Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: Cloud Spider 3.03
-
Component/s: Cloud Spider
-
Labels:None
Description
Based on crawl input options regex patterns are created to see if a URL is valid for the crawl or not. Some of these URLs found during the crawl never get validated by our URL pattern validator regex. This causes regex to go on for long time consuming a lot of CPU and the process hangs mid way.
We need to either change the regex or use a different logic to validate a URL for the crawl.
Currently affected crawls were
Crawl 1499 - http://www.monster.de/
Crawl 1500 - http://www.monster.fr/