Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: Spider
-
Labels:None
Description
While crawling a site if a page returns MIME type as "application/xhtml+xml" then spider marks the page as crawled but doesn't parse the page to get on page links.
This MIME type is associated with XHTML pages and not all XHTML pages return this type. We need to identify most commonly used MIME types with pages and make sure our spider is able to crawl those pages properly.