Details
- 
    Type:Bug 
- 
    Status: Open
- 
    Priority:Major 
- 
    Resolution: Unresolved
- 
    Affects Version/s: None
- 
    Fix Version/s: None
- 
    Component/s: Spider
- 
    Labels:None
Description
This data cross contamination problem is due to the fact that Otto was performing a crawl on a domain (www.tree.com) with the "all subdomains" box checked. It happens that some of the links on the crawled pages have a portion of the domain name in them (e.g., www.lendingtree.com). As a result, some of the links on the crawled pages were treated as subdomain links (instead of separate domains), and were included in the ADR accordingly. We need to properly identify a subdomain link, even if it has some (wording) overlap with the primary domain being crawled.