Uploaded image for project: 'AdMax'
  1. AdMax
  2. ADMAX-1984

Spider: crawls fails to honor "robots.txt" file with individual Search Engine

    Details

    • Type: Bug
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: unspecified
    • Fix Version/s: None
    • Component/s: Spider
    • Labels:
      None
    • Environment:

      Operating System: Windows XP
      Platform: PC

    • Bugzilla Id:
      3589

      Description

      Prerequisites:
      Edit "robots.txt" file in root folder as
      --------------
      User-Agent: Googlebot
      Disallow: /Spider/simple.htm
      --------------
      and save it

      Steps:
      1. Log into AdMax application.
      2. Navigate to SEO section, click on "Spider"
      3. Enter a valid URL to spider say
      "http://pvwb-of1pvd0010.ri.thesearchagency.com/gurpreet/Spider/newlink.htm"
      4. Select "custom spider options, ensure "Honor Robots" check box is checked
      5. Spider the URL and download the generated ADR report

      Expected Result:
      In ADR, the link to "Spider/simple.htm" should not be listed in "Spidered
      Urls_1" as they are disallowed

      Actual Result:
      In ADR, the link to "Spider/simple.htm" is listed in "Spidered Urls_1"

      Note:
      Observed that Honor "robots.txt" file is working fine, when used a wild card
      character (User-Agent: *) to exclude them all

        Attachments

          Activity

            People

            • Assignee:
              abhiram Abhiram Bhagwat
              Reporter:
              saravanan.t Saravanan (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: