Uploaded image for project: 'AdMax'
  1. AdMax
  2. ADMAX-2825

Cloud Spider: Spider Tries to Fetch Failed URLs Again and Again (In Each Iteration)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Cloud Spider 3.01
    • Fix Version/s: Cloud Spider 3.02
    • Component/s: Cloud Spider
    • Labels:
      None

      Description

      Bug in Nutch (Related: https://issues.apache.org/jira/browse/NUTCH-578)

      Nutch schedules URLs with exception to retry fetching again and again, until all iterations are complete. This needs to be changed to permanently mark the URLs as failed. This would also cause URLs Found count to be wrong, as failed URLs could be added again. This should be a one line change in Fetcher.java.

        Attachments

          Activity

            People

            • Assignee:
              antony Antony Rajiv (Inactive)
              Reporter:
              antony Antony Rajiv (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: