Uploaded image for project: 'AdMax'
  1. AdMax
  2. ADMAX-3128

Cloud Spider: Executing Commands From MsgListener Fails With "Too many open files" Exception

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Sustaining
    • Component/s: Cloud Spider
    • Labels:
      None
    • Environment:

      AWS

    • Sprint:
      Sprint 2

      Description

      Need to find the root cause if "too many open files" exception

      Wed 2012/08/15 17:05:00.615| |Thread-217|StreamGobbler|STDERR: 12/08/15 17:05:00 INFO mapred.LocalJobRunner: 20 threads, 92101 pages, 128 errors, 1.3 pages/s, 2203 kb/s,
      Wed 2012/08/15 17:05:00.728| |Thread-217|StreamGobbler|STDERR: 12/08/15 17:05:00 INFO fetcher.Fetcher: <crawl 3352> total_urls_spidered: 298827 (max:1000000)
      Wed 2012/08/15 17:05:00.729| |Thread-217|StreamGobbler|STDERR: 12/08/15 17:05:00 INFO fetcher.Fetcher: <crawl 3352> fetching: http://www.remax.com/property/85812259-60050095/Off-of-Hereford-Road-Taylor-AZ-85939/
      Wed 2012/08/15 17:05:01.025| |Thread-217|StreamGobbler|STDERR: 12/08/15 17:05:01 INFO fetcher.Fetcher: <crawlid 3352> Sending status message now
      Wed 2012/08/15 17:05:01.051| |Thread-217|StreamGobbler|STDERR: 12/08/15 17:05:01 INFO fetcher.Fetcher: -activeThreads=20, spinWaiting=0, fetchQueues.totalSize=999
      Wed 2012/08/15 17:05:01.885| |main|StreamGobbler|Got message: id: 29ea66d2-2f29-4c41-a3b3-26b04cd06cc1 body: {"msgID":"c2341558-9be4-4aee-93b7-2087ea0102e3","replyQueue":"cloudmgmtsvc_production","replyDestType":"JMS_QUEUE","replyDest":"/queue/SpiderService-ServerResponseQueue","msgType":"start_crawl","body":"

      {\"crawlRequestID\":3376,\"url\":\"http://www.toronto.com/\",\"statusListenerHostname\":\"ip-10-68-26-249.ec2.internal\",\"userAgent\":\"Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)\",\"honorNoIndex\":true,\"honorRobots\":true,\"parameterLimit\":5,\"linkDepth\":6,\"honorNoFollow\":true,\"honorRelNoFollow\":true,\"addPageLinksOnFly\":true,\"numberOfPages\":250000,\"numberOfThreads\":20,\"fixURLEncoding\":true,\"spiderSubDomains\":false,\"spiderSubDomainsPrimary\":false,\"subDomains\":[],\"skipParameters\":[],\"isCrawlAuthneticated\":false,\"allowedDir\":[],\"excludeDir\":[]}

      "}
      Wed 2012/08/15 17:05:01.886| |main|StreamGobbler|Getting RequestMsg...
      Wed 2012/08/15 17:05:01.893| |main|StreamGobbler|Getting Start Crawl Msg...
      Wed 2012/08/15 17:05:01.893| |main|StreamGobbler|Cleaning up old Crawls...
      Wed 2012/08/15 17:05:01.894| |main|StreamGobbler|Execute Command: bash su nutch -c "/nutch/search/scripts/clean-up.sh"
      Wed 2012/08/15 17:05:01.895| |main|StreamGobbler|#### ERROR: executeCommand Cannot run program "bash": java.io.IOException: error=24, Too many open files
      Wed 2012/08/15 17:05:01.896| |main|StreamGobbler|Starting Crawl...
      Wed 2012/08/15 17:05:01.909| |main|StreamGobbler|<CRAWLID 3376> Attempting create crawl config dir, flag false
      Wed 2012/08/15 17:05:01.910| |main|StreamGobbler|Execute Command: mkdir /mnt/nutch/CRAWL_config/CRAWL_3376
      Wed 2012/08/15 17:05:01.911| |main|StreamGobbler|#### ERROR: executeCommand Cannot run program "mkdir": java.io.IOException: error=24, Too many open files
      Wed 2012/08/15 17:05:01.911| |main|StreamGobbler|Execute Command: chmod 777 -R /mnt/nutch/CRAWL_config/CRAWL_3376
      Wed 2012/08/15 17:05:01.912| |main|StreamGobbler|#### ERROR: executeCommand Cannot run program "chmod": java.io.IOException: error=24, Too many open files
      Wed 2012/08/15 17:05:01.912| |main|StreamGobbler|<CRAWLID 3376> crawl config dir created and permissions set to /mnt/nutch/CRAWL_config/CRAWL_3376
      Wed 2012/08/15 17:05:01.913| |main|StreamGobbler|#### ERROR: writeFile java.io.FileNotFoundException: /mnt/nutch/CRAWL_config/CRAWL_3376/subDomains_3376 (No such file or directory)
      Wed 2012/08/15 17:05:01.913| |main|StreamGobbler|#### ERROR: writeFile java.io.FileNotFoundException: /mnt/nutch/CRAWL_config/CRAWL_3376/allowed_3376 (No such file or directory)
      Wed 2012/08/15 17:05:01.914| |main|StreamGobbler|#### ERROR: writeFile java.io.FileNotFoundException: /mnt/nutch/CRAWL_config/CRAWL_3376/exclude_3376 (No such file or directory)
      Wed 2012/08/15 17:05:01.915| |main|StreamGobbler|#### ERROR: writeFile java.io.FileNotFoundException: /mnt/nutch/CRAWL_config/CRAWL_3376/skipParams_3376 (No such file or directory)
      Wed 2012/08/15 17:05:01.931| |main|StreamGobbler|Execute Command: chmod 777 /nutch/Crawl_3376_javaexec.sh
      Wed 2012/08/15 17:05:01.931| |main|StreamGobbler|#### ERROR: executeCommand Cannot run program "chmod": java.io.IOException: error=24, Too many open files
      Wed 2012/08/15 17:05:01.932| |main|StreamGobbler|Execute Command: su nutch -c "/nutch/Crawl_3376_javaexec.sh"
      Wed 2012/08/15 17:05:01.933| |main|StreamGobbler|#### ERROR: executeCommandPID Cannot run program "su": java.io.IOException: error=24, Too many open files
      Wed 2012/08/15 17:05:01.933| |main|StreamGobbler|Kicking off Crawl Process, ProcessID: -1
      Wed 2012/08/15 17:05:01.933| |main|StreamGobbler|Sending Crawl Status...

        Attachments

          Activity

            People

            • Assignee:
              jshih Jeff Shih (Inactive)
              Reporter:
              antony Antony Rajiv (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: