Uploaded image for project: 'AdMax'
  1. AdMax
  2. ADMAX-2957

Report completion issues when several concurrent reports are in progress

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Link Score Report
    • Fix Version/s: Link Score Report
    • Component/s: Linking Report
    • Labels:
      None
    • Environment:

      Production

      Description

      PROBLEM:

      When multiple report requests are processing concurrently there seems to be an issue in the page scanning process that leads to reports taking an unusual length of time to complete, or to not complete at all. This appears to have something to do with the implementation of message-driven beans internal to the LS application as a mechanism to make the page scanning process behave like a multi-threaded application. Page scan operations are batched and delegated to be handled by a pool of message-driven beans that each handle the page scan operation a subset of URLs in the report request. The issue noted is that in cases where multiple report requests are processing concurrently, the page scanning portion of a report request can become very slow to process the last batch of links for a report, or not process them at all. This typically results in a report appearing in a endless "in-progress" status in the UI.

      Other issues that could be attributed to the pooled message driven bean approach in use are:

      • Problematic shut-downs/restarts of JBoss. MDBs that are in an orphaned state could be hanging the restart process.
      • Remove a JBoss-specific dependency (MDBs) from the Link Score application.
      • Utilize existing components in the TSA stack that are designed to be more scalable.
      • Move the storage of reports out of the JBoss server and into a location that has greater disk capacity/intended for report storage.

      POTENTIAL SOLUTIONS:

      Message-driven beans are used in the Link Score process primarily to increase the number of concurrent page scan operations and allow a report to complete quickly. One approach to solving this issue would be to find another way to improve concurrency, such as delegating page scanning operations to a component that runs outside of the JBoss container - ie; a common job execution process that operates within the TSA stack, or perhaps more appropriate would be to delegate page scanning operations to the AWS infrastructure that is in place for Cloud-enabled Spider.

      Other potential benefits to moving Link Score processing outside of the container are:

      • Lowered system utilization on the JBoss5 server that is responsible for other system services that are critical to daily operation
      • Relocate storage of completed link score reports to get them out of the JBoss document root and to a location intended for long term document storage and retrieval.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pwynne Patrick Wynne
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: