Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Sustaining
    • Fix Version/s: Sustaining
    • Component/s: Data Summarization
    • Labels:
      None
    • Environment:

      Production

    • Sprint:
      Sprint 1

      Description

      Bing SE Data is failing to process. It seems to be because the system is getting hung up when it sees the last line of the downloaded report, which is a copyright.

      Attached is the seuser report for the past few days noticed today’s copyright is slightly different:

      Today: "© 2013 Microsoft. All rights reserved. "

      Before: "©2013 Microsoft Corporation. All rights reserved. "

      Rerun:

      nohup $DIR/sepull147.sh -d3 -T 2013-04-23 -a 925 -U 3767 --force 2> /tmp/2013-04-24/sepull147.sh_a_925_u_3767_rerun1.log &

      Log extract:

      2013-04-24 05:30:44.117 (3) [P1T1]: Parsing /var/local/tsa/sedata/msn/tsa-msn-report-2013-04-23-U3767.csv.zip...
      2013-04-24 05:30:45.535 (3) [P1T1]: DatabasePool with a limit of 6 created
      2013-04-24 05:30:45.609 (3) [P1T1]: Deleting existing records for seuser #3767 where ((`GregorianDate`="2013-04-23 00:00:00"))
      2013-04-24 05:30:45.733 (2) [P1T1]: Warning(s) detected in statement [LOAD DATA LOCAL INFILE "/tmp/msn376744295.tmp" INTO TABLE `tmp`.`tmp1` FIELDS TERMINATED BY
      '\t' ENCLOSED BY '"' (`AccountId`,`CampaignId`,`AdGroupId`,`KeywordId`,`AdId`,`GregorianDate`,`Hour`,`AccountName`,`CampaignName`,`AdDistribution`,`Keyword`,`
      MatchType`,`Impressions`,`Clicks`,`Ctr`,`AverageCpc`,`Spend`,`AveragePosition`,`Conversions`,`ConversionRate`,`CostPerConversion`,`CurrentMaxCpc`)] executed by u
      ser [spike]
      2013-04-24 05:30:45.734 (2) [P1T1]: SQL Warning: Incorrect integer value: '© 2013 Microsoft. All rights reserved. ' for column 'AccountId' at row 5109 (code:1366
      )
      2013-04-24 05:30:45.734 (2) [P1T1]: SQL Warning: Row 5109 doesn't contain data for all columns (code:1261)
      2013-04-24 05:30:45.789 (3) [P1T1]: Attempt #1 failed, retrying for searchEngineUser 3767

      This is the same error for all sepull147 runs on April 24th.

      The problem seems to be in MSNSearchEngine.java, it's checking if the line doesn't start with the copyright to continue.

      private static final String REPORT_COPYRIGHT_NOTICE_SUFFIX = "Microsoft Corporation. All rights reserved. \"";

      Line 5813:
      else if ((!line.equals(""))
      && (!line.startsWith(EMPTY_REPORT_MESSAGE_PREFIX))
      && (!line.startsWith(EMPTY_REPORT_MESSAGE_PREFIX_2))
      && (!line.endsWith(REPORT_COPYRIGHT_NOTICE_SUFFIX))) {

      isEmpty = false;

        Attachments

          Activity

            People

            • Assignee:
              diego.nan Diego Nan (Inactive)
              Reporter:
              diego.nan Diego Nan (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1 day
                1d
                Remaining:
                Remaining Estimate - 1 day
                1d
                Logged:
                Time Spent - Not Specified
                Not Specified