Uploaded image for project: 'AdMax'
  1. AdMax
  2. ADMAX-2597

warehouse summarizer does not handle utf8 characters correctly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: Sustaining
    • Component/s: Data Summarization
    • Labels:
      None

      Description

      the sources table has keyword and description as utf8 columns. The temp table being created doesn't appear to be setting utf8 columns and it is not setting the character set for the entire table.

      Account 799 (Small Luxury Hotels) was just added Saturday

      rgardner@xml-06:~$ ls l /tmp/2011-03??/sesumm*799*

      rw-rr- 1 tsaapp tsaapp 36167 2011-03-26 04:53 /tmp/2011-03-26/sesummarize.sh_d_3_S_3_T_2011_03_25_a_12,16,782,799_04:52:26.log

      rw-rr- 1 tsaapp tsaapp 46003 2011-03-27 04:55 /tmp/2011-03-27/sesummarize.sh_d_3_S_3_T_2011_03_26_a_12,16,782,799_04:54:35.log

      rw-rr- 1 tsaapp tsaapp 4607 2011-03-28 08:04 /tmp/2011-03-28/sesummarize.sh-799-whsumm-rerun.log

      rw-rr- 1 tsaapp tsaapp 41734 2011-03-28 04:53 /tmp/2011-03-28/sesummarize.sh_d_3_S_3_T_2011_03_27_a_12,16,782,799_04:52:26.log

      rw-rr- 1 tsaapp tsaapp 39692 2011-03-29 04:53 /tmp/2011-03-29/sesummarize.sh_d_3_S_3_T_2011_03_28_a_12,16,782,799_04:52:29.log

      and there was a similar error on Sunday:

      2011-03-27 04:55:36.350 (2) [P10T1]: Exception [creating staging temp table]:com.mysql.jdbc.exceptions.MySQLIntegrityConstraintViolationException: Duplicate entry '799-hoteldorf grĂ¼ner baum' for key 'description'

      There appear to be similar sources for this account:

      mysql> select id, keyword, description from sources where accountID = 799 and distributionID = 3 and keyword = 'Im Weissen Rossl';

      ------------------------------------------------------------

      id keyword description

      ------------------------------------------------------------

      925871415 Im Weissen Rssl Keyword: [Im Weissen Rssl] broad
      925871455 Im Weissen Rssl Keyword: [Im Weissen Rssl] exact
      925895145 Im Weissen Rossl Keyword: [Im Weissen Rossl] broad
      925895175 Im Weissen Rossl Keyword: [Im Weissen Rossl] exact
      926513915 im weissen rssl Keyword "im weissen rssl"

      ------------------------------------------------------------

      5 rows in set (0.02 sec)

      It does appear that the temp table is confusing these slightly different variations.

        Attachments

          Activity

            People

            • Assignee:
              squadrim Mike Squadrito (Inactive)
              Reporter:
              therouxj Jeff Theroux
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: