Skip to content
Commit 5f9f5b52 authored by David Woodhouse's avatar David Woodhouse
Browse files

Add (failing) test for Evolution's address autocompletion query

[00:21] <tristan> dwmw2, if you really think that, then it will have to be
            rethought again, but if we're gonna rethink it again - please
            add a failing test case to test-client-custom-summary.c before
            we even go down that road.

Here you go :)

Note: This fails because we're using an inner join of folder_id with
folder_id_email_list on uid. Which means that records in folder_id
which don't *have* an email address aren't considered for the search.

This is the query:
SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
  FROM  'folder_id' AS summary
  JOIN  'folder_id_email_list' AS email_list
  ON    +email_list.uid = summary.uid
  WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')
      OR (email_list.value IS NOT NULL AND email_list.value LIKE 'p%')
      OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%')
       OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%')
       OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%')
       OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%'))
      OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%'));

You actually need a LEFT OUTER JOIN there, if you're going to do it that
way.

The above query takes about 1350ms on my data set with 238121 records.

If I change it to a LEFT OUTER JOIN, it looks like it will literally
take hours, so I didn't let it finish. I'm not sure what the '+' sign
in 'ON +email_list.uid = summary.uid' is supposed to do, but if I
*remove* that to make it 'LEFT OUTER JOIN … ON email_list.uid = summary.uid'
the query then completes in about 1650ms.

I can improve that by restructuring the query to look like this:

SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
  FROM  'folder_id' AS summary
  JOIN  'folder_id_email_list' AS email_list
  ON    +email_list.uid = summary.uid
  WHERE (email_list.value IS NOT NULL AND email_list.value LIKE 'p%')
UNION
SELECT summary.uid, summary.vcard, summary.bdata
  FROM  'folder_id' AS summary
  WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')
      OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%')
       OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%')
       OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%')
       OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%'))
      OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%'));

That runs in about 460ms on my data set.

I can get it down to about 400ms by eliding some of the IS NOT NULL checks
that seem rather gratuitous (although shouldn't sqlite do that for itself
since it's a fairly bloody obvious optimisation?):

SELECT DISTINCT summary.uid, summary.vcard, summary.bdata
  FROM  'folder_id' AS summary
  JOIN  'folder_id_email_list' AS email_list
  ON    +email_list.uid = summary.uid
  WHERE (email_list.value LIKE 'p%')
UNION
SELECT summary.uid, summary.vcard, summary.bdata
  FROM  'folder_id' AS summary
  WHERE ((summary.nickname LIKE 'p%')
      OR (summary.full_name LIKE 'p%')
      OR (summary.family_name LIKE 'p%')
      OR (summary.given_name LIKE 'p%')
      OR (summary.nickname LIKE 'p%')
      OR (summary.file_as LIKE 'p%'));

Finally, if I actually add indices on the fields that need them, such as
file_as, nickname, etc., I can get the search time down to 5-10ms. This
*only* works for my refactored queries using 'UNION', and not for the JOIN
versions. Obviously the indices aren't useful on the results of the JOIN.

(Note: we did already have an index on file_as_localized, but not file_as.
 And likewise for some other fields).
parent d7195ef0
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment