Add (failing) test for Evolution's address autocompletion query
[00:21] <tristan> dwmw2, if you really think that, then it will have to be rethought again, but if we're gonna rethink it again - please add a failing test case to test-client-custom-summary.c before we even go down that road. Here you go :) Note: This fails because we're using an inner join of folder_id with folder_id_email_list on uid. Which means that records in folder_id which don't *have* an email address aren't considered for the search. This is the query: SELECT DISTINCT summary.uid, summary.vcard, summary.bdata FROM 'folder_id' AS summary JOIN 'folder_id_email_list' AS email_list ON +email_list.uid = summary.uid WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%') OR (email_list.value IS NOT NULL AND email_list.value LIKE 'p%') OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%') OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%') OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%') OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')) OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%')); You actually need a LEFT OUTER JOIN there, if you're going to do it that way. The above query takes about 1350ms on my data set with 238121 records. If I change it to a LEFT OUTER JOIN, it looks like it will literally take hours, so I didn't let it finish. I'm not sure what the '+' sign in 'ON +email_list.uid = summary.uid' is supposed to do, but if I *remove* that to make it 'LEFT OUTER JOIN … ON email_list.uid = summary.uid' the query then completes in about 1650ms. I can improve that by restructuring the query to look like this: SELECT DISTINCT summary.uid, summary.vcard, summary.bdata FROM 'folder_id' AS summary JOIN 'folder_id_email_list' AS email_list ON +email_list.uid = summary.uid WHERE (email_list.value IS NOT NULL AND email_list.value LIKE 'p%') UNION SELECT summary.uid, summary.vcard, summary.bdata FROM 'folder_id' AS summary WHERE ((summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%') OR ((summary.full_name IS NOT NULL AND summary.full_name LIKE 'p%') OR (summary.family_name IS NOT NULL AND summary.family_name LIKE 'p%') OR (summary.given_name IS NOT NULL AND summary.given_name LIKE 'p%') OR (summary.nickname IS NOT NULL AND summary.nickname LIKE 'p%')) OR (summary.file_as IS NOT NULL AND summary.file_as LIKE 'p%')); That runs in about 460ms on my data set. I can get it down to about 400ms by eliding some of the IS NOT NULL checks that seem rather gratuitous (although shouldn't sqlite do that for itself since it's a fairly bloody obvious optimisation?): SELECT DISTINCT summary.uid, summary.vcard, summary.bdata FROM 'folder_id' AS summary JOIN 'folder_id_email_list' AS email_list ON +email_list.uid = summary.uid WHERE (email_list.value LIKE 'p%') UNION SELECT summary.uid, summary.vcard, summary.bdata FROM 'folder_id' AS summary WHERE ((summary.nickname LIKE 'p%') OR (summary.full_name LIKE 'p%') OR (summary.family_name LIKE 'p%') OR (summary.given_name LIKE 'p%') OR (summary.nickname LIKE 'p%') OR (summary.file_as LIKE 'p%')); Finally, if I actually add indices on the fields that need them, such as file_as, nickname, etc., I can get the search time down to 5-10ms. This *only* works for my refactored queries using 'UNION', and not for the JOIN versions. Obviously the indices aren't useful on the results of the JOIN. (Note: we did already have an index on file_as_localized, but not file_as. And likewise for some other fields).
parent
d7195ef0
Please register or sign in to comment