- Feb 29, 2016
-
-
Carlos Garnacho authored
Our old stale copy of the FTS3/4 module is now deleted, replaced by a shinier FTS5 embedded module. If at configure time we detect that SQLite doesn't offer the FTS5 module, we will load our own, just as we used to do with FTS4. FTS5 brings a few differences in the ways it's meant to be extended, the tokenizer has been updated to cope with the differences. Also, FTS5 offers no offsets() builtin function, nor matchinfo() which we used to implement ranking. It offers though ways to implement additional functions, and builtin rank support which can be tweaked to achieve the same functional results than we did. Other than that, the ways to interact with the FTS virtual table are roughly similar to those in FTS4, insertions and deletions have been updated to do things the FTS5 way. Since it's not worth to bump the database format (data is reproducted from the journal, so we drop some embedded data such as nie:plainTextContent), the nco:hobby property has been modified to no longer be fulltext indexed, AFAIK there's no users ever setting/ accessing that, and the FTS properties change will trigger the regeneration of the FTS view and virtual tables, resulting in a seamless update to FTS5. However, we don't leave completely unscathed from the fts3_tokenizer() change. Since the older FTS3/4 tokenizer is not registered, we can't just drop the older FTS table. So it is left dangling and never accessed again, in favor of the newer fts5 table. This is obviously not a problem when creating the database from scratch. In the way, a few bugs were found. per-property weights in ranking were being given in a scrambled way (although stable across database generations). And deletion of FTS properties (or entire rows) could result in the tokens not being fully removed from the FTS table, resulting in confused searches. These are now fixed. Impact to users of tracker should be none. All the FTS Sparql-to-SQL translation has been updated to just use FTS5 syntax and tables.
-
Carlos Garnacho authored
We no longer have a reason to deem this change incompatible. Since no actual text is stored in the FTS tables, and it merely updates its tokenization info from the original class tables, we can just drop this info and reconstruct at will. So there's no longer need to consider tracker:fulltextIndexed changes as incompatible.
-
- Feb 26, 2016
-
-
Resort to the basename in order to guarantee a title. https://bugzilla.gnome.org/show_bug.cgi?id=761466
-
ISO 8601 takes hours and months starting at 1. Passing 0 results in wrong parsing of the resulting date string. https://bugzilla.gnome.org/show_bug.cgi?id=761236
-
- Feb 25, 2016
-
-
Carlos Garnacho authored
And check for another condition where it might happen (if tracker_crawler_start() happens on a non-existing dir). The severity of this situation is not really critical, so it's been lowered to a g_debug().
-
Carlos Garnacho authored
When a currently inspected folder is cancelled, we could either jump to the next pending dir (if any) or the next crawling root, we must cancel these two operations prior to both.
-
Carlos Garnacho authored
We might be passed a non-canonical directory to file_notifier_current_root_check_remove_directory(), g_file_equal will be safer.
-
Carlos Garnacho authored
The enumerators would just be freed but not closed, leaking fds in the underlying implementations.
-
Carlos Garnacho authored
Always go through the async path, and manage the cancellable in both the IO scheduler and the jobs. If we try to handle cancellation in the direct path, the cursor will be unref'ed (and try to grab the DB lock again) while the lock is held. There's no way cancellation can work while holding the lock, so just removed this path.
-
- Feb 24, 2016
-
-
Carlos Garnacho authored
The way bulk operations break the ordering of the sparql updates may bring certain glitches. eg. if a delete (bulk) operation happens on a folder while the sparql buffer already contains updates corresponding to files within that folder, the way we order operations results in the delete happening first, and the inserts/updates happening afterwards. This may leave the database in inconsistent states. So make bulk operations a 1 file thing, this could maybe be smarter and try to compress consecutive similar operations, but there's not much added value in it. Plus, the bulk operations we issue in the miner would never match that criteria. Because we no longer break the ordering with bulk operations, things like the error map are no longer necessary, so we can remove some icky code here. BulkOperationMerge and its functions could be further simplified, that's left for a later cleanup.
-
Carlos Garnacho authored
If a folder being deleted affects operations currently in the currently issued tasks (eg. those we emitted ::process-file on) and writeback buffers, those operations would still attempt to proceed, with different degrees of success.
-
Carlos Garnacho authored
Otherwise the crawler will still attempt to go through the processed folders, adding unnecesary processing and potentially leaving inconsistent state in the TrackerFileSystem if a similar file layout appeared in the future.
-
Carlos Garnacho authored
Resetting and reusing is not deemed safe. It is better to create new cancellables and let the old ones be last unref'ed after async callbacks finish.
-
- Feb 23, 2016
-
-
Carlos Garnacho authored
We can't tell the extracted data is complete or valid, so give up on that data entirely.
-
Carlos Garnacho authored
It's useful to know the file it comes from.
-
- Feb 15, 2016
-
-
Carlos Garnacho authored
-
Carlos Garnacho authored
-
Carlos Garnacho authored
The bugs covered by the last patches might leave leftovers in the DB if tracker-miner-fs updated a directory after it changed between runs, but wouldn't delete contents that were removed from the filesystem. Add this toggle envvar to point people to a solution, if you've seen errors like "UNIQUE constraint failed: nie:DataObject.nie:url" in journald, you're affected by these bugs, and should run tracker-miner-fs once with TRACKER_MINER_FORCE_CHECK_UPDATED set in the environment. Running just once will be enough, enabling this always will incur in startup performance penalties.
-
Carlos Garnacho authored
Now that we perform per-directory crawling+querying, we can just check whether the currently crawled dir was updated, and remove all filtering features from sparql_contents_query_start(). We actually were interpreting the filter incorrectly, because the directories would be added to the updated_dirs array sooner than they would be crawled, so the check for deleted contents was simply forgotten. This simplification fixes such situation.
-
object_id is not initialized in this branch of code, use the object URI instead.
-
Carlos Garnacho authored
When adding files from the store to the TrackerFileSystem, we must at least detect whether they are folders. Returning "unknown" here meant that TrackerMinerFS wouldn't enable the TRACKER_BULK_MATCH_CHILDREN mask on delete operations for directories, which could leave lingering child nfo:FileDataObjects. This situation could happen if a folder is deleted between runs of tracker-miner-fs, all information we can get comes from the store, as it's no longer there at the time of crawling.
-
Carlos Garnacho authored
We can use STRSTARTS better than fn:starts-with here.
-
The GQueue is passed as a filter to sparql_contents_compose_query(), but it relies on the current file being outside the filter, otherwise it will produce an empty IN() filter.
-
Moreover, an infinite loop may occurs if process-file signal always fails.
-
Directories must get all children invalidated, because already queued tasks might contain new instances of those same files, in which case they would still find the previous URN.
-
This will be useful for delete operations.
-
Carlos Garnacho authored
tracker_crawler_start() may return FALSE if the directory doesn't exist or its ::check-directory handlers return FALSE. We must continue in these circumstances into querying the directory as usual, so it's contrasted and eventually deleted by file_notifier_traverse_tree_foreach().
-
These leaks had huge impact as each TrackerTask had a reference to a GFile, which prevent them to be removed from TrackerFileSystem when calling tracker_file_system_forget_files(). Due to this behavior, adding/removing/re-adding folders resulted in some folders/files not being indexed.
-
TrackerFileNotifier::file-deleted might have removed the last reference to the given file, make sure it lives a bit longer for the remaining operations to finish.
-
This allow the correponding node to be retrieved from g_object_get_qdata() instead of traversing all the tree, which might imply a performance hit.
-
If the first element is filtered out it'd append a comma anyway in the query filter.
-
-
The blob lenght is defined to be in bytes.
-
-
It was done at the end of the function.
-
All callers require both hashtables, so remove NULL handling for those. Fixes a possible memory leak if fts_properties is NULL.
-
- Feb 14, 2016
-
-
- Feb 06, 2016
-
-
Rūdolfs Mazurs authored
-