misc: Avoid string copies during filtering (bgo#768300)
When we switched over to doing better regex filtering and highlighting of ignored regions, we changed the way we were applying filters from a simple multiple-regex approach to a merged-span based approach. This is fine, except that this also changed the way we sliced the existing text to produce the filtered version. Prior to this commit, we removed matching filtered text by concatenating two string slices, which is extremely slow in Python due to the overhead of string allocation, among other things. With this patch, we use a more idiomatic approach of grabbing all of the text sections that we care about and concatenating them in a single join operation at the end. The test case in bgo#768300 was previously extremely slow (I gave up waiting), but with this change takes a few seconds. This commit also switches up the role of the "cutter" function, which now only applies changes rather than expecting to modify the text. Text modification is carried out by apply_text_filters itself, since it can do so much more efficiently.
parent
760f63ac
Please register or sign in to comment