Faster pattern matching under edit distance : a reduction to dynamic puzzle matching and the Seaweed Monoid of permutation matrices
Charalampopoulos, Panagiotis and Kociumaka, T. and Wellnitz, P. (2022) Faster pattern matching under edit distance : a reduction to dynamic puzzle matching and the Seaweed Monoid of permutation matrices. 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022 , pp. 698707. ISSN 25758454.

Text
focs_biron.pdf  Author's Accepted Manuscript Download (604kB)  Preview 
Abstract
We consider the approximate pattern matching problem under the edit distance. Given a text T of length n, a pattern P of length m, and a threshold k, the task is to find the starting positions of all substrings of T that can be transformed to P with at most k edits. More than 20 years ago, Cole and Hariharan [SODA’98, J. Comput.’02] gave an O(n + k^4·n/m)time algorithm for this classic problem, and this runtime has not been improved since. Here, we present an algorithm that runs in time O(n + k^{3.5}√( log m log k) · n/m), thus breaking through this longstanding barrier. In the case where n^{1/4+ε} ≤ k ≤ n^{2/5−ε} for some arbitrarily small positive constant ε, our algorithm improves over the stateoftheart by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic O(kn)time algorithm of Landau and Vishkin [STOC’86, J. Algorithms’89]. We observe that the bottleneck case of the alternative O(n + k^4· n/m)time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS’20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new Dynamic Puzzle Matching problem, which we solve by building on tools developed by Tiskin [SODA’10, Algorithmica’15] for the so called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fullycompressed setting (where text and pattern are given as straightline programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.
Metadata
Item Type:  Article 

Additional Information:  Date of Conference: 31 October 2022  03 November 2022. ISBN: 9781665455190 
School:  Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences 
Depositing User:  Panagiotis Charalampopoulos 
Date Deposited:  06 Jan 2023 05:48 
Last Modified:  09 Aug 2023 12:54 
URI:  https://eprints.bbk.ac.uk/id/eprint/50362 
Statistics
Additional statistics are available via IRStats2.