BIROn - Birkbeck Institutional Research Online

    Faster pattern matching under edit distance : a reduction to dynamic puzzle matching and the Seaweed Monoid of permutation matrices

    Charalampopoulos, Panagiotis and Kociumaka, T. and Wellnitz, P. (2022) Faster pattern matching under edit distance : a reduction to dynamic puzzle matching and the Seaweed Monoid of permutation matrices. 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022 , pp. 698-707. ISSN 2575-8454.

    [img]
    Preview
    Text
    focs_biron.pdf - Author's Accepted Manuscript

    Download (604kB) | Preview

    Abstract

    We consider the approximate pattern matching problem under the edit distance. Given a text T of length n, a pattern P of length m, and a threshold k, the task is to find the starting positions of all substrings of T that can be transformed to P with at most k edits. More than 20 years ago, Cole and Hariharan [SODA’98, J. Comput.’02] gave an O(n + k^4·n/m)-time algorithm for this classic problem, and this runtime has not been improved since. Here, we present an algorithm that runs in time O(n + k^{3.5}√( log m log k) · n/m), thus breaking through this longstanding barrier. In the case where n^{1/4+ε} ≤ k ≤ n^{2/5−ε} for some arbitrarily small positive constant ε, our algorithm improves over the state-of-the-art by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic O(kn)-time algorithm of Landau and Vishkin [STOC’86, J. Algorithms’89]. We observe that the bottleneck case of the alternative O(n + k^4· n/m)-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS’20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new Dynamic Puzzle Matching problem, which we solve by building on tools developed by Tiskin [SODA’10, Algorithmica’15] for the so called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fully-compressed setting (where text and pattern are given as straight-line programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.

    Metadata

    Item Type: Article
    Additional Information: Date of Conference: 31 October 2022 - 03 November 2022. ISBN: 9781665455190
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Panagiotis Charalampopoulos
    Date Deposited: 06 Jan 2023 05:48
    Last Modified: 09 Aug 2023 12:54
    URI: https://eprints.bbk.ac.uk/id/eprint/50362

    Statistics

    Activity Overview
    6 month trend
    91Downloads
    6 month trend
    79Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item