BIROn - Birkbeck Institutional Research Online

    On the average-case complexity of pattern matching with wildcards

    Carl, Barton (2022) On the average-case complexity of pattern matching with wildcards. Theoretical Computer Science , ISSN 0304-3975. (In Press)

    48011.pdf - Published Version of Record
    Available under License Creative Commons Attribution.

    Download (412kB) | Preview


    Pattern matching with wildcards is a string matching problem with the goal of finding all factors of a text $t$ of length $n$ that match a pattern $x$ of length $m$, where wildcards (characters that match everything) may be present. In this paper we present a number of complexity results and fast average-case algorithms for pattern matching where wildcards are allowed in the pattern, however, the results are easily adapted to the case where wildcards are allowed in the text as well. We analyse the \textit{average-case} complexity of these algorithms and derive non-trivial time bounds. These are the first results on the average-case complexity of pattern matching with wildcards which provide a provable separation in time complexity between exact pattern matching and pattern matching with wildcards. We introduce the \textit{wc-period} of a string which is the period of the binary mask $x_b$ where $x_b[i]=a$ \textit{iff} $x[i]\neq \phi$ and $b$ otherwise. We denote the length of the wc-period of a string $x$ by $\textsc{wcp}(x)$. We show the following results for constant $0< \epsilon < 1 $ and a pattern $x$ of length $m$ and $g$ wildcards with $\textsc{wcp}(x)=p$ the prefix of length $p$ contains $g_p$ wildcards: \begin{itemize} \item If $\displaystyle\lim_{m \rightarrow \infty} \frac{g_p}{p}=0$ there is an optimal algorithm running in $\cO(\frac{n \log_\sigma m}{m})$-time on average. \item If $\displaystyle\lim_{m \rightarrow \infty} \frac{g_p}{p}=1-\epsilon$ there is an algorithm running in $\cO(\frac{n \log_\sigma m\log_2 p}{m})$-time on average. \item If $\displaystyle\lim_{m \rightarrow \infty} \frac{g}{m} = \displaystyle\lim_{m \rightarrow \infty} 1-f(m)=1$ any algorithm takes at least $\Omega(\frac{n \log_\sigma m}{f(m)})$-time on average. \end{itemize}


    Item Type: Article
    Keyword(s) / Subject(s): Average case complexity, Pattern matching with wildcards, Stringology, Pattern matching, Pattern matching with don't care symbols
    School: School of Business, Economics & Informatics > Computer Science and Information Systems
    Depositing User: Carl Barton
    Date Deposited: 18 May 2022 10:38
    Last Modified: 18 May 2022 12:18


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item