hide
Free keywords:
Computer Science, Data Structures and Algorithms, cs.DS,Computer Science, Computational Complexity, cs.CC
Abstract:
We study regular expression membership testing: Given a regular expression of
size $m$ and a string of size $n$, decide whether the string is in the language
described by the regular expression. Its classic $O(nm)$ algorithm is one of
the big success stories of the 70s, which allowed pattern matching to develop
into the standard tool that it is today.
Many special cases of pattern matching have been studied that can be solved
faster than in quadratic time. However, a systematic study of tractable cases
was made possible only recently, with the first conditional lower bounds
reported by Backurs and Indyk [FOCS'16]. Restricted to any "type" of
homogeneous regular expressions of depth 2 or 3, they either presented a
near-linear time algorithm or a quadratic conditional lower bound, with one
exception known as the Word Break problem.
In this paper we complete their work as follows:
1) We present two almost-linear time algorithms that generalize all known
almost-linear time algorithms for special cases of regular expression
membership testing.
2) We classify all types, except for the Word Break problem, into
almost-linear time or quadratic time assuming the Strong Exponential Time
Hypothesis. This extends the classification from depth 2 and 3 to any constant
depth.
3) For the Word Break problem we give an improved $\tilde{O}(n m^{1/3} + m)$
algorithm. Surprisingly, we also prove a matching conditional lower bound for
combinatorial algorithms. This establishes Word Break as the only intermediate
problem.
In total, we prove matching upper and lower bounds for any type of
bounded-depth homogeneous regular expressions, which yields a full dichotomy
for regular expression membership testing.