de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Better Filtering with Gapped q-Grams

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons44209

Burkhardt,  Stefan
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons44717

Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Burkhardt, S., & Kärkkäinen, J. (2003). Better Filtering with Gapped q-Grams. Fundamenta Informaticae, 56, 51-70.


Cite as: http://hdl.handle.net/11858/00-001M-0000-000F-2C61-5
Abstract
A popular and well-studied class of filters for approximate string matching compares substrings of length $q$, \emph{the $q$-grams}, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses {\em gapped} $q$-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped $q$-grams. We show that gapped $q$-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous $q$-grams. To achieve these results the arrangement of the gaps in the $q$-gram and a filter parameter called \emph{threshold} have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the $k$~mismatches problem, i.e, approximate string matching with the Hamming distance.