de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Computing the threshold for q-gram filters

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons44717

Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Kärkkäinen, J. (2002). Computing the threshold for q-gram filters. In Algorithm theory, SWAT 2002: 8th Scandinavian Workshop on Algorithm Theory (pp. 348-357). Berlin, Germany: Springer.


Cite as: http://hdl.handle.net/11858/00-001M-0000-000F-2F48-7
Abstract
A popular and much studied class of filters for approximate string matching is based on finding common $q$-grams, substrings of length $q$, between the pattern and the text. A variation of the basic idea uses \emph{gapped} $q$-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped $q$-gram filters is the computation of the so-called \emph{threshold} which defines the filter criterium. We describe the first general method for computing the threshold for $q$-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of $q$-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison.