Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Zeitschriftenartikel

Better Filtering with Gapped q-Grams

MPG-Autoren
/persons/resource/persons44209

Burkhardt,  Stefan
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44717

Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Burkhardt, S., & Kärkkäinen, J. (2003). Better Filtering with Gapped q-Grams. Fundamenta Informaticae, 56, 51-70.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-2C61-5
Zusammenfassung
A popular and well-studied class of filters for approximate string matching compares substrings of length $q$, \emph{the $q$-grams}, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses {\em gapped} $q$-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped $q$-grams. We show that gapped $q$-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous $q$-grams. To achieve these results the arrangement of the gaps in the $q$-gram and a filter parameter called \emph{threshold} have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the $k$~mismatches problem, i.e, approximate string matching with the Hamming distance.