English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Better Filtering with Gapped q-Grams

Burkhardt, S., & Kärkkäinen, J. (2003). Better Filtering with Gapped q-Grams. Fundamenta Informaticae, 56, 51-70.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Burkhardt, Stefan1, Author           
Kärkkäinen, Juha1, Author           
Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019              

Content

show
hide
Free keywords: -
 Abstract: A popular and well-studied class of filters for approximate string matching compares substrings of length $q$, \emph{the $q$-grams}, in the pattern and the text to identify text areas that contain potential matches. A generalization of the method that uses {\em gapped} $q$-grams instead of contiguous substrings is mentioned a few times in literature but has never been analyzed in any depth. In this paper, we report the first results of a study on gapped $q$-grams. We show that gapped $q$-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous $q$-grams. To achieve these results the arrangement of the gaps in the $q$-gram and a filter parameter called \emph{threshold} have to be optimized. Both of these tasks are nontrivial combinatorial optimization problems for which we present efficient solutions. We concentrate on the $k$~mismatches problem, i.e, approximate string matching with the Hamming distance.

Details

show
hide
Language(s): eng - English
 Dates: 2004-06-152003
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: eDoc: 201801
Other: Local-ID: C1256428004B93B8-97486D0D727E678AC1256D0900521D74-BurkhardtKarkkainen2003
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Fundamenta Informaticae
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 56 Sequence Number: - Start / End Page: 51 - 70 Identifier: ISSN: 0169-2968