English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Better Filtering with Gapped q-Grams

Burkhardt, S., & Kärkkäinen, J. (2001). Better Filtering with Gapped q-Grams. In A. Amir, & G. Landau (Eds.), Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (pp. 73-85). Berlin, Germany: Springer.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Burkhardt, Stefan1, Author           
Kärkkäinen, Juha1, Author           
Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019              

Content

show
hide
Free keywords: -
 Abstract: The q-gram filter is a popular filtering method for approximate string matching. It compares substrings of length q (the q-grams) in the pattern and the text to identify the text areas that might contain a match. A generalization of the method is to use gapped q-grams, subsets of q characters in some fixed non-contiguous shape, instead of contiguous substrings. Although mentioned a few times in the literature, this generalization has never been studied in any depth. In ths paper, we report the first results from a study on gapped q-grams. We show that gapped q-grams can provide orders of magnitude faster and/or more efficient filtering than contiguous q-grams. The performance, however, depends on the shape of the q-grams. The best shaoes are rare and often posess no apparen regularity. We show how to recognize good shapes and demonstrate with experiments their advantage over both contiguous and average shapes. We concentrate here on the k mismatches problem, but also outline an approach for extending the results to the more common k differences problem.

Details

show
hide
Language(s): eng - English
 Dates: 2010-03-022001
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 518175
URI: http://www.mpi-sb.mpg.de/~stburk/gapped-q.ps
Other: Local-ID: C1256428004B93B8-5165C4C93C02B85DC1256A8F00420FEE-Burkhardt2001
 Degree: -

Event

show
hide
Title: Untitled Event
Place of Event: Jerusalem, Israel
Start-/End Date: 2001

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Source Genre: Proceedings
 Creator(s):
Amir, Amihood, Editor
Landau, Gadi, Editor
Affiliations:
-
Publ. Info: Berlin, Germany : Springer
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 73 - 85 Identifier: ISBN: 3-540-42271-4

Source 2

show
hide
Title: Lecture Notes in Computer Science
Source Genre: Series
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 2089 Sequence Number: - Start / End Page: - Identifier: -