English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Hoffmann, S., Otto, C., Kurtz, S., Sharma, C. M., Khaitovich, P., Vogel, J., et al. (2009). Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures. PLoS Computational Biology, 5(9): e1000502.

Item is

Files

show Files
hide Files
:
PLoS_Comput_Biol_2009_5_e1000502.pdf (Publisher version), 704KB
Name:
PLoS_Comput_Biol_2009_5_e1000502.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
© 2009 Hoffmann et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
License:
-

Locators

show

Creators

show
hide
 Creators:
Hoffmann, Steve, Author
Otto, Christian, Author
Kurtz, Stefan, Author
Sharma, Cynthia Mira1, Author           
Khaitovich, Philipp, Author
Vogel, Jörg1, Author           
Stadler, Peter F.2, Author
Hackermüller, Jörg, Author
Affiliations:
1Max-Planck Research Group RNA Biology, Max Planck Institute for Infection Biology, Max Planck Society, ou_1664150              
2Max Planck Society, ou_persistent13              

Content

show
hide
Free keywords: -
 Abstract: With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.

Details

show
hide
Language(s): eng - English
 Dates: 2009-09
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: eDoc: 442297
ISI: 000270800100020
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: PLoS Computational Biology
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 5 (9) Sequence Number: e1000502 Start / End Page: - Identifier: ISSN: 1553-734X