English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Estimating mutation distances from unaligned genomes

Haubold, B., Pfaffelhuber, P., Domazet-Lošo, M., & Wiehe, T. (2009). Estimating mutation distances from unaligned genomes. Journal of Computational Biology, 16(10), 1487-1500. doi:10.1089/cmb.2009.0106.

Item is

Files

show Files
hide Files
:
Haubold_2009.pdf (Publisher version), 2MB
 
File Permalink:
-
Name:
Haubold_2009.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Evolutionary Biology, MPLM; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Haubold, Bernhard1, Author           
Pfaffelhuber, Peter, Author
Domazet-Lošo, Mirjana1, Author           
Wiehe, Thomas, Author
Affiliations:
1Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society, ou_1445644              

Content

show
hide
Free keywords: alignment-free distance; number of substitutions; genome comparison; suffix tree; shortest unique substring
 Abstract: Alignment-free distance measures are generally less accurate but more efficient than traditional alignment-based metrics. In the context of genome sequence analysis, the efficiency gain is often so substantial that it outweights the loss in accuracy. However, a further disadvantage of alignment-free distances is that their relationship to evolutionary events such as substitutions is generally unknown. We have therefore derived an estimator of the number of substitutions per site between two unaligned DNA sequences, K-r. Simulations show that this estimator works well with "ideal'' data. We compare K-r to two alternative alignment-free distances: a k-tuple distance and a measure of relative entropy based on average common substring length. All three measures are applied to 27 primate mitochondrial genomes, eight whole genomes of Streptococcus agalactiae strains, and 12 whole genomes of Drosophila species. In each case, the cluster diagrams based on K-r are equivalent to or significantly better than those based on the two alternative measures. This is due to the fact that in contrast to the alternative measures K-r is derived from an explicit model of evolution. The computation of K-r is efficiently implemented in the program kr, which can be downloaded freely from the internet.

Details

show
hide
Language(s): eng - English
 Dates: 2009
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 438828
DOI: 10.1089/cmb.2009.0106
Other: 2725/S 39045
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Journal of Computational Biology
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 16 (10) Sequence Number: - Start / End Page: 1487 - 1500 Identifier: ISSN: 1066-5277 (print)
ISSN: 1557-8666 (online)