Alignment-free estimation of nucleotide diversity

Haubold, Bernhard; Reed, Floyd A.; Pfaffelhuber, Peter

doi:10.1093/bioinformatics/btq689

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Alignment-free estimation of nucleotide diversity

MPS-Authors

/persons/resource/persons56719

Haubold, Bernhard
Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society;

/persons/resource/persons56877

Reed, Floyd A.
Research Group Population Genetics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Haubold, B., Reed, F. A., & Pfaffelhuber, P. (2011). Alignment-free estimation of nucleotide diversity. Bioinformatics, 27(4), 449-455. doi:10.1093/bioinformatics/btq689.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-D406-1

Abstract

Motivation: Sequencing capacity is currently growing more rapidly
than CPU speed, leading to an analysis bottleneck in many genome
projects. Alignment-free sequence analysis methods tend to be
more efficient than their alignment-based counterparts. They may,
therefore, be important in the long run for keeping sequence analysis
abreast with sequencing.
Results: We derive and implement an alignment-free estimator of
the number of pairwise mismatches, πm. Our implementation of πm,
pim, is based on an enhanced suffix array and inherits the superior
time and memory efficiency of this data structure. Simulations
demonstrate that πm is accurate if mutations are distributed randomly
along the chromosome. While real data often deviates from this ideal,
πm remains useful for identifying regions of low genetic diversity using
a sliding window approach. We demonstrate this by applying it to the
complete genomes of 37 strains of Drosophila melanogaster, and to
the genomes of two closely related Drosophila species, D.simulans
and D.sechellia. In both cases, we detect the diversity minimum and
discuss its biological implications.