Alignment-free estimation of nucleotide diversity

Haubold, Bernhard; Reed, Floyd A.; Pfaffelhuber, Peter

doi:10.1093/bioinformatics/btq689

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_1505077_2

DetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Alignment-free estimation of nucleotide diversity

MPG-Autoren

/persons/resource/persons56719

Haubold, Bernhard
Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society;

/persons/resource/persons56877

Reed, Floyd A.
Research Group Population Genetics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Haubold, B., Reed, F. A., & Pfaffelhuber, P. (2011). Alignment-free estimation of nucleotide diversity. Bioinformatics, 27(4), 449-455. doi:10.1093/bioinformatics/btq689.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-D406-1

Zusammenfassung

Motivation: Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing. Results: We derive and implement an alignment-free estimator of the number of pairwise mismatches, πm. Our implementation of πm, pim, is based on an enhanced suffix array and inherits the superior time and memory efficiency of this data structure. Simulations demonstrate that πm is accurate if mutations are distributed randomly along the chromosome. While real data often deviates from this ideal, πm remains useful for identifying regions of low genetic diversity using a sliding window approach. We demonstrate this by applying it to the complete genomes of 37 strains of Drosophila melanogaster, and to the genomes of two closely related Drosophila species, D.simulans and D.sechellia. In both cases, we detect the diversity minimum and discuss its biological implications.