Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse




Journal Article

Aligment-free population genomics: an efficient estimator of sequence diversity


Haubold,  Bernhard
Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society;

There are no locators available
Fulltext (public)

(Publisher version), 831KB

Supplementary Material (public)
There is no public supplementary material available

Haubold, B., & Pfaffelhuber, P. (2012). Aligment-free population genomics: an efficient estimator of sequence diversity. G3: Genes, Genomes, Genetics, 2(8), 883-889. doi:10.1534/g3.112.002527.

Cite as:
Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximumlikelihood estimator of the average number of mismatches per site between two closely related sequences, p. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of p. We show through simulation that our new estimator is fast and accurate even with moderate recombination (r # p). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3.