ausblenden:
Schlagwörter:
genetic diversity; alignment-free; maximumlikelihood; Drosophila;
match length; distribution
Zusammenfassung:
Comparative sequencing contributes critically to the functional annotation of genomes. One
prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the
availability of efficient computational tools. We present here a strategy for comparing unaligned genomes
based on a coalescent approach combined with advanced algorithms for indexing sequences. These
algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only
linearly with sequence length. Using this approach, we have derived and implemented a maximumlikelihood
estimator of the average number of mismatches per site between two closely related
sequences, p. By allowing for fluctuating coalescent times, we are able to improve a previously published
alignment-free estimator of p. We show through simulation that our new estimator is fast and accurate
even with moderate recombination (r # p). To demonstrate its applicability to real data, we compare the
unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies,
our sliding window analysis locates the global divergence minimum between these two genomes to the
pericentromeric region of chromosome 3.