The allele distribution in next-generation sequencing data sets is accurately 
described as the result of a stochastic branching process

Heinrich, V.; Stange, J.; Dickhaus, T.; Imkeller, P.; Kruger, U.; Bauer, S.; Mundlos, S.; Robinson, P. N.; Hecht, J.; Krawitz, P. M.

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process

MPG-Autoren

/persons/resource/persons50437

Mundlos, S.
Research Group Development & Disease (Head: Stefan Mundlos), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons50496

Robinson, P. N.
Research Group Development & Disease (Head: Stefan Mundlos), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons50196

Hecht, J.
Research Group Development & Disease (Head: Stefan Mundlos), Max Planck Institute for Molecular Genetics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Heinrich, V., Stange, J., Dickhaus, T., Imkeller, P., Kruger, U., Bauer, S., et al. (2011). The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process. Nucleic Acids Res. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/22127862 http://nar.oxfordjournals.org/content/early/2011/11/29/nar.gkr1073.full.pdf.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0010-784C-8

Zusammenfassung

With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.