de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Thesis

Evaluation of Population-Based Haplotype Phasing Algorithms

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons202171

Sethi,  Riccha
International Max Planck Research School, MPI for Informatics, Max Planck Society;
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons180775

Marschall,  Tobias
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons98170

Pfeifer,  Nico
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Sethi, R. (2016). Evaluation of Population-Based Haplotype Phasing Algorithms. Master Thesis, Universität des Saarlandes, Saarbrücken.


Cite as: http://hdl.handle.net/11858/00-001M-0000-002C-41DA-7
Abstract
The valuable information in correct order of alleles on the haplotypes has many applications in GWAS studies and population genetics. A considerable number of computational and statistical algorithms have been developed for haplotype phasing. Historically, these algorithms were compared using the simulated population data with less dense markers which was inspired by genotype data from the HapMap project. Currently due to the advancement and reduction in cost of NGS, thousands of individuals across the world have been sequenced in 1000 Genomes Project. This has generated the genotype information of individuals from different ethnicity along with much denser genetic variations in them. Here, we have developed a scalable approach to assess state-of-the-art population-based haplotype phasing algorithms with benchmark data designed by simulation of the population (unrelated and related individuals), NGS pipeline and genotype calling. The most accurate algorithm was MVNCall (v1) for phase inference in unrelated individuals while DuoHMM approach of Shapeit (v2) had lowest switch error rate of 0.298 %(with true genotype likelihoods) in the related individuals. Moreover, we also conducted a comprehensive assessment of algorithms for the imputation of missing genotypes in the population with a reference panel. For this metrics, Impute2 (v2.3.2) and Beagle (v4.1) both performed competitively under different imputation scenarios and had genotype concordance rate of >99%. However, Impute2 was better in imputation of genotypes with minor allele frequency of <0.025 in the reference panel.