English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Phylogenetic trees from large datasets

Schmidt, H. A. (2003). Phylogenetic trees from large datasets. PhD Thesis, Heinrich–Heine–Universität, Düsseldorf.

Item is

Files

show Files
hide Files
:
PhD-Schmidt.pdf (Any fulltext), 990KB
 
File Permalink:
-
Name:
PhD-Schmidt.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Molecular Genetics, MBMG; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
eDoc_access: MPG
License:
-

Locators

show

Creators

show
hide
 Creators:
Schmidt, Heiko A.1, Author
Affiliations:
1Max Planck Society, ou_persistent13              

Content

show
hide
Free keywords: -
 Abstract: This thesis deals with topics from Bioinformatics /Phyloinformatics. It touches the fields algorithmic complexity, molecular phylogenetics, sequence evolution, as well as parallel computing and, thus, is addressed to an interdisciplinary community. Although it is hardly possible to write a completely self-contained thesis, I tried to provide some basic information needed for members of either community to be able to more easily understand the topics discussed. Consequently, some information might be enclosed that seem trivial to, e.g., computer scientists but which biologists are possibly not familiar with and vice versa.

Details

show
hide
Language(s): eng - English
 Dates: 2003-07-28
 Publication Status: Accepted / In Press
 Pages: 111 pp
 Publishing info: Düsseldorf : Heinrich–Heine–Universität
 Table of Contents: 1. Overview 1
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Molecular Phylogenetics: A General Introduction 5
2.1. Biological Sequences and Molecular Evolution . . . . . . . . . . . . . . . 5
2.1.1. Types of Biological Data . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2. Sequence Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . 7
2.1.4. Modeling Molecular Evolution . . . . . . . . . . . . . . . . . . . . 7
2.2. Phylogenetic Tree Reconstruction . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1. Notation on Phylogenetic Trees . . . . . . . . . . . . . . . . . . . 10
2.2.2. Types of Phylogenetic Methods . . . . . . . . . . . . . . . . . . . 14
2.2.3. Maximum Likelihood on Phylogenetic Trees . . . . . . . . . . . . 14
2.2.4. Complexity of Phylogenetic Analysis . . . . . . . . . . . . . . . . 17
3. The Quartet Puzzling Algorithm 19
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2. Methods from the TREE-PUZZLE Package . . . . . . . . . . . . . . . . 20
3.2.1. Likelihood Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2. Quartet Puzzling . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2.1. ML Step . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2.2. Puzzling Step . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2.3. Consensus Step . . . . . . . . . . . . . . . . . . . . . . . 24
4. Improvement in Complexity of the Puzzling Step of QP 25
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2. Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3. Complexity of the Original Puzzling Step . . . . . . . . . . . . . . . . . . 27
4.4. More E cient Puzzling Step Algorithms . . . . . . . . . . . . . . . . . . 29
4.4.1. Split-Based Puzzling Step Algorithm . . . . . . . . . . . . . . . . 29
4.4.2. Recursive Puzzling Step Algorithm . . . . . . . . . . . . . . . . . 31
4.4.2.1. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.2.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.3. Berry’s MRCA-Based Puzzling Step Algorithm . . . . . . . . . . . 36
4.5. Runtime Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.1. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.2. Benchmark Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5. Parallelized Quartet Puzzling 43
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2. Parameter Estimation for Evolutionary Models . . . . . . . . . . . . . . . 44
5.2.1. Algorithm: Estimating Model Parameters . . . . . . . . . . . . . 44
5.3. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4. Runtime Analysis of the Sequential TREE-PUZZLE . . . . . . . . . . . . 46
5.4.1. Runtime of the Parameter Estimation . . . . . . . . . . . . . . . . 46
5.4.2. Runtime of the Quartet Puzzling Algorithm . . . . . . . . . . . . 46
5.5. Parallelizing TREE-PUZZLE . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.1. Parallelizing the Parameter Estimation . . . . . . . . . . . . . . . 47
5.5.2. Parallelizing Quartet Puzzling . . . . . . . . . . . . . . . . . . . . 47
5.6. Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.7. E ciency of Parallel TREE-PUZZLE . . . . . . . . . . . . . . . . . . . . 50
5.7.1. Benchmark Datasets and Setup . . . . . . . . . . . . . . . . . . . 50
5.7.2. Results for Parameter Estimation . . . . . . . . . . . . . . . . . . 51
5.7.3. ML Step and Puzzling Step . . . . . . . . . . . . . . . . . . . . . 51
5.7.4. Overall Scaling of Parallel TREE-PUZZLE . . . . . . . . . . . . . 51
5.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6. Large ML Trees from Sequences Using Quartets 57
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2. The Modified Quartet Puzzling Algorithm . . . . . . . . . . . . . . . . . 57
6.2.1. The Algorithm ModPUZZLE . . . . . . . . . . . . . . . . . . . 58
6.3. Computational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.4. Some Practical Measures on the Method . . . . . . . . . . . . . . . . . . 60
6.5. Discussion and Possible Extensions . . . . . . . . . . . . . . . . . . . . . 61
7. Phylogenetic Trees from Multiple Genesets with Missing Data 63
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2. Methods to Combine Datasets . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.1. Low Level Combination: Total Evidence . . . . . . . . . . . . . . 64
7.2.2. High Level Methods . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2.1. Combining Equal-Sized Datasets: Consensus Methods . 64
7.2.2.2. Combining Overlapping Datasets: Supertree Methods . . 66
7.3. Medium Level Combined Phylogenetic Analysis . . . . . . . . . . . . . . 68
7.3.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.3.2. The Combined Quartet Method to Combine Genesets . . . . . . . 70
7.3.2.1. Combining the ML Quartets . . . . . . . . . . . . . . . . 70
7.3.2.2. Computing the Overall Tree . . . . . . . . . . . . . . . . 71
7.3.2.3. Assessing Whether Genesets Can Be Combined . . . . . 71
7.3.2.4. Overlap-Guided Puzzling Step . . . . . . . . . . . . . . . 72
7.3.2.5. Relative Majority Consensus . . . . . . . . . . . . . . . 73
7.4. The Phylogeny of the Grasses . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.1. The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.4.2.1. Low Level Combination . . . . . . . . . . . . . . . . . . 74
7.4.2.2. High Level Combination . . . . . . . . . . . . . . . . . . 74
7.4.2.3. Medium Level Combination . . . . . . . . . . . . . . . . 74
7.4.3. Parameter Estimates from Poaceae Dataset . . . . . . . . . . . . 76
7.4.4. Combinability of the Poaceae Dataset . . . . . . . . . . . . . . . . 77
7.4.5. Reconstructed Poaceae Phylogeny . . . . . . . . . . . . . . . . . . 78
7.4.5.1. Total Evidence Tree . . . . . . . . . . . . . . . . . . . . 78
7.4.5.2. MRP Supertrees . . . . . . . . . . . . . . . . . . . . . . 78
7.4.5.3. MinCut Supertrees . . . . . . . . . . . . . . . . . . . . 82
7.4.5.4. Combined Quartet Tree . . . . . . . . . . . . . . . . . . 82
7.4.6. Other Published Poaceae Phylogenies . . . . . . . . . . . . . . . . 86
7.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.5.1. Problems of Dataset-Combining Methods . . . . . . . . . . . . . . 88
7.5.2. Comparison of the Tree Topologies . . . . . . . . . . . . . . . . . 89
7.5.3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8. Summary 93
A. Variables and Functions 95
B. Abbreviations 98
Bibliography 101
Acknowledgments 111
 Rev. Type: -
 Identifiers: eDoc: 194884
 Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show