de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

 
 
 
 
DownloadE-Mail
  Algorithms for efficient alignment-free sequence comparison

Domazet-Lošo, M. (2010). Algorithms for efficient alignment-free sequence comparison. PhD Thesis, University of Zagreb, Zagreb.

Item is

Basisdaten

einblenden: ausblenden:
Datensatz-Permalink: http://hdl.handle.net/11858/00-001M-0000-000F-D451-8 Versions-Permalink: http://hdl.handle.net/11858/00-001M-0000-000F-D452-6
Genre: Hochschulschrift
Alternativer Titel : Algoritmi za učinkovitu usporedbu sekvenci bez korištenja sravnjivanja

Dateien

einblenden: Dateien
ausblenden: Dateien
:
Thesis_Mirjana Domazet-Loso.pdf (Verlagsversion), 2MB
Beschreibung:
-
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Domazet-Lošo, Mirjana1, Autor              
Haubold, Bernhard1, Ratgeber              
Ristov, Strahil, Gutachter
Affiliations:
1Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society, escidoc:1445644              

Inhalt

einblenden:
ausblenden:
Schlagwörter: alignment-free method; evolutionary distance; local sequence homology; genome comparison; HIV; horizontal gene transfer; suffix tree; suffix array; shortest unique substringing
 Zusammenfassung: Sequence comparison is an essential tool in modern biology. It is used to identify homologous regions between sequences, and to detect evolutionary relationships between organisms. Sequence comparison is usually based on alignments. However, aligning whole genomes is computationally difficult. As an alternative approach, alignment-free sequence comparison can be used. In my thesis, I concentrate on two problems that can be solved without alignment: (i) estimation of substitution rates between nucleotide sequences, and (ii) detection of local sequence homology. In the first part of my thesis, I developed and implemented a new algorithm for the efficient alignment-free computation of the number of nucleotide substitutions per site, and applied it to the analysis of large data sets of complete genomes. In the second part of my thesis, I developed and implemented a new algorithm for detecting matching regions between nucleotide sequences. I applied this solution to the classification of circulating recombinant forms of HIV, and to the analysis of bacterial genomes subject to horizontal gene transfer.

Details

einblenden:
ausblenden:
Sprache(n): eng - Englisch
 Datum: 2010-10-07
 Publikationsstatus: Angenommen
 Seiten: III, 127 S.
 Ort, Verlag, Ausgabe: Zagreb : University of Zagreb
 Inhaltsverzeichnis: Table of Contents
1. GENERAL INTRODUCTION.........................................................................1
1.1. Suffix trees and other index data structures used in biological sequence analysis.....................................................................................................................9
1.1.1. Suffix Tree..........................................................................................11
1.1.2. The space and the time complexity of the algorithms for the suffix tree construction.......................................................................................................13
1.1.3. Suffix Array........................................................................................14
1.1.4. The space and the time complexity of the algorithms for suffix array construction.......................................................................................................15
1.1.5. Enhanced Suffix Array.......................................................................17
1.1.6. The 64-bit implementation of the lightweight suffix array construction algorithm 21
1.1.7. Self-index...........................................................................................22
1.1.8. Burrows-Wheeler transform..............................................................23
1.1.9. The FM-Index and the backward search algorithm..........................25
1.1.10. The space and the time-complexity of the FM-index.........................29
2. EFFICIENT ESTIMATION OF PAIRWISE DISTANCES BETWEEN GENOMES...............................................................................................................31
2.1. Introduction................................................................................................31
2.2. Methods.....................................................................................................33
2.2.1. Definition of an alignment-free estimator of the rate of substitution, Kr 33
2.2.2. Problem statement.............................................................................35
2.2.3. Time complexity analysis of the previous approach (kr 1)................35
2.2.4. Time complexity analysis of the new approach (kr 2).......................37
2.2.5. Algorithm 1: Computation of all Kr values during the traversal of a generalized suffix tree of n sequences................................................................38
2.2.6. The implementation of kr version 2...................................................44
2.3. Analysis of Kr on simulated data sets........................................................45
2.3.1. Auxiliary programs............................................................................45
2.3.2. Consistency of Kr...............................................................................46 i
2.3.3. The affect of horizontal gene transfer on the accuracy of Kr............48
2.3.4. The effect of genome duplication on the accuracy of Kr....................49
2.3.5. Run time comparison of kr 1 and kr 2...............................................50
2.4. Application of kr version 2........................................................................53
2.4.1. Auxililary software used for the analysis of real data sets................56
2.4.2. The analysis of 12 Drosophila genomes............................................57
2.4.3. The analysis of 13 Escherichia coli and Shigella genomes...............58
2.4.4. The analysis of 825 HIV-1 pure subtype genomes.............................61
2.5. Discussion..................................................................................................62
3. EFFICIENT ALIGNMENT-FREE DETECTION OF LOCAL SEQUENCE HOMOLOGY....................................................................................66
3.1. Introduction................................................................................................66
3.2. Methods.....................................................................................................69
3.2.1. Problem statement – determining subtype(s) of a query sequence....69
3.2.2. Construction of locally homologous segments..................................71
3.2.3. Time complexity of computing a list of intervals Ii............................72
3.2.4. Algorithm 2: Construction of an interval tree...................................73
3.2.5. Computing a list of segements Gi.......................................................80
3.3. Analysis of st on simulated data sets.........................................................82
3.3.1. Run-time and memory usage analysis of st........................................82
3.3.2. Consistency of st................................................................................85
3.3.3. Comparison to SCUEAL on simulated data sets...............................92
3.4. Application of st.........................................................................................97
3.4.1. The analysis of Neisseria meningitidis..............................................98
3.4.2. The analysis of a recombinant form of HIV-1...................................99
3.4.3. The analysis of circulating recombinant forms of HIV-1................103
3.4.4. The analysis of an avian pathogenic Escherichia coli strain..........104
3.5. Discussion................................................................................................107
4. CONCLUSION..............................................................................................110
5. REFERENCES..............................................................................................112
6. ELECTRONIC SOURCES...........................................................................121
7. LIST OF ABBREVIATIONS AND SYMBOLS.........................................122 ii
iii
ABSTRACT............................................................................................................124
SAŽETAK..............................................................................................................125
CURRICULUM VITAE........................................................................................126
ŽIVOTOPIS...........................................................................................................127
 Art der Begutachtung: -
 Identifikatoren: eDoc: 544378
Anderer: Diss/12255
 Art des Abschluß: Doktorarbeit

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle

einblenden: