hide
Free keywords:
-
Abstract:
The advent of high throughput sequencers has lead to a dramatic increase in the size of
available genomic data. Standard methods, which have worked well for many years, are
not suitable for the analysis of big data sets, due to their reliance on a time-consuming
alignment step. In this thesis, a new alignment-free approach for phylogeny reconstruction is
introduced. The corresponding program, andi, is orders of magnitude faster than classical
approaches and also superior to comparable alignment-free methods.
The central data structure in andi is the enhanced suffix array. It is used to find long
exact matches between sequences. In this thesis, various approaches to the construction of
enhanced suffix arrays, including novel ones, are evaluated with respect to performance.
Additionally, a new parallel algorithm for the computation of suffix arrays is introduced.