Thursday, 6 December 2012

Alignment-free phylogeny of whole genomes using underlying subwords

Background:
With the progress of modern sequencing technologies a large number of complete genomes are nowavailable. Traditionally the comparison of two related genomes is carried out by sequence alignment.There are cases where these techniques cannot be applied, for example if two genomes do not sharethe same set of genes, or if they are not alignable to each other due to low sequence similarity,rearrangements and inversions, or more specifically to their lengths when the organisms belong todifferent species. For these cases the comparison of complete genomes can be carried out only withad hoc methods that are usually called alignment-free methods.
Methods:
In this paper we propose a distance function based on subword compositions called UnderlyingApproach (UA). We prove that the matching statistics, a popular concept in the field of stringalgorithms able to capture the statistics of common words between two sequences, can be derivedfrom a small set of "independent" subwords, namely the irredundant common subwords. We define adistance-like measure based on these subwords, such that each region of genomes contributes onlyonce, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filterdiscards subwords occurring in regions covered by other more significant subwords.
Results:
The Underlying Approach (UA) builds a scoring function based on this set of patterns, calledunderlying. We prove that this set is by construction linear in the size of input, without overlaps, andcan be efficiently constructed. Results show the validity of our method in the reconstruction ofphylogenetic trees, where the Underlying Approach outperforms the current state of the art methods.Moreover, we show that the accuracy of UA is achieved with a very small number of subwords,which in some cases carry meaningful biological information.

Source: http://www.almob.org/content/7/1/34

sport medical sport news news

No comments:

Post a Comment