1Department of Software engineering, College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350108, China2Department of Computer Science & Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA3Department of Pathology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA
Checking for direct PDF access through Ovid
Motivation:Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods.Results:We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, Symbol, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes.Availability and implementation:The K2 and Symbol approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz).Contact:email@example.comSupplementary information:Supplementary data are available at Bioinformatics online.