Codon reassignments have been reported across all domains of life. With the increasing number of sequenced genomes, the development of systematic approaches for genetic code detection is essential for accurate downstream analyses. Three automated prediction tools exist so far: FACIL, GenDecoder and Bagheera; the last two respectively restricted to metazoan mitochondrial genomes and CUG reassignments in yeast nuclear genomes. These tools can only analyze a single genome at a time and are often not followed by a validation procedure, resulting in a high rate of false positives.Results
We present CoreTracker, a new algorithm for the inference of sense-to-sense codon reassignments. CoreTracker identifies potential codon reassignments in a set of related genomes, then uses statistical evaluations and a random forest classifier to predict those that are the most likely to be correct. Predicted reassignments are then validated through a phylogeny-aware step that evaluates the impact of the new genetic code on the protein alignment. Handling simultaneously a set of genomes in a phylogenetic framework, allows tracing back the evolution of each reassignment, which provides information on its underlying mechanism. Applied to metazoan and yeast genomes, CoreTracker significantly outperforms existing methods on both precision and sensitivity.Availability and implementation
CoreTracker is written in Python and available at https://github.com/UdeM-LBIT/CoreTracker.Contact
Supplementary data are available at Bioinformatics online.