1Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium2Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK3Centre for Immunology, Infection and Evolution, University of Edinburgh, Ashworth Laboratories, King's Buildings, Edinburgh EH9 3JT, UK4Department of Human Genetics, David Geffen School of Medicine5Department of Biostatistics, School of Public Health6Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
Checking for direct PDF access through Ovid
Motivation:Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses.Results:We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold.Availability and Implementation:Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference.Contact:email@example.comSupplementary information:Supplementary data are available at Bioinformatics online.