The Impact of the Tree Prior on Molecular Dating of Data Sets Containing a Mixture of Inter- and Intraspecies Sampling
In Bayesian phylogenetic analyses of genetic data, prior probability distributions need to be specified for the model parameters, including the tree. When Bayesian methods are used for molecular dating, available tree priors include those designed for species-level data, such as the pure-birth and birth-death priors, and coalescent-based priors designed for population-level data. However, molecular dating methods are frequently applied to data sets that include multiple individuals across multiple species. Such data sets violate the assumptions of both the speciation and coalescent-based tree priors, making it unclear which should be chosen and whether this choice can affect the estimation of node times. To investigate this problem, we used a simulation approach to produce data sets with different proportions of within- and between-species sampling under the multispecies coalescent model. These data sets were then analyzed under pure-birth, birth-death, constant-size coalescent, and skyline coalescent tree priors. We also explored the ability of Bayesian model testing to select the best-performing priors. We confirmed the applicability of our results to empirical data sets from cetaceans, phocids, and coregonid whitefish. Estimates of node times were generally robust to the choice of tree prior, but some combinations of tree priors and sampling schemes led to large differences in the age estimates. In particular, the pure-birth tree prior frequently led to inaccurate estimates for data sets containing a mixture of inter- and intraspecific sampling, whereas the birth-death and skyline coalescent priors produced stable results across all scenarios. Model testing provided an adequate means of rejecting inappropriate tree priors. Our results suggest that tree priors do not strongly affect Bayesian molecular dating results in most cases, even when severely misspecified. However, the choice of tree prior can be significant for the accuracy of dating results in the case of data sets with mixed inter- and intraspecies sampling.