A robust DF-REML framework for variance components estimation in genetic studies
In genetic association studies, linear mixed models (LMMs) are used to test for associations between phenotypes and candidate single nucleotide polymorphisms (SNPs). These same models are also used to estimate heritability, which is central not only to evolutionary biology but also to the prediction of the response to selection in plant and animal breeding, as well as the prediction of disease risk in humans. However, when one or more of the underlying assumptions are violated, the estimation of variance components may be compromised and therefore so may the estimates of heritability and any other functions of these. Considering that datasets obtained from real life experiments are prone to several sources of contamination, which usually induce the violation of the assumption of the normality of the errors, a robust derivative-free restricted-maximum likelihood framework (DF-REML) together with a robust coefficient of determination are proposed for the LMM in the context of genetic studies of continuous traits.Results
The proposed approach, in addition to the robust estimation of variance components and robust computation of the coefficient of determination, allows in particular for the robust estimation of SNP-based heritability by reducing the bias and increasing the precision of its estimates. The performance of both classical and robust DF-REML approaches is compared via a Monte Carlo simulation study. Additionally, three examples of application of the methodologies to real datasets are given in order to validate the usefulness of the proposed robust approach. Although the main focus of this article is on plant breeding applications, the proposed methodology is applicable to both human and animal genetic studies.Availability and implementation
Source code implemented in R is available in the Supplementary Material.Contact
Supplementary data are available at Bioinformatics online.