Genome-wide association studies (GWAS) have had limited success when applied to complex diseases. Analyzing SNPs individually requires several large studies to integrate the often divergent results. In the presence of epistasis, multivariate approaches based on the linear model (including stepwise logistic regression) often have low sensitivity and generate an abundance of artifacts.Methods:
Recent advances in distributed and parallel processing spurred methodological advances in nonparametric statistics. U-statistics for structured multivariate data (µStat) are not confounded by unrealistic assumptions (e.g., linearity, independence).Results:
By incorporating knowledge about relationships between SNPs, µGWAS (GWAS based on µStat) can identify clusters of genes around biologically relevant pathways and pinpoint functionally relevant regions within these genes.Conclusion:
With this computational biostatistics approach increasing power and guarding against artifacts, personalized medicine and comparative effectiveness will advance while subgroup analyses of Phase III trials can now suggest risk factors for adverse events and novel directions for drug development.