TRM: A Powerful Two-Stage Machine Learning Approach for Identifying SNP-SNP Interactions

    loading  Checking for direct PDF access through Ovid

Abstract

Summary

Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods—Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)—to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RFOOB had better performance than MARS and RFIS in detecting important variables. This study demonstrates that TRMOOB, which is RFOOB plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRMOOB had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRMOOB is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.

Related Topics

    loading  Loading Related Articles