A knowledge-based approach for predicting gene-disease associations

    loading  Checking for direct PDF access through Ovid



Recent advances of next-generation sequence technologies have made it possible to rapidly and inexpensively identify gene variations. Knowing the disease association of these gene variations is important for early intervention to treat deadly diseases and provide possible targets to cure these diseases. Genome-wide association studies (GWAS) have identified many individual genes associated with common diseases. To exploit the large amount of data obtained from GWAS studies and leverage our understanding of common as well as rare diseases, we have developed a knowledge-based approach to predict gene-disease associations. We first derive gene-gene mutual information by utilizing the cooccurrence of genes in known gene-disease association data. Subsequently, the mutual information is combined with known protein-protein interaction networks by a boosted tree regression method.


The method called Know-GENE is compared with the method of random walking on the heterogeneous network using the same input data. For a set of 960 diseases, using the same training data in testing in 3-fold cross-validation, the average recall rate within the top ranked 100 genes by Know-GENE is 65.0% compared with 37.9% by the state of the art random walking on heterogeneous network. This significant improvement is mostly due to the inclusion of knowledge-based mutual information.

Availability and Implementation:

Predictions for genes associated with the 960 diseases are available at http://cssb2.biology.gatech.edu/knowgene.



Related Topics

    loading  Loading Related Articles