Motivation: Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly.
Results: In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction.
Availability and Implementation:http://www.csbio.sjtu.edu.cn/bioinf/R2C/
Supplementary information: Supplementary data are available at Bioinformatics online.