We are motivated by the fast-growing number of protein structures in the Protein Data Bank with necessary information for prediction of protein–protein interaction sites to develop methods for identification of residues participating in protein–protein interactions. We would like to compare conditional random fields (CRFs)-based method with conventional classification-based methods that omit the relation between two labels of neighboring residues to show the advantages of CRFs-based method in predicting protein–protein interaction sites.Results
The prediction of protein–protein interaction sites is solved as a sequential labeling problem by applying CRFs with features including protein sequence profile and residue accessible surface area. The CRFs-based method can achieve a comparable performance with state-of-the-art methods, when 1276 nonredundant hetero-complex protein chains are used as training and test set. Experimental result shows that CRFs-based method is a powerful and robust protein–protein interaction site prediction method and can be used to guide biologists to make specific experiments on proteins.