Protein interactions provide an important context for the understanding of function. Experimental approaches have been complemented with computational ones, such as PSIMAP, which computes domain–domain interactions for all multi-domain and multi-chain proteins in the Protein Data Bank (PDB). PSIMAP has been used to determine that superfamilies occurring in many species have many interaction partners, to show examples of convergent evolution through shared interaction partners and to uncover complexes in the interaction map.Motivation
To determine an interaction, the original PSIMAP algorithm checks all residue pairs of any domain pair defined by classification systems such as SCOP. The computation takes several days for the PDB. The computation of PSIMAP has two shortcomings: first, the original PSIMAP algorithm considers only interactions of residue pairs rather than atom pairs losing information for detailed analysis of contact patterns. At the atomic level the original algorithm would take months. Second, with the superlinear growth of PDB, PSIMAP is not sustainable.Results
We address these two shortcomings by developing a family of new algorithms for the computation of domain–domain interactions based on the idea of bounding shapes, which are used to prune the search space. The best of the algorithms improves on the old PSIMAP algorithm by a factor of 60 on the PDB. Additionally, the algorithms allow a distributed computation, which we carry out on a farm of 80 Linux PCs. Overall, the new algorithms reduce the computation at atomic level from months to 20 min. The combination of pruning and distribution makes the new algorithm scalable and sustainable even with the superlinear growth in PDB.