Important molecular descriptors used for establishing quantitative structure-activity relationships are investigated to classify similar versus dissimilar peptides. When searching new lead structures, synthesizing and testing compounds which are too similar wastes time and resources. In contrast, any lead optimization program requires the investigation of similar compounds to that lead. Thus, it is important to maximize or minimize the structural diversity of peptides to design useful compound libraries for lead finding or lead refinement projects.
If a molecular descriptor is a useful measure of similarity for the design of peptide libraries, small differences in this descriptor for a pair of molecules should only translate into small biological differences. Using this paradigm as a basis for descriptor validation, it was possible to rank different molecular descriptors. Those physicochemical descriptors are 2D fingerprints and five experimentally or theoretically derived principal property scales. Some theoretically derived metrics are obtained by computing interaction energies or similarity indices on predefined 3D grid points using canonical conformations for individual amino acids. The resulting 3D data matrices are analyzed using a principal component analysis leading to three principal properties for CoMFA (Comparative Molecular Field Analysis) or CoMSIA (Comparative Molecular Similarity Index Analysis) derived molecular fields.
The descriptor validation results reveal the applicability of design tools on peptide data sets. Experimentally derived descriptors, in general, are more acceptable than computationally derived metrics, while the latter provide a statistically valid alternative to characterize novel building blocks. The CoMSIA metrics perform slightly better than the CoMFA-based principal properties, while GRID-based descriptors are always less acceptable.