Multi-label classifier based on histogram of gradients for predicting the anatomical therapeutic chemical class/classes of a given compound
Given an unknown compound, is it possible to predict its Anatomical Therapeutic Chemical class/classes? This is a challenging yet important problem since such a prediction could be used to deduce not only a compound's possible active ingredients but also its therapeutic, pharmacological and chemical properties, thereby substantially expediting the pace of drug development. The problem is challenging because some drugs and compounds belong to two or more ATC classes, making machine learning extremely difficult.Results:
In this article a multi-label classifier system is proposed that incorporates information about a compound's chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes. The proposed system reshapes a 1D feature vector to obtain a 2D matrix representation of the compound. This matrix is then described by a histogram of gradients that is fed into a Multi-Label Learning with Label-Specific Features classifier. Rigorous cross-validations demonstrate the superior prediction quality of this method compared with other state-of-the-art approaches developed for this problem, a superiority that is reflected particularly in the absolute true rate, the most important and harshest metric for assessing multi-label systems.Availability and implementation:
The MATLAB code for replicating the experiments presented in this article is available at https://www.dropbox.com/s/7v1mey48tl9bfgz/ToolPaperATC.rar?dl=0.Contact:
Supplementary data are available at Bioinformatics online.