Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes*


    loading  Checking for direct PDF access through Ovid

Abstract

Recently, computer programs developed within the field of Inductive Logic Programming (ILP) have received some attention for their ability to construct restricted first-order logic solutions using problem-specific background knowledge. Prominent applications of such programs have been concerned with determining “structure-activity” relationships in the areas of molecular biology and chemistry. Typically the task here is to predict the “activity” of a compound (for example, toxicity), from its chemical structure. A summary of the research in the area is: (a) ILP programs have largely been restricted to qualitative predictions of activity (“high”, “low” etc.); (b) When appropriate attributes are available, ILP programs have equivalent predictivity to standard quantitative analysis techniques like linear regression. However ILP programs usually perform better when such attributes are unavailable; and (c) By using structural information as background knowledge, an ILP program can provide comprehensible explanations for biological activity. This paper examines the use of ILP programs as a method of “discovering” new attributes. These attributes could then be used by methods like linear regression, thus allowing for quantitative predictions while retaining the ability to use structural information as background knowledge. Using structure-activity tasks as a test-bed, the utility of ILP programs in constructing new features was evaluated by examining the prediction of biological activity using linear regression, with and without the aid of ILP learnt logical attributes. In three out of the five data sets examined the addition of ILP attributes produced statistically better results. In addition six important structural features that have escaped the attention of the expert chemists were discovered. The method used here to construct new attributes is not specific to the problem of predicting biological activity, and the results obtained suggest a wider role for ILP programs in aiding the process of scientific discovery.

    loading  Loading Related Articles