Introduction Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identification of Molecular Effects of Drugs, to help biomedical scientists curate this literature to facilitate decision support.
Objectives To create an automated system to identify assertions in the literature concerning drugs targeting genes with therapeutic implications and characterize the challenges inherent in automating this process in rapidly evolving domains.
Methods We used subject-predicate-object triples (semantic predications) and co-occurrence relations generated by applying the SemRep Natural Language Processing system to MEDLINE abstracts and ClinicalTrials.gov descriptions. We applied customized semantic queries to find drugs targeting genes of interest. The results were manually reviewed by a team of experts.
Results Compared to a manually curated set of relationships, recall, precision, and F2 were 0.39, 0.21, and 0.33, respectively, which represents a 3- to 4-fold improvement over a publically available set of predications (SemMedDB) alone. Upon review of ostensibly false positive results, 26% were considered relevant additions to the reference set, and an additional 61% were considered to be relevant for review. Adding co-occurrence data improved results for drugs in early development, but not their better-established counterparts.
Conclusions Precision medicine poses unique challenges for biomedical informatics systems that help domain experts find answers to their research questions. Further research is required to improve the performance of such systems, particularly for drugs in development.