Exploiting non-linear relationships between retention time and molecular structure of peptides originating from proteomes and comparing three multivariate approaches

    loading  Checking for direct PDF access through Ovid

Abstract

Peptides' retention time prediction is gaining increasing popularity in liquid chromatography–tandem mass spectrometry (LC–MS/MS)-based proteomics. This is a promising approach for improving successful proteome mapping, useful both in identification and quantification workflows. In this work, a quantitative structure-retention relationships (QSRR) model for its direct prediction from the molecular structure of 185 peptides originating from 8 well-characterized proteins and two Bacillus subtilis proteomes has been developed. Genetic Algorithm (GA) was used for selection of a subset of molecular descriptors coupled with three machine learning methods: Support Vector Regression (SVR), Artificial Neural Networks (ANN), and kernel Partial Least Squares (kPLS) for regression. Final GA-SVR, GA-ANN, and GA-kPLS models were validated through an external validation set of 95 peptides originating from the human epithelial HeLa cells proteomes. Robustness and stability was ensured by defining their applicability domain. The descriptors of the developed models were interpreted confirming a causal relationship between parameters of molecular structure and retention time. GA-SVR model has shown to be superior over the others in terms of both predictive ability, and interpretation of the selected descriptors.

Related Topics

    loading  Loading Related Articles