|| Checking for direct PDF access through Ovid
Street cocaine is typically altered with several compounds that increase its harmful health-related side effects, most notably depression, convulsions, and severe damages to the cardiovascular system, lungs, and brain. Thus, determining the concentration of cocaine and adulterants in seized drug samples is important from both health and forensic perspectives. Although FTIR has been widely used to identify the fingerprint and concentration of chemical compounds, spectroscopy datasets are usually comprised of thousands of highly correlated wavenumbers which, when used as predictors in regression models, tend to undermine the predictive performance of multivariate techniques. In this paper, we propose an FTIR wavenumber selection method aimed at identifying FTIR spectra intervals that best predict the concentration of cocaine and adulterants (e.g. caffeine, phenacetin, levamisole, and lidocaine) in cocaine samples. For that matter, the Mutual Information measure is integrated into a Quadratic Programming problem with the objective of minimizing the probability of retaining redundant wavenumbers, while maximizing the relationship between retained wavenumbers and compounds' concentrations. Optimization outputs guide the order of inclusion of wavenumbers in a predictive model, using a forward-based wavenumber selection method. After the inclusion of each wavenumber, parameters of three alternative regression models are estimated, and each model's prediction error is assessed through the Mean Average Error (MAE) measure; the recommended subset of retained wavenumbers is the one that minimizes the prediction error with maximum parsimony. Using our propositions in a dataset of 115 cocaine samples we obtained a best prediction model with average MAE of 0.0502 while retaining only 2.29% of the original wavenumbers, increasing the predictive precision by 0.0359 when compared to a model using the complete set of wavenumbers as predictors.