This study aimed to develop a genotypic method to predict HIV-1 coreceptor usage by employing the nucleotide sequence of the env gene in a tree-augmented naive Bayes (TAN) classifier, and to evaluate its accuracy in prediction compared with other available tools.Methods
A wrapper data-mining strategy interleaved with a TAN algorithm was employed to evaluate the predictor value of every single-nucleotide position throughout the HIV-1 env gene. Based on these results, different nucleotide positions were selected to develop a TAN classifier, which was employed to predict the coreceptor tropism of all the full-length env gene sequences with information on coreceptor tropism currently available at the Los Alamos HIV Sequence Database.Results
Employing 26 nucleotide positions in the TAN classifier, an accuracy of 95.6%, a specificity (identification of CCR5-tropic viruses) of 99.4% and a sensitivity (identification of CXCR4/dual-tropic viruses) of 80.5% were achieved for the in silico cross-validation. Compared with the phenotypic determination of coreceptor usage, the TAN algorithm achieved more accurate predictions than WebPSSM and Geno2pheno [coreceptor] (P < 0.05).Conclusions
The use of the methodology presented in this work constitutes a robust strategy to identify genetic patterns throughout the HIV-1 env gene differently present in CCR5-tropic and CXCR4/dual-tropic viruses. Moreover, the TAN classifier can be used as a genotypic tool to predict the coreceptor usage of HIV-1 isolates reaching more accurate predictions than with other widely used genotypic tools. The use of this algorithm could improve the correct prescribing of CCR5 antagonist drugs to HIV-1-infected patients.