1 Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden2 Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Box 1031, 17121 Solna, Stockholm, Sweden
Checking for direct PDF access through Ovid
Motivation:Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier.Results:SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second best method. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor.Availability and Implementation:SubCons is freely available as a webserver (http://subcons.bioinfo.se) and source code from https://bitbucket.org/salvatore_marco/subcons-web-server. The golden dataset as well is available from http://subcons.bioinfo.se/pred/download.Contact:firstname.lastname@example.orgSupplementary information:Supplementary data are available at Bioinformatics online.