Motivation: Omics studies aim to find significant changes due to biological or functional perturbation. However, gene and protein expression profiling experiments contain inherent technical variation. In discovery proteomics studies where the number of samples is typically small, technical variation plays an important role because it contributes considerably to the observed variation. Previous methods place both technical and biological variations in tightly integrated mathematical models that are difficult to adapt for different technological platforms. Our aim is to derive a statistical framework that allows the inclusion of a wide range of technical variability.
Results: We introduce a new method called the simulated linear test, or the s-test, that is easy to implement and easy to adapt for different models of technical variation. It generates virtual data points from the observed values according to a pre-defined technical distribution and subsequently employs linear modeling for significance analysis. We demonstrate the flexibility of the proposed approach by deriving a new significance test for quantitative discovery proteomics for which missing values have been a major issue for traditional methods such as the t-test. We evaluate the result on two label-free (phospho) proteomics datasets based on ion-intensity quantitation.
Availability and Implementation: Available at http://www.oncoproteomics.nl/software/stest.html.