The identification of genes or molecular regulatory mechanisms implicated in biological processes often requires the discretization, and in particular booleanization, of gene expression measurements. However, currently used methods mostly classify each measurement into an active or inactive state regardless of its statistical support possibly leading to downstream analysis conclusions based on spurious booleanization results.Results:
In order to overcome the lack of certainty inherent in current methodologies and to improve the process of discretization, we introduce RefBool, a reference-based algorithm for discretizing gene expression data. Instead of requiring each measurement to be classified as active or inactive, RefBool allows for the classification of a third state that can be interpreted as an intermediate expression of genes. Furthermore, each measurement is associated to a p- and q-value indicating the significance of each classification. Validation of RefBool on a neuroepithelial differentiation study and subsequent qualitative and quantitative comparison against 10 currently used methods supports its advantages and shows clear improvements of resulting clusterings.Availability and Implementation:
The software is available as MATLAB files in the Supplementary Information and as an online repository (https://github.com/saschajung/RefBool).Contact:
Supplementary data are available at Bioinformatics online.