A significant difference in the distribution of a feature between two gene sets can provide insight into function or regulation. This statistical setting differs from much of hypothesis testing theory because the genome is often considered to be effectively fixed, finite and entirely known in commonly studied organisms, such as human. The Mann-Whitney U test is commonly employed in this scenario despite the assumptions of the test not being met, leading to unreliable and generally underpowered results. Permutation tests are also commonly employed for this purpose, but are computationally burdensome and are not tractable for obtaining small P values or for multiple comparisons.Results:
We present an exact test for the null hypothesis that gene set membership is independent of the quantitative gene feature of interest. We derive an analytic expression for the randomization distribution of the median of the quantitative feature under the null hypothesis. Efficient implementation permits calculation of precise P values of arbitrary magnitude and makes thousands of simultaneous tests of transcriptome-sized gene sets computationally tractable. The flexibility of the hypothesis testing framework presented permits extension to a variety of related tests commonly found in genomics. The exact test is used to identify signatures of translation control and protein function in the human genome.Availability and implementation:
The exact test presented here is implemented in R in the package kpmt available on CRAN.Contact:
Supplementary data are available at Bioinformatics online.