Automatic module selection from several microarray gene expression studies
Independence of genes is commonly but incorrectly assumed in microarray data analysis; rather, genes are activated in co-regulated sets referred to as modules. In this article, we develop an automatic method to define modules common to multiple independent studies. We use an empirical Bayes procedure to estimate a sparse correlation matrix for all studies, identify modules by clustering, and develop an extreme-value-based method to detect so-called scattered genes, which do not belong to any module. The resulting algorithm is very fast and produces accurate modules in simulation studies. Application to real data identifies modules with significant enrichment and results in a huge dimension reduction, which can alleviate the computational burden of further analyses.