1College of Life Science, Hunan Normal University, Changsha, Hunan 410087, China and 2Department of Biostatistics and Epidemiology, Georgia Regents University, Augusta, GA 30912–4900, USA
Checking for direct PDF access through Ovid
Summary:The ‘omic’ data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini–Hochberg (BH) procedure and Westfall–Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.Availability and implementation:Our program is implemented in Matlab and is available upon request.Contact:firstname.lastname@example.orgSupplementary information:Supplementary data are available at Bioinformatics online.