Microarrays are widely used to quantify DNA methylation because they are economical, require only small quantities of input DNA and focus on well-characterized regions of the genome. However, pre-processing of methylation microarray data is challenging because of confounding factors that include background fluorescence, dye bias and the impact of germline polymorphisms. Therefore, we present valuable insights and a framework for those seeking the most optimal pre-processing method through a data-driven approach.Results
Here, we show that Dasen is the optimal pre-processing methodology for the Infinium HumanMethylation450 BeadChip array in prostate cancer, a frequently employed platform for tumour methylome profiling in both the TCGA and ICGC consortia. We evaluated the impact of 11 pre-processing methods on batch effects, replicate variabilities, sensitivities and sample-to-sample correlations across 809 independent prostate cancer samples, including 150 reported for the first time in this study. Overall, Dasen is the most effective for removing artefacts and detecting biological differences associated with tumour aggressivity. Relative to the raw dataset, it shows a reduction in replicate variances of 67% and 76% for β- and M-values, respectively. Our study provides a unique pre-processing benchmark for the community with an emphasis on biological implications.Availability and implementation
All software used in this study are publicly available as detailed in the article.Contact
Supplementary data are available at Bioinformatics online.