1Department of Radiation Oncology, University Hospital of Heidelberg, Heidelberg 69120, Germany2Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany3Heidelberg Ion-Beam Therapy Center (HIT), Heidelberg 69120, Germany4Heidelberg Institute for Radiation Oncology (HIRO), University Hospital of Heidelberg, Heidelberg 69120, Germany5Tanslational Radiation Oncology, German Cancer Consortium (DKTK), National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
Checking for direct PDF access through Ovid
Motivation:Detailed copy number (CN) variation data can be obtained from 450k or EPIC Illumina methylation assays. However, the effects of different preprocessing strategies (normalization, transformation and selection of gain/loss cutoff values) on variant calling have not been evaluated systematically.Results:We provide an R package which allows to directly compare any preprocessed CN data. It provides its own CN alteration detection methodology: segments are identified through detection of changes in variance of CN data and are subsequently filtered for significance. Meaningful cutoffs for gain/loss definition can be identified automatically through analysis of the resulting ΔCN distributions of all analyzed samples. Three exemplary datasets (2x450k, 1xEPIC) were selected for comparative analyses of Raw, Illumina, SWAN, Quantile, Noob, Funnorm and Dasen normalizations. Importantly, all CN data distributions were skewed (-0.66 to -1.2) therefore requiring different gain/loss cutoffs. Depending on the normalization method, prominent baseline differences between samples could be observed. We present a workflow, which alleviates both issues: Z-transformation removes baseline differences between samples, and automatic cutoff selection circumvents the problems accompanying the skewed distributions. Additional filtering of candidates by significance yields comparable results for most enumerated normalization methods except for SWAN. In contrast, manual cutoff determination results in highly variable numbers of variant calls, highly dependent on the selected normalization method. Taken together, we present a workflow which allows to robustly identify copy number alterations in methylation array data fairly independent of the applied normalization.Availability and Implementation:The cnAnalysis450k package is available on github (https://github.com/mknoll/cnAnalysis450k).Contact:email@example.comSupplementary information:Supplementary data are available at Bioinformatics online.