Motivation: Copy number variation (CNV) is a type of structural variation, usually defined as genomic segments that are 1 kb or larger, which present variable copy numbers when compared with a reference genome. The screening and ranking algorithm (SaRa) was recently proposed as an efficient approach for multiple change-points detection, which can be applied to CNV detection. However, some practical issues arise from application of SaRa to single nucleotide polymorphism data.
Results: In this study, we propose a modified SaRa on CNV detection to address these issues. First, we use the quantile normalization on the original intensities to guarantee that the normal mean model-based SaRa is a robust method. Second, a novel normal mixture model coupled with a modified Bayesian information criterion is proposed for candidate change-point selection and further clustering the potential CNV segments to copy number states. Simulations revealed that the modified SaRa became a robust method for identifying change-points and achieved better performance than the circular binary segmentation (CBS) method. By applying the modified SaRa to real data from the HapMap project, we illustrated its performance on detecting CNV segments. In conclusion, our modified SaRa method improves SaRa theoretically and numerically, for identifying CNVs with high-throughput genotyping data.
Availability and Implementation: The modSaRa package is implemented in R program and freely available at http://c2s2.yale.edu/software/modSaRa.
Supplementary information: Supplementary data are available at Bioinformatics online.