Background: Non-invasive detection of aneuploidies in a fetal genome through analysis of cell-free DNA circulating in the maternal plasma is becoming a routine clinical test. Such tests, which rely on analyzing the read coverage or the allelic ratios at single-nucleotide polymorphism (SNP) loci, are not sensitive enough for smaller sub-chromosomal abnormalities due to sequencing biases and paucity of SNPs in a genome.
Results: We have developed an alternative framework for identifying sub-chromosomal copy number variations in a fetal genome. This framework relies on the size distribution of fragments in a sample, as fetal-origin fragments tend to be smaller than those of maternal origin. By analyzing the local distribution of the cell-free DNA fragment sizes in each region, our method allows for the identification of sub-megabase CNVs, even in the absence of SNP positions. To evaluate the accuracy of our method, we used a plasma sample with the fetal fraction of 13%, down-sampled it to samples with coverage of 10X–40X and simulated samples with CNVs based on it. Our method had a perfect accuracy (both specificity and sensitivity) for detecting 5 Mb CNVs, and after reducing the fetal fraction (to 11%, 9% and 7%), it could correctly identify 98.82–100% of the 5 Mb CNVs and had a true-negative rate of 95.29–99.76%.
Availability and implementation: Our source code is available on GitHub at https://github.com/compbio-UofT/FSDA.