1Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD, USA2School of Medicine, Sun Yat-sen University, Guangdong, China3Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA4Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA5Department of Biomedical Engineering, Johns Hopkins Whitehead School of Engineering, Baltimore, MD, USA6Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Checking for direct PDF access through Ovid
Motivation:The alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can be mitigated by appropriate algorithmic and software-engineering improvements. One strategy is to modify the read-alignment algorithms by integrating the logic related to BS-seq alignment, with the goal of making the software implementation amenable to optimizations that lead to higher speed and greater sensitivity than might otherwise be attainable.Results:We evaluated this strategy using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by well-known CPU-based BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.Availability and implementation:The Arioc software is available for download at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.Supplementary information:Supplementary data are available at Bioinformatics online.