We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis.
Availability and implementation: VariantBam and full documentation are available at github.com/jwalabroad/VariantBam.
Supplementary information: Supplementary data are available at Bioinformatics online.