Metagenome sequencing has been used to great effect to characterize the diversity of microbial communities, including the variation in gut microbiota associated with the development and severity of inflammatory bowel diseases (Kostic et al, 2014). However, modern sequencing-by-synthesis such as Illumina produces short reads that are often difficult to accurately assign to a taxonomic unit and most of which are not informative to differentiate closely related taxa. Recent long-read single-molecule sequencing technologies such as Pacific Biosciences and Oxford Nanopore Technologies (ONT) permit the identification of taxa with greater specificity and sensitivity than short reads, but are limited by high cost and low throughput.Methods:
To assess the accuracy of nanopore sequencing-based metagenome classification, we constructed a mock microbiota colony by mixing ten cultured bacteria species at approximately equal cellular volume. We sequenced the mock community using ONT's MinION sequencer in 2 replicates. We also constructed a DNA sample simulating host contamination by mixing the mock community with Arabidopsis thalaina at a ratio of 10% bacterial DNA and 90% host DNA. We used Kraken (Wood and Salzberg, 2014) to assign OTUs to the sequenced reads.Results:
Mock community replicates A1 and A2 produced 55,027 and 84,178 reads, respectively. The mock host/A1 mixture sequencing produced 7531 reads. Across replicates A1 and A2, 25% and 47% of reads were identified as bacterial and classified by phylum. Of those classified, over 90% were assigned to a single species or strain. All 10 bacterial species in the mock community were accurately identified and no false positives were found with a threshold of 3 reads. Identified species varied from 0.01% to 9.45% relative abundances, perhaps due to differences in cell size and cellular density in medium of the mock community. The lowest abundance member was accurately identified to the species level in only 3 reads. The mock host contamination data produced only 311 reads identified as bacterial due to the 90% “host” DNA and low sequencing throughput. All but the lowest abundance member was accurately identified. We saw high correlation between estimated taxa abundances between both community replicates (r2 = 0.861) and A1 versus host+A1 (r2 = 0.942).Conclusions:
We have shown that whole-metagenome shotgun sequencing using ONT's MinION nanopore sequencing platform allows identification of known species and often strains in a mock community. Long reads allow increased sensitivity and specificity of taxa identification relative to common short-read sequencing approaches. To extend this technology to characterize the gut microbiota associated with inflammatory bowel diseases, we also considered the effect of significant host contamination. We demonstrated that, although sensitivity is obviously decreased proportional to the read coverage lost to host DNA, specificity of taxa assignments and abundance estimates are highly consistent and reproducible.