Although simple tandem repeats (STRs) comprise ∽2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94 bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, whereas only 12% of STRs with ≥80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6% and 12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large-scale population studies.
We have developed a cost effective strategy for performing targeted enrichment of simple tandem repeats (STRs) that utilizes capture probes targeting the flanking sequences of STR, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Targeting >6,000 STRs located within gene promoter regions, the figure shows the percentage of STRs successfully genotyped and mean depth of informative reads as a function of STR (G+C) content.