Motivation: Identification of microRNA (miRNA) transcriptional start sites (TSSs) is crucial to understand the transcriptional regulation of miRNA. As miRNA expression is highly cell specific, an automatic and systematic method that could identify miRNA TSSs accurately and cell specifically is in urgent requirement.
Results: A workflow to identify the TSSs of miRNAs was built by integrating the data of H3K4me3 and DNase I hypersensitive sites as well as combining the conservation level and sequence feature. By applying the workflow to the data for 54 cell lines from the ENCODE project, we successfully identified TSSs for 663 intragenic miRNAs and 620 intergenic miRNAs, which cover 84.2% (1283/1523) of all miRNAs recorded in miRBase 18. For these cell lines, we found 4042 alternative TSSs for intragenic miRNAs and 3186 alternative TSSs for intergenic miRNAs. Our method achieved a better performance than the previous non-cell-specific methods on miRNA TSSs. The cell-specific method developed by Georgakilas et al. gives 158 TSSs of higher accuracy in two cell lines, benefitting from the employment of deep-sequencing technique. In contrast, our method provided a much higher number of miRNA TSSs (7228) for a broader range of cell lines without the limitation of costly deep-sequencing data, thus being more applicable for various experimental cases. Analysis showed that upstream promoters at − 2 kb to − 200 bp of TSS are more conserved for independently transcribed miRNAs, while for miRNAs transcribed with host genes, their core promoters (−200 bp to 200 bp of TSS) are significantly conserved.
Availability and implementation: Predicted miRNA TSSs and promoters can be downloaded from supplementary files.
Contact:email@example.com or firstname.lastname@example.org or email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.