1Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany2Department of Human Genetics, Saarland University, 66421 Homburg, Germany3Cancer Registry of Norway, Institute of Population-based Cancer Research, N-0304 Oslo, Norway4Hummingbird Diagnostics GmbH, 69120 Heidelberg, Germany5Department of Internal Medicine III, University Hospital Heidelberg, 69120 Heidelberg, Germany6German Center for Cardiovascular Research (DZHK), 69120 Heidelberg, Germany7Klaus Tschira Institute for Integrative Computational Cardiology, 69120 Heidelberg, Germany8Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany9Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
Checking for direct PDF access through Ovid
Motivation:Although the amount of small non-coding RNA-sequencing data is continuously increasing, it is still unclear to which extent small RNAs are represented in the human genome.Results:In this study we analyzed 303 billion sequencing reads from nearly 25 000 datasets to answer this question. We determined that 0.8% of the human genome are reliably covered by 874 123 regions with an average length of 31 nt. On the basis of these regions, we found that among the known small non-coding RNA classes, microRNAs were the most prevalent. In subsequent steps, we characterized variations of miRNAs and performed a staged validation of 11 877 candidate miRNAs. Of these, many were actually expressed and significantly dysregulated in lung cancer. Selected candidates were finally validated by northern blots. Although isolated miRNAs could still be present in the human genome, our presented set likely contains the largest fraction of human miRNAs.Contact:email@example.com or firstname.lastname@example.orgSupplementary information:Supplementary data are available at Bioinformatics online.