CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants
Compared with earlier more restricted sequencing technologies, identification of rare disease variants using whole-genome sequence has the possibility of finding all causative variants, but issues of data quality and an overwhelming level of background variants complicate the analysis. The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of variants found in a difficult set of 25 unsolved rare disease cases. To address the challenge, we developed a three-stage pipeline, first carefully analyzing data quality, then classifying high-quality gene-specific variants into seven categories, and finally examining each candidate variant for compatibility with the often complex phenotypes of these patients for final prioritization. Variants consistent with the phenotypes were found in 24 out of the 25 cases, and in a number of these, there are prioritized variants in multiple genes. Data quality analysis suggests that some of the selected variants are likely incorrect calls, complicating interpretation. The data providers followed up on three suggested variants with Sanger sequencing, and in one case, a prioritized variant was confirmed as likely causative by the referring physician, providing a diagnosis in a previously intractable case.
The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of genetic variants found in a difficult set of 25 unsolved rare disease cases. We used a three-stage pipeline consisting of careful analysis of data quality, classification of disease relevant gene-specific variants into seven categories, and prioritization of variants by compatibility with the complex phenotype of a patient.We were able to determine the genetic cause of a case of epileptic encephalopathy, a missense mutation in KCNB1.