Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge
The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area.
We describe the design and implementation of a gene panel sequencing data analysis pipeline, VarP. The performance of the pipeline was assessed in the CAGI 4 community experiment. VarP identified the correct disease class and potentially causative variant(s) in 36/106 patients, including 10 patients where the clinical pipeline did not find any causative variants. Post analysis showed that use of three-dimensional structure could have assisted interpretation in a number of cases.