An emerging standard-of-care for long-QT syndrome uses clinical genetic testing to identify genetic variants of the KCNQ1 potassium channel. However, interpreting results from genetic testing is confounded by the presence of variants of unknown significance for which there is inadequate evidence of pathogenicity.Methods and Results—
In this study, we curated from the literature a high-quality set of 107 functionally characterized KCNQ1 variants. Based on this data set, we completed a detailed quantitative analysis on the sequence conservation patterns of subdomains of KCNQ1 and the distribution of pathogenic variants therein. We found that conserved subdomains generally are critical for channel function and are enriched with dysfunctional variants. Using this experimentally validated data set, we trained a neural network, designated Q1VarPred, specifically for predicting the functional impact of KCNQ1 variants of unknown significance. The estimated predictive performance of Q1VarPred in terms of Matthew’s correlation coefficient and area under the receiver operating characteristic curve were 0.581 and 0.884, respectively, superior to the performance of 8 previous methods tested in parallel. Q1VarPred is publicly available as a web server at http://meilerlab.org/q1varpred.Conclusions—
Although a plethora of tools are available for making pathogenicity predictions over a genome-wide scale, previous tools fail to perform in a robust manner when applied to KCNQ1. The contrasting and favorable results for Q1VarPred suggest a promising approach, where a machine-learning algorithm is tailored to a specific protein target and trained with a functionally validated data set to calibrate informatics tools.