High number of candidate gene variants are identified as disease‐causing in a period of 4 years

Advances in bioinformatic tools paired with the ongoing accumulation of genetic knowledge and periodic reanalysis of genomic sequencing data have led to an improvement in genetic diagnostic rates. Candidate gene variants (CGVs) identified during sequencing or on reanalysis but not yet implicated in human disease or associated with a phenotypically distinct condition are often not revisited, leading to missed diagnostic opportunities. Here, we revisited 33 such CGVs from our previously published study and determined that 16 of them are indeed disease‐causing (novel or phenotype expansion) since their identification. These results emphasize the need to focus on previously identified CGVs during sequencing or reanalysis and the importance of sharing that information with researchers around the world, including relevant functional analysis to establish disease causality.

genes with the potential for phenotypic expansion also emerge after initial clinical sequencing or after reanalysis of the sequenced data.
The identification of candidate gene variants (CGVs) is an important step in genomic analysis and reanalysis, as it can be the starting point for further investigation into the genetic cause and ultimately lead to a diagnosis.However, just because a gene or variant is considered a candidate does not necessarily mean that it will be linked with the patient's phenotype.Integrating findings from multiple lines of evidence is necessary to establish a causal link between the candidate gene and disease association including: (1) in silico predictions using various software (Ioannidis et al., 2016;Rentzsch et al., 2019); (2) information about the gene's pattern of expression in relevant tissues; (3) in vitro or in vivo functional assessments; (4) evidence from animal models (Boycott et al., 2020); and (5) identification of additional variants in unrelated individuals with overlapping phenotypes (Frésard & Montgomery, 2018;Osmond et al., 2022).
In a previous pilot study, we reanalyzed 102 undiagnosed cases following CES and were able to establish new genetic diagnoses in six individuals and identify 33 cases with CGVs (Schmitz-Abe et al., 2019).
Sonia Hills and Qifei Li contributed equally to this work.
Here, we revisited those 33 cases and established that 16 of these CGVs now have published disease associations or phenotypic expansions consistent with our patients' clinical presentations (49%).Of the remaining 17 CGVs, 15 (45%) are still candidates and 2 (6%) have been ruled out as disease-causing.Our experience demonstrates that CGV identification and ensuing extensive workup are critical toward disease gene discovery and ending diagnostic odysseys.

| Subjects
Participants' written informed consent was obtained in accordance with the IRB-approved research protocol (10-02-0053) at the Manton Center for Orphan Disease Research of Boston Children's Hospital.
Genomic data were obtained from the laboratories that completed the initial CES.

| CGV identification and sharing
Fastq raw data were processed through our custom-built pipeline ("Variant Explorer Pipeline") to detect candidate variants as previously described (Schmitz-Abe et al., 2019).To summarize, variants were filtered to include those with a predicted possible functional coding consequence, allele frequencies based on control populations, read depths of at least 10Â, and complete penetrance.Further variant filtration was performed based on the patient phenotypes.Variants in disease-associated genes were further interpreted using the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines for variant classification (Pejaver et al., 2022;Richards et al., 2015).CGVs were defined as variants in novel genes or variants in known genes with a potential for expansion of the phenotype.To corroborate findings and/or to find potential collaborators for functional analysis, CGVs were shared on platforms such as "Matchmaker Exchange" (Philippakis et al., 2015) and "GeneMatcher" (Sobreira et al., 2015).

| Evaluation of CGVs
Participants' clinical records were reviewed to obtain the most upto-date information regarding genetic diagnoses.In individuals for whom a different genetic variant was identified as a likely cause for their phenotype or the CGV was subsequently found in healthy family members, the previously identified candidate was ruled out.
The remaining CGVs were further evaluated via literature review and by discussion with the Manton Center Gene Discovery Core Team consisting of genetic counselors, bioinformaticians, researchers, and both internal and referring clinicians.The National Center for Biotechnology Information's (NCBI) PubMed database and the Human Gene Mutation Database (HGMD) were queried for peer-reviewed publications containing each CGV (Stenson et al., 2020).

| CGV classification
The CGVs were reclassified as confirmed if at least one peer-reviewed article was already published or submitted for publication.CGVs that did not meet these criteria continued to be classified as candidates.
Overall, the CGVs were classified into three groups after reevaluation: confirmed (Group 1), candidate (Group 2), and excluded (Group 3) (Figure 1).Within Group 1, CGVs were subdivided: published variants in confirmed disease genes, including both new gene-disease associations and phenotypic expansions (Group 1A), and novel variants in published disease-associated genes with clinical presentations matching the published patient phenotype (Group 1B).Group 2 was subdivided into two categories as well: variants in genes that remain unlinked with human disease (Group 2A) and those that are now linked to human disease, but our patient's phenotype does not overlap with the published phenotype (Group 2B).

| Statistical analysis
Statistical analysis was performed using Microsoft Excel Version 2212 and GraphPad by Dotmatics.

| RESULTS
The 33 CGVs described in our earlier article were identified either by initial CES (n = 15) or subsequently upon reanalysis by our team (n = 18) (Schmitz-Abe et al., 2019).Over the course of four years after publication of the original article, we have worked closely on these CGVs with collaborators, MatchMaker, and GeneMatcher to further investigate these CGVs.As part of these efforts, we shared data, tissues when available, and performed functional analysis to evaluate and link the CGVs to human disease, resulting in the publication of those findings.The CGVs were classified into three groups (Figure 1 and Table 1).Overall, 16 of those 33 CGVs are now confirmed to be newly associated with human disease or a phenotypic expansion of an already known disease, 15 continue to be classified as candidates, and two have been excluded.
Of the 16 confirmed CGVs, 11 (69%) represent novel gene discovery while five (31%) describe phenotypic expansions of already known disease genes (Table 2).Among these 16 CGVs, the confirmation process involved the identification of unrelated individuals with overlapping phenotypes in 11 cases (69%), in vitro functional assessment of cellular models in 6 cases (38%), the use of mouse models in 2 cases (13%), and other in vivo models (Xenopus, Drosophila) in 3 cases (19%) (Table 2).The average and median times between CGV identification (either by CES or reanalysis) and formal association with human disease or phenotypic expansion were 2.9 years ± 1.7 years and 2.5 years, respectively.In our cohort, CGVs identified by initial CES were more likely to be published as disease-associated than those identified upon reanalysis (73% for CES vs. 28% for reanalysis, p = 0.015, Fisher's exact test).
In 13 (81%) of the 16 confirmed CGVs, both the candidate gene and the specific genetic variant have been reported as disease-causing via publication (Group 1A) (Table 2).We also identified three novel variants (Group 1B) in now known disease-causing genes with clinical presentations matching the published phenotypes (Table 2).The classification of each published or novel variant as disease-causing was determined using a variety of evidence, including phenotypic overlap with unrelated individuals carrying variants in the same gene, protein domains, population frequencies, in silico prediction tools used by the authors of the publications, and functional analysis using cellular and mouse models (Table 2).
Fifteen (45%) of the CGVs remain candidates (Group 2) (Table 1) of which five (33%) have been confirmed to be disease-associated, but our patients' clinical presentations did not match with those reported in the literature (Group 2B) (Table S1).Two candidates were excluded (Group 3), the first with a PTBP1 variant which was found in multiple healthy family members and the second with candidate KIF14 replaced by two variants in THG1L that were published as disease-causing and matched the phenotype (Rabin et al., 2021).

| DISCUSSION
This study evaluates the relationship between CGVs and their perceived roles in disease causation over time.There are some existing studies on nondiagnostic exome or genome reanalysis that have solved a portion of undiagnosed cases in addition to identifying CGVs (Eldomery et al., 2017;Hagman et al., 2017;Liu et al., 2019;Schobers et al., 2022;Shashi et al., 2019).However, few articles have discussed the follow-up of these CGVs and how many are ultimately determined to be diseasecausing over a period of time.This study primarily focuses on reevaluating previously identified CGVs (Schmitz-Abe et al., 2019).
We heavily focus on CGVs as part of our gene discovery strategy by sharing them on various data-sharing platforms such as Gene-Matcher and Matchmaker Exchange, performing functional analysis in our laboratory as needed, and diligently collaborating with other groups interested in many of these genes, leading to the publication of new gene-disease associations and phenotypic expansions.We revisited the initial cohort of 33 CGVs, and in less than 4 years, 16 of them have been confirmed to be newly associated with disease or have had the disease phenotype expanded.This suggests that a large proportion of CGVs turn out to be disease-causing after a consistent follow-up.
We often consider ordering additional genetic testing, including whole genome sequencing, RNA sequencing, and methylation studies when initial next-generation sequencing such as CES identifies a CGV which is deemed inconclusive (Wojcik et al., 2023) . While these tests should remain part of the consideration, adequate attention and CADD, combined annotation dependent deletion; CES, clinical exome sequencing; ExAC, Exome Aggregation Consortium; gnomAD, Genome Aggregation Database; LP, likely pathogenic; NA, not available; P, pathogenic; PolyPhen-2, Polymorphism Phenotyping v2; PROVEAN, protein varition effect analyzer; REVEL, rare exome variant ensemble learner; SIFT, sorting intolerant from tolerant; VUS, variant of uncertain significance.should be given to CGVs.In order to incorporate this practice into clinical care, we propose a twofold approach: inclusion of the follow-up steps described herein by commercial laboratories to target CGVs flagged on initial analysis paired with clinician referral of patients with negative CES to research groups specializing in this type of genetic reanalysis.Overall, we recommend a close follow-up of CGVs by sharing them on various data-sharing platforms and reevaluating them periodically, both among commercial laboratories and research groups.This will enable new gene discoveries, gene-disease connections, and phenotypic expansions, assisting families with diagnosis and potential therapeutics.
F I G U R E 1 Candidate gene variant classification.CGV, candidate gene variant.T A B L E 1 Candidate gene classification.a Novel gene.T A B L E 2 List of Group 1 cases.a Novel gene.b Singleton.resources