Inter‐lab concordance of variant classifications establishes clinical validity of expanded carrier screening

Abstract Expanded carrier screening (ECS) panels that use next‐generation sequencing aim to identify pathogenic variants in coding and clinically relevant non‐coding regions of hundreds of genes, each associated with a serious recessive condition. ECS has established analytical validity and clinical utility, meaning that variants are accurately identified and pathogenic variants tend to alter patients' clinical management, respectively. However, the clinical validity of ECS, that is, correct discernment of whether an identified variant is indeed pathogenic, has only been shown for single conditions, not for panels. Here, we evaluate the clinical validity of a >170‐condition ECS panel by assessing concordance between >12 000 variant interpretations classified with guideline‐based criteria to their corresponding per‐variant combined classifications in ClinVar. We observe 99% concordance at the level of unique variants. A more clinically relevant frequency‐weighted analysis reveals that fewer than 1 in 500 patients are expected to receive a report with a variant that has a discordant classification. Importantly, gene‐level concordance is not diminished for rare ECS conditions, suggesting that large panels do not balloon the panel‐wide false‐positive rate. Finally, because ECS is intended to serve all reproductive‐age couples, we show that classification of novel variants is feasible and scales predictably for a large population.


| INTRODUCTION
A genetic test is described as having "clinical validity" if it yields a positive result when the clinical condition of interest is present and a negative result otherwise. 1 In the context of expanded carrier screening (ECS), which tests for tens to hundreds of Mendelian conditions simultaneously, patients are not typically affected with the screened conditions; rather, they are most often asymptomatic carriers who are at high risk of having an affected child if their reproductive partners are also carriers for any of the same conditions.
For an ECS panel to be clinically valid, it must correctly identify pathogenic variants in the gene associated with each screened condition. This requirement can be broken down into three steps to assess whether it is satisfied: (a) demonstration that the screened genes are associated with the conditions of interest, (b) evaluation of whether the test accurately discovers variants in those genes ("analytical validity"), and (c) correct discernment of which variants are pathogenic and which are benign. The first step has been satisfied for many of the most prevalent conditions screened by ECS (eg, defects in the CFTR, SMN1/2, FMR1, and HEXA genes cause cystic fibrosis, spinal muscular atrophy, fragile X syndrome, and Tay Sachs Disease, respectively [2][3][4][5]. Efforts are currently underway to apply established criteria 6 for gene-disease association to the less-common conditions on larger ECS panels (manuscript in preparation). The second step, which describes the analytical validity of ECS, is well established in the literature. 7 We recently validated that next-generation sequencing (NGS) coupled with software-assisted manual call review can detect singlenucleotide variants (SNVs), short insertions and deletions ("indels"), and copy-number variants (CNVs) with >99% sensitivity and specificity on a >170-condition ECS panel. 8,9 The third step, correct discernment of variant pathogenicity, has been investigated for some commonly tested ECS conditions, 10 but has not been well established for whole ECS panels. Nevertheless, it is critically important because the sequencing of full exons performed on many ECS offerings can discover novel variants whose pathogenicity must be assessed prior to reporting. 11 Here and elsewhere, we refer to novel variants as those that have not been previously detected and classified by the observing institution. In an attempt to make variant interpretation more systematic, the American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathologists (AMP) issued joint guidelines that specify combinations of evidence (eg, enrichment in cases relative to controls, clinical impact in animal models, etc.) that can yield the following classifications: benign, likely benign, variant of uncertain significance (VUS), likely pathogenic, and pathogenic. 12 These guidelines recommend that variant classification criteria be applied differently for ECS than for tests performed in an affected population: more stringent criteria must be met for pathogenicity because ECS patients are often asymptomatic, leading to rare variants with limited evidence often being classified as VUS. ECS laboratories typically only report pathogenic and likely pathogenic alleles (VUSs are not reported in ECS 11 ).
One way to assess the proficiency of discerning variant pathogenicity is through comparison to the consensus among submitters to public databases like ClinVar 13 (see Section 4). For instance, the clinical validity of hereditary cancer screening was explored through analysis of ClinVar concordance. 14 Other studies of ClinVar data reveal inter-lab disparity in variant classification that manifests as observed discordances. 13,15,16 Because ClinVar submissions often include the evidence underlying each classification, it may be possible to adjudicate discordances and understand their origin (eg, laboratories performing different types of testing may weigh the age of disease onset differently).
We investigated the clinical validity of a >170-condition ECS at the level of variant classification through concordance between our classifications and those from other laboratories with submissions in ClinVar.
We count the number of variants with concordant or discordant interpretations, classify the reasons for discordance, calculate the frequency with which patients' reports contain a variant with disputed interpretation, and assess gene-level concordance rates. Finally, because clinical uptake of ECS is growing and the number of variants requiring classification will increase proportionately, we explore the resources required to maintain the clinical validity of ECS as testing volume scales.

| Variant classification
We retrospectively queried variant classifications used internally at Myriad Women's Health ("MWH"; previously Counsyl, South San Francisco, California) for the Foresight ECS, which uses NGS of full exons or specialized assays to detect SNVs, indels, and CNVs in genes that cause 176 different recessive conditions. These classifications were generated in a manner consistent with the ACMG/AMP variant interpretation guidelines either manually or using software-assisted classification (for variants without literature references), and classifications are routinely re-evaluated as new data are obtained.

| Concordance analysis
We evaluated and categorized differences between MWH and Clin-Var classifications for the 172 genes of interest, simplifying assertions to "reportable" (eg, pathogenic or likely pathogenic) vs "not reportable" (eg, benign, likely benign, or VUS). ClinVar assertions were combined per variant using a majority rule. In the case of a tie, the variant was considered non-reportable. The combined entry was used for concordance analysis. The latest MWH interpretations for variants for which either the combined ClinVar interpretation or the MWH interpretation at the time of this study was "reportable" have been submitted to ClinVar under the name Counsyl (https://www.ncbi.nlm.nih. gov/clinvar/submitters/320494/). 18 ClinVar entries were classified as either concordant with MWH interpretations, or falling into one of seven discordance categories: Alleles were weighted by their population frequency (described below) to estimate the frequencies of different variant categories and to establish the respective rates at which a patient is expected to receive a report with concordant or discordant variants.

| Variant interpretation load analysis
To assess the workload required to interpret novel variants, we estimated the rate of observing novel alleles in the patient population and

| RESULTS
Our approach to assessing the clinical validity of a >170-condition ECS panel is best illustrated through an example ( Figure 1). For each partner in a couple, NGS was used to discover variants (Figure 1, top).
Importantly, in this study, we did not explore the efficacy of variant identification, as the analytical validity of ECS has been established previously. 8 Instead we focused on variant interpretation, which follows a guideline-based workflow to gather various forms of evidence that collectively yield a classification (Figure 1

| Discordant interpretations arise partly due to ECS-specific evidence requirements
We sought to elucidate common themes underlying the observed discordant variant classifications by categorizing their causes. For instance, because there is more of a premium on specificity in screening tests as compared to diagnostic tests, we expected that our ECS classification workflow would tend to favor non-reportable VUS and benign/likely-benign classifications relative to diagnostic tests that may have relatively more reportable pathogenic/likely-pathogenic classifications. Of the 237 "raw discordances," 76.8% were because of reportable assertions in ClinVar for variants that we did not consider meeting criteria for being reportable (ie, were considered VUS or benign/likely benign). After expert review, 44.7% of discordant variants had a clear explanation that warranted removal from further analyses in an ECS context: 25.7% of all discordances were because of variants with seemingly unreliable classifications based on sparse data and a hedged description (eg, an LP classification without case reports, based only on in silico analysis and low allele frequency, and accompanied by free-text stating "[the variant] is a strong candidate for a disease-causing variant, however, the possibility it may be a rare benign variant cannot be excluded"). Eleven percent were variants with no published cases and no other lines of evidence to support pathogenic classification, 3.4% were variants with homozygotes observed in the population (suggesting that the variant is either benign or low penetrance), and 4.6% were due to categories where reporting of variants in a carrier-screening setting might not be appropriate compared to a diagnostic setting (eg, variants with an adult-onset phenotype, variants whose pathogenicity is contingent on the presence or absence of a second variant in the same gene, and variants with reduced penetrance or variable expressivity). These variants with clearly explained discordances were not counted in the clinical performance analysis (Figures 2-3) because they did not meet MWH's definition of ECS-level evidence for pathogenicity or were not appropriate for reporting in ECS (see Table S3 for details on each of the 106 excluded variants). A remaining 131 (55.3%) of raw discordances could not be clearly categorized by an expert, the majority (74%) of those being cases where MWH did not consider there to be sufficient available evidence to interpret the variant as reportable ( Figure 2) (eg, other labs may be privy to additional patient data that enable the reportable interpretation with confidence). These 131 discordances were used in further analyses.

| The probability of carrying a variant with a legitimate interpretation difference is low, yielding high clinical sensitivity and PPV
We investigated the probability of an ECS patient receiving a report with at least one disputed variant call because prior exploration of variant discordance suggested that interpretation discordance was common. 13  To assess the clinical sensitivity, specificity, PPV, and NPV of MWH variant classifications, comparison to a truth set was needed.
However, for discordant variants, it was unclear whether the interpretation from MWH or ClinVar was correct. Therefore, we approximated the worst-case scenario for MWH by assuming that ClinVar is always the correct source of truth. Under this assumption, the aggregate clinical sensitivity of our ECS panel-based on variant interpretation concordance-is estimated to be >98%, the PPV >99%, the specificity >99.9% and the NPV >99.9% (Figure 3, red points). These data suggest that there is broad agreement among variant interpretations for an ECS panel with >170 genes.
We calculated the estimated clinical-performance metrics individually for every gene on the panel to test whether interpretation efficacy decreases for rare conditions (Table S4). For a common disease like cystic fibrosis, the estimated clinical sensitivity was 99.90%, with PPV, specificity, and NPV all >99.9% (Figure 3), comparable to levels of clinical validity reported for hereditary cancer screening of BRCA1/2. 14 The metrics were high for most genes individually, with 55% of genes having a sensitivity of >99.9%, 92% having a specificity of >99.9%, 73% having a PPV of >99.9%, and 84% having an NPV of >99.9% (Figure 3, pie charts). While 73% of genes had a sensitivity of >95%, 15% of genes had a sensitivity between 54% and 95% due to a small number of relatively high-frequency discordant variants that consequently have large impact on the calculation of gene-level sensitivity. In addition, 12% of genes could not be analyzed for sensitivity due to no reportable ClinVar variants observed by another laboratory that passed our filtering criteria (see Section 2). Notably, performance did not diminish for rare disorders: it remained high across the range of carrier rates (Figure 3). Generally, benign/likely benign variants tended to consume less time for interpretation than VUS or reportable variants did ( Figure 4B, C).

| Variant interpretation can be performed at scale
Variants with literature references tended to take more time to interpret than those without such evidence ( Figure 4D), with the time also being more variable.

| DISCUSSION
Here we evaluated a key aspect of ECS In this study, we differentiated between "raw discordances" and final discordances, and this analysis required the involvement of a variant-interpretation expert. For instance, we observed 61 discor-