Genotype–phenotype associations in 1018 individuals with SCN1A‐related epilepsies

SCN1A variants are associated with epilepsy syndromes ranging from mild genetic epilepsy with febrile seizures plus (GEFS+) to severe Dravet syndrome (DS). Many variants are de novo, making early phenotype prediction difficult, and genotype–phenotype associations remain poorly understood.


| INTRODUCTION
][3][4] DS usually presents at approximately 5-6 months of age with prolonged, febrile and afebrile, hemiclonic or generalized clonic seizures.From age 9 months to 4 years other seizure types, including myoclonic, absence, and focal seizures, develop. 5,6Typical antiseizure medications have limited efficacy, and sodium channel blockers are associated with worse outcomes. 7From age 2 years, cognitive, behavioral, and motor development becomes significantly impaired.Epilepsies within the less severe GEFS+ spectrum also present early in life 8 ; however, cognitive development is normal, and many cases have a positive family history.
Within SCN1A, pathogenic missense variants account for 45%-55% of DS phenotypes, with the remainder consisting of protein-truncating variants (PTVs), copy number variants, and other mutation types. 9,10More than 90% of GEFS+ variants are missense. 11Truncating variants lead to haploinsufficiency, 12 mediated by nonsensemediated decay (NMD) of mRNA containing premature stop codons causing total loss of function (LOF).Missense variants have varied functional effects, with those causing complete LOF likely resulting in DS and those causing partial LOF being associated with milder GEFS+ phenotypes. 13,14Functional studies are time-consuming and costly, so accurate functional information is only available for a small proportion of the thousands of variants associated with DS and GEFS+. 15,16Disease-associated PTVs are more evenly distributed across the gene, whereas missense variants are relatively confined to regions that code for sections important for channel function, namely, the S4 voltage sensor and S5-S6 pore regions. 11,17SCN1A is mainly expressed in cortical interneurons, and the predominant disease mechanism in SCN1A-related epilepsies is reduced function in inhibitory interneurons shifting the balance toward increased neuronal excitation. 180][21][22][23] Our recently developed SCN1A prediction model builds on previous work 24, 25 and, by utilizing machine learning algorithms, supports clinical judgment in predicting the likelihood of DS or milder GEFS+ phenotypes. 6= .043).Status epilepticus as presenting seizure type is a highly specific (95.2%)but nonsensitive (32.7%) feature of DS.
Significance: Understanding genotype-phenotype associations in SCN1Arelated epilepsies is critical for early diagnosis and management.We demonstrate an earlier disease onset in patients with missense variants in important functional regions, the occurrence of GEFS+ truncating variants, and the value of in silico prediction scores.Status epilepticus as initial seizure type is a highly specific, but not sensitive, early feature of DS.

K E Y W O R D S
Dravet syndrome, GEFS+, genotype-phenotype associations, SCN1A, severe myoclonic epilepsy of infancy

Key points
• We report genotype-phenotype analysis of an international cohort of 1018 individuals with SCN1A-related epilepsies • Missense variants in functionally important regions are more often associated with DS, whereas those in nonconserved distal and linker regions are more often associated with less severe GEFS+ phenotypes • Truncating variants at the very beginning or the very end of the SCN1A gene are rare and more likely to be associated with less severe GEFS+ phenotypes • Identical missense variants share more similar phenotypic features • Status epilepticus as the initial seizure type is a highly specific, but not sensitive, feature of DS The majority of disease-associated SCN1A variants occur de novo, and despite recent advances genotypephenotype associations are still poorly understood.This study assesses data from a retrospective cohort of 1018 individuals with SCN1A-related epilepsies, exploring how seizure characteristics, variant type, position, and in silico scores relate to the epilepsy phenotype.We hope that further elucidating the relationships between these features will aid clinicians in improved prediction of phenotype in patients with SCN1A-related epilepsies.

| Phenotype and clinical data
This was a retrospective cohort study of 1018 SCN1Apositive patients from the United Kingdom (n = 276), Australia (n = 203), 6,26 France (n = 201), Italy (n = 126), the Netherlands (n = 109), Belgium (n = 72), and Denmark (n = 31).Cohort details have recently been published as part of the development and validation of a prediction model for early diagnosis of SCN1A-related epilepsies. 27he cohorts included cases that came via research referrals and from consecutive referrals for clinical genetic testing across various centers in their respective countries.
Epilepsy phenotype was classified by clinicians with expertise in epilepsy diagnosis.The criteria for classification of phenotypes were as follows.DS was defined as generalized or hemiclonic seizures frequently triggered by fever and often prolonged, typically followed by other seizure types including myoclonic, focal impaired awareness, and absence seizures; and normal cognitive and psychomotor development prior to seizure onset, with subsequent slowing, including plateauing or regression of skills in the second year of life. 28Cases were classified as GEFS+ if phenotypes were concordant with the GEFS+ spectrum and of normal intellect regardless of family history. 26nset was recorded as age at first seizure in months.Status epilepticus as the initial seizure type was classified as a first seizure lasting longer than 30 min. 29

| Variant identification
We report on patients whose variants were previously identified and validated as part of published cohorts. 27A summary of the methodology used to identify variants in those cohorts is as follows.For identification of single nucleotide variants, all 26 exons of SCN1A were examined by Sanger or next generation sequencing.Large-scale gene rearrangements were identified by multiplex ligationdependent probe amplification.Missense variants and PTVs including premature stop codons, frameshifts that resulted in premature stop codons, large deletions, and whole gene deletions were included.Intronic variants predicated to disrupting splicing were omitted due to the complexity in predicting their functional effects.

| Variant stratification
Missense variants were grouped into one of 43 regions according to their amino acid protein position.The region boundaries were determined by the protein topology provided by the SCN1A protein annotation on UniProt. 30issense variants at the same amino acid position with matching amino acid substitutions were classified as identical variants.

| Control of repeated family data
If identical variants originated from the same family, data were only counted once per family for analyses that could be affected by bias imparted by repeated measurement.For those variables where data may have differed between family members, for example, age at seizure onset and variant scores, the median value was used so as to best reflect a random sample.An exception was made for families with intrafamilial variable phenotypes, for example, including both DS and GEFS+ affected individuals.Within these families, data could neither be consolidated to one representative data point nor included without introducing repeated measurement of familial data, hence they were excluded from those analyses.This applied to six families with the following variants: p.G163R (Dutch), p.A201V (Dutch), p.V406A (Dutch), p.S570I (Italian), p.T875K (Australian), and p.V982A (Dutch).
For analyses specific to identical variants, family members were examined as separate individuals.

| Variant scoring
A number of tools designed to assess the deleteriousness or pathogenicity of a specific variant were used.All variants were given an SCN1A genetic score, which estimates a variant's pathogenicity based on paralogue conservation and physicochemical difference.The higher the SCN1A genetic score, the greater the predicted severity (range = 0 [similar]-207 [dissimilar]). 27For variants with available chromosomal position data, we obtained a Combined Annotation Dependent Depletion (CADD) score: a derivation of genomic data including sequence context, gene models, evolutionary constraint, epigenetic influence, and functional prediction.The higher the CADD score (range = 1-99, log derived), the greater the likelihood of pathogenicity. 19Additionally, for all missense variants, the Rare Exome Variant Ensemble Learner (REVEL) score and Grantham score were calculated. 20,21REVEL scores (range = 0-1) combine the output of 13 individual tools covering amino acid attributes, conservation, and biochemical features, with a higher score corresponding to pathogenicity.Grantham scores (range = 5-215) estimate the degree of physicochemical difference between amino acids, with higher scores indicative of greater dissimilarity.

| Statistical analysis
Significance level was set at α = .05.Median values and interquartile ranges (IQRs) compared differences in onset of two groups when data were nonnormally distributed.We used Mann-Whitney U-tests to assess significant differences in median onset when variables were nonnormal but similarly distributed.Independent sample t-tests measured significant differences in mean values of continuous variables between two groups when data were normally distributed and in exploratory and descriptive analyses.If homogeneity of variances was violated, the unequal variances t-test was reported.Spearman rank correlations were used to find the strength of association between onset and variant scores.Binomial tests with Clopper-Pearson 95% confidence intervals, chi-squared tests, and Fisher exact tests were used to assess the significance of differences in proportions.Levene test was used to find significant differences in variance between two groups.SD and IQR were utilized as descriptive measures of variance.A Wilcoxon signed-rank test was used to assess the difference between the medians of individuals with the identical variant and the median of their comparative phenotypic cohort.In analyses where multiple testing was conducted, Bonferroni correction method was applied to p-values.Tests were carried out on IBM SPSS Statistics v28 and R Stats Package v4.1.2.The figures with structural heatmaps were generated with PyMOL v2.4.1.Preparation of the variants for the heatmaps was done using custom Python scripts.
Anonymized data not published within this article will be available from the corresponding author by email on reasonable request.

| Ethical approval
Retrospective review of anonymized clinical referral data and variant findings was approved by the relevant institutional review boards (West of Scotland Research Ethics Committee, reference number 16/WS/0203).

| Phenotypic differences of missense variant carriers relate to coding region
For the entire cohort, patients with missense variants located more proximally to the central pore coding region had earlier age at onset as indicated by the increased red shading centrally in Figure 2A.When focusing on the GEFS+ cohort, a similar pattern was noted.Patients with variants located further away from the central pore coding region presented with later age at seizure onset, as indicated by the increase in blue shading peripherally in Figure 2B.Overall, regardless of the phenotype (DS vs. GEFS+), patients with missense variants affecting pore forming regions (S5 + S5-S6 linker + S6) had significantly earlier seizure onset compared to patients with missense variants located outside the pore coding regions (MeanO nset PoreRegions = 7.7 months vs. MeanOnset OutsidePore = 9 .3m onths, p adj < .001).
We then investigated how the missense variant frequency related to age at seizure onset, according to functional region.To do so, we first assessed the variant frequency across the whole protein (486 variants among 2009 amino acid residues; Table S2) and assigned this ratio a value of 1. Individual coding regions were then grouped according to their respective variant frequency ratio (Figure 3A).Regions were labeled "normal" if the ratio was valued at .5-1.5, "variant dense" if the ratio was >1.5, and "variant sparse" if the ratio was <.5.Variant dense regions included conserved N-terminus, S4, S4-S5, S5, S5-S6, and S6.Variant sparse regions included the nonconserved N-terminus, S3-S4, D1-D2, D2-D3, and nonconserved C-terminus.S1-S3, D3-D4, and the conserved C-terminus made up the remaining "normal" regions.
First, we compared differences in presentation as measured by age at seizure onset according to the coding regions of the SCN1A protein.Patients with missense variants in variant dense regions consistently presented with earlier seizure onset (all ≤7.0 months; Figure 3B).Patients with variants in variant sparse regions presented with later seizure onset, with only one of these regions being associated with median onset < 7.0 months (nonconserved C-terminus).As a whole, variant dense regions were significantly associated with earlier age at seizure onset compared with both variant sparse (6.0 vs. 10.0 months, F I G U R E 1 Distribution of SCN1A variants throughout the gene.(A) Excess of the SCN1A missense variant burden for Dravet syndrome (DS) versus milder genetic epilepsy with febrile seizures plus (GEFS+) phenotypes by protein position.X-axis displays the protein positions made up of the different domain and linker gene regions.Schematic displays the unfurled SCN1-alpha subunit in situ in the cell membrane scaled to the x-axis alongside with the important topological protein regions included according to the length of the protein they make up.The exact amino positions contained within each region are given in the appendix (Table S1).Four homologous functional domains (I, II, III, IV) made up of six transmembrane regions (S1, S2, S3 [green], S4 [yellow], S5, S6 [orange]) are connected by linker regions (curved black lines).The S4 regions are positively charged, voltage gating the channel.The S5 and S6 regions make up the lining of the pore.The S5-S6 linker regions critically have parts of their polypeptide chain enter into the pore region through which sodium ions pass.The protein is capped by N-and C-terminus branches, illustrated by NH 3 + and CO 2 − , respectively.(B) Excess of the protein-truncating variant burden for DS versus GEFS+ patients by protein position.X-axis also displays the protein positions coded for the SCN1A exons 1-26. 31(A, B) DS patient excess is shown in pink; GEFS+ patient excess is shown in blue.The x-axis is divided into sliding windows of 41 residues/protein positions.The horizontal lines traversing the figure represent the expected level of variants identified across the gene if spread evenly, also colored according to phenotype.p < .001)and normal regions (6.0 vs. 7.0 months, p = .003).Variant sparse regions were also significantly associated with later onset seizures than normal regions (10.0 vs. 7.0 months, p = .036).
We then determined whether the proportion of missense variants associated with GEFS+ phenotypes could be related to this distribution pattern.The proportion of GEFS+-associated missense variants in this cohort was 21.8% (106/486, numbers consolidated as per Materials Methods section 2.4, Control of Repeated Family Data).Variant sparse regions were associated with increased frequency of GEFS+ (15/29, 51.7%), whereas variant dense regions were mainly associated with DS (280/340, 82.3%, χ 2 = 19.161,df = 1, p < .001).The exact values used to calculate these proportions are in the supplementary appendix (Table S2).Within the individual gene domains/subdomains, variant density was strongly negatively correlated with the proportion of GEFS+ patients (−.700, p = .004).

| Phenotypic features among patients with identical variants
The cohort contained 14 variants that were found in five or more patients, often from the same family.

T A B L E 2
Truncating variants associated with GEFS+ phenotypes.
We compared the variability in age at seizure onset between those who shared the same identical variant and those with different variants.We distinguished variants associated with DS only, those associated with other GEFS+ phenotypes only (GEFS+), and those associated with both DS and other GEFS+ phenotypes (Mixed).SD of age seizure onset for the entire DS cohort was 2.9 months (n = 823).The IQR of onset for the entire other GEFS+ cohort was 11 months.These were considered standard values of seizure onset variability within DS only and other GEFS+/Mixed phenotypes, respectively.All identical DS variant carrier groups exhibited SDs in seizure onset age of <2.9 (Table 3).All identical other GEFS+/Mixed variant carrier groups exhibited IQRs of <11.Identical DS only variant carriers exhibited less variability in age at onset than nonidentical DS variant carriers (1.9 vs. 2.9 months, p = .001).Identical other GEFS+/Mixed identical variant carriers also showed reduced variability in age at onset compared to nonidentical other GEFS+ variant carriers (8 vs. 11 months, p = .043).

| DISCUSSION
This large cohort study of patients with SCN1A variants provides evidence of association between gene variant features and phenotype.This includes differences in age at seizure onset associated with different variant types, identification of specific regions within SCN1A associated with specific presentations, and initial status epilepticus as a predictive phenotypic marker for DS.We interrogated the value of in silico variant scores as predictors of phenotype.
Age at seizure onset is both earlier and less variable for patients with truncating variants compared to missense variants.These findings are thought to be explained by the differences in disease-causing mechanisms associated with each variant type.PTVs result in nonfunctional alleles causing haploinsufficiency consistent with complete LOF. 12 Recent functional studies show SCN1A missense variants produce various altered states of functionality that are likely to affect severity of the phenotype.Variants associated with nonmeasurable  sodium currents (i.e., complete LOF) are associated with severe phenotypes, whereas variants with reduced but measurable current and changes in polarization and activation properties (i.e., partial LOF) are associated with milder phenotypes. 13,14The increased variance of phenotypic features associated with missense variants within this cohort may be explained by differences in the degree of LOF caused by these pathogenic variants.
Missense variants occur at higher frequencies in regions coding for important functional segments of the SCN1A protein. 11,17We confirm that these variants broadly cluster in the four functional domains with increased clustering in regions S4-S6.S4 is highly conserved and functions as the voltage sensor, whereas the S5-S6 regions are pore-lining and are responsible for ion selectivity. 32Our data add that collectively missense variants affecting these regions present with earlier seizure onset compared with those affecting other regions of the protein.Furthermore, these findings may also indicate a previously overlooked functional importance of the conserved N-terminus with this coding region, as this region is associated with increased pathogenic variant density, earlier age at onset, and increased proportion of DS phenotypes equivalent to regions of already well-established importance.A Japanese cohort recently demonstrated clustering within this region of the N-terminus. 17Those missense variants occurring in functionally less important areas, the D1-D2 and D2-D3 linkers and nonconserved terminus regions, had a higher chance of being associated with GEFS+ phenotypes and later seizure onset.It is of note that the D3-D4 linker region displayed both higher variant density and lower age at onset when compared to D1-D2 and D2-D3 linkers.The D3-D4 linker region of SCN1A codes for part of the functionally important inactivation gate of the sodium channel and has been previously linked with gain-of-function variants in patients with familial hemiplegic migraine type 3. 33,34 This may explain why this specific linker region exhibits pathogenicity similar to that of functionally more important gene regions (S1-S3 and conserved C-terminus).
The association of milder GEFS+ phenotypes with severe truncating variants is surprising, as these are expected to lead to complete LOF and a severe DS phenotype.So far, this has only rarely been reported.Mosaic variants are thought to account for approximately 7.5% of SCN1A epilepsies. 35Mosaic variants are associated with milder disease and may provide an explanation for the occurrence some of these GEFS+ phenotypes, although access to these genetic data for confirmation was not available (apart from the Dutch cohort).We observed that GEFS+-associated PTVs were nearly 10 times more likely to be found in the regions coded by exon 26 or first 100 nt.NMD, the mechanism responsible for haploinsufficiency in PTVs, has been shown to exhibit reduced efficiency within these regions. 36NMD is least efficient for stop codon variants found within the last exon and first 100 nt of a gene.Variants in these regions may result in the production of proteins due to a faulty NMD process.These proteins may retain some functionality, which could translate into a mitigation of haploinsufficiency.Additionally, NMD efficiency decreases the further downstream from the last exon junction a stop codon variant is located.This could apply to SCN1A, where exon 26 is exceptionally long, and may explain why toward the extreme end of the gene and exon there appears to be a drop-off in disease-associated PTVs entirely.Only one variant within this cohort was identified after position 1930, and this was associated with a GEFS+ phenotype.The increasing frequency of milder GEFS+ phenotypes within the region in question may indicate a trend toward more benign PTVassociated phenotypes within the last exon.This drop in truncating variant frequency in the distal C-terminus may indicate the point where NMD functionality is considerably reduced.Furthermore, a similar pattern can be observed around these regions in the missense variant carrier cohort.Within the nonconserved N-terminus coding region, only four missense variants were identified, all of which were associated with GEFS+ phenotypes.Additionally, after position 1928, only two missense variants (one DS, one GEFS+) were identified, supporting the idea that variants in these functionally less important regions are rarely associated with severe phenotypes for both variant types.Further functional studies regarding truncating variants found in the last exon and <100 nt regions could provide evidence of their partially/fully rescued functional status.It is also conceivable that the real effect of the change is different from an assumed theoretical one, and could for example be associated with a nonsense-mediated altered splicing event that would restore the phase.
A lack of similarity in anecdotal clinical reports associated with identical SCN1A variants might suggest that SCN1A-related epilepsies lack strong genotypephenotype associations.Our large sample provided us with a sufficient number of identical variants to test this hypothesis.Using age at seizure onset as our parameter of comparison, all "identical variant" groups within this cohort exhibited less variability in age at seizure onset compared with phenotypic controls (DS and GEFS+).However, we still observed phenotypic discrepancy, although reduced, among some identical variant carriers, which emphasizes the complexity of variant prediction.8][39] Further studies of identical variant carrier cohorts with divergent phenotypes could prove useful ground effects.
If easily accessible in silico and biochemical variant scores determined by variant characteristics can show associations to phenotype or severity of disease, they could act as clinical tools in outcome prediction of patients presenting with SCN1A-related epilepsy.Current studies have proven the efficacy of tools such as CADD and REVEL in distinguishing pathogenic from benign SCN1A variants, but evidence linking them to severity and epilepsy type is lacking. 13The recently developed SCN1A prediction model provided evidence to support its further ability to accurately predict the associated phenotype at onset. 27However, prediction outputs complement rather than replace the clinical diagnostic workup.
SCN1A-related epilepsies can prove difficult to differentiate at initial presentation.Status epilepticus has long been associated with DS and SCN1A-related epilepsies. 40,41For the subcohort of individuals where we had access to initial seizure type data, approximately one third of DS patients presented with status epilepticus compared with only 5% of GEFS+ patients.Our findings show that status epilepticus as initial presentation is a nonsensitive but highly specific and predictive feature of DS.This provides clinicians with the ability to more confidently diagnose DS phenotypes in SCN1A-positive individuals based on presenting seizure type. 6Earlier identification of DS allows clinicians to make more informed treatment choices, communicate the diagnosis and prognosis with patient families/carers, and in the future, identify patients who may benefit from targeted therapies early.
The findings presented in this study must be contextualized by certain limitations.This retrospective cohort only included individuals who had been referred for genetic testing or for research purposes, which could impart bias.Our study contained a cohort with a vast number of DS patients compared to GEFS+ patients.With respect to variant types, we found missense variants were responsible for slightly more than half of DS phenotypes, with the remainder made up of truncating variants.GEFS+ phenotypes result largely from missense variants, with a small minority being truncating.These proportions were reflected by those shown in previously reported smaller cohort studies, strengthening the applicability of findings on a population scale. 11,17It should be noted that patient data were obtained from research centers based in countries with predominantly Caucasian ethnicity and therefore may not entirely reflect differences among other ethnic groups.Additionally, due to the milder nature of disease observed in other GEFS+ phenotypes, it is likely that fewer patients will undergo molecular genetic testing.Hence, it is possible that the true population proportion of SCN1A-associated GEFS+ is higher than that reported in this and other similarly constructed cohorts.

| CONCLUSIONS
This study extends our current knowledge of genotypephenotype associations in SCN1A-related epilepsies.This includes findings regarding the earlier disease onset in patients with missense variants in important functional regions, the occurrence of GEFS+ truncating variants, the similarity of identical variants, and the value of in silico variant prediction scores.The identification of highly specific early disease features such as status epilepticus aids early diagnosis and may help shape treatment choice in the early stages of disease.

F I G U R E 2
Three-dimensional schematic of the SCN1A protein shown from the side, top, and bottom angles.The central opening in the schematics represents the channel pore, and voltage-sensing domain (VSD) sites are indicated.The schematic is colored to indicate where variants are located throughout the gene, with shading representing the associated age at seizure onset of variants found within these regions.Schematic A encompasses all missense variant patients.Schematic B only displays missense variant genetic epilepsy with febrile seizures plus patients.

T A B L E 3
Identical variant carriers.