Annotation of functional impact of voltage‐gated sodium channel mutations

Abstract Voltage‐gated sodium channels are pore‐forming transmembrane proteins that selectively allow sodium ions to flow across the plasma membrane according to the electro‐chemical gradient thus mediating the rising phase of action potentials in excitable cells and playing key roles in physiological processes such as neurotransmission, skeletal muscle contraction, heart rhythm, and pain sensation. Genetic variations in the nine human genes encoding these channels are known to cause a large range of diseases affecting the nervous and cardiac systems. Understanding the molecular effect of genetic variations is critical for elucidating the pathologic mechanisms of known variations and in predicting the effect of newly discovered ones. To this end, we have created a Web‐based tool, the Ion Channels Variants Portal, which compiles all variants characterized functionally in the human sodium channel genes. This portal describes 672 variants each associated with at least one molecular or clinical phenotypic impact, for a total of 4,658 observations extracted from 264 different research articles. These data were captured as structured annotations using standardized vocabularies and ontologies, such as the Gene Ontology and the Ion Channel ElectroPhysiology Ontology. All these data are available to the scientific community via neXtProt at https://www.nextprot.org/portals/navmut.


INTRODUCTION
Ion channels are integral membrane proteins that allow ions to flow across membranes in all living cells, playing an important role in key physiological processes such as neurotransmission, muscle contraction, learning and memory, secretion, cell proliferation, regulation of blood pressure, fertilization, and cell death. In humans, 344 genes encode the pore-forming subunits of ion channels. Mutations in more than 126 ion channel as well as in several ion channel-interacting protein genes have been reported to cause diseases, known as channelopathies (Ashcroft, 2006). The voltage-gated sodium channel group comprises nine members in mammals: SCN1A-SCN5A, corresponding to Na v 1.1 to Na v 1.5, and SCN8A-SCN11A, corresponding to Na v 1.6 This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. to Na v 1.9 (Table 1). SCN7A corresponding to Na x is not included in this group as it is not voltage sensitive. These channels are proteins of about 2,000 amino acids that are composed of four domains (DI-DIV), each consisting of six transmembrane helices (S1-S6) and connected to each other by the L1, L2, and L3 linkers (Figure 1). They interact with other proteins that modulate channel gating properties as well as channel trafficking and subcellular localization. Regions known to be involved in the gating properties and the protein-protein interactions are listed in Table 2 (Ahern, Payandeh, Bosmans, & Chanda, 2016).
Disruption of any aspect of the channel function can cause a wide spectrum of diseases. Depending on their tissue expression, defects in their function cause disorders such as epilepsy and seizures, paralysis, myotonia, pain disorders, autism, cognitive impairments, as well TA B L E 1 Diseases associated with mutations in voltage-gated sodium channels family; all data are represented using standardized vocabularies; and last but not least, each statement is reviewed by an expert curator and goes through a thorough quality control process.

Selection of data for curation
The Ion Channels Variants Portal presents data on the molecular, cellular, and organ-level phenotypes caused by genetic variations in the nine human voltage-gated sodium channels. Only variations that affect the coding sequence and give a defined mutated product have been considered, excluding intronic or splice site variations. This information was manually extracted from research articles indexed in PubMed and captured using an in-house developed biocuration software platform, the BioEditor. This tool allows the capture of structured annotations using standardized vocabularies. The experimental evidence supporting each annotation is also captured, including the reference, the type of assay, the protein origin, and the biological system in which the experiment was performed.

Data model
Annotation statements (Table 3A) are triplets composed of (1) a subject that corresponds to the protein variation being annotated; (2)  The relations (Table 3B) linking the subject and the object are grouped into two main concepts: "no impact" and having "impact." The possible impacts of variations can be further specified using "increases," "decreases," or "gains function"; "causes phenotype" for mammalian phenotype terms and specific impacts of variations on electrophysiological parameters can be described with relevant relations such as "depolarizes,", "hyperpolarizes," "hastens," "slows," or "impacts temperature-dependence of." For each annotation, detailed information about the experimental support of each statement is captured as evidence statements (Table 3C). The annotation evidence is composed of (1) one or more terms from the Evidence and Conclusion Ontology (Chibucos et al., 2014), describing the experiment performed, such as basic biological experiments or specific types of electrophysiological recordings; (2) the protein origin, which represents the species from which the protein was obtained for the experiment described using the NCBI taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy); (3) the biological system in which the experiment was done, that may contain one or more of these elements: the organism from the NCBI taxonomy, the tissue or cell type, from the CALOHA human anatomy vocabulary (ftp://ftp.next prot.org/pub/current_release/controlled_vocabularies/caloha.obo) or the cell line from the Cellosaurus knowledge resource (http:// web.expasy.org/cellosaurus/); (4) a qualitative assessment of the severity of the phenotype, either "Mild," "Moderate," or "Severe"; (5) a quality flag; each evidence is labeled as either Gold (high quality) or Silver (good quality), knowing that the "Silver" tag is used where quality of the experiment is not optimal according to curator's judgment; and (6) one reference, usually identified using a PubMed ID.

Assessment criteria for phenotype severity and data confidence
The criteria used to grade the severity of the phenotypic observations are based on the fold-change of activity, response, and so on, in the mutant compared with the wild-type control under the same conditions. A phenotype is considered "Mild" when the change is around TA B L E 3 Annotation elements: (A) basic triplet statement for "SCN5A-p.Ser2014*" variant, (B) relations, and (C) evidence for the triplet "SCN5A-p.Ser2014* decreases macroscopic conductance" 10%-20% of the control. "Moderate" is assigned when the change is between 20% and 80% of the control, whereas "Severe" is used to describe changes exceeding 80% of the control.

Element CV/Ontology Example
Each evidence is also assigned a qualitative confidence score, Gold or Silver. This is subjective to some extent and varies on a case-by-case basis, but our guidelines for assigning Gold quality to an experiment require either good statistical significance (P< = 0.01; for experiments where appropriate) or substantial relevance of the experiment, including appropriate controls. For example, a Silver tag may be assigned when the data are qualitative and/or statistical evaluation is missing, when errors are very large, when the data result from a low confidence assay (for example, low replicate number; poorly defined experimental systems, etc.), or when the experiment is carried out in a non-human protein that is evolutionarily distant from the human protein.

Quality control
To ensure data integrity, the annotations undergo both automated and manual checks. Automated checks ensure that the annotation is complete, that is, it contains a subject, a relation, an object, a reference, at least one evidence code, and the species in which the experiment was done. In the case of sequence variations, our software checks that the original amino acid at the position annotated is found in the sequence being annotated. Also, manual checks are performed to provide correct and consistent annotations. For example, controlled vocabulary terms used in the annotations are checked to ensure that terms used for biological processes were consistent throughout the annotations corpus. Once all the checks are successful, the annotations obtain a "Valid" status and are integrated into the portal.

Creation of the Ion Channels Variant Portal
The Ion Channels Variants Portal is specific to voltage-gated sodium

Impact of Na v variants on protein function
The Ion Channels Variants Portal contains 4,658 phenotypic observations on 672 variants, both natural and artificially generated with at least one phenotypic impact, extracted from 264 publications ( Table 4).
The corpus of data available for each Na v varies widely, with SCN2A, SCN4A, and SCN5A having the highest number of variants, whereas

Distribution of Na v variants in different topological regions, domains, and segments
To simplify the presentation of the variants distribution, we first divided the proteins based on two topological types: the cytoplasmic regions composed of the amino-and carboxyl-termini as well as the three linkers L1, L2, and L3, and the DI-DIV domains (mostly composed of transmembrane regions).
Consistent with the important role of the DI-DIV domains in channel gating (Bennett et al., 1995), most variants are located in these regions ( Figure 3A): seven out the nine channels have over 60% of their variants in the DI-DIV domains. However, the SCN5A and SCN8A variants are evenly distributed between the domains and cytoplasmic regions. In the case of SCN5A, the cytoplasmic regions have been extensively studied because they are targeted by a high proportion of disease-causing mutations. The underlying defect of the SCN5A mutations in the L3 and the C-terminal regions has been linked to their role in the inactivation gating process (Bennett et al., 1995;Mantegazza, Yu, Catterall, & Scheuer, 2001;Wehrens et al., 2003). It is not clear why for domains DIII and DIV that contain more artificial mutants due to their extensive study in SCN2A and SCN4A ( Figure 3B).

Distribution of variants within S1-S6 transmembrane segments
Looking in more detail at the S segments regardless of the domain they belong to, we observed that the natural variants impacting electrophysiological parameters are distributed predominantly in the S4 segment (41%), whereas the artificial variants are mainly localized in the S6 segment (37%) ( Figure 3C). The S6 segments are critical for permeation and fast inactivation because of their close proximity with the selectivity filter and inactivation gate (McPhee, Ragsdale, Scheuer, & Catterall, 1994, whereas the S4 segments play an essential role in VSD (Stuhmer et al., 1989). In addition, the S4 segments of the DIV domain have been shown to play a unique role in activationinactivation coupling (Chen, Santarelli, Horn, & Kallen, 1996). Interestingly, a large fraction of disease-causing variants are also localized in these segments: 25% in S4 and 23% in S6. Thus as expected, the electrophysiological variant distribution correlates with the importance of the D domains and the S segments on the channel function.

Distribution of non-electrophysiological phenotypes
Finally, we looked at the annotations affecting phenotypes other than electrophysiological parameters such as the binding, signaling, effect on cellular function, localization, and degradation. We noticed that these variants are mostly localized in the cytoplasmic regions (Nterminus, linkers, and C-terminus) rather than in the domains DI-DIV (63% and 37%, respectively) that is most likely related to the fact that transmembrane sequences from the domains offer less physical access to binding. All these variants impair interactions with intracellular proteins (outlined in Table 2) that regulate the trafficking and/or activity of the channels, including protein kinases (PKC, PKA, CSNK2A2, GSK3B, etc.), ubiquitin-protein ligases (NEDD4 and NEDD4L), and trafficking partners such as Ankyrin (ANK2 and ANK3), Syntrophin (SNTA1, SNTB1, SNTB2, SNTG1, SNTG2), fibroblast growth factor homologous factors 1-4 (FGF-11/-12/-13/-14) calmodulin, and so on (Abriel & Kass, 2005).

Comparison of disease-associated variants and experimentally characterized variants
One important challenge in medicine is to correlate the symptoms of patients (phenotypes) with the causative genetic variations. Conversely, as genome sequencing is expected to become a standard medical laboratory practice, it may be possible to identify mutations in individuals before any symptoms are visible. This practice may be especially relevant for Na v variants, as several associated diseases have a relatively late onset and mild disease progression, thus providing ample opportunities for disease prevention or attenuation if diagnosed in time. Our portal has therefore important potential applications in this area. A total of 1,289 disease-associated variants in the nine voltage-gated sodium channels were obtained from ClinVar and neXtProt (Table 4), 18% of which the portal has functional annotations (231 variants). Hence, the largest fraction of disease-associated variants is not characterized. Nevertheless, there are 441 variants not described to cause diseases but with functional data that could contribute to understand the pathogenic potential of new or paralogous variants. A detailed view of the overlap in the disease variants and the variants characterized experimentally is shown in Figure 4 where the fraction of variants in each of the domains and linkers is shown for each Na v . The overall distribution of the variants with phenotypic data generally matches that of the disease variants, except for the C-terminal region. This exception is likely due to the specific research focus on the regulatory interactions occurring in this region, for example with calmodulin and NEDD4L. These data indicate that despite the importance of this C-terminal region, only few natural mutations have been found in this region and shown to trigger disorders. F I G U R E 4 Stacked fraction of (A) the disease-causing variants per region and of (B) the characterized variants per region

CONCLUSIONS
The Ion Channels Variants Portal we have developed provides an exhaustive list of variants in voltage-gated sodium ion channels for which molecular phenotypes are available, curated in a highly structured model, with detailed information about the experimental system, without redundancy in the data and with complete traceability to the original experimental results. Researchers as well as clinical geneticists will be able to consult this database to have a comprehensive overview of the available data, which may be used to support the clinical decision process. Furthermore, with the large amount of data available, correlations between different mutations and certain diseases may be used to predict the effect of similar mutations in paralogous proteins.
The Ion Channels Variants Portal will undoubtedly be a useful resource for a better understanding of ion channel function, essential for understanding channelopathies. To be consistent with this aim, we will continue our effort to integrate newly characterized Na v variants and we welcome contact with groups having data sets that they would like to be considered for inclusion in the portal. Finally, we also plan to use the same approach to expand the annotation corpus to other ion channels.

ACKNOWLEDGMENT
The neXtProt server is hosted by Vital-IT, the SIB Swiss Institute of Bioinformatics' Competence Centre in Bioinformatics and Computational Biology. This work was funded by the Swiss National Science Fund Grant CR33I3_156233 to AB and HA.

DISCLOSURE STATEMENT
The authors declare no conflict of interest.