Monogenic diabetes syndromes: Locusspecific databases for Alstrm, Wolfram, and Thiamineresponsive megaloblastic anemia

Citation for published version: Astuti, D, Sabir, A, Fulton, P, Zatyka, M, Williams, D, Hardy, C, Milan, G, Favaretto, F, Yu-Wai-Man, P, Rohayem, J, López de Heredia, M, Hershey, T, Tranebjaerg, L, Chen, JH, Chaussenot, A, Nunes, V, Marshall, B, McAfferty, S, Tillmann, V, Maffei, P, Paquis-Flucklinger, V, Geberhiwot, T, Mlynarski, W, Parkinson, K, Picard, V, Bueno, GE, Dias, R, Arnold, A, Richens, C, Paisey, R, Urano, F, Semple, R, Sinnott, R & Barrett, TG 2017, 'Monogenic diabetes syndromes: Locus-specific databases for Alström, Wolfram, and Thiamine-responsive megaloblastic anemia', Human Mutation, vol. 38, no. 7, pp. 764-777. https://doi.org/10.1002/humu.23233


INTRODUCTION
Monogenic diabetes syndromes are characterized by glucose intolerance together with extrapancreatic features, and result from one or more defects in a single gene. There are about 40 different genetic subtypes identified so far, with an estimated prevalence of 2%-5% of all patients with diabetes (Schwitzgebel, 2014). The wide phenotypic and genetic heterogeneity poses significant problems for our understanding of disease mechanisms, and for providing prognostic information. This is compounded by the identification of diabetes syndrome gene variants of uncertain significance, in isolated diabetes through the widespread application of next-generation sequencing (Alkorta-Aranburu et al., 2014;Ellard et al., 2013;Philippe et al., 2015). There are no up-to-date variant databases for most monogenic diabetes syndrome genes; those that do exist contain limited historical variants on publicly available Websites (HGVS: http://www.hgvs.org/ dblist/dblist.html, GEN2PHEN: http://www.gen2phen.org/data/lsdbs, LOVD: http://grenada.lumc.nl/LSDB_list/lsdbs, WAVe: http://bioin formatics.ua.pt/WAVe, ClinVar: http://www.ncbi.nlm.nih.gov/clinvar).
Wolfram (type 1, MIM# 222300; type 2, MIM# 604928), Alström (MIM# 203800), and Thiamine-responsive megaloblastic anemia (MIM# 249270) syndromes are rare, monogenic syndromes where diabetes is a common feature. They are chronically debilitating, highly complex, and in common with other rare diseases, often subject to misdiagnosis, delayed diagnosis, and nondiagnosis. The syndromes exhibit clinical overlap: all can cause profound visual and hearing impairment, and diabetes mellitus (DM) or impaired glucose tolerance. With 0.57, 0.14, and 0.1 cases per 100,000 (Prevalence of rare diseases: Bibliographic data, 2013), all three syndromes also fall within the EU rare disease definition of "a prevalence of not more than 5 affected persons per 10,000 population" (Regulation ( is an EU initiative to widen access to genetic testing, clinical information, and research for the overlapping rare diabetes syndromes Wolfram, Alström, Bardet Biedl syndrome, and others, in Europe (www.euro-wabb.org). As part of this project, we have created a new locus-specific database to provide catalogs of gene variations involved in monogenic diabetes syndromes. By building on the existing generic frameworks and platforms for rare diseases, this gene variant database operates at a disease-specific level to support efficient diagnosis and research for these syndromic diabetes diseases.

THE GENES
This report focuses on four genes in the EURO-WABB locus-specific database, namely ALMS1, WFS1, CISD2, and SLC19A2.
Pathogenic variants in WFS1 (MIM# 606201) cause Wolfram syndrome (WS) type 1, a rare neurodegenerative disease characterized by DM and optic atrophy (OA). The gene is located on chromosome 4p16.1, and codes for an 890 amino acid protein (Wolframin) consisting of eight exons spanning 33.4 kb of genomic DNA (Inoue et al., 1998). Wolframin is an endoplasmic reticulum (ER) membrane protein (Takeda et al., 2001), thought to function as ER calcium channel or a regulator of ER calcium channel activity (Osman et al., 2003) and is involved in the unfolded protein response via interaction with and regulation of the ER stress sensor ATF6 (Fonseca et al., 2010). It is under regulation by ER stress sensors PERK, IRE 1alpha, and ATF6-beta (Fonseca et al., 2005;Odisho, Zhang, & Volchuk, 2015).
WS type 1 is also known as DIDMOAD due to the clinical features associated with the disease (diabetes insipidus, diabetes mellitus, optic atrophy, and deafness). Although nonautoimmune insulin-dependent DM is the most common manifestation of WS, the most frequent cause of morbidity and mortality associated with the disease are neurological disorders and urinary tract complications (Kinsley, Swift, Dumont, & Swift, 1995).
Pathogenic variants in the CISD2 gene (MIM# 611507) have been identified in patients with WS type 2 (Amr et al., 2007;Mozzillo et al., 2014). WS type 2 differs from type 1 in respect that so far no diabetes insipidus (DI) and psychiatric disorder has been associated with the disease, and the novel presence of defective platelet aggregation leading to peptic ulcer bleeding. CISD2 is located on chromosome 4q24, and codes for a 135 amino acid protein ERIS (ER intermembrane small protein), which consists of three exons spanning a 64.7-kb genomic region.
ERIS is a highly conserved zinc finger protein of the ER membrane involved in the regulation of cellular calcium homeostasis and mitochondrial biogenesis (Wang et al., 2014). Immunoprecipitation studies showed that ERIS protein coded by CISD2 does not interact with Wolfram protein (Amr et al., 2007). Studies in mice show that cisd2 deficiency in these animals causes mitochondrial death and dysfunction accompanied by autophagic death (Chen et al., 2009). To date, only 13 individuals with CISD2 mutations have been reported in the literature (Amr et al., 2007;Mozzillo et al., 2014;Rondinelli, Novara, Calcaterra, Zuffardi, & Genovese, 2015).
The database is based on the Leiden Open-source Variation Database (LOVD) platform V2.0-36 (Fokkema et al., 2011)  We included the following minimum data item set: pathogenicity, DNA change, genomic position in the reference sequence and genome assembly (GRCh 38), predicted protein change, mutation type, variant remarks (other information available for the variant), technique used, link to published reference if applicable, and the following anonymized clinical data: ethnic origin, gender, consanguinity, and clinical features.
As standard for the LOVD system, the database has links to other services such as PubMed, HGNC, Entrez Gene, OMIM, and GeneCards, in addition to sequence databases. The databases catalogs variants identified in patients reported to have been diagnosed with AS, WS type 1/type 2, and TRMA syndrome.
In predicting variant pathogenicity, we followed guidelines from the American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathology (AMP) (Richards et al., 2015) and considered other supporting information such as experimental evidence, presence in multiple families, segregation with disease T r a n s l o c a t i o n s 1000 phenotypes, as well as the prediction algorithm SIFT (Ng & Hanikoff, 2003), and PolyPhen-2 (Sunyaev et al., 2001).
The database is implemented on a secure server held by University

Novel variants in ALMS1 and WFS1
We report 17 novel germline ALMS1 variants detected in 17 UK AS patients from 16 families and one Slovakian patient (Table 2) and 23 novel WFS1 variants in 59 UK WS patients from 48 families (Table 3) referred to the West Midlands Regional Genetic Service, Birmingham Women's Hospital and Department of Medicine, Padua University, Italy. All novel variants identified were submitted to the EURO-WABB database (https://lovd.euro-wabb.org).

WFS1 genotype-phenotype analysis
We collected information on the age of onset of DM, OA, deafness, DI, and other reported clinical features from patients in the database and categorized the disease by phenotype and genotype.
Patients whose phenotype could clearly be identified were then assigned into respective genotype and phenotype categories (Supp.       (Krogh, Larsson, von Heijne, & Sonnhammer, 2001) and SMART (Letunic, Doerks, & Bork, 2015). ER, endoplasmic reticulum Notes: Group 1: variants predicted to cause complete or partial loss of function (N-terminal nonsense and frameshifts, splice-site variants predicted to cause exon skipping/deletions; C-terminal nonsense and frameshift; N-terminal small in-frame deletions/duplications/insertions/indels); or compound heterozygous where one variant is predicted to cause complete and the other a partial loss of function. Group 2: variants predicted to cause minor loss of function (missense, C-terminal small in-frame deletions/duplications/insertions/indels) or compound heterozygous for a variant predicted to cause partial and minor loss of function. See Supp.  CI, confidence interval.
Notes: Group 1: variants predicted to cause complete or partial loss of function (N-terminal nonsense and frameshifts, splice-site variants predicted to cause exon skipping/deletions; C-terminal nonsense and frameshift; N-terminal small in-frame deletions/duplications/insertions/indels); or compound heterozygous where one variant is predicted to cause complete and the other a partial loss of function. Group 2: variants predicted to cause minor loss of function (missense, C-terminal small in-frame deletions/duplications/insertions/indels) or compound heterozygous for a variant predicted to cause partial and minor loss of function. Sensitivity, specificity, positive predictive value, and negative predictive value was calculated using VassarStats Clinical calculator 1 (www.vassarstats.net). Notes: Group 1: variants predicted to cause complete or partial loss of function (N-terminal nonsense and frameshifts, splice-site variants predicted to cause exon skipping/deletions; C-terminal nonsense and frameshift; N-terminal small in-frame deletions/duplications/insertions/indels); or compound heterozygous where one variant is predicted to cause complete and the other a partial loss of function. Group 2: variants predicted to cause minor loss of function (missense, C-terminal small in-frame deletions/duplications/insertions/indels) or compound heterozygous for a variant predicted to cause partial and minor loss of function.
UK-White European families. In these cases, we speculate that genetic and environmental interactions may contribute to variable expressivity.
Comparison of the age of onset of DM and OA in group 1 and group 2 genotypes revealed a highly significant difference in phenotypes between the two groups. The mean age of onset of DM was 6.3 ± 3.5 years in patients with group 1 genotypes and 12.0 ± 9.9 years in individuals with group 2 genotypes (P < 0.0001), whereas the mean age of onset of OA was 11.7 ± 5.7 years in individuals with group 1 genotypes and 15.8 ±11.4 years in individuals carrying group 2 genotypes (P = 0.0023). A significant difference in the age of onset of DI was also observed between individuals carrying group 1 and group 2 genotypes. The mean age of onset of DI was 13.9 ± 6 years and 18.0 ± 10 years in group 1 and group 2 genotypes (P = 0.047), respectively (Table 6). Rohayem et al. (2011) andde Heredia, Clèries, andNunes (2013) also showed significant differences in the age of onset of DM and DI among patients carrying predicted complete, partial, or minor loss-of-function mutation. However, due to differences in genotypic classification used by these authors, the mean age of onset of DM and DI cannot be directly compared. It has been previously reported that some patients harboring a homozygous frameshift variant in the C-terminal end of WFS1 tend to have a delayed onset of OA (Zalloua et al., 2008). We therefore analyzed 19 patients carrying homozygous frameshift variants in the Cterminal of WFS1 (patients 265, 271-280, 285, 290, 295-298, 300, and 301 in Supp. Table S1), and 33 patients harboring a homozygous frameshift variant in the N-terminal region (patients 14-21, 58, 63, 87-91, 104, 112, 116, 118, 119, 123, 135, 198, 202, 235, 243-247, and 252-254 in Supp. Table S1). There is a slight difference in the age of onset of OA in patients with homozygous frameshift C-terminal variant compared with the age of OA onset in patients with homozygous frameshift N-terminal variants (13.2 ± 5 years and 11.2 ± 6.1 years, respectively). However, this is not statistically significant. Variants associated with a WS phenotype were distributed in both outside and inside the transmembrane region, whereas variants involved in the dominant form of WFS1-related disorder were mainly located at the C-terminal end of the protein (Fig. 1).

Future prospects and database update
The clinical overlaps and complexity exhibited by the syndromes mentioned in this report may lead to delayed or misdiagnosis. We have demonstrated that a more detailed description of clinical phenotypes in the patients coupled with genotype information can provide insight into genotype-phenotype correlations of these syndromes. Unfortunately, the clinical phenotypes are difficult to access, not always available, and can be unreliable sometimes in terms of age of onset. We hope that the information available for some of the patients in our databases will allow for better understanding of the disease and reliable genetic counselling for the patients and their families. Ultimately, functional studies of the variants will be necessary to further our understanding of disease mechanisms that will lead to the development of personalized therapies.
The EURO-WABB LOVD locus-specific databases for ALMS1/WFS1/CISD2/ SLC19A2 have been available online since 2012 and have received submissions of variants identified in patients.
Future contributors can submit their variants online or by contacting and providing curators with the necessary information. When referring to the EURO-WABB ALMS1/WFS1/CISD2/ SLC19A2 LOVD databases, we kindly ask users to cite this article.

ACKNOWLEDGMENTS
We sincerely thank referring clinicians, patients, and their family and past contributors for submitting their variants to our databases. We are very grateful for the support of Alström Syndrome UK, Associ-

DISCLOSURE STATEMENT
The authors declare no conflict of interest.