A Pan-European Study of the C9orf72 Repeat Associated with FTLD: Geographic Prevalence, Genomic Instability, and Intermediate Repeats

We assessed the geographical distribution of C9orf72 G4C2 expansions in a pan-European frontotemporal lobar degeneration (FTLD) cohort (n = 1,205), ascertained by the European Early-Onset Dementia (EOD) consortium. Next, we performed a meta-analysis of our data and that of other European studies, together 2,668 patients from 15 Western European countries. The frequency of the C9orf72 expansions in Western Europe was 9.98% in overall FTLD, with 18.52% in familial, and 6.26% in sporadic FTLD patients. Outliers were Finland and Sweden with overall frequencies of respectively 29.33% and 20.73%, but also Spain with 25.49%. In contrast, prevalence in Germany was limited to 4.82%. In addition, we studied the role of intermediate repeats (7–24 repeat units), which are strongly correlated with the risk haplotype, on disease and C9orf72 expression. In vitro reporter gene expression studies demonstrated significantly decreased transcriptional activity of C9orf72 with increasing number of normal repeat units, indicating that intermediate repeats might act as predisposing alleles and in favor of the loss-of-function disease mechanism. Further, we observed a significantly increased frequency of short indels in the GC-rich low complexity sequence adjacent to the G4C2 repeat in C9orf72 expansion carriers (P < 0.001) with the most common indel creating one long contiguous imperfect G4C2 repeat, which is likely more prone to replication slippage and pathological expansion.

In the present study, we aimed at expanding our C9orf72 observations in the Flanders-Belgian FTLD and FTLD-ALS cohort (n = 360) with a larger European cohort of 845 FTLD and FTLD-ALS patients, in which we determined the geographical distribution and prevalence of the pathological G 4 C 2 expansion. Furthermore, we provide the first evidence for a role of G 4 C 2 intermediate repeat length on C9orf72 expression and hypothesize on the genomic mechanisms favoring pathological expansion of the G 4 C 2 repeat.

Study Populations
The European FTLD cohort was collected through the European Early-Onset Dementia (EOD) consortium (Supp. Table S1). The European EOD consortium was launched in August 2011 to centralize and harmonize epidemiological, clinical, and biological data together with biomaterial of EOD patients throughout the Europe to stimulate high-profile translational dementia research. Supp. Table  S1 describes the number of patients per country and per clinical subgroup contributed by the European EOD consortium members. We received DNA and clinical and demographic information on 917 unrelated FTLD and FTLD-ALS patients as well as histopathology data of 46 patients obtained at autopsy. The 917 patients also included 10 patients from Wallonia, the French speaking part of Belgium, and six more patients from Italy, Spain, and Sweden, which were referred for clinical genetic testing to the Diagnostic Service Facility in our Department of Molecular Genetics (DMG DSF). Patients had been diagnosed according to established clinical diagnostic Neary criteria [Neary et al., 1998] and to the Mackenzie consensus criteria for neuropathology diagnosis [Mackenzie et al., 2010].
The Flanders-Belgian cohort consisted of 337 unrelated patients with FTLD and 23 with FTLD-ALS. These patients were recruited through the Belgian Neurology (BELNEU) consortium, a collaboration with neurologists affiliated to nine different specialized memory clinics and neurology departments in Belgium [Gijselinck et al., 2012;Van Langenhove et al., 2012b] (Supp. Table S2). In addition to the patient cohort, a Flanders-Belgian control cohort was assembled (n = 1,083). For more detailed description see Supp. Materials and Methods.

Figure 1.
Genotyping assays to characterize the C9orf72 region and G 4 C 2 repeat. The C9orf72 G 4 C 2 repeat (yellow box) is located upstream of the first exon of isoform NM 018325.3 (dark blue arrow) and adjacent to a GC-rich low-complexity sequence (LCS; light grey box) with their nucleotide sequences shown above. The sequence of the recurrent 10-bp deletion g.26747 26756delGTGGTCGGGG (Table 4, Supp. Figure S1), we observed in the LCS, is indicated in blue. Below the sequence, the primers with their corresponding PCR amplicons are shown for each of the PCR genotyping assays: STR-PCR in pink, forward RP-PCR in green, reverse RP-PCR in red and RP-PCR for sequencing in blue.

Histopathology of C9orf72 Expansion Carriers
From 11 C9orf72 G 4 C 2 expansion carriers, formalin-fixed brain was available for immunohistochemistry. Five micrometer slices were obtained from frontal cortex, temporal neocortex, hippocampus, area striata, neostriatum, mesencephalon, pons, and cerebellum. Of seven cases, additional samples were provided from thalamus and spinal cord. Slides were stained against Ubiquitin, p62, hyperphosphorylated tau, β-amyloid, TDP43, and FUS. For technical details see Supp. Materials and Methods.

C9orf72 G 4 C 2 Genotyping assays
We developed an alternative repeat-primed PCR assay (reverse RP-PCR; Fig. 1) and a short tandem repeat (STR) fragment length assay with flanking primers optimized for alleles with high GC content (STR-PCR; Fig. 1) allowing reliable identification of G 4 C 2 expansion carriers and exact sizing of normal lengths. These assays were performed in both cohorts and in relatives of the younger generation of index patients carrying an intermediate repeat allele or a variation in the flanking LCS without expansion.
For technical details on primers and amplification protocols see Supp. Materials and Methods.

Sequencing of the C9orf72 GC-Rich Low Complexity Sequence
We used the product of an alternative forward RP-PCR (RP-PCR for sequencing; Fig. 1) and sequenced the low complexity sequence (LCS) with the locus-specific reverse primer.
The Flanders-Belgian patient (n = 317) and control (n = 752) cohorts and 57 expansion carriers and 114 nonexpansion carriers of the European cohort were successfully screened. Cosegregation of LCS variations with the presence of a G 4 C 2 expansion was analyzed in two available families. For technical details see Supp. Materials and Methods.

C9orf72 Exon Sequencing and Dosage Analysis
Both cohorts were screened for coding and splice-site mutations in C9orf72. The frequency of rare mutations was determined in 400 controls. Further, we screened the Flanders-Belgian cohort for exonic deletions or duplications using the Multiplex Amplicon Quantification technique [Kumps et al., 2010]. For technical details see Supp. Materials and Methods.

Genetic Association Studies
We calculated association with disease of C9orf72 intermediate G 4 C 2 alleles and of SNP rs2814707 tagging the chromosome 9p21 risk haplotype stratifying for the presence or absence of C9orf72 intermediate repeats. Odds ratios (OR) with 95% confidence interval (CI) were calculated in a logistic regression model, adjusted for age and gender. In addition, we studied the correlation between the minor risk T-allele and intermediate repeat length. Further, in patients of both cohorts without a G 4 C 2 expansion, we calculated correlation between age at onset and normal repeat length using a Kruskal-Wallis test. Further, we compared age at onset between short (2-6) and intermediate (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24) repeat length in a Kaplan-Meier survival analysis. All analyses were done in IBM SPSS Statistics 20 (IBM Corporation, Armonk, NY, USA). For details see the Supp. Materials and Methods. a This group of other 72 patients was not included in the total because they did not fulfill the criteria for possible or probable diagnosis.

Luciferase Reporter Assays
We selected a 2 kb C9orf72 promoter fragment (chr9:27,572,414-27,574,451; NCBIBuild37 -hg19) containing the G 4 C 2 repeat and enriched for histone marks, DNaseI hypersensitivity clusters, and transcription factor binding sites based on ENCODE transcription data [Gijselinck et al., 2012]. The fragment was obtained by PCR of individuals carrying different numbers of normal repeat units (2, 9, 17, and 24 units) using primers with flanked attB-sites. PCR products were cloned into the pDONR221 vector (Invitrogen, Life Technologies, Grand Island, NY, USA) by a BP recombination reaction (Invitrogen, Life Technologies, Grand Island, NY, USA), and the integrity of all inserts was confirmed by sequence analysis. Correct entry clones were selected and cloned into an in-house developed promoterless destination vector containing the Gaussia luciferase reporter gene downstream of a Gateway cassette, by use of a LR recombination reaction (Invitrogen, Life Technologies, Grand Island, NY, USA). Human HEK293T cells were propagated and seeded for transient transfection in 24-well tissue-culture dishes, at 2 × 10 5 cells per well, and were allowed to recover for 24 hr. Cells were cotransfected with 40 ng of pSV40-CLuc plasmid that encodes the Cypridina luciferase gene with a SV40 promoter (New England Biolabs, Ipswich, MA, USA) and 1,000 ng of three independent C9orf72 promoter constructs per unit, with use of 2.4 μl Lipofectamine 2000 (Invitrogen, Life Technologies, Grand Island, NY, USA), in duplo. After 24 hr, Gaussia luciferase activities (LA G ) and Cypridina luciferase activities (LA C ) were measured in duplo in the growth medium using a BioLux Gaussia and Cypridina Luciferase Assay Kit (New England Biolabs, Ipswich, MA, USA) and a Veritas Microplate Luminometer with Dual Reagent Injectors Luminometer (Promega, Madison, WI, USA). To correct for transfection efficiency and DNA uptake, the relative luciferase activity (RLA) was calculated as RLA = LA G /LA C . This experiment was repeated three times resulting in 36 measurements for each construct. RLAs between different repeat lengths were calculated by a Mann-Whitney U test. For details see Supp. Materials and Methods.

The European EOD Consortium
The European cohort included 917 unrelated patients, of which 845 had a possible or probable diagnosis of FTLD (n = 781) or FTLD-ALS (n = 64; Table 1). In an additional 72 other patients, clinical presentation showed indications of FTLD together with symptomatology of other neurodegenerative brain diseases such as Alzheimer or Parkinson disease. A pathological diagnosis on autopsied brain was obtained for 45 patients, comprising FTLD-TDP (n = 28), FTLD-MND-TDP (n = 15), FTLD-UPS (n = 1), and FTLD-TAU (n = 1) diagnoses. Information on family history of disease was available for 609 (72.07%) of the 845 patients, of which 251 had a positive family history of disease and 358 were considered sporadic patients (Table 2). Average onset age and range were comparable between FTLD (62.7 ± 9.0, range 28-88 years) and FTLD-ALS patient groups (60.9 ± 9.8, range 31-83 years).
To evaluate the distribution of the G 4 C 2 expansion, we calculated, overall and per clinical phenotype, mutation frequencies per country (Supp . Table S2; Table 3). In Belgium, Portugal, and Italy, the pathological G 4 C 2 expansion mutation showed a comparable overall frequency ranging between 6.09% and 7.86%, and close to the

C9orf72-Associated Clinical and Pathological Phenotype
Of 73 G 4 C 2 expansion carriers in the European cohort, 50 received a clinical diagnosis of FTLD and 23 of FTLD-ALS. The average onset age in all carriers was 58.0 ± 7.5 years with an onset age range of 40-75 years and was comparable in the FTLD and FTLD-ALS subgroups (Supp. Table S3). The average disease duration in 31 deceased carriers was 5.3 ± 3.5 years (range 1-14 years) but differed in the clinical subgroups. Survival was on average 1.4 years shorter for the FTLD-ALS carriers with 4.7 ± 3.5 years (n = 17, range 1-14 years), compared with the FTLD patients with 6.1 ± 3.6 years (n = 14, range 1-14 years). Of 23 G 4 C 2 expansion carriers, more extensive clinical information was available to allow subclassification to the different FTLD phenotypes. In 22, clinical presentation was conform bvFTD (95.65%) and one presented with PNFA. Autopsied brain was available for 11 European G 4 C 2 expansion carriers (two Portuguese, five Spanish, one Austrian, one Czech, and two Swedish patients). In all 11 cases, we found TDP-43 pathology in the frontal and temporal neocortex, in the hippocampus and neostriatum, which was compatible with type B TDP proteinopathy [Mackenzie et al., 2010]. Accordingly, TDP-43 immunoreactive neuronal cytoplasmic inclusions (NCI) and dystrophic neurites were widespread over the entire cortical thickness, but NCI were more pronounced in the pyramidal cells of layer 2 compared with the deeper cortical layers. In addition to the TDP-43 positive pathology, p62 immunoreactive NCI were present in the granular layer of the dentate gyrus of the hippocampus, and in the granular layer of the cerebellar cortex. Further, p62 positive irregular granular NCI were observed in pyramidal neurons of the CA4 and CA3 region of the hippocampus.

Genomic Complexity in the C9orf72 Region
The C9orf72 G 4 C 2 repeat is contiguous with a GC-rich, LCS, comprising exon 1 of the C9orf72 transcript NM 018325.3 (Fig.  1). We sequenced the GC-rich LCS in 317 unrelated patients of the Flanders-Belgian cohort and in 752 controls (Fig. 1). We observed heterozygous deletions of 5-23 base pairs (bp) in a total of 19 individuals of which 10 patients carrying a G 4 C 2 expansion (10/27 = 37.04%), five noncarrier patients (5/290 = 1.72%), and four control persons (4/752 = 0.53%; Table 4). These variable deletions were significantly more frequently observed in carriers of a G 4 C 2 expansion compared with the group of noncarrier patients (OR = 33.53; 95% CI 10.31-109.09; P < 0.001) and controls (OR = 93.50; 95% CI 28.53-306.39; P < 0.001). Remarkably, nine of 10 (90.00%) expansion carriers presented with the same heterozygous 10-bp GTGGTCGGGG deletion (g.26747 26756 delGTGGTCGGGG; Supp. Fig. S1), which was not observed in patient noncarriers and controls (Table 4). This 10-bp deletion is contiguous with the G 4 C 2 repeat and joins two 100% GC sequences, thereby extending the GC-rich motif of the G 4 C 2 repeat with imperfect repeats (Fig. 1). In this context, it is striking that deletion of the GTGGT motif ( Fig. 1) was seen in all 10 deletion carriers of the 27 unrelated patients carrying an expanded G 4 C 2 repeat (37.04%), only once in the noncarrier patients (1/290 = 0.34%) and once in control individuals (1/752 = 0.13%; Table 4). To replicate these findings, we successfully sequenced the LCS in 57 unrelated patient carriers and 114 patient noncarriers from the European cohort. We observed a comparable high frequency of deletions and insertions (indels) in the patient carriers (14/57 = 24.56%) and no indels in the noncarriers (0/114, <0.88%; OR = 36.79; 95% CI 4.69-288.35; P = 0.001; Table 4). Seven of 14 (50.00%) patient carriers presented the same 10-bp deletion and in 10 of 14 (71.43%) patient carriers GTGGT was deleted (Table 4). Of note, children of the FTLD patient with the g.26747 26751delGTGGT deletion but without expansion did not show de novo expansions. This deletion is located on an allele of 5 repeat units.
The LCS is comprised in the PCR fragments produced by the forward RP-PCR assay to identify G 4 C 2 expansion carriers (Fig. 1) [Gijselinck et al., 2012]. To eliminate influences of LCS variability in G 4 C 2 expansion detection and for sizing normal repeat alleles, we developed a reverse RP-PCR assay on the sense strand (Fig. 1). This assay confirmed the presence of the G 4 C 2 expansion in patients of both cohorts.

Sizing of Normal C9orf72 G 4 C 2 Repeat Lengths
We used a STR genotyping assay (STR-PCR; Fig. 1  a The nucleotide sequence GTGGT is most frequently deleted in the LCS adjacent to the G 4 C 2 repeat in C9orf72 (indicated in bold). b gDNA numbering relative to reverse complement of contig AL451123.12 and starting at nucleotide 1.

Figure 2. A: Distribution of normal repeat lengths in the Flanders-Belgian patients and control individuals. Histograms of G 4 C 2 repeat units
sized <60 repeats in Flanders-Belgian patients, excluding patients with mutations in known causal genes or with a pathological G 4 C 2 expansion, compared with control individuals. B: Correlation of normal repeat lengths with rs2814707 alleles. Histograms of G 4 C 2 repeat units in 610 control individuals homozygous for the rs2814707 C-allele and 53 homozygous for the rs2814707 T-allele.
expansion carriers. In addition, we used the reverse RP-PCR assay ( Fig. 1) to validate the size of the longest allele in nonexpansion carriers. When we compared the allele lengths between the two assays, we obtained 99% concordance. Discordant allele scoring could be explained by the presence of a LCS deletion. This implies that the STR-PCR can be used for correct sizing of normal repeat alleles. The observed lengths of the G 4 C 2 repeat ranged from 2 to 24 units in nonexpansion carriers of the Flanders-Belgian cohort (g.26724GGGGCC[2 24]; relative to re-verse complement of AL451123.12; Fig. 2A) and between 2 and 21 units in the European cohort. The shortest allele contained one unit less than the reference genome (NCBIbuild37-hg19; Fig. 1). Frequency distribution of normal repeat alleles showed a trimodal allele distribution ( Fig. 2A). Using an estimation-maximization algorithm implemented in the R Package MCLUST, the three groups could be defined by a cutoff at 4 and 7 units (2-3 units, 4-6 units, and 7-24 units), with a probability of good classification of 0.9944.

Intermediate Repeat Length and the Chromosome 9 Risk Haplotype
In the Flanders-Belgian cohort, we previously had calculated a significant association of disease with the risk T-allele of rs2814707 (P = 0.008) [Gijselinck et al., 2012] tagging the chromosome 9p risk haplotype. Repeat expansion carriers contributed the largest fraction of the attributed risk but residual association signal remained in the nonexpansion carriers homozygous for the T-allele (OR = 1.75; 95% CI 1.02-3.01; P = 0.042; Supp. Table S4) [Gijselinck et al., 2012]. We used three different PCR genotyping assays to test for pathological G 4 C 2 expansions, making it unlikely that the residual association could be explained by missed mutation carriers. Also, we analyzed C9orf72 for other mutations by sequencing all exons in 493 patients and by exon-based analysis of putative deletions/duplications in 413 patients. Except for one patient-specific missense mutations (c.196A>T in NM 018325.3 [p.Thr66Ser]) without a clear in silico deleterious effect, no other pathogenic mutations were identified. When we compared the distribution of normal repeat lengths between the rare T-allele and the common C-allele, we observed that alleles of at least 7 repeat units were strongly correlated with the Tallele (P < 0.001; Fig. 2B), corresponding with allele group 3 of the trimodal frequency distribution. We defined this group (7-24 units) as intermediate repeat alleles. When we recalculated genetic association of rs2814707 with disease after excluding individuals homozygous for intermediate alleles, the residual association disappeared (P = 0.121; Supp.

Reporter Gene Analysis
To evaluate the effect of repeat length on C9orf72 promoter activity, we measured the RLA of constructs containing a C9orf72 promoter fragment with 2, 9, 17, and 24 units. We demonstrated a highly significant decrease of the RLA between fragments with 9, 17, and 24 units compared with the wild-type allele comprising 2 units (P < 0.001) with a maximum decrease of 52% in the 24 units containing promoter (Fig. 3). These data show that intermediate repeat alleles result in a significantly reduced C9orf72 promoter activity compared with the wild-type alleles.

Discussion
In the present study, we have assessed the geographical distribution of the pathological G 4 C 2 expansions in an extended pan-European patient cohort of FTLD patients, originating from Italy, Germany, Portugal, Sweden, Spain, Czech Republic, Bulgaria, Austria, and Belgium, ascertained within a newly formed, European EOD consortium. The prevalence of pathological G 4 C 2 expansions in the total European cohort (73/845, 8.64%) as well as in the FTLDonly (50/781, 6.40%) and FTLD-ALS (23/64, 35.94%) subgroups, was comparable to frequencies others and we published in the three original gene identification reports on C9orf72 Gijselinck et al., 2012;Renton et al., 2011].
The FTLD syndrome in the European G 4 C 2 expansion carriers was most often characterized by behavioral disturbances (95.7%), which was in line with our observations from an in-depth genotypephenotype correlation study in the Belgian C9orf72 carriers [Van Langenhove et al., 2012b]. Thirty-two percent of expansion carriers also developed concomitant ALS symptomatology. Further, we observed a wide range in onset age and disease duration, suggestive of genetic and/or environmental (epigenetic) disease modifiers [Van Langenhove et al., 2012b]. Of 13 G 4 C 2 expansion carriers, postmortem neuropathological diagnosis confirmed brain deposition of TDP-43 inclusions, explaining 30.23% (13/43) of TDP-43 positive patients. Because up to 70% of the TDP pathology patients remained unresolved after C9orf72 analysis, we performed mutation profiling of the other TDP-43-associated genes GRN, VCP, and TARDBP. This revealed one additional TARDBP mutation, p.Ile383Val, in a patient from the Barcelona Brain Bank. Interestingly, this mutation located in the C-terminal of the protein was reported once before in one US ALS family [Rutherford et al., 2008], yet in this Spanish patient-diagnosis SD, age at onset 60 years, age at death 75 years, father with dementia-no clinical nor pathological signs of MND were recorded. Taken together, of the 43 TDP-43 pathology patients 67.44% (29/43) remained unresolved by the known genes. In 17 familial TDP patients, this represented 58.82% (10/17) of patients without a known mutation, pointing to at least one, but likely more, other TDP-43-associated genes yet to be discovered. Eleven C9orf72 positive TDP-43 pathology carriers were submitted to further indepth histopathological evaluation. Although the neuropathological signature of C9orf72 has proven to be more diverse than initially anticipated (reports ranged from strong TDP proteinopathy reminiscent of FTLD-TDP type A Snowden et al., 2012;Stewart et al., 2012], to complete absence of TDP-43 pathology [Gijselinck et al., 2012;Murray et al., 2011] (reviewed in Cruts et al., Trends Neurosci, under revision), patients in our study all fitted the criteria of TDP proteinopathy type B [Mackenzie et al., 2010], albeit with relatively low lesion load. Eight of the 11 patients displayed concomitant AD-pathology, yet, AD stages were relatively mild (Braak stage I-III for neurofibrillary tangle pathology and Stage B for β-amyloid pathology) and probably this finding was age related more than related to the C9orf72 expansion. In the five cases stained with p62 antibody, 100% showed immunoreactive NCI in the dentate gyrus of the hippocampus, in the CA4 and CA23 pyramidal neurons of the hippocampus, and the granular cortical layer of the cerebellum. As this lesion load is far more pronounced than the TDP-43 lesions, it can be assumed that the presence of p62 pathology is a highly distinctive feature of C9orf72-associated FTLD [Al-Sarraj et al., 2011]. In another recent large-scale screening of the C9orf72 repeat expansion by Majounie et al. (2012), frequencies were reported for a total of 4,448 ALS patients and 1,425 FTLD patients from European and US Caucasian populations, and compared with smaller sets of patients from other ethnic backgrounds. Of interest with respect to the present study, the Majounie study included FTLD patients from the UK, the Netherlands, France, Finland, Germany, Sardinia, and Sweden. Except for a small number of German (n = 29) and Swedish (n = 7) patients, these countries were not represented in our study and vice versa. Further, a study conducted in Denmark identified 10 C9orf72 expansion carriers in a cohort of 82 Danish FTLD patients [Lindquist et al., 2012]. To get an even more representative picture of the prevalence and distribution of the C9orf72 repeat expansion in Western Europe, we performed a metaanalysis of the FTLD cohorts from our European EOD consortium study (n = 845), our previously reported Flanders-Belgian cohort (n = 360) [Gijselinck et al., 2012], the Majounie study (n = 1381) [Majounie et al., 2012], and the Lindquist study (n = 82) [Lindquist et al., 2012]. This resulted in prevalence numbers on 2,668 FTLD patients from 15 European countries (Table 3, countries with patients groups smaller than 20 were not included in the meta-analysis). The overall frequency of the C9orf72 pathological G 4 C 2 expansion in Western Europe was 9.98% in total FTLD, 18.52% in familial FTLD, and 6.26% in sporadic FTLD. Looking at the per country distribution, Belgium, Italy, The Netherlands, Portugal, and the UK showed an average total prevalence ranging from 6.09% to 10.29%. Denmark showed a slightly increased prevalence of 12.20%, and in France the prevalence was about double the average prevalence with 18.00%. In line with the hypothesis of a Scandinavian founder effect for the chromosome 9-C9orf72 haplotype [Mok et al., 2011], frequencies were highest in Finland (29.33%) and high in Sweden (20.73%). Yet, in contrast with this hypothesis and the expected North-South axis for the founder haplotype prevalence, frequencies for C9orf72 reached 25.49% in Spain. To the other extreme, prevalence in Germany was just 4.82%. Of note, Ireland, Luxembourg, Switzerland, Poland, Norway, and Greece are not yet represented in this meta-analysis of Western Europe.
We exactly sized the normal range of repeat alleles in both the Flanders-Belgian and European cohorts using two different assays and observed a range between 2 and 24 units. The allele frequencies showed a trimodal distribution with a long tail containing the rare longer alleles. Alleles in group 3 of this trimodal distribution (intermediate alleles; 7-24 units) are almost solely present on the chromosome 9 risk haplotype tagged by the rs2814707 T-allele. We asked the question whether these alleles might act as a risk factor on FTLD disease susceptibility or could have a mild effect on gene expression. Although intermediate repeat alleles were not significantly associated with disease in the Flanders-Belgian cohort, the slight increase in frequency in patients compared with control individuals explained the residual association we had observed for homozygous carriers of the rs2814707 T-allele (OR = 1.75; 95% CI 1.02-3.01; P = 0.042). A group of 16 FTLD-ALS patients without pathological expansion was included in this cohort. Although this is a small group, it is remarkable that seven of 32 alleles (21.9%) have at least 10 units, which is significantly higher than 7.2% in control individuals (OR = 3.685; 95% CI 1.560-8.706; P = 0.003). This suggests that intermediate alleles might rather have a risk effect in diseases involving ALS. Because the European cohort was collected for a frequency study of the repeat expansion, we did not have ethnicity matched control individuals for an association study.
Flanders-Belgian and European patient carriers of intermediate repeats (7-24) did not show an earlier onset age compared with carriers of shorter repeats (2-6; data not shown). Also, the variable onset age in patients with a pathological G 4 C 2 expansion [Gijselinck et al., 2012], could not be explained by a modifying effect of number of repeat units in the wild-type allele (data not shown), in contrast to what has been shown for patients with Huntington's disease [Aziz et al., 2009].
The mechanism by which the G 4 C 2 intermediate repeats might contribute to disease pathogenesis remained elusive. In this study, we performed reporter gene expression studies to evaluate the effect of intermediate repeats according to their size on C9orf72 promoter activity. We provided compelling evidence that transcriptional activity of the C9orf72 promoter significantly decreases with an increasing amount of repeat units with a maximal reduction of 52% of a 24 units containing promoter compared with 2 units on the wild-type allele (Fig. 3). To better discriminate the cutoff from which decreased transcriptional activity is apparent, more different length alleles should be investigated, although from these data it is clear that intermediate alleles affect the normal transcriptional activity of the C9orf72 promoter. With the cloned intermediate alleles, we did not reach an expression level of virtually 0 as expected in pathogenic expansion alleles. Therefore, individuals with the number of units higher than 24 are needed to better interpret the grey zone between normal and expanded alleles. These in vitro data confirm that the G 4 C 2 repeat is located in the promoter region of C9orf72 as we previously suggested [Gijselinck et al., 2012]. It also favors the loss-of-function hypothesis that was based on 50% reduction of C9orf72 levels in brain of pathological expansion carriers preventing expression of the mutant allele [Gijselinck et al., 2012]. However, in this study, we did not identify other simple sequence mutations or deletions/duplications pointing to a loss-of-function mechanism, except for one patient-specific missense mutation (p.Thr66Ser) without clear pathogenic nature. Therefore, other molecular mechanisms might also contribute to disease including sequestration of RNA-binding proteins and RNA toxicity , which can also lead to a decrease of the mRNA level. These two potential disease mechanisms are not mutually exclusive, and might also lead together to degeneration of neuronal populations in either the frontal cortex (FTLD) or the spinal cord (ALS).
By what mechanisms the G 4 C 2 repeat expands to a pathological size range remains to be discovered. In our first C9orf72 study in the Flanders-Belgian cohort [Gijselinck et al., 2012], we showed that the majority of the pathological G 4 C 2 expansions were located on the same risk haplotype tagged by the rare T-allele of SNP rs2814707, and strongly associated with FTLD and ALS. These observations were confirmed in the European study with 100% of European carriers of a pathological G 4 C 2 expansion carrying at least one T-allele. It has been proposed that this tight genetic association might be explained by a single founder mutation on this risk haplotype [Mok et al., 2011;Majounie et al., 2012]. An alternative hypothesis is that a specific genomic context on this risk haplotype is rendering the G 4 C 2 repeat less stable and making it more prone to expansion into the pathological size range. In this context, we and others  showed that the G 4 C 2 intermediate repeats are also highly significantly overrepresented on the same risk haplotype, suggesting that these intermediate repeats are more prone to replication slippage and unstable inheritance and thus triggering pathological expansions. The observation that intermediate alleles have the same linkage disequilibrium with a specific SNP allele as the expanded alleles is also previously made in other repeat expansion disorders, for example, Fragile X Syndrome [Gunter et al., 1998]. The appearance of pathological G 4 C 2 expansions in apparently sporadic patients in this and previous studies Gijselinck et al., 2012;Renton et al., 2011], seems to support this hypothesis. However, we could not observe increasing G 4 C 2 repeat length or de novo expansions in the younger generations of families of index patients with intermediate repeat length up to 22 units. Therefore, intermediate alleles might rather be considered as predisposing alleles for further stepwise expansion over probably many generations instead of pre-mutations. A study using sporadic ALS trios did also not show evidence of repeat instability between two generations [Pamphlett et al., 2012].
Further, we observed in patients of both the Flanders-Belgian and European cohorts with a pathological G 4 C 2 expansion, a significantly higher frequency of short indels in the GC-rich LCS adjacent to the G 4 C 2 repeat (24/84, 28.57%) compared with patient noncarriers (5/404, 1.24%; P < 0.001) and control persons (4/752, 0.53%; P < 0.001). Remarkably, in 23.81% (20/84) of repeat expansion carriers, a deletion of the GTGGT motif contiguous with the G 4 C 2 repeat was comprised within the indel (Fig. 1), compared with 0.25% in the noncarrier patients and 0.13% in control individuals (Table 1). This GTGGT deletion joins the two GC-rich sequences, increasing the overall GC content of the LCS. This has already been reported for CAG expansion diseases, where the expandability of the repeat increases with a higher GC-content of the surrounding DNA sequence [Brock et al., 1999;Nestor and Monckton, 2011]. Also, the GTGGT deletion creates one long contiguous imperfect G 4 C 2 repeat, which is likely more prone to replication slippage. Therefore, the GTGGT sequence is potentially an important stabilizer of the G 4 C 2 repeat. This resembles the loss of AGG interspersion in unstable Fragile X alleles [Kunst and Warren, 1994;Larsen et al., 2000]. Moreover, all carriers of a GTGGT deletion in the LCS had at least one rs2814707 T-allele and the 10-bp deletion was cotransmitted with the G 4 C 2 expansion in relatives of two expansion carriers, which indicates that it is likely located on the risk haplotype and might trigger the expansion. Nevertheless, it should be noted that not all carriers of a pathological G 4 C 2 expansion also have an LCS indel and thus we cannot exclude that the indels are rather a consequence of the close neighborhood of the expansion destabilizing the genomic region. Of note, the indels found in nonexpansion carriers are mostly not affecting the GTGGT sequence but rather stabilizing the repeat by deleting 100% GC-rich sequence. The GTGGT deletion in one FTLD patient without repeat expansion is located on a short allele with 5 units, which is probably not unstable enough to expand. A GTGGT deletion on an intermediate allele will most likely be not stable and hence not observed.
A better understanding of the underlying mechanism causing the G 4 C 2 instability will be essential to assess disease risk and improve clinical benefits. Further, elucidating how G 4 C 2 expansions lead to disease will be crucial to unveil biological pathways and key molecules in the disease process as targets for future therapies.