Single Nucleotide Polymorphism and FMR1 CGG Repeat Instability in Two Basque Valleys

Authors


Dr. Isabel Arrieta, Department of Genetics, Physical Anthropology and Animal physiology, Faculty of Science and Technology, University of the Basque Country, Apdo. 644-48080, Bilbao, Spain. Tel: (+34) 946012605; Fax: (+34) 946013145; E-mail: mariaisabel.arrieta@ehu.es

Summary

Fragile X Syndrome (FXS, MIM 309550) is mainly due to the expansion of a CGG trinucleotide repeat sequence, found in the 5′ untranslated region of the FMR1 gene. Some studies suggest that stable markers, such as single nucleotide polymorphisms (SNPs) and the study of populations with genetic identity, could provide a distinct advance to investigate the origin of CGG repeat instability. In this study, seven SNPs (WEX28 rs17312728:G>T, WEX70 rs45631657:C>T, WEX1 rs10521868:A>C, ATL1 rs4949:A>G, FMRb rs25707:A>G, WEX17 rs12010481:C>T and WEX10 ss71651741:C>T) have been analyzed in two Basque valleys (Markina and Arratia). We examined the association between these SNPs and the CGG repeat size, the AGG interruption pattern and two microsatellite markers (FRAXAC1 and DXS548). The results suggest that in both valleys WEX28-T, WEX70-C, WEX1-C, ATL1-G, and WEX10-C are preferably associated with cis-acting sequences directly influencing instability. But comparison of the two valleys reveals also important differences with respect to: (1) frequency and structure of “susceptible” alleles and (2) association between “susceptible” alleles and STR and SNP haplotypes. These results may indicate that, in Arratia, SNP status does not identify a pool of susceptible alleles, as it does in Markina. In Arratia valley, the SNP haplotype association reveals also a potential new “protective” factor.

Introduction

Fragile X Syndrome (FXS, MIM 309550) is probably the most common of the 215 X-linked mental retardation diseases described (Chiurazzi et al., 2008). Although other mutational and epigenetic mechanisms could give rise to the disorder (Tabolacci et al., 2008; Collins et al., 2010a; Collins et al., 2010b; Gronskov et al., 2011), the syndrome is mainly due to the expansion of a CGG trinucleotide repeat sequence found in the 5′ untranslated region (5′-UTR) of the first exon of the FMR1 gene (Fu et al., 1991; Oberle et al., 1991; Verkerk et al., 1991; Yu et al., 1991). The location of this gene coincides with a rare folate sensitive fragile site at Xq27.3 (FRAXA; Sutherland & Hecht, 1985; Sutherland, 2007).

There are four alleles for the CGG repeat. Normal alleles have a variable number of CGG repeats (from 6 to ∼50) with a mode of 20, 23, 30, and 40 in Caucasian populations (Brown et al., 1993; Ennis et al., 2007). Fragile X mutations are classified into premutations and full mutations (Oberle et al., 1991; Rousseau et al., 1991). Premutation alleles contain from ∼50 to 200 CGG repeats. In full mutated alleles, the trinucleotide CGG is generally expanded to more than 200 repeats. The term gray zone has been used to define high range normal alleles (from 35 to ∼50 repeats; Peprah et al., 2010). Only full mutation alleles are associated with clinical and cytogenetic expression of Fragile X Syndrome (Rousseau et al., 1991). Nevertheless, the FMR1 gene is also involved in the pathogenesis of two other conditions, premature ovarian failure (POF) and fragile X-associated tremor/ataxia syndrome (FXTAS; Schwartz et al., 1994; Jacquemont et al., 2002; Sherman et al., 2007; Tassone et al., 2007).

The mechanism behind the repeat expansion from normal to full mutation is believed to be complex and to involve multiple steps (Morton & Macpherson, 1992; Kolehmainen, 1994; Morris et al., 1995a; Morris et al., 1995b). Family and population studies have revealed several factors that may be involved in transforming a stable into an unstable allele. Published data reveal that: (1) No direct transition from a normal size to full mutated allele has been detected. (2) Within fragile X families, the risk of expansion from premutation to a full mutation varies with the sex and age of the transmitting parents (Fu et al., 1991; Heitz et al., 1992; Yu et al., 1992; Loesch et al., 1995; Nolin et al., 1996; Sherman et al., 1996; Ashley-Koch et al., 1998). Transmission of normal and gray zone alleles through males is less stable than through females (Sullivan et al., 2002). (3) Instability also increases with the number of repeats (Fu et al., 1991; Heitz et al., 1992; Richards & Sutherland, 1992; Nolin et al., 1996; Sherman et al., 1996; Ashley-Koch et al., 1998; Nolin et al., 2003). (4) Expansion of the CGG trinucleotide repeat is polarized and there are various AGG interspersion patterns that are believed to be responsible for stabilizing the alleles (Eichler et al., 1994; Kunst & Warren, 1994; Eichler et al., 1996; Kunst et al., 1996; Murray et al., 1997; Crawford et al., 2000b; Mathews et al., 2001; Napierala et al., 2005).

In addition to CGG repeat size and structure, other cis-acting factors could play a role in CGG repeat instability. Population studies revealed that mutated alleles are in linkage disequilibrium with linked microsatellite markers. DXS548 and FRAXAC1 are the most studied flanking markers (Morris et al., 1995a; Eichler et al., 1996; Gunter et al., 1998; Ennis et al., 2001). This association is stronger in populations with known founder effects for other diseases (Oudet et al., 1993a; Oudet et al., 1993b; Haataja et al., 1994; Rousseau et al., 1995; Peprah et al., 2010). Gray zone alleles are also in linkage disequilibrium with flanking microsatellites (Dombrowski et al., 2002).

Although instability seems to depend on three cis-acting factors, the size of the CGG repeat, the AGG interspersion pattern, and the STR haplotype, these factors alone do not completely describe the molecular basis for the linkage disequilibrium between normal and fragile X chromosomes. Some studies suggest that more stable markers, such as single nucleotide polymorphisms (SNPs) could provide an important contribution to study the origin of the CGG repeat instability (Gunter et al., 1998; Crawford et al., 2000a; Mathews et al., 2001; Brightwell et al., 2002a; Brightwell et al., 2002b; Curlis et al., 2005; Zhou et al., 2006; Ennis et al., 2007). Results obtained by Gunter et al. (1998) with a SNP (ATL1) suggests that allele state at DXS548 and FRAXAC1 may not be relevant for repeat stability because chromosomes with the same haplotype for these two markers show clearly different transition rates to full mutation status. According to Brightwell et al. (2002b), it is important to study populations of different genetic identity. These authors suggest that SNPs could provide a useful resource for investigating the genetic mechanisms behind instability and expansion of FRAXA triplet repeat.

A previous screening for fragile X syndrome in a mentally retarded group showed an absence of the full mutation among the Basque population (Arrieta et al., 1999a). Subsequent investigations on the FMR1 gene among a normal sample of Basque origin from the Biscay province and other Basque provinces showed a low frequency of large alleles and the maintenance of AGG interruptions on them (Arrieta et al., 1999b; Peñagarikano et al., 2004). In another previous work (Arrieta et al., 2003), we extended our study to Markina and Arratia, two different and isolated Basque valleys from the Biscay province. The results obtained showed differences between Markina and Arratia with respect to factors involved in CGG repeat instability and also a great similarity between the general Basque sample from the Biscay province and that from the Markina valley.

To gain further insight into the genetic diversity of the fragile X CGG repeat in these two Basque valleys, we analyze in this report seven SNPs identified in the vicinity of the FRAXA repeat (Fig. 1). These SNPs are WEX28 (rs17312728:G>T) located at (–184165) base pairs proximal to FRAXA (Brightwell et al., 2002b), WEX70 (rs45631657:C>T) located at (–53907) base pairs proximal to FRAXA (Ennis et al., 2007), WEX1 (rs10521868:A>C) located at (–2064) base pairs proximal to FRAXA (Brightwell et al., 2002b), ATL1 (rs4949:A>G) located in the first intron of the FMR1 gene (Gunter et al., 1998), FMRb (rs25707:A>G) located in the exon 5 of the FMR1 gene (Kunst & Warren 1994), WEX17 (rs12010481:C>T) located at (107335) base pairs proximal to FRAXA (Brightwell et al., 2002b), and WEX10 (ss71651741:C>T) located at (254752) base pairs proximal to FRAXA (Brightwell et al., 2002b).

Figure 1.

Location of FRAXA CGG repeat, short tandem repeats (DXS548 and FRAXAC1), and single nucleotide polymorphisms in and around the FMR1 locus (not to scale). This figure is adapted from Zhou et al. (2006) and Peprah et al. (2010). All DNA elements are arranged from centromere (CEN) to telomere (TEL). The positions relative to FRAXA, in kilobases, have been adopted from Brightwell et al. (2002a) and Ennis et al. (2007). Allele numbering for DXS548 and FRAXAC1 is from Chiurazzi et al. (1999), with the numbers in parentheses representing the base-pair size, according to Rousseau et al. (1995). AGG interspersions within the FMR1 CGG repeat are represented by a (+) sign and the number refers to the triplet length of uninterrupted CGG repeats.

Material and Methods

DNA Sample Material

Individual genomic DNA samples were obtained from 262 healthy unrelated male individuals of Basque origin, 204 from Markina and 58 from Arratia. The sample constitutes a solid proportion of the unrelated and Basque origin population of each valley. Their Basque origin was confirmed by analyzing the individual ancestry on the basis of two criteria: the surnames and the place of birth. Basque surnames constitute a good criterion because they are very different, not only from those of other Spanish populations, but also from valley to valley within the Basque Country. Therefore, an individual is considered autochthonous of one valley if their grandparents and great-grandparents were born in that valley and if their Basque surnames are characteristic of that valley. (In Spain, both the father's and mother's surnames are used in a sequential order, so it is easy to ascertain the grandparents’ and/or the great-grandparents’ surnames.)

Molecular Analysis

Genomic DNA was extracted from peripheral blood leukocytes according to standard procedures (Sambrook et al., 1989). All DNA samples had been previously genotyped for FRAXA repeat size, AGG interspersion pattern and microsatellite markers, DXS548, and FRAXAC1 (Arrieta et al., 2003).

In this work, we analyzed seven SNPs identified in the vicinity of the FMR1 repeat, six used by Brightwell et al. (2002a), and one used by Ennis et al. (2007), that appear to act as a marker of repeat expansion. The seven SNPs were amplified by allele-specific PCRs. Amplifications were carried out according to Brightwell et al. (2002a) and using the primers that had been kindly supplied by G. Brightwell and P.A. Jacobs (Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury, Wiltshire, UK). The SNP alleles were identified on the basis of PCR product sizes after ethidium bromide-stained agarose gel electrophoresis (Brightwell et al., 2002a).

Statistical Methods

A χ2-test of independence was performed (using SPSS software) to examine the allele association among loci and the allele distribution between valleys. When the expected frequencies were <5, Fisher's exact test was used.

Results

Allelic Distribution of SNPs among CGG Normal and Gray Zone Repeats

Table 1 shows the frequency distribution of each SNP in the total study population and within each valley in normal and gray zone alleles. All the SNPs had a total sample population frequency of over 6%; the WEX17-C allele was the rarest (6.87%), followed by the FMRb-A allele (7.63%). The total sample population frequency of some SNPs was fairly similar: WEX1 (A 8.39%, C 91.61%), FMRb (A 7.63%, G 94.37%), WEX17 (C 6.87%, T 93.13%), and WEX10 (T 8.78%, C 91.22%). In contrast, the WEX70-T (62.59%) and the ATL1-A (61.45%) alleles were the most evenly distributed throughout the study population. In the total sample population and among valleys, alleles WEX28-G, WEX70-T, WEX1-C, ATL1-A, FMRb-G, WEX17-T, and WEX10-C were overrepresented (P < 0.01 for ATL1 and WEX70 and P < 0.001 for the other SNPs, χ2-test).

Table 1.  Distribution of SNPs between the individuals with normal and gray zone FMR1 alleles in Markina and Arratia valleys.
SNPsTotal N (%)MarkinaArratia
Normal N (%)Gray zone N (%)Normal N (%)Gray zone N (%)
  1. Note: The frequency of SNP alleles are among normal (6–50) and gray zone range (35–50; Peprah et al., 2010). For each SNP allele the data are for both valleys (total) and for Markina and Arratia separately.a,bAlleles overrepresented in the total study population and among each valley, P < 0.001 and P < 0.01, respectively, χ2-test.

WEX28
 Ga178 (67.94)135 (67.84)0 (0.00)40 (78.43)3 (42.86)
 T84 (32.06)64(32.16)5 (100.00)11 (21.57)4 (57.14)
WEX70
 C98 (37.41)78 (39.20)5 (100.00)10 (19.61)5 (71.43)
 Ta164 (62.59)121 (60.80)0 (0.00)41 (80.39)2 (28.57)
WEX1
 A22 (8.39)13 (6.53)0 (0.00)7 (13.72)2 (28.57)
 Ca240 (91.61)186 (93.47)5 (100.00)44 (86.28)5 (71.43)
ATL1
 Ab161 (61.45)119 (59.80)0 (0.00)40 (78.43)2 (28.57)
 G101 (38.55)80 (40.20)5 (100.00)11 (21.57)5 (71.43)
FMRb
 A20 (7.63)16 (8.04)2 (40.00)0 (0.00)2 (28.57)
 Ga242 (94.37)183 (91.96)3 (60.00)51 (100.00)5 (71.43)
WEX17
 C18 (6.87)13 (6.53)3 (60.00)0 (0.00)2 (28.57)
 Tb244 (93.13)186 (93.47)2 (40.00)51 (100.00)5 (71.43)
WEX10
 Ca239 (91.22)186 (93.47)5 (100.00)44 (86.28)4 (57.14)
 T23 (8.78)13 (6.53)0 (0.00)7 (13.72)3 (42.86)

In Markina, the two alleles of each SNP were found in chromosomes with the normal CGG repeat length, but WEX28-T, WEX70-C, WEX1-C, ATL1-G, and WEX10-C were found in 100% and FMRb-G and WEX17-C in 60% of gray zone chromosomes, respectively. In this valley a significant difference was observed in the distribution of the two SNP alleles between normal and gray zone length repeats for WEX28, WEX70, ATL1, and WEX17 (P < 0.05 for ATL1 and WEX70 and P < 0.01 for WEX28 and WEX17, Fisher's exact test). We have also seen that allele T of WEX28, allele C of WEX70, allele C of WEX1, allele G of ATL1, and allele C of WEX10 were overrepresented among gray zone length CGG repeats in Markina valley (P < 0.05 in all cases, Fisher's exact test).

In Arratia, the two alleles of each SNP were observed in gray zone length CGG repeats but only one allele of FMRb and WEX17 were observed in normal length CGG repeats. Among gray zone chromosomes, the SNP WEX70-C, WEX1-C, ATL1-G, and FMRb-G alleles were found on 71.43%, WEX28-T and WEX10-C on 57.14%, and WEX17-C on 28.57%, respectively. In this valley, significant differences were found in the distribution of the SNP alleles between normal and gray zone length repeats for WEX70, ATL1, FMRb, and WEX17 (P < 0.01 for WEX70 and ATL1 P < 0.01 for FMRb and WEX17, Fisher's exact test). In Arratia, allele C of WEX70, allele C of WEX1, allele G of ATL1, allele G of FMRb, and allele T of WEX17 were overrepresented among gray zone length CGG repeats, but there were no significant differences (P > 0.05 in all cases, Fisher's exact test).

SNPs and AGG Interspersion Pattern

Table 2 shows the distribution of SNPs and AGG interspersion pattern among CGG normal repeats in Markina and Arratia. In relation to AGG position, 10 different alleles were observed. Only three chromosomes without AGGs were observed. Most alleles (187, 71.37%) possessed substructures of the type (CGG)10AGG(CGG)9AGG(CGG)n (10 + 9+n) or (CGG)10AGG(CGG)n (10+n). We determined also the 3′ pure CGG repeat length (n), and divided this into two groups, one with a pure 3′ CGG repeat length of <20 (normal) and another with a pure 3′ CGG repeat length within the range of 20–30 (intermediate). Just 0.81% of alleles were found to have a 3′ pure CGG repeat length ≥20. In both valleys there was a positive association between the chromosomes that had two AGG interruptions, the first in the 11th position and the second in the 21st position (10 + 9+n) with a 3′ pure CGG repeat length <20 and alleles WEX28-G, WEX70-T, WEX1-C, ATL1-A, FMRb-G, WEX17-T, and WEX10-C. The same SNP alleles had a positive association (although with a lower frequency) with the chromosomes that had the AGG interruption in the 11th position (10+n) also with a 3′ pure CGG repeat length <20. In Markina, 8.54% of WEX28-T, 11.06% of WEX70-C, 0.50% of WEX1-C, 13.06% of ATL1-G, 10.55% of FMRb-G, 1.50% of WEX17-C, and 12.06% of WEX10-C chromosomes had the first AGG interruption in the 10th position (9+n). In Arratia, 5.88% of WEX28-T, WEX70-C, ATL1-G, FMRb-G, and WEX10-C chromosomes, 3.92% of WEX1-C, and 0% of WEX17-C chromosomes had the first AGG interruption in the 10th position (9+n).

Table 2.  Distribution of SNPs according to the AGG interspersion pattern among FMR1 normal alleles (<35) in Markina and Arratia valleys.
 AGG PatternTotal (N)WEX28 (%)WEX70 (%)WEX1 (%)ATL1 (%)FMRb (%)WEX17 (%)WEX10 (%)
  1. Note: In AGG pattern the position of an AGG is designated by a plus sign (+), the number refers to the triplet length of uninterrupted CGG repeats and n is the remaining number of uninterrupted repeats. Only the frequency of one of the two SNP alleles is indicated.

Markina 199G/TC/TA/CA/GA/GC/TC/T
 19, 20, 232G 1.01T 1.01C 1.01A 1.01G 1.01T 1.01C 1.01
 9+n10G 3.02T 0.50C 4.02A 0.00G 5.03T 5.03C 5.03
 10+n46G 23.11T 15.57C 23.11A 15.07G 23.11T 23.11C 17.59
 12 + 109G 2.51T 3.01C 4.52A 3.51G 4.52T 4.52C 4.52
 15 + 91G 0.50T 0.00C 0.00A 0.00G 0.50T 0.50C 0.50
 18+n10G 0.00T 0.00C 5.03A 1.51G 0.00T 0.00C 5.03
 10 + 9+n102G 36.18T 38.69C 51.25A 37.18G 51.25T 51.25C 51.25
 9 + 9+n8G 1.51T 1.51C 4.02A 0.00G 1.51T 2.51C 3.01
 9 + 12 + 98G 0.00T 0.00C 0.00A 0.00G 4.02T 4.02C 4.02
 10 + 10 + 123G 0.00T 1.51C 0.50A 1.51G 1.01T 1.51C 1.51
Arratia 51       
 19, 20, 231G 1.96T 0.00C 1.96A 1.96G 1.96T 1.96C 1.96
 9+n1G 0.00T 0.00C 0.00A 0.00G 1.96T 1.96C 1.96
 10+n12G 23.53T 21.57C 23.53A 23.53G 23.53T 23.53C 15.69
 12 + 107G 11.77T 1.96C 5.89A 13.73G 13.73T 13.73C 13.73
 18+n1G 1.96T 0.00C 1.96A 0.00G 1.96T 1.96C 1.96
 10 + 9+n27G 39.21T 41.17C 52.94A 39.21G 52.94T 52.94C 47.06
 9 + 9+n2G 0.00T 0.00C 0.00A 0.00G 3.92T 3.92C 3.92

The distribution of SNPs and AGG interspersion patterns within gray zone CGG repeats are shown in Table 3. Among gray zone alleles, 80% and 85.71% had a double interruption and 20% and 14.92% had a triple interruption in Markina and Arratia, respectively. In both valleys 100% of gray zone alleles had structures with the first AGG interruption in the 10th position. On the other hand, 33% of gray zone alleles were found to have a 3′ pure CGG repeat length ≥20. All these alleles appeared in the Arratia valley. In Markina, 100% of WEX28-T, WEX70-C, WEX1-C, ATL1-G, and WEX10-C and 60% of FMRb-G and WEX17-C had the first AGG interruption in the 10th position (9+n). In Arratia, 71.43% of WEX70-C, WEX1-C, ATL1-G, and FMRb-G, 57.14% of WEX28-T and WEX10-C, and 28.57% of WEX17-C had the first interruption in the 10th position (9+n).

Table 3.  Distribution of SNPs according to the AGG interspersion pattern among FMR1 gray zone alleles (≥35) in Markina and Arratia valleys.
 AGG PatternTotal (N)WEX28 (%)WEX70 (%)WEX1 (%)ATL1 (%)FMRb (%)WEX17 (%)WEX10 (%)
  1. Note: In AGG pattern the position of an AGG is designated by a plus sign (+), the number refers to the triplet length of uninterrupted CGG repeats and n is the remaining number of uninterrupted repeats. Only the frequency of one of the two SNP alleles is indicated.

Markina 5G/TC/TA/CA/GA/GC/TC/T
 9 + 9 + 5 + 111T 20.00C 20.00C 20.00G 20.00G 20.00T 0.00C 20.00
 9 + 10 + 154T 80.00C 80.00C 80.00G 80.00G 40.00T 40.00C 80.00
Arratia 7       
 9 + 7 + 202T 28.57C 28.57C 28.57G 28.57G 28.57T 14.29C 14.29
 9 + 9 + 9 + 91T 14.29C 14.29C14.29G 14.29G 14.29T 14.29C 14.29
 9 + 9 + 161T 14.29C 14.29C 14.29G 14.29G 14.29T 14.29C 14.29
 9 + 9 + 221T 0.00C 14.29C 0.00G 14.29G 14.29T 14.29C 14.29
 9 + 9 + 231T 0.00C 0.00C 0.00G 0.00G 0.00T 14.29C 0.00
 9 + 10 + 141T 0.00C 0.00C 14.29G 0.00G 0.00T 0.00C 0.00

SNPs and FRAXAC1-DXS548 Haplotype Associations

Among CGG normal repeats, 10 and 9 different haplotypes for FRAXAC1-DXS548 were observed in Markina and Arratia, respectively (Table 4). The distribution of SNP alleles within haplogroups was different in the valleys. In Markina, WEX28-T, WEX70-C, WEX1-C, ATL1-G, FMRb-G, WEX17-T, and WEX10-C alleles, and in Arratia WEX1-C, FMRb-G, WEX17-T, and WEX10-C alleles were associated with many more different haplotypes than the other SNP alleles. The most common haplotype in both valleys was 38–40. This haplotype was overrepresented among normal alleles only in Markina (P < 0.001, Fisher's exact test). In both valleys the majority of individuals with WEX28-G, WEX70-T, WEX1-C, ATL1-A, FMRb-G, WEX17-T, and WEX10-C were found in haplotype 38–40 (P < 0.01, χ2-test in all cases).

Table 4.  Distribution of SNPs according to the FRAXAC1-DXS548 haplotypes among FMR1 normal alleles (<35) in Markina and Arratia valleys.
 HaplotypesTotal (N)WEX28 (%)WEX70 (%)WEX1 (%)ATL1 (%)FMRb (%)WEX17 (%)WEX10 (%)
  1. Note: Allele nomenclature of haplotypes according to Chiurazzi et al. (1999). Only the frequency of one of the two SNP alleles is indicated.

  2. aAlleles overrepresented in the most common haplotype (38–40; P < 0.01, χ2-test).

Markina 199G/TC/TA/CA/GA/GC/TC/T
 34–402G 1.00T 0.50C 1.00A 0.50G 1.00T 1.00C 1.00
 36–4212G 1.50T 0.50C 0.00A 1.00G 6.03T 6.03C 6.03
 36–483G 0.00T 0.00C 1.50A 0.00G 1.50T 1.50C 1.50
 36–507G 0.00T 0.00C 3.52A 0.00G 3.52T 3.52C 3.52
 38–40137G 59.32aT 48.75aC 68.36aA 47.24aG 65.34aT 64.34aC 63.33a
 38–4219G 1.00T 7.54C 9.55A 8.04G 9.55T 8.54C 9.55
 42–4210G 5.03T 3.02C 5.03A 3.02G 5.03T 5.03C 5.03
 42–486G 0.00T 0.00C 3.01A 0.00G 0.00T 3.01C 3.01
 42–502G 0.00T 0.00C 1.00A 0.00G 0.00T 0.50C 0.50
 42–541G 0.00T 0.00C 0.50A 0.00G 0.00T 0.00C 0.00
Arratia 51       
 34–401G 1.96T 1.96C 1.96A 1.96G 1.96T 1.96C 0.00
 36–401G 0.00T 1.96C 1.96A 1.96G 1.96T 1.96C 1.96
 36–421G 1.96T 0.00C 0.00A 0.00G 1.96T 1.96C 1.96
 36–481G 0.00T 0.00C 0.00A 0.00G 1.96T 1.96C 1.96
 38–4033G 64.71aT 54.90aC 54.90aA 58.83aG 64.71aT 64.71aC 52.94a
 38–427G 7.85T 11.77C 13.73A 13.73G 13.73T 13.73C 13.73
 40–401G 1.96T 1.96C 1.96A 1.96G 1.96T 1.96C 1.96
 42–483G 0.00T 0.00C 5.88A 0.00G 5.88T 5.88C 5.88
 42–503G 0.00T 0.00C 5.88A 0.00G 5.88T 5.88C 5.88

The distribution of SNP alleles and FRAXAC1-DXS548 haplotypes within gray zone CGG repeats are shown in Table 5. Three different haplotypes for FRAXAC1-DXS548 were observed in both valleys. In Markina, as in the general Basque sample from Biscay, the most common haplotype was 42–50. In this valley each of the seven SNPs correlated with the haplotype 42–50. For example, 60% of individuals with alleles WEX28-T, WEX70-C, WEX1-C, ATL1-G, WEX17-C, and WEX10-C and 40% with allele FMRb-A were found in haplotype 42–50. In Arratia the most common haplotype was 38–40; this haplotype was also overrepresented among CGG normal repeats. In this valley, 57.15% of individuals with alleles WEX1-C, FMRb-G, and WEX17-T and 42.86% with WEX28-G, WEX70-C, ATL1-G, and WEX10-C alleles were found in haplotype 38–40. In relation to the other haplotypes, in Markina 20% of individuals with alleles WEX28-T, WEX70-C, WEX1-C, ATL1-G, FMRb-G, WEX17-T, and WEX10-C were associated with haplotypes 36–48 and 36–50. In Arratia, 14.28% of individuals with alleles WEX28-T, WEX70-C, ATL1-G, FMRb-G, and WEX17-T were associated with haplotypes 36–48 and 42–48. In this valley the two alleles of SNPs WEX1 and WEX10 were associated with haplotypes 36–48 and 42–48.

Table 5.  Distribution of SNPs according to the FRAXAC1-DXS548 haplotypes among FMR1 gray zone alleles (≥35) in Markina and Arratia valleys
 HaplotypesTotal (N)WEX28 (%)WEX70 (%)WEX1 (%)ATL1 (%)FMRb (%)WEX17 (%)WEX10 (%)
  1. Note: Allele nomenclature of haplotypes according to Chiurazzi et al. (1999). Only the frequency of one of the two SNP alleles is indicated.

  2. aAlleles overrepresented in one of the most prevalent fragile X FRAXAC1-DXS548 haplotype (42–50; P≤ 0.05, Fisher's exact test).

Markina 5G/TC/TA/CA/GA/GC/TC/T
 36–481T 20.00C 20.00C 20.00G 20.00G 20.00C 0.00C 20.00
 36–501T 20.00C 20.00C 20.00G 20.00G 20.00C 0.00C 20.00
 42–503T 60.00aC 60.00aC 60.00aG 60.00aG 20.00C 60.00aC 60.00a
Arratia 7       
 36–481T 14.28C 14.28C 0.00G 14.28G 0.00C 14.28C 14.28
 38–405T 28.58C 42.86C 57.16G 42.86G 57.16C 14.28C 42.86
 42–481T 14.28C 14.28C 14.28G 14.28G 14.28C 0.00C 0.00

Certain FRAXAC1-DXS548 haplotypes have been previously described as positively or negatively associated with the fragile X mutation. One of the most prevalent fragile X FRAXAC1-DX548 haplotypes was 42–50, as was observed in the general Basque sample from Biscay. This haplotype was associated with gray zone alleles only in Markina. A significant positive association between this haplotype, the gray zone CGG repeat length and the SNP WEX28-T, WEX70-C, WEX1-C, ATL1-G, WEX17-C, and WEX10-C alleles was found only in Markina (P≤ 0.05 in all cases, Fisher's exact test).

Discussion

The genetic mechanisms behind instability and expansion of the FMR1 gene are still not fully understood. To investigate the association between a predisposition to expansion and genetic alterations that may be acting in cis, in this study we used seven SNPs linked to the FRAXA locus. Data from Mathews et al. (2001) indicate that cis-acting factors, defined by SNPs, because of their relative low mutation rate, remain in the population without sustaining recurrent mutation much longer than do microsatellites. Therefore, these markers have a greater time depth than microsatellites, revealing features of ancient history that could provide a distinct advance to studies investigating the origin of CGG repeat instability of FMR1.

Following Arrieta et al. (2003), Basques are an ancient population now living in the west of the Pyrenees Mountains. They speak an ancient language with very distinct characteristics from the surrounding populations. The Basque language, “Euskara,” is an extreme case of a relic language that has survived through thousands of years of continuous linguistic turnover in neighboring regions (Cavalli-Sforza, 1991). According to Cavalli-Sforza & Piazza (1993), “conservation of a distinct language must have been an important factor in maintaining social and genetic identity.” Although it might seem difficult to delimit a series of natural regions, in the Basque Country it is possible to distinguish different valleys with common features but a clear peculiarity that makes them different. This study was performed in two valleys from Biscay province, Markina and Arratia. The orography of this province, with 80 km of coast and several rivers, has caused the appearance of these valleys, each of them with their own resources and a relative isolation until recent times. The isolation that has occurred even among the valleys is documented also by the Basque linguistic differentiation between them. Markina is sited in the northeast side. This region shows isolating features because of the presence of a mountain range parallel to the sea. Arratia is located in the south of the province between a river and a mountain range, and also shows isolating features. Farming and cattle breeding were very important until the beginning of the 20th century.

According to Gunter et al. (1998), we believe that the SNPs analyzed are unlikely to directly affect repeat stability as they are several kb away from the CGG repeat of FMR1. Thus, SNPs are more likely to be linked to cis-acting sequences directly influencing stability. Accordingly, to evaluate the role of SNPs in the CGG repeat instability in the two Basque valleys analyzed and gain further insight into their genetic diversity, we examined the association between SNP alleles and all the known risk factors identified with predisposition to CGG repeat expansion in FMR1: the CGG repeat size, the AGG interruption pattern, and two microsatellite markers (FRAXAC1 and DXS548).

With respect to CGG repeat size, the alleles of the seven SNPs analyzed were not equally linked to chromosomes with CGG normal and gray zone repeats. In the total sample and also in the valleys, the analysis of the SNPs showed that the WEX28(G)–WEX70(T)–WEX1(C)–ATL1(A)–FMRb(G)–WEX17(T)–WEX10(C) haplotype was found more frequently in the normal CGG repeat.

There is also an association between some SNP alleles and gray zone chromosomes, and this association appears preferably in Markina because all the individuals with WEX28-T, WEX70-C, WEX1-C, ATL1-G, and WEX10-C alleles were found in gray zone chromosomes. In populations of Caucasian origin, Gunter et al. (1998), Crawford et al. (2000a) and Curlis et al. (2005) found that allele G of ATL1 was in significant linkage disequilibrium with gray zone CGG repeat alleles and Ennis et al. (2007) found that allele C of WEX70 occurred on chromosomes with gray zone CGG repeats. Moreover, Kunst & Warren (1994) and Curlis et al. (2005) found that the frequency of allele A of FMRb was much higher in gray zone alleles. Nevertheless, in the valleys analyzed, allele G of FMRb was overrepresented among normal and gray zone alleles. The data also showed that the association of ATL1-G and WEX70-C alleles found in gray zone CGG repeats in Markina was more similar to that found in other Caucasian populations than to that found in Arratia.

The analysis of the relationship between SNPs and AGG interruption pattern of the FMR1 CGG repeat in normal chromosomes and in both valleys showed again that the WEX28(G)–WEX70(T)–WEX1(C)–ATL1(A)–FMRb(G)–WEX17(T)–WEX10(C) haplotype was tightly associated with the most frequent repeat patterns 10 + 9+n and 10+n. A study by Zhou et al. (2006) suggested that the presence of the ATL1-A allele is a strong predictor of the presence in cis of the 10 + 9+n repeat pattern. According to Crawford et al. (2000b) if 9+n structures (with the first AGG interruption in the 10th position) were inherently more unstable than 10+n structures (with the first AGG interruption in the 11th position), one might expect to see an overabundance of 9+n structures among gray zone alleles. In fact, in both valleys all the gray zone chromosomes were linked to the 9+n configurations. In Markina, 100% of individuals with the WEX28(T)–WEX70(C)–WEX1(C)–ATL1(G)–WEX10(C) haplotype were associated with 9+n tracts. In this valley gray zone alleles had ≤20 pure repeats at the 3′ end. In Arratia approximately 60% of individuals with the WEX28(T)–WEX70(C)–WEX1(C)–ATL1(G)–WEX10(C) haplotype were associated with 9+n structures. In this valley 57% of gray zone alleles had ≥20 pure repeats at the 3′ end. As in other analyzed populations of Caucasian origin (Gunter et al., 1998; Crawford et al., 2000b; Ennis et al., 2001), in both valleys allele G of ATL1 was associated with 9+n structures.

In our previous study (Arrieta et al., 2003), a significant positive association between the “normal” FRAXAC1-DXS548 haplotype 38–40 and the lowest CGG repeats (<35) was found in Markina and Arratia. In this study, this STR haplotype was also associated in both valleys with the same SNP haplotype found in excess in the normal CGG repeats and with the AGG interspersion patterns 10 + 9+n and 10 + 9. In gray zone chromosomes, in Markina valley, one of the most prevalent fragile X FRAXAC1-DXS548 haplotypes, 42–50, was linked exclusively to the SNP WEX28(T)–WEX70(C)–WEX1(C)–ATL1(G)–WEX10(C) haplotype. In studies of Caucasian populations, Brightwell et al. (2002a) showed that the 42–50 haplotype was linked exclusively to allele C of WEX1; Gunter et al. (1998), Ennis et al. (2001), and Brightwell et al. (2002a) showed that allele G of ATL1 was found in association with haplotype 42–50; and the study of Ennis et al. (2007) suggested that the allele C of WEX70 arose on chromosomes with haplotype 42–50. Ennis et al. (2007) also proposed that allele C of WEX70 may be of importance in assigning high/low expansion risk and suggested that “further investigations of this SNP in samples of diverse ethnicity will help to establish the etiology underlying our observations.”

These results suggest that the two alleles of WEX28, WEX70, WEX1, ATL1, and WEX10 are not equally associated to chromosomes with CGG repeat instability. In both valleys, allele T of WEX28, allele C of WEX70, allele C of WEX1, allele G of ATL1, and allele C of WEX10 are associated with gray zone alleles, with the AGG repeat in the 10th position of the FMR1 triplet array. Considering that instability increases with the number of the repeat (Fu et al., 1991; Heitz et al., 1992; Richards & Sutherland, 1992; Nolin et al., 1996; Sherman et al., 1996; Ashley-Koch et al., 1998; Nolin et al., 2003) and that the position of the first AGG interruption might be also a factor for instability (Eichler et al., 1994; Kunst & Warren, 1994; Eichler et al., 1996; Kunst et al., 1996; Murray et al., 1997; Crawford et al., 2000b; Mathews et al., 2001; Napierala et al., 2005), then, these SNP alleles are preferably associated with cis-acting sequences directly influencing instability.

The comparison of the two Basque valleys analyzed also reveals many important differences: (1) Frequency and structure of “susceptible” alleles are different in the two Basque valleys analyzed. Arratia has a slightly increased frequency of “susceptible” alleles, defined by the 3′ pure CGG repeat length and the number of gray zone alleles. (2) Association between susceptible alleles and STR and SNP haplotypes are also different between Markina and Arratia. Only in Markina, all the individuals with gray zone alleles and 9+n structures are associated with WEX28(T)–WEX70(C)–WEX1(C)–ATL1(G)–WEX10 (C) haplotype. Moreover, only in Markina, in gray zone chromosomes all the individuals with the most prevalent fragile X FRAXAC1-DXS548 (42–50) haplotype are associated with the SNP WEX28(T)–WEX70(C)–WEX1(C)–ATL1(G)–WEX10(C) haplotype.

The results may indicate that in Arratia, the SNP status does not identify a pool of susceptible alleles as it does in Markina. The data obtained from Markina in relation to the SNP WEX70 supports the suggestions of Ennis et al. (2007). In Arratia the SNP haplotype association reveals again a potential new “protective” factor. According to Crawford et al. (2000a), the haplotype association between SNPs, flanking microsatellites, and FRAXA repeat size probably reflects the mutational and population history of the CGG repeat expansion, rather than identifying susceptible haplotypes involved in the mechanism of instability.

Acknowledgements

We acknowledge the individuals from Markina and Arratia for their cooperation in this study. This work was supported by the Department of Education, Universities, and Research of the Basque Government (IT-409–07) and by the Vice-rectorate for research of the University of the Basque Country (GIU 10/05).

Ancillary