Phenotype‐genotype relations in facioscapulohumeral muscular dystrophy type 1

To determine how much of the clinical variability in facioscapulohumeral muscular dystrophy type 1 (FSHD1) can be explained by the D4Z4 repeat array size, D4Z4 methylation and familial factors, we included 152 carriers of an FSHD1 allele (23 single cases, 129 familial cases from 37 families) and performed state‐of‐the‐art genetic testing, extensive clinical evaluation and quantitative muscle MRI. Familial factors accounted for 50% of the variance in disease severity (FSHD clinical score). The explained variance by the D4Z4 repeat array size for disease severity was limited (approximately 10%), and varied per body region (facial muscles, upper and lower extremities approximately 30%, 15% and 3%, respectively). Unaffected gene carriers had longer repeat array sizes compared to symptomatic individuals (7.3 vs 6.0 units, P = 0.000) and slightly higher Delta1 methylation levels (D4Z4 methylation corrected for repeat size, 0.96 vs −2.46, P = 0.048).

To determine how much of the clinical variability in facioscapulohumeral muscular dystrophy type 1 (FSHD1) can be explained by the D4Z4 repeat array size, D4Z4 methylation and familial factors, we included 152 carriers of an FSHD1 allele (23 single cases, 129 familial cases from 37 families) and performed state-of-the-art genetic testing, extensive clinical evaluation and quantitative muscle MRI. Familial factors accounted for 50% of the variance in disease severity (FSHD clinical score). The explained variance by the D4Z4 repeat array size for disease severity was limited (approximately 10%), and varied per body region (facial muscles, upper and lower extremities approximately 30%, 15% and 3%, respectively). Unaffected gene carriers had longer repeat array sizes compared to symptomatic individuals (7.3 vs 6.0 units, P = 0.000) and slightly higher Delta1 methylation levels (D4Z4 methylation corrected for repeat size, 0.96 vs −2.46, P = 0.048).
The D4Z4 repeat array size and D4Z4 methylation contribute to variability in disease severity and penetrance, but other disease modifying factors must be involved as well. The larger effect of the D4Z4 repeat array on facial muscle involvement suggests that these muscles are more sensitive to the influence of the FSHD1 locus itself, whereas leg muscle involvement seems highly dependent on modifying factors.

K E Y W O R D S
disease modifiers, epigenetics, facioscapulohumeral muscular dystrophy (FSHD), genotype, phenotype 1 | INTRODUCTION Facioscapulohumeral muscular dystrophy (FSHD) is one of the most common inherited muscle disorders. 1 It is characterized by progressive and typically asymmetrical weakness and wasting of facial, shoulder girdle and upper arm muscles, and often also trunk and leg muscles. 2 The degree of muscle involvement and the rate of disease progression are highly variable both between and within families. 3,4 FSHD is caused by the expression of the DUX4 transcription factor that is normally suppressed in somatic cells. 5,6 A copy of the DUX4 gene is located within each unit of the D4Z4 repeat array on chromosome 4q35 and a complete DUX4 gene in the most distal D4Z4 unit. In the normal population this repeat array varies between 8 and 100 units, whereas in facioscapulohumeral muscular dystrophy type 1 (FSHD1), the most common form of FSHD, it is contracted to 1 to 10 D4Z4 units. 7 gene instead of a repeat contraction. 9,10 Both gene products are necessary to establish or maintain a repressive D4Z4 chromatin structure in somatic tissue. Both in FSHD1 and FSHD2 the mutations only lead to disease if they are present on specific haplotypes that provide the necessary polyadenylation signal (DUXPAS) to stabilize the DUX4 transcript. 7,11 A small number of FSHD patients cannot be genetically explained by these two mechanisms.
For FSHD1 a rough and inverse correlation between the number of D4Z4 repeat units and disease severity has been repeatedly described. 3,[12][13][14][15] The majority of patients with 1 to 3 repeat units has a severe phenotype, while patients with 7 to 10 repeat units tend to be more mildly affected. 16,17 However, variability in disease severity is large for all repeat array sizes. Within families with repeat array sizes of 7 to 10 units, asymptomatic or non-penetrant gene carriers are found frequently (up to 30% of family members). 15,18 These longersized repeat arrays are also found in 1% to 2% of the healthy Caucasian population, indicating that they are disease permissive, but not always pathogenic. 19,20 Since the discovery of the disease mechanism for FSHD2, it is becoming increasingly clear that not only the D4Z4 repeat size, but also the epigenetic state of the D4Z4 locus contributes to the disease severity and penetrance. Observations that pathogenic variants in SMCHD1 aggravate disease severity in FSHD1 families suggested that D4Z4 chromatin modifiers influence DUX4 expression in skeletal muscle. 21 This hypothesis was supported by the lower CpG methylation level that was found in symptomatic individuals with 7 to 10 repeat units compared to asymptomatic and non-penetrant gene carriers with the same repeat size. 22 Still, with the current knowledge on the disease mechanism we cannot adequately explain the large clinical variability, even within families. Most likely, disease severity and penetrance are determined through a complex interplay of genetic, epigenetic and environmental and/or lifestyle factors. Two of the contributing factors are the D4Z4 repeat array size and D4Z4 chromatin structure (reflected by the methylation level), although it is unclear how much of the clinical variability can be explained by these factors. Additionally, because of the characteristic pattern of muscle involvement, the influence of the genetic defect and disease-modifying factors may differ between body regions or muscle groups.
In this study, we combine state-of-the-art genetic testing for FSHD with extensive clinical data in a large cohort of FSHD1 patients to assess how much of the clinical variability can be explained with our current knowledge on the (epi)genetic mechanism. We use family data to estimate the influence of familial factors on disease severity and include a detailed description of clinical features to further refine phenotype-genotype correlations.

| Patients
We recruited patients through the Neurology department of the Radboud University Medical Center, the national referral center for FSHD patients in the Netherlands between 2014 and 2015. We performed genetic testing on individuals aged 18 years or older and (a) with an FSHD phenotype, or (b) without an FSHD phenotype, but with at least one affected first degree family member. All individuals who tested positive for FSHD1 (D4Z4 repeat array size 1-10 units on a DUX4PAS containing haplotype) were included. 11,19 Exclusion criteria were the presence of pathogenic variants in SMCHD1 or DNMT3B and somatic mosaicism for the D4Z4 repeat array contraction. Asymptomatic mutation carriers were defined as individuals aged 25 years and older who did not report symptoms of FSHD on history taking, but who showed signs of FSHD on physical examination. Nonpenetrant mutation carriers were aged 25 years and older, reported no symptoms and had no signs of FSHD on physical examination.

| Genetic testing
For all samples we isolated blood-derived genomic DNA (gDNA), which was analyzed for D4Z4 repeat size and haplotype on chromosomes 4q and 10q, as previously described. 11 Southern blot analysis of gDNA after digestion with the methylation sensitive restriction enzyme FseI was used to determine the CpG methylation at the D4Z4 repeats on chromosomes 4 and 10. The Delta1 score as measure for the degree of D4Z4 hypomethylation was calculated as previously described. 22 D4Z4 CpG methylation is repeat size dependent and the Delta1 score indicates the differences between the expected D4Z4 CpG methylation based on the number of repeat units, and the observed methylation. 22 Detailed protocols are freely available from the Fields Center website (www.urmc.rochester.edu/fields-center).

| Muscle MRI
Ninety patients also participated in a large quantitative muscle MRI study on FSHD. 26 Scanning protocol and data processing are described in detail elsewhere. 26 The MR imaging was performed on a 3-Tesla MR system (TIM Trio; Siemens, Erlangen, Germany). Briefly, the legs were scanned using a Dixon 2.0 sequence. Slice thickness was set at 5 mm. The Dixon sequence fat fraction map was used to draw a region of interest for each of the leg muscles. Muscle fat fractions were calculated per region of interest. Fat fractions below 15% are considered normal. 27

| Protocol approval
This study was conducted according to the principles of the Declaration of Helsinki (version October 2013) and in accordance with the Medical Research Involving Human Subjects Act (WMO). The study protocol was approved by the regional medical ethics committee (CMO region Arnhem-Nijmegen). All participants signed informed consent.

| Statistical analyses
Statistical analyses were performed using SPSS Statistics version 22 and R Studio version 3.2. Descriptive statistics (mean, SD, range and frequency) were calculated for each variable. The relationship between the agecorrected FSHD clinical score (FSHD clinical score divided by age at examination) and the D4Z4 repeat array size was studied visually by a scatter plot and a best fitted trend line based on the least-squared method.
Next, nested linear regression models were fitted to study the fraction of variance in disease severity (FSHD clinical score, dependent variable) explained by each of the variables age, sex, D4Z4 repeat array size and Delta1 methylation score (independent variables). This was performed by adding the independent variables stepwise to the model to assess the additional explained variance by each of the added variables.
Because we were mainly interested in the additional value of D4Z4 repeat array size and Delta1 methylation score, the "baseline model"   Table 1. There was  Figure 1 shows a scatter plot of the age-corrected disease severity (FSHD clinical score) against the D4Z4 repeat array size. Patients with 7 to 9 D4Z4 repeat units were less severely affected than patients with 3 to 6 D4Z4 repeat units (age-corrected FSHD clinical score 9.7 vs 16.2, P = 0.000). For the Delta1 methylation score no significant association with age-corrected disease severity was found ( Figure 2).

| Explained variance in disease severity
The Delta1 methylation score decreased with an increase in the D4Z4 repeat array size ( Table 2). The explained variance (coefficient of determination R 2 ) for the various nested linear regression models are given in Table 3.

| Disease penetrance
This study included 14 asymptomatic and 9 non-penetrant gene car-

| Explained variance in disease severity per body region
We further refined the phenotype-genotype relations by applying the nested linear models (models 1-3) on various outcome measures for three body regions: the face, the upper extremities and the lower extremities (Table 3). We found that approximately 30% of the variance in the involvement of the facial muscles (facial score) was are contradictory. [28][29][30] One study on aerobic exercise in FSHD showed that it slows down disease progression in leg muscles. 31 The characteristic pattern of muscle involvement in FSHD prompted us to assess whether the influence of genetic and epigenetic factors differs per body region or muscle group. Indeed, the D4Z4 repeat array size had a stronger influence on the degree of facial weakness than on the upper and lower extremity involvement.
This is in line with previous studies showing that patients with a facial-sparing phenotype generally have repeat array sizes of >30 kb (approximately 7 units). [32][33][34][35][36][37][38][39][40] In contrast to the facial muscles, leg muscle involvement was influenced by age, but hardly by D4Z4 repeat array size. Remarkably, there was no difference in the influence of the D4Z4 repeat array size and methylation between frequently involved and frequently spared leg muscles. 26,41 These findings raise the question whether the facial muscles, that represent the most characteristic and often first symptom of FSHD, are more sensitive to (differences in) DUX4 expression levels than other muscles. There is no data on a histological or molecular level of the facial muscles in FSHD because they cannot be biopsied, and also on a clinical level knowledge is lacking. However, given the recent studies suggesting a functional relationship between DUX4 and the myogenic Pax3 and Pax7 homeodomain transcription factors, but not with other related homeodomains such as Pitx2 and Tbx1, it is tempting to speculate that facial muscles are more susceptible to DUX4 damage during development. 42,43 The small influence of the D4Z4 repeat array size on the degree of leg muscle involvement suggests that these muscles are more sensitive to modifying factors, or that compensation by other myogenic homeobox proteins takes place. Because all patients with leg muscle involvement also had some degree of facial and/or shoulder girdle muscle involvement, DUX4 expression is likely to be required as a trigger to induce disease activity in the leg muscles. This could indicate that the involvement of leg muscles is results from a complex interplay of downstream effects of DUX4 together with various modifying factors. Possibly, the influence of physical activity is larger for the lower extremity muscles, as the level of activity is more variable for the leg muscles than for the facial muscles. Additional research is required to test this hypothesis.