- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
Systemic sclerosis (SSc) is a multifactorial disease characterized by a triad of typical features, such as widespread vasculopathy, abnormalities of the immune system, and fibrosis (1). All of these characteristics are subtly interlaced and influence each other in a pathologic network that leads to the onset of the disease or to the development of specific organ manifestations (2, 3). Namely, it has been hypothesized that vascular injury and activation of the immune system constitute the early events and background that sustain the development of fibrosis (2) or the onset of other manifestations, such as pulmonary hypertension or scleroderma renal crisis (4).
Cytokines are key mediators of the immune system with a widespread array of functions, ranging from the regulation of inflammation to cell activation, proliferation, or differentiation (5). Cytokines may also promote the deposition of collagen and fibrosis (6) and many studies have focused on the role of these mediators in SSc, depicting alterations in their concentrations (7–9) or in the balance between Th1 and Th2 cytokine levels (10). Because cytokine production is regulated at the genetic level (11–14), it has been hypothesized that single-nucleotide polymorphisms (SNPs) in or near cytokine genes may be relevant to the development of SSc. Nonetheless, studies conducted so far on this matter have often yielded disappointing results (15–17) and, in some cases, the associations described by some authors have not been confirmed in replication studies conducted in other independent populations (16, 18–20). These contradictory results could be ascribed to different factors. First, the studies used small sample sizes and therefore were unable to depict a real association due to Type II errors (21). Second, the studied SNPs might not have a causative role in the pathogenesis of SSc, but rather they might only be relevant in the progression or in the expression of the disease (17). Third, each SNP may not have a discernible main independent effect on disease risk, but its effect may be dependent on other genetic variations (gene–gene interaction or epistasis) (22). The latter aspect is of particular importance when dealing with multifactorial diseases, such as SSc, and particularly in the present context because cytokines are redundant in their activity and because they may influence each other's production and function by acting synergistically or antagonistically (5).
The evaluation of gene–gene interaction in multifactorial diseases is a challenging task, and to date, several approaches have been developed for this purpose, including parametric statistical methods such as linear and logistic regression (23) or nonparametric methods such as genetic programming neural networks (24), multilocus genotype-pedigree disequilibrium test (25), or multifactor dimensionality reduction (MDR) (26). Parametric methods suffer from a general lack of power and flexibility to detect high-order gene–gene interactions and thus nonparametric models are regarded as more appropriate in the context of statistical epistasis as alternative research strategies (27, 28).
MDR is a nonparametric and genetic model-free data mining method developed to detect gene–gene interactions that examines all possible SNP combinations from a set of given SNPs and chooses the combination that best predicts the risk of disease by maximizing the classification accuracy of cases and controls. Among the advantages of the MDR strategy is the ability to detect and characterize high-order gene–gene interactions in case–control studies with moderate sample size data by reducing the genotype predictors from n dimension to 1 dimension and the ability to analyze correlated predictors, thus overcoming the problem of multicollinearity (28). The MDR method has successfully been used in a variety of complex multifactorial human diseases such as rheumatoid arthritis (29), lupus nephritis (30), prostate and sporadic breast cancer (31, 32), and others (26). It has also been shown that the MDR analysis may be integrated with other statistical approaches to better depict gene–gene interactions, and this strategy has been advocated as an optimal approach to elucidate complex epistatic interactions in human diseases (33, 34). Recently, Heidema et al (35) demonstrated that in genetic-association studies, the application of different multilocus methods may reduce the chance of falsely identifying SNPs as important and may be helpful for the selection of a set of important SNPs for further biologic studies.
The present study was conducted to determine the epistatic interactions of multiple cytokine SNPs on SSc susceptibility by using the MDR method, the results of which were also verified and complemented by other analytical strategies to minimize the chance of false-positive findings.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
The clinical and demographic characteristics of the patients are reported in Table 1. Genotyping results for the IL-4 G-1098T, IL-4 C-590T, and IL-4 C-33T did not meet the quality requirement for interpretation (e.g., unequal or weak amplification results) and were therefore excluded from analysis; this problem had already been observed in other studies in which the Heidelberg kit was used for genotyping (45). Missing genotypes had low frequencies (IL-1α C-889T: 1.2%; IL-1β C-511T: 0.2%; IL-1β C+3962T: 1%; IL-1R Cpst1970T: 0.2%; IL-2 G-330T: 3.5%; IL-2 G+160T: 3.5%; IL-4Rα A+1902G: 0.4%; IL-6 C-174G: 1.4%; IL-6 Ant565G: 1.4%; IL-10 A-1082G: 7.2%; IL-10 C-819T: 7.2%; IL-10 A-590C: 7.2%; IL-12 A-1188C: 1.4%; TGFβ1 T/C codon 10: 5.2%; TGFβ1 G/C codon 25: 5.2%; IFNγ AUTR5644T: 3.5%; TNFα A-308G: 1.2%; TNFα A-238G: 1.2%) and were randomly distributed across cases and controls; the frequency-based approach we used was therefore deemed appropriate to fill in missing genotypes.
Table 1. Clinical and demographic characteristics of 242 patients with systemic sclerosis*
|Female sex||214 (88.1)|
| ANA||232 (95.9)|
| ACA||104 (43)|
| Scl70||99 (40.9)|
|Age at onset, mean ± SD years||47.7 ± 14.4|
|Restrictive lung disease||48 (19.8)|
|Impaired DLCO||179 (73.9)|
|Pulmonary hypertension||56 (23.1)|
|Esophageal involvement||174 (71.9)|
|Renal involvement||10 (4.1)|
Allele and genotype frequencies for the studied SNPs in cases and controls and in the 2 disease subsets are reported in Table 2. All SNPs, with the exception of the IL-12 A-1188C SNP, were consistent with the HWE in both patients and controls. Allele and genotype frequencies for the SNPs in HWE were equally distributed between cases and controls; similarly, no differences were observed between controls and lcSSc patients or between controls and dcSSc patients (Table 2).
Table 2. Allele and genotype frequencies in patients and controls*
| || ||T||24.9||25.7||24.2||30.5||CT||36.4||35.6||33.9||40.7|
| || || || || || || ||TT||6.7||7.9||7.2||10.2|
| || ||T||34.7||33.4||33.7||32.5||CT||41.3||47.7||48.6||45.0|
| || || || || || || ||TT||14.0||9.5||9.4||10.0|
| || ||T||22.9||23.7||22.1||28.3||CT||34.0||34.9||33.1||40.0|
| || || || || || || ||TT||5.9||6.2||5.5||8.3|
| || ||T||35.7||32.6||31.5||35.8||CT||48.3||44.4||46.4||38.3|
| || || || || || || ||TT||11.6||10.4||8.3||16.7|
| || ||T||74.8||77.7||78.0||76.7||CT||33.1||37.2||35.2||43.3|
| || || || || || || ||TT||58.3||59.1||60.4||55.0|
| || ||G||15.9||16.9||16.9||16.7||AG||25.2||25.4||27.2||20.0|
| || || || || || || ||GG||3.3||4.2||3.3||6.7|
| || ||C||20||26||27||24||AC||35.1||39.5||39.9||38.3|
| || || || || || || ||CC||4.4||6.4||6.9||5|
| || ||T||44.2||46.4||45.3||50.0||AT||50.2||46.2||43.6||54.4|
| || || || || || || ||TT||19.0||23.3||23.5||22.8|
| || ||T||55.0||52.4||53.5||49.1||CT||49.6||54.5||53.8||56.9|
| || || || || || || ||TT||30.3||25.1||26.6||20.7|
| ||Codon 25||C||7.9||10.0||9.5||11.2||CC||0.4||0.9||1.2||0.0|
| || ||G||92.1||90.0||90.5||88.8||CG||14.9||18.2||16.8||22.4|
| || || || || || || ||GG||84.6||81.0||82.1||77.6|
| || ||G||89.2||90.0||90.3||89.2||AG||19.8||18.3||17.1||21.7|
| || || || || || || ||GG||79.3||80.9||81.8||78.3|
| || ||G||94.5||93.4||93.4||93.3||AG||9.2||11.6||12.2||10.0|
| || || || || || || ||GG||89.9||87.6||87.3||88.3|
| || ||T||62.8||64.8||65.2||63.4||GT||53.5||51.9||50.8||55.4|
| || || || || || || ||TT||36.1||38.8||39.8||35.7|
| || ||T||26.5||27.4||26.2||31.3||GT||41.7||37.1||34.8||44.6|
| || || || || || || ||TT||5.7||8.9||8.8||8.9|
| || ||T||90||89||89||89||GT||19.9||20.3||20||21.1|
| || || || || || || ||TT||80.1||79.1||79.1||78.9|
| || ||T||13||20||20||20||CT||74||37.3||38.3||34.2|
| || || || || || || ||TT||26||1.3||0.9||2.6|
| || ||T||16||20||20||21||CT||27.4||37.3||37.4||36.8|
| || || || || || || ||TT||2.1||2||1.7||2.6|
| || ||G||66.8||69.0||70.1||65.8||CG||46.0||47.1||46.7||48.3|
| || || || || || || ||GG||43.8||45.5||46.7||41.7|
| || ||G||74.0||72.9||74.5||68.3||AG||39.1||43.4||42.3||46.7|
| || || || || || || ||GG||54.5||51.2||53.3||45.0|
| || ||G||41.4||40.4||39.4||43.5||AG||42.7||51.1||51.4||50.0|
| || || || || || || ||GG||20.0||14.8||13.7||18.5|
| || ||T||26.8||21.6||21.7||21.3||CT||35.5||35.4||37.7||27.8|
| || || || || || || ||TT||9.1||3.9||2.9||7.4|
| || ||C||73.2||78.4||78.3||78.7||AC||35.5||35.4||37.7||27.8|
| || || || || || || ||CC||55.5||60.7||59.4||64.8|
The 4 best SNPs selected by the TuRF filter algorithm were IL-1β C-511T, IL-6 Ant565G, IL-2 G-330T, and IL-1Ra Cmspal11100T for controls versus SSc patients; IFNγ AUTR5644T, IL-6 C-174G, IL-6 Ant565G, and IL-2 G-330T for controls versus lcSSc patients; and IL-6 Ant565G, IL-10 C-819T, IL-2 G+160T, and IL-1R Cpst1970T for controls versus dcSSc patients.
The results of the exhaustive MDR analysis that evaluated all possible combinations of these 4 polymorphisms for each comparison are summarized in Table 3. The best model of each order is shown along with its TA, CVC consistency, and significance level as determined by permutation testing. As can be observed, none of the models were significant for the controls versus SSc comparison, whereas the MDR selected the 3-way combination as the best model for both the controls versus lcSSc patients comparison and the controls versus dcSSc patients comparison.
Table 3. Multifactor dimensionality reduction (MDR) analysis*
|Comparison†||Best combination in each dimension||TA||CVC||P|
|Controls vs. SSc||IL-1β C-511T||0.49||7/10||0.883|
| ||IL-1β C-511T, IL-6 Ant565G||0.51||4/10||0.706|
| ||IL-1β C-511T, IL-6 Ant565G, IL-2 G-330T||0.5||8/10||0.796|
| ||IL-1β C-511T, IL-6 Ant565G, IL-1Ra Cmspal11100T, IL-2 G-330T||0.5||10/10||0.796|
|Controls vs. lcSSc||IFNγ AUTR5644T||0.46||7/10||0.981|
| ||IFNγ AUTR5644T, IL-6 Ant565G||0.52||6/10||0.583|
| ||IL-2 G-330T, IFNγ AUTR5644T, IL-6 Ant565G‡||0.60||10/10||0.004|
| ||IL-6 C-174G, IL-2 G-330T, IFNγ AUTR5644T, IL-6 Ant565G||0.55||10/10||0.200|
|Controls vs. dcSSc||IL-6 Ant565G||0.47||4/10||0.967|
| ||IL-1R Cpst1970T, IL-6 Ant565G||0.56||10/10||0.082|
| ||IL-10 C-819T, IL-1R Cpst1970T, IL-6 Ant565G‡||0.57||10/10||0.050|
| ||IL-10 C-819T, IL-1R Cpst1970T, IL-2 G+160T, IL-6 Ant565G||0.53||10/10||0.455|
To test whether the preprocessing by the TuRF filter was indeed the optimal strategy for the multilocus analysis, we also ran the MDR on the whole data set with 18 SNPs and compared the results. The 3-locus models reported in Table 3 had a worse training accuracy (TrA) and a much better TA than the 3-locus models obtained running the MDR on the entire data set (TrA = 0.63, TA = 0.60 versus TrA = 0.65, TA = 0.56 for the controls versus lcSSc patients comparison; TrA = 0.68, TA = 0.57 versus TrA = 0.71, TA = 0.46 for the controls versus dcSSc patients comparison). These results indicate that the TuRF analysis actually reduced the amount of overfitting, improving the signal and helping the MDR algorithm to generate models that are more likely to generalize to independent data sets.
The distribution of cases and controls for the controls versus lcSSc patients comparison and the controls versus dcSSc patients comparison is summarized in Figure 1. Note that the pattern of high-risk and low-risk genotype combinations is nonlinear across each multilocus dimension; this is evidence of gene–gene interaction or epistasis.
Figure 1. Interaction of attributes. Summary of 3-locus genotype combinations associated with A, limited cutaneous systemic sclerosis and B, diffuse cutaneous systemic sclerosis. Each multilocus genotype combination is considered high risk when the ratio of cases to controls exceeds a threshold T, equal to the ratio of cases to controls in each population; otherwise, the cell is classified as low risk. High-risk combinations are depicted as darkly shaded cells, low-risk combinations as lightly shaded cells; empty cells are left blank. For each cell, the left bar indicates the cases, the right bar the controls. The pattern of high-risk and low-risk cells differs across each of the multilocus dimensions; this is evidence of gene–gene interaction or epistasis. IL = interleukin; IFNγ = interferon-γ; IL-1R = interleukin-1 receptor.
Download figure to PowerPoint
When we tried to replicate MDR results by the logistic-based approach, the FITF failed to detect any statistically significant effect (P > 0.05) for any of the comparisons. However, when the FITF was run considering the multilocus variables constructed by the MDR inductive algorithm, we found that these variables were highly significant, both for the controls versus lcSSc patients comparison (P < 0.001) and for the controls versus dcSSc patients comparison (P < 0.001). These results suggest that MDR identified a nonadditive interaction that was not identified by FITF because FITF depends on main effects.
The nonlinear relationship among the attributes both in the controls versus lcSSc patients comparison and in the controls versus dcSSc patients comparison is clearly illustrated in Figure 2, which shows an interaction graph highlighting the amount of information gained about case–control status by putting 2 polymorphisms together using the MDR function. A red or orange line connecting 2 polymorphisms suggests a positive information gain that can be interpreted as a synergistic or nonadditive relationship; a yellow line indicates independence or additivity. The interaction information analysis indicates that for both comparisons, interactions are mostly synergistic and that in the controls versus lcSSc patients comparison, the IFNγ AUTR5644T is the key SNP among the 3 SNPs selected in the best MDR model. On the contrary, no clear-cut distinction can be made for the model involving dcSSc patients. The significant interaction effects in the absence of a main effect, typical of nonlinear interactions (XOR model), confirm the epistatic nature of the interrelationship among cytokine SNPs in the present context.
Figure 2. Interaction graphs. The interaction model describes the percentage of the entropy (information gain) that is explained by each factor or 2-way interaction. The percentage in the node expresses the amount of the label's uncertainty eliminated by the node's attribute and the connection between the relative mutual information; a red or orange line suggests a positive information gain, which can be interpreted as a synergistic or nonadditive relationship; a yellow line indicates independence or additivity. Only the most important attributes selected after filtering (42) are reported for A, the limited cutaneous subset or B, the diffuse cutaneous subset. IFN = interferon; IL-2 = interleukin-2; IL1R = IL-1 receptor.
Download figure to PowerPoint
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
The current study was undertaken to determine the contribution of a high number of cytokine SNPs to either the susceptibility to SSc or the expression of the disease (e.g., disease subset). For this purpose we used MDR, a novel computational algorithm, to evaluate gene–gene interactions, as it has become clear that to study complex diseases with a polygenic background, traditional logistic regression analysis approaches are not adequate and may underestimate the genetic contribution to disease in the presence of interactions between loci (28). Indeed, traditional methods suffer from a general lack of power and may provide estimates with large standard errors, thus increasing Type I errors, when dealing with multiple variables and/or small sample sizes. On the contrary, MDR has been developed to overcome these concerns and has proved capable of identifying evidence for high-order gene–gene interactions in the absence of any statistically significant independent main effect (29–32).
Our results indicate that cytokine SNPs do not contribute to susceptibility to SSc per se, but rather they may be important in determining which subset of the disease the patient is likely to develop. Two different 3-factor models were found to be relevant in the susceptibility to each disease subset, both of which display the characteristics of epistasis: none of the cytokines showed independent main effects (Figure 2) and their interactions appeared to be nonlinear (Figure 1). The presence of epistatic interactions among cytokine SNPs may explain the apparently contradictory results we observed after the MDR and the FITF analysis. Indeed, although the FITF method in the presence of additive, dominant, and recessive models is more powerful than other analytical approaches, either parametric or nonparametric, it may not detect interactions when they involve genes with little or no marginal effects (44). In contrast, when this approach was used as part of a multistrategy constructive induction algorithm (26) that involved the removal of noisy attributes by the principles of information theory (e.g., TuRF) (42), the selection of interesting attributes by a nonparametric approach targeted at epistasis (e.g., MDR), and the construction of new multilocus attributes, it confirmed the validity of the 3-factor model sorted out by the MDR algorithm.
As outlined by Moore and Williams (46), it is difficult to make inferences about the biologic significance of a statistical model of epistasis; nonetheless, each of our 3-factor models has its own biologic plausibility. MDR analysis selected the IL-2 G-330T, IL-6 C-174G, and IFNγ AUTR5644T SNPs as the best predictors of lcSSc risk. IL-6 was found to be produced at increased levels by cultured peripheral blood mononuclear cells from SSc patients (47), and increased IL-6 levels were observed in sera from SSc patients, also correlating with the degree of skin fibrosis (9). On the contrary, IFNγ can negatively regulate the transcription of extracellular matrix synthesis from SSc fibroblast (48), and higher levels of IFNγ were found in lcSSc patients compared with dcSSc patients (49), while the administration of recombinant IFNγ proved effective in ameliorating skin fibrosis (50). Thus, IFNγ may modulate the fibrotic responses mediated by IL-6, and indeed lcSSc patients usually have an indolent fibrotic disease (51). Finally, the contribution of IL-2 polymorphisms to the genesis of lcSSc has already been described, as these polymorphisms were shown to be differently distributed between lcSSc patients and controls in the Italian population (16). As far as dcSSc is concerned, the MDR analysis detected a significant 3-way interaction among the IL-1R Cpst1970T, the IL-6 Ant565G, and the IL-10 C-819T SNPs. All of these cytokines have prominent profibrotic features (52, 53) and a role for IL-1R, IL-10, or IL-6 has already been demonstrated in the development of fibrotic responses in patients with SSc (42, 50–53). Thus the synergic interaction among these cytokines, as clearly outlined in Figure 2, may promote collagen synthesis or deposition and account for the prominent fibrotic features observed in the dcSSc subset of the disease (51).
Altogether, our results indicate that cytokine SNPs with a profibrotic or a regulatory function on profibrotic interleukins may be important in the susceptibility to SSc subset, that is, in determining the degree of fibrosis the patient is likely to develop (e.g., dcSSc or lcSSc). These findings are not totally unexpected because fibrosis is the ultimate hallmark of SSc (2). Still, although our models have their own biologic plausibility, a functional study would be the ultimate demonstration of their clinical relevance. However, connecting the high-risk and low-risk genotype combinations with what is known about fibrosis pathways is extremely difficult because much of what is known about the function of this pathway is based on experiments that involve 1 gene at a time. A fully elucidative study of multiple gene–gene interactions could be accomplished by either collecting and analyzing the whole genetic, genomic, and proteomic data from complex biologic systems (52) or studying biologic pathways in simple organisms (46). Nevertheless, these strategies are extremely complex and may present overwhelming technical problems, therefore an alternative strategy would be to generate via mathematical models (i.e., Petri nets) simpler hypotheses about biochemical systems that can be eventually tested in vitro or in vivo (53). So far these strategies have proved effective in relatively simple settings involving 2-locus epistatic models (54), but they could theoretically be extended to more complex biologic hierarchies that account for additional layers of complexity, thus helping to understand the genotype–phenotype relationship in genetic studies of rare diseases, such as SSc.
A major shortcoming of genetic-association studies is the possibility of false-positive results even in the presence of statistically significant findings, that is, according to the definition of Wacholder et al (55), the false-positive report probability (FPRP). Even if no formal calculation of the FPRP was carried out and no strategy has been outlined to perform such calculation in the context of epistasis, some considerations let us think that the probability of false-positive findings in our study is relatively low. First, only a limited number of cytokine SNPs has been described so far (56) and this would considerably increase the prior probability that the association is real. Second, for all the SNPs we analyzed, there have been reports that they have a functional relevance and that they are associated with high or low production of the corresponding cytokine (14,56), thus further increasing the prior probability. Finally, using the approach described by Moore et al (26), we confirmed the relevance of the SNPs sorted out by the MDR method showing that the multilocus attributes constructed by the MDR inductive algorithm are highly significant when analyzed by the logistic regression–based approach FITF (44), thus reducing the chance of falsely identifying these SNPs as important (35).
In summary, we provide evidence for gene–gene interaction among cytokine SNPs in the context of SSc. By applying the MDR algorithm, it was possible to model these interactions in the absence of main independent effects or detectable significance by parametric statistical approaches. This methodology allowed us to identify the most interesting cytokine SNPs from a high number of analyzed mutations (e.g., 18). We think this information will play an important role in helping future researchers target a smaller number of biologic pathways in an effort to develop a better understanding of the mechanisms that underlie SSc susceptibility and/or disease expression.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
Dr. Beretta had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Beretta, Scorza.
Acquisition of data. Beretta, Cappiello.
Analysis and interpretation of data. Beretta, Cappiello, Moore, Barili.
Manuscript preparation. Beretta, Cappiello, Moore, Greene.
Statistical analysis. Beretta, Moore, Greene.
Collection of funds. Scorza.