Summary
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
Coronary heart disease (CHD) is a complex disease, which is influenced not only by genetic and environmental factors but also by gene–environment (GE) interactions in interconnected biological pathways or networks. The classical methods are inadequate for identifying GE interactions due to the complex relationships among risk factors, mediating risk factors (e.g., hypertension, blood lipids, and glucose), and CHD. Our aim was to develop a twolevel structural equation model (SEM) to identify genes and GE interactions in the progress of CHD to take into account the causal structure among mediating risk factors and CHD (Level 1), and hierarchical family structure (Level 2). The method was applied to the Framingham Heart Study (FHS) Offspring Cohort data. Our approach has several advantages over classical methods: (1) it provides important insight into how genes and contributing factors affect CHD by investigating the direct, indirect, and total effects; and (2) it aids the development of biological models that more realistically reflect the complex biological pathways or networks. Using our method, we are able to detect GE interaction of SERPINE1 and body mass index (BMI) on CHD, which has not been reported. We conclude that SEM modeling of GE interaction can be applied in the analysis of complex epidemiological data sets.
Introduction
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
Coronary heart disease (CHD) is the most common form of heart disease in most developed countries. Despite encouraging declines over the past decades, it remains the leading cause of death in the United States. Like other complex diseases, CHD is influenced by multiple risk factors that interconnect with one another in complicated ways (Talmud, 2004; 2006). For example, cigarette smoking may directly affect CHD by damaging the vascular endothelium, and indirectly affect CHD by perturbing lipoprotein metabolism, increasing insulin resistance, and lipid intolerance. Interactions between genetic and environmental risk factors can sometimes be more important than the values of the individual factors in determining the probability of developing disease (Yang & Khoury, 1997). Understanding this complex interplay of genes and environment will lead us to new methods of disease detection and prevention. Studies have found GE interactions that contribute to the risk CHD, but results have not been consistent across studies. For example, previous studies reported that the risk of CHD is influenced by an interaction between smoking and APOE polymorphism (Humphries et al., 2001; Djoussé et al., 2002; Stengard et al., 1995; Talmud et al., 2005), but two large welldesigned studies have provided conflicting results (Keavney et al., 2003; Liu et al., 2003). Such inconsistencies across studies may be related to population differences, differences in the approaches used to assess GE interaction, confounding, low power, or sampling variation (see, e.g., Talmud, 2006).
The FHS, a populationbased prospective study, began in 1948. The aim of the FHS is to identify the common factors or characteristics that contribute to cardiovascular disease (CVD) by following its development over a long period of time in a large group of participants who had not yet developed overt symptoms of CVD or suffered a heart attack or stroke. In 1971, a secondgeneration group, the Offspring Cohort, was enrolled and followed every 4 years since enrollment. In 2002, a third generation cohort was also added. Over the years, the FHS and other epidemiological studies have produced many basic findings that are important today with respect to the understanding of the causes of CVD including genetic factors. The major risk factors are direct causes of CVD, including cigarette smoking, hypertension, cholesterol disorders, elevated glucose, and advancing age. The underlying risk factors: overweight/obesity, physical inactivity, diet, family history of premature CVD, and various genetic factors affect CVD risk by acting through the major risk factors, and they also appear to influence risk in ways unrelated to the major risk factors. For example, familial influences on risk of CVD are mediated in part through blood pressure and blood lipoprotein levels (Grundy et al., 1998; Smith et al., 2004). Patients with diabetes are at high risk of hypertension and cholesterol disorders such as atherogenic dyslipidemia and high level of lowdensity lipoprotein (LDL) cholesterol (Grundy et al., 2002). A clustering of risk factors including obesity, atherogenic dyslipidemia, and elevated blood pressure and glucose (metabolic syndrome) are under the pleiotropic control of several loci (Stein et al., 2003). Blood glucose (BG) is strongly related to blood pressure, positively related to triglyceride and negatively related to highdensity lipoprotein (HDL) (Cambien et al., 1987). We now have wellestablished risk factors for CVD that are interconnected in a complex biological system.
Despite the complexity of development of CHD, GE interactions are frequently evaluated using (1) contingency table analysis, considering one or two exposures or genes at a time, singly or in pairwise combinations, thereby neglecting the potential confounding effects of other factors or more complex interactions (Botto & Khoury, 2001); and (2) multiple regression analysis (univariate approaches), including all variables in a single model (Djoussé et al., 2002; Humphries et al., 2001; Talmud, 2004), ignoring indirect effects through other mediating risk factors and pleiotropic effects, and as a result, neglecting an important source of information that could provide additional insight into GE interactions. Several joint analyses on multiple traits have been developed to take into account the correlation or causal relationship among multiple traits in genetic association studies. These methods have been shown to improve the power of association tests and precision of parameter estimates (Jiang & Zeng, 1995; Zhu & Zhang, 2009). Therefore, a joint analysis on mediating risk factors and CHD is needed to reflect the underlying roles of risk factors for the disease.
Structural equation model (SEM) is a generalization of simultaneous equation procedures originating from path analysis (Wright, 1921) and initially popularized in econometrics and genetics. Recently, it has been used to examine GE interactions for twin data (Rijsdijk & Sham, 2002; McCaffery et al., 2009), and applied to functionally related traits in genetic research with the goal of characterizing genetic architecture precisely and intuitively (Nadeau et al., 2003; Gianola & Sorensen, 2004; Li et al., 2006; Neto et al., 2008; Zhu & Zhang, 2009). Nock et al. (2009) studied the genetic determinants of Metabolic Syndrome in the Framingham Heart Study (FHS) with conventional SEM, where parameters were estimated by minimizing the difference between the observed covariance matrix and the modelpredicted covariance matrix assuming multinormally distributed continuous variables. Their method, however, did not focus on the study of GE interaction and categorical outcomes, and was different from our approach. SEM is a useful method for estimating and evaluating simultaneous causal relationships among variables, which allows variables to be both dependents and predictors. In particular, SEM allows researchers to decompose the effects of one variable on another into direct, indirect, and total effects. Direct effects are the influence of one variable on another that are not causally explained by any other intermediary variable. Indirect effects are relationships that can be explained by at least one other intervening variable, and the total effect is the sum of direct and indirect effects. By explicitly accounting for the underlying roles of the risk factors for CHD, SEM can provide more insight and a richer understanding of how the risk factors influence CHD.
The FHS data consisted of family structure information, genotypic information, and longterm phenotypic information. Full sibs share genes and often, live under common environmental conditions for a period of their lives, which makes them a good source for a study in GE interaction. The objectives of this study were to use the FHS Offspring Cohort data to identify GE interactions in the progress of CHD using a generalized twolevel SEM to account for the causal structure among mediating risk factors and CHD and to compare the proposed method with a classical univariate approach (logistic regression) in terms of identification of GE interactions.
Threshold Model
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
For a dichotomous observed response (such as CHD or presence/absence of hypertension) y_{ijk} (k= 1, 2, …p) is related to an unobserved latent continuous response y*_{ijk} via a threshold model (Muthen, 1984) described as follows:
 (1)
where τ_{k} is a threshold parameter and pr (y_{ijk}= 1x_{ij}) =pr (y*_{ijk} > τ_{k} x_{ij}).
For a continuous observed response, y*_{ijk} is directly observed,
Structural Model
Withinfamily model—Level 1
The linear structural model can be specified as
 (2)
where intercept v_{i} is a (p× 1) vector of means of underlying responses over all individuals in the ith family; B is a (p×p) matrix of structural parameters that describes the causal relationship among p latent responses, where (I−B)^{−1} exists; Γ is a (p×q) coefficient matrix that describes the causal relationships between the latent responses and the predictor variables; x_{ij} is a (q× 1) vector of observed predictor variables (can be categorical or continuous variables), which includes the genotypic covariates, environmental covariates, and the GE interactions created as the crossproducts of genotypic and environmental covariates; ζ_{ij} is a (p× 1) vector of residuals, which is assumed to be multivariate normally distributed with mean vector zero and covariance matrix Ψ. Note that this model is conditional on the observed predictor variables x_{ij}. Unlike conventional SEMs where all observed variables are treated as responses, there are no distributional assumptions regarding x_{ij} (Muthen, 1984; Skrondal & RabeHesketh, 2004). To account for the correlation of individuals within a family, the intercept was allowed to vary across families and was defined at the second level.
Family randomintercept model—Level 2
Define the family level model as
 (3)
where v_{i} is a p× 1 vector of means of underlying responses in ith family, which represent heterogeneity between families in the overall response; γ is a (p× 1) vector of overall means of underlying responses; ξ_{i} is a (p× 1) vector of residuals and is assumed to be multivariate normally distributed with mean vector zero and covariance matrix Θ. The random effects at different levels of the model are assumed to be independent.
The model for the Level1 units (jth individual) and Level2 units (ith family) can be written by substituting equation (3) into (2) and solving for the reduced form, which gives a generalized twolevel SEM
 (4)
The betweenfamily and withinfamily covariance matrices are derived as Σ_{B}= (I−B)^{−1}Θ[(I−B)^{−1}]′ and Σ_{W}= (I−B)^{−1}Ψ[(I−B)^{−1}]′. The expected value and covariance matrix of y*_{ij} are derived as
 (5)
 (6)
where μ_{ij} is the implied mean vector of the endogenous variables; Σ is the implied variance–covariance matrix among the endogenous variables by the use of SEM. Both u_{ij} and Σ contain parameters to be estimated.
Estimation
The model described in (5) and (6) is unlike the conventional SEM where parameters are estimated by minimizing the difference between the observed covariance matrix and the modelpredicted covariance matrix assuming multinormally distributed continuous variables. Instead, the likelihood of the observed data must be obtained by somehow “integrating out” the latent responses y*_{ij}. Let θ be the vector of all parameters including the regression coefficients, threshold parameters τ_{k} for dichotomous variables, and the nonduplicated elements of the covariance matrix. Let y_{i} be the observed response vector and x_{i}, the vector of predictor variables for all subjects in the ith family. Let y and X be the response vector and matrix of predictor variables for all subjects. Given equations (5) and (6), the latent responses y*_{ij} follow a multivariate normal distribution with mean u_{ij} and covariance matrix Σ. We will denote the multivariate normal density of the latent responses at the family level as h_{i}(y*_{i}; θ). The marginal likelihood is constructed recursively. The conditional density of the observed variables of a family level y_{i}, conditional on the latent variables y*_{i} is (RabeHesketh et al., 2004)
 (7)
where the product is over all subjects within the family. Since the N families are assumed to be sampled independently, the total marginal likelihood is the product of the contributions from all families,
 (8)
Skrondal and RabeHesketh (2004) provide an extensive overview of estimation methods for SEMs with noncontinuous variables and related models. Muthen & Satorra (1996) developed a general threestage procedure to obtain estimates, standard errors, and a χ^{2} measure of fit for a given structural model with a mixture of dichotomous, ordered categorical, and continuous measures of latent variables. This estimation approach was computationally efficient and was implemented in Mplus (Muthen & Muthen, 2004).
The likelihood ratio (LR) statistic
 (9)
is used for hypothesis tests and model selection, where is the maximum likelihood (ML) estimator under the reduced model, and is the ML estimator under the full model. The LR has an asymptotically χ^{2} distribution when the reduced model is true. The degrees of freedom are the difference in the degrees freedom for the two models (Bollen, 1989).
Study Population
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
For this study, a crosssectional design was conducted using the Offspring Cohort data of the FHS. Subjects who participated in the fifth examination cycle were selected. The parents of the Offspring were excluded. The analyses were based on the 966 participants with complete data on family structure, and variables used. The data consisted of 444 full sibships from 279 families. Of these families, only nine contained half siblings, so we only considered the correlations among full siblings, the correlations among individuals in the same family but with one or two different parents were ignored in this analysis. Individuals were used as Level1 units and full sibships were used as Level2 units of hierarchical family structure. The endpoint was CHD, which was defined as present if subjects had (presence of a diagnosis of CHD) angina pectoris, myocardial infarction, coronary insufficiency, and sudden and nonsudden death. Other phenotypes of interest were hypertension, ratio of total cholesterol and highdensity lipoprotein (TC/HDLc), fasting BG. Hypertension was defined as systolic BP ≥ 140 mmHg or diastolic BP ≥ 90 mmHg or if subjects were currently taking medication to lower high BP. Two readings obtained by the physician and one reading obtained by the nurse were averaged to calculate systolic and diastolic BP values. Genes of interest included apolipoprotein E (APOE) and singlenucleotide polymorphisms (SNPs) rs1799768 in the human plasminogen activator inhibitor1 (SERPINE1) (previously PAI1), because the previous studies have shown that the genes were associated with CHD. As defined in the majority of previous studies, we grouped carriers of ɛ2 allele as those who had genotype E2/E2 or E2/E3, ɛ3 allele as those who had genotype E3/E3, and ɛ4 allele as those who had genotype E3/E4 or E4/E4. SERPINE1 was classified into three possible genotypes 4G/4G, 4G/5G, and 5G/5G. Following the theory of the development of CHD and scientific research (Grundy, 1999; Grundy et al., 2002; Smith et al., 2004) discussed in the previous section, we developed the conceptual model presented in Figure 1. The model illustrates the relationships among the mediating risk factors hypertension, BG, TC/HDLc, and endpoint CHD. The covariates of interest were age, gender, body mass index (BMI), number of cigarettes per day, and alcohol consumption (oz/week).
Statistical Analyses
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
Our proposed twolevel SEM was implemented using Mplus Version 5.1 software (Muthen & Muthen, 2004) (Type = Twolevel Estimator = Ml Algorithm = Integration) to account for the relationships among the mediating risk factors hypertension, BG, TC/HDLc, and endpoint CHD and for the hierarchical family structure with individuals at the first level and full sibships at the second level. The backward elimination procedure was performed in two steps. First, we started with the saturated model (full model) that was built by fitting the model with age, sex, BMI, cigarettes per day, alcohol, APOE, and SERPINE1 as explanatory variables to predict these four related phenotypes (Fig. 1), then the reduced model was fit in which the regression coefficient of the least important variable (with the largest Pvalue) was constrained to zero. The LR test statistic was obtained by LR =−2(loglikelihood_{reduced}− loglikelihood_{full}), which is approximately χ^{2} distributed with one degree of freedom. The above procedure was repeated until all the variables were important (P < 0.05). Second, to determine which GE interaction terms should be included in the final model, the backward elimination procedure was performed starting with all possible interactions between the explanatory variables identified in the first step. The overall model goodnessoffit to the data can be evaluated by the χ^{2} test, comparative fit index (CFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). However, these model fit indices are not available for models with categorical responses in Mplus. Instead we used Akaike's Information Criteria (AIC) and Bayesian Information Criteria (BIC) fit statistics and significance tests of path coefficients to assess the overall model goodnessoffit. Smaller AIC and BIC and significant path coefficients indicated the model was acceptable and a good fit.
Results
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
Characteristics of the study population are shown in Table 1. A total of 466 men and 470 women were assessed, of whom 62 (12.5%) men and 23 (4.9%) women had CHD as previously defined. Their clinical characteristics according to SERPINE1 genotypes and APOE genotypes are summarized in Tables 2 and 3. The ɛ4 carriers had higher levels of total cholesterol, and ratio of total cholesterol and HDL cholesterol; the ɛ2 carriers had lower levels of total cholesterol, and ratio of total cholesterol and HDL cholesterol (Table 3).
Table 1. Characteristics of the study population.  Male (N= 496)  Female (N= 470) 

Subjects with CHD (N= 62)  Subjects without CHD (N= 434)  Significant P*  Subjects with CHD (N= 23)  Subjects without CHD (N= 447)  Significant P* 


Age  58.7 ± 9.3  51.1 ± 10.0  <.0001  58.4 ± 9.5  51.9 ± 9.9  0.002 
Body mass index (kg/m^{2})  29.1 ± 3.8  28.1 ± 4.0  0.082  28.2 ± 5.5  26.8 ± 5.84  0.2651 
Cigarettes per day  6.1 ± 12.5  4.0 ± 10.3  0.1358  6.8 ± 12.5  3.9 ± 9.3  0.1572 
Total alcohol per week (Oz)  3.0 ± 5.5  3.7 ± 4.54  0.254  1.1 ± 2.2  1.7 ± 2.5  0.3169 
Systolic blood pressure (mm Hg)  129.7 ± 18.6  126.5 ± 15.4  0.1386  132.8 ± 19.7  120.4 ± 18.0  0.0015 
Diastolic blood pressure (mmHg)  76.9 ± 8.7  77.4 ± 8.9  0.641  75.9 ± 9.84  71.9 ± 9.3  0.0448 
Hypertension (%)  43.6  26.3  0.0048  47.8  22.6  0.0056 
Antihypertensive treatment (%)  29.0  12.3  0.0004  34.8  11.4  <.0001 
Blood glucose (mg/dL)  112.1 ± 37.9  101.9 ± 26.3  0.0077  106.1 ± 34.1  97.8 ± 29.1  0.1863 
Total cholesterol (mg/dL)  202.1 ± 35.1  202.0 ± 35.2  0.9865  225.7 ± 36.9  202.8 ± 37.9  0.0048 
HDL cholesterol (mg/dL)  38.5 ± 9.7  43.5 ± 11.3  0.001  48.7 ± 14.5  56.2 ± 14.1  0.0134 
TC/HDLc  5.7 ± 2.4  4.9 ± 1.4  0.0004  4.9 ± 1.4  3.8 ± 1.3  <.0001 
Table 2. Characteristics of the study population by SERPINE1 genotypes.  G4/G4 (N= 267)  G4/G5 (N= 491)  G5/G5 (N= 208)  Significant P* 


Age  51.4 ± 9.5  52.4 ± 10.4  52.3 ± 10.02  0.4372 
Body mass index (kg/m^{2})  27.8 ± 4.8  27.5 ± 5.1  27.5 ± 4.9  0.6821 
Cigarettes per day  4.6 ± 10.8  3.6 ± 9.4  4.9 ± 10.6  0.1861 
Total alcohol per week (Oz)  2.6 ± 3.9  2.7 ± 3.9  2.8 ± 3.8  0.8177 
Systolic blood pressure (mmHg)  123.3 ± 16.64  124.6 ± 17.8  123.8 ± 17.0  0.6299 
Diastolic blood pressure (mmHg)  74.9 ± 9.3  74.8 ± 9.4  74.5 ± 9.9  0.9060 
Hypertension (%)  25.8  27.9  22.6  0.3411 
Antihypertensive treatment (%)  13.1  14.3  12.0  0.7166 
CHD (%)  8.2  9.0  9.1  0.9280 
Blood glucose (mg/dL)  102.89 ± 32.9  100.3 ± 27.3  99.29 ± 26.9  0.3466 
Total cholesterol (mg/dL)  201.6 ± 36.2  202.4 ± 36.9  206.0 ± 36.4  0.3767 
HDL cholesterol (mg/dL)  47.7 ± 13.1  49.3 ±14.5  50.6 ± 15.3  0.0872 
TC/HDLc  4.61 ± 1.5  4.5 ± 1.6  4.4 ± 1.4  0.5938 
Table 3. Characteristics of the study population by APOE genotypes.  E2 (N= 123)  E3 (N= 638)  E4 (N= 205)  Significant P* 


Age  52.6 ± 9.9  52.2 ± 10.4  51.4 ± 9.2  0.5115 
Body mass index (kg/m^{2})  27.6 ± 4.6  27.7 ± 5.0  27.2 ± 5.0  0.4812 
Cigarettes per day  5.2 ± 10.2  4.3 ± 10.4  2.9 ± 8.7  0.0971 
Total alcohol per week (Oz)  2.6 ± 4.1  2.6 ± 3.86  3.0 ± 3.9  0.4369 
Systolic blood pressure (mmHg)  123.0 ± 16.2  125.0 ± 17.4  121.7 ± 17.4  0.0476 
Diastolic blood pressure (mmHg)  74.7 ± 9.3  74.9 ± 9.4  74.4 ± 10.1  0.7457 
Hypertension (%)  25.2  27.9  21.5  0.1831 
Antihypertensive treatment (%)  13.8  13.8  12.2  0.8369 
CHD (%)  7.3  8.9  9.3  0.8157 
Blood glucose (mg/dL)  101.3 ± 30.8  100.7 ± 28.0  100.7 ± 30.4  0.9782 
Total cholesterol (mg/dL)  186.2 ± 36.8  204.4 ± 35.7  208.6 ± 36.5  <.0001 
HDL cholesterol (mg/dL)  50.1 ± 14.0  49.5 ± 14.7  47.6 ± 13.3  0.212 
TC/HDLc  4.0 ± 1.4  4.5 ± 1.6  4.7 ± 1.4  0.0006 
Figure 2 shows the estimates of path coefficients of the final twolevel SEM, including gene–environment (GE) interaction, in the development of CHD. A path coefficient is a standardized regression coefficient (beta) showing the direct effect of an independent variable on a dependent variable in the path model. Increasing age, cigarette smoking, being male, hypertension, and high TC/HDLc are major risk factors that influence CHD risk directly. There were significant differences in TC/HDLc levels among the APOE genotype groups (P < 0.0001). The ɛ4 carriers had the higher and ɛ2 carriers had the lower TC/HDLc ratio compared with ɛ3 carriers. We observed a significant effect of the interaction between number of cigarettes per day and the APOE genotype (β=−0.02, P= 0.0064), implying that the APOE genotypes influenced the response of lipid level to the number of cigarettes per day. Additionally, the ɛ2 carriers showed a greater (slope) response to cigarettes than the other two genotypes. APOE and APOE× cigarettes/day influenced the risk of CHD by influencing the mediating risk factor (TC/HDLc). Although no significant effect of SERPINE1 was found, we did find a significant interaction between SERPINE1 and BMI for CHD through BG level (β=−0.502, P= 0.0244). The G4/G4 carriers had the greatest response of BG level to BMI. We also observed a significant effect of alcohol consumption on TC/HDLc, which ultimately reduced the CHD risk.
Table 4 presents the estimates of direct, indirect, and total effects on the development of CHD and estimates of CHD odds ratios for major risk factors that influence CHD directly using the twolevel SEM. The path coefficients were used to calculate the indirect and total effects. For example, the indirect effects of SERPINE1× BMI on CHD were calculated by multiplying the path coefficients for each path from SERPINE1× BMI to CHD and summing the products (SERPINE1× BMI > BG > Hypertension > CHD is −0.525 × 0.005 × 0.393 =−0.00103; SERPINE1× BMI > BG > TC/HDLc > CHD is −0.525 × 0.009 × 0.28 =−0.00137. Hence, the total indirect effect of SERPINE1× BMI on CHD is the sum of all the indirect effects of associated SERPINE1× BMI to CHD ((−0.00103) + (−0.00137) =−0.0024). The total SERPINE1× BMI on CHD is the sum of direct and indirect effects of SERPINE1× BMI on CHD (−0.0024 + 0 =−0.0024). Hypertension and high TC/HDLc are major risk factors that influence CHD directly. The estimated odds of developing CHD is 1.481 (95% confidence limits (CL): 1.021, 2.150) times higher for hypertensive subjects as compared to nonhypertensive subjects. The estimated CHD odds ratio is 1.323 (95% CL: 1.131, 1.548) times higher for subjects with a 1 unit larger TC/HDL. Being male, increasing age, and number of cigarettes per day are also major risk factors that influence CHD directly and indirectly through mediating risk factors such as TC/HDLc, hypertension, and BG. Alcohol consumption, BMI, APOE, APOE× cigarettes/day, SERPINE1, and SERPINE1× BMI effect CHD only through mediating risk factors. These results show that our twolevel SEM analysis method provides additional information on how the risk factors affect CHD both directly and indirectly.
Table 4. Estimated risk effects on CHD and CHD odds ratios for direct effects using a generalized twolevel SEM, the Framingham Offspring Study.  Direct  Indirect  Total 

Effect  Odds ratios (95% CL)  Effect  Effect 

Hypertension  0.3930  1.481 (1.021, 2.150)   0.3930 
TC/HDLc  0.2800  1.323 (1.131, 1.548)   0.2800 
Blood glucose (mg/dL)    0.0045  0.0045 
Sex  −0.9110  0.402 (0.224, 0.721)  −0.3002  −1.2112 
Age  0.0780  1.081 (1.048, 1.116)  0.0472  0.1252 
Cigarettes per day  0.0280  1.028 (1.004, 1.053)  0.0185  0.0465 
Total alcohol per week (Oz)    −0.0106  −0.0106 
Body mass index (BMI) (kg/m^{2})    0.0963  0.0963 
APOE    0.1221  0.1221 
APOE× cigarettes/day    −0.0064  −0.0064 
SERPINE1    0.0573  0.0573 
SERPINE1× BMI    −0.0024  −0.0024 
As a comparison with a standard approach, Table 5 shows the summary results with a univariate logistic regression approach, which has only one dependent variable, CHD. We only observed three significant effects, TC/HDLc, age, and sex, on CHD. No significant effects of gene or gene by environment interactions were found, which likely resulted because the effects of gene or gene by environment interactions were small. The estimated CHD odds ratio is 1.286 (95% CL: 1.091, 1.515) times greater when TC/HDLc is increased by 1. The estimated odds of developing CHD are 0.374 (95% CL: 0.205, 0.686) times lower for females than males. The estimated CHD odds ratio is 1.082 (95% CL: 1.047, 1.119) times greater when age is increased by 1. The major differences between our proposed twolevel SEM method and the univariate logistic approach were that not only more significant risk factors (APOE, APOE× cigarettes/day and SERPINE1× BMI) were found, but that the SEM method allowed one to fit a more complex and biologically realistic model, which allowed estimation of direct and indirect risk effects on CHD.
Table 5. Estimated risk effects on CHD and CHD odds ratios using a univariate logistic regression, the Framingham Offspring Study. Effect  Estimate  Standard error  Pvalue (ttest)  Odds ratios (95% CL) 

Hypertension  0.3389  0.2953  0.2516  1.403 (0.786, 2.507) 
TC/HDLc  0.2512  0.0835  0.0028  1.286 (1.091, 1.515) 
Blood glucose (mg/dL)  0.0006  0.0040  0.8881  1.001 (0.993, 1.008) 
Sex  −0.9823  0.3079  0.0015  0.374 (0.205, 0.686) 
Age  0.0792  0.0167  <.0001  1.082 (1.047, 1.119) 
Cigarettes per day  0.0655  0.0485  0.1771  1.068 (0.971, 1.175) 
Total alcohol per week (Oz)  −0.0306  0.0354  0.3877  0.970 (0.905, 1.040) 
Body mass index (BMI) (kg/m^{2})  −0.0586  0.0909  0.5198  0.943 (0.789, 1.128) 
APOE  0.1973  0.2684  0.4625  1.218 (0.719, 2.064) 
APOE× cigarettes/day  −0.0168  0.0225  0.4545  
SERPINE1  −1.0702  1.2188  0.3803  0.343 (0.031, 3.760) 
SERPINE1  0.0388  0.0419  0.3548  
Table 6 shows the estimated correlations between exogenous and endogenous variables based on our proposed model. For the two latent endogenous variables (CHD and hypertension), correlations were not calculated.
Table 6. Estimated correlations between endogenous and exogenous variables using a generalized twolevel SEM, the Framingham Offspring Study.  TC/HDLc  Blood glucose (mg/dL)  Sex  Age  Cigarettes/day  Alcohol (Oz)/wk  BMI  APOE  APOE× Cigarettes/day  SERPINE1  SERPINE1× BMI 

TC/HDLc  1.000           
Blood glucose  0.258  1.000          
Sex  −0.359  −0.087  1.000         
Age  0.142  0.232  0.006  1.000        
Cigarettes per day  0.090  0.039  0.009  −0.060  1.000       
Alcohol(Oz)/wk  −0.008  0.012  0.259  −0.015  0.166  1.000      
BMI  0.318  0.275  −0.137  0.033  −0.039  −0.035  1.000     
APOE  0.116  −0.004  −0.018  −0.036  −0.068  0.031  −0.027  1.000    
APOE×cigarettes/day  0.097  0.052  −0.024  −0.053  0.961  0.173  −0.021  0.040  1.000   
SERPINE1  0.118  0.078  −0.061  0.049  −0.012  −0.004  0.426  0.015  0.005  1.000  
SERPINE1× BMI  −0.032  −0.045  0.008  0.033  0.003  0.020  −0.023  0.028  0.012  0.881  1.000 
Discussion
 Top of page
 Summary
 Introduction
 Methods
 Threshold Model
 Study Population
 Statistical Analyses
 Results
 Discussion
 Acknowledgments
 References
In this work, we presented a generalized twolevel SEM to model the development of CHD, which included genotype and GE interactions, using the FHS Offspring Cohort data. Compared with a classical univariate method (logistic regression), our approach had several advantages: (1) it provided important insights into how genes and contributing factors affect CHD by investigating the direct, indirect, and total effects, (2) it aided with the development of biological models that more realistically reflect the complex biological pathways and networks, and (3) more significant risk factors (APOE, APOE× smoking and SERPINE1× BMI) were found when compared to a traditional univariate logistic regression approach. These many advantages should encourage researchers to use this method more frequently in the analysis of complex epidemiological data.
APOE is a major component of LDL and HDL. It plays a key role in the metabolisms of cholesterol and triglyceride by serving as a receptorbinding ligand removing excess cholesterol from the plasma and carrying it to the liver for processing (Dallongeville et al., 1992; Eichner et al., 2002). The structural gene locus of this apolipoprotein is polymorphic (Utermann et al., 1977). Three major APOE isoforms encoded by three common alleles (ɛ2, ɛ3, and ɛ4) at the APOE locus have been studied extensively, and results from several studies (Song et al., 2004; Wilson et al., 1996; Hixson, 1991; Davignon et al., 1988) showed that compared to ɛ3 homozygotes, ɛ4 carriers have the highest CHD risk and ɛ2 carriers the lowest. The likely mechanism for the APOE polymorphism effects on CHD risk may be through lipid metabolism (Song et al., 2004; Davignon et al., 1999). However, none of the current statistical approaches take into account this underlying mechanism. The results from this study showed a significant interaction between smoking and APOE polymorphism on CHD, which is consistent with previous findings (Humphries et al., 2001; Djoussé et al., 2002; Stengard et al., 1995; Talmud et al., 2005) and which supports possible pathogenesis of CHD, and roles of lipid levels on bridging APOE, APOE× smoking, and CHD.
PAI1 is an inhibitor of fibrinolysis, serving in the control of atherothrombosis and insulin resistance (Alessi & JuhanVague, 2006). Several SNPs in the human SERPINE1 gene have been identified (Dawson et al., 1993), among which the 4G/5G insertion/deletion polymorphism located in position −675 of the promoter region has been studied extensively. The 4G/4G genotype of the SERPINE1 gene had been associated with higher PAI1 levels compared to the 4G/5G and 5G/5G genotypes (Humphries et al., 1992; Dawson et al., 1993; Eriksson et al., 1995). Elevated plasma PAI1 levels are associated with a reduced fibrinolytic activity that in turn plays an essential role in the pathogenesis of cardiovascular risk and other diseases associated with thrombosis. Studies have demonstrated that PAI1 levels are a risk factor for CHD (Hamsten et al., 1987; Eriksson et al., 1995; JuhanVague et al., 1996), and diabetes (Mansfield et al., 1995; Festa et al., 2002; 2006; Meigs et al., 2006; Kanaya et al., 2006). Similar to the previous report that PAI1 and coronary events are related principally to insulin resistance syndrome (obesity, glucose intolerance, hypertension, and dyslipidemia) (JuhanVague et al., 1996; Anand et al., 2003), we found that a significant SERPINE1× BMI interaction influences the risk of CHD by influencing BG, which has not been reported previously. In contrast, other studies showed a significant interaction between smoking and the SERPINE1 gene on the risk of CHD (Morange et al., 2007; Su et al., 2006). The SERPINE1× BMI interaction effect on CHD independent of traditional risk factors remains to be confirmed in longitudinal studies.
Finally, the present study has limitations. First, the study population is comprised of predominantly Caucasian residents of Framingham, Massachusetts, which restricts the potential applicability of the findings to other ethnic groups where a CVD may be more prevalent. In addition, the inclusion of multiple clinical measurements per subject over time may have enhanced the accurate assessment of GE interactions. However, multiple measurements added complexity causing computation convergence problems and will be pursued in future research.