Molecular epidemiology of preterm delivery: methodology and challenges


Dr XiaobinWang Department of Pediatrics, Maternity 4, Boston University School of Medicine, 91 E. Concord Street, Boston, MA 02118, USA. E-mail:


Preterm delivery (PTD) appears to be a complex trait determined by both genetic and environmental factors. Few studies have examined genetic influence on PTD. The overall goal of our study is to examine major candidate genes of PTD and to test gene–environment interactions. Our study includes 500 preterm trios, including 500 preterm babies and their parents and 500 maternal age-matched term controls. We will perform the transmission/disequilibrium test (TDT) on candidate genes thought to be important in each of the four biological pathways of PTD: (1) decidual chorioamionotic inflammation: interleukin 1 (IL-1), IL-6, and tumour necrosis factor (TNF); (2) maternal and fetal stress: corticotropin-releasing hormone (CRH); (3) uteroplacental vascular lesions: methylenetereahydrofolate reductase (MTHFR); and (4) susceptibility to environmental toxins: GSTM1, GSTT1, CYP1A1, CYP2D6, CYP2E1, NAT2, NQO1, ALDH2, and EPHX. We will also perform standard case-control analyses on the 500 preterm cases and 500 term controls to examine gene–environment interactions. The major environmental, nutritional and social factors as well as clinical variables known or suspected to be associated with PTD will be used to test for gene–environment interactions. This study integrates epidemiological and clinical data as well as genetic markers along major pathogenic pathways of PTD. The findings from this study should improve our understanding of genetic influences on PTD and gene–environment interactions.


The rate of preterm delivery (PTD) remains high in the US at about 17.4% in blacks and 9.8% in whites.1 While previous epidemiological and clinical studies have identified a number of potential risk factors of PTD, the underlying biological mechanisms for these observed associations are poorly understood. Most cases of PTD occurring in the general population cannot readily be explained by any of the known or suspected risk factors.2 In the last two decades, various programmes undertaken specifically to prevent PTD have been largely unsuccessful.3,4 The few that are effective, including treatment of urinary tract infection, cerclage, and treatment of bacterial vaginosis in high-risk women, are not universally effective and apply to only a small percentage of women at risk of PTD.5 A large multicentre randomised clinical trial of treatment of asymptomatic bacterial vaginosis in pregnant women did not reduce PTD.6 Even in a low-income population with prevalent environmental risk factors, a randomised controlled trial of a PTD prevention programme did not significantly reduce the rate of PTD.7 It is generally agreed that the major obstacle to PTD prevention has been our incomplete understanding of its pathophysiology. This underlines the need for future research of PTD to go beyond the classical epidemiological approach and to look beyond traditional risk factors.

The current literature has provided strong evidence of a familial or intergenerational influence on PTD or low birthweight (LBW). A study from Scotland8 found that sisters of women who had delivered preterm LBW infants were more likely to have a preterm infant than the sisters of women who had delivered term growth-retarded infants. A Norwegian study9 suggests no significant association between mother and offspring preterm status. However, a US study showed an increased risk of PTD among women who themselves were born before 37 weeks’ gestation.10 A mother’s own birthweight is also an important determinant of her infant’s birthweight. Infants born to LBW mothers have lower mean birthweight and are more likely to be LBW than those born to normal birthweight mothers, even after accounting for other relevant maternal and infant covariates.9,11[12]–13 A previous history of LBW or PTD is one of the most important risk factors for a subsequent PTD.14 It has also been shown that the risk of PTD increases substantially with the number of previous LBW or preterm infants. Bakketeig and coworkers15 showed that the risk of PTD (defined as < 36 weeks in their study) in the second pregnancy was 14.3% if the first birth was preterm and 28.1% for the third pregnancy if both prior births were preterm. The risk of recurrence did not appear to be affected by the presence of medical complications, the length of the interpregnancy interval, or fetal survival. Our group has demonstrated strong familial aggregation of LBW in both US white and black populations.16 The combined effects of the mother’s birthweight and that of the index child on the risk of PTD or IUGR in the siblings are either additive or interactive. Consistently, a recent study17 based on US white and black populations suggests that recurrence of PTD contributes a notable portion of all PTDs, especially at the shortest gestations. The strong familial or intergenerational influences on PTD or LBW may be attributed to environmental factors, genetic factors, or both.

In contrast to many studies that consider the relative contribution of both genetic and environmental factors to a number of human complex diseases such as cancer, obesity, diabetes, asthma and hypertension, few studies have examined genetic influences on PTD. We hypothesise that PTD is a highly heterogeneous complex entity determined by multiple genetic and environmental factors. As illustrated in Fig. 1, clinical and experimental evidence indicate that most PTDs may result from four pathogenic processes: (1) decidual-chorioamniotic inflammation caused by ascending genitourinary tract or systemic infection; (2) maternal-fetal hypothalamic-pituitary adrenal axis activation caused by stress; (3) uteroplacental vascular lesions caused by coagulopathy, hypertension, and vascular lesions; and (4) susceptibility to environmental toxins. Ultimately, the four pathways converge on final clinical presentations characterised as preterm labour, preterm premature rupture of the membranes (PPROM), or medical induction due to maternal or fetal health threat, all of which lead to PTD. The central hypothesis of our study is that the polymorphisms of candidate genes in the four pathogenic pathways of PTD, independently or interacting with environmental factors, are associated with PTD. We are conducting a molecular epidemiological study to examine the effects of important candidate genes and their potential interactions with environmental factors on PTD among 500 preterm trios including 500 preterm babies and their parents and 500 maternal age-matched term controls. The specific aims are: (1) to perform the transmission/disequilibrium test (TDT) on candidate genes thought to be important in the major biological pathways of PTD (see Table 1); and (2) to conduct a case-control analysis among 500 preterm cases and 500 term controls to examine gene–environment interactions. Major environmental, nutritional and social factors known or suspected to be associated with PTD will be assessed with respect to gene–environment interactions. In the next section, we review the epidemiological and clinical evidence, molecular biology/genetic studies, and major candidate genes for each pathway. We are aware of other potential pathogenic pathways, but their significance is less substantiated by the literature.

Figure 1.

 Pathogenic pathways of preterm delivery

Table 1.  Candidate genes for preterm delivery Thumbnail image of

1. Decidual-chorioamniotic-inflammation pathway

Epidemiological and clinical evidence

There is increasing epidemiological and clinical evidence that amniochorionic-decidual infections play a role in PTD. The epidemiological profile of women at risk for PTD overlaps that of women at risk for acquiring sexually transmitted diseases (i.e. poor, young, minority, inner city, unmarried).18 Vaginal pathogens including N. gonorrhea, C. trachomatis, T. vaginalis, and Bacteroides spp., as well as asymptomatic bacteriuria, are found in greater frequency among women with PTD.19 Particularly striking are seven studies (two case-control and five cohort) which reported an increased risk of PTD in women with bacterial vaginosis,20[21]–22 with relative risks ranging from 2.0 to 6.9. However, randomised clinical trials on the efficacy of antibiotic treatment of bacterial vaginosis to prolong the pregnancy have yielded mixed results.6,23[24][25]–26 Systemic infections are also associated with PTD, including pyelonephritis,27 pneumonia,28 peritonitis29 and periodontal disease.30 These findings suggest that an infection even remote from the uterus can activate an inflammatory process that triggers a uteroplacental response, leading to PTD.

Inflammatory mediators and candidate genes

Pro-inflammatory cytokines (e.g. IL-1, IL-6, TNF) are mediators of inflammation produced by the macrophage/monocyte system in response to bacterial products. They are part of, and stimulate further, the cascade of signals that is the inflammatory response to infection.31[32][33][34]–35 The amniotic fluid of patients with PTD and intra-amniotic fluid infections display detectable levels of bacterial-derived endotoxin and IL-1 and TNF.31[32][33][34][35][36][37][38][39]–40 Levels of these cytokines also correlate with histological chorioamnionitis.33,41 IL-1 and TNF stimulate uterotonin expression including prostaglandin E2 (PGE2)37,38,42[43]–44 and endothelin.45 PGE is a powerful stimulant of myometrial contractions. Clinically, systemic or local administration of PGE2 induces labour. A study of 68 women with preterm labour46 found that amniotic fluid concentrations of prostaglandin E2 were significantly greater in women with PTD and intra-amniotic infection than in women without infection. A more recent study showed an increase in prostaglandin bioavailability before onset of labour.47 The effect of IL-1 and TNF can be further amplified by IL-6, which is secreted by cultured decidual and chorionic cells in response to IL-1 and TNF.45,48,49 Activation of the cytokine network also enhances decidual, fetal membrane, and cervical ECM-degrading protease activity.50[51]–52 The concerted effects of these proteases are efficient degradation of collagen, laminin, elastin, and fibronectin, which are crucial ECM components of the fetal membranes, decidua, and cervix.

In sum, the recent increase in knowledge about infection and PTD has shed new light and raised many questions.53 The inflammation pathways appear to be extremely complex and selection of candidate genes and understanding the contribution of any single gene can be a challenge. Our study focuses on genes that encode IL-1, IL-6, and TNF and examines whether maternal or fetal variant genotypes are associated with increased risk of amniochorionic-decidual infection and PTD.

2. Stress and activation of the hypothalamic-pituitary-adrenal (HPA) axis

Epidemiological and clinical evidence

Epidemiological factors commonly associated with maternal stress are also associated with PTD.54,55 The incidence of PTD is increased among unmarried and poor mothers,56 African-Americans even after controlling for socio-economic status,57,58 patients with major stressful events,54,59 patients with elevated psycho­logical scores for anxiety,60 and those subjectively reporting increased stress and anxiety.61 A recent study demonstrated an inverse correlation between levels of psychosocial and physiological stress and cervical length.62 The link between fetal stress and PTD is suggested by the increase in placental vascular lesions and intrauterine growth retardation among patients delivering preterm without infections or overt pre-eclampsia.63,64

Stress mediators and candidate genes

When individuals are under internal and/or external stress, they undergo a cascade of neuroendocrine responses. Corticotropin-releasing hormone (CRH) is the major hypothalamic regulator of the mammalian stress response. In addition to expression in the central nervous system, CRH is also expressed by trophoblasts in placenta and chorion, as well as by amnion and decidual cells.65[66][67][68][69][70]–71 Plasma CRH levels rise during the second half of pregnancy, peak during labour, and rapidly decline postpartum.72[73][74]–75 It has been suggested that activation of the fetal HPA axis drives a CRH-mediated ‘placental clock’ that triggers the onset of parturition at term.11,72,73,76 Similar HPA axis-modulated pathways also appear to be capable of triggering stress-induced PTD. A few studies showed that maternal CRH levels rise precociously among women who deliver prematurely.73,77

Parturition appears to be induced by CRH in two pathways. CRH mediates pituitary adrenocorticotropin (ACTH) secretion. The latter enhances adrenal cortisol secretion.78,79 Moreover, hypothalamic-induced activation of the fetal HPA axis is associated with increase in fetal ACTH and cortisol.80 Thus, activation of the maternal or fetal HPA axis would lead to increased levels of cortisol which, in turn, would result in enhanced placental CRH production,81 which leads to enhancement of prostanoid production by isolated amnion, chorion, and decidual cells.68[69][70]–71,81,82 Prostaglandins act as direct uterotonins, but also enhance myometrial receptivity by increasing oxytocin receptors83 and formation of gap junctions.84 Prostaglandins also elicit cervical change by enhancing ECM turnover.85 CRH also appears to induce parturition by stimulating the secretion of DHEAS from the fetal adrenal gland.76 DHEAS is the obligate precursor of placenta oestrone (E1), oestradiol (E2), and oestriol (E3).86 Oestrogens interact with myometrium to enhance gap junction (connexin 43) formation,87 oxytocin receptor,88 prostaglandin activity,89 myosin light chain kinases (MLCK) and calmodulin expression.90

In sum, current data suggest that both maternal and fetal stress with resultant activation of the HPA axis appears to be an important pathogenic pathway of PTD. The relative contribution of maternal and fetal genes in this pathway has not been evaluated. Our study chooses the gene encoding CRH and investigates both maternal and fetal CRH gene polymorphisms in relation to the risk of PTD. We are also interested in whether there are interactions between maternal CRH genotype and psychosocial stressors before and during pregnancy in relation to PTD.

3. Uteroplacental vasculopathy

Epidemiological and clinical evidence

The potential importance of a vascular pathway to PTD has recently been emphasised.91 Decidual haemorrhage presenting as vaginal bleeding in the first and subsequent trimesters is associated with a threefold increased adjusted relative risk for PTD due to preterm labour with intact membranes.92 Hager et al.93 observed that vaginal bleeding in more than one trimester carried the highest identifiable risk of PPROM with an odds ratio of 7.4. Ekwo et al.94 found an adjusted odds ratio of > 100 for PTD among women who experienced vaginal bleeding in more than one trimester when their previous pregnancy had been complicated by PPROM. When subchorionic haemorrhage is detected by ultrasound, the risk of PTD as well as stillbirth, miscarriage, and abruptio placentae are increased.95 The normal function of placental vessels depends on the balance of proco­agulant and anticoagulant mechanisms for damage repair and maintenance of blood fluidity. Pregnancy induces marked changes in the coagulation system and may increase the risk of thromboembolic events, especially among pregnant women who have acquired or have genetic risk factors for thrombosis.96 Below we review one condition that may affect such risk.

Hyperhomocysteinaemia (HHC) and candidate genes

HHC is indicative of disrupted homocysteine metabolism. It occurs in the rare hereditary homocystinuria but more commonly results from a combination of vitamin B12 or folate deficiency and mutations in the gene encoding the enzyme methylenetereahydrofolate reductase (MTHFR). The missense mutation (C677T) of MTHFR gene has been associated with reduced MTHFR activity and modestly increased plasma homocysteine concentrations, particularly in persons with plasma folate levels below the median.97 Homozygosity for the MTHFR mutation is found in 10–20% of the population, but this mutation varies significantly in populations.98[99]–100 A second common mutation in the MTHFR gene (A1298C) has recently been identified.101 A significant interaction appears to exist between the C677T and A1298C mutations.101 HHC has been associated with increased risk of thromboembolism102,103 and of coronary heart disease.104,105 Moreover, even mildly elevated plasma homocysteine (about 30% above normal controls) has been identified as an independent risk factor for numerous vascular disorders, including cerebrovascular,106 cardiovascular, and peripheral vascular disease.107 Studies have also shown that HHC-inducing interaction between MTHFR mutation and low folate intake accounts for a substantial portion of neural tube defects.108 Particularly pertinent to this study are recent reports linking HHC to increased risk of pre-eclampsia,97,109[110]–111 recurrent miscarriage,112,113 and placental abruption or infarction.113,114

In sum, uteroplacental vasculopathy appears to be an important pathogenic pathway of PTD. HHC (as a result of MTHFR mutation and/or low folate intake) may be an important underlying condition. Our study investigates whether MTHFR gene polymorphisms affect the risk of uteroplacental vasculopathy and PTD, and assess potential interaction of MTHFR gene polymorphisms with low folate intake on the risk of uteroplacental vasculopathy and PTD. It is noted that genes encoding other metabolic enzymes may also affect HHC levels. For example, B12-dependent methionine synthase (MS), an enzyme that catalyses the remethylation of homocysteine to methionine, also plays an important role in the remethylation pathway of homocysteine.115 Thus, in addition to MTHFR gene, other homocysteine metabolism genes may be of future interest.

4. Genetic susceptibility to environmental toxins

Humans are exposed to a variety of reproductive toxicants. A growing body of evidence demonstrates an association between environmental and occupa-tional exposures and adverse reproductive outcomes. Exposures studied include cigarette smoking,116,117 caffeine consumption,118 pesticides119,120 and organic solvents and related compounds.121[122][123][124][125]–126 Nevertheless, not all women who are exposed have adverse reproductive outcomes. It is speculated that the reproductive risk associated with exposure to endogenous or exogenous chemicals may be modified by genetic variation in metabolic detoxification activities.127 The metabolic detoxification process involves two parts: phase I, in which the original non-polar compound becomes polar and reactive, and phase II, in which the transformed polar compound is conjugated with certain endogenous functional groups such as glutathione, sulphate, glucuronide, and amino acids; thus, the end product becomes a stable hydrophilic compound that can easily be excreted.128 In humans, a significant proportion of these metabolic genes are polymorphic. As multiple alleles exist at loci encoding chemical-metabolising enzymes, the expression of different host susceptibility phenotypes may explain the considerable variability in pregnancy outcomes associated with environmental toxins. For example, the cytochrome P450 family serves as the major enzyme system in phase I metabolism. CYP1A1 is a well studied phase I enzyme, and its polymorphism has been associated with individual cancer susceptibility.129,130 The glutathione S-transferases (GSTT1 and GSTM1) are the major phase II enzymes. Our study of a Chinese population131 showed that the GSTT1 deletion genotype significantly modified the risk of increased sister chromatid exchange among workers exposed to benzene. In combined phase I and phase II enzyme disorders, a 40-fold increased risk of tobacco smoke-induced lung cancer was observed in individuals with susceptible CYP1A1 and GSTM1 genotypes,132,133 which suggests that phase I and phase II enzymes have a synergistic effect.

In summary, available data support the hypothesis that a woman’s reproductive risk is related to both her environmental exposures and her genetic susceptibility to adverse effects of these exposures. Our study will focus on nine metabolic genes known to lead to genetic differences in metabolic detoxification capacity: GSTM1, GSTT1, CYP1A1, CYP2D6, CYP2E1, NAT2, NQO1, ALDH2, and EPHX.

Study design and methods

Study population

Our study includes 500 preterm trios (mother, father, and infant) and 500 term controls in Anqing, China. Anqing is a city, stretching for about 80 km along the north bank of the Yangtze river. It has three urban areas and eight rural counties, with a total area of 15 000 km2. The total population in 1990 was 5.8 million (10% urban and 90% rural), with birth, mortality, and natural growth rates of 21.0, 14.7 and 6.3 per thousand, respectively, in 1995. Anqing Maternal and Child Health Care Center (AMCHCC) was established in 1971 and currently has 95 physicians, nurses and staff members. There are four major general hospitals with over 2500 beds, which provide medical service for 95% of the urban population. All urban residents are required to undergo physical examinations in AMCHCC before their marriage registration. When the married woman becomes pregnant, she receives free prenatal health care in AMCHCC and in one of the four major general hospitals. This well-established maternal health care system has provided a unique opportunity to conduct prospective genetic and environmental epidemiological studies in reproductive health.

PTD cases are defined as gestational age < 37 weeks and controls are those with gestational age between 39 and 42 weeks and birthweight within the 25th and 75th gestational age-specific percentiles for the study population. The controls were matched with cases by maternal age (± 5 years) and date of delivery (± 3 days). Women with multiple gestation, chromosomal abnormality or major birth defect, and known history of incompetent cervix, or PTD due to maternal trauma were excluded. Since 1996, we have recruited a total of 500 preterm trios.

Data collection procedures

Recruitment of subjects

All non-smoking and non-drinking women aged 20–34 years taking physical examinations for marriage registration in AMCHCC were invited to participate. After consent forms were signed, trained interviewers administered questionnaires to the women and their husbands to collect baseline information. In addition, blood samples, height and weight were also taken by trained examiners according to standard protocols.

Confirmation of pregnancy and follow-up

When a woman missed a period or developed early signs or symptoms of pregnancy, she was evaluated at AMCHCC by an obstetrician, including confirmation of her last menstrual period (LMP), a urine β-hCG test and an obstetric examination. Therefore, gestational age was accurately estimated for each individual woman. All the women subjects received routine prenatal care at AMCHCC until the third trimester of the pregnancy. Then they received routine prenatal care, delivery services, and postnatal care at one of the four major general hospitals in Anqing.

Collection of epidemiological and clinical data

Trained interviewers administered previously validated questionnaires to each eligible woman and her husband at the following time points. (1) At enrolment: a baseline questionnaire was administered to obtain information on socio-demographic characteristics, current medication, health status, reproductive history (especially contraceptive use, abortion and infertility), job activities, occupational exposure to dust, chemicals, radiation, noise and heat, ergonomic aspects, job-related stress, social support, physical activities, active and passive smoking, indoor coal combustion, cooking oil fumes, indoor coal use, air conditioner, consumption of tea, coffee and alcohol and diet. (2) At first prenatal visit: a questionnaire is administered to record whether during the study period any changes occurred in smoking, alcohol use, home environment, medication, occupational exposures, and health status. (3) At delivery: a labour and delivery record was completed by a trained nurse to record pregnancy outcomes, time of delivery, gender of the child, birthweight, and complications during pregnancy or labour and delivery. In addition, dietary information before and during the pregnancy was obtained using a food frequency questionnaire. A cord blood sample was collected from each eligible newborn. The study protocols were reviewed and approved by the Human Subject Committee of both Anqing and Boston University.

Selection of candidate genes

This proposed study will evaluate major candidate genes in the four pathogenic pathways of PTD as listed in Table 1. Information on each candidate gene’s chromosome location, known genetic polymorphisms, genebank account number, and source of reference are provided. We also summarised the impact of gene mutation on the functional activity of the gene to the best of our knowledge in Table 2. We did not include the well-described TNFA G-308A polymorphism because our pilot study in 1057 Chinese subjects showed that its variant allele frequency was < 6%. We did not study the ADH3 gene because the prevalence of alcohol drinking is very low in Chinese women (< 2%). Our selection of candidate genes reflects the current knowledge of important genetic influences on PTD. The rapid advancement of the Human Genome Project and biomedical research may reveal novel and important candidate genes of PTD.

Table 2.  Impact of gene polymorphisms on gene expression and protein activity Thumbnail image of

Statistical analysis and challenges

General strategies

Simple exploratory analyses will first be performed to determine whether any transformations may be necessary, to identify any data problems, and to identify relationships that may warrant further exploration. We will apply the transmission disequilibrium test (TDT) as well as regression analyses suitable for case-control design to address the specific aims of our study.

TDT is one of the most popular procedures for testing genetic association.147 It tests for non-random transmission of an allele from parents heterozygous for that allele to a well-defined class of offspring. Ordinarily, heterozygous parents will transmit either allele with 50% probability. However, in the presence of a genetic association with a disease, then alleles important to the disease process (or in linkage disequilibrium with such alleles) will be transmitted preferentially. Such deviations can be detected by a simple application of McNemar’s statistic. We have chosen TDT to investigate the candidate genes for three major reasons. First, with rapid progress of the Human Genome Project, hundreds of genes have been mapped and can be used for association studies on complex diseases such as PTD. Secondly, for a marker that is extremely close to a preterm locus or is the preterm locus itself, TDT can be far more powerful than conventional linkage tests.147[148]–149 Thirdly, although it is simple in design, TDT is equivalent to a randomised experiment and therefore is resistant to confounding.

Case-control design

While TDT provides an appealing approach to detecting genetic associations with disease, a disadvantage of TDT is that there can be some loss of power due to the fact that only heterozygous parents can be used in the analysis. Recent theoretical work150,151 has indicated that standard case-control methodology can be more powerful than family based tests such as TDT in detecting genetic association with specific phenotypes. We will first use a number of common statistical techniques including single-locus allele contingency tables and χ2 to examine the association of the major candidate genes with PTD in both the mothers and the infants. The regression based approach will provide us with good power to detect the genetic main effects of interest, as well as gene–environment interactions.152 We will use hierarchical modelling techniques to overcome problems associated with sparse data.153 Further discussion regarding the value of hierarchical models in epidemiological settings can be found in Rothman and Greenland.152

Multiple comparisons

As in most genetic epidemiological studies, multiple testing is an important consideration. We have chosen not to apply the standard Bonferroni-type adjustments for multiple comparisons, which involve dividing the desired type I error rate by the number of planned comparisons to ensure that the overall study type I error remains at the appropriate level. As argued recently in the genetic epidemiology literature, such approaches may be unnecessarily conservative.154 Many researchers argue that the multiplicity problems encountered in genetic epidemiology research require an alternative paradigm for handling the problem. Rothman and Greenland152 argue that hierachical Bayesian models can provide an excellent framework for handling multiplicity. The kinds of models they recommend are not Bayesian in the sense that they require informative prior knowledge. Rather, Bayesian computational methods are used to fit models that allow covariate effects to be modelled as random. Aragaki et al.155 discuss this approach specifically in the context of a study involving gene–environment interactions. We plan to use a similar approach for our analysis.

Gene–environment interactions

For specific Aim 2, we will be interested in testing for gene–environment interactions. Such tests can easily be incorporated into analyses of all outcomes of interest by the creation of a new covariate reflecting an interaction between genotype and exposure in a model that includes an exposure covariate X1 that takes the value 1 if exposed and 0 otherwise and a second covariate X2 that takes the value 1 if the individual has the high-risk genotype and 0 otherwise. We ignore the inclusion of confounding variables for the purpose of our discussion here, but they are easily added to the model below. Suppose the outcome is represented by Y, and, for illustration, suppose that Y is binary so that a logistic model is appropriate. If we fit a model such as Logit(pr(Y = 1)) = a0 + a1*X1 + a2*X2+ a3*X1*X2, then a1 can be interpreted as the log-odds associated with exposure for individuals with the common genotype (X2 = 0) and (a1 + a3) is the corresponding log-odds for individuals with the high-risk genotype (X2 = 1). The finding that a3 is significantly different from zero suggests that genotype modifies the effect of exposure. In a similar manner, we can also evaluate gene–environment interaction for departures from additivity using linear regression models.

Adjustment for confounders

An extensive list of potential confounding variables will be collected through questionnaires, clinical records and laboratory test results. For each specific aim and hypothesis, we will select a subset of variables as potential confounders to be controlled for based on biological plausibility, timing of exposure, and causal pathway. Selection of the covariates or confounding factors for all multivariate models will use a combination of standard statistical procedures for variable selection (e.g. stepwise regression) and model manipulation based on biological considerations. Besides the independent contribution to the outcome of interest, each potential confounding factor associated with a change-in-estimate of ± 5% or more in the multiple regression model, will be included. The effects of multiplicative interactions between the covariates or between the genes or between the genes and environments on PTD will be explored analogously.

Power consideration

For an association study of 500 preterm trios using TDT, with α = 0.01, our power calculation indicates that, except for cases of low allele transmission (0.55) and low allele frequency (0.05), the power is extremely high, in many cases over 99%. For testing gene–environment interactions, with two-sided statistical tests at significance level 0.01, we will have an excellent power to detect gene–environment interactions with 500 cases and 500 controls for relatively frequent alleles and environmental exposures.


This study has the following unique features. It has the ability to obtain detailed epidemiological and clinical data as well as blood samples from a large number of preterm trios and term controls from a homogeneous population, which permits a comprehensive genetic-epidemiological analysis of preterm delivery with sufficient statistical power. This study will evaluate important candidate genes in four pathogenic pathways of PTD and will test gene–environment interactions. This study takes advantage of the advancement in biotechnology and the Human Genome Project by utilising candidate gene and sequence information. The findings from this study will enhance our understanding of the aetiology of PTD.

We have chosen to use a Chinese population for several reasons. First, PTD is less prevalent in China, partially due to high-quality prenatal care, optimum age of bearing a child, and low prevalence of environmental and social risk factors (e.g. maternal active smoking and alcohol drinking), so that the PTD is more likely to be genetically predisposed. Secondly, as a result of previous projects, epidemiological and clinical data as well as blood samples from 500 preterm trios and 500 controls have already been collected. This makes our study both cost-effective and time-saving. Thirdly, since the women have been prospectively followed up from early pregnancy, the information on gestational age and time-dependent covariates should be accurate. Finally, all the study subjects were from a large homogeneous population, thus increasing the power of TDT to detect genetic associations. Because we have obtained both parental and fetal genotypes, we will be able to apply both TDT and more standard regression-based techniques used in standard case-control settings. Being able to perform both types of analysis is an important strength of our study, as the two have different strengths and weakness and are complementary in many ways as discussed earlier.

Few studies have examined gene–environment interactions in relation to PTD. Genetic susceptibility is important to consider in assessing whether an exposed individual is at increased risk. The markers of susceptibility can be incorporated into epidemiological models as effect modifiers to study gene–environment interactions in relation to health outcomes. Our group has demonstrated significant interactions between metabolic detoxification genes and various environmental toxins on adverse pregnancy outcomes. For example, the association between low-level benzene exposure and shortened gestation was significantly modified by genetic susceptibility as defined by two susceptibility genes: CYP1A1 (HincII polymorphism) and the GSTT1 (deletion polymorphism).156 The association between organophosphate pesticides exposure and male reproductive outcomes was significantly modified by paraoxonase 1 gene polymorphisms.157 We believe that a gene–environment approach offers a novel and promising research direction for PTD.

In summary, efforts to prevent PTD have been hampered by a poor understanding of the underlying aetiology. The most promising approach therefore is to elucidate the biological pathways of PTD and to understand the role of genetic and environmental factors in the pathogenesis of PTD at the molecular genetic level. Our study is among the first to evaluate major candidate genes of PTD and to test gene–environment interactions along the four pathogenic pathways of PTD. It is further strengthened by the coordinated use of both TDT and case-control designs. It has the potential to identify novel genetic variants and gene–environment interactions responsible for PTD. When specific genetic variants associated with PTD are detected and confirmed, the road is paved for further investigation into their biological functions in relation to PTD. Such discoveries would shed light on the pathophysiology of PTD, possibly leading to better strategies for prevention, diagnosis and treatment.


This study is supported in part by grant 20-FY98-0701 from the March of Dimes Birth Defects Foundation; by grants R825818 from the Environmental Protection Agency; 1R01 HD32505-01 from the National Institute of Child Health and Human Development; and 1R01 ES08337-01 from the National Institute of Environmental Health Science; by the Barbara and Joel Alpert Children of the City Endowment Fund from the Department of Pediatrics, Boston University School of Medicine, and Boston Medical Center.