Predicting Risk of Spontaneous Preterm Delivery in Women with a Singleton Pregnancy

Authors

  • Nils-Halvdan Morken,

    Corresponding author
    1. Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
    2. National Institute of Environmental Health Sciences, Epidemiology Branch, Durham, NC
    3. Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
    • Correspondence:

      Nils-Halvdan Morken, Department of Global Public Health and Primary Care, University of Bergen, Kalfarveien 31, Bergen 5018, Norway.

      E-mail: nils-halvdan.morken@kk.uib.no

    Search for more papers by this author
  • Karin Källen,

    1. Tornblad Institute, University of Lund, Lund, Sweden
    Search for more papers by this author
  • Bo Jacobsson

    1. Department of Obstetrics and Gynecology, Institute for the Health of Women and Children, Sahlgrenska University Hospital, Göteborg, Sweden
    Search for more papers by this author

Abstract

Background

Prediction of a woman's risk of a spontaneous preterm delivery (PTD) is a core challenge and an unresolved problem in today's obstetric practice. The objective of this study was to develop prediction models for spontaneous PTD (<37 weeks).

Methods

A population-based register study of women born in Sweden with spontaneous onset of delivery was designed using Swedish Medical Birth Register data for 1992–2008. Predictive variables were identified by multiple logistic regression analysis, and outputs were used to calculate adjusted likelihood ratios in primiparous (n = 199 272) and multiparous (n = 249 580) singleton pregnant women. The predictive ability of each model was validated in a separate test sample for primiparous (n = 190 936) and multiparous (n = 239 203) women, respectively.

Results

For multiparous women, the area under the ROC curve (AUC) of 0.74 [95% confidence interval (CI) 0.73, 0.74] indicated a satisfying performance of the model, while for primiparous women, it was rather poor {AUC: 0.58 [95% CI 0.57, 0.58]}. For both primiparous and multiparous women, the prediction models were quite good for pregnancies with comparatively low risk for spontaneous PTD, whereas more limited to predict pregnancies with ≥30% risk of spontaneous PTD.

Conclusions

Spontaneous PTD is difficult to predict in multiparous women and nearly impossible in primiparous, by using this statistical method in a large and unselected sample. However, adding clinical data (like cervical length) may in the future further improve its predictive performance.

Spontaneous preterm delivery (PTD) includes preterm labour and preterm pre-labour rupture of the membranes leading to PTD and designates the largest subgroup of the PTD syndrome.[1] Prediction of a woman's risk of a spontaneous PTD and the management of high-risk women are core challenges and unresolved problems in today's obstetric. Development of a risk predicting system of high quality will help in providing the right amount of efforts and recourses into treatment of the right patients and make counselling of patients easier. What is most needed is a set of techniques that early in pregnancy make it possible to assess a women's individual risk of spontaneous PTD given a certain set of risk factors.

The results from the development of risk scoring systems and diagnostic tests have so far been rather disappointing, and no adequate screening tool is available for early detection of PTD. The numerous previously developed risk scoring systems[2-10] aim at predicting risk of PTD based on past obstetric history. No scoring system has been shown to be superior to clinical judgement.[11] In otherwise asymptomatic women, the present risk scoring systems used in early pregnancy have a wide range of accuracy of predicting PTD, and they have so far not been recommended in clinical practice.[12]

Smith and colleagues have developed a statistical method that combines logistic and Bayesian methods.[13] This method has been applied in the prediction of caesarean section risk.[13] The most important benefit from the applied statistical technique is that calculation of optimal replacement constant makes missing data easy to handle and the model more robust. Smith et al. posed that this method may be generally applicable for clinical estimation of risk.

The aims of this study were to identify the most important available predictive variables for spontaneous PTD from a large and unselected sample, apply a statistical technique developed for other obstetric outcomes, assess if spontaneous PTD can be predicted, and validate the final models in a separate validation data set that was kept apart from the developmental data base.

Methods

A nationwide population-based register study was designed using data from the Swedish Medical Birth Register. Information is collected prospectively by the staff responsible for patient care and includes demographic data, reproductive history, and complications during pregnancy, delivery, and the neonatal period. Copies of standardized individual antenatal, obstetric, and paediatric records are forwarded to the Birth Register, where the information is automatically entered into a database and stored. All births and deaths in Sweden are validated every year through individual record linkage to the Swedish Population Register, using the mothers' and infants' unique national registration numbers, assigned to each Swedish resident at birth. The Medical Birth Register is maintained by the Swedish National Board of Health and Welfare.

The Swedish Medical Birth Register contains information on 98–99% of all births in Sweden. It is one of the most complete birth registers in the world and has previously been validated.[14] The quality is also controlled on an annual basis. Register information on gestational age is considered to be reliable.[14]

In the Swedish Medical Birth Register, the following sources are available to estimate gestational age: (1) date of last menstrual period, (2) corrected expected date of parturition according to last menstrual period (the estimate made by the midwife at the antenatal care centre, essentially based on last menstrual period and menstrual cycle length), (3) expected date of parturition according to ultrasound, and (4) estimated gestational age at birth reported by the delivery unit. Using these sources in hierarchical order, the best available estimate of gestational age for each infant is determined and designated best estimate. According to this method, the gestational age determined by ultrasound was preferred when available and not too incongruous with the other sources.

We used, wherever possible, predictive variables that were collected early in pregnancy, i.e. (before second trimester) maternal smoking was collected at the beginning of pregnancy and reflected pre-pregnant smoking. Haemorrhage during pregnancy meant that data were collected any time during the early phase of pregnancy.

Type of birth onset, i.e. spontaneous or induced labour or pre-labour caesarean section, has been registered in the Swedish Medical Birth Register since late 1990s. In cases of preterm pre-labour rupture of the membranes [International Classification of Diseases (ICD) 9:6581; ICD 10:O42], births were regarded as spontaneous PTDs regardless of the reported onset of labour. This variable has previously been validated and is considered to be reliable.[1] All registered iatrogenic preterm births, according to the definition outlined above, were excluded from further analysis (outlined in flow chart, Figure 1).

Figure 1.

Flow chart showing the data selection process.

Included in the current study were singleton pregnancies with deliveries occurring during the period 1992–2008, for which required data were available in the Swedish Medical Birth Register. Data for primiparous women and multiparous women were analysed separately. Prior to the model development, each of the 878 991 included pregnancies were randomly assigned to either development sample (odd number in mother's day of birth) or to test sample (even number in mother's day of birth). The data preparation process is described in detail in a flow chart (Figure 1). Model validation is important,[15] and the developmental and test samples were kept apart.

Statistical methods

The prediction models for primiparous women and multiparous women, respectively, were developed and evaluated in four steps. First, the development samples were used to determine the most important factors for predicting spontaneous PTD, using univariate and, finally, multivariable logistic regression analyses (see detailed description below). Secondly, the outputs of the final fitted multiple logistic regression models were used as previously described in detail by Smith and colleagues[13] in their prediction model for caesarean section risk. Thus, the outputs were converted into likelihood ratios (LR). The method can briefly be summarized into the following steps: (1) an optimal replacement constant was estimated for the included independent variables in the logistic regression model to be used when information on the variable was lacking (x1, x2 etc. is lacking); (2) the LRs were adjusted by assessing the difference between the replacement constant and the actual ‘overall log odds’ for the outcome in the development sample; and (3) the above process was repeated for each included independent variable, and the output was used to calculate adjusted LR for PTD. Thirdly, the outputs from the step above were used to calculate the individual LR for spontaneous PTD for each woman in the test samples (primiparous and multiparous women, respectively). The obtained LR estimates were used to create receiver operating characteristic (ROC) curves, and to calculate the area under the ROC curve (AUC) with 95% confidence interval [CI].

Finally, the observed and predicted rates of spontaneous PTD in the test sample were calculated for various ‘predicted risk strata’. The predicted risk for spontaneous PTD for each woman in the test samples was calculated using the Bayesian theorem:

display math
display math

The initial logistic regression analyses were performed in three steps. First, for each factor evaluated, the best model (continuous linear, continuous second degree polynomial, or class variables) was determined by considering the levels of significance and the goodness of fit[16] of the different models. When the best model of each factor was chosen, the factors with P-value <0.20 were entered into a multiple logistic regression analysis. When determining the level of significance of factors represented by a second degree polynomial, or several class variables, the simultaneous significance level of the fractions was considered (and not the individual P-values). The final multiple logistic regression model included significant factors (P < 0.05) only. The best performing model was selected based on a trade-off between the overall P-value of the model and the Hosmer–Lemeshow goodness of fit test.

For both groups (primiparous and multiparous women), the following factors were evaluated in their relation to spontaneous PTD using univariate multiple logistic regression analyses: maternal characteristics (age [linear model for primiparous, second degree polynomial for multiparous women], height [linear], body mass index [BMI, kg/m2][second degree polynomial], smoking [semi-continuous linear: 1 = no, 2 = 1–9 cigarettes per day, 3 = 10, or more cigarettes per day]), maternal pre-pregnancy disease (diabetes, hypertension, asthma, Crohn's disease, epilepsy) (yes/no classes), pregnancy complications/foetal abnormalities (urinary tract infections, haemorrhages, Down syndrome, neural tube defects, kidney malformations) (yes/no classes), discrepancy between gestational age according to ultrasound (GA_U) and gestational age based on date of last menstrual period (GA_LMP) (classes ≥ +14 days, +7 to +13 days, 6 to +6 days, 13 to −7 days, and ≤ −14 days), and obstetric history (number of previous spontaneous abortions [linear], number of years of involuntary childlessness [linear]). For multiparous women, more information regarding obstetric history were included: the number of previous children (classes 1, 2, 3, or more), gestational duration (weeks, linear) of last pregnancy, and the interval (years, linear) between the last and the current pregnancy.

The statistical analyses were performed using Gauss (GaussTM, Aptech Systems Inc., Maple Valley, WA, USA; http://www.aptech.com). The ethics committee, Göteborg, Sweden, approved the study (Reference number: Göteborg 258-07). The National Board of Health and Welfare approved the use of data from the Swedish Medical Birth Register.

Results

Table 1 shows the demographic description, maternal and foetal characteristics, maternal disease, pregnancy complications, and obstetric history, by the four study groups (one development sample and one validation sample for primiparous women and multiparous women, respectively). The rates of spontaneous PTD were 4.3% and 2.6% among primiparous and multiparous women, respectively.

Table 1. Demographic description, maternal and foetal characteristics, maternal disease, pregnancy complications, and obstetric history by study group
 Primiparous womenMultiparous women
Development dataset (n = 199 272)Test dataset (n = 190 936)Development dataset (n = 249 580)Test dataset (n = 239 203)
 %%%%
Maternal age (years)    
<203.23.10.20.2
20–2422.022.27.87.8
25–2940.340.231.031.1
30–3426.426.439.439.6
35–397.17.018.418.2
40+1.01.03.33.2
Maternal height (cm)    
<1500.10.10.10.1
150–1598.78.89.19.1
160–16955.355.156.456.3
170–17933.733.832.532.7
180+2.22.21.91.8
Maternal BMI (kg/m2)    
2012.011.99.69.7
20–2458.859.055.455.3
25–2921.321.124.724.7
30+7.98.010.310.3
Maternal smoking    
No88.688.787.287.2
<10 cigarettes per day8.58.38.18.2
≥10 cigarettes per day3.03.04.64.7
Difference GA_U and GA_LMP    
≥+141.21.21.01.0
+7 to +132.72.72.62.6
−6 to +671.271.372.372.4
−13 to −718.017.817.917.9
≤−146.96.96.26.2
Maternal diseases/pregnancy complications    
Pre-existing diabetes0.10.10.40.4
Pre-existing hypertension0.30.30.40.4
Epilepsy0.40.40.40.4
Urinary tract infections12.712.710.010.1
Asthma7.17.15.75.7
MbChron0.60.60.50.6
Pregnancy bleedings0.00.00.00.0
Spontaneous preterm delivery (<37 weeks)4.34.22.62.6
Foetal conditions    
Down syndrome0.00.00.10.1
Neural tube defects0.00.00.00.0
Kidney malformations0.00.00.10.1
Infant gender    
Male51.451.351.351.4
Female48.648.748.748.6
In vitro fertilization1.61.60.50.5
Years of involuntary childlessness (years)    
090.490.395.495.4
1–27.37.43.73.7
3 or more2.32.30.90.8
Previous spontaneous abortions    
None86.486.575.275.1
1–213.013.023.123.2
3 or more0.60.61.71.7
Previous children    
None100100
169.469.4
224.624.5
3 or more6.16.1
Years since last pregnancy (years)    
<251.451.3
2−533.333.5
6−911.111.0
≥104.14.1
Pregnancy duration last pregnancy (weeks)    
<341.41.3
34–364.04.0
37–3817.217.1
39–4051.051.1
4118.618.6
42+7.87.8

Table 2 (primiparous women) and Table 3 (multiparous women) show the results after the final multiple logistic regression analyses evaluating the different factors relation to spontaneous PTD. Among primiparous women, maternal age was significant in the univariate setting (not shown), but was not significant in the initial multiple model including all factors with P < 0.2, and was therefore not included in the final analysis. Among multiparous women, the association between maternal age and spontaneous PTD was U-shaped, and was highly significant in both the univariate and the multivariate settings. For both primiparous and multiparous women, the association between maternal BMI and spontaneous PTD was U-shaped, and highly significant. Maternal smoking was also a strong risk factor for spontaneous PTD in both groups. Of all the maternal diseases evaluated, only maternal pre-existing diabetes (primiparas and multiparas) and hypertension (primiparas) were associated with spontaneous PTD. Among primiparous women, pregnancies were more likely to end with a spontaneous PTD if the gestational age according to ultrasound differed 1 week or more (both directions) with the gestational age based on the date of the last menstrual period. For multiparous women, there was an increased risk for spontaneous PTD only if gestational age according to ultrasound was 1 week or more, shorter than the gestational age based on the date of the last menstrual period, or if the discrepancy between gestational age according to ultrasound and the date of the last menstrual period was 14 days or more. For both primiparous and multiparous women, the three evaluated foetal abnormalities were all associated with spontaneous PTD. Foetal male gender, but not in vitro fertilization, was also associated with spontaneous PTD. In both parity groups, there was a significant association between spontaneous PTD and the number of years of involuntary childlessness, and the number of previous spontaneous abortions, respectively. However, for multiparous women, the significance of ‘years of involuntary childlessness’ disappeared in the initial multivariate analysis, and was therefore not included in the final model. Among multiparous women, women with two previous children had the lowest risk of spontaneous PTD (both univariate and multivariate analyses). In the univariate analysis, three previous children or more was associated with an increased risk of spontaneous PTD, but the association disappeared in the multivariate analysis. The risk of spontaneous PTD decreased linearly and significantly with increasing gestational duration of the last pregnancy. There was a U-shaped association between the time elapsed since the last delivery, and the risk of spontaneous PTD in the current pregnancy.

Table 2. Risk factors for spontaneous preterm delivery (<37 weeks) among primiparous women. Results from multiple logistic regression analyses
 Final multiple modela
Odds ratio[95% CI]
  1. GA_U, Gestational age according to ultrasound; GA_LMP, Gestational age according to last menstrual period.
  2. aModel including all factors that were significant (all factors shown in the column) in the initial multiple mode.
  3. bOdds ratio per 1 year increase.
  4. cOdds ratio per 1 cm increase.
  5. dOdds ratio per one step increase in a quasi-continuous scale where 1 = no, 2 = 1–9 cig/day, 3 = 10 or more cigs./day.
  6. eCompared with difference –6 to +6 days.
  7. fcompared with females.
  8. gOdds ratio per one year increase.
  9. hOdds ratio per 1-step increase.
Maternal ageb  
Maternal heightc0.97[0.97, 0.97]

Maternal BMI

 Linear term (BMI)

0.95Simultaneous P-value <.001
Quadratic term (BMI2)1.001
Maternal smokingd1.09[1.00, 1.04]
Maternal pre-existing diabetes2.77[1.89, 4.06]
Maternal pre-existing hypertension1.79[1.32, 2.42]

Difference between GA_U and GA_LMP

 ≥+14 dayse

1.33[1.11, 1.60]
7 to 13 dayse0.79[0.67, 0.92]
−13 to 7 dayse1.36[1.29, 1.44]
≤−14 dayse1.39[1.28, 1.50]
Male infant genderf1.17[1.12, 1.22]
Down Syndrome6.33[3.55, 11.28]
Neural tube defects3.68[1.41, 9.61]
Kidney malformations3.75[1.98, 7.14]
Pregnancy bleeding6.42[2.90, 14.22]
In vitro fertilization  
Years of involuntary childlessnessg1.02[1.00, 1.04]
Previous spontaneous abortionsh1.08[1.03, 1.12]
Table 3. Risk factors for spontaneous preterm delivery (<37 weeks) among multiparous women. Results from multiple logistic regression analyses
 Final multiple modela
Odds ratio[95% CI]
  1. GA_U, Gestational age according to ultrasound; GA_LMP, Gestational age according to last menstrual period.
  2. aModel including all factors that were significant (all factors shown in the column) in the initial multiple model.
  3. bOdds ratio per 1 year increase.
  4. cOdds ratio per 1 cm increase.
  5. dOdds ratio per one step increase in a quasi-continuous scale where 1 = no, 2 = 1–9 cig/day, 3 = 10 or more cigs./day.
  6. eCompared with difference –6 to + 6 days.
  7. fcompared with females.
  8. gOdds ratio per 1-step increase.
  9. hCompared with 1 previous child.
  10. iOdds Ratio per 1 week increase.
  11. jCompared with 2–5 years.
  12. *This category was non-significant in the univariate analyses and was in the multivariate analyses merged with the reference category –6 to +6 days.

Maternal ageb

 Linear term (years)

0.91Simultaneous P-value <.001
Quadratic term (years2)1.002
Maternal heightc0.99[0.98, 0.99]

Maternal BMI

 Linear term (BMI)

0.95Simultaneous P-value <10−6
Quadratic term (BMI2)1.001
Maternal smokingd1.37[1.31, 1.43]
Maternal pre-existing diabetes2.47[1.95, 3.13]

Difference between GA_U and GA_LMP

 ≥+14 dayse

1.44[1.16, 1.79]
+7 to +13 dayse*
−13 to −7 dayse1.41[1.32, 1.50]
≤−14 dayse1.57[1.43, 1.72]
Male infant genderf1.20[1.14, 1.26]
Down Syndrome6.08[4.23, 8.75]
Neural tube defects3.11[1.20, 8.06]
Kidney malformations3.09[1.96, 4.87]
Pregnancy bleeding6.35[3.47, 11.62]
Previous spontaneous abortionsg1.13[1.09, 1.16]

Number of previous children

 2h

0.87[0.82, 0.92]
3 or moreh
Gestational duration last pregnancyi0.76[0.76, 0.77]

Interval between pregnancies

 <2 yearsJ

1.21[1.14, 1.29]
6 to 9 yearsj1.16[1.07, 1.26]
10 years or morej1.44[1.28, 1.62]

Table 4 (primiparous women) and Table 5 (multiparous women) show the LRs based on the results of the final multiple logistic regression analyses. In Appendix 1, there is an example showing how the LR tables could be used to estimate the individual risk for spontaneous PTD for a certain woman.

Table 4. Likelihood Ratios (LR) for spontaneous preterm delivery (<37 weeks) among primiparous women, by different values of each significant factor in the final multiple logistic regression model
Maternal BMIMaternal heightMaternal smokingInvoluntary childlessness
kg/m2LRCmLRCig./dayLRYearsLR
151.11501.62No0.9900.99
161.081511.581–91.0611.01
171.061521.5310+1.1321.03
181.041531.49  31.05
191.031541.44Discrepancy GA_LMP – GA_Usound41.07
201.011551.451.09
211.001561.36daysLR61.12
220.991571.32≥+141.3271.14
230.991581.29+7 to +130.7981.16
240.981591.25−6 to +60.92  
250.981601.21−13 to −71.27Previous spontaneous abortions
260.981611.18≤–141.36
270.981621.15  NumberLR
280.981631.11Pregnancy complications00.99
290.991641.0811.07
301.001651.05Bleeding6.1321.15
311.011661.02Diabetes2.7731.24
321.021670.99Hypertension1.7641.34
331.031680.96  51.45
341.051690.94Foetal abnormalities  
351.071700.91MB Down6.07  
361.091710.88NTD3.68  
371.111720.86Kidney def.3.74  
381.141730.83    
391.171740.81Foetal gender  
401.211750.79Female0.92  
  1760.76Male1.08  
  1770.74    
  1780.72    
  1790.7    
  1800.68    
  1810.66    
  1820.64    
  1830.62    
  1840.61    
  1850.59    
Table 5. Likelihood Ratios (LR) for spontaneous preterm delivery (<37 weeks) among multiparous women, by different values of each significant factor in the final multiple logistic regression model
Maternal ageMaternal BMIMaternal heightMaternal SmokingPrevious spontaneous abortions
YearsLRkg/m2LRcmLRCig./dayLRNumberLR
161.25151.551501.28No0.9300.96
171.2161.451511.261–91.2711.08
181.15171.351521.2410+1.7521.21
191.11181.271531.22  31.37
201.08191.21541.2Discrepancy GA_Usound – GA_LMP41.54
211.05201.141551.1951.74
221.02211.091561.17daysLR  
231221.041571.15≥+141.43Number of previous children
240.992311581.13−6 to +130.9NumberLR
250.97240.971591.12−13 to −71.3211.03
260.96250.941601.1≤–141.5220.9
270.96260.921611.08   3 +1.03
280.95270.91621.07Pregnancy complicationsGestational duration previous pregnancy
290.95280.881631.05Bleeding6.35
300.96290.871641.04Diabetes2.45WeeksLR
310.96300.871651.02  2818.6
320.97310.861661.01Foetal abnormalities2914.16
330.99320.861670.99Down S6.063010.78
341.01330.871680.98NTD3.11318.21
351.03340.881690.96Kidney malf3.09326.25
361.05350.891700.95  334.76
371.08360.911710.94Foetal gender343.62
381.12370.931720.92Female0.91352.76
391.16380.961730.91Male1.09362.1
401.21390.991740.9  371.6
411.26401.031750.88  381.22
421.32  1760.87  390.93
431.38  1770.86  400.71
441.46  1780.84  410.54
451.54  1790.83  420.41
    1800.82  Interval between previous and current pregnancy
    1810.81  
    1820.8  YearsLR
    1830.78  <21.16
    1840.77  2–50.92
    1850.76  6–91.14
        ≥101.42

The overall ability of the developed models to predict spontaneous PTD in the test samples was illustrated using ROC curves. For multiparous women (Figure 2), the AUC (0.74) indicated that the performance of the prediction model was satisfactory, while for primiparous women (Figure 3), the performance of the prediction model was rather poorly (AUC = 0.58). Figure 2 shows that with an LR cut-off 1.3, it would be possible to detect 50% of all cases of spontaneous PTD among multiparous women. The optimal point (the point with the largest distance to the line of unity [sensitivity = 1-specificity]) was LR = 1.00.

Figure 2.

ROC curve showing the overall ability of the developed model using the multiparous development sample to predict spontaneous preterm delivery (<37 weeks) among women in the multiparous test sample. The numbers represent likelihood ratios. Area under curve 0.74 [95% CI 0.73, 0.74].

Figure 3.

ROC curve showing the overall ability of the developed model using the primiparous development sample to predict spontaneous preterm delivery (<37 weeks) among women in the primiparous test sample. The numbers represent likelihood ratios. Area under curve 0.58 [95% CI 0.57, 0.58].

Finally, the performance of the prediction models was summarized in Table 6, showing the observed and predicted rates of spontaneous PTD, respectively, by predicted risk strata, for primiparous and multiparous women. The performance was quite good for pregnancies with comparatively low risk for spontaneous PTD, whereas the ability of the prediction models were limited to predict pregnancies with 30% or more risk of spontaneous PTD.

Table 6. The observed and predicted, respectively, rates of spontaneous preterm delivery (<37 weeks), by predicted risk strata
Estimated risk strataCorresponding likelihood ratioTotal (N)Cases (n)% Observed cases% Predicted cases
Primiparous women     
0–2.4%<0.572581592.32.3
2.5–4.9%0.58–1.1814916756883.83.8
5–7.4%1.19–1.823705721805.95.7
7.5–9.9%1.83–2.4916931217.28.2
10.0–14.4%2.50–3.972883311.511.9
15–19.9%3.98–5.63951717.916.8
20–29.9%5.64–9.65471123.423.8
30–39.9%9.66–18.4360038.5
40–49.9%18.44–22.532150.046.1
50% or more≥22.540
Multiparous women     
0–2.4%<0.9416580521871.31.6
2.5–4.9%0.95–1.955678221433.83.3
5–7.4%1.96–3.0194038098.66.0
7.5–9.9%3.02–4.13294537812.88.6
10.0–14.4%4.14–6.56203833616.512.0
15–19.9%6.57–9.3079013517.117.2
20–29.9%9.31–15.9566513420.224.3
30–39.9%15.96–30.454226816.136.4
40–49.9%30.46–37.22811721.047.2
50% or more≥37.232725018.460.2

Comment

The predictive ability of the model for multiparous women was satisfactory as indicated by an AUC of 0.74 [95% CI 0.73, 0.74]. Also, for both primiparous and multiparous women, the performance of the prediction models was quite good for pregnancies with comparatively low risk of spontaneous PTD (<30%).

The predictive ability of the two models varied between primiparous and multiparous women. Obstetric history adds useful information and reflects the individual aspect of gestational age and emphasizes the importance of heritability and its contribution to the tendency to repeat gestational age in subsequent pregnancies.[17, 18] On the other hand, the primiparous model fared rather poorly. A major problem is that most women who deliver preterm have no obvious risk factors.[11, 19] This leaves early detection of PTD to be very difficult, and prediction in primiparous woman is limited, as outlined by our data.

The technique of using Bayesian theory in predicting risk of spontaneous PTD has recently been applied by Lee et al.[20] They concluded that using Bayesian filtering was both customizable and useful in establishing a model for prediction of PTD. However, their study was very small with only 522 women, and they had a fairly high proportion of PTD (18.4%) unlike our population. Both aspects limit the generalizability of their conclusions. Also, Schaaf and colleagues from the Netherlands have recently developed a model for prediction of singleton spontaneous PTD in a population with similar PTD proportion as our data (3.8%).[21] They found both previous PTD and bleeding in pregnancy to be strong predictors of spontaneous PTD, and the accuracy of predictions decreased with increasing risk, similar to our data. However, they did not distinguish between primi and multipara, and they found a lower AUC of 0.63 [95% CI 0.63, 0.63], indicating a lower predictive ability in their combined model.

The logistic regression modelling identifies the available independent variables from the register that are important and can help in predicting spontaneous PTD. The subsequent calculation of adjusted LRs, based on the output of the final logistic regression models, has an advantage in that it makes it possible to easily handle missing data. An independent prior estimate has to be used to perform a Bayesian analysis. In our calculations, this information lies in the background odds found in the development sample. If this model is to be used in other populations, the background odds for that specific population has to be used instead. It is likely that we have identified the explanatory variables from an unselected and large population of major importance in the prediction of spontaneous PTD. In addition, the final output gives an estimate of the individual risk that is an important issue in the clinical situation (exemplified in Appendix 1). Clearly, the performance of the prediction models is not perfect, and illustrates the complicated challenge of predicting risk of spontaneous PTD for an individual woman. The major strength of this statistical technique is its robustness in handling of missing data through replacement coefficients. Hopefully, by adding variables from clinical examination of the women (like cervical length measurements by ultrasound) or biological markers and further development of the models, this way of handling data may enable a clinical applicability of an otherwise economic and easy to use predictive tool. Cervical length measurements would have been interesting to incorporate, but unfortunately, this information was not registered in the Swedish Medical Birth Registry throughout the study period.

Compared with a recent published predictive model for preterm birth by Catley et al.[11] using artificial neural networks, we look at a more specific subgroup of PTD. We have also included more independent variables than Catley and colleagues, and we have reached an even better predictive ability. None of the other developed prediction tools and risk scoring systems described is in daily clinical use, and our study shows that high-quality epidemiological and clinical data can reach at least the same or even better predictive ability than more sophisticated developed methods.

Our prediction models are developed and validated in a large unselected sample of Caucasian women from a Scandinavian population with low risk of spontaneous PTD and in epidemiological and clinical data of high quality from a compulsory national birth register, which is considered one of the most complete registers in the world. One crucial question to be answered is: ‘Can the models be generalized to different populations with different ethnic distributions and different proportions of spontaneous PTD?’ The multiparous model is applicable to populations with similar preterm birth rates, but the usefulness in more inhomogeneous populations, especially those with greater ethnic plurality, is thus an open question that needs to be addressed. On the other hand, we have previously in work with the presented models assessed the ethnic component, but it was left out as it added no extra predictive ability to the models. Most likely, and based on our assessments, it seems like diversity in preterm birth rate between populations may be a more important issue on generalizability than ethnic inhomogeneity.

An interesting observation from our models was that urinary tract infections in pregnancy were not associated with spontaneous PTD in this Caucasian low-risk population. This is contrary to data from other populations that have found infectious maternal diseases such as urinary tract infections, especially pyelonephritis[22-26] and pneumonia[27-29] to be associated with PTD. Our group has previously found the same lack of association in a different Scandinavian population from Norway.[30] Similarly, Schaaf et al. did not find recurrent urinary tract infections in pregnancy to be a predictive factor for spontaneous PTD in a Dutch population with similar proportion of spontaneous PTD.[21] Consequently, this further strengthens the possibility that the link between maternal infection, and PTD may vary in different populations and health care settings.

Further development of this statistical technique in prediction of spontaneous PTD is warranted based on findings from our study. Prediction of spontaneous PTD proved to be difficult.

Acknowledgements

The study was supported by the Göteborg Medical Society, the Evy and Gunnar Sandberg foundation and the Norwegian Research Council which are acknowledged.

Appendix: Appendix 1

Case example:

Calculation of the predicted probability of spontaneous preterm delivery (<37 weeks) in a pregnant woman with three previous children, aged 38 years, BMI 20.0, height 165.0 cm, smoking habits ≥10 cigarettes/day, gestational age according to ultrasound was 10 days shorter than the corresponding age based on the date of the last menstrual period, no diabetes or pregnancy complications, no known fetal abnormalities, male fetal gender, two previous spontaneous abortions, and the last pregnancy duration was 36 weeks.

Given the background odds for spontaneous delivery <37 weeks in the multiparous test population = 0.02686, then

Posterior odds = 0.02686 × (1.03 × 1.12 × 1.14 × 1.02 × 1.75 × 1.32 × 1.09 × 1.21 × 2.1) = 0.02686 × 8.58 = 0.23

Predicted probability = 0.23/(1 + 0.23) = 0.187 = 18.7 %

Using this prediction model the multiparous woman in the case example has a 19 % risk for a spontaneous delivery before 37 completed weeks of pregnancy.

Ancillary