A nationwide population-based register study was designed using data from the Swedish Medical Birth Register. Information is collected prospectively by the staff responsible for patient care and includes demographic data, reproductive history, and complications during pregnancy, delivery, and the neonatal period. Copies of standardized individual antenatal, obstetric, and paediatric records are forwarded to the Birth Register, where the information is automatically entered into a database and stored. All births and deaths in Sweden are validated every year through individual record linkage to the Swedish Population Register, using the mothers' and infants' unique national registration numbers, assigned to each Swedish resident at birth. The Medical Birth Register is maintained by the Swedish National Board of Health and Welfare.
In the Swedish Medical Birth Register, the following sources are available to estimate gestational age: (1) date of last menstrual period, (2) corrected expected date of parturition according to last menstrual period (the estimate made by the midwife at the antenatal care centre, essentially based on last menstrual period and menstrual cycle length), (3) expected date of parturition according to ultrasound, and (4) estimated gestational age at birth reported by the delivery unit. Using these sources in hierarchical order, the best available estimate of gestational age for each infant is determined and designated best estimate. According to this method, the gestational age determined by ultrasound was preferred when available and not too incongruous with the other sources.
We used, wherever possible, predictive variables that were collected early in pregnancy, i.e. (before second trimester) maternal smoking was collected at the beginning of pregnancy and reflected pre-pregnant smoking. Haemorrhage during pregnancy meant that data were collected any time during the early phase of pregnancy.
Type of birth onset, i.e. spontaneous or induced labour or pre-labour caesarean section, has been registered in the Swedish Medical Birth Register since late 1990s. In cases of preterm pre-labour rupture of the membranes [International Classification of Diseases (ICD) 9:6581; ICD 10:O42], births were regarded as spontaneous PTDs regardless of the reported onset of labour. This variable has previously been validated and is considered to be reliable. All registered iatrogenic preterm births, according to the definition outlined above, were excluded from further analysis (outlined in flow chart, Figure 1).
Included in the current study were singleton pregnancies with deliveries occurring during the period 1992–2008, for which required data were available in the Swedish Medical Birth Register. Data for primiparous women and multiparous women were analysed separately. Prior to the model development, each of the 878 991 included pregnancies were randomly assigned to either development sample (odd number in mother's day of birth) or to test sample (even number in mother's day of birth). The data preparation process is described in detail in a flow chart (Figure 1). Model validation is important, and the developmental and test samples were kept apart.
The prediction models for primiparous women and multiparous women, respectively, were developed and evaluated in four steps. First, the development samples were used to determine the most important factors for predicting spontaneous PTD, using univariate and, finally, multivariable logistic regression analyses (see detailed description below). Secondly, the outputs of the final fitted multiple logistic regression models were used as previously described in detail by Smith and colleagues in their prediction model for caesarean section risk. Thus, the outputs were converted into likelihood ratios (LR). The method can briefly be summarized into the following steps: (1) an optimal replacement constant was estimated for the included independent variables in the logistic regression model to be used when information on the variable was lacking (x1, x2 etc. is lacking); (2) the LRs were adjusted by assessing the difference between the replacement constant and the actual ‘overall log odds’ for the outcome in the development sample; and (3) the above process was repeated for each included independent variable, and the output was used to calculate adjusted LR for PTD. Thirdly, the outputs from the step above were used to calculate the individual LR for spontaneous PTD for each woman in the test samples (primiparous and multiparous women, respectively). The obtained LR estimates were used to create receiver operating characteristic (ROC) curves, and to calculate the area under the ROC curve (AUC) with 95% confidence interval [CI].
Finally, the observed and predicted rates of spontaneous PTD in the test sample were calculated for various ‘predicted risk strata’. The predicted risk for spontaneous PTD for each woman in the test samples was calculated using the Bayesian theorem:
The initial logistic regression analyses were performed in three steps. First, for each factor evaluated, the best model (continuous linear, continuous second degree polynomial, or class variables) was determined by considering the levels of significance and the goodness of fit of the different models. When the best model of each factor was chosen, the factors with P-value <0.20 were entered into a multiple logistic regression analysis. When determining the level of significance of factors represented by a second degree polynomial, or several class variables, the simultaneous significance level of the fractions was considered (and not the individual P-values). The final multiple logistic regression model included significant factors (P < 0.05) only. The best performing model was selected based on a trade-off between the overall P-value of the model and the Hosmer–Lemeshow goodness of fit test.
For both groups (primiparous and multiparous women), the following factors were evaluated in their relation to spontaneous PTD using univariate multiple logistic regression analyses: maternal characteristics (age [linear model for primiparous, second degree polynomial for multiparous women], height [linear], body mass index [BMI, kg/m2][second degree polynomial], smoking [semi-continuous linear: 1 = no, 2 = 1–9 cigarettes per day, 3 = 10, or more cigarettes per day]), maternal pre-pregnancy disease (diabetes, hypertension, asthma, Crohn's disease, epilepsy) (yes/no classes), pregnancy complications/foetal abnormalities (urinary tract infections, haemorrhages, Down syndrome, neural tube defects, kidney malformations) (yes/no classes), discrepancy between gestational age according to ultrasound (GA_U) and gestational age based on date of last menstrual period (GA_LMP) (classes ≥ +14 days, +7 to +13 days, 6 to +6 days, 13 to −7 days, and ≤ −14 days), and obstetric history (number of previous spontaneous abortions [linear], number of years of involuntary childlessness [linear]). For multiparous women, more information regarding obstetric history were included: the number of previous children (classes 1, 2, 3, or more), gestational duration (weeks, linear) of last pregnancy, and the interval (years, linear) between the last and the current pregnancy.
The statistical analyses were performed using Gauss (GaussTM, Aptech Systems Inc., Maple Valley, WA, USA; http://www.aptech.com). The ethics committee, Göteborg, Sweden, approved the study (Reference number: Göteborg 258-07). The National Board of Health and Welfare approved the use of data from the Swedish Medical Birth Register.