Parker JD, Liao D, Schenker N, Branum A. The use of covariates to identify records with implausible gestational ages using the birthweight distribution. Paediatric and Perinatal Epidemiology 2010.
The objective of this study was to evaluate the usefulness of covariates in identifying birth records with implausible values of gestational age. Birthweight distributions for births with early reported gestational ages are markedly bimodal, suggesting a mixture of two distributions. Most births form a normal-shaped left-hand (primary) distribution and a smaller number form the right-hand (secondary) distribution. The births in the secondary distribution are thought to have gestational age mistakenly reported. Prior work has found that births in the secondary distribution are at higher risk of poor outcomes than those in the primary distribution. Using 2002 US Natality data for gestational ages 26–35 weeks, we fit normal mixture models to birthweight with and without covariates (maternal race, education, parity, age, region of the country, prenatal care initiation) by reported gestational age. Additional models were stratified by infant sex. This approach allowed for the relationship between the covariates and birthweight to differ between the components.
Mixture models fit reasonably well for reported gestational ages <33 weeks, but not for later weeks. Counter to the hypothesis, results were similar for models with and without covariates or stratification or both, although stratified models without covariates predicted slightly more girls and slightly fewer boys in the secondary distribution than did the corresponding unstratified models. For reported gestational ages <33 weeks, predictions from the four sets of models were highly correlated and predictions were similar for subgroups defined by the clinical estimates of gestational age and other covariates. For births with reported gestational ages of 29 or more weeks, the proportion in the secondary distribution exceeded 30%, although this varied by maternal characteristics. The use of covariates and stratification complicated model fitting without materially improving identification of implausible gestational age values, supporting inferences from prior studies using data ‘cleaned’ without consideration of maternal or infant characteristics.