Assessing the quality of last menstrual period date on California birth records


  • Conflicts of interest: the authors have declared no conflicts of interest.

Michelle Pearl, Sequoia Foundation c/o Genetic Disease Screening Program, California Department of Public Health, 850 Marina Bay Parkway, Rm. F175, Mail Stop 8200, Richmond, CA 94804, USA.


Birth certificate last menstrual period (LMP) date is widely used to estimate gestational age in the US. While data quality concerns have been raised, no large population-based study has isolated data quality issues by comparing birth record LMP (Birth LMP) with reliable LMP dates from another source. We assessed LMP data quality in 2002 California singleton livebirth records (n = 515 381) and in a subset of records with linked prenatally collected LMP from California's statewide Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) (n = 105 936). Missing or incomplete LMP data affected 13% of birth records; 17% of those had complete LMP within XAFP records.

Data quality indicators supported XAFP LMP as more accurate than Birth LMP, with a lower prevalence of digit preference, post-term delivery, out-of-range gestational age estimates and implausible birthweight-for-gestational age. The bimodal birthweight distribution evident at 20–31 weeks' gestation based on Birth LMP was nearly absent with XAFP LMP-based gestational age. Approximately 32% of the second birthweight mode was explained by apparent clerical errors in Birth LMP month. Digit preference errors, particularly day 1, were associated with gestational age overestimation. Preterm delivery rates were higher according to Birth (7.6%) vs. XAFP LMP (7.2%). One-fifth of observed preterm and over half of observed post-term births using Birth LMP were not true cases; 15% of true preterm cases were missed. African American or Hispanic, less educated, and publicly or uninsured women were most likely to be misclassified and have large LMP date discrepancies attributable to clerical or digit preference error. The implementation of a revised birth certificate is an opportunity for targeted training and data entry checks that could substantially improve LMP accuracy on birth records.


Last menstrual period (LMP) date is the most widely available source for estimating gestational age from birth certificates in the US, and is the only source from the California certificate of livebirth before 2007. However, gestational age estimates from LMP in general, and from birth records in particular, are prone to error, as exhibited by digit preference1–3 and implausible values relative to birthweight.4 Errors in gestational age estimates from LMP have resulted in excess post-term births relative to ultrasound estimates1,5 and a bimodal birthweight distribution among very early preterm deliveries6,7 not observed for very early preterm deliveries identified through clinical and ultrasound estimates.8,9

It is unknown to what extent birth certificate LMP data quality is affected by recall difficulties and clerical error, beyond limitations inherent in the LMP dating method and its assumption of conception 14 days after the first day of menstrual bleeding (e.g. cycle length variability, amenorrhoea, non-menstrual vaginal bleeding mistaken for a normal period).10 Digit preference in the reported day of the month, an indication of recall error, is prevalent in LMP from birth records as well as medical records.1–3 Quantification of clerical errors in recording and entry, such as month or year discrepancies and month/day transpositions, requires comparison of LMP from different sources, yet no such population-based comparisons have been published.

The goals of this analysis are to (1) establish whether prenatally collected LMP data from California's centralised prenatal screening programme is more accurate than LMP data from linked birth records; (2) quantify the magnitude and impact of gestational age reporting errors; (3) determine to what extent clerical and recall error contribute to discrepancies in LMP dating; and (4) identify population subgroups most affected by poor LMP data quality. By comparing LMP dates from birth records with a population-based source of reliable LMP data, the study design isolates reporting error in LMP rather than errors inherent in the LMP dating methodology.


California singleton livebirth records from 2002 (n = 515 389) were linked to data from pregnant women enrolled in the statewide Expanded Alpha-fetoprotein Screening Program (XAFP) between July 2001 and December 2002. The XAFP is a voluntary, triple marker screening programme offered to all women entering prenatal care by 20 weeks' gestation. In order to interpret serological markers, the programme requires an estimate of gestational age based on ultrasound, LMP, or physical examination, which is reported by the medical provider at the time of maternal blood collection (between 15 and 20 weeks' gestation) and double-key entered by programme personnel. The programme assigns a ‘best estimate’ of gestational age that prioritises ultrasound when available as the ‘gold standard’, unless otherwise specified by the provider. Between 20% and 25% of records are routinely verified with providers before serological interpretation, and those with positive or uninterpretable screen results (roughly an additional 8%) receive further follow-up to confirm gestational age.

Probabilistic matching was used to link records from the XAFP and birth certificates, using mother's name, date of birth, social security number, delivery date, XAFP accession date, telephone number, street address, city and zip code.11 A conservative certainty cut-off was used to minimise false matches. Overall, 327 218 livebirth records (63%) linked to an XAFP record from the same pregnancy. As a quality control measure, 1800 records with large gestational age discrepancies or whose birth records indicated no prenatal care before 6 months' gestation were reviewed for matching accuracy, yielding six likely mismatches (0.4%). No mismatches were found from manual review of records with out-of-range gestational age values based on XAFP LMP (<20 or >45 weeks, n = 45).

Of 515 389 birth records in 2002, eight birth records with missing birthweight, 29 468 missing date of LMP and 37 155 missing only day of LMP were excluded, yielding 448 758 complete records. Comparisons with XAFP LMP data are based on 105 936 birth records with complete LMP date linked to an XAFP record with LMP date as the ‘best estimate’ of gestational age.

Data quality indicators evaluated include the proportion with post-term deliveries, out-of-range gestational age, implausible birthweight-for-gestational age, very preterm births with implausibly high birthweights (second birthweight mode), and digit preference. Gestational age was calculated as the neonate's date of birth minus the LMP date, with those <20 completed weeks or >44 completed weeks considered out-of-range and excluded from rate calculations. Preterm was defined as 20–36 completed weeks and post-term as 42–44 completed weeks. Implausible birthweight-for-gestational age was determined according to National Center for Health Statistics cut points (<20 weeks, ≥1000 g; 20–23 weeks, ≥2000 g; 24–27  weeks, ≥3000 g; 28–31 weeks, ≥4000 g; 32–47 weeks, ≤1000 g).12

To examine the bimodal birthweight distribution, birthweight density plots were generated from birth records for births between 20 and 27 weeks and 28–31 weeks' gestation, as defined by Birth LMP and XAFP LMP, using kernel density estimation.13 Birthweights ≥2200 g at 20–27 weeks' gestation and ≥2700 g at 28–31 weeks' were considered to be in the second birthweight mode. LMP days of the month with frequency greater than expected by chance include 1, 5, 10, 15, 20, 25 and 28. The overall expected proportion with preferred digits is 23.0%, and the expected proportion for digits 1–28 is 3.3%. The magnitude of measurement error in gestational age from Birth LMP dates was estimated by the difference between Birth LMP and XAFP LMP gestational age estimates. Positive differences represent overestimation of birth gestational age relative to XAFP. Among discrepant records, the R2 from linear regression models of birthweight on gestational age, defined by either Birth LMP or XAFP LMP, was assessed. We further examined false-positive preterm and post-term rates (1–specificity = false positives/true negatives), false-negative preterm rates (1–sensitivity = false negatives/true positives), and false-positive preterm and post-term screen rates (1–positive predictive value = false positives/screen positives), treating XAFP gestational age as the gold standard.

Two error flags were evaluated to explain discrepancies: clerical error and digit preference (indicating recall error). Clerical error types were suggested by Blair et al.14 as well as the distribution of observed discrepancies and include: dates that differ in only the month or year field; dates that differ by 1 in the tens digit of the day field (e.g. day 1 vs. 11); transposed month and day; LMP equal to the delivery date; or LMP 28 days or less before the child's date of birth, possibly reflecting an estimated delivery date. The electronic birth recording system used to enter 90% of records in California in 2002 did not allow LMP entries with dates beyond the delivery date. The XAFP data entry programme triggers a double-check for LMP dates beyond the date of blood collection. Records with preferred digit LMP days were labelled ‘digit preference errors’ if the date was discrepant from the XAFP date and the discrepancy was not also considered a clerical error. The proportion of discrepancies and poor data quality indicators ‘explained’ by each error type was evaluated by calculating the percentage change in prevalence of each indicator when substituting XAFP LMP values for Birth LMP values for records flagged as either clerical or digit-preference error.

The relationship between birth certificate demographic and obstetric characteristics and data quality, misclassification and gestational age estimates was examined by comparing prevalence across subgroups defined by: self-reported race/ethnicity, with Hispanic ethnicity stratified by mother's birthplace [US-born or foreign-born (Mexico in 87.5% of cases)]; maternal age; years of completed education categorised as <12, 12 and >12; parity (number of livebirths before current delivery); and source of payment for delivery, grouped as Medi-Cal (California's Medicaid programme), private insurance, uninsured or other (Medicare, worker's compensation, other governmental and non-governmental programmes).


Data completeness and population selection factors are assessed in Table 1. In 2002 birth records, 12.9% of deliveries were missing LMP dates, 55.8% of those missing day only (data not shown). Missing or incomplete LMP data on birth records were associated with African American and US-born Hispanic race/ethnicity, younger maternal age, higher prevalence of low birthweight, less than high-school education, and Medi-Cal coverage (Table 1). Of records with missing or incomplete LMP, 39.2% had complete ultrasound data and 16.6% had complete LMP data in linked XAFP records (data not shown).

Table 1.  Characteristics of linked and unlinked study populations, California 2002 Live Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records
 2002 Livebirthsa
(n = 515 381)
2002 Livebirthsa
with Birth LMP
(n = 448 758)
2002 LiveBirthsa with Birth
LMP, linked to XAFP
(n = 270 746)
Missing or incomplete LMP date
n = 66 623 (12.9%)
With LMP date
n = 448 758 (87.1%)
Not linked to XAFPb
n = 178 012 (39.7%)
Linked to XAFPc
n = 270 746 (60.3%)
XAFP Ultrasoundd
n = 164 810 (60.9%)
n = 105 936 (39.1%) %
  • a

    Excludes n = 8 records missing birthweight.

  • b

    Also includes n = 12 239 records that linked to XAFP but had no XAFP LMP or ultrasound data.

  • c

    Records with XAFP LMP or ultrasound data.

  • d

    Best estimate of gestational age used by the state-sponsored prenatal screening programme to interpret serological markers.

  • e

    Denominator excludes records with gestational ages <20 and >44 completed weeks.

  • LMP, last menstrual period; NA, not applicable.

 African American7.
 Hispanic, US-born21.917.516.118.318.118.7
 Hispanic, foreign-born28.933.034.432.230.634.7
 Pacific Islander3.
 American Indian/Alaskan Native0.
Age (years)
Education (years)
Previous livebirths (parity)
Birthweight (g)
Method of payment for delivery
 Any private48.853.044.458.762.353.1
Birth LMP gestational age (completed weeks)
 Preterm:e 20–36NA9.
 Post-term:e 42–44NA6.

Compared with non-XAFP participants, XAFP participants were more likely to be under the age of 34 years, to have no previous livebirths, to have completed more than 12 years of education, and to be privately insured (Table 1). Among XAFP participants, women with LMP as opposed to ultrasound ‘best estimates’ were more likely to be foreign-born Hispanic, have less than high-school education, and have Medi-Cal coverage. Both preterm and post-term birth rates derived from birth certificate LMP were higher among XAFP participants with ultrasound best estimates compared with those with LMP best estimates (8.9% vs. 7.7% and 8.2% vs. 3.7%, respectively).

XAFP LMP appears to suffer from fewer data quality problems than Birth LMP, as evidenced by fewer out-of-range gestational age values, fewer preferred digits, lower post-term rates and lack of a bimodal birthweight distribution at early gestational ages (Table 2). Preterm birth prevalence was higher according to linked Birth LMP than XAFP LMP (7.6% vs. 7.2%). Birth records linked to XAFP records had lower prevalence of out-of-range gestational age, post-term births and implausible birthweight-for-gestational age than the overall birth population. Day 1 was the most commonly reported day in overall birth records, and day 15 was most commonly reported by both Birth LMP and XAFP LMP within the linked sample. While digit preference is evident in both data sources for LMP date, over-reporting of days 1 and 15 of the month was higher in Birth LMP vs. XAFP LMP dates (Table 2).

Table 2.  Data quality indicators by study population and LMP data source, California 2002 Live Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records
 2002 Livebirths2002 Livebirths linked to
XAFP records with LMP
Birth LMP
(n = 448 758)
Birth LMP
(n = 105 936)
(n = 105 936)
  • a

    Denominator excludes records with gestational ages <20 and >44 completed weeks.

  • b

    Expected frequency of preferred digits is 3.3%.

  • c

    Proportion with birthweight ≥2200 g among deliveries 20–27 weeks and ≥2700 g among deliveries 28–31 weeks.

  • LMP, last menstrual period.

Gestational age at birth (completed weeks)
 Out-of-range: <
 Out-of-range: >441.60.60.03
 Preterm:a 20–369.07.67.2
 Term:a 37–4184.488.790.5
 Post-term:a 42–446.63.72.3
Digit preference, LMP dayb
 Any preferred digit36.734.331.4
Implausible birthweight-for-gestational age0.20.10.02
Second birthweight modec% (n)% (n)% (n)
 20–27 weeks22.6 (477)14.1 (47)1.9 (6)
 28–31 weeks29.2 (1015)21.0 (124)6.4 (33)
 Overall: 20–31 weeks26.7 (1492)18.5 (171)4.7 (39)

The proportion of very preterm births falling within the second birthweight mode was largest among the overall birth population (26.7% of all births between 20 and 31 weeks), and was four times greater when using Birth LMP than XAFP LMP to estimate gestational age in the linked sample (Table 2). The second birthweightmode all but disappeared within 20–27 weeks' gestation when gestational age was derived from XAFP LMP (Fig. 1), and was greatly attenuated between 28 and 31 weeks (Fig. 2).

Figure 1.

Birthweight distribution within LMP-based gestational age 20–27 completed weeks from birth (n = 333) and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP, n = 315) records, California 2002 Linked Birth and Prenatal Screening records.

Figure 2.

Birthweight distribution within LMP-based gestational age 28–31 completed weeks from birth (n = 590) and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP, n = 517) records, California 2002 Linked Birth and Prenatal Screening records.

The majority of Birth LMP and XAFP LMP dates are identical (71.1%), and 65.0% of discrepancies amount to ≤1 week in either direction (Table 3). Among discrepant records, XAFP LMP-derived days of gestation have a stronger association with birthweight than Birth LMP-derived days of gestation (n = 30 624; R2 = 0.27 and R2 = 0.01, respectively). Large (>2 weeks) gestational age overestimates are 75% more common than large underestimates (Table 3; 3.7% vs. 2.1%), and account for 97.2% of gestational ages >44 weeks and 41.6% of post-term births (data not shown). Birth LMP dates with preferred digits have larger discrepancies and greater gestational age overestimation than dates with non-preferred digits. Among Birth LMP dates with day 1 of the month, 16.2% overestimate gestational age by more than 2 weeks whereas 2.6% underestimate gestational age. The vast majority of records in the second birthweight mode (79.5%) underestimate gestational age by more than 31 days relative to XAFP LMP gestational age (Table 3).

Table 3.  Magnitude of difference between gestational ages calculated from Birth LMP vs. XAFP LMP date, by data quality indicators, California 2002 Linked Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)
Birth minus XAFPBirth LMP data quality indicators
% Overall
(n = 105 936)
% Among preferred digits
(n = 36 333)
% Among day 1
(n = 6614)
% Implausible birthweight-for-gestational age
(n = 91)
% Among 2nd birthweight mode
(n = 171)
  1. LMP, last menstrual period.

Gestational age (days)
 0 (no difference)71.165.752.17.79.9
 −1 to −710.
 −8 to −
 −15 to −311.
 ≤14 days2.12.62.683.587.1
 >14 days3.75.916.26.60.0

Table 4 shows the cross-classification of gestational age categories according to Birth LMP and XAFP LMP. Within Birth LMP-based gestational age groups of 20–31 and 32–36 weeks, 12.4% and 21.4%, respectively, are term births based on XAFP LMP. More than half of post-term births and 83.3% of those >44 weeks according to Birth LMP are term births based on XAFP LMP. The rate of false-negative preterm births is 15.0%, and 20.5% of observed preterm births and 53.9% of observed post-term births are false positives. While the majority of these misclassifications result from discrepancies of >2 weeks, 30.6% of preterm false negatives, 22.4% of preterm false positives and 22.9% of post-term false positives result from discrepancies of ≤14 days (data not shown). Birth records missing LMP dates with linked XAFP LMP data have higher preterm rates than linked records not missing LMP dates (9.8% and 7.2%, respectively) (Table 4).

Table 4.  Distribution of XAFP gestational age within Birth LMP gestational age categories (completed weeks), California 2002 Linked Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)
Birth LMP-based gestational age
(completed weeks)
XAFP LMP-based gestational agea
<2020–3132–3637–4142–44>44Total %Total N
32–3615255401 5231026.77 128
37–41330105591 694598788.293 387
42–4401572 010177053.63 843
 N17832676595 877241728 105 936
 N21239599 693284911 070 
  • a

    Bolded diagonal values indicate birth records correctly categorised according to XAFP gestational age categories.

  • b

    Calculations exclude Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259). Because post-term births derived from either LMP source may be unreliable, a post-term false-negative rate is not presented.

  • LMP, last menstrual period.

Preterm false-positive rateb1 652/97 724=1.7%
Preterm false-negative rateb1 143/7 535=15.2%
Preterm false-positive screen rateb1 652/8 044=20.5%
Post-term false-positive rateb2 068/102 876=2.0%
Post-term false-positive screen rateb2 068/3 838=53.9%

Of all gestational age discrepancies, 46.3% can be described as either clerical or digit preference errors. Clerical errors observed from discrepancies between Birth LMP and XAFP LMP dates represent 2.7% of all linked records and 9.3% of all discrepancies, whereas the prevalence of non-clerical digit preference error is 10.7% of all linked records and 37.0% of all discrepancies. Among clerical errors, 2.2% are whole year deviations, 0.9% possible confusions with estimated delivery date, 47.7% whole month deviations, 1.2% month/day transpositions, and 47.8% 10-day deviations. Among clerical errors, XAFP LMP gestational age is more closely related to birthweight (R2 = 0.33), whereas no relationship exists between Birth LMP gestational age and birthweight (R2 = 0.00).

Proportions displayed in Fig. 3 represent the amount by which the prevalence of each data quality indicator decreases when clerical or digit preference errors in birth records are corrected using XAFP LMP as the gold standard. Clerical errors are associated more with large underestimates than overestimates of gestational age, resulting in 33.1% of the preterm false positives and 31.6% of the second birthweight mode observed between 20 and 31 weeks (all of the latter involved errors in the month field). Digit preference error, especially day 1 error, is associated with large gestational age overestimates. Day 1 errors, while representing only 2.7% of linked records, disproportionately contribute to post-term out-of-range gestational ages, post-term false positives and missed preterm cases.

Figure 3.

Birth/XAFP LMP date discrepancies and poor data quality indicators: proportion explained by clerical error and digit preference error, California 2002 Linked Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936). LMP, last menstrual period.

Discrepancies between Birth LMP and XAFP LMP gestational age estimates vary by population subgroup (Table 5). Large underestimation of gestational age, clerical errors and false-positive preterm rates are apparent among foreign-born Hispanics, younger women with less than high-school education, women with high parity and with Medi-Cal or no insurance. Large overestimation of gestational age, digit preference and post-term false-positive rates are observed among African Americans, Native Americans, women with low education level, high parity, and Medi-Cal or no insurance. Preference for LMP day 1 is most prevalent among African Americans (data not shown) while clerical errors are more prevalent among foreign-born Hispanics.

Table 5.  Maternal and infant characteristics by gestational age categories and data quality indicators, California 2002 Linked Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)
Digit preference error, Birth LMP
Clerical error, Birth LMP
Preterm rate, Birth LMP
Preterm rate, XAFP LMP
Preterm false-
negative ratea
Preterm false-
positive ratea
Preterm false- positive screen ratea
  • a

    Excludes Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259, see Table 4 for detail).

  • LMP, last menstrual period.

 African American2.25.412.92.711.310.913.12.116.1
 Hispanic, US-born2.
 Hispanic, foreign-born3.
 Pacific Islander1.
 Native American1.73.911.
Age (years)
Education (years)
Previous livebirths (parity)
Method of payment for delivery
 Any private1.

Rates of preterm birth are approximately 5% lower across population subgroups when defined according to XAFP LMP than according to Birth LMP, with the exception of foreign-born Hispanics, whose preterm rates are 10% lower using XAFP LMP (Table 5). The preterm birth rate among foreign-born Hispanics appears to be lower than that for US-born Hispanics using XAFP LMP, while rates are identical using Birth LMP. Other preterm birth rate comparisons among subgroups change little based on LMP data source. However, overall preterm birth rates mask substantial misclassification in both directions. Among African Americans, for example, 13.1% of preterm cases are missed using Birth LMP, whereas 16.1% of presumed preterm cases are not true cases. Foreign-born Hispanics have the highest preterm false-positive and false-positive screen rates based on Birth LMP (2.3% and 26.4%, respectively). Native Americans and foreign-born Hispanics have the highest preterm false-negative rates (19.2% and 18.2%, respectively). Medi-Cal coverage and lack of insurance, high parity, young age and low education level are associated with high misclassification of preterm births in both directions.

Overall post-term rates are 36% lower using XAFP LMP compared with Birth LMP; however, this decrease is higher among African Americans, women aged over 35 years, women with high parity and women with Medi-Cal or no insurance. African Americans and the uninsured have the highest post-term false-positive rates (2.6% each), followed by Native Americans, women with less than high-school education and women with Medi-Cal (data not shown).


This is the first study to compare LMP dates from birth certificates with a large, population-based source of reliable, prenatally collected LMP data in order to isolate data reporting errors. Birth LMP was discrepant with XAFP LMP nearly a third of the time, resulting in one-fifth of preterm births and half of post-term births from birth records representing false positives, and 15% of true preterm cases being missed. Agreement within 1 week was larger in the current study than a previous comparison of LMP-based gestational age from birth records with gestational age from medical charts among normal-birthweight babies in northern California (89% and 77–78%, respectively); however, some chart estimates in that smaller study were derived from ultrasound.15

While menstrual dating has inherent flaws for estimating gestational age, the recording of LMP date itself is prone to errors amenable to improvement. California's centralised XAFP prenatal screening programme is the largest in the country, serving approximately 70% of pregnant women in the State. As accurate gestational age is needed for interpretation of risks for trisomies and neural tube defects, XAFP data provide a population-based source of gestational age in California. Until now, only vital records have provided sufficient numbers of very early deliveries to examine the bimodal distribution of birthweight. The second birthweight mode at early gestations appears to be largely an issue of clerical and recall error, rather than pathological non-menstrual bleeding misidentified as a normal menstrual cycle.6 XAFP LMP is more accurate than LMP from birth certificates, as demonstrated by lower rates of digit preference, out-of-range gestational ages, implausible birthweight-for-gestational age and post-term births. Over half of large discrepancies in LMP dates were explained by suspected clerical and digit preference errors, indicating that quality control measures have the potential to improve gestational age estimates.

Clerical errors may arise from recording dates from the wrong field (e.g. estimated due date14 or child's date of birth), manual error transcribing a date into a chart or worksheet, or typographical error on data entry. In this analysis, assessment of clerical error may have been incomplete as misread digits in the day field were only assessed if the tens digit differed by one or the month and day were transposed. On the other hand, random or recall error may have resulted in suspected clerical error by chance. Among discrepant records flagged for clerical error, birthweight was strongly associated with XAFP LMP gestational age while lacking association with Birth LMP gestational age, suggesting errors are predominantly in Birth LMP.

In 2002, California's XAFP programme required double-key entry of all dates, thus providing built-in error checks during data entry, verification of key fields with providers where any data element was missing, and follow-up of non-negative screening results, which probably account for improved data quality. The State vital statistics electronic data entry programme requires confirmation of dates of LMP that precede birth by more than 1 year and gestations less than 140 days with birthweight >2000 g. Implementing double-key entry and expanding data checks to other situations including mistaking the estimated due date for the LMP, additional implausible birthweight entries and out-of-range gestational age estimates, could yield substantial improvements in Birth LMP data quality.

Birth LMP dates with preferred digits were more likely than those with non-preferred digits to differ from XAFP LMP dates by more than 2 weeks (8.5% vs. 4.4%). Increased discrepancies associated with preferred vs. non-preferred digits have also been reported comparing gestational age estimates from XAFP LMP dates with ultrasound gestational age estimates.3 Increased digit preference prevalence in Birth LMP relative to XAFP LMP implies that mothers are directly asked for LMP information during birth registration. Increasing duration of LMP recall has been associated with gestational age overestimation16 and may be one explanation for the overestimation of gestational age that we observed for Birth LMP dates with preferred digits. Maternal querying and missing LMP dates may both result from missing prenatal charts at the time of birth registration.

The prevalence of digit preference in day of LMP dates changed little between 1987 and 2002 (35.9% and 36.7%, respectively).3 While preference for day 1 in Birth LMP dates was associated with large gestational age overestimations and missed preterm cases in our study, its prevalence in California birth records has decreased from 7.7% in 2001 and 7.3% in 2002 to 5.7% in 2003. Researchers should assess the degree of day 1 digit preference when relying on LMP dates for gestational age estimates, particularly among vulnerable subpopulations.

LMP data in California birth records were not more complete in 2002 than they were in 1987 (12.9% vs. 12.7%).3 Nationally in 2002, 5.1% of birth records were missing only day of LMP and 5.5% were also missing month and year (J. Martin, 7 Nov 2005, pers. comm.). Missing LMP data threaten external validity of preterm birth estimates. Births missing LMP data are disproportionately from vulnerable populations and have higher risk of infant mortality.17,18 Implausible gestational ages are frequently excluded from analysis, further compounding the missing data problem. In California birth records, missing day is imputed as 15 for gestational age calculation. However, unlike other States, clinical or obstetric estimate of gestational age was unavailable to substitute for records with incomplete LMP or out-of-range gestational age estimates until 2007.

Direct linkage of birth records to XAFP records where LMP dates were considered the ‘best estimate’ of gestational age ensures the most direct LMP error assessment possible on a large, population-based sample. However, the population of women with XAFP LMP as the best gestational age estimate differed from the general birth population, with fewer women over the age of 35 years, with post-high school education, and fewer preterm or low-birthweight deliveries. Women aged over 35 years often elect to have a diagnostic test (e.g. amniocentesis) rather than a screening test. Beyond selection factors related to prenatal screening participation, women in this study had LMP dates considered reliable for screening interpretation. It is likely that the LMP dates in birth records for these women are more reliable than the LMP dates of women who were referred for ultrasound, as suggested by the higher Birth LMP-based post-term rates among women with ultrasound dating. Similarly, relative to linked records, the overall birth population had higher digit preference and post-term rates and a larger proportion in the second birthweight mode. For these reasons, the discrepancies we observed probably underestimate the true extent of LMP reporting errors in the general population of California births.

Studies comparing birth certificate LMP with ultrasound gestational age estimates need to consider the role of reporting error in vital records in addition to inherent biological or methodological limitations of LMP dating. Direct comparison of XAFP LMP with ultrasound could also lead to biased conclusions regarding the quality of the LMP dating method. We observed an excess of post-term births based on XAFP LMP dates within the subsample of XAFP records with both LMP and ultrasound data, suggesting an over-representation of unreliable XAFP LMP dates necessitating ultrasound confirmation. This small subgroup of XAFP participants with both XAFP LMP and ultrasound data, comprising 14% of all XAFP participants and 1% of 1987 California livebirths, has been the focus of previous research.3

Women of African American and Hispanic origin, with less education and higher parity, and with public or no insurance coverage were disproportionately affected by misclassification and missing LMP data on birth records. Foreign-born Hispanics had the highest rates of clerical error and underestimated gestational age, and a high preterm false-positive rate. However, recall error, indicated by digit preference, was less pronounced than among African Americans and Native Americans. The appearance of higher preterm rates among US-born Hispanics relative to foreign-born Hispanics using gestational age from XAFP LMP dates suggests that reporting error in birth records, and clerical error in particular, may hide a Hispanic paradox for preterm delivery similar to that observed for birthweight, as hypothesised by others.19 Indices based on gestational age, such as small-for-gestational age or adequacy of prenatal care, may also be biased among these segments of the population.

Beginning in 2007, the obstetric estimate was added to the California birth certificate, intended to reflect ultrasound dating where available.20 Linked data from the XAFP programme suggest that at least 39% of birth records missing LMP could potentially have an obstetric estimate informed by ultrasound. Because women obtaining ultrasound during pregnancy are not representative of the birth population, as well as for other reasons, LMP dating will still be the primary source for population monitoring of preterm delivery. We conclude that some limitations previously attributed to the LMP dating method may be ameliorated through data quality control measures. The training surrounding the implementation of the revised birth certificate provides an opportunity to emphasise appropriate sources for gestational age data and to enhance data-checking protocols.


This paper was partially supported through contract CQ004942-LOS with the Centers for Disease Control and Prevention, Atlanta, GA. The authors are indebted to Joyce A. Martin of the Centers for Disease Control and Prevention, National Center for Health Statistics, and Alan Oppenheim of the California Department of Health Services, Center for Health Statistics for insight regarding national and State birth certificate data; Bob Currier and Marie Roberson of the California Department of Health Services, Genetic Disease Branch and Patricia M. Dietz of the Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion for their thoughtful comments; Alan Hubbard of University of California, Berkeley for statistical support; Allen Hom and Steve Graham of the Sequoia Foundation for data linkage; and Deborah Hildebrandt and Marissa Root for manuscript assistance.