Fetal crown–rump length (CRL) measurement by ultrasound in the first trimester is the standard method for pregnancy dating; however, a multitude of CRL equations to estimate gestational age (GA) are reported in the literature.
Fetal crown–rump length (CRL) measurement by ultrasound in the first trimester is the standard method for pregnancy dating; however, a multitude of CRL equations to estimate gestational age (GA) are reported in the literature.
To evaluate the methodological quality used in studies reporting CRL equations to estimate GA using a set of predefined criteria.
Searches of MEDLINE, EMBASE, and CINAHL databases, from 1948 to 31 January 2011, and secondary reference sources, were performed.
Observational ultrasound studies, where the primary aim was to create equations for GA estimation using a CRL measurement.
Included studies were scored against predefined independently agreed methodological criteria: an overall quality score was calculated for each study.
The searches yielded 1142 citations. Two reviewers screened the papers and independently assessed the full-text versions of 29 eligible studies. The highest potential for bias was noted in inclusion and exclusion criteria, and in maternal demographic characteristics. No studies had systematic ultrasound quality-control measures. The four studies with the highest scores (lowest risk of bias) satisfied 18 or more of the 29 criteria; these showed lower variation in GA estimation than the remaining, lower-scoring studies. This was particularly evident at the extremes of GA.
Considerable methodological heterogeneity and limitations exist in studies reporting CRL equations for estimating GA, and these result in a wide range of estimated GAs for any given CRL; however, when studies with the highest methodological quality are used, this range is reduced.
A dating of pregnancy is important, as up to 30% of women attending an antenatal clinic have uncertain or unreliable menstrual dates. Antenatal care and interventions aimed at improving pregnancy outcome rely on our knowledge of the gestational age (GA). The potential benefits of correct ultrasound dating in the first trimester include: the improved performance of first-trimester screening for chromosomal abnormalities; reducing the number of pregnancies classified as preterm; and the reduced incidence of post-term delivery. It has also been shown that dating the pregnancy in the first rather than the second trimester can lead to a reduction in the number of unnecessary inductions of labour.[6-8]
Crown–rump length (CRL) is the most commonly used fetal measurement for pregnancy dating in the first trimester. The first equation that correlated CRL with GA was reported by Robinson and Fleming in 1975. Several studies proposing and validating different CRL equations have been reported since then.[9-37] Although the original Robinson equation remains widely used, there is variation in practice and no consensus exists on which formula is the most appropriate for pregnancy dating. The prevailing practice for GA assessment is often dictated by operator preference, the default equation setting in the ultrasound equipment, local hospital policy, or national guidance. The use of different formulae can lead to a discrepancy in GA estimation for the same CRL measurement of several days.
Assessing the accuracy of CRL formulae is difficult, as it requires an independent gold standard for GA estimation: for instance, some studies have compared CRL dates with GAs based on the date of embryo transfer in pregnancies following in vitro fertilisation (IVF).[39-41] The problem with this approach is that IVF pregnancies may not be biologically equivalent to spontaneous conceptions. They are associated with higher perinatal risks and congenital malformation rates.[42, 43] Therefore, it is possible that early fetal growth in IVF pregnancies is also different to that in spontaneously conceived pregnancies.
Another way to evaluate existing CRL equations is to assess the methodological quality of the studies from which they were derived, in a manner similar to assessing the quality of randomised controlled trials, in order to evaluate the potential sources of bias and to identify the best equations to be used. The objective of this systematic review was therefore to perform such an evaluation.
This systematic review of observational studies was conducted and reported using the checklist proposed by the Meta-analyses of Observational Studies in Epidemiology (MOOSE) group. Three major electronic databases (MEDLINE, EMBASE, and CINAHL) were systematically searched from 1948 to 31 January 2011. Studies were included if they reported GA estimation from first-trimester CRL measurements using ultrasound. Only articles written in English were considered. Articles were excluded if they did not report a new equation for CRL dating. For instance, reviews and studies performing validation of previously published dating equations were excluded from the review.
A search strategy was formulated in collaboration with a professional information specialist: we searched MEDLINE (OvidSP; 1948–31 January 2011), EMBASE (OvidSP; 1974–31 January 2011), and CINAHL (EbscoHOST; 1980–31 January 2011). A cited reference search was conducted on the Science Citation Index (Web of Knowledge; 1945–31 January 2011) for two seminal papers.[9, 45] The following keywords were entered: crown–rump length OR CRL OR fetal OR foetal OR fetus OR foetus AND length OR embryo* AND (pole OR length) AND ultrasound OR ultrasonogra* OR ultra-sonogra* OR sonic* OR scan*AND gestational age OR gestation* OR expected gestation OR expected date* OR date delivery* OR dating delivery OR dating AND (formula or model or chart) OR dating.
Two reviewers (R.N. and J.D.) screened the titles and abstracts of all identified citations, and selected potentially eligible studies. The same reviewers independently assessed the full-text versions of eligible studies, and any disagreements were resolved by consensus or consultation with a third reviewer (A.T.P.). Reference lists of retrieved full-text articles were examined for additional, relevant citations. The flow chart of the literature search, plus the inclusions and exclusions, is presented in Figure 1.
The quality of the studies included was assessed using a modified version of the methods used in our previous evaluation of fetal growth charts. A list of methodological quality criteria (listed in Table S1) was devised a priori and divided into two domains: study design (12 criteria); and statistical and reporting methods (17 criteria). Studies were assessed against each criterion within the checklist and were scored as either 0 or 1 if there was a ‘high’ or ‘low’ risk of bias, respectively. The overall quality score was defined as the sum of ‘low risk of bias’ marks (with the range of possible scores being 0–29).
The studies included were reviewed and study details entered into an excel spreadsheet (Microsoft 2007). The methodological quality of each study was then assessed by two obstetricians (R.N. and J.D.) and a medical statistician (E.O.O.). Disagreements were resolved by consensus or consultation with a fourth reviewer (A.T.P.).
The searches yielded 1142 citations, of which 62 were considered for potential inclusion (Figure 1). Thirty-three studies were excluded because they described growth with GA (n = 16), assessed or compared existing chart(s) (n = 3), were reviews or practice guidelines (n = 3), or had other aims (n = 11; Table S2).[38-41, 45, 47-74] Finally, 29 studies providing data on over 11 000 pregnancies met the inclusion criteria and were included in the final analysis (Table 1).[9-37]
|Study||Country||Study period (months or year range)||Data collection||Study design||Conception||GA estimation method||GA range (weeks)||Women (n)||Measurements (n)||Scanning method||Quality score|
|Bovicelli et al.||Italy||NR||P||CS||Spont||LMP||7–13||237||NR||NR||11|
|Campbell et al.||UK||39||R||CS||Spont||LMP||7–14||316||NR||NR||13|
|Chalouhi et al.||France||2001–2006||P||CS||AC/Sponta||Oocyte retrieval||11–14||331||NR||TV/TA||15|
|Chevernak et al.||USA||47||R||NR||Mixed||hCG, body temperature, cervical mucous, consistency, endometrial biopsy, IVF||–12||77||NR||NR||5|
|Drumm et al.||Ireland||15||P||CS||Spont||LMP||6–14||253||253||TA||17|
|Goldstein and Wolfson||USA||NR||P||CS||Spont||LMP||NR||143||143||TV||12|
|Grisolia et al.||Italy||NR||P||CS||Spont||LMP||5–12||236||NR||TV||17|
|Hadlock et al.||USA||9||P||CS||Spont||LMP||5–18||416||NR||TV/TA||16|
|Izquierdo et al.||USA||12||NR||CS||Spont||LMP||8–12||92||NR||TV||13|
|Kurjak et al.||Yugoslavia||NR||NR||Mixed||Spont||LMP||6–14||220||390||TA||10|
|MacGregor et al.||USA||NR||NR||CS||AC||LH, body temperature, follicular collapse on ultrasound||7–13||72||72||TA||14|
|McLennan and Shulter||Australia||22||P||CS||Mixed||LMP, embryo transfer||5–14||396||NR||TV/TA||18|
|Papaioannou et al.||UK||77||R||Longit||Mixed||Previous CRL dating equation||6–13||4698||NR||TV/TA||16|
|Pedersen||Denmark||NR||P||Longit||Spont||LMP, body temperature||7–14||101||289||TA||15|
|Piantelli et al.||Italy||NR||P||Longit||Spont||LMP||7–13||72||NR||TA||14|
|Robinson et al.||UK||NR||P||CS||Spont||LMP||6–14||334||334||TA||18|
|Rossavik et al.||USA||NR||P||Longit||Mixed||Embryo transfer, follicular collapse on ultrasound||7–15||35||106||TA||10|
|Sahota et al.||China||24||P||CS||Spont||LMP||6–15||393||393||NR||21|
|Selbing and Fjällbrant||Sweden||NR||P||CS||AC||Insemination, body temperature||NR||24||24||TA||13|
|Silva et al.||USA||24||P||CS||AC||LH, ovulation induction||5–9||36||36||TV||12|
|Van de Velde et al.||the Netherlands||NR||P||Mixed||Spont||Body temperature||7–14||60||118||TA||13|
|Verburg et al.||the Netherlands||46||P||CS||Spont||LMP||6–14||2079||2079||TV/TA||20|
|Vollebergh et al.||the Netherlands||1981–1986||R||CS||Spont||Body temperature||6–13||47||47||TA||9|
|Westerway et al.||Australia||22||P||Mixed||NR||LMP||5–14||NRb||NR||NR||10|
|Wisser et al.||Germany||56||P||Mixed||AC||Oocyte retrieval, insemination||5–14||139||274||TV||15|
The main characteristics and overall quality score for each study included are presented in Table 1. The earliest study was published in 1975 and the latest in 2011.[9, 12] Data collection was prospective in 17 studies, retrospective in seven studies, and not reported or uncertain in five studies (Figure 2A; Table S3). Eighteen studies had a cross-sectional design, five were longitudinal, and five were mixed cross-sectional and longitudinal; the design of the remaining study was not reported.
Unselected, low-risk pregnancies were included in only eight (28%) studies. Overall, the demographic characteristics of the populations and any inclusion or exclusion criteria were not well described. Although almost all of the studies reported some of the inclusion/exclusion criteria used in the scoring, in no study were all of them used (Table S1).
The independent method used to assess GA was the first day of the last menstrual period (LMP) in 16 studies. In the remainder, GA was assessed using dates relevant to assisted reproduction, e.g. the date of oocyte retrieval, luteinising hormone surge, embryo transfer, basal body temperature rise, or intrauterine insemination.
Overall, the ultrasound aspects of the studies were well described (Figure 2B; Table S4). Transabdominal ultrasound was used in 12 studies, transvaginal ultrasound was used in five studies, and both were used in six studies. In 14 studies, more than one sonographer obtained scans. The method of image acquisition was well described (26 studies); however, none of the studies employed a comprehensive strategy for ultrasound quality control.
Although all studies had pregnancy dating as their main purpose, the regression equation of GA versus CRL was not reported in four studies. Assessment of the goodness of fit of the proposed equation was performed in 18 studies.
This review has identified four studies that satisfied more than 18 of the 29 quality criteria (Table 2).[9, 23, 29, 34] Figure 3 shows the variation of GA estimation using these four ‘best-scoring’ charts, compared with the remaining 22 lower-scoring studies (in 3 further studies it was not possible to caclulate the regression equation). It is notable that the best charts are very similar, and that the remaining 22 studies give a wide range of estimated GA, particularly at the extremes of CRL.
|Fetal crown-rump length (mm)||Gestational age (weeks + days)|
|McLennan and Schluter||Robinson and Fleminga||Sahota et al.a||Verburg et al.|
|5||6 + 0||6 + 0||6 + 2||6 + 2|
|10||7 + 0||7 + 1||7 + 2||7 + 4|
|15||7 + 6||7 + 6||8 + 1||8 + 2|
|20||8 + 4||8 + 4||8 + 6||9 + 0|
|25||9 + 1||9 + 2||9 + 3||9 + 4|
|30||9 + 5||9 + 6||9 + 6||10 + 0|
|35||10 + 2||10 + 2||10 + 3||10 + 3|
|40||10 + 5||10 + 6||10 + 6||10 + 6|
|45||11 + 1||11 + 2||11 + 2||11 + 2|
|50||11 + 4||11 + 5||11 + 5||11 + 5|
|55||12 + 0||12 + 1||12 + 1||12 + 0|
|60||12 + 2||12 + 3||12 + 3||12 + 3|
|65||12 + 5||12 + 6||12 + 6||12 + 5|
|70||13 + 0||13 + 1||13 + 1||13 + 0|
|75||13 + 2||13 + 4||13 + 3||13 + 3|
|80||13 + 3||13 + 6||13 + 6||13 + 5|
|85||–||14 + 1||14 + 1||14 + 0|
|Formula||GA (days) = 32.61967 + (2.62975 × CRL) – [0.42399 × log(CRL) × CRL]||GA (days) = 8.052 × (CRL × 1.037)1/2 + 23.73b||GA (days) = 26.643 + 7.822 × CRL1/2||GA (weeks) = exp[1.4653 + 0.001737 × CRL + 0.2313 × log(CRL)]|
The aim of this review was to investigate the methodology used in studies reporting GA estimation based on CRL measurement. Using a set of 29 criteria the studies were scored as having a low or high risk of bias based on study design, and on the statistical and reporting methods used. This produced a wide range of scores, showing that the quality of the studies was variable (median 15, range 5–21): nine studies scored >15/29 and six studies scored <12/29. We previously used this approach for the case of ultrasound chart creation in fetal biometry. In our view, this is the most scientific way to compare the methodological rigour of studies, improve consistency in fetal growth research, and highlight limitations that should be avoided in future research.
We found that there is considerable heterogeneity and that limitations exist in studies reporting CRL equations for estimating GA. The four studies with the highest scores (lowest risk of bias) satisfied 18 or more of the 29 criteria; these showed lower variation in GA estimation than the remaining, lower scoring studies, and this was particularly evident at the extremes of GA.
This review has several strengths. The use of a quality score allowed for an objective and quantitative assessment of study methodology: the quality criteria were formulated a priori, and were based on a previously published quality checklist used in studies of fetal biometry. One limitation is that an English language restriction was used; however, unlike systematic reviews of randomised controlled trials, where it is imperative that all available evidence is included to estimate the effect of treatment, this is less likely to be a significant limitation in reviews of methodological quality.
There is a debate regarding how best to select samples in research studies that aim to create reference equations of fetal size.
Some authors propose using markers of ovulation or oocyte retrieval/embryo transfer dates in IVF pregnancies as the gold standard; however, uncertainties remain in modelling GA estimation charts in such pregnancies, including the potential time lag between ovulation and conception, differences in early embryonic growth in vitro, and, more importantly, differences arising from the selected nature of the population undergoing assisted conception. There are conflicting results about first-trimester fetal growth in IVF pregnancies. Both underestimation and overestimation have been reported between assisted and spontaneous conception populations.[14, 22, 31, 32] Moreover, pregnancies achieved by assisted reproduction may be at higher risk of perinatal complications than normally conceived pregnancies. Finally, we consider that using a sample of women undergoing assisted reproduction to create dating charts that are then applied to a population of women with spontaneous conception is questionable.
Some authors have proposed using a sample that is as unselected as possible to best represent the underlying population. The problem with this strategy is that a number of pathological conditions may be prevalent, which are likely to affect the reference equations derived. We believe that the purpose of a reference equation is to demonstrate how fetuses should grow (prescriptive), rather than how they do grow (descriptive). Pathological processes, such as smoking, hypertension and pre-eclampsia, maternal disease, abnormal fetal karyotype and congenital anomalies, preterm delivery, and stillbirth, are known to affect fetal size later in pregnancy. There is now evidence to suggest that early fetal growth restriction can be evident as early as the first trimester. Therefore, when producing reference CRL equations, efforts should be made to ensure the sample consists of women at low risk of developing such complications.
A number of studies reporting CRL measurements in the first trimester have been excluded from this review because they attempted to answer a different question: to describe fetal growth in the first trimester.[49, 51-55, 57, 59, 61, 65, 66, 68, 70, 72-74] In some of the studies the authors considered both of these concepts, and such reports were included if GA estimation was one of the stated aims of the study and if a GA estimation formula was provided, regardless of how the data were analysed.[10, 11, 15, 21] The study by McLennan and Schluter illustrates the differences between the two concepts. The scatter plot of CRL (the independent variable) against GA is reported first, deriving the equation for GA estimation. In the second figure the scatter plot of GA (the independent variable) against CRL is reported. Both charts can be derived from the same population, but differences are seen relating to the analysis performed (i.e. modelling GA estimation rather than fetal size). Sahota et al. elegantly demonstrate how the assessment of size and maturity should not be considered interchangeable, as just ‘flipping’ a regression can lead to an over- or underestimation of GA, especially at the extremes of the CRL range.
We believe that the recommended study design should be a prospective study of normally conceived, singleton pregnancies, with a pre-defined analysis plan and a prior sample size estimation. Reporting of the demographic characteristics, recruitment period, and estimated GA is essential information for such observational studies; in the present review, <50% of the studies identified satisfied these criteria (Figure 2A, B); in addition, <60% had a prospective design. Most hospitals now routinely collect information using ultrasound software databases, and retrospective analysis of such databases can very easily generate a large sample size. However, retrospective studies are fraught with potential bias as data quality may be variable and the ability to perform continuing ultrasound quality assurance is curtailed.
It has previously been argued that reference studies should be performed by a single operator in order to reduce inter-observer error; however, ultrasound scans in most clinical services are performed by multiple operators, and so variability is inevitable and it would be illogical to ignore it. Reference studies should account for this when using multiple operators, and quality assurance steps should be taken to improve the quality and consistency of measurements, including the standardisation of contributing ultrasonographers.
In the analysis of studies a table of included observations should show how many women were recruited in each GA window. Both the median and variance should be modelled as a function of GA in a way that accounts for the increasing variability with gestation, and should provide smooth percentile curves. A goodness-of-fit assessment, with graphical evaluation of the superimposed centiles, is essential to compare the predictive model. To assess the model, a smooth change of the mean should be represented, superimposed onto the raw data.[12, 14, 16, 17, 19, 23-25, 27, 29, 31, 34, 37] While many studies described the statistical method used, more than half did not satisfy the above criteria (Figure 2B).
When adopting reference equations for use in clinical service it is reasonable to choose the publications with the lowest risk of methodological bias (Table 2). This review has identified four studies that satisfied more than 18 of the 29 quality criteria. In Figure 3 it is evident that using any of these four charts leads to very small differences in GA estimation, when compared with the remaining charts.
This systematic review has demonstrated considerable heterogeneity of design in the studies of pregnancy dating by CRL: this results in a wide range of estimated GA for any given CRL. The use of any one of the four studies identified that satisfy most quality criteria lead to very small differences in GA estimations. Consensus in methodology is essential in order to appraise population differences in CRL measurement. A checklist of recommended design is proposed to aid such consensus and potentially reduce the variability in application for clinical practice.
The authors declare they have no conflict of interests.
RN, JD, SHK, JV, and ATP designed the study. JV, ACA, and CI defined the quality criteria a priori. RN, JD, and EOO extracted the data. RN, JD, EOO, and ATP scored the studies, analysed the data, interpreted the results, drafted the article, and made the decision to publish. All authors assisted in drafting the article, submitting it, and revising it for important intellectual content, and all authors edited and approved the final version to be published.
No ethical approval was required.
All authors are part of the INTERGROWTH–21st Project, an international study of fetal growth (www.intergrowth21.org.uk) funded by the Bill & Melinda Gates Foundation to the University of Oxford, for which we are very grateful. C. Ioannou and A. T. Papageorghiou are supported by the Oxford Partnership Comprehensive Biomedical Research Centre, with funding from the Department of Health's National Institute for Health Research (NIHR) Biomedical Research Centres funding scheme.
We would like to thank Ms Nia Wyn Roberts, Outreach Librarian, Bodleian Health Care Libraries, for her assistance with the literature search, and Prof. Doug Altman for useful discussions on the assessment of fetal dating charts.