Systematic review of methodology used in ultrasound studies aimed at creating charts of fetal size


Dr AT Papageorghiou, Nuffield Department of Obstetrics and Gynaecology, John Radcliffe Hospital, Oxford, OX3 9DU, UK. Email


Please cite this paper as: Ioannou C, Talbot K, Ohuma E, Sarris I, Villar J, Conde-Agudelo A, Papageorghiou A. Systematic review of methodology used in ultrasound studies aimed at creating charts of fetal size. BJOG 2012;119:1425–1439.

Background  Reliable ultrasound charts are necessary for the prenatal assessment of fetal size, yet there is a wide variation of methodologies for the creation of such charts.

Objective  To evaluate the methodological quality of studies of fetal biometry using a set of predefined quality criteria of study design, statistical analysis and reporting methods.

Search strategy  Electronic searches in MEDLINE, EMBASE and CINAHL, and references of retrieved articles.

Selection criteria  Observational studies whose primary aim was to create ultrasound size charts for bi-parietal diameter, head circumference, abdominal circumference and femur length in fetuses from singleton pregnancies.

Data collection and analysis  Studies were scored against a predefined set of independently agreed methodological criteria and an overall quality score was given to each study. Multiple regression analysis between quality scores and study characteristics was performed.

Main results  Eighty-three studies met the inclusion criteria. The highest potential for bias was noted in the following fields: ‘Inclusion/exclusion criteria’, as none of the studies defined a rigorous set of antenatal or fetal conditions which should be excluded from analysis; ‘Ultrasound quality control measures’, as no study demonstrated a comprehensive quality assurance strategy; and ‘Sample size calculation’, which was apparent in six studies only. On multiple regression analysis, there was a positive correlation between quality scores and year of publication: quality has improved with time, yet considerable heterogeneity in study methodology is still observed today.

Conclusions  There is considerable methodological heterogeneity in studies of fetal biometry. Standardisation of methodologies is necessary in order to make correct interpretations and comparisons between different charts. A checklist of recommended methodologies is proposed.


Prenatal evaluation of fetal size was made possible in the 1960s by the introduction of combined A- and B-mode ultrasonography in obstetrics.1 Second- and third-trimester fetal biometry is now common practice and represents one of the most common medical investigations undertaken. However, a Cochrane review of trials evaluating routine ultrasonography beyond 24 weeks of gestation in low-risk pregnancies has not demonstrated any benefit in perinatal outcomes.2 Furthermore, intrauterine growth restriction remains a leading cause of perinatal loss, accounting for at least one-fifth of stillbirths in the UK;3 failure to diagnose and lack of effective treatment are the likely explanations.

The recognition of pathological growth is dependent on the existence of reliable standards. However, the establishment of normal charts for key biometric variables, such as bi-parietal diameter (BPD), head circumference (HC), abdominal circumference (AC) and femur length (FL), is not straightforward. Discrepancies in median values and percentile curves between studies have often been attributed to different distributions in racial,4 gender5 or other biological and demographic determinants.6 However, it is likely that differences in study design, data analysis and presentation also contribute to the observed discrepancies. Ultrasound measurement is often subject to observer error, and it is possible that systematic variations in measurement accuracy may exist between different studies. Suboptimal methodology when producing a fetal size chart is likely to affect the ability to discriminate the healthy from the compromised fetus.

Recommended methods in various aspects of study design have been published in recent years,7–11 including appropriate statistical methods for modelling cross-sectional10 (one scan per fetus) or longitudinal11 (serial scans) ultrasound data. Conversely, other aspects, such as sample size, population selection and exclusion criteria, remain debated to this day.

The aim of this systematic review was to evaluate the methodological quality of studies of fetal biometry using a set of predefined quality criteria of study design, statistical analysis and reporting methods.


This systematic review of observational studies was conducted and reported following the checklist proposed by the MOOSE group.12 Three major electronic databases (MEDLINE, EMBASE and CINAHL) were systematically searched for the period 1968 to September 2011 to identify studies of two-dimensional ultrasound biometry. Reference lists of retrieved full-text articles were examined for additional, relevant citations. Studies were included if the primary objective was to create size charts for BPD, HC, AC and FL on B-mode ultrasound in normal singleton pregnancies using either a cross-sectional or longitudinal design. The search was not restricted by study design or methodology, but only articles written in English were considered. Articles were excluded if: (1) only A-mode ultrasound was used; (2) the primary aim was other than the construction of size charts, for instance, the prediction of gestational age or comparisons between different population groups; and (3) the entire gestation was not covered; for instance, size chart from 20 to 30 weeks of gestation.

The keyword search strategy, which was constructed by a professional information specialist, is presented in Table 1. Two reviewers (KT and CI) screened the titles and abstracts of all identified citations, and selected potentially eligible studies. The full-text versions of eligible studies were independently assessed by the same reviewers and any disagreements were resolved by consensus or consultation with a third reviewer (ATP). Authors’ institutions were contacted in order to obtain a copy of the published article where this was not available from library sources. The flow chart of the literature search is presented in Figure 1. Studies excluded from this review and the reasons for exclusion are listed in Appendix S1.

Table 1.   Search strategy
  1. * indicates keyword truncation.

Fetal Development/
*Gestational Age/
((fetal or foetal or fetus or foetus) adj2 growth).tw.
((fetal or foetal or fetus or foetus) adj biometr*).tw.
1 or 2 or 3 or 4
(growth adj (curve* or chart* or standard* or index or indices)).tw.
(reference adj (curve* or chart* or index or indices or equation* or value* or range* or equation* or centile* or percentile*)).tw.
(biometr* adj (curve* or chart* or index or indices or equation* or value* or range* or equation* or centile* or percentile*)).tw.
(size adj (chart* or curve*)).tw.
(dating adj (curve* or chart*)).tw.
*Reference Values/
6 or 7 or 8 or 9 or 10 or 11
Ultrasonography, Prenatal/
(ultrasound* or ultrasonogra* or sonogra*).tw.
13 or 14
5 and 12 and 15
exp animal/not human/
16 not 17
limit 18 to English language
Figure 1.

 Consort flow diagram of literature assessment.

A list of methodological quality criteria (shown in Table 2) was initially developed by one of us (AC-A) in advance of the review, and agreed between three of the authors (JV, AC-A and ATP) independent of those who performed the data abstraction. These quality criteria are based on available published research,7–11 and are divided into three domains: study design, statistical methods and reporting methods; in total, 23 quality criteria were used for cross-sectional studies and 24 criteria for longitudinal studies.

Table 2.   Methodological quality criteria
DomainLow risk of biasHigh risk of bias
1. Study design
1.1 DesignClearly described as either cross-sectional or longitudinalNot reported
Mixture of cross-sectional and longitudinal data
1.2 Sample selectionPopulation-based study where there are attempts to identify and clearly define populations from a specific geographical area; from this underlying population, women are selected either consecutively or at randomNot population based; convenience sampling; arbitrary recruitment; or not reported
1.3 Number of occasions each fetus was measured (only for cross-sectional studies)Each fetus was measured and included only onceSome fetuses were measured and included more than once
1.4 Method of selecting the gestational ages at which the fetuses were measured (only for longitudinal studies)Interval of measures prospectively prespecified and justifiedInterval of measures not prospectively prespecified and justified or not reported
1.5 Reason(s) for choosing a particular number of serial measurements (only for longitudinal studies)Clear documentation of the intended number of serial measurementsNo clear documentation of the intended number of serial measurements
1.6 Inclusion/exclusion criteriaThe study made it clear that women at high risk of pregnancy complications were not included, and that women with abnormal outcome were excluded, i.e. an effort was made to include ‘normal’ outcome as best possible
As a minimum, the study population should exclude:
– multiple pregnancy
– fetuses with congenital structural or chromosomal anomalies
– fetal death
– women with disorders that may affect fetal growth (at least should specify exclusion of women with pre-existing hypertension, diabetes mellitus, renal disease and smoking)
– pregnancy complications (at least pre-eclampsia)
– pregnancies conceived by assisted reproductive technology
The study population included both low-risk and high-risk pregnancies, or women with abnormal outcome were not excluded
Study population that did not exclude fetuses or women with the characteristics previously described
Exclusions which would have a direct effect on the estimated percentiles, such as fetuses found at birth to be large or small for dates
1.7 Sample size A priori determination/calculation of sample size and justificationLack of a priori sample size determination/calculation and justification
1.8 Data collectionProspective study and ultrasound data collected specifically for the purpose of constructing charts of fetal size or fetal growthRetrospective study, or data not collected specifically for the purpose of constructing charts of fetal size or fetal growth, or unclear (e.g. use of routinely collected data)
1.9 Method of dating pregnancyClearly described
Known last menstrual period (LMP) and regular menstrual cycles prior to pregnancy AND a sonogram before 14 weeks demonstrating a crown–rump length (CRL) that corroborates LMP dates (within how many days unspecified)
Not described clearly
Gestational age assessment at >14 weeks, or gestational age assessment not including ultrasonographic verification
1.10 Collection of data on gestational age at inclusionThe gestational age was calculated precisely to the dayTruncation of gestational age to the number of ‘completed weeks’
2. Statistical methods
2.1 Number of measurements taken for each biometric variableMore than one measure per fetus per scanSingle measure or not specified
2.2 Statistical methodsClearly described and identifiedNot clearly described and identified
2.3 Assessment of increasing variability of the data with gestationPerformedNot performed
2.4 Assessment of goodness of fit of the modelsA test of goodness of fit of the models was reportedGoodness of fit of models was not reported
2.5 Scatter diagram of the data with the fitted percentiles superimposedStudy included scatter diagrams of the data with the percentiles superimposedStudy did not include scatter diagrams of the data with the percentiles superimposed
2.6 Change in reference percentiles across gestational ageSmooth changeNot smooth change
2.7 Methods used to estimate age-specific reference intervals for fetal size measurements‘Mean and standard deviation (SD) model’, smoothed crude percentiles, or ‘LMS method’13Inadequate
3. Reporting methods
3.1 Characteristics of study populationPresented in a table or clearly described, and includes minimum dataset of age, weight, height or body mass index and parityNot presented in a table or not clearly described, or does not contain minimum dataset
3.2 Description of number approached/enrolledDescribedNot described
3.3 Ultrasound machine(s) usedClearly specifiedNot clearly specified
3.4 Number of sonographers that took the measurementsReportedUnreported
3.5 Description of measurement techniquesThe study described sufficient and unambiguous details of the measurement techniques used for fetal size parameters, including imaging plane and calliper application methodThe study did not describe sufficient and unambiguous details of the measurement techniques used for fetal size parameters
3.6 Contains quality control measuresShould include the following:
– assessment of intraobserver variability
– assessment of interobserver variability
– image review
– image scoring
– image storage
Does not contain quality control measures
3.7 Report of mean and SD of each measurement and the sample size for each week of gestationPresented in a table or clearly describedNot presented in a table or not clearly described
3.8 Report of regression equations for the mean (and SD if relevant) for each measurementReportedNot reported

The included studies were reviewed by two obstetricians (CI and KT) and a medical statistician (EO), and study details were abstracted onto an Excel spread sheet. Studies were assessed against each criterion within the checklist and were scored as either ‘high’ or ‘low’ risk of bias. Disagreements were resolved either by consensus or consultation with a third reviewer (ATP). The overall quality score was defined as the percentage of ‘low risk of bias’ marks over the total number of quality criteria for each study.

Multiple regression analysis was performed between quality scores and study characteristics which were not part of the scoring algorithm: year of publication, sample size of participating women, sample size of included ultrasound examinations, study duration, type of participating hospitals (teaching versus nonteaching), number of participating sites (single versus multi-site), number of sonographers (single versus multiple) and type of country (low-, middle- or high-income country, using the 2010 World Bank Classification of economies by gross national income). Statistical analyses were performed using Microsoft Excel 2010 and IBM SPSS Statistics version 19.


Eighty-three publications from 32 countries were identified;14–96 the earliest was published in 1971 and the latest in 2008. In two publications,49,76 multiple study designs were used, for instance AC using a longitudinal and a cross-sectional design; the best described methodology with the highest quality score was used in the final analysis. The median sample size of participating women was 558 (minimum, 19; maximum, 17 660; interquartile range, 1120), whereas the median number of ultrasound examinations was 800 (minimum, 167; maximum, 50 131; interquartile range, 1770). Forty studies reported one biometric parameter only, whereas the remaining 43 studies included combinations of BPD, HC, AC or FL, but only 22 studies reported a complete set of all four parameters. In total, there were 60 studies for BPD, 34 for HC, 41 for AC and 43 for FL.

The study characteristics and overall quality score for each study are presented in Table 3. The breakdown of scores per study and per quality criterion is presented in Appendix S2. Additional characteristics not included in the scoring algorithm are presented in Appendices S3 (Additional maternal and pregnancy characteristics) and S4 (Additional study characteristics). Sixty-one studies had a cross-sectional design and 22 studies had a longitudinal design. Amongst the 61 cross-sectional studies, only 36 (59%) clearly specified that one examination per fetus was performed during the study period. Amongst the 22 longitudinal studies, only in 12 (55%) were ultrasound data analysed using a method that took into account their serial nature.

Table 3.   Characteristics of the included studies
ReferenceYearCountryNumber of womenNumber of scansWeeksMeasurementDesignData collectionScans for research purposes onlyQuality score (%)
  1. AC, abdominal circumference; BPD, bi-parietal diameter; FL, femur length; HC, head circumference; CM, cross-sectional data, several scans per fetus; C1, cross-sectional data, one scan per fetus; CS, cross-sectional data, number of scans per fetus not stated; LC, longitudinal data with cross-sectional analysis; LL, longitudinal data with longitudinal analysis.

Aickin et al.141976New Zealand446101819–41BPDCMUnclearUnclear17
Al-Meshari & Raber151987Saudi Arabia1570157012–41BPDC1UnclearUnclear30
Amoa et al.161993Papua New Guinea4251229–151412–42BPD, AC, FLLCUnclearUnclear21
Ashrafunnessa et al.172003Bangladesh71071012–42BPD, AC, FLC1ProspectiveYes70
Ayangade & Okonofua181986Nigeria55871212–40BPDCMUnclearUnclear26
Beigi & ZarrinKoub192000Iran132415 594–15 69312–40BPD, FLLCUnclearUnclear13
Bergsjo et al.201976Norway13142124–44BPDCMUnclearNo13
Brons et al.211990Netherlands6352012–40BPD, HC, AC, FLLLUnclearUnclear42
Browne et al.221992USA82857249–810810–44BPD, HC, AC, FLC1RetrospectiveNo35
Campbell & Newman231971UK574102913–40BPDCMUnclearUnclear17
Chan & Yeo241991Singapore1442144217–40BPDC1RetrospectiveNo22
Chang et al.251996Taiwan2077207716–41BPDC1UnclearUnclear52
Chitty et al.261994UK610425–61012–42ACC1ProspectiveYes78
Chitty et al.271994UK64964912–42FLC1ProspectiveYes78
Chitty et al.281994UK59459412–42BPD, HCC1ProspectiveYes78
de la Vega et al.292008Puerto Rico54854814–38BPD, HC, AC, FLC1UnclearNo13
Deter et al.301982USA25225213–41HC, ACC1UnclearUnclear35
Deter et al.311982USA20Unclear12–40BPD, HC, ACLLUnclearUnclear33
Deter et al.321987USA20Unclear15–40FLLLUnclearUnclear33
Di Battista et al.332000Italy2381237–153912–40BPD, HC, AC, FLLLUnclearUnclear42
Dubiel et al.342008Poland95995920–42BPD, HC, AC, FLCSProspectiveUnclear43
Elejalde & de Elejalde351986USA1068240910–40BPD, FLCMUnclearNo22
Eriksen et al.361985Denmark4149313–40BPDLLProspectiveUnclear42
Exacoustos et al.371991Italy2317231713–40FLC1UnclearNo48
Fescina et al.38198230692–72213–40BPD, HC, ACLCUnclearUnclear25
Gallivan et al.401993UK6743420–40ACLLProspectiveYes58
Guihard-Costa et al.411995France34333974–42488–41BPD, FLCMRetrospectiveUnclear30
Hadlock et al.421982USA40040015–41HCC1UnclearUnclear43
Hadlock et al.451982USA33833812–40FLCSUnclearNo35
Hadlock et al.431982USA40040015–41ACC1UnclearUnclear39
Hadlock et al.441982USA53353312–40BPDC1UnclearUnclear43
Hoffbauer et al.461979Germany43080012–40BPD, HC, ACCMUnclearUnclear9
Issel et al.481975GermanyUnclear540012–43BPDCSUnclearUnclear9
Jeanty et al.521981Belgium450Unclear15–40FLCSUnclearUnclear17
Jeanty et al.491984Belgium4569512–40ACLLProspectiveYes67
Jeanty et al.511984Belgium4569610–40BPD, HCLLProspectiveYes63
Jeanty et al.501984Belgium4564612–40FLLLProspectiveYes63
Johnsen et al.532006Norway6502489–258910–42BPD, HC, AC, FLLLProspectiveYes67
Jung et al.542007South Korea10 45510 45512–40BPD, HC, AC, FLC1RetrospectiveNo61
Kurmanavicius et al.551999Switzerland65575462–621712–42BPD, HCC1RetrospectiveNo65
Kurmanavicius et al.561999Switzerland65575807–586012–42AC, FLC1RetrospectiveNo65
Lai & Yeo571995Singapore63746017–613114–41BPD, HC, AC, FLC1RetrospectiveNo61
Larsen et al.581990Denmark35Unclear14–40BPD, AC, FLLCUnclearUnclear42
Lei & Wen591998China5496549616–40BPD, HC, AC, FLC1ProspectiveNo22
Lessoway et al.601998Canada1396747–79011–42BPD, HC, AC, FLC1ProspectiveNo43
Leung et al.612008China709679–70812–40BPD, HC, AC, FLC1ProspectiveYes70
Levi et al.631973Belgium1011303215–43BPDCMUnclearUnclear17
Levi et al.621975BelgiumUnclear76719–43BPD, HCCSUnclearUnclear9
Lu et al.642008TaiwanUnclear50 13114–41ACCSRetrospectiveNo35
Mathai et al.651995India120477–49820–40BPD, AC, FLLCProspectiveYes17
Merz et al.661987Germany53053013–42FLC1ProspectiveUnclear26
Munjanja et al.671988Zimbabwe190857–123312–40BPD, HCLCUnclearUnclear13
Munoz et al.681986South Africa1842184215–40BPDC1UnclearUnclear52
Nasrat & Bondagii692005Saudi ArabiaUnclearUnclear14–40BPD, HC, AC, FLC1RetrospectiveNo22
Neufeld et al.702004Guatemala31968414–38BPD, HC, AC, FLLCProspectiveYes33
O’Brien & Queenan711981USA411101614–40FLCMUnclearUnclear22
Okonofua et al.721988Nigeria20021920–40ACCMUnclearNo17
Okupe et al.731984Nigeria552110412–40BPDCMUnclearUnclear17
Paladini et al.742005Italy626623–62516–40BPD, HC, AC, FLC1ProspectiveYes65
Pang et al.752003China500234924–40BPD, HC, AC, FLLLProspectiveYes50
Persson et al.761978Sweden9370715–42BPDLCUnclearYes46
Persson & Weldner771986Sweden1916711–39BPD, FLLCProspectiveYes42
Pineau et al.782003France1336133612–38BPD, HCC1RetrospectiveNo39
Queenan et al.791976USA46873817–43BPDCMUnclearUnclear13
Saksiriwuttho et al.802007Thailand62862814–41BPD, HC, AC, FLC1ProspectiveYes35
Salomon et al.812006FranceUnclear19 64715–40BPD, HC, AC, FLCSUnclearNo48
Schluter et al.822004Australia17 66015 871–20 52011–41BPD, HC, AC, FLC1UnclearNo61
Shohat & Romano-Zelekha832001Israel1143114313–41BPD, FLC1ProspectiveNo22
Siwadune et al.842000Thailand61361312–41BPDC1ProspectiveYes65
Smulian et al.852001USA10 07010 07011–42ACCSRetrospectiveNo43
Snijders & Nicolaides861994UK1040104014–40BPD, HC, AC, FLC1RetrospectiveNo61
Sunsaneevithayakul et al.872000Thailand61561512–41ACC1ProspectiveYes70
Tamura & Sabbagha881980USA20053618–41ACCMUnclearUnclear9
Thame et al.892003Jamaica499257413–37BPD, HC, AC, FLLCProspectiveUnclear42
Titapant et al.902000Thailand60860812–41FLC1ProspectiveYes70
Todros et al.911987Italy1426603–128312–42BPD, HC, ACC1UnclearUnclear30
Verburg et al.922008Netherlands831320 277–22 27110–40BPD, HC, AC, FLLLProspectiveYes75
Warda et al.931985USA25425413–39FLC1UnclearNo57
Westerway et al.952000Australia3800Unclear11–40BPD, HC, AC, FLCMUnclearNo22
Wladimiroff et al.961978Netherlands30330324–41BPDC1UnclearNo17

In only 20 of the 83 studies (24%) were the ultrasound data collected prospectively and explicitly for research purposes, whereas, in 13 studies (16%), a retrospective analysis of an existing database was performed. In all the remainder, it was unclear whether a prospective or retrospective design was used, or whether the ultrasound examinations were performed for research purposes or as part of routine clinical care.

The frequencies of ‘low risk of bias’ in each of the three groups of methodological criteria are presented in Figures 2–4. Highest risk of bias was noted in the following fields: ‘Inclusion/exclusion criteria’, where none of the studies defined a rigorous set of antenatal or fetal conditions that should be excluded from analysis in order to ensure a normal pregnancy outcome (Figure 2, item 1.6); ‘Ultrasound quality control measures’, where no study demonstrated a comprehensive quality assurance strategy (Figure 4, item 3.6); and ‘Sample size calculation’, which was apparent in only six studies (Figure 2, item 1.7).

Figure 2.

 Methodological quality of included studies: study design.

Figure 3.

 Methodological quality of included studies: statistical methods.

Figure 4.

 Methodological quality of included studies: reporting methods.

Although some individual criteria of participant selection were used in different studies, such as shown in Figure 5, there was no study which systematically used all of these. Conversely, in 22 studies (27%), inappropriate exclusions were applied, such as removing from the final analysis cases on the outer percentiles of the ultrasound measurement or birthweight.

Figure 5.

 Exclusion criteria used in the included studies.

None of the studies in this review used a comprehensive ultrasound quality control strategy incorporating the items in Figure 6. In approximately one-half of the studies (40 studies), ultrasound examinations were performed by multiple sonographers, yet an exercise to standardise participating sonographers was reported in only four studies. No study reported the use of an image scoring method for the purpose of ultrasound quality assurance; in only four studies was it ensured that sonographers were blind to the actual measurement recorded during the examination.

Figure 6.

 Ultrasound quality assurance measures in the included studies.

Table 4 shows the different methods used for gestational age estimation. Last menstrual period (LMP) used in isolation remains the most popular method. Only 12 studies (14%) used a dating method considered to be at low risk of bias, namely either CRL alone or LMP confirmed by CRL.

Table 4.   Dating methods used in the included studies
Type of datingNumber of studies (%)
  1. CRL, crown–rump length; LMP, last menstrual period; US, ultrasound.

LMP only37 (45)
LMP confirmed by US parameter (non-CRL)21 (25)
LMP confirmed by CRL9 (11)
Other6 (7)
Not stated5 (6)
CRL only3 (4)
US parameter only (non-CRL)2 (2)

Results from individual studies were reported in the form of tables, equations or charts as demonstrated in Figure 7. Although tables of median values (68 studies) and tables of percentile ranges (65 studies) were a common method of presentation, half of the time these contained the raw unmodelled data; fitted median values following analytical modelling were presented in 35 studies and fitted percentiles in 33 studies. An equation for the median was reported in 50 of 83 studies, whereas the standard deviation was mathematically expressed in 32 of 83 studies, either as a fixed number or as a function of gestation. Printed charts of the median and percentile curves were seen in the vast majority of the publications.

Figure 7.

 Use of presentation methods in the included studies.

On univariate regression analysis, positive predictors of quality score were the year of publication (P < 0.001), sample size of participating women (= 0.04) and teaching (as opposed to nonteaching) hospital status (P = 0.003). On multiple regression analysis, however, the effect of sample size and hospital type became nonsignificant, whereas only the year of publication persisted as a significant predictor of quality score with a coefficient of determination R2 = 0.24. A scatterplot of quality scores versus year of publication is shown in Appendix S5.


This review has revealed substantial heterogeneity of methodology used in ultrasound studies of fetal biometry. A predefined quality scoring sheet was used in the assessment of the included studies. This checklist is not intended to commend or discard studies, but rather to be used as a consensus guideline in order to improve consistency in fetal growth research.

This review has several strengths. In the literature search, there were no restrictions by year of publication, as some of the older ultrasound charts may still be used in current clinical practice. The quality criteria were based on the best available evidence and were agreed independent of the reviewers who performed the data abstraction. The use of a quality score in percentage form allowed an objective rather than empirical assessment of quality and also enabled regression analyses in order to identify temporal or other trends. However, there are also some limitations. English language restriction was imposed and it is possible that studies of normative fetal biometry from non-English-speaking countries may have been missed. This was imposed for practical reasons; unlike systematic reviews of treatment effect, where it is imperative that all evidence is found, our aim was to assess the methodological quality of studies on fetal size. Another limitation was that the reviewers who performed the data abstraction were not blind to the origin and authors of the included studies. Finally, the older studies in this review were tested against some quality criteria which have only been established in recent years. Although it may seem unfair to judge previously published work by today’s standards, it is important for clinicians who choose amongst any of the published ultrasound charts to have an up-to-date assessment of their methodological quality.

Multiple regression analysis demonstrated that quality scores have significantly improved in recent years. This is logical as the present criteria of quality have been developed through gradual improvement in both the ultrasound technology and the statistical methods of data analysis. However, considerable heterogeneity still persists today: even in the last decade, quality scores of published studies have varied considerably around the average trend. It is possible that such differences in study quality may explain discrepancies in the reported fetal size curves between different populations. Clearly defined and consistent methodology is therefore necessary in order to critically appraise such population differences.

For instance, none of the included studies scored a ‘low risk of bias’ for their inclusion and exclusion criteria. This scoring field considered that a study should exclude women with conditions and outcomes associated with pathological fetal growth. The aim of a fetal size chart should be to depict how infants should grow under optimal conditions (a ‘prescriptive’ standard) rather than how they often grow (a ‘descriptive’ reference).97 To achieve this, it is necessary to consider factors that influence growth. A number of such factors are well established: maternal smoking;98 maternal disease, e.g. chronic hypertension or diabetes;99 pregnancy-induced hypertension;100 pre-eclampsia;100 abnormal karyotype;99 congenital anomalies;99 pre-term delivery;101 stillbirth.102 Some of these conditions were excluded in one or more studies in this review, but no study excluded all of them.

Authors have previously argued against this, claiming that an unselected population ensures a better representation of the underlying population,9 or that some exclusion criteria, such as maternal smoking for instance, are not reasonable.9,103 However, if strong evidence exists for each of them, then all should be excluded. It is unfortunate that the term ‘supernormalisation’103 has acquired a negative connotation in the past.

Conversely, it is arbitrary and illogical to exclude from analysis the outer percentiles of the ultrasound measurement values. For instance, measurements greater than 1.66 or two standard deviations from the mean31,64 may well represent physiological healthy variability. Similarly, exclusion of cases with birthweight in the outer percentiles has been used by some studies80,91 in order to define normal growth. The fundamental concern with this approach is the risk of excluding fetuses that are healthy, but constitutionally small or large for gestational age.

Only one-quarter of the studies in this review had a prospective design in which ultrasound examinations were performed for research purposes only. This is an important point. Most clinical ultrasound services now routinely collect information in computerised databases. Retrospective analysis of such databases is a practical solution to the generation of a large sample size; however, the ability of the researchers to address potential confounders is curtailed. For instance, reference curves can be skewed by the fact that a proportion of the examinations are clinically indicated as a result of suspected pathological growth. It is also difficult to retrospectively ascertain maternal or fetal complications, unless a reliable coding system is in place. The alternative is a prospective study. Strictly speaking, this indicates that the study design, participant recruitment and collection of clinical, demographic and ultrasound data are carried out with the objective of creating size charts. As a result of inconsistent terminology and ambiguous description, it was often difficult to identify prospective studies from retrospective database analyses in this review.

Accurate estimation of (gestational) age is a fundamental prerequisite for creating any size chart. It is already recognised that suboptimal dating in older ultrasound studies may have contributed to the observed flattening or reversal of growth at term.104 In addition, in studies up to the mid-1980s, the research objectives often overlapped between the assessment of fetal size and pregnancy dating; several studies used a biometric parameter, such as BPD, both as a means of dating and as a predictor of growth, in an almost circular fashion.31,42,45

Several different dating strategies were encountered in this review. On certain occasions, dating was either vaguely stated or not described at all. There is now robust evidence that early ultrasonographic determination of gestational age is more reliable than LMP alone.105 It has also been argued that, for clinical management, ultrasonography alone may be marginally superior to LMP confirmed by ultrasound,106 although such differences are small. However, when the objective is chart creation, the use of ultrasound for dating and then again for the assessment of fetal size creates a circular argument; for instance, it is possible that a smaller than expected CRL measurement may be indicative of early growth restriction or adverse pregnancy outcome,107 but such an association will be missed if the gestational age is recalculated on the basis of CRL. Therefore, in the case of studies aiming to create fetal size charts, we believe that it is conceptually more appropriate to estimate the date of confinement using an independent method, such as the LMP, provided that this is corroborated by CRL measurement. In the methodological assessment, we classified as ‘low risk of bias’ studies that used dating either by CRL alone, or by LMP corroborated by CRL. In any case, the parameter used for sonographic dating and the gestational window during which this is applied should always be clearly specified, in order to demonstrate that methods are adequately standardised; this also highlights the recommended dating practice for institutions who wish to adopt these fetal size charts.

Considerable variation exists in the collection of ultrasound data, the number of sonographers and scan machines used, measurement method definition, standardisation and the monitoring of ultrasound data quality. We have proposed a comprehensive quality control strategy which includes saving and independently reviewing scan images, the use of an image scoring method and the assessment of intra- and interobserver variability of measurement. Studies with multiple sonographers are preferable as they reflect real clinical practice, provided that such strategies for quality assurance are in place. Explicit description of measurement planes and calliper application conventions are necessary. A formal standardisation exercise prior to the start of a multi-sonographer study has been shown to increase data consistency.108 Blinding of the sonographers to their own measurements is a reasonable effort to remove observer bias.

A consensus has been reached in recent years regarding the appropriate methods for statistical modelling and data presentation.9–11 Presentation of the raw measurement data only is not clinically useful or informative. Both the median and variance should be modelled as a function of gestational age in a manner that accounts for the increasing variability with gestation and provides smooth percentile curves; goodness of fit testing should demonstrate that these curves describe accurately the structure of the raw data.10 Finally, the results should be presented in the form of tables of fitted percentile values, gestational curve charts and regression equations for both the mean and standard deviation.

When we assessed measurement differences from those studies that had the highest scores (Table 5), it was noted that, for the majority of biometric parameters, significant heterogeneity remained: for example, a BPD measurement of 88 mm at 36 weeks is around the 50th centile of one chart,74 whereas the same fetus would be below the 10th centile by another,92 and the same is seen at various other gestational ages. Differences in AC were similarly wide, with overlap between 10th and 50th centiles from different studies evident.49,53 However, equivalent centiles of HC seem to be very similar between studies, with most differences well within 0.5 SD. This may be a reflection of the lower variability in HC measurements across different countries because of the properties of HC as a marker of fat-free mass, or may simply be caused by the fact that HC is a parameter that is more consistently measured in different locations. In other words, whether these differences in biometric parameters are a result of different measurement methodologies or differences in population characteristics is impossible to establish using this approach. It is only when studies of fetal size of the highest methodological quality are performed uniformly in different populations that differences in fetal size can be properly appraised or attributed to biological determinants.

Table 5.   Comparison of measurement values amongst highest scoring studies
Reference28 weeks32 weeks36 weeks
10th centile50th centileSD10th centile50th centileSD10th centile50th centileSD
  1. For each parameter, the five highest scoring studies were identified; additional studies were included if more than one study occupied the fifth place; one study by Ashrafunnessa et al.17 is not included despite having the third highest quality score (70%), because data for the fitted centiles were not presented, meaning that we could not derive values for this table. All BPD measurements were outer to outer, except in Siwadune et al.,84 where the outer to inner convention was used; all measurements quoted in millimetres; all centiles are fitted centiles following modelling; SD, fitted standard deviation.

(A) Bi-parietal diameter (BPD) studies
Chitty et al.2869.2973.353.1778.9583.323.4186.7891.453.64
Verburg et al.9270.0073.682.8780.2284.383.2488.9093.553.63
Leung et al.6168.0972.203.2178.0482.333.3585.5590.023.49
Johnsen et al.5369.0073.003.1278.0083.003.9086.0091.003.90
Paladini et al.7466.2070.903.6775.7080.303.5984.1088.103.12
Kurmanavicius et al.5570.4974.943.4779.9484.743.7487.4492.563.99
Siwadune et al.8465.7168.322.0474.9377.912.3281.7585.152.65
(B) Head circumference (HC) studies
Chitty et al.28249.10262.5010.45282.70297.3011.39309.00324.8012.32
Verburg et al.92251.90262.708.42285.40297.809.67309.50323.5010.92
Leung et al.61246.90258.769.25281.22293.549.61305.60318.389.97
Johnsen et al.53246.00259.0010.14279.00293.0010.92304.00320.0012.48
Paladini et al.74250.00263.5010.53280.90296.1011.86303.30320.0013.03
Kurmanavicius et al.55248.43262.7311.15279.87295.3212.05303.26319.8712.96
(C) Abdominal circumference (AC) studies
Chitty et al.26212.92230.5713.77248.97269.7116.18282.59306.4118.58
Verburg et al.92224.30239.4011.78261.10278.6013.65292.10312.2015.68
Leung et al.61218.95233.9011.66256.61273.5613.22290.68309.6414.79
Sunsaneevithayakul et al.87217.04232.5812.12254.89273.2614.33289.72309.1215.13
Johnsen et al.53223.00240.0013.26262.00282.0015.60299.00321.0017.16
Jeanty et al.49208.01225.2413.44245.13262.3613.44276.07293.3013.44
Paladini et al.74218.20239.3016.46252.30275.0017.71284.30307.0017.71
Kurmanavicius et al.56213.21231.7814.49249.60270.6116.39283.33306.7818.29
(D) Femur length (FL) studies
Chitty et al.2749.2552.702.6957.4361.182.9364.1468.193.16
Verburg et al.9249.6852.472.1857.4360.462.3663.2466.522.56
Leung et al.6147.2250.022.1855.2058.172.3262.3365.462.44
Titapant et al.9046.4548.951.9554.3957.102.1160.9563.892.29
Johnsen et al.5348.0052.003.1256.0060.003.1263.0067.003.12
Paladini et al.7449.5052.402.2656.9060.502.8162.9067.203.35
Kurmanavicius et al.5648.5552.302.9356.9660.853.0364.2868.283.12


This systematic review has demonstrated considerable heterogeneity of design in ultrasound studies of fetal biometry. The use of uniform methodology of the highest quality is essential in order to establish whether population differences in fetal measurements are biological or caused by differences in measurement. A checklist of the recommended design is proposed in order to aid such uniformity.

Disclosure of interests

CI, IS, EO, JV and ATP are part of the INTERGROWTH-21st project, an international study of fetal growth (

Contribution to authorship

CI, KT and ATP designed the study, analysed the data, interpreted the results, drafted the manuscript and made the decision to submit. AC-A, JV and ATP defined the quality criteria a priori. CI, KT and EO extracted the data. All authors had full access to the data, interpreted the results, edited and approved the final manuscript.

Details of ethical approval

No ethics approval was required.


This project was supported by INTERGROWTH-21st (Grant ID# 49038) from the Bill & Melinda Gates Foundation to the University of Oxford, for which we are very grateful. CI and ATP are supported by the Oxford Partnership Comprehensive Biomedical Research Centre with funding from the Department of Health NIHR Biomedical Research Centres funding scheme.


We would like to thank Nia Roberts, outreach librarian at the Bodleian Health Care Libraries, for her assistance in the literature search.