Statistical Opinion

# Statistical methods for constructing gestational age-related reference intervals and centile charts for fetal size

Article first published online: 3 JAN 2007

DOI: 10.1002/uog.3911

Copyright © 2007 ISUOG. Published by John Wiley & Sons, Ltd.

Additional Information

#### How to Cite

Silverwood, R. J. and Cole, T. J. (2007), Statistical methods for constructing gestational age-related reference intervals and centile charts for fetal size. Ultrasound Obstet Gynecol, 29: 6–13. doi: 10.1002/uog.3911

#### Publication History

- Issue published online: 3 JAN 2007
- Article first published online: 3 JAN 2007

- Abstract
- Article
- References
- Cited By

### Introduction

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

Many fetal size variables, for example head measurements, abdominal measurements and femur length, increase over the course of gestation. Reference intervals (RIs) and centile charts provide a means of assessing these measurements, at a given gestational age (GA) or across a range of GAs, respectively, and are tools of great importance in clinical medicine.

RIs (sometimes, misleadingly, called ‘normal ranges’) represent the interval between a pair of symmetrically placed extreme centiles (such as the 5^{th} and 95^{th} for a 90% interval) of a size variable, denoted *y*, at a given GA. Centile charts plot the values of *y* corresponding to one or more RIs against the relevant GA over a range of GAs. In the field of fetal size, values which lie outside the RI are regarded as extreme and may indicate the presence of a disorder such as intrauterine growth restriction1 or macrosomia2. More informative, however, than this forced dichotomy is the calculation of a value's centile position, or *Z*-score, relative to the reference population, estimated from knowledge of the distribution of *y* at a given GA. For a given observation, the proximity of the centile position to 0% or 100% (alternatively the magnitude and sign of the *Z*-score) is then a measure of how extreme the observation is compared to the reference data at that GA. A centile position above 50% (equivalently a positive *Z*-score) signifies a measurement greater than average for that GA, and a centile position below 50% (or a negative *Z*-score) one less than average.

While recent years have seen the publication of a variety of strategies for the construction of RIs, incorrect methods have still been used for fetal measurements of all kinds1. The choice of suitable methodology in this field is especially crucial as inaccurate centiles may lead to false conclusions regarding the development of the fetus, resulting in suboptimal clinical care.

In an article in this issue of the Journal, Sherer *et al.*3 construct centile charts of the axial cerebellar hemisphere circumference (CHC) and area (CHA) through gestation using one such method, based upon regression modelling of both the mean and the standard deviation (SD) across GA, as detailed by Altman and Chitty4 and Royston and Wright1.

It is the aim of the present article to further examine the statistical approach used by Sherer *et al.*3, while taking a more general look at the problem of constructing GA-related RIs and considering alternative approaches to this problem. Techniques for longitudinal data, where each subject contributes repeated observations, as opposed to cross-sectional data, where they contribute only one, require a different approach and are not considered here. Further information on this area can be found in, for example, Royston and Altman5 and Royston6.

While many of the techniques explored here could be, and indeed have been, used in the context of anthropometric measurements, the focus here is on applications in the field of fetal size.

### The general problem

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

Prior to the statistical analysis, many RIs and charts for fetal size are already flawed by weaknesses in study design. As with any study, the choice of an appropriate sample is of great importance. While some published studies use routinely collected data, resulting in the inclusion of multiple observations on some fetuses, Altman and Chitty4 note that these fetuses are likely to be those with clinical indications, introducing bias to the sample. They advocate collecting data specifically for the purpose of developing the RI, with each fetus being included only once. Within this framework it is important to have as unselected a sample as possible because reference data should relate to ‘normal’ fetuses. Altman and Chitty4 suggest that it is reasonable to exclude fetuses subsequently found to have a congenital abnormality, though they recommend the inclusion of neonatal deaths and fetuses large or small for dates at birth where this is not the case. Maternal conditions which could affect fetal growth are also deemed reasonable exclusion criteria.

While imprecise estimates of the RI will be obtained when the sample size of the dataset is too small1, it is not easy to accurately specify appropriate sample sizes. In particular, when interest is focused on the extreme centiles, as is often the case, several hundred observations may be necessary to obtain estimates at an appropriate level of precision.

There are a variety of available statistical approaches for the calculation of RIs, the most important of which are to be reviewed presently. The method needs to produce reference centiles which change smoothly with GA and provide a good fit to the data. While clearly these requirements are essential, it is also preferable, for the sake of general usability and accessibility, to maintain as simple a statistical model as possible. Accordingly, the choice of approach must strike a balance between these conditions. It is also desirable that tools are available for calculating the relevant centile positions and *Z*-scores for any further measurements, which again should be as user-friendly as possible in their application. Not only is the calculation of *Z*-scores useful on an observation-specific level, it has also been shown to be instrumental in the assessment of chart comparison7 and quality control8.

### Mean and SD model

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

The statistical approach followed by Sherer *et al.*3, here referred to as the ‘mean and SD model’, is one which has been found to be sufficiently general to cope with a wide range of fetal measurements available from ultrasound scanning1. Generally, under the assumption that at each GA the measurement of interest has a Gaussian (or normal) distribution with mean and SD that vary smoothly with GA, the centile curve at a given GA may be calculated by:

- (1)

where mean_{GA} and SD_{GA} are, respectively, the mean and SD at the required GA, and *K* is the desired normal equivalent deviate (NED). The NED takes a value corresponding to the proportion of the standard normal distribution (with mean of 0 and SD of 1) lying to the left of it. For example, the 50^{th} centile (with a proportion of 0.5 of the standard normal distribution to the left of it) has an NED of 0, while the determination of a 90% reference range (i.e. the 5^{th} and 95^{th} centile curves) would require *K* = ±1.645.

The ‘mean and SD model’ approach aims to find functions that adequately represent how the mean and SD change with GA, allowing any desired centile curve to be readily calculated by appropriate choice of *K*.

Firstly the mean is modeled by fitting a polynomial curve to the raw data by means of least squares regression analysis. Royston and Wright recommend the initial use of a cubic polynomial (*a* + *bt* + *ct*^{2} + *dt*^{3}, where, for simplicity, GA is represented by *t*)1. If the cubic coefficient, *d*, is not significantly different from zero (approximately if *d* is less than twice its SD), a quadratic polynomial (*a* + *bt* + *ct*^{2}) should be fitted with the same assessment made of the quadratic coefficient, *c*. The process should be repeated until no further removal of terms is possible. While quadratic or cubic curves will often give a good fit to the data, Altman and Chitty4 suggest the linear-cubic model (*a* + *bt* + *dt*^{3}) as a good alternative for fetal size data. It is advocated that the choice of curve be based not only on statistical significance, but also that the quality of fit to the data and esthetic appearance, especially at the extremes of GA, should be taken into account. Sherer *et al.* found a linear model (*a* + *bt*) to be sufficient for the CHC curve and a quadratic polynomial to be suitable for CHA3.

Once a suitable mean model has been decided upon, attention can turn to the variability in the data. Residuals from the fitted mean model (observed value minus predicted value) should be calculated and plotted against GA to show if and how variability changes with GA4.

Previously, modeling of the variability was not often considered, even though in the field of fetal size SD almost always changes with GA9. While other methods have been proposed10, the approach most frequently used is that of Altman9. It follows—from the assumption that the variable under consideration is normally distributed at all GAs—that the residuals from the mean model should also be normally distributed. This in turn means that the absolute residuals (residuals with the sign removed) have a half normal distribution. As the mean of a half standard normal distribution is √(2/π), the mean of the absolute residuals multiplied by √(π/2) is an estimate of the SD of the residuals. Hence if the SD is not reasonably constant over GA, predicted values from a regression of the absolute residuals on age multiplied by √(π/2) will give age-specific estimates of the SD of the residuals, and hence of *y*.

An alternative formulation for Altman's approach favored by Royston and Wright1, and employed in this instance by Sherer *et al.*3, is to produce ‘scaled absolute residuals’ (SARs) by multiplying the absolute residuals by √(π/2). The SARs are then regressed on GA, the predicted values from which again estimate the SD of the residuals.

Under either formulation, if the absolute residuals, be they scaled or unscaled, show no trend with GA, the SD is estimated as the SD of the unscaled original residuals (observed value minus predicted value). If there is a trend, polynomial regression is needed to estimate an appropriate curve in the same way as for the mean. Altman suggests that it is unlikely that a curve more complex than quadratic is required for a satisfactory fit to the SD9. Superimposing ±1.645 × SD on the residual plot is useful to see how well the SD has been modeled, as approximately 90% of the observed residuals should fall within these limits. Sherer *et al.*3 found the CHC SARs to be suitably represented by a linear relationship with GA, while those for CHA required a cubic polynomial.

As the regression analysis to estimate the mean should really take into account any increase in SD with GA, at this juncture the mean model can be refitted using the reciprocal of the square of the estimated SD as weights. However, Altman and Chitty report that the effect of refitting is almost always rather small4.

A useful tool in assessing model fit are *Z*-scores (also known as SD scores), defined as:

- (2)

where mean_{GA} and SD_{GA} are, respectively, the mean and SD given by the model for the GA at which the observation is made. Hence *Z*-scores represent the observed values expressed on a standard normal scale (with a mean of 0 and SD of 1), with the mean and SD adjusted for GA.

Altman and Chitty4 recommend three methods of evaluation for the goodness of fit, all of which Sherer *et al.*3 appear to have carried out. These methods will be illustrated using data on fetal biparietal diameter (BPD). A subset of 850 of the 19 647 fetuses analyzed by Salomon *et al.*11 were fitted with a ‘mean and SD model’ in the standard manner, as outlined above, resulting in a cubic mean model and a linear SD model. Firstly, a plot of the *Z*-scores against GA should be checked for the existence of any patterns. The *Z*-scores should be randomly scattered about zero at all GAs, with any deviation from this indicating that the mean curve may require modification. This is shown in Figure 1 for the example dataset, with the BPD *Z*-scores appearing to adhere to this stipulation.

Secondly, a normal plot (essentially a scatterplot of the actual data values plotted against the ‘ideal’ values from a normal distribution) can be used to check that the *Z*-scores have a close to normal distribution. This is signified by a roughly straight line but can be confirmed more formally using the Shapiro–Wilk *W* test or Shapiro–Francia *W*′ test. Figure 2 shows that in the example dataset the BPD *Z*-scores do have a close to normal distribution and this is corroborated by both the Shapiro–Wilk *W* and Shapiro–Francia *W*′ tests having *P* of 0.998.

Finally, the appropriate proportion of observations should fall between and outside fitted centiles, for example approximately 90% of *Z*-scores should lie between *Z* = −1.645 and *Z* = +1.645. Deviation from this may imply that a higher-order polynomial curve for the SD is needed. For the example dataset, lines corresponding to a BPD *Z*-score of ±1.645 have been plotted on Figure 1. A brief examination suggests that approximately 90% of the data lie between the lines, with calculations confirming that 4.9% of the data lie below *Z* = −1.645 and 4.2% above *Z* = +1.645 (compared to an expected 5% for each). It is unlikely that the values will both be exactly 5%, so figures such as these indicate an adequate level of fit.

This aspect of the data can be further examined in a plot such as in Figure 3, a histogram of the *Z*-scores with an overlaid standard normal distribution. If the model fits well then the histogram should match up with the standard normal distribution, meaning that the expected and observed centiles lie at the same values. Given the sample size of the dataset, the histogram for the BPD data shows a close to standard normal distribution, indicating an adequate model fit.

Once a satisfactory model has been determined, the centile curves for the desired reference interval may be calculated by substituting the expressions for the mean and SD into equation (1). The *Z*-score for any new individual may be calculated using equation (2) and its centile obtained using the inverse normal distribution. Finally, the calculated centiles should be superimposed on the scatter diagram of observed values against GA to ensure a suitable fit.

Besides the study currently under consideration, this approach to the construction of RIs has been widely used in the field of fetal measurements. Altman illustrated his absolute residual approach by developing reference centiles of fetal foot length9. Chitty *et al.* constructed new charts for fetal head circumference, BPD and other head dimensions12, fetal abdominal circumference and area13, and fetal femur length14. Royston and Wright1 estimated RIs for fetal head circumference (using the same data as Chitty *et al.*12), hemoglobin concentration and kidney volume. Salomon *et al.* constructed new reference charts and equations for fetal biparietal diameter, head circumference, abdominal circumference and femur length11.

#### Extensions to the mean and SD model

Several extensions to the basic ‘mean and SD model’ approach described above have been posited as ways to improve the performance of the method. The use of logarithmic transformations and fractional polynomials is described below.

##### Mean and SD model with logarithmic transformation

Many size measurements tend to follow a skewed normal distribution at a given GA, usually a positive skew where the right tail of the distribution is longer than the left. While this clearly conflicts with the assumption that at each GA the data come from a population with a normal distribution, it can often be overcome by the application of a logarithmic transformation. This same solution will also increase the ease with which a model can be fitted if the SD of the original measurements increases rapidly with GA.

Royston suggests initially attempting to fit the mean model to the original measurements10. If the residuals from this model show a positive skew then a logarithmic transformation should be performed on the original values, *y*, and the model refitted on log(*y*). If residuals from the refitted model are once again skewed, it is then recommended to try using a modified logarithmic transformation of the form log(*y* + *C*), where *C* is positive if the new residuals are negatively skewed, and negative otherwise. A polynomial model of the same degree as the optimal model for log(*y*) is then repeatedly fitted, with the value of *C* varied until the highest (i.e. least significant) *P*-value for the normality test of the residuals is reached. Often a value of *C* will be found that makes the distribution of residuals satisfactorily normal.

Once acceptable residuals from the mean model have been obtained, the rest of the ‘mean and SD model’ fitting procedure is continued as before. However, it is important to back-transform the curves once the model has been finalized using the antilog (exponential if a natural logarithmic transformation was used), also remembering to subtract *C* for a modified logarithmic transformation. While this simple procedure can easily cope with the problem of skewed data, Altman and Chitty report that very few fetal size measurements require transformation4.

The effect of the logarithmic transformation is illustrated here using data on birth weight in 58 940 neonates as analyzed by Salomon *et al.*15. Figure 4, a scatterplot of birth weight against GA at birth, shows a marked increase in variability with GA and also suggests a slight positive skew to the data at a given GA. In Figure 5 the birth weights have undergone a natural logarithmic transformation, resulting in a more constant variance over GA with any evidence of skew being removed. The fitting of a ‘mean and SD model’ to this transformed data should now be relatively more simple.

The modification of the ‘mean and SD model’ by the addition of a logarithmic transformation is somewhat less common than the unmodified version in the fetal size literature. Royston used a modified logarithmic transformation in an example concerning fetal triglycerides10. After fitting an initial quadratic mean model, positive skew was identified in the residuals. A logarithmic transformation was performed on the original values and a quadratic mean model fitted on log(*y*). However, this introduced negative skewness, so a modified logarithmic transformation was utilized. Wright and Royston, in an example regarding fetal abdominal circumference, also used a logarithmic transformation16.

##### Mean and SD model using fractional polynomials

Fractional polynomials (FPs), formalized by Royston and Altman17, extend the range of models afforded by conventional polynomials by allowing parameters to also take fractional powers. Whilst a conventional polynomial is of the form

FPs are defined as

where *p*_{1}, *p*_{2}, *p*_{3}, … are chosen from a predetermined set, usually taken to be {−2, −1, −0.5, 0, 0.5, 1, 2, 3}. Here a value of −1 represents the inverse of *t* and 0.5 the square root of *t*. By convention the power 0 is defined to be log(*t*). If one or more power(s) in the model is/are duplicated then the model will include ‘repeated powers’, whereby the second term is multiplied by log(*t*). As an example, an FP of degree 3 with powers (0, 2, 2) (i.e. *p*_{1} = 0, *p*_{2} = 2 and *p*_{3} = 2) is of the form

Estimation of the best fitting FP for a given dataset involves both a systematic search for the best power or combination of powers from the permitted set, and estimation of the associated parameter coefficients. This selection process includes fitting a model for each combination of powers in the permitted set. This means, for example, that fitting a fractional polynomial of degree 2 (i.e. of the form *a* + *bt* + *ct*) using the standard set detailed above would involve fitting a different model for each of the 36 permissible combinations of powers. From these models the one with the lowest residual standard deviation is chosen to be optimal.

FPs give at least as good a fit to data as a conventional polynomial of corresponding degree and often offer a better fit than conventional polynomials of higher degree. Royston and Wright recommend the use of FPs for modeling the mean or SD curve if a quartic or quintic polynomial is required for an adequate fit to the data1.

Over recent years the use of FPs in the construction of RIs has become more popular. Kurmanavicius *et al.*18 created ranges for BPD, occipitofrontal diameter, head circumference and cephalic index using this method, although in each case, bar the cephalic index SD, the best fitting fractional polynomial was found to be a conventional polynomial. Kurmanavicius *et al.*19 also modeled mean abdominal diameter, abdominal circumference and femur length using FPs, with only femur length SD taking a fractional model. Size charts for fetal bones (radius, ulna, humerus, tibia, fibula, femur and foot) were presented by Chitty and Altman after fitting FPs, with all but one mean model, though none of the SD models, taking fractional form20.

### Alternative methods

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

Besides the ‘mean and SD model’, Wright and Royston16 report the other most widely applied statistical approaches for estimating GA-specific reference intervals in practice to be those of smoothed crude centiles21 and LMS22–24, as detailed below.

#### Centile curves based on direct centile estimates

For a sufficiently large dataset (several hundred observations at each week of gestation, according to Altman and Chitty4), one intuitive approach is to calculate empirical estimates for each desired centile at a given GA. While the curves produced by joining these values will be rough, even for large sample sizes, smoother curves can be obtained by considering ‘windows’ of GAs instead of each GA separately. Here, increasing window size will increase smoothness, though information can easily be lost through oversmoothing16.

A more formalized version of this approach, with a second stage involving centile smoothing based on the technique of Cleveland25, is presented by Healy *et al.*21. This approach makes no assumption about the nature of the distribution of measurements at a given GA but takes advantage of the knowledge that both the centiles themselves and the intervals between centiles at a fixed GA should behave smoothly.

In the first stage, observations are ordered by GA and the first *k*, where *k* usually represents 5–10% of the total data, selected. Initial empirical centile estimates at the required values, for example 5%, 10%, 25%, 50%, 75%, 90% and 95%, are calculated from these *k* measurements by sorting and counting, and then plotted against the median GA of the *k* observations. This ‘window’ of *k* observations is then moved on to encompass measurements 2 to *k* + 1, then 3 to *k* + 2, etc., with the same estimation procedure repeated on each occasion, until all observations have been included.

The initial centile estimates will be irregular, so the second stage smoothes them to provide more usable centile curves. It is first assumed that each centile curve can be approximated by a polynomial of degree *p*, so that *y*_{i}, the smoothed value of the *i*th centile, is given by

- (3)

where *t* again represents GA. Now consider the proportion corresponding to the *i*th centile (for example 0.5 for the 50^{th} centile) and define *z*_{i} as its NED, similarly to previously.

The coefficients *a* for a fixed *j* are then modeled as a polynomial in *z*_{i}, so that

- (4)

where the degree *q*_{j} of the polynomial may differ from one value of *j* to another. This restricts the distance between centiles and prevents the resulting curves from crossing. Combining equations (3) and (4) gives a linear model for the centile values which can be fitted by least squares regression. It follows that for any observation a corresponding *Z*-score can be calculated by solving a polynomial equation, though the order of the polynomial may realistically prohibit this. Goodness of fit should be judged by counting the points falling between adjacent centiles. This method was applied by Wright and Royston to measurements of fetal abdominal circumference and provided an adequate fit16.

#### LMS

The LMS method, introduced by Cole22, 23 and refined by Cole and Green24, provides a general method for fitting smooth centile curves to reference data. It utilizes the power transformation family of Box and Cox26 to allow the skewness of the measurement distribution, as well as the median and variability, to vary with age. These three features of the distribution are summarized by the parameters λ, µ and σ, the initials of which (L, M and S) give rise to the name of the method. The original form22, 23 necessitated age to be split into groups—an arbitrary procedure whereby different groupings would produce different centile curves. This subjective stage was removed by Cole and Green24 through the addition of a nonparametric aspect. Owing to the superiority of the later version, only this is detailed here.

As previously asserted, many size measurements follow a skewed normal distribution. The use of a suitable power transformation, which stretches one tail of the distribution and shrinks the other, can remove this skewness and ‘normalize’ the data. One such family of transformations, proposed by Box and Cox26, is used in the LMS method, with the optimal power at a given GA calculated from the data to completely remove skewness in the distribution. As skewness changes with GA, the calculated power also changes.

Given a variable of interest *y* with median µ and a power transformation so that *y*^{λ} (or log(*y*) if λ = 0) is normally distributed, we consider the transformed variable

- (5)

based on the Box–Cox transformation26. This transformation maps the median µ of *y* to *x* = 0 and is continuous at λ = 0. For λ = 1 the SD of *x* is the coefficient of variation (CV) of *y*, and this remains approximately true for all moderate values of λ^{24}. The optimal value of λ now minimizes the SD of *x*.

Denoting the SD of *x* (and CV of *y*) by σ, the *Z*-score (or SD score) of *x* (and hence *y*) is given by:

- (6)

and is assumed to take a standard normal distribution.

Assume that the distribution of *y* varies with GA, *t*, and that λ, µ and σ at *t* are read off smooth curves *L*(*t*), *M*(*t*) and *S*(*t*). Then

- (7)

Rearranging equation (7) shows that centile 100_{α} of *y* at *t* is given by

- (8)

where *z*_{α} is the normal equivalent deviate of size α. This shows that if *L*, *M* and *S* are smooth, then so are the centile curves.

Cole and Green then introduce a penalized likelihood function, derived from equation (7), with three integrals providing roughness penalties for the curves *L*(*t*), *M*(*t*) and *S*(*t*)24. The extent of these penalties, and hence the smoothness of the curves, are controlled by three smoothing parameters, and these are the only parameters requiring specification in order to fit the model. However, ‘equivalent degrees of freedom’ (EDFs), calculated for each fitted curve as a function of these smoothing parameters, give a more usable measure of the extent of the smoothing.

The illustrative examples of Cole and Green24, although not from the field of fetal measurements, show values of the *L* curve falling well below zero. This indicates the presence of considerably more skew that a log transformation would remove and the extent of variability of the *L* curves with age reinforces the notion that transformation using a single power for all ages is inappropriate.

### Discussion

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

There are several viable methods available, of varying complexity, for constructing age-related RIs and centile charts. Ideally, methods should be understandable by clinicians, and the results easy to use, even without a statistical computer package. It is desirable that any published method should provide the potential user with the means of calculating the corresponding *Z*-score and centile for a given measurement. The mere provision of a mean model or centile chart, regardless of the quality, is not really adequate. Any approach must also be sufficiently flexible to be applicable successfully to many sets of data. Unfortunately, none of the methods currently available fulfills all these criteria, so it is unlikely that any one would be appropriate in all circumstances.

In the simplest setting, if it is plausible that the observed measurements at each GA do indeed come from a population with a normal distribution and, in addition, the variance across the age range is constant, then the use of conventional polynomial regression may be justified. However, the strict adherence to these assumptions is unlikely, meaning that the model may not produce sufficiently reliable reference intervals. Slightly more realistic is the acknowledgment that variance is likely to change over the age range. This feature can be included by fitting the ‘mean and SD model’ as described previously, though again the assumption of an underlying normal distribution is not always tenable. This issue can often be dealt with by the addition of a (modified) logarithmic transformation prior to the model fitting to correct any skew (distribution asymmetry). However, this approach still suffers from the well-known limitations of polynomial curve shapes. This last hurdle can be overcome by the relaxation of the restrictions imposed on the powers of the polynomial, allowing the use of FPs. As FPs give at least as good a fit to data as a conventional polynomial of corresponding degree, and as the fitting of FPs with most basic statistical software is relatively straightforward, there seems little reason not to adopt them as standard.

All of these variations on the ‘mean and SD model’ benefit from being relatively conceptually simple and easy to use, with the necessary techniques available in most basic statistical packages. The resulting centile curves and *Z*-scores can be expressed as explicit formulae, meaning that the centile position of any individual is easily obtainable. While the method as described here is adequate for most fetal measurements, there are some cases that cannot be handled properly by this approach. It is important to emphasize the strong assumption that at each GA the data come from a population with a normal distribution. While skewed data may sometimes be corrected by a log transformation, this is not always successful, with time-varying skewness especially difficult to accommodate. Even after transformation, kurtosis (a non-normal distribution shape) may remain in the data, again in contravention of the assumption. Variables with a complex curve shape beyond those available from conventional (or even fractional) polynomials may also require alternative techniques.

The method of producing centile curves based on empirical centile estimates as described by Healy *et al.* makes no assumption about the nature of the distribution of measurements at a fixed GA, which is an appealing feature21. This approach provides a flexible way of constructing centile curves that is capable of handling many patterns of growth due to the lack of a pre-specified functional form. However, there are some drawbacks. Experience is needed to find the best ways of choosing the values of the adjustable parameters involved, and clearly there is some degree of subjectivity here. The estimation of the centile values of further observations is not simple unless a very basic model has been fitted. There is also some vulnerability to outlying values affecting the derived centile values. We agree with the conclusion of Altman and Chitty that this is not a suitable method for the derivation of fetal size charts, except when other methods are unsuccessful4.

The LMS method with penalized likelihood24 is extremely flexible and widely applicable16. It is usually easy to produce convincing centile curves, regardless of the complexity of the curve shape, and time-varying skewness is easily dealt with. It also has the appealing by-product of the L, M and S curves which completely summarize the measurement's distribution over the age range and facilitate further investigation into the underlying structure of the data. Penalized likelihood provides an elegant solution for ridding the earlier method of its arbitrary categorization, with the smoothing of the three curves becoming an integral part of the likelihood maximization. Now the only arbitrariness in the procedure is the choice of the three smoothing parameters.

There are, however, some general problems with the smoothing approach. Where data are more sparse near the ends of the age range, ‘edge effects’ (spurious changes in the centiles) may be observed, though this can be avoided by truncating the data at each end. One major drawback of non-parametric estimators is the lack of a succinct formula with which to estimate further centile values. This means that centiles may only be displayed graphically or in tabular form. Finally, the assumption of normality following the Box–Cox transformation may be violated by the presence of kurtosis, for which the transformation does not adjust.

A more recently proposed generalization of the LMS approach, the LMSP method of Rigby and Stasinopoulos27, uses the Box–Cox power exponential (BCPE) distribution to try to overcome the issue of kurtosis. A fourth parameter is introduced in the power transformation in order to account for the observed kurtosis in the distribution, and centile estimation proceeds in a manner not dissimilar to that of the conventional LMS method.

While for the first-time user application of the LMS method may appear a daunting task, the advent of specially designed programs such as the LMSChartmaker of Cole and Pan28, as well as packages for the widely used general statistical programs, mean that with brief instruction this need not be the case.

Wright and Royston advise that a ‘simple formula’ to allow estimation of centile position for an individual is extremely valuable16. If, when considering the statistical approach to follow in light of requirements specific to the data under analysis, this requirement is deemed to be essential, then this would exclude both the LMS method and any approach based on empirical centile estimates. Of the methods examined here, this leaves only the parametric approach of the ‘mean and SD model’. So the choice of approach is really reduced to the trade-off between the simplicity, usability and accessibility of the inferior model provided by the parametric approach, and the superior but less user-friendly model provided by the LMS method.

### Conflict of Interest Statement

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

T. J. C. provides the lmsChartMaker Light program as a free download, and earns royalties on the Pro version of the program, which contains more facilities.

### References

- Top of page
- Introduction
- The general problem
- Mean and SD model
- Alternative methods
- Discussion
- Conflict of Interest Statement
- References

- 1How to construct ‘normal ranges’ for fetal variables. Ultrasound Obstet Gynecol 1998; 11: 30–38., .
- 2Correctly identifying the macrosomic fetus: improving ultrasonography-based prediction. Am J Obstet Gynecol 2000; 182: 1489–1495., , , .
- 3Nomograms of the axial fetal cerebellar hemisphere circumference and area throughout gestation. Ultrasound Obstet Gynecol 2007; 29: 31–36., , , , , .
- 4Charts of fetal size: 1. Methodology. Br J Obstet Gynaecol 1994; 101: 29–34., .
- 5Design and analysis of longitudinal studies of fetal size. Ultrasound Obstet Gynecol 1995; 6: 307–312., .
- 6Calculation of unconditional and conditional reference intervals for foetal size and growth from longitudinal measurements. Stat Med 1995; 14: 1417–1436..
- 7The impact of choice of reference charts and equations on the assessment of fetal biometry. Ultrasound Obstet Gynecol 2005; 25: 559–565., , , , .
- 8Analysis of Z-score distribution for the quality control of fetal ultrasound measurements at 20–24 weeks. Ultrasound Obstet Gynecol 2005; 26: 750–754., , .
- 9Construction of age-related reference centiles using absolute residuals. Stat Med 1993; 12: 917–924..
- 10Constructing time-specific reference ranges. Stat Med 1991; 10: 675–690..
- 11French fetal biometry: reference equations and comparison with other charts. Ultrasound Obstet Gynecol 2006; 28: 193–198., , , , , , .
- 12Charts of fetal size: 2. Head measurements. Br J Obstet Gynaecol 1994; 101: 35–43., , , .
- 13Charts of fetal size: 3. Abdominal measurements. Br J Obstet Gynaecol 1994; 101: 125–131., , , .
- 14Charts of fetal size: 4. Femur length. Br J Obstet Gynaecol 1994; 101: 132–135., , , .
- 15Birth weight and size in France. Charts and equations. J Gynecol Obstet Biol Reprod (Paris) 2007; (in press)., , , , .
- 16A comparison of statistical methods for age-related reference intervals. J R Statist Soc A 1997; 160: 47–69., .
- 17Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Statist 1994; 43: 429–467., .
- 18Fetal ultrasound biometry: 1. Head reference values. Br J Obstet Gynaecol 1999; 106: 126–135., , , , , , .
- 19Fetal ultrasound biometry: 2. Abdomen and femur length reference values. Br J Obstet Gynaecol 1999; 106: 136–143., , , , , , .
- 20Charts of fetal size: limb bones. BJOG 2002; 109: 919–929., .
- 21Distribution-free estimation of age-related centiles. Ann Hum Biol 1988; 15: 17–22., , .
- 22Fitting smoothed centile curves to reference data. J R Statist Soc A 1988; 151: 385–418..
- 23The LMS method for constructing normalized growth standards. Eur J Clin Nutr 1990; 44: 45–60..
- 24Smoothing reference centile curves: the LMS method and penalized likelihood. Stat Med 1992; 11: 1305–1319., .
- 25Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 1979; 74: 829–836..
- 26An analysis of transformations. J R Statist Soc B 1964; 26: 211–252., .
- 27Smooth centile curves for skew and kurtotic data modelled using the Box–Cox power exponential distribution. Stat Med 2004; 23: 3053–3076., .
- 282005; http://www. healthforallchildren.co.uk [Accessed 1 December 2006]., . LMSChartmaker.