Fetal cerebellar growth and Sylvian fissure maturation: international standards from Fetal Growth Longitudinal Study of INTERGROWTH‐21st Project

To construct international ultrasound‐based standards for fetal cerebellar growth and Sylvian fissure maturation.


INTRODUCTION
The fetal central nervous system undergoes extraordinary transformation throughout pregnancy. The cerebellum and brain cortex are major landmarks of this complex yet highly organized neurodevelopmental process 1,2 .
The cerebellum, which is associated later in life with sensorimotor, cognitive and affective regulation 1,3 , can be identified during fetal life by ultrasonography as early as 12 weeks' gestation 4 . In clinical practice, ultrasonography is used to evaluate the anatomical integrity of the fetal cerebellum and linear growth of the transcerebellar diameter (TCD) 1,3,5 . Measuring the TCD also enables estimation of gestational age in cases of uncertain dates in the third trimester 6,7 , and may be more suitable when there are fetal growth disturbances, because cerebellar growth is less affected by placental insufficiency due to the 'brain-sparing' phenomenon 6,8 .
The fetal brain cortex displays remarkable gestationalage-specific maturation 9,10 . A leading marker of these processes, which follow a predictable timetable 9,11,12 , is the development (operculization) of the Sylvian fissure (SF) on the lateral convexities of the cerebral hemispheres.
There are several reference ranges for TCD measurements 4,6,9,[13][14][15][16][17][18][19][20][21][22] and SF maturation 9,[22][23][24][25][26] . Many of the studies providing these data have a high risk of methodological bias owing to sample selection, study design, data analysis and lack of ultrasound quality control. These limitations most probably explain the reported variation in the range of values across pregnancy, which makes clinical interpretation difficult. In addition, and importantly, none of the fetal studies continued to assess neurodevelopment into early childhood, which would seem a logical requirement for any tool proposed to evaluate fetal normality 27 .
The Fetal Growth Longitudinal Study (FGLS) of the INTERGROWTH-21 st Project has produced international standards, based on World Health Organization (WHO) recommendations 28 , for early and late pregnancy dating 29,30 , fetal growth 31 and estimated fetal weight 32 , as well as other aspects of pregnancy care 33,34 . To complement these clinical tools, we aimed to produce international standards for longitudinal TCD growth and SF maturation from the same population of healthy pregnant women contributing data to the FGLS whose babies had adequate growth and development from early fetal life to 2 years of age 35 .

Study population
INTERGROWTH-21 st is an international, multicenter, population-based project (www.intergrowth21.org.uk) 28 . Phase I of the INTERGROWTH-21 st Project, conducted between 2009 and 2016, consisted of nine complementary studies designed to describe optimal human growth and development, based conceptually on the WHO prescriptive approach. The FGLS, one of the main studies of the INTERGROWTH-21 st Project 31 , enrolled, before 14 weeks' gestation, a large cohort of healthy, well nourished women with a naturally conceived singleton pregnancy who met rigorous individual inclusion criteria, and whose babies were monitored until 2 years of age, in order to generate international standards.
The INTERGROWTH-21 st Project methodology has been described elsewhere in detail 31 . Briefly, participants were first selected at the population level and then at the individual level. At the population level, urban areas (a complete city or county, or part of a city with clear political or geographical limits) in which most deliveries occurred in healthcare facilities were identified. The areas had to be located at an altitude of < 1600 m, with a low risk of fetal growth disturbances, as well as have an absence or low level of major, known, non-microbiological contamination, such as pollution, domestic smoke, radiation or any other toxic abuse. Within each area, all institutions providing pregnancy and neonatal care in which more than 80% of births occurred were selected. From these populations, women were selected at an individual level if they had no clinically relevant obstetric or medical history, initiated antenatal care before 14 weeks and met the entry criteria of optimal health, nutrition, education and socioeconomic status 31 .
The women underwent serial fetal ultrasound scans every 5 weeks (± 1 week) until 41 + 6 weeks in eight urban areas worldwide that were geographically delimited to ensure that the study was population based. At each visit, a set of two-dimensional (2D) images and three-dimensional (3D) volumes of fetal biometric parts was obtained and stored digitally 31 . Gestational age was based on the last menstrual period (LMP) provided that its date was certain, the woman had regular 24-32 day cycles, she had not been using hormonal contraception or breastfeeding in the preceding 2 months, and any discrepancy between gestational age based on LMP and gestational age based on crown-rump length, measured by ultrasound at 9 + 0 to 13 + 6 weeks after the LMP, was 7 days or less 31,36 .
All ultrasound scans were performed by sonographers who were trained in a standardized way and audited regularly according to FGLS requirements. The same ultrasound equipment was used at all study sites (Philips HD-9; Philips Ultrasound, USA), with curvilinear abdominal 2D transducers C5-2, C6-3 and one curvilinear abdominal 3D transducer V7-3, and was specially adapted to ensure that measurement values were not visible on the screen in order to reduce 'expected value' bias 31,37 .
For the present analysis, we included only FGLS participants whose children had neurodevelopmental, nutritional and morbidity assessments at 2 years ± 2 months of age, in five of the original eight study sites (Pelotas (Brazil); Turin (Italy); Oxford (UK); the central area of Nagpur (India); and the Parklands suburb of Nairobi (Kenya)) 35,38,39 . Three sites, in China, Oman and the USA, did not participate in the early childhood development assessments for logistical and administrative reasons pertaining to the timing of the start of the study and/or staff availability.
The INTERGROWTH-21 st Project was approved by the Oxfordshire Research Ethics Committee ''C'' (ref: 08/H0606/ 139), the research ethics committees of the individual participating institutions, and the corresponding regional health authorities in which the project was implemented. Participants provided written consent to be involved in the project 28 .

Volume acquisition, offline analysis and measurement methodology
TCD measurement and SF assessment were performed using still images, extracted from the available 3D fetal head volumes acquired at the five study sites. Head volumes were acquired at the level of the axial, transthalamic plane. Six predefined quality-control criteria of the 2D images had to be satisfied to acquire the volume: oval shape, symmetrical plane, thalami and cavum septi pellucidi (CSP) visible, cerebellum not visible and head occupying at least 30% of the image 40 .
The acquisition was undertaken with the volume data box and angle of sweep (usually 70 • ) adjusted to include the entire skull, during fetal quiescence, with the mother holding her breath and with the transducer held steady. The real-time image was observed during acquisition to confirm that the sweep included the entire skull with no maternal or fetal movement during the sweep; otherwise, the volume was discarded and the acquisition repeated 41 . All data were then transferred electronically to the Ultrasound Coordinating Unit in Oxford. Further details of the methodology for volume acquisition are available at https://intergrowth21.org.uk (follow the link to 'Study Protocol' to download the ultrasound manual) 41 .
Offline image analysis for plane reconstruction and measurements was carried out using the open-source image analysis software program MITK (Medical Imaging Interaction Toolkit MITK, version 0.12.2; German Cancer Research Center, Division of Medical Image Computing, www.mitk.org). All measurements were undertaken by one experienced fetal medicine specialist at the Coordinating Unit in Oxford, who was standardized in ultrasound volume manipulation 37 and who was blinded to the clinical details and gestational age.
The TCD was measured in the standard transcerebellar plane 5 , while SF maturation was assessed in the transthalamic plane 40 . First, each stored volume of the fetal head was uploaded onto the multiplanar mode facility 42 . Second, starting from this plane, rotation or scrolling of the volume in orthogonal planes was undertaken with the fulcrum of rotation primarily in the middle of the CSP 42 . As the transthalamic plane was the plane of volume acquisition, it required minimal manipulation to extract the 2D plane 40 . The transcerebellar 2D plane was extracted at the level of the transthalamic plane with a slight posterior tilting and visualization of the frontal horns of the lateral ventricles, CSP, thalami, cerebellum and cisterna magna 5 . Once the appropriate transcerebellar 2D plane had been extracted, the TCD was measured perpendicular to the midline echo (falx cerebri), with the calipers placed 'outer to outer' between the distal margins of the hemispheres at the largest transverse diameter of the cerebellum 5 .
Assessment of the SF was performed in the brain hemisphere distal to the probe to prevent shadowing from the fetal skull bones, with a focus on the angle changes between the insula and the temporal lobe 22 . We used a simple, unweighted, scoring system, ranging from Grade 0 (no development) to Grade 5 (maximum development), employed previously for magnetic resonance imaging 43 and ultrasound analysis of cortical maturation ( Figure S1) 9,22,44 . The hemisphere in which the SF was measured (right or left) was identified by combining fetal presentation (cephalic or breech) with head direction at the time of measurement; when the presentation was transverse or oblique, the hemisphere was not determined.
To ensure that the best possible images were obtained for each extracted plane, we used a scoring system to evaluate and grade image quality 45 (1, impossible to assess accurately; 2, possible to assess accurately; 3, good; 4, almost perfect). Only 3D volumes that scored 2-4 were included in the analysis.

Reproducibility
Formal assessment of interobserver reproducibility for plane reconstruction and TCD acquisition was undertaken following the INTERGROWTH-21 st quality control strategy 45,46 in a randomly selected subset of 132 fetuses (12%). From this subset, a single head volume of each fetus was selected randomly and assessed by two fetal medicine specialists. Both observers independently uploaded each volume, extracted the transcerebellar plane and measured the TCD blinded to all measurements obtained (including their own).

Statistical analysis
Outliers, defined as measurements > 5 SD above the mean at each gestational age, were excluded. To assess the possibility of pooling data across sites, we used two complementary analytical strategies 36 . Firstly, variance component analysis was used to calculate the percentage of total variance due to between-site variance, as well as an estimation of the percentage of total variance for individuals within each site. Secondly, for each site, at five specific gestational-age windows, we calculated the difference between each site's mean and the mean of all sites together. Each difference was then expressed as a proportion of all the sites' SD, i.e. the SD of the data pooled across all sites, at each corresponding gestational age, to give the standardized site difference (SSD). The SSD is similar to the Z-score and is expressed in units of all the sites' SD (i.e. 1.0 standardized difference = 1.0 of all the sites' SD). The SSD allows for direct comparisons of biometric measurements in populations across pregnancy, standardized by the corresponding pooled SD. A pattern of SSD values < 0.5 was prespecified in the FGLS protocol, in keeping with WHO recommendations, as an adequate cut-off value for combining data from all sites 36 .
The distribution of fetal cerebellar measurements was assessed for normality, conditional on gestational age. We then modeled TCD as a function of gestational age using fractional polynomial regression and obtained the fitted centiles 47 . For the SF maturation analysis, we calculated the mean gestational age and 95% CI for each development score. Goodness of fit of the resultant models was assessed as described previously by Ohuma  In addition, model fit was assessed visually using quantile-quantile (Q-Q) plots of the residuals, plots of residuals against fitted values, distribution of fitted Z-scores against gestational age and a comparison of the estimated proportions of observations falling below the 3 rd centile or above the 97 th centile to the expected proportions of 3%.
Measurement reproducibility was assessed using Bland-Altman analysis of the interobserver differences and their 95% limits of agreement 49,50 . We did not consider the intraclass correlation coefficient (ICC) to be appropriate for the reproducibility study since it depends on the range of the measurement values. Consequently, rather than having fixed values, ICCs vary according to the range of gestational ages being studied 51 . Therefore, we instead used the method proposed by Bland and Altman, which has been shown to be more appropriate for assessing the repeatability of two measurements 49 . Analysis was performed using Stata version 15 (StataCorp., College Station, TX, USA).

RESULTS
Of the children in the original FGLS cohort 31 who had a developmental assessment at 2 years of age (n = 1339) 38 , 1130 (84%) also had an available 3D ultrasound fetal head volume. The proportional contribution of cases from the study sites was 26% for India (n = 291), 25% for Kenya (n = 280), 23% for Italy (n = 262), 14% for Brazil (n = 156) and 12% for the UK (n = 141).
The sociodemographic characteristics and pregnancy/ perinatal outcomes of the study sample are presented in Table 1. They are similar to those of the total FGLS cohort 31 , and confirm the health and low-risk status of the population studied. In addition, we provide evidence that the fetuses whose brain development is the subject of this analysis had low morbidity and adequate growth and development at 2 years of age (Figure 1 and Tables 2 and  3) 35,38 . The median number of ultrasound scans per woman was five (range, 1-6); 90% had four or more scans, indicating good adherence to the study protocol. The total number of 3D volumes available was 5746; however, 2730 (48%) volumes could not be assessed. Most of these were obtained after 32 weeks' gestation when visualization and assessment of brain structures are hampered by acoustic shadowing of the calcified fetal skull, fetal head position in the maternal pelvis and a reduction in amniotic fluid volume 37,42,52 . Other reasons for images of limited quality were fetal movement artifact not evident during the original scan, acoustic shadows from proximal structures, reverberation artifacts and unfavorable fetal head orientation 22,37 . After these exclusions and removal of 11 outliers, 3016 and 2359 volumes from 1130 fetuses were available for TCD and SF analysis, respectively.
Interobserver reproducibility for plane reconstruction and TCD measurement was assessed in a randomized subset of volumes from 132 fetuses (12% of the total sample of 1130 fetuses), across a gestational-age range from 15 + 0 weeks to 36 + 2 weeks. TCD could be measured in 108 (82%) of the 132 volumes. Bland-Altman plots were used to present graphically the interobserver differences and their 95% limits of agreement ( Figure 2) 49,50 . The mean interobserver difference was very close to zero (0.07 mm); the inferior and superior limits of agreement (± 2 SD) were -2.40 mm and + 2.55 mm, respectively, suggesting very close agreement. No evidence of consistent bias was seen across the range of measurements. For both structures assessed, within-site variance was greater than between-site variance. For TCD, within-site   Age-and gender-specific Z-scores and centiles. †Mean values were estimated from raw data. IQR, interquartile range.
International fetal transcerebellar diameter and Sylvian fissure maturation standards 619 variance was 0.33 (10.9% of the total variance), while between-site variance was 0.005 (0.2% of the total variance). For SF maturation, the within-site variance estimate was 0.007 (3.2% of the total variance), while between-site variance was 0.005 (2.3% of the total variance) ( Table 4). SSD according to gestational age for the five sites was expressed as a proportion of the SD of all the sites at each gestational-age interval. All SSDs for TCD and SF score were below 0.5 SD for all five fetal gestational-age Mean interobserver difference (mm) Absolute interobserver difference (mm) Figure 2 Bland-Altman plot showing interobserver reproducibility for measurement of fetal transcerebellar diameter on ultrasound in 108 randomly selected fetuses. Mean interobserver difference ( ) and 95% limits of agreement (± 2 SD) ( ) are shown.   windows ( Figure S2). The results of these analyses show that the five study populations are sufficiently similar, according to WHO predefined criteria 53 , for the data to be pooled to produce international standards.
All cerebellar measurements were distributed normally, conditional on gestational age. The best fitting powers were provided by a second-degree fractional polynomial. The gestational-age-specific 3 rd , 50 th and 97 th smoothed centiles for TCD are shown in Figure 3a, and the gestational-age-specific 3 rd , 5 th , 10 th , 50 th , 90 th , 95 th and 97 th smoothed centiles are presented in Table S1. For clinical purposes, we also present the same fitted centiles of gestational-age estimation based on measurement of TCD (Figure 3b, Table S2).
Assessment of goodness of fit of the TCD model by gestational-age-specific comparisons of empirical centiles to smoothed centile curves showed good agreement, and scatterplots of Z-scores by gestational age did not show any patterns (data not shown). The equations for the mean and SD from the fractional polynomial models for TCD are presented in Table S3, which allow the reader to calculate any desired centiles according to gestational age in exact weeks. The actual values for these centiles according to gestational age are presented in Table S1. Striking overlap of TCD values between male and female fetuses was seen ( Figure S3).
The SF increased in maturation with advancing gestation, and there was no appreciable difference between study sites, with complete overlap of the mean gestational Spaghetti plot of fetal Sylvian fissure maturation scores, obtained longitudinally in fetuses that contributed data to Sylvian fissure maturation international standards, according to study site. , Kenya (n = 280 (24.8%)); , India (n = 291 (25.8%)); , Italy (n = 262 (23.2%)); , Brazil (n = 156 (13.8%)); , UK (n = 141 (12.5%)); total, n = 1130. age and 95% CI for each developmental score (Figure 4). Spaghetti plots of SF scores obtained longitudinally in the same individuals, according to study site, are shown in Figure 5. Similar to TCD, there were no sex differences in SF maturation score ( Figure S4). Finally, it was possible to determine which SF was measured (right or left) in 2359 scans; spaghetti plots of the maturation score between the right and left SFs showed no differences ( Figure S5).

DISCUSSION
In this study, we present international standards for fetal cerebellar growth and SF maturation based on data from a large, longitudinal sample, obtained under rigorously controlled conditions, from well nourished women living in environments with minimal constraints on fetal growth, across five geographically diverse urban areas worldwide. In addition, and unique to the ultrasound literature, we provide follow-up evidence that the fetuses that contributed data to these standards had low morbidity and adequate growth and development at 2 years of age 35,38 .
Variance component analysis showed that only 0.2% and 2.3% of the total variability in fetal cerebellar growth and SF maturation, respectively, could be attributed to between-site differences. These results are compatible with the 1.9-3.5% variability between sites reported for fetal skeletal growth and newborn length in the FGLS 36 , the 3% variability reported for infant length in the WHO Multicentre Growth Reference Study 53 , and the 1.3-9.2% variability reported for skeletal growth and neurodevelopmental milestones at 2 years of age in the FGLS follow-up study 35,38 .
Our results refute suggestions that the observed variability in these measurements, between unselected samples, is related to genetic differences 3,17,21,54,55 . Similar suggestions have been made previously about human growth. However, there is now consistent evidence that the variability in human skeletal growth within a population is seven times larger than that between populations (genetic variability), which represents less than 10% of the total variance 36,56,57 .
Our results also do not support the recently published suggestion of in-utero sexual dimorphism in the development of brain structure and function 58,59 . Specifically, male fetuses have been documented to have larger cerebellar gray-matter volume, as well as greater intracerebellar functional connectivity 59 . However, we did not find any sex-related differences in the pattern of growth or maturation of the studied brain structures. In addition, we found no differences between the right and left SF in this in-utero population; in the mature brain, the SF follows a steeper trajectory in the right hemisphere, while it extends further posteriorly and is longer (in horizontal length) on the left side. It is possible that these differences are not evident in utero or in the axial ultrasound planes.
The international standards presented here demonstrate a more than 2-fold increase in TCD during the second half of pregnancy ( Figure 3). The rapid growth of the cerebellum within a relatively narrow time period suggests that TCD measurement may facilitate gestational age estimation during the second and early third trimesters. This may also apply in suspected fetal growth restriction, as the brain-sparing phenomenon may protect cerebellar growth 4,19,21,28 , and close to term when the head may be more difficult to measure 4,7,19,21 . In our study, TCD predicted gestational age well, and the prediction intervals of ± 7 days at 20 weeks and ± 10 days at 32 weeks compare favorably with those obtained using head circumference 30 . Further work will be needed to assess robustness in pregnancies with abnormal growth.
Longitudinal evaluation of SF maturation shows a characteristic pattern 44,60,61 . We estimated the mean gestational age and 95% CI at which each SF maturation categorical score is expected. Hence, assessment of the progress of SF maturation may be a useful adjunct to fetal brain examination, especially if a brain abnormality is suspected 23,44 .
We chose to describe and quantify ultrasound patterns of SF maturation for the following reasons: first, the SF is the first primary fissure to be evident sonographically (at 17-19 weeks' gestation), which means that it can be assessed readily at the time of the mid-trimester fetal scan 9,52 ; second, its development follows a predictable timetable during pregnancy, which makes it easier to incorporate SF assessment into routine clinical care 11,52 ; third, SF assessment is feasible using the standard 2D ultrasound plane used for routine fetal head biometry, facilitating examination without specialist training in neurosonography or transvaginal scanning, and without increasing the total scanning time 11,23 ; and, fourth, a simple SF scoring system is available 22 . More complex scoring systems or measurements may then be used for detailed neurosonographic examination 11,23,24,62 .
A number of studies have reported TCD 4,6,9,13-22 and SF maturation 9,22-24 reference ranges; however, existing charts have several limitations. A recent systematic review found substantial heterogeneity in the methodological quality of studies aimed at developing fetal brain structure charts, which can lead to significant variability in the interpretation of ultrasound measurements. None of the studies had a low risk of bias for sample selection, inclusion/exclusion criteria, quality control, neonatal/infant outcomes or neurological follow-up; in addition, goodness of fit of the proposed model was reported in fewer than 35% of the studies 27 . We therefore aimed to avoid these limitations by producing international standards to complement those from the INTERGROWTH-21 st Project that are already being used clinically and for research purposes [29][30][31][32][33][34]63,64 . The 3D ultrasound volumes were taken specifically for this purpose during the FGLS, using rigorous methods, identical ultrasound equipment, blinding of operators and a detailed quality-control strategy 28,31 . The images were extracted from the volumes using standardized axial planes recommended for routine clinical practice 5 , and were measured and scored in accordance with a predefined protocol 40 . Our reproducibility study suggested a high level of agreement in fetal imaging. Furthermore, and crucially, the cohort was followed up to 2 years of age and developmental outcomes confirmed their eligibility for the construction of international standards 38,39 .
It could be said that measurements or scores acquired on planes extracted from 3D volumes do not completely accord with 2D measurements obtained in real time 37,42,65 . However, we believe that this is unlikely because, although volumetry is associated with a degree of variability if not standardized, once rigorous methodology is adopted, 2D assessments from reconstructed planes can be as reproducible as, and consistent with, 2D measurements obtained in real time 22,37,42,52,65 .

Conclusions
The growth and developmental patterns of the fetal brain structures we studied were similar across diverse geographical regions, and there were no differences between male and female fetuses. Hence, we pooled the data to produce international standards for TCD growth and SF maturation. We suggest that widespread implementation of the standards will enhance the clinical interpretation of fetal brain scans and standardize research findings.

ACKNOWLEDGMENTS
This study was funded by the INTERGROWTH-21 st grant 49 038 from the Bill & Melinda Gates Foundation to the University of Oxford; we gratefully acknowledge their support. A.T.P. is supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre with funding from the NIHR Biomedical Research Centre (BRC) funding scheme. The views expressed herein are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health or any of the other funders. We are grateful to Professor Brenda Eskenazi for her comments regarding the Sylvian fissure analysis. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

SUPPORTING INFORMATION ON THE INTERNET
The following supporting information may be found in the online version of this article:

Figure S2
Standardized site discrepancy (SSD) for fetal transcerebellar diameter (TCD) (a) and Sylvian fissure (SF) maturation score (b), according to gestational age. SSD calculated as: (mean of site's measurementsmean of all sites' measurements at each gestational age interval)/SD of all sites' measurements at each gestational age interval. SSD adjusted at median gestational age for all sites at each gestational age interval. ± 0.5 SD is shown (----).

Figure S3
Transcerebellar diameter ultrasound measurements according to gestational age and fetal sex (male (green) or female (red)), in fetuses that contributed data to transcerebellar diameter international standards. No suggestion of any sex differences was evident.

Figure S4
Spaghetti plot of fetal Sylvian fissure maturation scores, obtained longitudinally in fetuses that contributed data to Sylvian fissure maturation international standards, according to fetal sex. No suggestion of any sex differences was evident.

Figure S5
Spaghetti plot of Sylvian fissure maturation scores, obtained longitudinally in fetuses that contributed data to Sylvian fissure maturation international standards, according to side of the Sylvian fissure. No suggestion of any differences was evident.

Table S1
Smoothed centiles for transcerebellar diameter (mm) according to exact gestational age (weeks)