Validity and reliability of radiological methods to assess proximal hip geometry in children with cerebral palsy: a systematic review




The aim of this systematic review was to assess the current validity and reliability of radiological methods used to measure proximal hip geometry in children with cerebral palsy.


A search was conducted using relevant keywords and inclusion/exclusion criteria of the MEDLINE, CINALH Plus, Embase, Web of Science, Academic Search Premier, The Cochrane Library, and PsychINFO databases.


The migration percentage using X-rays showed excellent reliability and concurrent validity with three-dimensional (3D) measurements from computed tomography (CT) scans. The acetabular index, measured using X-rays had good reliability but moderate concurrent validity with 3D CT measurements; 3D CT scan indexes had greater reliability. The measurement of the neck shaft angle using X-rays showed excellent concurrent validity with measurements from 3D CT scans and excellent reliability. Regarding femoral anteversion, one study found an excellent correlation between two-dimensional CT and clinical assessment and excellent reliability. Two others showed less evidence for the use of CT ultrasounds.


Most of the X-ray-based measurements showed good to excellent metrological properties. More metrological evidence is needed for the assessment of femoral anteversion. Magnetic resonance imaging and ultrasound-based measurements have great potential although very little metrological evidence is available.

What this paper adds

  • Most X-ray based measurements have good to excellent metrological properties.
  • More metrological evidence is needed for the assessment of femoral anteversion.
  • MRI- and ultrasound-based measurements have great potential but little metrological evidence is available regarding their use.

Hip deformities occur in over one-third of children with cerebral palsy (CP) and are the second most common musculoskeletal deformity after equinus.[1-3] The femoral head frequently migrates relative to the acetabulum, which can lead to subluxation causing pain and functional limitations.[4] Surveillance of hip deformity throughout growth is a challenge in this population.[5-7] Hip migration has been shown to be mostly associated with acetabular dysplasia, increased femoral neck-shaft angle, and increased femoral anteversion.[2, 8-11] These parameters are used in clinical decision-making regarding interventions[2, 10, 12] such as physical therapy, positioning, botulinum toxin injections or surgical interventions, such as adductor tenotomy, derotational, or varising osteotomy.[13-15] Because clinical evaluation of proximal hip geometry is limited to the measurement of femoral anteversion and joint range of motion, radiologically-based measurements are required to provide a detailed assessment and to ensure a reliable follow-up throughout growth, pre- and post intervention, or in research studies. The metrological properties of such measurements have not been specifically reviewed with regard to the evaluation of hip deformities found in children with CP. Numerous techniques are described in the literature, in studies of varying quality, making analysis difficult.

The main aim of this paper was to review the evidence of the metrological properties of image-based measurement of proximal hip geometry in children with CP, including hip migration, acetabular dysplasia, femoral neck shaft angle, and femoral anteversion. More specifically we aimed to (1) collect, evaluate, and report the data in studies that assessed the concurrent validity and reliability of imaging methods; (2) report the threshold for ‘real change’ when available; and (3) propose future research.


Database search and selection process

Articles were identified through a comprehensive search of the following computerized bibliographic databases: MEDLINE (1949-07/2012), CINALH Plus (1937-07/2012), Embase (1947-07/2012), Web of Science (1898-07/2012), Academic Search Premier (1975-07/2012), The Cochrane Library, and PsychINFO (1967-07/2012).

The search used the following Medical Subject Headings (MeSH) terms and text words, combining the keywords in order to achieve exhaustivity: (1) cerebral palsy; (2) (keywords relative to the hip or femur) ‘acetabular dysplasia’, ‘acetabular index’, ‘hip migration’, ‘hip subluxation’, ‘hip dysplasia’, ‘femoral anteversion’, ‘neck-shaft angle’, ‘coxa valga’; (3) (keywords relative to the imaging type) ‘radiography’, ‘X-ray’, ‘tomography’, ‘CT scan’, ‘ultrasonography’, ‘ultrasound’, ‘echography’, ‘MRI’, ‘biplanar radiography’, ‘biplanar X-ray’; and (4) (keywords relative to metrological properties) ‘measurement’, ‘measure’, ‘validity’, ‘reliability’, ‘repeatability’.

Two reviewers (BM and CP) independently assessed the papers by title and abstract with regard to the inclusion and exclusion criteria described below. Consensus for the inclusion/exclusion of relevant articles was reached by discussion. To be included, studies had to meet the following criteria: (1) original articles published in peer-review journals excluding conference proceedings; (2) studies including children or young adults below the age of 20 years with CP; (3) studies involving evaluation of imaging-based measurement of proximal femoral geometry (acetabular dysplasia, hip migration, neck-shaft angle or femoral torsion); and (4) studies reporting data regarding reliability and/or concurrent validity of imaging techniques.

The exclusion criteria were (1) studies which were not in English; (2) studies including CP and other pathologies for which it was not possible to extract the data of the children with CP; (3) studies before 1980 were reviewed and were considered inappropriate because they involved radiological methods that are now unused;[16, 17] (4) studies showing concurrent validity between different measures measuring the same concept using the same material (e.g. migration percentage and centre-edge angle measuring hip migration in the same anteroposterior X-ray as in Reimers[18]) were also excluded on the basis that the other studies comparing the same concept with different devices provided more evidence. Studies that analysed correlations between the different measures of proximal hip geometry without carrying out a metrological evaluation were also excluded (e.g. Abel et al.[8]).

The references of the selected articles were also searched in order to complete the selection process. Data items were extracted using a standardized form (see Tables 1 and III).

Table 1. Population and methodology of articles evaluating hip migration, acetabular morphology and of the Robin and Graham classification
StudyType of studyNumber of hipsNumber of childrenMean Age/range/SDPopulation descriptionRadiological devicePosture control Y=yes N=not reportedMeasureComparison with another techniqueNumber of examinersExaminers: qualifications and years of experienceNumber of trials by sessions/number of sessionsAssessmentStatistical analysis
  1. R, retrospective; P, prospective; GMFCS, Gross Motor Function Classification System; quadri, quadriplegic; dipl, diplegic; hemi, hemiplegic; 3D, three-dimensional; ICC, intraclass correlation coefficient; SEM, standard error of measurement; MAD, median absolute difference; LA, limits of agreement; RMSE, root mean square error.

Hip migration
Cliffe et al.[36]P402030mo–10yGMFCS level IV: 10 V: 10Anteroposterior X-ray, two radiographs taken the same dayPelvis and hips in neutral positionMigration percentage2Paediatric radiologistsInterpretation by two radiologists of the two radiographs for each children twiceIntra- and inter-reliability, assessment of the effect of positioning, inter-observer error and variations in observer technique over timeMean, SD, Pearson's correlation coefficient, ICC
Faraj et al.[37]R44222–8yAnteroposterior X-rayHip joints in neutral position, without ab/adductionMigration percentage2Orthopaedic trainees2/2Intra- and inter-reliability within and between sessionsMAD, Kruskal–Wallis, Mann Whitney U
Kim et al.[38]R1521007y 11mo (SD 1y 6mo)Anteroposterior X-rayCoxae in neutral position

Migration percentage

Modified migration percentage

2Rehabilitation doctors1/3Intra- and inter-reliabilityICC, SEM
Pountney et al.[40]R20Anteroposterior X-rayPelvis and hips in neutral positionMigration percentage (use of a drafting arm)32/2Intra- and inter-reliabilityBland and Altman (LA)
Acetabular morphology
Chung et al.[31]R54277y 2mo (SD 2y 4mo)Spastic quadri:19 dipl: 6 mixed: 23D CTN3D visual assessment of acetabular dysplasia (anterior, posterior or global defect)420: 5y/4y/trainee1/2Intra- and inter-reliabilityKappa,% of agreement (rate of people classifying the acetabulum in the same class)
Chung et al.[29]P17128y 1mo (5y 11mo – 13y 2mo)12 spastic quadri GMFCS III: 1 IV: 8 V: 33D CTNThree directional indexes (anterosuperior, superolateral, posterosuperior)320: 5y/chief resident1/2Intra- and inter-reliabilityICC, Mean difference (range)
Park et al.[33]P22168y 4mo (SD 2y 2mo)Spastic quadri GMFCS III: 2 IV: 8 V: 63D CTNFrom reformatted axial plane: anterior/posterior acetabular indexes and acetabular anteversionThree-directional indexes (anterosuperior, posterosuperior) for pre- and post-osteotomy420y/6y/5y/research assistant1/2Intra- and inter-reliability, concurrent validitySample size, ICC, Pearson correlation coefficients, Paired t test and Wilcoxon signed rank test
Multiple evaluations
Gose et al.[35]R150 (102 for concurrent study)755y 5mo (2y 8mo– 6y 11mo)Spastic dipl: 60, quadri: 15 GMFCS II: 17 III: 34 IV: 16 V:8Anteroposterior X-ray, 3D CTNMigration percentage, acetabular index3D migration, lateral opening angleConcurrent validitySpearman rank correlation coefficient
Gose et al.[34]R91/20 (for reliability study of CT angles)91/20 (3DCT)5y 2mo (2y 7mo– 6y 10mo)Spastic dipl: 66, quadri: 25 GMFCS II: 9 III: 42 IV: 32 V: 8 Robin and Graham classification II: 4 III: 20 IV: 63 V: 4Anteroposterior X-ray 3D CTNRobin and Graham classification3D migration, lateral opening angle, sagittal inclination angle2 (for reliability study of CT angles)1/2Concurrent validity Intra- and inter-reliability (CT angles)Kruskal–Wallis, ICC, RMSE
Lee et al.[32]R51 out of 38451 out of 3849y 1mo (3y – 17y)GMFCS I: 146 II: 109 III: 69 IV: 42 V: 18 307 dipl and quadr, 77 hemiAnteroposterior X-raySupine position, hips in internal rotationMigration percentage28y/3y orthopaedic surgeons1/1Interrater reliabilityCalculate sample size for reliability, ICC
Parrott et al.[39]R202032mo (11mo– 8y 5mo)Anteroposterior X-rayPelvis and hips in neutral positionMigration percentage, acetabular index51–3y research physiotherapists1/2Intra- and inter-reliability within and between sessionsSpearman, Wilcoxon, ANOVA, ICC, SEM
Segev et al.[41]R2010Anteroposterior X-rayNAcetabular index, centre-edge angle, migration percentage5Senior orthopaedic surgeons1/3Intra- and inter-reliabilityANOVA: variances and standard deviations, ICC, paired t test
Hip classification
Murnaghan et al.[48]R424214–19yAnteroposterior X-rayPelvis and hips in neutral positionRobin and Graham Classification4 surgeons, 4 physiotherapists2 residents/2 surgeons: 4–10y/physiotherapists 16–26y1/2Intra- and inter-reliability, concurrent validityICC, asymptotic symmetry test
Robin et al.[47]R26813416y 4mo (14y –19y 1mo)GMFCS level I: 29 II: 25 III: 27 IV: 24 V: 29Anteroposterior X-rayNRobin and GrahamclassificationMigration percentage21/1Concurrent validityAgreement percentages

Quality and metrological assessment

Since no standardized quality assessment tools are available for the evaluation of articles in this field, a customized quality assessment scale was developed based on the literature. The aim of this scale was to provide both an assessment of the intrinsic quality of each article (maximum score 28) and an assessment of the metrological evidence supporting the method evaluated (maximum score 10). The total score was named the Q-score and is out of 100 (Table SI, online supporting information). The first part of the scale was based on previously published quality checklists for systematic reviews or scales assessing the quality of studies included in systematic reviews. These scales included questions focusing on the study design and quality of the reporting of the methodology and results.[19-22] The metrological part of the scale was based on evidence-based medicine in radiology[23] and studies providing examples of scales to evaluate metrological articles.[24, 25] The metrological score reflects the amount of metrological evidence brought by an article for the method considered. The rating of the quality assessment was carried out by two observers (CP and SB) independently and disagreements were then resolved by consensus.

In this review, an intraclass correlation coefficient (ICC) between 0 and 0.20 was considered as low, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good and 0.81–1 as excellent. The r or K coefficient was considered as 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, and 0.81–1 as excellent.[26-28] Although we acknowledge the limits of using such simple definitions, we decided to use this system for clarity and to enable comparisons between studies.[27, 28]

Because it was impossible to combine the results owing to the different statistical methods used, measurement errors estimating a statistical value for true change are reported as they were reported in each article.


References and explanations of the measurements taken from images can be found in Table SII (online supporting information). Figures 1 and 2 provide a visual representation of the main measurements described below.

Figure 1.

Anteroposterior hip X-ray measurements. Migration percentage=AB/AC×100%; AI, acetabular index; CEA, centre-edge angle; HSA, head-shaft angle; NSA, neck-shaft angle.

Figure 2.

Femoral anteversion. Anterior torsion of the femoral neck relative to the femoral condyles in the transverse plane. A, anterior; P, posterior.

Selection process

After duplicates were removed, 962 articles were automatically extracted from the databases and 19 articles fitted the inclusion and exclusion criteria: 13 from the PubMed database, three from other databases, and three found during the reference check. The methodology of each study is summarized in Tables 1 and 2 and the results of each study in Tables 3 and 4. Five articles came from the same team[29-33] and three came from another team.[10, 34, 35]

Table 2. Population and methodology of articles evaluating the neck-shaft angle, head-shaft angle, and femoral anteversion
 Type of study R=retro P=proNumber of hipsNumber of childrenMean age (y)/range/SDPopulation descriptionRadiological devicePosture control (NR=not reported)MeasureComparison with another techniqueNumber of examinersQualifications and years of experienceNumber trials by sessions/number sessionsAssessmentStatistical analysis
  1. R, retrospective; P, prospective; quadri, quadriplegic; dipl, diplegic; hemi, hemiplegic; 3D, three-dimensional, ICC, intraclass correlation coefficient.

Neck shaft angle, head shaft angle
Foroohar et al.[43]R10 for reliability (out of 70)39Group 1: 8y 1mo Group 2: 7y 8mo Group 3: 7y

Group 1: 15 patients with spastic CP

Group 2: 10 patients with spastic CP, need surgical intervention

Group 3: control

Anteroposterior X-rayRadiographs in external rotation excludedX-ray head shaft angle3Orthopaedic residents1/3Intra- and interrater reliabilityICC
Gose et al.[10]R186935y 4mo (2y 7mo – 6y 10mo)73 dipl, 20 quadri GMFCS level II: 20 III: 41 IV: 22 V: 103D CT X-rayNR3D CT neck shaft angleX-ray neck shaft angleConcurrent validitySpearman rank correlation test
Lee et al.[32]R384, 51 (out of 384) for the reliability session384, 51 (out of 384) for the reliability session9y 1mo (3y – 17y)GMFCS: I: 146 II: 109 III: 69 IV: 42 V: 18 307 dipl and quadr, 77 hemiAnteroposterior X-raySupine, hips in internal rotationX-ray neck-shaft angle and X-ray head-shaft angle2 for reliabilityOrthopaedic surgeons 8/3y (reliability)1/1Interrater reliabilityCalculate sample size for reliability, ICC for continuous data and kappa for categorical data; Pearson correlation coefficient
Femoral anteversion
Mahboubi et al.[45]R18 (out of 30)18 (out of 30)3 – 15yPatients with coxa valga, NSA>140°CTNRCT femoral anteversion, superimposition of two proximal slicesCT femoral anteversion, (one proximal slice)Concurrent validity
Miller et al.[44]R80407y 7mo (3y–16y)Ultrasound CTUltrasound: supine, legs flexed over the end of the table and parallel to each other CT: NRHead-neck ultrasound or flat-surface ultrasoundCT femoral anteversion CT flat-surface methodConcurrent validityCorrelation coefficient
Femoral anteversion and neck shaft angle
Chung et al.[30]P363611y (5y – 20y)GMFCS level I: 5 II: 11 III: 11 IV: 7 V: 2 6 hemi, 25 dipl, 5 quadriAnteroposterior X-raySupine, hips in internal rotationX-ray neck-shaft angleCT reformatted slices neck shaft angle31/2Concurrent validity, intra- and interrater reliabilitySample size, ICC, Pearson rank correlation coefficient
CTNRCT reformatted slices neck-shaft angle21/2Intra- and interrater reliabilityICC, Pearson rank correlation coefficient
CTNRCT femoral anteversionTrochanteric prominence angle test, maximal hip internal and external rotation28/7y1/2Concurrent validity, intra- and interrater reliability 
Table 3. Metrological properties of measures evaluating hip migration, acetabular morphology, and Robin and Graham classification
 Radiological deviceMeasureConcurrent validityIntrarater reliabilityInterrater reliabilityComments
  1. 3D, three dimensional; ICC, intraclass correlation coefficient; MAD, median absolute difference; Mean diff, mean difference; SEM, standard error of measurement; LA, limits of agreement; RMSE, root mean square error.

Hip migration
Cliffe et al.[36]Anteroposterior X-ray two radiographs taken the same dayMigration percentage on two radiographs for one hipTwo radiographs of the same child ICC=0.97 mean diff=3.2% the same day, 3.3% a later day correlation between migration percentage (2 radiographs): R[2]=0.94 (p<0.001) mean+2SD=10%Two radiographs of the same child ICC=0.96 mean diff=3.7% (SD=3.8%) mean+2SD=11%Assessment of two different radiographs for each child
Faraj et al.[37]Anteroposterior X-rayMigration percentage

MAD intra-session=3% (95th centile 12.7%)

MAD inter-session=1.7–3.2% (95th centile 12.6–12.9%)

MAD=2.8% (95th percentile 22%)
Gose et al.[35]Anteroposterior X-ray 3D CTMigration percentageCorrelation between 3D migration and migration percentage: r=0.85 (p<0.001)
Kim et al.[38]Anteroposterior X-rayMigration percentage, modified migration percentage

Migration percentage ICC range=0.94–0.97 SEM range=6.0–11.9%

Modified migration percentage ICC range=0.90–0.97 SEM range=6.2–17.1%

Migration percentage ICC=0.95 SEM=9.1%

Modified migration percentage ICC=0.87 SEM=15.9%

Higher reliability in the group without dysplasia
Lee et al.[32]Anteroposterior X-rayMigration percentageICC=0.93
Parrott et al.[39]Anteroposterior X-rayMigration percentageICC range=0.95–0.97 SEM for measurement change range=2.82–4.22% mean diff range=–1.94–2.78% SD range=1.59–5.75% 1.96×SEM=5.8%ICC range=0.91– 0.93 SEM for measurement change range=4.33–5.86% SD range: 1.28–6.45% 1.96*SEM=11.5%Influence of the gothic arch on the measurement error
Pountney et al.[40]Anteroposterior X-rayMigration percentageLA range=3.3–7.6% MD=0.1%LA range=5–8.1% MD=0.1–1.6%Six hips excluded from the inter-measurer analysis of the LA
Segev et al.[41]Anteroposterior X-rayMigration percentageSD range=3.31–7.91%ICC=0.83–0.92 SD=1.03–1.04%
Centre-edge angleSD range=4.81–5.04°ICC=0.60–0.74 SD=2.92–3.08°
Acetabular morphology
Chung et al.[31]3D CT3D visual assessment of acetabular dysplasiaKappa range=0.50–0.81% of agreement: 74–89Kappa=0.61% of agreement: 79
Chung et al.[29]3D CT, reformatted slicesThree directional acetabular indexesICC range=0.88–0.98 mean diff=2.1–4.1°ICC range=0.85–0.93
Gose et al.[35]Anteroposterior X-ray 3D CTAcetabular indexCorrelation between lateral opening angle and acetabular index: r=0.58 (p<0.001)
Gose et al.[34]3D CTLateral opening angle, sagittal inclination angleICC=0.93 RMSE=1.51°ICC=0.91 RMSE=1.69°
Park et al.[33]3D CT, reformatted slicesFrom transaxial plane: anterior/posterior acetabular index, acetabular anteversion, three directional indexes×pre-osteotomy: anterior acetabular index vs anterosuperior index: r=0.45 (p=0.04) posterior acetabular index vs posterosuperior index r=0.49 (p=0.02) ×post-osteotomy: anterior acetabular index vs anterosuperior index: r=0.21 (p=0.36) posterior acetabular index vs posterosuperior index r=0.14 (p=0.53)

ICC range: anterior acetabular index=0.55–0.94 posterior acetabular index=0.85–0.96 acetabular anteversion=0.90–0.98

Mean diff range anterior acetabular index=2.1–5.8° posterior acetabular index=1.4–3.1° acetabular anteversion=0.7–1.8°

ICC range: anterior acetabular index=0.79–0.83 posterior acetabular index=0.70–0.73 acetabular anteversion=0.92–0.95
Parrott et al.[39]Anteroposterior X-rayAcetabular indexICC range: 0.91–0.92 SEM for measurement change range: 1.82–1.91° mean diff: –0.85–0.10 SD range: –2.62–2.80° 1.96×SEM=2.6°ICC range: 0.80–0.81 SEM for measurement change range: 2.69–3.02° SD range: 0.82–3.53° 1.96×SEM=5.9°Influence of the gothic arch on the measurement error
Segev et al.[41]Anteroposterior X-rayAcetabular indexSD=2.96–3.18°ICC range=0.69–0.74 SD range=0.27–1.12°
Gose et al.[34]Anteroposterior X-ray three-dimensional CTRobin and Graham classification, lateral opening angle, sagittal inclination angle, 3D migrationSignificant correlation between grades of classification and lateral opening angle p<0.001 3D migration p<0.001 not with sagittal inclination angle p=0.82
Murnaghan et al.[48]Anteroposterior X-rayRobin and Graham classificationAgreement with the established standards ICC range=0.74–0.96ICC range=0.88–0.94ICC range=0.84–0.92
Robin et al.[47]Anteroposterior X-rayRobin and Graham classificationPercentage agreement of qualitative indices range=90.3–98.1%Agreement between estimated and true hip grade: kappa coefficient=0.96
Table 4. Metrological properties of measures evaluating neck-shaft angle, head-shaft angle, and femoral anteversion
 Radiological deviceMeasureIntrarater reliabilityInterrater reliabilityConcurrent validityComments
  1. CT, computed tomography; ICC, intraclass correlation coefficient.

Neck shaft angle, head shaft angle
Foroohar et al.[43]Anteroposterior X-rayX-ray head-shaft angleICC=0.68 (1 rater)ICC=0.94 (3 raters)
Chung et al.[30]Anteroposterior X-rayX-ray neck-shaft angle CT reformatted slices neck-shaft angleICC=0.93–0.97 mean diff=4.0 SD 3.4° 90% of the measurements within 10°ICC=0.87–0.94Correlation between X-ray and CT reformatted slices neck-shaft angle: r=0.89 (p<0.001)
3D CT, reformatted slicesCT reformatted slices neck-shaft angleICC=0.96–0.97ICC=0.96–0.96
Gose et al.[10]3D CT, Anteroposterior X-ray

3D CT neck-shaft angle

X-ray neck-shaft angle

Correlation between 3D CT and X-ray neck shaft angle r=0.74 (p<0.001)X-ray neck-shaft angle significantly larger than the 3D CT neck-shaft angle (p<0.001)
Lee et al.[32]Anteroposterior X-ray

X-ray neck-shaft angle

X ray head-shaft angle

Shape of the proximal repiphysis

X-ray neck shaft angle: ICC=0.98

X-ray head-shaft angle: ICC=0.79 shape of the proximal femoral epiphysis: gamma=0.97 kappa=0.66

Correlation between: X-ray neck-shaft angle and head-shaft angle r=0.72 (p<0.001)

X-ray neck-shaft angle and migration percentage: r=0.42 (p<0.001)

X-ray head-shaft angle and migration percentage: r=0.26 (p<0.001)

Femoral anteversion
Mahboubi et al.[45]CTCT femoral anteversionComparison between the old and the new technique: r=0.90.Mean for the new technique 29.4° and for the old 25.4°. Difference more than 10° in some patients
Miller et al.[44]CT, ultrasoundCT femoral anteversion CT flat-surface method ultrasound femoral anteversion flat-surface ultrasoundCorrelation between: head-neck ultrasound and CT femoral anteversion=0.51 head-neck ultrasound and CT flat-surface method=0.54 head-neck ultrasound and flat-surface ultrasound=0.72 flat-surface ultrasound and CT femoral anteversion=0.60 flat-surface ultrasound and CT flat-surface method=0.67

Regarding CT femoral anteversion, 21 hips could not be measured by one proximal slice

Eight hips could not be measured by two proximal slices because of high neck shaft angle

Chung et al.[30]CTCT femoral anteversionICC=0.98–0.995ICC=0.98–0.98Correlation between CT measurement and trochanteric prominence angle test r=0.86 (p<0.001) hip internal rotation r=0.79 (p<0.001) hip external rotation r=–0.48 (p<0.001)

Quality assessment

The quality rating scale is presented in Table SI. The principal aim of 13 out of the 19 studies was metrological. The mean Q-score was 65/100 (SD 14). Two articles had scores greater than 80, five articles had scores between 70 and 80, seven articles had scores between 60 and 70 and five articles had scores lower than 60/100.

Measurement of hip migration

The hip migration percentage as first proposed by Reimers in 1980[18] was evaluated in eight studies (mean Q-score 67).[32, 35-41] Except for the Gose et al.[35] study (Q-score 55), the studies found only evaluated the use of anteroposterior X-rays for the assessment of femoral head migration. Gose et al.[35] found excellent concurrent validity between migration percentage and the degree of migration in the frontal plane using a three-dimensional (3D) model of migration based on 3D computed tomography (CT). The seven other studies examined the reliability of migration percentage, all showing that intra- and interrater reliability was good to excellent. Results for standard error of measurement (SEM) × 1.96, mean + 2SD, and 95% confidence intervals were reported to be 5.8%, 10%, and 12.9% respectively, for one measurer and 11%, 11.5%, and 22% respectively, for different measurers depending on the study[36, 37, 39] (Q-scores 74, 74, 71). Using the lateral edge of the acetabular roof was more reliable than using the ‘sourcil’ of the acetabulum, as described by Kim et al.[38]

Measurement of acetabular dysplasia

The acetabular index, as proposed by Hilgenreiner,[42] using anteroposterior X-rays, was evaluated in three studies (Q-scores 74, 61, 55, mean 63).[35, 39, 41] The acetabular index showed moderate concurrent validity with measures carried out using 3D CT scans.[35] The intrarater reliability of the acetabular index was found to be excellent[39] and the interrater reliability good in two studies.[39, 41] Results for SEM*1.96 was reported to be 2.6° for a single assessor and 5.9° for different assessors.[39]

Morphological axial two-dimensional (2D) measurements of the acetabulum including anterior, posterior, and acetabular anteversion,[33] showed moderate concurrent validity with 3D measures using reformatted slices from a 3D CT scan realized before osteotomy (r=0.45–0.49, p<0.05) and poor concurrent validity after osteotomy (r=0.14–0.21, p>0.05). Intra- and inter-reliability of the different 2D measures were good to excellent (mean differences between 0.7° and 5.8°, Q-score 92).

Three-dimensional CT was used to evaluate different measurements in four studies.[29, 31, 34, 35] Chung et al.[31] visually assessed acetabular defects using 3D images and found moderate to good intra- and interrater reliability (Q-score 63). In a later study, the same team evaluated three directional indices of acetabular geometry calculated from reformatted slices from 3D CT scans and found excellent intra- and interrater reliability[29] (Q-score 68). In two different studies, Gose et al.[34, 35] showed that the intra- and interrater reliability of using a best-fit plane of the active surface of the ilium to define the orientation of the acetabulum was excellent, and there was moderate concurrent validity with plain radiography (Q-scores 55, 74).

Measurement of neck-shaft angle and head-shaft angle

The neck-shaft angle was evaluated in three studies[10, 30, 32] (Q-scores 55, 68, 89; mean 71) and the head shaft angle in two studies[32, 43] (Q-scores 53, 68; mean 61). The neck-shaft angle measured using anteroposterior X-ray showed an excellent correlation with the neck-shaft angle measured using 3D CT reformatted slices[30] and a good correlation with the 3D models issued from segmentation.[10] The intra- and interrater reliability of X-rays for the measurement of neck-shaft angle was excellent;[30, 32] the intrarater reliability for head-shaft angle was good and the interrater reliability was good to excellent.[32, 43] Chung et al.[30] showed that 90% of the measurement of neck-shaft angle using X-rays for the same rater was within 10° (Q-score 89). The same authors reported the neck-shaft angle measured using 3D CT scans (reformatted slices) to be a highly reliable method.

Femoral anteversion

Three studies evaluated in vivo measures of femoral anteversion in children with CP[30, 44, 45] (mean Q-score 53). Chung et al.[30] (Q-score 89) found an excellent correlation between measurements from 2D CT scans (a method using two slices through the femoral neck) and the trochanteric prominence angle test, even though the later overestimated the CT scan by a mean of 4.8° (SD 8.4). They reported excellent intra- and interrater reliability of the 2D CT scan measurement.

Miller et al.[44] (Q-score 42) evaluated the concurrent validity of two 2D CT scan methods (using measurements taken from a single slice or two laminated sections on the neck and flat surface) and two measurements using ultrasound (head-neck and flat-surface measure). The correlation of the two CT scans with ultrasound-based femoral head-neck measures was moderate, whereas with the ultrasound-based flat-surface measurements were good.

Mahboubi et al.[45] (Q-score 29) evaluated the concurrent validity of two methods based on two different CT scan slices: the first involved superimposition of a slice from the capital femoral epiphysis on a slice through the femoral neck and the second was Hernandez's method[46] using one slice. The correlation between the two methods was excellent even though the authors reported differences of more than 10° between the two measurements in some patients.

The Robin and Graham classification

Robin et al.[47] developed a six-grade radiological classification for children with CP at bone maturity based on qualitative indices and the measurement of the migration percentage. This classification was assessed in three articles[34, 47, 48] (Q-scores 63, 71, 74, mean 68). It was originally applied to children with CP at skeletal maturity (closure of triradiate cartilage). The intra- and inter-observer reliability were excellent in this ‘matured population’[48] (Q-score 71). The agreement between ‘estimated’ hip grade and true hip grade based on migration percentage was good to excellent. The percentage of agreement between the four qualitative morphological features of the classification and the hip grading based on migration percentage was excellent[47] (Q-score 63). More recently, a good correlation between the grades of classification and measurements from 3D CT scans was also found in a population of children aged between 2 years and 7 years.[34]


The aim of this systematic review was to evaluate existing metrological evidence for image-based measurements of hip geometry in children with CP. The metrological evidence for the use of the four main parameters defining hip geometry in this population varied depending on the parameter and the imaging and measurement techniques used. After discussion of the evidence for each parameter, the main limitations of this review are discussed and perspectives are highlighted.

Hip migration

Of all the parameters assessed, the metrological properties of the migration percentage have been the most studied. This parameter was shown to have good concurrent validity and good to excellent intra- and interrater reliability. The reliability of migration percentage measurement tends to increase with the size of the femoral head and age.[39] A threshold for true change has been defined between 5.8% and 11.5% depending on the article,[36, 39] however, this threshold is greatly increased if the measurer is inexperienced.[37]

The results of this review show that, to date, there is a lack of studies that have evaluated the use of 3D images to assess hip migration in children with CP. In the light of the current literature, migration percentage appears to be the most valid and reliable technique for the surveillance of hip migration in children with CP. Some suggestions have been made in the literature to improve reliability and to reduce errors: (1) reliability may be increased by providing more specific anatomical landmarks for measurements of the acetabulum, such as the lateral margin of the acetabular roof[38] or the midpoint of the ‘gothic arch’, which frequently occurs in the case of acetabular dysplasia;[39] (2) careful positioning of the patient may also limit measurement errors and increase reliability.[18, 49] Radiographs should be carried out as far as possible with the pelvis flat, horizontal, and with neutral abduction/adduction of the legs.[50, 51]

Acetabular dysplasia

The metrological properties of the acetabular index calculated from radiographs and measurements of acetabular morphology from 2D or 3D images have been evaluated for the assessment of acetabular dysplasia.

The acetabular index calculated from radiographs showed moderate concurrent validity with measurements taken from 3D images, good to excellent reliability and a threshold of true change of 2.6° when one measurer is involved and 5.9° when there are several measurers. As for the calculation of migration percentage, care must be taken to reduce measurement errors.[52, 53] Use of the lateral margin of the sourcil rather than the lateral margin of the acetabular roof has been recommended by Agus et al.,[54] although this has not been evaluated in children with CP. Using the mid-point of the ‘gothic arch’ when it is present may increase reliability.[39] Finally, extra care must also be taken for young children with small femoral heads as errors can be greater.[39]

For measurements taken using CT, the reliability was moderate to excellent for axial 2D CT scans and was excellent when 3D reformatted images were used.[29, 33] Equally, sensitivity to change was increased by the use of 3D images.[29, 33] It has been reported that the use of a 3D protocol avoids the errors that are related to the position of the pelvis, increases the accuracy of the calculation of acetabular anteversion, and avoids misinterpretation of some complex acetabular defects.[8, 55] Overall, these results suggest that 3D planes as defined by Chung et al.[29] or Gose et al.[34] are more metrologically sound and more clinically relevant than transaxial plane measurements, especially when surgical interventions are planned.

Neck-shaft and head-shaft angles

Current evidence suggests that the neck-shaft angle using anteroposterior X-rays is the method of reference for the measurement of femoral valgus deformity in children with CP. The neck-shaft angle is more correlated to migration percentage than the head-shaft angle, the former appearing to reflect hip instability more (correlation between neck shaft angle and migration percentage r=0.26 and between head shaft angle and migration percentage r=0.42).[32] The limits of agreement of femoral neck shaft angle are within 10° according to Chung et al.[30] The neck shaft angle from a standard radiograph requires femoral anteversion to be taken into account, which requires the hips to be positioned in internal rotation (about 30°).[44, 56, 57] If more appropriate, 3D CT can be used with excellent reliability, although no limits or values of measurement errors have yet been provided.[30]

Femoral anteversion

This parameter is three-dimensional in nature and is therefore probably the most challenging measurement in children with CP. The method used by Chung et al.[30] using two slices (as recommended by Murphy et al.),[58] has shown the best in vivo evidence in children with CP to date with an excellent correlation with the trochanteric-prominence-angle test and excellent reliability. However, this technique underestimated the clinical-prominence-test angle by a mean of 4.8°. Another study in 52 children out of 59 using 2D CT measurements[59] (Weiners et al.,[60] Hernandez et al.[46] and Murphy et al.'s[58] methods) (not included in this review) also emphasized a mean underestimation of 29.6° in comparison with intra-operative measurement in one side of 12 of these 52 children. Furthermore, in the present review, data describing the levels of the slices and anatomical landmarks used to create the condylar axis are lacking, thus making it difficult to draw any recommendations.[58] In the literature, other techniques using 2D CT have been proposed, reflecting the difficulty of using axial scan slice(s) to define accurately the landmarks needed to determine the ‘femoral neck line’[46, 58, 60-63] (see Fig. 2) and the condylar line.[46, 58, 60]

Miller et al.[44] proposed ultrasound as an alternative method to measure anteversion, at least when the neck-shaft angle is over 150°. They showed moderate correlation with the 2D CT scans but did not report data regarding reliability. In dried femora and healthy adults,[64, 65] ultrasound has shown excellent reliability and concurrent validity with magnetic resonance imaging (MRI) for measuring femoral anteversion. Further metrological studies of 2D CT ultrasound and MRI are needed to evaluate femoral anteversion in children with CP.

The Robin and Graham classification

The aim of the Robin and Graham classification is to communicate the natural history of hip deformities in children with CP and to describe the outcome of interventions. This classification has been shown to have good concurrent validity with 3D CT in a population aged between 2 years and 7 years and excellent reliability in an adolescent population. Since it is quick and easy to apply, it has become a new reference tool in the field. The quantitative part of the classification is based on hip migration percentage; therefore, the limitations relating to migration percentage measurement also apply to the classification.


There are some limitations to the conclusions drawn in this review. First, there was a wide range in the quality of the articles included (Q-scores from 29–92). The different statistical methods of reporting the results for concurrent validity (r, r2, K, limits of agreement) and reliability (mean differences, SD, median absolute difference, ICC, SEM) as well as missing descriptions (i.e. the population studied) made any direct comparisons between studies difficult. Furthermore, a priori sample size calculations to ensure that each study was well powered were reported in only three studies.[30, 32, 33] Second, among the articles included, five were from the same team and three from another team. Since measurements are sensitive to local habits, this could introduce bias in the generalization of the results. Third, this review focused on concurrent validity and reliability, which are the two main aspects of metrological assessment studied in the literature. Literature regarding the other aspects of validity is scarce and needs further investigation. Fourth, the quality scale we developed has not been validated and the scores should, therefore, be interpreted with caution.

Improving metrological studies

Three-dimensional CT with reformatted images and 3D modelling, the trochanteric prominence test, or intraoperative anatomical measurements are considered as criterion standards[8, 30, 34, 59] (and see below). However, these methods also lack validation and one should be cautious when interpreting the results. The definition of a reliable and valid complete 3D model of the hip joint in children with CP is necessary. Another way to measure the proximal femur geometry is to study either femoral models that mimic CP hip geometry, or cadavers. These methods help with the validation process but should be followed by validation in vivo.[56, 66]

With regard to the statistical analysis, which is a key point of this type of study, no exhaustive recommendations are available. However, when studying the reliability of a measurement reporting either the limits of agreement, the SEM or the minimal detectable difference would allow future users to know whether the probability that the observed difference could be attributed to a true change or to an error of measurement.[28] The power of the results can also be improved by calculating, a priori, the minimal sample size according to the number of raters and the number of repetitions.[67, 68]

Up and coming techniques

The studies included assessed anteroposterior X-rays, ultrasound, 2D and 3D CT scans using either reformatted images or 3D bone models. More proof of evidence is needed for ultrasound, 2D and 3D CT scans, especially with regard to the assessment of femoral geometry.

Even though MRI has already been used in children with CP and other populations for the evaluation of proximal femoral geometry,[69-72] to date there are no studies evaluating the metrological properties either for acetabular dysplasia, femoral head migration or geometry of the femur in the specific population of children with CP. In a population of children with mixed pathologies (four children with CP out of 17 children with other pathologies), femoral anteversion measured with MRI showed good concurrent validity with CT and ultrasound as well as an excellent intra- and interrater reliability.[70] Although MRI is time-consuming for 3D modelling and often requires young children to be sedated, its validity and reliability, the fact that it is radiation-free, and the possibility of 3D bone and cartilage modelling, suggest that it could be a highly useful tool.

EOS biplanar X-ray plus 3D modelling is a new method that is slightly irradiant but allows images to be taken in weight bearing and evaluates the entire lower limb in a single scan. Preliminary studies of concurrent validity with 3D CT and of reliability in cadavers for 3D acetabular coverage,[73] and the first results in a population of 12 children including six children with CP, show that this technique is promising for the assessment of the hip in this population.[74]

Finally, we would particularly like to insist on the need for evidence for non-irradiative methods. There is now clear evidence that repetitive exposure to radiographs may increase the risk of cancer in the paediatric population.[75, 76] Since these children frequently undergo orthopaedic follow-up for the duration of their lives, they may have an increased risk of cancer.


This review reports on the existing evidence behind the various image-based measurements of proximal hip geometry in children with CP. How to use such measurements and patient management strategies or guidelines for hip surveillance were not the aim of this review. Migration percentage, acetabular index, neck-shaft angle and the Robin and Graham classification all showed good to excellent concurrent validity and reliability, and are clinically relevant; these measurements would, therefore, seem to be the criterion-standard for initial diagnosis and hip surveillance. Other more complex methods, mostly for pre-surgical evaluations, such as 3D CT (reformatted planes or 3D modelling) can be used reliably for the assessment of acetabular dysplasia and neck-shaft angle but need further standardization and validation. Although there is some evidence for the use of CT scans in the assessment of femoral anteversion, this parameter remains a challenge and further developments and metrological evidence are required. Further studies are also needed to develop methods which are non- or less irradiative, such as ultrasound and MRI, specifically for the population of children with CP.


We sincerely thank Johanna Robertson for her help in revising the English of the article and Laetitia Houx for her help in illustrating the measurements and in reviewing the article.