The aim of this systematic review was to assess the current validity and reliability of radiological methods used to measure proximal hip geometry in children with cerebral palsy.
The aim of this systematic review was to assess the current validity and reliability of radiological methods used to measure proximal hip geometry in children with cerebral palsy.
A search was conducted using relevant keywords and inclusion/exclusion criteria of the MEDLINE, CINALH Plus, Embase, Web of Science, Academic Search Premier, The Cochrane Library, and PsychINFO databases.
The migration percentage using X-rays showed excellent reliability and concurrent validity with three-dimensional (3D) measurements from computed tomography (CT) scans. The acetabular index, measured using X-rays had good reliability but moderate concurrent validity with 3D CT measurements; 3D CT scan indexes had greater reliability. The measurement of the neck shaft angle using X-rays showed excellent concurrent validity with measurements from 3D CT scans and excellent reliability. Regarding femoral anteversion, one study found an excellent correlation between two-dimensional CT and clinical assessment and excellent reliability. Two others showed less evidence for the use of CT ultrasounds.
Most of the X-ray-based measurements showed good to excellent metrological properties. More metrological evidence is needed for the assessment of femoral anteversion. Magnetic resonance imaging and ultrasound-based measurements have great potential although very little metrological evidence is available.
Hip deformities occur in over one-third of children with cerebral palsy (CP) and are the second most common musculoskeletal deformity after equinus.[1-3] The femoral head frequently migrates relative to the acetabulum, which can lead to subluxation causing pain and functional limitations. Surveillance of hip deformity throughout growth is a challenge in this population.[5-7] Hip migration has been shown to be mostly associated with acetabular dysplasia, increased femoral neck-shaft angle, and increased femoral anteversion.[2, 8-11] These parameters are used in clinical decision-making regarding interventions[2, 10, 12] such as physical therapy, positioning, botulinum toxin injections or surgical interventions, such as adductor tenotomy, derotational, or varising osteotomy.[13-15] Because clinical evaluation of proximal hip geometry is limited to the measurement of femoral anteversion and joint range of motion, radiologically-based measurements are required to provide a detailed assessment and to ensure a reliable follow-up throughout growth, pre- and post intervention, or in research studies. The metrological properties of such measurements have not been specifically reviewed with regard to the evaluation of hip deformities found in children with CP. Numerous techniques are described in the literature, in studies of varying quality, making analysis difficult.
The main aim of this paper was to review the evidence of the metrological properties of image-based measurement of proximal hip geometry in children with CP, including hip migration, acetabular dysplasia, femoral neck shaft angle, and femoral anteversion. More specifically we aimed to (1) collect, evaluate, and report the data in studies that assessed the concurrent validity and reliability of imaging methods; (2) report the threshold for ‘real change’ when available; and (3) propose future research.
Articles were identified through a comprehensive search of the following computerized bibliographic databases: MEDLINE (1949-07/2012), CINALH Plus (1937-07/2012), Embase (1947-07/2012), Web of Science (1898-07/2012), Academic Search Premier (1975-07/2012), The Cochrane Library, and PsychINFO (1967-07/2012).
The search used the following Medical Subject Headings (MeSH) terms and text words, combining the keywords in order to achieve exhaustivity: (1) cerebral palsy; (2) (keywords relative to the hip or femur) ‘acetabular dysplasia’, ‘acetabular index’, ‘hip migration’, ‘hip subluxation’, ‘hip dysplasia’, ‘femoral anteversion’, ‘neck-shaft angle’, ‘coxa valga’; (3) (keywords relative to the imaging type) ‘radiography’, ‘X-ray’, ‘tomography’, ‘CT scan’, ‘ultrasonography’, ‘ultrasound’, ‘echography’, ‘MRI’, ‘biplanar radiography’, ‘biplanar X-ray’; and (4) (keywords relative to metrological properties) ‘measurement’, ‘measure’, ‘validity’, ‘reliability’, ‘repeatability’.
Two reviewers (BM and CP) independently assessed the papers by title and abstract with regard to the inclusion and exclusion criteria described below. Consensus for the inclusion/exclusion of relevant articles was reached by discussion. To be included, studies had to meet the following criteria: (1) original articles published in peer-review journals excluding conference proceedings; (2) studies including children or young adults below the age of 20 years with CP; (3) studies involving evaluation of imaging-based measurement of proximal femoral geometry (acetabular dysplasia, hip migration, neck-shaft angle or femoral torsion); and (4) studies reporting data regarding reliability and/or concurrent validity of imaging techniques.
The exclusion criteria were (1) studies which were not in English; (2) studies including CP and other pathologies for which it was not possible to extract the data of the children with CP; (3) studies before 1980 were reviewed and were considered inappropriate because they involved radiological methods that are now unused;[16, 17] (4) studies showing concurrent validity between different measures measuring the same concept using the same material (e.g. migration percentage and centre-edge angle measuring hip migration in the same anteroposterior X-ray as in Reimers) were also excluded on the basis that the other studies comparing the same concept with different devices provided more evidence. Studies that analysed correlations between the different measures of proximal hip geometry without carrying out a metrological evaluation were also excluded (e.g. Abel et al.).
The references of the selected articles were also searched in order to complete the selection process. Data items were extracted using a standardized form (see Tables 1 and III).
|Study||Type of study||Number of hips||Number of children||Mean Age/range/SD||Population description||Radiological device||Posture control Y=yes N=not reported||Measure||Comparison with another technique||Number of examiners||Examiners: qualifications and years of experience||Number of trials by sessions/number of sessions||Assessment||Statistical analysis|
|Cliffe et al.||P||40||20||30mo–10y||GMFCS level IV: 10 V: 10||Anteroposterior X-ray, two radiographs taken the same day||Pelvis and hips in neutral position||Migration percentage||–||2||Paediatric radiologists||Interpretation by two radiologists of the two radiographs for each children twice||Intra- and inter-reliability, assessment of the effect of positioning, inter-observer error and variations in observer technique over time||Mean, SD, Pearson's correlation coefficient, ICC|
|Faraj et al.||R||44||22||2–8y||–||Anteroposterior X-ray||Hip joints in neutral position, without ab/adduction||Migration percentage||–||2||Orthopaedic trainees||2/2||Intra- and inter-reliability within and between sessions||MAD, Kruskal–Wallis, Mann Whitney U|
|Kim et al.||R||152||100||7y 11mo (SD 1y 6mo)||–||Anteroposterior X-ray||Coxae in neutral position|| |
Modified migration percentage
|–||2||Rehabilitation doctors||1/3||Intra- and inter-reliability||ICC, SEM|
|Pountney et al.||R||20||–||–||–||Anteroposterior X-ray||Pelvis and hips in neutral position||Migration percentage (use of a drafting arm)||–||3||–||2/2||Intra- and inter-reliability||Bland and Altman (LA)|
|Chung et al.||R||54||27||7y 2mo (SD 2y 4mo)||Spastic quadri:19 dipl: 6 mixed: 2||3D CT||N||3D visual assessment of acetabular dysplasia (anterior, posterior or global defect)||–||4||20: 5y/4y/trainee||1/2||Intra- and inter-reliability||Kappa,% of agreement (rate of people classifying the acetabulum in the same class)|
|Chung et al.||P||17||12||8y 1mo (5y 11mo – 13y 2mo)||12 spastic quadri GMFCS III: 1 IV: 8 V: 3||3D CT||N||Three directional indexes (anterosuperior, superolateral, posterosuperior)||–||3||20: 5y/chief resident||1/2||Intra- and inter-reliability||ICC, Mean difference (range)|
|Park et al.||P||22||16||8y 4mo (SD 2y 2mo)||Spastic quadri GMFCS III: 2 IV: 8 V: 6||3D CT||N||From reformatted axial plane: anterior/posterior acetabular indexes and acetabular anteversion||Three-directional indexes (anterosuperior, posterosuperior) for pre- and post-osteotomy||4||20y/6y/5y/research assistant||1/2||Intra- and inter-reliability, concurrent validity||Sample size, ICC, Pearson correlation coefficients, Paired t test and Wilcoxon signed rank test|
|Gose et al.||R||150 (102 for concurrent study)||75||5y 5mo (2y 8mo– 6y 11mo)||Spastic dipl: 60, quadri: 15 GMFCS II: 17 III: 34 IV: 16 V:8||Anteroposterior X-ray, 3D CT||N||Migration percentage, acetabular index||3D migration, lateral opening angle||–||–||–||Concurrent validity||Spearman rank correlation coefficient|
|Gose et al.||R||91/20 (for reliability study of CT angles)||91/20 (3DCT)||5y 2mo (2y 7mo– 6y 10mo)||Spastic dipl: 66, quadri: 25 GMFCS II: 9 III: 42 IV: 32 V: 8 Robin and Graham classification II: 4 III: 20 IV: 63 V: 4||Anteroposterior X-ray 3D CT||N||Robin and Graham classification||3D migration, lateral opening angle, sagittal inclination angle||2 (for reliability study of CT angles)||–||1/2||Concurrent validity Intra- and inter-reliability (CT angles)||Kruskal–Wallis, ICC, RMSE|
|Lee et al.||R||51 out of 384||51 out of 384||9y 1mo (3y – 17y)||GMFCS I: 146 II: 109 III: 69 IV: 42 V: 18 307 dipl and quadr, 77 hemi||Anteroposterior X-ray||Supine position, hips in internal rotation||Migration percentage||–||2||8y/3y orthopaedic surgeons||1/1||Interrater reliability||Calculate sample size for reliability, ICC|
|Parrott et al.||R||20||20||32mo (11mo– 8y 5mo)||–||Anteroposterior X-ray||Pelvis and hips in neutral position||Migration percentage, acetabular index||–||5||1–3y research physiotherapists||1/2||Intra- and inter-reliability within and between sessions||Spearman, Wilcoxon, ANOVA, ICC, SEM|
|Segev et al.||R||20||10||–||–||Anteroposterior X-ray||N||Acetabular index, centre-edge angle, migration percentage||–||5||Senior orthopaedic surgeons||1/3||Intra- and inter-reliability||ANOVA: variances and standard deviations, ICC, paired t test|
|Murnaghan et al.||R||42||42||14–19y||–||Anteroposterior X-ray||Pelvis and hips in neutral position||Robin and Graham Classification||–||4 surgeons, 4 physiotherapists||2 residents/2 surgeons: 4–10y/physiotherapists 16–26y||1/2||Intra- and inter-reliability, concurrent validity||ICC, asymptotic symmetry test|
|Robin et al.||R||268||134||16y 4mo (14y –19y 1mo)||GMFCS level I: 29 II: 25 III: 27 IV: 24 V: 29||Anteroposterior X-ray||N||Robin and Grahamclassification||Migration percentage||2||–||1/1||Concurrent validity||Agreement percentages|
Since no standardized quality assessment tools are available for the evaluation of articles in this field, a customized quality assessment scale was developed based on the literature. The aim of this scale was to provide both an assessment of the intrinsic quality of each article (maximum score 28) and an assessment of the metrological evidence supporting the method evaluated (maximum score 10). The total score was named the Q-score and is out of 100 (Table SI, online supporting information). The first part of the scale was based on previously published quality checklists for systematic reviews or scales assessing the quality of studies included in systematic reviews. These scales included questions focusing on the study design and quality of the reporting of the methodology and results.[19-22] The metrological part of the scale was based on evidence-based medicine in radiology and studies providing examples of scales to evaluate metrological articles.[24, 25] The metrological score reflects the amount of metrological evidence brought by an article for the method considered. The rating of the quality assessment was carried out by two observers (CP and SB) independently and disagreements were then resolved by consensus.
In this review, an intraclass correlation coefficient (ICC) between 0 and 0.20 was considered as low, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good and 0.81–1 as excellent. The r or K coefficient was considered as 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, and 0.81–1 as excellent.[26-28] Although we acknowledge the limits of using such simple definitions, we decided to use this system for clarity and to enable comparisons between studies.[27, 28]
Because it was impossible to combine the results owing to the different statistical methods used, measurement errors estimating a statistical value for true change are reported as they were reported in each article.
References and explanations of the measurements taken from images can be found in Table SII (online supporting information). Figures 1 and 2 provide a visual representation of the main measurements described below.
After duplicates were removed, 962 articles were automatically extracted from the databases and 19 articles fitted the inclusion and exclusion criteria: 13 from the PubMed database, three from other databases, and three found during the reference check. The methodology of each study is summarized in Tables 1 and 2 and the results of each study in Tables 3 and 4. Five articles came from the same team[29-33] and three came from another team.[10, 34, 35]
|Type of study R=retro P=pro||Number of hips||Number of children||Mean age (y)/range/SD||Population description||Radiological device||Posture control (NR=not reported)||Measure||Comparison with another technique||Number of examiners||Qualifications and years of experience||Number trials by sessions/number sessions||Assessment||Statistical analysis|
|Neck shaft angle, head shaft angle|
|Foroohar et al.||R||10 for reliability (out of 70)||39||Group 1: 8y 1mo Group 2: 7y 8mo Group 3: 7y|| |
Group 1: 15 patients with spastic CP
Group 2: 10 patients with spastic CP, need surgical intervention
Group 3: control
|Anteroposterior X-ray||Radiographs in external rotation excluded||X-ray head shaft angle||–||3||Orthopaedic residents||1/3||Intra- and interrater reliability||ICC|
|Gose et al.||R||186||93||5y 4mo (2y 7mo – 6y 10mo)||73 dipl, 20 quadri GMFCS level II: 20 III: 41 IV: 22 V: 10||3D CT X-ray||NR||3D CT neck shaft angle||X-ray neck shaft angle||–||–||–||Concurrent validity||Spearman rank correlation test|
|Lee et al.||R||384, 51 (out of 384) for the reliability session||384, 51 (out of 384) for the reliability session||9y 1mo (3y – 17y)||GMFCS: I: 146 II: 109 III: 69 IV: 42 V: 18 307 dipl and quadr, 77 hemi||Anteroposterior X-ray||Supine, hips in internal rotation||X-ray neck-shaft angle and X-ray head-shaft angle||–||2 for reliability||Orthopaedic surgeons 8/3y (reliability)||1/1||Interrater reliability||Calculate sample size for reliability, ICC for continuous data and kappa for categorical data; Pearson correlation coefficient|
|Mahboubi et al.||R||18 (out of 30)||18 (out of 30)||3 – 15y||Patients with coxa valga, NSA>140°||CT||NR||CT femoral anteversion, superimposition of two proximal slices||CT femoral anteversion, (one proximal slice)||–||–||–||Concurrent validity||–|
|Miller et al.||R||80||40||7y 7mo (3y–16y)||–||Ultrasound CT||Ultrasound: supine, legs flexed over the end of the table and parallel to each other CT: NR||Head-neck ultrasound or flat-surface ultrasound||CT femoral anteversion CT flat-surface method||–||–||–||Concurrent validity||Correlation coefficient|
|Femoral anteversion and neck shaft angle|
|Chung et al.||P||36||36||11y (5y – 20y)||GMFCS level I: 5 II: 11 III: 11 IV: 7 V: 2 6 hemi, 25 dipl, 5 quadri||Anteroposterior X-ray||Supine, hips in internal rotation||X-ray neck-shaft angle||CT reformatted slices neck shaft angle||3||–||1/2||Concurrent validity, intra- and interrater reliability||Sample size, ICC, Pearson rank correlation coefficient|
|CT||NR||CT reformatted slices neck-shaft angle||–||2||–||1/2||Intra- and interrater reliability||ICC, Pearson rank correlation coefficient|
|CT||NR||CT femoral anteversion||Trochanteric prominence angle test, maximal hip internal and external rotation||2||8/7y||1/2||Concurrent validity, intra- and interrater reliability|
|Radiological device||Measure||Concurrent validity||Intrarater reliability||Interrater reliability||Comments|
|Cliffe et al.||Anteroposterior X-ray two radiographs taken the same day||Migration percentage on two radiographs for one hip||–||Two radiographs of the same child ICC=0.97 mean diff=3.2% the same day, 3.3% a later day correlation between migration percentage (2 radiographs): R=0.94 (p<0.001) mean+2SD=10%||Two radiographs of the same child ICC=0.96 mean diff=3.7% (SD=3.8%) mean+2SD=11%||Assessment of two different radiographs for each child|
|Faraj et al.||Anteroposterior X-ray||Migration percentage||–|| |
MAD intra-session=3% (95th centile 12.7%)
MAD inter-session=1.7–3.2% (95th centile 12.6–12.9%)
|MAD=2.8% (95th percentile 22%)||–|
|Gose et al.||Anteroposterior X-ray 3D CT||Migration percentage||Correlation between 3D migration and migration percentage: r=0.85 (p<0.001)||–||–||–|
|Kim et al.||Anteroposterior X-ray||Migration percentage, modified migration percentage||–|| |
Migration percentage ICC range=0.94–0.97 SEM range=6.0–11.9%
Modified migration percentage ICC range=0.90–0.97 SEM range=6.2–17.1%
Migration percentage ICC=0.95 SEM=9.1%
Modified migration percentage ICC=0.87 SEM=15.9%
|Higher reliability in the group without dysplasia|
|Lee et al.||Anteroposterior X-ray||Migration percentage||–||–||ICC=0.93||–|
|Parrott et al.||Anteroposterior X-ray||Migration percentage||–||ICC range=0.95–0.97 SEM for measurement change range=2.82–4.22% mean diff range=–1.94–2.78% SD range=1.59–5.75% 1.96×SEM=5.8%||ICC range=0.91– 0.93 SEM for measurement change range=4.33–5.86% SD range: 1.28–6.45% 1.96*SEM=11.5%||Influence of the gothic arch on the measurement error|
|Pountney et al.||Anteroposterior X-ray||Migration percentage||–||LA range=3.3–7.6% MD=0.1%||LA range=5–8.1% MD=0.1–1.6%||Six hips excluded from the inter-measurer analysis of the LA|
|Segev et al.||Anteroposterior X-ray||Migration percentage||–||SD range=3.31–7.91%||ICC=0.83–0.92 SD=1.03–1.04%||–|
|Centre-edge angle||–||SD range=4.81–5.04°||ICC=0.60–0.74 SD=2.92–3.08°||–|
|Chung et al.||3D CT||3D visual assessment of acetabular dysplasia||–||Kappa range=0.50–0.81% of agreement: 74–89||Kappa=0.61% of agreement: 79||–|
|Chung et al.||3D CT, reformatted slices||Three directional acetabular indexes||–||ICC range=0.88–0.98 mean diff=2.1–4.1°||ICC range=0.85–0.93||–|
|Gose et al.||Anteroposterior X-ray 3D CT||Acetabular index||Correlation between lateral opening angle and acetabular index: r=0.58 (p<0.001)||–||–||–|
|Gose et al.||3D CT||Lateral opening angle, sagittal inclination angle||–||ICC=0.93 RMSE=1.51°||ICC=0.91 RMSE=1.69°||–|
|Park et al.||3D CT, reformatted slices||From transaxial plane: anterior/posterior acetabular index, acetabular anteversion, three directional indexes||×pre-osteotomy: anterior acetabular index vs anterosuperior index: r=0.45 (p=0.04) posterior acetabular index vs posterosuperior index r=0.49 (p=0.02) ×post-osteotomy: anterior acetabular index vs anterosuperior index: r=0.21 (p=0.36) posterior acetabular index vs posterosuperior index r=0.14 (p=0.53)|| |
ICC range: anterior acetabular index=0.55–0.94 posterior acetabular index=0.85–0.96 acetabular anteversion=0.90–0.98
Mean diff range anterior acetabular index=2.1–5.8° posterior acetabular index=1.4–3.1° acetabular anteversion=0.7–1.8°
|ICC range: anterior acetabular index=0.79–0.83 posterior acetabular index=0.70–0.73 acetabular anteversion=0.92–0.95||–|
|Parrott et al.||Anteroposterior X-ray||Acetabular index||–||ICC range: 0.91–0.92 SEM for measurement change range: 1.82–1.91° mean diff: –0.85–0.10 SD range: –2.62–2.80° 1.96×SEM=2.6°||ICC range: 0.80–0.81 SEM for measurement change range: 2.69–3.02° SD range: 0.82–3.53° 1.96×SEM=5.9°||Influence of the gothic arch on the measurement error|
|Segev et al.||Anteroposterior X-ray||Acetabular index||–||SD=2.96–3.18°||ICC range=0.69–0.74 SD range=0.27–1.12°||–|
|Gose et al.||Anteroposterior X-ray three-dimensional CT||Robin and Graham classification, lateral opening angle, sagittal inclination angle, 3D migration||Significant correlation between grades of classification and lateral opening angle p<0.001 3D migration p<0.001 not with sagittal inclination angle p=0.82||–||–||–|
|Murnaghan et al.||Anteroposterior X-ray||Robin and Graham classification||Agreement with the established standards ICC range=0.74–0.96||ICC range=0.88–0.94||ICC range=0.84–0.92||–|
|Robin et al.||Anteroposterior X-ray||Robin and Graham classification||Percentage agreement of qualitative indices range=90.3–98.1%||–||–||Agreement between estimated and true hip grade: kappa coefficient=0.96|
|Radiological device||Measure||Intrarater reliability||Interrater reliability||Concurrent validity||Comments|
|Neck shaft angle, head shaft angle|
|Foroohar et al.||Anteroposterior X-ray||X-ray head-shaft angle||ICC=0.68 (1 rater)||ICC=0.94 (3 raters)||–||–|
|Chung et al.||Anteroposterior X-ray||X-ray neck-shaft angle CT reformatted slices neck-shaft angle||ICC=0.93–0.97 mean diff=4.0 SD 3.4° 90% of the measurements within 10°||ICC=0.87–0.94||Correlation between X-ray and CT reformatted slices neck-shaft angle: r=0.89 (p<0.001)||–|
|3D CT, reformatted slices||CT reformatted slices neck-shaft angle||ICC=0.96–0.97||ICC=0.96–0.96||–||–|
|Gose et al.||3D CT, Anteroposterior X-ray|| |
3D CT neck-shaft angle
X-ray neck-shaft angle
|–||–||Correlation between 3D CT and X-ray neck shaft angle r=0.74 (p<0.001)||X-ray neck-shaft angle significantly larger than the 3D CT neck-shaft angle (p<0.001)|
|Lee et al.||Anteroposterior X-ray|| |
X-ray neck-shaft angle
X ray head-shaft angle
Shape of the proximal repiphysis
X-ray neck shaft angle: ICC=0.98
X-ray head-shaft angle: ICC=0.79 shape of the proximal femoral epiphysis: gamma=0.97 kappa=0.66
Correlation between: X-ray neck-shaft angle and head-shaft angle r=0.72 (p<0.001)
X-ray neck-shaft angle and migration percentage: r=0.42 (p<0.001)
X-ray head-shaft angle and migration percentage: r=0.26 (p<0.001)
|Mahboubi et al.||CT||CT femoral anteversion||–||–||Comparison between the old and the new technique: r=0.90.||Mean for the new technique 29.4° and for the old 25.4°. Difference more than 10° in some patients|
|Miller et al.||CT, ultrasound||CT femoral anteversion CT flat-surface method ultrasound femoral anteversion flat-surface ultrasound||–||–||Correlation between: head-neck ultrasound and CT femoral anteversion=0.51 head-neck ultrasound and CT flat-surface method=0.54 head-neck ultrasound and flat-surface ultrasound=0.72 flat-surface ultrasound and CT femoral anteversion=0.60 flat-surface ultrasound and CT flat-surface method=0.67|| |
Regarding CT femoral anteversion, 21 hips could not be measured by one proximal slice
Eight hips could not be measured by two proximal slices because of high neck shaft angle
|Chung et al.||CT||CT femoral anteversion||ICC=0.98–0.995||ICC=0.98–0.98||Correlation between CT measurement and trochanteric prominence angle test r=0.86 (p<0.001) hip internal rotation r=0.79 (p<0.001) hip external rotation r=–0.48 (p<0.001)||–|
The quality rating scale is presented in Table SI. The principal aim of 13 out of the 19 studies was metrological. The mean Q-score was 65/100 (SD 14). Two articles had scores greater than 80, five articles had scores between 70 and 80, seven articles had scores between 60 and 70 and five articles had scores lower than 60/100.
The hip migration percentage as first proposed by Reimers in 1980 was evaluated in eight studies (mean Q-score 67).[32, 35-41] Except for the Gose et al. study (Q-score 55), the studies found only evaluated the use of anteroposterior X-rays for the assessment of femoral head migration. Gose et al. found excellent concurrent validity between migration percentage and the degree of migration in the frontal plane using a three-dimensional (3D) model of migration based on 3D computed tomography (CT). The seven other studies examined the reliability of migration percentage, all showing that intra- and interrater reliability was good to excellent. Results for standard error of measurement (SEM) × 1.96, mean + 2SD, and 95% confidence intervals were reported to be 5.8%, 10%, and 12.9% respectively, for one measurer and 11%, 11.5%, and 22% respectively, for different measurers depending on the study[36, 37, 39] (Q-scores 74, 74, 71). Using the lateral edge of the acetabular roof was more reliable than using the ‘sourcil’ of the acetabulum, as described by Kim et al.
The acetabular index, as proposed by Hilgenreiner, using anteroposterior X-rays, was evaluated in three studies (Q-scores 74, 61, 55, mean 63).[35, 39, 41] The acetabular index showed moderate concurrent validity with measures carried out using 3D CT scans. The intrarater reliability of the acetabular index was found to be excellent and the interrater reliability good in two studies.[39, 41] Results for SEM*1.96 was reported to be 2.6° for a single assessor and 5.9° for different assessors.
Morphological axial two-dimensional (2D) measurements of the acetabulum including anterior, posterior, and acetabular anteversion, showed moderate concurrent validity with 3D measures using reformatted slices from a 3D CT scan realized before osteotomy (r=0.45–0.49, p<0.05) and poor concurrent validity after osteotomy (r=0.14–0.21, p>0.05). Intra- and inter-reliability of the different 2D measures were good to excellent (mean differences between 0.7° and 5.8°, Q-score 92).
Three-dimensional CT was used to evaluate different measurements in four studies.[29, 31, 34, 35] Chung et al. visually assessed acetabular defects using 3D images and found moderate to good intra- and interrater reliability (Q-score 63). In a later study, the same team evaluated three directional indices of acetabular geometry calculated from reformatted slices from 3D CT scans and found excellent intra- and interrater reliability (Q-score 68). In two different studies, Gose et al.[34, 35] showed that the intra- and interrater reliability of using a best-fit plane of the active surface of the ilium to define the orientation of the acetabulum was excellent, and there was moderate concurrent validity with plain radiography (Q-scores 55, 74).
The neck-shaft angle was evaluated in three studies[10, 30, 32] (Q-scores 55, 68, 89; mean 71) and the head shaft angle in two studies[32, 43] (Q-scores 53, 68; mean 61). The neck-shaft angle measured using anteroposterior X-ray showed an excellent correlation with the neck-shaft angle measured using 3D CT reformatted slices and a good correlation with the 3D models issued from segmentation. The intra- and interrater reliability of X-rays for the measurement of neck-shaft angle was excellent;[30, 32] the intrarater reliability for head-shaft angle was good and the interrater reliability was good to excellent.[32, 43] Chung et al. showed that 90% of the measurement of neck-shaft angle using X-rays for the same rater was within 10° (Q-score 89). The same authors reported the neck-shaft angle measured using 3D CT scans (reformatted slices) to be a highly reliable method.
Three studies evaluated in vivo measures of femoral anteversion in children with CP[30, 44, 45] (mean Q-score 53). Chung et al. (Q-score 89) found an excellent correlation between measurements from 2D CT scans (a method using two slices through the femoral neck) and the trochanteric prominence angle test, even though the later overestimated the CT scan by a mean of 4.8° (SD 8.4). They reported excellent intra- and interrater reliability of the 2D CT scan measurement.
Miller et al. (Q-score 42) evaluated the concurrent validity of two 2D CT scan methods (using measurements taken from a single slice or two laminated sections on the neck and flat surface) and two measurements using ultrasound (head-neck and flat-surface measure). The correlation of the two CT scans with ultrasound-based femoral head-neck measures was moderate, whereas with the ultrasound-based flat-surface measurements were good.
Mahboubi et al. (Q-score 29) evaluated the concurrent validity of two methods based on two different CT scan slices: the first involved superimposition of a slice from the capital femoral epiphysis on a slice through the femoral neck and the second was Hernandez's method using one slice. The correlation between the two methods was excellent even though the authors reported differences of more than 10° between the two measurements in some patients.
Robin et al. developed a six-grade radiological classification for children with CP at bone maturity based on qualitative indices and the measurement of the migration percentage. This classification was assessed in three articles[34, 47, 48] (Q-scores 63, 71, 74, mean 68). It was originally applied to children with CP at skeletal maturity (closure of triradiate cartilage). The intra- and inter-observer reliability were excellent in this ‘matured population’ (Q-score 71). The agreement between ‘estimated’ hip grade and true hip grade based on migration percentage was good to excellent. The percentage of agreement between the four qualitative morphological features of the classification and the hip grading based on migration percentage was excellent (Q-score 63). More recently, a good correlation between the grades of classification and measurements from 3D CT scans was also found in a population of children aged between 2 years and 7 years.
The aim of this systematic review was to evaluate existing metrological evidence for image-based measurements of hip geometry in children with CP. The metrological evidence for the use of the four main parameters defining hip geometry in this population varied depending on the parameter and the imaging and measurement techniques used. After discussion of the evidence for each parameter, the main limitations of this review are discussed and perspectives are highlighted.
Of all the parameters assessed, the metrological properties of the migration percentage have been the most studied. This parameter was shown to have good concurrent validity and good to excellent intra- and interrater reliability. The reliability of migration percentage measurement tends to increase with the size of the femoral head and age. A threshold for true change has been defined between 5.8% and 11.5% depending on the article,[36, 39] however, this threshold is greatly increased if the measurer is inexperienced.
The results of this review show that, to date, there is a lack of studies that have evaluated the use of 3D images to assess hip migration in children with CP. In the light of the current literature, migration percentage appears to be the most valid and reliable technique for the surveillance of hip migration in children with CP. Some suggestions have been made in the literature to improve reliability and to reduce errors: (1) reliability may be increased by providing more specific anatomical landmarks for measurements of the acetabulum, such as the lateral margin of the acetabular roof or the midpoint of the ‘gothic arch’, which frequently occurs in the case of acetabular dysplasia; (2) careful positioning of the patient may also limit measurement errors and increase reliability.[18, 49] Radiographs should be carried out as far as possible with the pelvis flat, horizontal, and with neutral abduction/adduction of the legs.[50, 51]
The metrological properties of the acetabular index calculated from radiographs and measurements of acetabular morphology from 2D or 3D images have been evaluated for the assessment of acetabular dysplasia.
The acetabular index calculated from radiographs showed moderate concurrent validity with measurements taken from 3D images, good to excellent reliability and a threshold of true change of 2.6° when one measurer is involved and 5.9° when there are several measurers. As for the calculation of migration percentage, care must be taken to reduce measurement errors.[52, 53] Use of the lateral margin of the sourcil rather than the lateral margin of the acetabular roof has been recommended by Agus et al., although this has not been evaluated in children with CP. Using the mid-point of the ‘gothic arch’ when it is present may increase reliability. Finally, extra care must also be taken for young children with small femoral heads as errors can be greater.
For measurements taken using CT, the reliability was moderate to excellent for axial 2D CT scans and was excellent when 3D reformatted images were used.[29, 33] Equally, sensitivity to change was increased by the use of 3D images.[29, 33] It has been reported that the use of a 3D protocol avoids the errors that are related to the position of the pelvis, increases the accuracy of the calculation of acetabular anteversion, and avoids misinterpretation of some complex acetabular defects.[8, 55] Overall, these results suggest that 3D planes as defined by Chung et al. or Gose et al. are more metrologically sound and more clinically relevant than transaxial plane measurements, especially when surgical interventions are planned.
Current evidence suggests that the neck-shaft angle using anteroposterior X-rays is the method of reference for the measurement of femoral valgus deformity in children with CP. The neck-shaft angle is more correlated to migration percentage than the head-shaft angle, the former appearing to reflect hip instability more (correlation between neck shaft angle and migration percentage r=0.26 and between head shaft angle and migration percentage r=0.42). The limits of agreement of femoral neck shaft angle are within 10° according to Chung et al. The neck shaft angle from a standard radiograph requires femoral anteversion to be taken into account, which requires the hips to be positioned in internal rotation (about 30°).[44, 56, 57] If more appropriate, 3D CT can be used with excellent reliability, although no limits or values of measurement errors have yet been provided.
This parameter is three-dimensional in nature and is therefore probably the most challenging measurement in children with CP. The method used by Chung et al. using two slices (as recommended by Murphy et al.), has shown the best in vivo evidence in children with CP to date with an excellent correlation with the trochanteric-prominence-angle test and excellent reliability. However, this technique underestimated the clinical-prominence-test angle by a mean of 4.8°. Another study in 52 children out of 59 using 2D CT measurements (Weiners et al., Hernandez et al. and Murphy et al.'s methods) (not included in this review) also emphasized a mean underestimation of 29.6° in comparison with intra-operative measurement in one side of 12 of these 52 children. Furthermore, in the present review, data describing the levels of the slices and anatomical landmarks used to create the condylar axis are lacking, thus making it difficult to draw any recommendations. In the literature, other techniques using 2D CT have been proposed, reflecting the difficulty of using axial scan slice(s) to define accurately the landmarks needed to determine the ‘femoral neck line’[46, 58, 60-63] (see Fig. 2) and the condylar line.[46, 58, 60]
Miller et al. proposed ultrasound as an alternative method to measure anteversion, at least when the neck-shaft angle is over 150°. They showed moderate correlation with the 2D CT scans but did not report data regarding reliability. In dried femora and healthy adults,[64, 65] ultrasound has shown excellent reliability and concurrent validity with magnetic resonance imaging (MRI) for measuring femoral anteversion. Further metrological studies of 2D CT ultrasound and MRI are needed to evaluate femoral anteversion in children with CP.
The aim of the Robin and Graham classification is to communicate the natural history of hip deformities in children with CP and to describe the outcome of interventions. This classification has been shown to have good concurrent validity with 3D CT in a population aged between 2 years and 7 years and excellent reliability in an adolescent population. Since it is quick and easy to apply, it has become a new reference tool in the field. The quantitative part of the classification is based on hip migration percentage; therefore, the limitations relating to migration percentage measurement also apply to the classification.
There are some limitations to the conclusions drawn in this review. First, there was a wide range in the quality of the articles included (Q-scores from 29–92). The different statistical methods of reporting the results for concurrent validity (r, r2, K, limits of agreement) and reliability (mean differences, SD, median absolute difference, ICC, SEM) as well as missing descriptions (i.e. the population studied) made any direct comparisons between studies difficult. Furthermore, a priori sample size calculations to ensure that each study was well powered were reported in only three studies.[30, 32, 33] Second, among the articles included, five were from the same team and three from another team. Since measurements are sensitive to local habits, this could introduce bias in the generalization of the results. Third, this review focused on concurrent validity and reliability, which are the two main aspects of metrological assessment studied in the literature. Literature regarding the other aspects of validity is scarce and needs further investigation. Fourth, the quality scale we developed has not been validated and the scores should, therefore, be interpreted with caution.
Three-dimensional CT with reformatted images and 3D modelling, the trochanteric prominence test, or intraoperative anatomical measurements are considered as criterion standards[8, 30, 34, 59] (and see below). However, these methods also lack validation and one should be cautious when interpreting the results. The definition of a reliable and valid complete 3D model of the hip joint in children with CP is necessary. Another way to measure the proximal femur geometry is to study either femoral models that mimic CP hip geometry, or cadavers. These methods help with the validation process but should be followed by validation in vivo.[56, 66]
With regard to the statistical analysis, which is a key point of this type of study, no exhaustive recommendations are available. However, when studying the reliability of a measurement reporting either the limits of agreement, the SEM or the minimal detectable difference would allow future users to know whether the probability that the observed difference could be attributed to a true change or to an error of measurement. The power of the results can also be improved by calculating, a priori, the minimal sample size according to the number of raters and the number of repetitions.[67, 68]
The studies included assessed anteroposterior X-rays, ultrasound, 2D and 3D CT scans using either reformatted images or 3D bone models. More proof of evidence is needed for ultrasound, 2D and 3D CT scans, especially with regard to the assessment of femoral geometry.
Even though MRI has already been used in children with CP and other populations for the evaluation of proximal femoral geometry,[69-72] to date there are no studies evaluating the metrological properties either for acetabular dysplasia, femoral head migration or geometry of the femur in the specific population of children with CP. In a population of children with mixed pathologies (four children with CP out of 17 children with other pathologies), femoral anteversion measured with MRI showed good concurrent validity with CT and ultrasound as well as an excellent intra- and interrater reliability. Although MRI is time-consuming for 3D modelling and often requires young children to be sedated, its validity and reliability, the fact that it is radiation-free, and the possibility of 3D bone and cartilage modelling, suggest that it could be a highly useful tool.
EOS biplanar X-ray plus 3D modelling is a new method that is slightly irradiant but allows images to be taken in weight bearing and evaluates the entire lower limb in a single scan. Preliminary studies of concurrent validity with 3D CT and of reliability in cadavers for 3D acetabular coverage, and the first results in a population of 12 children including six children with CP, show that this technique is promising for the assessment of the hip in this population.
Finally, we would particularly like to insist on the need for evidence for non-irradiative methods. There is now clear evidence that repetitive exposure to radiographs may increase the risk of cancer in the paediatric population.[75, 76] Since these children frequently undergo orthopaedic follow-up for the duration of their lives, they may have an increased risk of cancer.
This review reports on the existing evidence behind the various image-based measurements of proximal hip geometry in children with CP. How to use such measurements and patient management strategies or guidelines for hip surveillance were not the aim of this review. Migration percentage, acetabular index, neck-shaft angle and the Robin and Graham classification all showed good to excellent concurrent validity and reliability, and are clinically relevant; these measurements would, therefore, seem to be the criterion-standard for initial diagnosis and hip surveillance. Other more complex methods, mostly for pre-surgical evaluations, such as 3D CT (reformatted planes or 3D modelling) can be used reliably for the assessment of acetabular dysplasia and neck-shaft angle but need further standardization and validation. Although there is some evidence for the use of CT scans in the assessment of femoral anteversion, this parameter remains a challenge and further developments and metrological evidence are required. Further studies are also needed to develop methods which are non- or less irradiative, such as ultrasound and MRI, specifically for the population of children with CP.
We sincerely thank Johanna Robertson for her help in revising the English of the article and Laetitia Houx for her help in illustrating the measurements and in reviewing the article.