Development and validation of item sets to improve efficiency of administration of the 66-item Gross Motor Function Measure in children with cerebral palsy


Dr Dianne J Russell at CanChild Centre for Childhood Disability Research, McMaster University, IAHS 408, 1400 Main Street West, Hamilton, ON L8S 1C7, Canada. E-mail:


Aim  To develop an algorithmic approach to identify item sets of the 66-item version of the Gross Motor Function Measure (GMFM-66) to be administered to individual children, and to examine the validity of the algorithm for obtaining a GMFM-66 score.

Method  An algorithmic approach was used to identify item sets of the GMFM-66 (GMFM-66-IS) using data from 95 males and 79 females with cerebral palsy (CP; mean age 14y 7mo, SD 1y 8mo, range 12y 7mo to 17y 8mo). The GMFM-66-IS scores were then validated using combined data from three Dutch studies involving 134 males and 92 females with CP (mean age 7y, SD 4y 6mo, range 1y 4mo to 13y 8mo), representing all levels of the Gross Motor Function Classification System.

Results  The final algorithm contains three decision items from the GMFM-66 that determine which one of four item sets to administer. The GMFM-66-IS has excellent agreement with the full GMFM-66 both at a single assessment (intraclass correlation coefficient [ICC]=0.994, 95% confidence intervals [CI] 0.993–0.996) and across repeat assessments (ICC=0.92, 95% CI 0.89–0.95).

Interpretation  The GMFM-66-IS is a promising alternative to the full GMFM-66. Users should be consistent in their choice of measure (GMFM-66 or GMFM-66-IS) on repeat testing and clearly identify which method was used.

The Gross Motor Function Measure (GMFM) was designed to assess change in gross motor function over time in children with cerebral palsy (CP). The GMFM has been shown to be a reliable and valid measure of gross motor function,1–4 and it is widely used as a measure of motor function in research applications.5–9 Rasch analysis has been applied to the GMFM to improve its scalability and interpretability, resulting in a version containing only 66 of the original 88 items. For clarity, the reduced version is referred to as the GMFM-66,10 and the original 88-item GMFM is now referred to as the GMFM-88. The reliability, validity, and responsiveness of the GMFM-66 has been demonstrated.10–13

The GMFM-88 was designed and validated as a responsive measure of change over time. An important consideration in the development of the GMFM-66 was that responsiveness should be maintained. This was assessed at the time of development, and Wang and Yang14 have since concluded that the responsiveness of the GMFM-66 is actually superior to that of the GMFM-88. This property makes it particularly useful for measuring the effect of intervention programmes.

Additional advantages of the GMFM-66 over the GMFM-88 include the interval properties of the scale that allow improved interpretability of a total score and change scores, the decreased time needed to administer the 66 items, the ability to obtain a score even when not all of the items are tested, the availability of a computer program for scoring, and the ordering of items according to difficulty on item maps.

As a test of the rigor of the Rasch method, a computer simulation exercise was undertaken to determine the minimum number of items that would be required to obtain a valid estimate of gross motor ability. This exercise is described in the Gross Motor Function Measure User’s Manual11 and by Avery et al.12 The results demonstrate that as few as 13 items can provide an accurate estimate of the GMFM-66 score. On the basis of these findings researchers and clinicians have been eager to test fewer items to reduce even further the time involved for each assessment. There are, however, no guidelines for choosing an appropriate subset of the 66 items.

To provide a systematic method for the selection of items, an algorithmic approach was developed in the present study, in which children are assessed on a series of decision items that guide the therapist to a set of predefined items targeted to the individual child’s ability; once the item set appropriate to the individual’s ability is found, the remaining decision items are recorded as ‘not tested’. The score arrived at through this item-set method will be referred to as the GMFM-66 item set (GMFM-66-IS) score.

The purposes of the present study were to develop this algorithm to enable users to identify the appropriate subset of GMFM-66 items based on a child’s functional ability, and to examine the validity of this method using cross-sectional and longitudinal data. We aimed to answer the following questions: can the algorithmic method be used to obtain a valid GMFM-66-IS score for a single assessment; does the GMFM-66-IS score measure change over time in a similar way to the full GMFM-66 score (and do different item sets measure change differently); and can the algorithmic approach be used for children who are assessed with different item sets on different occasions?



Development sample

Data from the first GMFM-66 assessment of 228 adolescents participating in the Adolescent Study of Quality of Life, Movement and Exercise15 were used to develop the algorithmic approach; this sample will be referred to as the development sample. Fifty-four participants were excluded because they had a score of ‘not tested’ on the first decision item, leaving 174 participants in whom the algorithmic approach was piloted. Characteristics of this sample are found in Table I.

Table I.   Characteristics of participants in the development and validation samples
 Development sampleValidation sample
  1. aGross Motor Function Classification System (GMFCS) levels were not measured in one of the studies contributing to the validation sample; however, all children in that study ranged in function from level I to level III. bParticipants who did not complete the first decision item could not be assigned an item set. CP, cerebral palsy.

Sample size, n174227
Type of CP, n
 Spastic bilateral106110
 Spastic unilateral2894
 Not specified30
GMFCS level, n
 Not specifieda055
Item set, n
 Not calculatedb03
Sex, n
 Not specified01
Age, y:mo
 Mean (SD) 14:7 (1:8)  7:0 (4:6)
 Range 12:7–17:8  1:4–13:8

The children were assessed by study therapists who had been trained in the administration and scoring of the GMFM-88. The therapists’ reliability in assessing the GMFM-66 was measured against a criterion video before the study and again near the end of data collection to ensure that they reached and maintained a criterion score of more than 0.80 using a weighted kappa.16

Ethical approval was obtained through the institutional review board of McMaster University; all parents consented, and participants gave assent.

Validation sample

The participants in the development sample were also involved in the Ontario Motor Growth Curves study, which provided data for the Rasch analysis used to develop the GMFM-66. Because the datasets involve the same children, there was concern that any data collected from this sample might overestimate the strength of the relationship between the GMFM-66-IS score and the full GMFM-66 score. Therefore, the GMFM-66-IS was validated using GMFM-88 data sets previously collected by researchers in the Netherlands. This validation sample provides data from three studies involving children with a broad age range from 1 to 13 years; characteristics of this sample are provided in Table I.

The first sample (n=107) came from the Pediatric Rehabilitation Research in the Netherlands group.17 These children were aged 9 to 13 years. Only one assessment was available for each individual, so these data were used only in the cross-sectional analysis. GMFM-88 assessments were administered by a trained researcher who also classified the children by Gross Motor Function Classification System (GMFCS) level. GMFM-66 scores were calculated from the GMFM-88 scores using the Gross Motor Ability Estimator software provided with the GMFM-66 user manual.11

The second sample (n=62) was provided by the Pediatric Rehabilitation Research in the Netherlands cohort of children aged 5 years or younger. These children were assessed at 18 months, 2 years 6 months, and 3 years 6 months. This sample was used for both the cross-sectional and the longitudinal analyses. All GMFM-88 assessments were administered by trained assessors.

The third sample (n=55) was provided by Ketelaar and colleagues18 from their study of functional therapy for children with CP. The GMFCS level was not recorded for these children, but they were all assessed as having ‘mild’ (n=43) or ‘moderate’ (n=12) motor impairment, corresponding to GMFCS levels I to III (M Ketalaar, personal communication 2007). In this data set children aged 1 to 7 years were assessed on four occasions at 6-month intervals. For the cross-sectional analysis, the first assessment of each individual was used; the longitudinal analysis involved a subset of children who had a repeat assessment at 1 year. Five paediatric physical therapists and one research assistant were trained in the administration and scoring of the GMFM-88 and carried out all GMFM-88 assessments. For the repeat assessments, the same assessor administered the GMFM-88 in the same therapy room.


The purpose of developing the new scoring method was to provide a set of items targeted to an individual’s ability, to enable accurate scoring with fewer items. Initially six item sets were identified. These were selected on the basis of six items that were spaced at approximately equal intervals across the GMFM-66 item difficulty map to ensure that the spectrum of abilities would be captured. Additional items clustered around these six items were included, to ensure an adequate sampling above and below the target ability level, so that each set contained at least 13 items, recommended as the minimum number required for an accurate estimate.12

To direct therapists to the appropriate item set, five decision items were selected at intervals midway between adjacent item sets. A child who could complete a decision item (i.e. who had a score of 3) would then be tested on a more difficult decision item; if they scored a 2 or less, they would be directed to the previous item set. The initial pilot algorithm is outlined in Figure S1 (supporting information published online).

Using data from the development sample, the performance of the initial six item sets was assessed in relation to the full GMFM-66 score. With those preliminary findings, we embarked on an iterative approach, modifying the item sets and the decision items so that the resulting GMFM-66-IS score would closely approximate the full GMFM-66 score.

Statistical analysis

Data analysis was carried out using SPSS for Windows version 14.0 (SPSS Inc., Chicago, IL, USA).

Cross-sectional analysis

The algorithm was used to calculate a GMFM-66-IS score for each participant. A simple regression analysis was then performed to determine how accurately the GMFM-66-IS score predicted the full GMFM-66 score.

An intraclass correlation coefficient (ICC) was calculated to give an estimate of the absolute agreement between the algorithm score and the full score. Unlike a Pearson’s correlation, this ICC would penalize any systematic bias in the algorithm score. The chosen ICC was based on a two-way mixed-effects model; calculations were carried out according to the methods presented by McGraw and Wong19 for the coefficient ICC(A,1).

To gain an understanding of the factors that might determine for whom the algorithms work well, the degree of score agreement was examined in subgroups defined by several variables such as the type of CP, GMFCS level, item set, age, and sex. Four variations of score agreement were considered: exact agreement, agreement within 2 points (plus or minus), agreement within 4 points, and agreement within 10 points. The effect of the child’s age on the score agreement was also examined by a plot of the differences between the item set and full scores against age.

Longitudinal analysis

Of the 227 individuals in the validation sample, 110 had two assessments 1 year apart, and these were used in the longitudinal analysis. Change scores (difference between the baseline and 1y scores) were calculated for both the GMFM-66-IS and the full GMFM-66 score. An ICC(A,1) was calculated to determine how closely the GMFM-66-IS and the GMFM-66 measured change.

To examine whether change was measured differently depending on which item set was administered, the difference in the change scores was calculated for each participant (difference score). For children in whom the GMFM-66-IS score and the GMFM-66 score indicated the same amount of change, the difference score would be 0. Children assessed as having changed more with the GMFM-66-IS would have a positive value for the difference score. The mean difference scores were calculated for each item set, and a one-way analysis of variance (ANOVA) was performed to examine whether change was assessed differently with the different item sets. Only participants who were assessed with a different item set on the two occasions were included in this analysis.

Young children would be expected to move between item sets as their gross motor function improves. For example, a very young child may initially be assessed on item set 1 and later on item set 2 or 3. To determine whether moving between the item sets affected the relationship between the GMFM-66-IS score and the full GMFM-66 score, a two-factor ANOVA model was used. This model examined the effect of two factors on the difference in change score, initial item set and whether the child moved between item sets.


Algorithm and item sets

On the iterative review based on data from the development sample, we decided to combine three of the item sets (and therefore remove two of the decision items), leaving four item sets that gave good agreement between the full GMFM-66 score and the GMFM-66-IS score. Item set 1 has 15 items, item set 2 has 29 items, item set 3 has 39 items, and item set 4 has 22 items. The final algorithm is outlined in Figure 1, and the associated item sets are described in Appendix S1 (supporting information published online).

Figure 1.

 The final algorithmic approach for the identification of the appropriate item set for each child being assessed. Decision items are shown in bold.

Cross-sectional analysis

A simple linear regression model was used to predict the full GMFM-66 score from the GMFM-66-IS score. If the GMFM-66-IS score was a perfect predictor of the GMFM-66 score one would expect the regression intercept to be 0 and its slope to be 1; 95% confidence intervals (CIs) of the parameter estimates that include these values are therefore desirable. The estimated intercept was close to 0 (β0=−0.54, 95% CI −1.45 to 0.37), and the estimated slope was near 1 (β1=1.01, 95% CI 0.99–1.02).

A plot of the full GMFM-66 score against the GMFM-66-IS score is presented in Figure S2 (supporting information published online). It indicates that the two methods produce very similar scores, with most individuals being close to the line of perfect agreement. There is, however, one very prominent outlier. This child had a GMFM-66-IS score of 89.7 and a full score of 60.1. Removal of this outlier had little effect on the results (details not presented).

The ICC was high at 0.994 (95% CI 0.993–0.996), confirming that the absolute agreement between the two measures is very good. A plot of age against score agreement (GMFM-66-IS minus GMFM-66; not shown) indicated that age was not related to score agreement; this was confirmed by a low correlation coefficient (r=0.015).

The mean GMFM-66-IS and full GMFM-66 scores for different groups of respondents are shown in Table II. For all groups examined, the mean and SD scores are very similar across scoring methods. Table III shows how well the GMFM-66-IS score approximates the full score for different groups of respondents. Overall, 82% of respondents had GMFM-66-IS scores within 2 points of the full GMFM-66 score, and 96% of the respondents had agreement within 4 points. Item set 2 performed particularly well, with all but two respondents having score agreement within 2 points, and all respondents having agreement within 4 points. Item sets 3 and 4 had the largest score differences, but even then more than 95% of participants had score differences of <4 points.

Table II.   Mean (SD) 66-item Gross Motor Function Measure (GMFM-66) and item set scores (GMFM-66-IS) for the validation sample
  1. GMFCS, Gross Motor Function Classification System.

Type of cerebral palsy
 Spastic bilateral11052.1 (23.6)52.3 (23.3)
 Spastic unilateral9469.1 (22.3)69.2 (22.0)
 Dyskinetic537.4 (30.1)37.5 (29.2)
 Ataxic485.3 (12.7)84.6 (13.1)
 Mixed1145.9 (29.2)46.8 (29.2)
GMFCS level
 I7176.7 (21.7)76.9 (21.4)
 II2461.0 (22.7)60.9 (22.0)
 III2744.7 (14.7)45.1 (15.0)
 IV2632.9 (10.1)33.0 (10.0)
 V2122.1 (8.1)22.9 (8.2)
 Not specified5569.7 (11.2)69.5 (10.8)
Item set
 13221.4 (7.2)22.1 (7.1)
 24537.7 (7.3)37.7 (7.4)
 36460.2 (11.6)60.0 (11.0)
 48384.8 (10.1)85.0 (8.9)
Total22459.2 (25.2)59.4 (24.9)
Table III.   Score agreement for the validation sample
 TotalExact±2 points±4 points±10 points>10 points
  1. GMFCS, Gross Motor Function Classification System; ±, within.

Type of cerebral palsy, n (%)
 Spastic bilateral11019 (17.3)92 (83.6)106 (96.4)110 (100)0 (0)
 Spastic unilateral9431 (33)74 (78.7)90 (95.7)92 (97.9)2 (2.1)
 Dyskinetic51 (20)5 (100)5 (100)5 (100)0 (0)
 Ataxic42 (50)3 (75)4 (100)4 (100)0 (0)
 Mixed112 (18.2)10 (90.9)10 (90.9)11 (100)0 (0)
GMFCS level, n (%)
 I7135 (49.3)65 (91.5)69 (97.2)70 (98.6)1 (1.4)
 II243 (12.5)18 (75)23 (95.8)24 (100)0 (0)
 III272 (7.4)25 (92.6)26 (96.3)27 (100)0 (0)
 IV264 (15.4)24 (92.3)25 (96.2)26 (100)0 (0)
 V215 (23.8)18 (85.7)21 (100)21 (100)0 (0)
 Not specified556 (10.9)34 (61.8)51 (92.7)54 (98.2)1 (1.8)
Item set, n (%)
 1328 (25)28 (87.5)31 (96.9)32 (100)0 (0)
 2455 (11.1)43 (95.6)45 (100)45 (100)0 (0)
 3645 (7.8)47 (73.4)61 (95.3)64 (100)0 (0)
 48337 (44.6)66 (79.5)78 (94)81 (97.6)2 (2.4)
Total, n (%)22455 (24.6)184 (82.1)215 (96)222 (99.1)2 (0.9)

Longitudinal analysis

The agreement between the change in the full GMFM-66 score and the change in the GMFM-66-IS score across all participants was high (ICC=0.92, 95% CI 0.89–0.95), indicating good global agreement between the two methods.

For participants who were assessed with the same item set at both timepoints, there is no significant difference in change scores among the item sets (F=1.87, p>0.05). The mean difference in change score for all item sets is near 0, indicating little difference in how change is measured. The difference in change score was slightly positive for item set 3 (mean difference 0.845, 95% CI 0.34–1.35), indicating a slightly greater change in score over 12 months on the full score compared with item set 3.

Estimated marginal means of the two-factor ANOVA of the relationship between the differences in change score and both the initial item set and whether the child changed item set are shown in Table IV. The results of the ANOVA indicated that neither the initial item set nor a change in item sets over 1 year influenced the difference in change score between the methods. However, the interaction between these two methods was statistically significant (p=0.007), indicating that the full score and the GMFM-66-IS score measured change differently depending on the combination of initial item set and whether the child moved between item sets. The estimated marginal means of the difference in change scores between the methods indicate that the difference is actually close to 0 in all instances, confirming that, despite the statistical significance of the finding, there was no substantial numerical difference between the two scoring methods.

Table IV.   Estimated marginal means from the two-way analysis of variance investigating the effect of initial item set and whether the child changed item sets on the difference in change score between the item set and full scores on the 66-item Gross Motor Function Measure (GMFM-66, n=110)
Initial item setChange in item setanMeanStandard error95% confidence interval
  1. aWhether or not the child was measured with different item sets at baseline and 1 year. bStatistically significant result, indicating that the mean difference between the GMFM-66 and the GMFM-66-IS was slightly positive for this group. cNo children who were initially assessed in item set 4 were assessed on a different item set 1 year later.

1Same item set70.6570.864−1.057 to 2.371
Different item set21.6821.022−0.346 to 3.710
2Same item set11−0.5250.689−1.893 to 0.842
Different item set4−0.1870.511−1.201 to 0.826
3Same item set350.8450.3860.078 to 1.611b
Different item set19−1.8040.554−2.903 to −0.704
4Same item set15−0.1910.590−1.362 to 0.979
Different item set17NAc  


The results indicate that the algorithmic approach to GMFM-66 assessment and the resulting GMFM-66-IS scores are a good alternative to assessment using the full GMFM-66.

The GMFM-66-IS score correlates well with the full score when applied to a single assessment, as indicated by the coefficients of the regression analysis. The estimated intercept was near 0, indicating that for this sample the GMFM-66-IS was systematically neither higher nor lower than the full score. Also, the estimated slope was approximately 1, indicating that GMFM-66-IS scores correspond very closely to the full GMFM-66 scores. In addition, the ICC of the degree of exact agreement between the two methods was very high.

When used to measure change over time the GMFM-66-IS score shows excellent agreement with the full GMFM-66 score, regardless of the item set on which a person is assessed or whether different item sets are used at different assessments. The overall agreement between 1-year change scores for the two methods was high. There was no systematic difference between item sets or between children who were assessed on the same or different item sets on each occasion. A significant interaction between initial item set and whether the item set changed between assessments was found. This is not strong evidence of a practical problem because the mean difference between the groups was close to 0. No further comparison was performed for the longitudinal analysis because of its smaller sample size.

Because the GMFM-66-IS does not require assessment of items considered to be too easy for a child’s overall level of function, the score tends to be higher than the full GMFM-66 score. In fact the GMFM-66-IS score may be a more accurate representation of the child’s function if the easier items are no longer relevant to the child’s overall functioning. If, however, one was targeting intervention to the affected side of a child with unilateral CP, one might wish to use the full GMFM-66 to capture change in that aspect of a child’s function.

It would be prudent for therapists to be consistent in their choice of scoring method for a given child. For therapists interested primarily in assessing change over time or in research studies where administration time is an important consideration, the GMFM-66-IS is a useful measure. For those who would like to assess the whole range of abilities, particularly to obtain a detailed descriptive account of a child’s current function, the full GMFM-66 is useful. Therapists should then continue to use the same method in future assessments for any given child. This will ensure that the measured change is assessed consistently.

This validation exercise shows that the GMFM-66-IS is a systematic approach to identifying the appropriate subset of items for any given child, and it is a promising option to decrease the administration and scoring time required for a valid GMFM-66 score, although, because the items were selected post hoc from the full GMFM-66, the extent of this time-saving is yet to be determined. As with the full GMFM-66, the item scores from the GMFM-66-IS can be entered into the Gross Motor Ability Estimator computer program to give an estimate of the GMFM-66 score, as well as the ability to display and print item difficulty maps.

This was a retrospective study of the validity of the GMFM-66-IS in which we aimed to assess whether the algorithmic method and the resulting item set score provide a reliable alternative to the full GMFM-66. Because this has been a theoretical exercise, it would be important to get an understanding of the clinical utility of this approach prospectively. We encourage clinicians to try this approach, print the item maps, and see whether this gives them the type of clinical information that they need. Further validation of this approach could be done using other data sets. Current work is under way to evaluate the reliability, responsiveness, and clinical use of the GMFM-66-IS prospectively.


We are very grateful to our Dutch colleagues, Jeanine Voorman, Marjolijn Ketelaar, and the Pediatric Rehabilitation Research in The Netherlands Study Group, for sharing their Gross Motor Function Measure (GMFM) data with us. We also thank Barb Galuppi, the many therapists who did the GMFM assessments, and the children and families who participated in these studies. Funding for the Adolescent Study of Quality of Life, Movement and Exercise study was provided by the Canadian Institutes of Health Research (CIHR; MOP-53258). Dianne Russell holds a Research Scholar award from the Ontario Federation for Cerebral Palsy. Peter Rosenbaum holds a CIHR Canada Research Chair in Childhood Disability.