- Top of page
- What this paper adds
- Supporting Information
Aim The aim of the study was to investigate the construct validity of the Quality of Upper Extremity Skills Test (QUEST) in children with cerebral palsy (CP).
Method A total of 170 QUEST assessments from a convenience sample of 94 children with CP involved in clinical and research treatment programmes (54 males, 40 females; mean age 6y 10mo, SD 2y 11mo, range 2–16y; Gross Motor Function Classification System levels I–V) were reviewed.
Results The QUEST was not unidimensional; many items demonstrated poor fit when total scores were analysed; goodness of fit improved when domains were considered independently and limbs separately examined. QUEST items involving elbow flexion and/or forearm in pronation were easily achieved, thus reducing test sensitivity. Postures items in the grasp domain behaved erratically, with little total score relationship.
Interpretation Calculating total scores is discouraged. Reporting QUEST results separately for domains and each limb is recommended. Posture items in the grasp domain had little relationship with total scores and it is recommended that they be removed from the test.
Measurement of upper limb movement and function among children with cerebral palsy (CP) has been investigated for many years. One of the earliest measures to be introduced was the Quality of Upper Extremity Skills Test (QUEST).1 The domains and items of the QUEST reflect assumptions of neurodevelopmental theory,2 which was presumed to underpin upper limb movement and function.1 QUEST scores are assumed to represent the ‘latent trait’ of upper limb movement quality and function in children with CP; however, this construct has not been tested. Previous QUEST validity studies, reviewed below, have examined only concurrent validity. Although there are now a number of measures relating to upper limb movement and function,3 the QUEST has had, and continues to have, widespread clinical and research use.4–11 Therefore, examination of construct validity is needed and overdue.
The QUEST manual contains psychometric and administration information.1 It is a 34-item criterion-referenced observation test, with higher scores indicating increased levels of achievement on harder items and the assumption that easier items can be credited as performed. There are four domains: dissociated movement, grasp, weight bearing, and protective extension. The domain score is a summed item score converted into a standardized percentage and the total score is the average of domain scores. Although the original target population was children with CP aged 18 months to 8 years, it has been used in research with older children.4,8,10,11 QUEST reliability is well established and has been found to be high for internal consistency and interrater and intrarater reliability.12 There is, however, a dearth of evidence relating to validity with both the target age range population and children over the age of 8 years.
During initial test development,1 concurrent validity of the QUEST with the Peabody Developmental Motor Scales – Fine Motor Subtest13 was tested. A strong correlation (r=0.84) suggested that the QUEST measured aspects of fine motor ability. High concurrent validity with the Melbourne Assessment of Unilateral Upper Limb Function14 in children with hemiplegic CP aged 5 to 8 years was found.15 This confirmed that the QUEST measures aspects of upper limb function.
To date, no study of the construct validity of individual QUEST items, domains, or total score has been carried out using item–response theory methods such as Rasch analysis.16 Rasch analysis provides information about the relative difficulty of items, and identifies items that do not contribute meaningfully to the latent trait being measured.16 This helps reveal redundant items, and contributes to characterizing the dimension being measured. In the present study, therefore, Rasch methodology was used to determine whether the QUEST measures the latent trait of ‘upper limb quality of movement and function’.
In simple Rasch models, probability of a correct response is modelled as a logistic function of the difference between the person (ability or attainment level) and item (difficulty of required performance).16 QUEST person parameters are the level of upper limb quality of movement and function (the ‘latent trait’) possessed by a child on an item, a domain, or a total score. A child with higher upper limb movement ability should have a higher probability of a higher score relative to the difficulty of an item. Rasch modelling constructs a theoretical line of measurement for all items, positioning them in order of difficulty – if items do reflect the latent trait there should be a good fit between the theoretical line and item placement. ‘Fit’ statistics thus describe how well items represent sample abilities and how well each participant has achieved items in comparison with model expectations.16 If the test represents a latent trait, all items should contribute and only one dimension should be evident. If a test is unidimensional, all items will have ‘good fit’, and a child’s likelihood of achieving item success should predict his or her overall score. QUEST total and domain scores are routinely reported; hence, this study examined construct validity of both.
We sought to establish QUEST construct validity in total and domain scores in children with CP aged 2 to 16 years.
- Top of page
- What this paper adds
- Supporting Information
The study used clinical data from the Queensland Cerebral Palsy Health Service (CP Health), Brisbane, data from a randomized controlled trial of upper limb intramuscular botulinum toxin A (Thumbs Up, conducted through CP Health), and data from a randomized controlled trial of occupational therapy home programmes conducted through The Spastic Centre, NSW.17 No attempt was made to control for treatments experienced by the children at the time of QUEST administration. Consequently, the sample represented a typically diverse CP population who regularly access services for evaluation and intervention. This is an acknowledged study limitation. The Human Research Ethics Committees of the Royal Children’s Hospital, The Spastic Centre, the University of Sydney, and the University of Western Sydney approved the use of existing data for this study.
To calculate sample size, information relating to person–measure estimate stability and item calibration sample size requirements were considered. Application of the Rash model requires ‘as many items for a stable person measure as you need persons for a stable item measure’ (ref. 18 p. 328). A sample of 150 tests is recommended for item calibrations to be stable within 0.5 logits.18 In total, 170 QUEST assessments were gathered from existing data sources (CP Health [n=67], Thumbs-Up study) [n=41], and The Spastic Centre [n=62]).
Four occupational therapists administered the QUEST according to the manual instructions. Each had clinical experience of treating children with CP (3y, 8y, 18y, and 20y) and experience using the QUEST before study data collection. All therapists had participated in QUEST training before study data collection to enhance data consistency.12
Administration materials were either the same as those specified in the manual or, when the test manual permitted, they were varied to ‘facilitate movements through verbal encouragement, toys, demonstration and/or handling the child as necessary’ (ref. 1, p. 13). All scoring used digital recordings, standard QUEST score sheets and a comments form. Hickey and Ziviani19 corrections to range of motion at elbow and wrist joints were used, as was their recommendation not to use the subjective rating scales of hand function, spasticity, and cooperativeness.
The Winsteps Rasch analysis program (version 3.63.0) was used for analyses.20 From the matrix of raw scores, the model estimating a linear ability for each child and a linear difficulty for each item was developed. These were scaled along a unidimensional continuum ranging from minus to plus infinity.16 Measurement units were expressed in logits, a logarithm of the ratio of ‘pass’ and ‘fail’ probabilities. Zero was the average item difficulty, in keeping with convention, to overcome known limitations of raw scoring an observational, categorical scale.16
Fit statistics indicated the extent to which data were unidimensional21– closeness of observed scores to predicted scoring pattern was expressed by (1) outlier-sensitive fit (outfit: sensitive to unexpected behaviour affecting responses to items far from a child’s ability level); and (2) information-weighted fit (infit: sensitive to unexpected behaviour affecting responses to items matching child ability). Both fit statistics were expected to approach 1.0, with typical acceptable values between 0.6 and 1.4.16 Point–biserial correlation coefficients were computed for each item, indicating the extent to which child scores on an item correlated with whole test scores, thus indicating predictable behaviour of items in relation to ability. As item predictability can be influenced by the number of counts in each category, the recommended minimum number (n=10) of counts per category for item predictability was used.18
Reliability was directly computed from the measurement error accompanying each child’s ability and item difficulty estimates.16,21 The person separation index indicated the extent to which measures separated individuals into different ability levels.16,21,22 We expected that the test would separate children into those with high, intermediate, and low abilities.16
- Top of page
- What this paper adds
- Supporting Information
Ninety-four children contributed 170 QUEST test scores (Table I). Where children contributed more than one QUEST test score, assessments were performed an average of 10 weeks apart (not <4wk). In the case of all children who contributed more than one QUEST, scores were tested for violation of independence and the person map was also inspected to ensure that scores from the same child were not co-located prior to analyses. Although all levels of gross motor and manual abilities were represented, children were mostly high functioning, with 66% classified as Manual Ability Classification System (MACS) level I or II. This skewing of the sample towards higher function in our clinical and research samples was not unexpected, as children with hemiplegia and diplegia commonly present for remediation of functional difficulties arising from impaired quality of upper limb movement.
Table I. Demographics of study participants
| ||Total sample||Thumbs up data||Clinical data||Home programme study|
|Multiple QUEST contributionsa (n)|
| One QUEST||43||3||35||9|
| Two QUESTs||30||8||9||13|
| Three QUESTs||17||6||2||9|
| Four QUESTs||4||1||2||0|
|Total QUESTs (n)||170||41||67||62|
|Age at time of each QUEST, mean (SD), range, y:mo||6:10 (2:11), 2:0–16:7||6:5 (3:2), 2:2mo– 11:4||6:4 (3:1), 2:0–16:7||7:8 (2:2), (4:5–13:2)|
|GMFCS level (all QUESTs)|
|MACS (all QUESTs)|
|Type of CP (all QUESTs)|
| Spastic quadriplegia||16||6||10||0|
| Spastic diplegia||41||3||13||25|
| Spastic hemiplegia||96||32||34||30|
Assessing therapists chose which QUEST domains to administer, resulting in varying quantities of domain data. The weight bearing or protective extension domains, for example, were omitted if in the therapist’s view a child was unable to perform items because of gross motor limitations or behavioural problems. This reflects clinical practice conventions. Therefore, data available were as follows: dissociated movement (n=162), grasp (n=170), weight bearing (n=39), and protective extension (n=38).
Initial Rasch analysis
A priori, ‘normal’ posture items were removed from grasp, as postural abilities were also reflected in ‘abnormal’ posture categories. This resulted in 61 items for analysis. Table II summarizes person separation, reliability, item fit, and point–biserial correlation information. Item–person maps depict the spread of item difficulty in relation to person abilities. Inspection of item–person maps helped identify if there were sufficient items to separate children of high, intermediate, and low levels of ability, without overrepresentation of items at any level of difficulty.
Table II. Summary of analyses for total scores and domain scores
|Domain (no. of items)||Person separation||Reliability||Items with poor fit||Item map|
|Whole test (61)||2.51||0.86||14||Many items at the same level of difficulty throughout the scale, particularly items of intermediate levels of difficulty, i.e. high scale redundancy.|
|Dissociated movement (34)||2.46||0.86||1||Less item redundancy than the whole test. Able to discriminate between children of high, intermediate and low levels of ability|
|Dissociated movement left (17)||2.80||0.89||3|| |
|Dissociated movement, right (17)||2.76||0.88||1|| |
|Grasp (postures removed) (6)||0.86||0.43||0||When limbs considered separately, able to discriminate between intermediate and low ability. The small number of items reduces the discriminatory ability of the domain|
|Grasp, left (3)||1.75||0.75||0|| |
|Grasp, right (3)||1.73||0.75||0|| |
|Weight bearing (10)||0.99||0.49||1||Able to discriminate between intermediate and low levels of ability|
|Weight bearing, left (5)||1.17||0.58||0|| |
|Weight bearing, right (5)||0.75||0.36||0|| |
|Protective extension (6)||0.45||0.17||0||Able to discriminate between intermediate and low levels of ability|
|Protective extension, left (3)||0.98||0.49||0|| |
|Protective extension, right (3)||0.71||0.34||0|| |
Total score analysis
Total score person separation and reliability were good (Table II); however, 14 items had poor fit (Table II), suggesting they did not pertain to the same latent trait.16 Seven items had low point–biserial correlations (≤0.15; Table III). Some of these were among the easiest in the test, with most children achieving success, as evidenced by low category counts for 0 (‘cannot do’) or 1 (‘partial credit’, e.g. elbow flexion with the forearm in pronation). Other erratic items were in postures. These were difficult to rate, particularly in children with hemiplegia who were capable on one side. They appeared to have little relationship with total scores. Postures items were thus removed from the remaining analyses. The item–person map for total scores (Fig. 1; the full legend can be found as supporting online information) demonstrated high scale redundancy, with many items being of the same level of difficulty.
Table III. Items with poor fit and poor point–biserial correlations
|Analysis||Items||Poor infit||Poor outfit||Poor point–biserial correlation|
|Total score||Posture: head atypical||✓||✓||✓|
|Posture: trunk lateral|| ||✓||✓|
|Posture: shoulders elevated|| ||✓||✓|
|Posture: trunk forward|| ||✓||✓|
|Posture: shoulders retracted|| ||✓||✓|
|DM: Shoulder flexion, right|| ||✓|| |
|DM: wrist extension with elbow flexion, left|| ||✓|| |
|DM: wrist extension with forearm in pronation, left|| ||✓|| |
|DM: wrist extension with forearm in pronation, right||✓|| || |
|DM: wrist extension with elbow flexed, right||✓|| || |
|DM: elbow flexion with forearm in pronation, left|| || ||✓|
|DM: elbow flexion with forearm in pronation, right|| || ||✓|
|Grasp: cereal, right||✓||✓|| |
|Weight bearing: sitting with hands behind, left||✓|| || |
|Protective extension: backwards, left||✓||✓|| |
|Protective extension: backwards, right||✓|| || |
|DM (all items)||Elbow flexion with forearm in pronation, right|| ||✓||✓|
|DM (left items only)||Elbow flexion with forearm in pronation, left||✓||✓|| |
|Wrist extension with elbow in extension, left|| ||✓|| |
|Wrist extension with elbow in flexion, left||✓|| || |
|DM (right items only)||Elbow flexion with forearm in pronation, right||✓||✓|| |
|Weight bearing||Weight bearing: sitting with hands behind, left||✓|| || |
Figure 1. Item–person map for the 61 items in the total score. On the left side of the vertical dashed line, ‘.’ represents the estimated total score of one individual, ‘X’ represents the estimated total scores of two individuals. On the right side of the vertical dashed line, average item difficulties are presented, with more difficult items at the top (items are identified with labels from item 1, A1, to item A61). (The full version of this legend with explanations for each item label can be found as supplementary material in the online version of this article.)
Download figure to PowerPoint
Redundancy in QUEST items and poor fit when total scores were calculated indicated domains should be analysed separately. Theoretical assumptions underlying the QUEST meant that items were based on tasks assumed to follow a developmental trajectory for the refinement of skills. In this study, it was apparent that this was problematic because 57% of the sample had unilateral impairment. Consequently, because it was likely children would perform better with their dominant hand than with their non-dominant hand, we analysed left and right items separately.
The person separation index was acceptable (Table II). One item had unacceptable outfit and a low point–biserial correlation. Seventeen items had insufficient (<10) counts for one or more categories – largely items that involved the forearm in pronation and/or the elbow in flexion; these represented the easiest items on the test. Item–person map examination suggested that the dissociated movement domain could separate children into high-, medium-, and low-ability categories. Left and right limb analyses revealed acceptable person separation and reliability (Table II). Although all items had acceptable point–biserial correlations, three left items and one right item demonstrated poor fit (Table III).
The person separation index was below the acceptable standard of 2 (Table II). Item–person map inspection revealed that the small number of items in this domain did not allow for adequate discrimination between children with different levels of ability. All items had acceptable point–biserial correlations and acceptable infit and outfit values. Category function analyses demonstrated disordered rating scales for grasp items. For example, for ‘grasp of cereal’, the categories ‘scissor grasp’ and ‘inferior scissor grasp’ were not often used by raters in this study, making these partial credit categories redundant in our sample. When left and right items were examined separately, category function improved and all items had good fit.
The separation value was below the expected value (Table II). All items had adequate point–biserial correlations; one item had poor infit (Table III). Many weight bearing categories had counts of less than 10. The item map indicated that there were insufficient items to separate children in the middle of the scale. When limbs were considered separately, all items had good fit and point–biserial correlations.
Person separation was poor (Table II) possibly because of the small number of items. Fit statistics and point–biserial correlations were acceptable. Many categories within the protective extension items had counts of less than 10. Examination of left and right items separately revealed that all items had acceptable fit and point–biserial correlations.
- Top of page
- What this paper adds
- Supporting Information
This was the first study to investigate QUEST construct validity in children with CP aged 2 to 16 years. Rasch modelling was used to determine whether the QUEST items were unidimensional, that is, representative of the same latent trait. Analysis demonstrated that 14 items had poor fit. Poor point–biserial correlations indicated seven unpredictable items, including the atypical postures items. Categories within rating scales for some items were redundant: in some cases only three or four out of a possible six categories were used. These findings suggest that the QUEST total score may not reflect a unidimensional construct.
Analysis of domains found adequate fit statistics and less item redundancy when compared with total scores. Grasp had better scalability when separated into left and right items, particularly for category function. Dissociated movement items with elbow in flexion and/or pronation were achieved by most children, resulting in unpredictable relationships between these items and total/domain scores. Weight bearing and protective extension had many items with low counts. This may be related to the study sample, which comprised children with a relatively high level of functioning, or it may suggest these items are redundant and categories should be collapsed. Further investigation of elbow flexion/pronation items, weight bearing and protective extension domains with a sample of lower-functioning children is required to determine construct validity and whether these items should remain in the scale.
Rasch analysis of the total score revealed that the QUEST has significant item redundancy. When domains were examined separately, scalability improved, with less redundancy and improved item fit. On their own, domains appear to more meaningfully reflect unidimensional constructs.
QUEST results should be reported only by domains and not a total score. Reporting domain scores will not compromise the reliability of the test, as reliability findings for domain scores have been found to be high and comparable to total score findings.12 More than one domain should be administered to improve the sensitivity of the instrument because of the demonstrated difficulty of individual domains to separate children into ability categories.
Inspection of QUEST items reveals that, with the exception of postures, they are all unimanual; however, the total or domain score involves adding the scores from each limb together. Thus, good performance by one limb may mask by poor results of the other. Previous researchers have recognized this limitation of the QUEST and calculated unimanual scores.4,9,10 Other upper limb assessments, for example the Assisting Hand Assessment,23 the Paediatric Motor Activity Log,24 and the Melbourne Assessment of Unilateral Upper Limb Function,14 produce unimanual total scores. To enhance interpretability of QUEST scores, improve category function, and allow easier comparison with other unimanual assessments, it is recommended that unimanual QUEST scores be calculated.
Items involving the elbow in flexion and/or the forearm in pronation were demonstrated by most children, which was not surprising as this is the known synergistic spastic upper limb posture. The suggestion made by Woodbury et al.25 to reconsider the use of resting-state reflexes involving synergistic patterns (such as elbow flexion and forearm pronation) when measuring purposeful upper limb movement is supported by our study. Further research evaluating removal of these items is needed, particularly for children classified as MACS levels IV and V.
Postures items in the grasp domain were problematic, and removal is recommended. Postures items had little relationship with QUEST total scores, as evidenced by poor point–biserial correlations. Further, postures item instructions require that ‘observations for scoring this item should be made while administering the grasp items in the following section’ (ref. 1 p. 7). In practice, raters might see abnormal postures for one or two grasps, and normal postures for others. Deleting these items removes not only psychometric but also administration problems.
The convenience sample resulted in repeat tests and in diverse participant treatment, which was not controlled. It was typical of the clinical CP population in this regard. Although Rasch analysis included adequate data for dissociated movement and grasps, the sample provided limited data for protective extension and weight bearing. As a result, findings in these domains should be viewed with caution. Children with low levels of manual ability (MACS levels IV and V) were underrepresented.
Recommendations for research
The authors of the QUEST assumed that development of upper limb movement and function in children with CP mirrors development in children without disability; however, our results suggest otherwise. The results of this study instead suggest item redundancy in scales and a development of grasp that is not hierarchical. Longitudinal research on the development of hand function in children with CP is required to refine measurement of upper limb quality of movement and function appropriately. Further psychometric investigation on the amended QUEST is required with samples that represent the full range of ability of children with CP.