Development of a parent‐reported questionnaire evaluating upper limb activity limitation in children with cerebral palsy

Abstract Background and purpose Upper limb activity measures for children with cerebral palsy have a number of limitations, for example, lack of validity and poor responsiveness. To overcome these limitations, we developed the Children's Arm Rehabilitation Measure (ChARM), a parent‐reported questionnaire validated for children with cerebral palsy aged 5–16 years. This paper describes both the development of the ChARM items and response categories and its psychometric testing and further refinement using the Rasch measurement model. Methods To generate valid items for the ChARM, we collected goals of therapy specifically developed by therapists, children with cerebral palsy, and their parents for improving activity limitation of the upper limb. The activities, which were the focus of these goals, formed the basis for the items. Therapists typically break an activity into natural stages for the purpose of improving activity performance, and these natural orders of achievement formed each item's response options. Items underwent face validity testing with health care professionals, parents of children with cerebral palsy, academics, and lay persons. A Rasch analysis was performed on ChARM questionnaires completed by the parents of 170 children with cerebral palsy from 12 hospital paediatric services. The ChARM was amended, and the procedure repeated on 148 ChARMs (from children's mean age: 10 years and 1 month; range: 4 years and 8 months to 16 years and 11 months; 85 males; Manual Ability Classification System Levels I = 9, II = 26, III = 48, IV = 45, and V = 18). Results The final 19‐item unidimensional questionnaire displayed fit to the Rasch model (chi‐square p = .18), excellent reliability (person separation index = 0.95, α = 0.95), and no floor or ceiling effects. Items showed no response bias for gender, distribution of impairment, age, or learning disability. Discussion The ChARM is a psychometrically sound measure of upper limb activity validated for children with cerebral palsy aged 5–16 years. The ChARM is freely available for use to clinicians and nonprofit organisations.

Traditional interventions improve the independence of children with cerebral palsy by addressing activity limitation, that is, improving active function by the independent movement of the child to achieve an activity (Ashford & Turner-Stokes, 2013). In recent years, research and reviews investigating these interventions suggest that there is a lack of valid and responsive measures for evaluating changes in upper limb activity limitation (Hoare et al., 2010;Meyer-Heim & van Hedel, 2013;Palsbo & Hood-Szivek, 2012;Qiu et al., 2009;Sakzewski, Ziviani, & Boyd, 2009;Sandlund, Mcdonough, & Hager-Ross, 2009). This is supported by systematic reviews into measures of activity limitation for children with cerebral palsy (Gilmore, Sakzewski, & Boyd, 2010;Greaves, Imms, Dodd, & Krumlinde-Sundholm, 2010;Klingels et al., 2010). These reviews suggest that the ABILHAND-Kids is the most psychometrically robust measure available for this purpose.
The ABILHAND-Kids has been developed using Rasch analysis, which allows the transformation of ordinal outcome scores into linear (interval-level) scores if the data from their items fit the Rasch mathematical model (Bond & Fox, 2015, p. 29). This approach satisfies the compelling argument that ordinal outcome scores should not be used in clinical trials (Hobart, Cano, Zajicek, & Thompson, 2007;Merbitz, Morris, & Grip, 1989). However, there is no evidence that the ABILHAND-Kids measure is responsive (Gilmore et al., 2010;Greaves et al., 2010). Other studies using the ABILHAND-Kids also suggest a lack of responsiveness (Preston et al., 2016;Preston et al., 2015). The adult version of the ABILHAND-Kids (the ABILHAND) also has limited responsiveness when compared with other measures (Bovolenta, Clerici, Agosti, & Franceschini, M, 2009). Additionally, the ABILHAND-Kids was validated on a sample of French-speaking children with cerebral palsy that included only four children with severe activity limitation, on which a floor effect was reported, and 46% of the remaining sample were classed as having minimal to no activity limitation (Arnould, Penta, Renders, & Thonnard, 2004). It is increasingly important that the validity and scale range of measures of activity limitation include children with more severe disability, because new approaches such as robotic and computer-assisted rehabilitation technology are potentially more inclusive for children whose degree of disability prevents their participation in other rehabilitation practices such as constraint-induced movement therapy (Fasoli et al., 2010, Meyer-Heim & van Hedel, 2013. Since the reviews of Gilmore et al. (2010), Greaves et al. (2010), and Klingels et al. (2010), two other measures with good potential (the paediatric motor activity log [revised; Wallen, Bundy, Pont, & Ziviani, 2009] and the Children's Hand-use Experience Questionnaire [Skold, Krumlinde-Sundholm, Hermansson, & Eliasson, 2009]) have been developed using Rasch analysis, but they still require further psychometric testing (Skold, Hermansson, Krumlinde-Sundholm, & Eliasson, 2011;Wallen & Ziviani, 2013), and some items appear unsuitable for all children, for example, fastening a necklace (Skold et al., 2011 and set out to construct a measure that encompasses the most common activities of daily living at which children with cerebral palsy experience limitation. By developing the new measure using the Rasch model, we intended that the final measure would permit transformation of the raw scores to interval-level measurement. The Rasch model is a probabilistic mathematical model of measurement based upon, but less rigid than, the (deterministic) Guttman pattern (Bond & Fox, 2015 p. 177;Tennant & Conaghan, 2007). The underlying principle for constructing measures based on the Rasch model is that the probability of a person endorsing, or "passing," an item is influenced only by the difficulty of the item and the ability of the person (Tennant & Conaghan, 2007). Endorsing an item illustrates a specific "quantity" of the trait being measured, and it is probable that all easier items will be also be endorsed by that person. This technique allows the person being measured to be numerically quantified on a logistic scale if the items themselves are on a linear scale and if they are unidimensional (they all relate strongly to the trait being measured and not a different underlying trait). The linear (interval) scales on which items and persons are numerically located are calibrated in log-odds units called logits. These units represent the natural logarithm of the odds of success, that is, endorsing (or passing) an item (Bond & Fox, 2015 p. 29).
Responses to items showing a good fit to the Rasch model are determined to have met the fundamental principles of measurement for achieving linear (interval-level) outcome scores (Newby, Conner, Grant, & Bunderson, 2009). Bond & Fox (2015, Chapter 3) give a helpful description of these principles, and a commentary of what should be expected from a Rasch analysis is provided by Tennant and Conaghan (2007).
This study therefore aimed to develop and establish a psychometrically sound, parent-completed questionnaire for measuring activity limitation of children with cerebral palsy aged 5-16 years that could produce interval-level measurement and that had no floor effects even in children with the most severe upper limb activity limitation.

| Item and response category development
To develop appropriate items for the new measure, we used an approach, which aimed to focus the items squarely within the dimension of activity limitation, and specifically those activities at which children with cerebral palsy most commonly experience limitations. Our hypothesis for this approach was that treatment goals targeting upper limb activity limitation, formulated after functional assessment of children with cerebral palsy aged 5-16 years by clinical and research doctors and therapists, would provide an appropriate basis for items, which relate directly to upper limb activity. We approached 14 therapy teams across England to collect appropriate goals of rehabilitation therapy. We combined these with goals taken from our own research work (Preston et al., 2016;Weightman et al., 2011;. For the purposes of the Children's Arm Rehabilitation Measure (ChARM's) item set, the goals were rewritten to form item stems. A major advantage of this approach is that the completed measure will have great clinical relevance because it is based on the most common functional difficulties experienced by the population for whom the measure is validated.
Response options for items also need to be properly developed.
Item responses can be varied in type or number (Streiner & Norman, 2003, pp. 33-35), or they can be consistent for each item, for example, rating capability as "Easy, Difficult, or Impossible" for each item, as in the ABILHAND-Kids. Too many response options can introduce error (Bond, 2003). Conversely, too few response options may result in poor responsiveness (Bovolenta et al., 2009), possibly as a consequence of increased floor and ceiling effects caused by the width of the categories (Merbitz et al., 1989). Bond and Fox (2015, p. 160) suggest that the optimum number of response options is entirely dependent on the characteristic being measured and should be assessed empirically for each scale. We therefore elected to develop item responses from the natural stages into which each item's activity can be broken as is typically done in rehabilitation by therapists working on reducing activity limitation (Bobath, 1990). For example, the item responses for the item "donning a vest" included the following natural stages: • Yes, my child can put on a vest.
• My child can put on a vest if it is laid out first.
• My child can put on a vest once it has been pulled over their head or one arm.
• My child can complete putting on a vest once it has been put on over the head and arms.
• No, my child needs help to completely put on a vest.
Individual items therefore had a differing number of response options. The resulting item set was reviewed by between two and five therapists spread across the 12 rehabilitation teams that agreed to support the development of the ChARM. The item set was then formulated into the ChARM questionnaire.
The ChARM underwent face validity testing by a process in which the ChARM was reviewed by five groups of five or six people, one group after another. Each group included paediatric therapists, parents of children with cerebral palsy, professors and researchers who specialise in psychometrics and in the development of new measures, and lay persons. After each group's review, the reviewers' comments were addressed before the ChARM was reviewed by the next group of reviewers. The process was repeated four times in total. Paediatric therapists were not from the teams that had been involved in the generation of goals or the review of the items.
The aim of the next stage was to obtain a dataset of ChARM responses in order to perform psychometric testing. This required parents of children with cerebral palsy aged 5-16 years across the range of manual disability treated by paediatric therapists (Manual Ability Classification System [MACS; Eliasson et al., 2006] Levels II-V) to complete the ChARM and return it to us. Therefore, the paediatric therapy teams posted to the parents of each child on their caseload that met these criteria a ChARM, an information sheet and a prepaid, addressed envelope for parents to return the ChARM directly to the research team. A web-based version of the questionnaire was available for parents who preferred to submit responses online. Both versions included a section for parents to give details of their child for the purposes of investigating response bias, for example, gender, age, and manual ability. We also included a text box for parents to leave comments.
Following an initial Rasch analysis on this first draft of the ChARM, we modified the ChARM on the basis of the Rasch findings and posted this ChARM version 2 back to the parents that had returned the first draft in order to perform a second Rasch analysis. To overcome the possibility that we would not receive a response from every family in the original cohort, therapy teams from two additional regional paediatric services posted out the questionnaire to the parents of children with cerebral palsy aged 5-16 years. We also used social media (e.g., Facebook and the message boards of Hemiplegia and Scope) to attempt to increase the sample of children with cerebral palsy for whom the ChARM would be completed.

| Rasch analysis
The Rasch analyses in this study were performed using RUMM2030 Version 5.4 for Windows, Copyright 1997-2012 RUMM Laboratory Pty Ltd. Masters' Partial Credit Model (unrestricted; polytomous or extended response category test format) was used because item responses varied in type and number between items (Masters, 1982).
The analyses generate summary statistics illustrating mean person and item locations and the overall fit to the Rasch model based on a chi-squared test of fit. Additionally, two measures of internal consistency are available: the person separation index (PSI) and Cronbach's α. In order to power an adequate Rasch analysis, we required a minimum of 100 completed ChARMs to achieve 95% confidence of item calibration to within 0.5 logits (Linacre, 1994). We did not collect data on how many ChARMs were posted by therapy teams.
Individual item analysis includes an assessment of individual item fit (using chi-square and standardised fit-residual statistics), response category threshold ordering, response dependency, and item response bias (differential item functioning). Additionally, unidimensionality is investigated by identifying the two most divergent subsets of items within the first factor of a principal component analysis of the residuals, as described in Tennant and Conaghan (2007). Separate person estimates are generated for each of these divergent item subsets, and differences in the individual person estimates are evaluated using a series of t tests. The percentage of significant tests should not exceed 5%, and the lower bound confidence interval for a binomial test of proportions should overlap (i.e., be lower than) the 5% limit to indicate unidimensionality.
Where disordered thresholds are present, amendments will be made by combining two or more adjacent response categories. Where evidence of response dependency or multidimensionality is present, items will be removed.
Once fit to the model is achieved, each deleted item will be individually reintroduced to the final item set to reevaluate the initial source of misfit.
To evaluate external construct validity of the ChARM, we hypothesised that there would be significant differences between mean logit scores of all children grouped by MACS (manual ability) level. To determine this, we planned to perform an analysis of variance on mean logit scores calculated for all children within each MACS level.

| Initial Rasch analysis
The initial Rasch analysis was conducted on a dataset from 170 ChARMs, each with 40 items, completed by the parents of children with cerebral palsy who were approached anonymously through the 12 regional therapy teams. This revealed a number of psychometric problems, for example, misfitting items and lack of unidimensionality.
We addressed these problems through a process that is described in more detail below. The initial Rasch analysis informed development of ChARM draft 2, which showed good fit to the Rasch model but also a large floor effect (greater than 20% of scores outside the range of the scale (Holmes & Shea, 1997)). One parent of a MACS Level V child listed 10 items in the comments section, which she suggested were missing but desirable for her child. Six of these items were relevant for both age range and gender and were added to ChARM draft 2 in an attempt to address the floor effect. This resulted in a questionnaire of 25 items, which was sent out to parents to obtain a new dataset on which to perform a second Rasch analysis and develop a final version of the ChARM. We received a completed 25-item ChARM draft 2 from 148 parents of children whose demographics and clinical details are given in Table 1. None was a result of the use of social media. All data were included in the psychometric testing.
Initial summary statistics for the ChARMs returned by the 148 parents are shown in Table 2 and indicated a degree of misfit to the Rasch model (chi-square statistic = 128.9, df = 50, p < .001). Initial fit statistics for items suggested that only item 2, an "easy" item involving an activity of "pressing a button or switch," displayed a significant misfit to the model.

| Threshold ordering
Five items initially displayed disordered response thresholds. To resolve this, we combined responses that illustrated disordered thresholds, using appropriate wording from each response to produce an ordered categorical response between the remaining unchanged categories. Figure 1 illustrates the threshold maps before and after addressing the disordered thresholds.

| Local dependency and unidimensionality
A number of items displayed local dependence through a correlation of item residuals. All observed dependencies made conceptual sense, but the content of the dependent items did not lend themselves to the items being combined into a single item with a broader response range.
We therefore resolved this issue in an iterative process by deleting the item, which displayed dependency with more items than any other, and then the dependent items with the least favourable fit statistics.
After the removal of six items, including the misfitting item 2, the ChARM displayed acceptable evidence of unidimensionality (only 8% of t tests were significant, with the lower bound of the 95% confidence interval at 4.6%).

| Item response bias
None of the items displayed any item response bias at a Bonferroniadjusted significance level, which was investigated for all items on the basis of age group (5-8 years old, 9-12 years old, or 13-16 years old), gender (male or female), distribution of arm impairment (unilateral or bilateral), learning difficulties, and visual impairment (present or not present).

| Final summary item fit statistics
Final summary statistics are shown in Table 2, and final item fit statistics are shown in Table 3 for the final 19 items.
Once the final psychometrically acceptable item set had been established, deleted items were reintroduced, one at a time, to the final item set to check that the initial misfit anomaly was still present. This was the case for all deleted items, so none were included in the final item set.
A person-item distribution map is shown in Figure 2. This sample size also provides at least 95% confidence of item calibration to within 0.5 logit, although given the good targeting parameters of the scale, it is likely that a 99% confidence of item calibration to within 0.5 logit has been achieved (Linacre, 1994).

| Reliability
The PSI illustrates an internal consistency of 0.946 (extremes included and 0.951 without extremes, see Table 2). The Cronbach's α value of the final item set is 0.95, indicating good targeting of the item distribution.

| DISCUSSION
This study successfully developed a psychometrically robust measure of upper limb activity limitation specifically validated for children with cerebral palsy aged 5-16 years. The ChARM is unidimensional, has excellent reliability, and displays no response bias for gender, topography, age or learning disability, and no floor or ceiling effects. The sample size permitted a strong calibration of items (Linacre, 1994).
Post-development psychometric testing to establish the measurement properties of a new measure is essential (Hobart et al., 2007;Tennant, 2007), but nothing in this subsequent validation can rectify badly selected and inappropriate items (Streiner & Norman, 2003, p. 15). Defining and selecting items that adequately represent the characteristic to be measured are of critical importance (Wilson, 2005, p. 64). Diligent design of outcome measures may help to prevent limitations described above (Hobart et al., 2007), starting with careful consideration of the actual trait being Note. ChARM = Children's Arm Rehabilitation Measure; CI = confidence interval; PSI = person separation index; SD = standard deviation. Initial analysis of draft 2: Initial summary statistics of Rasch analysis performed on one hundred forty-eight 25-item ChARM questionnaires returned for children described in Table 1.
Final analysis of draft 2: final summary statistics of Rasch analysis on 19-item questionnaire after addressing psychometric issues.
measured (Hobart et al., 2007). Our strategy for developing items ensured that items would be valid and appropriate for a high proportion of children with cerebral palsy across the targeted age and manual ability range and crucially that would represent the single characteristic that was to be evaluated: changes to upper limb activity limitation, as defined by the World Health Organisation (World Health Organisation, 2002;World Health Organisation, 2016). We will evaluate responsiveness in a subsequent study.
We decided against a standard (identical) response format for each item and elected to include all potentially appropriate response options knowing that they would be evaluated empirically, using Rasch analysis to identify disordering of thresholds and demonstrate which response facilitates easy identification of stages of achievement, which a child has reached (Bobath, 1990); it avoids the potential uncertainty for the respondent of which response option to endorse that occurs with homogenous item response options; and it overcomes the halo effect (when respondents endorse the same response category for each item;   Streiner & Norman, 2003, p. 39). No parent reported taking longer than 4 min to complete the ChARM.

| Limitations
Despite the promising results, some study limitations are present.
Developments of the items were from goals of activity rehabilitation taken from 53 children. This was a smaller number than anticipated, given that the therapy teams involved covered well-populated areas potentially including up to 2,000 children with cerebral palsy, none of whom was excluded outside of the age range 5-16 years old. Possible reasons for this poor initial recruitment include the requirement of all participants in the early stages to give full, written, informed consent to take part in the study despite the low impact of the study on children's care. This has now been recognised by the National Health Service research ethics service, and proportionate review is now available for studies of this nature. However, the ethics committee removed the need for written consent because the data were all anonymised, and a return of the questionnaire to researchers by parents was considered by the ethics committee to imply that informed consent was given. Additionally