Psychometric properties of the WeeFIM in children with cerebral palsy in Turkey

Authors


  • ACKNOWLEDGEMENTS
    This study was first presented as an oral presentation at the Society of Rehabilitation Research Summer Meeting, Leeds, UK, 3–4 July 2007.

Dr Ayşe A Küçükdeveci at Ankara 85 Sitesi, 176. sokak, No: 12, Beysukent, 06800 Ankara, Turkey. E-mail: ayse@tepa.com.tr

Abstract

The Functional Independence Measure for Children (WeeFIM) instrument has recently been adapted and validated for non-disabled children in Turkey. The aim of this study was to validate the instrument in children with cerebral palsy (CP). One hundred and thirty-four children with CP were assessed using the WeeFIM. Reliability was tested by internal consistency, intraclass and interrater correlation coefficients (ICCs), internal construct validity by Rasch analysis, and external construct validity by correlation with the Denver II Development Test (Denver II). Mean age of the participants (70 females, 64 males) was 4y 6mo (SD 3y 8mo, range 6mo–16y). CP type was: diplegia in 37.3%, hemiplegia in 20.2%, quadriplegia in 8.2%, ‘baby at risk’ (i.e. infants who show neuromotor delay but cannot be classified in a CP type) in 29.9%, and other in 4.5%. Reliability of the WeeFIM was excellent with high Cronbach’s alpha and ICC values ranging between 0.91 and 0.98 for the motor and cognitive scales. After collapsing response categories, both motor and cognitive scales met Rasch model expectations. Unidimensionality of the motor scale was confirmed after adjustment for local dependency of items. There was no substantive differential item functioning and strict unidimensionality for both scales was shown by analysis of the residuals. External construct validity was supported by expected high correlations with developmental ages determined by the social, fine motor function, language, and gross motor function domains of the Denver II. We conclude that the WeeFIM is a reliable and valid instrument for evaluating the functional status of Turkish children with CP.

List of abbreviations.
DIF

Differential item functioning

FIM

Functional Independence Measure

WeeFIM

Functional Independence Measure for Children

A variety of models have been used to describe the health status of children with disabilities.1,2 One of these is the developmental disability model, where motor and cognitive delay are quantified after comprehensive assessments of developmental and behavioural processes. There are various scales that are traditionally used to determine the developmental status of children such as the Battelle Developmental Inventories,3 Capute Scales,4 the Denver II Development Test (Denver II),5 and the Vineland Adaptive Behavior Scales.6 These instruments make a discriminative assessment of a child’s developmental skills. They usually require comprehensive and time-consuming assessments with trained examiners. Despite these strengths, the developmental model does not adequately account for a child’s skills in performing daily-living activities in natural environments such as the home and community. Another model, which became popular in the past decade, is the functional disability model where the child’s level of independence in activities and participation in daily life are addressed. The main advantage of the functional disability model is that it is outcome-oriented, defines the needs of children for functioning, and can be used for decision making in rehabilitation.2,7 There are two commonly used functional assessment scales for children: the Pediatric Evaluation of Disability Inventory (PEDI)8 and the Functional Independence Measure for Children (WeeFIM).9 Each instrument measures functional abilities and limitations when performing activities of daily living, taking into account caregiver assistance and the use of special equipment. The PEDI includes 197 items with an administration time of 45 minutes; the WeeFIM includes 18 items with a 20-minute administration time. As the WeeFIM is shorter and quicker to administer it lends itself to assessment of functional outcome in paediatric rehabilitation.

The WeeFIM was adapted from the adult Functional Independence Measure (FIM), retaining the same structure as the original scale.9 It includes 18 items covering six areas: self-care (eating, grooming, bathing, dressing upper body, dressing lower body, toileting); sphincter control (bladder management, bowel management); transfer (chair/bed/wheelchair transfer, toilet transfer, tub/shower transfer); locomotion (crawling/walking/wheelchair, stair climbing); communication (comprehension, expression); and social cognition (social interaction, problem solving, memory). A 7-level ordinal rating system ranging from 7 (complete independence) to 1 (total assistance), is used to score performance in each item. As with the FIM, the WeeFIM consists of two dimensions: motor and cognitive.10 The motor scale includes self-care, sphincter control, transfer, and locomotion items; the cognitive scale includes communication, and social cognition items. The WeeFIM’s purpose is to be an evaluative measure of disability using a minimal essential data set that is discipline free (i.e. can be used by paediatricians, rehabilitation specialists, physiotherapists, nurses, etc.) and is designed to track outcomes across clinical settings. It is recommended for use in children older than 6 months. The WeeFIM can be administered through direct observation, interview, or a combination of observation and interview.9 Consequently the WeeFIM is being extensively used in paediatric rehabilitation including in children with cerebral palsy (CP), brain injury, spina bifida, genetic impairments, and paediatric burns.2,7,10,11

The WeeFIM has recently been adapted and validated for non-disabled children in Turkey.12 As the instrument is intended to be used in rehabilitation settings for children with CP, it was felt that validation of the scale for this group would be necessary. Therefore, the aim of this study was to investigate the psychometric properties of the WeeFIM in children with CP in Turkey. To achieve this aim, first reliability of the instrument was tested then internal construct validity of the scale was assessed by modern psychometric methods based on Rasch analysis. Rasch analysis is increasingly used to evaluate the integrity of scales used for outcome measurement, including key aspects such as unidimensionality, functioning of item categories, and item bias. Last, external construct validity was tested by traditional statistical methods.

Method

Participants

One hundred and thirty-four children with CP, attending the outpatient clinic at the Department of Physical Medicine and Rehabilitation, Medical Faculty of Ankara University, Turkey were recruited for the study. All children had a confirmed medical diagnosis and were receiving therapy or follow-up services in special training and education centres for children with developmental disabilities. Written informed consent was obtained from the primary caregivers of the participants. The study was carried out in compliance with the Helsinki Declaration and was submitted to, and acknowledged by the Research Ethics Committee of Ankara University Faculty of Medicine, as required in Turkey for a non-intervention study of this kind.

Assessment

Assessment included the administration of the WeeFIM and the Denver II by the same physiatrist (BST). Additionally, a subgroup of 28 children were rated on the WeeFIM by a second physiatrist (GY) on the same day. The WeeFIM used in the present study was the previously adapted version.12 Both physiatrists applying the WeeFIM were trained in rating the instrument. Administration of the WeeFIM was performed, as stated in the guidelines, by both direct observation of the children and interviews with primary caregivers, 90% of whom were mothers, 7% fathers, and 3% other relatives.

The Denver II was used for the evaluation of the developmental status of the children. It is a 125-item test which assesses children’s development in four areas: social (aspects of socialization inside and outside home); fine motor functions (eye-hand coordination, manipulation of small objects); language (production of sounds, ability to recognize, understand, and use language); and gross motor functions (motor control, sitting, walking, jumping, and other movements).5 Results of assessment in each area of the test are assessed by developmental age and graded as being ‘failure’, ‘suspicious’, or ‘successful’. The Denver II was validated for the Turkish paediatric setting13 and it is the most commonly used developmental test in Turkey. The test is used as part of a routine clinical procedure for the assessment of children with developmental disabilities in their current setting. The physiatrist performing the assessment was licensed for administering the validated Turkish version of the test.

Reliability

Reliability of the WeeFIM was tested in a number of ways. First, the internal consistency of the instrument was tested. The internal consistency of an instrument is an estimate of the degree to which its constituent items are interrelated and is assessed by Cronbach’s alpha coefficient.14 In addition, for overall reliability, the intraclass correlation coefficient is calculated (ICC 2,1).15 Subsequently, reliability is further tested by the person separation index (PSI) from the Rasch analysis (see below). Finally, interrater reliability is investigated through the ICC. Usually, a reliability of 0.70 is required for analysis at group level and values of 0.85 and higher for individuals.15

Internal construct validity

The internal construct validity of a scale is assessed by Rasch analysis. The Rasch methodology adopted in this study has been widely used for analysis of the FIM.16 The analytical approach is described in detail elsewhere.17 (Appendix I, supporting information published online). Data were fitted to the Rasch model using the RUMM2020 software.18

External construct validity

External construct validity of the WeeFIM was assessed through convergent validity with the Denver II. Convergent validity refers to the assumption that different measures of similar hypothetical constructs ought to correlate highly with one another if the measures are valid.15 Although developmental status and functional status are not the same constructs, high associations are expected between the similar sections of these two constructs.8 For example, a high correlation is expected between the WeeFIM cognitive section, and the language ability section of a developmental test. In spite of the fact that, at the outset of the study, some questions were raised regarding the specificity of the Denver II, this test was used in the present study due to it being the only validated developmental test available in Turkish.

Sample size and statistical analysis

The sample size of 134 used in the current study gave 99% confidence of the person-estimate (e.g. the latent estimate of motor ability) being within 0.7 SD logits of its true value.19 This sample size was also sufficient to test for differential item functioning (DIF) where, at an alpha of 0.01, a difference of 0.5 SDs within the residuals can be detected for any two groups with a β of 0.20. Bonferroni corrections were applied to both fit and DIF statistics due to the number of tests undertaken.20 Consequently a value of 0.004 was used throughout. The Wilcoxon signed rank test was used for evaluating change over time and Spearman’s rank correlation was used for associations between instruments.

Statistical analysis was undertaken using SPSS-11 for Windows.

Results

Characteristics of children

The 134 children (70 females, 64 males) had a mean age of 4y 6mo, (SD 3y 8mo, range 6mo–16y, median 41mo). Age distribution was: 29.1% less than 18 months, 11.2% 18 to 30 months, 23.1% 31 to 60 months, 12.0% 61 to 84 months, and 24.6% more than 84 months. CP type was diplegia in 37.3%, hemiplegia in 20.2%, quadriplegia in 8.2%, ‘baby at risk’ in 29.9%, and other in 4.5%. Preterm or low-birthweight infants who show neuromotor delay or abnormalities, but who cannot be classified in a certain type of CP according to motor or topographic involvement, are defined as ‘baby at risk’.21,22 Most of the group was classified as ‘failure’ or ‘suspicious’ according to the four areas of the Denver II (Table I) .

Table I.   Developmental state of participants according to Denver II Development Test5
 Denver II SocialDenver II Fine motorDenver II LanguageDenver II Gross motor
Failure, %75605188
Suspicious, %9745
Successful, %1633457

Mean age of the subgroup of children (n=28; 16 females, 12 males) used for interrater reliability was 4y 8mo (SD 3y 2mo, range 6mo–12y 10mo, median 4y 6mo). CP type was diplegia (n=13), hemiplegia (n=8), quadriplegia (n=2), and ‘baby at risk’ (n=5).

Reliability

Internal consistency (Cronbach’s alpha), ICC, and PSI values of the WeeFIM motor and cognitive scales were high (>0.90) and consistent for individual use (Table II). Interrater reliability was excellent with ICC values of 0.98 and 0.93 for the motor and cognitive scales respectively.

Table II.   Reliability of the Functional Independence Measure for Children (WeeFIM)9
 Internal consistency (Cronbach’s α) n=134 ICC (95% CI) n=134PSI n=134Interrater reliability ICC (95% CI) n=28
  1. ICC, intraclass correlation coefficient; CI, confidence interval; PSI: person separation index.

WeeFIM Motor scale0.930.91 (0.88–0.93)0.930.98 (0.95–0.99)
WeeFIM Cognitive scale0.980.97 (0.96–0.98)0.990.93 (0.85–0.97)

Internal construct validity

Initial analysis of the motor scale showed poor fit to model expectations (Table III, analysis 1). The response categories of most items did not reflect an increasing level of the underlying trait, as would be expected when the scale worked properly. For example, the transition (threshold) between categories one and two was located at a higher level of independence than the transition between categories five and six (e.g. see Fig. 1). This distortion is referred to as ‘disordered thresholds’ and thus categories were collapsed to remove this disordering, which resulted in improved fit (Table III, analysis 2). Reliability (PSI) was high at 0.93. Little DIF or item bias was observed in the data and, where present, was limited to the walking and stairs items, the former of which displayed DIF for age and type of CP, the latter for type of CP (e.g. at the same level of motor ability, the response to the item varied by type of CP). However, independent t-test analysis indicated persisting multidimensionality in the data.

Table III.   Fit to Rasch model expectations of Functional Independence Measure for Children (WeeFIM)9 motor and cognitive scales
AnalysisnItem residualPerson residualχ2 valuepPSI%t-tests out of range (95% CI)
MeanSDMeanSD
  1. PSI, person separation index; CI, confidence interval.

WeeFIM Motor Scale
 1. Initial125−0.5041.278−0.4200.850102.1<0.0010.94112.8 (8–17)
 2. After rescoring94−0.7811.086−0.4150.82433.50.14830.93012.8 (8–17)
 3. Testlets making three items94−0.2730.972−0.2230.59012.10.02850.9291.0
WeeFIM Cognitive Scale
 4. Initial83−0.4920.870−0.2640.6144.40.49070.98834.8
 5. After rescoring83−0.5300.915−0.2510.6754.00.54510.98792.4
Figure 1.

 Category probability curves of eating item from Functional Independence Measure for Children (WeeFIM)9 motor scale

Further analysis of the 13-item motor scale showed high correlations among the residuals, indicating local dependence (potential redundancy) among the underlying item sets designated as self-care, sphincter control, and mobility (transfer and locomotion). These sets of items were grouped into three ‘super items’ or testlets, reflecting these three underlying sets, and then the data refitted to the Rasch model. Fit to model expectation was good and data were strictly unidimensional (t-tests 1%), suggesting that the original misfit was a result of local independence (Table III, analysis 3).

Data from the cognitive scale initially failed to fit the Rasch model (Table III, analysis 4), but after rescoring for disordered thresholds of the reponse categoies the scale met model expectations (Table III, analysis 5). Reliability was high at 0.99. No DIF was found for any item. The number of significant independent t-tests for the cognitive scale was low (2.4%) supporting the unidimensionality of the scale.

External construct validity

The correlations of the WeeFIM scale with four areas of the Denver-II were as expected, the strongest (r=0.94) being between the WeeFIM cognitive scale and the Denver II language section, and the least strong (r=0.71) between the WeeFIM cognitive and the Denver II gross motor function section (Table IV).

Table IV.   Correlations of Functional Independence Measure for Children (WeeFIM)9 with Denver II Development Test5 presented as Spearman’s r (n=134)
 Denver II SocialDenver II Fine motorDenver II LanguageDenver II Gross motor
WeeFIM Motor0.860.850.850.85
WeeFIM Cognitive0.840.870.940.71

Discussion

The present study investigated the reliability and the validity of the Turkish version of the WeeFIM in children with CP. The scale was found to be reliable with high internal consistency, overall reliability, and interrater reliability. Cronbach’s alpha values were found to be 0.93 and 0.98 for the WeeFIM motor and cognitive scales respectively, comparable to 0.99 which was reported in a sample including 573 normally developing children.12 The same study reported the overall ICC values as 0.81 and 0.91 for the motor and cognitive scales respectively, whereas they were 0.91 and 0.97 in the present study. The only PSI value previously reported in a group of children with disabilities was 0.94 for the motor scale which was in concordance with our finding of 0.93.10 Interrater reliability ICCs of 0.98 and 0.93 for motor and cognitive scales respectively, were also consistent with earlier studies. In a study including 205 children with disabilities, the interrater ICC of the WeeFIM was reported as 0.96.23 Similarly, ICCs of domain scores ranged from 0.90 to 0.99 in 569 normally developing Thai children.24 In another study of typically developing Japanese children, interrater ICCs were found to be 0.99 and 0.98 for motor and cognitive scales respectively.25

Regarding internal construct validity, the WeeFIM motor scale in its present form did not satisfy the requirements of the Rasch measurement model. Collapsing of item categories was necessary, which improved fit but multidimensionality remained. Further analysis suggested that local dependency was present which would have a considerable effect on parameter estimates as well as inflating classical reliability estimates. Once this was adjusted for by creating testlets, all 13 items (presented as three testlets) remained in the scale, which was strictly unidimensional. Thus, it appears that local dependency was affecting parameter estimates in such a way as to give the appearance of multidimensionality. This contrasts with previous findings which showed that either the motor scale split into three subscales, or that the sphincter items must be removed to satisfy Rasch model requirements.10,26 Consequently, those with larger data sets for the WeeFIM (and indeed the FIM) may wish to revisit their analysis to see if local dependency was affecting results. Similarly, the WeeFIM cognitive scale required collapsing of categories. After rescoring the item categories, the cognitive scale was found to be a unidimensional Rasch scale without showing any DIF.

This is the first WeeFIM Rasch study reporting on data of children with CP only. Another Rasch study also reported on children with disabilities but only 11% of the group had CP.10 Similar to the findings of this earlier study, our results confirmed the internal construct validity of the motor and cognitive scales of the WeeFIM.

External construct validity of the WeeFIM was investigated through its convergent validity with Denver II. Although better developmental screening tools, such as the Battelle Developmental Inventories,3 Capute Scales,4 or Ages and Stages Questionnaire,27 are currently available, Denver II is the only developmental test with a validated Turkish version.13 The least strong correlation (r=0.71) was observed between the WeeFIM cognitive scale and the Denver II gross motor section, whereas the strongest correlation (r=0.94) was between the WeeFIM cognitive scale and the Denver II language section. Other correlations were all nearly 0.85. These correlations with developmental ages were as expected, confirming the convergent validity of the scale. In a study investigating the association between functional status and the language tests, medium to high correlations (0.75–0.77) with the WeeFIM total scores were reported.28 In another study, cognitive domains of the WeeFIM had high correlations (0.75–0.88) with cognitive, communication, and social sections of Battelle Developmental Inventory Screening Test and the Vineland Adaptive Behavior Scales, but moderate correlations (0.53–0.76) with sections measuring motor development. In the same study, motor domains of the WeeFIM were highly correlated (0.77–0.88) with motor developmental sections, whereas lower correlations (0.40–0.80) were demonstrated with cognitive, communicative, and social developmental sections.2 It was also reported that high correlations (0.82–0.84) were found between the self-care and mobility sections of another functional assessment scale, PEDI, and adaptive and gross motor domains of the Batelle Developmental Inventory Screening Test.8 The relations reported in all of those studies are in concordance with our external construct validity findings.

One point of concern of the present study is the number of extreme cases, namely those at the floor of the motor scale. Over nine out of 10 (91%) of cases at the floor of this scale were from children aged less than 41 months. Thus, over two-fifths of all children examined (41%) were at the floor of the motor scale, despite adjustments to the scoring instructions to include motor aspects such as crawling. This needs to be examined further, to determine if developmental delay makes this a valid observation, or to see if the scoring instructions are sensitive enough to grade the youngest children. Otherwise, this may be a limitation of applying generic motor measures to children with CP, as opposed to using a measure constructed specifically for this group.

Although we did observe some DIF for the walking item, the absence of DIF for other items is contrary to that found in previous studies10,12 and suggests that items are invariant in their hierarchical ordering, at least across the crude age groups used in this study. Nevertheless, it is possible that developmental delay in these children with CP is making the age groups more homogeneous than otherwise might be the case.

There are some weaknesses in the current study. The sample size is low and this may have contributed to the necessity to collapse categories. At the present time, because of this, we recommend the existing scoring structure be retained until further evidence accrues to support the need for re-structuring the response options. Similarly, the 13 items of the motor scale should be used as in the original version. Although we identified local dependency among these items, and subsequently grouped them into three testlets (super items), there is always a tension between the clinimetric and psychometric demands placed upon instruments. Each item contains potentially important information for the clinical management of these children, and in this context, replication of items (redundancy) may not be of concern. Any local dependence in these items can be accommodated, post hoc, as we have done. However, if a shorter set of items were sought, for whatever reason, then this redundancy would be taken into account in item selection.

The independent t-test we used to confirm unidimensionality has been shown to be very effective at detecting multidimensionality in data.29 However, it loses its power when the number of thresholds being used to make estimates is very low and the differences in estimates, which form the basis of this test, are less likely to be normally distributed. The small sample size also restricted our ability to examine DIF in fine detail, for example, at smaller age groups. Finally, the sample size also precludes providing an exchange rate between the raw score and the interval scale estimate (Rasch-transformed score) at the present time.

Conclusion

The WeeFIM is a reliable and valid scale for assessing motor and cognitive function in children with CP, although the floor effect suggests that validity is weaker with the youngest children where most of this effect occurs. From a measurement perspective there is some redundancy in the motor scale item set, but clinical management considerations may override this and, in any case, grouping items for analytical purposes into the underlying clinical subscales, such as self-care, removes the disturbance caused by this dependency.

Disclaimer

The patient data collected during the course of this research study has not been processed by Uniform Data System for Medical Rehabilitation (UDSMR). No implication is intended that such data has been or will be subjected to UDSMR’s standard data processing procedures or that it is otherwise comparable to data processed by UDSMR nor have researcher’s been trained or credentialed in the use or implementation of the WeeFIM by UDSMR.

FIM and WeeFIM are trademarks of Uniform Data System for Medical Rehabilitation, a division of UB Foundation Activities, Inc.

Ancillary