To revise the content of the Functional Index in myositis (FI) and to evaluate measurement properties of a revised FI.
To revise the content of the Functional Index in myositis (FI) and to evaluate measurement properties of a revised FI.
Previously performed FI (n = 287) were analyzed for internal redundancy and consistency, and ceiling and floor effects. Content was evaluated and a preliminary revised FI was developed. To evaluate the construct validity of the preliminary revised FI, it was compared with isokinetic measurements of muscular strength and endurance, the Myositis Activities Profile, disease impact on general wellbeing, and creatine phosphokinase levels. Minor adjustments were made and the revised FI was investigated for interrater reliability and intrarater reliability over a 1-week period. After this, some minor, additional adjustments were made leading to the final version, FI-2.
Five tasks were removed from the original FI due to ceiling effects. Performance pace and number of repetitions were modified for the remaining tasks. A moderate correlation (rs = 0.58) was found between the shoulder flexion task of the preliminary revised FI and isokinetic measurements of shoulder flexion endurance. Intraclass correlation coefficient (ICC) for interrater reliability of the revised FI varied from 0.86–0.99 with no systematic differences. ICC for intrarater reliability varied from 0.56–0.99 with systematic differences (P < 0.05) between test and retest in 3 of the tasks. The sit-up task was excluded due to low intrarater reliability resulting in the final 7-item FI-2. There was a good correlation between tasks on the right and left side suggesting that the FI-2 could be performed unilaterally.
The FI-2 is a valid and reliable outcome measure of impairment for patients with polymyositis or dermatomyositis. It is well tolerated and the unilateral FI-2 requires a maximum of 20 minutes to perform. Further evaluation of sensitivity to change and testing in healthy individuals needs to be conducted.
Polymyositis (PM) and dermatomyositis (DM) are idiopathic systemic inflammatory diseases (1) that are clinically characterized by symmetrical muscle weakness together with fatigue and in some cases also lung fibrosis and myalgia (1–4). Both muscle impairment and lung fibrosis may lead to activity limitations such as difficulties walking uphill, climbing stairs, or washing hair (3, 5).
There are few valid and reliable assessments to evaluate disability in patients with PM and DM. Recently, the International Myositis Assessment and Clinical Studies Group (IMACS) proposed a 6-domain disease activity core set to be used in clinical trials in patients with myositis; physician and patient/parent global disease activity, muscle strength by manual muscle test (MMT), activity limitation by the Stanford Health Assessment Questionnaire (HAQ) (6), laboratory assessments of muscle enzymes, and assessment of extra-skeletal involvement (7). These assessments were developed in adult and juvenile patients with myositis followed by an initial validation in adult patients (8).
The MMT and kinetic systems or Cybex devices have been used in trials to assess isometric muscle strength in myositis (9–14). However, the MMT could have limited measuring reliability in patients with mild impairment (15). Isokinetic measurements were used in one trial including patients diagnosed with inclusion body myositis (IBM) (12), but have not yet been used in patients with PM and DM. These systems are costly and not feasible in daily clinical practice. Another limitation of these measurements is that they measure strength, although patients with PM and DM often report impaired muscle endurance in addition to muscle weakness.
The Functional Index in myositis (FI) was the first functional impairment outcome measure developed specifically for patients with PM and DM (16). The FI is based on repetitive movements involving selected muscle groups to capture decreased muscle endurance and was reliable and validated regarding its ability to discriminate patients from healthy individuals. The FI is useful in assessing patients with moderate to severe impairment (17, 18). However, it is time-consuming, some tasks might be inadequate, and ceiling effects have been observed in patients with mild impairment (17, 18).
The objective of this study was to revise the content of the original FI, to develop a revised FI, to establish content and construct validity of the revised FI, and also to investigate its interrater and intrarater reliability.
Patients with PM or DM were recruited from the rheumatology clinics at the Karolinska University Hospital, Stockholm and the Sahlgrenska University Hospital, Göteborg, Sweden. Inclusion criteria were a diagnosis of PM/DM according to Bohan and Peter (19), ability to understand the Swedish language well, and ability to perform the FI (16). Exclusion criteria were a diagnosis of IBM, pulmonary hypertension, or severe osteoporosis. The process of scrutinizing the FI and also the development and evaluation of the revised FI was performed in sequences, i.e., each part of the study was designed based on the results from the previous one. The study was performed as 5 parts with the patients included in each part defined as a cohort. A total of 78 patients divided into 5 cohorts were consecutively included for the different parts of this study (Table 1).
|Cohort 1 n = 53||Cohort 2 n = 4||Cohort 3 n = 25||Cohort 4 n = 13||Cohort 5 n = 13|
|Age, years||NA||54 (53–56)||55 (28–73)||53 (27–69)||54 (27–76)|
|PM/DM diagnosis, no.||32/21||3/1||16/9||7/6||8/5|
|SLE/SSc/SS diagnosis in addition to PM/DM†, no.||1/1/2||0/0/0||1/2/1||0/0/0||0/0/0|
|Diagnosis duration, years||3.0 (0.0–21.0)||4.0 (2.4–5.0)||5.0 (1.5–29.0)||3.3 (0.2–18.0)||4.2 (2.0–18.1)|
|Prednisone, mg/day||NA||NA||3.8 (0.0–20.0)||5.0 (0.0–30.0)||0.0 (0.0–15.0)|
The original FI comprised a total of 14 tasks in the following consecutive order: grip strength, elbow flexion and shoulder flexion with 1-kg weight cuffs around the wrists, shoulder abduction, hip flexion, hip abduction, step test, heel lifts, toe lifts, head lifts, and sit-ups. Scoring was based on the number of correctly performed repetitions. Maximum score for each task was 5 (maximal capacity). The FI also included transfer from side-to-side in a lying position, transfer to a sitting position, and Peak Expiratory Flow (PEF), each with a maximum score of 3 (maximal capacity). The total FI score varied from 0 (minimum capacity) to 64 (maximal capacity) per body side (16).
Maximal voluntary right shoulder flexor and right knee extensor dynamic muscle strength and endurance were evaluated using an isokinetic dynamometer (Kin-Com 500H, Chattecx, Chattanooga TN). Dynamic right shoulder flexor strength was measured in a range from 20°–120°, where 0° indicated that the arm was aligned with the trunk. Dynamic right knee extensor strength was measured in a range from 90°–30°, where 0° indicated a straight leg. The tests were performed in concentric actions at a velocity of 90°/second and the passive return to start position at 60°/second. Start force was set to 50% of the isometric mean peak torque of 3 maximal contractions in the respective start positions. Maximal dynamic strength was evaluated by 3 concentric contractions with a good reproducibility with a 1-minute rest in between (20, 21). Mean torque (MT) of the 3 contractions in a range from 40°–115° and 70°–35° in shoulder and knee, respectively, was determined (22).
After a 5-minute rest, the dynamic muscle endurance tests were performed in the same settings. Muscle endurance was defined as the number of maximal repetitions performed to exhaustion. After completion of each test, the patients rated perceived muscular exertion using the Borg CR-10 scale. The Borg CR-10 scale is a category scale with ratio properties ranging from 0 = nothing at all to 10 = very, very strong—almost maximal (23). The number of repetitions performed at a level above 40%, 50%, 60%, and 70% of MT was calculated for further analyses.
The Myositis Activities Profile (MAP) is a disease-specific questionnaire assessing activity limitation in patients with PM and DM. The MAP is divided into 4 subscales, Movement, Moving around, Self-care, and Domestic; and 4 single items, Social, Avoiding overexertion, Work/school, and Leisure. Each item is scored on a 7-grade scale from 1 to 7; where 1 = no difficulty to perform, and 7 = impossible to perform. The median of all items included constitutes the score of each subscale (24).
The impact of disease on general wellbeing during the past week and during the past 6 months, respectively, were rated using 0–100-mm visual analog scales, where 0 = no impact on wellbeing, and 100 = maximal impact on wellbeing (25).
Serum creatine phosphokinase (CPK) levels were used as a surrogate marker for disease activity (normal values for women < 3.5 μcal/l, and for men < 4.7 μcal/l).
The content of the original FI was used as the basis for the revised version. The FI that had been performed 287 times in clinical practice and in 2 exercise trials (17, 18) between 1996 and 2000 was analyzed. Cohort 1, consisting of 53 patients with various stages of disease, was used in the analysis. Each task of the FI, and the total FI scores were checked for possible floor or ceiling effects. The FI was then divided into 3 subscales, upper limbs, lower limbs, and neck and trunk, which were analyzed for internal redundancy and internal consistency. The results were presented to a group of health professionals with experience in treating patients with PM and DM. They commented on the results and the relevance of each individual FI task. Cohort 2 comprised 4 patients all of whom were from cohort 1 with previous experience of performing the FI. These patients were included consecutively as they visited the clinic for regular check-ups during April 2001, and performed the FI and were invited to comment on the relevance of each task and also to suggest additional ones. A preliminary revised FI was developed.
The construct validity of the preliminary revised FI was then assessed in 25 patients in cohort 3 (19 from cohort 1 and another 6 patients at the rheumatology clinic meeting the inclusion criteria and accepting participation in the study, all with various stages of disease). Our hypotheses were that the shoulder flexion and knee extension tasks of the preliminary revised FI should be more convergent with the isokinetic endurance measures of the shoulder flexion and knee extension than with the isokinetic strength measures. They were also hypothesized to converge slightly with activity limitation (the MAP), but to be divergent from disease activity (CPK levels) and participation restriction (disease impact on wellbeing). Possible internal redundancy, floor, and ceiling effects were also analyzed. The patients in cohort 3 visited the clinic on 3 occasions. On the first visit, the cohort received instruction on the isokinetic measurement procedure. On the second visit the patients performed the preliminary revised FI, filled out the MAP, and rated disease impact on their wellbeing. While performing the revised FI the patients rated perceived muscular exertion and were monitored with a heart rate display to evaluate the exercise intensity of each single task. During the third visit isokinetic measurements of muscle strength and endurance were performed. The preliminary revised FI and isokinetic measurements were performed within the same week, and the learning session was scheduled 1 week prior. The same physical therapist instructed, observed, and scored all measures. Blood samples for analysis of serum CPK levels were drawn within 2 weeks of the second visit. Minor adjustments were made on the preliminary revised FI, resulting in the revised FI.
Interrater reliability of the revised FI was assessed using cohort 4 consisting of 13 patients all with various stages of disease. The patients performed the revised FI twice on the same day with a 1-hour rest period in between sessions. Three physical therapists were involved in this part of the study. One observed all 13 patients and the other 2 randomly observed 11 and 2 patients, respectively, on either the first or second test occasion. Each therapist read a manual for the revised FI, but no common learning session took place. Measures of maximal and mean grip strength assessed by the Grippit instrument (AB Detektor, Göteborg, Sweden) were conducted after performance of the revised FI on all occasions.
Intrarater reliability over 1 week was performed in the 13 patients in cohort 5 (which comprised 1 patient from cohort 1, 1 from cohort 3, 5 from cohort 4, and an additional 6 patients registered at the rheumatology clinic meeting the inclusion criteria and accepting participation in the study), all with stable disease activity and stable medication doses for the past 3 months. One physical therapist observed all patients on all occasions. The Grippit measurements were carried out after the performance of the revised FI on all occasions. Minor modifications were made, resulting in the final version, the FI-2.
All data were analyzed with Statistica 6.0 (StatSoft, Tulsa, OK). Spearman correlation coefficient (rs) was used to analyze both internal redundancy and internal consistency within the FI and the preliminary revised FI, and the convergence or divergence with other constructs. Correlation coefficients rs >0.90 were considered to indicate redundancy and rs <0.60 to indicate poor consistency. In analyses of construct validity, rs 0–0.25 as no or very low correlation, rs 0.26–0.49 as low, rs 0.50–0.69 as moderate, rs 0.70–0.89 as high, and rs 0.90–1.0 as very high correlation (26). Intraclass correlation coefficients (ICCs) were calculated for interrater and intrarater reliability. ICCs <0.75 were considered to indicate low to fair reliability and those >0.75 to indicate good to excellent reliability (27). The level of significance was P < 0.05, indicating systematic differences between test and retest.
The ethics committees of the Karolinska Institutet in Stockholm and the Sahlgrenska University Hospital in Göteborg approved the design of the study and all patients gave their informed consent.
The box plot distributions for the 287 FI performed by cohort 1 are shown in Figure 1. Ceiling effects (defined as median value equals 20–50 percentile of the total variation of values) were observed for the elbow flexion and hip abduction tasks, for the transfers from side-to-side, as well as up to sitting and PEF.
Furthermore the shoulder flexion, the shoulder abduction, the hip flexion, the step test, the head lift, and sit-up tasks had median values of 5 indicating maximal capacity (Figure 1). Median value for the total FI score was 53 (range 16–64) indicating a limited possibility to detect clinically relevant improvement.
No tasks were redundant, indicated by correlation coefficients from rs = 0.10 to rs = 0.82. Grip strength score showed poor consistency with the upper limbs subscale (rs = 0.43). Hip abduction score showed poor consistency with the lower limb subscale (rs = 0.48), and all tasks within the neck and trunk subscale were inconsistent (rs 0.09–0.46).
Health professionals and patients rated most tasks on the original FI as relevant, including the grip strength. Because Grippit is not a functional assessment of grip function, and all other tasks included in the FI classified as functional, the health professionals suggested that the grip measurements should be separate rather than be included in the revised FI. Although the elbow flexion task had a ceiling effect, the majority of patients and health professionals considered it to be relevant. The patients questioned the relevance of the hip abduction task, the transfers, and the PEF measurement, but rated the heel lift and the toe lift tasks as relevant but difficult to perform.
Based on the above assessments, the grip measurements, the hip abduction task, the transfers, and the PEF measurements were excluded in the preliminary revised FI. To overcome the ceiling effects observed in the remaining tasks of the FI, some adjustments were made. The maximal number of repetitions for elbow flexion, shoulder flexion, shoulder abduction, head lift, sit-ups, hip flexion, and step test tasks was increased from 20 to 60. A pace of 40 nods per minute, given by a metronome, resulting in 20 repetitions per minute, was used to standardize the movement velocity. The heel lift and the toe lift tasks were modified from standing on one leg with minimal balance support to standing on both feet with balance support from a wall. The maximal number of repetitions for the 2 latter tasks was increased from 20 to 120, and they were performed at a pace of 80 nods per minute, resulting in 40 repetitions per minute. These changes resulted in the preliminary revised FI. The patients performed 5 learning repetitions to enhance the performance of the tasks at the given speeds.
Two of the 25 patients in cohort 3 did not perform the isokinetic measurements; 1 due to muscle soreness of 2 weeks duration after performing the preliminary revised FI, and 1 due to lung surgery. As hypothesized, the shoulder flexion task on the preliminary revised FI was most convergent with the isokinetic shoulder flexion endurance measurements (rs = 0.48–0.58), but less with the isokinetic maximal MT (rs = 0.37), the MAP (rs = 0.28), and well-being (varying from rs = 0.17–0.36), and was divergent with CPK levels (rs = 0.05). Our hypotheses regarding the step test of the preliminary revised FI were not proven because the highest correlation was obtained with isokinetic maximal MT (rs = 0.42), while correlations to other constructs varied from rs = 0.01–0.32 (Table 2).
|Constructs||FI shoulder flexion task n = 25||FI step task n = 25|
|Isokinetic maximal MT, shoulder flexion||0.37†||—|
|Isokinetic endurance, shoulder flexion|
|Isokinetic maximal MT, knee extension||—||0.42†|
|Isokinetic endurance, knee extension|
|Wellbeing >1 week||0.17||0.03|
|Wellbeing >6 months||0.36||0.28|
No tasks of the preliminary revised FI were redundant. Correlations between the tasks on the right and left sides ranged from rs = 0.76–0.92.
The elbow flexion task still had a ceiling effect in the preliminary revised FI with a median value of 60 repetitions equaled 20–50 percentiles for both the right and left sides and was therefore removed. All other tasks varied from 0–60 or from 0–120 repetitions, respectively (Figure 2). There was no significant difference between the right and left sides of the body in each task except for the step test tasks. Therefore, the other tasks are presented as the mean of the right and left sides in Figure 2. The elbow flexion task was not included in the 8-task revised FI.
Cohort 4 (n = 13) performed 2 revised FI and Grippit measurements (with a 1-hour rest period in between) and were observed by independent observers. The ICC varied between 0.86 and 0.99, and between 0.92 and 0.98 for the different tasks of the revised FI and the Grippit measurements, respectively. No systematic differences were found (Table 3).
|Interrater reliability ICC n = 13||Intrarater reliability ICC n = 10||Error of measurement, no. (%)|
|Shoulder flexion right (0–60)||0.97||0.75†||10 (16)|
|Shoulder flexion left (0–60)||0.96||0.90†||6 (10)|
|Shoulder abduction right (0–60)||0.86||0.94†||5 (8)|
|Shoulder abduction left (0–60)||0.91||0.87||8 (13)|
|Head lift (0–60)||0.90||0.93||5 (8)|
|Sit-ups (0–60)‡||0.92||0.56||15 (25)|
|Hip flexion right (0–60)||0.92||0.91||7 (12)|
|Hip flexion left (0–60)||0.99||0.80||10 (16)|
|Step test right (0–60)||0.97||0.99||3 (5)|
|Step test left (0–60)||0.98||0.97||4 (7)|
|Heel lift (0–120)||0.98||0.89||16 (13)|
|Toe lift (0–120)||0.99||0.97||7 (6)|
Cohort 5 (n = 13) was used to assess intrarater reliability. Three patients could not repeat the revised FI after one week: 1 patient experienced muscle soreness lasting 5 days after the first performance and did not want to repeat the measurements; 1 patient still experienced muscle soreness after 10 days and did not perform the second revised FI in accordance with the instructions due to fear of aggravating pain; and 1 patient cancelled the second visit due to other illness. Therefore, the results of intrarater reliability were calculated on the remaining 10 patients. ICC varied between 0.56 and 0.99 for the tasks of the revised FI, and between 0.93 and 0.97 for the Grippit measurements. Systematic differences (P < 0.05) between test and retest were found for the right and left shoulder flexion tasks and for the right side shoulder abduction task (Table 3). The measurement error for the different tasks of the revised FI varied between 3 and 16 repetitions, and between 20 and 30 Newtons for the Grippit measurements (Table 3). The sit-up task was excluded due to low intrarater reliability (ICC 0.56), leading to the final version, FI-2 (Appendix A, B), which includes 7 tasks of the upper and lower extremities and head lift. Each task of the FI-2 is scored separately as the number of repetitions performed correctly, with scoring starting after a 5-repetition learning.
Muscle soreness was experienced by 22 of the 25 patients in cohort 3 (1 severe, 21 mild) after performing the preliminary revised FI and the isokinetic measurements. Eighteen of the 25 patients wore the pulse display during performance of the preliminary revised FI. The median heart rate during the different tasks varied between 49% and 69% of the patients' individually predicted maximal heart rate. Median perceived exertion for each task was 3 (range 0–10) according to the Borg CR-10 scale.
The final version, FI-2, requires a maximum of 33 minutes to administer (maximal performance time for all tasks). If the patient has more severe impairment, it can be administered in less time. Based on the high correlations between tasks on the right and left body side of the preliminary revised FI, the FI-2 could be performed unilaterally, preferably on the patient's dominant side, when used in clinical trials. The FI-2 could be administered in less time, requiring only 5 to 21 minutes depending on the impairment level of the patient.
The FI-2 is the first partially-validated and reliable disease-specific measure of impairment in patients with PM and DM. The FI-2 contains no redundant tasks, has no ceiling or floor effects, and possesses satisfactory construct validity and interrater and intrarater reliability. It is not time-consuming to administer, is well-tolerated by patients with all stages of disease, and does not require any expensive equipment or formal training.
The content of the FI-2 represents measurement of muscle impairment of the upper and lower limbs and neck. This content was confirmed by extensive analysis of many previously-performed FI, as well as input by patients and health professionals with a variety of training and experience. The inclusion of shoulder flexion, shoulder abduction, hip flexion, step test, and head lift tasks in the FI-2 is in accordance with the disease phenotype of myositis (28). It has also been suggested that distal muscle groups may be involved in later stages of the disease (3), which supports the inclusion of heel and toe lift tasks. This also indicates that grip strength, although not included in the FI-2, is important to measure and our results support the reliability of the Grippit as a grip strength measure for patients with myositis. The inclusion of the sit-up task in both the preliminary revised FI and the revised FI was based on clinical experience and patient expertise, but it was excluded in the FI-2 due to low intrarater reliability. The original FI elbow flexion task did not discriminate patients with myositis from healthy controls (16). This could explain the ceiling effect of this task and supports its exclusion from the FI-2. Modifications of the number of repetitions and the pace standardization conferred absence of floor and ceiling effects for the FI-2. This has made it a more sensitive tool to detect changes in impairment in patients with various degrees of muscle impairment, but this modification should be confirmed in a longitudinal study.
Our results support the FI-2 as a measure of impairment. For the upper limb the shoulder flexion of the preliminary revised FI measures endurance more than strength in accordance with our hypothesis. However, it is more uncertain what the step test task measures. Our failure to demonstrate convergence between the step test task and the isokinetic knee extensor endurance test might be ascribed to the latter being open-chained while the former is performed in a closed chain. The absence of higher correlations between the preliminary revised FI shoulder flexion and knee extension tasks and the isokinetic measurements might be explained by several factors. Cohort 3 consisted of a small number of patients with variability in muscle performance. Furthermore, the knee extensor muscle groups tested with the isokinetic measurements of strength and endurance might not be ideal to compare with the step test task of the preliminary revised FI, because they do not completely reflect the functional use of muscles as in the FI. Nonetheless, no gold standard for functional measurement of muscle impairment is available and the use of dynamic isokinetic measurements was hypothesized to be more convergent to the dynamic tasks of the preliminary revised FI than isometric measurements such as MMT or an isometric dynamometer (9–14). Because the HAQ is now considered the international standard, it should have been included in addition to the disease specific, valid MAP for construct validity evaluation of the FI-2. However, when this study was designed, the IMACS outcome measure had not been presented.
All 3 physical therapists involved in the reliability part of the study had regularly performed the original FI and were therefore familiar with a majority of the tasks. This may explain why additional information and practice before study start was not needed. The intrarater reliability of the revised FI was good to excellent, although surprisingly, not to the same extent as for interrater reliability. Because the interrater reliability of the sit-up task was excellent, its poor intrarater reliability is hard to explain and might be ascribed to chance. The fact that no literature includes the abdominal muscles in the phenotype of PM and DM and that the sit-up task showed poor intrarater reliability supports its exclusion in the final FI-2. The systematic differences between the test and retest were probably due to some patients performing more repetitions at retest. This could be explained either by day-to-day variations of health status or by the patients being more cautious on the first test in order to avoid over-exertion.
The revised FI was well tolerated. Only 3 of the 44 patients included in the cohorts 3, 4, and 5 experienced severe delayed onset muscle soreness after performing any of the revised FIs. One of these was in retrospect experiencing an upcoming flare of her disease. The other 2 overcame the muscle soreness in approximately 1 week. All other patients tolerated the measures well with only mild short-term or no muscle soreness. Patients with myositis have previously been reported to experience myalgia and muscle tenderness; however, it is not related to previous muscle activity (2, 16). To our knowledge there are no published data explaining the reasons for longstanding muscle soreness after exercise in these patients. Some patients included in cohorts 3, 4, and 5 had mild to moderate osteoporosis and tolerated the FI-2 measures well. However, some caution should be exerted when measuring patients with severe osteoporosis and multiple vertebral fractures.
All patients who met the inclusion criteria registered at the 2 rheumatology clinics involved in this study were invited to participate in the different parts of this study; however, the number of patients in some cohorts was limited. A larger number of patients might have resulted in more stable analyses. Although the samples were small, they represented a wide variety of age, sex, disease duration, and disability.
Because the international consensus concerning measurements of disease activity were not available at the start of study and although it is well known that CPK levels do not always follow disease activity (29), this variable was chosen as a marker for disease activity in this study. Muscle inflammation could also have been assessed with muscle biopsies or magnetic resonance imaging of muscles; however, these procedures also have limitations.
In conclusion, our results support the FI-2 as a valid, reliable, and feasible outcome measure of muscle impairment for use in daily practice and clinical trials, although further evaluations of its sensitivity to change and normal values need to be conducted. The FI-2 could complement the proposed IMACS core set of outcome measures as a measure for muscle endurance.
Many thanks are given to all participating patients, to Eva Romanus for observing patients in the reliability section, and to Associate Professor Robert A. Harris for linguistic reading of the manuscript.
Instructions for correct performance of the Functional Index-2.