Measurement properties of the Falls Efficacy Scale‐International (FES‐I) in persons with late effects of polio: A cross‐sectional study

Fear of falling (FoF) is very common in persons with late effects of polio (LEoP). An internationally recognized rating scale to assess FoF is the Falls Efficacy Scale‐International (FES‐I). Yet, there is limited knowledge about its measurement properties in persons with LEoP.


INTRODUCTION
Falls and fear of falling (FoF) are common among persons with late effects of polio (LEoP). [1][2][3][4] The increased impairments of the lower limbs that often occur decades after the acute poliomyelitis infection [5][6][7] can lead to reduced balance and 8 walking limitations, 9 and can increase the risk of falling. 7 It is shown that between 50% and 84% of persons with LEoP fall every year, 1-4 especially outdoors 2 and in activities related to walking. 2,3 Many get injured when they fall, 1,2,10,11 and the experiences of falls can lead to worry, distress, 12 and FoF. 1,2,8,13,14 FoF is defined as "an ongoing concern about falling that ultimately limits the performance of activities of daily living." 15 It is described that up to 95% of persons with LEoP experience FoF when performing daily activities, 1,2,8,13,14 and for some, the fear is constantly in their mind. 12 FoF can lead to difficulties in carrying out household activities, leisure activities, participating in social events, 12 limiting physical activity, 16,17 and avoiding performing meaningful daily activities. 18 Therefore, it is of great importance to accurately assess FoF among persons with LEoP.
One common rating scale to assess FoF is the Falls Efficacy Scale-International (FES-I). The FES-I is widely used in elderly, [19][20][21][22] in persons with neurological disorders, [23][24][25] and in persons with LEoP. 2,4,7,14,16 There are different versions of the scale; the most commonly used is the 16-item version, assessing FoF during various physical and social activities. To minimize the assessment burden, a 7-item version of FES-I (short FES-I) also exists as a screening tool of FoF. 17,19,22,[26][27][28] In clinical practice, the score of each item of the FES-I is summarized to a total score. However, data from a multi-item ordinal rating scale are nonlinear in nature, and should not be calculated into a sum score unless it is proven psychometrically sound and unidimensional. 29 Approaches that can be used to investigate measurement properties of an ordinal scale are factor analysis 30 and the Rasch model analysis. 31 Factor analysis identifies the number of factors that represent relationships among sets of interrelated variables, and the Rasch analysis tests how well the data fit the model expectations. If data from an ordinal scale are unidimensional and fit the Rasch model, a score transformation into interval scores can be made, enabling a summation of the scores and the use of parametric statistics. 31 FES-I is shown to have excellent reliability and construct validity in older people living in the community. 32 It is somewhat unclear, however, if the scale consists of one or two dimensions. One study among older people 19 reported that FES-I belongs to one factor, whereas other studies in older people 20,33 and persons with multiple sclerosis 23 reported that FES-I may belong to two factors. It is important to note that none of these studies 19,20,23,33 used what today can be considered gold standard for explorative factor analysis, that is, polychoric correlation matrix, parallel analysis, minimum rank factor analysis, promin rotation, and quality criteria reporting. [34][35][36] Nor has any previous study thoroughly evaluated the measurement properties of FES-I among persons with LEoP. Therefore, the aim of this study was to investigate the measurement properties of FES-I and short FES-I in persons with LEoP, by use of modern explorative factor analysis and Rasch analysis.

Study design
A cross-sectional design was applied and data were collected by a postal survey. Questions on demographics, clinical characteristics, occurrence of falls, and rating scales about FoF, self-perceived impairments, and walking limitations were mailed to the participants. In the present study, mainly data on FoF are presented; other data have been presented previously. 2

Participants
A total of 356 ambulant community-dwelling persons with LEoP were recruited from a post-polio rehabilitation clinic at a university hospital in southern Sweden. 2 Inclusion criteria were: (1) a confirmed history of acute poliomyelitis affecting the lower limbs; (2) a period of recovery and functional stability of at least 15 years; (3) clinically verified post-polio including new symptoms, and/or loss of functioning in one or both lower limbs that had persisted for at least a year; (4) ability to walk indoors; and (5) ability to understand Swedish. The exclusion criterion was: (1) other diseases that could impact a person's mobility. Of the 356 persons who were invited to participate in the survey, 31 did not respond or declined to participate, and 4 had incomplete FES-I scores. Thus data on FoF from 321 participants were used in this study.

The Falls Efficacy Scale-International
The FES-I 33 consists of 16 items that assess how concerned persons are about falling when: (1) cleaning the house; (2) getting dressed or undressed; (3) preparing simple meals; (4) taking a bath or shower; (5) going to the shop; (6) getting in or out of a chair; (7) going up or down stairs; (8) walking around in the neighborhood; (9) reaching for something above the head or on the ground; (10) going to answer the telephone before it stops ringing; (11) walking on a slippery surface; (12) visiting a friend or relative; (13) walking in a place with crowds; (14) walking on an uneven surface; (15) walking up or down a slope; and (16) going out to a social event.
In the short FES-I, seven items from FES-I are included, that is, items 2, 4, 6, 7, 9, 15, and 16. 26 All items on the FES-I/short FES-I are responded to on a 4-point Likert scale, ranging from 1 = not at all concerned (about falling) to 4 = very concerned (about falling). The total score ranges from 16 to 64 points on the FES-I, and from 7 to 28 points on the short FES-I. A higher score on both scales indicates a greater FoF. Both the FES-I and the short FES-I are reported to have good measurement properties (ie, Cronbach's alpha: 0.96/0.92; mean interitem correlation: 0.64/0.63; and intraclass correlation coefficient: 0.82 and 0.83, respectively) among older people. 26

Analysis
Demographics and clinical characteristics of the participants were analyzed using IBM SPSS Statistics version 25 (IBM Corporation, Armonk, New York, USA) and presented as mean (SD) or percent. To assess the measurement properties of the FES-I, both factor analysis and Rasch analysis were performed.

Factor analysis
A modern factor analysis was applied in the present study, including a polychoric factor matrix, parallel analysis, minimum rank factor analysis, and a promin rotation if two or more factors were found. 36 In parallel analysis, real data percent of variance is compared with the mean and the 95 percentiles of random variance. In a one-factor solution, real data percent of variance is expected to be more than the mean of random variance. To be significant, real data percent of variance should be more than the 95th percentile of random percent of variance. 36 The appropriateness of the confirmatory factor analysis was checked, by means of determinant of the correlation matrix, and should be >0.00001; otherwise it indicates singularity (ie, high associations between items) or multicollinearity (ie, low associations between items). In addition, Bartlett's test of sphericity was checked, which should be <0.05 (validity/suitability of responses), and the Kaiser-Meyer-Olkin (KMO) test of (item) sampling adequacy, which should be above 0.5. 37,38

Rasch analysis
In the Rasch model analysis, the partial credit version [39][40][41] was used to assess if the FES-I/short FES-I met the measurement requirements of the Rasch model (RUMM 2030, RUMM Laboratory, Perth, Australia). This included tests of unidimensionality, local dependency, targeting, hierarchical order of items, differential item functioning (DIF), response category functioning, and reliability (Person Separation Index, PSI). The Rasch model analysis was also used for raw score transformation to interval measurements (linearized scores).
Unidimensionality and local dependency Unidimensionality (ie, when all items in the scale measure a single construct) was assessed by means of Principal Component Analysis (PCA) of the residual correlations. Analysis of the residuals was conducted to identify potential sub-dimensions in FES-I. Person location estimates were derived from two subsets of items: one that loaded positively and one that loaded negatively on the first principal component of residuals.
If violation of unidimensionality is negligible, the number of person locations that differ between the two item sets is small. To support unidimensionality, the overall proportion of persons with significantly different measures from the two item subsets should be <5%. 42,43 Local dependency (ie, when the response to one item determines the response to another item) 44 was assessed by inspecting item fit residuals. As a rule of thumb, fit residuals (ie, residual equals observed minus expected score) estimates for each item should be within AE2. 5. Negative values (<À2.5) indicate local dependency, whereas positive values (>2.5) indicate multidimensionality. 45 Item characteristic curves (ICCs) illustrate this relationship, enhancing the interpretation of the magnitude and pattern of the numerical fit statistics. Individual item residual correlations were also assessed, and if the responses of the items are independent of each other the residual correlations should not exceeded 0. 30. 45 Targeting The sample's targeting to the FES-I and the short FES-I was also analyzed by assessing the mean person location, which expresses the average magnitude and direction by which the person locations differ from the item locations. Good targeting means that the scale represents the persons' levels of FoF, and poor targeting that the persons' FoF are not covered by the scale's range of measurement. Mistargeting results in lower precision and problems with differentiating persons' FoF along the latent scale. 46 For FES-I, positive person logit locations indicate that the sample experiences more FoF than represented by the scale, and vice versa for negative person logit locations. 46 In general, mean person locations up to AE0.5 logits indicate good targeting. 47 Hierarchical ordering of items The logical hierarchical ordering of items along the quantitative continuum was analyzed to assess the internal validity of the FES-I and the short FES-I. 45 It is reasonable to consider the FES-I item locations as a hierarchy from negative (more FoF) to positive (less FoF) locations. Differential item functioning DIF was analyzed to assess if items work the same across different groups of people. In the present study, DIF was evaluated for gender (men and women) and age (65 years or younger, 66 years and older). If DIF can be clinically justified, it should not be resolved by means of splitting the item for the person factor (eg, gender or age groups). 48 Response category functioning The response category functioning of each item was analyzed by inspecting the ordering of the thresholds. A threshold is the point between two categories where either response is equally probable. The FES-I and the short FES-I have four response categories, which yield three thresholds. Disordered thresholds indicate that response categories are not functioning as expected, for example, that persons have difficulties in distinguishing between "somewhat concerned" and "fairly concerned." To resolve disordered thresholds, items can be re-scored by collapsing the response categories. 46,49 Reliability The reliability of the FES-I and the short FES-I was assessed by the Person Separation Index (PSI) value. PSI indicates the power of the construct to discriminate among persons, that is, FoF among groups of persons with LEoP. The PSI is analogous to Cronbach's alpha, 50 but is based on logit values in the Rasch model analysis; higher PSI values indicate greater detection of reliable differences between persons.
Raw score transformation to interval measurements Raw score transformations of the FES-I and the short FES-I to interval measurements were also performed.
Because the FES-I score is ordinal in nature, one point on the various items is not necessarily the same across the measurement spectrum. Through RUMM2030 the raw scores can be translated into logits. Given an appropriate solution, the Rasch person estimates in logits can be transformed into interval measurements (linearized scores). 51

RESULTS
The demographics and clinical characteristics of the 321 participants (173 women) are presented in Table 1. Their mean age was 70 (SD 10) years, the mean age at the acute poliomyelitis infection was 6 (SD 6) years, and the mean duration of LEoP was 20 (SD 9) years. A majority (69%) lived with a partner and 38% lived in housing with stairs. All participants were ambulatory, and about a third could walk >1000 meters. Most of the participants used walking aids. Nearly half of them (48%) reported balance problems, and approximately two-thirds had experience of falls during the past year.

Factor analysis
The factor analysis resulted in one factor according to parallel analysis with 70.4% explained common variance and a reliability of 0.97 (McDonald´s ordinal Omega). The parallel analysis provided support for the one-factor solution, as the percent of variance (71.8%) was more than the mean of random variance (12.7%) and the 95 percentiles of random variance (14.2%). Bartlett's test of sphericity and the KMO criteria were fulfilled. The determinant of the matrix was low (<0.000001) indicating singularity ( Table 2). The high reliability and the low determinant may indicate some item redundancy (ie, local dependency).

Unidimensionality and local independence
Analysis of the FES-I showed some misfit to the Rasch model (X 2 = 141.94; p < .0001; item mean (SD) fit residual = À0.27 (1.1) and person mean (SD) fit residual = À0.32 (1.0)). PCA of the residuals indicated multidimensionality, as the 95% confidence interval (CI) did not include 5% (12.7% with significantly different measures from the two item subsets, 95% CI 10.3-15.1%). Item 4 "Taking a bath or shower" had a fit residual slightly above 2.5, which also indicated multidimensionality. However, inspection of the item characteristic curve for this item showed only a slight deviation of the observed score compared to the expected score. Inspection of residual correlations showed high The short FES-I also showed some misfit to the Rasch model (X 2 = 46.73; p < .01): item mean (SD) fit residual = À0.04 (1.4) and person mean (SD) fit residual = À0.32 (0.88). PCA of the residuals showed unidimensionality, as the lower CI for the number of significant tests overlapped 5%. Item 7 "Going up or down stairs" had a fit residual slightly below 2.5, indicating local dependency. When inspecting the item characteristic curve for this item, only a slight deviation of the observed score was found compared to the expected score. No residual correlations ≥0.3 were revealed in short FES-I.

Sample-to-scale targeting
As can be seen in Figure 1, the sample-to-scale targeting for the FES-I and the short FES-I was suboptimal. In both versions of the scale, measurements for some persons on the left (floor effect: less FoF) were not matched by the items of equivalent difficulty in relation to FoF. The mean person location was À1.229 (SD 2.733) for the FES-I, and À1.152 (SD 2.103) for the short FES-I, indicating that the sample on average reported somewhat less FoF than expected.

Hierarchical order of items
In Table 3, the hierarchical ordering of the items in the FES-I and the short FES-I is presented. In the FES-I, item 11 "Walking on a slippery surface" was perceived as most concerning, whereas item 3 "Preparing simple meals" was perceived as least concerning. In the short  FES-I, item 15 "Walking up or down a slope" was perceived as most concerning, whereas item 2 "Getting dressed or undressed" was perceived as least concerning. The ordering between the items seemed clinically reasonable for both versions of the scale.

DIF
For the FES-I, a gender DIF was seen for item 1 "Cleaning the house" (p = .00017), where men reported less FoF than women did, and for item 2 "Getting dressed or undressed" (p = .00006), where women reported less FoF than men did. First, we adjusted item 2 for gender. As the difference for item 1 remained, this item was also split. However, the strong correlation (r = 1.0) between the DIF-adjusted person locations and the nonadjusted locations for items 1 and 2, indicated that the DIF for these items were of minor importance.
For the short FES-I, a gender DIF also occurred for item 2. As there was a strong correlation (r = 1.0) between the DIF-adjusted person locations and the nonadjusted locations for this item, it was concluded that the DIF was of minor importance.

Response category functioning
The response category thresholds worked as intended for all items in the FES-I and the short FES-I.

Reliability
Reliability was high for both the FES-I and the short FES-I (PSI with extremes 0.92/0.86; Cronbach's alpha 0.95/0.87, respectively), implying that almost five and four (ie, 4.85/3.69 strata) distinct levels of FoF can be identified 52 by the two scales, respectively.

Raw score transformation to interval measurements
In the Appendix S1, raw scores of the FES-I and the short FES-I are transformed to linear logit values (together with their standard errors) and to linearized scores. For the FES-I, the highest information function with the lowest measurement error corresponded to a raw score of 46 and a linearized score of 51.09. The corresponding figures for the short FES-I are 20 and 20.82, respectively. This could be taken as an indicator for where the best cutoff between low and high concerns for falling is located, with respect to the lowest measurement error.

DISCUSSION
Many persons with LEoP are afraid of falling when performing everyday activities, and therefore, FoF is important to assess. Here, we have for the first time, evaluated the measurement properties of the FES-I and the short FES-I in persons with LEoP. The factor analysis revealed that the FES-I was unidimensional, even though the Rasch analysis showed some misfit to the Rasch model and local dependency (in particular for the FES-I). Targeting for both the FES-I and the short FES-I was somewhat suboptimal, as our participants on average reported less FoF than expected. A negligible gender DIF was found for two items in the FES-I and for one item in the short FES-I. Reliability was high for both the FES-I and the short FES-I, and the response category thresholds worked as intended for both versions of the scale.
The explorative factor analysis resulted in a onefactor solution for the FES-I, with a 70.4% explained variance and a high reliability (0.97). The determinant of the matrix was low, possibly due to high associations between items (cf. Table 2). In addition, the high reliability and the low determinant may indicate some item redundancy (ie, local dependency). Our finding is in line with the study by Delbaere et al among older people, 19 reporting that the FES-I comprises one factor. However, other studies in older people 20,33 and persons with multiple sclerosis 23 have reported that the FES-I belongs to two factors, albeit providing some support for unidimensionality. The reason for the various results may be the use of different factor analyses. None of the above-mentioned studies 19,20,23,33 have used what today can be considered as gold standard for explorative factor analysis. Instead, they used the "default methods," most likely because other studies often use T A B L E 3 Item-level Rasch model fit of the items in the FES-I (16-i) and the short FES-I (7-i), ordered by location from the most concerning task to the least concerning task  these methods. The default methods include PCA with Varimax rotation, based on a factor matrix with Pearson correlations, and criteria that tend to overestimate the number of factors to extract (eigenvalue >1 and/or scree plot). 53,54 In addition, the two-factor solutions are characterized by delivering "difficulty factors," rather than "latent variables," 30,55 meaning that the least difficult items and the most difficult items form separate factors. 20,23,33 However, by using modern gold standard methods for factor analysis, we are convinced that our analyses support the FES-I to be considered unidimensional.
In the Rasch analysis we found some misfit to the Rasch model for the FES-I regarding unidimensionality and local independence. The PCA of the residuals did not include 5% in the 95% CI and the item "Taking a bath or shower" had a slight deviation of the observed score compared to the expected score. In addition, local dependency (correlations >0.3) was found between the items "Visiting a friend/relative" and "Going to a social event," between "Walking in a place with crowds" and "Going to a social event," and between "Walking on an uneven surface" and "Walking up/down a slope." The dependency among these items may be expected, as they to some extent include similar aspects (ie, social components and walking in a challenging environment). However, in the Rasch analysis for the short FES-I, unidimensionality was supported and no residual correlations ≥0.3 were revealed. Thus the short FES-I seems to have a somewhat better fit to the Rasch model than the FES-I, in line with another study in persons with multiple sclerosis. 23 Regarding the hierarchical ordering of the items in the FES-I, the participants rated "Walking on a slippery surface" to be the most concerning, whereas "Preparing simple meals" was perceived as the least concerning. In the short FES-I, participants rated "Walking up or down a slope" to be the most concerning item and "Getting dressed or undressed" to be the least concerning. The ratings are clinically reasonable and in accordance with previous studies in persons with LEoP, 1,2 where activities related to walking, in particular, are associated with increased FoF. 7 In the present study, men reported less FoF than women when "Cleaning the house," and women reported less FoF than men when "Getting dressed or undressed." Possible reasons may be that women clean the house more often than men do, and that they sit down more often when they get dressed/undressed. However, a deeper analysis revealed that the gender DIF for these items was negligible. To the best of our knowledge, no previous study has reported any particular gender differences for these items.
Furthermore, the Rasch analysis showed that the response category thresholds worked as intended for all items in both the FES-I and the short FES-I. Our finding is in line with the study by Halvarsson et al, who reported good response category functioning for FES-I. 21 In addition, we found a high reliability for both versions of the scale (PSI 0.92 and 0.86 for the FES-I and the short FES-I, respectively), even though the highest values occurred for the FES-I. Our results are in accordance with those of previous studies of the FES-I, 20,23,33 and even slightly higher for the short FES-I than reported by Delbaere et al. 19 Regarding the sample-to-scale targeting, it was found to be somewhat suboptimal for both versions of the scale (cf. Figure 1A,  B). A possible reason for the results may be that our participants were quite well functioning. All of them were ambulatory, around 50% had no or little balance problems, and one-third could walk >1000 meters. A previous study has shown that persons with LEoP seem to have less FoF the better walking ability and balance they have. 7 Because a summation of the FES-I and the short FES-I scores is warranted in clinical practice and in research, we transformed raw (ordinal) scores into linearized scores (interval measurements) (see Appendix S1). Interval scores are more appropriate than ordinal scores for use in parametric statistics and enable comparisons of changes in FoF over time. 56 However, transformed values can only be obtained from the raw scores if the person has responded to all items. In fact, all participants in our study responded to all items in the FES-I, indicating that the scale was easy to fill in.

Strength and limitations
Our participants comprised a fairly homogeneous sample of ambulatory persons with mild to moderate LEoP. Therefore, further testing of the FES-I and the short FES-I is needed in a more heterogeneous population of persons with LEoP. A strength of the study is that we included 321 persons, which is considered sufficiently large for both factor analysis 57 and Rasch analysis. 58 Several statistical methods were used to assess the measurement properties of both the FES-I and the short version of the scale. However, DIF was only assessed in relation to gender and age, and other factors that could potentially contribute to item bias need to be examined. In addition, further evaluations of other measurement properties of the two versions of the FES-I are warranted, for example, test-retest reliability and responsiveness, among persons with LEoP.

CONCLUSION
The measurement properties of the FES-I and the short FES-I in persons with LEoP were considered sufficient. The factor analysis of the FES-I supported unidimensionality and the Rasch analysis showed that both