Assessing fatigue in childhood cancer survivors: Psychometric properties of the Checklist Individual Strength and the Short Fatigue Questionnaire––a DCCSS LATER study

Abstract Background Fatigue is often reported by patients with childhood cancer both during and after cancer treatment. Several instruments to measure fatigue exist, although none are specifically validated for use in childhood cancer survivors (CCS). The aim of the current study was to present norm values and psychometric properties of the Checklist Individual Strength (CIS) and Short Fatigue Questionnaire (SFQ) in a nationwide cohort of CCS. Methods In total, 2073 participants were included from the Dutch Childhood Cancer Survivor Study (DCCSS) LATER cohort. Normative data, construct validity, structural validity, and internal consistency were calculated for the CIS and SFQ. In addition, reliability and a cutoff score to indicate severe fatigue were determined for the SFQ. Results Correlations between CIS/SFQ and vitality measures asking about fatigue were high (>0.8). Correlations between CIS/SFQ and measures of different constructs (sleep, depressive emotions, and role functioning emotional) were moderate (0.4–0.6). Confirmatory factor analysis resulted in a four‐factor solution for the CIS and a one‐factor solution for the SFQ with Cronbach's alpha for each (sub)scale showing good to excellent values (>0.8). Test–retest reliability of the SFQ was adequate (Pearson's correlation = 0.88; ICC = 0.946; weighted Cohen's kappa item scores ranged 0.31–0.50) and a cut‐off score of 18 showed good sensitivity and specificity scores (92.6% and 91.3%, respectively). Conclusion The current study shows that the SFQ is a good instrument to screen for severe fatigue in CCS. The CIS can be used as a tool to assess the multiple fatigue dimensions in CCS.


| INTRODUCTION
Cancer-related fatigue, defined as a distressing, persistent, subjective sense of physical, emotional, and/or cognitive tiredness or exhaustion related to cancer and/or cancer treatment that is not proportional to recent activity and interferes with usual functioning, 1 is often reported by patients with childhood cancer both during and after (successful) cancer treatment. It was shown to be a debilitating late effect even years after treatment, limiting a person's daily functioning and affecting the quality of life. [2][3][4] The National Comprehensive Cancer Network (NCCN) recommends to screen all cancer patients for fatigue regularly during and after treatment. 1 Fatigue is a subjective multidimensional phenomenon best assessed with a questionnaire. 5 Several instruments to measure fatigue exist, although none are specifically validated for use in adult childhood cancer survivors (CCS).
A frequently used multidimensional questionnaire to measure fatigue is the Checklist Individual Strength (CIS). 6 It has a total of 20 items using four subscales to distinguish between fatigue severity, concentration problems, reduced motivation, and activity level. The CIS was validated in patients and survivors of adult-onset cancer, 7 but not in CCS specifically.
The CIS can be a good instrument to assess the multiple dimensions of fatigue, but to screen for fatigue it is desirable to have a shorter instrument. Guidelines for survivors of adult cancer recommend screening for fatigue using a numerical rating scale (NRS), 1,8 but this may not be a reliable screening technique in CCS as a single-item screening instrument was found to not be accurate for identifying cases of clinically significant fatigue in survivors of pediatric brain tumors. 9 With the current lack of an available adequate screening instrument, the international late effects of childhood cancer guideline harmonization group (IGHG) recommends screening for fatigue performing a short medical history asking about the survivor's feelings of tiredness and exhaustion. 10 Nonetheless, a systematic measure to indicate whether a person experiences severe fatigue would be preferable. A validated questionnaire with a cut-off score to indicate severe fatigue could be that measure.
A recent study showed that the Short Fatigue Questionnaire (SFQ), 11 a short version of the CIS, is an excellent instrument to screen for severe fatigue in the general population and several patient populations, among which Test-retest reliability of the SFQ was adequate (Pearson's correlation = 0.88; ICC = 0.946; weighted Cohen's kappa item scores ranged 0.31-0.50) and a cut-off score of 18 showed good sensitivity and specificity scores (92.6% and 91.3%, respectively).

Conclusion:
The current study shows that the SFQ is a good instrument to screen for severe fatigue in CCS. The CIS can be used as a tool to assess the multiple fatigue dimensions in CCS.

K E Y W O R D S
checklist individual strength, childhood cancer survivors, psychometric properties, severe fatigue, short fatigue questionnaire survivors of breast cancer and adult hematologic cancer survivors. 12 With a cut-off score to indicate severe fatigue, 12 the SFQ could be an objective screening instrument in CCS.
The SFQ and the CIS are questionnaires that are potentially useful in CCS care. The SFQ as short screening instrument and the CIS as multidimensional fatigue questionnaire have already shown to be valid in multiple patient populations, including cancer patients and survivors of adult-onset cancer. 7,12 To test whether previously shown questionnaire properties are also applicable to CCS, validation of both instruments in this patient population is needed. The current study aim was to investigate psychometric properties of the CIS and the SFQ in a nationwide cohort of CCS. Additionally, to indicate how CCS score the CIS and SFQ compared to other populations, norm values are presented.

| Participants
Participants were included from the Dutch Childhood Cancer Survivor Study (DCCSS) LATER cohort. 13 This nationwide cohort CCS, diagnosed before the age of 18 between January 1, 1963 and December 31, 2001 in the Netherlands, was started for a multidisciplinary DCCSS LATER program for CCS late effect care and research. During a clinic visit, which took place in the period 2017-2020, data were collected for the study (details can be found elsewhere 14 ). Among other questionnaires (described in detail below), the CIS and SFQ were completed by the participants. A subgroup of the participants completed the SFQ twice within 1 week, one during the clinic visit and a second one digitally at home (second version was part of a questionnaire survey for the whole study cohort which most participants already completed in 2013 except for a small subgroup who were not able to participate in the original survey and were therefore asked to complete it for the current study). All participants gave written informed consent (if aged <16 (n = 3), parents gave additional written consent). The study was approved by the Medical Research Ethics Committee of the Amsterdam University Medical Center (registered at toetsingonline.nl, NL34983.018.10).

| CIS
The Checklist Individual Strength (CIS) 6 has 20 items (Table S1) and was designed to measure four fatigue dimensions, namely fatigue severity (CIS-fatigue; 8 items), concentration (5 items), motivation (4 items), and physical activity level (3 items). Some items are reversed before scores are added up (Table S1) to calculate the total score, which can range from 20 to 140 with a higher score corresponding to more problematic fatigue. A score of 35 or higher on the subscale fatigue severity indicates severe fatigue which was validated in the general Dutch population. 7

| SFQ
The Short Fatigue Questionnaire (SFQ) 11 consists of four items (identical to four items of the CIS-fatigue; Table S1) measuring fatigue severity. Three item scores are reversed and then all item scores are added up, resulting in a total score that varies from 4 to 28 with a higher score reflecting more fatigue. The SFQ was validated in the general Dutch population and a cut-off score of 18 or higher was suggested to indicate severe fatigue. 12

| Other measures
To determine the relationship between the CIS/SFQ and other (fatigue related) measures, two health-related quality of life questionnaires that include aspects of fatigue (e.g., vitality) were completed.

| TAAQOL
The TNO and AZL Questionnaire for Adult's Quality of Life (TAAQOL) 15 (Table S1). Scale scores were calculated following instructions described elsewhere 16 with higher scores indicating good HRQOL. The TAAQOL has been validated in both the general population and in patients with chronic diseases. 15 18 The subscale Vitality consists of four items (Table S1), covering feelings of energy and fatigue. Scale scores were calculated following instructions described elsewhere, 19 so that higher scores indicate better HRQOL. The SF-36 has been validated in several patient populations, among which cancer patients 20 and CCS. 21 The questionnaires were digitally completed during the clinic visit (participants who were not able to visit the clinic or who were not able to complete the questionnaires during the visit were asked to complete a digital or paper versions at home).

| Statistical analysis
IBM SPSS (IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp) and R 22 were used to conduct the analyses (all tests with α = 0.05). To examine possible selection bias between study participants and non-participants (eligible CCS that did not return informed consent or did not complete study questionnaires), groups were compared to sex, decade of birth, childhood cancer diagnosis, decade of diagnosis, treatment with chemotherapy, and /or radiotherapy (yes/no). Cramér's V was calculated to examine effects sizes of potential differences between the groups. Seven hundred and forty-four persons explicitly refused to participate (see flowchart in Figure S1) and were therefore excluded from this analysis.

| Normative data
Mean total scores and percentile scores of the SFQ and CIS (total and subscale scores) were calculated.

| Construct validity
Pearson's correlation between the CIS/SFQ and the vitality subscale of the TAAQOL and the SF-36 was calculated (convergent validity). As both vitality subscales assess symptoms of fatigue, strong correlations (r ≥ 0.7) were expected. Pearson's correlation between the CIS/SFQ and the sleep and depressive emotions subscales of the TAAQOL and the role functioning emotional subscale of the SF-36 were also calculated (discriminant validity). We assume fatigue to be moderately related with the HRQOL concepts sleep, depressive emotions, and emotional functioning (r between 0.4 and 0.7) because it concerns concepts that have certain overlap, but differ from fatigue as was pointed out by previous studies. 23,24

| Structural validity
Confirmatory factor analysis (CFA) was performed to determine a single-factor structure for the SFQ (fatigue severity) and a four-factor structure for the CIS (fatigue severity, concentration, motivation, and physical activity). The structure of both instruments had already been validated in the general (Dutch) population, 7,12 however has yet to be confirmed in CCS specifically. Maximum likelihood estimation 25 with direct Oblimin rotation 26 was performed. Eigenvalues ≥1 were used to identify factors and then factor loadings for each item were calculated (>0.4 was considered good factor loading). Item correlation matrix, the Kaiser-Meyer-Olkin (KMO) test, and the Bartlett's test of sphericity were calculated to test adequacy of the data to perform a factor analysis. Model fit was examined using the root mean square error of approximation (RMSEA), with a value <0.06 as a cut-off value to indicate good model fit. 27

| Internal consistency
To indicate whether the items of the instruments measure the same underlying constructs, Cronbach's alpha was calculated for the SFQ and the four subscales of the CIS. It is a measure between 0 and 1 indicating how items hold together in a scale where >0.9 is seen as excellent, >0.8 as good, >0.7 as acceptable, and <0.6 as poor internal consistency.

| Reliability
To determine test-retest reliability of the SFQ, data from a subgroup of the participants (n=90) who completed the SFQ twice within 1 week, were analyzed in three ways (following Bruton, Conway & Holgate who proposed that a combination of approaches is more likely to give a true picture of the instrument's reliability 28 ). Pearson's correlation and intraclass correlation coefficient (ICC) 29 between the two measurement moments were calculated for total scores and weighted Cohen's Kappa (Kw) 30 for all individual items of the SFQ. To calculate the ICC, a two-way random effects, absolute agreement, single measurement model was used. 31 The CIS was only completed once, therefore its test-retest reliability could not be investigated.

| Cut-off score SFQ
To confirm whether the suggested cut-off score of 18 for the SFQ to determine severe fatigue can also be used in the CCS population, an ROC analysis was performed. True "severe fatigue cases" were determined using the cut-off score of 35 on the CISfatigue. Sensitivity (proportion of truly identified severe fatigue cases) and specificity (proportion of truly identified non-cases) were calculated for a range of possible cut-off scores for the SFQ (18 ± 2). Youden's index was calculated, a value between 0 and 1 with a higher value suggesting a better cut-off point. 32,33 In addition, the positive prediction value (PPV; proportion of severe fatigue cases identified by the SFQ that are true cases) and negative prediction value (NPV; proportion of non-cases identified by the SFQ that are truly non-cases) were calculated.
If a participant had one or more missing values for the CIS/SFQ items and therefore no subscale score could be determined, the participant was excluded from the analyses of that particular subscale. Table S1 shows the number of participants for each subscale.

| RESULTS
In total, 2073 participants (43.8% of eligible persons) were included in the current study (flowchart in Figure S1). A comparison with non-participants is shown in Table S3. There are no large differences between the groups (small effect sizes), suggesting no selection bias in the study cohort. Table 1 shows the participant characteristics.

| Normative data
Mean total scores and subscale scores for the CIS and SFQ are shown in Table 2. Also, percentile scores (25 th , 50 th , and 75 th ) are presented.

| Construct validity
Pearson's correlations between the total score of the CIS/SFQ and the vitality, sleep, and depressive emotions subscales of the TAAQOL and the vitality and emotional role functioning subscales of the SF-36 are shown in Table 3. Correlations between CIS/SFQ and the vitality subscales were high (>0.8), indicating good convergent validity. Correlations between CIS/ SFQ and the HRQOL domains sleep, depressive emotions, and role functioning emotional were moderate (between 0.4 and 0.6), indicating good discriminant validity.

| CIS
Item correlations for items of the same subscale ranged 0.51-0.82 and for items of different subscales ranged 0.28-0.78. The KMO test showed a value of 0.96, the Bartlett's test of sphericity was significant (p < 0.001), and the RMSEA was 0.07. CFA resulted in a four-factor solution with each factor explaining 53.7%, 11.0%, 6.4%, and 5.0% of the variance, respectively (76.1% total variance explained). In Figure S2A, the scree plot is shown confirming a fourfactor solution. The eigenvalues for the four factors and factor loadings of all items (range 0.443-0.925) are shown in Table S2. All items loaded good (>0.4) on their original subscale, with items 14 and 20 loading good on two subscales (fatigue severity and activity subscale).

| SFQ
Item correlations were all >0.7, the KMO test showed a value of 0.84, the Bartlett's test of sphericity was significant (p < 0.001), and the RMSEA was 0.13. CFA resulted in a one-factor solution explaining 81.1% of the variance. In Figure S2B, the scree plot is shown with an eigenvalue of 3.245 confirming a one-factor solution. Table S2 presents the factor loadings of the items, which were all good (range 0.785-0.939).

| Internal consistency
Cronbach's alpha for the subscales fatigue severity, concentration, motivation, and physical activity level of the CIS was 0.95, 0.91, 0.85, and 0.91, respectively (alpha for all 20 items was 0.95), indicating good to excellent internal consistency. Cronbach's alpha for the SFQ was 0.92 indicating excellent internal consistency.

| Reliability
A total of 90 participants completed the SFQ twice within 1 week. Thirty-nine participants completed both SFQ questionnaires on the same day and 51 participants completed the second SFQ within a week of the first one (mean number of days between both measurements was 4 days; n = 90). Pearson's correlation between total scores of the two SFQ measurements was high (0.88; p < 0.001) and the ICC was excellent (0.946; 95%CI: 0.907-0.967). Kw scores for item 1-4 were 0.50, 0.43, 0.31, and 0.34, respectively, (all p < 0.01) reflecting fair to moderate item agreement.

| Cut-off score SFQ
ROC analysis showed an area under the curve of 0.974 (95% CI: 0.969-0.980). Sensitivity and specificity of the suggested cut-off score of 18 (± 2) are presented in Table 4. This table also shows the PPV and NPV. The suggested cut-off score of 18 had the highest value for the Youden's index (highest combined sensitivity and specificity) and showed good PPV and NPV.

| DISCUSSION
The aim of the current study was to present norm values and psychometric properties of the SFQ, a four-item screening instrument for severe fatigue, and the CIS, a multidimensional fatigue questionnaire, in a nationwide cohort of CCS. Results show psychometric properties of the SFQ and CIS to be good in CCS and therefore the SFQ can be used to screen for severe fatigue in this population and the CIS can be used to evaluate the multiple fatigue dimensions.
The IGHG guideline 10 suggests to screen regularly for severe fatigue, however no screening instrument had yet been validated in CCS. Single-item screening (with the fatigue thermometer) was shown to not be reliable to indicate clinically significant fatigue in survivors of adolescent brain tumors 9 suggesting a multiple-item instrument to be more optimal for screening in CCS. With the lack of a validated screening instrument, it is currently suggested to screen for fatigue by performing a medical history focused on the survivors feelings of tiredness and exhaustion (at every long-term follow-up visit). Recommended questions to ask are "do you get tired easily" or "are you too tired or exhausted to enjoy the things you like to do." 10 The first of these questions is asked in the SFQ, accompanied by questions asking about exhaustion and fitness level. Looking at the suggestions made in the guideline, the SFQ meets the requirements to screen for fatigue in CCS. The current study showed psychometric properties of the SFQ in CCS to be adequate plus the suggested cut-off score of 18 to indicate severe fatigue showed the highest combined sensitivity and specificity, in addition to a good PPV and NPV in CCS and can therefore be perfectly used for fatigue surveillance.
The guideline further suggests additional testing with a validated fatigue measure for survivors with an indication for severe fatigue. 10 The PedsQL Multidimensional Fatigue Scale or the PROMIS Pediatric Fatigue measure is suggested as both have been validated in CCS. 34,35 However, psychometric properties presented in those studies are limited. Psychometric properties presented of the PedsQL by Robert et al. 34 are good (Cronbach's alpha of total score and three subscales all ≥0.88), but other psychometric properties remained to be determined. Also, the cohort in which the study was conducted was relatively small (n = 64) and did not include all childhood malignancies (only CNS, hematological, lymphoma, and solid tumor cancer diagnoses were included). The study by Hinds et al. 35 showed the PROMIS Pediatric Fatigue measure to be a valid instrument to distinguish different levels of fatigue and that it is feasible for cancer patients and survivor populations to properly complete the questionnaire. However, no psychometric properties of the PROMIS in CCS were presented and the studied cohort only included survivors of leukemia, lymphoma, brain tumors, or solid tumors. The current study was the first to validate the CIS in a nationwide cohort CSS including all childhood malignancies and showed the CIS to be a good instrument to investigate multiple dimensions of fatigue. The structural validity we found in CCS (four-factor solution) is comparable to what has been reported in the general Dutch population, 7 a Japanese working population, 36 a healthy Portuguese population, 37 and a population of patients with rheumatoid arthritis. 38 Item 14 (Physically I am in bad shape) and 20 (Physically I feel I am in good shape) had good factor loadings (≥0.4) for both the fatigue severity subscale and the physical activity subscale meaning these items could be used for both subscales. However, to ensure optimal comparison of subscale scores between different populations, we suggest using the original structure of the CIS (item 14 and 20 in fatigue severity subscale). Correlations with the vitality subscales of the SF-36 and TAAQOL were good (Table 3). A high correlation with these subscales that ask about (life) energy, tiredness, and exhaustion mean that these issues and symptoms of fatigue are reflected in the total score of the CIS and SFQ as well. On the other hand, moderate correlations with the sleep, depressive emotions, and role functioning emotional subscales show that the CIS and SFQ can discriminate well between fatigue and these, often with fatigue interfering, symptoms.
Norm values can help interpreting results. Subscale scores and total norm scores of the CIS were comparable to norm scores of adult-onset breast cancer and hematological cancer survivors. 7 Compared to norms of the general Dutch population, CCS score higher on all subscales and the total score of the CIS. As previous literature showed symptoms of fatigue to be more prevalent in CCS compared to controls, 4,39,40 it was expected that CCS would show higher norm values. Since no large differences in diagnosis and treatment-related variables between participants and non-participants were found, we assume norm values of the current CCS study cohort to be generalizable.
A limitation of the current study was the lack of a gold standard for confirming the cut-off score of 18 for severe fatigue of the SFQ. No validated cut-off instrument to indicate severe fatigue was yet available in CCS and therefore we used the cut-off score of the CIS (≥35) as a gold standard in the current study. The current study showed the structure and internal consistency of the items and subscales of the CIS to be good and comparable to populations it is already been widely used in (general population, T A B L E 4 Sensitivity, specificity, positive prediction value (PPV), and negative prediction value (NPV) of several SFQ cut-off scores survivors of adult cancer) 7 and we therefore believe that the cut-off score of 35 can be safely used in CCS as well.
To conclude, with a growing population of cancer survivors worldwide and fatigue as a frequently reported late effect, structural screening for clinically significant fatigue will become more and more important. The current study shows the SFQ to be a good instrument to screen for severe fatigue in CCS. Would the SFQ indicate a person to be severely fatigued (total score ≥18), it is suggested to do additional testing and the CIS can then be used as a tool to assess the multiple fatigue dimensions.