Evaluating quality of life in frailty: applicability and clinimetric properties of the SarQoL® questionnaire

Abstract Background The SarQoL® questionnaire was specifically designed to measure quality of life (QoL) in sarcopenia. Frailty and sarcopenia have areas of overlap, notably weak muscle strength and slow gait speed, which may mean that the SarQoL could provide a measure of QoL in frailty. This study aimed to evaluate the clinimetric properties of the SarQoL questionnaire in physical frailty using the Fried criteria. Methods Analyses were carried out on data from the Sarcopenia and Physical impairment with advancing Age study. Frailty was assessed with the Fried criteria and QoL with the SarQoL, the Short‐Form 36‐Item, and the EuroQoL 5‐Dimension (EQ‐5D) questionnaires. We evaluated discriminative power (with the Kruskal–Wallis analysis of variance test), internal consistency (with Cronbach's alpha), construct validity (through hypotheses testing), test–retest reliability (with the intraclass correlation coefficient), measurement error (calculating standard error of measurement and smallest detectable change), and responsiveness (through hypotheses testing and standardized response mean). Results In total, 382 participants were included for the validation and 117 for the responsiveness evaluation. They had a median age of 73 (69–79) years, took 5 (3–8) drugs, and had 4 (3–5) co‐morbidities. There were more women (n = 223; 58.4%) than men and, in total, 172 (45%) robust, 167 (44%) pre‐frail, and 43 (11%) frail participants. Discriminative power was confirmed when significantly lower (P < 0.001) overall SarQoL scores, and thus also worse QoL, were observed between robust [77.1 (64.35–85.90)], pre‐frail [62.54 (53.33–69.57)], and frail [49.99 (40.45–56.06)] participants. Six of the SarQoL domains performed likewise, with significantly lower scores according to frailty status with Domain 7 (fears) being the exception. Internal consistency was good (α = 0.866). Convergent (using Short‐Form 36‐Item and EQ‐5D) and divergent construct validity (using EQ‐5D) was confirmed. Test–retest reliability was excellent [intraclass correlation coefficient = 0.918 (0.834–0.961)], with a standard error of measurement of 3.88 and a smallest detectable change of 10.76 points. We found moderate responsiveness when five of the nine hypotheses were confirmed, coupled with a large effect size for the overall SarQoL score (corrected standardized response mean of −1.44). Conclusions The SarQoL questionnaire has adequate clinimetric properties for use with frail patients in clinical practice and trials and could provide data that are more appropriate and detailed than the generic questionnaires currently used.


Introduction
The World Health Organization declared the period from 2020 to 2030 to be the decade of healthy ageing, which they define as 'the process of developing and maintaining the functional ability that enables wellbeing in older age'. 1 This concept is closely linked to the syndrome of frailty, a clinically recognizable state of increased vulnerability in older people, caused by age-related losses in physiological reserves and function across multiple organ systems, such that the ability to cope with everyday or acute stressors is compromised. 2 This state of increased vulnerability is associated with negative health outcomes, as evidenced by a recent metaanalysis, which found an increased likelihood of premature mortality, hospitalization, and institutionalization. 3 Frailty was also associated with an increased risk for developing disability in both basic and instrumental activities of daily living, an increased risk for physical limitations, dependency, falling, fractures, cognitive decline, decline in lean body mass, and lower life satisfaction. 3 These outcomes, in combination with an estimated prevalence of 10.7-18%, mean that frailty represents an important burden on public health. [4][5][6] While hard outcomes such as mortality and hospitalizations remain the primary indicators in research settings, outcomes measuring the subjective experience of patients are becoming as essential part of the arsenal. Health-related quality of life is one of the main patient-reported outcome measures used in research, and several studies have already focused on quality of life (QoL) in frailty in the last decade. A 2019 systematic review listed 22 studies that assessed QoL in frailty and which demonstrated that frail participants had worse QoL than robust participants. However, these differences between frail and robust people were only clear for the sub-concepts of physical functioning and satisfaction with life. For social and environment scales, results were inconsistent between the different questionnaires used, limiting their usefulness in assessing the psychosocial well-being pre-frail and frail individuals. In this systematic review, the Short-Form 36-Item (SF-36) was the most frequently used instrument out of the 14 instruments included, followed by the WHOQOL-BREF, the CASP-19, and the EUROHIS-QOL. 7 Several observations can be made from the results of this systematic review. First, the SF-36, which was the most frequently used instrument to measure QoL in frailty, is a generic instrument and not adapted to specific populations or diseases. 8 While generic instruments allow QoL to be compared between a range of populations, specific instruments often possess better construct validity and are more sensitive to changes in QoL over time. 9 Secondly, the concept of QoL and the components needed to provide a holistic assessment were interpreted differently between each of the QoL questionnaires. While some concepts from the generic QoL questionnaires mentioned previously are shared with the sarcopenia quality of life (SarQoL ® ) questionnaire (i.e. physical and mental health and activities of daily living), others such as 'body composition', 'leisure activities', and 'fears' are unique.
The systematic review did not include frailty-specific QoL instruments. A QoL instrument specific to the frailty syndrome might improve sensitivity to change in disease-specific QoL over time in this group. 10 One such specific questionnaire is the SarQoL questionnaire, developed in 2015 with the aim of measuring health-related QoL in sarcopenic persons. 11 The questionnaire was constructed using input from experts, literature review, and crucially, interviews with older, sarcopenic individuals. It has been validated for use with sarcopenic, older, community-dwelling participants in multiple languages and has consistently been shown to be a valid and reliable instrument, as well as responsive to changes in QoL. [12][13][14][15][16][17][18][19][20][21] Multiple authors have argued that the conceptual frameworks of frailty and age-related sarcopenia overlap substantially, notably on the similar clinical manifestations used to diagnose the two conditions. The slowness indicator in the Fried criteria for frailty and the low gait speed indicator used to characterize muscle function in sarcopenia are one area of overlap between the two conditions. Partial overlap exists between weight loss in frailty and muscle loss in sarcopenia, and fatigue/exhaustion in frailty and grip strength in sarcopenia. Some have argued that sarcopenia is equivalent to the physical component of frailty, separate from the cognitive, psychological, sociological, and spiritual components of frailty. [22][23][24] Because of the overlap between sarcopenia and physical frailty, we considered it worthwhile to explore whether the SarQoL questionnaire could be used in the assessment of QoL in frail and pre-frail individuals, as diagnosed with the Fried criteria. This study aims to examine the clinimetric properties of the SarQoL questionnaire in robust, pre-frail, and frail participants of the Sarcopenia and Physical impairment with advancing Age (SarcoPhAge) study.

Population
The analyses described in this manuscript have been carried out using the data collected during the SarcoPhage study. This cohort study followed a sample of community-dwelling older people for 5 years and has been described in multiple publications [25][26][27][28][29] . In brief, the SarcoPhAge study recruited a convenience sample of volunteers aged 65 years or older living in the Liège province of Belgium. Participants were recruited from different departments of an outpatient clinic in Liège, as well as through advertisement in the local press. Candidates were not eligible for inclusion in the cohort if they presented with a body mass index (BMI) >50 kg/m 2 or if they had one or more amputated limbs. No other exclusion criteria were applied. Participants were invited to the research centre once yearly, where they performed physical tests and completed questionnaires. 25 For the analyses presented here, we used data from Year 1 of follow-up, except for the evaluation of the responsiveness of the questionnaire, where we used data from the visits carried out at 1 and 5 years into the study.

Frailty evaluation
In the SarcoPhAge sample, physical frailty was evaluated with the criteria described by Fried et al. 30 The Fried diagnostic criteria evaluate five items to determine whether a person is considered to be robust, pre-frail, or frail. In this study, the five criteria were measured with the following instruments: weakness was present if handgrip strength measured with hydraulic dynamometer was below the cut-offs based on gender and BMI, low gait speed was detected by evaluating usual walking speed on a 4 m track (results corrected to 4.5 m track) with cut-offs based on gender and height, low physical activity was measured with the Minnesota Leisure Time Activity Questionnaire 31 using gender-specific cut-offs for kilocalories used in physical activity in the preceding week, exhaustion was established using two items from the Center for Epidemiological Studies Depression scale, 32 and weight loss was detected through a self-reported question on unintentional weight loss of more than 4.5 kg in the past year. 25,30 For each item, participants were given 1 point if below the cut-off, and 0 if not, and these item scores were summed for a frailty score between 0 and 5. Participants with zero points were considered robust, a score of 1 or 2 points indicated a pre-frail state, and subjects with a score of 3 or more points were considered to be frail. A detailed description of the criteria, instruments, and cut-off values is provided in Supporting Information, Table S1.

Quality of life measurement
The SarQoL questionnaire, the focus of this validation study, is a patient-reported outcome measure specifically designed to evaluate QoL in older, sarcopenic, community-dwelling people. There are 55 items in the questionnaire, categorized into seven domains of health-related dysfunction: (i) physical and mental health, (ii) locomotion, (iii) body composition, (iv) functionality, (v) activities of daily living, (vi) leisure activities, and (vii) fears. A score between 0 (worst QoL) and 100 (best QoL) is provided for each domain, and an overall QoL score (range: 0-100 points) is calculated on the entirety of the questionnaire. 11 The scoring algorithm is not publicly available, but tools to calculate the scores are available upon request via info@sarqol.org or via the website www.sarqol. org and free for non-sponsored research. The questionnaire is self-reported and takes about 15 min to complete. The SarQoL questionnaire has been validated in multiple languages and has been shown to be a valid and reliable instrument. 13, [15][16][17][18][19]33 The questionnaire was shown to be responsive to changes in QoL in a sample of 42 sarcopenic subjects followed over 3 years, and its standard error of measurement (SEM) and smallest detectable change have been calculated in different European populations as well as pooled. 20,21 Complementary to the SarQoL questionnaire, two generic QoL questionnaires were also completed by each participant to allow the evaluation of the construct validity of the SarQoL questionnaire. The first of these, the SF-36 questionnaire, measures functional health and well-being from the patient's perspective, providing eight domain scores (physical functioning, social functioning, role functioning physical, role functioning emotional, vitality, bodily pain, mental health, and general health) and two summary scores (physical and mental), all scored from 0 (worst QoL) to 100 (best QoL) points. 34 Secondly, the EuroQoL 5-Dimension 3-Level (EQ-5D-3L) and the associated visual analogue scale (EQ-VAS) were administered. The EQ-5D is a generic measure of health status, which records self-reported problems (none, some, and extreme) on five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). 35 Results are reported as an index score (between 0 and 1, with 0 indicating death and 1 indicating perfect health) and a self-rated health evaluation (between 0, worst imaginable health, and 100, best imaginable health). 36

Clinimetric properties
The measurement properties to be included in this validation were selected based on the COSMIN taxonomy and its related documentation. 37,38 These include known-groups validity (also known as discriminative power), internal consistency, construct validity (through hypotheses testing), reliability (test-retest), measurement error, and responsiveness. We also looked at the presence of floor and/or ceiling effects and provided the smallest detectable change to aid in the interpretation of the evolution of the SarQoL scores over time.
i Known-groups validity is based on the hypothesis that two or more groups with distinctive characteristics should logically differ in the construct that is measured. 39 In the context of this study, the hypothesis is that robust participants should have higher QoL scores than pre-frail and frail participants, which would mean that the SarQoL questionnaire can discriminate between the three frailty profiles. ii Internal consistency quantifies the degree of interrelatedness between the items in the questionnaire, that is, whether all items in the SarQoL measure the same underlying construct (QoL). 37,38 iii Construct validity is used to assess whether the questionnaire under investigation actually measures what it theoretically aims to measure. This is performed by comparing the questionnaire with other questionnaires (or subscales of) that should, in theory, measure the same construct (convergent validity) or a different construct (divergent validity). 37,38 In this study, we utilized the same eight hypotheses on the strength of association between the overall QoL score of the SarQoL questionnaire and domains of the SF-36 and EQ-5D that were used in previous validations. 13-19,33 iv The test-retest reliability of the questionnaire shows whether the scores measured by the SarQoL questionnaire remain stable between multiple administrations, on the condition that the participants' health state also remains stable. 37,38 To measure this, the SarQoL questionnaire was administered twice, with an approximate interval of 2 weeks in between, and participants provided information on the stability of their health. Because of the different objectives of the study that collected the data analysed in this article, only the 43 participants who were diagnosed as sarcopenic with the European Working Group on Sarcopenia in Older People criteria were invited, at the time of the original validation study, to participate in the retest part, with 30 providing usable data. 13 ReliabilityJCSM_12687 is also demonstrated by the SEM, which provides a measure of the dispersion of observed scores around the 'true' score from repeated measurements. The smallest detectable change provides the value for the minimum change in QoL scores that needs to be observed to be certain that the measured change in QoL is real and not possibly due to measurement error. 38 v Floor and ceiling effects indicate that the range of the scale is too narrow and that extreme profiles cannot be accurately measured. They are present when >15% of the participants obtain either the highest or lowest score. vi The last clinimetric property investigated was the responsiveness of the questionnaire, that is, its capacity to detect change over time, between the first and fifth years of the SarcoPhAge study. 21 We used the same methodology as in a previous evaluation of the responsiveness of the SarQoL, which was combination of hypothesis testing and effect size evaluation. 21 In short, we evaluated nine hypotheses (see Table 5) on the theorized strength of correlation between the changes observed with the SarQoL questionnaire between Year 1 and Year 5 and the changes observed with (domains of) the SF-36, EQ-5D, and EQ-VAS. The results were interpreted with the criteria from de Boer et al., which indicate that a questionnaire has high responsiveness if at least 75% of hypotheses are confirmed, moderate when at least 50-75% are confirmed, and poor responsiveness when less than 50% are confirmed. 40 In this analysis, we included all participants for whom we had valid data at Year 1 and Year 5. For the second method, we calculated standardized response means (SRMs) (a measure of effect size), which reflect the magnitude of change measured by the SarQoL and by the other questionnaires used in this study. Larger effect sizes indicate that the questionnaire possesses better responsiveness. 41 Because this method is based on the assumption that a change in health status has occurred, we could only include those participants for whom we had valid data at Year 1 and Year 5 and whose frailty status changed in the years between evaluations. The change in frailty status is used here as a proxy measure of change in health status, and we hypothesize that a change in frailty status will be reflected in the observed change in QoL.

Statistical analysis
Normality of distribution for quantitative variables was tested with the Shapiro-Wilk test, by comparing mean and median and by evaluating the histogram and Q-Q plot. Continuous variables following a Gaussian distribution are reported as mean ± standard deviation, while those who do not are reported as median (25th-75th percentile). Nominal variables are reported as absolute (n) and relative (%) frequencies. The evaluation of differences between groups for nominal variables was carried out using Pearson's χ 2 test. All results were considered significant at 5% level (P ≤ 0.05), except for pairwise comparisons between the robust, prefrail, and frail groups, which were considered significant at P ≤ 0.017 (P-value adjusted for the number of comparisons: iv The test-retest reliability was quantified by calculating the intraclass correlation coefficients (ICCs) (two-way mixed model À absolute agreement type) between the scores from the first and the second administrations. ICCs greater than 0.7 indicate acceptable reliability. 38 The SEM was calculated by dividing the standard deviation of the difference between the scores from the first administration and those of the second administration by the square root of 2. This gives the following formula: SEM = (SD (test score À retest score) ∕√2). The smallest detectable change is derived from the SEM value, by the following formula: 1.96 * √2 * SEM. v Floor and ceiling effects were evaluated following inspection of the frequency tables. vi Finally, SRMs, a measure of effect size and used to evaluate responsiveness, were calculated by dividing the mean difference between the SarQoL scores from the first year and the fifth year of the SarcoPhAge study by the standard deviation of the differences between these paired values.
The SRM values were subsequently transformed with the formula SRM/ √2∕√(1 À r), where 'r' signifies the correlation between Year 1 and Year 5 scores. 43

Clinical characteristics
In total, 382 subjects were eligible for inclusion at the first follow-up visit of the SarcoPhAge study. These subjects had a median age of 73 (69-78) years old, were slightly overweight at a median BMI of 27 (24)(25)(26)(27)(28)(29)(30) kg/m 2 , took a median of 5 (3-8) drugs, and had a median of 4 (3-5) co-morbidities. There were slightly more women (n = 223; 58.4%) than men in the sample. The median grip strength was 39 (33-45) kg for men and 21 (17.5-25) kg for women. Lastly, the median gait speed in the complete sample was 1.09 (0.91-1.28) m/s. All 382 participants were evaluated for frailty with the Fried criteria, and we found 172 (45%) robust, 167 (44%) pre-frail, and 43 (11%) frail individuals. Clinical characteristics were significantly different between these three groups. Frail participants were older than pre-frail participants, who, in turn, were older than robust individuals (P < 0.001). The same dynamic was present for BMI (with frail participants having the highest BMI; P < 0.001), drug consumption (with frail participants taking the most drugs; P < 0.001), and co-morbidities (with frail participants having the highest number of co-morbidities; P < 0.001). As expected, frail participants had lower grip strength than pre-frail participants, who, in turn, had lower grip strength than robust people (P < 0.001 for men and women). The same was observed for gait speed (with frail participants having the lowest gait speed; P < 0.001). Detailed results and pairwise comparisons are available in Table 1.

Known-groups validity
The overall QoL score measured by the SarQoL questionnaire was significantly different (P < 0.001) between the three categories of frailty, following a downward trend with robust participants having the best QoL [77.10 (64. 35 . The differences between the overall QoL scores in these three groups were revealed to be significant in the paired comparisons (all P < 0.001).
The QoL scores for the seven domains of health-related quality of life in the SarQoL questionnaire were also shown to be highly significantly different (P < 0.001). An examination of the paired differences showed that only Domain 7 (fears) was not significantly different in the comparison between pre-frail and frail groups (P = 0.119).
The complete results of the known-groups validity are presented in Table 2.

Internal consistency
The homogeneity of the questionnaire was found to be excellent, with Cronbach's alpha of 0.866, at the upper end of the 0.70 to 0.95 range considered good. This shows that the questionnaire is consistent without showing the increased likelihood for redundancy associated with alpha values greater than 0.95. The influence of individual domains was tested by deleting a single domain at a time. The resulting alpha values ranged from 0.854 to 0.894, indicating that no domain unduly influences the internal consistency.

Construct validity
Two sets of hypotheses were examined: for the convergent construct validity, we theorized that the overall QoL score of the SarQoL questionnaire measures a construct related to the SF-36 physical functioning, role limitation due to physical problems, and vitality domains as well as to the EQ-5D utility score. We therefore expect to find moderate to strong correlations between the SarQoL and these domains. For the divergent construct validity, we theorized that the overall QoL score of the SarQoL questionnaire measures a different construct than the SF-36 role limitation due to emotional problems and mental health domains, as well as the self-care      and anxiety/depression items of the EQ-5D. If this is correct, we expect to find weak or non-existent correlations between the SarQoL and these domains. The convergent validity of the SarQoL questionnaire in the entire sample was excellent, as evidenced by the confirmation of the four pre-specified hypotheses and the strong correlations between the overall QoL score of the SarQoL questionnaire and the four domains theorized to measure similar constructs (correlation coefficients between 0.447 and 0.798). When isolating the three frailty groups, the results were largely similar, with the exception of the correlation between the SarQoL and the SF-36 role limitation due to physical problems domain in the frail group, which dropped from r = 0.628 (P < 0.001) to r = 0.246 (P = 0.199).
The results of the divergent construct validity were less straightforward: both in the complete sample and in the three frailty categories, we found moderate to strong correlations between the SarQoL questionnaire and the two domains of the SF-36 theorized to be measuring a different construct. The hypotheses with the EQ-5D self-care and anxiety/depression items were confirmed by weak correlations (respectively, r = À0.273; P < 0.001 and r = À0.257; P < 0.001).
The full results, shown in Table 3, demonstrate that six out of the eight pre-specified hypotheses were confirmed, fulfilling the criteria of 75%, which indicates acceptable construct validity.
The SEM in this sample was calculated to be 3.88 points, leading to a smallest detectable change of 10.76 points. In practical terms, the overall QoL score of an individual participant would have to change by 10.76 points to be able to be sure that the observed change in QoL is due to a real change in QoL in the patient. SEM and smallest detectable change for the individual domains are reported in Table 4.

Floor and ceiling effects
None of the 382 participants obtained the lowest (0) or the highest (100) score possible for the overall QoL score of the SarQoL ® questionnaire, showing the absence of floor and ceiling effects in the summary score.

Responsiveness
Out of the 382 participants who provided usable data at the first year of the SarcoPhAge study, 235 remained in the study at the fifth year of follow-up and were included in the responsiveness evaluation. Of these 235, a further 117 changed in terms of their frailty status between the first and fifth years of the study, and these were included in the analysis of responsiveness through the evaluation of effect size (SRMs).
We examined nine hypotheses used in an earlier study of the responsiveness of the SarQoL questionnaire, on the theorized correlation between changes measured by the SarQoL questionnaire and by other questionnaires. We were able to confirm five out of nine hypotheses but had to reject Hypothesis 1 (ΔSarQoL overall score and ΔSF-36 general health), Hypothesis 4 (ΔSarQoL overall score and ΔEQ-VAS), Hypothesis The details of the hypotheses and the observed correlations can be found in Table 5.
We also evaluated responsiveness with the metric of effect size. We calculated SRMs for all domains and summary scores of the SarQoL, SF-36, and EQ-5D questionnaires. The complete results are reported in Table 6. We can observe that the SRM of the SarQoL overall score (corrected SRM = À1.14) is much larger than the SF-36 PCS (corrected SRM = À0.634), the EQ-5D index (corrected SRM = 0.064), and the EQ-VAS (corrected SRM = À0.267). Globally, the SarQoL questionnaire had small effect sizes for three domain scores, moderate for 2 and large for 1. The SF-36 obtained small effect sizes for five domains and the MCS, and moderate effect sizes for three domains and the PCS.

Discussion
This study examined whether the SarQoL questionnaire could be used as a disease-specific instrument to measure health-related QoL in frailty. The psychometric results presented in this article indicate that it has adequate measurement properties when used with the Fried frailty criteria. This means that the SarQoL could be a new option for researchers seeking to evaluate QoL in populations characterized by the presence of pre-frailty and/or frailty.
This study demonstrated that the SarQoL questionnaire can discriminate between robust, pre-frail, and frail subjects, with declining QoL scores according to the category of frailty, and that it can do so over a wide range of concepts. The systematic review by Crocker et al. highlighted that (sub)scales measuring physical aspects of QoL were broadly able to discriminate between robust and frail people but reported inconsistent results for other aspects of QoL. 7 Therefore, it is encouraging to see that the SarQoL questionnaire   is able to discriminate on more than just the physical aspects of QoL and that it brings extra precision in being able to discriminate between robust, frail, and pre-frail individuals. A note of caution is warranted with regard to Domain 7, where only the comparison between robust and frail participants yielded significantly different QoL scores. This domain should not be interpreted in a vacuum but taking into account the other domain scores and the overall QoL score.
The internal consistency was shown to be high (α = 0.866), indicating that the domains in the questionnaire are highly interrelated and measure the same construct, QoL. Mixed results were obtained in the evaluation of the construct validity of the questionnaire. All four hypotheses on the convergent validity were confirmed, but two out of the four hypotheses for divergent validity were rejected. The two rejected hypotheses, where we found stronger correlations than expected, were between the overall QoL score of the SarQoL questionnaire and the mental health and role limitations due to emotional problems domains of the SF-36. It may be that our hypotheses are erroneous and that these two domains are conceptually closer to the SarQoL questionnaire than we theorized. One correlation of particular interest is between the SarQoL overall QoL score and the SF-36 role limitation due to physical limitations in the frail group (r = 0.246), because it is significantly lower than the correlation coefficients in the robust (r = 0.408) and pre-frail groups (r = 0.611). Upon further investigation, this discrepancy is linked to the significant floor effect in this SF-36 domain, where 16 of the 30 participants obtain the lowest score possible.
The test-retest reliability of the questionnaire was excellent, with an ICC of 0.918 (95% CI = 0.834-0.961) for the overall score. However, because the original study only contacted the participants diagnosed as sarcopenic with the European Working Group on Sarcopenia in Older People criteria to enter the evaluation of the test-retest reliability, there were only data available for 29 participants. So, while this is a result that indicates good test-retest reliability, with an elevated ICC and a relatively small CI, these results should be confirmed in a larger sample and in particular samples with sufficient pre-frail and frail participants to calculate ICC's for these particular groups. Because the SEM and the smallest detectable change are based on the test-retest data, this same remark also applies to these two indicators. It should also be noted that, in this study, Domain 6 (leisure activities) and Domain 7 (fears) did not demonstrate adequate reliability. We hypothesize that this may because of the low sample size in combination with the low number of items for these two domains (two items for Domain 6 and four items for Domain 7), which causes any difference between the responses between the first and second administration of the questionnaire to be exaggerated in the scores.
We examined the ability of the SarQoL questionnaire to detect a change in QoL. We found moderate responsiveness through the confirmation of five out of nine hypotheses on the correlation between changes in QoL observed by the SarQoL questionnaire and by other questionnaires. It is possible that the rejection of several hypotheses is linked to the lower SRMs found between the different questionnaires. In fact, the SRM of the overall QoL score of the SarQoL questionnaire is markedly stronger at SRM = À1.144 compared with the strongest effect size of the SF-36, which was the physical functioning subscale at SRM = À0.749. It may be that the rejection of some hypotheses was thus caused not by poor responsiveness of the SarQoL questionnaire but by smaller effect sizes found by the SF-36. Similarly, for the EQ-VAS, we found a small effect at SRM = À0.267 and the rejection of two hypotheses associated with this instrument.
Here also, this may be more linked to the performance of the EQ-VAS in combination with the 4 year interval between the assessments. It is highly likely that an instrument such as the EQ-VAS would be influenced by response shift, which is defined as a change in the self-evaluation of the meaning of a target construct caused by reconceptualization of the construct, a reprioritization of the participants' values, or a recalibration of the respondents' internal standards of measurement. 38,45 Overall assessments, such as the EQ-VAS, which asks the respondent to indicate on a scale from 0 to 100 'how good or bad your health state is today', are more vulnerable to response shift because they require careful consideration and interpretation of the question. The participants had to evaluate for themselves the meaning of the concept 'health state' and what is considered 'good' and 'bad' and assign a numerical value to this, leaving open the possibility of reconceptualization, reprioritization, or recalibration. 46 Researchers investigating changes in QoL over time or pre-intervention/post-intervention should make the overall QoL score of the SarQoL questionnaire their main outcome, given that it has the highest SRM and the smallest detectable change. If a significant change in overall QoL is found, further analyses of the individual domains could be useful in indicating on what domains a participant's QoL has changed. Because this study used data collected during a previous study, we were unable to investigate and quantify the content validity of the SarQoL questionnaire in a population of frail, older, community-dwelling individuals. In the development of the questionnaire, content validity had been put at the heart of the process by soliciting, at each step of the item generation and selection process, input from multiple sarcopenic persons. 11 In this study, we were unable to provide this information from frail individuals. However, some authors have theorized that sarcopenia, the target condition for which the SarQoL questionnaire was developed, constitutes one of the main components of the clinical frailty syndrome, all the while recognizing that frailty should not be limited to physical manifestations but should also incorporate psychological, cognitive, emotional, social, and spiritual factors. 24,47 Currently, to our knowledge, the only questionnaire that measures QoL and that is specifically designed with and for older frail persons is the Geriatric Quality of Life Questionnaire. 48 However, the developers left the definition of what constitutes the 'frail elderly' up to the appreciation of the clinicians responsible for recruitment, instead of a recognized diagnostic tool. While the SarQoL questionnaire was not specifically developed for frailty, the shared characteristics between sarcopenia and frailty mean that it should be able to provide a precise measurement of the physical weakness aspect of frailty. Apart from the physical domains, the SarQoL has also incorporated items on mental health, body image, sexuality, activities of daily living, leisure activities, and fears, making for a multidimensional framework of QoL.
Healthy ageing is already high on the agenda for most health systems in both Western and Asian countries and will only gain in importance as the number of older people increases. 49 Concepts such as frailty, sarcopenia, or the construct recently proposed by the World Health Organization called Intrinsic Capacity, which is a composite of all the physical and mental capacities of an individual, may play an important role in any future medical approach. 50 Whatever approach is adopted, it must take in the perspective and priorities of the target population, and QoL can be an important metric for this. Having valid, reliable, and precise instruments to measure QoL that can pick up on the impact of a specific target condition is a prerequisite to be able to rely on QoL instruments to provide information on the patients' lived experience.
There are some limitations to this study. First off, we adopted the frailty criteria developed by Fried et al., but other diagnostic approaches are available, such as the Rockwood Clinical Frailty Scale or the IF-VIG, among others. [51][52][53] Although all these approaches purportedly measure the same concept, frailty, we cannot be sure that the results on the validity and reliability of the SarQoL would have been the same if we had applied other diagnostic approaches. Secondly, our sample of robust, pre-frail, and frail participants is not necessarily representative of frailty in the wider community. Because these data were collected within a study that recruited volunteers, and which asked those volunteers to make several trips to the research centre, it is likely that the SarcoPhAge study recruited a sample that was in better overall condition, and that had better mobility, than a representative sample of pre-frail and frail participants. While this study has shown that the SarQoL questionnaire is a valid and reliable tool in frailty, additional investigations in samples with a different make-up need to confirm these results. Lastly, while the overall sample size was more than adequate for a psychometric study, the test-retest sample is relatively small with only 29 participants. This steep reduction from the overall sample size is a result of the fact that only a subset of participants was invited to complete the questionnaire a second time. However, because we have the 95% CI around the ICC, we can judge that most values have adequate precision, apart from Domains 6 and 7.