Psychometric properties of a patient‐reported outcome set in acute stroke patients

Abstract Objectives Impairments after stroke may affect multiple domains of health‐related quality of life (HRQoL). Patient‐reported outcome measures (PROMs) have proven valuable in measuring patients’ well‐being. We examine the psychometric properties of a standard set of PROMs assessing global health, anxiety, and depression, and functioning in a German health care setting. Method We included inpatients at the Department of Neurology at the University Medical Center Hamburg‐Eppendorf, diagnosed with stroke. Following the stroke‐specific standard set of the International Consortium for Health Outcome Measurement, we collected demographic and clinical information at baseline, and PROMs for global health (PROMIS‐10), three items for self‐reported functioning, anxiety, and depression (PHQ‐4) at 90 days follow‐up. We calculated confirmatory factor analyses to test factorial validity and correlation analyses to test construct validity. We further conducted item and reliability analyses. Results In a sample of 487 patients (mean age, SD: 71.1, 12.6; 47% female) with mild and moderate symptoms, model fit for the PROMIS‐10 was acceptable for the two‐factor and single‐factor models. Factor loadings ranged from 0.52 to 0.94. The postulated single‐factor model for functioning was saturated with zero degrees of freedom. Factor loadings ranged from 0.90 to 0.96. For the PHQ‐4, the two‐factor model showed excellent model fit. Factor loadings ranged from 0.78 to 0.87. Internal consistency was acceptable to good. Construct validity was generally confirmed. Conclusions The PROMIS‐10 is a valid and reliable instrument to measure HRQoL among German stroke patients. While the PHQ‐4 was confirmed as a screening measure for mental disorders, further research is needed on items assessing self‐reported functioning. Results are limited to patients showing minimal functional deficits.


INTRODUCTION
To improve patients' health-related quality of life (HRQoL), patientreported outcome measures (PROMs) have a strong impact on decisions made in the context of evaluating patient care (Reeves et al., 2018;Snyder et al., 2013;Valderas & Alonso, 2008). Next to physiological and other medical information, patients' health status is now also assessed by their subjective experiences with health-related domains such as mental well-being, functional impairment, psychosocial functioning, and quality of life (Glasgow et al., 2012;Ishaque et al., 2019;Willke et al., 2004). Although the benefits and costs of implementing PROMs into routine care have been critically discussed (Gilbody et al., 2002;Greenhalgh et al., 2005;Marshall et al., 2006), assessing PROMs has a positive impact on patient satisfaction, process of care, and health outcomes (Ishaque et al., 2019;Recinos et al., 2017), further promoting the shift toward an increased patient-centeredness of medical care (Baumhauer, 2017;Glasgow et al., 2012).
Aiming to coordinate and to standardize the rising number of PROMs (Kotronoulas et al., 2014;Price-Haywood et al., 2019), the National Institute of Health (NIH) established the Patient-Reported Outcomes Measurement Information System (PROMIS) (Cella et al., 2010). Among the disease-specific standard sets, published by the International Consortium for Health Outcome Measurement (ICHOM), is the Standard Set for Stroke (ICHOM-SSS), which was developed qualitatively based on expert consensus, with the primary aim of creating a clinically intuitive and practical measure (Salinas et al., 2016). Independent from the severity of stroke symptoms, the impairments that may occur after stroke have the potential to affect every health-related domain including HRQoL (Katzan, Schuster, et al., 2018;Katzan, Thompson, et al., 2018;Katzan et al., 2019;Price-Haywood et al., 2019). Thus, PROMs are a valuable addition to well-established clinician-reported measures in order to capture changes relevant to the patients' well-being .
In the ICHOM-SSS, one of the measures to assess patient-reported health status is the PROMIS Global Health short form (PROMIS-10).
The instrument measures the patients' global health status based on a global physical health score (GPH) and a global mental health score (GMH). Both scales have been validated in previous studies: Hays and colleagues (2009) suggested a two-factor structure (GPH, GMH) with four items each, after rejecting a single-factor solution in a confirmatory factor analysis (CFA).  were also able to confirm the suggested two-factor solution. Although results of the CFAs in both studies showed acceptable model fit, their models excluded two single items and the authors identified inconsistencies with global goodness-of-fit indices. Because the PROMIS-10 is measured using 10 items, a validation study should take into account all items and address these shortcomings.
In addition to patients' global health status, their functional impairment and potential mental disorders are considered central patient-reported outcomes (Poku et al., 2016;Price-Haywood et al., 2019). Both are strongly associated with HRQoL after stroke (Rafsten et al., 2018;Tramonti et al., 2014;Wilson & Cleary, 1995). However, functional impairment after stroke is usually assessed by clinicians evaluating the patients' physical and cognitive impairments (Harvey, 2015;Jönsson et al., 2014;Lyden et al., 1994) neglecting the patient perspective. As for mental disorders, systematic reviews and metaanalyses on post-stroke anxiety (PSA) and post-stroke depression (PSD) report prevalences between 29% and 31% (Ayerbe et al., 2013;Hackett & Pickles, 2014;Rafsten et al., 2018). In the included studies, PSA and PSD were mostly diagnosed using self-report measures (e.g., Hospital Anxiety and Depression Scale, Hamilton Anxiety Rating Scale, Generalized Anxiety Disorder 7-item scale, Beck Depression Inventory, and Patient Health Questionnaire). Another meta-analysis on PSD, in which patients were diagnosed based on clinical interviews, prevalences ranged between 11% and 18% (Mitchell et al., 2017). Since mental disorders may affect recovery and rehabilitation after stroke (Belagaje, 2017;Nannetti et al., 2005), these findings stress the need for reliable and valid patient-reported outcomes to assess symptoms of anxiety and depression after stroke.
In the context of a larger study implementing the ICHOM-SSS in routine stroke care in Germany (Rimmele et al., 2019), we aimed to test the psychometric properties of a patient-reported outcome set in patients with stroke 90 days after a cerebrovascular incident to further confirm the factor structure of the PROMIS-10 and its validity in a German-speaking sample. First, we tested the factorial validity of the PROMs. Therefore, we aimed to confirm (a) the two-factor structure of the PROMIS-10 measuring global health (Hays et al., 2009;; (b) a heuristically postulated single-factor structure of the ICHOM-SSS items for functional impairment (self-reported functioning) (Salinas et al., 2016); (c) the two-factor structure of the Patient Health Questionnaire-4 (PHQ-4) measuring anxiety and depression to test its validity to screen for these symptoms in patients with stroke (Kroenke et al., 2009). Second, we aimed to determine construct and discriminant validity. Therefore, we expected (a) GPH to show stronger negative associations with self-reported and clinician-rated functioning than with anxiety and depression; (b) GMH to show stronger negative associations with anxiety and depression than with self-reported and clinician-rated functioning; (c) substantial correlation between self-reported and clinician-rated functioning; and (d) a moderate positive association between GPH and GMH.

Study design and study sample
This psychometric study is part of a prospective exploratory observational and implementation study, which is currently conducted at the Department of Neurology at the University Medical Center Hamburg-Eppendorf, Germany. The hospital's stroke unit cares for all regular patients with stroke admitted to the hospital. There, we consecutively recruited inpatients who were diagnosed with acute ischemic or hemorrhagic stroke over a period of 15 months. We excluded patients who showed severe deficits in their ability to communicate (e.g., dementia or aphasia). All patients or their legal guardians provided informed consent. Please see the study protocol for more detailed information and primary research questions (Rimmele et al., 2019). The study protocol was approved by the ethics committee of the Hamburg chamber of physicians. The study is registered at clinicaltrials.gov, NCT03795948.
Following the ICHOM-SSS (Salinas et al., 2016), there were four points of assessment: baseline (admission to the hospital), discharge from the hospital, and 90-day and 12-month follow-up. For this study, we used the information collected at baseline (demographic, diagnostic and clinical information, functional impairment, and patient-reported health prior to the stroke) and at the 90-day follow-up (patientreported outcomes including functional impairment). At baseline, study participants completed the paper-pencil version of the ICHOM-SSS during their hospital stay. If they were unable to complete it by themselves, a research assistant administered the items in an in-person interview. Follow-up questionnaires were sent to the patients after discharge. In case the patients indicated need for assistance, items were administered in a telephone interview. Patients' ability to complete the questionnaire without help was assessed by a separate item.

Measures
We collected basic demographic and clinical characteristics from the patients' electronic health record. We assessed all other information according to the German version of the ICHOM-SSS (Supporting information S1). This included details on the stroke event, such as prior vascular and systemic diseases, risk factors, stroke severity, duration of symptoms (less than 1 hour, 1 hour to 1 day, longer than 1 day, unable to determine), and level of consciousness at arrival (fully awake, somnolent, coma). Stroke severity was measured using the NIH Stroke Scale (Lyden et al., 1994), with scores of 0 indicating no stroke For self-reported functional impairment as assessed by the ICHOM-SSS, patients were asked to indicate whether they needed help walking, going to the toilet, and getting dressed. Patients had three response options for the first item (e.g., 1 = able to walk without help, 2 = able to walk with help, 3 = unable to walk) and two response options for the latter two items. For validation purposes, we used the clinician-rated functional impairment as assessed by the simplified modified Ranking Scale questionnaire (smRSq; Bruno et al., 2013;van Swieten et al., 1988), which measures patients' degree of disability or dependence.
The scale consists of one item, which is scored on a seven-point Likertscale ranging from 0 (no symptoms) to 6 (death) and was assessed via a telephone assessment 90 days after stroke with either the patient or a patient's relative or care taker.

Statistical analyses
We calculated descriptive statistics for sample characteristics (frequencies, means, and SDs), and performed item analysis for the patient-reported outcome measures (means, SDs, skewness, and kurtosis). To assess reliability, we calculated Cronbach's α as a measure of internal consistency for each scale as well as the standardized difficulty and corrected item-total correlation for each item.
We performed our analyses within the framework of classical test theory. It should be noticed that several other approaches exist, most prominently item-response theory, which may model the response pattern in the data even better than the methods applied here. We conducted a series of analyses to test the factorial structure of the investigated measures. First, we tested the hypothesized two-factor structure of the PROMIS-10 with four items loading on a global mental health factor (GMH), and four items loading on a global physical health factor (GPH). We allowed the two single items global health (global01) and social participation (global09) to be correlated with both factors.
Second, we tested a single-factor model to examine whether the three categorical items for self-reported functioning poststroke were loading on one latent factor. Third, we aimed to confirm the two-factor structure of the PHQ-4 with the items measuring nervousness and worries loading on the anxiety factor, and the items measuring loss of interest and depressive mood loading on the depression factor (Löwe et al., 2010). In all confirmatory factor analyses, we evaluated model fit based on the following indices (Hu & Bentler, 1999;Kriston et al., 2008;Schermelleh-Engel et al., 2003): normed χ 2 (χ 2 /degrees of freedom (df) < 3.0 for good, < 5.0 for acceptable fit), To test construct validity of the scale scores resulting from the measures, we calculated Pearson's correlation coefficients between the two scales of the PROMIS-10 (GPH, GMH), self-reported and clinicianrated functioning, as well as the subscales of the PHQ-4 (Cohen, 1992).
We calculated the correlation between GPH and GMH to test discriminant validity.

Study sample
We collected data of 1,725 patients between March 2017 and June 2018. This sample comprised all patients admitted to the stroke unit. In this psychometric study, we excluded patients who did not give consent at the 90-days follow-up (n = 684) and who were unable to complete the questionnaire without the help of a relative or caregiver (n = 554) due to possible bias. The flow diagram describes participation in detail ( Figure 1). Alcohol (more than one beverage per day) 37 8 † As assessed by the NIH Stroke Scale (Lyden et al., 1994). ‡ As assessed by the simplified modified Ranking Scale questionnaire (Bruno et al., 2013;van Swieten et al., 1988).

Factorial validity
Standardized factor loadings for the suggested two-factor model of the PROMIS-10 ranged between 0.55 and 0.93. They were lowest for the three recoded items. The model showed poor fit for all indices, except the SRMR ( Standardized factor loadings for this adapted model ranged between 0.52 and 0.94. Model fit improved (Table 2) and was acceptable, except for the normed χ 2 and RMSEA. The BIC confirmed that this model (Figure 2) fit the data better than the model without residual correlations.
As both latent factors were highly correlated (r = 0.95), we also tested a model with a single global health factor in a post hoc analysis, which fit our data poorly regarding all indices (Table 2). To improve model fit, we added residual correlations between the items for mobil-  (Table 2) and was acceptable, except for the RMSEA.
For self-reported functioning (functional impairment), standardized factor loadings in the single-factor model were 0.91 for ambulation, 0.96 for toileting, and 0.90 for getting dressed ( Figure S1). Due to the limited number of indicators, the model was saturated with zero degrees of freedom. This means that the number of parameters that had to be estimated was equal to the amount of information available in the observed data and therefore global model fit could not be assessed.
Further, we confirmed the two-factor structure of the PHQ-4.

Construct validity
We calculated Pearson's correlation coefficients between the constructs of interest to test discriminant and construct validity (Table 3).
Due to the sample size, results should be interpreted based on the Abbreviations: BIC, Bayesian information criterion; CFI, comparative fit index; Normed χ 2 , χ 2 /degrees of freedom; RMSEA, root means error of approximation; SRMR, standardized root mean squared residual; TLI, Tucker-Lewis index.
strength of the associations as indicated by the correlation coefficient rather than the p-values. The global health scales of the PROMIS-10 correlated strongly that indicates that the factors are largely overlapping and cannot be easily differentiated, suggesting limited discriminant validity. The negative associations between GMH and anxiety and depression were stronger than those between GMH and with selfreported or clinician-rated functional impairment indicating construct validity for this subscale. The negative associations between GPH and anxiety and depression were at least as strong as for self-reported or clinician-rated functional impairment. This finding is not fully consistent with theoretical expectations and indicates limited construct validity.

Reliability
Acceptance of the three patient-reported outcomes was high with data missing in three cases at most. indicated acceptable internal consistency. Table S1 shows a summary of the reliability analyses.
The majority of patients did not report any symptoms of anxiety or depression with item means ranging from 0.47 (worries) to 0.69 (loss of interest). Cronbach's alpha coefficients for both scales indicate acceptable (anxiety) to good (depression) internal consistency. Table S2 shows a summary of the reliability analyses.

DISCUSSION
In this psychometric study, we examined the properties of a patient- The two-and single-factor models showed good model fit after adding residual correlations post hoc based on the results of the CFAs.
The varying factor loadings and correlated errors, if confirmed by independent studies, may suggest forming weighted scores for both subscales. The correlations between the items assessing mood, general health and physical activity, and emotional problems and fatigue were also reported by Hays et al. (2009). However, as the residual correlations were low and could be explained by the item contents, they are unlikely to raise serious concerns during application of the measure.
Nonetheless, they seem to be present in different settings, therefore they deserve further attention in independent psychometric investigations. Also, the RMSEA, a standardized measure of the amount of the error in the model, was above the frequently used threshold of 0.08 and, thus, did not always support the tested models. However, the RMSEA tends to be inflated in models with few strongly correlated variables (Kenny et al., 2014;Kenny & McCoach, 2003;Shi et al., 2019).
Therefore, we think in this situation the other fit indices should have more weight.
According to the T-scores, our sample reported lower GPH and GMH than the general US population (ichom.org/files/medicalconditions/stroke/stroke-reference-guide.pdf, accessed April 26, 2020; Hays et al., 2009) We tested a reflective, saturated model for self-reported functioning with high factor loadings for all three items. However, we heuristically decided on a single-factor structure corresponding with the three items suggested by the ICHOM-SSS. To measure self-reported functioning more comprehensively, future validation studies may benefit from models that include more indicators, which assess functioning in a more differentiated manner, such as the PROMIS-Physical Function item bank (Rose et al., 2014), the Stroke Impact Scale-16 Scale (Duncan et al., 2003), and the Stroke Specific Quality of Life Scale (Ewert & Stucki, 2007). The moderate associations between self-reported functioning and the external measures suggest low construct validity.
Especially the association between self-reported and clinician-rated functioning was weaker than expected. Since both constructs aim to assess the patients' disability or dependence, one possible interpretation may be that the clinicians' and patients' perspectives differ to a substantial extent, which is a common finding in the care of stroke patients Price-Haywood et al., 2019). This underlines the importance of assessing self-reported functioning using adequate measures.
We were able to confirm the two-factor structure of the  showing that the questionnaire is a reliable and valid measure to screen for symptoms of anxiety and depression in patients with stroke. Yet, the correlation between both factors was high indicating low discriminant validity, which is further supported by the results of the correlation analysis. Still, construct validity of the PHQ-4 was satisfactory, as symptoms of anxiety and depression showed weaker associations with GPH than with GMH and were only moderately associated with both measures of functioning. Other than suggested by the cited references, patients screening positive for symptoms of depression (11% with a score ≥3) and anxiety (12% with a score ≥3) were underrepresented in our sample, which may be due to their higher functioning.
There are limitations to our study. Our findings cannot be generalized to patients with severe symptoms because only few of the patients in our sample suffered from moderate to severe stroke symptoms.
Since this was also the case for the earlier study by , there is currently no evidence for the use of the PROMIS-10 among more severely impaired patients. At the same time, we excluded those patients from the psychometric study who did not complete the questionnaire by themselves but had a relative or caregiver completing it for them in order to control for potential bias (the administered measures were designed to be answered by the patients). However, it is likely that this limits the generalizability of our findings because patients who suffer from more severe stroke symptoms do not have the mental or physical capacity or find it too distressing to fill out the questionnaire. Accordingly, patient-reported outcomes like the PROMIS-10 may only apply to patients who show mild or moderate impairment (George & Zhao, 2018). It is possible that our findings are limited due to a skewed distribution of data, especially for the self-reported functioning and the PHQ-4, which suggests floor effects. This may be explained by the overall low distress reported by patients in our sample. With regard to generalization, it is also notable that we recruited patients from a university medical center. In case of a cerebrovascular incident, patients might be attended to more quickly in this specific, urban clinical setting than in rural areas. Moreover, we were unable to determine construct and discriminant validity of the PROMIS-10 in a narrower sense, because the ICHOM-SSS does not include similar and distinct constructs. Since this psychometric study was part of a larger clinical study testing the implementation of the ICHOM-SSS, we used the items provided by the standard set and added the PHQ-4 to maintain efficiency and keep the possible burden for patients at a minimum.

CONCLUSION AND IMPLICATIONS
Although there have been previous attempts to conceptualize the rising number of available PROMs (Valderas & Alonso, 2008), the ICHOM has been essential in providing standardized, efficient, and diseasespecific set of PROMs. The PROMIS-10 of the ICHOM-SSS provides a standard set of items for assessing the HRQoL in stroke patients. We were able to show that the German version of the PROMIS-10 is a valid and reliable instrument to measure HRQoL among stroke patients with mild to moderate symptoms. Our findings are in line with previous validation studies on the structure of the PROMIS-10 (Hays et al., 2009;). Yet, the psychometric limitations also found in our study suggest that there may be alternative approaches to measure global HRQoL. In addition to the PROMIS-10, we were able to show the value of measuring self-reported functioning and symptoms of anxiety and depression in this population to further integrate the patient perspective into routine care. While the PHQ-4 has proven to be a valid and reliable instrument to screen for mental disorders, our study offers new information on the assessment of self-reported functioning. PROMs measuring functional impairment need to explore aspects that patients find most relevant to their functioning and assess them using a comprehensive item pool. In addition, an in-depth investigation of the level and conditions of agreement between self-report and clinician assessment of functional impairment is urgently needed.
Future research is needed to address the practicability and benefit of the PROMIS-10 and self-reported functioning, especially among patients who suffer from moderate to severe symptoms.

FUNDING INFORMATION
Innovation Fund of the German Federal Joint Committee, grant number: 01VSF16023.

CONFLICT OF INTEREST
GT reports receiving consulting fees from Acandis and Portola, grant support and lecture fees from Bayer, lecture fees from Boehringer Ingelheim, Bristol-Myers Squibb/Pfizer, and Daiichi Sankyo, and consulting fees and lecture fees from Stryker, all outside this work. All other authors declare no conflict of interest.