Development and validation of a patient‐reported outcome measure for patients with chronic respiratory failure: The CRF‐PROM scale

Abstract Background Various health‐related quality‐of‐life (HRQOL) tools are used to evaluate patients with chronic respiratory failure (CRF), but there is a relative lack of tools available for the evaluation of social support and treatment in these patients. The present study focused on the development of a systematic patient‐reported outcome measure (PROM) tool for use in patients with CRF. Methods The CRF‐PROM scale conceptual framework and item bank were generated after reviewing the corresponding literature and HRQOL scales, interviewing CRF patients and focus groups. After creation of the initial scale, the items in the scale were selected through two item selection theories, and the final scale was created. The reliability, validity and feasibility of the final scale were assessed. Results The CRF‐PROM scale includes four domains (i.e., physiological domain, psychological domain, social domain and therapeutic domain) and 10 dimensions. After the item selection process, the final scale included 50 items. Cronbach's α coefficients, which were all above 0.7, indicated the reliability of the scale. The results of structural validity met the relevant standards of confirmatory factor analysis. The response rates of the preinvestigation and the formal investigation were 93.3% and 97.6%, respectively. Conclusions The CRF‐PROM scale developed in the present study is effective and reliable. It could be used widely in the posthospital management of patients, in CRF studies and in clinical trials of new medical products and interventions. Patient or Public Contribution Participants from eight different hospitals and communities participated in the development or validation phase of the CRF‐PROM scale.


| INTRODUCTION
Chronic respiratory failure (CRF) occurs in the advanced stages of many respiratory diseases, and it is associated with a high hospitalization rate and high mortality. 1 It continues to affect the quality of life of patients posthospitalization 2 in terms of physical and psychological factors as well as factors associated with social support, satisfaction with treatment and treatment compliance, among others.
In previous reports, it was proposed that the assessment of the effects of treatment on any individual patient should include the patient's own evaluation of therapy, or patient-reported outcome. 3,4 A patient-reported outcome measure (PROM) is any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else. [5][6][7] The importance of the assessment of quality of life to evaluate the human and financial costs and benefits of modern medical techniques has become increasingly recognized in research and healthcare practice.
Currently, there are many scales for estimating quality of life, but there is no scale specifically designed for use in patients with CRF.
When it is necessary to estimate quality of life in patients with CRF, a universal scale (such as SF-36 8 ) or a chronic obstructive pulmonary disease (COPD) scale is usually used. 9 Generic instruments are useful for comparing effects on quality of life in populations with different diseases; however, disease-specific tools are generally more sensitive to disease-specific issues and are therefore more appropriate for clinical trials in which specific therapeutic interventions are being evaluated. 10,11 Previous research in patients with CRF has mainly focused on their physical condition and athletic ability. 12 Few studies have investigated changes in psychological state caused by disease, or factors such as social support, treatment compliance and therapeutic satisfaction.
The aim of the present study was to develop a new PROM scale for use in patients with CRF that was comprehensive and showed sufficient validity, reliability and feasibility. The intention was to develop a scale that was a useful tool for posthospital management and in clinical trials.

| Ethics approval and consent to participate
The study and the CRF-PROM were reviewed and approved by the Medical Ethics Committee of Shanxi Medical University, China (No. 2018LL128), and written informed consent was obtained from all participants.

| Study population
Participants were enrolled from eight different hospitals and communities in Shanxi Province, China. The scale was usually filled out by participants independently, but in cases where participants were not able to do it unassisted, the questions were asked verbally by a trained investigator. The inclusion criteria were age 18 years or older and willing to participate in the study. The CRF group included patients diagnosed with CRF by a clinician and the control group included healthy subjects from the communities mentioned above without respiratory failure or malignant tumour of the respiratory system. The exclusion criteria were mental illness or presence of a consciousness disorder and inability to understand or complete the scale for any reason.   Figure 2). In this stage, 10 patients (male/female ratio: 1.5; average age: 68.3 years) were selected for one-on-one interviews, and three focus groups were organized. Each focus group included two respiratory disease specialists, one nurse, one psychologist, one sociologist and one ethics expert. They generated specific suggestions for revision of the scale in various domains together. The duration of each of the aforementioned one-on-one interviews with 10 patients was no less than 30 min, and the impact of disease on their quality of life was documented. An item bank was then created.

| Formation of the initial scale
The items in the questionnaire are all based on a 5-point Likert scale, and it includes positive items (with higher scores corresponding to better quality of life) and negative items (with higher scores corresponding to lower quality of life). The score range is 0-4, and scores pertaining to negative items are converted into negative numbers when calculating the total scores. We selected 15 patients for a pilot survey to ascertain whether all items were accurately understood. Each patient completed the scale independently, and then explained their interpretation of every item to the investigator. Any misunderstood items were modified.

| Item selection
To ensure that the final scale has good reliability, validity and feasibility, the selection process followed the principles of good sensitivity, independence, representativeness and internal consistency.
We utilized both the classical test theory [CTT; in this study, it included discrete trend, exploratory factor analysis (EFA), correlation coefficient and Cronbach's α coefficient] and the item response theory (IRT) to perform the selection of scale items. 13 In a variety of CTT methods, the standard deviation (SD) of the score of each item is used to measure its discrete trend; it is recommended to delete items with an SD < 1. 13,14 In the EFA, the principal component method was used to analyse each factor and perform maximum orthogonal rotation. Items with low factor loading (<0.4) or close to other factors in the EFA were excluded. The correlation coefficient of each item with its dimension was calculated, and in the present study, items yielding small correlation coefficients (<0.6) were deleted. 15 Internal consistency was assessed using Cronbach's α coefficient and corrected item-total correlation (CITC). If there is an item with CITC < 0.5 or a large increase in the value of Cronbach's α coefficient after the item is removed, it indicates that its existence has the effect of reducing the internal consistency of the dimension and should be removed. 16 IRT, as the last method of item selection of this study, evaluated item performance by constructing the Bayesian Generalized Partial Credit Model (GPCM). Through parallel and rawpar functions, the dimension settings were consistently unidimensional, and then IRT was applied in 10 dimensions. The study used the marginal maximum likelihood estimation method for large samples, and the discrimination parameter (a) and the difficulty parameter (b i ) for each item were calculated using Multilog 7.03 software. The general requirement of a is >0.6, and in the present study, items for which a was <0.60 were excluded. The b 1 , b 2 , b 3 and b 4 parameters correspond to four levels of difficulty, where b 1 is the category threshold parameter between option 1 and option 2, and so on, and b 1 < b 2 < b 3 < b 4 . The range of difficulty level parameters is generally −3 to +3. 17 After applying the above-described five tests, if an item passed three or more, it was retained. The items thus retained were combined with additional items identified by the majority of experts on an expert panel of investigators via a formal process described in detail below, to generate the final version of the scale.

| Reliability analysis
Reliability refers to the consistency of measurement results. The most commonly used indicator of reliability is Cronbach's α coefficient. It is generally believed that Cronbach's α coefficient should be above 0.7.

| Content validity
Content validity refers to whether the items can represent the measured content. In the present study, the content validity index (CVI) was used for quantitative analysis. If the CVI was higher than 80%, the item was retained.

| Dimensional correlation
Dimensional correlation is the degree of correlation between the item and its own domain. When the correlation coefficient r is >.50, the dimensional correlation is considered acceptable.

| Construct validity
The validity of a construct is an indicator of whether its domain can be evaluated via confirmatory factor analysis. LISREL 10.0 software was used to conduct these calculations in the present study. In accordance with the theoretical framework, a four-factor model was tested. While there are many fitting indexes available for model evaluation, none of them can be used as a completely standardized test of the success of a model. Relatively reliable indicators include the non-normed fit index (NNFI), the comparative fit index (CFI), the adjusted goodness-of-fit index (AGFI) and the approximate error root mean square (RMSEA). It is generally believed that an RMSEA value below 0.08 corresponds to a reasonably good fit (the smaller the better), and a value between 0.08 and 0.10 indicates a moderate degree of fitting. As with the RMSEA, the smaller the root mean square residual value, the better the fit. When the normative fit index, the NNFI, the CFI and the value-added fit index are above 0.9, the fitting is considered reasonably good (the bigger the better), 18 and when they are close to 0.90, the degree of fitting can be considered acceptable.

| Response analysis
Response analysis refers to the ability of items to measure small changes in indicators. Whether the scale can determine changes in an indicator in the same group over time is often investigated, as is whether it can identify differences in a measured indicator between different groups. It reflects an ability to determine the characteristics of different populations. In the present study, the scale was required to be able to distinguish between the CRF group and the control group based on differences between their mean scores in each dimension (except the nonapplicable therapeutic dimension). The statistical method used was the two-sample t test, and p <.05 was deemed to indicate that the scale was able to differentiate between the control group and the CRF group.

| Feasibility analysis
Feasibility reflects the degree to which the scale is deemed acceptable by target respondents. Commonly used indicators include the completion rate and the completion time. The completion rate, also known as the response rate, refers to the percentage of questionnaires that are attempted by respondents and returned. It is generally required to be more than 85%. The scale completion time is usually intentionally restricted to 30 min or less, because a longer duration is not conducive to clinical application or the implementation of an investigation.

| Data analysis software
Data analyses were conducted using SPSS 25.0, Multilog 7.03, LISREL 10.0 software and R 3.6.1. Table 1 shows the characteristics of the 364 CRF patients and 125 control patients who completed the final scale. The sociodemographic characteristics of the two groups of participants are balanced and comparable.

| Item selection
The discrete trend is measured using the SD. The SD of each item is shown in Table 2. The recommended deletion (SD < 1) is COM1\SAT4.
In factor analysis, Kaiser-Meyer-Olkin = 0.795, and Bartlett's spherical test yielded p < .001, indicating that the data were highly suitable for factor analysis. Based on the theoretical framework and the results of EFA, 10 factors were ultimately selected. Variance maximization orthogonal rotation was then applied, and items for which the factor loading on each factor was small (<0.4) or similar were removed ( In accordance with the predetermined theoretical structure, the correlation coefficient of each item with its dimension and other dimensions was calculated and items with small correlation coefficients (<0.6) were deleted ( Table 2). The suggested deletions were SOM10, SOM12, SOM13, DAL1, SC1, SAT2 and SAT3. The Cronbach's α coefficient for each dimension of the scale is shown in Table 2 When only one factor is extracted (the eigenvalue of the second factor is less than 1) or the eigenvalue of the first factor is greater than two times the second factor, the dimension is considered to be unidimensional. Based on IRT, considering the estimated values of a i , the suggested deletions were DLA1, DLA3 and SC1 (Table 2). Figure 3 shows the total amount of information and measurement error provided by each dimension. Figure 4 shows a matrix diagram of the item characteristic curve (ICC) of each item. Ideally, the first and fifth curves of the ICC change monotonically, and the second, third and fourth curves are normally distributed. As shown in Figure 4

| Content validity
The CVI of all items was higher than 80%, indicating good content validity.  were above 0.50 and those of 7 of the 9 items in the therapeutic domain were above 0.50.

| Construct validity
There was a four-factor structure in confirmatory factor analysis; in

| Response analysis
In the response analysis, the mean scores derived from the control group and the CRF group were statistically significantly different in every dimension, indicating that the scale has the ability to distinguish between people with different qualities of life (Table 5).

| Feasibility analysis
In the small sample investigation, we sent initial PROMs out to 120 patients with CRF.    CRF is a long-term disease, so posthospital management plays an important role in a patient's prognosis. 25 Because of a relative lack of medical knowledge, however, family members often do not know how to systematically assess changes in a patient's status. In this context, it is reasonable to expect that the CRF-PROM will assist caregivers.
The National Center for Complementary and Alternative Medicine emphasizes that the effect of an intervention or treatment must be confirmed by a recognized endpoint indicator. 26

| CONCLUSIONS
The results of the present study suggest that the CRF-PROM is effective and reliable. It can be widely used in the posthospital management of patients, studies investigating CRF and clinical trials of new medical products. The CRF-PROM addresses the current lack of a scale specifically designed for use in CRF patients, and it may also prove useful in the development of additional disease-specific scales.
Notably, however, the broader applicability of the scale needs to be assessed via administration in more provinces of China, and its wider generalizability in different races also remains to be assessed. It may require additional ongoing modifications.

ACKNOWLEDGEMENTS
We thank all of the focus group members, the eight hospitals and communities that have worked with us and the investigators and medical staff who helped in the investigation. This study was supported by a grant from the National Natural Science Foundation of