The underlying structure of the English Cancer Patient Experience Survey: Factor analysis to support survey reporting and design

Abstract Background The English Cancer Patient Experience Survey (CPES) is a regularly conducted survey measuring the experience of cancer patients. We studied the survey's underlying structure using factor analysis to identify potential for improvements in reporting or questionnaire design. Methods Cancer Patient Experience Survey 2015 respondents (n = 71,186, response rate 66%) were split into two random subgroups. Using exploratory factor analysis (EFA) on the first subgroup, we identified the survey's latent structure. EFA was then applied to 12 sets of items. A first (“core”) set was formed by questions that applied to all participants. The subsequent sets contained the “core set” plus questions corresponding to specific care pathways/patient groups. We used confirmatory factor analysis (CFA) on the second data subgroup for cross‐validation. Results The EFA suggested that five latent factors underlie the survey's core questions. Analysis on the remaining 11 care pathway/patient group items also indicated the same five latent factors, although additional factors were present for questions applicable to patients with an overnight stay or those accessing specialist nursing. The five factors models had an excellent fit (comparative fit index = 0.95, root mean square error of approximation = 0.045 for core set of questions). Items loading on each factor generally corresponded to a specific section or subsection of the questionnaire. CFA findings were concordant with the EFA patterns. Conclusion The findings suggest five coherent underlying sub‐constructs relating to different aspects of cancer health care. The findings support the construction of evidence‐based composite indicators for different domains of experience and provide options for survey re‐design.


| BACKGROUND
Patient experience has been established as a distinct domain of quality of care, together with clinical effectiveness and patient safety. [1][2][3] Consequently, in recent decades modern healthcare systems conduct large patient surveys with nationwide coverage, whose findings are reported publicly for responsible/accountable organizations. Examples include the General Practice Patient Survey (GPPS) and the Adult Inpatient Survey in England, and the CAHPS surveys in the United States. [4][5][6][7] Although some such surveys encompass patients with any disease, some focus on the experience of patients with specific diseases. The English Cancer Patient Experience Survey (CPES) is an example of such a survey. 8 To 2020, there have been eight waves of this survey from 2010 onward, with another two waves being prepared.
Ideally, the psychometric properties of survey questionnaires are examined during the survey design process. Often, as was the case with the CPES, surveys are implemented prior to any psychometric evaluation. In such cases, factor analysis can provide insights with a number of potential uses. Factor analysis is a family of statistical techniques which identify underlying, latent, relationships among survey items, helping to identify the constructs underpinning a survey. Using factor analysis, survey questions which relate to the same underlying construct or domain of care can be grouped together. These domains of care could be used as the basis for performance management and public reporting conventions. Organizations may be classified on the basis of their performance within each domain, rather than, or in addition to, being classified on every item. Knowledge of these domains might help to more efficiently target quality improvement efforts, addressing the source of deficits in patient experience, rather than each particular aspect of experience measured by individual questions.
Having identified domains, results of factor analysis can also inform future questionnaire development. The number of questions relating to the domain, and the consistency of responses to questions within that domain, can help to inform whether further questions are needed or if there is potential for item removal within a domain. Although the approach has been applied to other patient experience surveys, [9][10][11][12][13][14][15][16] no prior study has used factor analysis to identify the underlying structure of CPES. We therefore aimed to elucidate the structure of the CPES survey using factor analysis.

| Data
We used data from 71,186 respondents to the National CPES 2015 (response rate 65.7%). Details of the survey and method of administration have been published elsewhere. 17 Briefly, the survey was mailed to all adult patients (aged 16 and over) discharged from a National Health Service hospital after inpatient or day case cancerrelated treatment during April-June 2015 following vital status checks at survey mail-out (between 3 and 5 months after the sampling period).
The survey included 49 evaluative questions relating to aspects of patient experience (i.e., questions which ask patient to evaluate their care, which contrast with filter questions which often ask patients factual questions about their care to establish if a section of questions are relevant e.g., whether the patient has had an operation). It also includes questions about the patient (including age, gender, and ethnicity). Of the evaluative questions, seven have binary response options, 41 use a Likert scale with 3-7 response options, and one asks patients to rate their overall satisfaction between 0 and 10. Respondents were split randomly into two data sets, with one (N = 35,559) being used to establish the underlying structure of the data using exploratory factor analysis (EFA). The underlying factor structure was then confirmed using the second data set (N = 35,627) using confirmatory factor analysis (CFA).

| Core questions
Of the 49 evaluative questions, 20 represented domains of care that were assumed to be relevant to all respondents (with the remaining questions being relevant only to certain groups of patients, such as those treated by chemotherapy or those in education or employment--see below). We excluded one of these questions (relating to access to clinical nurse specialists) from the core set as it acts both as a measure of experience and a filter question. Throughout this work, we refer to these 19 items as "core questions." Despite the high overall response rate of the survey, only 27% (19,263) of respondents gave an informative response to all the core questions (i.e., answers such as 'Don't know/can't remember where treated as missing). Restricting analyses to this subset of respondents would result in reduced precision and the potential for bias that can arise from pairwise deletion. 18 To counter this, we produced a single imputation of the missing responses using chained equations under the missing at random assumption. Predictive mean matching was used to maintain the interval nature of the data.
Reflecting the dichotomous and ordinal nature of the response options within CPES for most questions, we primarily employed categorical (polychoric) correlations within the EFA to avoid the attenuation of correlations between two categorical variables which can occur when Pearson correlations are used. We used linear (Pearson) correlations only for correlations involving the single 0-10 rating question. All correlations were computed using the psych package in R. 19 We first performed an unrestricted EFA to determine the number of factors to retain. We used two methods for determining this number of factors: the Kaiser criterion, which identifies and retains factors with eigenvalues greater than one 20 and the Cattell's scree test, which involves an examination of a plot of eigenvalues, the scree plot, for breaks or discontinuities. 21 Having identified the number of factors, we performed additional EFA restricted to the number of factors identified by either method. We applied oblique rotations (using the Promax rotation method as implemented by the psych package in R 19 ) when any of the two above methods indicated the retention of more than one factor with a view to explaining whether rotated models resulted in improved overall fit. These rotations lead to freely estimated inter-factor correlations. 22, 23 We use a cut-off of 0.40 for the factor loadings. 24 Items with lower loadings were removed.
To account for the ordinal nature of responses, the factor structures from the EFA were examined within the CFA using structural equations models applying Satorra-Bentler adjustment to the standard errors and chi-squared values. 22,23 We made use of the population error statistic root mean square error of approximation (RMSEA), the baseline comparison statistics comparative fit index (CFI) and Tucker Lewis index (TLI), and the standardised root mean squared residuals (SRMR) statistic . The following cut-off values are presently recognized as indicative of good fit: RMSE < 0.07, CFI ≥ 0.95, TLI ≥ 0.95, SRMR < 0.08. 25,26 CFA was performed using the lavaan package in R. 27 The internal consistency reliability coefficients (Cronbach's alpha) for each factor derived from the EFA core model was computed using polychoric and Pearson correlations. The range of Cronbach's alpha coefficients in each factor when one question was left out was also calculated.
2.2.2 | Questions relating to specific patient groups/care pathways Unlike "core" questions that every patient could have answered, most questions applied only to specific patient groups (e.g., those in education or employment) or to those who have undergone specific care pathways (e.g., having been treated by chemotherapy or having had an overnight stay in hospital). When responses to these questions were missing, it usually reflected the lack of applicability of a specific care pathway (i.e., patients without a hospital stay , therefore they should not answer questions regarding their experience as inpatients), rather than reflecting a lack of response to an applicable question. For this reason, we did not impute responses to questions relating to specific patient groups/care pathways and aspects of care.
Following previous work examining key drivers of satisfaction, 28 we classified questions into 10 sets representing a specific patient group or care pathway, plus a further set including the question about access to clinical nurse specialists which was left out of the core set for analytic reasons (Appendix 1, Table A1). The above analysis for the core questions was repeated a further 11 times including responses to the core questions and responses to the questions applicable to the particular patient group/care pathway.
Analysis was performed using R 3.6.1. 29

| Core questions
The scree plot for unrestricted EFA applied to the core set of 19 questions which were applicable to all respondents is shown in Figure 1 indicating that only one factor had an eigenvalue > 1, suggesting a single unidimensional underlying patient experience construct for the data. A restricted EFA model with a single factor resulted in factor loadings > 0.4 for 18 of the 19 questions considered (Table 1). Only the question on willingness to take part in cancer research (Q58) had a loading < 0.4. Applying this one factor model (after removing Q58) within a CFA found that, depending on the goodness-of-fit measures that were used, the model did not provide a good fit to the data (RMSEA = 0.081, CFI = 0.836, and TLI = 0.814 indicating an unacceptable fit, and SRMR=0.054 indicating an acceptable fit--against recommended normative threshold values of RMSE < 0.07, CFI ≥ 0.95, TLI ≥ 0.95, and SRMR < 0.08). 25,26 We therefore examined the model structure implied by examination of the scree plot to determine the number of factors which should be retained. The scree plot ( Figure 1) did not display any clear break or discontinuity. We therefore chose to retain five factors, corresponding to the point where the outstep decline ends (after factor five) and reaches a very low level (at factors five and six). Applying a five factor restricted EFA to the 19 core questions identified factor loadings > 0.4 for all questions, except for Q58 (Table 2). In general, the factors correspond to a domain or subdomain of care as explicitly captured by a section or subsection of the survey questionnaire. The questions loading on each factor are as follows: Applying this five factor model within a CFA found that the model provided a good fit to the data for three of the four goodness-of-fit measures considered, with the fourth measure just below the acceptable threshold (RMSEA = 0.045 and SRMR = 0.029 indicating a good fit, CFI = 0.954 indicating an acceptable fit, and TLI = 0.944 just indicating an unacceptable fit).
The values of the Cronbach's alpha for each of the five factors fell within the acceptable 0.70 value with the exception of Factor 5, whose Cronbach's alpha was 0.60 (Appendix 1, Table A2). Deletion of one question from Factor 2 ("As far as you know, was your GP given enough information about your condition and the treatment you had at the hospital?") led to an increase in Cronbach's alpha.

SPECIFIC PATIENT GROUPS/CARE PATHWAYS
The scree plots for the unrestricted EFA models applied to the 11 sets of "patient group/pathway-specific" questions (comprised of the core questions plus questions applicable to a particular patient group/care pathway) are shown in Appendix 2. When basing the number of factors to be retained on the basis of eigenvalues > 1 we retained only one factor in eight of the 11 patient group/pathwayspecific sets of questions. For the remaining three sets of questions, there are two eigenvalues > 1 (Appendix 2). As was the case with the core questions only model, in all 11 patient group/pathway-specific EFA models restricted to one or two factors, the question on willingness to take part in cancer research (Q58) consistently had factor loadings < 0.4. This indicated that the question on willingness to take part in cancer research did not belong to either the core underlying construct of patient experience or the underlying construct of the additional patient group/ pathway-specific factor. For the 8/11 patient group/ pathway-specific sets of questions where (in restricted EFA) only one factor was retained, the noncore questions were all loaded (>0.4) onto this single factor, indicating they belonged to the core underlying construct of patient experience, with the exception of Q17 ("Were you given the name of a Clinical Nurse specialist?"). Where two factors were retained, the noncore questions all loaded onto a second factor defined by the noncore questions, namely; • Questions about support for people with cancer. • Questions about hospital stay. • Questions about support from health and social care services outside hospital.
One or more core questions were loaded onto the new factor. No cross loadings were observed on any of these models.
As with models restricted to the core questions, applying a one or two factor CFA as appropriate to the question sets, did not provide a good fit to the data (see Appendix 1, Table A3).
As with the core set of questions, the scree plots for the 11 sets of questions (comprised of the core questions plus questions applicable to a particular patient group/ care pathway) did not display any clear break or discontinuity. Instead, we retained a number of factors such that all factors present in the core questions only model were retained. In nine of the 11 patient group/pathway-specific sets of questions this was achieved by retaining five factors. In two cases, an additional factor was retained (resulting in six factor models, see Table 3) which related to • Questions about specialist nurse care. • Questions about hospital stay.
In CFA, these five and six factor models (Appendix 1, Table A3) were found to provide a good/acceptable fit to the data according to RMSEA and SRMR (RMSEA range 0.042-0.058 and SRMR range 0.028-0.056). The CFI and TLI statistics for these models were closed to achieving, or achieved, an acceptable fit (CFI range 0.931-0.954 and TLI range 0.919-0.944).

| Summary of findings
We have applied exploratory and confirmatory factor analyses to the responses to the English CPES. We found that the core set of questions which applied to all patients, and many questions which applied only to a subset of patients were dominated by a single underlying factor (as indicated by factor eigenvalues > 1). However, this single factor did not provide a good description of T A B L E 1 Factor matrix for exploratory factor analysis model restricted to a single factor and applied to the core set of 19 questions. Blanks correspond to loadings less than 0.4 the data (according to goodness-of-fit metrics), thus implying a more complex underlying structure. Visual inspection of scree plots implied that five underlying factors can describe the experiences of patients captured by the core questions applicable to all patients. These included: shared decision-making; care coordination and administration; the diagnostic process; timeliness of investigations; and aftercare and support. Many questions applicable to specific subsets of patients also fitted within the latter five underlying domains, but additional factors were required for specialist nursing and for hospital stay. These domains of care provide a good description of the data (according to goodness-of-fit metrics). Furthermore, they largely fitted with the existing structure of the survey and, in light of the data presented here, represent a reasonable target for public reporting of data and performance improvement.

| Comparisons with the literature
Previous work has examined the underlying structure of other nationwide patient experience surveys including HCAHPS and GPPS, 30,31 but this is the first time that this approach has been applied to an established nationwide experience of cancer patients. It has been long recognized that patient experience varies greatly by patient sociodemographic characteristics, including age, sex, socioeconomic status, and ethnicity. 32,33 Furthermore, for CPESs, cancer site/type is strongly associated with ratings of experience, above and beyond adjustment for other patientlevel variables. 32,34 Future work should address the question whether the underlying structure of CPES may vary by patient group.

| Strengths and limitations
We used a large sample, which allowed for precise estimation of underlying factors. We have only used data from a single year, however the survey has been conducted seven times between 2010 and 2019. During this period, there have been only small changes made to the wording of survey items. However, the number and type of question have remained largely the same, and the overall structure has remained consistent with the same sections covering the various stages of the care pathway. Therefore, the findings are likely to be generalizable across survey waves. We note that in general core questions relating to the different factors tend to be placed in close proximity to each other within the questionnaire. While this can be useful to the patient, it is possible that this proximity Question number influenced the factor structure we observed. However, when we also considered the questions applicable to certain patient groups or care pathways we found similar questions loading on to the same factor which were placed at some distance within the questionnaire. This would not be expected if proximity was the driving force behind the observed factor and thus provides further support for the 5/6 factor structure we propose.

| Implications
Our results support the current structure of the survey which in general covers the range of aspects of care and patient experience which are relevant to cancer patients. The survey seems to capture the experience of patient groups defined by different care pathways and services equally well. Furthermore, the results indicate that although factual questions (such as about participating in research, Q58) could be successfully included in care experience questionnaires, it is important to recognize that these do not, on the basis of results presented here, represent aspects of patient experience per se. This analysis can be used as the basis of supporting the construct of a number of composite indicators to summarize hospital performance with respect to cancer patient experience. For example, such composites, might target organizational performance across aspects of care experience relating to the five underlying domains/factors identified (i.e., shared decision-making; care coordination and administration; the diagnostic process; timeliness of investigations; and aftercare and support). Alongside consideration of the drivers of satisfaction with care, such composites may help users of the survey to more easily relate to study findings and prioritize bundles of actions and interventions targeting specific composite domains (we explore this further below). This could, in principle, help to increase the reliability of organizational-level scores, c From six factor model with core questions and questions applicable to patients with access to specialist nurse.
d From five factor model with core questions and questions applicable to patients with recent hospital care. e From six factor model with core questions and questions applicable to patients with recent hospital stay.
f From six factor model with core questions and questions applicable to patients with recent hospital stay. g From five factor model with core questions and questions applicable to patients with recent outpatient or day case appointments.
h From five factor model with core questions and questions applicable to patients treated by radiotherapy.
i From five factor model with core questions and questions applicable to patients treated by chemotherapy. j From five factor model with core questions and questions applicable to patients who received support from health and social care services. k From five factor model with core questions and question applicable to patients with recent outpatient appointments.
T A B L E 3 (Continued) which are known to represent a limitation of questionbased scores of the CPES survey, though this needs to be explored directly in further empirical research.
It is worth noting that while factor analysis can provide evidence about patterns of responses, it tells us little about the relative importance of the various aspects of care. If an overall summary score were to be derived, various weighting schemes could be applied. All questions or domains of care could be weighted equally, though this implies they have equal importance. Alternatively, policy-based weights may be employed reflecting an external view of the importance of different domains of care. A third option is to employ a key drivers analysis which empirically examines the importance of survey items to survey responders using their associations with a global evaluation item. Such a key drivers analysis has been carried out for a number of surveys, including CPES. 28,[35][36][37] Many of these use a selection of individual questions, but others use domain scores, which can be based on factor analyses.
The CPES is a survey with a relatively large number of questions. As such there may be some desire to shorten the questionnaire to reduce burden on responding patients. The high internal consistency of Factor 1 (shared decision-making) and Factor 3 (diagnostic process) indicates the potential for item removal. In contrast, the low internal consistency of Factor 5 (aftercare and support) indicates that there may be benefit in additional questions in this area. While factor analysis can help to identify potential questions for removal (for example by identifying domains of experience survey by a large number of questions) it should be noted that factor analysis is not considered sufficient for such purposes. 22,23 First, removing a question from a survey could be detrimental to its content validity. Furthermore, weak loadings might be the result of sampling error, although this is unlikely to be an issue in our study context, given the large sample size. As a consequence, replication of factor analytic models is critical for scale development.

| CONCLUSION
The underlying structure of the CPES corresponds to five major aspects of care experience and pathways of cancer patients. The findings support the current survey design, though they also provide potential options to guide survey redesign, and have potential to inform the way the survey findings might optimally be reported, and improvement efforts targeted.