Health utility values of breast cancer treatments and the impact of varying quality of life assumptions on cost‐effectiveness

In breast cancer research, utility assumptions are outdated and inconsistent which may affect the results of quality adjusted life year (QALY) calculations and thereby cost‐effectiveness analyses (CEAs). Four hundred sixty four female patients with breast cancer treated at Erasmus MC, the Netherlands, completed EQ‐5D‐5L questionnaires from diagnosis throughout their treatment. Average utilities were calculated stratified by age and treatment. These utilities were applied in CEAs analysing 920 breast cancer screening policies differing in eligible ages and screening interval simulated by the MISCAN‐Breast microsimulation model, using a willingness‐to‐pay threshold of €20,000. The CEAs included varying sets on normative, breast cancer treatment and screening and follow‐up utilities. Efficiency frontiers were compared to assess the impact of the utility sets. The calculated average patient utilities were reduced at breast cancer diagnosis and 6 months after surgery and increased toward normative utilities 12 months after surgery. When using normative utility values of 1 in CEAs, QALYs were overestimated compared to using average gender and age‐specific values. Only small differences in QALYs gained were seen when varying treatment utilities in CEAs. The CEAs varying screening and follow‐up utilities showed only small changes in QALYs gained and the efficiency frontier. Throughout all variations in utility sets, the optimal strategy remained robust; biennial for ages 40–76 years and occasionally biennial 40–74 years. In sum, we recommend to use gender and age stratified normative utilities in CEAs, and patient‐based breast cancer utilities stratified by age and treatment or disease stage. Furthermore, despite varying utilities, the optimal screening scenario seems very robust.


What's new?
Utility scores facilitate calculations of quality-adjusted life years (QALYs), which represent a measure of the value of health outcomes.With regard to breast cancer, however, utility scores are inconsistent, with consequences for QALY evaluation and cost-effectiveness analysis (CEA).
In our study, new utility values for breast cancer treatment were determined at different time points and stratified by different measures.Evaluation shows that normative utilities stratified by age and gender most reflected real-world situations and were therefore most meaningful for CEA.Likewise, patient-based breast cancer quality-of-life parameters stratified by age and treatment or by disease stage were most appropriate for CEA.

| INTRODUCTION
3][4][5] In the Netherlands, the 10 year survival rate of invasive breast cancer increased from 40% in 1961-1970 to 80% in 2011-2020. 6searchers and physicians not only aim to reduce breast cancer mortality, they also aim to improve the health related quality of life (HrQoL) of patients with breast cancer.Especially with an increasing population of patients with a good prognosis, also the effects of treatment choices on HrQoL have become more important.Diagnosis and treatment of breast cancer have both physical and psychological effects on patients and impact their HrQoL.To measure the HrQoL in patients, multiple disease specific and generic patient reported outcome measures (PROMs) have been developed.8][9] Generic PROMs such as the 36-Item Short Form Survey (SF-36) and the EuroQol EQ-5D are non-disease specific and can be used to compare HrQoL between groups of people, irrespective of their health condition. 10,11Some generic PROMs can be used to calculate a utility score which represents the desirability of an individual's health state at a particular point in time and generally ranges between 1 (perfect health) and 0 (equal to death), but can also be negative (worse than death). 12Subsequently, utility scores per specific health state can be applied in cost-effectiveness analyses to calculate quality adjusted life years (QALYs) of a specific group of people over time.
Utility scores have been reported for a range of health states, diseases and populations.4][15][16][17] The differences between these estimates can largely be explained by three issues.
16]18 The United Kingdom National Institute for Health and Care Excellence (NICE) recommends the use of the EQ-5D, which has led to an increase of the use of the EQ-5D over time. 19Second, studies differed in the population who completed the HrQoL measures on which utility scores were based.A systematic review on oncological cost-effectiveness studies found that only 25% of the studies used utility scores which were based on responses of people who actually experienced the specific health state. 15In breast cancer specifically, a systematic review found that 11% of the metastatic breast cancer utilities and 43% of the early breast cancer utilities were based on responses of people who actually experienced the specific health states. 16The other utilities were based on responses of clinicians, researchers, the community, or patients who were asked to value a hypothetical health state.5][16] Third, utility scores were found to differ between countries even when populations, socioeconomic status, health systems, and attitudes toward health were expected to be quite similar. 20It would be most accurate to use utility scores derived from the same population as the target population of a study.Therefore, the EQ-5D has country-specific value sets to translate the answers to the questionnaire into a country-specific utility score. 21 cost-effectiveness analyses with QALYs as effectiveness measure, utility values are applied to health states relevant to the disease or intervention studied.Cost-effectiveness analyses concerning cervical cancer, colorectal cancer, and oesophageal adenocarcinoma screening showed that outcomes depended on which utility scores were used. 18,22For instance, in a study by de Kok et al, varying HrQoL parameters in health states related to cervical cancer screening and treatment led to variation in the optimal screening strategy with differences in preferred primary test, number of life-time tests and screening interval. 18This shows that the chosen utility values in cost-effectiveness analyses may influence the results of the analyses and thereby possibly policy recommendations.For breast cancer health states a variety of utility values are available.However, there are uncertainties which values are best to use. 13,14,17These uncertainties regard utilities of the normative population, disutilities of breast cancer states (ie, the decrease in utility value due to a specific health state), and disutilities concerning screening and follow-up.
Therefore, the aim of our study is to determine new utility values for various breast cancer treatment options at different time points during treatment stratified by age.In addition, our study aims to quantify the impact of different sets of normative, breast cancer treatment and screening and follow-up utility values on the cost-effectiveness of a large set of breast cancer screening strategies.

| METHODS
Our study consisted of two parts: (1) a longitudinal prospective cohort study on health utility values during breast cancer treatment and (2) a methodological microsimulation study on the effects of varying quality of life assumptions on cost-effectiveness.

| Data collection
Breast cancer utility scores were calculated using an institutional database including questionnaires responses of 734 women with breast cancer who received axillary treatment for lymph node staging and/ or metastasis in the Erasmus MC Cancer Institute in the Netherlands between November 2015 and December 2021. 23Inclusion criteria were women with a breast cancer diagnosis, who received axillary treatment and completed a set of quality of life related PROMs in the 'patient data platform', the Institute's online PROM collection tool.This tool included the generic HrQoL questionnaire EuroQol EQ-5D-5L. 24The EQ-5D-5L is the updated version of the originally developed EQ-5D-3L and offers five answer options per HRQoL dimension which makes the results more sensitive to small variations in HrQoL than those of the EQ-5D-3L. 25,26Exclusion criteria were patients undergoing proton therapy or palliative treatment, or having a history of previous breast cancer.Data of patients were also excluded if clinical data were unavailable or if patients had not completed the EQ-5D-5L.Informed was obtained during the first questionnaire, as part of routine care protocol. 27,28Specific information about the cohort was published elsewhere. 23e database contained self-reported sociodemographic data, clinical treatment data from medical records and EQ-5D-5L data from questionnaires completed before surgery (baseline), 6 months postsurgery and 1-year post-surgery.

| Statistical analyses
The answers to the EQ-5D-5L questionnaires were transformed into a utility score using the Dutch value set. 29Average utility scores were calculated for patients stratified by treatment allocation and age.
Treatment was categorised using four characteristics of treatment

| Microsimulation modelling
The microsimulation screening analysis breast (MISCAN-Breast) model was used to simulate individual life histories of a population of women from birth to death and, in a subset of women, the natural history of breast cancer (Figure S1). 4,30Breast cancer treatment parameters were updated with data up to 2013, the natural history and breast cancer survival rates were calibrated with data up to 2015. 31ditional information about the MISCAN-Breast model can be found in the Supplementary Methods Section in Data S1 and elsewhere. 4,31,32 our study, we simulated a cohort of 10 million women at average risk of developing breast cancer.All women were assumed to be born on 1 January 1980 and life tables, breast cancer parameters and screening parameters were based on data from the Netherlands.Outcomes were calculated for the women from age 40 until death.To calculate the full potential of the screening strategies, attendance rates were set at 100%.In total, 920 breast cancer screening policies were simulated varying in age groups eligible for screening and screening interval.The simulations with the MISCAN-Breast model were performed as in the cost-effectiveness analyses performed previously; additional details on parameters and assumptions can be found in the original article. 32e results from the simulations were used to calculate the number of QALYs gained and additional costs compared to no screening per screening policy.Furthermore, incremental cost effectiveness ratio's (ICERs) were calculated to form efficiency frontiers.

| Quality of life parameters
To evaluate the effect of different utility sets on cost-effectiveness, different sets of (1) normative, (2) breast cancer treatment and (3) screening and follow-up utility values were used to calculate QALYs gained and ICERs.

Normative utility values
First, cost-effectiveness analyses were performed using different utility sets for the normative health state (ie, the average health state of a comparative population without the disease of interest).Normative utility values which were applied were perfect health (utility score of 1), gender specific (0.858) 29 and gender and age specific 33 (Table 1).

Breast cancer treatment and screening and follow-up utility values
were used as in the study by Kregting et al. 32

Breast cancer treatment utility values
Second, cost-effectiveness analyses were performed using different utility sets for breast cancer treatment health states.These utility values were (1) equal to the values from Kregting et al 32 which were based on Stout et al, 34 and (2-5) based on the values from part 1 of the current study (Table 1).Utility scores at diagnosis were used in the analyses for a duration of 1 month, treatment scores for the following 11 months and recovery scores for the subsequent 1 year.Furthermore, screening and follow-up utility values were similar to Kregting et al and normative utility values were age and gender specific. 32,33reening and follow-up utility values Lastly, cost-effectiveness analyse were performed using different utility sets for breast cancer screening and follow-up health states.These utility values were varied based upon the values by de Haes et al  1).Therefore, the analyses of these utility sets can be seen as a sensitivity analysis.Furthermore, breast cancer treatment utility values were similar to Kregting et al and normative utility values were age and gender specific. 32,33r some treatment categories, no patient data were available to calculate utility values for.This was the case for some treatment options in the age group 75 years and older.Therefore, the assumption was made that the effect of the treatment on utility values for women over 75 years of age would be the same as for women aged between 64 and 75 years receiving the same treatment with a correction based on the difference in normative utility score between the two age groups (factor 0.99).Furthermore, there were no utility data for patients with breast cancer who had no surgery.Therefore, the assumption was made that women without surgery under the age of 75 years had a poor prognosis, possibly due to metastasis or comorbidity, and therefore a poor quality of life with an assumed disutility factor of 30% compared to normative utilities.Women aged 75 years and older who did not get breast cancer surgery were assumed to have latent tumours which would probably not lead to breast cancer death.Therefore, the disutility factor for this group was assumed to be 15% compared to normative utility values.
Data from the Netherlands Cancer Registry (NKR) and Netherlands Comprehensive Cancer Organisation (IKNL) on the use of breast cancer treatment options in the total population of patients with breast cancer diagnosed in 2017 in the Netherlands were used to determine treatment usage per age group and stratified by mode of detection (Tables S2-S5).The modes of detection were detected in screening, interval cancer (maximally 30 months after a screening examination) or clinically detected cancers (in women who did not attend screening in at least 30 months).

| Cost-effectiveness analyses
A healthcare payer perspective was adopted and direct medical costs were calculated, including costs of screening, diagnostics and treatment.
Cost parameters were similar to the analyses of Kregting et al in which they were largely based on a study by Geuzinge et al (Table S1 and Supplementary Methods in Data S1). 32,35Per utility category, multiple costeffectiveness analyses were performed differing in utility scores only.All other parameters remained the same and are equal to the costeffectiveness analyses previously performed by Kregting et al.This includes discounting at 3.5% per year for QALYs and costs from 2020.
Subsequently, ICERs were calculated by dividing the difference in costs by the difference in QALYs between screening strategies.Therefore, the ICER reflects the costs required to gain one QALY compared to the previous strategy.ICERs were not calculated for strategies that were dominated by another strategy (ie, another strategy gained more QALYs and required less costs).Per utility score set, an efficiency frontier was drawn with all strategies which were not dominated and therefore had an ICER.The ICERs were compared to a conservative willingness to pay (WTP) threshold of €20,000 per QALY gained. 36Strategies that did not exceed this threshold were considered to be cost-effective.Per utility category, the efficiency frontier were compared between the different utility score sets.Moreover, the optimal strategies, according to the WTP threshold, for each utility set were compared.

| RESULTS
In total, 734 patients were identified in the institutes' online patient database until 31 December 2021 (Figure 1).Results of the cost-effectiveness analyses showed that variation in normative utilities had an effect on the number of QALYs gained per modelled screening strategy (Figure 2A).The 'perfect health' utility set resulted in a higher amount of QALYs gained compared to the other two sets.Only small differences in QALYs gained were observed between the 'gender specific' and 'gender and age specific' utility sets where the first one was slightly lower.Compared between all three utility sets, the efficiency frontiers were very similar in regard to which strategies were on the frontier.Taking all considerations into account, the optimal strategy was biennial screening for ages 40-76 for all three utility sets (Table S6).
The analyses varying breast cancer treatment utilities showed small differences in QALYs gained per strategy (Figure 2B).For the triennial and quadrennial strategies, the efficiency frontier overlapped largely (Table S7).However, for the biennial strategies the differences in QALYs gained were a little larger, and these differences were even larger for annual strategies.Also here, the strategies on the efficiency frontier were similar between utility sets used.Furthermore, the utility sets 'stratified by disease stage', 'stratified by age and chemotherapy' and 'stratified by age and chemo and/or endocrine therapy' led to an optimal strategy of biennial screening for ages 40-76, whereas the sets 'stratified by age and type of surgery' and 'stratified by age and endocrine therapy' led to biennial screening for ages 40-74 to be the optimal strategy.
The sensitivity analyses on the screening and follow-up utility values showed that the variation in these values did not substantially to perform a sensitivity analyses.Therefore, no preferred set can be appointed.However, since the amount of QALYs gained, the strategies on the efficiency frontier, and the optimal strategy were very similar for all utility sets, it can be concluded that the screening and follow-up utility values as found by de Haes et al give robust results. 37e effect of differing quality of life parameters was most substantial for normative utility values because of the large number of simulated person-years that were experienced in normative health.In contrast, only a proportion of individuals was simulated to develop breast cancer with a duration of a limited period of time, resulting in fewer person-years with breast cancer than in normative health.
Therefore, the variations in normative utility values had the largest effect on total number of QALYs gained.Given that the normative utility values from Clarijs et al were retrieved from a large population of women in the Netherlands and were stratified by both gender and age, these parameters can be considered best estimates. 33Because the QALY estimates in the analyses using only gender specific utility values were quite similar to QALY estimates using gender and age specific utility values, we conclude that?gender specific utility values can be used in the absence of gender and age stratified normative utility values.Also of importance is the use of country specific utility parameters, especially for normative utilities which should preferably be calculated using the same measure and the same value set as the data on the disease or intervention of interest. 38,39A smaller effect on the number of QALYs gained was seen when varying breast cancer treatment quality of life parameters.This was expected for the utility sets which were all based on the same dataset and stratified differently on treatment.However, a bigger difference was expected for on population level. 37Moreover, all cost-effectiveness analyses with variations in utility sets resulted in a very similar list of screening strategies on the efficiency frontier and two quite similar optimal screening strategies of biennial screening for ages 40-74 or ages 40-76   (when using a WTP threshold of €20,000 per QALY gained).This showed that the benefit-harm balance of these strategies was robustly advantageous over other investigated screening strategies.
Earle et al reported utilities for patients with breast cancer undergoing BCS, chemotherapy or endocrine therapy to be between 0.97 and 1.0 compared to 1.0 for a normative population. 15In the current study, the utility values for these categories are much lower ranging between 0.71 (age 75+, chemotherapy yes, during recovery) and 0.91 (age 45-54, chemotherapy yes endocrine therapy no, at diagnosis).
When taking into consideration the gender and age specific normative utilities, the differences become smaller, but remain present.An explanation for these differences can be found in the timing of the valuation compared to the treatment, the quality of life instrument used and the population asked to value the health states. 16few studies investigated the effects of varying quality of life parameters in cost effectiveness analyses.A study on colorectal and oesophageal cancer screening strategies found that varying utility parameters could substantially impact the number of QALYs gained. 22wever, this did not seem to impact which strategies were on the efficiency frontier, which is comparable to the finding of the current study in breast cancer screening strategies.On the contrary, a study by de Kok et al found different optimal strategies for cervical cancer screening when using different utility sets. 18Our study even found different screening modalities to be optimal depending on the set of as the size of the disutilities.In addition, the patient treatment utilities in the current study were largely based on the same set of data which caused them to be rather similar.Moreover, compared to the studies on colorectal, oesophageal and cervical cancer screening, the current study modelled many more screening strategies in the cost-effectiveness analyses.Despite the higher number of strategies, the strategies on the efficiency frontier and the optimal strategy based on the WTP threshold were more robust.
A major strength of the current study was that both normative and breast cancer utility parameters in the cost-effectiveness analyses were based on prospective, longitudinal real-world data collected using PROMs.Moreover, the use of the validated EQ-5D-5L in combination with the Dutch value set caused the utility values to be of high quality and representative for the Dutch population. 29The utility values for patients with breast cancer were calculated using prospective cohort data from the population who actually experienced the specific health states studied.Therefore, the utility values were more representative of the actual health state patients experience than in studies where doctors, nurses or the general population valued health states that they never experienced themselves.A limitation of the dataset used was that all patients were treated in Erasmus MC, which is an academic hospital.Possibly, this could have influenced the representativeness of this patient population compared to the all patients with breast cancer in the Netherlands.In addition, due to the inclusion criteria of the dataset, only patients who received surgery were included.Therefore, no quality of life data were available on patients with breast cancer who did not receive surgery and assumptions had to be made on utility parameters for this group.Also in literature, utility parameters for this group were hard to find, therefore the assumptions made are less reliable.Nevertheless, it is a strength of our study that this population was taken into consideration because data from IKNL showed that there is a substantial group of patients with breast cancer who did not receive surgery, especially the patients who did not participate in screening.Another limitation of the dataset was that, despite the high number of patients, some categories included only small numbers of patients or none at all after stratification by age and treatment type.These could all be explained by the fact that certain treatments are very uncommon in certain age groups (eg, chemotherapy in patients over the age of 75).To provide for utility parameters in the cost-effectiveness analyses, the values from the closest age group and a factor based on normative differences between the age groups were used.
In conclusion, our study provided new data-based utility values for patients with breast cancer stratified by age and treatment options which can be used in cost-effectiveness analyses.Furthermore, it showed that the use of gender and age stratified normative utilities and patient-based breast cancer quality of life parameters stratified by age and treatment, or disease stage are recommended.In addition, number of QALYs gained was not sensitive to variations in screening and follow-up utilities and efficiency frontiers and optimal screening strategies were found to be very robust.
the 'Kregting et al' utility set which used rather different utility parameters based on stage at diagnosis.The analyses showed that disutilities attributed to the health state 'breast cancer leading to death' had the biggest effect on the number of QALYs gained.Furthermore, sensitivity analyses of the screening and follow-up utility parameters showed that the results of the cost-effectiveness analyses are robust.Although the estimates of de Haes et al have been made more than 30 years ago and are debated sometimes, they were shown to lead to robust conclusions when varied in sensitivity analyses from half up to double the value estimates resulting in robust QALY results

1
Quality of life parameters applied to health states in the different cost-effectiveness analyses.
T A B L E 2 Population characteristics at baseline.
a BCS; breast conversing surgery.