Quality-of-life aspects of the overactive bladder and the effect of treatment with tolterodine
Article first published online: 25 DEC 2001
Volume 83, Issue 6, pages 583–590, April 1999
How to Cite
Kobelt, Kirchberger and Malone-Lee (1999), Quality-of-life aspects of the overactive bladder and the effect of treatment with tolterodine. BJU International, 83: 583–590. doi: 10.1046/j.1464-410x.1999.00004.x
- Issue published online: 25 DEC 2001
- Article first published online: 25 DEC 2001
The overactive bladder (OAB) is a multisymptomatic problem that affects people of both sexes and all ages, although it is more prevalent in the elderly [ 1–5]. The OAB is defined as urgency and frequency with or without urge incontinence. Urinary incontinence and urgency/frequency have been shown to affect a patient’s physical, social and emotional well-being [ 6–12]; the efficacy of any treatment is generally measured as a decrease in the frequency of incontinence. While such clinical efficacy measures (often surrogate endpoints) are of great value and particularly important in clinical trials, they fail to address the patients’ concept of disease burden and the consequences of treatment from the patients’ perspective. Measuring these patient-derived values is therefore important, as patients are paying a larger part of treatment costs and health outcomes are increasingly evaluated from the patients’ view. The benefits of specific treatments and of the healthcare delivery system in general will be judged in terms of the extent to which changes in patients’ functioning or well-being meet their needs and expectations.
Subjective quality of life (QoL) can be objectively measured and several instruments have been developed, validated and used extensively. Generic instruments such as the MOS Short Form (SF) 36 [ 13] provide a profile of the patients’ QoL covering several health concepts, generally grouped in physical, mental and social domains. Instruments such as the EuroQol EQ-5D [ 14, 15] measure QoL as an index and provide a health-state classification system that allows their use in cost-utility analyses. The advantage of generic measures is that they are more comprehensive and allow comparison across populations and diseases. However, they are generally not sensitive enough to detect small changes caused by treatment. Specific measures are targeted more to issues relevant to the patient and the clinician, and can thus be used to determine whether a desired treatment effect has been achieved or not. Several instruments have been developed specifically for urinary incontinence, including the Incontinence Impact Questionnaire (IIQ) and the Urinary Distress Inventory (UDI) [ 16], the York Incontinence Perception Scale (YIPS) [ 17], the Incontinence Quality of Life (IQoL) [ 18], the Incontinence Quality of Life Index (IQoLI) [ 19] and the King’s Health Questionnaire (KHQ) [ 20]. The two types of instruments are complementary and generally, they are combined in QoL research.
The objective of this review is to discuss the impact on QoL of the OAB and analyse the improvements that can be achieved with an effective and well-tolerated drug such as tolterodine. We used the SF36 and the EuroQol to assess the impact of the OAB on patients’ well-being, and the IQoLI and the KHQ to investigate the effect of treatment with tolterodine. The IQoLI was tested for use in a clinical trial setting and the KHQ, so far only validated with female patients, was also validated for use in men.
The SF36, a generic instrument, was used both in cross-sectional surveys and in clinical trials. The instrument includes 36 questions with multiple answers that are grouped into eight subscales measuring eight health concepts or domains. Each subscale is scored individually between 0 (worst) and 100 (best). It was integrated in two large ‘willingness-to-pay’ (WTP) surveys in Sweden [ 21] and in the USA [ 22] of patients with urge and/or mixed incontinence. Scores were correlated with self-reported symptoms of frequency and incontinence, and to the amounts that patients were willing to pay for symptom relief, to estimate the impact of the syndrome on patients’ QoL. It was also used in a clinical study comparing tolterodine with oxybutynin and placebo over 12 weeks in North America [ 23], and in a 10-week naturalistic clinical trial comparing tolterodine with oxybutynin in patients over 50 years of age in the UK [ 24]. Scores obtained at baseline were used to show the effect of the disease, by comparing them with published normative values of an age- and sex-matched cohort of the normal population. Table 1 shows the baseline characteristics of the different cohorts.
In all studies where the SF36 was used, scores for patients with an OAB in most domains were significantly lower than those of the normal population [ 21, 22, 25]. For comparison, scores are reproduced in Table 2. Figure 1 illustrates the scores of the Swedish patients with an OAB compared with normalized scores of the general population for each of the domains, controlled for age and sex, but not for comorbidity. Analysis of different age groups indicates that the impact of the condition is more severe in younger patients. Scores for patients younger than 70 years are significantly lower than those of age-matched normal subjects, while the difference in scores between patients and normal subjects over 70 is not significant. Table 3 illustrates the scores for different age groups obtained in the Swedish survey. The severity of symptoms of frequency and incontinence correlated negatively with the SF36 scores (P<0.001) [ 25].
The EuroQol EQ-5D was used to assess utility in the cross-sectional WTP survey in Sweden and in the clinical trial in the elderly in the UK. Utility measurement is the preferred way to integrate QoL into economic analysis, in the form of quality-adjusted life years. Utility refers to the preference that individuals or society have for a particular health outcome, generally measured on a scale between 0 (dead) and 1 (perfect health). These preferences are then used to value health states relative to one another. Utilities can be measured directly by methods such as the ‘standard gamble’ or the ‘time trade-off’ (TTO) [ 25, 26], or by using a questionnaire such as the EQ-5D. This instrument provides a measure of overall health-related QoL (utility) based on five descriptive questions with three levels of answers and a rating scale. The rating scale is a vertical, calibrated visual-analogue scale (VAS) with labelled anchors of ‘death’ (at 0) and ‘best imaginable health state’ (at 100). For the descriptive part, utility values between 0 (death) and 1 (full health) for the different combinations of possible answers have been established in the general population in the UK, using the TTO method, and are now more widely used than the rating scale [ 15]. In Sweden, 455 of 461 patients completed the questionnaire and all answers could be used. In the clinical trial in the UK, 325 patients completed the questionnaire adequately at baseline, while 53 patients did either not answer or had incomplete answers and were omitted from the analysis. Excluded patients were not significantly different from those included. As with the SF36, EQ-5D scores were compared with an age- and sex-matched cohort of the normal population.
EQ-5D scores showed the same pattern as the SF36 and scores obtained with both methods were significantly lower than those of the normal population. In Sweden, values were 0.68 and 0.65 for the TTO method and the ‘thermometer’, respectively, and correlated negatively with symptom severity (P<0.001) [ 27]. In the UK elderly population, the respective scores were 0.73 and 0.68.
However, differences in scores between the normal population and patients with OAB may not be fully explained by the impact of the syndrome. It could be expected that patients with OAB have a higher level of comorbidity than that found in the normal population. As the values for the normal population for both the SF36 and the EQ-5D were obtained from published sources, information on concomitant diseases was not available. We therefore analysed the impact of concomitant diseases on the utility scores obtained in the clinical trial in the UK, where information on comorbidity was available, with multiple regression analyses. The impact of comorbidity had, as expected, a significant negative effect on the overall level of the utility scores, but did not affect the correlation between symptoms of frequency and urge incontinence and the utility scores.
The effect of the treatment of OAB was analysed in the two clinical trials in North America and the UK using both generic and disease-specific instruments. In the North American trial comparing tolterodine with oxybutynin and placebo [ 23], QoL was measured with the SF36 and an exploratory instrument specifically developed for urge incontinence in women, the IQoLI [ 19], at baseline and after 12 weeks. In addition, SF36 scores for 161 patients in an open extension of the trial to 12 months were available. The IQoLI contains 25 questions with four possible responses coded 0–3 that are summed to an overall score of 0–75; high scores indicate a better QoL. Ten of the questions contain an additional option of ‘not applicable’, that are scored as ‘no problem’=3. Analysis of the responses to this option indicated that it was not possible to distinguish whether a question was not applicable as a consequence of the disease, or because of other external factors. Hence, further work on the formulation of the questions for this option is required, and the results reported here are only indicative. In all, 277 patients were included in the clinical trial and 260 patients completed the SF36 ( Table 1), and 214 female patients the IQoLI at baseline. In this trial, patients who withdrew from treatment were not followed and scores at the end of the study are therefore only available for patients treated with tolterodine, oxybutynin or placebo. The clinical efficacy of tolterodine and oxybutynin was equal, and both were significantly more effective than placebo, despite a considerable placebo effect. However, significantly more patients withdrew from treatment in the oxybutynin group (31% compared with 13% for tolterodine and 14% for placebo) [ 23], which is likely to bias the results. The scores presented here only address the treatment effect when patients remained on their respective treatment.
SF36 scores did not change significantly from baseline for either of the groups after 12 weeks of treatment and differences between the groups were not significant. However, most domains showed a trend to a positive change for patients treated with tolterodine, but not for patients on oxybutynin or placebo (Fig. 2). In addition, changes in symptoms of frequency and incontinence in the tolterodine group correlated positively with changes in QoL, reaching significance the mental domains (‘role, emotional’, ‘mental health’) for tolterodine (P<0.05). Correlations were not significant in the oxybutynin group and negative in the placebo group. However, after open extension of tolterodine treatment to 12 months, changes in scores were not significant in any of the domains.
When measured with the IQoLI in the North American trial, scores improved from 40.6 to 44.1 in the group treated with tolterodine, and from 43.3 to 46.9 in the group treated with oxybutynin. Scores for patients in the placebo group also changed significantly and there was no significant difference between the groups. This result did not change when the problematic ‘not applicable’ answers were scored with the opposite score (0 ‘severe problems’ instead of 3 ‘no problem’).
In the UK study [ 24], comparing tolterodine with oxybutynin, QoL was measured with the SF36 and the KHQ [ 20] at baseline and at 10 weeks. The KHQ is a specific questionnaire for the rapid assessment of women with urinary incontinence that was designed, piloted and validated over 3 years in 1100 women referred to a tertiary referral urogynaecology unit in London. The questionnaire contains 21 questions in eight different QoL domains, a domain assessing urinary coping strategies and a separate scale for measuring the severity of urinary symptoms. Answers are scored with a four-point system and include an ‘inapplicable’ option for questions relating to personal (sexual) relationships. Weighted summary scores in each domain range from 0 to 100, with higher scores indicating greater impairment.
Validation of the KHQ in men
Although the KHQ was specifically developed for women, we asked all patients to complete the measure, to investigate its applicability in men; 125 men were available for the clinical analysis at the end of the study and 121 of these had completed the KHQ. Responses from 29 patients were incomplete, in 24 because the patients had not completed the domain ‘personal relationships’, as they considered it inapplicable. Only 92 patients were therefore included in the full analysis.
Ten weeks is considered too long an interval to allow an assessment of test-retest reliability and thus only the answers at study completion were used. Internal consistency refers to the degree of correlation between the items forming a subscale and is assessed using Cronbach’s alpha statistics; the alpha coefficient should be >0.70. Construct validity was tested by correlating KHQ subscales with the SF36, as it was expected that comparable subscales would show substantial correlations. This analysis was performed for six domains of the KHQ that were comparable with the SF36, using Spearman’s correlation coefficient. Finally, we performed a test of item discriminant validity using the Multitrait Analysis Program (MAP) [ 28]. The ‘multitrait/multi-item’ approach tests whether an item correlates more with the subscale it is assigned to than with other subscales, and provides further analysis of the psychometric properties of a questionnaire.
The estimate of internal consistency gave a Cronbach’s alpha coefficient of >0.70 in all domains, with five domains having a high validity, with coefficients of 0.85–0.88. Internal consistency was comparable with the values published for females ( Table 4). The analysis of criterion validity showed significant correlations between KHQ and SF36 subscales. A comparison with the correlations detected in the published validation study for female patients resulted in comparable results for three of the domains ( Table 4). In the MAP analysis, three subscales had a perfect fit (‘personal relationships’, ‘emotions’ and ‘sleep/energy’). The scaling success of the remaining domains was satisfactory, with rates of 82.5–91.7%. In total, the scaling success reached 91.4%, with no cases in which an item correlated significantly lower with its own subscale than with another scale, which would be considered a definite scaling error. Thus, it can be concluded that the application of the KHQ to men did not jeopardize the psychometric properties of the questionnaire and that the instrument can be used in all patients.
In the UK study, 378 patients over 50 years old were enrolled and the SF36 obtained at baseline from all, although only 339 completed all domains ( Table 1), and the KHQ obtained from 338 patients. Patients who withdrew from treatment during the trial were followed as far as possible and the KHQ was obtained from 294 of 308 patients on treatment at study completion, and from 44 of 70 patients that had withdrawn. We analysed results for the two treatment groups on an intent-to-treat basis, and for patients who completed treatment. There were too few patients who had withdrawn from treatment, particularly in the tolterodine group (15 compared with 29 in the oxybutynin group) to allow subgroup analysis. We therefore analysed results for those completing and not completing treatment in the combined groups. In addition, we analysed whether the effect of treatment on QoL differed in different age groups.
SF36 scores did not change significantly after 10 weeks. However, when measured with the KHQ, the QoL of the elderly patients improved significantly; six of the eight subscales showed a significant change for both treatment groups, and coping strategies were reduced significantly. The difference between the groups was not significant in any of the domains. Table 5 shows the change in scores for the two groups on an intent-to-treat basis. ‘General health perceptions’ and ‘personal relationships’ did not change significantly. However, 39% of patients did not complete the latter, as it did not apply to them. Table 5 also shows the change from baseline for different age groups. For this analysis, we combined the treatment groups to increase the sample size. Analyses of the age groups by treatment group showed no significant difference. The positive effect on QoL was maintained in all age groups and was similar to the results for the entire cohort. However, ‘personal relationships’ improved significantly for patients younger than 60 years, while ‘sleep’ was no longer significant in this group.
Analysis of scores at the end of the study for patients completing the 10-week treatment and those who withdrew from treatment showed significantly worse scores for those not completing in five of the eight domains (Table 5). Surprisingly, the mean scores of those not completing the study had deteriorated compared with their baseline value. This was the case for four domains on the KHQ and six domains in the SF36. The change was significant for ‘general health’. However, the potential effect on patients of having to withdraw from treatment needs to be further investigated.
Subjective QoL is an important outcome and the ICS has recommended that QoL measurements be included in all studies of urinary incontinence as a complement to clinical measures [ 29]. Measuring QoL in patients with OAB, where they have different symptoms in different combinations, is not easy and no instrument covers the full spectrum. In addition, if compared with no treatment (placebo) over a short time, there is generally a very strong placebo effect, making objective analysis difficult. Generic QoL instruments are easy to use, particularly if they are as short as the EuroQol, but they are not very sensitive to differences in symptoms or to changes after treatment in patients who usually have several concomitant diseases, as with urinary incontinence. Nevertheless, there was a surprisingly strong correlation between these general scores and the severity of the symptoms of OAB in several of the populations studied.
Patients with OAB have QoL scores that are significantly worse than those seen in the general population. However, most of the scores were obtained in cross-sectional surveys, and the comparison with the normal population is based on published normal values, rather than on an actual matched cohort. However, it might be difficult to find a sample with none of the symptoms of OAB matched for comorbidity. Instead, the comparison might have to be limited to patients who are either continent or not. A Swedish study found substantial differences in QoL, measured with the Nottingham Health Profile, of incontinent elderly women compared with matched normal subjects, controlled for other illnesses and physical, marital or residential characteristics [ 30]. Patients were more emotionally disturbed and socially isolated than controls, and more likely to experience sleep disturbance, while they were no different in the other domains (energy, pain, mobility). It is therefore likely that the present results with the SF36 overstate the impact of OAB alone and further work is required to assess the difference from the normal population accounted for by OAB alone.
However, the effect of the severity of symptoms of OAB on QoL scores does not seem to be affected by comorbidity, as indicated by the multiple regression analysis of EQ-5D scores. Comorbidity had, as expected, a significant negative effect on the overall level of scores, but it did not affect the negative correlation between symptom severity and utility scores. Thus, although the difference from the normal population may be overstated, the negative effect of increasing severity of symptoms remains unaffected, and in the absence of treatments that cure the condition, a reduction of symptoms can be expected to lead to an improvement in QoL.
As a consequence of the need to measure differences after treatment, several disease-specific instruments have been developed during the past few years. Judging from the validation studies of these measures, they appear to address well the problem of urinary incontinence and can measure the impact of treatment. However, until these measures have been widely used in different settings with different treatments and by different investigators, we will not be able to judge their value fully.
The KHQ, which had been validated in a very large sample of female patients in the UK, was able to measure the impact of treatment. The two treatments compared, tolterodine and oxybutynin, have a similar clinical efficacy and therefore QoL would not be expected to differ between the groups when patients were on treatment. The main difference between the groups was in the number of adverse effects, predominantly dry mouth. However, patients suffering from severe dry mouth withdrew from treatment and it was therefore not possible to assess the effect on QoL of this side-effect. The protocol of the trial demanded that patients who withdrew from treatment were followed for the entire 10 weeks, but it is inherently difficult to follow patients who withdraw. Although 10-week measurements were available for 62% of those who discontinued, there were too few available questionnaires in the two treatment groups (15 and 29 patients, respectively) to allow an analysis of the difference or to affect the scores of the entire group. However, patients who withdrew from treatment had significantly lower scores at 10 weeks than had those on treatment, although there was no significant difference between those completing or not completing at baseline. In addition, scores of the latter at 10 weeks were below their baseline values on both the generic and the specific instruments. It remains speculative as to why patients would feel worse after having tried treatment; it is possible that the placebo effect linked to participation in a clinical trial has a positive effect on QoL scores, and not completing the trial may convey a feeling of failure and decrease the scores. However, the sample was too small to analyse this further.
Although the KHQ had been developed for women, we also asked men in the trial to complete it, to analyse its applicability to all patients. The proportion of men in the trial was higher than expected and it was therefore possible to test the psychometric properties of the instrument in men. It appears that the instrument performs as well in men as in women and can therefore be used in all patients in a trial.
The OAB affects patients’ activities and well-being severely, and significant improvements are seen with treatment, even after a short time. A treatment that is effective and well tolerated, and will therefore increase compliance, can be expected to have a profound overall effect on this patient population where under-treatment is the rule rather than the exception. Tolterodine has been studied in many clinical trials in different patient populations and settings, and has the potential to increase compliance with treatment, and therefore patients’ overall well-being.
- 1National Institutes of Health Consensus Development Conference on Urinary Incontinence in Adults. J Am Geriatr Soc 1990; 38: 265 72, , .
- 19Measuring quality of life in female urinary urge incontinence: development and psychometric properties of the IQoLI. J Outcomes Res 1997; 1: 1 8, , .
- 22Urge incontinence: quality of life and patients’ valuation of symptom reduction. PharmacoEconomics 1998; 14: 153 9, , , .
- 24UK/Eire Tolterodine versus Oxybutynin Study Group. The comparative tolerability and efficacy of tolterodine 2 mg bid versus oxybutynin 2.5/5 mg bid in the treatment of the overactive bladder. Proceedings of the ICS 1998; 220: 163 4, , , .
- 281988, , , User’s guide for the Multitrait Analysis Program (MAP). Rand Cooperation Report N-2786-RC, Rand Cooperation Santa Monica