The validation of the Dutch OAB‐q SF: An overactive bladder symptom bother and health‐related quality of life short‐form questionnaire

The overactive bladder quality of life short‐form questionnaire (OAB‐q SF) evaluates both symptom bother and health‐related quality of life in patients with OAB, a highly prevalent disease. The objective of this study was to translate and validate a Dutch version of the OAB‐q SF.


| INTRODUCTION
Overactive bladder (OAB) is defined as urgency, with or without urgency urinary incontinence, usually associated with frequency and nocturia. The prevalence of this condition is described to be between 13% to 16% worldwide, and is expected to increase as a result of the aging of the population. It has been shown to have a great negative impact on an individual's health-related quality of life (HRQOL). All this causes a high burden on society. [1][2][3] OAB is a symptom-based condition, with low-positive and negative predictive values for urodynamic investigations. 4,5 The best method available to diagnose disease, quantify disease severity, and evaluate treatment effects is therefore the use of patient-reported outcome's (PROs), usually in the form of a questionnaire. Since the introduction of PROs, many different questionnaires have been developed. To compare the burden of OAB in patients and define guidelines for treatment, consensus is necessary on the specific questionnaire to use. The EAU (European Association of Urology) and the ICS (International Continence Society) guidelines do not recommend specific questionnaires to use for OAB, but both professional organizations mention that it is important to use questionnaires validated in the language of use. 6,7 The International Consortium for Health Outcome measurements (ICHOM) aims to improve value-based healthcare by defining global standard sets of outcome measures for different conditions. A core set of outcome measures for OAB which includes the OAB-q SF questionnaire, was developed in 2017. 8 The OAB-q Short Form (SF) is a worldwide used questionnaire for health-related quality of life in patients with OAB. OAB-q SF is the shorter version of the 33-item "OAB-q" questionnaire. The OAB-q SF includes 19 items; a six-item symptom bother scale and a 13-item healthrelated quality of life (HRQOL) scale. 9,10 Before implementing the ICHOM set of outcome measures for OAB in the Netherlands, the OAB-q SF questionnaire needs to be translated and validated in Dutch. Therefore, the aim of this study is to translate and validate the OAB-q SF in the Dutch language.

| Study design
This is a single-center, prospective cohort validation study, for which approval was obtained by the Ethics Review Board.

| Patient group
All patients seen at the Urology outpatient clinic in between April 2018 and February 2019 diagnosed with OAB were eligible for screening. OAB was defined as urinary urgency, with or without urinary incontinence. Inclusion criteria were age 18 years and above and being fluent and literate in the Dutch language. Exclusion criteria consisted of urinary diversions, a history of/or active malignant tumors of the urinary tract, hematuria, bladder stones, neurogenic bladder, dementia, mental retardation, and symptomatic urinary tract infection. The treating physician explained the study to patients eligible for inclusion and invited to participate. After signing informed consent patients were asked to complete the questionnaires during the inclusion visit (test) and 2 weeks later at home (retest). Characteristics of the included patients were extracted from the medical records.

| Reference group
Patients who visited the department of Allergology outpatient clinic between September 2019 and December 2019 were invited as reference group. Inclusion criteria were age 18 years and above and being fluent and literate in the Dutch language. Exclusion criteria consisted of a urological medical history or current bladder problems, dementia, and mental retardation. We considered these patients as a proper control group as allergy pathology has no relationship with bladder problems; those with bladder problems were indeed not eligible for inclusion in the reference group. Patients who met inclusion criteria were informed by their treating physician and if willing to participate, informed consent was signed and one set of questionnaires was completed.
• The OAB-q SF is a 19-item, self-administered disease specific instrument derived from the OAB-q. 9,10 The OAB-q SF contains two main subscales: Symptom bother (six items) and Health-Related Quality of Life (HRQOL, 13 items). Each item is rated on a six-point Likert scale, for the symptom bother scale ranging from 0 (not at all) to 6 (a very great deal) and for the HRQOL scale from 0 (none of the time) to 6 (all of the time). The two subscales are separately summed and, on the guidance of the scoring manual, 9 transformed into scores ranging from 0 to 100. A higher score on the symptom bother scale indicates a greater symptom severity and a higher score on the HRQOL scale indicates a better HRQOL, so they are inversely related to each other. These two scores, are always be mentioned separately, since the OAB-q SF has no total score. • The EQ-5D-5L questionnaire (European Quality of life 5-Dimension 5-Level questionnaire) developed by the EuroQol group, is one of the most used PRO instruments for the measurement of HRQOL. 11 It consists of five questions addressing mobility, self-care, activities, pain/discomfort, and anxiety/depression, the answers are transformed to an index value ranging from 0 (inability) to 1 (no problems) by using the accessory index value calculator. In addition, the health state is self-reported by completing a visual analog scale (VAS) ranging from 0 "the worst health you can imagine" to 100 "the best health you can imagine." • The UDI-6 is a six-item symptom inventory, specific to symptoms associated with lower urinary tract dysfunction. It combines information on irritative, stress and obstructive/discomfort symptoms of the lower urinary tract. 12 This questionnaire has been translated and validated in Dutch and the mean score of the six items is converted to a 0 to 100 scale on the guidance of the scoring manual. 13 • The ICIQ-OAB questionnaire indicates the symptom bother of frequency, nocturia, urge, and incontinence in four questions. The impact on quality of life of these four problems is self-reported by completing four bother scales from 0 to 10. According to the design of the questionnaire the results of the ICIQ-OAB questions are summed creating a score; ICIQ-OAB Q (questions). Furthermore, in the present study the bother scales are summed; ICIQ-OAB BS (bother scales), creating a value ranging from 0 to 40 indicating the HRQOL. The design of the questionnaire does not indicate how to calculate the total score of the bother scales.

| CROSS-CULTURAL ADAPTION
The cross-cultural adaption of the original English OAB-q SF into the Dutch language was done according to the standardized guidelines for linguistic validation. 14 The forward translation of the English OAB-q SF into the Dutch OAB-q SF was performed by three professional native Dutch-speaking translators separately. During a consensus meeting discrepancies between the three translations were discussed with the translators, two urologists (BB and JS) and the primary investigator (IG). The final version (see the Supporting Information material) was backwardtranslated by a native English-speaking translator. To confirm the content validity of the Dutch version, the questionnaire was evaluated face-to-face with five patients visiting the urology outpatient clinic.

| Content validity
The content validity was assessed during the linguistic validation by patient and researchers (IG, BB, and JS). Researchers subjectively evaluated the correspondence between the clinical symptoms of OAB and the questions. Patients reported on the formulation of the questions and clarity of the questions during the face-to-face evaluation.

| Internal consistency
By assessing the correlation between different items within the questionnaire, the internal consistency is examined, demonstrating whether the items measure the same underlying construct. The Cronbach's α was calculated for the two subscales of the OAB-q SF. A Cronbach's α between 0.70 and 0.95 was considered to reflect adequate internal consistency. 15

| Reproducibility
The reproducibility is the degree to which repeated measurements in the test-retest period provide similar answers. When testing the reproducibility, a distinction between the reliability and agreement is made. 15,16 Reliability is determined by the degree to which patients can be differentiated from each other, despite the measurement error. This was expressed by the intraclass correlation coefficient (ICC) for agreement, scores over 0.70 are acceptable. Furthermore, the agreement indicates the measurement error which is the similarity in scores rated on separate occasions. The limits of agreement (LOA) were expressed as the mean change in scores of repeated measurements of 1.96 × standard deviation of the changes. 16,17

| Criterion validity
The criterion validity, that is, the extent to which the OAB-q SF questionnaire scores relate to a gold standard, is determined with the Pearson's correlation coefficient (range, −1 to 1) in case of a linear association and when a linear association is not seen, the Spearman correlation coefficient. For OAB, a gold standard does not exist, and instead the UDI-6 and the ICIQ-OAB (Q and BS) served as such.

| Construct validity
Predefined hypotheses about the relation of the OAB-q SF to other instruments were tested. The construct validity is considered adequate when at least 75% of the results of predefined hypotheses are in accordance. 15 The following hypotheses were formulated: 1. The reference group will have lower OAB-q SF symptom bother scores and higher OAB-q SF HRQOL scores than the patient group. 2. Patients with a higher UDI-6 score will have a higher OAB-q SF symptom bother score. 3. Patients with a higher ICIQ-OAB Q (questions) score will have a higher OAB-q SF symptom bother score. 4. Patients with a higher ICIQ-OAB BS (bother scale) score will have a lower OAB-q SF HRQOL score. 5. Patients with a lower EQ-5D-5L index value and patients with a lower EQ-5D-5L VAS will have a lower score on the OAB-q SF HRQOL.

| Floor and ceiling effects
Floor and ceiling effects were considered if more than 15% of the respondents would achieve the lowest-or highest-possible score. 15 The floor and ceiling effects were calculated for symptom bother and HRQOL scores at baseline in the patient and in the reference group.

| STATISTICAL METHOD
A sample size of at least 50 participants was considered adequate for validation of questionnaires, 15 thus we aimed to include a total of 100 patients, 50 in the patient group and 50 in the reference group. Continuous data are presented as mean, standard deviation (SD). The Student t test and the χ 2 test for continuous and categorical variables, respectively, were used evaluating differences between patient and reference group. Statistical analyses were performed using SPSS version 24.0 (IBM Corp, Armonk, NY). Statistical significance was defined as P < .05.

| RESULTS
In total, 103 participants were included in the study. In the patient group, 56 patients signed an informed consent, of whom 52 patients completed the questionnaires at both time points. Four patients did not return the second questionnaire and were therefore excluded from the analyses. The reference group consisted of 51 participants who completed the questionnaires at one time point. Table 1 displays the patient characteristics and the baseline scores of the four questionnaires.

| Content validity
Content validity was confirmed during the face-to-face evaluation of the questionnaire. Question 8 of the OAB-q SF HRQOL subscale was discussed, but did not lead to changes in the questionnaire. Furthermore, the face-to-face evaluation demonstrated that patients found the questionnaire understandable, easy to complete and clear.

| Internal consistency
The internal consistency of the questionnaire was tested good for both subscales. Cronbach's α's between 0.70 and 0.95 reflect adequate internal consistency. In the patient group the OAB-q SF symptom bother subscale the Cronbach's α scores were 0.84 and 0.87 for test and retest, respectively. For the OAB-q SF HRQOL subscale the Cronbach's α were 0.88 and 0.91 for test and retest, respectively.

| Reproducibility
In the patient group, the second questionnaire was returned after a mean of 15.8 days (SD ± 11). An adequate reliability was confirmed with ICCs higher than 0.70 for the two subscales of the OAB-q SF. Table 2 lists the ICCs for agreement and LOA ranges for the two subscales of the OAB-q SF.

| Criterion validity
Using the Pearson correlation coefficient a moderate to very strong correlation was detected between the OAB-q SF symptom bother and the UDI-6 and the ICIQ-Q. The criterion validity of the OAB-q SF HRQOL was evaluated by calculating the correlation with the IQIQ-BS and the EQ-5D-5L index values and VAS. Calculating the correlation with the EQ-5D-5L index values, the spearman correlation coefficient was used since no linear relationship was found between the OAB-q SF HRQOL and the EQ-5D-5L index value. Correlations demonstrated a weak to strong correlation (See Table 3 for ρ and P values).

| Construct validity
All predefined hypotheses were confirmed: 1. The reference group did have lower OAB-q SF symptom bother scores and higher OAB-q SF HRQOL scores compared to the patient group (Table 1).

2.
Patients with a higher UDI-6 score had a higher OABq SF symptom bother score (Table 3). 3. Patients with a higher ICIQ-OAB Q (questions) score had a higher OAB-q SF symptom bother score 4. Patients with a higher ICIQ-OAB BS (bother scale) score had a lower OAB-q SF HRQOL score (Table 3). 5. Patients with a lower EQ-5D-5L index value and a lower EQ-5D-5L VAS had a lower score on the OAB-q SF HRQOL (Table 3).

| Floor and ceiling effects
In the patient group, no floor or ceiling effects were seen for the two subscales (Table 4). In the reference group, floor effects were seen for the symptom bother subscale; 17.6% scored the lowest possible score of 0. Moreover, in the HRQOL subscale, a ceiling effect was seen, in that, 29.4% of patients scored the highest-possible score.

| DISCUSSION
The primary aim of this study was to translate and validate the OAB-q SF in the Dutch language. The results of this study showed that this Dutch version is valid, reliable and consistent. This enables the use of the OAB-q SF in daily practice in the Netherlands. A valid tool to measure both symptom bother and health-related quality of life in patients with OAB in an easy and fast way. The content validity of the questionnaire was confirmed during the face-to-face evaluation. Question 8 of the OAB-q SF HRCOL subscale was discussed. One patient commented on question 8 in the health-related quality of life subscale: "During the past 4 weeks, how often have your bladder symptoms caused you to have problems with your partner or spouse?" The issue was that response option "not applicable" was lacking for those who had no partner. Because adding this response option would complicate the scoring manual, we discussed this problem with the designers of the original questionnaire. 9 In the cohort of Coyne et al, 9 patients either leaved the question blank, and it was recorded as missing, or patients answer was "None of the time" given that when it is not applicable, it really is none of the time. Therefore the Dutch version did not insert "not applicable" as answer option, and no changes were made as a result of this discussion. 9 Moreover, according to the scoring manual of the OAB-q SF, the total score can be adapted to up to 50% of missing items, still creating a score ranging from 0 to 100. The significantly different scores in the patient group (higher in symptom bother and lower in HRQOL) compared to the reference group, indicated a good discriminative ability and possible diagnostic value of the OAB-q SF. Comparable to the Cronbach's α's of the original OAB-q SF (0.82 and 0.91) and the Spanish validation (0.81 and 0.92), 9,18 the Cronbach's α's of the Dutch OAB-q SF were good (0.83 to 0.89), and demonstrated an excellent internal consistency. Using the change in scores between the test-retest, the agreement and the limits of agreement were calculated, demonstrating an adequate reliability and reproducibility. These results are in accordance with the original OAB-q SF study, 9 and the Spanish validation study. 18 Concerning the criterion validity, the present study used the UDI-6, the ICIQ-OAB, and the EQ-5D-5L to correlate with the OAB-q SF because of the absence of a gold standard. As expected, the symptom bother subscale showed a strong correlation with the UDI-6 and the ICIQ-questionnaires for both test and retest. Moreover, the OAB-q SF HRQOL subscale showed a strong correlation with the ICIQ-bother scales, but the correlations with the EQ-5D-5L index value and EQ-VAS were moderate. The ICIQ-OAB bother scales are focused on OAB symptoms and the EQ-5D-5L is more in general, which might be a possible explanation for the moderate compared to strong correlation. The Spanish validation study also used the EQ-5D and -VAS and showed comparable, moderate correlations. 18 All predefined hypotheses in the present study were confirmed demonstrating that patients and references are well distinguishable, and therewith showing a good construct validity.
In the patient group, no floor and ceiling effects were detected, which implies that although many patients had severe OAB, the questionnaire is still discriminative enough to detect worsening or improvement of symptom bother or in HRQOL. In the reference group, as expected, a floor effect was found in the symptom bother scale (17.5%), indicating that in the reference group patients had no bother due to bladder problems. Moreover, a ceiling effect was seen in the HRQOL scale (29.4%), indicating that in the reference group, bladder problems were not severe enough or not present to create a decrease in HRQOL.
The strength of the current study is the use of standardized measurement properties as described by Terwee et al 15 to evaluate the reliability and validity of the OAB-q SF. The current study did not determine the responsiveness and interpretability due to short followup, and a lack of therapy changes over time in the study group. This is a limitation of the study, however previous literature on the English OAB-q SF demonstrates a good responsiveness and interpretability. 9 There was a difference in the mean age between the patient and the reference group. The reference group is only used for one  of the four hypotheses of the construct validity. All the other measurement properties are calculated without the use of the reference group, so without influence of this age difference. Another limitation of the study is the absence of a gold standard to assess the criterion validity. On the other hand, the absence of a gold standard in this highly prevalent disease demonstrates the need for a good PRO in OAB. The choice to implement the OAB-q SF in the ICHOM OAB-set suggests that this questionnaire might be a valid PRO for OAB symptoms.

| CONCLUSION
In conclusion, this Dutch version of the OAB-q SF showed a good validity and reliability according to wellestablished guidelines on measurement properties. The OAB-q SF is a suitable instrument for assessing both symptom bother and HRQOL in patients suffering from OAB. We recommend the use of this measurement tool in both research and clinical practice.