- Top of page
- Author contributions
- Statement of interests
Dyspepsia is a common condition, and one that consumes considerable resources in both investigation and treatment and of which there is still a great deal of uncertainty regarding its management.1, 2 As a result, a number of ‘cost-effectiveness’ randomized trials of dyspepsia management strategies have been conducted. One difficulty for researchers has been choosing an appropriate outcome measure as definitions of dyspepsia have changed over the years, and cost-effectiveness studies require that the ‘effect measure’ is not contaminated by ‘resource use questions’ such as visits to a doctor. Multidimensional scales that also assess quality of life are particularly problematic, as there are better validated quality of life measures that have generalizability over other disease areas (e.g. Health Utility Index3 and EQ-5D4). As there is no ‘absolute’ definition of dyspeptic symptoms we rely on questionnaires that have established psychometrics.5 Dyspepsia symptoms can be assessed by measuring either frequency or severity. The frequency of symptoms has been found to correlate more closely with a clinical diagnosis of dyspepsia than severity, indicating that frequency may be more valid for pragmatic studies.6 In gastro-oesophageal reflux disease (GERD), severity of symptoms correlates more closely with oesophagitis cure than frequency, indicating that severity may be more responsive to change.7 Measuring both frequency and severity of dyspepsia symptoms may improve both validity and responsiveness to change of an instrument compared with measuring either alone.6–9 A final important factor for cost-effectiveness trials is that the instrument is suitable for self-completion by the subject, in terms of length and ease of comprehension.
For clinical trials based in a primary care setting, the outcome measure should have been validated in a primary care population, where the aetiology, prevalence and severity of patients’ symptoms may differ to those from secondary or tertiary care populations.10
In addition, outcome measures should be able to distinguish between patients suffering from predominantly ulcer-like symptoms (epigastric pain) or reflux symptoms (heartburn and regurgitation); some evidence suggests that predominant ulcer-like or reflux symptoms do not reliably predict endoscopic diagnosis of oesophagitis or ulcer, respectively.11
In a recent review of symptom-based outcome measures for dyspepsia and GERD trials 37 studies were identified describing 26 questionnaire outcome measures.5 Twelve assessed symptoms only, and 14 were multidimensional. Of the unidimensional questionnaires, only two assessed both frequency and severity of dyspepsia and had proven reliability, validity and responsiveness. The Reflux Disease Diagnostic Questionnaire (RDQ) is an excellent measure for GERD, but is not validated to assess dyspepsia.11 The Leeds Dyspepsia Questionnaire (LDQ)10 was the only fully validated unidimensional instrument to assess both frequency and severity of dyspepsia symptoms. Although the LDQ is a useful unidimensional outcome measure for dyspepsia, it has three main disadvantages. It is researcher administered (not self-completed), it is long (nine pages) and has a long reference time frame (6 months). The aim of this study was to validate a shortened and revised the LDQ as a suitable measure for dyspepsia trials.
- Top of page
- Author contributions
- Statement of interests
The SF-LDQ proved to be a sensitive and specific measure, acceptable to patients and suitable for high rates of self-completion. The SF-LDQ was responsive to change and able to differentiate between populations with differing prevalence, demonstrating discriminant validity. Other dyspepsia questionnaires have also been tested for discriminant validity19–21 providing additional evidence for the construct validity of this instrument.
The summed frequency scoring system demonstrated the greatest concurrent validity when analysed using the area under ROC curves and logistic regression. However, the difference in concurrent validity between this scoring system and the summed total score was not statistically significant. The summed total score has a greater range of values (0–32) than the summed frequency score (0–16), which gives greater precision. Rates of non-response to questions about symptom frequency were very low (0.5–1%), indicating that these questions were acceptable. Rates of non-response to questions about symptom severity were higher (5–6%) indicating that these items were less interpretable or acceptable to a minority.
The LDQ has previously demonstrated a sensitivity of 80% (95% CI: 65–91%) and a specificity of 79% (95% CI: 66–89%) in a primary care population.10 These values were marginally higher than the SF-LDQs sensitivity of 77% (95% CI: 68–85%) and specificity of 73% (95% CI: 68–78%) using the summed total score. However, this difference is not statistically significant, and even if the SF-LDQ was slightly less valid than the LDQ, this would be offset by the increased acceptability, feasibility and reliability of the shorter self-completed measure.17 The SF-LDQ had a high level of internal consistency when tested by Cronbach's alpha coefficient and the item-total correlation method, indicating that all of the questions in the questionnaire scale were measuring the same underlying construct, producing a high level of reliability. The SF-LDQ had a higher score for Cronbach's alpha coefficient (0.90) than the LDQ (0.69),10 suggesting that the shorter form was more accurately measuring a single construct.
Assessment of concurrent validity involves comparing the questionnaire against a ‘gold standard’. As there is no ‘gold standard’ for diagnosis of dyspepsia6, 22, 23 a GPs’ diagnosis was chosen as a quasi-gold standard. Validity has been established compared with a clinician's diagnosis in previous studies,10, 19, 24–27 whilst other studies of dyspepsia outcome measures have used different ‘gold standard’ comparisons to demonstrate concurrent validity, such as generic quality of life scores,28–33 patient self-assessment using diaries34 and dyspepsia adverse events.35 An alternative approach would have been to compare the SF-LDQ with the LDQ as the gold standard.17 However, the correlation between the two questionnaires would have been artificially inflated by the presence of four identical questions. No attempt was made to standardize GPs’ diagnosis through discussion of the 1988 Working Party definition of dyspepsia with them.18 Standardizing the GPs’ diagnosis in this way would have artificially increased the concurrent validity of the SF-LDQ, because the questionnaire is based upon the Working Party definitions.
It should be emphasized that the SF-LDQ is designed as an outcome assessment tool and not as a diagnostic tool. Although considerable effort has been made by the Rome process to disentangle reflux and epigastric pain, this has not been successful where patients have not had endoscopic investigation to exclude peptic ulcer and oesophagitis. Exclusion of patients with predominant reflux-like symptoms did not substantially alter the concurrent validity of the SF-LDQ, when assessed by the area under the ROC curve for the summed frequency score. This suggests that using the Rome II definition of dyspepsia instead of the 1988 Working Party definition has little influence on the concurrent validity of the SF-LDQ. Both GERD and dyspepsia have recently been re-defined by the Rome III panel.36, 37 These changes are designed to aid further research in selected subgroups of patients and do not alter the nature of symptoms sought as outcome measures for use in uninvestigated patients, where both reflux symptoms and epigastric pain commonly coexist.
The correlation between the test–retest SF-LDQ scores 2 days apart was 0.93, showing a high degree of reliability. Whilst only 40% of patients returned the second questionnaire, this low response rate was not unexpected for a primary care sample where most participants do not have dyspepsia. The LDQ had a weaker correlation between the two questionnaire scores (0.83) when test–retest reliability was assessed,10 but the response rate for the second questionnaire was higher (96% in a secondary care population). Validation of the SF-LDQ as a postal questionnaire was not carried out in this study. However, the reliability, validity and responsiveness should not be affected by postal completion, as it is a self-completed instrument. Interpretability and acceptability of the SF-LDQ were demonstrated, suggesting that the questionnaire should have a good response rate. The response rate was only 40% in the test–retest sample, but it was 78% in the responsiveness to change sample (secondary care), where dyspepsia was more salient to the respondents. The SF-LDQ's responsiveness to change was highly statistically significant in 37 patients receiving a treatment of known effectiveness. The standardized response mean values suggested this response to change was large. The LDQ was assessed in a similar way and was found to be equally responsive to change, although the standardized response mean was not calculated.10 Other studies have used two groups receiving treatment and placebo, in order to compare the responsiveness of the two groups.38, 39 However, it is not ethical to use placebos except in the context of a randomized-controlled trial, so this was not possible in this study. There was no alternative method of confirming that a response to change had occurred in this study, such as a blinded clinician assessment after treatment or a question on self-reported global improvement, as the treatments used have proven efficacy.40–42
Three percent of respondents commented that the text size was too small and a larger version of the questionnaire should be made available for such patients in clinical trials. Non-English speaking patients were excluded from this study, but should be included in clinical dyspepsia trials to increase generalizability. Translation of the SF-LDQ into other languages would alter its characteristics, necessitating further validation.17, 31
The SF-LDQ is a self-completed outcome measure that assesses both the frequency and severity of dyspepsia symptoms for which acceptability, interpretability, reliability, validity and responsiveness to change have been demonstrated. It is a precise measure using the summed total score of frequency and severity responses and has good feasibility due to its brevity. The SF-LDQ meets all the criteria for an outcome measure for dyspepsia in cost-effectiveness trials, and is particularly well suited to primary care trials involving uninvestigated patients.