A review of guidelines on benign prostatic hyperplasia and lower urinary tract symptoms: are all guidelines the same?


C.T. Brown, Clinical Effectiveness Unit, The Royal College of Surgeons of England, 35/43 Lincoln's Inn Fields, London WC2A 3PE, UK.
e-mail: cbrown@rcseng.ac.uk


The Clinical Practice Guidelines on BPH/LUTS are examined by authors from London and Poitiers. They found in their review of the literature that the overall and methodological quality of such guidelines varies widely. They acknowledge the difficulties in developing careful guidelines, but suggest a formal appraisal of quality and methods, as these are the ones more likely to help urologists in decision-making.

There are three papers on the prevalence of symptoms relating to lower tract conditions. The first examines male urinary incontinence in four European centres, the second nocturia and its effect on quality of life and sleep in a US community sample, and a further paper describes the prevalence diagnosis and treatment of prostatitis in Italy.

A study from Sydney describes the authors use of the Inflow intra-urethral device for managing acontractile bladders in female patients. They found that the device provides an effective method of bladder drainage, with an acceptable side-effect profile and a significant improvement in quality of life.


To compare overall and methodological quality with content in national and supra-national Clinical Practice Guidelines (CPGs) on benign prostatic hyperplasia (BPH) and lower urinary tract symptoms (LUTS), as the purpose of CPGs is to reduce unwanted variation in practice and improve patient care by setting agreed standards based on the best available evidence.


An electronic search was used to identify Internet-based national and supra-national CPGs on BPH and LUTS available in 2001. Two independent assessors analysed the content and appraised the methodological quality of the CPGs using an existing and validated instrument (St. George's Hospital Medical School Health Care Evaluation Unit Appraisal Instrument) comprising 37 items grouped into three broad areas, i.e. rigour of development, context and content, and clinical application.


Eight CPGs were suitable for appraisal; there was much variation in overall and methodological quality. There was agreement that a patient history and physical examination (including a digital rectal examination) should be used in all symptomatic men. In addition, patients’ symptoms should be assessed using a validated symptom score, e.g. the International Prostate Symptom Score. There was considerable variation in the number and type of diagnostic tests recommended for routine assessment. CPGs scoring low on the appraisal instrument (indicating poor overall and methodological quality) were more likely to recommend more diagnostic tests than those scoring high. There was general agreement between the guidelines on the treatment of BPH/LUTS and the importance of the patient's involvement in making management decisions. Guideline quality was independent of local health resources and publication year.


The overall and methodological quality of CPGs on BPH/LUTS varies considerably. There appears to be an inverse relationship between guideline quality and the number of diagnostic tests recommended for routine assessment. Using CPGs of high quality may prevent men with BPH/LUTS being exposed to tests of doubtful utility. Although this may reduce both resource use and exposure to potential harm, moving to a more minimalist approach to diagnosis may itself be potentially harmful to patients.


Clinical Practice Guideline


European Association of Urology.


High-quality healthcare implies a practice that is consistent with the best available evidence [1]. Obtaining and critically appraising the current evidence, as well as considering this evidence in the context of an individual's circumstances, is beyond the time, skills and resources of most clinicians. Clinical Practice Guidelines (CPGs) provide an evidence-based framework on which clinicians base their practice. Their purpose is to reduce unwanted variation by setting agreed standards based on the best available evidence [2]. However, in clinical areas where there are numerous CPGs there is considerable variation in the recommendations they make. This has raised concerns about their quality [3].

Over the last decade several national and supra-national CPGs have been developed that provide advice on the management of BPH and associated LUTS. There is much variation in both the diagnostic and therapeutic recommendations made by these CPGs [4]. Are the variations in content associated with overall and methodological quality? To address this question we analysed the content and assessed the quality of all national and supra-national Internet-based CPGs on BPH/LUTS available in December 2001.


Only Internet-based national and supra-national CPGs were selected for appraisal; CPGs were selected if they were available in full (most CPGs in this study), had a summary of their recommendations (including details of how to obtain a full copy), or were publicised on the Internet. The Internet was chosen as it is now considered the standard preferred method of obtaining scientific information and is widely available to all. National and supra-national CPGs were selected as they represent either national guideline development groups, e.g. The National Health and Research Council of Australia, or institutions of interested parties (e.g. BAUS). Many CPGs have been developed in this area at a local level, such as those published as part of meeting proceedings, or produced by loco-regional working parties. These were discounted, as it would not be possible using existing resources to search, obtain and appraise all such CPGs. CPGs were appraised irrespective of their year of publication.

Medline, Google, AltaVista and Yahoo were searched under the principal Medical Subject Headings ‘benign prostatic hyperplasia’, ‘BPH’, ‘benign prostatic enlargement’, ‘lower urinary tract symptoms’ and ‘LUTS’ to locate CPGs on BPH and LUTS. Each heading was then researched using the Boolean operator AND with ‘guidelines’, ‘clinical practice guidelines’ and ‘recommendations’. Searches were extended using the Related Subject command when available. Some CPGs were not available in full on the Internet and were appraised using either a CD ROM (European Association of Urology, EAU) or a hardcopy (WHO and UK).

The instrument used to appraise the CPGs [5,6] is the result of collaborative research between the Health Care Evaluation Unit at St George's Hospital Medical School, London, UK, and The Health Services Research Unit at Aberdeen University, Scotland. It was developed and designed to encourage the systematic development of high-quality CPGs. It also provides a structured, transparent and reproducible method of appraising CPGs. The instrument has been extensively tested for reliability and validity on over 60 CPGs [7] and comprises 37 items grouped in three dimensions, addressing rigour of development/methodology (20 items); context and content (12 items); and clinical application (five items; Appendix A, Table 1) Over half of the available ‘points’ are allocated to the ‘rigour of development’. The appraisal instrument was developed in this way, as rigour of development/methodology is thought to be the principal reason why CPG recommendations may differ, given the same available evidence. The final score can vary from 0 (indicating low quality) to 37 (indicating high quality). A user guide accompanies the instrument, ensuring that the questions are interpreted consistently between assessors.

Table 1.  A summary of the appraisal instrument
Dimension and
criteria assessed
1. Rigour of development1–2020
Responsibility of the guidelines
Composition of the development group
Identification and interpretation of the evidence
Formulation of recommendations
Link between evidence and main recommendations
Peer review
2. Context and content21–3212
Aims and objectives of the guidelines
Identification of the target group
Circumstances for application and non-application
Presentation and format
Outcomes, benefits, harms and costs
3. Clinical application33–375

In this study two independent assessors appraised the quality of all the selected CPGs using the instrument. Where discrepancies in overall and individual question scores arose, each was re-assessed after discussion. Where disagreement remained an average score was taken; this explains why some total scores include a half-mark.


Twelve CPGs were identified; eight were suitable for appraisal and were identified from the following organizations and countries: the AUA, EAU, International Consensus Committee (WHO), Australia, Germany, Malaysia, Singapore and the UK (Appendix B); Malta, Poland and Switzerland indicated that they did not have their own CPGs, but based their national recommendations on the WHO CPGs. The CPGs from Brazil were not available in English.

As the individual recommendations made by many of these CPGs were discussed in detail elsewhere [4] we report findings relating to overall and methodological quality as determined by the appraisal instrument.

Half of the CPGs assessed (EAU, Malaysia, Singapore and UK) failed to report their search strategy for obtaining the evidence on which their recommendations were based. Similarly the EAU, WHO and Malaysia (low scoring) did not identify any method of assessing the strength of the evidence used. High-scoring CPGs (Australia, AUA) ranked the evidence used to make their recommendations, e.g. randomized controlled trials are considered to provide stronger evidence than descriptive studies including case-series and case reports [8]. High-scoring CPGs were more likely to address clinical application, declare their funding source and have a multidisciplinary development team, e.g. doctors, nurses, GPs and epidemiologists. A summary of the CPGs appraised is shown in Table 2.

Table 2.  A summary of the CPGs appraised and of the recommendations for diagnostic testing in the routine assessment of men with BPH/LUTS
  • *

    in decreasing order of score;

  • mean score from two assessors using the appraisal instrument, score 0–37 with a higher score indicating better overall and methodological quality;

  • ‡for implementation in clinical practice;

  • ¶R, recommended; O, optional; NR, not recommended, special circumstances only; ND, not discussed; KUB, kidneys, ureters and bladder; PVR, postvoid residual volume

Quality score30.528.511.51110.59.595
Publication year20001994199120011997199919981999
References listedXX
Funding source declaredXXXXX
Multidisciplinary team
Final draft pilotedXXXXXX
Use of symptom scoreRRRRRRRR
Use of a voiding diaryOORNDNDONDR
Urine analysisRRRNDRRRR
PVR measurementNRORRRROR
Serum creatinineNRRRRRRRR
urinary tractNRNROONRROO
Total recommended tests23645747


All the CPGs agreed that the patient's history should be taken and a physical examination (including a DRE) be used in all symptomatic men, and that patients’ symptoms should be assessed using a validated symptom score, e.g. the IPSS.

The number of recommended diagnostic tests for the routine assessment of men with BPH/LUTS was compared among the CPGs. In some cases if a test was not recommended the CPG stated it as optional. For example, many CPGs stated that flexible cystoscopy in men with BPH/LUTS was not routinely recommended but remained optional. Optional tests usually required clinical judgement to be applied to a situation, e.g. requesting upper urinary tract imaging on the basis of a raised serum creatinine level. Some diagnostic tests were actively not recommended for routine use and in these instances most CPG developers justified not using a test based on current evidence. Other CPGs failed to discuss some tests at all.

There was much variation in the number and type of diagnostic tests recommended for routine assessment by each of the CPGs (Table 2). The number of recommended diagnostic tests was inversely proportional to the CPG quality as determined by the appraisal instrument. CPGs with low scores (indicating poor overall and methodological quality) were more likely to recommend more routine diagnostic tests (Fig. 1). The highest scoring CPG (Australia) not only recommended the fewest diagnostic tests, but also contained the most tests stated as ‘not recommended’, each justified with a summary of the current evidence.

Figure 1.

The relationship between CPG score and the number of routine diagnostic tests recommended, with the regression line.


There was general agreement on the medical and surgical management of BPH and LUTS. All CPGs agreed on the criteria for surgical intervention in the presence of complications such as renal insufficiency, stones and recurrent infection, or in those where medical therapy had failed to control the most severe symptoms. In uncomplicated cases (those which cause no serious health threat) there was also agreement on the importance of patient preference in the decision-making process on treatment.


This study shows that there is considerable variation in the overall and methodological quality of CPGs for BPH and LUTS. As the same evidence is available to all CPG developers, the recommendations for best practice should be similar. The variations in CPG content seen in this and other studies may be explained by the variation in methodological quality. The results of this study are only observational and do not represent the opinion or recommendation of any of the authors or their institutions.

In this study CPGs were classified for quality as indicated by the appraisal instrument's scoring system; the higher the score the higher the quality. This classification may alter if the scoring system was allocated differently, e.g. with equal scoring across the three dimensions of rigour of development, context and content, and clinical application. However, the present instrument is the only validated CPG appraisal tool available.

The variation in overall and methodological quality found in the CPGs is probably a result of their development not being systematic. Similar results were reported previously for CPGs on the anticoagulation of patients with atrial fibrillation [9]. CPGs with low scores on the appraisal instrument neglected issues relating to the rigour of development, including linking their recommendations with high-quality evidence, e.g. that from randomized controlled trials or systematic reviews. The validity of recommendations made by CPGs failing to state their search strategy or how they formulated their recommendations from the current evidence must be questioned, i.e. those scoring low in this study.

The relationship between overall and methodological quality and CPG content was most evident in the use of routine recommended diagnostic tests. High-quality CPGs recommended fewer routine diagnostic tests than low-quality CPGs. As CPGs are developed with users in mind, this could have a profound effect on the process and outcome of care for men with BPH/LUTS. However, the conclusions from this study should be interpreted cautiously as only eight CPGs were included in the analysis.

One possible explanation for the variation in quality is that the most recently published CPGs might score the highest; this was not the case. The two CPGs with the highest scores were among the earliest and latest developed. The WHO CPGs were published in 1991 (earliest), and were not updated when this study was conducted, suggesting that it was not considered mandatory. We think that comparing CPGs published up to 10 years apart is valid, as indicated by the similarities in the content of the CPGs from EAU (2001) and WHO (1991). We appraised the methods of currently available CPGs irrespective of publication date.

Another possible explanation could be that local health care resources influence the number of tests recommended. However, the richest (America) and the poorest (Malaysia) recommended a similar number of tests [10]. Cultural differences in urological practice between America, Europe and Australia may also explain this variation in content. However, there was no reason for any differences in methodological quality.

The evidence of the usefulness of some diagnostic tests in many disease areas is poor [11]. In this study, high-quality CPGs were more likely to dismiss tests as there was no evidence of diagnostic effectiveness. Stating a test as not recommended is far more difficult for a CPG developer (as justification is required), but more useful for the potential user. A thorough understanding and synthesis of the evidence, as well as accurate documentation within the CPG, is required to achieve this in a way that will benefit clinicians.

The variation in recommendations for diagnostic tests for men with BPH/LUTS has strong implications for the management of such men. Using CPGs of high quality may prevent men with BPH/LUTS being exposed to tests of doubtful utility, reducing both resource use and exposure to potential harm. Adopting CPGs of poor quality may result in greater activity, by exposing men to many tests with little evidence to support them [12]. Using the highest scoring CPG (Australian), where only two routine diagnostic tests are recommended, i.e. symptom score and urine analysis, may seem attractive, but this minimal approach may itself expose patients to harm. Although there is only a weak correlation between LUTS, uroflow and postvoid residual volume [13,14], many would argue that these are useful tests both diagnostically (e.g. in detecting those with high residuals) and in monitoring those on watchful waiting or who have started drug therapy. Recommendations based solely on evidence may not be suitable when creating a purposeful CPG for an everyday user. Until the effectiveness of some individual tests has been confirmed, their routine use is unlikely to be recommended by the highest scoring CPGs where evidence summaries accompany the recommendations. This highlights the importance of a planned update to amend current recommendations based on the most recent evidence.

In conclusion, the overall and methodological quality of CPGs for BPH/LUTS varies considerably. In addition, higher quality CPGs recommend fewer routine diagnostic tests. By using a methodological appraisal instrument CPG developers will be encouraged to create CPGs that reflect the relevant research evidence more accurately. However, developing the ideal ‘user friendly’ CPG suitable for all practitioners who manage men with BPH/LUTS is a difficult task. Where numerous guidelines exist, those that have been formally appraised and are found to be of high methodological quality are likely overall to be the most beneficial to patients. Countries or organizations with no resources to create their own high-quality CPGs should adapt their practice policies from CPGs that score highly when formally appraised.


The appraisal instrument for CPGs. Each question is answered by ‘Yes’, ‘No’, ‘Not sure’ or ‘Not applicable’


Responsibility for guideline development

  • 1Is the agency responsible for the development of the guidelines clearly identified?
  • 2Was external funding or other support received for developing the guidelines?
  • 3If external funding or support was received, is there evidence that the potential biases of the funding body(ies) were taken into account?

Guideline development group

  • 4Is there a description of the individuals (e.g. professionals, interest groups-including patients) who were involved in the guidelines development group?
  • 5If so, did the group contain representatives of all key disciplines?

Identification and interpretation of evidence

  • 6Is there a description of the sources of information used to select the evidence on which the recommendations are based?
  • 7If so, are the sources of information adequate?
  • 8Is there a description of the method(s) used to interpret and assess the strength of the evidence?
  • 9If so, is (are) the method(s) for rating the evidence satisfactory?
  • 10Is there a description of the methods used to formulate the recommendations?
  • 11If so, are the methods satisfactory?
  • 12Is there an indication of how the views of interested parties not on the panel were taken into account?
  • 13Is there an explicit link between the major recommendations and the level of supporting evidence?

Peer review

  • 14Were the guidelines independently reviewed prior to their publication/release?
  • 15If so, is explicit information given about methods and how comments were addressed?
  • 16Were the guidelines piloted?
  • 17If the guidelines were piloted, is explicit information given about the methods used and the results adopted?


  • 18Is there a mention of a date for reviewing or updating the guidelines?
  • 19Is the body responsible for the reviewing and updating clearly identified?

Overall assessment of development process

  • 20Overall, have the potential biases of guideline development been adequately dealt with?



  • 21Are the reasons for the developing the guidelines clearly stated?
  • 22Are the objectives of the guidelines clearly defined?


  • 23Is there a satisfactory description of the patients to which the guidelines are meant to apply ?
  • 24Is there a description of the circumstances (clinical or non clinical) in which exceptions might be made in using the guidelines?
  • 25Is there an explicit statement of how the patient's preferences should be taken into account in applying the guidelines?


  • 26Do the guidelines describe the condition to be detected, treated, or prevented in unambiguous terms?
  • 27Are the different possible options for management of the condition clearly stated in the guidelines?
  • 28Are the recommendations clearly presented?

Likely costs and benefits

  • 29Is there an adequate description of the health benefits that are likely to be gained from the recommended management?
  • 30Is there an adequate description of the potential harms or risks that may occur as a result of the recommended management?
  • 31Is there an estimate of the costs or expenditures likely to incur from the recommended management?
  • 32Are the recommendations supported by the estimated benefits, harms and costs of the intervention?


Guideline dissemination and implementation

  • 33Does the guideline document suggest possible methods for dissemination and implementation?

Monitoring of guidelines/clinical audit

  • 34Does the guideline document specify criteria for monitoring compliance?
  • 35Does the guideline document identify clear standards or targets?
  • 36Does the guideline document define measurable outcomes that can be monitored?

National Guidelines Only

  • 37Does the guideline document identify key elements which need to be considered by local guideline groups?


The URLs used in this study





International Consensus Committee (WHO)