• evidence-based medicine;
  • research design standards;
  • epidemiological;
  • publishing standards;
  • observational study methods


  1. Top of page
  2. Abstract
  9. Appendix


To develop and apply a standardized evaluation form for assessing the methodological and reporting quality of observational studies of surgical interventions in urology.


An evaluation standard was developed using the Consolidated Standards for Reporting Trials statement and previously reported surgical reporting quality instruments. Consensus scoring among three reviewers was developed using two distinct sets of studies. All comparative observational trials involving therapeutic surgical procedures published in four major urological journals in 1995 and 2005 were randomly assigned to each reviewer. Categories of reporting adequacy included background, intervention, statistical analysis, results and discussion.


Twenty-seven articles in 1995 and 62 in 2005 met the inclusion criteria; 90% of studies were retrospective. From 1995 to 2005, the overall reporting quality score increased by 3.9 points (95% confidence interval, CI, 2.7–5.9; P = 0.001), from a mean (sd) of 19.1 (3.9) to 23.0 (4.2) on a scale of 0–42. There were significant improvements in the reporting categories of study background (+0.7 points, 95% CI 0.1–1.3, P = 0.043, 0–8-point scale), intervention (+1.6 points, 0.8–2.3, P = 0.001, 0–9-point scale), and statistical analysis (+0.8 points, 0.2–1.4, P = 0.006, 0–9-point scale). There were smaller and statistically insignificant improvements for results (+0.5 points, −0.3 to 1.2, P = 0.217, 0–10-point scale) and discussion reporting (+0.4 points, −0.1 to 0.8, P = 0.106, 0–6-point scale).


There have been minor improvements in the reporting of observational studies of surgical intervention between 1995 and 2005. However, reporting quality remains suboptimal. Clinical investigators, reviewers and journal editors should continue to strive for transparent reporting of the observational studies representing the bulk of the clinical evidence for urological procedures.


randomized controlled trial


Consolidated Standards of Reporting Trials


Strengthening the Reporting of Observational Studies in Epidemiology.


  1. Top of page
  2. Abstract
  9. Appendix

Well-designed randomized controlled trials (RCTs) are generally accepted to provide the highest level of evidence in support of therapeutic interventions. However, they face considerable challenges when applied to surgical procedures [1]. Specifically, ensuring that surgeons are unaware of treatment is impossible, and ensuring the same for patients and other personnel involved in the intervention is often not feasible. Other methodological issues relate to surgical skill-sets and the accumulation of experience [2]. In addition, both surgeons and patients frequently lack equipoise and are unwilling to accept random allocation to an invasive surgical procedure. These reasons might contribute to the fact that RCTs account for <5% of the studies published in urology [3].

In the absence of RCTs, the highest quality of evidence is derived preferably from prospective observational studies of high methodological quality. In urology, observational studies represent the bulk of available evidence on the therapeutic effectiveness of surgical procedures and devices [4]. Whereas the issues surrounding the methodological and reporting quality of RCTs are well-recognized and have led to the Consolidated Standards of Reporting Trials (CONSORT) initiative [5], there has been less focus on observational studies to date. Although several topic-specific guidelines for the reporting of non-randomized studies have been proposed, no consensus statement has yet been promulgated [6–10]. Furthermore, many of these guidelines are not readily applicable to surgery.

In the light of the paramount importance of observational studies to the body of evidence in support of therapeutic interventions in urological surgery, we sought to develop a set of surgery-specific criteria for critically assessing the methodological and reporting quality of cohort studies. These criteria were then applied to studies published a decade apart in urology journals to assess the progression of reporting adequacy of invasive surgical procedure studies.


  1. Top of page
  2. Abstract
  9. Appendix

As no surgery-specific guidelines for reporting observational cohort studies were available, a composite checklist based on the CONSORT criteria and checklists for evaluating paediatric surgical studies was developed, consisting of 42 scoreable items with three additional items for studies with secondary outcomes or with no single primary outcome [5,7,11,12] (the Appendix details the quality-reporting instrument). Each item was scored as either ‘yes’, ‘no’, or ‘not applicable.’ To assure internal consistency, each of the three reviewers (T.T., R.B. and P.D.) evaluated a separate set of four randomly selected observational studies that met the study selection criteria outlined below. One article each came from the volume published in 2003 of each of the four journals to be reviewed. Based on these evaluations, changes were made to the evaluation instrument and a second set of four randomly selected studies were reviewed from 2002. There were no substantive differences during this second evaluation of the assessment instrument.

Study selection was limited to cohort studies of urological surgical procedures published in the 1995 and 2005 in the four major urology journals, i.e. European Urology, Journal of Urology, BJU International (formally known as the British Journal of Urology) and Urology. These journals are read by a wide audience of practising urologists and currently have the highest impact scores in this field (4.9, 4.0, 2.6 and 2.1, respectively) [13]. Prospective and retrospective cohort studies were identified using a defined Pubmed search strategy (Table 1).

Table 1.  The Pubmed search strategy
Publication year:1995 or 2005
Journals:J Urol
Br J Urol
Eur Urol
Study type:Comparative
Publication type:NOT case reports
NOT review
NOT editorial
NOT comment
NOT letter
NOT correspondence
NOT drug therapy
Medline Subject Heading:Surgical procedures, operative

Using the article abstracts, the articles identified by this search strategy were then screened for eligibility. Inclusion criteria included: (i) prospective or retrospective cohort studies with at least one arm involving an invasive urological surgical procedure or device for the purposes of therapeutic intervention; and (ii) analyses based on a comparison of at least two distinct groups of patients. Studies concerned primarily with determining prognosis, diagnosis, or making before and after comparisons within the same group (e.g. at different time points) were excluded. Studies using raw data from previously published studies (secondary analyses) were also excluded. A preliminary review of every article abstract published in the first three months of 1995 and 2005 in each of the four journals showed that the Pubmed search strategy identified >90% of eligible articles compared to manual searching. Each of the three reviewers was then randomly assigned a third of the articles from each journal deemed eligible for this study. Given the unique appearance of each journal, articles were not ‘blinded’ for journal, year of publication, authors, institution of origin, or funding source.

The primary aim of the present study was to develop and implement an instrument to assess the quality of surgical cohort studies, and thus assess the status quo of surgical study reporting in urological journals. The number of checklist criteria met (i.e. ‘yes’ responses) in each report was analysed. ‘Not applicable’ responses were scored as ‘yes’ responses for criteria that could not be fulfilled given the particular study design or study results. Statistical reporting for studies with a single primary outcome and no secondary outcomes was assessed using items S7a-S9a (Appendix). Statistical reporting for studies with no single primary outcome were assessed using items S7b-S9c. Statistical reporting for studies with both a single primary outcome and secondary outcomes were scored using the mean of items S7a-S9a and S7b-S9b.

Reporting adequacy scores in five categories (background, intervention, statistical analysis, results and discussion) were calculated for each study. Summary statistics for each year were also generated. The maximum possible overall score of our quality-assessment instrument was 42. Based on similar studies from reports in paediatric surgery, the estimated mean (sd) score would have been 22 (7) in 1995. Using anticipated sample sizes of 35 and 70 for 1995 and 2005, and a two-sided α of 0.05, this study was estimated to have 85% power to detect a clinically important 20% (4.4 point) increase in reporting quality score.

Sample size was estimated using Power Analysis and Sample Size (PASS 2005) software (NCSS, Kaysville, Utah, USA) and for all other statistical analyses we used standard commercial software. Groups were compared using the chi-square test and Student’s t-test, using two-sided testing and a predefined α of 0.05. Effect sizes for continuous variables were reported as mean differences with 95% CI. No formal adjustment was made for multiple comparisons.


  1. Top of page
  2. Abstract
  9. Appendix

Of 244 articles in 1995 and 461 in 2005 identified by the Pubmed search strategy, 31 in 1995 and 65 in 2005 were eligible for inclusion in this study after abstract review. Of these, an additional seven articles were excluded after review of the full-text publication, leaving 27 in 1995 and 62 in 2005; Fig. 1 shows the flowchart for study selection. Of the selected studies, 90% were retrospective; the most common topics were endourology (30%), oncology (24%), and reconstruction (24%) (Table 2).


Figure 1. Study selection flow chart.

Download figure to PowerPoint

Table 2.  The details of the studies
Variable, n (%)19952005OverallP
Study topic    
 Endourology  9 (33) 18 (29) 27 (30)0.669
 Oncology  5 (19) 16 (26) 21 (24) 
 Reconstruction  6 (22) 15 (24) 21 (24) 
 Voiding dysfunction  4 (15)  4 (7)  8 (9) 
 Infertility  1 (4)  6 (10)  7 (8) 
 Transplant  2 (7)  2 (3)  2 (3) 
 Infection  0 (0)  1 (2)  1 (1) 
Study type    
 Retrospective 24 (89) 56 (90) 80 (90)0.300
 Prospective  2 (7)  6 (10)  8 (9) 
 Not specified  1 (4)  0 (0)  1 (1) 
 Yes  3 (11)  5 (8)  8 (9)0.556
 No 16 (59) 44 (71) 60 (67) 
 Not specified  8 (30) 13 (21) 21 (24) 
Median sample size:    
 Overall 1121001000.510
 Of smallest arm 29 28 290.726

From 1995 to 2005, the overall score for reporting quality increased by 3.9 points (95% CI 2.7–5.9, P = 0.001) on a scale of 0–42, from a mean (sd) of 19.1 (3.9) to 23.0 (4.2). There were consistent improvements across all categories (Fig. 2), although not all of them were statistically significant. There were significant improvements of +0.7 points (95% CI 0.1–1.3, P = 0.043) on a 0–8-point scale for study background, +1.6 points (0.8–2.3, P = 0.001) on a 0–9-point scale for intervention, and +0.8 points (0.2–1.4, P = 0.006) on a 0–9-point scale for statistical analysis. There were smaller, insignificant improvements of +0.5 points (−0.3 to 1.2, P = 0.217) on a 0–10-point scale for results, and +0.4 points (−0.1 to 0.8, P = 0.106) on a 0–6-point scale for discussion.


Figure 2. The quality-of-reporting assessment by year, showing the mean number of items reported in each quality-assessment category. Grey bars represent articles from 1995, and black bars articles from 2005.

Download figure to PowerPoint

For individual criteria, a large percentage of studies from both years explained the scientific background (96% in 1995 vs 97% in 2005, P = 0.909) and study hypothesis (85% in 1995 vs 87% in 2005, P = 0.808). From 1995 to 2005, there were major improvements in descriptions of the surgical intervention (59% vs 87%, P = 0.003) and efforts of the researchers to standardize the procedures under investigation (48% vs 82%, P = 0.001). By contrast, clearly defined primary endpoints (11% vs 19%, P = 0.340), reporting of baseline data (59% vs 73%, P = 0.213), discussion of limitations (44% vs 48%, P = 0.732), provision of insignificant P-values (41% vs 63%, P = 0.053), and reporting of patient-capture rates (41% vs 44%, P = 0.806) did not significantly improve from 1995 to 2005.


  1. Top of page
  2. Abstract
  9. Appendix

Evidence-based decision making at both the individual patient and health-policy levels hinges on the availability of high-quality research studies. For questions of therapeutic effectiveness, the highest quality evidence is ideally derived from a systematic review and meta-analysis of multiple RCTs of low heterogeneity. However, this type of evidence is rarely available in urological reports, a situation that compels urologists to rely upon lower-quality evidence for clinical decision-making [4]. In the absence of RCT evidence, the best available observational data would stem from high-quality, preferably prospective, cohort studies. These studies should not only be carefully designed and executed, but also reported as transparently as possible. Such transparency allows the reader to more easily assess the validity and relevance of study results and ascertain whether the findings are applicable to the care of an individual patient. The critical issue of reporting quality has been recognized in the development of reporting standards such as CONSORT for RCTs [5], the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for systematic reviews and meta-analysis which was previously known as Quality of Reporting of Meta-Analyses [14], and the Standards for Reporting of Diagnostic Accuracy for diagnostic tests [15]. Specific reporting criteria for observational studies of surgical procedures have yet to be developed.

In the present study we developed a standardized checklist to assess the methodological and reporting quality for observational studies of surgical interventions. During the decade from 1995 to 2005, there was an increase in the overall quality of reporting score of 3.9 points (from 19.1 to 23.0) on a 42-point scale, representing a 17% improvement. While statistically significant, this score increase was just short of the 20% increase that we had considered to be a meaningful improvement for a 10-year period. In addition, overall reporting quality continued to be low in 2005, with only 54% of reporting criteria being met. Criteria that fell markedly below expectations in 2005 included the identification of a single primary outcome (18.5%), the description of the surgeons’ experience level (15.4%), and reporting of the percentage of patients lost to follow-up (13.8%). Each of these three criteria can have a major impact on the validity and general applicability of study findings, and is therefore important to the critical appraisal process. Although completeness is necessary for high-quality reporting, it is important to recognize that the failure to report certain methodological safeguards against bias does not necessarily indicate that they were not applied [16]. However, as the published report frequently represents the only source of information about a given study, complete and transparent reporting is of great importance to an evidence-based practice.

In the present study, the summation of equally weighted individual items was used to calculate a summary score representing an overall assessment of the methodological and reporting quality. This well-established approach is insightful, but has been criticised because methodological issues of varying importance are treated equally [17]. On the other hand, differential weighting of individual criteria would be subjective and subject to similar criticism. We therefore sought to complement the comparison of summary scores from 1995 and 2005 with the reporting of individual category scores (e.g. background, intervention, results, etc.) and select individual criterion scores. We chose not to adjust these comparative statistical analyses for multiple testing, recognizing that there is an increased risk of a type I error. The interpretation of improvements of individual criteria should therefore be interpreted with caution.

An additional limitation of this study is its focus on only 2 years of publication. Although one might speculate that the years chosen for review do not adequately represent the reporting quality of urological journals in other years, there is no foundation for this assumption. A further limitation of this study is that the instrument for assessing reporting quality was not validated. As there are no pre-existing validated instruments, our instrument was developed from applicable criteria found in the CONSORT statement and surgery-specific criteria found in other instruments. The instrument was then modified based upon the responses from three independent reviewers using two separate sets of sample observational-study articles. A formal validation that would have required a significantly greater effort with more reviewers and test articles was not done. However, one of the strengths of this study is its use of a standardized quality-of-reporting form that was derived from established instruments and pilot-tested for interobserver agreement.

Since our study began, an international collaboration has released the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement [18]. This 22-item checklist includes many of the same criteria found in our instrument for the background, methods, results and discussion categories. However, the STROBE statement, which aims to have the broadest applicability possible, is not surgery-specific. In particular, criteria not found in the STROBE statement, such as those involving the number of participating centres, the number of surgeons, and the experience level of the surgeons, can make a significant difference in how surgical studies are interpreted. An extension of the CONSORT statement to evaluate the reporting of nonpharmacological trials has also recently been developed that addresses several of these issues for RCTs of surgical interventions [19]. The STROBE statement and the CONSORT criteria for nonpharmacological trials therefore provide the basic framework upon which to build more surgery-appropriate criteria to critically appraise surgical observational studies. Reporting quality is receiving increasing attention in the research community, as witnessed by the launch of the Enhancing the Quality and Transparency of Health Research initiative that promotes the transparent and accurate reporting of health research [20]. Journal editors of leading urological journals are urged to support these efforts and to promote the transparent reporting of both RCTs and observational studies, by formally endorsing reporting standards where they exist.

In conclusion, although there have been minor improvements from 1995 to 2005, the methodological and reporting quality of observational studies of surgical procedures in urological journals remains suboptimal, with only about half of reporting standards being met in 2005. Increased efforts to train urological investigators in clinical research methods, and increased emphasis on the transparency of reporting of studies by reviewers and editors, appear to be indicated to address this issue.


  1. Top of page
  2. Abstract
  9. Appendix
  • 1
    Harbour R, Miller J. A new system for grading recommendations in evidence based guidelines. BMJ 2001; 323: 3346
  • 2
    Devereaux PJ, Bhandari M, Clarke M et al. Need for expertise based randomised controlled trials. BMJ 2005; 330: 88
  • 3
    Scales CD Jr, Norris RD, Keitz SA et al. A critical assessment of the quality of reporting of randomized, controlled trials in the urology literature. J Urol 2007; 177: 10904
  • 4
    Borawski KM, Norris RD, Fesperman SF, Vieweg J, Preminger GM, Dahm P. Levels of evidence in the urological literature. J Urol 2007; 178: 142933
  • 5
    Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001; 357: 11914
  • 6
    Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. Am J Public Health 2004; 94: 3616
  • 7
    Rangel SJ, Kelsey J, Colby CE, Anderson J, Moss RL. Development of a quality assessment scale for retrospective clinical studies in pediatric surgery. J Pediatr Surg 2003; 38: 3906
  • 8
    Cho MK, Bero LA. Instruments for assessing the quality of drug studies published in the medical literature. JAMA 1994; 272: 1014
  • 9
    Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health 1998; 52: 37784
  • 10
    Margetts BM, Thompson RL, Key T et al. Development of a scoring system to judge the scientific quality of information from case-control and cohort studies of nutrition and disease. Nutr Cancer 1995; 24: 2319
  • 11
    Rangel SJ, Kelsey J, Henry MC, Moss RL. Critical analysis of clinical research reporting in pediatric surgery: justifying the need for a new standard. J Pediatr Surg 2003; 38: 173943
  • 12
    Begg C, Cho M, Eastwood S et al. Improving the quality of reporting of randomized controlled trials. The CONSORT Statement. JAMA 1996; 276: 6379
  • 13
    ISI Web of Knowledge. Journal Citation Reports. Thomson Corporation, 2007
  • 14
    PLoS Medicine Editors. Many reviews are systematic but some are more transparent and completely reported than others. PLoS Med 2007; 4: e147
  • 15
    Bossuyt PM, Reitsma JB, Bruns DE et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 2003; 138: W112
  • 16
    Devereaux PJ, Choi PT, El-Dika S et al. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol 2004; 57: 12326
  • 17
    Sanderson S, Tatt ID, Higgins JP. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol 2007; 36: 66676
  • 18
    Von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007; 370: 14537
  • 19
    Boutron I, Moher D, Altman DG, Schulz KF, Ravaud P. Extending the CONSORT statement to randomized trials of nonpharmacologic treatment: explanation and elaboration. Ann Intern Med 2008; 148: 295309
  • 20
    Altman DG, Simera I, Hoey J, Moher D, Schulz K. EQUATOR: reporting guidelines for health research. Lancet 2008; 371: 114950


  1. Top of page
  2. Abstract
  9. Appendix

The instrument for assessing the quality of observational study reporting

B1Can the number of participating centres be determined?No, Yes
B2Can the practice type(s) of the participating centres be determined?No, Yes
B3Can the number of surgeons who participated in the study be determined?No, Yes
B4Is the experience level of the surgeons performing the procedure specified?No, Yes
B5Is the setting/location of collection of subjective data detailed?No, Yes, N/A
B6Scientific background/rationale explained?No, Yes
B7Specific objectives or hypotheses stated (i.e. broadly outlined method for comparison indicated)?No, Yes
B8Are selection and/or exclusion criteria for cases clearly stated?No, Yes
I1Is the surgical technique/intervention adequately described (e.g. specifically referenced article)?No, Yes
I2Is there any mention of an attempt to standardize operative technique/application of intervention? If technique is adequately  described, standardization may be assumed.No, Yes
I3Is there any mention of an attempt to standardize perioperative care?No, Yes
I4Is the period when all cases were performed clearly stated?No, Yes
I5Can it be determined if patients were or were not treated concurrently within the same periods?No, Yes
I6Is there a clearly defined single primary outcome?No, Yes
I7Methods for assessing outcomes described?No, Yes
I8For studies assessing functional outcomes, is it stated whether the assessment tool is validated?No, Yes, N/A
I9Do the authors describe how patients were chosen into each treatment group (e.g. patient choice or specific criteria)?No, Yes
Statistical analysis
S1Calculation to justify sample size?No, Yes
S2Statistical methods described?No, Yes
S3Statistical software identified?No, Yes
S4Identification of predetermined α (significant P-values)?No, Yes
S5Sided-ness of testing reported?No, Yes
S6Is it stated whether multiple testing is specifically addressed or not addressed? Answer ‘N/A’ only if it is obvious that there was no multiple testingNo, Yes, N/A
S7aSummary of results for each group for the single primary outcome?If no single primary outcome defined, select ‘N/A’ and use Questions S7b-S9b for outcomes reporting assessment.No, Yes, N/A
S8aEffect size (e.g. HR, RR, NNT) provided for the primary outcome?No, Yes, N/A
S9aPrecision of effect size (e.g. CI for HR, RR, NNT) provided for the primary outcome?Answer ‘No’ if no effect size given.No, Yes, N/A
S7bSummary of results for each group for secondary/other outcomes?No, Yes, N/A
S8bEffect size (e.g. HR, RR, NNT) provided for secondary/other outcomes?No, Yes, N/A
S9bPrecision of effect size (e.g. CI for HR, RR, NNT) provided for the secondary/other outcomes?Answer ‘No’ if no effect size given.No, Yes, N/A
Results reporting
R1Was any attempt made to blind evaluators during the analysis of data?No, Yes
R2Was the patient population from which the cases were selected from adequately described or identified (e.g. geographically)?No, Yes
R3Are study capture rates provided? If stated that ‘all’ patients were captured within a given period, then answer ‘Yes.’No, Yes
R4Are relevant baseline demographic and clinical data given for each group?No, Yes
R5Are actual numbers, alone or in addition to percentages, furnished for all demographic variables?No, Yes
R6Are actual numbers, alone or in addition to percentages, furnished for all results?No, Yes
R7Is the number and nature of complications addressed?No, Yes
R8For longitudinal studies, is attrition of subjects and reason for attrition recorded?No, Yes, N/A
R9Are exact P-values for significant results provided (<0.01 acceptable)?Check text for data not reported in tables/figures.No, Yes, N/A
R10Are exact P-values for insignificant results provided?Check text for data not reported in tables/figures.No, Yes, N/A
D1Do the authors address whether there is any missing data?If not explicitly addressed, answer ‘No’ unless it is obvious there is no missing data.No, Yes
D2Interpretation of results provided?No, Yes
D3Explicitly address study hypotheses/objectives?Answer ‘N/A’ if no study hypotheses/objectives were stated (Item B7).No, Yes, N/A
D4Address sources of potential bias/study limitations?No, Yes
D5Explicitly address general applicability or lack of general applicability of findings?No, Yes
D6Interpretation in context of current evidence?No, Yes