H van Dongen, Department of Gynaecology, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, the Netherlands. Email H.van_Dongen@lumc.nl
Background This study was conducted to assess the accuracy and feasibility of diagnostic hysteroscopy in the evaluation of intrauterine abnormalities in women with abnormal uterine bleeding.
Search strategy Electronic databases were searched from 1 January 1965 to 1 January 2006 without language selection. The medical subject heading (MeSH) and textwords for the following terms were used: hysteroscopy, diagnosis, histology, histopathology, hysterectomy, biopsy, sensitivity and specificity.
Setting University Hospital.
Selection criteria The inclusion criteria were report on accuracy of diagnostic hysteroscopy in women with abnormal uterine bleeding compared to histology collected with guided biopsy during hysteroscopy, operative hysteroscopy or hysterectomy.
Data collection and analysis Electronic databases were searched for relevant studies and references were cross-checked. Validity was assessed and data were extracted independently by two authors. Heterogeneity was calculated and data were pooled. Subgroup analysis was performed according to validity criteria, study quality, menopausal state, time, setting and performance of the procedure. The pooled sensitivity, specificity, likelihood ratios, post-test probabilities and feasibility of diagnostic hysteroscopy on the prediction of uterine cavity abnormalities. Post-test probabilities were derived from the likelihood ratios and prevalence of intrauterine abnormalities among included studies. Feasibility included technical success rate and complication rate.
Main results One population of homogeneous data could be identified, consisting of patients with postmenopausal bleeding. In this subgroup the positive and negative likelihood ratios were 7.9 (95% CI 4.79–13.10) and 0.04 (95% CI 0.02–0.09), raising the pre-test probability from 0.61 to a post-test probability of 0.93 (95% CI 0.88–0.95) for positive results and reducing it to 0.06 (95% CI 0.03–0.13) for negative results. The pooled likelihood ratios of all studies included, calculated with the random effects model, were 6.5 (95% CI 4.1–10.4) and 0.08 (95% CI 0.07–0.10), changing the pre-test probability of 0.46 to post-test probabilities of 0.85 (95% CI 0.78–0.90) and 0.07 (0.06–0.08) for positive and negative results respectively. Subgroup analyses gave similar results. The overall success rate of diagnostic hysteroscopy was estimated at 96.9% (SD 5.2%, range 83–100%).
Conclusions This systematic review and meta-analysis shows that diagnostic hysteroscopy is both accurate and feasible in the diagnosis of intrauterine abnormalities.
Abnormal uterine bleeding in premenopausal and postmenopausal women is the single most common reason for gynaecological referrals. In more than 40% of the referred patients polyps and myomas have been reported.1 The ultimate gold standard in uterine cavity evaluation is hysterectomy. This can, however, not be used as a diagnostic tool. Hysteroscopy permits direct visualisation of the cervical canal and uterine cavity, enabling observation of intrauterine abnormalities. An accurate diagnosis may result in surgical or medical treatment directed at the specific pathology and may avoid the need for major surgery. Since Gimpelson and Rappold2 reported that hysteroscopy combined with guided biopsy was more accurate than dilatation and curettage, hysteroscopy is considered an accurate ‘gold standard’ in uterine cavity evaluation. Despite the lack of adequate information about the diagnostic accuracy, it is used in many studies with and without endometrial sampling as a reference standard.3–5 Although a high-quality review has been published about the accuracy of hysteroscopy in 2002, this review focused exclusively on studies reporting on presence or absence of (pre-)malignant disorders of the endometrium.6 It took until 2003 before a systematic review and meta-analysis of the accuracy of hysteroscopy was conducted in the assessment of intracavitary abnormalities in general in premenopausal women with abnormal uterine bleeding.7 This review, however, had only included studies written in English and because of heterogeneity between studies, no positive likelihood ratio had been calculated. Therefore, the purpose of this systematic review and meta-analysis is to evaluate appropriately, without language restriction, the diagnostic accuracy of hysteroscopy in the evaluation of intrauterine abnormalities in premenopausal and postmenopausal women with symptoms of abnormal uterine bleeding.
This review was focused on studies in which the results of the diagnostic hysteroscopy in the evaluation of the uterine cavity were compared to histology. The population of interest was premenopausal or postmenopausal women with symptoms of abnormal uterine bleeding. The main outcome measures of our systematic review and meta-analysis were the accuracy, by means of likelihood ratios and post-test probabilities. Secondary outcome measures were the feasibility by means of technical success rate and complication rate, and the accuracy of hysteroscopy in the diagnosis of endometrial polyps and submucous myomas.
Electronic databases (MEDLINE, EMBASE, Current Contents, Science Citation Index and the Cochrane database) were searched from 1 January 1965 (first MEDLINE citations) to 1 January 2006 without language selection as suggested for an accurate literature search.8 In our search we used the strategy for articles reporting on diagnostic test evaluation as proposed in the literature.9,10 The medical subject heading (MeSH) and text words for the following terms were used in the search strategy: hysteroscopy, diagnosis, histology, histopathology, hysterectomy, biopsy, sensitivity and specificity (Appendix A; search strategy). Reference lists of the included articles were cross-checked in search for additional relevant studies not detected by the literature search.
Two authors (H.vD. and C.dK.) read the abstracts of all retrieved studies and decided about the inclusion. In case of disagreement the article was included for full reading.
The inclusion criteria were report on accuracy of diagnostic hysteroscopy in women with abnormal uterine bleeding compared to histology collected with guided biopsy during hysteroscopy, operative hysteroscopy or hysterectomy as reference standard.
Studies on both premenopausal and postmenopausal women were eligible. Study population was considered premenopausal or postmenopausal if more than 70% of the participants were premenopausal or postmenopausal respectively. If the study population could not be classified into either, they were excluded for subgroup analysis for this item as discussed further on. Studies in which the population contained less than 70% of women with complaints of abnormal uterine bleeding were excluded. Studies where more than 5% of cases used Tamoxifen or if fertility problems were the primary reason for hysteroscopy, were excluded because of different prevalence and pathophysiology influencing outcome measures.11 Studies only reporting on malignancy of the endometrium were excluded as well. This study focused on diagnostic hysteroscopy; therefore studies reporting on diagnostic accuracy based on findings by hysteroscopy specifically designed for therapy were excluded as well.
The methodological quality of each selected paper was assessed independently by the two reviewers. The quality assessment tool (an adjusted version of the validity criteria suggested by Deville et al. ,8Appendix B) contained criteria assessing internal and external validity. The internal validity criteria refer to study characteristics. External validity criteria were used to provide insight into the generalisability of the studies. The quality assessment tool was piloted in a subset of included studies and tested for reliability by means of repeatability of its use and agreement by the two reviewers. Disagreement concerning the quality of the included studies was solved by consensus. As no evidence exists on interpretation of quality of validity tools, we decided to consider studies of high quality if more or equal to ten points (>two-third of maximum score) were scored. Studies with a quality score between six and nine were considered of moderate quality and studies with a score smaller than or equal to five were considered of low quality. The mean quality of all included studies was calculated.
We used guidelines for meta-analysis of diagnostic trials as published by others.8,12–14 Two outcomes were considered: intracavitary abnormality and normal uterine cavity. Intracavitary abnormalities included all intrauterine polyps, myomas, synechiae, septae, and (pre-)malignancies. A uterine cavity was considered normal in case of functional or atrophic endometrium. Data were abstracted as 2 × 2 tables of hysteroscopy result (positive or negative for intrauterine abnormality) and the histological results (normal or abnormal) independently by the two reviewers. To define test errors, cases in which the hysteroscopy result was negative for intrauterine abnormalities and the reference standard result was abnormal were regarded as false-negative results. False-positive results were cases in which the hysteroscopy result was abnormal and the reference standard was normal.
Information on menopausal status, the number of women recruited and technical details pertaining to hysteroscopic examination were retrieved from the articles as well. Whenever necessary, authors were contacted and asked to supply additional information. Furthermore, separate 2 × 2 tables were constructed to analyse the diagnostic accuracy specifically for benign intrauterine disorders, such as endometrial polyps and submucous myomas. When 2 × 2 tables contained empty cells, 0.5 was added to each cell to enable calculations. The sensitivity and specificity with 95% CI were calculated from each 2 × 2 table for all included studies. χ2 statistics, weighted for sample size, was used to evaluate heterogeneity. Subgroup analysis was performed to obtain the largest population with homogeneous data to perform the meta-analysis. Subgroups were defined before the analysis according to items of the quality assessment tool, quality of the studies, menopausal state, time, setting and performance of the procedure. The summary sensitivity–specificity point, likelihood ratios and post-test probabilities (with 95% CI) were estimated by the fixed-effects model, whenever heterogeneity could be rejected (χ2: P ≥ 0.05). The post-test probabilities were derived from the likelihood ratios and the pre-test probabilities from the prevalence of intrauterine abnormalities among included studies. Likelihood ratios greater than 5 or less than 0.2 may expect to generate moderate to large conclusive shifts from pre-test to post-test probability.15 In the presence of heterogeneity across studies, a random effects model was used. This approach produces wider 95% CIs, and allows for a more conservative interpretation of the results.8
Spearman’s correlation of sensitivities and specificities was calculated to assess whether studies originated from one-receiver-operator-characteristics curve (ROC curve). If Spearman’s correlation coefficient was <−0.25, studies were plotted in a sensitivity–specificity-area and a summary ROC curve was constructed.
Hysteroscopic procedures that failed to make a final diagnosis because of technical aspects, inadequate visualisation or participant factors were categorised as failed procedures and recorded. Complication rates were recorded as well.
Of 409 retrieved abstracts, 55 relevant articles were selected for full reading. From cross-checking reference lists of relevant studies, another 16 studies were included, leading to 71 articles for full reading. Fourteen authors were contacted and asked for additional data, 3 responded whereas 11 did not respond and had to be excluded. In total, 54 studies were excluded because of various reasons as shown in Figure 1.2,16–68 Finally 17 studies with 4208 procedures remained, and were included in our meta-analysis.69–85 The characteristics of these studies are shown in Table 1 and the sensitivity–specificity plots are displayed in Figure 2. The mean quality of the included studies was 8.7 (median 8, range 4–13, maximum possible score 14). The intra-observer and inter-observer agreement of the quality assessment (intra-class correlation alpha with 95%) was 0.77 (95% CI 0.71–0.84) and 0.72 (95% CI 0.65–0.79), respectively.
Table 1. Methodological details of all included studies
The results of the heterogeneity analysis are shown in Table 2. Five studies that included women with postmenopausal bleeding were found to have homogeneous data.73,78,79,82,84 In this group the pooled sensitivity of diagnostic hysteroscopy in the assessment of the uterine cavity was 0.96 (95% CI 0.93–0.99) and the pooled specificity 0.90 (95% CI 0.83–0.95). The positive and negative likelihood ratios were 7.9 (95% CI 4.79–13.10) and 0.04 (95% CI 0.02–0.09) respectively. The pre-test probability (prevalence) of uterine cavity abnormalities in this subgroup was 0.61 (95% CI 0.25–0.97) and changed to post-test probabilities of 0.93 (95% CI 0.88–0.95) and 0.06 (95% CI 0.03–0.13) for positive and negative results respectively. A summary ROC curve could not be constructed because the Spearman’s correlation coefficient of the sensitivities and specificities was −0.19.
Table 2. Heterogeneity analysis and correlation of sensitivity and specificity of included studies
Number of studies
Number of procedures
Sensitivity P -value (χ2 tests)
Specificity P -value (χ2 tests)
Gold standard operative hysteroscopy or guided biopsy
Gold standard hysterectomy
Explicit definition of normal/abnormal
Avoidance of verification bias
Independent interpretation of tests
Inclusion and exclusion criteria mentioned
Premenopausal women only
Postmenopausal women only
Patients scheduled for surgery only
Distension with saline fluid
Distension with carbon dioxide
Follicular phase of cycle
Time of verification
Reference test similar
Quality score ≥10
Quality score ≥8
Quality score ≥7
Heterogeneity remained within the other specified groups: therefore we decided to pool all studies included by using the random effects model. The prevalence of uterine abnormalities was 46.6% (95% CI 22–67%). The pooled likelihood ratios of all studies included were 6.5 (95% CI 4.1–10.4) and 0.08 (95% CI 0.07–0.10) for positive and negative results respectively. The pre-test probability of 0.46 changed to post-test probabilities of 0.78 (95% CI 0.69–0.85) and 0.04 (95% CI 0.03–0.05) for positive and negative results respectively. The pooled likelihood ratios and post-test probabilities of the subgroup analyses, according to the quality assessment score and menopausal state, are detailed in Table 3.
Table 3. Results of meta-analysis on the diagnostic accuracy of hysteroscopy in the evaluation of the uterine cavity stratified by different subgroups
Number of procedures
Likelihood ratio (95% CI)
Post-test probability (95% CI)
Fixed effects model was used to calculate pooled likelihood ratios.
In a separate analysis we calculated the pooled sensitivity, specificity, likelihood ratios and post-test probabilities of all studies included for the diagnosis of endometrial polyps and submucous myomas with the random effects model. Unfortunately, some studies had to be excluded because the first author of the article was unable to supply the raw data necessary to split results for endometrial polyps and submucous myomas. For the diagnosis of endometrial polyps (n = 12) the pooled sensitivity was 0.94 (95% CI 0.92–0.96), whereas the specificity was 0.92 (95% CI 0.91–0.94). For the diagnosis of submucous myomas (n = 11), the pooled sensitivity was 0.87 (95% CI 0.81–0.92), and the specificity 0.95 (95% CI 0.93–0.97). The corresponding likelihood ratios, post-test probabilities and prevalence are depicted in Table 4.
Table 4. Results of meta-analysis on the diagnostic accuracy of hysteroscopy in the evaluation of endometrial polyps and submucous myomas by the random effects model
Likelihood ratio (95% CI)
Post- test probability (95% CI)
Fixed effects model was used to assess pooled likelihood ratios.
Failure rates were clearly reported in 12 (80%) of the 17 studies. The overall success rate of diagnostic hysteroscopy was estimated at 96.6% (SD 3.6%) when studies with unclear reporting were excluded. For premenopausal women the success rate weighted for the number of procedures was 96.8% (SD 2.7%), which was significantly higher (P = 0.002) than for postmenopausal women (success rate 95.6%, SD 6.4%). If participants were not weighted by the number of procedures, there was no significant difference found between premenopausal and postmenopausal women. The success rate of outpatient (office) procedures was estimated at 96.1% (SD 3.8%). In 1399 procedures 16 complications were reported (1.0%, SD 1.6%, range 0–4%): 13 vasovagal collapses, two false tracts and one perforation of the uterine wall.
This systematic review and meta-analysis of diagnostic hysteroscopy for premenopausal and postmenopausal women with abnormal uterine bleeding provides information from several comparative studies of hysteroscopy and histology collected at hysterectomy, operative hysteroscopy or guided biopsy as reference tests. It shows that diagnostic hysteroscopy is accurate in the diagnosis of intrauterine abnormalities and therefore clinically useful. Moreover, in accordance with others,6,86 our review confirms that diagnostic hysteroscopy is safe, with a low incidence of serious complications and a small failure rate.
The prevalence of intrauterine abnormalities in our review of women with abnormal uterine bleeding was 46.6%, which is consistent with previously published literature.1 The likelihood ratios were in the range that suggest that diagnostic hysteroscopy is useful both in predicting disease and excluding a non-diseased state.15 A separate analysis concerning the accuracy of endometrial polyps and submucous myomas did not reveal any difference. As missing endometrial polyps in postmenopausal women may result in undiagnosed malignant disorders, a subanalysis was performed, which showed similar results. Likewise, Clark et al. 6 already proved in their meta-analysis that diagnostic hysteroscopy is accurate in the diagnosis of endometrial cancer.
It has been suggested that a thick endometrium obscures a complete view of the uterine cavity, which would especially hamper accurate detection of intrauterine abnormalities.87 Therefore we pooled studies that performed hysteroscopy solely in the follicular phase of the menstrual cycle. Unfortunately, this failed to result in a clinically significant increase of the post-test probability, so an evidence-based recommendation on this subject cannot be made yet. Nevertheless, to achieve optimal visualisation it is practical to schedule diagnostic hysteroscopy in the follicular phase of the cycle.88
It is interesting that generally when all studies are pooled as opposed to a selected group one expects a more precise but a conservative result. In this review the estimates of the accuracy pooling all studies are somewhat counter-intuitive in that they were not as good as the studies that were homogeneous. In this case the homogeneous population represents postmenopausal women, which may reflect a better accuracy on account of the postmenopausal state. Nevertheless, if we compare the likelihood ratios of postmenopausal to premenopausal women, this was only the case for the negative estimate. Therefore, it might be more likely that the different models used to calculate the pooled likelihood ratios and the different quality of the studies included in both subgroups are responsible for this phenomenon.
Further, although we found in this review a significantly better success rate of diagnostic hysteroscopy among premenopausal women than among postmenopausal women, this difference was only 1% and therefore clinically not of any importance.
Also noteworthy is that 22% of the articles included for full reading were obtained by cross-checking reference lists of included studies. Although, this may imply a poor search strategy, it is more likely that these reports were poorly indexed, which is often the case for older reports on diagnostic accuracy.8 Moreover, all of the studies identified by cross-checking did eventually not meet our inclusion criteria, and were excluded after all.
As meta-analyses often include small numbers of studies, the power of the χ2 tests is low, and so they are poor at detecting true heterogeneity among studies.89 An alternative approach to quantify the effect of heterogeneity is the I 2 index, which describes the percentage of total variation across studies that is because of heterogeneity rather than chance.90 In this review the I 2 index revealed no differences compared to the χ2 tests for heterogeneity (data not shown).
The differences in results among individual studies included, give reason to criticise our review. Homogeneity of results from study to study is one of the criteria for meta-analysis, but presence of inconsistency does not always invalidate a meta-analysis. In this situation, it is important to consider possible reasons for heterogeneity. We examined the sources of heterogeneity in accordance with published guidelines, taking into account differences in methodological quality and study characteristics.8,91 The quality of the studies included varied indeed considerably. Nevertheless, subgroup analyses regarding quality revealed no specific sources of heterogeneity. Yet, investigation of heterogeneity is often limited without access to individual participant data.92
Heterogeneity may also be caused by clinical differences.93 Variations in the study population among studies can all result in different estimates of diagnostic accuracy. An explanation for these differences might be the fact that disease status is defined by use of different diagnostic thresholds to define positive and negative results.94 The primary outcome in our meta-analysis was presence or absence of intrauterine abnormalities instead of a certain threshold. Nevertheless, this is prone to a certain amount of subjectivity and could have introduced heterogeneity. Unfortunately, definition of diseased state was poorly reported and could not be solved as we had no access to individual data.
As the number of studies included in this review was rather small, it was not useful to examine sources of heterogeneity as thoroughly as possible, as the number of available data points would have limited its significance. To be able to draw conclusions, we decided to base our inferences on the overall pooled results calculated by the random effects model.95
The potential bias due to variation in histological variation and lack of blinding in its assessment needs to be discussed as well. Hysterectomy specimens are regarded as the criterion standard for verification of intrauterine diseases, but exclusive use of this reference standard in a diagnostic study is not feasible.6 Therefore it is not surprising that many included studies obtained histology by guided biopsy. If the phenomenon of an imperfect gold standard is ignored, there will be a tendency to underestimate the diagnostic performance of the investigated test.96 On the contrary, if a reference test is established with knowing the outcome of the index test, test accuracy is overestimated.8
With regard to diagnostics of the uterine cavity, it is noteworthy that recently a meta-analysis on the accuracy of saline infusion sonography in women with abnormal uterine bleeding reported a sensitivity of 0.95 and a specificity of 0.88, equalling the accuracy of diagnostic hysteroscopy in our review (0.94 and 0.89, respectively).11 It is thought that saline infusion sonography reduces costs and discomfort for women concerned.97–100 Nowadays, diagnostic hysteroscopy is performed according to the so-called vaginoscopic approach without use of speculum or tenaculum, reducing discomfort significantly.101 Furthermore, as a result of recent advances in endoscopic instrumentation there is evidence suggesting that outpatient therapeutic hysteroscopic procedures provide significant cost savings and are preferred by women compared to day case procedures.102,103 Whether these improvements make diagnostic hysteroscopy comparable to saline infusion sonography regarding cost-effectiveness and patient compliance remains unclear.
This systematic review and meta-analysis gives the strongest evidence to date that diagnostic hysteroscopy is both accurate and feasible in the diagnosis of intrauterine abnormalities. As diagnostic hysteroscopy is predominantly performed in the outpatient clinic, and therapy in an inpatient setting, an accurate diagnosis is important to direct treatment at the specific pathology and avoid needless surgery. Moreover, it may contribute to prognosis of expected quality of life (e.g. regarding complaints) as well.
The authors would like to thank Mr J.W. Schoones of the Walaeus Library for his pleasant and very useful help with the electronic literature searches.
Table AppendixA.. Search strategy in PubMed (MEDLINE) for publications about the evaluation of diagnostic accuracy
The above cited strategy was also used for EMBASE, Current Contents, Science Citation Index and the Cochrane database, adjusted according to specific requirements for the particular electronic database.
(((hysteroscopy AND diagnosis) OR (hysteroscop* AND diagnos*)) AND (“cytology”[Subheading] OR histology OR histopathology OR histolog* OR histopatholog* OR “Biopsy”[MeSH] OR biopsy OR biops* OR “Hysterectomy”[MeSH] OR hysterectom* OR hysterectomy OR “Curettage”[MeSH] OR curettage) AND (“Sensitivity and Specificity”[MeSH] OR “Sensitivity and Specificity” OR (Sensitivity AND Specificity) OR “Predictive Value of Tests”[MeSH] OR Predictive Value of Tests)) OR ((hysteroscopy OR hysteroscop*) AND (“cytology”[Subheading] OR histology OR histopathology OR histolog* OR histopatholog* OR “Biopsy”[MeSH] OR biopsy OR biops* OR “Hysterectomy”[MeSH] OR hysterectom* OR hysterectomy OR “Curettage”[MeSH] OR curettage) AND (“Sensitivity and Specificity”[MeSH] OR “Sensitivity and Specificity” OR (Sensitivity AND Specificity)))
Table AppendixB.. Quality assessment Form (Adjusted according to Devilléet al. 8; respective points in brackets, minimum score 1 point, maximum score 14 points.)
Criteria of internal validity
1. Reference standard
Hysteroscopy with biopsy (1)
2. Reference test similar within study
One reference standard (1)
3. Definition of abnormal/normal
Definition of polyp and fibroid (1)
4. Independent interpretation
Interpretation reference blinded for index (1)
5. Avoidance of verification bias
Verification independent of result (1)
Prospective (consecutive series ) retrospective (0)
Criteria of external validity
7. Setting mentioned
Information to identify setting (outpatient, inpatient )
Failure rate mentioned (1)
9. Reason for referral
Inclusion criteria mentioned (1)
Exclusion criteria mentioned (1)
10. Information on index test
Information about procedure (in)directly available (1)