Endometriosis is a common gynaecological condition found in women of reproductive age.1–3 The condition is associated with chronic pelvic pain and infertility, which can result in reduced quality of life, psychological morbidity and work absenteeism.4 Diagnosis is invariably made following laparoscopic inspection of the pelvis. The appearance of endometriosis varies, but classically is seen as areas of discolouration (so-called implants or deposits) or peritoneal defects and scarring. More extensive disease can lead to formation of adhesions and cysts. Both medical and surgical treatments for endometriosis are associated with considerable morbidity and frequently short term relief of chronic pelvic pain symptoms.5,6 In subfertility, laparoscopic ablative treatment is associated with improved reproductive outcome.4,7 Accurate diagnosis is therefore essential to optimally target women likely to benefit from treatment, to reduce unnecessary morbidity and to efficiently use health service resources.
Advances in endoscopic instrumentation have facilitated peritoneal tissue biopsy for histological assessment. The existing data supporting the use of laparoscopy in visual diagnosing endometriosis have seldom been validated against histological findings confirmed independently using an established gold standard. Individual studies on this subject are small, leading to imprecise estimates of diagnostic accuracy. We therefore undertook a quantitative systematic review to obtain more precise accuracy estimates and to explore reasons for heterogeneity.
General bibliographic databases MEDLINE (1966–December 2003) and EMBASE (1980–December 2003) were searched. Pilot searches suggested that the following search strategy gave reasonable precision without compromising sensitivity: All medical subject headings (MeSH) and textwords for the terms biopsy, histology or pathology were combined with the MeSH term endometriosis [classification and diagnosis]. This allowed us to capture studies where diagnosis of endometriosis was verified by histology. The search was limited to human studies with no language restrictions. Studies addressing the relevant diagnostic technology (laparoscopy) were then identified on completion of the initial search phase by examining all the retrieved citations. Reference lists of all known reviews and primary studies were checked and the specialist journal Gynaecological Endoscopy hand searched for relevant articles.
Selection of studies was achieved in a two-stage process by two independent reviewers (CBW and TJC). Stage I involved identifying titles and abstracts as potentially relevant from bibliographic database searches and papers were provisionally included unless they could be clearly excluded as not addressing the issue at hand. The test of interest was diagnostic laparoscopy and in order to utilise all available data, the population was not restricted although the underlying indication for laparoscopy was recorded. Appropriate studies were test accuracy studies where comparison of the results of the diagnostic test of interest (i.e. visual diagnosis endometriosis at laparoscopy) was made with results of a reference standard based on histology. Studies where verification of diagnosis by the reference standard was dependent upon the test result (i.e. ‘positive’ or ‘negative’ test) were also included, but classified as ‘incompletely verified’. All provisionally included articles from Stage I had full text of the articles retrieved (Stage II). Final inclusion or exclusion decisions were made on the basis of the listed criteria of population, test and reference standard. Disagreements were resolved by consensus or arbitration by a third reviewer. The strength of agreement between reviewers taking into account the play of chance was computed using the kappa statistic (agreement is considered good if >0.6 and very good if >0.8).8
All papers meeting the eligibility criteria were assessed for their methodological quality, which involved scrutinising study designs and the relevant features of population, test and reference standard.9,10 These features included method of data collection and patient selection, population spectrum (full description of clinical presentation) description of the diagnostic test (technique, diagnostic criteria including lesion type and extent), and histological reference standard sufficient to allow replication, and presence of verification bias (completeness and timing of verification of test result by the reference standard) and blinding.11 Histological definition of endometriosis was regarded as adequate if both glands and stroma were present in a sufficient biopsy specimen. Withdrawal of women from the study, missing data and lack of outcome data were categorised as lost to follow up. In addition to assessing the above features in isolation, the following quality hierarchy of diagnostic evidence was also used, which incorporated the above items12:
I. An independent, blind comparison with reference standard among an appropriate population of consecutive patients.
II. An independent, blind comparison with reference standard among an appropriate population of non-consecutive patients or confined to a narrow population of study patients.
III. An independent, non-blind comparison with reference standard among an appropriate population of consecutive patients.
IV. An independent, non-blind comparison with reference standard among an appropriate population of non-consecutive patients or confined to a narrow population of study patients.
V. An independent, blind comparison among an appropriate population of patients, but reference standard not applied to all study patients.
The assessment of English language papers was performed by two reviewers independently (CW and TJC) and foreign language papers by one reviewer (CW) following translation where necessary. Any disagreement was resolved by consensus.
Data were abstracted as two by two tables of the laparoscopy result (positive or negative for endometriosis) and the histology result (positive or negative for endometriosis) wherever possible. This allowed us to calculate the true positive rate (sensitivity), false positive rate (1 − specificity) and likelihood ratios (LRs) for each study. Where more than biopsy specimen was taken from a normal or abnormal area of peritoneum, the unit of measure to derive the accuracy characteristics was taken to be the particular peritoneal area under scrutiny rather than the patient.
Computation of two by two tables was precluded in studies where data were restricted to either positive test results or negative test results. Among such studies, data were extracted to allow positive and negative predictive values to be calculated respectively.
Laparoscopic procedures failing to make a final diagnosis because of technical aspects (e.g. failed pneumoperitoneum, inadequate visualisation) were categorised failed procedures. Failure rates were recorded but excluded from two by two tables, whereas indeterminate test results (inability to make a diagnosis with acceptable laparoscopic visualisation) were used in a sensitivity analysis including them along with negative results. This was planned in order to determine the impact of these results on test performance, thereby preventing biased assessment of test characteristics. Information on the number of women recruited and whose outcome data were known was also sought from the manuscripts. In addition, details about the type and site of lesion(s) and their extent (according to the revised American Fertility Society Classification [rAFS])13 were sought to examine their effects on accuracy in sensitivity analyses. Complication data relating to the diagnostic procedure or arising from acquiring tissue for the reference standard (peritoneal biopsy) were also extracted where recorded.
Meta-analysis to produce summary pooled estimates of sensitivity and specificity (weighted by the number of biopsy specimens or women accordingly) were performed if these measures were found to behave independently as indicated by lack of statistical correlation between them. In the presence of an apparent association between sensitivity and specificity, a summary receiver operating characteristic curve (ROC) was planned to be generated. However, estimates of sensitivity and specificity or production of summary ROC curves have limited value in clinical interpretation. Moreover, there was no significant association between sensitivity and specificity (Spearman's correlation coefficient r= 0.4, P= 0.42), so a summary ROC receiver operating characteristic curve was not generated.14 Summary LRs for both positive and negative test results along with their 95% confidence intervals (CI) were generated as the principal measures of diagnostic accuracy based on the recommendations of the various evidence-based medicine groups.15–17
The LRs indicate by how much a given laparoscopy finding raises or lowers the probability of having endometriosis. This is important in clinical decision making because the estimated probability of disease (or not having disease) is a prime factor determining whether to withhold treatment, undertake further diagnostic testing or treat without further testing.18 Thus, the generation of LRs and post-test probabilities represents a more relevant method of establishing the utility of a test and reduces the risk of erroneous inferences being drawn.19 Pooling of LRs was performed by weighting the log LR from each study in inverse proportion to its variance, using a random effects model in the presence of heterogeneity.
We examined heterogeneity of results between studies both graphically and statistically (using χ2 test). We explored for sources of heterogeneity by subgroup analyses, stratifying the relevant studies according to variation in specific study characteristics [e.g. population, test result (including type, site and extent of lesions) reference standard and study quality]. We explored for publication bias by producing a funnel plot of diagnostic odds ratios against corresponding variances. The adjusted rank correlation method was used to test the correlation between estimated diagnostic odds ratios (ratio of LRs) and their variances.20
The search revealed 1426 citations and of these there were 50 articles that both reviewers thought were relevant and full manuscripts of these papers were obtained. Agreement regarding eligibility was 91% (weighted kappa 0.65, 95% CI 0.54–0.76). No additional articles were identified through examination of the reference lists of the known primary publications and review articles. After independent review of the 50 manuscripts, 27 articles (3732 women) were considered to be eligible for inclusion in the review (Fig. 1). Of these studies, four (433 women)21–24 provided complete data for computation of test accuracy. The remaining 23 studies were restricted to positive (21 articles including 2707 women)25–45 or negative (two articles including 159 women46,47 test results.
Details of the participants, interventions, outcomes and study quality criteria of the studies selected for meta-analyses are summarised in Tables 1 and 2. Overall compliance with methodological quality criteria of selected studies was poor. No study was of the highest methodological quality (level 1), two studies were classified as level 2 (7%), two studies (7%) as level 3 and the remaining 23 studies (86%) were level 4 and 11 studies (41%) were level 5 in quality.
Table 1. Study characteristics.
Study (year published)
No. of patients
Number in parentheses are those patients diagnosed as having endometriosis on laparoscopy.
Number in parentheses are the number visually diagnosed at laparoscopy, the remainder being visually diagnosed on laparotomy.
Complete biopsy data available for 214 patients (352 biopsies). Biopsy data incompletely reported for remaining 20 patients.
The overall weighted sensitivity was 94% (95% CI 80–98%) and specificity was 79% (95% CI 67–87%) according to the four completely verified studies of laparoscopy for endometriosis (Table 3).Table 3 also presents accuracy data according to final diagnosis using the patient (rather than biopsy specimens) as the unit of measure. No substantial alterations in accuracy were noted when accuracy characteristics were calculated in this manner.
Table 3. Diagnostic accuracy of laparoscopy for endometriosis from studies with complete verification.
Endometriosis (biopsy specimen)
+ve test (sensitivity)
−ve test (specificity)
+ve test (sensitivity)
−ve test (specificity)
The numbers shown in brackets are the figures from which the sensitivity and specificity were calculated (test/reference).
Individual study prevalence of endometriosis varied between 18% and 77% reflecting variation in population spectrum. The individual and pooled LRs for endometriosis are shown in Fig. 2. Assuming a 10% pre-test probability of endometriosis, a positive laparoscopy increases the likelihood of disease to 32% (95% CI 21–46%) and a negative laparoscopy decreases this likelihood to 0.7% (95% CI 0.1–5.0%).Table 4 shows the effect of a positive or negative laparoscopy result on post-test probability of endometriosis, according to various pre-test probabilities (disease prevalence of 10%, 20% and 50%, respectively).
Table 4. Risk of woman having endometriosis following a diagnostic laparoscopy stratified by pre-test probability of endometriosis. The following equation was used for calculating post-test probability: Post-test Probability = Likelihood Ratio × Pre-test Probability/[1 − Pre-test Probability × (1 − Likelihood Ratio)] where likelihood ratios (95% CI) for diagnostic laparoscopy are LR+ 4.30 (2.45–7.55)/LR− 0.06 (0.01–0.47). Ranges of post-test probability were calculated by using lower and upper limits of 95% confidence intervals of likelihood ratios.
Pre-test probability of endometriosis
Post-test probability [% (range)]
Test+ (abnormal result)
Test− (normal result)
Heterogeneity of diagnostic performance between studies was present as confirmed by a statistically significant χ2 test and this remained for positive laparoscopy results following subgroup analyses according to items of study quality. However, for negative test results, a potential explanation for heterogeneity in accuracy was provided by study quality. The pooled LR estimate for a negative test result from the two studies of higher quality (level 2) was marginally more conservative at 0.07 (95% CI 0.02–0.24, χ2 test for heterogeneity P= 0.1). A funnel plot (not shown) indicated that smaller studies tend to report better diagnostic test performance, although the correlation was not statistically significant (rank correlation r=−0.8, P= 0.3) and so publication bias is unlikely to be a problem.
Of the 23 remaining studies which were unsuitable for meta-analysis (studies with incomplete verification), 21 were restricted to positive test results and 2 to negative test results. The median PPV was 0.76 (range 0.25–1.0) regardless of whether patient or biopsy specimen was taken as the unit of measure (Table 5). Median values were not significantly different when data were stratified by population characteristics (pain, infertility, mixed or unreported presenting symptoms). Negative predictive values of 0.02 and 0.08 were reported in the two remaining studies.
Table 5. Diagnostic accuracy of laparoscopy from studies with incomplete verification. The numbers shown in brackets are the figures from which the positive and negative predictive values were calculated (test/reference).
In six studies,24,29,37,40,42,45 accuracy data were reported according to location within the pelvic cavity (lesion site) and in nine studies21,24,25,34,36,37,42,44,45 according to morphological appearance (lesion type). No consistent trend in accuracy was confirmed due to the paucity of comparative data between studies and data pooling was thus not feasible (data not presented, but available from the authors). The effect of disease staging (rAFS classification) was reported in a single study. The predictive ability of a positive laparoscopy was better with increased severity of endometriosis [40/47 (0.85) for minimal/mild disease vs 17/18 (0.94) for moderate/severe disease]. No study reported any failed laparoscopic instrumentation or diagnostic procedures. No major direct complications were reported.
Our review shows that there is very little good quality literature assessing the value of visual diagnosis of endometriosis at laparoscopy. Among the available studies, a negative diagnostic laparoscopy seems to be highly accurate for excluding endometriosis and thereby of usefulness to the clinician in aiding decision-making. A positive laparoscopy is less informative and of limited value when used in isolation. For example, if we assume a 20% population prevalence of endometriosis, a positive finding on laparoscopy will be incorrect in half of the cases. There is therefore significant potential for unnecessary clinical morbidity as a consequence of unnecessary treatments based on false positive diagnoses. No test failures or serious complications were reported suggesting that laparoscopy is both a successful and safe diagnostic intervention, although reporting bias is likely as recording of failures and complications was unclear in some studies.
The strength of our review is based on the compliance with criteria for performing a rigorous systematic review,9,48,49 which included, among others, the use of study quality assessment and exploration of heterogeneity by planned pre-specified subgroup analyses.50,51 Homogeneity of results from study to study is one of the criteria for meta-analysis, but the presence of inconsistency does not always invalidate a meta-analysis. In this situation, it is important to consider possible reasons for heterogeneity and so try and explain it. However, the lack of a satisfactory explanation for heterogeneity when dealing with positive test results limits the strength of inferences arising from this meta-analysis. The small number of studies available and corresponding sample sizes restricted exploratory subgroup analyses. Furthermore, the availability of details of potential, pre-specified explanatory population and test characteristics (e.g. clinical presentation, type, location and extent of endometriotic lesions) were limited due to inconsistent and incomplete reporting. However, cautious interpretation would demand that we consider the performance of laparoscopy to vary according to these features, although the magnitude and direction of any effect is unclear at present.
Another reason for careful data interpretation in the clinical setting relates to the unit of analysis employed to estimate diagnostic accuracy. In a number of studies, several biopsies were taken from each woman and in others a single biopsy was taken for diagnostic purposes. In the former studies, presentation of data varied (according to biopsy, patient or both) so that it was only possible to compute accuracy based on biopsy specimen as well as final patient diagnosis in seven studies. Some may argue that where several biopsies are obtained, the final diagnosis is the most relevant unit of analysis, as subsequent management will depend upon this. Conversely, by limiting presentation of data to a final, often arbitrary, ‘composite’ diagnosis risks masking the true accuracy of visual diagnosis and any variation according to disease spectrum (lesion type and number). We believe that both approaches have merit and have therefore presented accuracy according to specimen and overall patient diagnosis.
The diagnosis of endometriosis in young women often leads to long term medicalisation and increases the likelihood of repeated surgical intervention. Both medical and surgical interventions are associated with significant side effects and costs. Moreover, although the condition is benign, the implications of diagnosis in terms of a woman's perception of her sexuality and fertility may have an adverse psychological impact and be detrimental to her quality of life. The LR for a positive test on laparoscopy (4.30, 95% CI 2.45–7.45) is therefore unlikely to raise the pre-test probability of endometriosis over any threshold for advanced management in most clinicians' practice, unless disease prevalence is very high.18 Additional testing, namely, peritoneal biopsy, will therefore be required when endometriosis is suspected following laparoscopic visual inspection. The case for performing such biopsies is further strengthened when taking into account technological advances in endoscopic equipment and instrumentation, which have facilitated laparoscopic intervention. In contrast, a woman with a negative laparoscopy can be adequately reassured without the need for further testing.
This review highlights the paucity of high quality accuracy studies in this field. Endometriosis is a common condition associated with substantial morbidity. High quality diagnostic studies are thus urgently required. They should be performed according to the recently published STARD criteria52,53 to avoid the methodological deficiencies discovered in this systematic review. In particular, the population spectrum should reflect standard practice and be adequately described, the features that constitute a normal or abnormal laparoscopy should be explicit and the type, site and extent of disease should be recorded to assess their impact on accuracy. Finally, the test result must be verified in all cases and potential bias due to variation in interpretation of the histological reference standard should be minimised by standardised reporting and blinding assessment from the preceding laparoscopy result. This review should form the basis for the planning of such studies in the future.
The authors would like to thank Dr Honest for his help in constructing Fig. 2.
Appendix A. Reference list of excluded studies
A1. Martin DC, Ahmic R, el Zeky FA, Vander ZR, Pickens MT, Cherry K. Increased histologic confirmation of endometriosis. J Gynecol Surg 1990;6:275–279.
A2. Bonte H, Chapron C, Vieira M, et al. Histologic appearance of endometriosis infiltrating uterosacral ligaments in women with painful symptoms. J Am Assoc Gynecol Laparosc 2002;9:519–524.
A3. Chapron C, Dumontier I, Dousset B, et al. Results and role of rectal endoscopic ultrasonography for patients with deep pelvic endometriosis. Hum Reprod 1998;13:2266–2270.
A4. Kitawaki J, Kusuki I, Koshiba H, Tsukamoto K, Fushiki S, Honjo H. Detection of aromatase cytochrome P-450 in endometrial biopsy specimens as a diagnostic test for endometriosis. Fertil Steril 1999;72:1100–1106.
A5. Vercellini P, Vendola N, Bocciolone L, Rognoni MT, Carinelli SG, Candiani GB. Reliability of the visual diagnosis of ovarian endometriosis. Fertil Steril 1991;56:1198–1200.
A6. Lin J, Sun C, Zhang X. Clinical evaluation on laparoscopic diagnosis of pelvic endometriosis by heat-colour test with endocoagulation. Zhonghua Fu Chan Ke Za Zhi 1997;32:280–283.
A7. Nakamura M, Katabuchi H, Tohya T, Fukumatsu Y, Matsuura K, Okamura H. Scanning electron microscopic and immunohistochemical studies of pelvic endometriosis. Hum Reprod 1993;8:2218–2226.
A8. Sun AD. Laparoscopy for the diagnosis of endometriosis: analysis of 254 cases. Zhonghua Fu Chan Ke Za Zhi 1991;26:24–27, 61.
A9. Vercellini P, Trespidi L, De Giorgi O, Cortesi I, Parazzini F, Crosignani PG. Endometriosis and pelvic pain: relation to disease stage and localization. Fertil Steril 1996;65:299–304.
A10. Canis M, Mage G, Manhes H, Pouly JL, Wattiez A, Bruhat MA. Laparoscopic treatment of endometriosis. Acta Obstet Gynecol Scand Suppl 1989;150:15–20.
A11. Doberl A, Bergqvist A, Jeppsson S, et al. Regression of endometriosis following shorter treatment with, or lower dose of danazol. Comparison of pre- and post-treatment laparoscopic findings in the Scandinavian multi-center study. Acta Obstet Gynecol Scand Suppl 1984;123:51–58.
A12. Kreiner D, Fromowitz FB, Richardson DA, Kenigsberg D. Endometrial immunofluorescence associated with endometriosis and pelvic inflammatory disease. Fertil Steril 1986;46:243–246.
A13. Rawson JM. Prevalence of endometriosis in asymptomatic women. J Reprod Med 1991;36:513–515.
A14. Kably AA, Matus CM, di Castro P, Ibarra V, Serviere C. Achromatic endometriosis. Ginecol Obstet Mex 1990;58:324–327.
A15. Goldstein DP, De Cholnoky C, Emans SJ. Adolescent endometriosis. J Adolesc Health Care 1980;1:37–41.
A16. Donnez J, Nisolle M, Casanas-Roux F. Peritoneal endometriosis: two-dimensional and three-dimensional evaluation of typical and subtle lesions. Ann N Y Acad Sci 1994;734:342–351.
A17. Donnez J, Nisolle M, Gillerot S, Smets M, Bassil S, Casanas-Roux F. Rectovaginal septum adenomyotic nodules: a series of 500 cases. Br J Obstet Gynaecol 1997;104:1014–1018.
A18. Muzii L, Catalano GF, Marana R. Endometriosis externa and interna: endoscopic diagnosis. Rays 1998;23:683–692.
A19. Nisolle M, Donnez J. Peritoneal endometriosis, ovarian endometriosis, and adenomyotic nodules of the rectovaginal septum are three different entities. Fertil Steril 1997;68:585–596.
A20. Redwine DB. Peritoneal blood painting: an aid in the diagnosis of endometriosis. Am J Obstet Gynecol 1989;161(4):865–866.