Symptoms, ultrasound imaging and biochemical markers alone or in combination for the diagnosis of ovarian cancer in women with symptoms suspicious of ovarian cancer

  • Protocol
  • Diagnostic

Authors


Abstract

This is the protocol for a review and there is no abstract. The objectives are as follows:

To establish the accuracy of symptoms, ultrasound and biomarkers alone or in combination for the diagnosis of ovarian cancer in pre- and postmenopausal women.

To compare the accuracy of different tests or test combinations.

We will investigate the following sources of heterogeneity:

Population

  • Clinical setting (generalist/primary care/community/family practice) versus specialist setting (cancer unit/cancer centre/gynaecological oncology)

  • Menopausal status

Index tests

  • Test positivity threshold

  • Experience of the ultrasound test operator (general sonographers versus specialist interest)

Target condition

  • Histological subtype

Study quality

  • Case-control versus other study designs

  • Study quality: for study participants not receiving surgery initially following a negative index test result: 12 months follow-up versus less than 12 months follow-up

Background

Ovarian cancer is the deadliest and the most common cause of mortality among all gynaecological cancers. In 2012, 239,000 women were diagnosed with ovarian cancer and 152,000 women died worldwide (CRUK 2014). The high case fatality rate is largely attributed to the advanced stage at diagnosis in the majority of ovarian cancers. Approximately 75% of ovarian cancers are diagnosed in an advanced stage. Five-year survival rates are less than 30% in advanced-stage disease in comparison to five-year survival of more than 90% in stage 1 disease (CRUK 2014). Lack of awareness and recognition of symptoms by patients and physicians is considered one of the main factors in delayed diagnosis and poor outcomes. Diagnosis of ovarian cancer is challenging because of variable presentation, the non-specific nature of symptoms (Fitch 2002), and the low prevalence (0.23%) (Myers 2006). Ten per cent of women will undergo surgery in their lifetime for ovarian pathology, but only a small minority will have ovarian cancer (RCOG 2011). The prevalence of cancer in women undergoing surgery for ovarian pathology is 20% (Koonings 1989), but ranges from 5.7% to 57.5% (Myers 2006).

Diagnosis of ovarian cancer in premenopausal women poses additional challenges. The majority of tumours detected in premenopausal women tend to be benign; only 1 in 1000 symptomatic ovarian cysts are malignant, increasing to 3 in 1000 at age 50 (RCOG 2011).

A considerable amount of research has been dedicated to early diagnosis of ovarian cancer in an effort to improve outcomes. The use of symptoms, biomarkers and imaging (in particular ultrasound parameters) has been explored in an effort to make an earlier and more accurate diagnosis (Bankhead 2005; Sarojini 2012; Van Calster 2012). The accuracy of these tests, alone or in combination, and in different healthcare settings, has been investigated in various studies (Ferraro 2013; Kaijser 2014), but the most accurate combination of tests has yet to be determined.

This is a generic protocol for a series of four linked diagnostic test accuracy reviews to estimate the accuracy of symptoms, ultrasound imagining and biochemical markers alone or in combination for the diagnosis of ovarian cancer:

  • Clinical symptoms for the diagnosis of ovarian cancer

  • Ultrasound imaging for the diagnosis of ovarian cancer

  • Biochemical markers for the diagnosis of ovarian cancer

  • Symptoms, ultrasound imagining and biochemical markers for the diagnosis of ovarian cancer

Target condition being diagnosed

The diagnosis of ovarian cancer is difficult, largely because many physiological and benign conditions in premenopausal women, including the menstrual cycle, endometriosis and fibroids, may present in a similar way: with symptoms, an abnormal ultrasound scan and/or associated with raised biomarkers. The result is that the specificity of these tests is reduced and the probability of false positive results is increased.

Ovarian cancer is a heterogeneous disease and includes epithelial cell tumours, germ cell tumours, stromal cell tumours, metastatic cancers and tumours of low malignant potential (LMP, also known as borderline tumours). More than 90% of ovarian cancers in postmenopausal women are epithelial cell tumours, whereas in premenopausal women 15% to 20% of tumours are germ cell in origin. Epithelial cell tumours are the most common and within this group of tumours high-grade serous carcinoma is the commonest and most deadly. Other common epithelial histological types are mucinous, clear cell and endometrioid types (Shepherd 2000). Current understanding of the pathogenesis of ovarian cancers suggests they are different diseases, sharing the same anatomical location. Recent morphological and genetic studies have helped to improve our understanding of ovarian carcinogenesis and tumour behaviour based on different histology types. The distal fallopian tube has been identified as the origin for serous ovarian carcinomas and ovarian clear cell cancers, the origin of endometrial cancer has been linked to endometriosis (Wiegand 2010), and the origin of the majority of mucinous tumours is considered to be the appendix (Seidman 2003). A dualistic model has also been proposed based on the behaviour of tumours (Shih 2004). Type I tumours are indolent, present at an early stage and progress from benign, to intermediate to carcinoma in a stepwise pattern; low-grade serous, endometrioid, clear cell and mucinous carcinomas are examples of type I tumours. Type II tumours are aggressive, high-grade carcinomas, most often diagnosed at an advanced stage and include high-grade serous, endometrioid and undifferentiated carcinomas. Type I and type II tumours display markedly different and distinct genetic patterns (Cho 2009). This advancement in understanding has major research implications, especially regarding the role of biomarkers in the management of ovarian cancer. However, there is a lag between recent advances in knowledge and the current evidence base. This understanding is yet to be reflected in the majority of the primary studies. Many biomarkers are being investigated and preliminary evidence on a few promising biomarkers has been reported, but needs to be substantiated in larger studies.

Existing reviews are mostly restricted to the performance of tests in epithelial ovarian cancer in postmenopausal women and they neglect the heterogeneous nature of the disease and the different prevalence of the tumour based on menopausal status (Myers 2006). Most studies have investigated the accuracy of tests for the diagnosis of ovarian cancer in adnexal masses occurring in postmenopausal women or have not made a distinction between pre- and postmenopausal women (Myers 2006).

Our review will include primary ovarian cancer of all histological types and stages, including borderline tumours. We will not consider the diagnosis of metastatic disease (cancer found in the ovary, but originating in an other organ) in this review.

Index test(s)

Symptoms

In the last decade, the perception of ovarian cancer as a 'silent killer' has changed. Various published literature has concluded that the diagnosis is preceded by persistent gastrointestinal and urinary symptoms, and menstrual disturbances (Bankhead 2005). Symptoms frequently reported in studies include abdominal pain, pelvic pain, abdominal bloating, distension, altered bowel habit, such as constipation and diarrhoea, and urinary symptoms. However, the duration, severity and nature of these symptoms are non-specific and mimic benign conditions, such as irritable bowel syndrome and perimenopausal changes, making early recognition and diagnosis challenging. The majority of studies investigating the accuracy of symptoms are case-control studies using non-validated questionnaires and are prone to recording or recall bias. The Goff Symptom Index, one of the most commonly studied and validated questionnaires, has been shown to have a sensitivity of 66.7% and specificity of 90% in women older than 50 years, with a corresponding sensitivity and specificity of 86.7% in women of less than 50 years (Goff 2007). However, the potential for the Goff Symptom Index and other symptom scores to significantly improve outcome and their potential value as a diagnostic tool has also been questioned, since the interval from recognition to diagnosis is variable and may only be a few months (Lim 2012).

Biochemical markers

Biochemical markers are substances secreted or shed by tumours into surrounding blood and body fluids and expressed in abnormal tissues. Biomarkers may be uniquely specific for some tumour subtypes or non-specific.

The most commonly used biomarker for ovarian cancer is CA125, which is raised in many benign and physiological conditions (Moss 2005; Posadas 2004). CA125 operating at a threshold of 30 units/ml has been shown to have a sensitivity of 81% and specificity of 75% for distinguishing benign from malignant tumours in mixed pre- and postmenopausal populations with adnexal masses (Jacobs 1990). However, CA125 has a low sensitivity (50%) for early-stage ovarian cancer (Jacobs 1989), and reduced specificity in premenopausal women.

More recently, other promising tumour markers have received Food and Drug Administration (FDA) approval. These include OVA1TM, which is a multivariate index assay using a combination of five bioassays including CA125 II, transthyretin (TTR), apolipoprotein A1 (Apo-A1), transferrin and beta 2 microglobulin. Human embryonic antigen (HE4) has been demonstrated to have similar sensitivity, but improved specificity compared to CA125 and OVA1TM for ovarian cancer, particularly in premenopausal women (Ferraro 2013; Holcomb 2011). Human chorionic gonadotropins (HCG), lactate dehydrogenase (LDH) and alpha fetoprotein (AFP) are germ cell tumour markers and are recommended for use in women under 40 years (ACOG 2007; RCOG 2011).

Revised understanding of the pathophysiology of ovarian cancer suggests that the majority of high-grade serous ovarian cancers and primary peritoneal cancers arise from the fimbrial end of the fallopian tube and are therefore likely to disseminate intraperitoneally early (Vaughan 2011). This implies that early detection with symptoms and ultrasound imaging may never be achievable and sensitive biomarkers will be required to detect early disease. It has been noted that levels of some tumour markers may begin to rise as early as three years prior to diagnosis (Anderson 2009).

Ultrasound

Ultrasound imaging enables visualisation of morphological details of ovarian cysts. The diagnostic potential of ultrasound has improved with advancing technology and the availability of transvaginal ultrasound (TVS), 3D ultrasound and doppler techniques to characterise blood flow. However, the use of ultrasound to characterise lesions is influenced by interference from surrounding tissue, variability of the macroscopic features and the subjective nature of interpretation that is operator-dependent. Various scores have been developed to make ultrasound interpretation more objective (Geomini 2009). Morphological features, such as size, presence of bilateral lesions, presence and thickness of septum, presence of solid areas, excrescences and papillary structures within tumours, metastases, presence of ascites and doppler measurements of blood flow, have been combined in various ways.

  • The 'U' score (presence of bilateral lesions, multilocularity, solid areas, metastases or ascites, where U = 0 indicates the absence of any of these features; U = 1 indicates the presence of any one of these features and U = 3 indicates the presence of two or more of these features) (RCOG 2011).

  • The 'B' rules (unilocular cysts; presence of solid components where the largest solid component is less than 7 mm; presence of acoustic shadowing; smooth multilocular tumour with a largest diameter of less than 100 mm; no blood flow)) (RCOG 2011).

  • The 'M' rules (irregular solid tumour; presence of solid components where the largest solid component is less than 7 mm; at least four papillary structures; irregular multilocular solid tumour with a largest diameter of 100 mm or more; very strong blood flow) (RCOG 2011).

The U score, B and M rules have been evaluated in many primary studies on their own or in combination with other features (i.e. non-ultrasound features) (Kaijser 2014). New ultrasound-based models (simple rules (SR) and logistic regression model 2 (LR2)) have been proposed by the International Ovarian Tumour Analysis (IOTA) as having better diagnostic accuracy in the preoperative evaluation of ovarian tumours but external validation of these scores to date is limited (Kaijser 2014).

Combinations of tests

Ovarian cancer is a heterogeneous tumour and it is likely that a combination of tests has the potential to improve diagnostic accuracy over any single test alone. The Risk of Malignancy Index (RMI), calculated by multiplying the ultrasound score, menopausal status and CA125 (RMI = U x M x CA125), is the most widely used combination of tests. The sensitivity and specificity of RMI has been demonstrated to be 70% and 90% respectively in postmenopausal women when a cut-off of RMI 250 is used (RCOG 2010). The Risk of Ovarian Malignancy Algorithm (ROMA) uses menopausal status and the biomarkers CA125 and HE4 for risk assessment of the probability of ovarian cancer in adnexal masses. A meta-analysis concluded that ROMA aided differentiation of epithelial ovarian cancer (EOC) from benign masses with higher sensitivity, but lower specificity compared to HE4 and CA125 alone, but considerable heterogeneity was present resulting from the different thresholds, variations in study design and patient characteristics (Li 2012). Other test combinations have been proposed, including more recently the ADNEX (Assessment of Different NEoplasias in the adneXa) model, which combines clinical and ultrasound variables and the biomarker CA125 and shows promise in the preoperative discrimination of benign, borderline, early and advanced malignancies in ovarian masses (Van Calster 2014).

For the purpose of this review we will consider combination tests as tests combining variables from more than one category of the index tests, for example: symptoms, ultrasound imaging and biochemical markers. We will also consider a test that includes risk factors for ovarian cancer, such as age or family history, combined with one or more of the index tests included in this review, i.e. symptoms, ultrasound scan and biomarkers, as combination tests. Our review will also include any combination of tests used in any order.

Clinical pathway

Symptomatic women present in both generalist and specialist settings and may undergo further investigations including biomarker tests, ultrasound scan or both to guide referral to general gynaecologists or gynaecological oncologists. Existing guidelines vary in their recommendations. The National Institute for Health and Care Excellence (NICE) and the Royal College of Obstetrics and Gynaecology (RCOG) in the UK have suggested a clinical pathway where symptoms in primary care trigger further testing in primary care with ultrasound scan and biomarkers prior to referral to specialist care. The NICE guidance recommends ultrasound imaging in symptomatic women with CA125 of 35 IU/ml or greater. The RMI is used in secondary care to triage for surgical management (NICE 2011): postmenopausal women are referred to gynaecological oncologists if the RMI is more than 250 and premenopausal women are referred if CA125 is more than 200 units/ml (RCOG 2010; RCOG 2011). The American College of Obstetrics and Gynaecology (ACOG) recommends using a combination of symptoms, risk factors, biomarkers and imaging tests (including computed tomography (CT) and positron emission tomography (PET-CT)) to triage women for surgical management (ACOG 2007). Use of germ cell tumour markers such as alpha fetoprotein (AFP), human chorionic gonadotropin (HCG) and lactate dehydrogenase (LDH) are recommended in women under 40 years (ACOG 2007; RCOG 2011). A recent multicentre study in the UK demonstrated variable adherence to the recent NICE guidance in terms of tests used, thresholds used and interpretation (Rai 2015).

Prior test(s)

As a minimum women will present with self assessed symptoms. In addition, women may have had clinical assessment (history and examination), imaging and biomarker tests prior to testing with the index test depending on at what point in the clinical pathway the index tests are being evaluated.

Role of index test(s)

The index tests are used for triage of patients with symptoms and/or size suspicious of ovarian cancer presenting in primary or secondary care for further testing or treatment.

Alternative test(s)

This review is concerned with initial investigations to diagnose ovarian cancer that would be applicable both in primary and secondary care. Computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET) and other complex imaging techniques are beyond the scope of this review.

The multivariate biomarker OvaSureTM (CA125, prolactin, leptin, insulin growth factor II, macrophage inhibitory factor and osteopontin) has been reported to have a sensitivity and specificity of 95.3% and 99.4% respectively (Visintin 2008), but has been withdrawn from market as these performance characteristics were based on an inaccurate prevalence rate. Advances in technology, especially in mass spectrometry, have led to high throughput assays and rapid turnover in identifying promising new biomarkers. However, deficiencies in verification, validation and reproducibility have mitigated against the translation of promising biomarkers into practice. In addition, lack of a consistent, standardised process for obtaining regulatory approval for tests may be contributing to the gap between development and implementation. For this reason we will only include currently approved FDA markers used in the diagnosis of ovarian cancer in this review. We will be inclusive in our search strategy and map emerging biomarkers for the diagnosis of primary ovarian cancer as a resource for updating the review.

Rationale

Advances in surgical practice and chemotherapy have slightly improved survival, but ovarian cancer continues to have high mortality, which is largely attributed to advanced stage at diagnosis. The non-specific nature of symptoms associated with ovarian cancer and the high prevalence of ovarian cysts of uncertain significance (30% of females with regular menstruation, 50% of females with irregular menstruation and 6% of postmenopausal females (Duklewski 2009)) continues to pose problems for early and accurate diagnosis.

With advances in knowledge revealing the extent of the heterogeneous nature of ovarian cancer disease there is a need to re-examine the performance of tests alone and in combination and in sub-populations of ovarian cancer risk and for different types of disease. In addition, a review is needed that encompasses the most recent test developments.

An internal scoping exercise of systematic reviews carried out by the author team in preparation for this Cochrane review demonstrated limitations in quality and the degree to which current understanding is reflected. Deficiencies included inadequate ascertainment of the literature, limited spectrum, lack of consideration of prior testing, inclusion of studies with inadequate reference standards (minimal or no follow-up data on patients who did not undergo surgery) and non-ascertainment of the disease status of index test negatives.

Accurate diagnosis of ovarian cancer is important to ensure appropriate referral and further management, including surgery. Outcomes in ovarian cancer are better when patients are referred to specialists in gynaecological oncology and inappropriate referral to general gynaecologists may result in the need for additional remedial surgery. However, referral of benign masses to specialists in gynaecological oncology may cause unnecessary anxiety in women, result in unnecessarily invasive surgery, compromise fertility and overwhelm services.

Objectives

To establish the accuracy of symptoms, ultrasound and biomarkers alone or in combination for the diagnosis of ovarian cancer in pre- and postmenopausal women.

To compare the accuracy of different tests or test combinations.

Secondary objectives

We will investigate the following sources of heterogeneity:

Population

  • Clinical setting (generalist/primary care/community/family practice) versus specialist setting (cancer unit/cancer centre/gynaecological oncology)

  • Menopausal status

Index tests

  • Test positivity threshold

  • Experience of the ultrasound test operator (general sonographers versus specialist interest)

Target condition

  • Histological subtype

Study quality

  • Case-control versus other study designs

  • Study quality: for study participants not receiving surgery initially following a negative index test result: 12 months follow-up versus less than 12 months follow-up

Methods

Criteria for considering studies for this review

Types of studies

We will include diagnostic case-control, cross-sectional and comparative diagnostic test accuracy studies. We we also include studies developing and validating multivariable models for the diagnosis of ovarian cancer. We anticipate that in view of the low prevalence of ovarian cancer, the majority of cross-sectional studies will recruit women with reference standard results and index test results will be ascertained retrospectively. Studies have to contain sufficient data to extract 2 x 2 tables on the diagnostic test performance. We will include studies not providing verification of index test negatives and construct 2 x 2 tables by imputation using setting-specific prevalence estimates.

Participants

Adult women aged 18 years or older, irrespective of menopausal status. We will exclude studies restricted exclusively to populations under 18. We will exclude women with a previous history of ovarian cancer and pregnant women.

Prior tests

The review will include women who have symptoms or signs suggestive of suspected ovarian cancer. As a minimum women will have undertaken self assessment. As the review covers index tests used in both generalist and specialist settings women may also have had one or more of ultrasound scan and biomarker testing prior to the index test being evaluated. We will exclude cross-sectional studies explicitly describing the population as asymptomatic or screening or where the asymptomatic participants cannot be disaggregated. We will downgrade studies not clarifying symptomatology in the included population in the quality assessment (QUADAS-2) by noting applicability in the patient domain as unclear.

Index tests

Symptoms

We will include combinations of symptoms alone or combinations of symptoms, signs or risk factors for ovarian cancer (such as family history) at any threshold and in any order. We will exclude studies restricted to single symptoms, signs alone or risk factors alone.

Biomarkers

We will include the following FDA approved biomarkers:

  • CA125

  • CEA

  • HE4

  • OVA1

  • LDH

  • HCG

  • AFP

Although HCG and AFP are FDA approved markers they are not approved by the FDA for use as tests in ovarian cancer. However, they are used clinically and recommended by the RCOG and ACOG for women under 40 as additional markers for germ cell tumours and we will include them.

We will tabulate other, non-FDA approved ovarian cancer biomarkers as in development to inform review updates.

Ultrasound

Any ultrasound characteristic or combination of characteristics at any threshold, conducted and interpreted by either generalist or specialist sonographers (we will investigate operator experience as a potential source of heterogeneity). We will review only studies post 1991 to restrict the review to current technology and include:

  • 3D ultrasound

  • Grey scale morphology (TVS)

  • Doppler studies involving ovarian pathology

Combinations of tests

Any combination of the index tests listed above (symptoms, ultrasound scan, biomarkers) at any threshold and used in any order.

Target conditions

Ovarian cancer, all stages and types. We will exclude studies restricted to specific ovarian pathologies with the exception of epithelial ovarian cancer (EOC) as this is the most common (> 90% in postmenopausal women) of the ovarian cancers and is associated with the highest mortality. We will exclude metastatic or recurrent ovarian cancer and studies where it is not possible to disaggregate data concerning metastatic disease.

Reference standards

Histology in women who have undergone surgery and clinical follow-up in women who have not undergone surgery. For studies using clinical follow-up, we will consider a year of follow-up of higher quality and we will analyse these separately from studies where clinical follow-up is less than one year.

Search methods for identification of studies

Electronic searches

We will use sensitive search strategies combining terms for the target condition (ovarian cancer) and the index tests (biochemical markers, symptom indices and ultrasound tests or testing strategies) as well as terms to describe test combinations. We will adapt the strategies to run across a range of databases: the Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE and MEDLINE In Process (Ovid), EMBASE (Ovid), CINAHL (Ebsco), the Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE), Health Technology Assessment Database (HTA) and SCI Science Citation Index (ISI Web of Knowledge). We will draw on existing systematic reviews and guidelines as a source of primary studies. We will apply no language restrictions.

We will update the electronic searches for symptoms from the search strategies (completed 2009) used to inform recent UK guidance (NICE 2011). In addition, for ultrasound and biomarkers, we will add additional terms and backdate the searches further to 1991 in order to capture emerging evidence on the probable use of IOTA (International Ovarian Tumor Analysis) variables and biomarkers that were not covered in the NICE guidance. The symptom score search strategy for MEDLINE (Ovid) is shown in Appendix 1.

Searching other resources

To identify ongoing and unpublished studies we will search the following trials registers and conference abstracts and proceedings without a date limit: ClinicalTrials.gov, UK Clinical Research Network Study Portfolio Database (UKCRN) and WHO International Clinical Trials Registry Platform (ICTRP). We will individually search conference proceedings from the European Society of Gynaecological Oncology (ESGO), International Gynecologic Cancer Society (IGCS), American Society of Clinical Oncology (ASCO) and Society of Gynecologic Oncology (SGO), supplemented by searches of the ZETOC and Conference Proceedings Citation Index (Web of Knowledge).

We will handsearch the citation lists of reviews and included studies.

We will supplement the searches for biomarkers by searching the FDA (http://www.fda.org.uk) and European Medicines Agency websites (http://www.ema.europa.eu/ema/) using strategies used by the UK National Horizon Scanning Centre. Inclusion will be limited to FDA approved biomarkers but we will map non-FDA approved biomarkers as a resource for review updates.

Data collection and analysis

We will combine the results of searches in an EndNote database and remove the duplicates. We will carry out study selection and quality assessment in duplicate and independently (NR and RC) with disagreements resolved by discussion or arbitration by a third review author (CD or SS).

Selection of studies

Study authorship will not be concealed (Cochrane DTA Handbook 2013). We will review unique titles and abstracts against the predefined selection criteria to select potentially relevant studies for full-text review. We will carry out study selection independently and in duplicate (NR, RC). We will resolve differences in opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will summarise the results of the selection process using a PRISMA flow diagram and document reasons for exclusion. The group of experts forming the management group of the ROCkeTS (Refining Ovarian Cancer Test Accuracy Scores) study will check the final list of included studies.

Data extraction and management

We will use a pre-defined data collection form (Appendix 2). We will carry out data extraction independently and in duplicate (NR and RC). We will resolve any difference of opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will seek the following data: study design, setting, method of recruitment, number of participants, age, menopausal status (directly or using age over 50 years and history of previous hysterectomy as a proxy for postmenopausal status), prior tests, index tests and index test threshold, index test operator (for symptoms and ultrasound scan), reference standard (including duration of follow-up) and stage of cancer. We will extract data to derive a 2 x 2 table for each study. Where index test negatives are not verified in a study we will impute prevalence estimates applicable to the study setting.

Assessment of methodological quality

We will undertake quality assessment independently and in duplicate (NR and RC). We will resolve any difference of opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will tailor the QUADAS-2 checklist according to the topic area being addressed by the addition of a comparative domain and a separate domain for modelling studies drawing on the forthcoming PROBAST (prediction model risk of bias assessment) tool for diagnostic and prediction models (Wolff 2014). The tailoring is detailed in Appendix 3 and summarised below.

Patient selection

Inappropriate exclusions include specific age groups, histological sub-types or grades, specific ovarian cancer pathologies, co-morbidities such as endometriosis and infertility.

Applicability judgements will depend on symptom status (minimum of self assessed in generalist settings and self assessed or elicited by a healthcare professional in specialist settings).

Index test

Applicability judgements will depend on the experience of the healthcare professional eliciting symptoms and the operator of the ultrasound scan, and whether the interpretation of the ultrasound scan was informed by the presence of symptoms.

Reference standard

Follow-up of less than 12 months for index test negatives is considered unlikely to correctly classify the target condition.

Applicability judgements will depend on whether disease positives can be disaggregated into borderline, ovarian cancer and metastatic disease.

Flow and timing domain

The interval between the application of the index test and the reference standard should be three months or less.

Disease negatives should all receive the same reference standard.

Addition of a comparative domain

For studies comparing two or more index tests, selection of participants should be the same for each test.

For studies comparing two or more index tests, the interval between index tests should be less than three months.

We will present the results of the quality assessment graphically and narratively, highlighting the most important threats to validity and applicability.

Statistical analysis and data synthesis

We will conduct preliminary exploration of diagnostic accuracy study results for each index test separately (see 1 to 4 below), by plotting estimates of sensitivity and specificity in (i) forest plots and (ii) ROC plots.

Index test groups

We will analyse index tests separately in the following groups:

  1. Symptoms suspicious of ovarian cancer, alone or in combination

  2. Biomarkers alone or in combination (CA125, CEA, HE4, OVA1, LDH, HCG, AFP)

  3. Ultrasound characteristics, alone or in combination

  4. Combinations of tests across categories 1 to 3 (either as rules or multivariable models)

Within each index test group, we will consider a meta-analysis where studies use the same test or same combination of tests, studies have compatible study designs and where heterogeneity (as assessed by visual inspection and clinical expertise) is considered reasonable.

Index test subgroups

For each index test or test combination, if there are sufficient studies, we will consider the following subgroups for separate meta-analyses.

Patient characteristics
  • Generalist setting (primary care, community care, family practice) versus specialist setting (secondary care, cancer unit, cancer centre).

  • Pre- versus postmenopausal status as explicitly stated or using age (< 50 years versus age > 50 years) or hysterectomy (yes/no) as surrogates for menopausal status if menopausal status is not reported.

Index test characteristics
  • Test threshold where studies report a common threshold.

  • For ultrasound studies, if the same variables or same rules or same combination of variables are used in more than four studies, we will consider them for meta-analysis. In addition, we will do an overall meta-analysis across the 'U', 'B' and 'M' scores, including other studies derived from scores based on similar ultrasound parameters to these.

Target condition
  • Histological subtype: epithelial versus non-epithelial; high-grade serous (type I) versus other epithelial (type II); early-stage (stage I/II) versus late-stage disease (stage III/IV).

Study quality
  • Case-control versus other study designs.

Methods for meta-analysis

We will explore diagnostic accuracy by creating forest plots of study-specific estimates of sensitivity and specificity, and by plotting these estimates in ROC space. Where adequate data are available and it is considered reasonable to pool results, we will perform meta-analyses using hierarchical models. Since the characteristics measured by index tests may yield binary, ordinal or quantitative test results, the choice of model - bivariate model (Chu 2006; Reitsma 2005) or HSROC model (Rutter 2001) - will depend on whether studies report common thresholds or thresholds vary across studies. To estimate average sensitivity and specificity, we will perform the analysis of each test by first restricting to studies that report a common threshold. To estimate a summary ROC curve without restricting to a common threshold, we will randomly select data at one threshold from each study. We will perform all analyses using the NLMIXED procedure in Statistical Analysis System( SAS 2009) and the xtmelogit command in Stata version 14 (StataCorp LP 2015).

For studies testing multivariable models, we will include both validation and development models with a separate subgroup meta-analysis for validation models only (higher level of evidence). We will consider meta-analysis of multivariable models where exactly the same model is used in terms of both variables and variable coefficients, the model estimates relate to similar patient populations and there are sufficient studies for meta-analysis.

We will consider random-effects univariate analyses (which ignore any correlation between sensitivity and specificity) where pooling is considered an appropriate approach but where hierarchical models fail to converge.  

Where meta-analysis is not considered appropriate due to clinical or methodological heterogeneity we will use a narrative synthesis.

We will translate summary estimates of sensitivity and specificity into the summary estimates of the probability of disease for test-positive patients (PPV) and test-negative patients (1-NPV) using a prevalence of ovarian cancer of 0.23% in women presenting to primary care with symptoms (Myers 2006). In sensitivity analyses we will explore other values of prevalence reflecting secondary care.

Methods for test thresholds

For studies where the 2 x 2 table is generated by using test thresholds, the choice of model, bivariate model (Chu 2006; Reitsma 2005) or HSROC model (Rutter 2001) will depend on whether studies report common thresholds or thresholds vary across studies.

For an index test within an individual study we will normally extract up to three thresholds for an index with multiple possible thresholds. We will prioritise extraction of results in the following order: (i) from pre-specified thresholds, (ii) thresholds commonly used in clinical guidelines, (iii) thresholds commonly used in the published literature and (iv) thresholds reported as main outcomes in the studies. In exceptional instances we may extract data from five thresholds for an individual index test in a single study.

We will exclude studies where it is not possible to identify an appropriate threshold to enable a 2 x 2 table for study results at a single threshold to be reported according to our categories of index test (1 to 4 above).

Exploratory analyses will include forest plots of study estimates of sensitivity and specificity grouped by test threshold, and plotting sensitivity and specificity in ROC space with test thresholds indicated.

To estimate average sensitivity and specificity, we will perform the analysis of each test by first restricting to studies that report common threshold(s) or thresholds viewed as clinically important in the published literature.

To estimate a summary ROC curve without restricting to a common threshold, we will use methods of meta-analysis using multiple thresholds across studies (Riley 2015), or if we are unable to use these methods we will randomly select a single threshold for each study.

Comparison of test accuracy

We will compare test accuracy between different tests first by restricting to studies that make head-to-head (direct) comparisons between tests within the same population as this provides the most reliable evidence (Takwoingi 2013). Secondly, we will also compare tests by including all relevant studies (indirect comparison), particularly where there are few studies comparing tests within the same population. We will compare test accuracy by adding a covariate for test types to be compared in the bivariate or HSROC model, and we will use likelihood ratio tests to test statistical significance between tests.

Investigations of heterogeneity

For each index test group, we will explore the effect of the relevant factors specified in the secondary objectives by visual inspection of forest plots and summary ROC plots. For further investigations of heterogeneity, if there are adequate data, we will add factors as single covariates to the bivariate or HSROC model. We will separately add the following covariates to the bivariate model to assess the association of test performance:

Index test
  • Experience of test operator (ultrasound and symptom elicitation by health professionals) applicable to setting as a categorical covariate: (yes versus no)

  • Threshold (if a continuous covariate)

Reference standard
  • Low risk of bias in the reference standard domain as a categorical variable: high or unclear risk versus low risk

Sensitivity analyses

We will consider sensitivity analyses if there are sufficient studies to investigate the impact on the summary estimates of (i) including only studies with low concern about applicability in the patient selection domain of QUADAS-2, (ii) leaving out potentially highly influential studies and (iii) classification of borderline tumours as malignant or benign.

We will calculate PPV and NPV for a range of values for the prevalence of ovarian cancer reflecting both primary care and secondary care using the best available estimates from the published literature and hospital audits.

Assessment of reporting bias

We will not undertake any formal assessment of reporting bias in our review due to current uncertainty about how to assess reporting bias in diagnostic test accuracy reviews, especially in the presence of heterogeneity (Deeks 2005).

Acknowledgements

This project was supported by the National Institute for Health Research, via Cochrane Infrastructure funding to the Cochrane Gynaecological, Neuro-oncology and Orphan Cancer Group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Systematic Reviews Programme, NIHR, NHS or the Department of Health.

We thank Jo Morrison for clinical and editorial advice and Clare Jess for her contribution to the editorial process.

Appendices

Appendix 1. Search strategy

MEDLINE search strategy

1. Symptoms scores
Search strategy

Database: Ovid MEDLINE(R) 1946 to April Week 2 2015

1 exp ovarian neoplasms/di
2 exp adnexal diseases/di
3 ((ovar$ or adnexal or fallopian or peritoneal$ or pelvic) adj3 (cancer$ or carcinoma$ or malignan$ or mass or masses or cyst or cysts or neoplas$ or tumor$ or tumour$)).tw.
4 ((borderline or border line) adj4 ovar$).tw.
5 exp Fallopian Tube Neoplasms/di
6 exp Peritoneal Neoplasms/di
7 exp pelvic neoplasms/di
8 ((epithelial or germ cell) adj5 ovar$).tw.
9 or/1-8
10 exp "Signs and Symptoms"/
11 symptom$.ti,ab.
12 exp early diagnosis/ or exp Diagnosis/
13 exp "Early Detection of Cancer"/
14 (early adj (sign$ or symptom$)).tw.
15 (abdom$ adj3 (pressure or pain$ or swelling$ or hard)).tw.
16 (bowel irregularit$ or bloat$ or fullness or satiet$ or gastro$).tw.
17 (fatigue or weight loss$ or weight gain$ or constipat$ or diarrhoea or diarrhea or gas).tw.
18 (nausea$ or indigestion).tw.
19 ((loss or lack) adj3 (energ$ or appetite$)).tw.
20 (urin$ adj3 (frequenc$ or urgenc$)).tw.
21 ((leg$ or ankle$) adj2 (swell$ or swollen)).tw.
22 ((abnormal or irregular or postmenopausal) adj1 vaginal adj (bleed$ or discharge$)).tw.
23 (pelvic discomfort$ or pelvic pain$ or chest pain$ or respirator$ difficult$ or lower back
pain$).tw.
24 or/10-22
25 9 and 24
26 (index or risk$ or score$ or scoring or checklist$ or rule$ or indices or tool$ or instrument$ or survey$ or questionnaire$ or interview$).tw.
27 25 and 26
28 limit 27 to (humans and yr="2009 - 2015")

2. Biomarkers
Search strategy

Database: Ovid MEDLINE(R) 1946 to April Week 3 2015

1 exp Ovarian Neoplasms/di
2 exp Adnexal Diseases/di
3 ((ovar$ or adnexal or fallopian or peritoneal$ or pelvic) adj3 (cancer$ or carcinoma$ or malignan$ or mass or masses or cyst or cysts or neoplas$ or tumour$ or tumor$)).tw.
4 ((borderline or border line) adj4 ovar$).tw.
5 exp Fallopian Tube Neoplasms/di
6 exp Peritoneal Neoplasms/di
7 exp Pelvic Neoplasms/di
8 ((epithelial or germ cell) adj5 ovar$).tw.
9 or/1-8
10 exp Tumor Markers, Biological/
11 exp Biological Markers/
12 Proteomics/
13 Genetic Markers/
14 Metabolomics/
15 multiplex$.tw.
16 multivariate.tw.
17 (CA125 or CA-125 or HE4 or OVA 1 or OVA1 or HCG or LDH or AFP or CEA).tw.
18 CA-125 Antigen/
19 Chorionic Gonadotropin/
20 L-Lactate Dehydrogenase/
21 alpha-Fetoproteins/
22 Carcinoembryonic Antigen/
23 or/10-22
24 9 and 23
25 limit 24 to (humans and yr="2009-2015")

3. Ultrasound/IOTA
Search strategy

Database: Ovid MEDLINE(R) 1946 to April Week 3 2015

1 exp Ovarian Neoplasms/di
2 exp Adnexal Diseases/di
3 ((borderline or border line) adj4 ovar$).tw.
4 exp Fallopian Tube Neoplasms/di
5 exp Peritoneal Neoplasms/di
6 exp Pelvic Neoplasms/di
7 ((ovar$ or adnexal or fallopian or peritoneal$ or pelvic) adj3 (cancer$ or carcinoma$ or malignan$ or mass or masses or cyst or cysts or neoplas$ or tumour$ or tumor$)).tw.
8 ((epithelial or germ cell) adj5 ovar$).tw.
9 or/1-8
10 exp ultrasonography/
11 ultraso$.tw.
12 (transvagina$ adj2 sonogra$).tw.
13 or/10-12
14 9 and 13
15 limit 14 to (human and yr=2009-2015)
16 IOTA.tw.
17 International Ovarian Tumor Analysis.tw.
18 ((ovarian or epithelial or adnex$ or fallopian or peritoneal or pelvic) adj3 (model$ or regress$ or rule$ or score$ or algorithm$ or term$ or definition$ or measure$)).ti,ab.
19 or/16-18
20 13 and 19
21 limit 20 to human
22 15 or 21

Appendix 2. Data extraction form

Data extraction: ROCkeTS

Please enter your initials (NR, RC) 

STUDY IDENTIFICATION AND STUDY TYPE

Study ID (number allocated in included studies PDFs and 'notes' field in reference manager)  Comments

Study authors

(surname and year)

  
Country in which study conducted  

Study design

Please choose from:

- 'Prospective' cross-sectional test accuracy study (P CS)

- 'Retrospective' cross-sectional test accuracy study (R CS)

- Case-control test accuracy study (CC)

- Comparison of the accuracy of tests or testing strategies in 2 different populations (e.g. a randomised trial of tests or testing strategies) (Between-person comparison - BPC)

- Within-person comparison of test accuracy (WPC)

- Unclear(U)

  

PATIENT SELECTION DETAIL

For studies comparing two index tests or testing strategies in different patient populations complete details for each patient population (copy and paste table if necessary)

FOR NON-COMPARATIVE STUDIES OR WITHIN-PERSON TEST COMPARISONS (WPC) assessing the accuracy of one index test or one index testing strategy describe methods of participant selection as reported (cut and paste from paper if possible)

Include total number of study participants

  Comments
 

FOR BETWEEN-PERSON COMPARATIVE STUDIES (BPC) of two index tests or testing strategies in different patient populations describe methods of participant selection receiving each index test or testing strategy as reported

(cut and paste from paper if possible)

Include total number of study participants receiving each test or testing strategy

  

Clinical setting mentioned

If yes

Y/N/U

Primary/community

Secondary/hospital/cancer unit/cancer centre

NR

 
Risk factors such as age, menopause, family history, BRCA status, other cancers mentioned in the study  

INCLUDED PATIENT CHARACTERISTICS DETAIL.

For studies comparing two index tests or testing strategies in different patient populations complete details for each patient population (copy and paste table if necessary)

FOR NON-COMPARATIVE STUDIES OR WITHIN-PERSON TEST COMPARISONS (WPC) describe characteristics of included patients as reported (cut and paste from paper if possible)  Comments
 
FOR BETWEEN-PERSON COMPARATIVE STUDIES (BPC) of two index tests or testing strategies in different patient populations describe characteristics of participants receiving each index test or testing strategy as reported (cut and paste from paper if possible)  

Age as reported or not reported ('NR')

(delete options as necessary)

- Age range:

- Age mean (SD):

- NR

 
Menopausal status(n/%)prepost
Prior test(s)

Symptoms

Signs

Biomarker/s

USS

NR

 
Histology Number (%)
Benign Number (%)
Endometriosis  
Others  
Tumours of low malignant potential (LMP/borderline) Number (%)
Malignant Number (%)
I Number (%)
II Number (%)
III Number (%)
IV Number (%)

PATIENT SELECTION RISK OF BIAS

PATIENT SELECTION

A. Risk of bias

Describe methods of patient selection:
a) Was a consecutive or random sample of patients enrolled?Yes/No/Unclear
b) Was a case-control design avoided?Yes/No/Unclear

c) Did the study avoid inappropriate exclusions?

a) include all ages and regardless of menopausal status or justify restrictions

b) include all stages of ovarian cancer.

c) include co-morbidities such as infertility and endometriosis

Yes/No/Unclear

Could the selection of patients have introduced bias?

If a) and b) and c) 'YES' = low risk of bias

If a) or b)or c) 'No' = high risk of bias

If a) or b)or c) 'Unclear' = unclear risk of bias

RISK: LOW/HIGH/UNCLEAR
B. Concerns regarding applicability
Describe included patients (prior testing, presentation, intended use of index test and setting):

Is there concern that the included patients do not match the review question?

a) Patients all symptomatic OR symptomatic and asymptomatic can be disaggregated

b) Prior tests primary care: self reported symptoms

c) Prior tests secondary care: self reported symptoms OR self reported symptoms PLUS one or more of biochemical markers and ultrasound

CONCERN: LOW/HIGH/UNCLEAR

Low - a) and b) and C) Yes

High - a) or b) or C) No

Unclear - a) or b) or C) Unclear

INDEX TEST(S) DETAILS

For studies comparing two index tests or testing strategies in different patient populations complete details for each index test or testing strategy (copy and paste table if necessary)

INDEX TEST(S) Test (note the type of symptom, biomarker, ultrasound variable) Test threshold (what constitutes abnormal test symptoms, signs, biomarkers or USS: threshold value or fixed value of abnormality Threshold value or fixed value of abnormality Clinical setting in which index test performedIf operator was blinded to previous test (s) resultDescribe what prior tests information was available to those interpreting index test

Comments

Detail about conduct of index test that might be a source of heterogeneity (e.g. experience of operator (ultrasound, symptoms), type of technology (biomarkers))

- For test combinations test order and rule for combining tests

Symptoms (list)

Yes

No

NR

(Number of symptoms and time element)

Primary/community/family practice

Secondary: hospital/cancer unit/cancer centre/gyn oncologist

NR/U/Mixed

Yes

No

NR

  
Signs

Yes

No

NR

 

Primary/community/family practice

Secondary: hospital/cancer unit/cancer centre/gyn oncologist

NR/U/Mixed

Yes

No

NR

  
Biomarkers

Yes

No

NR

 

Primary/community/family practice

Secondary: hospital/cancer unit/cancer centre/gyn oncologist

NR/U/Mixed

Yes

No

NR

  
USS

Yes

No

NR

 

Primary/community/family practice

Secondary: hospital/cancer unit/cancer centre/gyn oncologist

NR/U/Mixed

Yes

No

NR

  
Combination

Yes

No

NR

 

Primary/community/family practice

Secondary: hospital/cancer unit/cancer centre/gyn oncologist

NR/U/Mixed

Yes

No

NR

 

If a combination of tests (a testing strategy) was used for each participant please detail:

- What combination of tests?

- The order in which tests were performed?

- The rule for combining test results:

E.g. + and + = surgery

E.g. + and - = surgery

E.g. + and - = no surgery

E.g. – and - = no surgery

Etc

or not reported ('NR')

Combination:

Order:

Rule for combining tests:

Comments

INDEX TEST(S)

(If more than one index test was used, please complete for each test).

A1. Risk of bias (symptoms)
Describe the index test and how it was conducted and interpreted:
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low- a) and b) Yes

Unclear - a) or b) Unclear

A2. Risk of bias (ultrasound)
Describe the index test and how it was conducted and interpreted:
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

A3. Risk of bias (Biomarkers) rule different because objective test in comparison to US and symptom elicitation
Describe the index test and how it was conducted and interpreted:
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - b) No or (a) and b)) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

A4. Risk of bias (within-study combination) 
Describe the index test and how it was conducted and interpreted: 
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
c) i) Were symptoms/signs interpreted without knowledge of ultrasound or biomarkers; ii) was ultrasound interpreted without knowledge of biomarkersYes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) or c) No

Low - a) and b) or c)Yes

Unclear - a) or b) or c) Unclear

B. Concerns regarding applicability
a) Is the skill of person performing US and eliciting symptoms detailed? (level of training and/or experience) Yes/No/Unclear/NA
b) Was US performed in all patients by non-specialised sonographers Yes/No/Unclear
c) Was US performed with knowledge of symptoms/signs/biomarkers Yes/No/Unclear
Is there concern that the index test, its conduct or interpretation differ from the review question?

CONCERN: LOW/HIGH/UNCLEAR

High - a) or b) or C) No

Low - a) and b) or c)Yes

Unclear - a) or b) Unclear

REFERENCE STANDARD AND TARGET CONDITION DETAIL

REFERENCE STANDARD

Surgery (%)

F ollow-up (%) and length of follow-up

TARGET CONDITION

TARGET CONDITION

Target conditions are ovarian cancer (see list of different histology of ovarian cancer)

Epithelial ovarian cancer (EOC)Number (%) Comments
Serous  
Mucinous 
Endometrioid 
Clear 
Germ cell tumours 
Stromal cell tumours 
LMP 
  Others (metastasis)  

REFERENCE STANDARD RISK OF BIAS

A. Risk of bias
Describe the reference standard and how it was conducted and interpreted:

a) Is the reference standard likely to correctly classify the target condition?

Index test +ve:

Histology following laparoscopy or laparotomy

Yes/No/Unclear

b) Is the reference standard likely to correctly classify the target condition?

Index test –ve:

Yes - if a minimum follow-up period of greater than 12 months is included as required to assess whether the target condition is present

No - if a minimum 12-month follow-up period is absent

Unclear - if no information on follow-up period is included

Yes/No/Unclear
Could the reference standard, its conduct or its interpretation have introduced bias

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

DOMAIN 3: REFERENCE STANDARD (continued)

B. Concerns regarding applicability

Is there concern that the target condition as defined by the reference standard does not match the review question?

Yes - ovarian cancer, borderline and metastatic disease are not differentiated (and cannot be for analysis)

No - ovarian cancer, borderline and metastatic disease can be differentiated for analysis

Unclear - unclear if ovarian cancer, borderline and metastatic disease have been disaggregated

CONCERN: Yes/No/Unclear

DOMAIN 4: FLOW AND TIMING

A. Risk of bias

Describe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2 x 2 table (refer to study flow diagram):

Describe the time interval and any interventions between index test(s) and reference standard:

a) Was there less than 3 months interval between application of each index test and application of the reference standard? Yes/No/Unclear
b) Did all patients receive a reference standard? Yes/No/Unclear
c) Did all index test -ve patients receive the same reference standard? Yes/No/Unclear
d) Were all patients who underwent testing included in the analysis? Yes/No/Unclear
Could the conduct or interpretation of reference standard have introduced bias?

RISK: LOW/HIGH/UNCLEAR

LOW - a) and b) and c) and d) - Yes

HIGH - a) and b) and c) and d) - No

UNCLEAR - a) and b) and c) and d) - Unclear

COMPARATIVE DOMAIN (if applicable)

A. Risk of bias

Describe the selection process for participants to receive one or other index test or index testing strategy

Describe the time interval and any interventions between index test(s) for within-person test comparisons

a) For studies comparing two or more index tests or testing strategies in different patient populations were the selection criteria for participants receiving one or other index test or testing strategy the same? Yes/No/Unclear/NA

b) For within-study comparisons of index tests:

- was the interval between application of each index test < 3 months

Yes/No/Unclear/NA

c) For within-study comparisons of individual index tests:

- were index tests interpreted blind to the results of other index test results

Yes/No/Unclear/NA

Could the conduct of the comparative study have introduced bias?

LOW - a) OR (b) and c)) - Yes

HIGH - a) OR (b) and c)) - No

UNCLEAR - a) OR (b) or c)) - Unclear

RISK: LOW/HIGH/UNCLEAR
B. Concerns regarding applicability
Describe included patients (prior testing, presentation, intended use of index test and setting):

Is there concern that included patients have been selected in a different way to participants in non-comparative studies

Low - No

High - Yes

Unclear - Unclear

CONCERN: LOW/HIGH/UNCLEAR

RISK OF BIAS FOR MULTIVARIABLE DIAGNOSTIC MODELLING STUDIES (if applicable)

1. Participant selection DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were appropriate data sources used, e.g. cohort, RCT or nested case-control study data? DEV Yes/No/Unclear  
b) Were participants enrolled at a similar state of health, or were predictors considered to account for differences?  DEV Yes/No/Unclear VAL Yes/No/Unclear

Could the selection of patients have introduced bias?

HIGH: a) OR a) and b) - YES

LOW: a) OR a) and b) - NO

UNCLEAR: a) OR a) and b) - UNCLEAR

  HIGH/LOW/ UNCLEAR  HIGH/LOW/ UNCLEAR
3. Predictors DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were predictors defined and assessed in a similar way for all participants?   DEV Yes/No/Unclear  
b) Are all predictors available at the time the model is intended to be used DEV Yes/No/Unclear VAL Yes/No/Unclear
c) Were all relevant predictors analysed?: No if symptoms only; No if US index test only; No if combination of index tests (symptoms, US and biomarkers) but miss out US OR Symptom OR FDA approved biomarkers DEV Yes/No/Unclear VAL Yes/No/Unclear

Could the definition, measurement or analysis of predictors introduced bias?

HIGH: a) OR b) OR c) - YES

LOW: a) OR b) OR c) - NO

UNCLEAR: a) OR b) OR c) - UNCLEAR

DEV HIGH/LOW/ UNCLEAR VAL HIGH/LOW/ UNCLEAR
3. ANALYSIS DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were there a reasonable number of outcome events? DEV Yes/No/Unclear  
b) Were there a reasonable number of outcome events?   VAL Yes/No/Unclear
c) Were non-binary predictors handled appropriately? DEV Yes/No/Unclear VAL Yes/No/Unclear
d) Was selection of predictors based on univariable analysis avoided? DEV Yes/No/Unclear  
e) Do predictors and their assigned weights in the final model correspond to the results from multivariable analysis? DEV Yes/No/Unclear  
f) For the model or any simplified score, were relevant performance measures evaluated, e.g. calibration, discrimination, (re)classification and net benefit? DEV Yes/No/Unclear VAL Yes/No/Unclear
g) Was the model recalibrated or was it likely (based on the evidence presented, e.g. calibration plot) that recalibration was not needed? DEV Yes/No/Unclear VAL Yes/No/Unclear
h) Was model validation undertaken in individuals other than those in the model development (external validation)?   VAL Yes/No/Unclear

Could the analysis strategy have introduced bias?

HIGH: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - YES

LOW: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - NO

UNCLEAR: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - UNCLEAR

DEV HIGH/LOW/ UNCLEAR VAL HIGH/LOW/ UNCLEAR

TEST ACCURACY DATA

If reported please complete the following 2 x 2 contingency table. For studies investigating the accuracy of more than one index test or testing strategy please complete a 2 x 2 table for each test/testing strategy (cut and paste table as necessary). Imaging test results will be dichotomous.

LOWEST LEVEL OF AGGREGATION:

Fill in data as available.

  REFERENCE STANDARD (ovarian cancer) REFERENCE STANDARD (borderline) REFERENCE STANDARD (benign) 
INDEX TEST/TESTING STRATEGY +ve for ovarian cancer    TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve for borderline    TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve for benign    TOTAL INDEX TEST -ve
  DISEASE +ve TOTAL borderline DISEASE -ve TOTAL 'N'
Aggregation borderline +ve TOTAL DISEASE +ve TOTAL DISEASE -ve TOTAL 'N'
Aggregation borderline -ve TOTAL DISEASE +ve TOTAL DISEASE -ve TOTAL 'N'

INSERT ANOTHER MORE DETAILED TABLE WITH SUB-CATEGORIES OF OVARIAN CANCER FOR LOW GRADE AND HIGH GRADE, TYPE 1 AND TYPE 2, EARLY-STAGE AND LATE-STAGE

  REFERENCE STANDARD (early-stage) REFERENCE STANDARD (advanced-stage) 
INDEX TEST/TESTING STRATEGY +ve (early-stage)   TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve (late-stage)   TOTAL INDEX TEST -ve
  DISEASE +ve DISEASE -ve TOTAL 'N'
  REFERENCE STANDARD (Type 1) REFERENCE STANDARD (Type 2) 
INDEX TEST/TESTING STRATEGY +ve (Type 1)   TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve (Type 2)   TOTAL INDEX TEST -ve
  DISEASE +ve DISEASE -ve TOTAL 'N'

Appendix 3. QUADAS-2

PATIENT SELECTION RISK OF BIAS

PATIENT SELECTION

A. Risk of bias

Describe methods of patient selection:
a) Was a consecutive or random sample of patients enrolled?Yes/No/Unclear
b) Was a case-control design avoided?Yes/No/Unclear

c) Did the study avoid inappropriate exclusions?

a) include all ages and regardless of menopausal status or justify restrictions

b) include all stages of ovarian cancer

c) include co-morbidities such as infertility and endometriosis

Yes/No/Unclear

Could the selection of patients have introduced bias?

If a) and b) and c) 'YES'= low risk of bias

If a) or b) or c) 'No' = high risk of bias

If a) or b) or c) 'Unclear'= unclear risk of bias

RISK: LOW/HIGH/UNCLEAR
B. Concerns regarding applicability
Describe included patients (prior testing, presentation, intended use of index test and setting):

Is there concern that the included patients do not match the review question?

a) Patients all symptomatic OR symptomatic and asymptomatic can be disaggregated

b) Prior tests primary care: self reported symptoms

c) Prior tests secondary care: self reported symptoms OR self reported symptoms PLUS one or more of biochemical markers and ultrasound

CONCERN: LOW/HIGH/UNCLEAR

Low - a) and b) and C) Yes

High - a) or b) or C) No

Unclear - a) or b) or C) Unclear

INDEX TEST(S)

(If more than one index test was used, please complete for each test).

A1. Risk of bias (symptoms)
Describe the index test and how it was conducted and interpreted:
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

A2. Risk of bias (ultrasound)

Describe the index test and how it was conducted and interpreted:

a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?

Yes/No/Unclear

b) If a threshold was used, was it pre-specified?

Yes/No/Unclear

Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

A3. Risk of bias (biomarkers) rule different because objective test in comparison to US and symptom elicitation

Describe the index test and how it was conducted and interpreted:
a) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?Yes/No/Unclear
b) If a threshold was used, was it pre-specified?Yes/No/Unclear
Could the conduct or interpretation of the index test have introduced bias?

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

B. Concerns regarding applicability
a) Is the skill of person performing US and eliciting symptoms detailed? (level of training and/or experience) Yes/No/Unclear/NA
b) Was US performed in all patients by non-specialised sonographers Yes/No/Unclear
c) Was US performed with knowledge of symptoms/signs/biomarkers Yes/No/Unclear
Is there concern that the index test, its conduct or interpretation differ from the review question?

CONCERN: LOW/HIGH/UNCLEAR

High - a) or b) or C)No

Low - a) and b) or c)Yes

Unclear - a) or b) Unclear

REFERENCE STANDARD RISK OF BIAS

A. Risk of bias
Describe the reference standard and how it was conducted and interpreted:

a) Is the reference standard likely to correctly classify the target condition?

Index test +ve:

Histology following laparoscopy or laparotomy

Yes/No/Unclear

b) Is the reference standard likely to correctly classify the target condition?

Index test –ve:

Yes - if a minimum follow-up period of greater than 12 months is included as required to assess whether the target condition is present

No - if a minimum 12-month follow-up period is absent

Unclear - if no information on follow-up period is included

Yes/No/Unclear
Could the reference standard, its conduct or its interpretation have introduced bias

RISK: LOW/HIGH/UNCLEAR

High - a) or b) No

Low - a) and b) Yes

Unclear - a) or b) Unclear

DOMAIN 3: REFERENCE STANDARD (continued)

B. Concerns regarding applicability

Is there concern that the target condition as defined by the reference standard does not match the review question?

Yes - ovarian cancer, borderline and metastatic disease are not differentiated (and cannot be for analysis)

No - ovarian cancer, borderline and metastatic disease can be differentiated for analysis

Unclear - unclear if ovarian cancer, borderline and metastatic disease have been disaggregated

CONCERN: Yes/No/Unclear

DOMAIN 4: FLOW AND TIMING

A. Risk of bias

Describe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2 x 2 table (refer to study flow diagram):

Describe the time interval and any interventions between index test(s) and reference standard:

a) Was there less than 3 months interval between application of each index test and application of the reference standard? Yes/No/Unclear
b) Did all patients receive a reference standard? Yes/No/Unclear
c) Did all index test -ve patients receive the same reference standard? Yes/No/Unclear
d) Were all patients who underwent testing included in the analysis? Yes/No/Unclear
Could the conduct or interpretation of reference standard have introduced bias?

RISK: LOW/HIGH/UNCLEAR

LOW - a) and b) and c) and d) - Yes

HIGH - a) and b) and c) and d) - No

UNCLEAR - a) and b) and c) and d) - Unclear

COMPARATIVE DOMAIN (if applicable)

A. Risk of bias

Describe the selection process for participants to receive one or other index test or index testing strategy

Describe the time interval and any interventions between index test(s) for within-person test comparisons

a) For studies comparing two or more index tests or testing strategies in different patient populations were the selection criteria for participants receiving one or other index test or testing strategy the same? Yes/No/Unclear/NA

b) For within-study comparisons of index tests:

- was the interval between application of each index test < 3 months

Yes/No/Unclear/NA

c) For within-study comparisons of individual index tests:

- were index tests interpreted blind to the results of other index test results

Yes/No/Unclear/NA

Could the conduct of the comparative study have introduced bias?

LOW - a) OR (b) and c))- Yes

HIGH - a) OR (b) and c)) - No

UNCLEAR - a) OR (b) or c)) - Unclear

RISK: LOW/HIGH/UNCLEAR
B. Concerns regarding applicability
Describe included patients (prior testing, presentation, intended use of index test and setting):

Is there concern that included patients have been selected in a different way to participants in non-comparative studies

Low - No

High - Yes

Unclear - Unclear

CONCERN: LOW/HIGH/UNCLEAR

RISK OF BIAS FOR MULTIVARIABLE DIAGNOSTIC MODELLING STUDIES (if applicable)

1. Participant selection DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were appropriate data sources used, e.g. cohort, RCT or nested case-control study data? DEV Yes/No/Unclear  
b) Were participants enrolled at a similar state of health, or were predictors considered to account for differences? DEV Yes/No/Unclear VAL Yes/No/Unclear

Could the selection of patients have introduced bias?

HIGH: a) OR a) and b) - YES

LOW: a) OR a) and b) - NO

UNCLEAR: a) OR a) and b) - UNCLEAR

  HIGH/LOW/ UNCLEAR  HIGH/LOW/ UNCLEAR
3. Predictors DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were predictors defined and assessed in a similar way for all participants? DEV Yes/No/Unclear  
b) Are all predictors available at the time the model is intended to be used DEV Yes/No/Unclear VAL Yes/No/Unclear
c) Were all relevant predictors analysed?: No if symptoms only; No if US index test only; No if combination of index tests (symptoms, US and biomarkers) but miss out US OR Symptom OR FDA approved biomarkers DEV Yes/No/Unclear VAL Yes/No/Unclear

Could the definition, measurement or analysis of predictors introduced bias?

HIGH: a) OR b) OR c) - YES

LOW: a) OR b) OR c) - NO

UNCLEAR: a) OR b) OR c) - UNCLEAR

DEV HIGH/LOW/ UNCLEAR VAL HIGH/LOW/ UNCLEAR
3. ANALYSIS DEV Yes/No/Unclear VAL Yes/No/Unclear
a) Were there a reasonable number of outcome events? DEV Yes/No/Unclear  
b) Were there a reasonable number of outcome events?   VAL Yes/No/Unclear
c) Were non-binary predictors handled appropriately? DEV Yes/No/Unclear VAL Yes/No/Unclear
d) Was selection of predictors based on univariable analysis avoided? DEV Yes/No/Unclear  
e) Do predictors and their assigned weights in the final model correspond to the results from multivariable analysis? DEV Yes/No/Unclear  
f) For the model or any simplified score, were relevant performance measures evaluated, e.g. calibration, discrimination, (re)classification and net benefit? DEV Yes/No/Unclear VAL Yes/No/Unclear
g) Was the model recalibrated or was it likely (based on the evidence presented, e.g. calibration plot) that recalibration was not needed? DEV Yes/No/Unclear VAL Yes/No/Unclear
h) Was model validation undertaken in individuals other than those in the model development (external validation)?   VAL Yes/No/Unclear

Could the analysis strategy have introduced bias?

HIGH: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - YES

LOW: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - NO

UNCLEAR: a) OR b) OR c) OR d) OR e) OR f) OR g) OR h) - UNCLEAR

DEV HIGH/LOW/ UNCLEAR VAL HIGH/LOW/ UNCLEAR

TEST ACCURACY DATA

If reported please complete the following 2 x 2 contingency table. For studies investigating the accuracy of more than one index test or testing strategy please complete a 2 x 2 table for each test/testing strategy (cut and paste table as necessary). Imaging test results will be dichotomous.

LOWEST LEVEL OF AGGREGATION:

Fill in data as available.

  REFERENCE STANDARD (ovarian cancer) REFERENCE STANDARD (borderline) REFERENCE STANDARD (benign) 
INDEX TEST/TESTING STRATEGY +ve for ovarian cancer    TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve for borderline    TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve for benign    TOTAL INDEX TEST -ve
  DISEASE +ve TOTAL borderline DISEASE -ve TOTAL 'N'
Aggregation borderline +ve TOTAL DISEASE +ve TOTAL DISEASE -ve TOTAL 'N'
Aggregation borderline -ve TOTAL DISEASE +ve TOTAL DISEASE -ve TOTAL 'N'

INSERT ANOTHER MORE DETAILED TABLE WITH SUB-CATEGORIES OF OVARIAN CANCER FOR TYPE 1 AND TYPE 2

  REFERENCE STANDARD (early-stage) REFERENCE STANDARD (advanced-stage) 
INDEX TEST/TESTING STRATEGY +ve (early-stage)   TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve (late-stage)   TOTAL INDEX TEST -ve
  DISEASE +ve DISEASE -ve TOTAL 'N'
  REFERENCE STANDARD (Type 1) REFERENCE STANDARD (Type 2) 
INDEX TEST/TESTING STRATEGY +ve (Type 1)   TOTAL INDEX TEST +ve
INDEX TEST/TESTING STRATEGY +ve (Type 2)   TOTAL INDEX TEST -ve
  DISEASE +ve DISEASE -ve TOTAL 'N'

What's new

DateEventDescription
21 September 2016AmendedContact details updated.

Contributions of authors

  • Guarantor of the review: SS, JD

  • Conceiving the idea: SS, CD, JD

  • Designing and coordinating the review: NR, CD, SS, JD

  • Designing search strategies: SB, NR, CD, SS, RN

  • Screening, data extraction, quality assessment: NR, RC, CD

  • Obtaining and screening data on unpublished studies: NR, RC

  • Data management of the review: NR

  • Analysis and interpretation of data: SM, KS, NR, SS, CD, JD

  • Writing the review: NR, CD, SS

  • Providing general advice on the review: RN, MB, SK, CD, SS, JD

  • Securing funding for the review: SS, CD, JD

Declarations of interest

This review and participation of all authors in it has been funded as part of a programme of research (ROCkeTS - Refining Ovarian Cancer Test Accuracy Scores).

Moji Balogun - None known
Susan Bayliss - None known
Rita Champaneria - None known
Clare Davenport - Received funding from the NIHR HTA to support this review in a methodological capacity.
Jon Deeks - This work is a funded project, funded by the NIHR HTA Commissioning Board.
Sean Kehoe - Recieved payment from Roche and Astra-Zenica for lecturing expenses and Sanofi Pastuer paid expenses for attendance at EUROGIN meeting 2015.
Susan Mallett - Co-applicant on one funded government funded grant (mpMRI imaging) and one recently submitted grant (circulating DNA) for the diagnosis of ovarian cancer.
Richard Neal - Holds, or has held grants from a number of bodies including: HTA, NIHR, NISCHR, Cancer Research Wales, Tenovus, Cancer Research UK, Department of Health, Prostate Cancer UK. Has received a fee for a lectures for Manitoba Cancer Care.
Nirmala Rai - My participation in this review is funded by the NIHR grant listed.
Kym Snell - My participation in this review is funded by the NIHR grant listed.
Sudha Sundar - None known

Sources of support

Internal sources

  • None, Other.

External sources

  • National Institute for Health Research (HTA programme: 13/13/01), UK.

Ancillary