Blood CEA levels for detecting recurrent colorectal cancer

  • Review
  • Diagnostic

Authors


Abstract

Background

Testing for carcino-embryonic antigen (CEA) in the blood is a recommended part of follow-up to detect recurrence of colorectal cancer following primary curative treatment. There is substantial clinical variation in the cut-off level applied to trigger further investigation.

Objectives

To determine the diagnostic performance of different blood CEA levels in identifying people with colorectal cancer recurrence in order to inform clinical practice.

Search methods

We conducted all searches to January 29 2014. We applied no language limits to the searches, and translated non-English manuscripts. We searched for relevant reviews in the MEDLINE, EMBASE, MEDION and DARE databases. We searched for primary studies (including conference abstracts) in the Cochrane Central Register of Controlled Trials (CENTRAL), in MEDLINE, EMBASE, and the Science Citation Index & Conference Proceedings Citation Index – Science. We identified ongoing studies by searching WHO ICTRP and the ASCO meeting library.

Selection criteria

We included cross-sectional diagnostic test accuracy studies, cohort studies, and randomised controlled trials (RCTs) of post-resection colorectal cancer follow-up that compared CEA to a reference standard. We included studies only if we could extract 2 x 2 accuracy data. We excluded case-control studies, as the ratio of cases to controls is determined by the study design, making the data unsuitable for assessing test accuracy.

Data collection and analysis

Two review authors (BDN, IP) assessed the quality of all articles independently, discussing any disagreements. Where we could not reach consensus, a third author (BS) acted as moderator. We assessed methodological quality against QUADAS-2 criteria. We extracted binary diagnostic accuracy data from all included studies as 2 x 2 tables. We conducted a bivariate meta-analysis. We used the xtmelogit command in Stata to produce the pooled estimates of sensitivity and specificity and we also produced hierarchical summary ROC plots.

Main results

In the 52 included studies, sensitivity ranged from 41% to 97% and specificity from 52% to 100%. In the seven studies reporting the impact of applying a threshold of 2.5 µg/L, pooled sensitivity was 82% (95% confidence interval (CI) 78% to 86%) and pooled specificity 80% (95% CI 59% to 92%). In the 23 studies reporting the impact of applying a threshold of 5 µg/L, pooled sensitivity was 71% (95% CI 64% to 76%) and pooled specificity 88% (95% CI 84% to 92%). In the seven studies reporting the impact of applying a threshold of 10 µg/L, pooled sensitivity was 68% (95% CI 53% to 79%) and pooled specificity 97% (95% CI 90% to 99%).

Authors' conclusions

CEA is insufficiently sensitive to be used alone, even with a low threshold. It is therefore essential to augment CEA monitoring with another diagnostic modality in order to avoid missed cases. Trying to improve sensitivity by adopting a low threshold is a poor strategy because of the high numbers of false alarms generated. We therefore recommend monitoring for colorectal cancer recurrence with more than one diagnostic modality but applying the highest CEA cut-off assessed (10 µg/L).

摘要

偵測大腸癌復發的血液CEA值

背景

檢測血液中的癌胚抗原(CEA)作為在主要治療後,追蹤偵測大腸癌復發的建議檢測。然而其決定啟動進一步檢查的臨界濃度仍有很大的臨床歧異。

目的

決定不同血液CEA值對找出大腸癌患者復發的診斷表現,以知會臨床照護作業。

搜尋策略

我們全面搜尋至2014年1月29日,沒有限制語言並翻譯非英文的文章。我們在MEDLINE、EMBASE、MEDION與DARE等資料庫中搜尋相關回顧文獻。我們亦在Cochrane Central Register of Controlled Trials (CENTRAL)、MEDLINE、EMBASE與Science Citation Index & Conference Proceedings Citation Index – Science當中搜尋原始研究(包括會議摘要)。我們亦透過WHO ICTRP與ASCO會議圖書館等資料庫,搜尋正在進行中的研究。

選擇標準

我們收錄了大腸癌切除術後,追蹤比較CEA與參考標準之橫斷型診斷試驗準確度研究、世代研究與隨機對照試驗(RCTs)。我們只收錄能夠擷取出2 x 2準確度資料的研究。我們排除病例對照研究,因為其病例與對照比係決定於其研究設計,會造成其資料不適合用於評估測試的準確性。

資料收集與分析

兩位文獻作者(BDN、IP)獨立地評估所有文獻的品質,並對任何異議進行討論。對無法達成共識的部分,由第三位作者(BS)進行仲裁。我們依據QUADAS-2準則評估方法學的品質。我們在所有收錄的研究中,以2 x 2的表格萃取二元的診斷準確性資料,並進行了二變項統合分析。我們在Stata中以xtemogit指令進行敏感度與特異性的合併估計,並且也產生階層式接受者操作特性曲線圖(ROC曲線圖)。

主要結果

在收錄的52個研究中,敏感性範圍由41%至97%,而特異性則由52%至100%。7個研究有報告閥值設為2.5 µg/L的結果,其合併敏感性為82%[95% 信賴區間(CI) 78%至86%],而合併特異性則為80%(95% CI 59%至92%)。23個研究有報告閥值設為5 µg/L的結果,其合併敏感性為71%(95% CI 64%至76%),而合併特異性則為88%(95% CI 84%至92%)。 7個個研究有報告閥值設為10 µg/L的結果,其合併敏感性為68%(95% CI 53%至79%),而合併特異性則為97%(95% CI 90%至99%)。

作者結論

即使閥值濃度設的很低,CEA值仍不夠敏感,無法單獨使用。因此,以另外一個診斷方法,加強對CEA的監測,以避免遺漏個案是必要的。試著以採用低閥值來改善敏感性,並非好的策略,因為這會產生大量的假警訊。因此我們建議以超過一種診斷方式來監測大腸癌的復發,但評估要以最高的CEA臨界濃度(10 µg/L)。

譯註

審稿者:刁茂盟
服務單位:高雄長庚紀念醫院 小兒胃腸科
職稱:主任

本翻譯計畫由臺北醫學大學考科藍臺灣研究中心(Cochrane Taiwan)、台灣實證醫學學會及東亞考科藍聯盟(EACA)統籌執行
聯絡E-mail:cochranetaiwan@tmu.edu.tw

Plain language summary

Detecting recurrent colorectal cancer by testing for blood carcino-embryonic antigen (CEA).

Background

After surgery for cancer in the colon or rectum (colorectal cancer), most people are intensively followed up for at least five years to monitor for signs of the cancer returning. When this occurs, it usually causes a rise in a blood protein called CEA (carcino-embryonic antigen). An increased level of CEA can be picked up by a blood test, which is normally done every three to six months after colorectal cancer surgery. Those people with raised CEA levels are further investigated by x-ray imaging (usually a scan of the chest, abdomen and pelvis). We conducted this review to help decide what level of blood CEA should lead to further investigation.

Key Results

This review shows that setting a low cut-off point will increase the number of genuine cases of colorectal cancer recurrence that are detected (true positives), but a low cut-off will also cause unnecessary alarm by incorrectly classifying too many cases that are not actually recurrences (false positives). In addition, this review shows that a rise in CEA does not occur in up to 20% of patients with a true recurrence (false negatives). The current evidence supports using the highest cut-off point assessed (10 µg/L), but that adding another diagnostic modality (e.g. a single scan of the chest, abdomen and pelvis at 12 to 18 months) is necessary in order to avoid the missed cases.

Laički sažetak

Dokazivanje ponovne pojave kolorektalnog karcinoma mjerenjem karcinoembrionalnog antigena (CEA) u krvi

Dosadašnje spoznaje

Nakon kiruškog odstranjenja karcinoma u debelom (kolon) ili stražnjem (rektum) crijevu (kolorektalni karcinom) potrebno je pažljivo pratiti bolesnike tijekom 5 godina radi otkrivanja eventualne ponovne bolesti. Ponovna pojava karcinoma može uzrokovati povišenje bjelančevine karcionoembrionalnog antigena (engl. carcino-embryonic antigen, CEA) u krvi. U tih bolesnika se ta pretraga ponavlja svakih 3-6 mjeseci. U osoba s povišenim koncentracijama CEA učine se i dodatne pretrage rendgenom (snimke prsnog koša, trbuha i zdjelice). U ovom Cochrane sustavnom pregledu analizirani su dosad provedeni klinički pokusi kako bismo istražili značenje koncentracije CEA na daljnje pretrage, odnosno kolika razina CEA bi trebala potaknuti upućivanje bolesnika na daljnje pretrage.

Ključni rezultati

Postavljanje niske granične vrijednosti za ovu pretragu će dovesti do povećanog broja otkrivanja ponavljane pojave kolorektalnog karcinoma (stvarno pozitivni), ali će se istovremeno u nekih osoba postaviti lažna dijagnoza bolesti koje zapravo nema. U 20% bolesnika s ponovljenom pojavom bolesti ne dođe do povišenja CEA (tzv. lažno negativni). Postojeći dokazi ukazuju da bi bolesnika trebalo uputiti na dodatne pretrage ako ima visoke granične vrijednosti CEA (10 µg/L), ali isto tako da je potrebno napraviti još neku dijagnostičku metodu (sama snimka prsnog koša, trbuha ili zdjelice12-18 mjeseci nakon prvog liječenja) kako se ne bi propustili otkriti slučajevi ponovljene bolesti.

Bilješke prijevoda

Hrvatski Cochrane
Prevela: Vesna Kušec
Ovaj sažetak preveden je u okviru volonterskog projekta prevođenja Cochrane sažetaka. Uključite se u projekt i pomozite nam u prevođenju brojnih preostalih Cochrane sažetaka koji su još uvijek dostupni samo na engleskom jeziku. Kontakt: cochrane_croatia@mefst.hr

Laienverständliche Zusammenfassung

Erkennung von wiederkehrendem Darmkrebs durch Blutuntersuchungen auf Carcinoembryonales Antigen (CEA)

Hintergrund

Nach einer Krebsoperation im Dickdarm oder Rektum (kolorektales Karzinom) werden die meisten Patienten mindestens fünf Jahre lang engmaschig überwacht, um frühzeitig Anzeichen für das Wiederauftreten des Krebses zu erkennen. Tritt dieser Fall ein, verursacht er in der Regel den Anstieg eines Blutproteins namens CEA (Carcinoembryonales Antigen). Ein erhöhter CEA-Wert lässt sich durch einen Bluttest bestimmen, der nach einer Darmkrebsoperation normalerweise alle 3 bis 6 Monate durchgeführt wird. Patienten mit erhöhten CEA-Werten werden dann mittels Bildgebung (in der Regel eine Computertomographie (CT) von Brust, Bauch und Becken) genauer untersucht. Wir führten diesen Review durch, um herauszufinden, ab welchem CEA-Wert genauere Untersuchungen folgen sollten.

Hauptergebnisse

Dieser Review zeigt, dass ein niedriger Schwellenwert die Anzahl echter Fälle von wiederkehrendem kolorektalem Karzinom (Rezidiv) erhöht (richtig positiv), andererseits jedoch auch unnötige Besorgnis schafft, indem dadurch zu viele Fälle als bedenklich eingestuft werden, bei denen es sich nicht um ein Rezidiv handelt (falsch positiv). Zusätzlich zeigt dieser Review, dass bei bis zu 20 % der Patienten mit einem tatsächlichen Rezidiv der CEA-Wert nicht erhöht ist (falsch negativ). Die gegenwärtige Evidenz unterstützt die Anwendung des höchsten beurteilten Schwellenwertes (10 µg/l), aber auch die Notwendigkeit, ein weiteres Diagnosemittel einzusetzen (z. B. eine Computertomographie von Brust, Bauch und Becken nach 12 bis 18 Monaten), um alle Fälle zu erfassen.

Anmerkungen zur Übersetzung

S. Schmidt-Wussow, freigegeben durch Cochrane Schweiz.

Streszczenie prostym językiem

Wykrywanie wznowy raka jelita grubego za pomocą oznaczania stężenia antygenu rakowo-płodowego (CEA) we krwi.

Wprowadzenie

Większość pacjentów po operacji raka jelita grubego lub raka odbytnicy (rak jelita grubego) obejmuje się intensywną obserwacją przez co najmniej 5 lat, w celu monitorowania pod kątem wznowy tego nowotworu. Pojawienie się wznowy wiąże się zazwyczaj ze stwierdzeniem we krwi zwiększonego stężenia białka, zwanego CEA (antygen rakowo-płodowy). Zwiększone stężenie CEA można wykryć w badaniu krwi, które zazwyczaj wykonuje się co trzy do sześciu miesięcy po operacji raka jelita grubego. Osoby, u których wykryto zwiększone stężenie CEA poddaje się dalszej diagnostyce, która polega na prześwietlaniu promieniami rentgenowskimi (zazwyczaj klatki piersiowej, brzucha i miednicy). Celem niniejszego przeglądu jest ustalenie przy jakim stężeniu CEA we krwi należy przeprowadzać dalszą diagnostykę.

Główne wyniki

Wyniki tego przeglądu wskazują, że ustalenie punktu odcięcia dla stężenia CEA na niskim poziomie doprowadzi do zwiększenia liczby prawidłowo rozpoznanych przypadków wznowy raka jelita grubego wśród osób chorych (wyniki prawdziwie dodatnie), przy jednoczesnym zwiększeniu liczby rozpoznań przypadków wznowy tego nowotworu u osób zdrowych (wyniki fałszywie dodatnie). Dodatkowo, wyniki tego przeglądu wskazują, że u 20% pacjentów rzeczywistej wznowie nowotworu nie towarzyszy zwiększenie stężenia CEA we krwi (wyniki fałszywie ujemne). Obecne dane naukowe potwierdzają słuszność ustalenia punktu odcięcia dla stężenia CEA we krwi na wysokim poziomie (10 µg/L), jednak konieczne jest wtedy przeprowadzenie dodatkowej diagnostyki (np. jednorazowo tomografia komputerowa klatki piersiowej, brzucha i miednicy w ciągu 12-18 miesięcy po operacji) w celu wykrycia pominiętych przypadków wznowy nowotworu.

Uwagi do tłumaczenia

Tłumaczenie: Mateusz Świerz, Redakcja: Karolina Moćko

淺顯易懂的口語結論

以檢測血液的癌胚抗原(CEA)值來偵測大腸癌的復發

背景

在結腸或直腸癌(大腸癌)的手術後,大部分的人會受到至少五年的密集追蹤,以監測癌症復發的跡象。當它發生時,通常會導致一種稱為CEA(癌胚抗原)的血液蛋白升高。CEA值的升高可以透過血液檢測發現,這通常在大腸癌手術後每隔3至6個月進行一次。對於那些CEA值升高的人,會以X光攝影(通常檢視胸部、下腹部與骨盆)來進行更進一步的調查。我們完成這份文獻回顧,以幫助定義應該要針對血液中的哪個CEA值進行進一步的調查。

主要結論

本文獻回顧顯示,設定一個低標準臨界濃度,會增加偵測到的大腸癌復發真實案例數量(真陽性),但低臨界濃度也會造成由於不正確地將過多不是真正復發的案例分類,而出現的非必要假警報(假陽性)。此外,本文獻回顧顯示,CEA的升高並沒有在高達20%的真復發患者身上出現(假陰性)。現有的證據支持使用高臨界濃度(10 µg/L)來評估,但這必須加入另外一種診斷方式(例如:在12至18個月時單獨進行胸部、下腹部與骨盆的X光攝影),以避免案例的遺漏。

譯註

審稿者:刁茂盟
服務單位:高雄長庚紀念醫院 小兒胃腸科
職稱:主任

本翻譯計畫由臺北醫學大學考科藍臺灣研究中心(Cochrane Taiwan)、台灣實證醫學學會及東亞考科藍聯盟(EACA)統籌執行
聯絡E-mail:cochranetaiwan@tmu.edu.tw

Background

International guidelines recommend that blood carcino-embryonic antigen (CEA) levels are measured to detect recurrent colorectal cancer (CRC) as part of an intensive follow-up regimen (Duffy 2013b; Labianca 2010; Locker 2006; NCCN 2013; NICE 2011).

A previous Cochrane review (Jeffery 2007) of eight randomised controlled trials (RCTs) (Kjeldsen 1997; Makela 1995; Ohlsson 1995; Pietra 1998; Rodriguez-Moranta 2006b; Schoemaker 1998; Secco 2002; Wattchow 2006) evaluated the impact of follow-up strategy on overall survival and the number of recurrences detected. The analysis included very scant data on CEA; data on overall survival were only available from one trial (odds ratio (OR) 0.57, 95% confidence interval (CI) 0.26 to 1.29) and data on recurrence rate only from two (OR 0.85, 95% CI 0.58 to 1.25).The follow-up strategies implemented in each study were instead broadly classed as either intensive or minimal and the investigative modalities included in each strategy varied greatly between studies. Compared to minimal follow-up, it was estimated that an intensive regimen could significantly reduce five-year all-cause mortality (OR 0.73, 95% CI 0.59 to 0.91).

The validity of this conclusion has been questioned because the mechanism by which a mortality reduction of this magnitude could be achieved by treating asymptomatic recurrence is unclear. There is evidence from one trial that starting chemotherapy for recurrence at an asymptomatic rather than symptomatic stage increases length of survival by a median of five months (Glimelius 1992). There is also observational evidence that surgical resection of metastases when feasible is associated with over 40% survival at five years (Colibaseanu 2013; Gonzalez 2013; Kanas 2012), and one commentator has suggested that advances in chemotherapy, hepatic resection, and multidisciplinary CRC follow-up mean that the clinical benefits of intensive follow-up will be even greater today (Labianca 2010). It is certainly true that there are now a number of well-tolerated effective chemotherapy regimens for recurrent CRC in older populations (Cunningham 2010; Locker 2006). However, the authors of the CEASL (CEA second-look) trial argue that identifying and treating asymptomatic recurrence has the potential to increase overall mortality (Treasure 2014), and the FACS (Follow-up After Colorectal Surgery) trial suggests that the effect of follow-up on absolute mortality is much smaller than that suggested by the 2007 review (Primrose 2014).

Nevertheless, the FACS trial has re-awakened interest in CEA follow-up. It showed that measuring blood CEA three- to six-monthly for five years, augmented by a single CT (computed tomography) scan at 12 to 18 months, leads to earlier diagnosis of recurrence and increases by about three-fold the proportion of recurrences that can be treated with curative intent (Primrose 2014). As CEA monitoring does not involve x-rays, it can be done in the community, and is potentially more cost-effective than CT imaging. The FACS trial result has raised substantial interest in CEA as a first-line follow-up modality.

CEA is a glycoprotein involved in cell adhesion produced during foetal development. Production usually ceases at birth, but elevated levels can be detected in people with colorectal, breast, lung and pancreatic cancer, in smokers, and in people with benign conditions such as cirrhosis of the liver, jaundice, diabetes, pancreatitis, chronic renal failure, colitis, diverticulitis, irritable bowel syndrome, pleurisy and pneumonia (Newton 2011; Sturgeon 2009). Prior to first diagnosis, CEA levels may rise between four and eight months before the development of cancer-related symptoms (Goldstein 2005). Approximately 90% of colorectal cancers produce CEA (Dallas 2012). Predicting those people who do not secrete CEA is a challenge, with conflicting reports regarding whether well- or poorly-differentiated tumours are associated with increased secretion (Davidson 1989). During follow-up, CEA appears to be most sensitive for detecting hepatic and retroperitoneal metastases, and is least sensitive for local recurrences and peritoneal or pulmonary disease (Scheer 2009; Tsikitis 2009). However, CEA needs to be seen as a triage test (where a rise should lead to further investigation rather than initiation of therapy), as it gives no information about the location and extent of recurrence (Duffy 2013b ).

Although serial CEA measurements are taken during follow-up, the decision to investigate further with imaging is usually based on a single elevated CEA measurement (although a repeat blood test is often done to confirm the raised level). An absolute threshold somewhere between 3 and 7 µg/L is typically used to trigger further investigation. In the FACS trial, the threshold used was based on the difference of the CEA level at a single time point from the postoperative baseline (Primrose 2014).

The most recent systematic review exploring the accuracy of CEA for diagnosing recurrent CRC includes a meta-analysis of 20 studies (Tan 2009). These studies implemented a wide range of thresholds (3 to 15 µg/L) and measured CEA using a variety of test kits. The pooled estimates of sensitivity and specificity were 64% (95% CI 61% to 67%) and 90% (95% CI 89% to 91%) respectively. The pooled area under the curve (AUC) was 0.79 (standard error = 0.054). A subgroup analysis of four studies that reported accuracy at a threshold of 3 µg/L gave an improved sensitivity of 73% (95% CI 69% to 77%) but at the expense of a reduced specificity of 68% (95% CI 65% to 72%). Based on a metaregression analysis, the authors suggest that a cut-off of 2.2 µg/L provides the ideal balance between sensitivity and specificity, but this is based on extrapolation beyond the data analysed, as the lowest threshold applied in any included study was 3 µg/L. We were also unable to identify some of the data included in the analysis from the published studies.

Target condition being diagnosed

Colorectal cancer is globally the third most common cancer, accounting for 9.8% of all detected cancers. In 2008, the age-standardised incidence rate was 17.3 cases per 100,000 (30.1 in high-income countries and 10.7 in low- or middle-income countries) (Ferlay 2013).

Colorectal adenocarcinoma arises in the colonic mucosa and progressively invades through the layers of bowel wall into surrounding structures, leading to peritoneal, neural, lymphatic and haematological metastasis (Gore 1997). This process provides the basis of the internationally recognised TNM (tumour node metastasis) staging system (Sobin 2009) and the earlier Dukes classification (Dukes 1932). The first site of haematological metastasis is the liver via the portal vein, after which distant metastasis occurs most commonly in the lungs but also in the bones and brain (Guthrie 2002). Prognosis is closely related to stage, with higher-grade metastatic tumours having a poorer prognosis (Maringe 2013). Approximately two-thirds of patients will present with a primary CRC amenable to radical surgery (Jeffery 2007).

Following surgery, however, 30% to 50% of patients will develop recurrence (Labianca 2010), although the results of the FACS trials suggest that perhaps half these cases result from inadequate preliminary staging and might have been detectable through more rigorous investigation at the time of primary treatment (Primrose 2014). The most common site for recurrence is the liver, followed by the lungs, but it can also occur in the abdomen and pelvis (Cunningham 2010; Jeffery 2007).

As stated in the Background, the effectiveness of treatment of recurrence is a matter of hot debate (Godlee 2014; Treasure 2014). In the absence of trials of treatment versus no treatment, most estimates of impact are based on observational data. Patients undergoing secondary surgery with curative intent have a median survival time of 35.8 to 84.8 months. Chemotherapy has been estimated to prolong life by one to two years (Arriola 2006; Cunningham 2010; Tsikitis 2009). However, apart from the Nordic trial showing that the initiation of chemotherapy at an asymptomatic stage increases survival (Glimelius 1992), there is no evidence from trials to confirm that treatment of early-diagnosed asymptomatic recurrence improves survival or other outcomes. There is a need therefore to determine the most accurate means of detecting early-stage recurrence before the impact of treatment strategies can be further explored.

Index test(s)

CEA is a relatively simple and low-cost biomarker that can be detected by a blood test. The analysis of CEA in clinical studies utilises the technique of immunoassay in a variety of forms and from a number of different manufacturers. Earlier methods were manual immunoassays, such as radio-immunoassay, but most laboratories now use fully automated non-isotopic methods. The reproducibility of these fully automated methods are generally superior to the older manual methods. Unfortunately, the details of the methods used in clinical studies and their analytical performance are often lacking (Wild 2013).

Data from external quality assessment schemes have repeatedly shown good precision for most methods at low CEA concentrations. In 2010, within-laboratory precision over a 12-month period at a concentration of 3 µg/L (equivalent to 54 U/L) was less than 9% on average for all major methods. A greater analytical challenge is the difference in method bias (Wild 2013). Despite the availability of an international reference preparation (IRP 73/601) since 1975 and its widespread use in commercial assays since the early 1990s, method bias may still be ± 20%, and the degree of this bias is often sample-dependent (Bormer 1991; Laurence 1975). CEA has a complex molecular structure and the antibodies used in the immunoassays recognise different epitopes of the molecule, which is considered to be a major source of method bias (Bormer 1991). Consequently, the interpretation of data from clinical studies, especially the use of any particular threshold, needs to take account of the actual method used. Due to the good reproducibility but significant method-dependent bias, it is advised that the same assay technique should be used throughout any follow-up period (Duffy 2013b).

Clinical pathway

Following radical surgery (with or without adjuvant therapy), there is wide variation in the recommended intensive follow-up regimen (Duffy 2013b; Labianca 2010; Locker 2006; NCCN 2013; NICE 2011).

The European Society of Medical Oncology (ESMO) recommend history, physical examination, and CEA determination every three to six months for the first three years, and every six to 12 months in years four and five. A colonoscopy is recommended at one year, then every three to five years looking for metachronous adenomas and cancers. A CT scan of the chest and contrast-enhanced ultrasound scan (USS) or CT scan of the abdomen is recommended every six to 12 months for the first three years in patients considered to be at higher risk. Other laboratory and radiological examinations are not recommended unless patients have suspicious symptoms (Labianca 2010).

The American Society of Clincal Oncology (ASCO) recommends that CEA is performed every three months for the first three years in patients with stage II or III disease if the patient is a candidate for surgery or systemic therapy, and that raised CEA levels (> 5 µg/L, confirmed by a repeat test) warrant further evaluation for metastatic disease (Locker 2006). Unlike ASCO, ESMO does not specify a threshold nor limit testing to specific tumour stages. The European Group on Tumour Markers (EGTM) specify CEA measurement at baseline and then every two to three months for three years, then six-monthly for five years in patients with stage II to III disease who would tolerate further surgery or systemic therapy. EGTM recommend that any increase in CEA (confirmed by a repeat test) should trigger further investigations (Duffy 2013b).

The National Institute for Health and Clinical Excellence (NICE) recommended follow-up from four to six weeks following curative treatment, for all patients who could tolerate and accept the balance of risk and benefits of further treatment, including CEA measurement at least every six months in the first three years, two CT scans of the chest and abdomen in the first three years, and colonoscopy at one year and five years (NICE 2011).

Once recurrence is suspected on the basis of a raised CEA level, patients then undergo further diagnostic testing to confirm recurrence (Duffy 2013a). The modality used to provide a definitive diagnosis is usually either CT or USS, but could also be clinical assessment, colonoscopy, flexible sigmoidoscopy and barium enema, CT colonography, positron emission tomography–computed tomography (PET-CT), or magnetic resonance imaging (MRI).

Prior test(s)

As detailed above, CEA is often the most frequently undertaken modality within an intensive follow-up regimen. Prior testing in this context is irrelevant, because CEA is measured routinely within intensive follow-up programmes.

Role of index test(s)

As a triage test to prompt further investigation for CRC recurrence.

Alternative test(s)

Circulating tumour cells and cytokeratins have been examined as possible biomarkers of CRC recurrence, but the studies are few and limited. Ca125 is regarded as an emerging biomarker for use in postoperative follow-up, but as yet evidence is limited (Duffy 2013b; Newton 2011). CT imaging is the only other test that meta-analysis suggests has potential to detect metastatic recurrence amenable to resection, but it is more expensive than measuring blood CEA. CT-PET is used in some centres, but will only be preferred to standard CT for routine follow-up if future evidence suggests much superior performance. Endoscopic imaging (colonoscopy) is routinely used as an adjunct to CEA or CT imaging or both in follow-up care to detect metachronous polyps or cancer (and rarely intraluminal recurrence). Clinical and ultrasound examination lack sensitivity. MRI can realistically be applied only to the liver and lacks strong evidence of effectiveness in detecting recurrence.

Rationale

This diagnostic test accuracy (DTA) review aims to clarify the accuracy of blood CEA as a triage test for CRC recurrence. If found to be sufficiently accurate, CEA could be a cost-effective means of reducing unnecessary, more expensive investigations.

Objectives

To determine the diagnostic performance of different blood CEA levels in identifying people with colorectal cancer recurrence in order to inform clinical practice.

Secondary objectives

To identify sources of between- and within-study heterogeneity to inform future study designs.

Methods

Criteria for considering studies for this review

Types of studies

We include cross-sectional diagnostic test accuracy studies, cohort studies, and RCTs that directly compared follow-up after CRC resection using CEA to a reference standard. We included studies only if we could extract 2 x 2 accuracy data. We excluded case-control studies, as the ratio of cases to controls is determined by the study design, making the data unsuitable for assessing test accuracy.

Participants

Participants were adults with no detectable residual disease after primary treatment with surgical resection (with or without adjuvant therapy) being followed-up for recurrence.

Index tests

Blood carcino-embryonic antigen (CEA).

Target conditions

Recurrence of colorectal cancer following curative resection, including locoregional recurrence and metastatic disease.

Reference standards

  1. Imaging done per protocol or to investigate for suspected recurrence (usually CT, MRI or PET-CT, but also endoscopy, CT colonography, ultrasound, and barium enema).

  2. The histological confirmation of recurrence following surgery or tissue biopsy.

  3. Routine clinical follow-up used as a reference standard to confirm negative index test values where imaging is not indicated as part of the follow-up schedule (standard protocols run for three to five years).

We had hoped to compare the results of using these different reference standards in a sensitivity analysis. However, the majority of studies (73%) reported a composite reference standard, including more than one of the three reference standards listed above, as part of a prespecified clinical pathway and so the specific reference standard applied varied between participants within the same study. Without individual patient data, identifying the exact investigative modality applied as the reference standard was not possible and so we did not conduct the planned sensitivity analysis.

We classified the chosen reference standard (or composite reference standard) used in each study as 'appropriate' (1 to 3 above), 'inappropriate' (a reference standard not included in 1 to 3 above), or 'not stated' for further subgroup analysis.

There were insufficient data available to classify deaths during follow-up as 'death from CRC', 'death with CRC', 'death from other causes', or 'death unspecified', as detailed in the original protocol.

Search methods for identification of studies

Electronic searches

Our information specialist (NR, trained in Cochrane DTA methodology) designed our search strategy, and conducted all searches to January 29 2014. We applied no language limits to the searches, and translated non-English manuscripts to assess suitability for inclusion.

We searched for relevant reviews in the MEDION database (www.mediondatabase.nl), using the search terms 'cea' OR 'carcinoembryonic' or 'carcino-embryonic' and restricting to Malignancy OR Digestive. Using the same terms, we also searched MEDLINE (OvidsSP) [1946 to current, In-process], and EMBASE (OvidSP) [1974 to current] using the Reviews Clinical Query, and the DARE database (the Cochrane Library, Wiley).

We searched for primary studies (including conference abstracts) in the Cochrane Central Register of Controlled Trials (CENTRAL), the Cochrane Library, Wiley (Appendix 1), MEDLINE (OvidSP) [1946 to current, In-process] (Appendix 2), EMBASE (OvidSP) [1974 to current] (Appendix 3), , and the Science Citation Index & Conference Proceedings Citation Index - Science (Web of Science, Thomson) [1945 to current] (Appendix 4).

We identified ongoing studies by searching WHO ICTRP (apps.who.int/trialsearch/) using the following search terms: (Condition = (colorectal cancer OR colon cancer OR colorectal neoplas* OR colon neoplas* OR rectal cancer OR rectal neoplas*) AND Intervention = (cea OR Carcinoembryonic Antigen OR carcinoembryonic antibod*)), and by searching ClinicalTrials (clinicaltrials.gov) using the following search terms: (Condition = (colorectal cancer OR colon cancer OR colorectal neoplas* OR colon neoplas* OR rectal cancer OR rectal neoplas*) AND Intervention = (cea OR Carcinoembryonic Antigen OR carcinoembryonic antibod*)).

We conducted an additional search of the ASCO meeting library (meetinglibrary.asco.org/) for conference abstracts using the following search terms: (Title word search: “cea OR "carcinoembryonic antigen" OR "carcinoembryonic antigen").

Searching other resources

Following the search of bibliographic databases, we checked reference lists of retrieved reviews and all included studies. In addition, we performed a 'Related articles' search on PubMed on all included studies.

In the protocol, we stated we would contact the principal investigators of all included studies to identify further relevant literature, clarify methodological queries if they exist and to ask for any unpublished data relevant to this review. Unfortunately, due to time constraints and the large number of studies included in our review, we were not able to do this.

Data collection and analysis

Selection of studies

To identify relevant studies, two review authors (BDN and IP) scanned all titles and excluded those studies clearly not relevant to the topic of CEA for the detection of CRC recurrence. Following this, the same two review authors (BDN, IP) independently assessed both the titles and abstracts of the selected studies and retrieved the full-text articles for those deemed to be relevant and for those where a decision could not be made on the basis of the title and abstract alone.

We assessed the remaining full-text articles to see whether 2 x 2 accuracy data were available and, if so, we included the study in the review and implemented a full data extraction. Reasons for exclusions are detailed in Figure 1. A third review author (BS) resolved any disputes over which references should be included.

Data extraction and management

Full data extraction was guided by a background information sheet describing how each item should be interpreted. Two review authors piloted and refined this form, using three initial studies. A third review author resolved any disagreements over extracted data.

We extracted data into an Excel spreadsheet under the following headings: author, year, title, country, study design, setting, dates of data collection, population (n), inclusion criteria, exclusion criteria, included participants (n), age, smoking status, site of primary tumour, stage/grade of primary tumour, investigations done to ensure no residual disease, chemotherapy/radiotherapy, follow-up schedule, cases of recurrence (n), CEA timing, CEA technique, CEA threshold, reference standard, timing of CEA versus reference standard, true positives (TP), false positives (FP), true negatives (TN), false negatives (FN), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), AUC, QUADAS-2 items (including CEA laboratory technique, Appendix 5).

In the protocol we stated we would contact authors if data were not available, but due to time constraints we were not able to do this.

Assessment of methodological quality

Assessment of methodological quality

QUADAS-2 is a generic set of criteria for assessing the quality of diagnostic accuracy studies. It consists of four key domains: patient selection, index test, reference standard, and the flow of patients through the study and timing of the index test in relation to the reference standard. Signalling questions are provided to guide judgement of the risk of bias across these four domains (Whiting 2011).

We modified QUADAS-2 to exclude items not applicable to this review. A guide to the operational definitions for the modified QUADAS-2 items can be found in Appendix 5.

We included additional questions regarding index test repetition (4.A.1) and CEA laboratory technique (2.A.2 to 2.A.4). We modified "Was there an appropriate interval between index test(s) and reference standard?" (Yes/No/Unclear) to instead read "4.A.2. Was the timing between index test(s) and reference standard ascertainable?" (Yes/Unclear). We also modified "Did all patients receive a reference standard?" to instead read "Did all included patients who had at least one CEA measurement receive a reference standard?". We removed "Was a case-control design avoided?" from the original QUADAS-2 template as we excluded all case-control studies. We also removed "Were the index test results interpreted without knowledge of the results of the reference standard?" as knowledge of the reference test result would not bias the interpretation of a positive or negative CEA result, as CEA is an objective test using a predetermined dichotomous threshold.

For the index test domain, items were weighted so that the use of a prespecified threshold and a consistent method for CEA measurement had more influence on the overall judgement than the items regarding estimation of method reproducibility and indication of method accuracy. We made this decision as the latter two items were very rarely reported.

For the reference standard domain, items were weighted so that correctly classifying recurrent CRC had more influence on the overall judgement than whether the reference standard was interpreted without the knowledge of the index test. We made this decision as there were no blinded studies included in the review.

For the flow and timing domain, the five items were weighted so that the inclusion of all patients in the final analysis had the most influence and everyone receiving a reference standard was second most influential. Repetition of the index test prior to the reference standard, ascertainable timing between the index test and reference standard, and to all patients receiving the same reference standard were weighted equally lower.

Signalling questions weighted as high priority determined the overall rating within each domain.

Two review authors (BDN, IP) assessed the quality of all articles independently, discussing any disagreements. Where they could not reach consensus, a third author (BS) acted as moderator. We used the results of the quality assessment for descriptive purposes to provide an evaluation of the overall quality of the included studies and to investigate potential sources of heterogeneity.

Statistical analysis and data synthesis

We used descriptive statistics to present summary data for each included study. The Characteristics of included studies tables detail patient sample, study design, CEA technique, follow-up characteristics and the CEA threshold(s) at which accuracy was reported. We extracted binary diagnostic accuracy data from all included studies as 2 x 2 tables. We present the risk of bias results for each of the four domains of the QUADAS-2 assessment graphically as described by Whiting 2011.

Inferential statistics were guided by Chapter 10 of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010).

We used Review Manager 5 to produce forest plots showing the variability of sensitivity and specificity across primary studies, with corresponding 95% confidence intervals for visual comparison. For studies reporting more than one threshold, we extracted 2 x 2 data for all thresholds. We plotted sensitivity and specificity estimates from each study in ROC space, using the inverse standard error of each estimate to adjust the size of each box to represent precision. For both of these graphs, we included sensitivity and specificity at the threshold closest to 5 µg/L (the most commonly reported threshold). We did not conduct a meta-analysis across all of the included studies, as we had a sufficient number of studies to carry out meta-analyses at specific thresholds (see next section), which is clinically more informative.

We used the bivariate model to perform meta-analysis of sensitivity and specificity (Reitsma 2005). We conducted analyses using the xtmelogit command in Stata (Takwoingi 2013).

We estimated the absolute numbers of false alarms (false positives) and missed cases (false negatives) per 1000 patients tested for each three-monthly testing interval by applying the pooled sensitivity and specificity derived from this review to: 1) the observed median reported prevalence of recurrence divided by 15 (national guidance is to conduct 14 to 15 CEA tests during follow-up); 2) the incidence of recurrence data per follow-up period reported by Sargent 2007 (as in reality the proportion developing recurrence between tests is not constant but falls over time).

Investigations of heterogeneity

Based on the results of the quality assessment, we determined the following most likely sources of heterogeneity: effect of CEA threshold, whether a single CEA measurement or serial measurements were evaluated, and the laboratory techniques employed.

For each subgroup analysis, we conducted bivariate meta-analyses (Reitsma 2005), using the xtmelogit command in Stata to produce pooled estimates of sensitivity and specificity. Summary ROC plots and forest plots are reported to provide a basic picture of between-study variability in these accuracy estimates.

CEA Threshold

For tests producing a continuous outcome, the threshold at which a positive result is defined directly impacts on the accuracy of the test. The use of different thresholds between studies is therefore a key source of heterogeneity.

We investigated the effect of threshold by carrying out subgroup meta-analyses for thresholds where sufficient data were available. As some studies reported 2 x 2 data for more than one threshold, this analysis allowed us to include all of the available data. We used Review Manager 5 to produce a forest plot showing the variability of sensitivity and specificity across primary studies at specific thresholds.

Although the original plan was to apply a meta-analysis method incorporating more than one 2 x 2 table from a single study (Hamza 2009), this method requires data to be reported at consistent thresholds across all included studies, and this was not the case in our review.

Timing of CEA Measurement

Despite sequential CEA measurements being taken in the majority of studies, 2 x 2 data were not reported for each scheduled measurement in any of these studies.

Some studies provided 2 x 2 data for the CEA measurement taken closest to the time point at which recurrence was detected or, for patients who did not experience recurrence, their final follow-up measurement. Others looked across all of the measurements available for each individual to assess whether any of the sequential measurements had crossed the threshold during the entire follow-up period. This approach meant the time interval between a rise in CEA and confirmed recurrence was variable across individuals within the same study, but this interval was not reported in any study. Consequently, we classified a patient without confirmed recurrence during the follow-up period and at least one measurement above the threshold as a false positive in the 2 x 2 table, and a patient with confirmed recurrence but without any CEA rise above the threshold as a false negative.

As this information was not consistently reported in all studies, we could not include this variable in the metaregression analysis. Instead, we explored whether this had a significant impact on accuracy by carrying out a subgroup analysis on those studies that did provide this information. This analysis was also limited to studies reporting accuracy at 5 µg/L (the most commonly reported threshold) to avoid any threshold effects.

Laboratory Technique

The intention was to carry out subgroup analyses on studies using the same laboratory technique in order to assess the effect of technique on accuracy. However, given that so few studies provided sufficient detail regarding the laboratory technique employed, this was not possible. We were interested in exploring whether the implementation of IRP 73/601 reduced between-study variability in sensitivity and specificity. We therefore used the information provided in each study to assess whether laboratory methods predated the introduction of IRP (e.g. manual Radioimmunoassay (RIA) and Immunoradiometric assay (IRMA) methods) and whether the samples were analysed pre-1992. We then carried out a subgroup analysis and compared the widths of the 95% confidence intervals for the pooled estimates of sensitivity and specificity. We again limited this analysis to those studies reporting accuracy at 5 µg/L to avoid threshold effects.

Sensitivity analyses

To explore whether study quality biased the sensitivity and specificity of CEA, we planned a subgroup analysis to include those studies which had a low risk of bias across all four domains. We also carried out a metaregression analysis using the 'Metadas' macro in SAS, including all of the four domains as ordinal covariates (low risk, unclear, high risk).

Assessment of reporting bias

As described in the protocol and by Van Roon 2011, investigation of publication bias in DTA studies is known to be problematic, and so we have not included assessment of reporting bias in this review (Deeks 2005; Leeflang 2008; Song 2002).

Results

Results of the search

Figure 1 summarises the studies that we identified, screened and selected for this review. Our search resulted in 6782 hits, including 6571 primary studies, 128 reviews, 46 conference abstracts, and 37 registered trials. We identified 45 additional articles by checking the reference lists of retrieved reviews and by performing a 'Related articles' search in PubMed. We removed duplicates (n = 3016), leaving 3811 records for title and abstract screening. Of these, we requested 268 full-text articles for review, of which we excluded 216 (see Figure 1 for reasons for exclusion). Fifty-two studies met our inclusion criteria and are included in the final review.

Figure 1.

PRISMA flow diagram: results of the search for studies evaluating the diagnostic accuracy of blood CEA to detect recurrent colorectal cancer in patients following curative resection.

Included studies

Prevalence

Included studies were published between 1974 and 2014 and were conducted across 22 countries. All studies were conducted in secondary care, except one Norweigian prospective study (Johnson 1985) in which follow-up was conducted in both primary and secondary care. In total, 9717 patients were included, and 2951 recurrences detected. The median number of participants in the studies was 139 (interquartile range (IQR): 72 to 247) and the proportion of recurrences detected ranged from 13.5% (Fezoulidis 1987) to 72.3% (Ochoa-Figueroa 2012) (median: 29.5%, IQR: 24.3 to 36.3%).

Study Design

In 24 studies (46%) a prospective design was used, three of which were randomised controlled trials (RCTs) (McCall 1994; Ohlsson 1995; Steele 1982). One study prospectively followed up a cohort of patients of whom some were identified retrospectively (Tate 1982), while another sampled retrospectively from a prospective cohort (Korner 2007). The remaining 26 studies (50%) used a retrospective design.

Clincal features of included patients

Location of recurrence

The location of recurrence was reported in 25 studies (48%) including local, locoregional, and distant recurrence. However, the description of CRC recurrence was heterogeneous and all studies lacked 2 x 2 tables for the diagnostic accuracy of CEA to detect recurrence at each location (Characteristics of included studies).

Staging of primary colorectal cancer

Apart from the two studies (4%) which included only patients with rectal cancer (Barillari 1992; Fezoulidis 1987), the majority of studies (n = 50, 96%) included patients with both colon and rectal cancer.

Thirty-three studies (63%) used the Dukes staging to describe the primary CRC. A further 11 studies (21%) used the TNM grading system and one study (2%) used the Astler-Coller staging. The staging was unclear or not reported in the remaining seven studies (13%) (Carlsson 1983; Kohler 1980; Koizumi 1992; Li Destri 1998; Mittal 2011; Ochoa-Figueroa 2012; Wood 1980).

Of those using Dukes staging, seven included Dukes A - D (Banaszkiewicz 2011; Carpelan-Holmström 2004; Jubert 1978; Mach 1978; Mariani 1980; Seregni 1992; Yu 1992); 15 included Dukes A - C (Barillari 1992; Deveney 1984; Farinon 1980; Fezoulidis 1987; Fucini 1987; Graffner 1985; Hine 1984; Irvine 2007; Kato 1980; Korner 2007; Luporini 1979; Mackay 1974; McCall 1994; Ohlsson 1995; Triboulet 1983); three used Dukes B - C (Beart 1981; Steele 1982; Wang 1994); two used Dukes C (Hara 2008; Tobaruela 1997); one used Dukes A - C plus palliative cases (Johnson 1985); one used Dukes A - C plus unknown cases (Tate 1982); and four used Dukes A - D plus unknown cases (Bjerkeset 1988; Engarås 2003; Miles 1995; Minton 1985).

Of the 11 studies using the TNM grading system: five included TNM I - III (Kanellos 2006a; Ohtsuka 2008; Park 2009; Tang 2009; Yakabe 2010); four used TNM I - IV (Carriquiry 1999; Nishida 1988; Peng 2013; Staib 2000); and one included TNM II - III (Kim 2013). Only one study reported 2 x 2 tables by stage, reporting on TNM II and TNM III (Hara 2010).

The study that used Astler-Coller staging included A - C2 (Lucha 1997).

Smokers

Three studies explicitly excluded smokers (Kanellos 2006a; Mariani 1980; Staib 2000), four studies explicitly included some smokers (but there was no way of identifying these patients in the 2 x 2 tables), and the remaining studies did not report smoking status. In the two studies which gave precise figures for smoking prevalence, it was low at 2% smokers (Fucini 1987) and 9% heavy smokers (Mach 1978).

Investigations for residual disease

In 43 studies (83%) it was not clear which (if any) perioperative investigations were done to ensure there was no residual disease before entering follow-up. In the nine studies that reported this information, three reported using a persistent postoperative elevation of CEA as evidence of residual disease (Hara 2008; Irvine 2007; Steele 1982); one used "signs" of malignancy at the first follow-up examination (Tate 1982); one used preoperative colonoscopy to resect any lesions outside the section of bowel planned for resection (Banaszkiewicz 2011); one reported using the intraoperative detection of gross residual disease (Lucha 1997); one specified no gross residual disease and clear resection margins (Bjerkeset 1988); one used preoperative abdominal CT and interoperative palpation to exclude liver metastases (Kanellos 2006a); and one reported using preoperative barium enema (BE), chest x-ray (CXR), liver function tests (LFTs) and CEA, and postoperative BE and colonoscopy to ensure there was no residual disease (Ohlsson 1995).

Treatment

In 14 studies (27%) some (but not all) patients received chemotherapy, and in no studies was a subgroup analysis performed comparing the diagnostic accuracy of CEA in those receiving chemotherapy compared to those who did not (Characteristics of included studies).

Reference standard

In 38 studies (73%) a composite reference standard was used, the composition of which varied greatly between studies (see Characteristics of included studies). In 12 of these, a predefined multimodal follow-up schedule was used for each patient (although the composition of these varied across studies) (Banaszkiewicz 2011; Carlsson 1983; Fucini 1987; Hara 2008; Irvine 2007; Jubert 1978; Kanellos 2006a; McCall 1994; Ohlsson 1995; Park 2009; Peng 2013; Steele 1982). In 26 studies (50%) a predefined composite follow-up schedule was used to trigger further investigations for suspected recurrence.

A single investigation was used in three studies (6%) (Mittal 2011; Ochoa-Figueroa 2012; Staib 2000), of which one reported 2 x 2 tables separately for PET and for CT (Ochoa-Figueroa 2012).

In the remaining 11 studies (21%), it was unclear what was used as a reference standard.

CEA measurement

The use of predefined follow-up schedules resulted in multiple CEA measurements being available for analysis.

Eight studies (15%) reported the accuracy of the CEA measurement closest to the time at which recurrence was detected by the reference standard, whilst nine studies (17%) defined CEA as positive if any CEA measurement crossed the threshold at any time within the follow-up period. In a subset of studies, the authors stated clearly that a single 'positive' measurement would be followed up by a repeat test to confirm the result.

For the remaining 35 studies (67%), it was impossible to unpick which CEA value had been used, due to limited reporting.

Reporting units

CEA studies have used both ng/mL and µg/L in their publications. Numerically these are the same value and for consistency we have used µg/L throughout the review.

Laboratory technique

Details regarding laboratory methods for CEA analysis were inconsistently reported across the included studies. Based on the available information relating to laboratory technique, we were able to able to group the studies as follows:

  1. Twenty-two studies (42%) analysed samples before the introduction of the international reference preparation (IRP) using manual RIA and IRMA methods;

  2. Seven studies (13%) used an identifiable laboratory technique following introduction of IRP;

  3. Eight studies (15%) used unfamiliar laboratory techniques after the introduction of IRP;

  4. Fifteen studies (29%) did not report laboratory technique.

For the seven studies reporting an identifiable laboratory technique following IRP introduction, six distinct techniques were used: Autodelfia post-year 2000 (Carpelan-Holmström 2004; Engarås 2003); Abbott automated instrumentation (Korner 2007); Bayer Immuno 1 (Irvine 2007); Siemens ADVIA centaur (Kim 2013); Roche elecsys (Mittal 2011); and Diasorin/byk santec liaison (Staib 2000). Across these, four thresholds were reported: 3 µg/L (Staib 2000); 5 µg/L (Carpelan-Holmström 2004; Kim 2013; Mittal 2011); 5.6 µg/L (Engarås 2003); and 10 µg/L (Irvine 2007; Korner 2007).

Forty-three studies (83%) did not report an estimation of CEA method reproducibility nor an indication of method accuracy. Of the remaining nine studies, three (6%) reported both an estimation of reproducibility and an indication of method accuracy (Carpelan-Holmström 2004; Engarås 2003; Steele 1982), four (8%) clearly reported only an estimation of reproducibility (Fucini 1987; Hine 1984; Mach 1978; Mackay 1974), and the remaining two (4%) reported only the indication of method accuracy (Irvine 2007; Miles 1995).

Excluded studies

Of the 216 excluded full-text articles (Figure 1; Characteristics of excluded studies):

  • 152 studies (70%) did not report complete 2 x 2 data, and 74 (34%) reported no 2 x 2 data at all: 59 (27%) only reported recurrences; 16 (7%) only reported CEA positive cases; and three (1%) only CEA negative);

  • 23 studies (11%) did not conduct a single-point diagnostic test accuracy study (14 (6%) used alternative analyses (trend, nomogram, slope, or median CEA); five were case-control studies (2%); three (1%) were review articles; and one was an economic analysis);

  • 14 studies (6%) did not report an analysis of serum CEA measurements taken as part of a follow-up schedule (seven (5%) reported preoperative CEA measurements; six (3%) reported the prognostic value of one postoperative CEA measurement; and one used intraoperative portal vein sampling);

  • eight studies (4%) included fewer than 30 patients;

  • six studies were unavailable or needed translation (five studies (2%) were not retrieved after worldwide search by the British Library, and we were not able to translate the remaining study);

  • five studies (2%) did not clearly report colorectal cancer recurrence (three (1%) reported on only liver metastases; and two (1%) reported colorectal cancer recurrence together with other cancer types);

  • five studies (2%) reported datasets already included in the review;

  • three studies (1%) reported non-curative surgery.

We have not included two large RCTs in the review: the FACS trial (as 2 x 2 data were not reported in the published paper (Primrose 2014)), and the CEASL trial, which was published following our search and did not report on negative CEA cases (Treasure 2014).

Methodological quality of included studies

We assessed all 52 studies using the QUADAS-2 framework. Figure 2 shows the summary of overall risk of bias and applicability concerns, and Figure 3 presents the risk of bias and applicability concerns as overall percentages.

Figure 2.

QUADAS-2 risk of bias and applicability concerns summary including review authors' judgements about each domain for each included study

Figure 3.

QUADAS-2 risk of bias and applicability concerns graph including review authors' judgements about each domain presented as percentages across included studies

Three studies (6%), including 516 participants of whom 177 experienced recurrence, were assessed as being at low risk of bias and low concern regarding applicability across all domains (Barillari 1992; Irvine 2007; McCall 1994). Across these studies, each reported a different threshold (3, 10, and 5 µg/L respectively) using CEA test kits from three different manufacturers (with poor description of method accuracy). Each study applied a different but "appropriate" follow-up schedule to detect recurrence. Consequently, the planned subgroup analysis of high-quality studies (low risk of bias in all four domains) was not feasible.

Risk of bias

We judged 34 studies (65%) to be at high risk of bias in at least one of the four domains (Figure 3).

For the patient selection domain, items were weighted so that the presence of inappropriate exclusions had more influence on the overall judgement than the presence of a consecutive or random sample. Of the 27 studies judged to be at high risk of bias for patient selection (52%), inappropriate exclusions were based on:

There were no studies deemed to be at high risk of bias based on the judgements made about the index test.

There were no studies at high risk of bias based on the judgements made about the reference standard, and in 17 (33%) the risk was unclear.

Thirteen studies (25%) were deemed to be at high risk of bias based on flow and timing. In four studies, not all patients were included in the final analysis (Beart 1981; Bjerkeset 1988; Kohler 1980; Park 2009). In the remaining nine studies, a raised CEA value triggered the reference standard which could introduce work-up bias and result in false negative CEA results being misclassified as true negative results (Lucha 1997; Mackay 1974; Mariani 1980; Miles 1995; Tang 2009; Tobaruela 1997; Triboulet 1983; Wood 1980; Yu 1992).

Applicability concerns

We judged 37 studies (71%) to be at low risk of applicability concerns in all three domains (Figure 3). We rated only one study (Ochoa-Figueroa 2012) at high risk of applicability concerns in relation to patient selection, as it did not include all patients undergoing postoperative follow-up, but only those referred with suspected recurrence to the Department of Nuclear Medicine for fluoro-deoxy-glucose (FDG) PET-CT. There were no studies deemed to be at high risk for applicability based on the index test or reference standard.

Unclear risk

Of the 364 domains, we deemed 85 (23%) to be at unclear risk of bias or applicability. For the vast majority of these items poor reporting accounted for the unclear rating.

Findings

Diagnostic accuracy

The forest plot in Figure 4 (Analysis 1) shows the range of sensitivity and specificity of CEA for the detection of recurrent colorectal cancer across all 52 included studies.

Figure 4.

Forest plot for all 52 included studies for the threshold reported closest to 5 µg/L

TP = true positive; FP = false positive; FN = false negative; TN = true negative

The blue square depicts the sensitivity and specificity for each study and the horizontal line represents the corresponding 95% confidence interval for these estimates.

For studies reporting accuracy at more than one threshold, 2 x 2 data at the threshold closest to 5 μg/L are included in the plot (5 μg/L was the most commonly reported threshold).

.Sensitivity ranged from 41% to 97% and specificity from 52% to 100%.

Figure 5 plots each of the 52 studies in ROC space. The size of each box is proportional to the inverse standard error for sensitivity and specificity for each study (a larger box indicates greater precision).

Figure 5.

Scatter plot of sensitivity versus specificity for all 52 studies, regardless of threshold.

Each box represents the 2 x 2 data extracted from each study, with the width of the boxes being proportional to the inverse standard error of the specificity and the height of the boxes proportional to the inverse standard error of the sensitivity.

Effect of CEA threshold on diagnostic accuracy

Forty-one studies (79%) reported accuracy at just a single threshold. A wide range of thresholds were reported (2 to 40 µg/L). Four studies (8%) did not report which threshold they used (Graffner 1985; Johnson 1985; Ohlsson 1995; Seregni 1992).Seven studies (13%) reported 2 x 2 data for more than one threshold:

The forest plots in Figure 6 (Analysis 2) show the range of sensitivity and specificity for studies reporting the accuracy of CEA at cut-off values of 2.5, 5 and 10 µg/L.

Figure 6.

Forest plot broken down by threshold: CEA at 2.5µg/L, CEA at 5µg/L, CEA at 10µg/L.

TP = true positive; FP = false positive; FN = false negative; TN = true negative

The blue square depicts the sensitivity and specificity for each study and the horizontal line represents the corresponding 95% confidence intervals for these estimates.

The summary ROC curves and the summary estimates including confidence ellipses for the threshold values of 2.5, 5, and 10 µg/L (Analyses 3, 4 and 5) can be found in Figure 7, Figure 8 and Figure 9 respectively.

Figure 7.

Summary ROC plot of accuracy at a threshold of 2.5 µg/L.

Each box represents the 2 x 2 data extracted from each study. The width of the box is proportional to the number of patients who did not experience recurrence in each study, and the height is proportional to the number of patients that did develop recurrent CRC.

The filled circle is the pooled estimate for sensitivity and specificity and the line running through it is the summary ROC curve.

The smaller dotted ellipse represents the 95% credible region around the summary estimate; the larger dashed ellipse represents the 95% prediction region.

Figure 8.

Summary ROC plot of accuracy at a threshold of 5 µg/L.

Each box represents the 2 x 2 data extracted from each study.

The width of the box is proportional to the number of patients who did not experience recurrence in each study, and the height is proportional to the number of patients that did develop recurrent CRC.

The filled circle is the pooled estimate for sensitivity and specificity and the line running through it is the summary ROC curve.

The smaller dotted ellipse represents the 95% credible region around the summary estimate; the larger dashed ellipse represents the 95% prediction region.

Figure 9.

Summary ROC plot of accuracy at a threshold of 10 µg/L.

Each box represents the 2 x 2 data extracted from each study.

The width of the box is proportional to the number of patients who did not experience recurrence in each study, and the height is proportional to the number of patients that did develop recurrent CRC.

The filled circle is the pooled estimate for sensitivity and specificity and the line running through it is the summary ROC curve.

The smaller dotted ellipse represents the 95% credible region around the summary estimate; the larger dashed ellipse represents the 95% prediction region.

In the seven studies reporting a threshold of 2.5 µg/L, the sensitivity ranged from 65% to 91% and specificity from 34% to 98%. The pooled sensitivity of these studies was 82% (95% CI 78% to 86%) and pooled specificity 80% (95% CI 59% to 92%). Assuming that the proportion of patients with recurrence in any single testing period is 2% (based on our observed prevalence of recurrence of 30% and national guidance to conduct 14 to 15 CEA tests during follow-up), for every 1000 patients tested at a threshold of 2.5 µg/L, 16 cases of recurrence will be detected, four cases will be missed, and there will be 196 false alarms (people referred unnecessarily for further testing). More precise estimates of test performance using the incidence data reported by Sargent 2007 can be found in Summary of findings 2.

In the 23 studies which reported the impact of applying a threshold of 5 µg/L, sensitivity ranged from 43% to 93% and specificity from 60% to 100%. The pooled sensitivity of these studies was 71% (95% CI 64% to 76%) and pooled specificity 88% (95% CI 84% to 92%). For every 1000 patients tested at a threshold of 5 µg/L, 14 cases of recurrence will be detected, six cases will be missed, and there will be 118 false alarms. More precise estimates of test performance using the incidence data reported by Sargent 2007 can be found in Summary of findings 3

In the seven studies reporting the impact of applying a threshold of 10 µg/L, sensitivity ranged from 41% to 87% and specificity from 88% to 100%. The pooled sensitivity of these studies was 68% (95% CI 53% to 79%) and pooled specificity 97% (95% CI 90% to 99%). For every 1000 patients tested at a threshold of 10 µg/L, 14 cases of recurrence will be detected, seven cases will be missed, and there will be 29 false alarms. More precise estimates of test performance using the incidence data reported by Sargent 2007 can be found in Summary of findings 4.

Effect of the timing of CEA measurement

As previously described, we used two approaches when choosing which CEA measurement to include in the 2 x 2 tables. The first was to evaluate the CEA measurement taken closest to the time point at which recurrence was detected; the second was to look across all measurements to assess whether any had crossed the threshold during the entire follow-up period.

Including only those studies reporting accuracy at a threshold of 5 µg/L, we carried out a subgroup analysis for these two strategies.

We adopted the first strategy in eight studies, for which the pooled sensitivity and specificity were 69.0% (95% CI 57.3% to 78.7%) and 90.0% (95% CI 77.8% to 95.9%) respectively. We adopted the second strategy in nine studies, for which the pooled sensitivity and specificity were 64.5% (95% CI 55.2% to 72.9%) and 89.5% (95% CI 83.4% to 93.5%) respectively.

Effect of laboratory technique

We were unable to carry out a subgroup analysis based on specific laboratory techniques, as reporting was so limited that it is was difficult to identify groups of studies where we could be confident that they had all used consistent methods.

For those studies reporting accuracy at a threshold of 5 µg/L, we carried out a subgroup analysis comparing the variability in accuracy before and after the introduction of the international reference preparation (IRP 73/601) calibration. We excluded one study (Li Destri 1998) from this analysis, as there was insufficient information about the timing of the sample analysis and laboratory technique. There were 11 studies predating the introduction of the IRP, providing a pooled sensitivity of 73.6% (95% CI 63.2% to 81.8%) and a pooled specificity of 88.5% (95% CI 83.2% to 92.2%), and 11 studies used methods which incorporated the IRP, resulting in a pooled sensitivity of 67.9% (95% CI 58.6% to 75.9%) and a pooled specificity of 88.6% (95% CI 80.0% to 93.7%). These results indicate no significant reduction in variability, and this was confirmed when we added it as a covariate in the metaregression (P = 0.958).

Effect of patient selection on diagnostic accuracy

When restricting the analyses to the 11 studies deemed to be at low risk of bias in the patient selection domain of the QUADAS-2 assessment, the sensitivity ranged from 43% to 93% and specificity from 61% to 99%.

We added the patient selection risk of bias item as an ordinal covariate (low risk = 6, unclear risk = 6 and high risk = 11) in the metaregression analysis for those studies reporting accuracy at 5 µg/L. The effect of this covariate was not significant (P = 0.771).

Effect of index test on diagnostic accuracy

There were no studies deemed to be at high risk of bias in the index test domain of the QUADAS-2 assessment. When restricting the analyses to the 37 studies (71%) deemed to be at low risk of bias in the index test domain of the QUADAS-2 assessment, the sensitivity ranged from 41% to 97% and specificity from 52% to 100%.

We added the index test risk of bias item as a covariate (low risk = 15, unclear risk = 8) in the metaregression analysis for those studies reporting accuracy at 5 µg/L. The effect of this covariate was not significant (P = 0.901).

Effect of the reference standard on diagnostic accuracy

There were also no studies deemed to be at high risk of bias in the reference standard domain of the QUADAS-2 assessment. When restricting the analyses to the 35 studies (67%) deemed to be at low risk of bias in the reference standard domain of the QUADAS-2 assessment, the sensitivity ranged from 41% to 97% and specificity from 52% to 100%.

We added the reference standard risk of bias item as a covariate (low risk = 17, unclear risk = 6) in the metaregression analysis for those studies reporting accuracy at 5 µg/L. The effect of this covariate was not significant (P = 0.292).

Effect of flow and timing on diagnostic accuracy

When restricting the analyses to the 25 studies (48%) deemed to be at low risk of bias in the flow and timing domain of the QUADAS-2 assessment, the sensitivity ranged from 41% to 95% and specificity from 52% to 100%.

We added the flow and timing risk of bias item as an ordinal covariate (low risk = 12, unclear risk = 6 and high risk = 5) in the metaregression analysis for those studies reporting accuracy at 5 µg/L. The effect of this covariate was not significant (P = 0.664).

Summary of findings

Summary of findings 1. Summary of results table: different cut-offs
  1. 1as defined in the Reference standards section of the Methods.
    2three-monthly prevalence is estimated as 2%, as the median prevalence amongst the included studies was 30% and a standard follow-up schedule will include 14 to 15 CEA tests over five years.

Review question: What is the accuracy of single-measurement blood CEA as a triage test to prompt further investigation for colorectal cancer recurrence after curative resection?
Population: adults with no detectable residual disease after curative surgery (with or without adjuvant therapy)
Studies: cross-sectional diagnostic test accuracy studies, cohort studies, and RCTs, reporting 2 x 2 data
Index test: Blood carcino-embryonic antigen (CEA)
Reference standard: appropriate¹ imaging, histology, or routine clinical follow-up
Setting: primary or hospital care.
Subgroup Number
(Studies)
Sensitivity (95% CI) Specificity (95% CI)

Interpretation

Assuming a constant incidence of 2%² recurrence at each measurement point, testing 1000 people will have the following outcome depending on the CEA threshold applied

2.5 µg/L1515 (7)82% (78 to 86)80% (59 to 92)16 cases of recurrence will be detected and 4 cases will be missed.
196 people will be referred unnecessarily for further testing
5 µg/L4585 (23)71% (64 to 76)88% (84 to 92)14 cases of recurrence will be detected and 6 cases will be missed.
118 people will be referred unnecessarily for further testing
10 µg/L2341 (7)68% (53 to 79)97% (90 to 99)14 cases of recurrence will be detected and 6 cases will be missed.
29 people will be referred unnecessarily for further testing
Summary of findings 2. Outcome of follow-up testing using a CEA threshold of 2.5 µg/L
  1. 1Estimates are based on data reported by Sargent 2007. Three-monthly data were unavailable, and so constant rates were assumed during each six-month period for the first two years. Estimates are rounded.

Month when CEA measured per 1000 patients tested at a threshold of 2.5 µg/L False alarm rate
Estimated recurrences¹ Referrals for raised CEA Cases of recurrence detected Cases of recurrence missed False alarms (cases investigated when cancer not present)
Follow-up years 1 and 2: 3-monthly CEA testing
31921216319692%
61921216319692%
93922432719286%
123922432719286%
153722330719387%
183722330719387%
213121925619489%
243121925619489%
Follow-up years 3, 4 and 5: 6-monthly CEA testing
304622938819183%
363622330619387%
422721722519590%
482521621419590%
541721114319793%
601420811319795%
Summary of findings 3. Outcome of follow-up testing using a CEA threshold of 5 µg/L
  1. 1Estimates are based on data reported by Sargent 2007. Three-monthly data were unavailable, and so constant rates were assumed during each six-month period for the first two years. Estimates are rounded.

Month when CEA measured per 1000 patients tested at a threshold of 5 µg/L False alarm rate
Estimated recurrences¹ Referrals for raised CEA Cases of recurrence detected Cases of recurrence missed False alarms (cases investigated when cancer not present)
Follow-up years 1 and 2: 3-monthly CEA testing
31913113611890%
61913113611890%
939143281111580%
1239143281111580%
1537142261111682%
1837142261111682%
213113822911684%
243113822911684%
Follow-up years 3, 4 and 5: 6- monthly CEA testing
3046147331311478%
3636142261011682%
422713619811786%
482513518711787%
541713012511891%
601412810411892%
Summary of findings 4. Outcome of follow-up testing using a CEA threshold of 10 µg/L
  1. 1Estimates are based on data reported by Sargent 2007. Three-monthly data were unavailable, and so constant rates were assumed during each six-month period for the first two years. Estimates are rounded.

Month when CEA measured per 1000 patients tested at a threshold of 10 µg/L False alarm rate
Estimated recurrences¹ Referrals for raised CEA Cases of recurrence detected Cases of recurrence missed False alarms (cases investigated when cancer not present)
Follow-up years 1 and 2: 3- monthly CEA testing
319421363070%
619421362970%
9395527132952%
12395527132952%
15375425122953%
18375425122953%
21315021102958%
24315021102958%
Follow-up years 3, 4 and 5: 6- monthly CEA testing
30466031152948%
36365324122954%
4227481992961%
4825461782963%
5417411163072%
6014391053075%

Discussion

Summary of main results

We include 52 studies in the meta-analysis, covering 9717 patients (median sample size = 139, IQR: 72 - 247). The median proportion of recurrences in each study was 29% (IQR: 24% - 36%), agreeing with previously reported recurrence rates (Labianca 2010).

The diagnostic accuracy of CEA was reported at 15 different thresholds, ranging from 2 to 40 µg/L. Seven studies (13%) reported accuracy at a threshold of 2.5 µg/L, providing a pooled sensitivity of 82% (95% CI 78% to 86%) and a pooled specificity of 80% (95% CI 59% to 92%). The most commonly reported threshold was 5 µg/L (23 studies, 44%), providing a lower sensitivity of 71% (95% CI 64% to 76%) and an increased specificity of 88% (95% CI 84% to 92%). Seven studies (13%) reported accuracy at a threshold of 10 µg/L. Implementing such a high threshold reduced sensitivity to 68% (95% CI 53% to 79%), but provided high specificity of 97% (95% CI 90% to 99%).

Reporting quality was insufficient in important areas such as laboratory techniques. Insufficient detail about laboratory techniques and the frequent use of composite reference standards made it impossible to conduct desirable subgroup analyses. An individual-patient data meta-analysis would be required to fully explore the influence of factors such as preoperative CEA levels, chemotherapy, site of recurrence and smoking status, that are known to impact on CEA levels in follow-up.

Our results compared with other reports

Tan 2009 carried out a meta-analysis of 20 studies that reported the accuracy of CEA for the diagnosis of colorectal cancer recurrence using the Moses-Littenberg Method (Moses 1993). Their pooled estimate for specificity at a threshold of 5 µg/L was the same as ours (88%). Our pooled estimate for sensitivity was higher (71% versus 63%), but this difference is not statistically significant.

The method used by Tan 2009 to identify 2.2 µg/L as the 'optimum' CEA threshold was based on linear extrapolation (the lowest threshold included in their study was 3 µg/L). We instead implement bivariate meta-analyses (Reitsma 2005), as recommended in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010). This method is statistically more rigorous than the method implemented in Tan 2009, and directly accounts for the within- and between-study variability in sensitivity and specificity.

We question the Tan 2009 recommendation of 2.2 µg/L (which was based on achieving high sensitivity) not just on the basis of the low specificity (and high false alarm rate), but also because there appears to be a 'ceiling' effect in terms of sensitivity - even at a threshold of 2.5 µg/L, around one in five cases of recurrence would be missed. The failure to exceed a sensitivity of about 80% even with a low threshold or poor specificity reflects the well-documented fact that some recurrent cancers are not associated with a rise in blood CEA levels.

Strengths and weaknesses of the review

Completeness

A key strength of this review is the comprehensiveness of our searches. We avoided the use of search filters and did not restrict our review to English-language publications. Two review authors screened all abstracts independently, with a third independently settling any disagreement over inclusion. We retrieved and analysed all full-text articles that we felt could be potentially relevant based on the title and abstract. We based additional searches on the citation of full-text articles to reduce the risk of missing relevant studies. Foreign-language articles were translated or assessed or both by colleagues of the authors proficient in the language in question.

It is not possible to estimate the impact of unpublished studies on our findings, as little is known about the mechanisms of publication bias for diagnostic accuracy studies (Allen 2013). Despite this, our included studies are likely to represent the vast majority of studies that provide evidence on this topic.

Two review authors then extracted data independently, and three authors independently performed QUADAS-2 assessment of the included studies, with subsequent discussion to reach consensus on overall judgements of risk of bias and applicability. The meta-analyses followed Cochrane DTA guidelines.

Variability

A major weakness of this review is that we considered many included studies to be at high risk of bias. There was also considerable between-study variation in the reporting of: 1) stage of primary disease included; 2) approach to ensuring no residual disease; 3) reporting of smoking; 4) reporting of chemotherapy treatment; and 5) the location of recurrence. All of these factors could plausibly have some influence on CEA levels, but corresponding 2 x 2 tables were not presented for these subgroups, and so it was not possible to adjust for this variation in our analyses.

The QUADAS-2 assessment of methodological quality highlighted the extent of the quality issues in the existing literature. Even the three studies that we assessed as having no risk of bias or applicability concerns were subject to considerable between-study heterogeneity: they each reported accuracy at different CEA thresholds, implemented different CEA laboratory techniques, and used differing composite reference standards to detect recurrence. The varying thresholds made it unfeasible to provide pooled diagnostic accuracy estimates for these high-quality studies.

Over half of the included studies (n = 27, 52%) were at high risk of selection bias, mainly due to inappropriate patient exclusions. We deemed a further 15 studies (29%) to be at unclear risk of bias for patient selection, due to poor reporting. This makes our accuracy estimates susceptible to selection bias, particularly if those excluded were at particularly high or low risk of recurrence. To investigate this further, we removed those studies at high and unclear risk of bias for patient selection in a sensitivity analysis. The pooled estimates were not significantly different from the overall pooled results (sensitivity = 73%, 95% CI 64% to 80%; specificity = 87%, 95% CI 79% to 92%).

The methods used to measure CEA were also poorly reported: three studies (6%) did not report the CEA threshold used to determine a positive result, 15 studies (29%) did not report which laboratory technique had been used, and 43 studies (83%) failed to report any indicator of method accuracy or an estimate of CEA reproducibility. It is well known that variability exists between laboratory methods and between laboratories, and without this information it is impossible to adjust for any bias that has been introduced by the differences in method. The IRP calibration (73/601, introduced in 1992) attempts to reduce between-laboratory and between-technique variability, so we performed a sensitivity analysis leaving only the studies that were conducted after its introduction. We did not find the pooled accuracy estimates to be significantly different from the overall analysis (sensitivity = 67.9%, 95% CI 58.6% to 75.9%; specificity = 88.6%, 95% CI 80.0% to 93.7%).

A possible source of bias in this review is likely to be the methods used to implement the reference standard. In nine studies, the reference standard was only carried out if a rise in CEA was detected, possibly causing false-negative results to be misclassified as true-negative results. Furthermore, most studies implemented a composite reference standard, but failed to consistently reported which investigation (within the composite) actually diagnosed recurrence. In half of the studies (n = 26, 50%), positive results for certain reference tests triggered the use of other reference tests. These concerns over partial and differential verification were considered in the flow and timing domain of QUADAS-2, explaining why there were no studies deemed to be at high risk of bias in the reference standard domain.

The time between the CEA measurement and the reference test used in the 2 x 2 table was not reported in any of the studies. There is therefore a high chance of misclassification due to disease progression during the time between CEA and the reference test. Understanding this relationship is important in this setting as: a) a high-grade recurrence will progress more quickly than low-grade; b) this information is required to estimate lead time. Furthermore, no study reported 2 x 2 data for each three- to six-month period of follow-up, which would be desirable given that CRC recurrence is known to occur more commonly in the first two years of follow-up, suggesting that a variable threshold may have greater accuracy (Sargent 2007).

Applicability of findings to the review question

All of the studies identified were carried out in hospital outpatient clinics, except one that followed up patients in both primary and secondary care. As the patient population is so well defined in this review (postoperative curative colorectal cancer resection), it is unlikely that the actual clinical setting in which follow-up takes place would have any influence on the severity of disease seen or consequently on the accuracy of CEA.

Changing the setting of follow-up could affect the accuracy of the CEA measurement if transporting blood samples taken in a community setting are stored suboptimally and there are long delays in blood reaching the laboratory. But monitoring CEA in primary care is already common practice in many countries and these potential problems have been successfully addressed. Implementation of the reference standard might also vary if patients being followed up in hospital are more likely to be referred for further investigation for reasons other than a rise in CEA. However, the Australian multicentre RCT investigating GP versus surgical follow-up reported similar recurrence rates and times to detection, irrespective of place of follow-up (Wattchow 2006).

For these reasons, we regard the findings of this review as applicable to follow-up in the primary and specialist care setting.

To make sense of the meta-analysis results and calculate false-alarm rates, the pooled estimates of sensitivity and specificity need to be converted into predictive values, taking into account the incidence of disease in the relevant testing interval. In making this conversion, we assumed that sensitivity and specificity are constant during the follow-up period, which seems reasonable, as we are aware of no evidence that recurrences presenting at different time points have a different propensity to release CEA.

CEA is usually measured about 14 to 15 times during the five years following primary treatment (three-monthly for two years and then six-monthly) and so the crudest estimate of the number of recurrences potentially detectable in each testing interval is 2% (the median incidence of recurrence in the included studies of 30% divided by 15). However, in reality incidence is not constant at each testing point, but changes with time and follow-up interval. So, as some readers will wish to apply the findings of our review to a more precise estimate of incidence from actual clinical practice, we have reported estimates of test performance based on external data from Sargent 2007, which is the best data currently available on the incidence of recurrence at each point during follow-up.

Authors' conclusions

Implications for practice

The most important conclusion from this review is that CEA has inadequate sensitivity to be used as the sole method of detecting recurrence. Most national guidelines already recommend that it should be used in conjunction with another mode of diagnosis (such as CT imaging of the thorax, abdomen, and pelvis at 12 to 18 months) to pick up the remaining cases. Our review supports this recommendation. If CEA is used as the sole triage test, a significant number of cases will be missed, whatever threshold is adopted for defining a positive test.

It is important to point out that this review provides no evidence to help choose which diagnostic modality to use for this supplementary testing, nor the frequency with which it should be undertaken. However, current recommendations are consistent with the results of the FACS trial which showed that regular CEA blood testing achieves similar diagnostic performance to regular CT imaging, if supplemented with a single CT scan at 12 to 18 months (Primrose 2014).

Supplementing CEA with another testing modality to improve sensitivity also makes it easier to adopt a threshold for defining a positive test which reduces the number of patients requiring further investigation with CT imaging or other more invasive investigations. This is important for minimising unnecessary anxiety and radiation hazard for patients. It is also important in health economies such as the NHS, because of the expense and limited capacity for investigations such as CT imaging and colonoscopy.

Current standard practice (based on national recommendations) is to apply a threshold 5 µg/L. At this threshold, assuming that the proportion of patients with recurrence in any single testing period is about 2% (based on our observed prevalence of recurrence of 30% and national guidance to conduct 14 to 15 CEA tests during follow-up), then there would be 118 false alarms and six missed cases for every 1000 patients tested. Increasing the threshold to 10 µg/L reduces the number of false alarms to 29 at a cost of six missed cases (Summary of findings 1). It is possible (although beyond the scope of this review to assess) that these missed cases may be avoided by the strategy of supplementary testing with another investigative modality as recommended above. For those interested in reviewing national recommendations on testing frequency, and the optimal threshold to apply at each time point (which need not necessarily be constant), we have included more precise estimates of test performance derived from incidence data reported by Sargent 2007 for the thresholds of 2.5 µg/L (Summary of findings 2), 5 µg/L (Summary of findings 3), and 10 µg/L (Summary of findings 4).

One potential solution to improve the diagnostic performance of CEA that is not addressed by this review is to treat CEA as a monitoring test rather than a one-off diagnostic test. Studies excluded from this review (Characteristics of excluded studies) for not being DTA studies have investigated the utility of: CEA frequency (Carl 1983), CEA slope (Staab 1985a), CEA doubling time (Ito 2002; Koga 1999) and a CEA nomogram (Minton 1978a; Minton 1978b; Minton 1989). The authors of the FACS trial have more recently pointed out that taking account of the change in CEA results over time and setting a threshold on the basis of the trend in CEA level could have substantially improved CEA performance, with an area under the ROC curve increasing from 0.74 to 0.90 (Shinkins 2014).

Implications for research

It is clear that measuring blood CEA has insufficient sensitivity to be used alone. Future research needs to explore the optimal timing and extent of supplementary CT imaging. It is also becoming clear that using one-off CEA measurements is suboptimal. An analysis of the benefits of making decisions to further investigate on the basis of trends over time needs to be done, and to be augmented by cost-benefit analysis of different strategies for the timing of monitoring tests and the optimal combination of CEA blood testing and CT imaging.

The other clear outcome from this review is the overall poor quality of reporting of diagnostic accuracy studies in this field. This poor reporting is compounded by the considerable between-study heterogeneity and limitations of study quality. In response to the methodological limitations highlighted in this review, authors of future research investigating the diagnostic accuracy of CEA for CRC recurrence should take care to clearly report: the CEA threshold and technique used, with an indication of method accuracy and of CEA reproducibility; the reference test used in any 2 x 2 table reported; 2 x 2 tables for each time point that the index test is measured; and the timing of the CEA test in relation to the index test (preferably as individual patient data).

The lack of significant improvement in diagnostic accuracy following sensitivity analysis using studies deemed to be at low risk of bias in the QUADAS-2 assessment also suggests that modifications to QUADAS-2 may be warranted in assessing the quality of diagnostic tests used for follow-up monitoring.

Acknowledgements

The review authors would like to thank Professor Paul Glasziou for his input, especially into the development of the modified QUADAS-2 assessment tool, Dr Clare Davenport for her input into the development of theTitle Registration Form, Copy Edit Support (CES) and Henning Keinke Andersen (ME of the CCCG) for careful revision of the review.

Data

Presented below are all the data for all of the tests entered into the review.

Table Tests. Data tables by test
TestNo. of studiesNo. of participants
1 CEA - all thresholds529717
2 CEA at 2.5µg/L71515
3 CEA at 5µg/L234585
4 CEA at 10µg/L71607
Test 1.

CEA - all thresholds.

Test 2.

CEA at 2.5µg/L.

Test 3.

CEA at 5µg/L.

Test 4.

CEA at 10µg/L.

Appendices

Appendix 1. Cochrane Central Register of Controlled Trials search strategy

#1MeSH descriptor: [Colorectal Neoplasms] explode all trees
#2(colorectal near/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)):ti,ab,kw (Word variations have been searched)
#3(colon* near/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)):ti,ab,kw (Word variations have been searched)
#4(bowel near/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)):ti,ab,kw (Word variations have been searched)
#5(rectal near/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)):ti,ab,kw (Word variations have been searched)
#6(rectum near/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)):ti,ab,kw (Word variations have been searched)
#7#1 or #2 or #3 or #4 or #5 or #6
#8MeSH descriptor: [Carcinoembryonic Antigen] explode all trees
#9cea:ti,ab,kw (Word variations have been searched)
#10(carcinoembryonic near/3 antigen*):ti,ab,kw (Word variations have been searched)
#11(carcinoembryonic near/3 antibod*):ti,ab,kw (Word variations have been searched)
#12(carcino-embryonic near/3 antigen*):ti,ab,kw (Word variations have been searched)
#13(carcino-embryonic near/3 antibod*):ti,ab,kw (Word variations have been searched)
#14#8 or #9 or #10 or #11 or #12 or #13
#15#7 and #14

Appendix 2. MEDLINE search strategy

1colorectal neoplasms/ or exp adenomatous polyposis coli/ or exp colonic neoplasms/ or colorectal neoplasms, hereditary nonpolyposis/ or exp rectal neoplasms/142383
2(colorectal adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.69267
3(colon* adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.56720
4(bowel adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.3988
5(rectal adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.18409
6(rectum adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.4598
71 or 2 or 3 or 4 or 5 or 6179150
8Carcinoembryonic Antigen/13372
9cea.ti,ab.16371
10(carcinoembryonic adj3 antigen?).ti,ab.11442
11(carcinoembryonic adj3 antibod*).ti,ab.622
12(carcino-embryonic adj3 antigen?).ti,ab.431
13(carcino-embryonic adj3 antibod*).ti,ab.13
148 or 9 or 10 or 11 or 12 or 1323958
15Neoplasm Recurrence, Local/79823
16Recurrence/155149
17recur*.ti,ab.381384
18relaps*.ti,ab.116217
19treatment failure/25585
20Reoperation/63998
21Follow-Up Studies/ and Postoperative Care/5767
22reoperat*.ti,ab.23840
23((local or distant) adj2 failure).ti,ab.3371
24((therap* or treatment or surg*) adj3 fail*).ti,ab.58705
25((therap* or treatment or surg*) adj3 (respond* or response*)).ti,ab.116904
26((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 follow up).ti,ab.16723
27((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 surveillance).ti,ab.1277
28((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 monitor*).ti,ab.3604
2915 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28802827
307 and 14 and 291993
317 and 146353
32limit 31 to "reviews (maximizes specificity)"41
3330 not 321966

Appendix 3. Embase search strategy

1exp colon cancer/ or exp rectum cancer/172220
2(colorectal adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.97898
3(colon* adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.75721
4(bowel adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.5761
5(rectal adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.26610
6(rectum adj3 (neoplas* or cancer? or tumour? or tumor? or carcinoma?)).ti,ab.5978
71 or 2 or 3 or 4 or 5 or 6234787
8carcinoembryonic antigen/25911
9cea.ti,ab.22520
10(carcinoembryonic adj3 antigen?).ti,ab.13394
11(carcinoembryonic adj3 antibod*).ti,ab.657
12(carcino-embryonic adj3 antigen?).ti,ab.617
13(carcino-embryonic adj3 antibod*).ti,ab.21
148 or 9 or 10 or 11 or 12 or 1336255
15cancer recurrence/ or tumor recurrence/119064
16recurrent disease/ or relapse/192303
17recur*.ti,ab.523223
18relaps*.ti,ab.174290
19exp treatment failure/82867
20Reoperation/53394
21follow up/ and (postoperative care/ or postoperative period/)38038
22reoperat*.ti,ab.31321
23((local or distant) adj2 failure).ti,ab.4986
24((therap* or treatment or surg*) adj3 fail*).ti,ab.83522
25((therap* or treatment or surg*) adj3 (respond* or response*)).ti,ab.167374
26((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 follow up).ti,ab.23063
27((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 surveillance).ti,ab.1797
28((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) adj5 monitor*).ti,ab.4961
2915 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 281107887
307 and 14 and 292994
31(meta-analysis or systematic review or MEDLINE).tw.144743
327 and 14 and 3178
3330 not 322952

Appendix 4. Science Citation Index & Conference Proceedings Citation Index - Science search strategy:

#1

TOPIC: ((colorectal NEAR/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*))) OR TOPIC: ((colon* NEAR/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*))) OR TOPIC: ((bowel NEAR/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*))) OR TOPIC: ((rectal NEAR/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*))) OR TOPIC: ((rectum NEAR/3 (neoplas* or cancer* or tumour* or tumor* or carcinoma*)))

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

189,742
#2

TOPIC: (cea) OR TOPIC: ((carcinoembryonic NEAR/3 antigen*)) OR TOPIC: ((carcinoembryonic NEAR/3 antibod*)) OR TOPIC: ((carcino-embryonic NEAR/3 antigen*)) OR TOPIC: ((carcino-embryonic NEAR/3 antibod*))

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

23,879
#3

TOPIC: (recur*) OR TOPIC: (relaps*) OR TOPIC: (reoperat*)

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

511,568
#4

TOPIC: (((local or distant) NEAR/2 failure)) OR TOPIC: (((therap* or treatment or surg*) NEAR/3 fail*)) OR TOPIC: (((therap* or treatment or surg*) NEAR/3 (respond* or response*)))

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

200,865
#5

TOPIC: (((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) NEAR/5 "follow up")) OR TOPIC: (((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) NEAR/5 surveillance)) OR TOPIC: (((postoperat* or post-operat* or postsurg* or post-surg* or posttreat* or post-treat* or posttherap* or post-therap*) NEAR/5 monitor*))

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

17,719
#6

#5 OR #4 OR #3

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

699,223
#7

#6 AND #2 AND #1

Indexes=SCI-EXPANDED, CPCI-S Timespan=All years

1,518

Appendix 5. Operational guidance for modified QUADAS-2 tool

Unless otherwise specified, each item must be explicitly reported to achieve a “yes” answer.

DOMAIN 1: Patient Selection
A: Risk of Bias
1. Was a consecutive or random sample of patients enrolled? Yes/No/Unclear
2. Did the study avoid inappropriate exclusions?
 Yes

Patients are included in follow-up post radical CRC resection, OR

Exclusions was justified in the text and reviewers reached consensus on the appropriateness of any exclusions. Exclusions based on patient characteristics allowing subgroup analysis (e.g. tumour grade) should be deemed appropriate

 NoCriteria for “yes” not achieved.
 UnclearExclusions not reported clearly.
OVERALL RISK OF BIAS: LOW/HIGH/UNCLEAR
B: Applicability
1. Is there concern that the included patients do not match the review question?
 YesPatients are not undergoing follow-up post radical CRC resection including CEA measurement.
 NoPatients are undergoing follow-up post radical CRC resection including CEA measurement.
 UnclearThe included population is not defined.
OVERALL CONCERN REGARDING APPLICABILITY: LOW/HIGH/UNCLEAR
 
DOMAIN 2: Index Tests
A: Risk of Bias
1. If a threshold was used, was it pre-specified? Yes/No/Unclear
2. Is the same method and instrument used for all CEA measurements? Yes/No/Unclear
3. Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations? Yes/No/Unclear
4. Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme? Yes/No/Unclear
OVERALL RISK OF BIAS: LOW/HIGH/UNCLEAR
B: Applicability
1. Is there concern that the index test, its conduct, or interpretation differ from the review question?
 YesBlood CEA is not interpreted as a stand-alone test to trigger investigation for CRC recurrence
 NoBlood CEA is interpreted as a stand-alone test to trigger investigation for CRC recurrence
 UnclearIt is unclear whether the index test differs from the review question
OVERALL CONCERN REGARDING APPLICABILITY: LOW/HIGH/UNCLEAR
 
DOMAIN 3: Reference Standard
A: Risk of Bias
1. Is the reference standard likely to correctly classify the target condition?
- can we confidently exclude recurrence on the basis of no clinical detection of recurrence when we are assessing the utility of CEA at detecting asymptomatic recurrence amenable to resection?
 YesAn appropriate reference standard (as defined in the protocol) is used.
 NoAn inappropriate reference standard is used
 UnclearThe reference standard used is not clearly specified.
2. Were the reference standard results interpreted without knowledge of the results of the index test?
- If tests are done as part of a follow-up regime it must not be assumed that the interpretation of each test is independent of another. It must be clearly stated when reference test interpretation occurred.
 YesThe reference standard results were interpreted without knowledge of the index test(s).
 NoThe reference standard results were interpreted with knowledge of the index test(s).
 UnclearIt is not clear whether interpretation was blinded or not.
OVERALL RISK OF BIAS: LOW/HIGH/UNCLEAR
B: Applicability
1. Is there concern that the target condition as defined by the reference standard does not match the review question? Yes/No/Unclear
OVERALL CONCERN REGARDING APPLICABILITY: LOW/HIGH/UNCLEAR
 
DOMAIN 4: Flow and Timing
A: Risk of Bias
1. Was the index test repeated prior to the reference standard? Yes/No/Unclear
2. Was the the timing between index test(s) and reference standard ascertainable?
 YesThe timing was ascertainable.
 UnclearNot reported, variable or could not be clearly determined
3. Did all included patients who had at least one CEA measurement receive a reference standard? Yes/No/Unclear
4. Did patients receive the same reference standard?
 Yes>95% of patients received the same reference standard regardless of index test results or place within a follow-up schedule.
 No>95% of patients did not receive the same reference standard regardless of index test results, or place within the follow-up schedule.
 UnclearIt is unclear whether all the included patients received same reference standard regardless of index test results
5. Were all patients included in the analysis? Yes/No/Unclear
OVERALL RISK OF BIAS: LOW/HIGH/UNCLEAR

Contributions of authors

NWR and BDN devised the search strategy.
BDN and IP reviewed titles, abstracts, and full-text articles, and extracted all data.
BS acted as moderator at all stages.
BDN, IP, and BS performed the QUADAS-2 assessment.
BS, BDN, and DM devised the statistical analysis.
BS conducted statistical analyses in R, Stata, and SAS.
BDN and BS wrote the initial draft of the review
DM, TJJ, SM, IP, JP, and RP provided comments and edited the draft.

Declarations of interest

None

Sources of support

Internal sources

  • No sources of support supplied

External sources

  • HTA - 11/136/81, UK.

    This work is partly funded by the National Institute of Health Research (NIHR) Health Technology Appraisal Programme project grant "What CEA level should trigger further investigation during follow up after curative treatment for colorectal cancer?" (HTA - 11/136/81).

  • National Institute for Health Research (NIHR) School for Primary Care Research (SPCR), UK.

    The Nuffield Department of Primary Care Health Sciences receives funding from the National Institute for Health Research (NIHR) School for Primary Care Research (SPCR).

Differences between protocol and review

We stated we would contact the principal investigators to clarify methodological queries and ask for any unpublished data relevant to this review. This has not yet been done, and we have stated this in the Methods section.

We were unable to apply the Hamza method which allows data for multiple thresholds from a single study to be incorporated in the meta-analysis. This method requires 2 x 2 data at consistent thresholds across studies, but in our review accuracy has been reported at a wide range of inconsistent thresholds.

In terms of sensitivity analyses, we did not feel it necessary to remove each study in turn from the analyses as our review includes such a large number of studies, of which none is notably larger than the others, making it high unlikely that one particular study would heavily skew the overall pooled estimates.

Characteristics of studies

Characteristics of included studies [ordered by study ID]

Banaszkiewicz 2011

Study characteristics
Patient sampling

Country

Poland

Study design

Retrospective casenote review

Setting

Hospital

Dates of data collection

N/R

Population (n)

965

Inclusion criteria

Patients after radical surgery in whom prognosis following a possible second operation was good

Exclusion criteria

Non-radical surgery or concomitant disease making survival of a second operation unlikely

Participants included (n)

340

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A - D

Perioperative Investigations done to ensure no residual disease

Endoscopic polypectomy

Chemotherapy/radiotherapy?

Radical

Recurrences (n)

112

Site of recurrences

Liver 44, Local 32, Lung 7, Disseminated 12, other 6, 2 sites 11

Index tests

CEA timing

CEA 3, 6, 12 months, then once a year up to 5 years

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

Follow-up visits at 3, 6, 12 months, then once a year up to 5 years. Follow-up schedule included patient’s history and physical examination, measurement of CEA serum concentration and classic colonoscopy

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Yes  
    

Barillari 1992

Study characteristics
Patient sampling

Country

Italy

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

66

Inclusion criteria

Rectal cancer treated for cure

Exclusion criteria

N/R

Participants included (n)

66

Patient characteristics and setting

Age range

62.3 yrs (mean)

Smoking status

N/R

Site of primary tumour

rectum

Stage of primary tumour

6 Stage A, 32 Stage B, 28 Stage C

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

33

Site of recurrences

Local 10, Metastatic 25 (Lungs 3, peritoneum 10,bones 2,liver 21, multiple 8)

Index tests

CEA timing

3-monthly CEA

CEA technique

CEA was analysed using a direct radioimmunologic method (CEA-PR; Sorin Biomedica)

CEA threshold

3 µg/L

Definition of positive

Any elevation of 1 of the antigen levels greater than the limit defined by the between assay coefficient of variation (calculated on the basis of 2 standard deviations) was defined as significant, and the assay was repeated after 10 days.

Which CEA value (s) used?

Repeated value.

Target condition and reference standard(s)

Follow-up schedule

3-monthly to 60 months: blood CEA, TPA, CA19.9 and clinical exam. 6, 18, 30, 42, 54 months: USS Abdomen, CXR, Barium Enema. 12, 24, 36, 48, 60 months: colonoscopy, CT body. 6, 18, 30, 42 months: Bone scan.

Reference standard

Abdominal or total body CT, a chest x-ray examination, a bone scan, an endoscopy, and a clinical examination were performed. An exploratory laparotomy was performed when all three markers were elevated, even if recurrence was not confirmed by total body CT scan and clinical examinations

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Yes  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Yes  
    

Beart 1981

Study characteristics
Patient sampling

Country

USA

Study Design

Prospective

Setting

Department of Surgery and Oncology, Mayo Clinic and Mayo Foundation

Dates of data collection

1976 - 1986

Population (n)

149

Inclusion criteria

Resection of Dukes' B2 or C colorectal carcinoma was followed from the time of operation until the time of tumour recurrence or writing the published paper

Exclusion criteria

N/R

Participants included (n)

149

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colon

Stage of primary tumour

Dukes B or C

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Some got radiotherapy, chemotherapy, and/or immunotherapy. Numbers not specified

Recurrences (n)

34

Site of recurrences

Liver metastasis 14, Chest 6, Pelvic disease 12

Index tests

CEA Timing

At least every 15 week

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

At least every 15 weeks a complete history was taken and physical examination was carried out. A CXR was obtained, and laboratory determinations included complete blood count, alkaline phosphatase, SGOT, SGPT, and CEA. LDH and proctoscopic examinations were done every 6 months. A BE and liver scanning were done annually

Reference standard

Additional tests including CT, laparoscopy, liver biopsy, and abdominal exploration were ordered as indicated by the history, physical examination, or positive laboratory results. All recurrent tumours were documented histologically

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?No  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Bjerkeset 1988

Study characteristics
Patient sampling

Country

Norway

Study Design

Prospective

Setting

Hospital

Dates of data collection

1976 - 1979

Population (n)

244

Inclusion criteria

colorectal cancer resection operated for cure

Exclusion criteria

Did not survive resection, residual disease, resection margins not clear, pre- and post-op CEA determination

Participants included (n)

164

Patient characteristics and setting

Age range

N/R

Smoking status

Some, but not quantified

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 58, B 76, C 48, D 50, unknown 12

Perioperative Investigations done to ensure no residual disease

Clear resection margins, no residual disease

Chemotherapy / Radiotherapy?

22 Dukes B - C randomised to 5-year follow-up; 21 had preoperative external radiation

Recurrences (n)

47

Site of recurrences

Liver 12, Lungs 10, Local 8, Local and distant 5, carcinomatosis 6, multiple 6

Index tests

CEA timing

3, 6, 12, 18, 24, then yearly

CEA technique

Roche Ria test- repeat if raised, if repeat raised then test as described in follow-up

CEA threshold

3.5 µg/L

Definition of positive

Transient

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

3, 6, 12, 18, 24 months then yearly CEA, clinical, biochemical, immunological (immunoglobulins and complement). CXR. Colonoscopy 6, 18 months. DCBE 1, 3, 5 years. "complimentary radiographic, scintographic, ultrasonographic added if indicated."

Reference standard

If nothing found investigating for increased CEA, then a second-look operation was performed (laparotomy and biopsy) in 23, liver imaging 8, autopsy 5, clinical course 6

Flow and timing

Timing of CEA vs reference standard (days)

as per follow-up schedule

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?No  
Was the index test repeated prior to the reference standard?Yes  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Carlsson 1983

Study characteristics
Patient sampling

Country

Sweden

Study design

Prospective study

Setting

Hospital

Dates of data collection

N/R

Population (n)

163

Inclusion Criteria

Curative operation for colorectal cancer

Exclusion Criteria

Advanced age, moving away, death 3 months postop

Participants Included (n)

139

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative Investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

50

Site of recurrences

N/R

Index tests

CEA timing

Blood tests 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 42, 48, 60 months post-1977- blood tests 3, 6, 12, 18, 24, 30, 36, 42, 48, 60 months

CEA technique

Direct radio immunoassay method developed at the Department of Nuclear Medicine, Malmo General Hospital

CEA threshold

3 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

Until 1977: Follow-up exam and rectoscopy 3, 6, 9, 12, 15, 18 ,21, 24, 26, 42, 48, 60 months. Double contrast enema 3, 12, 24, 36, 48, 60 months. CXR and blood tests 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 42, 48, 60 months. From 1977: Physical exam and rectoscopy 3, 12, 24, 36, 48, 60 months. Double contrast enema 3, 12, 24, 36, 48, 60 months. CXR and blood tests 3, 6, 12, 18, 24, 30, 36, 42, 48, 60 months

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Carpelan-Holmström 2004

Study characteristics
Patient sampling

Country

Finland

Study Design

Retrospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

354

Inclusion criteria

Curative surgery, but unclear

Exclusion criteria

Palliative, followed up elsewhere, no preoperative serum samples, no serum at the time of recurrence

Participants included (n)

102

Patient characteristics and setting

Age range

29 - 88 yrs

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A - D (16 Dukes A, 45 Dukes B, 34 Dukes C, and 7 Dukes D)

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

40

Site of recurrences

Local 17, Liver 10, Various 13

Index tests

CEA timing

N/R

CEA technique

CEA was measured with a time-resolved immunofluorometric assay (AutoDELFIA®; Wallac, Turku, Finland). The detection limit of the assay is 0.2 µg/L, and the inter-assay coefficient of variation is 3% in the concentration range 3 – 90 µg/L (total CV 4%)

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Reference standard

Clinical follow-up

Flow and timing

Timing of CEA vs reference standard (days)

Unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Unclear
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Yes  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Unclear  
    

Carriquiry 1999

Study characteristics
Patient sampling

Country

Uruguay

Study design

Retrospective casenote review.

Setting

Hospital

Dates of data collection

1985 - 1998

Population (n)

209

Inclusion criteria

Histologically proven colorectal carcinoma, 3 postoperative CEA measurements, minimum period of follow-up 24 months

Exclusion criteria

Postoperative death and Stage IV (unless radical resection of synchronous liver metastases), no preop CEA

Participants Included (n)

142

Patient characteristics and setting

Age range

30 - 91 yrs

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

TNM staging system: 32 patients had Stage I, 57 had Stage II, 86 had Stage III, and 27 had Stage IV disease

Perioperative Investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

52

Site of recurrences

N/R

Index tests

CEA timing

CEA 3-monthly for 24 months, 4-monthly for yrs 3 - 4, and once per year after this (strict adherence in only 42 patients)

CEA technique

Serum concentrations of CEA were determined by a standard commercially-available immunoenzymatic assay

CEA threshold

5 µg/L

Definition of positive

2 consecutive values above 5 regarded abnormal; repeated at 2 - 4 weeks

Which CEA value (s) used?

Repeated value at the time of recurrence

Target condition and reference standard(s)

Follow-up schedule

CEA 3-monthly for 24 months, 4-monthly for yrs 3 - 4, and once per year after this (strict adherence in only 42 patients). Clinical follow-up, rectoscopy and/or colonoscopy at 1 yr and 3 yrs

Reference standard

USS/MRI/CT indicated on basis of raised CEA or clinical suspicion. CEA second-look surgery never used

Flow and timing

Timing of CEA vs reference standard (days)

as per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Yes  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Deveney 1984

Study characteristics
Patient sampling

Country

USA

Study design

Prospective

Setting

Hospital

Dates of data collection

starting in 1978

Population (n)

N/R

Inclusion criteria

Resection for curable adenocarcinoma of the colon or rectum

Exclusion criteria

Dukes D

Participants included (n)

65

Patient characteristics and setting

Age range

67 yrs mean

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

8 Dukes A tumours, 34 had Dukes B tumours, and 20 had Dukes C tumours

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Some got radio/chemotherapy

Recurrences (n)

23

Site of recurrences

N/R

Index tests

CEA timing

3-monthly in yr 1, then 6-monthly to year 5

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

3-monthly in year 1, then 6-monthly to year 5: clinical history, examination, FOBT, LFT, CEA. 6-monthly CXR, CT abdomen, total colonoscopy. BE at 6 and 12 months then annually

Reference standard

A positive finding on any test prompted additional confirmatory tests, including laparotomy, thoracotomy,or percutaneous CT-directed biopsy

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Engarås 2003

Study characteristics
Patient sampling

Country

Sweden

Study design

Prospective

Setting

Hospital

Dates of data collection

1998 - 1990

Population (n)

151

Inclusion criteria

Surgery with curative intent with 5 years follow-up

Exclusion criteria

N/R

Participants Included (n)

132

Patient characteristics and setting

Age range

27 - 75

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Duke A 11, B 76, C 43,D 1, Undefined 1

Perioperative investigations done to ensure no residual disease

Not specified

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

39

Site of recurrences

N/R

Index tests

CEA timing

Monthly during year 1 and then at 18 and 24 months

CEA technique

Delfia® test kits (Wallac Oy, Turku, Finland). The accuracy of the assays was assessed by analysis of 2 control samples in each assay and by measurement of the coefficient of variation by duplicate analyses of the samples

CEA threshold

5.6 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

Monthly outpatient clinic visit during year 1, serum tests monthly during year 1, then 18 and 24 months. Clinical examinations at 1 year and 2 year with CXR, Sigmoidoscopy, BE, and CT Liver.

Reference standard

Radiologic and/or endoscopic investigations at surgery or post mortem

Flow and timing

Timing of CEA vs reference standard (days)

as per follow-up schedule

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Yes  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Farinon 1980

Study characteristics
Patient sampling

Country

Italy

Study design

Retrospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

87

Inclusion criteria

Preoperative CEA test > 6, operated in with end-to-end anastomosis

Exclusion criteria

N/R

Participants included (n)

35

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 3, B 26, C 6

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

10

Site of recurrences

N/R

Index tests

CEA timing

3 monthly

CEA technique

CEA radioimmunoassay direct method

CEA threshold

6 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

CEA and colonoscopy every 3 months

Reference standard

Second look surgery if not clear from CEA + colonoscopy.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Fezoulidis 1987

Study characteristics
Patient sampling

Country

Germany

Study design

Prospective

Setting

Hospital

Dates of data collection

1984 - 1986

Population (n)

48

Inclusion criteria

radical surgery

Exclusion criteria

No exclusion criteria were defined; results from all 48 participants are included in the study

Participants Included (n)

48

Patient characteristics and setting

Age range

Study does not describe age bands; median age is 64

Smoking status

N/R

Site of primary tumour

Rectum

Stage of primary tumour

Dukes A 9, Dukes B 16, Dukes C1 19, Dukes C2 4

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

5

Site of recurrences

5 local rectal recurrences

Index tests

CEA timing

6 weeks, (then 3-monthly); main text of the study only mentions that CEA was measured postoperatively; ?Only once; it is not clear if there was a sequence of measurements. Table 4 looks more like a one-off

CEA technique

Unknown

CEA threshold

2.5 µg/L

Definition of positive

Unclear

Which CEA value (s) used?

Probably 4 - 6 weeks postoperatively

Target condition and reference standard(s)

Follow-up schedule

4 - 6 weeks postoperatively, then 3-monthly clinical examination, CT, and CEA

Reference standard

4 patients underwent CT guided biopsy but at unknown stage

Flow and timing

Timing of CEA vs reference standard (days)

Unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Fucini 1987

Study characteristics
Patient sampling

Country

Italy

Study design

Retrospective

Setting

Hospital

Dates of data collection

1979 - 1983

Population (n)

64

Inclusion criteria

Potentially curative surgery

Exclusion criteria

Died or demonstrated recurrence before 1982 (introduction of TPA and CA19-9 assays)

Participants Included (n)

52

Patient characteristics and setting

Age range

40 - 77

Smoking status

1 smoker

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A: 28, B 17, C19

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

10

Site of recurrences

N/R

Index tests

CEA timing

As per protocol, then repeated within 2 weeks considered positive

CEA technique

Double antibody method (CEA-PR, Sorin Biomedica)

CEA threshold

20 (95% control group)

Definition of positive

2 consecutive samples

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

CEA + TPA + CA19-9, clinical exam at 3, 7, 14 days then 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 42, 48, 54, 60 months. Blood count at 3, 6, 12, 18, 24, 26, 48, 60 months. Liver USS at 3, 6, 18, 30, 36, 42, 48, 54, 60 months. CXR 3, 6, 12, 18, 24, 30, 36, 48, 60 months. DCBE at 18, 42, 60 months. Colonoscopy 6, 12, 24, 36, 48, 60 months. APCT 12, 24 months. Random perineal percutaneous needle biopsy (rectal cancer) 6, 12, 18, 24, 36, 48, 60

Flow and timing

Timing of CEA vs reference standard (days)

Sensitivity uses CEA at the time of recurrence, specificity uses CEA over threshold at any time during follow-up

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?No  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Graffner 1985

Study characteristics
Patient sampling

Country

Sweden

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

190

Inclusion criteria

Curative resection, age able to attend follow-up

Exclusion criteria

Moved from area, died of intercurrent illness, did not follow the schedule

Participants included (n)

167

Patient characteristics and setting

Age range

55 - 74

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 24, B 89, C 77

Perioperative Investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

47

Site of recurrences

Liver 18, anastomotic 4, perineal 7, lungs 4, skin 5, multiple organs 8, skeleton 1

Index tests

CEA timing

CEA every second month during the first 2 years and every third month thereafter

CEA technique

Radioimmunoassay

CEA threshold

Abnormal blood values (CEA used same method as Colleen et al 1979 "the reference value was calculated from serum sampled from 89 apparently healthy persons aged 25 to 69 years. It was 10+/- 2.5 ug/l (mean+/-S.D)") or a rise of CEA levels within the normal range of more than 50%

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

CEA, ESR, haemoglobin, ALP, glutamyltranspeptidase (GGT), orosomucoid, alpha-antitrypsin, and haptoglobin every second month during the first 2 years and every third month thereafter. Physical exam and rectoscopy 3, 6, 9, 12, 18, 24, 36, 48, 60 months. DCBE and CXR 12, 36, 60 months

Reference standard

CXR, CT liver, CT perineum, endoscopic investigation of anastomosis, DCBE, angiography and bone scintography in selected cases

Flow and timing

Timing of CEA vs reference standard (days)

if abnormal CEA detected reference standard triggered

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Hara 2008

Study characteristics
Patient sampling

Country

Japan

Study design

Retrospective

Setting

Hospital

Dates of data collection

1990 - 2000

Population (n)

680

Inclusion criteria

Curative resection, dukes C

Exclusion criteria

Multiple cancers, insufficient examinations, persistent post-op CEA, and SCC, randomised to pretest probability group

Participants Included (n)

174

Patient characteristics and setting

Age range

60.6 ± 11.1

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

All Dukes C- Stage 1 18, 2 59, 3 232, 4 39

Perioperative investigations done to ensure no residual disease

Persistent CEA elevation excluded

Chemotherapy/radiotherapy?

No

Recurrences (n)

51

Site of recurrences

N/R

Index tests

CEA timing

3-monthly

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

All patients were followed for more than 5 years or until death with routine serum CEA examination every 3 months. USS and/or CT and CXR examinations were performed every 3 - 6 months

Reference standard

Additional imaging was performed in patients with elevated postoperative CEA levels to determine whether recurrence was present

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Hara 2010

Study characteristics
Patient sampling

Country

Japan

Study design

Retrospective

Setting

Hospital

Dates of data collection

1990 - 2004

Population (n)

488

Inclusion criteria

Stage II or III curative resection

Exclusion criteria

Patients with squamous cell, carcinoma, more than one cancer, or insufficient follow-up

Participants Included (n)

Stage II: 167

Stage III: 136

Patient characteristics and setting

Age range

Stage II: 68.3 ± 10.5 (38 – 92)

Stage III: 63.4 ± 9.4 (44 – 88)

Smoking status

N/R

Site of primary tumour

Stage II: Colon 112, rectum 55

Stage III: Colon 89, rectum 47

Stage of primary tumour

Stage II: Depth T1 0, 2 0, 3 142, 4 23

Stage III: Depth T1 3, 2 89, 3 32, 4 12

Perioperative investigations done to ensure no residual disease

Not specified

Chemotherapy/radiotherapy?

No

Recurrences (n)

Stage II: 23

Stage III: 51

Site of recurrences

N/R

Index tests

CEA timing

Unclear

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

All patients underwent routine serum CEA assays and radiological examination

Flow and timing

Timing of CEA vs reference standard (days)

unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Unclear  
    

Hine 1984

Study characteristics
Patient sampling

Country

UK

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

663

Inclusion criteria

Radical surgery for colorectal cancer

Exclusion criteria

SCC anus, tumours in the appendix. 6 were lost to clinical follow-up and 5 others were removed from the trial. Removal followed the development of unassociated conditions such as alcoholic cirrhosis which interfered with the interpretation of a significant CEA rise (3 patients) and in 2 patients the onset of psychiatric illness made the use of cancer chemotherapy inadvisable

Participants Included (n)

626

Patient characteristics and setting

Age range

59

Smoking status

Unknown

Site of primary tumour

290 rectum, 373 colon

Stage of primary tumour

A in 38, B in 377 and C in 248

Perioperative investigations done to ensure no residual disease

Not specified

Chemotherapy/radiotherapy?

Patients with at least 2 progressively rising CEA values of > 35 ngml-1 but no other definite evidence of recurrent malignancy were randomised in a prospective trial of cytotoxic therapy

Recurrences (n)

171

Site of recurrences

N/R

Index tests

CEA Timing

At each follow-up visit

CEA Technique

CEA was measured in the unextracted serum by a double antibody radio-immunoassay as developed by Egan et al. (1972) and adapted by Laurence et al. (1972). The inter- and intra-assay variation of the method was found to be < 10%. An upper limit of 15 µg/L will include 99% of a normal population and in the present study a level of > 20 µg/L was regarded as abnormal

CEA threshold

20 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

3-monthly for for first 2 postoperative years, then 6 - 12-monthly depending on the surgeon. Full clinical examination including sigmoidoscopy was performed

Reference standard

recurrence was primarily made on the basis of symptoms and signs of disease confirmed by other investigations when indicated (e.g. liver scan, bone scan, biopsy). Thorough clinical examination including sigmoidoscopy. If this indicated recurrent malignancy, confirmatory investigations were ordered and management was initiated appropriate to the results. When clinical examination failed to reveal malignancy, the subsequent course of events depended on the degree of elevation of the CEA. If the level was >20ngml-1 but <35ngml- 1, the test was repeated at monthly intervals until it fell below 20ngml-1 or rose above 35ngml-1. All patients with levels >35ngml-1 and no clinical evidence of recurrence had a further CEA estimation, full blood count, erythrocyte sedimentation rate, liver function tests, barium enema, chest X-ray and isotope and/or ultrasound liver scan, together with bone scan and colonoscopy where indicated. If recurrence was diagnosed from the results of these
investigations then appropriate management was instituted.

Flow and timing

Timing of CEA vs reference standard (days)

Raised CEAs were recalled to clinic within 2 months of the date of the first sample for clinical exam and sigmoidoscopoy. If no recurrence found intensified frequency of testing whilst in the 20 - 35 range. If > 35 but no signs of recurrence, then chemotherapy

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Unclear  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Unclear  
    

Irvine 2007

Study characteristics
Patient sampling

Country

UK

Study design

Retrospective

Setting

Hospital

Dates of data collection

1996 - 2000

Population (n)

150

Inclusion criteria

Curative surgery for colorectal cancer

Exclusion criteria

Palliative patients, non-operative patients, 11 who developed metastases or recurrences within 3 months of surgery, persistently elevated CEA postoperatively (deemed non-curative resection)

Participants Included (n)

139

Patient characteristics and setting

Age range

22 - 87

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 10, B 82, C 47

Perioperative investigations done to ensure no residual disease

Development of metastases or recurrence within 3 months of surgery, persistently elevated CEA postoperatively

Chemotherapy/radiotherapy?

No

Recurrences (n)

46

Site of recurrences

N/R

Index tests

CEA timing

Postoperatively 3-monthly for 2 yrs, then 6-monthly to 5 yrs. The CEA measurements for each patient were analysed twice, once looking for a small rise in CEA and again looking for a CEA value that rose above the traditional normal limit (10 µg/L)

CEA technique

Bayer immunoassay, which at the levels in this study has an error rate of 2.3%

CEA threshold

10 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

6-monthly CT for 2 years, plus CEA 3-monthly for 2 years, then 6-monthly to 5 years

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Yes  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Johnson 1985

Study characteristics
Patient sampling

Country

Norway

Study design

Propsective

Setting

Hosptial + primary care

Dates of data collection

N/R

Population (n)

93

Inclusion criteria

Radical treatment for colorectal cancer

Exclusion criteria

Palliative, new cancers, no CEA monitoring

Participants included (n)

51

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colon 49, rectal 44

Stage of primary tumour

Dukes A 28, B 27, C 21, palliative 17

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

15

Site of recurrences

N/R

Index tests

CEA timing

Postoperatively, then at 3 - 4-monthly intervals

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

More data available?

N/R

Target condition and reference standard(s)

Follow-up schedule

Postoperatively, then at 3 - 4 monthly intervals, rising CEA resulted in further investigation, general clinical investigations, angiography of the liver, resection. No fixed schedule

Flow and timing

Timing of CEA vs reference standard (days)

CEA triggered investigation

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Unclear  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Unclear  
    

Jubert 1978

Study characteristics
Patient sampling

Country

USA

Study design

Retrospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

97

Inclusion criteria

Colorectal cancer

Exclusion criteria

N/R

Participants Included (n)

97

Patient characteristics and setting

Age range

65 mean (39 - 89)

Smoking status

Unknown

Site of primary tumour

Colon 56, rectum 41

Stage of primary tumour

Dukes A 10, B 42, C 34, D 6

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

5 chemo, 5 immuno

Recurrences (n)

20

Site of recurrences

7 liver, 13 non-liver

Index tests

CEA timing

At 6-week intervals postoperatively

CEA technique

N/R

CEA threshold

2.5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

CEA is done preoperatively and at six week intervals postoperatively. In addition, patients are evaluated postoperatively at 6 to 8 week intervals by physical examination and the usual laboratory and radiological tests, and where indicated, suspicions of recurrence and/or metastasis are documented histologically for the most part.

Reference standard

"suspicions of recurrence and/or metastasis are documented histologically for the most part".

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Unclear  
    

Kanellos 2006a

Study characteristics
Patient sampling

Country

Greece

Study design

Prospective

Setting

Hospital

Dates of data collection

1991 - 1999

Population (n)

N/R

Inclusion criteria

Histologically proven colorectal cancer, no detectable liver metastasis, curative surgery for colorectal cancer

Exclusion criteria

Confirmed liver metastasis, peritoneal carcinomatosis, ascites, emergency surgery for obstruction or perforation, smokers, obstructive biliary disease or biliary surgery, or refused consent

Participants Included (n)

73

Patient characteristics and setting

Age range

64.2 (SD: 9.7)

Smoking status

Non-smokers

Site of primary tumour

Colorectal

Stage of primary tumour

Stage I 14, II 37, III 22

Perioperative investigations done to ensure no residual disease

Pre-op abdominal CT, intraoperative liver palpation to exclude liver metastases

Chemotherapy/radiotherapy?

22 patients with stage III cancer had adjuvant chemo

Recurrences (n)

10

Site of recurrences

N/R

Index tests

CEA timing

3-monthly to 3 yrs, the 6-monthly to 5 yrs

CEA technique

Monoclonal antibody technique, using a solid-phase 2-site mouse monoclonal antibody radioimmunoassay kit

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

Every 3 months for the first 3 years and every 6 months thereafter: clinical examination routine biochemical analysis, CXR, and CT.

Flow and timing

Timing of CEA vs reference standard (days)

Simultaneous, per protocol.

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Kato 1980

Study characteristics
Patient sampling

Country

Japan

Study design

Prospective

Setting

Hospital

Dates of data collection

1977 - 79

Population (n)

N/R

Inclusion criteria

Surgically treated for adenocarcinoma of the colon or rectum with curative intent

Exclusion criteria

Incomplete CEA dataset

Participants Included (n)

129

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A,B,C

Perioperative investigations done to ensure no residual disease

Not specified

Chemotherapy/radiotherapy?

No

Recurrences (n)

32

Site of recurrences

N/R

Index tests

CEA timing

Unclear

CEA technique

RIA kit by Dynabot

CEA threshold

2.5 and 5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

N/R

Flow and timing

Timing of CEA vs reference standard (days)

Unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Unclear  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Unclear  
    

Kim 2013

Study characteristics
Patient sampling

Country

Korea

Study design

Retrospective

Setting

Hospital

Dates of data collection

2005 - 2009

Population (n)

N/R

Inclusion criteria

Radical resection

Exclusion criteria

Patients with stage 0, I or IV cancer, insufficient follow-up (less than 3 years), abnormal CEA in the first measurement after surgery (checked within three months after surgery), history of other cancers and/or history of preoperative concurrent chemoradiation therapy were excluded

Participants Included (n)

336

Patient characteristics and setting

Age range

Stage 111: 29 - 81, Stage 11: 33 - 83

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Stage II 189, Stage III 147

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

79

Site of recurrences

Index tests

CEA timing

CEA levels were assayed with a 3-month interval for the first 2 years and every 6 months thereafter

CEA technique

Immunoassay method (ADIVA Centaur XP immunoassay system, Siemen AG, Erlangen, Germany)

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

CEA levels were assayed with a 3-month interval for the first 2 years and every 6 months thereafter. Chest CT and abdomino-pelvic CT were performed with a 6-month interval for the first 2 years and every year thereafter

Reference standard

The diagnosis of a tumour recurrence was confirmed by biopsy and radiologic evidence

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Kohler 1980

Study characteristics
Patient sampling

Country

USA

Study design

Retrospective casenote review.

Setting

Hospital

Dates of data collection

1971 - 1974

Population (n)

144

Inclusion criteria

Surgically confirmed adenocarcinoma of colon or rectum

Exclusion criteria

N/R

Participants Included (n)

49

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

22

Site of recurrences

N/R

Index tests

CEA timing

Not clear

CEA technique

Hansens radioimmunoassay

CEA threshold

2.5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

N/R

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?No  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Unclear  
    

Koizumi 1992

Study characteristics
Patient sampling

Country

Japan

Study design

Cross-sectional with follow-up of cases

Setting

Hospital

Dates of data collection

1986 - 1990

Population (n)

194

Inclusion criteria

Unclear

Exclusion criteria

Cases undergoing operation later, benign colorectal disease.

Participants Included (n)

77

Patient characteristics and setting

Age range

32 - 83

Smoking status

Unknown

Site of primary tumour

Colorectal

Stage of primary tumour

Unknown

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Follow-up schedule

N/R

Recurrences (n)

34

Site of recurrences

N/R

Index tests

CEA timing

N/R

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Reference standard

Unclear

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Unclear
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Unclear  
    

Korner 2007

Study characteristics
Patient sampling

Country

Norway

Study design

Prospective cohort with retrospective sampling

Setting

Hospital

Dates of data collection

1996 - 1999

Population (n)

314

Inclusion criteria

Surgically treated for adenocarcinoma of the colon or rectum with curative intent, age < 75 yrs, national guidelines followed

Exclusion criteria

Not systematically followed up for 5 years or until recurrence, incomplete CEA dataset. Dukes D

Participants included (n)

153

Patient characteristics and setting

Age range

< 75

Smoking status

N/R

Site of primary tumour

Colon 102, rectum 50

Stage of primary tumour

Dukes A 31, B 79, C 42

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

37

Site of recurrences

N/R

Index tests

CEA timing

CEA 3, 6, 9, 12, 18, 24, 30, 36, 42, 48, 54, 60 months

CEA technique

Immunoassay kit from Abbot diagnostic IL, USA

CEA threshold

4 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

CEA 3, 6, 9, 12, 18, 24, 30, 36, 42, 48, 54, 60 months. USS Liver & CXR 6, 12, 18, 24, 30, 36, 42, 48, 54, 60 months. Colonoscopy 12, 60 months.

Reference standard

Biopsy and/or imaging studies to confirm recurrence, or disease-free interval of 60 months without proof of recurrence.

Flow and timing

Timing of CEA vs reference standard (days)

not specified if different from protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Li Destri 1998

Study characteristics
Patient sampling

Country

Italy

Study design

Retrospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

364

Inclusion criteria

Radical surgery for colorectal cancer CEA measured postoperatively

Exclusion criteria

N/R

Participants included (n)

239

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

45

Site of recurrences

hepatic 18, non-hepatic 22, mixed 5

Index tests

CEA timing

CEA monitoring, conducted every 3 months for years 1, 2, and 3, every 6 months for years 4 and 5, then yearly up to year 10

CEA technique

The antigen was determined using the radioimmunoassay method.

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

CEA monitoring, conducted every 3 months for years 1, 2, and 3, every 6 months for years 4 and 5, then yearly up to year 10.

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Unclear  
    

Lucha 1997

Study characteristics
Patient sampling

Country

USA

Study design

Retrospective

Setting

Hospital

Dates of data collection

1981 - 1985

Population (n)

N/R

Inclusion criteria

Newly diagnosed colorectal cancer undergoing operative resection for cure (Astler Coller A,B,C)

Exclusion criteria

Metastatic disease and synchronous cancers

Participants Included (n)

285

Patient characteristics and setting

Age range

66.8 (range, 31 - 96)

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Astler-Coller Stage A 39, B1 57, B2 109, C1 15, C2 60

Perioperative investigations done to ensure no residual disease

Intraoperative criteria for curative resection included absence of gross residual disease

Chemotherapy/radiotherapy?

No

Recurrences (n)

66

Site of recurrences

N/R

Index tests

CEA timing

2-monthly for 2 years, 3-monthly for year 3, 6-monthly for years 4 - 5, annually afterwards. A repeat CEA was performed in patients who had an abnormal rise

CEA technique

Abbott

CEA threshold

5 µg/L

Definition of positive

2 consecutive samples

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Follow-up schedule

2 monthly for 2 years, 3 monthly for year 3, 6 monthly for years 4 and 5, annually afterwards. A detailed history and physical examination was performed, and CEA levels were monitored at each encounter.

Reference standard

Two successive CEA elevations were investigated with diagnostic imaging and / or endoscopy when indicated.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?No  
    

Luporini 1979

Study characteristics
Patient sampling

Country

Italy

Study design

retrospective

Setting

Hospital

Dates of data collection

1974 - 1976

Population (n)

204

Inclusion criteria

Large bowel malignancies, radical resection

Exclusion criteria

N/R

Participants Included (n)

198

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Large intestine

Stage of primary tumour

Dukes A - B 11, C1 39, C2 30, CH (liver involvement) 32

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Yes

Recurrences (n)

62

Site of recurrences

N/R

Index tests

CEA timing

N/R

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

More data available?

N/R

Target condition and reference standard(s)

Follow-up schedule

N/R

Flow and timing

Timing of CEA vs reference standard (days)

Unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Unclear  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Unclear  
    

Mach 1978

Study characteristics
Patient sampling

Country

Switzerland

Study design

Retrospective

Setting

Hospital

Dates of data collection

1977 - 1978

Population (n)

200

Inclusion criteria

Histologically confirmed diagnosis of adenocarcinoma of colon or rectum

Exclusion criteria

Incomplete tumour resection

Participants Included (n)

66

Patient characteristics and setting

Age range

65

Smoking status

12 patients who had CEA levels fluctuating around the normal limit of 5 ng/ml during the last 2 or 3 years without a definite rise of CEA levels and also without clinical evidence of tumour relapse. Among them were 6 heavy smokers

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes ABCD

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

2 of the recurrences were reported to have chemo

Recurrences (n)

19

Site of recurrences

N/R

Index tests

CEA timing

3-monthly

CEA technique

The radioimmunoassay of CEA was performed according to the method of Goldz as modified by Mach el al. The major modification was that duplicates of 1 ml of plasma (10 ml of blood was collected in tubes containing 33 mg of dry E.D.T.A. K3) instead of 5 ml of serum, were extracted in perchloric acid. The sensitivity of the test is 1 µg/L. The normal value determined in 90 nonsmoking blood bank donors, unselected for age and sex, ranged between 0 to 3.5 µg/L. Our CEA assay is similar to the Hansen method,'but our numerical values are slightly higher and should be divided by a factor of 1.5 in order to make a direct comparison.

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow up schedule

N/R

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Unclear  
    

Mackay 1974

Study characteristics
Patient sampling

Country

UK

Study design

Prospective

Setting

Hospital

Dates of data collection

Approx 1970 - 1973

Population (n)

N/R

Inclusion criteria

Surgically resected colorectal carcinoma (a) Their operations were considered to be clinically curative. (b) Pathological staging showed the carcinoma to fall into Dukes (1950) A, B, or C category. (c) The participants had been followed up for at least 12 months and most for 24 months either after the operation or after the first plasma CEA assay

Exclusion criteria

Inadequate follow-up time or because the plasma CEA values had risen temporarily to or remained at levels between 20 and 40 µg/L

Participants included (n)

220

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Duke ABC

Perioperative investigations done to ensure no residual disease

Unclear

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

53

Site of recurrences

Liver 31, lung 3, peritoneum and pelvis 17, bones 2, local 6, skin 2

Index tests

CEA timing

3 monthly

CEA technique

Double-antibody radioimmunoassay

CEA threshold

40 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

N/R

Reference standard

Recurrence of tumour was detected clinically or by radioisotope scanning or other radiographic techniques.

Flow and timing

Timing of CEA vs reference standard (days)

Reference standard triggered by a rise in CEA

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Mariani 1980

Study characteristics
Patient sampling

Country

Italy

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

N/R

Inclusion criteria

Histologically confirmed adenocarcinoma submitted for resection (included ± pre-op measurements)

Exclusion criteria

Heavy smokers (> 15 cigarettes/day) and patients with known, or suspected alcoholic hepatitis

Participants included (n)

69

Patient characteristics and setting

Age range

60.2 ± 11.6 yrs

Smoking status

Excluded

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 5, B 18, C 14, D 2.

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

24

Site of recurrences

N/R

Index tests

CEA timing

The 4th and 14th day after surgery. Subsequent blood samples were taken at regular intervals (every 2 - 3 months) in the following 12 - 20 months. Moreover, an increased CEA value was always confirmed by repeated assays of the same sample, and by assaying an additional sample obtained from the same patient

CEA technique

Radioimmunoassay (RIA), using commercial EAK kits (purchased through SORIN Biomedica, Saluggia, Italy)

CEA threshold

10 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

All patients had a blood sample taken for CEA assay preoperatively, then at the 4th and 14th day after surgery. Subsequent blood samples were taken at regular intervals (every 2-3 months) in the following 12-20 months with follow-up examinations; the complete work-up of the patients included physical examination, chest standard X-ray, recto-sigmoidoscopy, liver scan, hemogram and liver function tests; barium enema and bone scan were performed when indicated.

Flow and timing

Timing of CEA vs reference standard (days)

not specified

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

McCall 1994

Study characteristics
Patient sampling

Country

Australia

Study design

Prospective RCT

Setting

Hospital

Dates of data collection

1984 - 1990

Population (n)

328

Inclusion criteria

curative resection of colorectal cancers

Exclusion criteria

Patients with metastatic disease at presentation and those who for geographic or medical reasons were not able to be followed were excluded from the trial. Less than two years follow-up completed (16 patients: 10 died of unrelated causes; 6 withdrew consent or were lost to follow-up) and failure to obtain CEA levels (one patient).

Participants Included (n)

311

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes ABC

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

98

Site of recurrences

N/R

Index tests

CEA timing

Patients entered into both arms of the study had serum CEA levels measured for 5 consecutive years: every 3 months for the first 2 years, then every 6 months for the next 3 years

CEA technique

Enzyme immunoassay method (Abbott Laboratories, North Chicago, IL)

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

Standard follow up: Clinical review plus CEA, Liver function, and fecal occult blood - 3 monthly til 2 years, 6 monthly til 5 years. CXR, Liver CT, Colonoscopy at 0 and 5 years;

Aggressive follow up: As for standard follow-up plus CXR , Liver CT and Colonoscopy annually

Reference standard

Radiology, histology

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Miles 1995

Study characteristics
Patient sampling

Country

Scotland

Study design

Retrospective notes review

Setting

Hospital

Dates of data collection

1988 - 1992

Population (n)

265

Inclusion criteria

Patients who underwent a resection, with curative intent.

Exclusion criteria

Patients were excluded where, on inspection of the patients' notes, it was found that primary surgery was palliative, follow-up was incomplete or there were fewer than 1 preoperative and 2 postoperative carcinoembryonic antigen level estimations

Participants included (n)

125

Patient characteristics and setting

Age range

69 (41 - 90)

Smoking status

Unknown

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 10, B 27, C 38, D 22, unknown 27

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

53

Site of recurrences

N/R

Index tests

CEA timing

Not clear

CEA technique

Using international standard International Reference Preparation 73/601, National Institute for Biological Standards and Control

CEA threshold

10 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

History is recorded and clinical examination (including rectal examination and rigid sigmoidoscopy), faecal occult blood test and estimation of carcinoembryonic antigen level are undertaken

Reference standard

The presence of recurrent disease is confirmed by clinical examination, colonoscopy, biopsy, chest radiography, ultrasonography, computerized axial tomography scanning and laparotomy.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol, CEA triggers reference standard.

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Yes  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Minton 1985

Study characteristics
Patient sampling

Country

USA

Study design

Prospective

Setting

Hospital

Dates of data collection

1978 - 1983

Population (n)

400

Inclusion criteria

post-colorectal cancer resection

Exclusion criteria

N/R

Participants included (n)

400

Patient characteristics and setting

Age range

58 (18 - 84)

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes A 17, B1 91, B2 31, C1 119, C2 122, D 6, unknown 6

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

130

Site of recurrences

Liver 49, Anastomosis site or mesentery of bowel 26, peritoneum 7, pelvis 6, para-aortic nodes 2, mesentric nodes 2, multiple 7, other 28, no disease found 3

Index tests

CEA timing

CEA performed every 2 months for the first 2 years, and then every 4 months for the next 3 years. To rule out laboratory variations, a repeat CEA value was required to confirm an abnormal CEA elevation

CEA technique

N/R

CEA threshold

2.5 µg/L

Definition of positive

Abnormal repeated

Which CEA value (s) used?

Unclear

Target condition and reference standard(s)

Follow-up schedule

Patients were evaluated postoperatively with each surgeon's customary follow-up procedures and frequency of CEA determinations.

Reference standard

Second-look surgery was performed on any potentially resectable recurrent cancer discovered by physical examination or symptoms of bowel or ureteral obstruction, gastrointestinal bleeding, or findings from rectal, vaginal, or colostomy examinations. In addition, second-look surgery was done when a persistently rising CEA value was detected. Before the second-look procedure was performed a careful physical examination complemented by chest roentgenogram, bone and brain scans, and appropriate gastrointestinal and genitourinary roentgenograms was done to rule out the possibility of unresectable metastases. A computerized axial tomography (CAT) scan of the abdomen was not required, but was considered appropriate for institutions with that capability.

Flow and timing

Timing of CEA vs reference standard (days)

not specified

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?No  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Unclear  
    

Mittal 2011

Study characteristics
Patient sampling

Country

India

Study design

Retrospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

73

Inclusion criteria

Histologically proven postoperative CRC resection undergoing PET/CT and conventional imaging to detect suspected recurrence triggered by a rising CEA

Exclusion criteria

N/R

Participants included (n)

73

Patient characteristics and setting

Age range

25 - 80

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

38

Site of recurrences

N/R

Index tests

CEA timing

Within 7 - 10 days of imaging

CEA technique

Electro-chemiluminescent immunoassay

CEA threshold

3 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At point of recurrence

Target condition and reference standard(s)

Reference standard

PET/CT

Flow and timing

Timing of CEA vs reference standard (days)

within 7-10 days of CEA

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Nishida 1988

Study characteristics
Patient sampling

Country

Japan

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

N/R

Inclusion criteria

Surgically treated for adenocarcinoma of the colon or rectum with curative intent and CEA measurements

Exclusion criteria

incomplete CEA dataset

Participants included (n)

66

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Stage I - V

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Follow-up schedule

N/R

Recurrences (n)

20

Site of recurrences

N/R

Index tests

CEA timing

CEA 1 month

CEA technique

RIA kit by Dynabot

CEA threshold

2.5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Reference standard

N/R

Flow and timing

Timing of CEA vs reference standard (days)

Unclear

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Unclear  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Unclear  
    

Ochoa-Figueroa 2012

Study characteristics
Patient sampling

Country

Spain

Study design

Retrospective

Setting

Hospital

Dates of data collection

2007 - 2011

Population (n)

54

Inclusion criteria

Referred to the Dept of Nuclear Medicine for FDG PET-CT with suspected CRC recurrence following surgical resection and posterior histological confirmation

Exclusion criteria

Not possible to follow up, mixed malignancy of the salivary gland

Participants Included (n)

47

Patient characteristics and setting

Age range

63 (32 - 87)

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

38 chemo, 9 chemo and radio

Recurrences (n)

34

Site of recurrences

N/R

Index tests

CEA timing

CEA used as a marker of suspected recurrence or measured when recurrence suspected by CT

CEA technique

Radioimmunoanalysis

CEA threshold

10 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

Single measurement taken at point of recurrence

Target condition and reference standard(s)

Reference standard

Histopathology or Clinical evolution, FDG PET-CT

Flow and timing

Timing of CEA vs reference standard (days)

CEA prior to Referral; no more clear than this

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?Unclear  
   High
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Ohlsson 1995

Study characteristics
Patient sampling

Country

Sweden

Study design

RCT

Setting

Hospital

Dates of data collection

1983 - 1986

Population (n)

107

Inclusion criteria

Resection with curative intent, recruited to follow-up group

Exclusion criteria

Patients operated with local excision or having demonstrable distant metastases were excluded, as were patients in whom age or severe illness was considered to preclude treatment of recurrent disease. Other exclusion criteria were:

Inability to cooperate, ulcerative colitis, Crohn's disease, familial polyposis, and incomplete colonoscopy together with uncertain findings at the barium enema examination

Participants Included (n)

53

Patient characteristics and setting

Age range

65.7 (40.6 - 83.3)

Smoking status

N/R

Site of primary tumour

Rectum 19, colon 34

Stage of primary tumour

Dukes A 10, B 21, C 22

Perioperative investigations done to ensure no residual disease

Preoperative investigation included barium enema, pulmonary x-ray, and blood tests for liver function test, carcinoembryonic antigen and colonoscopy

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

17

Site of recurrences

Local 11, liver 3, lung 3, peritoneum 2, ovary 1

Index tests

CEA timing

3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 42, 48, 60 months

CEA technique

Not specified

CEA threshold

N/R

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

Physical examination, Rigid Proctosigmoidoscopy, Blood tests - CEA, ALP, GGT, Faecal Heamoglobin,

CXR: 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 42, 48, 60 months.

Endoscopic control of the anastomosis: 9, 21, 42 months.

Colonoscopy: 3, 15, 30, 60 months.

CT Pelvis: 3, 6, 12, 18, 24 months.

Reference standard

CT/ Endoscopy/colonoscopy

Flow and timing

Timing of CEA vs reference standard (days)

per protocol, immediate diagnostic work-up did not reveal the site of recurrence in 4 asymptomatic patients with raised CEA levels; in these patients the time interval between elevation of CEA and symptoms of tumour recurrence varied between 0.2 and 4.7 (median 0.5) years

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Ohtsuka 2008

Study characteristics
Patient sampling

Country

Japan

Study design

Retrospective

Setting

Hospital

Dates of data collection

2002 - 2005

Population (n)

138

Inclusion criteria

Curative resection, stage 0 – III according to the General Rules for Clinical and Pathological Studies on Cancer of the Colon, Rectum, and Anus, 7th edition, 2006, no residuals

Exclusion criteria

History of another malignancy before or after the operation, lost to follow-up

Participants Included (n)

97

Patient characteristics and setting

Age range

70 (37 - 86)

Smoking status

Chronic benign disease or smoking in 46 cases

Site of primary tumour

32 right colon, 32 left colon, 30 rectum, 3 multiple

Stage of primary tumour

0 in 8, I in 12, II in 37, and III in 40

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Yes, but not described

Recurrences (n)

22

Site of recurrences

Index tests

CEA timing

Every 1 – 3 months during the initial 6 months after the operation, every 3 – 6 months from 6 months to 2 years, and every 6 – 12 months during 2 – 5 years after the operation

CEA technique

CEA, a latex immunoassay, Mitsubishi Chemical Ltd., Japan

CEA threshold

5 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

the follow-up schedule of the tumour markers and physical examination after the operation were: every 1 – 3 months during the initial 6 months after the operation, every 3 – 6 months from 6 months to 2 years, and every 6 – 12 months during 2 – 5 years after the operation. Radiological examinations including abdominal ultrasonography, computed tomography (CT), chest X-ray, gastrointestinal series, and/or endoscopic evaluation were performed every 6 – 12 months during the follow-up period. Marker evaluations and physical/radiological examinations were performed at shorter-term intervals than those described above in patients with suspected recurrence, those undergoing chemotherapy, or in those demonstrating marker elevations.

Reference standard

radiological examinations / histology

Flow and timing

Timing of CEA vs reference standard (days)

per protocol or reference standard triggered by rise in CEA

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Yes  
    

Park 2009

Study characteristics
Patient sampling

Country

Korea

Study design

Prospective

Setting

Hospital

Dates of data collection

N/R

Population (n)

1707

Inclusion criteria

curative resection for colorectal cancer followed by surveillance programme

Exclusion criteria

Patients with synchronous metastatic disease or patients undergoing palliative resection, and those with carcinoma in situ, inflammatory bowel disease, familial adenomatous polyposis or pathology other than adenocarcinoma were excluded, as were patients with T1 cancer treated by endoscopic mucosal resection or transanal excision. In addition, patients with chronic obstructive lung disease, chronic liver disease, peptic ulcer, and diabetes were excluded.

Participants Included (n)

1263

Patient characteristics and setting

Age range

61 (21 - 90)

Smoking status

N/R

Site of primary tumour

Colon 631, rectum 632

Stage of primary tumour

I 212, II 514, III 537

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Yes, but not specified

Recurrences (n)

291

Site of recurrences

N/R

Index tests

CEA timing

per schedule

CEA technique

N/R

CEA threshold

7 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All, although at point of recurrence for 18.8%

Target condition and reference standard(s)

Follow-up schedule

2- or 3- month intervals for the first 2 years and at 6-month intervals thereafter. At each visit, CEA levels are assayed, a full history is obtained, and a physical examination is per- formed. A serum CEA assay is performed with at least a 2- week interval after the administration of chemotherapy. Colonoscopy is performed within 6 months to 1 year following surgery, and every 3years thereafter. Chest radiographs and abdominopelvic computed tomography (CT) are performed 6 months postoperatively and then at yearly intervals. Unscheduled CT or positron emission tomography (PET) scans were performed on patients with increased serum CEA concentrations or patients who were symptomatic.

Reference standard

diagnosis of a tumour recurrence was confirmed by biopsy or examination of the resected specimen. Other- wise, tumour recurrence was documented from the first clinical or radiologic sign of disease that showed an unrelenting course leading to tumour progression and/or death. The criteria for establishment of recurrent disease included histologic confirmation, palpable disease, or radiographic evidence of disease with subsequent clinical progression and supportive biochemical data, particularly an increased CEA level.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?No  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Peng 2013

Study characteristics
Patient sampling

Country

China

Study design

Retrospective comparative diagnostic accuracy study

Setting

Hospital

Dates of data collection

2006 - 2012

Population (n)

128

Inclusion criteria

Colorectal cancer with full response to primary surgery ± chemo, undergoing FDG-PET/CT for either elevated CEA levels or in patients with a suspicion of recurrence without CEA rise

Exclusion criteria

Unstable, severe DM, severe illness, 1 or more additional tumours, unable to remain supine for 30 mins

Participants Included (n)

96

Patient characteristics and setting

Age range

61 (34 - 85)

Smoking status

N/R

Site of primary tumour

Colon 53, rectum 42

Stage of primary tumour

0 in 1, I 15, II 31, III39, IV 9, unknown 1

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Yes, but not specified

Recurrences (n)

63

Site of recurrences

N/R

Index tests

CEA timing

3-monthly

CEA technique

N/R

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At time of recurrence

Target condition and reference standard(s)

Reference standard

FDG-PET/CT +/- histology

Flow and timing

Timing of CEA vs reference standard (days)

Detection of recurrent lesions within 6 months of the FDG-PET scan/CEA ± histology

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Seregni 1992

Study characteristics
Patient sampling

Country

Italy

Study design

Retrospective

Setting

Hospital

Dates of data collection

1975 - 1990

Population (n)

431

Inclusion criteria

Curative resection

Exclusion criteria

N/R

Participants Included (n)

336

Patient characteristics and setting

Age range

21 - 92

Smoking status

N/R

Site of primary tumour

Colon 247, rectum 184

Stage of primary tumour

Dukes A 40, B 186, C 107, D 72

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

136

Site of recurrences

50 local recurrences, 136 distant recurrences

Index tests

CEA timing

N/R

CEA technique

N/R

CEA threshold

N/R

Definition of positive

Unclear

Which CEA value (s) used?

Unclear

Target condition and reference standard(s)

Reference standard

N/R

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Unclear  
Is the same method and instrument used for all CEA measurements?Unclear  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Unclear  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?Unclear  
    

Staib 2000

Study characteristics
Patient sampling

Country

Germany

Study design

Prospective

Setting

Hospital

Dates of data collection

1994 - 1998

Population (n)

100

Inclusion Criteria

Patients undergoing a whole-body PET scan for suspected relapse after curative resection of histologically confirmed colorectal cancer and who caused a “diagnostic problem”. The “diagnostic problems” of the patients that led to a PET scan were (1) staging of rest of the body in patients with known recurrence (n = 30); (2) suspected recurrence (n = 32); (3) increasing CEA level (n = 13); (4) unclear finding on pelvic CT (n = 7); and (5) confirmation of liver metastases (n = 12) and lung metastases (n = 6).

Exclusion Criteria

No CEA evaluation, uncontrolled DM, or acute inflammation

Participants Included (n)

98

Patient characteristics and setting

Age range

62 (32 - 80)

Smoking status

N/R

Site of primary tumour

Rectal 52, sigmoid 12, colon 22, lung or liver metastases 9, peritoneum 1

Stage of primary tumour

I 8, II 25, III 46, IV 21

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

Chemo/immunotherapy 25

Recurrences (n)

58

Site of recurrences

N/R

Index tests

CEA timing

N/R

CEA technique

Liaison Kit (Byk-Sangtec, Diet- zenbach, Germany)

CEA threshold

3 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

At point of recurrence

Target condition and reference standard(s)

Follow-up schedule

Followed up with the department’s established follow-up program. The indication for a whole body PET scan was given for patient s with suspected relapse after curative resection of colorectal cancer and who caused a “diagnostic problem”

Reference standard

FDG-PET/CT

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Yes  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?Yes  
Did all patients receive a reference standard?Yes  
    

Steele 1982

Study characteristics
Patient sampling

Country

USA

Study design

RCT

Setting

Hospital

Dates of data collection

1975 - 1980

Population (n)

770

Inclusion criteria

B2 C colon or rectal cancer, 2 treatment arms: GITSG protocol 7175 was designed to evaluate adjuvant therapy (chemotherapy, radiotherapy, both, and none) following curative resection of Dukes' B2,C1,or C2 rectal carcinoma. Protocol 6175 was the study of the potential benefit of adjuvant therapy (chemotherapy, immunotherapy, both, and none) following clinically curative resection of Dukes' B2, C1, or C2 colon cancers.

Exclusion criteria

CEA not recorded post-op

Participants Included (n)

734

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Rectal 191, colon 543

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

CEA < 5

Chemotherapy/radiotherapy?

Yes, but not described

Recurrences (n)

149

Site of recurrences

Colon

Index tests

CEA timing

On active treatment arms CEA values during and after treatment were to be obtained monthly during the first 3 months,every 3 months for the remainder of the first year, and every six months from then on. For control arms were to have CEA values obtained before operation, 1 week after operation, and at weeks 5, 10, 15, 25 after operation,and every 15 weeks thereafter

CEA technique

Hansen Z-gel technique. Interassay comparisons among the institutions and intra-assay analysis performed in the GITSG CEA reference laboratory at the Mallory Gastrointestinal Institute (Boston, Massachusetts) showed excellent reproducibility and acceptable variation among the various laboratories

CEA threshold

2.5 µg/L

Definition of positive

Maximum level of CEA

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

Patients in both protocols were scheduled for regular clinic visits every 5 weeks during the first 6 months after surgery and every 15 weeks for the remainder of the first year. Physical examination, complete blood count, and liver function tests were performed at each visit. Liver/ spleen scan,chest posterio-anterior,and lateral roentgenograms were obtained every 6 months. Sigmoidoscopic examination and large-bowel, contrast roentgenograms were performed every year.Histologic evidence of tumor was the fundamental criterion for recurrence. However,roentgenographic evidence was acceptable in cases of lung or bony metastases. In the rectal-cancer adjuvant study, liver metastases were also accepted on the basis of liver scan, and local recurrence was accepted on the basis of perineal pain occurring acutely after a pain-free interval.

Reference standard

Histology, XR for bony or lung mets, liver scan for liver mets in rectal study, or perineal pain occurring acutely after a pain-free interval.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?Yes  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Yes  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Tang 2009

Study characteristics
Patient sampling

Country

Taiwan

Study design

prospective

Setting

Hospital

Dates of data collection

1995 - 2007

Population (n)

N/A

Inclusion criteria

(1) Prior curative resection for histology-proven primary adenocarcinoma of the colorectum between 1995 and 2002, (2) availability of serial serum samples from before the operation and from after the surgery, and (3) follow-up with a definitive clinical outcome

Exclusion criteria

(1) synchronous or metachronous extracolonic cancers, (2) having neoadjuvant therapy for rectal cancer, and (3) fewer than 3 follow-up samples available for s-p53Ab analysis

Participants Included (n)

305

Patient characteristics and setting

Age range

20 - 90

Smoking status

N/R

Site of primary tumour

Colon 95, rectum 101, both 4

Stage of primary tumour

I 45, II 130, III 130.

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

76

Site of recurrences

locoregional 7, intra-abdominal or retro-peritoneal 18, hepatic 29, pulmonary 17, brain or bone 9

Index tests

CEA timing

The CEA test was defined as positive if 2 consecutive postoperative CEA values were greater than 5 µg/L or the elevated preoperative CEA values had not returned to the normal level (5 µg/L) after surgery

CEA technique

Abbott Architect 2000 (Abbott Laboratories, Abbott Park, IL, USA)

CEA threshold

5 µg/L

Definition of positive

The CEA test was defined as positive if 2 consecutive postoperative CEA values were greater than 5 µg/L or the elevated preoperative CEA values did not returned to the normal level (5 µg/L) after surgery

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

All cases were followed up at the outpatient department every 3 – 6 months until death or until December 2007. All the patients were followed according to the hospital guidelines of care. Briefly, all patients underwent a follow-up protocol of an outpatient visits every 3 – 6 months. The follow-up included physical examination and carcinoembryonic antigen tests as well as chest X-ray, abdominal sonography or abdominal computer-assisted tomography scan, and colonoscopy every 1 – 3 years after operation.

Reference standard

Relapse confirmed by histology or by an imaging study

Flow and timing

Timing of CEA vs reference standard (days)

Triggered by positive CEA

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?Unclear  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Tate 1982

Study characteristics
Patient sampling

Country

UK

Study design

Prospective study after some retrospective sampling

Setting

Hospital

Dates of data collection

1973 - 1978

Population (n)

520

Inclusion criteria

curative resection

Exclusion criteria

Dukes D, no follow-up information available, signs of malignancy on first postoperative examination, malignancy of other sites during follow-up

Participants Included (n)

468

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

N/R

Stage of primary tumour

A 94, B 226, C 128, unknown 20

Perioperative investigations done to ensure no residual disease

First postoperative exam

Chemotherapy/radiotherapy?

Not stated

Recurrences (n)

108

Site of recurrences

N/R

Index tests

CEA timing

At each follow-up visit

CEA technique

Assayed by a double-antibody radioimmunoassay system

CEA threshold

40 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

all

Target condition and reference standard(s)

Follow-up schedule

The follow-up procedure for each patient complied with the normal clinical practice for the hospital concerned and, in addition, at each follow up examination a specimen of plasma was taken for CEA determination. At least 6mly.

Reference standard

Variable

Flow and timing

Timing of CEA vs reference standard (days)

Very variable

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Unclear  
Were the reference standard results interpreted without knowledge of the results of the index tests?Yes  
   Unclear
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Unclear  
    

Tobaruela 1997

Study characteristics
Patient sampling

Country

Spain

Study design

Retrospective

Setting

Hospital

Dates of data collection

1988 - 1993

Population (n)

N/R

Inclusion Criteria

Colorectal cancer, curative surgery for Dukes C disease.

Exclusion Criteria

Dukes A, B, D

Participants Included (n)

60

Patient characteristics and setting

Age range

< 5 preop 60.9 (34 - 85) + > 5 preop 64.9 (47 - 83)

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes C = 60

Perioperative investigations done to ensure no residual disease

No

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

21

Site of recurrences

Hepatic 9, locoregional 6, combined 3, pulmonary 3

Index tests

CEA timing

As follow-up schedule

CEA technique

Enzyme-linked immunoassay

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

Physical examination and CEA 3 monthly for 2 years, then 6 monthly up to 5 years. USS abdomen twice a year. CT if CEA increased

Reference standard

CT if CEA increased

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Yes  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Triboulet 1983

Study characteristics
Patient sampling

Country

France

Study design

Prospective cohort study

Setting

Hospital

Dates of data collection

1976 - 1979

Population (n)

91

Inclusion criteria

Operated on with curative intent for colorectal cancer

Exclusion criteria

Conditions which could affect B2 microglobulin level: altered renal function (creatinine > 88.4 umol/l); liver disease: chronic active cirrhosis, primary biliary cirrhosis, acute hepatitis. Metastasis or Dukes D cancers. Patients whose CEA had not returned to normal within 3 months of the operation.

Patient characteristics and setting

Participants included (n)

91

Age range

33 - 80

Smoking status

N/R

Site of primary tumour

Colon 65, rectum 26

Stage of primary tumour

Dukes A&B = 50; C = 41

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

No

Recurrences (n)

43

Site of recurrences

12 rectum, 31 colon

Index tests

CEA timing

Every 3 months

CEA technique

Radioimmunoassay (sorin)

CEA threshold

20 µg/L

Definition of positive

N/R

Which CEA value (s) used?

N/R

Target condition and reference standard(s)

Follow-up schedule

CEA & B2m every 3 months for at least 2 years. Clinical and laboratory monitoring was ensured by the same physician during the first two years post-op in a pre-established protocol with a barium enema and / or an endoscopy during the first two years enema. CXR and Liver USS annually. Further investigations if indicated (CT chest, bone scan)

Flow and timing

Timing of CEA vs reference standard (days)

Yearly CXR and liver USS; enema and/or endoscopy done at least once in the 2 year follow-up

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?Unclear  
Was the the timing between index test(s) and reference standard ascertainable?Unclear  
Did all patients receive a reference standard?No  
    

Wang 1994

Study characteristics
Patient sampling

Country

Study design

Retrospective

Setting

Hospital

Dates of data collection

1981 - 1986

Population (n)

352

Inclusion criteria

Operated for histologically proven colorectal cancer

Exclusion criteria

No preoperative CEA or lost to follow-up, Dukes A, or Dukes D

Participants Included (n)

272

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

Dukes B 160, C 112

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

27

Site of recurrences

N/R

Index tests

CEA timing

Blood samples for CEA measurement were taken a few days before operation and about 1 month after operation and afterward at intervals of 3 - 4 months, combined with physical examination

CEA technique

Radioimmunoassay kit manufactured by Abbott Laboratory (Chicago, IL, USA)

CEA threshold

5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

Blood samples for CEA measurement were taken a. few days before operation and about one month after operation and afterward at intervals of three to four months, combined with physical examination. Other procedures such as colonoscopy, liver sonography, and chest x-ray were performed annually,

Reference standard

In the cases where we suspected recurrence the patient underwent additional abdominal computed tomography, bone scanning, or other diagnostic procedures.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?No  
Did the study avoid inappropriate exclusions?No  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?No  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Wood 1980

Study characteristics
Patient sampling

Country

UK

Study design

Retrospective

Setting

Hospital

Dates of data collection

1974 - 1976

Population (n)

148

Inclusion criteria

Apparently curative surgery for adenocarcinoma of the colon and rectum without evidence of metastatic disease

Exclusion criteria

N/R

Participants Included (n)

148

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal

Stage of primary tumour

N/R

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

36

Site of recurrences

Local 17, local + liver 2, local + bone 2, local + metachronous primary 1, liver 8, bone 5, lung 2

Index tests

CEA timing

Each follow-up visit, 2 consecutive raised CEA triggered investigation for recurrence

CEA technique

CEA levels were assayed by a double antibody radioimmunoassay on unextracted serum

CEA threshold

25 µg/L

Definition of positive

2 consecutively elevated values

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

CEA at 3 - 6 months intervals post-operative for up to 56 months or until death.

Reference standard

If CEA positive then CXR, Liver scan, and bone scan. If these are negative, additional BE and/or colonoscopy.

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Unclear  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Yakabe 2010

Study characteristics
Patient sampling

Country

Japan

Study design

Prospective

Setting

Hospital

Dates of data collection

1999 - 2003

Population (n)

266

Inclusion criteria

Curative resection for colorectal cancer, TNM stages I - III, postoperative examinations according to the follow-up schedule

Exclusion criteria

Inappropriate follow-up

Participants Included (n)

227

Patient characteristics and setting

Age range

65.2 (± 10.8) years

Smoking status

N/R

Site of primary tumour

Colon 138, rectum 89

Stage of primary tumour

I 34, II 94, III 99

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/radiotherapy?

N/R

Recurrences (n)

62

Site of recurrences

N/R

Index tests

CEA timing

3 months for the first 3 years and every 6 months during years 4 and 5

CEA technique

Latex immunoassay, Mitsubishi Chemical Ltd, Japan

CEA threshold

4.5 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

History was taken and a physical examination and measurement of tumor markers were performed every 3 months for the first 3 years and every 6 months during years 4 and 5. Chest X- ray and abdominal computed tomography (CT) were done every 6 months for 5 years, and colonoscopy was performed at 1 and 3 years after surgery. Patients were observed until 5 years after surgery or until recurrence was confirmed.

Reference standard

Recurrence was confirmed histologically or radiologically

Flow and timing

Timing of CEA vs reference standard (days)

per protocol

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?Unclear  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?Yes  
    

Yu 1992

  1. a

    ACBE= air contrast barium enema
    ALP: alkaline phosphatase
    APCT: abdominopelvic computed tomography
    BE: barium enema
    CT: computed tomography
    CXR: chest xray
    DCBE: double contrast barium enema
    DM: diabetes mellitus
    ESR: erythrocyte sedimentation rate
    FOBT: faecal occult blood test
    LDH: lactate dehydrogenase
    LFT: latex fixation test
    MRI: magnetic resonance imaging
    N/R: not reported
    RIA: radioimmunoassay
    SCC: squamous cell carcinoma.
    SGOT: serum glutamic oxaloacetic transaminase
    SGPT: serum glutamate pyruvate transaminase
    TNM: primary tumour, regional nodes, metastasis
    TPA: tissue plasminogen activator
    µg/L = micrograms per litre
    USS = ultrasound scan

Study characteristics
Patient sampling

Country

China

Study design

Retrospective observational study

Setting

Teaching hospital in Shanghai

Dates of data collection

May 1988 - March 1990

Population (n)

216

Inclusion criteria

Primary colorectal cancer having curative surgery in the teaching hospital or other hospitals

Exclusion Criteria

N/R

Participants Included (n)

182

Patient characteristics and setting

Age range

N/R

Smoking status

N/R

Site of primary tumour

Colorectal cancer 121, colon cancer 95

Stage of primary tumour

Only reported Dukes stage data for the 28 before- surgery cases (Table 1)

Perioperative investigations done to ensure no residual disease

N/R

Chemotherapy/ radiotherapy?

N/R

Recurrences (n)

66

Site of recurrences

N/R

Index tests

CEA timing

N/R

CEA technique

RIA

CEA threshold

15 µg/L

Definition of positive

1 elevated value

Which CEA value (s) used?

All

Target condition and reference standard(s)

Follow-up schedule

CEA first measured at 6 weeks after curative surgery; then every 3 months, plus liver ultrasound test and basic health check.

Reference standard

Positive CEA and CA-19-9 triggers ultrasound and CT or colonoscopy

Flow and timing

Timing of CEA vs reference standard (days)

N/R

Comparative 
Notes 
Methodological quality
ItemAuthors' judgementRisk of biasApplicability concerns
DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?Yes  
Did the study avoid inappropriate exclusions?Unclear  
   Low
DOMAIN 2: Index Test All CEA thresholds
If a threshold was used, was it pre-specified?Yes  
Is the same method and instrument used for all CEA measurements?Yes  
Is there an estimation of reproducibility of the method, for example the % coefficient of variation at specific concentrations?No  
Is there an indication of method accuracy, for example, is there evidence of participation in an external quality assessment and proficiency testing scheme?No  
   Low
DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?Yes  
Were the reference standard results interpreted without knowledge of the results of the index tests?No  
   Low
DOMAIN 4: Flow and Timing
Did all patients receive the same reference standard?Yes  
Were all patients included in the analysis?Yes  
Was the index test repeated prior to the reference standard?No  
Was the the timing between index test(s) and reference standard ascertainable?No  
Did all patients receive a reference standard?No  
    

Characteristics of excluded studies [ordered by study ID]

StudyReason for exclusion
Afsaneh 20122 x 2 data not ascertainable
Ahmed 2013Only CEA positive
Aitkin 2012Only CEA positive
Amin 20122 x 2 data not ascertainable
Arnaud 19792 x 2 data not ascertainable
Arnaud 19972 x 2 data not ascertainable
Arriola 20062 x 2 data not ascertainable
Auer 1977Stomach and colorectal cancer combined
Bakalakos 1999Liver metastases only
Barrillari 1996Only cases of recurrence
Beatty 1979Only cases of recurrence
Beets 1994Only cases of recurrence
Bhatavedekar 1992Alternative analysis - median CEA
Bivins 1974n < 30
Boey 1984Alternative analysis - slope
Borie 2004Only cases of recurrence
Brummendorf 19852 x 2 data not ascertainable
Brummendorf 1986Alternative analysis - doubling time
Bucci 19942 x 2 data not ascertainable
Camunas 1991Only cases of recurrence
Cangemi 1984n < 30
Cangemi 1987Case-control study
Carl 1983Alternative analysis - slope
Carpelan-Holmström 1996Only cases of recurrence
Castells 19982 x 2 data not ascertainable
Catania 19812 x 2 data not ascertainable
Chang 2012Only cases of recurrence
Chapman 1998Pre-operative CEA
Chen 2010Only CEA positive cases
Cho 2007Pre-operative CEA
Choi 1997Only CEA positive cases
Colombo 19862 x 2 data not ascertainable
Cossu 1984Alternative analysis
Dalton 20102 x 2 data not ascertainable
Dash 2012Only CEA negative cases
De Brauw 19872 x 2 data not ascertainable
De Levin 1982n<30
De Salvo 1997Only cases of recurrence
Dhar 19722 x 2 data not ascertainable
Di Cristofaro 2012Alter n a tive analysis - economic
Engarås 2001Only cases of recurrence
Farquharson 2012Only CEA positive cases
Fernandes 20062 x 2 data not ascertainable
Filella 19942 x 2 data not ascertainable
Filiz 2009Not follow-up for recurrence - prognostic value of postoperative CEA
Finlay 1983Not curative resection
Fiocchi 2011Not follow-up for recurrence - includes patients with suspicion of recurrence on CT
Florio 19882 x 2 data not ascertainable
Fora 20122 x 2 data not ascertainable
Forones 1997Preoperative CEA
Forones 1998n < 30
Fortner 1988Only cases of recurrence
Fournier 19992 x 2 data not ascertainable
Fucini 1983Duplicated dataset
Fucini 1984n < 30
Fucini 19852 x 2 data not ascertainable
Gail 1981Alteranative analysis - modelling
Gajdukevich 2010Not curative surgery
Gaudagni 19992 x 2 data not ascertainable
Graham 1998Only cases of recurrence
Gray 1981Only cases of recurrence
Griesenberg 1999Only cases of recurrence
Grossetti 19812 x 2 data not ascertainable
Grossmann 20072 x 2 data not ascertainable
Haga 1990Only cases of recurrence
Hall 19942 x 2 data not ascertainable
Hara 2011Duplicate dataset
Herrera 1976Case-control study
Hida 19962 x 2 data not ascertainable
Hohenberger 1994Only cases of recurrence
Holt 2010Only cases of recurrence
Holubec 20002 x 2 data not ascertainable
Holyoke 2975n < 30
Houlbec 20012 x 2 data not ascertainable
Humphreys 2011Only CEA negative cases
Huyghe 19832 x 2 data not ascertainable
Iarumov 1998Unable to locate full text
Indinnimeo 1999Unable to locate full text
Ito 2002Alternative analysis - doubling time
Jaeger 1975Only cases of recurrence
Jiang 19892 x 2 data not ascertainable
Kanellos 2006bNot follow-up - portal CEA sampling
Karesen 19802 x 2 data not ascertainable
Kawamura 2010Only cases of recurrence
Kerr 20122 x 2 data not ascertainable
Khan 20092 x 2 data not ascertainable
Kimura 1986Only cases of recurrence
Kishimoto 20102 x 2 data not ascertainable
Koch 19772 x 2 data not ascertainable
Koch 1979Not follow-up for recurrence - prognostic value of postoperative CEA
Koch 1982Not follow-up for recurrence - prognostic value of postoperative CEA
Koga 1999Alternative analysis - doubling time
Korner 20052 x 2 data not ascertainable
Kumar 2011Only cases of recurrence
Lagache 1980Only cases of recurrence
Lauterbach 19872 x 2 data not ascertainable
Lavin 1981Case-control study
Lechner 20002 x 2 data not ascertainable
Leventakos 2013Only cases of recurrence
Levy 2012Duplicate dataset
Lipska 2007Only cases of recurrence
Lipska 2010Only cases of recurrence
Lorenz 1986Not follow-up for recurrence - prognostic value of postoperative CEA
Lunde 1982Only cases of recurrence
Ma 2006Not follow-up for recurrence - prognostic value of postoperative CEA
Mach 1974Case-control study
Makela 19952 x 2 data not ascertainable
Makis 20132 x 2 data not ascertainable
Mant 2013Duplicate dataset
Martin 1976Only CEA positive case
Martin 1979Only CEA positive case
Martin 1980Only CEA positive case
Marucci 1983Not follow-up for recurrence - prognostic value of postoperative CEA
May 20122 x 2 data not ascertainable
Mazilu 2012Unable to locate full text
McCarthy 19852 x 2 data not ascertainable
Meling 19922 x 2 data not ascertainable
Mentges 19862 x 2 data not ascertainable
Mentges 1988Only cases of recurrence
Metzger 1983Only cases of recurrence
Metzger 1985Only cases of recurrence
Minton 1978aAlternative analysis - nomogram
Minton 1978bOnly cases of recurrence
Minton 1989Alternative analysis - nomogram
Miwa 1980Only cases of recurrence
Moertel 1978Only cases of recurrence
Morelli 1985n<30
Moreno Carretero 19982 x 2 data not ascertainable
Moschl 19802 x 2 data not ascertainable
Nicolini 19952 x 2 data not ascertainable
Nicolini 20052 x 2 data not ascertainable
Nicolini 2010Only cases of recurrence
Northover 19852 x 2 data not ascertainable
Northover 1986Review article
Northover 2003Review article
Novis 1986Only cases of recurrence
Nowacki 19832 x 2 data not ascertainable
Ntinas 20042 x 2 data not ascertainable
O'Dwyer 19872 x 2 data not ascertainable
O'Dwyer 1988Only CEA positive cases
Obradovic 20112 x 2 data not ascertainable
Odariuk 1989Only CEA positive cases
Ovaska 1989Only cases of recurrence
Ozhiganov 1986Unable to translate
Ozkan 2012a2 x 2 data not ascertainable
Ozkan 2012b2 x 2 data not ascertainable
Park 2012Only cases of recurrence
Park 20132 x 2 data not ascertainable
Pecorella 19962 x 2 data not ascertainable
Peethambaram 19972 x 2 data not ascertainable
Pereira 2004Unable to locate
Persijin 19812 x 2 data not ascertainable
Pfeiffer 19792 x 2 data not ascertainable
Philips 19842 x 2 data not ascertainable
Pietra 19982 x 2 data not ascertainable
Plebani 19962 x 2 data not ascertainable
Pompecki 1980n < 30
Pribelsky 2002Only cases of recurrence
Primrose 2011Duplicate dataset
Primrose 20142 x 2 data not ascertainable
Quentmeier 1990Only cases of recurrence
Reddy 2013Only cases of recurrence
Revetria 1989Case-control stu dy
Rezamansourian 2011Review article
Rieger 1975Only cases of recurrence
Rockall 19992 x 2 data not ascertainable
Rocklin 1990Only cases of recurrence
Rocklin 19912 x 2 data not ascertainable
Rodriguez-Moranta 2006aOnly cases of recurrence
Rognum 1986Only cases of recurrence
Sagar 19892 x 2 data not ascertainable
Sandelewski 2005Only cases of recurrence
Sanli 2012Only CEA positive cases
Sardi 1989Only cases of recurrence
Sarikaya 2007Only CEA negative cases
Secco 19892 x 2 data not ascertainable
Secco 2000Only cases of recurrence
Segol 1977Not follow-up for recurrence - prognostic value of postoperative CEA
Shirley 20122 x 2 data not ascertainable
Simo 2002Only CEA positive cases
Sirisriro 1996Only CEA positive cases
Song 2010Alternative analysis - CEA trend
Sorensen 2010Only CEA positive cases
Staab 1985aAlternative analysis - slope
Staab 1985bAlternative analysis - slope
Stautner-Brückmann 1990Only cases of recurrence
Steele 1980Only CEA positive cases
Stuckle 20002 x 2 data not ascertainable
Su 2012Only cases of recurrence
Sugarbaker 1976Only CEA positive cases
Szymendera 1982 aOnly cases of recurrence
Szymendera 1982 b2 x 2 data not ascertainable
Szymendera 19852 x 2 data not ascertainable
Takashima 1982Only cases of recurrence
Tomoda 1981Non-curative surgery
Tsai 2009Only cases of recurrence
Tsikitis 2009Only cases of recurrence
Verberne 2013 aLiver metastases only
Verberne 2013 bLiver metastases only
Wan 19942 x 2 data not ascertainable
Wanebo 1978aOnly cases of recurrence
Wanebo 1978bOnly cases of recurrence
Wang 2007Not follow-up for recurrence - prognostic value of postoperative CEA
Wang 20102 x 2 data not ascertainable
Wedell 1981Only cases of recurrence
Weiss 19982 x 2 data not ascertainable
Wichmann 2000aOnly cases of recurrence
Wichmann 2000bPreoperative CEA
Wichmann 2002Preoperative CEA
Wolf 1997Only cases of recurrence
Wood 1975Unable to locate
Yu 2013Only cases of recurrence
Zeng 1993Only cases of recurrence
Zervos 20012 x 2 data not ascertainable
Ziegenbein 1980Alternative analysis - trend
Zuniga 19892 x 2 data not ascertainable