New stool tests for colorectal cancer screening: A systematic review focusing on performance characteristics and practicalness

Authors


Abstract

New stool tests may be promising tools for future colorectal cancer (CRC) screening. The aim of this review was to summarize current evidence of performance characteristics and practicalness in a population-based screening setting of recently developed stool tests. The MEDLINE database was searched for relevant articles published until July 2004. Studies were included if they comprised more than 10 cases and more than 10 controls. Details on study population, performance characteristics and stool collection procedure were taken into account. Overall, 29 studies, mostly retrospective, were included, investigating 17 different stool markers or marker combinations. Underlying study populations were very heterogeneous and mostly very small. Half of the studies reported sensitivity for adenomas in addition to sensitivity for CRC, and fewer than half reported sensitivity by tumor stage or location. Performance characteristics of stool tests varied to a large extent. For most DNA-based markers, specificity was about 95% or higher, but sensitivity was mostly low even for invasive CRC. More studies with larger sample sizes were done for protein-based markers, which typically had lower specificity. In most studies, stool samples were frozen within a rather short time period after defecation. While promising performance characteristics have been reported for some tests, more pervasive evidence from larger, prospectively designed studies, which also consider aspects of practicalness, e.g., the possibility of mailing the samples, is needed. © 2005 Wiley-Liss, Inc.

With more than 900,000 new cases and about 500,000 deaths per year, colorectal cancer (CRC) is the third most common malignancy in the world.1 Due to its slow development from endoscopically removable precancerous lesions and from surgically curable early stages, screening provides the opportunity to reduce both morbidity and mortality of the disease. To realize these goals, acceptable and practical screening methods are needed. In contrast to invasive screening methods, stool testing meets crucial criteria of acceptability and practicalness. However, despite its proven efficacy for reducing CRC incidence and mortality,2, 3 the most widely used stool test, fecal occult blood testing (FOBT), has important limitations given its low sensitivity to detect precancerous or cancerous lesions.

Deeper insights into molecular changes during tumorigenesis have promoted the development of new stool tests intended to detect exfoliated neoplastic colon cells or cell products in stool. Since Sidranski et al.4 first demonstrated the feasibility of detecting mutations in the proto-oncogene K-ras in stool of CRC patients, various new stool markers, both DNA-based as well as protein-based, have been investigated. However, the literature published in this field is rather heterogeneous, and the main focus is often on technical challenges.

The purpose of this review was to summarize pertinent studies in order to describe the current evidence for new stool tests aimed at detecting colorectal neoplasms under screening conditions. Special attention was therefore given to qualities of prime importance for a successful, population-based screening setting which might minimize the burden of CRC.

Material and methods

Relevant studies were identified by searching the MEDLINE database for articles published until July 2004, using various combinations of the terms colorectal neoplasms, colorectal cancer, marker, biomarker, feces, stool, test, molecular, DNA, proteins, diagnosis, mass screening and screening. In addition, articles from the bibliographies of material identified by the MEDLINE search were reviewed.

The search was limited to studies on humans published in English. Only full-text articles were included because abstracts did not provide enough information for a more detailed evaluation. Another inclusion criterion was the examination of both cases and controls in the same study, allowing determination of both sensitivity and specificity. To roughly distinguish between mere proof-of-principle studies with very small sample sizes and studies providing estimates of performance characteristics with some minimum precision, only studies with more than 10 cases and more than 10 controls were included.

All studies meeting the inclusion criteria mentioned above were divided into different groups according to the type of marker(s) under examination. Whenever information was available, details regarding the study population, test performance characteristics and collection of stool samples were extracted from the articles. A general description of the study population comprised the number of cases and controls and their mean or median age. Cases were separated into CRC patients and people bearing adenomas. Stage and location of neoplastic lesions were taken into account, and if possible, tumor stages were uniformly classified according to Dukes' classification in order to simplify the comparison. Unless otherwise noted, the control group consisted of subjects who were endoscopically proven free of CRC/adenomas. Information on health status or the presence of symptoms was collected. Regarding performance characteristics of the new stool tests, sensitivity for CRC and adenomas and specificity were listed as reported in the article. In addition, 95% confidence intervals for all test performance characteristics were calculated based on the exact binomial distribution. Besides overall sensitivity for CRC, sensitivity stratified by tumor stage and by location was considered wherever possible.

Moreover, information relevant to practicalness and suitability for large-scale application was taken into account, e.g., time limits between defecation and freezing or processing of samples and further details regarding the stool collection procedure. We also assessed potential sources of bias, including whether sampling among cases and controls was comparable, whether analyses were performed in blinded fashion and whether bowel irritation due to recent endoscopy might have influenced the results.

Results

Overall, the initial search identified 56 potentially relevant studies. Of these, 29 studies investigating 17 different markers or marker combinations met the criteria of inclusion. The reasons for exclusion of the remaining articles are as follows: no controls (10 studies), sensitivity and/or specificity not reported (8 studies), fewer than 10 cases and/or controls (6 studies), article not in English (3 studies). A list containing the references for all excluded articles is available from the first author on request. Among all included studies, the potential of detecting CRC was investigated in 28 studies and the potential of detecting adenomas was investigated in 15. Information to assess stage-dependent sensitivity for CRC was available in 13 studies, and information to assess sensitivity by location was available in 9 studies.

Table I summarizes information from studies evaluating single DNA-based stool markers, which are subdivided into markers based on genetic mutations, markers based on epigenetic alterations and markers based on DNA integrity or DNA quantity.

Table I. Single DNA-Based Stool Markers
  1. CI, confidence interval; col., colonoscopy; quantity, quantity of native stool for further processing; SDNAI, stool DNA index; T, T categories of colorectal cancer (N and M stage n.r.); y, years; n.r., not reported.

inline image

Four studies reported performance characteristics for stool testing based on the detection of mutations in the proto-oncogene K-ras.5, 6, 7, 8 Sensitivity of this marker varied 40–56% for CRC, disregarding a study with only 5 CRC cases. Even though the number of cases per group was small, data provided by Ratto et al.5 showed a clear increase of sensitivity with tumor stage. Sensitivity was 0%, 30% and 78% for Dukes' A, B and C, respectively.5 Sensitivity for adenomas, which was investigated in 3 studies, was about 30% and tended to be lower for small adenomas (<1 cm diameter) compared to larger adenomas.6, 7, 8

Traverso et al.9, 10 analyzed 2 further markers related to the genetic pathway during the adenoma–carcinoma sequence. Analysis of mutations in the tumor-suppressor gene APC yielded sensitivity of 61% (41–79%) and 50% (26–74%) for CRC (Dukes' B2) and adenoma (≥1 cm diameter), respectively.9BAT26, a marker of microsatellite instability, allowed detection of 37% (23–52%) of CRCs (all proximal), while none of the 69 people bearing adenomas had a positive test result.10 For all stool markers in this group, specificity was 95–100%.

A recently published study was the first to evaluate the suitability of epigenetic markers for stool testing. In this proof-of-principle study, which enrolled 13 cases and 13 controls, analysis of epigenetic alterations in the SFRP2 gene yielded both a sensitivity and a specificity of 77% (46–95%).11

The potential of simply using information on DNA quantity and DNA integrity for the detection of CRC was investigated by Loktionov et al.12 and by Boynton et al.,13 respectively. Both markers may be related to the exfoliation of nonapoptotic neoplastic colonocytes into stool, showing decreased DNA degradation. While analysis of DNA integrity resulted in a higher specificity, sensitivity was higher for DNA quantity.12, 13 To analyze the latter, stool was processed within 4 hr. After DNA quantity was observed to increase by age, cases and controls were matched by age retrospectively.12

Table II gives an overview of studies evaluating the combination of several DNA-based stool markers. Combining the detection of mutations in the genes p53 and APC, Koshiji et al.14 found a sensitivity of 88% (74–96%) for CRC and a specificity of 100% (78–100%). Ahlquist et al.,15 Calistri et al.16 and Tagore et al.17 analyzed marker combinations comprising mutations in the genes K-ras, p53 and APC, one or more markers of microsatellite instability and long DNA (l-DNA), a marker of DNA integrity based on the detection of DNA sequences longer than 200 bp. The high sensitivity of 91% (71–99%) for CRC and of 82% (48–98%) for adenomas reported by Ahlquist et al.15 in 2000 was not reproduced by more recent studies. Sensitivity for CRC in those studies was only about 60%.16, 17 In all 3 studies, l-DNA had a substantial impact on sensitivity.

Table II. Several DNA-Based Stool Markers Combined
  1. CI, confidence interval; col. colonoscopy; DIA, DNA integrity analysis; HGD, high-grade dysplasia; 1-DNA, long DNA, LOH, loss of heterozygosity; MSI, microsatellite instability; n.r., not reported, o.a.a., other advanced adenomas; quantity, quantity of native stool for further processing; y, years.

inline image

Table III shows studies evaluating stool tests based on the detection of various proteins in stool and one further stool test. Two studies investigated the potential of determining concentrations of decay-accelerating factor (DAF), a membrane-bound glycoprotein regulating activation of the autologous complement cascade.19, 20 The enhanced expression of DAF in CRC may be related to growth factor stimulation.20 Quantitative measurement was performed by means of an ELISA. Testing of multiple samples in the second study increased sensitivity for CRC to 72% (62–81%),20 while specificity in both studies was about 90%.19, 20

Table III. Protein-Based Stool Markers and Other Stool Markers
  1. CA, cholic acid; CEA, carcinoemblyenic antigen; CI, confidence interval; col, colonoscopy; DCA, deoxycholic acid; high-risk, high-risk adenoma (≥ 1 cm and/or severe dysplasia and/or villous components); low-risk, low-risk adenoma (not fulfilling the criteria for high-risk adenoma, see high-risk); n.r., not reported; quantity, quantity of native stool for further processing; STn, sialylated-Tn oligosaccharide;, Tis, carcinoma in situ; Tumor M2-PK, Tumor M2 pyruvate kinase; y, years.

inline image

Seven studies, mainly conducted among high-risk populations, were included, reporting performance characteristics for calprotectin, a calcium binding protein found primarily in granulocytes, macrophages and epithelial cells.21, 22, 23, 24, 25, 26, 27 Fecal concentrations of calprotectin were also measured by ELISA. Sensitivity for CRC ranged 63–90%, and sensitivity for adenomas was 26–80%, while specificity was lower, ranging 47–76%. Although calprotectin was described to be stable in feces for several days at room temperature,21 in most studies stool samples were frozen within a short time period.

Within the group of protein-based markers, the best performance characteristics were reported for the detection of minichromosome maintenance protein 2 (MCM2). In a study with 40 cases and 25 controls, sensitivity was 93% (80–98%) and specificity was 100% (86–100%).30 The time limit between defecation and sample processing in this study was 8 hr.

Regarding sensitivity by location, differences were either negligible9, 19, 20, 25 or, for some markers, sensitivity tended to be lower for lesions proximal to the splenic flexure.11, 16, 29, 30 However, the number of cases per group was mostly rather small.

Discussion

Since it has become obvious that, even in ideal circumstances, screening with FOBT would decrease CRC morbidity and mortality only by a moderate amount, much research is done to develop novel stool tests aimed at realizing the high potential of stool testing for CRC. In this review, a variety of new stool tests proposed for the early detection of colorectal neoplasms were included. These tests are based on markers that cover a wide spectrum of biologic targets, e.g., specific DNA mutations, various proteins and recently also epigenetic alterations. Partly, first results regarding performance characteristics have been encouraging. However, single estimates of sensitivity and specificity often obtained in small studies do not provide sufficient information to assess the suitability of a stool test for mass screening. To provide a differentiated summary of evidence for new stool tests and their potential for CRC screening, this review focuses not only on test performance characteristics but also on aspects of study design, precautions against potential bias and practicalness.

The different stool tests under investigation yielded a broad variety of estimates for sensitivity and specificity. It appears reasonable to compare performance characteristics of new tests to the existing stool test, FOBT, showing a sensitivity of 30–50% for CRC and a specificity of 84–96% (depending on rehydration of slides).34 Compared to these values, sensitivities reported for new tests are mostly higher, whereas for specificity there is no clear trend. Partly, specificity of new tests appears to reflect the degree of association between the neoplastic process and the stool marker. For example, for most tests based on detection of genetic mutations occurring during tumorigenesis, specificities of 95% and higher are reported, while calprotectin, representing a marker of inflammation rather than a tumor-derived marker,35 showed false-positive rates of about 30%. However, direct comparison of performance characteristics is limited since most point estimates of sensitivity and specificity for new tests lack precision as they are based on studies with rather small sample sizes. In addition, the underlying study populations differ to a large extent, mainly regarding the presence of symptoms and the stage of disease.

The above-mentioned performance characteristics for both FOBT and new stool tests refer to one-time application. Repeated testing within a certain screening program would result in a higher sensitivity. At the same time, however, the overall rate of false-positive results would increase as well.

Regarding practicalness in a screening setting, the simplest method of stool testing would be to collect one sample from one stool, which then is sent by mail to the laboratory for further analysis. For FOBT, sampling is more complicated as multiple samples from consecutive stools need to be analyzed and dietary and medication restrictions are required for the widely used guaiac-based FOBT (dietary restrictions are not required for the immunochemical FOBT). By contrast, practicalness of new stool tests appears to be more limited by stability issues. In many studies, stool samples were frozen at –80°C either immediately or within a very short time period, which could hardly be realized under screening conditions. It was rarely discussed whether this type of handling of samples was essential for analysis. Practicalness could also be affected by the quantity of native stool required for further processing, which ranged from 50 mg to the whole stool.

Certain aspects of practicalness, e.g., the sampling procedure, might be of importance concerning compliance with screening. It remains to be clarified whether the chief barrier to participate in screening is inherently related to the collection of stool samples or whether there are differences by sampling modality.

In this review, novel stool tests were classified according to the type of underlying marker(s), either DNA-based or protein-based. This classification partly reflects the complexity of analysis. Regarding DNA-based markers, not only is the sample preparation requiring extraction and amplification of stool DNA time-consuming but even more labor-intensive is the detection of specific mutations. Although technical advances are expected to streamline current methodology, the costs for these tests may still be relatively high. According to a cost–effectiveness analysis, fecal DNA testing is inferior to conventional CRC screening methods.36 By contrast, for many, but not all, protein-based markers, analysis including sample preparation is rather simple. Any protein-based marker being reliable, stable and analyzable by a simple method, e.g., ELISA, could therefore have high potential to become suitable for mass screening regarding large-scale application and related costs.

So far, most studies evaluating new stool tests have been based on a retrospective comparison of cases and controls. The value of such studies in allowing preselection of promising candidates is undisputed. However, they entail the problem that it often remains unclear whether samples from both groups were handled similarly. Only for 2 stool tests have larger prospective studies been carried out, and in both cases, performance characteristics were lower compared to those found in retrospective studies.26, 37

For some stool markers, studies have shown a much lower sensitivity for early stages compared to more advanced stages. This finding underlines the importance of taking into account stage-dependent sensitivity. Further studies should particularly focus on the detection of early and curable cancer stages, which is crucial to the benefit of screening. Similarly, it would be worthwhile to pay more attention to the potential of a stool test to detect precancerous lesions, allowing reduction of incidence by preventing cancer.

Another important aspect for future studies is the sensitivity by location, which is essential to decide whether a marker should only be used in combination with other stool markers or other screening methods. For example, degradation of certain markers during colon passage could result in a higher detection rate for distal lesions, as was observed for the marker K-ras.38 However, BAT26, a marker of microsatellite instability, appears to be more frequent in proximal lesions.15

Once certain stool tests prove suitable in pilot studies in consideration of the qualities mentioned above, their actual potential for CRC screening ought to be evaluated by prospective studies among subjects reflecting the population eligible for screening. Large sample sizes are required to ensure a precise statistical assessment of performance characteristics. Ideally, such a large prospective study would be designed to test various stool tests, including FOBT, in parallel. On the one hand, this study design would warrant highest comparability. On the other hand, it would allow assessment of the potential of combining completely different stool markers in one test.

Information provided in this review may be limited due to incomplete reporting of data in the original articles. For example, for several stool tests, practicalness could not be assessed as details regarding sample collection were not mentioned. Furthermore, this review represents only a descriptive summary of studies. Results from several studies were not combined to generate pooled overall results owing to the limited number of studies per marker and to the heterogeneity of underlying study populations. In this review, the biologic background of the various markers is explained concisely. More detailed information for single markers is available from the original articles, while the basis of genetic markers is introduced elsewhere.39

In conclusion, during recent years much effort has been put into the development of new stool tests aimed at improving current stool testing for CRC with FOBT. According to early results, several tests appear promising with respect to overall performance characteristics. It will be of particular importance for further studies to consider sensitivity by stage and by location, as well as sensitivity for adenomas, and to ensure that handling of stool samples meets conditions practical in mass screening. Taking into account these crucial aspects at the very beginning may allow us to focus future research and large-scale evaluation on stool tests that actually have high potential for a successful population-based screening setting.

Ancillary