Automated identification of circulating tumor cells by image cytometry

Authors

  • Tycho M. Scholtens,

    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    Search for more papers by this author
  • Frederik Schreuder,

    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    Search for more papers by this author
  • Sjoerd T. Ligthart,

    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    Search for more papers by this author
  • Joost F. Swennenhuis,

    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    Search for more papers by this author
  • Jan Greve,

    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    Search for more papers by this author
  • Leon W. M. M. Terstappen

    Corresponding author
    1. Faculty of Science and Technology, Department of Medical Cell BioPhysics, MIRA Research Institute, University of Twente, Enschede, The Netherlands
    • Faculty of Sciences and Technology, Medical Cell BioPhysics, University of Twente, P.O. Box 217, AE Enschede 7500, The Netherlands
    Search for more papers by this author

Abstract

Presence of circulating tumor cells (CTC), as detected by the CellSearch® System, in patients with metastatic carcinomas is associated with poor survival prospects. CellTracks TDI, a dedicated image cytometer, was developed to improve the enumeration of these rare CTC. The CellSearch System was used to enumerate CTC in 7.5 mL blood of 68 patients with cancer and 9 healthy controls. Cartridges containing the fluorescently labeled CTC from this system were reanalyzed using the image cytometer, which acquires images with a TDI camera using a 40×/0.6 NA objective and lasers as light source. Automated classification of events was performed by the Random Forest method using Matlab. An automated classifier was developed to classify events into CTC, apoptotic CTC, CTC debris, leukocytes, and debris not related to CTC. A high agreement in classification was obtained between the automated classifier and five expert reviewers. Comparison of images from the same events in CellTracks TDI and CellTracks Analyzer II shows improved resolution in fluorescence images and improved classification by adding bright-field images. Improved detection efficiency for CD45-APC avoids the classification of leukocytes nonspecifically binding to cytokeratin as CTC. The correlation between number of CTC detected in CellTracks TDI and CellTracks Analyzer II is good with a slope of 1.88 and a correlation coefficient of 0.87. Automated classification of events by CellTracks TDI eliminates the operator error in classification of events as CTC and permits quantitative assessment of parameters. The clinical relevance of various CTC definitions can now be investigated. © 2011 International Society for Advancement of Cytometry

Circulating tumor cells (CTC) may be present in the peripheral blood of patients with cancer and their presence is associated with a reduced probability of survival (1–6). Reports on the number of tumor cells present in peripheral blood vary greatly between studies (7–14). This variation can be mainly attributed to the differences in sample preparation and analysis techniques. To minimize variability in rare cell analysis, standardized and automated sample preparation is essential. In the analysis of the samples, the difficulty is to set criteria by which an event can be classified as a tumor cell and which can be applied across samples, instruments, and operators. In this study, CTC were immunomagnetically enriched for expression of Epithelial Cell Adhesion Molecule (EpCAM) from 7.5 mL blood samples and fluorescently labeled with CD45-APC, Cytokeratin-PE (CK-PE), and DAPI with an automated sample preparation system (14, 15). Analysis of the samples is performed by the CellTracks Analyzer II, a semiautomated fluorescence microscope that presents images of events staining with CK-PE and DAPI to the reviewer for CTC classification (14, 15). The heterogeneity in the morphology of the CTC candidates, mainly caused by a variable degree of apoptosis, makes their classification difficult (13, 16, 17). In this study, we introduce an image cytometer that uses lasers and a Time Delay Integration (TDI) camera to achieve automated classification of CTC (18, 19).

MATERIALS AND METHODS

Patients and Controls

CTC were enumerated in 31 blood samples from patients with primary breast or colorectal cancer, 37 samples from patients with metastatic carcinomas, and 9 samples from healthy donors. All patients provided informed consent and the use of samples was approved by the ethics board of Medisch Spectrum Twente, Enschede, the Netherlands.

Sample Preparation

Blood was drawn in 10 mL CellSave Preservative evacuated blood collection tubes (Veridex LLC, Raritan, NJ). Samples were kept at room temperature and processed within 72 h. 7.5 mL aliquots of blood were centrifuged and placed in a CellTracks® AutoPrep® System. This system uses EpCAM antibody labeled ferrofluids to immunomagnetically enrich cells of epithelial origin. The magnetically enriched cells are fluorescently labeled with the nucleic acid dye DAPI, Phycoerythrin-labeled monoclonal antibodies directed against Cytokeratin 8, 18, and 19 (CK-PE), and Allophycocyanin-labeled monoclonal antibodies against CD45 (CD45-APC). After incubation, the sample is transferred into a CellTracks analysis cartridge, which is present inside the MagNest® cell presentation device. This device presents the magnetically labeled cells at the upper surface of the sample cartridge (20).

CellTracks Analyzer II

The CellTracks Analyzer II is a four-color, semiautomated fluorescence microscope (Veridex LLC, Raritan, NJ). The analyzer uses a 10×/0.45 NA microscope objective in combination with a CCD camera to acquire fluorescence images covering the entire surface of the sample chamber. After image analysis, a gallery of objects stained positive for both CK-PE and DAPI is shown to the operator. A trained operator makes the final decision whether the presented image classifies as a CTC. The CTC criteria require the event to have a diameter of at least 4 μm, round to oval morphology, positive staining for CK-PE, a clearly defined nucleus and negative staining for CD45-APC (14). CD45-APC is added to the assay to avoid classification of leukocytes nonspecifically staining with CK-PE as CTC.

CellTracks TDI Analyzer

After the initial scan with the CellTracks Analyzer II, the analysis cartridge is kept inside the MagNest and the sample is rescanned the same day using the CellTracks TDI system (18, 19). To increase the resolution and sampling density, the CellTracks TDI uses a 40×/0.6 NA objective, instead of the 10×/0.45 NA objective that is used in the CellTracks Analyzer II. TDI image acquisition is used to obtain fluorescence images while the cartridge is scanned through the imaging plane, which minimizes image acquisition time. The use of TDI for image acquisition in cytometry is also used in the ImageStream system (21). Epifluorescence illumination is performed using three different laser lines with wavelengths of 375, 532, and 639 nm. Beam homogenizing optics (Suss-MicroOptics, Neuchatel, Switzerland) are used to create a square 180 × 180 μm2 homogeneous illumination profile. Bright-field images were obtained by backlight illumination using a blue LED, which was placed below the analysis cartridge. For each sample, 12 bit raw images of the whole cartridge surface are acquired for DAPI, CK-PE, CD45-APC and bright-field. After image acquisition, the initial 16.1 GB image data per sample is processed by an ImageJ script (22, 23). This script performs a correction for inhomogeneous illumination and subtracts the background from all images. The signal level of the CK-PE images is used as a threshold. Image locations with a CK-PE pixel intensity of equal to or greater than 50 above the noise level and that are larger than 9 μm2 are marked as possible locations for a CTC. Next, at the marked image locations an area of 30 × 30 μm2 is saved for DAPI, CK-PE, CD45-APC, and bright-field. This smaller data set, with a size of ∼ 172 Mb for 1,000 events, is processed further and the remaining image data is discarded. The script then determines a region of interest (ROI) for each CK-PE event by means of an Otsu threshold algorithm (24). This ROI is used as a measure for the object size and serves as a mask to determine quantitative parameters for an event. The corresponding ROI for the nucleus stained with DAPI is obtained with the same Otsu threshold algorithm with the restriction that the center of this ROI is restricted to the area of the CK-PE ROI. The center of the nucleus has to be located within a radius of 10 μm of the center of the CK-PE ROI and the size is limited to 1.5 times the size of the CK-PE ROI. If no object is found within these restrictions, the same ROI for DAPI is used as for the CK-PE image. The same restrictions hold for the CD45-APC images. These restrictions assure the presence of a ROI for each object and prevent the selection of ROI's for different colors that do not belong to the same event. The size restriction with respect to the CK-PE ROI is necessary to prevent selection of the whole 30 × 30 μm2 image if no objects are found. For each event, using the ROI's for DAPI, CK-PE, and CD45-APC, several parameters are calculated in the following categories: quantitative, correlation, texture, and morphological. These parameters are then used in a classification algorithm to obtain automated event classification.

Automated Event Classification

The basis for automated classification is the availability of a manually classified number of events that are labeled as training- and test set. We used an expert reviewer to classify events in the training and test set into five different classes. These are defined as (1) CTC; DAPI positive, CK-PE positive, CD45-APC negative events with a round to oval nucleus, at least a large part of the cell lined with CK-PE and a nucleus inside the cytoskeleton; (2) Apoptotic CTC; events with an intensity of the DAPI fluorescence that is significantly less as compared to CTC and CK-PE staining with a dotted pattern; (3) CTC debris; consisting of small events, less than 4 μm in diameter, CK-PE positive, but DAPI negative and CD45-APC negative; (4) Leukocytes; DAPI positive, CK-PE dim, CD45-APC positive events with round to oval morphology and a nucleus inside the CD45-APC pattern; (5) Debris; representing anything other than cells of interest, usually aggregates of staining reagents or debris in the sample or destroyed cells.

The automated classifier was developed in Matlab 2009a (Mathworks, Natick, MA). For classification of events we made use of the Random Forest (RF) method (25–29). This classification method has been used previously to classify hyper-spectral (30), bright-field (31), SPECT (32), and fluorescence images (33, 34). The RF method is an ensemble classifier that uses multiple classification trees for decision making. Each tree uses a random subset of the parameters and events present in the entire training set. At each node in the decision tree, a best split is made based on a single parameter. The tree is fully grown until all the parameters are used to form nodes in the tree. Events that were not used to grow the tree are run down the tree to determine the “out-of-bag” (OOB) error rate. This serves as a method for internal cross validation. Since RF classification is an ensemble classifier, multiple trees are grown and each tree votes for a specific class for each event. The total RF classification method uses many trees to achieve balanced voting. In this classifier we used 5,000 trees. This number generated a vote distribution that did not significantly change with the addition of more trees to the forest. This results in a majority vote for each event that is classified by the RF method, which determines its final class. The RF method is versatile as it accepts both categorical and continuous parameters. Also, internal cross-validation is quite accurate in predicting the accuracy of the final classifier, which is defined as the number of correctly determined events, in a single class or in all classes, by the RF classifier divided by the total number of events in the same class. Furthermore, variables like the importance of separate parameters and a measure to determine outliers in the data can give more insight in the data set. To determine the importance of a parameter, its values are randomized and the effect on classification is determined. Parameters that have a large effect on the classification accuracy are said to have high importance. On the contrary, parameters that have a low importance might even be removed from the parameter set without having a significant influence on the classification accuracy. The outlier measure is a value that determines how many standard deviations an event is from the center of that population. Events that are e.g. six or more standard deviations from the median of the population can be further examined to check the correctness of the manual classification. The final RF classifier consists of many trees that all vote for a specific class for each event. A parameter, termed margin, can be constructed that quantifies the “likelihood” of each class for a specific event. It is defined as the difference between the fractions of votes for the majority class minus the fraction of votes for the second largest class. Large values denote that the classifier is relatively sure about the classification, although it does not take into account whether the classification is correct, i.e. the same as the manual classification by the expert reviewer.

RESULTS

CellTracks TDI Data Analysis

CellTracks TDI identifies events at the surface of the analysis cartridge that have passed a threshold on CK-PE and measures several morphological parameters and quantitative fluorescence parameters from these events. The data set for the development of the automated event classification consisted of 11,872 events classified by an expert reviewer from 68 patient samples. One hundred fifty-three parameters were determined for each event. Data from 7,698 events was used as a training set, and the remainder was used as the test set. Events in patient samples were randomly assigned to either the test or training set. Two variables related to the RF classification process were optimized during training of the classifier. Mtry, which is the number of parameters considered for each tree and sampsize, which is the number of events (same for all five classes) that were randomly selected from all the events in the training set. For mtry the value was varied over the interval of 1–24 and the optimal value was 12 as it generated the lowest average internal error rate. This is very close to the default value of mtry that is recommended, which is the square root of the number of parameters.

Default RF classifiers randomly select about one-third of all the events during construction of a new tree. For imbalanced data sets, as is the case here, this is not optimal (35). This is due to the fact that the RF classifier tries to minimize the total error rate during training. Because the most frequent classes are leukocytes, debris and CTC debris, this would favor the correct classification of these classes. This implies that correct classification of less frequent classes is less important. Because the least frequent classes (CTC and apoptotic CTC) are the most important to get correct, a different sampling method has to be used during creation of a new tree, other than random sampling. The best classification is obtained when no class is favored during creation of a tree, resulting in the need for equal numbers of events from all five classes. This is determined by the variable sampsize. To determine the optimal value, it was varied over the range of 10–230 with intervals of 10. A smaller interval did not significantly improve the determination of the optimal value, but it did significantly increase computation time. The optimal value for sampsize was found to be 110.

The importance of individual parameters for correct classification of a specific type of event is illustrated for CTC and leukocytes in Table 1. The higher the number, the more important the feature is for classification. The ratio of CK-PE total intensity divided by CD45-APC total intensity is, with a value of 12.6, by far the most important in classification of CTC. For leukocytes, the slope of the correlation between DAPI and CK-PE intensity is, with a value of 10.0, the most important. Analyzing parameters by their relative importance allows for optimization of parameters by removing unimportant parameters and possibly adding parameters in more important classes.

Table 1. Parameter importance in automated classification of CTC and leukocytes
CategoryParameterCTCLeukocyte
DAPIDAPI/PEPEPE/APCAPCAPC/DAPIBFDAPIDAPI/PEPEPE/APCAPCAPC/DAPIBF
  1. DAPI/PE, PE/APC, and APC/DAPI: ratio parameters that either use two base parameters, in case of total intensity ratio, or use two fluorescence channels in their calculation, in case of slope and R2. First mentioned parameter is either used in the numerator of the fraction or as the first parameter in a calculation.

MorphologyArea0.1 0.6 0.1  0.9 1.5 0.9  
 Perimeter0.1 0.4 0.2  0.9 1.1 0.8  
 Circularity0.2 0.3 0.2  0.2 0.3 0.2  
 Max. caliper0.1 0.6 0.2  1.2 2.2 1.3  
TextureContrast mean1.5 0.8 1.1  1.5 0.2 0.8  
 Correlation range1.2 0.3 1.9  0.4 0.0 1.1  
 Homogeneity mean1.9 1.4 0.2  2.1 1.8 0.8  
 Entropy mean1.5 1.0 0.5  2.1 1.6 0.9  
QuantitativeTotal intensity3.8 1.3 3.8 0.23.9 3.1 3.5 0.2
 Standard deviation3.3 1.2 0.6 0.15.3 1.5 1.0 0.0
 Maximum value3.1 1.3 0.4 0.13.5 2.1 1.1 0.0
 Total intensity ratio 0.5 12.6 3.9  9.3 6.1 2.4 
CorrelationR2 0.4 0.6 0.8  0.8 0.2 0.4 
 Slope 0.6 0.6 0.1  10.0 0.8 2.0 

The final RF classifier was trained using the optimized parameters. OOB error rates were as follows: CTC: 17.4%, apoptotic CTC: 33.8%, CTC debris: 9.4%, leukocytes: 5.5%, debris: 10.9%; and total: 9.9%. To determine actual performance of the RF classifier, it was used on the test set which consists of 4,174 events. The resulting classes were compared to the manual classes as determined by the expert reviewer. The error rates are: CTC: 10.2%, apoptotic CTC: 34.1%, CTC debris: 9.5%, leukocytes: 4.0%, debris: 10.8%, and total: 9.6%. To compare the classification by the expert reviewer and the automated classifier a confusion matrix was constructed to illustrate the number and percentage of events that were classified into each class by the expert reviewer and the automated classifier, which is shown in Table 2. Results from all classified events are given in Table 2 as well as a separation into the training and test set. Values between brackets indicate the percentage of events in a specific field based on the total number of events as classified by the expert reviewer. Percentages on the diagonals of the three subtables indicate the recall rate from a specific class. Recall rate is defined as the number of events, for a specific class, correctly classified by the RF classifier divided by the total number of events determined to be in that class by the expert reviewer. Recall rate is a measure of sensitivity. Another value that can be calculated from Table 2 is precision. It is defined as the number of events, for a specific class, correctly classified by the RF classifier divided by the total number of events determined to be in that class by the RF classifier, giving a measure of specificity. Precision values calculated from the total data set are (range 0–1): 0.50 for CTC, 0.23 for apoptotic CTC, 0.87 for CTC debris, 0.86 for leukocytes, and 0.98 for debris. This shows that few events are falsely classified as debris; however, some events that are debris are incorrectly classified as CTC-related events.

Table 2. Confusion matrices showing the relation between manually determined classes (rows) and classes that were determined by the automated classifier (columns) in the training set, test set, and total data set
 Automated assignment of events
 CTCApoptotic CTCCTC deprisLeukocytesDebris
  1. Values in brackets indicate the percentage of events based on manual classification.

 Training set    0
Assignment of events by expert reviewerCTC 76 (83) 14 (15)1 (1)1 (1)0
Apoptotic CTC 14 (18) 53 (66)9 (11)04 (5)
CTC debris 20 (1) 87 (6)1,356 (91)7 (0)26 (2)
Leukocytes  3 (0)  2 (0)01,270 (94)69 (5)
Debris 43 (1) 84 (2)192 (4)190 (4)4,177 (89)
Test set 
CTC 44 (90)  5 (10)000
Apoptotic CTC  8 (18) 29 (66)4 (9)03 (7)
CTC debris 11 (1) 47 (6)708 (91)3 (0)13 (2)
Leukocytes000713 (96)30 (4)
Debris 23 (1) 37 (1)91 (4)124 (5)2,281 (89)
All events     
CTC120 (85) 19 (13)1 (1)1 (1)0
Apoptotic CTC 22 (18) 82 (66)13 (10)07 (6)
CTC debris 31 (1)134 (6)2,064 (91)10 (0)39 (2)
Leukocytes  3 (0)  2 (0)01,983 (95)99 (5)
Debris 66 (1)121 (2)283 (4)314 (4)6,458 (89)

An example of a measurement of a sample from a metastatic carcinoma patient is shown in Figure 1. Panel A shows the total DAPI intensity versus the ratio of the total CK-PE intensity divided by the total CD45-APC intensity and Panel B shows the total CD45-APC intensity versus the mean DAPI intensity. These four parameters were the most important parameters during training of the automated classifier. The parameter on the y-axis of the graph in Panel A being the most important and the parameter on the x-axis of Panel A being the second most important. Panel B shows the third and fourth most important parameters. Each dot represents an event that has passed the threshold. The analysis algorithm identified five classes of events that are represented by different colors. Magenta represents events classified as CTC, green represents events classified as apoptotic CTC, orange represents events classified as CTC debris, blue represents events classified as leukocytes, and gray represents events classified as debris. The size of the dots varies between 3 and 9 pixels in diameter, depending on the margin with which that event was classified. The margin indicates the difference in the number of votes for the most likely and second most likely classes divided by the total number of votes. One can clearly see that events which are located away from the center of their respective clusters are smaller in size, denoting lower certainty in classification.

Figure 1.

Scatter plots of CellTracks TDI CTC analysis of 7.5 mL of blood from a patient with metastatic carcinoma showing the four parameters with the highest importance during training of the RF classifier. The size of each dot varies and depicts the margin of the automated classification of the event. A larger dot indicates a higher confidence. In total, 246 events are shown in the figure; 22 CTC colored magenta, 18 apoptotic CTC colored green, 62 CTC debris colored orange, 75 leukocytes colored blue, and 69 events debris colored dark gray. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 2 shows typical images from the five classes of events identified by the automated algorithm after they have passed the CK-PE threshold on the CellTracks TDI system. To the right of the images, a column chart is shown which indicates the percentage of votes that is assigned to each of the 5 classes for the corresponding event. Row A represents an event that has a high likelihood of being a CTC. The event has a clear nucleus (DAPI) and cytoskeleton (CK-PE), no signal for the leukocyte marker (CD45-APC) and a round to oval morphology (Bright-field). The composite of the nucleus (blue) and cytoskeleton (green) is shown in the overlay and the bright-field image is added to the composite in the adjacent image. The certainty of the event being classified as a CTC is 81% with a margin of 81%–13% = 68% compared to the next most likely apoptotic CTC class. Row B shows a typical apoptotic CTC with some nuclear material, punctuated cytokeratin staining, no staining with CD45-APC and a bright-field image showing a round-to-oval morphology. The typical cytokeratin staining can be attributed to the collapse of the cytoskeleton, which results in retraction of the cytokeratin filaments. The certainty of the event being classified as an apoptotic CTC is 69% with a margin of 52% compared to the next most likely CTC debris class. Row C shows CTC debris, with no staining of DAPI or CD45-APC, but clear staining with CK-PE. The diameter of this event is smaller than 4 μm and it has a round appearance in the bright-field channel. The certainty of the event being classified as CTC debris is 88% with a margin of 81% compared to the next most likely apoptotic CTC class. Row D shows an event with a clear nucleus and staining with both CK-PE and CD45-APC. The bright-field image shows a round to oval morphology consistent with that of a leukocyte. The overlay images are consistent with that of a leukocyte non-specifically binding to CK-PE. The certainty of the event being classified as leukocyte is 90% with a margin of 82% compared to the next most likely debris class. Row E shows images associated with debris. These events frequently show similar staining in all three fluorescence channels. The certainty of the event being classified as debris is 69% with a margin of 56% compared to the next most likely leukocyte class.

Figure 2.

CellTracks TDI images of the five event classes. Row A shows an intact CTC; Row B an apoptotic CTC; Row C CTC debris; Row D a leukocyte; and Row E debris. The images show, from left to right, DAPI, CK-PE, and CD45-APC fluorescence, bright-field, overlay of DAPI (blue), CK-PE (green), and CD45-APC (red) and the same overlay with the bright-field image added. The column chart to the right shows the distribution of votes for each class by the automated classification method for each of the five images. Colors in columns correspond to those in Figure 1. aCTC, apoptotic CTC; CTCd, CTC debris. Scale bar represents 10 μm. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Inter and Intra-operator Variability

The automated classification of events by the RF method is based on manual classification of those events by an expert reviewer. This expert reviewer uses several guidelines to determine the class of an event, like the presence of CD45-APC, size, and intensity of the nucleus (DAPI), and so forth. Also, the current CellTracks Analyzer II uses operator based review of images of events to obtain a final CTC count. This method introduces uncertainty due to variations between multiple reviewers. A statistical study has shown that the operator variability is a major factor in consistent analysis of patient CTC samples (16).

To determine the inter-reader variability and agreement on CellTracks TDI images, 1,000 events were randomly selected from the entire database of 11,872 events. These 1,000 events were then presented to five expert reviewers. They were asked to assign a class, 1 through 5, to each event. Possible classes were the same as those used in the automated classification, namely: CTC, apoptotic CTC, CTC debris, leukocytes, and debris. Every event had to be classified before the next event could be viewed. Table 3 shows the inter-reviewer agreement between the five expert reviewers and their agreement with the automated classifier. The average agreement between the five expert reviewers is 84.8% with a standard deviation of 5.1%. This means that, when presented with a random event, a second reviewer will classify the event the same as the first reviewer with an ∼ 85% chance. The intra-reviewer agreement was determined in only one expert reviewer and was found to be 96.6%. Second review of the same events was done on different days and after randomization of the data set to obtain a realistic value. The agreement between the five reviewers and the automated classifier is on average 85.4% with a standard deviation of 4.7%. Since there is no absolute definition of a correct class for each of the 1,000 events, we used the class that occurred most frequently among the five reviewers as being the “correct” class. Distribution of classification among reviewers was such that 67.9% of all events were agreed upon by all five reviewers (all five voted for the same class) and 21.5% of events were agreed upon by four of five reviewers. Thus, 89.4% of all 1,000 events were classified the same by at least four of five reviewers. To determine if one class was more present in one category or the other, we compared the 894 events that were confidently agreed upon and the 106 events that were not. A ratio was determined that compares the fraction of a specific class in each of the two categories, effectively determining what class was most occurring in the events that were not readily agreed upon. These ratios are: CTC: 6.8, apoptotic CTC: 12.7, CTC debris: 1.3, leukocytes: 1.0 and debris: 0.7. This shows that reviewers disagreed the most on the classes CTC and apoptotic CTC. Further statistical analysis of the classes as assigned by the five reviewers was carried out (36–39). Average agreement between reviewers is 84.8%; however, some agreement between reviewers is to be expected based on chance. Kappa scores (39) were determined to illustrate the level of agreement between reviewers that is greater than chance and normalized to 1-chance. Overall kappa score is 0.75, indicating a reasonable to good agreement between reviewers. The kappa scores were also calculated for each category separately and were as follows: CTC: 0.55, apoptotic CTC: 0.38, CTC debris: 0.86, leukocytes: 0.73 and debris: 0.74. This confirms that the highest agreement between reviewers, above chance, is observed in CTC debris, leukocytes, and debris.

Table 3. Inter-reviewer agreement between five manual classifications and automated classification, using random forests, for a random selection of 1,000 events
 Rev. 2Rev. 3Rev. 4Rev. 5Auto
  1. Rev. 1: Reviewer 1, Auto: Automated classifier.

Rev. 191.0%79.3%89.5%85.7%89.8%
Rev. 2 79.6%90.4%87.4%87.9%
Rev. 3  80.4%77.4%77.6%
Rev. 4   87.2%86.7%
Rev. 5    84.9%

Comparison of CTC Analysis by CellTracks Analyzer II and CellTracks TDI

Identification of leukocytes

The efficiency of excitation of APC with a Mercury arc lamp, as used in the CellTracks Analyzer II, is less as compared to the 639 nm laser line used for APC excitation in the CellTracks TDI. This is illustrated by the signal intensity in the CD45-APC images in Figure 3. Blood samples from nine healthy donors were prepared for CTC analysis. The sample cartridges were first analyzed by the CellTracks Analyzer II followed by analysis by the CellTracks TDI. Data analysis of the CellTracks Analyzer II was modified such that only CD45-APC positive and DAPI positive events were presented for review. An operator classified the events as leukocytes when the morphology of the events was consistent with that of a cell. Leukocytes in the CellTracks TDI were identified as CD45-APC positive, DAPI positive cells. The number of leukocytes identified by the CellTracks TDI ranged from 873 to 5245 (mean: 1989, SD: 1,588), whereas the CellTracks Analyzer II identified only 286–3383 (mean: 994, SD: 1,033) leukocytes in these samples. This corresponds to 30–80% (mean: 47%) of the leukocyte population identified by the CellTracks TDI.

Figure 3.

Comparison of images of leukocytes recorded with the CellTracks TDI and the CellTracks Analyzer II. Panel A shows three different cells analyzed by the CellTracks TDI. First column shows DAPI fluorescence, second column shows CK-PE fluorescence, and the last column shows CD45-APC fluorescence. Panel B shows the line profiles from the CD45-APC images in A. Panel C shows the same three cells now analyzed by the CellTracks Analyzer II. Panel D shows the corresponding line profiles. Scale bar represents 5 μm.

Identification of CTC

To compare the two instruments, the same patient samples were analyzed on CellTracks Analyzer II and CellTracks TDI. Figure 4 shows images of the same seven CTC acquired with the two different instruments. Images of the DAPI, CK-PE and CD45-APC channels as well as an overlay image of DAPI and CK-PE are shown for each CTC. In addition, a bright-field image is recorded and shown for CellTracks TDI. The detail in the images is clearly better in the CellTracks TDI images, as can be expected from the higher NA objective used in this instrument.

Blood samples from 68 patients with carcinoma and 9 healthy donors were prepared for CTC analysis by the CellTracks Autoprep and analyzed by the CellTracks Analyzer II. After analysis, the same sample cartridge was analyzed by the CellTracks TDI. Figure 5A shows the comparison of the number of CTC detected by both instruments for the 68 patients. It shows a good correlation of R2 = 0.87 with a slope of 1.88 and an intercept of 0.2. Bland–Altman data (not shown) reveals the interchangeability between both methods with a bias of 1.56 and 95% confidence levels in CTC counts of −9.5 and 6.4. Figure 5B shows the relative occurrence of CTC, apoptotic CTC, and CTC debris after automated classification of patient samples. The samples were sorted by decreasing number of CTC. In 17 (25%) of the samples, CTC debris and or apoptotic CTC were detected when no intact CTC were detected.

Figure 4.

Images of seven events classified as CTC or apoptotic CTC obtained with both CellTracks Analyzer II (left) and CellTracks TDI (right). Scale bar represents 5 μm. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 5.

Comparison of CTC counts obtained with CellTracks Analyzer II and the automated classification of CTC with the CellTracks TDI (Panel A) and the variation in CTC count, apoptotic CTC count, and CTC debris count by the automated classification with the CellTracks TDI (Panel B) across 68 patient samples. Data points in Panel A were jittered by up to half a count and both axis have combined linear and logarithmic scales, changing between 0 and 1. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

DISCUSSION

Presence of CTC in blood of patients with metastatic carcinomas is associated with poor survival prospects and their persistence after initiation of therapy strongly suggests an ineffective treatment (1–6). A correct assessment of the presence of CTC is therefore of utmost importance. The prospective multicenter studies leading to these findings were made possible through the use of standardized CTC enumeration by the CellSearch system. In this system, CTC classification relies on the final selection by an operator on a set of computer pre-selected events based on cytokeratin and DNA content. Variability of CTC classification by different operators has been identified as the main source of error and a likely explanation that more than one CTC has to be detected before a sample is considered CTC positive. In the CellTracks Analyzer II, images presented to the operator are auto-scaled to increase visibility. However, the human eye can only discern approximately 100 gray levels, making the deduction of quantitative information by the operator from such images difficult. In auto-scaled images the relation between “true” signal and background is less apparent. A dim nucleus may appear to be just as bright as a brighter nucleus and a slight difference in monitor settings may already result in a different interpretation by the operator. The large heterogeneity of the morphological appearance of CTC further contributes to the variation in classification of objects as CTC. In this study, CellTracks TDI was developed to automate the classification of events based on RF classification using several types of parameters. The major differences between the CellTracks Analyzer II and CellTracks TDI are the objective, 10×/0.45 NA versus 40×/0.6 NA, the use of a mercury arc lamp versus laser lines and a regular CCD camera versus a TDI camera. These differences result in a higher resolution and an increase in sensitivity. The higher sensitivity for APC resulted in a higher efficiency of the detection of leukocytes identified by CD45-APC by CellTracks TDI. The lower efficiency of leukocyte detection by the CellTracks Analyzer II can most likely be contributed to granulocytes that express the CD45 antigen at lesser density. The higher APC sensitivity of the CellTracks TDI results in the ability to identify DAPI positive, CK-PE negative, CD45-APC dim intact cells. CTC that express EpCAM, but lack cytokeratin 8, 18, and 19 will be among these cells. CellTracks TDI can be used to investigate the frequency of such cells in patients with cancer and to verify whether or not they are indeed CTC by detection of chromosomal abnormalities (19,40,41).

CTC are extremely rare and accurate detection of a few CTC in 7.5 mL of blood is strongly dependent on the Poisson distribution and the assay variability. Statistical calculations (16) have shown that one or more CTC, at 95% confidence, are detected in a 7.5 mL blood sample, after processing using CellSearch, for in-vivo concentrations down to 2,600 CTC in 5 L of blood. This amounts to a concentration of ∼ 0.5 CTC/mL. Classifying objects as CTC was found to be the largest contributor of errors in CTC determination (16) and the automated CTC classification presented here can help to reduce this error. The CellSearch system recovers 85% of cells from the tumor cell line SKBR-3 spiked in 7.5 mL of blood (14). The specificity of the assay is 99.7% for healthy subjects and patients with nonmalignant disease. In healthy subjects, only 8 of 145 samples (5.5%) were found to have 1 CTC per 7.5 mL of blood and in all others no CTC were detected (14).

The RF method that was used for automated classification of events proved to be versatile. It can for example be used to determine an outlier measure of events based on the original manual review of events. This could further improve classification in future measurements because the manual classification, which is the basis for automated classification, can be further optimized. Also, the parameters that are used in the classification can be optimized by determining their importance. Some parameters that were in the original set as used in this article proved to be of little to no importance in classification. Some further initial experiments indicate that these parameters can be left out of the data set without any significant detrimental effect on classification accuracy but reducing the time it took to train the classifier.

General classification accuracy of the automated classifier proved to be good with excellent recall rates for all classes except apoptotic CTC, which was only 66%. Precision of classification was excellent for CTC debris, leukocytes and debris, mediocre (0.50) for CTC and low for apoptotic CTC. The lower precision of classification for CTC and apoptotic CTC is mainly due to the false positive events which are, according to manual review, debris but are not classified as such. Although the percentage of these events is low, their relative number as compared to the total number of e.g. CTC is high due to the large number of debris events present in the patient samples. Further improvements to the CellTracks TDI system might therefore come from reducing the number of debris events in the patient samples after preparation. Also, the automated classifier could possibly be trained to prevent false positive classification of events as CTC or apoptotic CTC. However, we do not know in what way this will affect the recall rate of the classifier, which is currently 90.4% over the entire dataset.

Inter-reviewer variability and agreement results indicate that the automated classifier has similar agreement to the expert reviewer that was used for classification of the entire dataset as compared to the five reviewers among each other. Also, the intra-reviewer agreement was show to be 96.6%, whereas that of the automated classifier is of course 100%. These results indicate that the automated classification of events in the CellTracks TDI by the RF method is comparable to a human expert reviewer while being always consistent. This method could therefore be used to standardize classification of CTC in the CellTracks system. Provided however that separate instruments have similar output values with regards to e.g. fluorescence intensity or at least a way to measure and process the difference between them and standardize the output values.

Results from Figure 5A indicate that in 25% of patient samples, CTC debris was present without the presence of CTC. Destruction of intact CTC by mechanical stress during the immunomagnetic enrichment could lead to CTC debris. This is however unlikely as such phenomenona are not observed when cells from tumor cell lines spiked in blood are processed. Moreover, CTC debris can be observed in patients with large numbers of CTC that have not been immunomagnetically enriched (42). The presence of apoptotic bodies near the tumor site and in the blood stream are reported by others (17,43–46) as well as an increase of cytokeratin 18, 19 and caspase cleaved cytokeratin in serum of patients with carcinomas (47–52) supporting the notion that the detected tumor debris is indeed directly related to the presence of a tumor. A recent retrospective analysis of EpCAM+CD45CK+ objects detected with the CellSearch system in the prospective multicenter CTC study of castration-resistant prostate cancer showed that the smaller CTC-related objects also correlated to poor survival (13). Only CTC that express EpCAM as well as Cytokeratin 8, 18, or 19 will be detected by this system. The frequency and the relation to clinical outcome of CTC that do not express these antigens are being explored through a variety of alternative approaches for the detection of CTC (53–60).

Although only viable CTC will have the potential to form metastasis, the presence of apoptotic CTC or CTC debris might also be an indicator of a worse prognosis. However, this will have to be uncovered in controlled clinical studies. CTC debris events occur more frequently than intact CTC and can therefore be enumerated with a smaller statistical error. The discrimination between the different CTC populations may also be of value for determination of therapy effectiveness. Shortly after administration of a therapy, a decrease in intact cells and an increase in cell debris might suggest an effective therapy.

Acknowledgements

We thank Dr. M. de Groot from MST hospital in Enschede, The Netherlands, for kindly supplying blood specimens from the patients.

Ancillary