- Top of page
- MATERIALS AND METHODS
- LITERATURE CITED
- Supporting Information
Digital image analysis of cell nuclei is useful to obtain quantitative information for the diagnosis and prognosis of cancer. However, the lack of a reliable automatic nuclear segmentation is a limiting factor for high-throughput nuclear image analysis. We have developed a method for automatic segmentation of nuclei in Feulgen-stained histological sections of prostate cancer. A local adaptive thresholding with an object perimeter gradient verification step detected the nuclei and was combined with an active contour model that featured an optimized initialization and worked within a restricted region to improve convergence of the segmentation of each nucleus. The method was tested on 30 randomly selected image frames from three cases, comparing the results from the automatic algorithm to a manual delineation of 924 nuclei. The automatic method segmented a few more nuclei compared to the manual method, and about 73% of the manually segmented nuclei were also segmented by the automatic method. For each nucleus segmented both manually and automatically, the accuracy (i.e., agreement with manual delineation) was estimated. The mean segmentation sensitivity/specificity were 95%/96%. The results from the automatic method were not significantly different from the ground truth provided by manual segmentation. This opens the possibility for large-scale nuclear analysis based on automatic segmentation of nuclei in Feulgen-stained histological sections. © 2012 International Society for Advancement of Cytometry
DIGITAL pathology is one of the main application fields of automated medical image analysis. A steadily improving imaging technology has given us an increasing volume of high quality medical images. This increasing volume of images, both in routine clinical work and in research and development, calls for an increasing degree of automation of image analysis. We are also at the brink of moving away from subjective evaluation being done at the microscope, toward automated and more objective diagnostic and prognostic work, provided that we can ensure the quality of quantitative measurements of relevant image features.
Digital image analysis of cell nuclei is a very useful method to obtain quantitative information for the diagnosis and prognosis of human cancer (1). But before we can extract meaningful parameters describing the cell nuclei of a specimen, we have to segment the nuclei from the rest of the image. Nuclei in tissue sections are in general difficult to segment by purely automatic means because; (1) the cells may be clustered, (2) the image background varies, (3) there are intensity variations within the nuclei, and (4) the nuclear boundary may be diffuse, either along the whole perimeter or in sectors of varying width. Fluorescence microscopy images of nuclei in tissue sections often show an uneven background, due to autofluorescence from the tissue and fluorescence from objects that are out of focus, while light-microscopy images of nuclei in tissue sections are even more complex due to the presence of a visible background (2). Therefore, segmentation of nuclei in light-microscopy images of thin sections is often done in a semiautomatic or interactive way (3–6). The lack of a reliable automatic nuclear segmentation is a limiting factor for high-throughput nuclear image analysis in light-microscopy images of routine histopathological sections.
A large number of methods for segmentation of cells and cell nuclei for several biomedical applications have been published. A large amount of recently published segmentation methods are applied on fluorescence images (7–20), and examples of applications on such images are spatial analysis of DNA sequences and nuclear structures, nuclei tracking and studies of surface-stained living cells. In our institute, we have developed a system for high-throughput nuclear texture analysis to perform a prognostic classification of cancer patients, based on up to 50,000 measured nuclei/case. Using light-microscopy images of Feulgen-stained sections we obtain more detailed morphological information compared to fluorescence images.
Segmentation of cell nuclei can be viewed as an object modeling problem. Successful global thresholding requires that the nuclei have a range of intensities that is sufficiently different from the background. This is generally not true, since the background varies. The result may be improved by adaptive thresholding, but large intensity variations between and within the nuclei will cause the model to fail. Region growing methods, e.g., the watershed algorithm, are based on the assumption that the objects consist of connected regions of similar pixels. Region growing methods combined with region-merging are commonly used in segmentation of cells and nuclei from fluorescence images (7, 10). Again, large intensity variations between and within the nuclei may cause the model to fail. Edge-based segmentation models the nucleus as an entity bounded by an intensity gradient. In practice, the detected edges are not always sharp, they do not cover the whole circumference of all nuclei, and clustered cells pose a problem. Active contour models or snakes (21) are widely used in medical image segmentation (11, 12, 15, 16, 22, 23). However, these methods are sensitive to initialization of a start contour or a seed inside each nucleus. Baggett et al. (18) and McCullough et al. (19) proposed semiautomatic methods based on dynamic programming for segmentation of cells and nuclei, requiring the user to mark two points per cell (or nucleus), one approximately in the center and the other on the border.
Bengtsson et al. (2) discussed the relative advantages of different approaches to cell segmentation, and concluded that more robust segmentation can be obtained if a combination of cellular features such as intensity, gradient, and shape is used. Based on this, Bengtsson et al. (2) and Wählby et al. (7) proposed a seeded watershed algorithm as the most useful tool for incorporating such features into the cell model. They presented a seeded watershed segmentation of the gradient magnitude image in which seeds representing both object and background are created by combining morphological filtering of both the original image and the gradient magnitude image. Subsequent steps (merging of weak borders and cluster separation) refined and improved the results. Plissiti et al. (23) proposed a seeded watershed segmentation of nuclei in conventional Pap stained cervical smear images that combined intensity, gradient, and shape information. They found that the proposed method produced more accurate nuclear segmentation compared to a gradient vector flow deformable model and a region based active contour model. Malpica et al. (20) proposed a watershed algorithm combining gradient and shape information for splitting clusters of nuclei in blood and bone marrow preparations.
The aim of the present study has been to develop a method for automatic segmentation of cell nuclei in light-microscopy images of Feulgen-stained histological sections of prostate cancer. We have not found any recent studies on automatic segmentation of nuclei from such prostate sections. We have followed the combined approach of Bengtsson et al. (2), but we have combined adaptive thresholding with an active shape model. Initial tests using global thresholding, edge detection, watershed with different region merging techniques, and also the method of Wahlby et al. (2, 7) gave poor results on our images. But it is difficult to compare our approach with the method of Wahlby et al., because the material, the staining, and the imaging modalities are so different.
Our combined approach for automatic segmentation of nuclei from Feulgen-stained sections of prostate cancer is based on a number of interacting steps from several strategies traditionally used in image segmentation: A gradient-validated local adaptive thresholding and an active contour model that features an optimized initialization and works within a restricted annular region to improve convergence of the segmentation of each nucleus. We propose a method based on several steps to; (1) detect the nuclei, (2) optimize the initial contours for the snakes by a coarse segmentation, (3) optimize the convergence of the snakes, and (4) split overlapping segmentation masks.
To detect the nuclei and make binary “marker” images to find a start contour for each snake we use local adaptive thresholding. On the basis of the evaluation of Trier and Taxt (24) and Trier and Jain (25) we have used the Niblack method (26) with the postprocessing step of Yanowitz and Bruckstein (27). Initial tests using a traditional snake (21) required a very precise initialization, and therefore gave poor results. In the present study, we therefore used the gradient vector flow (GVF) snake (28, 29).
The evaluation of the segmentation result is a very important issue, and a quantitative evaluation is vitally important, as image data sets should be much too large to oversee manually. Udupa et al. (30, 31) considered three factors when evaluating segmentation methods: precision (reproducibility in the presence of data variability), accuracy (agreement with the real object), and efficiency (computer and user time needed). In the evaluation of medical image segmentation results, we cannot always rely on the manual segmentation as the “ground truth,” as it is affected by intra- and interexpert variability. In addition, the merging of “ground truths” from several experts into “latent gold standards” is not trivial (32, 33). A number of validation frameworks and metrics have been proposed, but there is no general consensus on which approach to use. Metrics from general computer vision that treat under- and oversegmentation as equivalent errors of accuracy will run into trouble in some medical applications (31). So segmentation accuracy has to be combined with knowledge about clinical relevancy and the medical impact of different errors (34).
A very small number of samples seem to be a common characteristic of many papers on evaluation of accuracy in medical image segmentation. Popovic et al. (34) suggest a new validation metric to assess segmentation accuracy, but includes only a single CT image of calvarial tumors in six different patients as a case study. Udupa et al. (30) outlines the methodology for evaluation, but applies it to a single MR image of a patient's brain. Warfield (33) proposed a method to estimate the accuracy of automated segmentation as compared to a group of expert segmentations, and simultaneously measure the quality of each expert, but illustrated its application on just three different MRI images. Einstein et al. (35) discuss and compare reproducibility (agreement among repeated measurements) and accuracy (agreement between measurement and external standard) in interactive cell segmentation. Cytological material from six patients with invasive ductal carcinoma of the breast was used, but only six nuclei per patient were segmented. The method of Wählby et al. (7) was tested on six different 2D images containing a total of 689 cells, and gave about 90% correct segmentation, in the sense that it tallied with manual counts from the same image fields. Thus, the geometrical accuracy of the segmentation of each cell was not considered.
Our segmentation method was developed using a “training set” of two frame images taken from one case of prostate cancer and was evaluated using 30 randomly selected frames (including the two training frames) from three cases. The evaluation was performed by comparing the segmentation results with a manual delineation of 924 nuclei from the 30 frames.
- Top of page
- MATERIALS AND METHODS
- LITERATURE CITED
- Supporting Information
Two frame images were used to develop the segmentation method, and the first frame is used as an “example frame” in some of the figures (see Fig. 4). The method was tested on 28 randomly selected frame images from three cases (11 frames from Case 1, 8 from Case 2, and 9 from Case 3).
Manual delineation of the 30 frames resulted in a total of 924 manually segmented cell nuclei, of which 4 were overlapping nuclei and therefore excluded from the evaluation below. From the first part of Table 1 we see that if we include all nuclei larger than 450 pixels, about 68% of the manually segmented nuclei were also segmented by the automatic method. The mean sensitivity (95%) and the mean specificity (96%) averaged over all the nuclei segmented both manually and automatically in the 30 frames from the three cases is (almost) the same as in the training frames. Figure 5 shows ROC data for each frame of the three cases. The automatic method segmented 96% more “nuclei” compared to the manual method. However, some of the “nuclei” that were segmented only by the automatic method were erroneous segmentations capturing only a part of the nucleus.
Figure 5. ROC data computed from 13 image frames from “Case 1” (a), eight frames from “Case 2” (b), and nine frames from “Case 3” (c). The segmentation sensitivity on the vertical axis, and (1-specificity) on the horizontal axis are the mean values computed from all nuclei of the frame that were both manually and automatically segmented. The graphs show only the 1% upper left corner of the ROC plane.
Download figure to PowerPoint
Table 1. The percentage of manually segmented nuclei that were also segmented by the automatic method, the mean sensitivity and specificity of the automatically produced segmentation masks, and the total number of automatically and manually segmented nuclei
| ||Training||Case 1||Case 2||Case 3|
|Nuclear area > 450 pixels|
|% of manual segmentations||73.4||71.3||68.7||64.9|
|Automatic/manual (ratio)||144/96 (1.5)||770/426 (1.81)||512/230 (2.23)||530/268 (1.98)|
|Nuclear area > 2400 pixels (diameter > 5.6 μm)|
|% of manual segmentations||83.0||75.1||75.2||68.1|
|Automatic/manual (ratio)||73/61 (1.2)||373/328 (1.14)||186/172 (1.08)||239/206 (1.16)|
If we include only the biologically most relevant nuclei, i.e., the nuclei larger than 2,400 pixels (corresponding to a nuclear diameter D > 5.6 μm), the automatic method segmented only 13% more candidates for nuclei compared to the manual method, about 73% of the manually segmented nuclei were also segmented by the automatic method, but the mean segmentation sensitivity/specificity remained at 95%/96% (second part of Table 1).
Comparing Nuclear Area of Nuclei Segmented by Both Methods
A correlation analysis indicates a very close relation between the nuclear area of the manual and automatic segmentation results on the same nuclei (r = 0.99). However, the high correlation may be quite misleading, and Bland–Altman scatter plots (41) of feature differences versus feature mean values are better suited to compare the results from nuclei segmented by both approaches.
In the Bland–Altman plot (Fig. 6), the estimated mean difference μd reveals that there is a small but systematic bias towards a slightly larger manual nuclear area. From Figure 6 we can also observe that the spread of the difference varies with the area, so that the “limits of agreement” (LOA = μd ± 2σd) of the area difference, and the 95% confidence intervals of these limits, cannot be used to assess whether the area accuracy is acceptable or not. However, the curves in Figure 6 illustrate that a large fraction (60–80%) of the spread seen in the Bland–Altman plot is within the error margin of a one pixel wide perimeter outside/inside each manually segmented object. So the area accuracy seems to be acceptable for both small and large nuclei.
Figure 6. Bland-Altman plot of the difference in nuclear area between manually and automatically segmented nuclei from “Case 1” versus the average area from the two methods. The mean difference μd (red solid line), the “limits of agreement” LOA (green solid lines), and the 95% confidence intervals of the LOA (blue dashed lines) are also shown. The curves show the difference-versus-average of digitized circular objects with or without an outside/inside 4- (dashed curves) or 8-connected perimeter (solid curves). The smallest nuclei are regarded as blood cell nuclei, while the larger nuclei (with more than 2,400 pixels or a nuclear diameter D > 5.6 μm) are regarded as epithelial cancer cell nuclei.
Download figure to PowerPoint
Comparing Nuclear Gray Level Features of Nuclei Segmented by Both Methods
An interesting question is whether the observed bias and spread in nuclear area is reflected in various important nuclear features (see Fig. 7). The regression line in Figure 7a is given by DM(g) = 0.004 + 0.097 (±0.021) DM(a), with a correlation coefficient of r = 0.47. This implies that a 6% difference in area compared to the mean (which covers about 75% of the spread in the Bland–Altman area plot) will result in a mere 1% change in the difference-to-mean ratio of the mean nuclear gray level.
Figure 7. The difference-to-mean ratio of two nuclear features [(a) the mean gray level and (b) the gray level entropy)] for the manual and automatic segmentation versus the corresponding difference-to-mean ratio of the nuclear area for the nuclei of “Case 1.” The regression line (red), the 95% confidence interval of the regression line (blue dashed), and the 95% prediction interval (green dashed) are also given.
Download figure to PowerPoint
The regression line in Figure 7b is given by DM(H) = 0.008 + 0.119 (±0.017) DM(a), with r = 0.62. This implies that a 6% difference in area compared to the mean will result in only a 1.5% change in the difference-to-mean ratio of the nuclear entropy value. If the automatic segmentation is close to the real nuclear boundary, while the manual one is slightly larger, the latter will include some pixels from the brighter background, increasing the nuclear entropy. The difference-to-mean ratio of the gray level power-to-mean ratio (PMR = standard deviation divided by mean value), which is highly correlated with entropy, gave similar results.
Comparing Nuclear Features of Nuclei Segmented by Only One of the Methods
The biologically most relevant nuclei, i.e., the nuclei larger than 2,400 pixels, were split into three groups (Supporting Information):
Group 1: Cell nuclei segmented by both methods (already analyzed)
Group 2: Cell nuclei segmented only by manual experts
Group 3: Cell nuclei segmented only by the automatic algorithm
We have checked if there are any systematic differences in characteristic nuclear parameters (e.g., nuclear area, mean nuclear gray level, and gray level entropy) between Group 1 and 2, and between Group 1 and 3. The differences between parameters obtained from the manual and automatic segmentation of nuclei of Group 1 have been analyzed above for all sizes of nuclei.
We have investigated whether the distributions in the two scattergrams: (a) mean nuclear gray level versus nuclear area, (b) gray level entropy versus nuclear area, are similar for the manual and the automatic segmentation of the group of nuclei that have been segmented both manually and automatically (Group 1), and whether there are differences between the scattergrams from manual segmentation in Group 2 and 1, and from Group 3 and 1.
We found that for Group 1 (the nuclei segmented by both methods), the difference between the manual and the automatic segmentation results are within acceptable limits. The comparison with Group 2 (the nuclei segmented only manually) shows that the distribution of nuclei that are left out by the automatic segmentation algorithm is not biased with respect to area, average gray level or gray level entropy. Group 3 (the nuclei segmented only by the automatic algorithm) on the other hand, seems to be biased towards smaller area and a slightly larger range of mean nuclear gray level and gray level entropy.
An Expert Visual Evaluation
As a separate test, we have performed an expert visual inspection of both the manual and the automatic segmentations of the biologically most relevant nuclei (area > 2,400 pixels), where the 25% threshold on under-segmentation was not in effect. This corresponds to a situation where the “ground truth” is not available for comparison with the result from the automatic segmentation algorithm. About 2.5% of the manual segmented nuclei were judged to be erroneous, whereas 12% of the automatic segmented nuclei were erroneous, resulting in a few more correctly segmented nuclei compared to the manual method.
Evaluation of the Method on Four Cases of Prostate Cancer in Another Study
The manual inspection of the segmented nuclei of four cases of prostate cancer in another study showed that between 93 and 98% of the epithelial cancer cell nuclei were segmented with good quality (Table 2). We regard 2–7% erroneous segmented nuclei as a very good result. Depending on the downstream use in the nucleotyping system one may choose to manually exclude these nuclei before further analysis, or go with the old principle “do more less well” allowing for an almost unlimited number of nuclei as manual editing is not required.
Table 2. The automatic segmentation was applied on 260 cases of prostate cancer in another study
|Case||No. frames||No. nuclei||% very accurate||% acceptable||% error|
The nucleotyping system typically measured about 2,500–3,000 frames/case and a maximum limit of 5,000 frames was set for the largest cases, resulting in about 30,000 nuclei/case. The segmentation of a typical frame image was performed in 2 min, and using our cluster consisting of 192 kernels about 200 frames were segmented in 2 min, and all the frames of the 260 cases were segmented in about 5 days.
Robustness of the Method to Changes in Parameter Values
The result of the proposed segmentation method depends on a sequence of image processing steps containing a large number of parameter values. We have analyzed the importance of including each step and the robustness or stability of the method by changing one parameter value at a time, keeping the other parameter values constant. These tests were performed on the image frames from Case 1, considering all nuclei >450 pixels.
The adaptive thresholding of the gray level frame images is clearly important (Fig. 2a) and the postprocessing step of Yanowitz and Bruckstein (27) is crucial in removing false objects from the binary frame image (Fig. 2b). A δ value of 0.2 (in the Niblack method) gave good results, but changing this to 0.1 or 0.3 did not influence the sensitivity or specificity. The object validation step of Yanowitz and Bruckstein involves a threshold on the gradient along the perimeter of the object candidate (here chosen as 0.1). A 50% change in this parameter did not influence the sensitivity or specificity.
Another crucial component of the method is the GVF snake. Initial tests using a traditional snake (21) required a very precise initialization, and therefore gave poor results. Using the GVF snake with an annular edge map, on the other hand, produced very satisfactory results. Halving the elasticity and rigidity parameters (α = 5, β = 10) reduced the number of detected nuclei from 71 to 67% of the manually segmented nuclei, but did not affect sensitivity and specificity. Reducing the external force weight from κ = 0.6 to 0.5 increased the number of detected nuclei from 71 to 73% of the manually segmented nuclei, but again this did not affect sensitivity or specificity. The number of iterations (of the snake deformation) was set to 30. Increasing the number of iterations did not improve the results, but the same results were obtained by just 10 iterations. The GVF regularization coefficient, on the other hand, will influence the results. It was set to μ = 0.2, and increasing it to μ = 0.3 brought the sensitivity down from 95 to 94%, the specificity down from 96 to 89%, and the number of detected nuclei from 71 to 60% of the manually segmented nuclei. Decreasing it to μ = 0.1 lowered the number of detected nuclei from 71 to 69% of the manually segmented nuclei but did not affect the sensitivity or specificity.
- Top of page
- MATERIALS AND METHODS
- LITERATURE CITED
- Supporting Information
We have developed a method for automatic segmentation of cell nuclei from Feulgen stained tissue sections of prostate cancer. The algorithm was developed on a few frames and tested on a total of 30 frames from three patients. Comparing the results from the automatic algorithm to the manual segmentation of the same set of frames, the automatic method segmented a few more nuclei compared to the manual method, but about 73% of the manually segmented nuclei were also segmented by the automatic method.
A total of 924 cell nuclei were both manually and automatically segmented. An accuracy analysis indicated a very close correlation between the area of the manual segments and the areas of the automatic segmentation results on the same nuclei (r = 0.99). The mean sensitivity was 95% and the mean specificity was 96%. There was, however, a systematic bias towards a slightly larger manual nuclear area, and a spread consistent with a 1–2 pixel wide perimeter outside/inside the manually segmented nuclei. Scattergrams of characteristic nuclear parameters have verified that for the nuclei segmented by both methods, the difference between the manual and the automatic segmentation results are within acceptable limits.
We have also verified that the distribution of nuclei that are left out by the automatic segmentation algorithm (i.e., the nuclei segmented only by manual experts) is not biased with respect to area, mean gray level or gray level entropy. The distribution of nuclei segmented only by the automatic algorithm, on the other hand, seems to be biased towards smaller area and a somewhat larger range of mean nuclear gray level and gray level entropy.
The objects that were segmented only by the automatic method could be: (i) correctly segmented nuclei, (ii) incorrectly segmented nuclei, or (iii) false objects (“nonnuclei”). Segmented objects that should not be included in further texture analysis are handled by the automatic cell classification system and the manual inspection performed after automatic segmentation and cell classification. An independent test of the method on four cases in another study showed that between 93 and 98% of the epithelial cancer cell nuclei were segmented with good enough quality for further texture analysis.
The reproducibility of both the manual and the automatic segmentations remains an interesting question. This could be studied by letting more individuals do manual segmentations of the same image fields, and repeating automated segmentation of the same microscopic fields digitized multiple times, assessing eventual statistical differences between multiple manual and computer segmentations. Such a task is however, a whole new study, and we have therefore not attempted to pursue this line in the present article.
A quantitative evaluation of the segmentation results is vitally important. This is true not only because we need a quantified agreement to replace manual segmentation with an automatic procedure. It is definitely needed if we are to prove the validity of a given method based on a data set that is much too large to oversee and evaluate manually. Thus, the assessment of segmentation accuracy as well as the intercomparison of the three groups of object parameter distributions should be streamlined and incorporated into the segmentation algorithm. Then, the testing and resetting of algorithm parameters for a convergence towards an optimal segmentation result could be more or less automatic. Ideally, the evaluation of accuracy and bias could be constructed as a feedback mechanism, using an automatic gradient search for an optimal set of parameters to the segmentation algorithm. But even at the present stage, we believe that our approach opens the possibility of large-scale nuclear feature analysis based on an automatic segmentation of cell nuclei in Feulgen stained histological sections.