Cervical cancer is the third most common cancer in women, with an estimated 529,000 new cases and 275,000 deaths in 2008 in the world . More than 90% of cervical cancer is caused by human papilloma virus infections which cause changes to endothelial cells before the development of many types of cervical cancers. Therefore, in developed countries, screening by cytology is the most common approach to prevent cervical cancer at a precancerous stage. However, population-wide screening is unavailable in low-resource regions and is suboptimal in developing countries . Therefore, more than 85% of the new cases and about 88% of the deaths of cervical cancer occur in developing and undeveloped countries .
Screening of cervical cytology slides is “very labor intensive and demands that the cytotechnologist be capable of high levels of concentration for extended periods” . Automation-assisted reading techniques have the potential to reduce screening errors and increase availability, especially to developing countries. Currently two Food and Drug Administration (FDA)-approved automated reading systems [4, 5] are commercially available. However, a large, prospective randomized trial found that although the productivity can be increased by these systems, the sensitivity is lower than manual reading [6, 7].
Our goal is to explore a cost-effective and highly sensitive screening technique which is more possible to reach the population in developing countries. The proposed technique combines a considered choice of preparation and staining methods, automated image acquisition and assisted diagnosis.
Several different methods are clinically accepted for preparation of cytological slides for screening cervical cancer. The sample may be smeared on a slide directly after collection or it may be prepared using liquid-based cytology (LBC). The LBC may be manual (MLBC) which requires only a centrifuge, a vortex mixer and a pipettor , or automated (ALBC) which requires commercial machine. Pap smears has been used for many years and clinically well accepted. However, cells are better dispersed in a LBC slide, making it easier for automated image analysis to identify individual cells . Recent studies suggest that the performance of these three methods is similar [10, 11].
Three stains are widely used for cervical cancer screen: Papanicolaou (Pap) stain, proprietary stain (provided by commercial machines), and hematoxylin and eosin (H&E) stain. Compared to Pap stain, the H&E stain is much easier to prepare, lower in cost, and more consistent, therefore is used most often in histopathology but also in cytopathology. Proprietary stains are typically more costly than the other two stains. Figure 1 shows typical images from the conventional smear with Pap stain, the ALBC with proprietary stain, and the MLBC with H&E stain. It can be seen that the cells in Fig. 1(c) are more dispersive than in Fig. 1(a), but with more artifacts than in Fig. 1(b).
In choosing specific slide preparation method and staining technique, we attempted to minimize overall cost and optimize clinical outcome. Although ALBC has its advantages, MLBC does not require advanced skill and experience and can be reliably performed by a technician after some simple training. Therefore, in a community health centers (CHCs) setting, the hardware cost of ALBC is not justified. Our informal survey showed that many CHC and small hospitals are well prepared to perform MLBC. Finally, H&E stain is the most inexpensive and simple to use of the three stains, and has some clinical acceptance. Therefore, in our study, we choose MLBC with H&E stain.
Previous Works in Cervical Cell Segmentation
A variety of segmentation methods for cervical cells have been proposed in recent years. The majority of cytoplasm segmentation used one or multiple of the following techniques: K-means [12, 13], edge detection , thresholding [15, 16], and active contours [13, 17]. Most of these works are designed for images of isolated cells, especially for those in the Herlev data set . For segmentation in images containing multiple cells, thresholding [15, 16, 19] and level set  techniques have been used. For nucleus segmentation, related works can be divided into three groups: 1) single-nucleus segmentation, which utilize contour or shape information by using active contour model [13, 20], parametric fitting , and difference maximization ; 2) multiple-nuclei segmentation, which uses thresholding [17, 22], Hough transform , and morphology (watershed; [16, 23, 24] techniques; 3) touching-nuclei splitting, which uses morphological erosion , Bayesian classification , and active shape model .
Two of the aforementioned methods [15, 16] can achieve the segmentation of cytoplasm, multiple nuclei and touching nuclei. However, these methods were developed on healthy rather than a mix of healthy and pathological cells. Since the size, shape and chromatin of abnormal nuclei vary significantly, further development is needed to address typical variations encountered in a clinical setting. So far we have found only one previous study of automated segmentation of abnormal nuclei , only very limited data was reported.
More recently segmentation of nuclei in histological images use adaptive thresholding combined with active contour model . This automated method is comparable with the manual delineation in segmentation accuracy. Recently, the graph cut (GC) approach  is highly attractive in cell nucleus segmentation. The binarization of nuclei based on GC is addressed in Ref.  with results more accurate than global thresholding. Prior knowledge like the shape , manual annotation and local image features  can be incorporated in the GC framework to allow more robust segmentation.
Previous Works in Cervical Cell Classification
Several cervical cell classification methods have been reported. These methods can be divided into two categories according to their main tasks. The first task attempts to eliminate noncellular artifacts such as debris and inflammatory cell clusters. Typically the shape, size and intensity features [16, 24, 33, 34] are exploited to identify artifacts. For classifier training, the linear discriminant analysis , maximum likelihood , and support vector machines (SVMs; 16,24) are used. Feature thresholding technique is used in Refs.  and  for artifacts elimination in Pap smear. The second task aims to classify cells into abnormal or normal class. The extraction of cell feature is the major areas of research in this task. A wide range of feature types has emerged [3, 22, 36, 37], including optical density, size, shape, texture, contextual information, and whole image measurements. An important work is conducted in Ref. , where a benchmark data set and 20 features are constructed for cervical cells. On this data set, the most informative features are nucleus/cytoplasm (N/C) ratio, brightness of nuclei and cytoplasm, and longest linear dimension and area of nuclei as demonstrated in Ref.  using genetic algorithm combined with nearest neighbor classifier. Recently, in Ref.  it was proposed to classify segmented cell image using a linear plot of two-dimensional (2-D) Fourier and logarithmic transforms.
Most studies assume that accurate segmentation of cytoplasm and nuclei are already obtained. A number of studies investigated classification schemes which either refines classification results by cell patch matching , or directly classifies abnormal cell without segmentation . Recently, the cell classification only based on nuclear feature is studied .
In this article, we propose the first integrated, automation-assisted system for cervical cancer screening on H&E stained MLBC slides. The automatic selection of abnormal cells from cervical cytology specimen comprises three aspects: image acquisition, cell segmentation, and cell classification. An autofocusing method which rejects the coverslip and successfully finds the actual focal plane is introduced. A global and local scheme is proposed to segment both healthy and abnormal cervical cells. A classification framework is designed to improve the sensitivity of abnormal cell recognition and specificity of normal cell recognition. Specific contributions of the presented work consist of:
- Gaussian filter is used as the focus function, and a searching method based on iterative comparison of image qualities of specific locations is proposed to find the global maximum of the focus curve.
- The global multiway GC  on the a* channel enhanced images can obtain effective cytoplasm segmentation when image histograms present nonbimodal distribution, whereas the local adaptive GC (LAGC) method can obtain accurate nucleus segmentation by combing intensity, texture, boundary and region information.
- Feature selection and preprocessing techniques are used to improve the sensitivity, and features which capture contextual and cytoplasmic information are introduced to improve the specificity.