Interest in understanding virus-host cell interactions has increased in recent years (1). It is believed that targeting specific proteins within the host cell instead of viral components may lead to significant improvements in antiviral treatments. Host functional genetic profiling using RNA interference (RNAi) screens provide a systematic approach to obtain a comprehensive overview of cellular pathways that are exploited by viruses. RNAi screening allows for the systematic knock down of each gene of the host cell's genome (2) to determine the effect of cellular gene silencing on virus infection (3).
Our overall aim is the genome-wide identification of cellular genes involved in virus entry and replication. To screen for changes in virus infection when knocking down a certain cellular gene we use small interfering RNA (siRNA) cell arrays in combination with high-throughput fluorescence microscopy (4). With this approach, siRNA and transfection reagents are robotically spotted on a chamber plate at known locations in a grid pattern. As cells are cultured and treated on the printed plates, only those cells located within a printed spot area take up siRNA to undergo gene silencing. For each spot area one two-channel image is acquired using a fully automated fluorescence microscope. The advantages of cell arrays over multi-well plates, which are often used for RNAi screening (5, 6), are identical treatment of all samples with respect to cell seeding, infection, and staining, as well as significantly reduced costs due to lower reagent usage.
Genome-wide screens with more than 20,000 genes can generate more than 200,000 fluorescence images and therefore fully automatic, efficient, and robust image analysis methods are needed. In recent years, a number of approaches for cell nuclei or whole cell segmentation of fluorescence microscopy images has been reported (e.g.,7–11). In high-throughput applications, approaches based on adaptive thresholding (e.g.,12, 13) yielded good results for cell nuclei segmentation, in particular, with unclustered nuclei. To separate clusters of nuclei, watershed-based techniques (e.g.,12, 14, 15) and approaches employing geometric properties (16) have been proposed. Recently, an approach based on multiscale entropy-based thresholding and region merging has been described (17). An approach based on the zero-crossings of the Laplacian was used in (18). For the segmentation of whole cells or the cytoplasm approaches based on deformable models (19), Voronoi diagrams (20), or a combination of both (21) have been proposed. Nevertheless, none of these approaches can cope with all image types and one has to carefully select and adapt algorithms for a particular application.
The key tasks for image analysis in our application are cell nucleus segmentation, detection of regions with transfected cells (siRNA spots), quantification of the virus infection level, and quality control of single images as well as whole plates. In this article, we describe the workflow and image analysis approaches of a system addressing these key tasks. In particular, we propose novel approaches for (i) the segmentation of cell nuclei using a gradient-based thresholding scheme which does not require subsequent postprocessing steps for separating clustered nuclei, (ii) the segmentation of viral protein expression in the cytoplasm based on a combination of a watershed algorithm and region growing, (iii) the localization of regions with transfected cells within cell array images, (iv) cell classification as infected or noninfected, and (v) the detection of out-of-focus images for the purpose of automatic quality control which increases the discrimination power by taking into account only pixels near segmented boundaries. We have evaluated the main algorithms of our system and compared their performance with that of previous approaches. Our whole approach was applied to a large number of images from siRNA high-throughput screens of cells infected by either dengue virus or hepatitis C virus. To our best knowledge, this is the first image-based approach for automatic single-cell-based quantification of viral replication from large datasets, which measures the level of virus replication in transfected cells only, and allows for image quality control.
MATERIALS AND METHODS
Image Analysis Workflow
In our application, the input for image analysis are two-channel images acquired from a chamber plate with printed siRNA spots. Each image corresponds to one siRNA spot (Fig. 1). The first channel displays DAPI stained cell nuclei. The second channel represents expressed fluorescently stained viral protein. To analyze these images we have developed a workflow and image analysis approaches as depicted in the diagram in Fig. 2. First, the cell nuclei are segmented in Channel 1. Then, a binary mask of pixels defining the neighborhood of each nucleus is computed and the viral signal is measured in Channel 2. Next, siRNA spot areas with transfected cells are localized exploiting prior knowledge about the layout of the cell arrays. Then, the cells are classified as infected or noninfected. Finally, the quality of single images and of a whole plate is checked. Each step is described in detail below.
Segmentation of Cell Nuclei
The cell nuclei in our images manifest themselves as bright objects on a dark background. The average intensity is related to the DNA density and varies for different nuclei as well as depends on the cell cycle. Also, the images generally suffer from uneven illumination and varying staining. Therefore, it is difficult to find a single global threshold suitable for all nuclei. Even if adaptive thresholding approaches are used and the thresholds are determined in image sub-regions, it is difficult to cope well with clustered cell nuclei. Here, we propose an edge-based approach which analyzes gradient magnitude images instead of thresholding the original image intensities.
In the 1D case, edges can be determined by computing the gradient (first derivative) of the signal and locating points that have locally maximal gradient magnitudes. In 2D case, edges correspond to ridge points of gradient magnitude images. Points close to edges can be determined by thresholding the gradient magnitude image. An alternative approach for edge finding is to calculate second derivatives of an image and to locate zero-crossings of the Laplacian operator. A nice property of the Laplacian operator is that zero-crossings always produce closed boundaries. Another property is that for bright objects on a dark background the Laplacian is positive on the dark side of an edge and negative on the bright side of an edge (22). In our approach we exploit this property to distinguish between interior and exterior cell nuclei parts and to improve the segmentation result for clustered cells.
The main idea of our approach is to detect pixels on the bright side of edges that have a large gradient magnitude and to subsequently morphologically process the result to obtain the final segmentation. We have observed that cell nuclei with different average intensities that are close to each other are often well separated by determining regions for which the Laplacian operator yields negative values (Fig. 3B). Relevant edge regions are detected by thresholding the gradient magnitude (Fig. 3C). Combining the results of these two operations improves the separation of clustered nuclei without requiring additional postprocessing steps.
The whole scheme for nuclei segmentation consists of six main steps (see Fig. 3): In the first step, a binary image f is obtained from the input image g by combining the results of the gradient magnitude and the Laplacian operator (Figs. 3B–D):
where, ▿ denotes the Nabla operator, , ▿2g = gxx + gyy, with gx, gy, gxx, gyy denoting first and second order partial derivatives of g. To reduce the noise influence the image is smoothed by a Gaussian before applying the Nabla and Laplacian operators. In both cases, we used σ = 1 for the standard deviation of the Gaussian (which was determined based on experimental experience). We automatically determine the threshold T by using the unimodal background symmetry method (23), which assumes that there is one dominant peak in the histogram corresponding to the background pixels (outside edges). The position p of the peak is found and the width w at half of the maximum is computed. The threshold T is set to p + kw, where k is a parameter (in our case we used k = 1/2).
In the second step of our approach, small objects are removed (Fig. 3E). An object is considered small if it has a less number of pixels than a given constant. This constant was determined by analyzing the histogram of the sizes of segmented objects in a number of images after the first step (in our application, we used 80 pixels for this constant). Third, connected components of pixels are labeled (Fig. 3F). In this step a unique identifier is assigned to each 8-connected component of pixels to identify the objects. In the fourth step, each object is morphologically closed with a small disk structuring element, i.e., background structures that cannot contain the structuring element are added to the object (Fig. 3G). We used a 3 × 3 structuring element (24). In this step each object is treated individually preventing merging of objects with different identifiers. Fifth, holes in each object are filled by a standard hole-filling algorithm (24) (Fig. 3H), and finally, cell nuclei are identified (Fig. 3I) based on size, intensity level, and circularity. The appropriate range for each feature was determined based on an analysis of real image data and experimental experience.
Quantification of Changes in Virus Infection
For quantifying changes in viral infection after knocking down a certain cellular gene, there exist two main approaches. Either (i) the level of expressed viral protein (i.e., the viral signal) can be measured for each cell and the changes are quantified for each siRNA spot, e.g., by comparing the mean viral signal over all cells in the siRNA spot or (ii) the percentage of infected cells (called infection rate) can be computed for each siRNA spot and the changes in infection rates are studied. In both cases the viral signal must be determined for each cell. In the second approach, the cells must also be classified as infected or noninfected.
In our case, we compute the viral signal for each segmented cell nucleus in its neighborhood by computing the mean of the pixel values in the virus signal channel (Channel 2). To prevent overlaps between neighborhoods of different cells we partition an image into influence zones (IZs) of segmented nuclei (Figs. 4A and 4B). The IZs are computed using a seeded watershed transform of the Gaussian filtered (σ = 1) and inverted virus channel (Channel 2) with the segmented cell nuclei as initial seeds. We have implemented three different approaches for defining the neighborhood of a cell nucleus based on: (i) dilating the cell nucleus mask (Fig. 4C), (ii) dilating the cell nucleus mask inside its IZ (Fig. 4D), and (iii) region growing inside IZ (Fig. 4E). The region growing algorithm is started from the pixels at the cell nucleus boundary. For each IZ, the mean (μIZ) and standard deviation (σIZ) of the pixel values at the cell nucleus boundary are computed. All pixels within an IZ with intensities inside the range [μIZ − kσIZ, μIZ + kσIZ] that are connected to the cell nucleus boundary are included (we used k = 1). Connected pixels to the cell nucleus boundary are determined using morphological reconstruction by dilation (24).
The cells are classified as infected or noninfected according to the estimated viral signal. Cells with a viral signal less than a threshold are classified as noninfected, whereas the others are classified as infected. The threshold is determined automatically by maximizing the difference in infection rates between positive and negative controls. In positive controls the viral protein production is blocked and therefore the signal is reduced. In negative controls the virus replication is not altered. To detect changes in viral infection we use a measure denoted as infection rate ratio, which is defined by where IRi is the infection rate within the siRNA spot i and IRN is the normal infection rate of a virus, i.e. the percentage of infected cells without knocking down a certain gene.
Determination of Circular Regions with Transfected Cells
Only cells that are located within printed siRNA spots can be transfected and only those should be quantified (note that changes in the viral signal can be observed only in transfected cells). Cells outside printed circular regions (siRNA spots) cannot take up siRNA (neglecting rare migration of cells) and thus should exhibit a normal virus signal. Therefore, it is important to localize siRNA spot areas with transfected cells to select cells that are located within printed siRNA spots. The problem of finding siRNA circular areas is not easy because cells are present on the whole plate and if a particular siRNA has no effect on virus infection there is no change in viral signal, which could be exploited to find the spot. On the other hand, if the knockdown of a certain gene leads to a change in virus signal a clear difference between transfected and nontransfected cells is observable (e.g., Fig. 1B).
A usual approach for localizing siRNA spot areas in cell arrays is to manually determine a circle within the first image and to apply the same position for the other images. In this case, the position of the first spot is marked on the plate with a pen by a biologist in a laboratory. However, using the same spot position for all images does not take into account possible tilting of the whole plate and hence errors are generally introduced. To improve the accuracy, we have developed an automatic approach for localizing siRNA spots directly from the image data. This approach combines local information extracted from single images and global information about the dimension and layout of the printed spots.
The idea is to detect siRNA spot areas in images with altered viral protein expression and to extrapolate the locations of the areas to other images by using prior knowledge about the printed grid (spot diameter, spotting distance). Our approach for localizing siRNA spots consists of two steps:
1For each image g of a plate, a position [x,y] of a circle of known fixed diameter d is found, for which the difference dg in the mean viral signal μIN of cells inside the circle and the mean viral signal μOUT of cells outside the circle is maximal, i.e.
2All images with differences dg in a certain range are selected and a grid of known parameters is fitted to the computed circle positions [x, y] using a least-squares approach. The idea behind selecting only differences in a certain range is to choose only those images from Step 1 for which the siRNA spot was detectable.
The appropriate range of differences was determined based on simulations. We found that too large or too small differences were correlated with erroneous estimations of circle positions.
Since in high-throughput screening applications a large number of images need to be analyzed and some of them may be of poor quality (e.g., out-of-focus, no cells in certain areas, image artifacts), we need algorithms that can assess the quality of the data to exclude failures from statistics. Quality checks can be carried out on two levels: on the whole plate level and on the single image level.
On the whole plate level, the main goal is to sort out unsuccessful experiments. To this end, statistical parameters can be computed based on the results for positive and negative controls (25). In our approach, we use measures which are directly computed from the images. First, we calculate the percentage of saturated pixels in Channel 2 (viral protein). To limit the effect of image artifacts we determine saturated pixels only in the neighborhoods of identified cells. A high percentage of saturated pixels are related to overexposure. Overexposed plates can be excluded and the images can be reacquired with decreased exposure times.
To visualize whole plate related problems, e.g., due to improper staining or cell seeding, we have implemented a graphical user interface (GUI) which displays all images of a plate in one overview tiled image. A user can view the original data, the segmentation results, as well as quality tags of single images. The images are tagged automatically as described below. The GUI plays a significant role especially during optimization of the sample preparation and validation of the automatic quality control of an experiment.
On the single image level, we automatically tag images as “low quality” if (i) the number of cells is outside a given range or if (ii) the images are classified as “out-of-focus”. The first criterion enables excluding images with a too small or too large number of cells, which is typically related to uneven seeding. This criterion also excludes most out-of-focus images because no or only a small number of cell nuclei are usually recognized in blurred images. The images satisfying criterion (i) are classified as “out-of-focus” if the average gradient magnitude calculated from image regions near the boundaries of segmented cell nuclei is lower than a threshold. Our approach is motivated by the fact that the boundaries of segmented objects are related to image edges and the gradient magnitude of an edge point is related to image sharpness. The threshold is computed as the mean minus three standard deviations of the gradient magnitude calculated from all images of a plate. To increase the robustness of estimating the mean and standard deviation we exclude average gradient magnitude values larger than a certain threshold. This threshold was determined by the unimodal background symmetry method (23) as described earlier (we used k = 1).
Samples and Image Acquisition
We have applied our whole approach to real image data of hepatitis C virus (HCV) and dengue virus (DV) high-throughput screening experiments. The cell arrays were prepared on chambered cover glass tissue culture plates (Nalge Nunc, Wiesbaden, Germany). Transfection reagents together with siRNAs were printed typically in a 12 × 32 grid using a chip writer compact robot (Bio-Rad) with solid pins (Point Technologies) resulting in a spot diameter of ∼400 μm. The siRNAs used in our experiments were taken from kinase or cytoskeleton libraries (Ambion). Seven different plates are needed for a kinase experiment and four different plates for a cytoskeleton experiment. Each experiment was repeated several times (between four and eight repetitions). Cells of a human hepatoma cell line (Huh7 (26) for DV and Huh7.5 (27) for HCV, respectively) were seeded on spotted plates and incubated for a certain time period (24–48 h). Subsequently, the cells were infected with a virus, namely a green fluorescent protein (GFP)-tagged HCV (28) or with DV 2 (strain New Guinea C). After 24–48 h the cells were fixed and fluorescently stained. Cell nuclei (cellular DNA) were labeled by DAPI and a viral protein was labeled by immunofluorescence. For DV we used primary antibody anti DENV E mouse monoclonal, HB46 [American Type Culture Collection (ATCC), Manassas, VA] and secondary antibody anti-mouse Alexafluor 546, (Invitrogen, Karlsruhe, Germany). For HCV we used a monoclonal anti-GFP-antibody (Roche Diagnostics GmbH, Mannheim, Germany) and secondary antibody anti-mouse Alexafluor 546, (Invitrogen, Karlsruhe, Germany). For each siRNA spot area and each channel one grayscale image was acquired using the high-content scanning system Scan ˆR (Olympus, Heidelberg, Germany) with magnification 10×, NA = 0.40, and CCD camera pixel size 6.45 μm × 6.45 μm. The typical image size is 1,344 × 1,024 pixels. As an output of the high-throughput scanning, we typically obtain a set of 384 two-channel images for each plate of HCV or DV experiments.
Prior to applying the overall approach to real image data, we tested and evaluated single algorithms and compared them with previous approaches.
Segmentation of Cell Nuclei
The cell nucleus segmentation algorithm was evaluated using real image data, where ground truth was obtained from two experts who marked cell nuclei in randomly selected real images from different experiments (in total 1,914 nuclei were used). We compared the results of our approach with two previous approaches, namely adaptive thresholding by Otsu's method (29) (ATO) as well as ATO followed by a three-step model-based strategy for separating cell clusters (SCC) based on the watershed transform as implemented in CellProfiler (15, 30). With ATO 78.5% of cell nuclei were correctly segmented. ATO followed by SCC yielded 95.8% and using our gradient-based thresholding (GBT) approach we achieved a segmentation accuracy of 97.1% (1,859 correctly segmented nuclei), see Table 1. Our approach was particularly superior to ATO in segmenting clustered cells (compare Figs. 5A and 5C). SCC significantly improved the results obtained by ATO (compare Figs. 5A and 5B), but failures in separating elongated cell nuclei were observed (Fig. 5B). The reason for this is that SCC assumes a circular shape of the nuclei. We also measured the computation time averaged over nine different images. ATO required 5.3 s and ATO followed by SCC needed 15.3 s. The computation time of our approach turned out to be 7.6 s (see Table 1).
Table 1. Performance comparison of approaches for cell nucleus segmentation
SEGMENTATION ACCURACY (%)
COMPUTATION TIME (s)
For all approaches, a hole-filling step was included. The values for the computation time are mean values over 9 different images (Dual Core AMD Opteron Processor, 2.6 GHz, 64 bit Linux).
ATO, Adaptive thresholding by Otsu's method; ATO + SCC, ATO followed by a model-based strategy for separating cell clusters; GBT, Novel gradient-based thresholding scheme.
ATO + SCC
Quantification of the Virus Signal
We studied the effect of using different cell nucleus neighborhoods for quantifying the virus signal, which is computed as the mean intensity in Channel 2 in the given neighborhood. For 112,067 cells in total from one plate, we measured the virus signal for each cell by using three different types of neighborhoods: neighborhoods based on simple dilation (SD), restricted dilation (RD), and region growing (RG) in IZ (see Material and Methods). To compare the results, we computed differences in the measured signal for each cell while using a different type of neighborhood. It turned out that we obtain almost the same results with SD and RD (mean difference: 0%, standard deviation: 2.5% of the dynamic intensity range). With RG we measured on average a slightly higher virus signal level than with SD as well as RD (mean difference: 1% of the dynamic intensity range, standard deviation: 3.8%). The reason for this difference is that with RG we segment the cytoplasm and compute the mean intensity only from pixels inside the cytoplasm, whereas with SD as well as RD we compute the mean intensity from pixels near a cell nucleus without segmenting the cytoplasm and therefore background pixels are generally included. We do not consider these differences to be significant and therefore we mostly use the approach based on SD in our application because of its simplicity and significantly lower computation time.
Detection of Out-of-Focus Images
We have also analyzed the performance of our approach for the detection of out-of-focus images. We have determined the discrimination power of our gradient-based out-of-focus measure for 384 images from one plate and have compared it with another algorithm using the normalized variance of pixel values in the image (31). An expert classified the images into three different categories: “in-focus”, “out-of-focus”, and “hard to decide”. The latter class was mostly assigned to images where some cells were in-focus and some were out-of-focus. We computed the discrimination power as the maximal possible percentage of correctly distinguished “in-focus” and “out-of-focus” images by
where, m is the out-of-focus measure for which discrimination power (DP) is computed and Tm is a threshold which is chosen such that DP is maximized. Using our approach we achieved DP = 99.4% whereas for the normalized variance we obtained DP = 97.4% (Fig. 6). Note that the latter approach was ranked best in the performance study in (31).
Application to High-Throughput Screens
Our overall approach has been applied to more than 55,000 images of kinase and cytoskeleton screens of cells infected either by HCV or DV using cell arrays with 384 spots. For both types of viruses we obtained a good agreement with decreased infection rate ratios (IRRs) in positive controls as compared to IRRs in negative controls. Besides positive controls, reduced IRRs were also observed in other siRNA spot areas targeting, for example, cellular genes involved in viral infection. This is illustrated for some siRNA spots in Fig. 7C which were selected from a kinase DV screening experiment presented in Fig. 7B. Seven different plates with the same siRNA spot layouts were prepared and imaged on different days. The mean values of these seven repetitions and their standard deviations are shown. Note that we obtained a clear decrease in IRR not only in positive controls but also for other spots (e.g., at position i12). This indicates the applicability of the whole approach.
We have described an automatic approach for analyzing image-based high-throughput screens using cell arrays to identify genes involved in virus entry and replication. The overall image analysis approach allows for fully automatic and accurate quantification of a large number of images on single cell basis. Analyzing phenotypes at the level of single cells is critical to determine and study distributions of measured quantities in contrast to using only average values (32). The approach presented here was designed based on requirements of a specific application and the individual algorithms were carefully selected and adapted. Nevertheless, we believe that our approach as well as the presented ideas and experimental comparisons may be applicable in other cytometry high-throughput applications.
In particular, we have described a novel gradient-based thresholding scheme for cell nucleus segmentation which does not require postprocessing steps for cluster separation. This approach does not use the image intensities directly, but is based on thresholding the gradient magnitude while only taking into account pixels where the Laplacian of Gaussian operator yields negative values. In our approach we assume that after the first step of the algorithm (result shown in Fig. 3D) there is exactly one connected component of pixels based on which a nucleus is identified in the subsequent steps. In our experiments we did not observe that a nucleus is separated into two different connected components after the first step. In general, however, this may occur. For example, if the threshold T used to obtain the result in Fig. 3C would be chosen much higher. In this case, the approach could be extended by introducing an additional step comprising morphological reconstruction by dilation (24) of a binary image corresponding to the negative values of the Laplacian of Gaussian (Fig. 3B) from the intermediate result shown in Fig. 3D.
In our validation study, the proposed cell nucleus segmentation algorithm correctly segmented 1,859 nuclei from a total of 1,914 available nuclei, resulting in 97.1% accuracy. In comparison, a smaller number of correctly segmented cells were obtained by adaptive thresholding (78.5%) as well as by adaptive thresholding followed by model-based cluster separation (95.8%). The increase in performance using our approach was observed particularly for clustered cell nuclei which had different average intensities. Also, our algorithm is about two times faster than adaptive thresholding followed by model-based cluster separation. The segmentation accuracy we achieved is also better than the accuracy of 84% recently reported by Gudla et al. (17) using an algorithm that was designed to work in the presence of uneven illumination and clustered nuclei. Note, however, that the stated segmentation accuracy values are difficult to compare because different images were used.
To eliminate low quality images we described an approach for out-of-focus detection which measures the mean gradient magnitude from regions near segmented cell nucleus boundaries. In recent years, several autofocusing approaches have been proposed (31, 33, 34). In these approaches, a certain measure is typically calculated for several focal planes and the optimal plane is determined by maximizing the measure. The problem of out-of-focus detection addressed in this article is different in that we need to discriminate blurred and sharp images. Nevertheless, autofocus measures can also be used for the task considered here. We experimentally compared our approach with an approach based on the normalized variance of the image intensities, which was ranked best in (31). For our approach we obtained a higher discrimination power, namely of 99.4%, whereas using the normalized variance we obtained 97.4%. The main difference to other measures is that we exploit the result of image segmentation.
Our integrated approach has been applied to more than 55,000 images in different screening experiments with cells infected by either hepatitis C or dengue viruses. It turned out that the obtained results are in good agreement with the expected behavior and encourage the application to image datasets from other high-throughput experiments, in particular, genome-wide screens.
The authors thank Nathalie Harder for providing an implementation of adaptive thresholding by Otsu's method, Carolin Wohlfarth for her help with annotating the images, Nathan Brady for checking the English, and the Delft University, Netherlands, for providing the DIPImage toolbox.