Correlation-based methods of automatic particle detection in electron microscopy images with smoothing by anisotropic diffusion

Authors

  • W. V. Nicholson,

    Corresponding author
    1. School of Biomedical Sciences, University of Leeds, Worsley Building, Leeds LS2 9JT, U.K.
      William V. Nicholson. Tel.: +44 (0)113 343 3034; e-mail: william_v_nicholson@yahoo.com
    Search for more papers by this author
  • R. Malladi

    1. School of Biomedical Sciences, University of Leeds, Worsley Building, Leeds LS2 9JT, U.K.
    2. Computing Science Department, University of California, Berkeley, CA 94720, U.S.A.
    Search for more papers by this author

William V. Nicholson. Tel.: +44 (0)113 343 3034; e-mail: william_v_nicholson@yahoo.com

Summary

Two methods of correlation-based automatic particle detection in electron microscopy images are compared – computing a cross-correlation function or a local correlation coefficient vs. azimuthally averaged reference projections (either from a model or from experimental particle images). The ability of smoothing images by anisotropic diffusion to improve the performance of particle detection is also considered. Anisotropic diffusion is an effective method of preprocessing that enhances the edges and overall shape of particles while reducing noise. It is found that anisotropic diffusion improves particle detection by a local correlation coefficient when projections from a high-resolution reconstruction are used as references. When references from experimental particle images are used, a cross-correlation function shows a more marked improvement in particle detection in images smoothed by anisotropic diffusion.

Introduction

Cryo-electron microscopy and single-particle image analysis are emerging as powerful techniques in structural biology. In particular, these techniques can be applied to large macromolecular assemblies lacking symmetry, which are difficult to study by other methods. Ever improving resolution is currently being achieved for a diverse range of macromolecular complexes. Among recent examples, resolutions of 11.5 Å have been reported for the E. coli ribosome (Gabashvilli et al., 2000), a very large, multi-subunit particle; 22 Å for bovine complex I (Grigorieff, 1998), a 890-kDa membrane protein; and 21 Å for the catalytic subunit of the DNA-dependent protein kinase (Chiu et al., 1998), a relatively small, 460-kDa soluble protein assembly. Some quite high-resolution results that have been reported recently, e.g. 7.5 Å for the 50S ribosomal subunit (Matadeen et al., 1999) and 9.5 Å for the spliceosomal U1 small nuclear ribonucleoprotein particle (Stark et al., 2001) have used significantly fewer particles than have been required (for the same resolution), for example, in previous work with virus particles. The discrepancy in the claimed resolution, for a given number of particles, with the values achieved in previous work has not yet been reconciled.

As radiation damage imposes strong limitations on the electron doses used to collect micrographs of biological macromolecules, images of individual particles have a low signal-to-noise ratio, which can only be overcome by averaging a large number of images. The number of particle images required for a three-dimensional (3D) reconstruction increases dramatically with the desired resolution (e.g. 73 000 particles were used in the ribosome reconstruction at 11.5 Å resolution). It is estimated that the number must be increased to about 1 million before it is even physically possible to reach ‘atomic’ resolutions, i.e. better than 4 Å (Henderson, 1995; Glaeser, 1999), when using images of the currently available quality.

As a first step in image processing, selection of particles from each micrograph is performed either manually, using software for interactive particle selection (Frank et al., 1996), or by computer-assisted, semi-automated methods. In either method, particle selection becomes a very labour-intensive step in image processing, as one works toward ever increasing resolutions. Thus, automation of particle selection will be necessary to avoid this stage becoming a serious bottleneck. Several approaches that have been proposed for automatic particle selection, which have met with varying degrees of success, were recently reviewed (Nicholson & Glaeser, 2001). The approaches include methods that make use of various forms of template matching, local comparison of intensity values, edge detection, quantitative measures of the local image texture/statistics and neural networks. In particular, various template-matching methods based on correlation have been tried, including cross-correlation with an azimuthally averaged reference (Frank & Wagenknecht, 1984; Thuman-Commike & Chiu, 1995) and the synthetic discriminant filter (Stoschek & Hegerl, 1997).

In this paper, a quantitative comparison is made for several different ways in which one can use both the cross-correlation function and the computation of a local correlation coefficient. It has been suggested that computing a local correlation coefficient can overcome problems of sensitivity of the cross-correlation function to intensity variations in the image. In this work, we also investigate the effectiveness of using different types of reference images (target objects) such as (a) experimental single-particle images, (b) projections from a 3D reconstruction, (c) multiple projections or (d) a Gaussian blob as the reference. In addition, we evaluate the use of an edge-preserving ‘smoothing’ operation, anisotropic diffusion, as a preprocessing step, which is applied to the data before using either the cross-correlation function or the correlation coefficient to search for particles.

Methods

Cross-correlation function

Numerous methods have been described for automatic particle selection by cross-correlating a reference image with the micrograph image:

image

where f(x, y) is the image and g(x, y) is the reference. In this work, azimuthally averaged particle images or projections from a 3D model were used as references to avoid the computational cost of examining each possible orientation of (unaveraged) reference images (Frank & Wagenknecht, 1984; Thuman-Commike & Chiu, 1995). Alternatively, a reference image may be something as simple as a 2D Gaussian function whose size is close to that of the particles that are being sought.

Correlation coefficients

Particle detection was carried out by evaluation of the 2D distribution of correlation coefficients between a template or reference image and the test image (Goshtasby et al., 1984; Gonzalez & Woods, 1992). In practice, this involves computing a cross-correlation function (using a fast Fourier transform (FFT)-based algorithm) and normalizing for high variations in the image intensity in the test image. The correlation coefficient is given by:

image

where f(x, y) is the image, in which one wishes to carry out object detection, and g(x,y) is the reference image. f̄(x, y) is a local average, and the summation is carried out within a small area, defined by a mask function, M, for each point in the micrograph image. is an average computed from g(x, y) with the summation carried out within an area defined by the mask function, M. The term

image

is the local variance in the micrograph image, and it can be computed efficiently using either one of two algorithms described by van Heel (1982). In this work the local variance was computed using the FFT-based algorithm described in van Heel's paper as this is easier to implement for a circular mask, although the sliding sums algorithm is faster.

Multiple references

The use of multiple references, for different projection views of the target object (obtained either by computation from a 3D model or from experimental particle images), for the various template matching functions (cross-correlation function and local correlation coefficient) was evaluated in this work and compared with the use of single references. All references were azimuthally averaged. Reference projections generated from the 3D model were masked with a radius of 32 pixels and the experimental particle images used as references were masked with a radius of 35 pixels. An approximate suitable radius for the mask was determined by manual measurement from particle images. Empirical tests with a range of radii (30–35 pixels) led to the conclusion that manual measurement gives the best choice of radius. The various ribosome projections have radii of between 25 and 30 pixels. Particle images are less well centred than projections from a 3D model; therefore, a larger radius is suitable. When using multiple references, the template matching function is computed for each reference for each position in the test image and the maximum value of the template matching function is used at the given position, in the final map.

Peak searches

Peaks in the maps for the various template matching functions are detected by determination of a suitable threshold and subsequent pruning based on proximity of peaks to one another. Peaks within a user-chosen distance of one another are eliminated in favour of the highest peak. In this work this distance was chosen to be 80 pixels (slightly larger than the approximate diameter of the particles). This choice of distance eliminates some aggregates and contaminants that are larger than true particles. Image processing and other computations were carried out using SPIDER (with some of the new required functionality supported in locally implemented SPIDER operations) and auxiliary Fortran programs.

Micrograph images

Micrograph images of ice-embedded ribosome particles were used as test data. The micrographs were part of a dataset made available to participants at the ‘Single Particle Reconstruction from Electron Microscope Images’ Course held at Pittsburgh Supercomputing Center on 21–24 July 1999. Comparable data are available as part of the SPIDER software distribution (Frank et al., 1996).

Anisotropic diffusion

Particle detection was attempted in micrograph images filtered using anisotropic diffusion. The effectiveness of using reference images filtered with anisotropic diffusion was also investigated. In this work, anisotropic diffusion by Beltrami flow was used.

The Beltrami flow equation is a non-linear diffusion method for image denoising and sharpening. It is an attempt to answer the following important question in early image analysis: what is the natural way to treat vector-valued images and images in higher dimensions? The result is a general mathematical framework, due to Sochen, Kimmel, and Malladi (Sochen et al., 1998; Kimmel et al., 1999), for feature-preserving image smoothing that applies seamlessly to grey-level, vector-value (colour) images, volumetric images and movies. The main idea is to view images as embedded maps between two Riemannian manifolds and to define an action potential that provides a measure on the space of these maps. The authors (Sochen et al., 1998) showed that many classical geometric flows emerge as special cases in this view as well as a new flow, the so-called Beltrami flow, that moves a grey-level image under a scaled mean curvature, and also succeeds in finding a natural coupling between otherwise decoupled component-wise diffusion that was often used in the past in vector-valued image diffusion.

In the present context we are interested in noise reduction and enhancement of cryo-EM images. This can be effectively done using the reaction-diffusion form of the Beltrami flow equation (Malladi & Ravve, 2002). We also employ the fast, unconditionally stable, semi-implicit schemes described in that work.

The reaction-diffusion form of the Beltrami flow equation is given by

image

where h is the edge indicator function. The first, reaction term (cos β) · ∇h · ∇U is responsible for edge enhancement and the second, diffusion term (sin β) · h · ∇2U is responsible for smoothing. β is a parameter that controls the relative contribution of these terms. Images were smoothed by anisotropic diffusion with β = 63.4° following contrast enhancement by histogram equalization and initial filtering with a 5 × 5 box filter.

Interactive particle selection

In order to evaluate the different automated methods of particle selection, the coordinates of a set of true particles are needed as a standard. In the case of experimental images, a set of interactively selected particles were used as the set of true particles. The coordinates were obtained by using the boxer program in EMAN for interactive particle selection (Ludtke et al., 1999). The boxer program has a graphical user interface that allows a human operator manually to select particles by pointing and clicking with a mouse on particles in a micrograph image displayed on a monitor.

Results

A high defocus image was used in tests to evaluate various methods for automatic particle selection. Figure 1(a) shows a 1024 × 1024 field from a larger (2048 × 2048) micrograph image that was used to test different methods of automatic particle detection. The larger image was smoothed using Beltrami flow and Fig. 1(b) shows the corresponding 1024 × 1024 field from the smoothed image. Figure 1(c) shows the particles that were selected interactively from the field that is shown in Fig. 1(a). Particles can be easily distinguished in high-defocus micrographs and manual selection of particles is relatively unambiguous.

Figure 1.

(a) A 1024 × 1024 field taken from a larger micrograph image showing ice-embedded ribosome particles. The micrograph was part of a dataset made available to participants at the ‘Single Particle Reconstruction from Electron Microscope Images’ Course held at Pittsburgh Supercomputing Center on 21–24 July 1999. Comparable data are available as part of the SPIDER software distribution (Frank et al., 1996). (b) The corresponding field to that shown in (a) taken from the same larger micrograph image after smoothing by Beltrami flow. (c) The same field as (a) with boxes at the locations of interactively selected particles.

The relative effectiveness of each of the particle-detection schemes tested here is presented in the form of parametric plots of hits (as a percentage of true particles) vs. false particles (as a percentage of true particles), with each variable depending on the threshold. Coordinates of peaks in the template matching function are accepted as hits when they are within a specified distance of the coordinates from ‘true particles’ that were selected manually earlier. The distance used is of the same order as the size of the particles.

The automatic particle detection methods were also evaluated by plotting percentage hits vs. false positives as a percentage of automatically selected particles. This is an alternative method of displaying the data that is more informative as to how ‘contaminated’ an automatically selected data set is, i.e. what percentage of the data consists of false particles.

When azimuthally averaged experimental particle images were used as references, it was found that there is little significant difference between the results obtained with the various methods used here. Figure 2(a) shows plots of hits vs. false particles as a percentage of true particles for correlation-based methods and Fig. 2(b) shows results from the same tests in plots of hits vs. false particles as a percentage of auto-selected particles. Particle detection using smoothed experimental particle images is generally less successful than using non-smoothed experimental particle images, as is shown by the data presented in Fig. 3. In all tests that are shown in Figs 2 and 3, the results are markedly worse when a map of correlation coefficients is used to identify particles within an image that had been smoothed by anisotropic diffusion.

Figure 2.

One hundred experimental particle images used as references. (a,i) Plot of %Hits vs. %False of true particles using a single reference. (a,ii) Plot of %Hits vs. %False of true particles using multiple references. (b,i) Plot of %Hits vs. %False of auto-selected particles using a single reference. (b,ii) Plot of %Hits vs. %False of auto-selected particles using multiple references.

Figure 3.

One hundred smoothed experimental particle images used as references. (a) Plot of %Hits vs. %False of true particles. (b) Plot of %Hits vs. %False of auto-selected particles.

Obtaining a refined reconstruction may require a relatively large investment of effort; we therefore investigated to see if using projections from an unrefined reconstruction is substantially better than using experimental particles images or worse than using projections from a refined reconstruction. However, it was concluded that particle detection using azimuthally averaged reference projections from an unrefined 3D reconstruction is not substantially different in performance to using experimental particle images. Figure 4 shows plots of hits vs. false positives for particle detection by correlation-based methods using azimuthally averaged projections from the unrefined 3D ribosome reconstruction.

Figure 4.

Twenty projections from an unrefined 3D ribosome reconstruction used as references. (a) Plot of %Hits vs. %False of true particles. (b) Plot of %Hits vs. %False of auto-selected particles.

Particle detection using azimuthally averaged reference projections from a refined 3D reconstruction is nevertheless more successful than from an unrefined 3D reconstruction. Figure 5 shows plots of hits vs. false positives for particle detection by correlation-based methods using azimuthally averaged projections from the refined 3D ribosome reconstruction. In addition, when using projections from the refined reconstruction, particle detection is most successful in smoothed images when using correlation coefficients.

Figure 5.

Twenty projections from a refined 3D ribosome reconstruction used as references. (a) Plot of %Hits vs. %False of true particles. (b) Plot of %Hits vs. %False of auto-selected particles.

In smoothed images there are extended areas of little intensity variation except near the boundaries of particles and the variance is very low in these areas. In the correlation coefficient, the variance appears in the denominator; so the computation may be sensitive to numerical error (when using the FFT-based algorithm for the local correlation coefficient) as small changes in the low variance cause large changes in the local correlation coefficient. In addition, smoothing may have removed internal features in the particles that can be useful in detection while making particles brighter compared with the background. The local correlation coefficient performs worse because of this; but the cross-correlation function, which is not divided by the local variance, is more effective in detecting the brighter particles (as a result of higher local variances rather than an improved match to the references). Anisotropic diffusion raises amplitudes in the Fourier transform at higher frequencies in the images not represented in the references (in those cases where they are particles from the micrograph or projections from a low-resolution reconstruction). This has relatively little effect on the cross-correlation function, because the Fourier transforms of those references are low or near zero at the higher frequencies so the contribution from the high values at higher frequencies in the images is low as is cancelled out by multiplying with the low or zero values in the Fourier transforms of the references. However, because the local correlation coefficient is divided by the variance (which is larger owing to greater contributions from the higher frequencies), detection is poorer. It is interesting that the local correlation coefficient is somewhat more effective at particle detection in images smoothed by anisotropic diffusion when high-resolution references are available. This suggests that it is better able to make use of high-resolution edges in the image for particle detection when high-resolution templates/references are available.

Particle detection by ‘blob correlation’, which uses a 2D Gaussian function as the template, performs similarly to correlation using azimuthally averaged reference projections; but it does seem to be quite sensitive to making an optimal match between the particle size and the Gaussian ‘blob’ size. The relatively high success rate in comparison with that obtained for azimuthally averaged reference projections may be because azimuthally averaging particle images produces a poor reference image or because the problems of structural heterogeneity in the ribosome particles mean that a choice of reference that has relatively little discrimination is better for detecting particles in the micrograph image. Figure 6(a) shows a plot of percentage hits vs. false positives as a percentage of true particles in the image. Particle detection is most successful when a radius of 40 pixels is used. Figure 6(b) shows that carrying out blob correlation with a smoothed image does not make a significant difference to the success of particle detection. In the tests of blob correlation, peaks that were too close to each other (within a distance of twice the radius in pixels) were eliminated in favour of the higher peaks as described in the Methods section. The maximum level of hits obtained with the larger choices of blob radius is significantly lower than 100%. This is because the particles are about 40 pixels in radius and some particles are fairly close, and therefore some detection peaks for true particles are eliminated as described earlier.

Figure 6.

Blob correlation. Gaussian blobs of varying radii were used as references. Plots of %Hits vs. %False of true particles. (a) Particle detection in the original, unsmoothed image. (b) Particle detection in the smoothed image.

Discussion

The results show that correlating with multiple references, at least in the case of globular particles such as the ribosome, has little advantage over using a single reference. Multiple references are not widely used in the literature, and the quantitative evaluations reported here seem to validate the practice of using only a single reference. A description of particle detection by cross-correlation with multiple references is provided in Ludtke et al. (1999), however. It is reasonable to expect that small numbers of reference projections may be useful when images have top views and side views that are very different in appearance, as they are for the GroEL particle.

In the case of correlation with reference projections from a refined ribosome reconstruction, particle detection is more effective in smoothed images (for levels of hits below 75%). Otherwise, smoothing images does not produce dramatic improvements in particle detection.

Particle detection in images using blob correlation with a suitable choice of blob radius is similar to the results from correlation with reference projections from a 3D reconstruction. Results from using blob correlation were earlier described in Lata et al. (1995). In that work, the particles were subsequently screened by using linear discriminant analysis. Using blob correlation to obtain a desired level of hits for a given level of false positives without an additional screening step may be difficult in practice. It is unclear how one could design an automated thresholding method when the ribosome is not substantially similar to a blob although it does have some similarity to the azimuthally averaged references used when computing the cross-correlation function and local correlation coefficients.

In general, computing local correlation coefficients does not substantially improve the cross-correlation function. It is possible to obtain production micrographs for a reconstruction that do not substantially vary in the local background and do not contain bright contaminants; so the ability of the local correlation coefficient to overcome the sensitivity of the cross-correlation function to local intensity variations in the image is not an advantage.

Acknowledgements

We would like to thank Joachim Frank for permission to use the ribosome test image shown in Fig. 1 (comparable data are available on the Wadsworth Center web site at http://www.wadsworth.org/spider_doc/spider/docs/techniques.html) and Robert M. Glaeser for invaluable discussions. This work was supported in part by PACI subaward number 776 from the University of Illinois for NSF Cooperative Agreement number ACI-9619019.

Ancillary

Advertisement