Relative labelling index: a novel stereological approach to test for non-random immunogold labelling of organelles and membranes on transmission electron microscopy thin sections


Professor Terry M. Mayhew. Tel.: +44 (0)115 970 9414; fax: +44 (0)115 970 9259; e-mail:


Simple and efficient protocols for quantifying immunogold labelling of antigens localized in different cellular compartments (organelles or membranes) and statistically evaluating resulting labelling distributions are presented. Two key questions are addressed: (a) is compartmental labelling within an experimental group (e.g. control or treated) consistent with a random distribution? and (b) do labelling patterns vary between groups (e.g. control vs. treated)? Protocols rely on random sampling of cells and compartments. Numbers of gold particles lying on specified organelle compartments provide an observed frequency distribution. By superimposing test-point lattices on cell profiles, design-based stereology is used to determine numbers of points lying on those same compartments. Random points hit compartments with probabilities determined by their relative sizes and so provide a convenient internal standard, namely, the expected distribution if labelling is purely random. By applying test-line lattices, and counting sites at which these intersect membrane traces, analogous procedures provide observed and expected labelling distributions for different classes of membranes. Dividing observed golds by expected golds provides a relative labelling index (RLI) for each compartment and, for random labelling, the predicted RLI = 1. In contrast to labelling densities of organelles (golds µm−2) or membranes (golds µm−1), RLI values are estimated without needing to know lattice constants (area per point or length per intersection) or specimen magnification. Gold distributions within a group are compared by chi-squared analysis to test if the observed distribution differs significantly from random and, if it is non-random, to identify compartments which are preferentially labelled (RLI > 1). Contingency table analysis allows labelling distributions in different groups of cells to be compared. Protocols are described and illustrated using worked specimen examples and real data.

1. Introduction

Immunocytochemistry seeks to define the cellular or intracellular location of biochemically characterized antigens. In order to resolve intracellular compartments, immunocytochemistry must be combined with transmission electron microscopy (TEM) and an important criterion for judging this combination is the ability to permit quantification of labelling (Griffiths, 1993; Lucocq, 1994). Particulate markers are to be preferred provided they can be unambiguously identified and counted. In practice, colloidal gold particles have become the markers of first choice for visualizing and quantifying antibodies bound to thin sections of tissues or cells. Their high electron density makes them relatively easy to detect and their punctate nature makes them easy to count (Faulk & Taylor, 1971; Romano & Romano, 1977; Roth et al., 1978; Bendayan, 1984; Slot et al., 1989; Griffiths, 1993; Lucocq, 1994; Roth, 1996). Immunogold particles of different sizes (usually 5–15 nm) permit multiple labelling in order to distinguish different antigens and different compartments (Slot & Geuze, 1985).

The labelling process is relatively straightforward. Antibodies specific for the antigen(s) are raised and a given antibody binds to an antigen in the TEM ultrathin section. These antibodies are then located by a second-step reagent complexed to electron-dense colloidal gold. This second-step reagent is often protein A-gold but gold complexed to antibodies that bind specifically to the primary antibody is also commonly used. The gold marker is added after production of the ultrathin sections. The latter are mounted on carbon/plastic-coated EM grids prior to incubation. They are incubated first on drops of primary antibody and then on the electron-dense second-step reagent. Cellular structures are then contrasted by impregnation with heavy metal stains and the grids observed by using TEM.

A crucial factor in quantifying gold labelling is the signal-to-noise (SN) ratio, which depends on the specificity and affinity of the antibody for the antigen (Griffiths, 1993). Here, signal may be defined simply as those gold particles that are bound faithfully to the relevant antigen. The SN ratio also depends critically on the dilution of antibody used for labelling: as concentration increases, so does the signal and, generally, also the noise. In a well-designed study with a specific antibody, the noise should not exceed the signal and one should detect appreciably more labelling over some compartments (the signal) than over others considered not to label specifically (the background noise). In practice, the concentration that gives the best SN ratio is chosen by determining the highest concentration of antibody which gives a good signal and a reasonably low background. Here, background may be defined as a combination of non-specific adhesion and cross-reactivity with proteins (or other tissue ingredients or features) that mimic the specific antigen (e.g. Nigg et al., 1982). Provided that there is independent evidence to support the notion that the antibody is specific for the antigen(s) of interest, the question is whether, on statistical grounds, the labelling that appears to be ‘real’ and ‘specific’ (the signal) actually exceeds that considered to be merely background noise.

Ideally, gold counts should be expressed relative to the real 3D cell, a procedure that can also be used to estimate labelling efficiency (Lucocq, 1992). However, quantification of gold particles on thin TEM sections usually involves estimating labelling frequencies or labelling densities (LD). The former simply express the absolute or relative numbers of gold particles falling on different compartments. Although useful, especially as a quick and simple preliminary screen, the main disadvantage of this method is that observed labelling frequencies are ‘size-weighted’: more voluminous or extensive compartments occur, and are sampled, more often than smaller compartments. Therefore, even if two compartments contain the same concentration of antigen, more gold particles will tend to be seen and counted on the larger compartment. This danger to correct interpretation can be avoided by determining LD and identifying compartments that have a high SN ratio. LD values are variably expressed as numbers of golds per length of membrane trace or cytoskeletal filament/tubule on the section plane, per sectional area of organelle profile or, less often, per volume of compartment or reference space (Lucocq, 1992). They give an indication of antigen concentrations (golds µm−1, µm−2 or µm−3) in different labelled compartments. On this basis, a compartment considered to label significantly would show a higher LD than one which is assumed to be antigen-free.

LD estimation tends to be applied only if, after a preliminary qualitative examination, obvious differences in compartment labelling are seen. However, this may lead to more subtle, but nonetheless significant, differences being missed. Moreover, little attention has been paid to the question of whether or not observed labelling patterns differ from random and, if not, of how to express statistical confidence in the apparent preferential compartmental localization.

LD requires two pieces of information: the number of gold particles on a compartment and some measure of compartment size (e.g. profile area, trace length). The latter can be obtained by computer-assisted methods in which, for example, membrane or organelle profiles of interest are traced manually on an electron micrograph or display screen. However, a better approach is to apply design-based stereological methods (Cruz-Orive & Weibel, 1990; Mayhew, 1991, 1992; Griffiths, 1993; Lucocq, 1993; Howard & Reed, 1998). These make use of lattices of test probes (points and lines), which are randomly superimposed on sectional images and used to identify and count chance encounters with compartments of interest. The stereological counting approach is not only highly efficient but also unbiased. Whatever the quantification method adopted, unbiased estimation depends crucially on randomised sampling of cells and subcellular compartments (Gundersen & Jensen, 1987; Mayhew, 1991; Gundersen et al., 1999). This is so because LD estimation depends on sampling organelle compartments according to their correct relative profile areas (and, hence, volumes) and membranes according to their relative trace lengths (and, hence, surface areas). Random sampling for both the position and orientation of section planes achieves these sampling conditions.

Here, we present a simple method for quantifying and evaluating observed gold labelling patterns by testing whether or not these conform to the convenient null hypothesis of random labelling. This is possible because when test points are applied randomly to sections, they provide not only a measure of organelle compartment size but also an ‘expected’ distribution, i.e. that which is predicted if gold particles are distributed randomly (or non-specifically) between compartments. The basis of statistical evaluation is to compare this expected distribution (the null hypothesis) with the actual (‘observed’) distribution of gold particles. From the two distributions, a relative labelling index (RLI) can be calculated for each compartment. The same principles can be used to compare gold and intersection counts in studies on membrane compartments and compartments made of filaments or tubules. This new index has the advantage over LD of not requiring a knowledge of the lattice constants used to convert test point counts into organelle profile areas or test line intersections into membrane trace lengths. In this strict sense, RLI is a more efficient (gives greater precision per unit cost) estimator than LD, although, clearly, the accuracy of both remains purely a function of the specificity of the primary antibody. The basic rationale of this novel approach to assessing gold distributions is first described and then the analytical methods are illustrated by worked examples based on artificial and real data.

2. Method

2.1. Defining the compartment category

The important initial step is to define the category of compartment with which antigen(s) are associated. Essentially, intracellular compartments are divisible into three main categories (see Griffiths, 1993): volume (organelle), surface (membrane) or linear (filament, tubule) compartments. The proposed method is described as two variants, the first devised for organelles and the second for membranes. The latter may be modified for studying linear structures such as cytoskeletal microfilaments and microtubules. The choice of compartment category also influences the specimen sampling strategy in a fundamental way: organelle compartments may be sampled by randomising only section location but membrane compartments must be sampled by randomising location and orientation (Cruz-Orive & Weibel, 1990; Mayhew, 1991).

2.2. Random sampling: its importance and implementation

Thin sectioning allows compartments to be visualized under the TEM at optimal lateral resolution. Unfortunately, ultrathin sections represent an extremely small fraction of the total specimen and, consequently, it is impossible to examine all cells and all their compartments. Therefore, sampling is obligatory. Fortunately, random sampling will give every part (and, in certain instances, every orientation) of the specimen exactly the same chance of being selected. Consequently, a random sample is an unbiased sample. However, systematic random sampling tends to be more efficient than simple random sampling. In systematic random sampling, the position and/or orientation of the first item is selected at random but a predetermined pattern then dictates the positions and/or orientations of all other items in the sample (Gundersen & Jensen, 1987; Mayhew, 1990, 1991). In the present context, random sampling at all stages is a prerequisite.

Often, the specimen comprises a population of cultured cells that represents a particular group, e.g. an untreated (control) group, or one of one or more (there may be different levels of treatment) treated groups. Regardless of grouping, each specimen must be sampled randomly at successive stages, e.g. a pellet of cells provides TEM blocks which are cut to provide ultrathin sections which provide TEM fields for counting immunogold particles and stereological test probes. For successful random sampling, the initial pellet must be sampled in such a way that all parts of it (and all compartments within it) have equal chances of being selected. The same rule applies to sectioning blocks and recording fields from sections. With such a strategy, the validity of relative volume estimation from test point counts is assured (Howard & Reed, 1998). For analysing membrane compartments, it is necessary also to randomise directions of sectioning by means of suitable sampling tools (Baddeley et al., 1986; Mattfeldt et al., 1990; Nyengaard & Gundersen, 1992). This is important for estimating relative membrane surfaces from counts of test line intersections.

Appropriately prepared plastic sections, or thawed cryosections, are labelled with antibody and a colloidal gold reagent. Labelled grids are examined in the TEM and fields of view selected by systematic random sampling. When using support grids, the pattern of spaces (windows) between grid bars can be used to fix the sampling pattern. For example, a very convenient way of sampling is to focus on the portion of specimen that occupies a predetermined corner of a grid window and repeat this for all equivalent corners in other sampled windows. The encounters between grid bars and specimen section are randomised during the process of mounting the section on the grid and choosing a grid corner. Provided that a randomly selected field shows the cell type of interest, it must be analysed regardless of its content. The field cannot be moved in order to ensure that a complete cell profile, or an ‘interesting’ or ‘typical’ organelle, is contained within the field. Moreover, fields must not be restricted to cell profiles containing, say, a nuclear profile or Golgi complex. These would constitute component-biased samples (Mayhew, 1979).

Finally, when recording micrographs of random fields, it is more efficient to select the optimal magnification. This is the minimum magnification at which gold particles and cell compartments can be clearly identified. Higher magnification will not improve identification but will compromise sampling efficiency by increasing field-to-field variation (Cruz-Orive & Weibel, 1981). Clearly, practical decisions are necessary to ensure that appropriate magnifications are used for, say, small (5 nm) vs. large (15 nm) gold particles. In multiple-labelling experiments, the magnification used may have to be determined by the smaller golds. Similar issues influence the choice of magnifications when dealing with extremes of gold labelling, i.e. instances of small numbers of gold particles or of very intense labelling. To deal with the former, it may be sensible to count golds at high magnification (about × 15 000) but to monitor compartments at low power. For instances of intense labelling, it is more efficient to subsample golds rather than count hundreds of golds over some compartments and few golds over others. This can be undertaken using the fractionator principle (Gundersen, 1986). For example, quadrats occupying a known fraction of the grid area are sampled and golds associated with a given intensely labelled compartment are counted. The corresponding point (or intersection) totals can then be adjusted accordingly with the known sampling fraction.

2.3. Selecting subcellular compartments

Having sampled appropriately and selected optimal magnifications, gold particles are counted on all selected fields and assigned to particular compartments (Lucocq, 1993). At this stage, important decisions must be made about the types and numbers of compartments in a given category (organelle or membrane). Decisions will be determined partly by experience and expectation (based on knowledge of the antigens and antibodies or the findings of a preliminary pilot study) and the needs of a particular investigation. It might be argued that, to avoid unwarranted assumptions, no compartment must be excluded but, in practice, a sensible balance between resolution and noise must be struck. In this context, resolution refers to the number of compartments, and noise to the observed variation of gold or test probe counts within a compartment. Generally speaking, the greater the number of compartments, the greater the resolution (or more precise the localization) but the greater the noise associated with infrequently occurring, small or poorly labelled compartments. It remains the case that accuracy is determined by the specificity of the primary antibody.

For the above reasons, it is preferable to restrict the number of compartments to no more than 10–12 (if no prior information is available about the location of possible antigen sites). It is also important to sample compartments representing those likely and unlikely to be labelled specifically. This will make statistical testing effective. For low labelling densities, a quick and unbiased procedure for deciding on which compartments to include is to systematically sample 1–2 labelled grids and count about 100 golds, identifying the compartments with which they are associated. To ensure that all subcellular structures are covered by the analysis, interesting individual compartments may be distinguished from others by including a category ‘residual compartment’. All structures not of individual interest (and their associated gold counts) can then be classified as belonging to this artificial composite compartment.

2.4. Quantitative and statistical procedures

2.4.1. Model I: Organelle (volume) compartments

If volume-occupying compartments are selected as the most likely label targets, the next step is to count gold particles on those compartments on all randomly sampled fields from all randomly sampled grids and blocks representing the cell population of interest. The distribution of those counts between different compartments then represents the ‘observed’ distribution for purposes of statistical evaluation (see Fig. 1A). To obtain relative labelling intensities (RLI or LD) and the ‘expected’ distribution, the randomly sampled TEM fields are overlain with a lattice of test points. If this is positioned randomly on successive fields (essentially, this means nothing more than ensuring that lattices are superimposed on fields so that test point positions are independent of the compartments seen on the fields), the resulting distribution of points will represent that which would be expected if gold particles were distributed purely randomly throughout cell volume (Fig. 1B). In this way, estimates of both RLI (and, if necessary, LD) can be obtained. A convenient statistical test of the significance or otherwise of apparent differences between expected and observed distributions within an experimental group is provided by the chi-squared test (see below). In theory, if there is no specific labelling and no preferential (non-specific) adhesive labelling, application of gold marker alone should provide the expected (random) distribution. Therefore, if application of gold marker in the absence of primary antibody produces a non-random distribution, this approach can be used to adduce evidence of preferential but non-specific labelling (e.g. adhesion).

Figure 1.

The use of gold particles and test point probes to determine observed and expected gold labelling distributions in volume-(organelle-) based compartments. A cell profile, part of a larger randomly sampled set, is being analysed in terms of two compartments: nucleus and cytoplasm. In A, 14 gold particles (black circles) are distributed such that no[N] = 10 lie on nucleus and no[CYT]= 4 on cytoplasm (the observed gold distribution). In B, a lattice of test points (lower left angles of lattice squares) is superimposed on the cell image so as to be random in position. In total, 15 points hit the cell of which P[N]= 6 hit nucleus and P[CYT] = 9 hit cytoplasm. The point pattern determines the corresponding expected gold distribution (ne[N] = 6 × 14/15 = 5.6; ne[CYT] = 9 × 14/15 = 8.4). Observed and expected gold distributions from sets of cell profiles allow calculation of relative labelling indices, no/ne (without needing to know lattice constants or magnification) and chi-squared values. The latter are used to test whether or not observed golds are distributed differently from the expected (random) distribution. Labelling densities (golds per profile area) can be calculated only if the lattice area per test point and specimen magnification are known.

2.4.2. Model II: Membrane (surface) compartments

As for organelle compartments, the distribution of immunogold counts between different membrane compartments equates to the observed distribution (Fig. 2A). However, the expected distribution must now be obtained by superimposing lattices of test lines on TEM fields and counting intersections which these lines make with sectional traces of different membrane types (Fig. 2B). Provided that encounters between test lines and membrane surfaces are random in position and orientation (Cruz-Orive & Weibel, 1990; Mayhew, 1991, 1992; Howard & Reed, 1998), the resulting distribution of test intersections represents that which would be obtained if gold particles were distributed purely randomly over the aggregate surface of all selected types of membrane. If it is justified to assume that all types of membrane share the same surface orientation characteristics, sampling conditions can be relaxed so that only section position need be randomised. Otherwise, both position and section orientation must be randomised using appropriate sampling tools (Stringer et al., 1982; Baddeley et al., 1986; Mattfeldt et al., 1990; Nyengaard & Gundersen, 1992).

Figure 2.

The use of gold particles and test line probes to determine observed and expected gold labelling distributions on surface-(membrane-) based compartments. Here, the cell profile is analysed in terms of four compartments: plasma, nuclear, granule and residual membranes. In A, 13 gold particles (black circles) are distributed such that no[PM] = 11 lie on plasma membrane, no[NM] = 0 on nuclear membranes, no[GM] = 1 on granule membranes and no[RM] = 1 on other or residual membranes (the observed gold distribution). In B, a lattice of test lines is superimposed so as to be random in position and orientation. The lines (or, rather, their left-hand borders) make 52 intersections with membranes: I[PM] = 18 with plasma membrane, I[NM] = 8 with outer nuclear membranes, I[GM] = 10 with granule membranes and I[RM] = 16 with residual membranes. This pattern determines the expected gold distribution (ne[PM] = 18 × 13/52 = 4.5, ne[NM] = 8 × 13/52 = 2.0 and so on). Again, observed and expected gold distributions from sets of cell profiles permit calculation of relative labelling indices (no/ne) and chi-squared values. Labelling densities (golds per membrane trace length) can be calculated if lattice line lengths per intersection and specimen magnification are known.

The practical problem of estimating relative surface extents is similar in sampling terms to that of estimating lengths of filamentous structures. Therefore, to compare filamentous compartments (e.g. actin and intermediate filaments, microtubles) an expected distribution can also be constructed by counting test line intersections. Now, these will be intersections between test lines and images of linear features seen on sections.

2.4.3. Testing for differences between observed and expected distributions

A convenient way of comparing expected and observed distributions statistically is the chi-squared (χ2) test (Sokal & Rohlf, 1981). This can be used to compare distributions within an experimental group (e.g. control or treated) or, when used in conjunction with contingency tables (Sokal & Rohlf, 1981), to compare two or more groups (e.g. control vs. treated).

2.5. Specimen worked examples

2.5.1. Example 1: Within-group comparisons (model I – organelles)

Imagine that immunogold was used to label nine subcellular volume compartments in a group of macrophages: early endosomes (EE), late endosomes (LE), phagosomes (PH), endoplasmic reticulum cisternae (ER), the Golgi complex (GC), mitochondria (M), cytosol (C), nucleus (N) and the residuum (R). On analysing TEM fields, the following gold counts (no) were found on those compartments: no[EE] = 23, no[LE] = 20, no[PH] = 22, no[ER] = 14, no[GC] = 23, no[M] = 39, no[C] = 202, no[N] = 55 and no[R] = 3, giving a combined total of 401 gold particles. This represents the observed distribution. Next, imagine applying a test point lattice on which each test point occupied an area ap= 2 µm2 on the scale of the specimen (ap is obtained by dividing the lattice test point area by M2 where M is the linear magnification). With this lattice, the corresponding test point totals (P) for the same cellular compartments were found to be P[EE] = 4, P[LE] = 9, P[PH] = 32, P[ER] = 23, P[GC] = 10, P[M] = 43, P[C] = 270, P[N] = 127 and P[R] = 3, giving a combined total of 521 test points. When normalized for the total number of observed golds, these test point totals correspond to an expected gold count (ne) distribution of ne[EE] = 3.08, ne[LE] = 6.93, ne[PH] = 24.63, ne[ER] = 17.70, ne[GC] = 7.70, ne[M] = 33.10, ne[C] = 207.81, ne[N] = 97.75 and ne[R] = 2.31, combined total = 401. This is the distribution that is expected if there is random or non-specific labelling.

RLI values (no/ne, expressed as observed golds per expected gold) for the above dataset are given in Table 1, together with the partial (compartmental) and total χ2 values. Labelling densities (no/P.ap, expressed as numbers of gold particles µm−2 of compartment sectional area) are provided for the sake of completeness and to emphasize that RLI and LD values express the same relative labelling but in slightly different ways. The partial χ2 value in each row (Table 1) is obtained simply using the formula (no – ne)2/ne. For example, the partial χ2 value for early endosomes is calculated as (23 – 3.08)2/3.08 = 396.81/3.08 = 128.83. The total χ2 (obtained by summing the partial values) is 205.05 and, for 8 degrees of freedom (2-1 columns by 9-1 rows), this means that the null hypothesis of no difference between expected and observed gold particle distributions must be rejected at a probability level of P < 0.001. In short, the observed distribution of gold particles is not random. A glance at the partial χ2 values indicates that the most important contributors to total χ2 are the early and late endosomes and Golgi complex (Table 1). Values of RLI (and LD) show the nature of these contributions. If a compartment is randomly labelled, its RLI = 1 and its partial χ2 = 0. If the compartment is preferentially labelled, RLI will have a value > 1 and its partial χ2 will make an important contribution to total χ2. On these criteria, it can be concluded that endosomal and Golgi compartments are preferentially labelled.

Table 1.  Observed and expected distributions of gold particles in organelle compartments and calculation of labelling density, relative labelling index and chi-squared values (see text Example 1). Asterisks (*) identify compartments that are preferentially labelled.
CompartmentObserved golds, noObserved points, PExpected golds, neLabelling Density, no/P.ap, golds µm−2Relative Labelling Index, no/nePartial Chi-squared values
Early endosomes 23  4  3.082.887.47*128.83*
Late endosomes 20  9  6.931.112.89* 24.65*
Phagosomes 22 32 24.630.340.89  0.28
Endoplasmic reticulum 14 23 17.700.300.79  0.77
Golgi complex 23 10  7.701.152.99* 30.40*
Mitochondria 39 43 33.100.451.18  1.05
Cytosol202270207.810.370.97  0.16
Nucleus 55127 97.750.220.56 18.70
Residuum  3  3  2.310.501.30  0.21

2.5.2. Example 2: Within-group comparisons (model II – membranes)

Imagine that the aim is to examine gold particle distributions between five selected membrane compartments in a group of secretory cells: plasma membrane (PM), Golgi membrane (GM), rough ER membrane (RERM, including the outer nuclear membrane), secretory granule membrane (SGM) and residual membrane (RM, including unidentifiable membranes). The following gold counts (no) on those membranes were detected: no[PM] = 49, no[GM] = 6, no[RERM] = 129, no[SGM] = 228 and no[RM] = 20, a combined total of 432 gold particles. Again, this represents the observed distribution. Now imagine that membrane traces on TEM sections were analysed using a test line lattice on which each intersection was equivalent to a membrane trace length (corrected for magnification) of li = 1.3 µm. With this lattice, the corresponding intersection totals (I) for the same membrane compartments were found to be I[PM] = 133, I[GM] = 18, I[RERM] = 111, I[SGM] = 112 and I[RM] = 14, total = 388 intersections. When adjusted to the observed total gold count, intersection counts correspond to expected gold counts (ne) of ne[PM] = 148.08, ne[GM] = 20.04, ne[RERM] = 123.59, ne[SGM] = 124.70 and ne[RM] = 15.59, combined total = 432 golds.

The values of RLI (no/ne, observed golds per expected gold) and LD (no/, expressed as golds µm−1 of membrane trace length on sections) are provided in Table 2. The table also shows partial and total χ2 values for the five membrane compartments. Total χ2 is 163.19 which, for 4 degrees of freedom (2-1 columns by 5-1 rows), means that the null hypothesis must be rejected at a probability level of P < 0.001. Again, the observed distribution of gold particles is non-random. Inspection of partial χ2 values shows that the most important contributor to total χ2 is secretory granule membrane and the associated RLI (and LD) estimates show that these membranes are preferentially labelled. This is the only membrane compartment that has significantly more observed golds than expected golds.

Table 2.  Observed and expected distributions of gold particles on membrane compartments and calculation of labelling density, relative labelling index and chi-squared values (see text Example 2). Asterisks (*) identify the compartment that is preferentially labelled.
CompartmentObserved golds, noObserved intersections, IExpected golds, neLabelling Density, no/I.lI, golds µm−1Relative Labelling Index, no/nePartial Chi-squared values
Plasma membrane 49133148.080.280.33 66.29
Golgi membrane  6 18  9.84
Rough ER membrane129111123.590.891.04  0.24
Secretory granule membrane228112124.701.571.83* 85.57*
Residual membrane 20 14 15.591.101.28  1.25

2.5.3. Example 3: Between-group comparisons by contingency table analysis (model I – organelles)

In a real example, it was decided to investigate the effects of antibody dilution on the immunogold labelling of nine organelle compartments in each of two groups of cells. The selected compartments were early endosomes (EE), late endosomes (LE), phagosomes (PH), endoplasmic reticulum cisternae (ER), Golgi complex (GC), mitochondria (M), cytosol (C), nucleus (N) and residuum (R). At higher antibody concentration, the following gold counts were obtained: no[EE] = 10, no[LE] = 0, no[PH] = 14, no[ER] = 36, no[GC] = 0, no[M] = 48, no[C] = 182, no[N] = 84 and no[R] = 48, total = 422. The corresponding totals at lower concentration were no[EE] = 20, no[LE] = 24, no[PH] = 12, no[ER] = 20, no[GC] = 0, no[M] = 16, no[C] = 197, no[N] = 54 and no[R] = 31, total = 374. The grand total of golds is 796. After test point counting, and within group comparisons (see above), it was found that both of these distributions were non-random. But does this mean that the distributions at different antibody dilutions are the same?

To answer this question, we must compare the two dilution distributions. However, it is not necessary to construct expected distributions from test point counts or to calculate RLI or LD in order to achieve this. Instead, a contingency table analysis can be undertaken. Table 3 provides a contingency table (two columns × nine rows) analysis of these results. In this analysis, the expected gold counts at each dilution are calculated from the row, column and grand totals. For example, for early endosomes at the 1 : 100 dilution the expected number of golds (15.90, Table 3) is calculated from the product of the column (422) and row (30) totals divided by the grand total (796), i.e. 422 × 30/796 = 15.90. The corresponding partial χ2 value is calculated as the square of the difference between observed and expected golds (5.902 = 34.81) divided by the expected golds (15.90), i.e. 34.81/15.90 = 2.19.

Table 3.  Observed and expected distributions of gold particles in organelle compartments of cells exposed to different dilutions of antibody and calculation of chi-squared values in contingency table analysis (see text Example 3).
CompartmentAntibody Dilution 1 : 100Antibody Dilution 1 : 400Row TotalsPartial Chi-squared values
Observed goldsExpected goldsObserved goldsExpected golds
Early endosomes 10 15.90 20 14.10 30 2.19, 2.47
Late endosomes  0 12.72 24 11.28 2412.72, 14.36
Phagosomes 14 13.78 12 12.22 26 0.00, 0.00
Endoplasmic reticulum 36 29.69 20 26.31 56 1.34, 1.51
Golgi complex  0  0  0  0  0    0, 0
Mitochondria 48 33.93 16 30.07 64 5.83, 6.58
Cytosol182200.93197178.07379 1.78, 2.01
Nucleus 84 73.16 54 64.84138 1.61, 1.81
Residuum 48 41.88 31 37.12 79 0.89, 1.01
Column Totals42242237437479656.11

Taken together, the data in Table 3 yield a total χ2 value of 56.11 and so the null hypothesis (no difference between labelling distributions obtained at alternative dilutions) must be rejected (P < 0.001 for 8 degrees of freedom). Although both dilution distributions are non-random, they differ in the manner in which they depart from randomness. The main differences between dilutions involve the labellings of late and early endosomes (which are lower than expected at the higher antibody concentration) and mitochondria (greater than expected at the higher concentration).

2.6. Practical application: Non-specific adhesion of gold to organelle compartments

It is well known that proteins, including antibodies, have the tendency to adhere non-specifically to sectional surfaces. This is thought to be due primarily to hydrophobic and electrostatic interactions (Griffiths, 1993). For this reason, it is a standard procedure in immunocytochemical labelling experiments, to treat the grid (before applying the antibody) with a protein (or mixture of proteins) that can block the most adhesive sites on the sections. To test whether or not the RLI method could detect preferential but non-specific adhesion of gold to intracellular organelles, we undertook non-specific labelling of compartments in freeze-substituted liver cells. Livers from Wistar rats were perfusion-fixed (30 min, room temperature) via the portal vein with 0.5% glutaraldehyde in 0.2 m PIPES buffer (pH 7.2). Tissue blocks were cryoprotected in 0.8 m sucrose in 5 mm PIPES buffer and frozen in liquid nitrogen before freeze substitution and embedding in Lowicryl HM20. Sections were cut at 90 nm thickness and incubated first on fish-skin gelatine (0.5% w/v) in phosphate buffered saline (PBS) (10 min). Sections were incubated only with the second-step reagent, 12 nm protein A-gold. This was prepared at higher-than-normal concentration (10-fold higher than that employed to produce minimal background) so as to promote non-specific adhesion. Sections were washed in PBS and distilled water, dried and contrasted using uranyl acetate and lead citrate.

Fields sampled by systematic random procedures were viewed at a final linear magnification of × 250 000. Within cell profiles, nine volume compartments were examined: nucleus, mitochondria, cytosol, endoplasmic reticulum, Golgi complex, endosomes, peroxisomes, lipid droplets and residuum. Counts of gold particles and test points were completed in roughly 30 min.

Results are summarized in Table 4. The observed distribution of gold particles did not conform to that expected for a random distribution (total χ2 = 274.98; 8 degrees of freedom; P < 0.001). Values of RLI and partial χ2 revealed preferential adhesion to lipid droplets, endoplasmic reticulum and nucleus.

Table 4.  Observed and expected distributions of 12 nm protein A-gold in organelle compartments of freeze-substituted liver cells. Sections were incubated with the second step reagent only and at higher-than-normal concentration. There is preferential sticking to lipid droplets, ER and nucleus (asterisks *) and this is non-specific.
CompartmentObserved golds, noObserved points, PExpectedgolds, neRelative Labelling Index, no/nePartial Chi-squared values
Nucleus 177 14 122.611.44* 24.13*
Mitochondria 194 25 218.940.89  2.84
Cytosol 710 90 788.200.90  7.76
Endoplasmic reticulum 148  9  78.821.88* 60.72*
Golgi complex   9  3  26.270.34 11.35
Endosomes  15  2  17.520.86  0.36
Peroxisomes   8  3  26.270.30 12.71
Lipid droplets 129  6  52.552.45*111.22*
Residuum  20  9  78.820.25 43.89
Column Totals141016114101.00274.98

3. Discussion

This study has demonstrated a simple and effective method for quantifying immunogold labelling distributions, expressing them as indices of relative labelling (RLI) rather than labelling densities (LD), and testing whether or not distributions indicate random or differential labelling. The methods are applicable to compartments that occupy volumes (organelles) or surfaces (membranes) but may be adapted for linear features (microfilaments and microtubules). RLI differs from LD: the latter represents the observed number of gold particles µm−2 of compartment area (organelles), or µm−1 of compartment length (membranes or filaments), whereas RLI represents the observed number of golds per expected (randomly distributed) gold. It also differs from labelling index (Nemali et al., 1988; Ogiwara et al., 1999), which is the product of organelle LD and volume fraction. As such, labelling index merely changes the reference space from the specific organelle to a larger compartment and expresses gold density per unit area of reference space (e.g. cytoplasmic or cell profile area). Consequently, it conveys no more useful comparative information than organelle LD. In fact, as the number of golds will not correlate so well with the size of the expanded reference space, labelling index must be expected to be a less efficient estimator than LD or RLI. To calculate the latter, the expected distribution of golds is obtained simply from observed point or intersection counts because these represent distributions that are expected if gold particles distribute randomly between compartments. The worked examples illustrate how such analyses can be used to draw statistical comparisons between observed and expected distributions within a given experimental group and between observed distributions in different groups. A specimen study on liver cells revealed that protein A-gold applied at high concentration can exhibit preferential but non-specific adhesion to certain organelle compartments.

Both LD and RLI are calculated by counting test probes as well as gold particles. They are also generically similar in expressing the same relative compartmental labelling. The distinction is in the specific nature of the estimator and the fact that RLI can be calculated without needing to know the constants of test lattices employed for point or intersection counting or the magnification at which sections are analysed. In this sense, RLI is a more efficient estimator than LD, despite the fact that both ratio estimators share the same precision of estimation. Interestingly, the nature of RLI estimation allows a further practical increase in efficiency because gold particles, test points and intersections can all be counted directly at the microscope with little, if any, modification of the screen binocular. This makes the method more accessible to those less experienced in quantitative immunocytochemistry. The organelle-based variant of the RLI estimator is similar to an index developed to quantify silver grains on electron microscopic autoradiographs (Williams, 1977). Also, in an earlier immunocytochemical study (Slot et al., 1991), test points were used to predict random gold-labelling distributions in organelle and membrane compartments.

The practical value of an estimator may be expressed in terms of two statistical qualities: bias (systematic error) and precision (random error). Generally speaking, biases affecting immunogold and test probe counts can arise from sampling inadequacies or technical limitations. The biases include the effects of antibody specificity. Precision (synonymous with accuracy only when there is no bias) is influenced by technical factors but also by sample sizes, e.g. numbers of blocks per cell group, grid sections per block, TEM fields per section, etc. (Shay, 1975; Cruz-Orive & Weibel, 1981; Gundersen & Østerby, 1981; Gupta et al., 1983; Gundersen & Jensen, 1987; Gundersen et al., 1999). Sampling bias is eliminated by randomised sampling at every stage of selection from the collection of cells to superimposing test lattice probes on TEM fields. Test point and intersection counts are influenced by the ability to recognize organelle profiles and membrane traces and this depends on factors such as image contrast, magnification (and lateral resolution) and membrane sectioning angle (Mall et al., 1977; Paumgartner et al., 1981; Mayhew & Reith, 1988). For these reasons and others, it is sensible to incorporate into the compartment categorization scheme a portmanteau compartment into which ambiguous structures can be placed.

On the immunocytochemical side, gold counts are influenced by factors such as antibody dilution, specificity and labelling efficiency (Griffiths, 1993). These factors have effects that determine whether or not a reproducible labelling distribution represents the ‘true’ distribution. Even the most comprehensive evidence for monospecificity of antibody (Western blotting, immunoprecipitation, etc.) does not guarantee the same specificity for antigen at the surface of the TEM section. It is a compelling argument of specificity when cells that totally lack antigen fail to label. However, there is, at present, no way to prove directly and unequivocally that all antibody molecules considered to bind specifically really do bind to the true antigen on the section surface. As arguments of specificity are indirect, we refer to the tissue–gold interactions simply as label.

We emphasize that, in the present context, detection of preferential labelling should not be interpreted as indicating specificity. This is reinforced by the findings illustrated in Table 4. Specificity is a distinct issue that is not addressed by the RLI (or LD) method alone. Instead, other methods and experiments will be required in order to tackle this issue. Qualitative methods include obtaining biochemical proof of specificity by means of immunoblotting or immunoprecipitation. Quantitative methods include modulating the amounts of antigen in situ, e.g. using transfection/expression systems based on transiently or stably expressing cell lines, producing physiological changes in antigen location, producing knock-out cells, and cell or specificity modulation. However, although the RLI method does not address the issue of specificity, it does offer a powerful way of testing whether or not labelling patterns are specific in terms of any specificity experiments that are undertaken.

Labelling efficiency represents the ratio between the number of gold particles and number of antigens. Unfortunately, the number of gold particles hardly ever corresponds to the number of antigens in a compartment because labelling efficiency is influenced by various technical factors involved in section preparation and labelling. Moreover, labelling efficiencies may vary between compartments (Griffiths & Hoppeler, 1986; Chang et al., 1988) making it difficult to draw, from relative gold counts, conclusions about relative amounts of antigen in different compartments. However, labelling conditions are likely to remain sufficiently constant within compartments (Posthuma et al., 1984) provided that processing and preparative conditions are standardized. Therefore, in between-group comparisons of the same compartment, changes in RLI (as in LD) should be commensurate with changes in antigen concentration.

The precision of gold and test probe distributions within and between compartments clearly depends on the numbers of golds and test probes that are counted. Independent stereological studies have demonstrated that acceptable levels of estimation precision can be obtained by counting as few as 100–200 test probes on a given compartment or reference space (Braendgaard & Gundersen, 1986; Gundersen, 1986; Gundersen & Jensen, 1987; Mayhew, 1990). However, this does not apply to a heterogeneous compartment or reference space comprising multiple (sub)compartments which need to be estimated individually. Consequently, RLI and LD will be estimated more precisely in those compartments with higher gold counts. In companion studies, we have found that reproducible observed distributions can be obtained counting as few as 100 golds on at least two grids. Therefore, the levels of counting effort illustrated in our worked examples are realistic. They may be achieved for a set of fields (viewed directly down the microscope or as micrographs) in less than an hour. The precision of gold localization depends, of course, on the number of compartments selected and on how homogeneous, or otherwise, those compartments are. Clearly, if practicable, better resolution of localization will be achieved by selecting compartments that are homogeneous (e.g. selecting early and late endosomes is preferable to having a portmanteau compartment entitled endosomes). This means that decisions will have to be made about balancing the number of compartments against the number of golds per compartment. Higher resolution is likely to require higher counts, more sections and, generally, more work.

Another factor determining the outcome of statistical testing (i.e. whether the null hypothesis is accepted or rejected) is the choice of compartments within a given class. Whilst all intracellular structures (labelled and unlabelled) must be covered by the analysis, it is not necessary that all compartments comprise a single type of structure. To achieve this, whilst maintaining a reasonable number of compartments, it is therefore sensible to create artificial composite compartments comprising different organelles (e.g. RER + SER + Golgi) or different membranes (e.g. RER + outer mitochondrial). It is important to appreciate that restricting analysis to a few compartments possessing similar gold labelling intensities, whilst excluding all other compartments, will compromise the value of statistical testing. For instance, if applied to compartments having the same RLI values, and ignoring non-labelled compartments, such an approach will lead to the outcome that total χ2= 0 and the absurd conclusion that there is pure random labelling. Including the unlabelled compartments (singly or as a composite) will reduce the chances of drawing this conclusion if there is real differential labelling of compartments.

Although the proposed methods offer a practical and effective way of quantifying immunoelectron images, it is only a first step and certain refinements seem appropriate for future consideration. First, the methods are suitable for antigens that are restricted to organelle or membrane compartments. However, some antigens translocate between volume- and surface-occupying compartments and this raises the question of how to derive expected distributions from mixtures of organelles and membranes. A possible solution is to treat a membrane as a volume rather than a surface compartment (Slot et al., 1991) and define a zone lying within a fixed distance on each side of the membrane trace. This needs to take account of potential biases due to oblique sectioning of membranes (Mayhew & Reith, 1988). Second, membrane orientation with respect to the section plane (or electron axis) affects labelling efficiency. Provided that membranes compartments being investigated (a) are isotropic (show no preferred directionality) in 3D space, or (b) share the same directionality, or (c) are cut by isotropic section planes, or (d) are intersected by isotropic test lines, the present membrane-variant will suffice. However, it may be preferable to randomise section orientation by using the orientator (Mattfeldt et al., 1990) or isector (Nyengaard & Gundersen, 1992) sampling devices rather than randomise test line orientation by using vertical sections with sine-weighted line probes (Baddeley et al., 1986). It should also be noted that, if non-randomly orientated section planes are sampled, there are no generally valid procedures for subsequently correcting the observed intersection counts of different membranes possessing different orientation characteristics. Third, problems might arise if labelling of membrane surfaces is dependent on membrane orientation. Fourth, clearly, gold particle counting needs to be performed using unbiased counting rules (Gundersen, 1977; Howard & Reed, 1998) to ensure that particles are not counted by methods that are influenced by differences in particle size. An associated point counting rule would seem most appropriate for gold particles, particularly in multiple-labelling experiments involving gold particles of different sizes. Finally, non-specific sticking of golds is a real problem, which, hitherto, has not received rigorous quantitative analysis. The present method shows how this problem can be detected and quantified and opens up future possibilities for correcting its effects.


We thank colleagues who helped to produce this manuscript. The basic idea was born in June 1999 after a FEBS-sponsored course on Electron Microscopy and Stereology in Molecular Cell Biology held at the University of Oslo. It arose out of discussions held in the coffee bar of the Hotel Gyldenløve. Interestingly, the English translation of Gyldenløve is Golden Lion!