Optimizing imaging parameters for the separation of multiple labels in a fluorescence image

Authors

  • R. Neher,

    1. Max-Planck-Institut für biophysikalische Chemie, D37070 Göttingen, Germany
    Search for more papers by this author
    • *

      Present address: Ludwigs-Maximilians-Universität München, Sektion Physik, Theresienstr. 37, 80333 München, Germany.

  • E. Neher

    Corresponding author
    1. Max-Planck-Institut für biophysikalische Chemie, D37070 Göttingen, Germany
    Search for more papers by this author

Dr Erwin Neher. Tel.: +49 551201 1630; fax: +49 551201 1688; e-mail: eneher@gwdg.de

Summary

A theoretical analysis is presented on how to separate the contributions from individual, simultaneously present fluorophores in a spectrally resolved image. Equations are derived that allow the calculation of the signal-to-noise ratio of the estimates for such contributions, given the spectral information on the individual fluorophores, the excitation wavelengths and intensities, and the number and widths of the spectral detection channels. We then ask how such imaging parameters have to be chosen for optimal fluorophore separation. We optimize the signal-to-noise ratio or optimize a newly defined ‘figure of merit’, which is a measure of efficiency in the use of emitted photons. The influence of photobleaching on the resolution and on the choice of imaging parameters is discussed, as well as the additional resolution gained by including fluorescence lifetime information. A surprisingly small number of spectral channels are required for an almost optimal resolution, if the borders of these channels are optimally selected. The detailed consideration of photobleaching is found to be essential, whenever there is significant bleaching. Consideration of fluorescence lifetime information (in addition to spectral information) improves results, particularly when lifetimes differ by more than a factor of two.

Introduction

Molecular biology and protein biochemistry provide an ever-increasing number of specific fluorescence labels (Zhang et al., 2002). The need to separate fluorescence contributions from such individual fluorophores in multiply labelled microscopic samples is obvious, but the theoretical and instrumental basis regarding the quality of such a separation has not been sufficiently elaborated. The analysis of fluorescence resonance energy transfer (FRET)-signals (Förster, 1948) falls into the same class of problems, because it can be viewed as a problem of separating fluorescence into the components originating from FRET-pairs vs. those from isolated donors and acceptors (Erickson et al., 2001; Zimmermann et al., 2002), as well as from autofluorescence of the specimens.

Apart from instrumental requirements, which call for the highest possible photon collection efficiency (NA of objective, quantum yield of detector; spectral resolution) there are ‘theoretical’ problems, which originate from the fact that both emission and absorption spectra of simultaneously present dyes overlap and signal quality is limited by photon shot noise, as well as by photobleaching. Some of these problems have been analysed for the special case of spectral caryotyping (Garini et al., 1999), in which chromosomes are labelled with distinct combinations of fluorophores. A recent study addressed the general problem of the quality of linear unmixing by simulations (Zimmermann et al., 2003). Nevertheless, a theory is required, which in more general terms predicts the performance of a given microscope configuration for the separation of a given set of fluorophores.

We develop equations that allow the calculation of the signal-to-noise ratio (SNR) of the estimates for fluophore concentrations at a given pixel, based on spectrally resolved image data and on prior knowledge of the spectra of participating fluorophores. In our model we allow for photobleaching and we assume that noise in the acquired images is dominated by photon shot noise, photomultiplier dark noise and some stray light. We then ask simple questions, such as:

  • 1What is the optimal detection bandwidth for a single chromophore in the presence of background stray light?
  • 2Do photons in the region of spectral overlap between two dyes improve or deteriorate the result, or should one exclude such spectral regions from the analysis?
  • 3How many spectral detection channels are needed to separate optimally a certain number of fluorophores and how are the boundaries of such channels optimally selected?
  • 4How much information is gained when spectral channels are further separated into channels according to fluorescence lifetime (such as by counting photons above and below a certain lifetime threshold)?

We conclude that for many of these questions ‘SNR’ is not a good measure for the quality of the separation of fluorescence contributions, because it depends strongly on the total excitation intensity. We therefore introduce a new measure for the efficiency of a given microscope or measurement configuration, which we call the ‘figure of merit, FoM’. The FoM, which relates to the estimate of a given fluorophore concentration, is the square of the SNR of that estimate, normalized with respect to the number of detectable photons originating from this fluorophore (see below). It is a measure of the efficiency with which emitted photons are used for the estimation of a dye concentration. Using both FoM and SNR we explore how the resolution depends on those imaging parameters, which are at the disposal of the user on a given microscope, such as

  • 1the number of exposures (with different excitation wavelengths);
  • 2relative excitation intensities and exposure times;
  • 3number of spectral detection channels;
  • 4spectral borders of the detection channels or else the choice of emission filters; and
  • 5lifetime thresholds (see above).

Most of our conclusions are valid for both confocal and widefield microscopy, whereas in many instances, particularly when dealing with light detection, we describe our results in terms appropriate for the confocal microscope (i.e. for photomultipliers or photon counting units).

We found that many of our results confirm intuitive practice in the selection of excitation parameters and spectral windows of detection. However, we also encountered non-intuitive solutions, particularly when considering multiple exposures on mixtures of more than two dyes. We confirmed the conclusion of Zimmermann et al. (2003) that few spectral channels are sufficient to separate a small number of fluorophores and we expect that our quantitative treatment will become more important as more fluorophores contribute to a fluorescence image.

Theory and definition of the problem

The ‘optimum resolution’ for the separation of individual fluorophores on a given pixel, considering a limited supply of emitted photons, is the resolution that would be obtained if all and only those photons emitted by a fluorophore (limited by its photobleaching and collection efficiency of the microscope) are counted and assigned to a separate detector channel. This can be achieved when the emission spectra do not overlap. Then the resolution of the estimates for fluorophore concentrations depends mainly on photon shot noise and is readily calculated considering the influence of photon loss, detector quantum yield, dark counts, etc. (Young, 1996).

With spectral overlap we have in addition a photon-sorting problem, which sometimes can be solved by selecting narrow band pass emission filters, such that a given detector channel collects photons mainly from one fluorophore. This, however, eliminates a large proportion of photons, which do carry information. Also, with multiple fluorophores, it is often not possible to find spectral bands, which isolate signals from a given fluorophore. Thus, the individual spectral channels usually contain fluorescence light originating from several fluorophores, which have to be separated computationally. To do so, some microscopes now offer the possibility of ‘spectral unmixing’ or ‘emission fingerprinting’. In these techniques the contributions from different fluorophores at a given pixel are calculated, based on spectral information about the contributing fluorophores (Keshava & Mustard, 2000; Tsurui et al., 2000; Lansford et al., 2001; Hiraoka et al., 2002; Zimmermann et al., 2002, 2003). This involves the inversion of a system of linear equations of the form

Ax = y(1)

where x is a column vector of unknown fluorophore concentrations, y is the vector of measured fluorescence signals in the different detection channels and A is a matrix with coefficients indicating how much a given dye at unit concentration contributes to a given detection channel (based on prior knowledge about the spectra of the fluorophores). This equation has to be solved for x at each individual pixel. This is readily performed when the number of unknown fluorophores is equal to the number of detection channels, and if the determinant of A is not vanishing (detA ≠ 0). When the number of detection channels is larger than the number of fluorophores a solution minimizing the least-square error (assuming Gaussian statistics) can be found by solving

ATAx = ATy(2)

where AT is the transposed matrix A (Rao, 1973). For the case that the errors associated with y are not the same for all elements of y a better solution is obtained by minimizing χ2 (see below).

The resolution of the estimates of dye concentrations in a given pixel is a problem of the condition of the matrices mentioned above, and of noise signals in the detection channels. The former depends on the elements of matrix A, which, in turn, are functions of excitation wavelengths and intensities, as well as the spectral windows of the detection channels. Below we develop equations for the expectation values of x as well as their noise considering photon shot noise as the main noise source. First, however, we discuss some simple aspects of resolution by considering fluorescence signals from a single fluorophore.

Resolution and the ‘figure of merit’

Regarding the problem of ‘optimum resolution’, a quantity is needed that describes the accuracy of an estimate of local dye concentration. One such quantity is the SNR. This, however, depends strongly on the excitation strengths and on the mean value of the dye concentration. Because this dependency obscures other influences, we calculate another measure, which indicates how well the information from a detectable photon is used in terms of resolution. We thus define a relative measure, the figure of merit or FoM of a measurement relative to the best possible resolution that can be achieved on a given microscope, by relating the actual SNR to a hypothetical optimal resolution. For simplicity, we introduce the FoM first in terms of photon numbers, as measured by a photon counting device, before returning to the more general notation (using analog signals and vectors, Eqs 1 and 2) in subsequent sections. The optimum resolution would be obtained if all the detectable photons NF, originating from a single fluorophore species F, were counted accurately and correctly associated with F. In that case the SNR of that measurement would be inline image. To derive the FoM for such a measurement we first consider the square of the SNR. This is an apparent number napp of detected photons in the sense that it is equal to the number of detected photons in the ideal measurement. We divide this number by NF and define

FoM = napp/NF(3)

where napp is the square of the SNR. Later we use this definition both to derive theoretical expressions for FoM and to calculate it from experimental data. Therefore, we have to specify what the quantities napp and NF mean in both cases: NF always is a hypothetical quantity, the number of detectable photons originating from a given fluorophore. Because we want to disregard unavoidable losses resulting from the limited efficiency of collection and detection of photons on a given microscope, NF should be considered as the number of photons that would actually be counted on that microscope if the whole emission spectrum of a fluorophore could be sampled and all the counts could be sorted into a single detection channel, one for each fluorophore. napp is the square of the SNR, inline image, of the calculated dye concentration (calculated either on the basis of experimental data or from model calculations), where nF is the background-corrected mean photon count and inline image is the variance of the actually measured photon count n or a calculated variance, considering all relevant noise sources. Thus, inline image is the sum of variances from background noise (dark counts), shot noise from stray light, fluorescence light and all other conceivable sources of noise. When we extend the definition of FoM to dye mixtures, we calculate FoM for each dye individually, replacing inline image by the square of the SNR of the estimate for the concentration of that dye, as detailed in Appendix B.

For illustration of the concept we calculate here the FoM for the simple case of measurement of fluorescence from a single fluorophore in the presence of readout noise given by a ‘dark count’NR and background light NB. The total photon count n, when nF fluorescence photons are measured, will be

n = nF + NB + NR(4)

and we obtain from Eqs (3) and (4):

image(5)

where napp, the squared SNR, was entered as the square of the background-corrected photon count, divided by the actual count number n (which is equal to inline image).

We obtain:

image(6)

It is readily seen that in the absence of any background noise (NB ≈ 0; NR ≈ 0) the FoM is nF/NF, which is 1 for the case that all detectable photons are detected, such as in the absence of an emission filter. Any filtering and any background noise will lead to FoM < 1. In this sense the FoM is analogous to the quantum efficiency of a photodetector. It therefore may be considered an ‘apparent quantum efficiency of detection’, which takes into account the deterioration of the detection process by background noise and losses from spectral overlap. In fact, the quantum efficiency of the detector can be readily adopted in this formalism by redefining NF (see Appendix C for a discussion of the influence of quantum efficiency of the detector).

Resolution for mixtures of fluorophores in the presence of noise

After discussing some properties of the FoM by considering background noise in the case of a single fluorophore we return to the problem of separating fluorescence contributions of several fluorophores, which are present simultaneously at a given pixel. To do so we return to the more general notation of Eqs (1) and (2), which includes intensity-based photodetectors. As discussed above this problem involves inversion of a system of linear equations (see Eqs 1 and 2). However, in reality contributions of stray light, spectral leaks, etc., have to be considered in addition to the contributions from fluorophores, such that Eq. (1) has to be extended to

Ax + yo = y(7)

where yo represents a vector of signals originating from such sources. yo is assumed to be known, once the excitation wavelengths and intensities have been selected, and is assumed to be the same for all pixels. By contrast, x and y will be pixel specific. Specimen-specific background, such as autofluorescence, has to be handled as one or several additional components of x (Zimmerman et al., 2003). The elements of x are the unknown concentrations (actually concentrations times focal volume) of fluorophores and ‘autofluorophores’ in a given pixel and A is a matrix (the same for all pixels), the elements aij of which give the contributions of fluorophore j (if present at unit concentration) to the signal in detection channel i. To obtain a unique solution, the number of detection channels must not be smaller than the number of components of x.

We want to be able to combine the data from several exposures to calculate the components of x. Each exposure provides as many equations to Eq. (7) as there are detection channels. Thus we have a system of linear equations for all the exposures together, which have to be solved simultaneously. For that purpose they can be considered as a single system of n · m equations, where n is the number of detector channels and m the number of exposures.

The question addressed in this paper is how excitation parameters and detection channels have to be selected in order that the solution x is as accurate as possible. In other words, we want to find for a given mix of fluorophores the optimum in SNR (or in the FoM, defined above) in the parameter space of all possible combinations of excitation wavelengths and intensities, spectral detection windows, etc., which are accessible on a given microscope. We will refer to this set of parameters as the ‘imaging parameters’ in the following text. We expect that in many cases the optimum set of imaging parameters will be close to standard practice. In these cases relatively narrow, well-separated bands of the emission spectrum are selected for the individual detection channels, in each of which the contributions from one fluorophore dominates. Linear unmixing allows us to correct for minor contributions of other dyes. However, linear unmixing also allows us to use spectral regions in which fluorophores strongly overlap, and the question then arises as to whether photons in such overlap regions carry useful information, and how the resolution depends on the number of spectral channels used.

To answer such questions and to find an optimum solution, we first have to consider the noise and sources of error in a fluorescence image. Initially, we make the assumption that dye spectra, detector sensitivities, background light (i.e. the elements of A and yo) are exactly known for a given microscope setting, such that the error in the calculation of x is fully determined by the noise in the detection channels. The ith component of the signal vector y is given by the background signal of the corresponding detection channel yo,i plus a product of a single photon signal s(λ) and the spectral density of detected photons n(λ) integrated over the spectral band b of the detection channel.

image(8)

In the case of a photon-counting device s(λ) is the counting unit, whereas for an intensity-based detector it is proportional to 1/λ (see Appendix B). The noise in a light measurement is dominated by the photon shot noise of the signal (Young, 1996; Garini et al., 1999) and by some detector noise, inline image. The light-dependent variance of yi, under conditions of Poisson statistics, is the integal over the product of the single photon signal s(λ) and the signal density y(λ), which itself is given by s(λ) · n(λ). Therefore,

image(9)

where inline image is the background variance of the ith detection channel (which may contain an integral over the spectral band). In the case of light detectors, which measure light intensity, evaluation of Eq. (9) is not straightforward. In Appendix B it is shown how the wavelength-dependence of s(λ) can be allowed for by introducing a single photon signal so at reference wavelength λo and calculating the variance inline image of the ith component of y as

image(10)

Here inline image is the mean wavelength of the spectral window of detector channel i, and CVs is the coefficient of variation in amplitude of the single photon signal. The sum in Eq. (10) represents the light signal yi.

We see that the noise of the measurement depends on the presence and amount of fluorophores (xj). Therefore, the optimization of imaging parameters cannot be performed for a fluorophore combination in general, but only for a specific mix at a time. In other words, the microscope user has to specify for which component she/he wants to optimize the measurement in the presence of given amounts of other dyes. In practice, this is probably best done iteratively by performing a first measurement under default conditions and then optimizing parameters on the basis of the dye distribution in a specified region of interest. Once such default values for x have been specified, the errors in y are calculated according to Eq. (10) and the expected variances inline image of x are calculated (Rao, 1973) as the diagonal elements hjj of a matrix H, which is the inverse of the product of two matrices,

H = (CT · C)−1.(11)

C is a matrix, the elements cij of which are calculated according to:

cij = aijy,i(12)

and CT is its transpose.

This calculation is based on the minimization of χ2, assuming Gaussian noise in the original data (Rao, 1973). This approximation is not quite correct for small signals, when there is a discrepancy between Poisson statistics (which is valid for shot noise at light levels well below fluorescence saturation) and Gaussian noise (which under the assumption of equality of variance and mean results in a simple algorithm). More complicated optimization procedures based on the correct noise statistics and/or including non-negativity constraints are available (Lawson & Hanson, 1974). However, for the purpose of optimizing imaging parameters a simple and rapid algorithm is necessary, because millions of parameter combinations, which are accessible on a given microscope, have to be tested to find the optimum. The microscope user may be interested in optimizing the concentration estimate of a single dye label in the presence of a specified background or may want simultaneously to optimize the resolution of several dyes. Below (under the heading ‘Three and more dyes’) we calculate the ratios inline image by the procedure described above or otherwise FoMj, the FoM of dye j (see Appendix B) and look for those combinations of imaging parameters that optimize these quantities under certain constraints, such as limits to total bleaching. Specifically we discuss optimal combinations for the sum S:

image(13)

and for individual FoMj values or their mean. The definition of S as a sum of squares of inverse SNRs is appropriate for simultaneous optimization of all dye resolutions by minimizing S.

Here we actually do not use the linear unmixing algorithm because we are only interested in the errors of the prediction in order to find optimum conditions for the measurement. Once these optimum conditions are obtained, the ‘unmixing’ solution for a given pixel can be calculated according to

image

where y′ and b′ are vectors, the elements of which are calculated (in analogy to those of C) according to

yi = yiy,i
y′0,i = y0,iy,i.
(15)

Results and examples

Here we illustrate the influence of spectral overlap, bleaching and photodetector (background) noise by considering simple examples. For simplicity we first return to the case of a photon counting experiment, which avoids complications resulting from the wavelength dependence of the energy of the optical quantum. We also discuss some simple cases of single dye measurements and two-dye mixtures before returning to the general problem of seperating the contributions from several dyes.

A single fluorophore and background noise

The first example is that of a single fluorophore with normalized emission spectrum e(λ). This is the case already discussed in the context of Eqs (3) and (4), where we assumed a constant readout noise, characterized by a dark count NR and a background light, which contributes NB photons. We now discuss how bandpass-filtering affects the FoM and the influence of background light on it. To do so, we have to write Eq. (4) in a spectrally resolved form with the normalized emission spectrum e(λ) and the background light spectrum b(λ):

image(16)

Here n is the number of detected photons and the integration extends over b, the spectral band considered. NF and NB are the expected photon counts when the integration is extended over the whole spectrum. For simplicity, a bandpass filter with infinitely sharp edges (box-like function) was assumed. Combining Eqs (4), (5) and (16) we obtain

image(17)

In the case of vanishingly small stray light (NB = 0) Eq. (17) simplifies to

image(18)

It is seen that emission filtering in this case always leads to a reduction of the FoM because the denominator in Eq. (18) is always larger than the numerator (which is < 1). This merely reflects the fact that no improvement is achieved by filtering out signal-carrying photons on a constant noise background.

This is different when white background light is present, which contributes more noise the larger the bandwidth. Considering the full Eq. (17) it is seen that the FoM has a maximum when plotted against the bandwidth of an emission filter (Fig. 1). NF was assumed to be 100, and the bandpass filter centred at the centre-wavelength of the chromophore. The emission spectrum e(λ) was a Gaussian (for simplicity) with an emission maximum at 500 nm and a sigma of 15 nm. The three curves shown are for the cases NBb(λ) = 0, 1 and 5 photons per nm bandwidth. Both the optimum bandwidth and the maximum in FoM depend on the amount of background light. In particular for the case of large background noise, the FoM deteriorates if the bandwidth of the filter is increased beyond 45 nm. Nevertheless, bandwidth settings as wide as 80 nm gave better results than very narrow settings in all cases.

Figure 1.

The ‘figure of merit (FoM)’ for a single dye in the presence of white background light as a function of filter bandwidth. The calculation of FoM was performed according to Eq. (17) assuming a dye with a Gaussian spectrum centred at 500 nm with a sigma of 15 nm (halfwidth ∼35 nm). The excitation intensity was assumed to be such that 100 photons can be detected during the exposure at full bandwidth. The emitted fluorescence was filtered with a square-box bandpass filter with bandwidth b, centred at the centre of the emission (transmission = 1 for 500 − b/2 < λ < 500 + b/2 and zero otherwise). The broken curve shows the FoM in the absence of background light. It approaches 1 for large bandwidth (corresponding to an SNR of 10 for 100 detected photons). The curves FoM_01 (circles) and FoM.05 (diamonds) were calculated for white background light with intensities of 1 and 5 detected photons per nm bandwidth, respectively. In all cases a ‘readout noise’NR of 1 photon was included, the influence of which, however, is almost negligible.

Surprisingly, the maximum of the FoM is at bandwidths for which the fluorescent light at the edge of the band is already significantly smaller than the background light (about 2 and 6 times, respectively). The optimum location for the edge of a bandpass can be calculated by setting the derivative of the FoM with respect to the cutoff wavelength to zero, which results in the condition

image(19)

The integration interval w in this case extends from some wavelength at the lower edge of the bandpass (which was assumed to be fixed in this calculation) to the upper edge, λmax, the influence of which on FoM we wish to analyse. This situation is slightly different from that in Fig. 1, in which both upper and lower edges of the bandpass were varied simultaneously. The equation shows that the optimum cutoff wavelength is always at a point where the number of fluorescence photons (per unit bandwidth) is smaller than the number of background photons, because the denominator on the right-hand side of Eq. (19) is a sum of positive terms, which includes the numerator. The ratio in Eq. (19) tends to unity if the number of fluorescence photons is far bigger than the number of background photons. A different result is obtained below when we consider the case of two dyes with overlapping spectra, in which photons from one dye (of not precisely known concentration) deteriorate the SNR of another dye, even if the latter contributes fewer photons than the first dye.

A slightly more complicated solution is obtained when asking for the optimum resolution when bleaching is considered. In the hypothetical case of no background noise, the best estimate of the amount of dye in the focal volume would be obtained for a long light exposure, counting all the photons that are observed until complete bleaching. In the case that the photodetector has a non-negligible dark signal, these counts will accumulate and eventually noise will increase without further increase in signal. To calculate the optimum exposure time the ratio inline image (the square of the SNR) can be calculated by evaluating the product FoM · NF:

image(20)

For this purpose NF has to be considered as time dependent, to represent the total number of photons that can be detected within the exposure time t. Likewise, NR and NB are assumed to increase with time. Whereas the latter are likely to do so in a linear fashion, the bleaching time course may be quite complicated, depending on the point-spread function and the mobility of the dye. Figure 2 shows the result of a calculation assuming a 50-nm bandfilter, an exponential bleaching process with time constant of 1 s, and linearly increasing NR and NB, which within 1 s reach values of 1 and 1 per nm bandwidth, respectively. The number of photons that can be maximally detected (with full bleaching) is assumed to be 100. The optimal SNR of 5.8 is obtained for a 1.8-s exposure time. At this time the hypothetical dye is 84% bleached and the FoM is 0.41. The SNR is smaller both for larger and shorter exposure times. At shorter times the number of detected photons is lower than optimal, at longer times the additional photons from the dye do not outweigh the additional noise of the background photons. The result should be compared to an SNR of 10, which can be optimally obtained for 100 detected photons in the absence of background noise.

Figure 2.

Signal-to-noise ratio as a function of exposure time of a bleaching dye. Emission spectrum and bandpass filter were assumed to be the same as in Fig. 1 (with 50 nm bandwidth). The dye was assumed to bleach with a time constant of 1 s at the chosen illumination intensity and to emit 100 detectable photons before complete bleaching. The ratio mean/√Var was calculated using Eq. (20) as a function of exposure time. Background light was assumed to increase linearly with exposure time (see text for further details). In the absence of background noise the SNR would reach a value of 10 (for 100 detected photons). The background noise reduces this to an optimum value of 5.84, which is obtained for an exposure time of 1.8 s.

Simple two-dye mixtures

Before describing procedures to optimize S or FoM for multiple dye mixtures, we discuss some principles of the photon sorting problem by means of the simplest possible example, which is a mix of two dyes in the absence of any background noise. The questions we ask with respect to overlapping emission spectra are:

  • 1How much information about the fluorophores is lost, when emission spectra overlap?
  • 2Is it advantageous for obtaining a good FoM to eliminate photons in the region of spectral overlap from the measurement, or do all photons contribute information about the dye mix?
  • 3What are the rules for good selection of excitation wavelengths and intensities.

We start by considering two dyes and for simplicity assume that both dyes have rectangular emission spectra (see Fig. 3A). For one dye, emission starts at 450 nm and extends beyond 500. Its spectrum overlaps with that of another dye, the spectrum of which starts at some wavelength below 500 nm and extends to 550 nm. Therefore, the two spectra overlap in a band around 500 nm of total width o and we take this band to be symmetrical around 500 nm, as depicted in Fig. 3(A). First, we assume two spectral detection channels, one starting at 450 nm and extending up to 500 nm − g/2, the other one symmetrically above 500 nm, as depicted in Fig. 3(B). This leaves a gap of width g in the overlapping region, and we ask whether introducing the gap and widening it improves or deteriorates the resolution. For simplicity we assume ideal detectors, and the absence of any noise other than photon shot noise.

Figure 3.

The square of SNR and the FoM as a function of the gap-width between two detection channels. In this figure two detection channels are used. Part A shows the spectral shapes of the hypothetical dyes (see text for explanation). Part B shows the spectral windows of the detection channels and the gap between them. Part C shows the square of the SNR (left ordinate) or the FoM (right ordinate) for three cases: for trace 100_100 it was assumed that both dyes at unit concentration contribute a total of 100 photons each to the measured signal; for traces 100_50 and 100_20 the dye of interest contributes 100 photons and the other dye contributes 50 and 20 photons, respectively. Variances were calculated as the diagonal elements of matrix H (Eq. 11) using a Maple macro.

We excite both dyes, such that a total of 100 photons can be detected from each (with g = 0). If there were no spectral overlap the ratio (mean)2/Var for both dyes would be 100, which is the value expected for a mean of 100 Poisson-distributed photons. We set the overlap region to extend from 483 nm to 517 nm such that half of the photons from each dye are in the overlap region and half of them are in the non-overlapping region. If we set the gap width to 33 nm, such that none of the photons from the overlap region is included, we count an average 50 photons in each channel with a σ of ∼7.1 (variance = 50). Calculating the ratio (mean)2/Var for one of the dyes we obtain the value of 50 and an FoM of 0.5 (taking the original mean of 100 photons from a given dye as NF). Varying the gap width (Fig. 3C, trace 100_100), we see that the widest gap actually results in the optimum value, with the FoM varying from 0.4 at zero gap to 0.5 with a gap, which excludes all photons from the overlap region. It is clear that in this case (two dyes, two detection channels and equal signals from the two dyes in the overlap regions) the overlap photons deteriorate the signal.

The above result does not mean that photons from regions of spectral overlap are always counterproductive. If the second dye is present at a lower concentration, its influence on the resolution of the first dye is weaker and the FoM changes very little when the detection gap is varied (see Fig. 3C, traces 100_50 and 100_20, which were calculated for the cases that dye 2 contributes 50 and 20 photons, respectively). Nevertheless, FoM does not increase above 0.57 even if the second dye is present in only very small quantities. This is because there are only two detection channels available and the algorithm can determine the amount of dye 2 only very roughly when it contributes only a few photons. Note that this specific result for very small numbers of photons is quite inaccurate owing to the simplifying assumption of Gaussian noise; an algorithm that uses Poisson statistics and non-negativity constraints would probably do better.

The overlap photons do contribute information when we allow more detection channels, such that some channels measure one or the other dye outside the region of spectral overlap and others deal with the overlap. Figure 4 depicts such a measurement. The spectra of the two dyes are the same as above; however, there are now four detection channels, one each at the low- and high-wavelength ends. If both dyes contribute the same number of photons and the width of channels 1 and 4 is adjusted such that they exactly cover the region of non-overlap, then the FoM is 0.5, no matter whether the overlapping photons are used in channels 2 and 3 or not (see trace 100_100 in Fig. 4C). If, however, dye 2 contributes only half as many photons as dye 1, the FoM is improved from about 0.5 to 0.6 when more and more photons from the overlap region are included in the measurement (for smaller gaps). For the case that dye 2 contributes only 20 photons, this effect is even stronger (see traces 100_50 and 100_20 in Fig. 4C).

Figure 4.

The square of SNR and the FoM as a function of the gap-width. Here four detection channels are used (see part B). All other aspects are similar to those of Fig. 3.

Numerous other calculations of similar type showed that it only rarely improves the FoM if photons are eliminated from the measurement by a band pass filter, although in special cases (particularly with a very small number of detection channels) it may do so. By contrast, regions of spectral overlap are very important for adequate resolution when the spectra of two dyes differ mainly by their widths. An example is shown in Fig. 5(B,D), in which an FoM of 0.4 was achieved for a dye the photons of which came exclusively from a spectral band, which was strongly contaminated by photons from another dye.

Figure 5.

Comparison of spectral shift with a difference in spectral width. In all cases fluorescene from two dyes is measured with three detector channels, which cover the spectral range from 400 to 600 nm without gaps. One central channel, symmetrical around 500 nm, has a width that is varied in the calculations (parts C and D). The two other channels, also symmetrical around 500 nm, cover the spectral ranges below and above the central one. In parts A and C the two dyes have a spectral shift with respect to each other of 20 nm and half widths of 36 nm (Gaussians with sigma of 15 nm) and in part B and D they have the same centre wavelength of 500 nm, but they differ in width. One (trace 500nm_15) has the same width as the spectra in part A; the other (trace 500nm_44) has a halfwidth of 103 nm (sigma = 44 nm). Parts C and D show the FoMs for one of the dyes (trace 500nm_15) calculated as a function of the width of the central detection channel for the cases of parts A and B, respectively. In the case of the spectral shift of 20 nm (parts A and C) the FoM goes through a maximum of 0.45 at a width of the central channel of 15 nm. In the other case the maximum is very broad, with an optimal value of 0.4 at 55–60 nm width of the central channel.

The examples given above show that mixing photons from another dye into the detection channel, which provides the main signal for a dye of interest, deteriorates the quality of the photon count of that dye. It does so in a manner that depends on other detection channels. Two special situations are discussed here. In both cases it is assumed that n2 photons are added on average to the average count n1 of the dye, which we want to measure. In the optimum case the mean of n2 is exactly known (for instance, because the amount of this dye is accurately measured in another detection channel). Such a case was previously considered in the context of Eq. (17), where we discussed the influence of a known background light on the FoM of a single dye. The equation allowed us to calculate at which wavelength near the edge of the spectrum of a dye the detection channel should be cut off in order to optimize the FoM. We concluded that we can extend the measurement up to a point at which the spectral density of photons from the dye is one-half to one-fifth of that of the background. This shows that in this case photons are ‘valuable’ unless there is a large excess of background photons (depending on the ratio of the total numbers of fluorescence and background photons; Eq. 19). The second case is the two-dye mixture described above, in which the photons in the overlap region were ‘useless’, even for the case n1 = n2. The reason is that in this case the mean of n2 was an unknown and was determined together with n1 with an accuracy not better than that of n1.

As another simple example of two-dye mixtures, we consider two dyes with gaussian emission spectra, and we assume that we can excite these dyes such that each contributes 100 detectable photons. In the first case, the two gaussians have the same sigma of 15 nm, but are shifted by 20 nm (Fig. 5A), in the second both dyes have the same centre-wavelength, but one has 2.9 times larger halfwidth than the other (Fig. 5B). We use three detector channels in each case (one of width w centred at 500 nm and two further channels, adjacent at shorter and longer wavelengths, respectively, and symmetrical around 500 nm) and calculate the FoM, while varying the width w of the central channel. In both cases the FoM goes through a maximum (Fig. 5C,D), which has quite similar values in the two cases. We conclude that a difference in spectral shape can be as valuable for separating the fluorescene of two dyes as a spectral shift.

Replacing the central detection channel by a gap and measuring with only two detection channels leads to some loss of resolution. In the case of the spectral shift (Fig. 5A) the maximum FoM is 0.416 (at a gap width of 9 nm) and it is 0.39 at zero gap. This compares with the optimum of 0.44 for three channels, as discussed above. The case of the spectral broadening (part D) results in a singular matrix for two symmetrical detector channels with a gap in between because of the symmetry (the two channels measure identical signals).

In order to explore how temporal information contributes to the separation of fluorophores, we assume that one of the dyes of Fig. 5(A) has a fluorescence lifetime of 2 ns, and the other of 4 ns. We further assume that we have two photon counting devices at hand, each of which is able to separate the counts of its respective detector into classes, corresponding to photons with delays (after a short excitation pulse) shorter and longer than certain threshold values t1 and t2, respectively. We select the spectral ranges of the two channels to extend from 400 to 500 nm and from 500 to 600 nm, respectively, and investigate how much the resolution improves by treating these counts as four detection channels in the sense of our calculation (two lifetime classes each of two spectral channels) and how the values of t1 and t2 have to be selected in order to optimize the resolution. The optimum is an FoM of 0.45 for t1 = 5.2 ns (for the channel on the side of the 2-ns lifetime dye) and t2 = 3.5 ns (for the other channel). Thus, taking advantage of fluorescence lifetime information can improve the resolution between two dyes with a two-fold lifetime difference by about 15% (0.45 vs. 0.39). In the case that the two dyes have no spectral shift at all, but a two-fold difference in fluorescence lifetime, an FoM of 0.13 is obtained. Obviously, temporal information is most valuable when other means of photon sorting are inapplicable. The effect of lifetime is much larger if the ratio of lifetimes is larger. For a spectral shift of 20 nm and a 10-fold difference in lifetime the FoM increases to 0.72. Figure 6(A) shows the dependence of the FoM as a function of the ratio in lifetimes and Fig. 6(B) is a three-dimensional plot of the FoM for the case of a lifetime ratio of 3 as a function of the two time thresholds t1 and t2.

Figure 6.

Separation of fluorophores by fluorescence lifetimes. Fluorescence lifetimes were assigned to the two dyes of Fig. 5(A). Dye 1 (centred at 490 nm) was assumed to have an exponentially distributed lifetime with a mean of 2 ns, whereas the mean lifetime of dye 2 was allowed to vary. Detection channel 1 (450–500 nm) was split into two channels, one accepting photons with lifetimes smaller t1 and another collecting the remaining photons of this wavelength band. The same was performed with detection channel 2 (500–550 nm) using a threshold t2, such that a total of four detection channels were available. FoM values were calculated, as described in the text, varying the ratio of mean fluorescence lifetimes as well as t1 and t2. Part A plots the optimum FoM of dye 1 against the ratio of lifetimes, in each case optimizing t1 and t2. The lowest value of 0.4 is obtained when the dyes have the same lifetime (identical to the case of Fig. 5C with zero width of the central detection channel). Part B is a three-dimensional plot of FoM of dye 1 against the lifetime thresholds t1 and t2 for the case of a lifetime ratio of 3.

Three and more dyes

In order to demonstrate the power of the formalism described above, we discuss in detail the example of a more complicated dye mixture, which may be difficult to separate by conventional means. Although we make an effort to perform these calculations under ‘more realistic’ conditions including dye bleaching (see Appendix for details on how we calculate aij and yo,i) and realistic, intensity-based spectra, we are aware that quantitative measurements on the microscope will generate a number of additional complications.

The example we use is that of the three autofluorescent proteins cyan fluorescent protein (ECFP), green fluorescent protein (EGFP) and yellow fluorescent protein (EYFP). The excitation and emission spectra of the enhanced forms of these proteins, as supplied by Clontech (www.clontech.com/gfp/excitation.shtml), are shown in Fig. 7. We assume that we can excite the fluorescence of these proteins with the laser wavelengths that are typically available on a laser scanning microscope (LSM) and that the emission spectrum can be split into several detection channels, either by a cascade of beam splitters or by a Prism Spectrophotometer Detector (Leica), or by a photomultiplier array (Zeiss Meta, up to 32 channels). In some of the calculations we assume that the dyes can be simultaneously excited by two or three laser wavelengths (which is possible at minimum loss of detection bandwidth with an acousto-optical beam splitter). On a widefield microscope simultaneous excitation should be possible with suitable filter sets, but at substantial loss of emitted light.

Figure 7.

Resolution as a function of the number of detection channels. The example of the three autofluorescent proteins cyan fluorescent protein (ECFP), green fluorescent protein (EGFP) and yellow fluorescent protein (EYFP) was used to explore how the resolution of the dye concentration estimates depends on the number of detection channels. Parts A and B show excitation and emission spectra. It is assumed that dyes are present in the focal volume at concentrations such that when excited at their excitation maximum, the same number of photons (about 200) is detected for each dye. The dashed vertical lines in part A show the positions of the laser lines used for excitation. The dashed vertical lines in part B represent the boundaries of the detection channels, which were found to be optimal for the case of a single simultaneous exposure with two laser lines (458 and 514 nm) and only three detection channels. Part C shows the mean ratio of the variance of concentration estimates divided by the square of the concentration estimates, averaged over the three dyes (the quantity S, Eq. 13, divided by 3) as a function of the number of detector channels. The upper two curves are for a single exposure with simultaneous excitation at 458 and 514 nm. The lower three curves are for the case of two exposures in sequence. The values at the very right in each case represent 34 or 68 detection channels, respectively. Otherwise the abscissa shows the total number of detection channels, which for the case of the double exposure is twice the number of physically existing detection channels. Open squares are values for optimal gaps between spectral windows (see Table 1 for coordinates of spectral windows). Circles were calculated for contiguous spectral windows without gaps, and crosses represent values for which two sequential exposures using the same spectral windows without gaps were used.

We asked the following questions:

  • 1What is the efficiency, in terms of FoM, in separating the contributions of these three dyes and how do laser wavelengths and intensities have to be selected to optimize this efficiency?
  • 2How does the efficiency depend on the number of detection channels?
  • 3How does simultaneous excitation with two or three wavelengths compare with sequential exposure using one wavelength at a time?
  • 4What is the gain in efficiency when, during a sequence of two exposures, the spectral detection windows are optimized for each exposure (vs. the case of using the same windows in both exposures).

To answer these questions we calculated FoM and S, the sum of the expected variances of dye estimates (Eq. 13; at normalized mean concentrations, x = 1, for each dye) for many possible combinations of imaging parameters and we selected the combination with the best S value. Because it is very computation-intensive to test all possible combinations of excitation wavelengths, excitation intensities, number of detection channels, boundaries of detection channels, etc., we adopted the following three-step procedure for a given set of excitation wavelengths.

Step 1.  Optimize the intensities of the excitation: to do so we first calculated FoM and S values using a large number of equally spaced detection channels (34 for each exposure), which covered the whole range of the emission spectra of the three dyes. We increased excitation intensities of the individual laser lines up to the point at which dye bleaching reached a certain limit. For these calculations we used the same relatively arbitrary bleaching constants γi (see Eq. A4) for all three dyes, which would typically lead to 50% bleaching during an exposure, in which about 200 photons from a given dye can be detected. While calculating FoM and S values for all eligible combinations of excitation intensities, we investigated the optimum S value for which the maximum bleaching of any of the three dyes remained smaller than a certain threshold value (80% in most of the cases described below). We did so for one or two consecutive exposures and stored optimum parameters in both cases. For 1 unit of excitation light, white background light of 1 photon per 20 nm bandwidth was included in all the simulations (linearly increasing with illumination intensity and bandwidth).

Step 2.  Optimization of the detection with fewer channels. Assuming that a large number of detection channels is able to extract all the available information about dyes and that the optimum excitation intensities determined with many detection channels should also be appropriate for fewer detection channels, we held the excitation parameters constant, reduced the number of detection channels and systematically varied their boundaries in order to determine the optimum boundaries for a given number of detection channels and a given number of exposures. The optimum thus achieved may not be the absolute optimum because it is conceivable that with a restricted number of detection channels other excitation parameters might yield slightly better values. However, our result (see below) that a small number of channels, optimized in this way, gives close to optimal results leaves little room for improvement.

Step 3.  Introduction of band gaps: from the simulation of simple two-dye mixtures described above, it seems likely that the introduction of gaps between detection channels may improve the results. We therefore introduced such gaps in the calculation in order to optimize the results further. In most cases this step did indeed improve the result. However, the improvement rarely exceeded a few per cent change in FoM or S (see below).

General results.  After only a few simulations allowing simultaneous or sequential excitation at wavelengths of 458, 476, 488 and 514 nm, it became apparent that the optimum illumination was always such that only laser lines at 458 and 514 nm were used. Optimum intensity for both 476 and 488 nm was always zero. Note that this excitation is not optimal for resolving GFP, which is a consequence of the fact that we searched for the optimum S and not for that of a single dye. The exclusion of excitation at 476 and 488 nm is understandable because excitation at these wavelengths produces far more overlap in emission than excitation at either 458 or 514 nm (see Table 1 and Fig. 7C). In addition, it was apparent that sequential excitation (first at 458 nm and then at 514 nm) was generally about twice as efficient (in terms of FoM) as simultaneous excitation at 458 and 514 nm. This means that twice the number of photons have to be collected in the latter case in order to achieve the same resolution as in the former. For sequential excitation at 458 and 514 nm, the mean FoM for the three dyes was typically around 0.40, when the bleaching threshold was set to 80% (i.e. 20% of the dye remaining unbleached). This means that about 2.5 times as many photons have to be collected as would be required for the same resolution if the emission spectra do not overlap, and if no background light is present. This performance is comparable with that of the classical approach, using filter cubes for simultaneous observation of only two fluorescent proteins. Such filter sets usually employ emission filters with 20–50 nm bandwidth, which typically transmit 20–50% of the available photons. A three-dye filter cube would probably have much narrower bands if this was at all feasible for the three proteins considered here.

Table 1.  Imaging parameters, which optimize S values for overlapping spectra of three dyes. Spectra with the general properties of those of ECFP, EGFP and EYFP were assumed for the simulations. Relative excitation light intensities refer to spectra normalized to peak intensities (per nm) both for emission and excitation. We assumed nominal concentrations of 1 for each dye, which (in view of the normalization of the spectra) refer to those concentrations that, when excited at the peak of the excitation spectrum, excite the unit of light emission (per nm) at the peak of the emission spectrum. The upper half of a double-row corresponds to the parameters of the first exposure and the lower one to those of the second exposure. so · λo (see Eq. B3) was selected such that the ‘unit’ of emission light at 500 nm corresponds to 0.233 photons per nm band width. For simplicity, light quanta were assumed to be monodisperse at a given wavelength (CV2 = 0). White stray light at an intensity of 1 photon per 20 nm bandwidth was added for each unit of excitation light as well as readout noise of 2 photons per detection channel (except for the case of ‘many channels’, where readout noise was neglected).
Type of simulationExcitation at λ (nm)Bleaching threshold (% bleached)Detection channelsS/3mean FoM
458488514
strong illumination00380%34 channels, 5 nm each0.00730.42
many detection channels400 from 450 to 620 nm  
strong illumination00380%474–515; 519–525; 529–620;0.00770.401
2 exposures; 3 channels each400 454–485; 489–500; 504–618  
strong illumination 1 exposure; 4 channels40380%454–480; 484–490; 494–520; 529–6200.0150.207
weak illumination00.050.420%450–515; 515–525; 525–620;0.0370.356
2 exposures; 3 channels each0.500 450–485; 485–500; 500–620  

How many detection channels are required to extract the bulk of the information? When optimum values for FoM and S obtained for many detection channels were compared with those from fewer ones, a remarkably small number of detection channels (if optimally selected) were able to achieve almost as good a resolution as a large number. Table 1 gives optimum FoM values, as well as some other parameters of the simulations, and Fig. 7(C) plots S/3 against the number of optimally configured detection channels. For the case of two sequential exposures (lower trace, circles) and individual optimization of channel boundaries, S/3 increases from a value of 0.0073 at 68 detection channels to 0.008 at four detection channels (two for exposure 1 and two for exposure 2). In this simulation we assumed readout noise to be zero. Including readout noise deteriorates the resolution. This effect has been shown to be worse for a higher number of channels (Zimmermann et al., 2003), such that there will be an optimal number of channels. This deterioration is obvious in the limiting case of very many detection channels, in which each channel receives only a small signal and readout noise is dominating. Binning of neighbouring channels in a detector array, which is equivalent to considering broader and fewer channels, will reduce the adverse effects of readout noise.

For a single exposure with simultaneous excitation at both 458 and 514 nm (upper trace, circles in Fig. 7C) the values are about twice as high as those resulting from a dual exposure. The trace with crosses gives the results for a very similar calculation, in which during a dual exposure the detection windows were forced to be the same for both exposures. This may be advantageous in practice in spite of some loss of resolution, because the change of emission filters may be time consuming and may introduce problems of alignment and reproducibility. ‘Freezing’ the detection channel boundaries compromises the resolution to any significant degree only for the lowest number of detection channels. The simulations for simultaneous excitation with two laser wavelengths are approximate in the sense that they do not consider the fact that fluorescence detection has to be suppressed in a window around the wavelength of the second laser line. However, the availability of an acousto-optical beam splitter (AOBS) brings this hypothetical case close to reality. An AOBS can blank out the detection in a window of only 5 nm. We simulated such emission filtering in a few cases and found that the optimum FoMs of ECFP and EGFP were typically decreased by about 10%, whereas that of EYFP was almost unchanged.

Bleaching sets a limit to the resolution.  As in the case of a single fluorophore (Fig. 2), increased excitation intensity or longer exposure time will improve the S value of a measurement up to the point at which most of the dye is bleached (the FoM, being a relative measure of detection efficiency of photons, is less sensitive to changes in excitation intensity). At longer exposure times S values may even deteriorate if substantial background light is present. We simulated the case of immobile fluorophores that were bleached with an exponential time course reaching a 50% value at a time when about 200 photons had been detected. This number should be valid for the case that 10 molecules of EGFP are in the focal volume, assuming that about 0.5% of all emitted photons can be detected (Harms et al., 2001). We performed separate simulations with different bleaching thresholds (which led to different illumination intensities; see above, the description of ‘step 1’ in the optimization procedure) and calculated the resulting S values as a function of these bleaching thresholds. It was found that an S value of 0.02 is obtained when 90% of the dyes are bleached. This does not improve further upon more illumination owing to background light, which in this calculation was assumed to be white and to contribute about 80 photons (over the whole spectral range, for the case of 90% bleaching). The SNR for an individual dye is therefore limited to about 12 [=√(3/S)] under these conditions.

Differences in bleaching time course may help to separate fluorophores.  In the above calculation the bleaching time constants for the three dyes were assumed to be equal to each other, although it is known that there are distinct differences in bleaching between different fluorescent proteins (Harms et al., 2001; Patterson et al., 2001). We noticed that, when allowing for different bleaching time courses between dyes, the optimum solution quite frequently bleached one of the dyes more or less completely during a first exposure, which allowed us to separate the remaining two dyes during a second exposure without interference from the first one. This effect is not as pronounced when allowing ECFP to bleach because the spectra of ECFP are sufficiently distinguished from those of the other dyes that exposure patterns can be found that allow at least one exposure with little interference from ECFP, even without bleaching it. However, when (merely to demonstrate the principle) EGFP was turned into a dye with high susceptibility for bleaching (by increasing γj, Eq. A4) significantly improved solutions could be found. As an example we compare two cases. First, the three dyes were assigned equal γj values and the bleaching threshold was set to 40% for all dyes (such that 60% remained unbleached). An optimum S value of 0.050 was obtained. When subsequently the bleaching rate of EGFP was tripled and EGFP was allowed to bleach more completely, while the bleaching threshold for the other two dyes was maintained at 40%, an S value of 0.0425 was obtained. For this calculation the concentration of EGFP was adjusted such that it emitted the same number of photons as in the control run despite of its higher bleaching. The comparison shows that the temporal separation of photons from different dyes owing to differential bleaching in two consecutive exposures helps to avoid the ‘mixing’ of photons and therefore improves the resolution. It should be noted, however, that strong bleaching may cause additional problems when dyes are mobile, owing to the diffusional redistribution of non-bleached dyes.

Condition of the system of linear equations

In the preceding paragraphs we considered noise in light measurements and spectral overlap as the only reasons for inaccuracies in estimates of the concentrations of dyes. Other sources of error are incomplete knowledge of the dye spectra, the nature of the background light and, in particular, changes of dye properties in the environment of the cytoplasm. In practice one may have to introduce cellular autofluorescence as another ‘quasi fluorophore’ in the calculations and one may have to measure dye spectra ‘in vivo’. Nevertheless, errors in these quantities will remain, which will lead to errors in the elements of matrix A (Eq. (7) and in the background light vector yo. Such errors will be less detrimental to the resolution of the unknown vector x, for better condition of matrix ATA (Eq. 2). This, in turn, is largely determined by detATA, the determinant of matrix ATA or some other measure of its condition, such as detATA/(N/(n − 1))(n−1/2) (Bodewig, 1959; N is the trace and n is the dimension of ATA). There is a strong correlation between this condition number and S, the value optimized by the calculations above. Figure 8(A) plots 3/S against the condition number for about 137 000 values of 3/S, which were calculated during an optimization run. The best (largest) 3/S values also have large values of this condition number, although for satisfactory 3/S values the condition number can have a certain range of values. The correlation is far better between S and detH−1 (Eq. 16), which also includes information regarding noise in the experimental values (Fig. 8B). Therefore, the optimization of imaging parameters described here should also select for good cases with respect to the condition of ATA and H−1 and thereby minimize adverse effects of errors in other properties, which are not explicitly discussed here.

Figure 8.

Resolution and the ‘Condition’ of matrices. Part A plots 3/S (see Eq. 13) against the condition number of matrix ATA (see text). The calculation was performed using spectra of the three autofluorescent proteins ECFP, EGFP and EYFP with two consecutive exposures and four detection channels each. It was repeated 137 000 times, while systematically varying the spectral borders of the detection channels and the excitation strengths of all three dyes during the second exposure (optimal values during the first exposure had been determined in a previous calculation). Part B plots values of 3/S against the determinant of matrix H−1 (Eq. 11), which is CTC. In this experiment the bleaching threshold was set lower than that of Fig. 7 and Table 1. Therefore, the optimum resolution is somewhat lower.

Discussion

The method of ‘linear unmixing’ of the contributions from several fluorescent species, which may be present simultaneously at a given pixel in a microscopic image, has great potential for analysing images of multiply labelled biological samples (Keshava & Mustard, 2000; Tsurui et al., 2000; Lansford et al., 2001; Zimmermann et al., 2002, 2003). We have shown that a surprisingly small number of spectral detection channels is sufficient to calculate the individual fluorescence contributions in mixtures of three dyes (such as CFP, GFP, YFP) at near optimal resolution, even if the spectra of such dyes overlap strongly, and we describe procedures for optimal selection of the spectral ranges of such detection channels. To discuss the degree of resolution and the loss due to spectral overlap and various noise sources, we define a new quantity, the ‘figure of merit’ (FoM), which indicates how well a given set of imaging parameters performs for a given set of fluorophores relative to the case that these fluorophores are present singly and that their fluorescence can be measured noiselessly – except for unavoidable photon shot noise. The FoM is the ratio of the apparent number of detected quanta, estimated on the basis of the SNR of a given measurement, to the number of detectable quanta, as judged from the known properties of a given microscope and the spectral properties of the dye. For the three autofluorescent proteins ECFP, EGFG and EYFP and for two sequential exposures at 458 and 514 nm, the FoM is about 0.4, i.e. one would have to excite and detect about 2.5 times more photons to reach the resolution that would be obtained if the dyes were not mixed and if their fluorescence could be measured without background noise.

The FoM, as a measure of resolution, has the advantage over other similar measures, such as the SNR, that it is normalized with respect to the total number of individual photons during a given exposure and therefore is less dependent on illumination intensity. In this sense it is a measure of the efficiency of detectable photons in contributing to the information regarding the dye (similar to the quantum efficiency of a detector), whereas the SNR depends strongly on the cumulative effect of all detected photons.

Using the formalism developed in this paper on a real microscope requires that for the laser lines being used the microscope is calibrated in terms of the emission spectra, bleaching characteristics and excitation efficiencies of the dyes under consideration. Background light for all combinations of excitation and detection filters must be determined separately and the autofluorescence of the cells under study may have to be included as one or two additional ‘quasifluorophores’ (Zimmermann et al., 2003). By contrast, it is expected that more background light can be tolerated (as compared with conventional techniques), because it is being corrected for in the unmixing algorithm. We think the effort of calibrating a microscope in the sense addressed above is worthwhile, because the ability to unmix the contributions of several fluorophores accurately will open up a number of possibilities for quantitative fluorescence microscopy. One of these is the determination of ion concentrations using indicator dyes in the presence of other fluorophores or autofluorescence because, for instance, the measurement of free [Ca2+] concentration is merely the determination of the ratio of the concentration of Ca2+-bound indicator dye to that of free indicator dye. Treating both forms as two fluorescent species in the sense of the analysis presented here will allow us to determine this ratio and to estimate its accuracy in the presence of further dyes.

Another possible application is the analysis of fluorescence resonant energy transfer (FRET), in which the problem is to separate fluorescence contributions of FRET pairs (two fluorophores at a fixed spatial relation to each other, allowing FRET) from those of uncoupled donors and acceptors, as well as from autofluorescence and possibly the contributions from other labels (Zimmermann et al., 2002). In both of these special cases, additional complications will arise if illumination intensities are such that bleaching has to be considered. The complications arise because the fluorescent species considered here probably bleach at different rates and may interconvert on the time-scale of the measurement. This will require a reformulation of Eqs (A2)–(A4). With respect to FRET it will also be interesting to explore how much the sorting problem can be improved by temporal separation of photons in addition to the spectral separation. This is readily done (see Fig. 6) by introducing a lifetime threshold as another parameter (equivalent to a spectral border) and splitting the signal of the relevant detector channel into two, according to the known lifetime and the threshold. In this way fluorescence lifetime imaging (Clegg et al., 1992) can readily be combined with spectrally resolved measurements.

Given the enormous possibilities of molecular biology to label proteins in living cells in a specific manner and the need to determine subcellular localizations of such labels as well as to probe their interactions by FRET, we expect that analytical techniques for optimization of the ‘unmixing’ of such labels, as described here, will be extremely valuable tools in cellular biology.

Acknowledgements

We would like to thank Drs S. Hell, F. Wouters, M. Rupnik and R. Heintzmann for helpful comments on the manuscript. We also thank E. Lemke for providing microscope samples.

Appendices

Appendix A

Calculation of the elements aij of matrix A including effects of dye bleaching

The elements aij of matrix A (Eq. 7) represent the contributions of dye j to the signal measured in detection channel i under the illumination conditions chosen, if dye j is present at a certain standard concentration. They are calculated as products of a quantity inline image, which depends on the excitation efficiency and emission properties of fluorophore j within the spectral band of detection channel i and the degree of bleaching inline image of fluorophore j during exposure µ:

image((A1))

We include inline image in the Matrix A because we want the unknown x in Eq. (7) to represent the concentrations of dye, which were present at a given pixel at the beginning of an exposure or of a series of exposures. inline image carries an index µ because we want to include data from several consecutive exposures in the analysis and the degree of bleaching will change in such a series of exposures. Therefore, we define the vectors yo and y to have n · m elements, where n is the number of physically existing detector channels and m is the number of exposures. The index µ is incremented with each exposure from 0 to a maximum of m − 1. The quantities inline image will be the same for all rows i, which belong to the same exposure. Specifically, index i is being calculated as l + µn, where l is the number of a physical detector channel (0 < l < n − 1).

Values for inline image are calculated from the emission ej(λ) and absorption spectra aj(λ) of dye j and the spectral window of channel i (= l + µn) according to

image((A2))

where λl,0 and λl,1 are the spectral borders of the physical detection channel l. ηl(λ) represents attenuation in the detection pathway such as emission filters and wavelength-dependent detection efficiencies and inline image is proportional to the total excitation of dye j during exposure µ. It may be the sum of excitations at several laser wavelengths λk (k = 1, … , number of excitation lines considered), each of which is used at intensity inline image during the exposure time tµ.

Thus

image((A3))

ajk) representing the excitation spectrum of dye j.

The index µ is again included to allow for the fact that we want to use multiple exposures with different combinations of excitation wavelengths. In our actual implementation of the algorithm (programmed in C++, using Metrowerk's Codewarrior) we included a detector sensitivity function as a multiplier to e(λ), which we omitted in this description for simplicity.

The ‘bleaching parameters’inline image are assumed to be determined by a dye-specific bleaching constant γj and the absorption of dye j during exposure µ. During a first exposure the dye concentration is assumed to decay exponentially with rate constant inline imageγj/tµ such that at the end the dye remaining is xjexp(–inline imageγj) and the mean during the whole interval is xj(1 − exp(–inline imageγj))/(inline imageγj). We therefore set

image((A4))

with inline image for all detection channels that correspond to the first exposure and

image((A5))

for all µ > 0, inline image representing the product of the bleaching factors of dye j during exposures preceding the µth one.

The components of vector yo (Eq. 7) are calculated in analogy to the rows of matrix A as a sum of contributions from the individual excitation lines at exposure µ with intensity inline image and exposure time tµ:

image((A6))

Here, inline image are spectral functions of scattered light or leak-light, resulting from laser line k during exposure µ and inline image are excitation independent signals such as dark counts, offsets of detectors and room light.

It should be noted that, for simplicity, we assume bleaching to be strictly proportional to excitation (i.e. differences in quantum efficiency between dyes have to be accommodated in the ‘bleaching coefficient’γj). Also, we would like to point out that bleaching may lead to changes in the spatial distribution of dyes during confocal laser scanning. Such effects, which depend on the mobility of dyes, scan speed and other factors, are not included in the current treatment of the bleaching problem

Appendix B

Calculation of signal variance and FoM

For the calculation of signal amplitudes and the contributions to these from individual fluorophores (the matrix elements aij) it is irrelevant whether the detectors measure the energy of the light signal or photon counts, as long as the spectra used are based on the same type of measurement. In order to predict the noise of the measurement, however, the type of measurement has to be specifically considered. For Eqs (3)–(6) we were assuming that the signals represented photon counts and that emission spectra are normalized (∫e(λ)dλ = 1). We now discuss means by which to handle detector signals based on the measurement of light energy. We also need to specify what we mean by ‘normalized’ emission and excitation spectra in this case. Because we use the signals in a system of linear equations, we are quite free to define the units of excitation intensities and output signals; however, it is necessary to ascertain that individual signals (such as from different dyes, background and single photon signal) are specified in the correct relation to one another and to the bleaching process. On a real microscope an optimal choice for the definitions will be highly influenced by the instrumentation. For the simulations on the dye mixtures presented here, we took emission and excitation spectra from the literature and defined the unit of excitation light intensity as that of the peak of the excitation spectrum (note that in reality this is different for different dyes, such that values for bleaching constants have to be scaled accordingly). We defined the unit of the output signal as the peak of the emission spectrum and we set the single photon signal such that a typical detector signal would correspond to 100 photons. Similarly, we chose the bleaching coefficient such that 100% bleaching allowed us to collect 400 photons over the whole emission spectrum at unit dye concentration.

Considering fluorescence as a Poisson process, in which during a given exposure time n(λ) · Δλ individual single photon signals, each with energy s(λ), superimpose in the spectral window Δλ, we obtain for the total light energy y:

image((B1))

and for its variance inline image:

image((B2))

where we neglected background and have used the proportionality between s(λ) and 1/λ to replace s(λ) by soλo/λ, so representing the single photon signal at some reference wavelength λo. The factor inline image was introduced to allow for the fact that in a non-photoncounting detector the noise is increased owing to the dispersion of single photon signals, CVs representing the coefficient of variation (sigma/mean) of the single photon signal.

We define soλon(λ)/λ as the light energy-based emission spectrum ey(λ):

ey(λ) ≡ soλon(λ)/λ((B3))

and calculate the light signal originating from dye j (at concentration xj), which is excited at total excitation strength inline image during an exposure µ, as:

image((B4))
image((B5))

Here, we neglected bleaching and used some of the symbols introduced in Appendix A.

Considering the light originating from dye j in the spectral window of detector channel ii,0 < λ < λi,1) and introducing a filtering function ηi(λ) we arrive at Eq. (A2):

image((B6))

which is xjaij (however, without bleaching).

In order to express inline image in terms of aij we consider the mean weighted inverse wavelength on the spectral window i

image((B7))

and arrive at

image((B8))

as the contribution of dye j to the variance in channel i.

The total variance inline image of the signal in detection channel i is the sum of contributions from the fluorophores and from background light. All of these terms are of a nature similar to that of the fluorophore j above, such that

image((B9))

Here we replaced for simplicity inline image by 1/inline image and introduced inline image the background variance of the signal in detector channel i.

The ‘figure of merit’ of dye j (FoMj) is the ratio of napp.j, the apparent number of detected photons from a given fluorophore j to nmax.j, the maximal detectable number of photons (see Eqs 3–6).

To calculate FoMj, we take napp,j as the ratio of the square of the concentration of dye j and hjj, the diagonal element of Matrix H (see Eq. 11). We calculate nmax,j as another ratio:

image((B10))

where yj,max is the maximum fluorescence signal from dye j (no emission filter; η(λ) ≡ 1) and inline image is the variance of the photon shot noise (no background noise). Introducing these conditions into Eqs (B4) and (B5) we obtain

image((B11))

The integration, which has to extend over the whole emission spectrum of dye j, is readily performed. However, in practice we replaced it by

image((B12))

which, for typical dye spectra, is almost indistinguishable from the former. For both Eqs (B6) and (B12) inline image has to be multiplied by inline image if bleaching is being considered.

Appendix C

The FoM as an ‘apparent quantum efficiency’

The FoM, being defined as a ratio of an apparent number of detected photons (as judged from the measured SNR) and the number of detectable photons, is closely related to the quantum efficiency of detection. In the main body of this paper we considered as detectable photons the number NF of photons that can be counted with a given detector, disregarding the fact that a real photodetector detects only a fraction of the available photons. With an ideal detector, the number of detectable photons would be larger by a factor 1/qe, where qe is the quantum efficiency. Thus the FoM of Eqs (3), (5) and (6) has to be multiplied by qe if the influence of detector efficiency is to be considered. Then the FoM can also be viewed as an ‘apparent quantum efficiency’ in the sense that it is a ratio of apparently detected photons (as judged from the SNR) over the actual number of photons arriving at the detector.

In fact, Tan et al. (1999) introduced such a measure, which they termed inline image:

image(9)

They pointed out that for a signal with no excess noise inline image is equal to qe and that for large signals (NF ≫ NR) the best detector is always the one with the highest quantum efficiency, whereas for small signals a detector with large read-out noise (such as a CCD) may have low inline image, even for qe close to 1 (see Tan et al., 1999, who compared several different detectors in an imaging set-up).

Ancillary