The remarkable success of cytometry over the past 30 years is largely due to its uncanny ability to display populations that vastly differ in numbers and fluorescence intensities on one scale. The log transform implemented in hardware as a log amplifier or in software normalizes signals or channels so that these populations appear as clearly discernible peaks. With the advent of multiple fluorescence cytometry, spectral crossover compensation of these signals has been necessary to properly interpret the data. Unfortunately, because compensation is a subtractive process, it can produce negative and zero valued data. The log transform is undefined for these values and, as a result, forces computer algorithms to truncate these values, creating a few problems for cytometrists. Data truncation biases displays making properly compensated data appear undercompensated; thus, enticing many operators to overcompensate their data. Also, events truncated into the first histogram channel are not normally visible with typical two-dimensional graphic displays, thus hiding a large number of events and obscuring the true proportionality of negative distributions. In addition, the log transform creates unequal binning that can dramatically distort negative population distributions.
Methods and Results
The HyperLog transform is a log-like transform that admits negative, zero, and positive values. The transform is a hybrid type of transform specifically designed for compensated data. One of its parameters allows it to smoothly transition from a logarithmic to linear type of transform that is ideal for compensated data.
Since cytometry's infancy, scientists and engineers have known about the large dynamic range of many molecules on and within cells. If typical immunofluorescence data were presented on a linear scale, it would be difficult, if not impossible, to visually appreciate cell populations separated by large intensity differences for one or more measured parameters.
Solving this display problem required a display transform that not only compressed the absolute dynamic range of fluorescence signals from these molecules but also normalized their relative differences. The logarithmic (L) transform was almost the perfect mathematical solution to this display problem and cytometers were engineered so that some signals could be logarithmically amplified before digitization, storage, and display (1). The success of cytometry has been due in large part due to this ability to visualize immunologically defined populations on a single log scale. The L transform and its inverse are typically defined as
where x, the relative linear channels, is in the set of all real numbers; L(x) is restricted to the interval [0,r), and E(y), [1,10d); r is the analog-to-digital resolution; and d is the number of decades for the dynamic range of x.
The independent variable, x, of the L transform can admit numbers in the real domain, but only those numbers equal or greater to some threshold, normally set to one, are evaluated by the log function. All other values of x cause the L transform to return a zero. The L transform can be implemented in hardware as a logarithmic amplifier or in software as shown above. Either way, the L transform must protect against taking the log of a value less than or equal to zero. The E transform is the inverse of L such that E(L[x]) = x for x ≥ 1. The restricted range of x in this equivalency creates a number of problematic issues for graphics and analysis of L-transformed data.
Figures 1A and 1B illustrate the positive attributes of the L transform (M&M, see Example 1 for details). Figure 1A shows two hypothetical populations on a single-parameter linear scale. Events that constitute the rightmost population, H, are normally distributed with relatively arbitrary means and standard deviations (S.D.). The events that comprise the leftmost population, SL, are multiplicatively scaled to 1% of the H population, thus maintaining a constant coefficient of variation (C.V.). The large decrease in intensity and variance compresses the SL population into a few channels, making it difficult to appreciate as a separate population.
If the same data are L transformed (Fig. 1B), the SL population is visually distinct with the same apparent S.D. as H. The positive characteristics of the L transform are that it minimizes the mean and variance differences for populations with similar C.V.s.
Unfortunately, the L transform also has some well-known disadvantages. Zero and negative valued data are undefined and must be truncated to zero. In addition, there are pronounced binning effects that dramatically affect the distribution of populations near the origin.
To better illustrate these problems, Figure 1C shows a parent population, H. The H population events are translocated to 1% of their original intensity, forming a TL population that has negative and positive values (M&M, see Example 2 for details). When these data are L transformed, TL is distributed over much of the axis. A peak is observed in the middle portion of the axis that continually decreases in frequency until the origin, where about half the TL population is accumulated into the first histogram channel. For data that are translocated to or near the origin, the L transform is not a suitable because low intensity population S.D. values or variances are not preserved and their distributions are highly distorted.
The ability to analyze and graph data defined over the real domain of numbers with a log-like transform has wide ranging applications in many scientific disciplines. A solution to the L problem was first published by Johnson in 1949 (2) who proposed using a generalized inverse hyperbolic sine (IHS) function, Su, to define a log-like transform that spanned the real number domain from negative to positive infinity.
In 1981 Bickel and Doksum (3) proposed a modification to the well-known Box-Cox (BC) transform (4) that was also log-like and would admit negative numbers. In 1989, Burbidge et al. (5) compared the IHS and BC transforms and came to the conclusion that the IHS had more merits. The IHS transform has since been selected as a surrogate L transform for signed data in numerous publications from various scientific disciplines ranging from agriculture to astrophysics (6, 7).
Many traditional statistical methodologies (e.g., regression, analysis of variance) require that data be normally distributed and have constant variance. As seen in Figure 1B, the L transform does have a “stabilizing” influence on S.D. or variance with multiplicatively scaled data and thus has been extensively used to transform this type of data into a form amenable to statistical analyses with the constant variance requirement. In some cases, however, data will have negative, zero, and positive values that complicate the use of the L transform. In these cases, other transformations such as the generalized logarithm (GLOG) (8), Started Logarithm (SL) (9, 10), and the log-linear hybrid (LL) (11) have been shown to be log-like by preserving the variance with multiplicatively scaled data that also can accept extreme positive and negative values. In 2003 the performance of these three different transforms was evaluated by Rocke and Durbin (12) who found that the GLOG transform was the best choice for gene-expression microarray data.
In the field of cytometry, this issue is important because compensation is a process that involves subtraction or translocation (13), resulting in negative, zero, and positive value data. Compensated data have multiplicative and translocated characteristics. The multiplicative attribute is fundamental to measurement processes such as found in cytometry. DNA histograms are an excellent example; the S.D. of the G2M population is very close to twice G0G1. The C.V. of a population is the ratio between its mean intensity and variability; therefore, for the most part, C.V. is preserved in measured data. As shown in Figure 1B, this type of data is particularly amenable to the L transform because constant C.V. translates to constant S.D. values or variances, making the populations clearly distinguishable.
The other characteristic of compensated data is translocation. In the compensation process, a population is moved or translocated by subtraction along one or more of its parameters to or near the origin of an axis without significantly changing its variance. As shown in Figure 1D, the L transform is particularly unsuited for this type of data. An optimal transform for compensated data must be able to handle both characteristics.
In 2002 Parks and Moore (14) proposed using a modified IHS transform as a replacement for the L transform for compensated data, called the Logicle/Biexponential (LB) transform. Because it is not necessary to protect the family of IHS transforms from negative or zero data, there is no inherent bias in the transformed data. Thus, properly compensated data do not appear overcompensated with the LB transform, which eliminates a major source of error in cytometric data analysis. Parks and Moore generalized the IHS transform to better meet the needs of displaying compensated data. Parameters were added to the hyperbolic sine exponentials to better control the size of a linear window through the origin, which helped eliminate low intensity binning artifacts and controlled the variance of negative populations.
In 2003 the HyperLog (HL) was released in software (15) and later presented (16) as a log-like transform optimized for compensated data. The HL transform was engineered as a hybrid function. One component of the function is ideally suited to the multiplicative nature of compensated data, and the other to its translocated nature. The transform smoothly transitions from one type of transform to the other, depending on the analysis or graphics needs. The HL transform is similar but not identical to LL and fundamentally different from the IHS types of transforms. The HL transform is an inverse hybrid linear/exponential function that is defined over the real-numbered domain. The HL transform is flexible enough to represent unbiased compensated data with visually pleasing axes (Figs. 6A,B, 7).
The purpose of this paper is to mathematically describe the HL transform, investigate its properties, especially in eliminating binning artifacts, and demonstrate its usefulness in representing axes for cytometric data. Although the HL transform was designed specifically for the unique characteristics of compensated data, it is flexible enough to be used in a more general manner.
MATERIALS AND METHODS
Simulations were performed by MathCad 2001 Professional (MathSoft Engineering & Education, Inc., Cambridge, MA)
The following examples were designed only to demonstrate the problems associated with the L transform for translocated data. These examples do not incorporate data measurement errors due to photon counting or digitization.
Example 1: Multiplicatively scaled data (Fig. 1A, B).
Populations are normally distributed using the Box-Muller equation (17) from 10,000 synthesized events. The mean and S.D. of the unscaled population, H, are 600 and 50, respectively. The linear scale and L-transformed scales range between 0 and 1,000 (r = 1000, d = 3). As described for the L transform, r is the analog-to-digital resolution and d is the number of decade dynamic range. The low-intensity population (SL) is formed by assuming constant C.V. and scaling the mean and S.D. to 1% of their original values, 6 and 0.05, respectively.
The same method as described for Example 1 was used to create the high-intensity population, H. The low-intensity population, TL, is formed by translocating H to 1% of its original value, 6, while preserving the original S.D. at 50.
The same method was used as described for Example 2, except that the TL population was translocated to the origin.
The inverse of the basic HL transform is given by
where a and b are constants and y is defined over the real-numbered domain.
A more practical base-10 form of this transform is
The constant d, decades, has a similar definition as described for the L transform; at y = r, EH(y,b) is at its maximum.
The EH transform is continuous and symmetric about the origin as shown in Figure 2 and its zoomed inset. Three different b coefficients, 0, 35, and 100, are shown to demonstrate how b affects the transform through the origin (Fig. 2, inset). The H transform is also shown to illustrate how EH(y,b) approaches E(y) for y >> 0. Note that the number of decades for these transforms is 3 to correspond to the example data shown in Figure 1. With four or more decades, the relative effect of the linear term in the EH transform in the high intensity region will be a lot smaller than that shown in Figures 2 and 3.
The HL transform is the inverse of EH, which is found by using a suitable root finding algorithm (18) and restricting the roots to nonimaginary values.
where root(…) is a standard root finding algorithm (18) that finds y such that EH(y) = x.
Figure 3 shows the HL transform with b coefficients 0, 35, and 100 and the corresponding L transform. For extreme values of x, the HL and L transforms have very similar characteristics.
Binning Effects and Population Splitting
Transformations of axes can radically change the shape of histograms due to unequal bin sizes as was shown for the L transform in Figure 1D. The reason unequal bin sizes causes distortion is depicted in Figure 4. Figure 4A shows a histogram with four equal-sized bins containing the same number of events in each bin. When a transform is applied to the x bin boundaries, the sizes for each histogram bin can change as shown in Figure 4B. Because the frequency in each bin must be preserved, the height of the histogram bins must change inversely as their widths change. In the case of the L transform, this binning effect creates a peak in the middle of the axis with a decreasing continuum approaching the origin as shown in Figure 1D.
The HL transform with b = 0 also has binning effects near the origin, often splitting negative populations into two peaks (Fig. 5, b = 0 distribution). This artificial population splitting is a graphic distortion that should be minimized to unambiguously appreciate separate clusters.
By increasing the b coefficient of the HL transform, this peak splitting can be eliminated as shown in Figure 5 for b = 35, 100, 277, and 333. A simple approach to finding an appropriate b coefficient would be to manually set it to a value large enough to eliminate peak splitting for a number of datasets. For four-decade cytometric data, a single b coefficient of 100 can be used on a wide variety of datasets with different amounts of applied compensation (Fig. 6A,B).
The critical b coefficient, bc, that just eliminates peak splitting for an arbitrary population at the origin is calculated as follows.
Let F(i) return the number of events for the untransformed population's channel i. By using the chain rule for inverse functions and taking the derivative of i ≥ 0 part of EH(I,b), the binning effect on F(i) is given by
for i ≥ 0.
If we assume that the S.D. of the negative population is sd, we can write a characteristic function, C, as
By using a suitable minimization method on C varying b (19), we can find bc. A negative distribution will be very flat between zero and sd when HL transformed with b = bc. If b < bc, the data will appear split. If b >> bc it will appear as a single population. For our example data, bc is calculated as 35.332. See Figure 5, b = 35, to appreciate the flat distribution that is associated with bc for Example 3 data.
A reasonable approach to finding an “appropriate” b coefficient might be to maximize the Fisher distance function for negative (mean2 = 0, sd2) and positive (mean1, sd1) populations in HL-transformed space. The Fisher distance function is given by
The distance is maximal when the means between the two populations are far apart and their respective variances are relatively small. The HL transform characteristic function that when maximized (19) yields the “optimal” Fisher distance between negative and positive distributions is given by
For H and TL populations in Example 3, the b coefficient that maximizes the Fisher distance is 276.67 (Fig. 5, b = 277).
The L transform normalizes extreme positive valued data and stabilizes the variance for multiplicatively scaled data. An interesting question to ask is whether the HL transform with a suitable b coefficient, bt, can stabilize the variance of translocated data to the origin (Fig. 1C,D). When the b coefficient is very high, the HL transform will be close to linear and would conserve the variance in a way that is similar to that in Figure 1C. Unfortunately, the transform would have very little log-like characteristics that would be undesirable.
However, one can find a b coefficient, bt, where the linear S.D. or variance of a positive population is approximately the same as the S.D. of the translocated HL-transformed population. For this equality to hold, then
Solving for bt,
If the term d · sd/r ≪ log(sd), which it normally is for typical values of d, r, and sd, then the above equation simplifies to
Because bt is relatively independent of sd and the positive population mean, the above equation is quite general for most cases. For the Example 3 data, bt is calculated as 333.33 (Fig. 5, b = 333). Over the range sd = 1 … 100, the sd of the population translocated to the origin varied by less than 1% from its original sd with bt = r/d for the Example 3 data.
The ability of the HL transform to accept negative numbers usually requires that a lowest negative HL transformed value be determined. This boundary is set manually or calculated as the value that excludes some percentage (e.g., 5%) of negative events.
Multiparameter data generated from four different fluorescent proteins are traditionally difficult to compensate because of their relatively weak fluorescence and significant signal crossover. Therefore, this type of data is a good test case for comparing HL and L transforms (Fig. 6). As shown in Figure 6A–C, HL axes can represent compensated data in an unbiased display format. Compare the same data with traditional log axes (Fig. 6D–F). The encircled population in Figure 6D clearly shows compensation bias for the L transform, making the data appear to be undercompensated. The corresponding population in Figure 6A does not show this bias. Figure 6F shows an isometric display of data in Figure 6D to better demonstrate the truncation and binning effects of the L transform, resulting in high number of events directly on the axes. The HL-transformed data shown in Figure 6C does not show these undesirable effects.
HL axes also have a well-balanced appearance (Figs. 6A,B, 7). Part of the reason is that the linear axis distance is approximately the same as the adjacent log axis distance (Fig. 7, linear and log distance arrows). Stating this relation mathematically,
where R ∼ 1 for b = 0 … 100.
The rightmost inset in Figure 7 shows that R is approximately 1 for a wide range of b coefficients, 0 … 100, giving the axis its well-balanced and aesthetically pleasing appearance.
The HL transform is a log-like transform for display or analysis systems that need to span the real-numbered domain. The inverse of the HL transform, EH, is an exponential/linear hybrid function designed specifically for compensated data. Each value that comprises the HL transform is calculated by a suitable root-finding routine. If the transform is to be used in software, these transformed values are normally entered into a lookup table with linear interpolation. This requirement is not really a problem because transforms such as L and exponential are normally implemented with lookup tables because transcendental functions are inherently computationally expensive.
L and log-like transforms can create binning artifacts that can be confusing if not minimized or eliminated. For example, the HL transform with b = 0 can cause a population located at the origin to appear as two separate populations defined in the negative and positive domains (Fig. 5, b = 0). Negative population splitting is easily eliminated by increasing the b coefficient to some positive value (Fig. 5, b > bc).
The b coefficient smoothly transitions the HL transform between two different types of transforms. With a low b coefficient, the transform is more log-like and thus stabilizes population variances that are multiplicatively scaled (e.g. constant C.V.). As the b coefficient approaches bt, the transform becomes more optimal for stabilizing the variance for translocation or subtractive scaling. The “appropriate” b coefficient is ultimately determined by which of these two types of scaling is most important for the analysis or graphics at hand.
For graphic representation of compensated data, the lower limit of this appropriate b coefficient is bc because negative population splitting is not a desired characteristic. The upper limit should be close to bt because optimizing the transform for translocated data represents an extreme for compensated data and results in a significant decrease in the separation between negative and positive populations. One possible approach to finding an optimal b coefficient that hopefully would fall between the extremes of bc and bt would be to maximize or minimize some characteristic function that has some desired functionality.
One obvious characteristic function would be the Fisher distance between a negative and a positive population. It makes intuitive sense that a good transform would attempt to separate these two populations as best as possible. At first this approach appears to be an ideal solution, but finding this optimal HL transform is relatively easy for the simple example as described in this report but becomes complicated with real data. With real data there may be many positive populations with ill-defined boundaries and it may not be clear which one to use for the optimization. It also may be that it is more important to the investigator to separate two positive populations than a positive and a negative population.
Another possible approach would be to find the HL transform that minimizes the difference between all the observed population variances. This approach also has problems for real data. It requires some kind of method for identifying these populations, which may be very difficult to do automatically. For some, this approach might create an aesthetically pleasing graph; for others, it may distort the log nature of the transform and create strange looking axes. Another approach might be to find the lowest b coefficient that makes the negative population similar to a Gaussian (16). This approach would appeal to those who wish to minimally perturb the log-like nature of the transform.
Transforms defined over the entire real domain such as HL neither create nor destroy information; they only rearrange information. When used as a graphics transform, a family of HL transforms formed by different b coefficients is essentially equivalent from the point of view of information content, and the decision for finding the “best” b coefficient is somewhat arbitrary and largely influenced by a user's or designer's sense of aesthetics.
Complicating this issue is that any method that optimizes the HL transform or other similar log-like transforms based on data will ultimately cause the axis to change as the data changes. This variability can create problems for investigators who wish to compare data. Comparing data is so common in the scientific process that the “changing axis” problem in many ways outweighs any benefits in optimizing the transform.
Another aspect to this decision is the complexity of the transform axis. If the axis appears strange in any way, it tends to distract the viewer from appreciating the information contained in the data, a well-known problem in graphic design. If the b coefficient is a power of 10, the HL-transformed axis smoothly transitions from a complete linear scale to a log scale yielding an axis that is well-balanced and simple to interpret (Figs. 6A,B, 7).
When HL is used in graphic displays, the “best” b coefficient is normally the simplest. For four-decade log data, a value of 100 works for just about all cases. The axes are consistent from one dataset to another and thus have the important advantage of being comparable. The axes are interpretable by any scientist who is familiar with linear and logarithmic axes and do not appear needlessly complicated. Thus, even though the HL transform is amenable to optimization techniques, software engineers should be discouraged from forcing the user to use a data-dependent optimized result.
Other types of transforms (e.g., GLOG and LB) can also be used to display compensated data in an unbiased manner. The advantages and disadvantages of these transforms are relatively minor, especially for graphic displays. The ultimate success or failure of these approaches will largely be determined by the aesthetics of the axes they produce.
In summary, the HL transform is a log-like transform that was originally designed specifically for the display of compensated data. Its ability to smoothly transition between exponential and linear scales gives it desirable features that may have general application beyond the visualization of unbiased compensated data.
Mark Munson, Don Herbert, Chris Bray, and Ben Hunsberger played important roles in developing this report.