## INTRODUCTION

Classes of bioparticles are often defined by the type and quantity of biomarkers present in each analyzed particle. Flow cytometry (FC) typically quantifies the presence of these biomarkers by tagging them with fluorescent molecules. However, the raw FC measurements do not directly yield the biomarker quantity or label concentration; instead, they provide values that are proportional to the number of photons measured by the individual photodetectors.

The optical pathway of FC instruments is arranged in an attempt to separate signals from different fluorochromes by routing them into dedicated detectors; however, owing to spectral overlap and imperfect filters, a complete separation is almost never possible. Therefore, the fluorescence emitted by every fluorochrome may be simultaneously collected by more than one detector (in extreme cases, all the detectors). This process can be mathematically represented as a linear mixing of signals and is a subject of study in various fields of science ranging from chemometrics to imaging and remote sensing (1–6).

Let **r** denote the vector of observations of length *L* (the number of detectors employed in the FC system), **M** an *L* × *p* spectral-signature matrix (*p* being the number of labels used in an experiment), **α** the vector of length *p* of abundances in which α_{ i} represents abundance (amount) of the *i*th label in the measured object, and **e** a vector of length *L* which denotes noise. Therefore, the phenomenon of “spectral spillover” that leads to signal mixing may be represented using a basic linear spectral mixture equation:

The linear-mixture model assumes that multiple signals measured from every particle can be expressed as a linear combination of spectral signatures with appropriate abundances α_{1}, α_{2}, …, α_{ n}. The cytometry literature usually refers to these values (however incorrectly) as “compensated fluorescence.”

In traditional polychromatic FC, the number of detectors employed is equal to the number of labeled markers; thus, in order to find the abundances (or values linearly correlated with abundances), the unmixing operation can readily be performed by multiplying the measured data vectors (or raw fluorescence observations) by the inverse of the spectral-signature matrix (also called the mixing matrix):

where is the unmixed approximation of **α**. Although the mixing matrices are a priori unknown, they can be easily approximated by employing single-stained controls and normalizing the resultant spectra. This process leading to the recovery of abundances is known as FC compensation and is described extensively in the FC literature (7, 8).

However, the number of detectors employed in an FC experiment does not have to be limited to the number of fluorochromes and may be significantly larger. This type of optical arrangement is characteristic of an emerging class of spectral FC systems, which attempt to measure an approximation of the full spectrum emitted by every analyzed bioparticle. The measurements produced by a spectral system may represent fluorescence, Raman, or surface-enhanced Raman scattering characteristics (9–12).

An attempt to recover abundances from spectral measurements leads to a mixing model with matrices that are not square, resulting in an overdetermined system of equations. This is seemingly a trivial problem, as the standard compensation approach can easily be extended by using the pseudoinverse of an overdetermined mixing matrix in a process known as ordinary least-square (OLS) minimization.

Although overdetermined unmixing is a new issue for FC analysis, it is often used in various imaging techniques ranging from microscopy to remote sensing (2, 3). These techniques usually rely on OLS to find the optimal vector of abundances. However, the OLS method is valid only if the noise in Eq. (1) is Gaussian and has equal variance irrespective of the signal level. Therefore, it is legitimate to inquire whether this widely accepted approach is appropriate for spectral FC and other techniques based on fluorescence.

In this report, we will demonstrate that, owing to the physics of signal formation in cytometry, the OLS solution is biased and does not provide a correct estimation of abundances for spectral FC systems. Therefore, it should not be employed for fluorescence-, Raman-, or surface-enhanced Raman scattering-based cytometry. We will also propose and discuss alternative approaches: an approximation based on minimization of percentage error using weighted least squares (WLS), a technique explicitly addressing the distribution of the fluorescence signal and employing a generalized linear model (GLM), and a simplified solution using a variance-stabilization transformation commonly employed in image denoising.

The reported data represent simulations and real multispectral measurements obtained using a 32-channel experimental system designed at Purdue University (12). The goal of the simulations is to demonstrate the known and proposed approaches in a simple and straightforward fashion without reference to any particular biological application. This simulation also allows us to validate unmixing algorithms by comparing abundances known a priori to the estimated values after unmixing.

In the case of the real experimental data, we are able to compare the unmixed abundances to abundances obtained by measuring control samples. Additionally, the changes in distribution of estimated intensities introduced by different unmixing methodologies demonstrate their impact on the estimation of fluorochrome concentration and on the relative position of biological populations in the feature space.