Flow cytometry-based analysis has provided detailed knowledge of the phenotypic cellular composition of a large number of organs. In addition, subsequent prospective isolation by fluorescence activated cell sorting of predetermined phenotypically defined cellular subsets has given insight to the functional and genetic characteristics of these cells. The hematopoietic organ has been subject of intense immunophenotypic investigation, revealing a high complex composition. Herzenberg and coworkers were pioneers in the development and implementation of flow cytometry to identify the cellular B-cell subsets that defines differentiation of a distinct hematopoietic lineage , and studies following similar principles have and are highlighting the immense cellular and functional diversity that associates with hematopoietic development and particularly with its specific effector cells. Together, these studies have paved the way for the implementation of flow cytometry also in clinical practice, in for instance T-cell (CD4) subset determination in cases of human immunodeficiency virus , or immunophenotyping of hematological malignancies .
Apart from the identification of hematopoietic effector cells, flow cytometry has been widely applied to identify the hierarchical stages of the developing hematopoietic organ . Due to its large immunophenotypic diversity, multicolor flow cytometry is a prerequisite to identify distinct cellular subsets in the hematopoietic organ at high resolution. The availability of an ever-increasing number of fluorophores  and developments in commercially available flow cytometers now allow for definition, and subsequent prospective isolation of cellular subsets based on up to 17 distinct immunofluorescent parameters [6, 7]. Moreover, recent developments in spectrometry-based single cell mass cytometry have at least doubled that number of parameters detectable in one single cell [8, 9]. However, all of these analyses are complex and put a high demand on experimental procedures. Suboptimal experimental design, cell-preparation, acquisition, and/or analysis impose the risk for flawed determination of cell surface protein expression levels, especially in very infrequent cellular subsets. Therefore, it is crucial to apply more standardized approaches in frequency determination of cellular subfractions.
Here, we discuss the value of multicolor flow cytometry in the identification and prospective isolation of the infrequent bone marrow-residing hematopoietic stem cells (HSCs). We focus on a set of critical aspects in these analyses that are key to perform accurate frequency determinations of these rare cells. Although exemplified here for HSCs, most of these aspects should have a more general applicability.
HSC—The Value and Pitfalls of Multicolor Flow Cytometry
Bone marrow-residing HSCs, owing the capacity to functionally replenish all cell-lineages within the hematopoietic system, constitute about 0.005–0.01% of all bone marrow cells [10, 11]. This infrequent nature has imposed challenges on the immunophenotypic identification and isolation of these cells. In addition, it became clear from earlier studies that no single cell surface marker could specifically define the HSC, a fact that still is valid. In mice, the first reports on the purification of HSCs by flow cytometry were based on the expression of WGA (wheat germ lectin) and H-2K , or on the expression of Sca1, low expression of Thy1.1 and the absence of lineage marker expression (such as B220, CD4, CD8, Gr1, Mac1, Ter119, CD5, and NK1.1) [13, 14]. Further enrichment was achieved by excluding cells that lacked expression of cKit . These cKit+/high, lineage-, and Sca1+/high (KLS) cells constitute about 0.05% of total BM cells, but still contain only about 10% bonafide HSCs.
The latter illustrates the necessity of additional markers to purify HSCs to homogeneity. Subsequent studies identified murine markers that, within the KLS population, further enriched for HSC activity. For instance, Nakauchi and coworkers identified KLSCD34-/low , Jacobsen and coworkers identified KLSCD34-Flt3- , and Morrison and coworkers identified KLSCD48-Slamf1+ cells  to contain HSC activity at high frequencies. Goodell and coworkers combined expression of some of these cell surface molecules with dye exclusion properties of the HSCs (side population; recently reviewed in this journal ). Both within the human  and the murine system , cellular subsets have been defined that at single cell levels are able to reconstitute conditioned recipients at relatively high frequencies. Nevertheless, enrichment to homogeneity still remains a challenge. Figure 1 illustrates an example of a flow cytometric analysis in which pre-enriched (cKit-enriched) mouse bone marrow cells were stained with antibodies against the indicated cell surface markers. This figure exemplifies the usefulness and necessity of multicolor flow cytometry and demonstrates the interrelation of some of the “HSC markers” mentioned above. As is clear from this figure, neither Flt3 or Slamf1 (or CD150) nor CD34 expression within the KLS compartment (Fig. 1, three middle plots) defines a “true” HSC-phenotype, as subsequent gating on KLSFlt3−, KLSSlamf1+ or KLSCD34− cells displays events that express “non-HSC markers” (Fig. 1, three lower plots).
Also in the human system, past and recent studies now enable us to identify immature hematopoiesis with increasingly high resolution. Figure 2 displays a composite, 17-parameter (including 13 separate fluorophores and a “dead-cell marker”) flow cytometric analysis of human bone marrow mononuclear cells based on published literature [20-25]. The figure illustrates simultaneous detection of nine different hematopoietic cellular subsets (populations 1–9) in one single sample, with population 9 representing cells highly enriched for HSC activity. However, an increasing number of fluorophores within one sample means more potential spreading error measurements, which can compromise the ability to optimally detect all different “colors.” As exemplified in Figure 2, CD11b and CD4 only give very weak signals, although these antigens are normally highly expressed and detected at high resolution. This issue becomes increasingly important for antigens that are expressed at low levels, such as CD49f, a cell surface protein recently shown to associate with human HSCs . In Figure 2, discrimination between CD49f positive and negative populations is difficult as a result of sample compensation, as well the fact that other bright fluorophores already were used to detect alternative cell surface antigens. Therefore, this staining illustrates that “more colors are not always better,” and that an optimal staining is restricted to the maximum of necessary colors/channels used. The use of a so-called “dump channel,” as is discussed later, is one solution to optimize a flow cytometry-based staining.
Experimental Setup and Layout
A number of considerations should be taken into account when identifying very infrequent cells using these multicolor experiments. Below, we will discuss some issues that, in our view, are of importance to obtain reliable and reproducible frequency determinations.
A proper experimental design is a first and essential step of any experimentation and positive and/or negative controls (such as treated versus nontreated, or wild-type versus knockout cells) are necessary for assay evaluation. In a flow cytometry experiment, these controls sometimes overlap with controls used for gate setting when defining an expressing (positive) versus a nonexpressing (negative) population.
Gate Setting Controls
“Gate setting” controls are often referred to as fluorescence minus one (FMO) controls (Fig. 3, lower middle plot) . In short, FMOs are cell preparations that are stained with antibodies against all experimental markers except one. Table 1 illustrates the use of three different FMOs in a hypothetical experiment using antibodies against the markers A, B, and C. Because background (or unspecific) signal in a specific “channel” can be very different in different subpopulations within the same stained sample, it is relevant to determine the background signal in the population of interest. These differences in background may be due to differences in autofluorescence or unspecific binding of staining reagents of different cell types. Eosinophils, for instance, exhibit high autofluorescence compared with other leukocytes . In the experiment represented by Table 1, the background signal for marker C within the A + B+ compartment (Table 1, FMO − C) might be different from that in the A − B+ or A + B− compartment. This is an important issue when analyzing HSCs and other immature blood cell compartments [28, 29] because many of the markers used present with gradual expression levels, e.g., no distinct positive and negative populations can be observed (illustrated in Fig. 1).
Table 1. Staining strategy
Overview of compensation controls (SS: single stained) and gate setting controls (FMOs: fluorescence minus one) in a flow cytometric experiment using three different colors (A–C).
Internal Reference Populations (IRPs)
A potential drawback with the use of FMOs is that nonspecific staining of antibodies cannot be addressed with regards to the reagents being evaluated for the FMO controls. Present or absent expression is instead defined in another sample (the FMO) rather than in the tested sample, and variation in staining quality between these samples could affect the analysis. In addition, some cellular subsets are identified by medium versus high expression level for a specific marker. Thus, sometimes the use of a FMO is of little help. Instead, the use of IRPs could be of great value . IRPs are cell populations within the tested sample that can serve as a positive or negative reference, as is illustrated in Figure 3. A requirement for the use of IRPs is pre-existing knowledge of cell surface expression of the marker of interest within these IRPs, obtained through for instance the use of FMOs in earlier experiments. While a potential drawback of using IRPs is that the cells contained in the IRP are different from the population of interest, and thereby potentially have different (unspecific) binding properties to the antibodies used, we have found that they are an excellent complementary strategy to FMOs for defining boundaries between positive and negative cells. As a consequence, when good IRPs are at hand, isotype control antibodies are completely redundant.
As mentioned in connection to Figure 2, inclusion of additional fluorochromes comes at a potential cost. Therefore, planning an experiment thoroughly, with regards to the numbers of fluorophores used, fluorescence spillover, expression levels of the antigens, and the brightness of the fluorochromes, is of substantial importance. Generally, when designing multicolor experiments it is preferable to use fluorochromes with high fluorescence intensities for antibodies that bind to weakly expressed antigens, and vice versa.
Apart from biological and gate setting controls, controls for instrumental setup are included, as shown in Table 1 (SS: single stained control). Correction for spectral overlap (or compensation) between different channels was traditionally accomplished manually on older analog instruments, but multicolor experiments on more modern digital machines quickly demanded the use of software compensation. In addition, software compensation can be regarded as objective, whereas manual compensation is not and we therefore suggest to always perform software compensation. Using the same cell source as the sample to set compensation can sometimes be a problem in cases of limited cell numbers (for instance primary human cell sources), differences in autofluorescence between different cellular subsets, or when the antibody only binds to a very minor fraction of the total cells. The use of so-called compensation (or capture) beads provides a better alternative in most cases. These types of commercially available beads are consistent, bind antibodies with high affinity, give signals with high intensity and narrow coefficient of variation and allow for adequate numbers of events for proper compensation calculations to be collected. Importantly, these beads should only be used when giving a brighter signal compared with the stained cells, which usually holds true.
The cell number in the stained sample can affect the quality of the staining. This is of relevance when analyzing very infrequent cells as this necessitates staining of larger cell numbers. Antibodies should, therefore, be titrated using similar cell numbers (or concentrations) as in the experiment samples. Sometimes, it is desirable to enrich for the cell type of interest, for instance when sorting infrequent cell types. Here, enrichment for the target cells will increase sorting speed and, in our experience, also increase sorting yields and purity. The choice of method (positive selection for a marker of interest or depletion of unwanted cells) depends on the experimental inquiry. In our hands, enrichment usually provides higher purities, whereas sample-depletion gives lower purities but eventually a higher yield of the infrequent cellular subsets in cell sorting experiments. Positive selection (using for instance magnetic beads) could in some cases lead to unwanted activating events and should therefore be performed and evaluated with care.
To obtain appropriate statistical power when determining frequencies of very infrequent events, acquisition of a large number of events is often required. This often results in very large files (especially when using multicolor flow cytometry). Previously, this imposed problems both in terms of storage capacity as well as during data analysis. However, storage capacity is today rarely a limiting factor and hardware and software developments allow for the acquisition and analysis of increasingly large data files.
Analysis of Data
Whereas the development of instrumentation and fluorochrome availability has showed great improvement in recent years, developments in analysis software have been somewhat disappointing despite an increasing demand as analyses have become increasingly more complex. As discussed by Chattopadhyay et al., development in computer-driven analysis is made, but still has its downsides . Thus, manual data analysis in multicolor determination of infrequent subpopulations remains to date the most reliable method. It is important to have a clear and predefined hypothesis that guides such analyses. Data analysis often puts demands on both software and hardware, especially when analyzing large files. As shown in Figure 1, we propose “pre-gating” to remove unwanted events before determination of expression levels of the different markers. These steps are in our view essential and apply to most flow cytometric analysis on any type of material.
Based on the signal characteristics (area versus width and/or height) of either the forward scatter (FSC) or side scatter (SSC) signal, it is possible to distinguish single cells from doublets, triplets, etc. (Fig. 1, upper left plot). This is relevant, as two different cells expressing mutually exclusive markers could read out as one double positive cell. Also, defining cell size and complexity, by FSC and SSC, respectively, excludes smaller irrelevant particles (Fig. 1, upper plots second from the right). Exclusion of dead cells is important, as dead cells tend to stick to other cells and unspecifically bind antibodies. Several approaches are available for dead cell exclusion, of which DNA-binding dyes (like propidium iodide; Fig. 1: upper plots second from the left) and amine-reactive dyes (such as ViViD and Aqua Blue ) are commonly used. For the quantification of rare cells, the use a viability marker is key, as false positive cells may constitute a substantial proportion within the investigated population (or gate).
Many cell types analyzed by multicolor flow cytometry are defined by a combinatorial pattern of present and absent marker expression. Widely applied in hematopoietic flow cytometry and highly recommended here is the usage of a so-called “dump-channel” that allows to simultaneously gate away cells positive for a number of proteins that are not expressed on the cells of interest . For instance, murine HSCs express no, or only low levels of CD3, CD4, CD8, B220, CD19, Gr1, CD11b, and Ter119 proteins (so called lineage markers) [13, 14]. Antibodies used against these markers can easily be conjugated with the same fluorochrome (Fig. 1, “Lineage-channel”), and be visualized in the same channel as the dead cell markers. This decreases the number of different fluorochromes used in the staining and thereby potentially improves the quality of the other signals. As mentioned before, such a dump-channel for lineage markers was not used in the human sample (Fig. 2) and thereby negatively affected our possibility to optimally identify the HSC population.
All “pre-gating” steps discussed above will eliminate most, but not all unspecific or unwanted events. This implies a risk when defining a certain cell type based only on absent expression of a number of markers. We strongly advise to include at least one (though preferably more) positive marker to define the cell type of interest. Still, when evaluating very infrequent events, unspecific events could contaminate the positive gate and, as was discussed previously in this journal , to define a real positive population can pose a challenge. The physical location of these events within the positive gate, or more objectively, the mean fluorescence intensity of the events can help to determine whether or not these events are the actual cells of interest. One illustrative example in connection to HSC biology is peripheral blood reconstitution analysis following murine bone marrow transplantation at limiting doses, where the expected levels of donor contribution could be very low. Here, it is important to standardize the definition of what is a real positive reconstituted mouse (how many positive events of how many total blood cells) as proposed by Eaves and coworkers .
Flow cytometric frequency determination, and subsequent isolation of infrequent immature cell populations, is a powerful tool when studying of for instance transgenic mice [35, 36]. Ideally in these types of experiments, several replicates, or cell preparations from multiple individuals, are used. The more infrequent a cell population is, the larger is the risk of variation between samples. Therefore, as discussed previously, it is advisable to collect a proper number of events within the gate of interest. Once a gate setting strategy is defined, the exact gates should be applied to all samples in the analysis. This is important, although sometimes subject of debate. For instance, in KIT (W41/W41), mice that are characterized by decreased cKit signaling activity, cKit-expressing cells are decreased both in frequencies and cKit-expression levels when compared with wild type cells [37, 38]. In cases like these, careful considerations in gate setting strategies, in combination with functional comparison of the cell population of interest are necessary for proper comparative analysis.
When presenting flow cytometric data of very infrequent events, there are some further issues worth considering. Most analysis software provides the possibility to present data in different types of plots. Contour plots (including out layers; Fig. 1, upper plots) are useful and preferred over dot plots in the case of many events, as they give a good reflection of the event intensity distribution. However, when presenting a limited number of events, dot plots (Fig. 1, lower plots) are preferred because single out layers could otherwise appear as “islands” rather than dots, or can even be visually absent.
Both multicolor experimental setup and digital correction of spectral overlap creates (both positive and negative) events around the lower ends of the axis. When displayed on traditional log scales, differential expression levels would loose resolution and events are even “stuck” against the axes. Biexponential, or logicle display , that uses alternative scaling of the lower end of the axis and allows for presentation of negative values, is a good means to circumvent these issues and is used throughout the figures in this report and illustrated in Supporting Information Figure 1.
In published literature, conclusions are regularly made based on the position of “positive versus negative” gates. Although often excluded, presentation of the negative control that forms the basis for the gate positioning, like an FMO or an untreated control, is highly illustrative and allows the reader to better understand and judge the presented data.
Another consideration, when presenting infrequent cell subset analysis, is the choice of presenting data in the form of frequencies or absolute numbers. For instance, when comparing bone marrow cellular subsets between individuals that are of same size, age, and with similar bone marrow cellularities, presentation of frequencies is adequate. However, these analyses are complicated when comparing for instance murine wild type reference animals with for example smaller or larger experimental test animals, like the Lig4 (Y288C)  or p18 (INK4C)  mice, respectively. In those cases, where the size of the population in which the frequency is determined might differ, presentation of absolute cell numbers, instead of (or in addition to) cell frequencies might be more informative.
Implementation of recent developments in flow cytometry into experimental science and clinical practice has enhanced the possibilities to extract biological information from a test sample. However, these developments put an increasing demand not only on the setup and acquisition of the samples but also on the analysis and presentation of the data. Analyses of very infrequent cellular subsets, like bone marrow-derived HSCs, are particularly sensitive to suboptimal experimentation and thereby necessitates optimal experimental procedures. Here, we have discussed a set of issues that we believe should be taken into consideration in order to obtain, analyze, and present as high quality flow cytometry results as possible.