Polychromatic flow cytometry: A rapid method for the reduction and analysis of complex multiparameter data
Laboratory of Molecular and Tumor Immunology, Robert W. Franz Cancer Research Center, Earle A. Chiles Research Institute, Providence Cancer Center and Providence Portland Medical Center, Portland, Oregon
Recent advances in flow cytometry have resulted in the development of reliable techniques for performing polychromatic (5–17 color) flow cytometry analysis. However, the data reduction and analysis involved in the resolution of hundreds of possible cellular subphenotypes identified, using a single polychromatic flow cytometry staining panel, presents a major obstacle to the successful application of this technology.
To generate two distinct collections of T cell populations with differentially expressed surface markers, cryopreserved lymph node cells from 5 melanoma patients vaccinated with the modified gp100209-2M melanoma peptide were stimulated with cognate peptide and cultured in either IL-21 + low-dose IL-2 or IL-15 + low-dose IL-2. In vitro stimulated (IVS) cells were interrogated using 8-color flow cytometry. Data were analyzed using Winlist Hyperlog™ and FCOM™ software, and 32 T cell subsets were resolved for each culture condition. Hierarchical clustering analysis was applied to the relative percentages of each subphenotype for both IVS conditions to determine if unique cell surface marker expression signatures were produced for each IVS culture.
Sequential data analysis using Hyperlog™ and FCOM™ demonstrated that lymphocytes cultured in IL-21 + IL-2 had a distinctively different set of subphenotype signatures compared to cells grown in IL-15 + IL-2 for all 5 patients. Importantly, subsequent cluster analysis of all 32 subphenotype frequencies in each IVS test condition for all 5 patients reproducibly demonstrated that cellular subphenotypes produced after IL-21 + IL-2 IVS partitioned separately from subphenotypes produced by IL-15 + IL-2 IVS.
Two to four parameter flow cytometry data are commonly analyzed by distributing the event acquisition, through sequential one- or two-parameter gated histograms with positive and negative staining regions delineated, using appropriate negative staining controls for each fluorescence parameter. Thus, in the case of 2-parameter histogram analysis of a 4-color staining experiment, cells stained for epitope “A” and “B” expression would be distributed into four subsets: A+/B+, A+/B−, A−/B+, and A−/B−. Each of these phenotypes could be further partitioned into four more subphenotypes based on their distribution in a second 2-parameter Cartesian array specific for staining of epitopes “C” and “D”. This simple example would result in 16 different subphenotypes. Although polychromatic flow cytometry (>4 colors) is now a readily achievable reality in many laboratories, this same laborious data processing strategy remains the basic approach used for ever-more complex multiparameter data analysis.
The development of small powerful diode lasers, and the optimization of new fluorophore molecular species with widely separated emission spectra has resulted in the development of reliable techniques for performing polychromatic flow cytometry analysis. Multiparameter analysis can require the quantification of frequency distributions of as many as 32–1024 possible subphenotypes if 5- to 10-parameter analysis is performed. Faster, more efficient data reduction, and analysis procedures are required—especially if polychromatic analysis involves the interrogation and comparison of multiple test samples, as might occur in clinical studies involving many patients in different treatment cohorts.
The recent development of Winlist Hyperlog™ data analysis software (Verity House Software) and the similar FlowJo biexponential software (Trestar Software) produces more accurate quantification of flow cytometry fluorescence data—especially when weak positive staining is expressed for one or more parameters (1, 2). Additionally, the Winlist Software contains “FCOM”™, an analysis tool which rapidly reduces multiparameter data to a series of multiple event acquisition histograms, one for every possible subphenotype defined by the number of fluorescent parameters in a given multicolor staining protocol. FCOM™-generated event acquisition histograms can be subsequently exported to simple spreadsheet and graphics software for comparative analysis of small numbers of test samples. Tabular spreadsheet summaries of multiple FCOM subphenotype acquisition frequencies can also be exported to cluster analysis software to compare subphenotype staining patterns of a large number of test samples.
Clustering analysis comprises a large family of methods to partition a large set of test items [reviewed in (3)]. Among them hierarchical clustering analysis is the most used method because it allows a convenient visual representation of the data. In hierarchical clustering, the data set is split into groups, and, within them, into subgroups. Since the initial application (4), hierarchical cluster algorithms have been employed in a wide variety of gene expression studies (5–9). In general, gene expression studies generate relative expression signals against a preselected baseline detection background (e.g. Affymetrix GeneChips). These gene expression values are used for the cluster analysis. A cell population analyzed by polychromatic flow cytometry can be divided into multiple different subphenotypes. The number of cells within each phenotype subset can be given as a relative frequency of the total preselected (gated) population of interest. These relative frequencies are accessible for cluster algorithms as shown in recent studies that analyzed lymphocyte subsets defined using 4-color flow cytometry in patients with colorectal cancer (10), and in patients with B cell chronic lymphoproliferative disorders (11). Similarly, applying unsupervised average-linkage hierarchical clustering (Pearson Correlation) to absolute valued data sets (e.g. flow cytometry frequency data) should be comparably effective in the processing of more complex 5–17 parameter polychromatic flow cytometry subphenotype analysis (3, 11–13). Practical, more efficient approaches to cluster analysis of large subphenotype arrays (1,024 in a 10-color experiment) is made possible if automated data reduction techniques such as FCOM™ analysis can first be employed for the resolution of the much larger number of subphenotypes that are delineated in a 5–17 color analysis compared to 4-color flow cytometry.
To demonstrate the analytical power of using combined automated data reduction analysis with follow-on cluster analysis in the context of highly complex multicolor flow cytometry, we here describe 8-color flow cytometry analysis of in vitro stimulated (IVS) lymph node (LN) lymphocytes cultured in either IL-21 + IL-2 or IL-15 + IL-2. Previous studies in our laboratory have demonstrated that both IVS conditions generate distinctly different cell-surface marker signatures in cultured T cells (unpublished data), which should be detectable with cluster analysis algorithms.
MATERIALS AND METHODS
Human Memory/Effector T Cell Staining Panel
Fluorescent antibodies were purchased or, in the case of CD27, fluorescenated in-house with the use of commercially available QDot antibody conjugation kits (Invitrogen, Molecular Probes, Eugene, OR). The staining panel consisted of the following: CCR7-FITC (R&D Systems, Minneapolis, MN), HLA-A2-restricted gp100209-2M- (IMDQVPFSV) tetramer-PE, (iTAg, Beckman Coulter, San Diego, CA), CD8β-PE-TR, and CD28-PE-Cy7 (Beckman Coulter, Fullerton, CA), CD14 PE-Cy5 (Beckman Coulter, Marseille, France), CD19-PE-Cy5 (eBioscience, San Diego, CA), CD45RA-APC, affinity purified CD27 and strepavidin APC-Cy7 (BD Bioscience, Pharmingen, San Diego, CA), CD57biotin (BD Bioscience, Immununocytometry Systems, San Jose, CA). The pairing of fluorochromes to antibodies was determined by established staining profiles of each antibody to allow for detection of bright, dim, and negative populations. Spectral overlap between fluorescent dyes was also considered. Each fluorescent antibody was titered carefully at optimal PMT voltage settings to determine the antibody concentration that produced optimal fluorescence resolution with minimal nonspecific background staining. The use of a “dump cocktail” of CD19 & CD14-PE-Cy5 in combination with 5 μg/ml of 7-aminoactinomycin D (7-AAD) (Invitrogen, Molecular Probes, Eugene Oregon) in 1× PBS was employed to stain cells with high cell-surface Fc receptor-mediated, nonspecific binding of antibodies, and to discriminate between live and dead cells. Since PE-Cy5 and 7-AAD fluoresce at similar wavelengths, a single PMT was assigned to collect both signals and serve as a “dump” channel to eliminate nonspecific binding of dead and high Fc+ receptor cells (B cells and monocytes) for experimental and control samples. Bright, high-density surface markers were selected to pair with far-red emitting fluorochromes, and thus counter broadening distributions of fluorescence in the far-red channels of PE-Cy7 and APC-Cy7 (14). All samples were acquired using Summit 4.2 software on a Dako Cyan ADP Flow Cytometer equipped with three diode lasers (488, 635, and 407 nm), and modified with optimal bandpass and dichroic filters (Dako, Fort Collins, CO).
Cell Samples and In-Vitro Stimulation Cultures
Thirty-five HLA-A2+ patients with resected stage I–III melanoma were randomized to receive melanoma peptide vaccinations every 2 weeks (13 vaccines) or every 3 weeks (9 vaccines) for 6 months. The vaccine included the HLA-A2-binding modified melanoma peptide gp100209-2M mixed in Montanide ISA 51 adjuvant (15). Sentinel LN biopsies were collected from 12 patients after the second vaccine (16). Lymphocytes from LN were isolated and cryopreserved. Five patients from the vaccination trial with the highest frequencies of gp100209-2M specific CD8+ T cells were included in this analysis. Collected LN lymphocytes from these 5 patients were subjected to a 9-day IVS prior to being stained for phenotype analysis. Cryopreserved lymphocytes were thawed and washed in dPBS (BioWhittaker, Walkersville, MD) with 2% human AB sera (Irvine Scientific, Santa Ana, CA). The cells were cultured in X-VIVO-15 (BioWhittaker, Walkersville, MD), 5% human AB sera with 1 μg/ml gp100209-2M peptide (Invitrogen, Eugene, OR) and either 50 ng/ml recombinant human IL-15 (Peprotech, Rocky Hill, NJ) or 50 ng/ml recombinant human IL-21 provided by Zymogenetics, (Seattle, WA). The cells were plated at 12.5 × 104 cells/ml in 200 μl per well in a round bottom polystyrene 96-well cell-culture plate for 2 days in a 37°C humidified chamber with 5% CO2. On day 2, the cells were washed to remove the peptide and suspended in culture media containing low-dose rhIL-2 (60 IU/ml, PROLEUKIN®, Chiron Corporation, Emeryville, CA) and either rhIL-15 or rhIL-21 for 7 days in a 37°C humidified chamber with 5% CO2. On day 8, the T cells were harvested by adding 2 mM ethylenediaminetetraacetic acid (EDTA) (Sigma Chemical, St Louis, MO) to the wells for 5 min, and transferred to 5-ml polystyrene tubes to wash out the EDTA before staining and phenotype assessment.
Staining, Compensation, and Gating Strategy
After IVS, the cells were stained using an 8-color human T cell phenotype panel consisting of CCR7, gp100209-2M tetramer, CD8β, CD28, CD45RA, CD57, and CD27 and a dump channel cocktail of CD14 and CD19 antibodies conjugated to PE-Cy5. 7-AAD, which fluoresces at the same wavelength as PE-Cy5, was added prior to acquisition for live/dead discrimination, and 7-AAD fluorescence was collected in the same PMT as the PE-Cy5 signal. Viable CD14−/CD19− lymphocytes were gated through CD8β and gp100209-2M tetramer staining to preselect for gp100209-2M specific CD8β+ T cells. This population of cells was further interrogated for staining by the remaining five cell-surface markers to determine their subphenotypes. All data were acquired in FCS format using Summit™ 4.2 software and analyzed using Winlist™ 5. O Software (Verity House Software, Topsham, ME). Computer-assisted digital compensation was performed using single-color staining controls via the Hyperlog™ transform as previously described (17). FMO (“fluorescence minus one”) controls were used to set hinged-gating and define histogram regions that distinguished positive from negative events for experimental samples and fidelity controls (1). Fidelity controls were used to ensure that there was no loss of staining frequency and intensity between lower order panels and the corresponding fluorescence for each mAb in the 8-color panel.
Data Reduction and Cluster Analysis
FCOM™ (which stands for “combination function”) is an analysis tool in Winlist™ that can be used to categorize and bin a fluorescent cellular event based on the combinations of all the predefined (pregated) fluorescent histogram regions that contain the event.
This function uses pregated positive regions of multiple fluorescence parameters to enumerate all possible subphenotypes for mixed cell populations as defined by the number of positive gated regions. Since an event is either located inside or outside of a given fluorescent region, its status can be represented by a single digit: 0 = outside, and 1 = inside. FCOM™ assigns each cellular event a numerical value based on all possible cellular gated polychromatic combinations that contain that event. These numerical values are then arrayed as discrete single parameter histogram peaks where each peak represents a defined polychromatic subphenotype. Regions used to delineate the positive events are set on either 1-parameter or 2-parameter histograms based on FMO negative control stains for each antibody. The absolute number of gated cells of each defined subphenotype can be assessed and displayed as a discrete peak on the FCOM™ histogram array. The relative percentages of all subphenotypes are also calculated in FCOM™ and displayed as a phenotype frequency register.
Since the number of subphenotypes for any polychromatic panel is a logarithmic function (2n, where n = number of parameter gates), 32 subphenotypes were generated with the five antibodies described previously. Using the “Create Results Array” tool in FCOM™, regions were automatically drawn around each subphenotype and the frequency and absolute number were displayed in a tabular register format.
The relative percentages of each of the 32 subphenotypes defined by FCOM™ were used for unsupervised cluster analysis. Average-linkage hierarchical clustering (Pearson correlation) was performed with The Institute for Genomic Research MultiExperiment Viewer software [TIGR MeV 3.1, www.tm4.org, (18)]. Data files generated by FCOM™ were reformatted to Tab Delimited, Multiple Sample (.txt) Files (TDMS Format), and were uploaded to TIGR MeV software.
Conventional Data Analysis
A representative example of conventional sequential 2-parameter dot-plot analysis of an 8 color staining experiment is displayed in Figure 1. IL-21 + low-dose IL-2 cultured cells from Patient EA32 were stained and used in the polychromatic phenotype analysis. The conventional analysis strategy consisted of filtering all live, gp100209-2M tetramer positive CD8β+ T cells through a series of sequential gated 2-parameter dot-plots, with the initial screen occurring through a CCR7 staining gate (Fig. 1). The events in the CCR7 negative (Fig. 1A), and positive (Fig. 1B) regions were then subsequently arrayed onto a 2-parameter histogram of CD45RA vs. CD57 fluorescence. Hinged gating based on FMO controls divided the populations into four subregions: CD45RA+/CD57−, CD45RA+/CD57+, CD45RA−/CD57−, CD45RA−/CD57+. Cells in these respective regions were then further arrayed onto additional 2-parameter histograms of CD27 vs. CD28 staining. This method of determining the subphenotype frequencies requires one to multiply frequencies of gated events from each sequential dot-plot to determine the final frequency of a given selected subphenotype. For example, to determine the frequency of CCR7−/CD45RA−/CD57−/CD28−/CD27− cells in the gp100209-2M specific; viable; CD8β+ T cell population (Fig. 1A): the CCR7−/tHLA+/CD8+ frequency (96.53%) is multiplied by the CD45RA−/CD57− frequency (70.94%); this product is then multiplied by the frequency of CD27−/CD28− dual negative cells in the CD45RA−/CD57− compartment (3.0%). The final frequency of CD8+, tetramer+ cells that have a CCR7−/CD45RA−/CD57−/CD27−/CD28− phenotype is thus calculated as: 96.53% × 70.94% × 3.0% = 2.02%. A similar calculation would be repeated for the other 15 subpopulations in the CCR7− population, and for the 16 CCR7+ populations (Fig. 1B). IL-21 + IL-2 IVS conditions result in high dominant expression of double positive CD27+/CD28+ CD8+ T cells for all four CD45RA vs. CD57 staining regions; the three major subphenotypes were CD45RA−/CD57−/CD27+/CD28+, CD45RA−/CD57+/CD27+/CD28+, and CD45RA+/CD57−/CD27+/CD28+ for both CCR7− (Fig. 1A) and CCR7+ cells (Fig. 1B). By contrast, a similar conventional dot-plot analysis of the same 8-color stain of IL-15 + IL2 IVS CD8+ T cells from the same patient (Figs. 2A and 2B) showed predominantly double negative CD27−/CD28−, and single positive CD28+ staining of cells from all four compartments of the CD45RA vs. CD57 dot-plot for both CCR7− and CCR7+ T cells. This basic approach to the resolution of 32 subpopulations of pregated, viable, antigen-specific CD8+ T cell subsets for both IVS conditions revealed clear phenotype subset differences between the two IVS test groups. However, the procedure is very cumbersome and time consuming—especially if multiple patient lymphocyte samples are tested and compared.
FCOM Data Analysis
An FCOM™ analysis was generated for the same data shown in Figure 1 by preselecting the same gp100209-2M-specific, viable, CD8+ T cell population, and screening the data through the five remaining staining parameter gates (CCR7, CD45RA, CD57, CD27, and CD28) after defining positive regions for each parameter using FMO controls. The FCOM™ analysis algorithms rendered tabular registers for absolute cell numbers and for relative percentages for all 32 subphenotypes in the sample (Fig. 3). Individual subphenotype FCOM™ event histograms of the absolute number of cells for all 32 subphenotypes are shown for the same patient sample stimulated with IL-21 + IL-2 (Fig. 4A) and with IL-15 + IL-2 (Fig. 4B). FCOM™ automatically adjusts the event scaling factor to include the full range of absolute cell counts for each subpopulation within a given test sample. Thus, in the examples shown (Figs. 4A and 4B) IL-21+ low-dose IL-2 generated much higher numbers of tetramer positive CD8+ T cells for some subphenotypes (>1,000) compared to the absolute numbers produced by IL-15 + low-dose IL-2 (<600). FCOM™ analysis thus provides a rapid approach for the generation of a series of event acquisition histograms for all 32 subphenotypes, and also offers the potential to compare efficiently the relative frequency distribution pattern of these subphenotypes between two or more samples by exporting the frequency register data into standard analysis software (Fig. 4C). The FCOM™-derived phenotype data shown in Figure 4C delineated ten major subphenotype differences between IL-21 + IL-2 vs. IL-15 + IL-2 CD8+ cells. These were exactly the same 10 major subphenotype signature differences which could be derived by conventional analysis of 2-parameter dot-plots shown in Figures 1 and 2. Thus, FCOM™ analysis rendered the same results as those obtained using conventional gating procedures. To verify equivalence of conventional 2-parameter dot-plot analysis calculations versus automated FCOM™ analysis, the frequencies of all 32 subphenotypes from patient EA32 stimulated with IL-21 + IL-2, and, separately, with IL-15 + IL-2 were first calculated from dot-plot arrays (Figs. 1 and 2). The correlation coefficient for subphenotype frequencies determined using FCOM™ compared to frequencies derived from conventional analysis procedures for IL-21 + IL-2 IVS was R2 = 0.999, and R2 = 0.9987 for IL-15 + IL-2, respectively (Fig. 5). Similarly, high correlation concordance was found for the other four patient IVS samples (data not shown). Tabular FCOM registers show both the number and frequency of cells in each subphenotype found for different test samples interrogated with a given combination of multiparameter specific antibodies (Fig. 3). This allows for rapid quantification of the data, and export to spreadsheet software to subsequently organize and visualize the data with simple graphic techniques when relatively small numbers of test samples are involved. Alternatively, FCOM™ frequency register data for each subset can be exported into cluster analysis software to analyze and compare staining frequency patterns between large numbers of test samples, and thus detect grouping or clusters of related staining phenotypes.
Data Analysis and Visualization by Cluster Algorithm
Cluster algorithms can be employed to compare relative changes between and within individual samples for all identifiable subphenotypes, and can thus group large sample sets with many expression parameters per sample. Unsupervised average-linkage hierarchical clustering (Pearson Correlation) was used to group the IVS cell samples in this study. The cluster algorithm generated two major clusters, A and B (Fig. 6). Cluster A included the two IVS samples from patient EA29, whereas the IVS samples from patients EA13, EA32, EA34, and EA35 were separately grouped in cluster B. The second generation of cluster trees grouped the samples according to their IVS condition. All samples stimulated with IL-15 + IL-2 were grouped in cluster Ac and Bc, whereas all samples stimulated with IL-21 + IL-2 were grouped in cluster Ad and Bd. Technical replicates of samples EA13 (EA13rep) and EA35 (EA35rep) were included in the analysis and clustered in major Cluster B. The subgrouping of replicate samples was also in accordance to the IVS condition (Fig. 6). To verify the reproducibility and the stability of the analysis, the same type of cluster algorithm analyses was performed excluding the technical repeats, EA13rep and EA35rep, for both IVS conditions (Fig. 7A). The second generation of cluster trees partitioned again according to the IVS condition. Reciprocal analysis was performed with the technical repeats only. Here also the second generation cluster partitioning correlated with the two different IVS conditions (Fig. 7B). Importantly, the 10 major subphenotype differences between the two IVS conditions determined by conventional dot-plot and FCOM™ analysis were also resolved by visual inspection of the heat map (a two-dimensional matrix of phenotype frequencies and cluster trees for data interpretation) shown in Figure 6.
The recent developments in instrument design and fluorochrome chemistry for polychromatic flow cytometry now permit the simultaneous resolution and analysis of a wide range of subphenotypes in a given cell population. Modern multiparameter flow cytometry signal processing results in highly standardized and technically controlled data acquisition. However, because data acquired by polychromatic flow cytometry is also highly complex, more advanced, rapid, data reduction approaches are needed for the efficient interpretation of multiparameter data. We here demonstrate how multiple subphenotypes (32) generated by 8-color polychromatic flow cytometry analysis of cells from multiple samples can be analyzed by comparing relative differences in subphenotype staining patterns within and between individual test samples. The test samples for this study consisted of LN lymphocytes obtained from 5 melanoma patients, following vaccination with gp100209-2M peptide antigen. Lymphocytes from all 5 patients were cultured with cognate antigen in two different IVS conditions to generate data sets of cell populations with two distinct phenotype pattern distributions.
Conventional gating strategies using two-dimensional Cartesian dot-plot arrays resulted in the resolution of different frequency profiles for 32 CD8+ T cell subphenotypes generated with IL-21 + low-dose IL-2 IVS, and, similarly, in IL-15 + low-dose IL-2 IVS cultures. Throughout our analysis we employed hinged gating of two parameter biexponential scaling arrays for more precise assessment of positively and negatively stained cell populations. This approach reflects the growing consensus that such “logicle” arrays are superior to the more conventional logarithmic displays since they allow for more accurate compensation of the sample, and do not result in overcompensation of dim fluorescence signals [reviewed in (2)]. Hinged gating of biexponential plots may provide less of an advantage over conventional orthogonal gating of logarithmic displays when positive fluorescence signals are bright and well resolved from background fluorescence. However, most polychromatic staining arrays contain multiple antibodies which individually produce either dim or bright fluorescence, and many antibodies will simultaneous delineate discrete subsets of both dimly and brightly stained cells. Thus, a single analysis strategy is required to most accurately set compensation and gating for both bright and dim fluorescence in the same sample. Hinged gating was set on FMO controls for each parameter which provide the most accurate negative control since they include all the combined nonspecific fluorescence effects of all antibodies in the test except the antibody for which the positive/negative threshold is being determined (2). This background nonspecific fluorescence is collected in the detector assigned to the fluorescence signal of the missing antibody.
The manner in which polychromatic panels are designed directly impacts the success in obtaining well-resolved, efficient separation of all positive fluorescence events for each parameter. Thus, the FMO background fluorescence can be very high if the FMO control for a given dimly fluorescent staining parameter is made up of several antibodies recognizing high-density antigenic epitopes conjugated to bright fluorochromes. High background FMO fluorescence in a given channel can in turn make it difficult to resolve all the dimly positive antigen-specific fluorescence events collected in the assigned detector. The potential undercounting of dimly positive fluorescent events against a high fluorescent background is a problem that is not resolved by the analysis and gating strategy described herein. However, by carefully matching the dimmest fluorochromes to epitopes with the highest expression density, and otherwise optimizing the reduction of inter-and intra-laser nonspecific fluorescence into the channels collecting the dimmest antigen-specific fluorescence signals, it is possible to minimize the undercounting of dimly positive cells. Such efforts to balance and optimize the amount of overlapping fluorescence in the staining panel is presently the best solution to increase the efficiency with which dimly positive events are counted.
Acquired data was processed by automated FCOM™ data reduction algorithms into a series of 32 event acquisition histograms—one for each subphenotype. The FCOM™ automated data reduction step is a critical new software venue for the rapid processing and resolution of frequency distributions for all subphenotypes in a given multicolor experiment. The 5 patient samples in our study yielded 320 separate phenotype frequency calculations for 32 subphenotypes (i.e. 32 × 5 patients × 2 culture conditions). Conventional analysis techniques used to calculate these frequencies include the serial multiplication of percentages as one moves through the gating hierarchy; or the alternative technique of dividing the number of events in the final subphenotype by the total events in the initial parent population of interest—in our study all CD8β+/gp100209-2M tetramer+/7-AAD− T cells. Either procedure is time consuming, and becomes more time consuming as the number of parameters increases. After setting just one positive and negative boundary region for each parameter the use of FCOM™ reduces the multiparameter analysis time from several hours in many cases to several minutes—usually less than 30–60 min for most experiments. The example presented here indicates, that FCOM™ is a key component that allows the rapid, efficient comparison of polychromatic flow cytometry data from large sets of test samples, and provides the basis for any subsequent data processing (such as cluster analysis). We here demonstrated high correlation between conventional versus automated FCOM™ subphenotype resolution. Convincing automated data reduction techniques such as FCOM™ analysis open a wide spectrum of data processing strategies to subsequently group, visualize, and finally interpret complex multiparameter flow cytometry data.
One possible tool to process complex flow cytometry data, once it has been reduced to a register of event frequency histograms for all subphenotypes, is the use of cluster algorithms (10, 11). The unsupervised cluster algorithms applied here separated the patient samples into two distinct clusters. Sample EA29 was repetitively grouped separately, indicating a distinct phenotype cluster. Analysis of a larger sample pool may detect samples similar to EA29, which could then be correlated to clinical or biological data. The other major cluster included the remaining 4 patients with two subcluster groupings of similar subphenotype expression pattern that correlated separately with the two IVS conditions.
Visual inspection of the heat maps in this study delineated 10 obvious phenotypes, which appeared to be differentially expressed between the two IVS culture conditions; these were the same 10 major phenotype differences initially determined by FCOM™ analysis (Fig. 4C). Beside visual inspection of heat maps other software algorithms can be used like “Self Organizing Maps” (SOM) or “Clustering Affinity Search Technique” to detect differentially expressed subphenotypes in larger, more complex sample sets (19, 20). By applying the SOM algorithm to our data set many of same phenotype differences were resolved, and reconfirmed the identity of the distinguishing subphenotypes initially detected by visual inspection of the heat map.
Because of the small sample size we performed technical replicates of two samples to demonstrate stability of the grouping approach. With or without the technical replicates the grouping was in concordance to the IVS condition. The heat map and the cluster trees indicate minor differences between the original samples and the replicates, which may be attributed to intraexperimental variances. More important, in all three analyses (Figs. 6 and 7) the major grouping of replicates was still in accordance to the IVS condition, demonstrating the potential to detect reproducible expression signatures.
A recent report has described the use of cluster analysis to show the correlation between flow cytometry phenotype data from lymphocytes of tumor-draining LNs in colon cancer patients and the presence of metastatic disease (10). Based on cluster analysis of phenotype data from 4-color flow cytometry a correlation was found between expression levels of 65 T cell markers and metastatic status. Malignant hematopoetic diseases have commonly been classified based on flow cytometry analysis for the expression of specific cell surface markers (21). More recently 3- and 4-color flow cytometry has been used in combination with cluster analysis to define subtypes of hematopoetic malignancies (11). As reported, the analysis of different cases of chronic lymphocytic leukemia with multicolor flow cytometry combined with grouping through cluster algorithms helped to achieve two clinical goals: (a) data bases can be generated to more rapidly and efficiently identify characteristic expression patterns for diagnostic purposes, (b) known, as well as newly identified expression patterns can be correlated to clinical outcome to describe new, clinically relevant, subgroups of hematopoetic malignancies. These two recent reports suggest that there may be a wide range of possible applications for the data reduction and interpretation strategy presented herein.
Both recently published examples of colorectal cancer and chronic lymphocytic leukemia analysis show the clinical value of the combination of flow cytometry with subsequent analysis through cluster algorithms. However, neither of the 3–4 color studies employed a method to initially reduce complex data in a rapid automated manner based on precise gating in the context of more highly evolved 5–17 color flow cytometry. Such an approach would be required prior to cluster analysis to more rapidly process very large numbers of subphenotype frequency distributions for a large number of test samples. The three-step analysis strategy presented here, including Hyperlog™ gating, automated FCOM™ data reduction, and subsequent grouping through cluster algorithms is a feasible strategy for the more rapid and precise interpretation of multiparameter flow cytometry data, and opens a wide range of clinical and scientific applications to the newly emerging analytical power of polychromatic flow cytometry.