### Abstract

- Top of page
- Abstract
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS
- DISCUSSION
- LITERATURE CITED

The design of a panel to identify target cell subsets in flow cytometry can be difficult when specific markers unique to each cell subset do not exist, and a combination of parameters must be used to identify target cells of interest and exclude irrelevant events. Thus, the ability to objectively measure the contribution of a parameter or group of parameters toward target cell identification independent of any gating strategy could be very helpful for both panel design and gating strategy design. In this article, we propose a discriminative information measure evaluation (DIME) based on statistical mixture modeling; DIME is a numerical measure of the contribution of different parameters towards discriminating a target cell subset from all the others derived from the fitted posterior distribution of a Gaussian mixture model. Informally, DIME measures the “usefulness” of each parameter for identifying a target cell subset. We show how DIME provides an objective basis for inclusion or exclusion of specific parameters in a panel, and how ranked sets of such parameters can be used to optimize gating strategies. An illustrative example of the application of DIME to streamline the gating strategy for a highly standardized carboxyfluorescein succinimidyl ester (CFSE) assay is described. © 2010 International Society for Advancement of Cytometry

### INTRODUCTION

- Top of page
- Abstract
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS
- DISCUSSION
- LITERATURE CITED

Multiparameter flow cytometry (FCM) technology has seen dramatic advances in recent years, with five or more color assays now performed routinely in many basic and translational research laboratories (1). Standardization of all aspects of FCM, from instrument setup to data analysis, is an ongoing effort by multiple organizations, because standardization is necessary for consistent data comparison across sites (2). Multiple investigators have pioneered the use of advanced techniques and technologies to accurately evaluate immune cell subsets in multicenter programs for well over two decades, including the use of backgating for measuring purity and recovery combined with checksums (3), use of CD45 in three-color (4) and four-color assays (5), use of a single platform technology for absolute counts (6, 7), panleukogating (8), and the use of prealiquoted lyophilized reagents (2).

In the context of FCM assays performed in good clinical laboratory practice (GCLP)-compliant laboratories, demonstration of reproducibility is critical for clinical acceptance (9–11). The reproducibility of FCM assays relies on key elements of the assay being standardized and well-characterized, including instrument and reagent qualification, sample preparation processes, and analysis protocols (12–15). Over the past few years, multicenter standardization studies for many types of flow assays have consistently shown that suboptimal data analysis methods are one of the most significant sources of variability (2, 16, 17). Variability can be reduced by collection of sufficient events, use of appropriate controls, careful parameter selection, and optimized gating strategies (18–21). In the context of analysis, the use of highest purity and lowest contamination measures as well as backgating can aid in the design of appropriate gates.

Simple gates in any two projected dimensions may not suffice to minimize false positive and maximize true positive events. Hence, in the drive to maximize recovery and purity (22), gating strategies can become increasingly complex even when there are relatively few parameters being measured, with arbitrarily shaped gates and multiple gating generations being used. Although this is effective for a single laboratory, it is difficult to apply complex gating strategies consistently across different instruments and operators across multiple laboratories. Thus, the ability to objectively measure the contribution of a specific parameter or combination of parameters toward target cell identification independent of any gating strategy could be very helpful for both panel and gating strategy design.

Recent developments in computational statistics allow us to discover and monitor target cell subsets directly in multiple dimensions without use of a sequence of gates. Several groups, including ours, have recently published gating-free model-based approaches to cell subset identification using statistical mixtures of Gaussian, T, or skewed distributions (23–26). Here, we show that the predictive density resulting from such model-based approaches can be exploited to perform a discriminative information measure evaluation (DIME) for FCM parameters. DIME analysis allows us evaluate parameter usefulness for identifying a target cell subset that can be specified as some collection of mixture components. From a biological perspective, DIME provides insight into optimal parameter combinations that characterize a cell subset in a way that is independent of any particular gating strategy. Practically, DIME provides an objective basis for standardizing the analysis of FCM panels in multicenter clinical trials and can contribute to improved assay reproducibility.

We show the application of DIME to the design of a simplified gating strategy for a carboxyfluorescein succinimidyl ester (CFSE)-based assay designed to measure CD4 and CD8 T lymphocyte proliferation following antigen challenge. The context for this proof-of-concept analysis was a three center pilot study (BD Biosciences, Université de Montreal/NIML and Duke University) sponsored by DAIDS to standardize the assessment of T lymphocyte proliferation using a panel for CD3, CD4, CD8, CFSE, and an amine viability stain. Experts at the three centers had, through careful evaluation of their collective data, developed a standard consensus gating strategy that was designed to reduce background and enhance detection of specific proliferation.

### DISCUSSION

- Top of page
- Abstract
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS
- DISCUSSION
- LITERATURE CITED

We have described a novel and effective statistical method for evaluation of the usefulness of any given parameter or group of parameters for discriminating a target cell subset; the method uses a discriminative measure calculated from the density of a fitted mixture model. To the best of our knowledge, this is the first description of an automated, quantitative approach to evaluate how useful any given parameter in a flow cytometric panel is for identifying cell subsets of interest. We also demonstrated using a CFSE proliferation assay as an example that such a measure is potentially useful for both panel optimization and gating strategy, especially in the context of assay standardization.

In the chosen example, as expected, CFSE, CD4, and CD8 were the most informative parameters. That scatter and aAmine parameters were among the least informative is also reasonable because these parameters are used primarily to reduce interference and background by negatively selecting dead/dying cells and cellular debris and are not part of positive selection for the target CD4 or CD8 proliferating cell populations. However, a surprising and very interesting result of our analysis was that in the presence of CD4 and CD8, CD3 did not contribute materially to the discrimination of proliferating and nonproliferating lymphocytes in a CFSE assay. From a technical perspective, the samples used in the CFSE assays were PBMCs that would have included general populations of lymphocytes and monocytes. In this experiment, PBMC samples were analyzed after a 6-day stimulation with antigen-specific peptide or anti-CD3/anti-CD28 mAbs. Potentially contaminating monocytes, NK, and B cells would be either strongly adherent or dead by the time of sample staining and acquisition. In hindsight, therefore, it is biologically reasonable that CD3 was not necessary to discriminate CD4+ and CD8+ lymphocytes in our CFSE experiments as it is likely that only CD4+ and CD8+ T cells remained after the 6 day stimulation. However, we strongly caution that the lack of information provided by CD3 in this experimental example should not be generalized outside of this highly standardized CFSE assay. A great amount of effort to optimize and standardize each aspect of the assay used in our analysis yields highly reproducible results that were not previously possible (27) and applying the same parameter usefulness analysis to an identical data set that has been acquired in a less standardized setting may not yield similar results. Rather, the point of this example is that commonly used parameters may not add value to the panel and the potential usefulness of each parameter used in a given panel is best determined with an objective measure like DIME.

The principle underlying DIME is conceptually simple. In contrast to the sequential 2D approach of conventional gating, multidimensional statistical analysis identifies cell populations using all parameters simultaneously. In a fitted model, DIME basically formalizes the idea that if we can “drop” one or more parameters at a time, and re-evaluate the likelihood that any given event in the target cell population still belongs to the original population it was in when all parameters were present, we will know how much the “dropped” parameter contributes to target cell subset identification. DIME thus further extends the multidimensional analysis to provide a quantitative measure of the contribution each parameter makes towards each identified cell subset of interest, giving an objective basis for panel and gating strategy design.

An objective, automatically computable measure of parameter usefulness has many benefits. In the context of clinical FCM assays in GCLP-compliant laboratories, DIME is useful for reducing the amount of trial-and-error in optimal gating strategy design by identifying the most informative parameters. As the number of parameters increases, the availability of DIME can significantly reduce the effort to find a target cell subset by identifying the subgroups of maximally informative parameters. The use of an objective basis for rationally designing a gating strategy is also likely to increase acceptance of that strategy by flow experts at different institutions. In turn, acceptance and use of a common gating strategy contributes to reducing the variability in multicenter studies. In some cases, DIME may reveal that certain parameters provide no additional information and can be dropped from the panel when cost is an issue. Panel reduction may also result in increasing the applicability of an assay by making the assay feasible on less sophisticated cytometers.

DIME can also be used to evaluate if different markers perform equivalently, and the potential loss or gain of sensitivity and specificity afforded with swapping markers. This can then be used to inform decisions on panel construction—for example, in cell sorting applications, it is necessary to use live cells, and hence a useful question that can be answered using discriminative measures could be “What is the impact of swapping an intracellular marker, that requires permeabilization and fixation to identify, with one or more cell surface markers, that may be used to identify viable cells?” Computationally, the discriminative measure provides a natural mechanism for feature selection, and we are currently investigating the extent to which this can be used for adaptive dimension reduction in clustering applications.

An obvious caveat is that DIME can only tell us about the usefulness of parameters for the particular data sets actually analyzed. If a parameter makes no contribution in the majority of data samples but is critical for occasional anomalous samples, this will not be reflected by DIME unless the anomalous data samples are also analyzed. Clearly, DIME should not be used to exclude such parameters that are known to be of high informational value in anomalous data samples but are otherwise redundant in typical data sets. However, if the test data set on which DIME is assessed is representative of the universe of data samples, then DIME provides robust information to guide decisions on marker inclusion, exclusion or exchange. Similarly, the measure of parameter usefulness is always with respect to a particular target cell subset. For different target cells, different parameters may be relevant, and this will be reflected in the DIME analysis of that target cell subset. In particular, this article has evaluated the usefulness of DIME for a highly standardized data set, and the practicality of this statistical tool for less standardized “real world” data needs further confirmation.

It is also clear that DIME is only as good as the clustering algorithm and will only provide a reliable guide if the cluster of interest has high sensitivity and specificity with respect to the “true” target cells. Finally, we admit that the use of DIME is highly restricted at present, because it relies on an existing fitted statistical model of the data, and such analysis is at present only carried out in specialized research settings. However, research in statistical mixture modeling of flow data is rapidly progressing and will become increasingly important and relevant as data dimensionality and throughput increase.

Fully automated gating techniques are currently being intensively researched at several institutions including ours, but manual gating is likely to remain the standard practice for some time. We show here that model-based approaches to automated cell subset identification can be complementary to manual gating and provide useful information to guide both parameter selection and gating strategy design.