SEARCH

SEARCH BY CITATION

Keywords:

  • hierarchical clustering;
  • minimal residual disease;
  • acute lymphoblastic leukemia;
  • support vector machines;
  • multiparameter flow cytometry

Abstract

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Flow cytometry is a valuable tool in research and diagnostics including minimal residual disease (MRD) monitoring of hematologic malignancies. However, its gradual advancement toward increasing numbers of fluorescent parameters leads to information rich datasets, which are challenging to analyze by standard gating and do not reflect the multidimensionality of the data.

We have developed a novel method to analyze complex flow cytometry data, based on hierarchical clustering analysis (HCA) but with a new underlying algorithm, using Mahalanobis distance measure. HCA is scalable to analyze complex multiparameter datasets (here demonstrated on up to 12 color flow cytometry and on a 20-parameter synthetic dataset).

We have validated this method by comparison with standard gating approaches when performed independently by expert cytometrists. Acute lymphoblastic leukemia blast populations were analyzed in diagnostic and follow-up datasets (n = 123) from three centers. HCA results correlated very well (Passing–Bablok correlation coefficient = 0.992, slope = 1, intercept = −0.01) with standard gating data obtained by the I-BFM FLOW-MRD study group. To further improve the performance in follow-up samples with low MRD levels and to automate MRD detection, we combined HCA with support vector machine (SVM) learning.

HCA in combination with SVM provides a novel diagnostic tool that not only allows analysis of increasingly complex flow cytometry data but also is less observer-dependent compared with classical gating and has potential for automation. © 2011 International Society for Advancement of Cytometry

In childhood acute lymphoblastic leukemia (ALL), response to therapy as measured by minimal residual disease (MRD) monitoring is an important biomarker for predicting relapse and stratifying treatment (1–5). MRD can be assessed by molecular analysis of B- and T-cell receptor gene rearrangements or by flow cytometric analysis of aberrant immunophenotypes. Flow cytometric MRD monitoring is a fast and sensitive method and has been incorporated in several large childhood ALL clinical trials (1, 6, 7). However, flow cytometry generates increasingly large and information-rich datasets, which provide new challenges for analysis. Modern multilaser flow cytometers are able to simultaneously measure up to 12 or more parameters and acquire such information from millions of single cells (8, 9). Traditional gating of populations on two-parameter plots is tedious (e.g., 28 plots in six-color flow cytometry, 66 plots for 10-color analysis, 91 plots for 12-color analysis, etc.) and does not reflect the multidimensionality of the data. Moreover, both the setting of the gates and interpretation of the results are observer-dependent and require intensive training and high levels of expertise. Therefore, new analytical tools that are less observer-dependent reflect the multidimensionality of the data and enable automatization are needed. This would facilitate the more widespread use and applicability of flow cytometry for MRD monitoring within large international multicenter trials.

Alternative analytical methods have been tested for flow cytometry data (10–16), but most of them rely on prior knowledge of the number of clusters (cell populations) expected in the sample. These methods produce limited number of clusters without information on their inner hierarchy (subpopulations). Although hierarchical clustering methods can place each cell in its hierarchical context within the analyzed dataset, their downside is that they often fail to reflect the elliptical shapes of flow cytometric populations. One suggested solution lies in the use of Mahalanobis distance measurement for computing the distance of clusters (17); however, this method was unable to cluster samples from single events, a feature necessary for the detection of small populations such as in monitoring MRD in patients with leukemia. Therefore, we have developed a novel algorithm, harboring the advantages of hierarchical clustering and using Mahalanobis distance measurement, but which has the ability to cluster data from single events. With this approach, it is possible to present complex flow data in one figure, yet allowing easy separation of subpopulations and their quantification. Here, we validate this novel analysis algorithm in 123 MRD datasets from 45 patients with ALL (including data from 38 patients from the I-BFM FLOW-MRD study group QC investigations, similar to recently published work) (18, 19).

DESIGN AND METHODS

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Samples and Flow Cytometry

Samples from bone marrow or peripheral blood were collected, processed, and analyzed by flow cytometry using standard protocols (18, 20). Samples were contributed by laboratories form respective national reference centers. An example of typical gating strategy is shown in Supporting Information Figure 1. All the works were approved by local Ethics Committees, and informed consent was obtained from patients or patient's parents or legal guardians according to the Declaration of Helsinki.

Data Preprocessing

We have analyzed flow cytometry datasets that were acquired with FACSCalibur, CantoII, or LSRII cytometers (Becton Dickinson, San Jose, CA) or Dako Cyan ADP (Beckmann Coulter, Brea, CA). Data were exported from their respective operating software (Cell Quest Pro, DiVa, or Summit into FCS files versions 2.0 or 3.0). Raw data were extracted from FCS files and imported into MATLAB environment (MathWorks, Natick, MA) in which all subsequent steps were carried out. Data from the FACSCalibur were exported as compensated and log-transformed datasets. Data from the CantoII and Dako Cyan were exported as linear, noncompensated datasets. For compensation, the fluorochome spill-over table exported from operating software was transposed and inverted to form a compensation matrix, which was then applied to the data. Linear data were processed by HyperLog transform (21) to allow better visualization of the data distributions. Logarithmic scale display is very useful for medium to high values but does not correctly display low and negative values for which a linear scale is more appropriate. Hyperlog transformation combines both scales. It is an inverse hybrid linear/exponential function that is defined over the real-numbered domain. It allows for smooth transition from negative and low positive values in linear scale to higher values in logarithmic scale resulting in a single continuous display. The resulting display of data is based on the number of decades to be displayed, resolution of the cytometer (or analog to digital conversion) and the coefficient (here, fifth percentile of negative values in each parameter) controlling the range of linearly displayed data. All datasets were normalized (z-score) before further analysis.

Hierarchical Clustering

Hierarchical clustering analysis (HCA) is an alternative method to traditional gating for identifying cells with similar characteristics. This method measures the similarity of cells based on the complete profile of all recorded parameters (i.e., both light scatter and all fluorescence channels) rather than based on consecutive single or dual parameter comparisons. HCA builds a hierarchical tree (dendrogram) of cell populations by bottom-up merging of clusters of cells based on their similarities. This merging process starts from single cells and results in a single cluster consisting of all the cells. The internal structure of the resulting dendrogram reflects the merging process that, in turn, reflects the hierarchy of populations in the input dataset.

We developed a new adaptive linkage algorithm for HCA, called Mahalanobis-average linkage, which is especially suitable for flow cytometry data that often contain cell populations of elongated multidimensional ellipsoid shapes (Supporting Information Fig. 2). Mahalanobis-average linkage algorithm proved to be superior to other HCA metrics as tested on a synthetic dataset (Supporting Information Fig. 3). This translates into correct population recognition when compared with other HCA metrics (Supporting Information Fig. 4). It uses a scale-invariant Mahalanobis distance (22) to define the proximity of clusters. The Mahalanobis distance between an ellipsoid (fitted to a cell cluster) and a point (a single cell) is the Euclidean (ordinary) distance of the point from the center of the ellipsoid compensated by the length of the ellipsoid in the direction from the center to the point. This means, for example, that all the points at the “surface” of the ellipsoid have the same Mahalanobis distance to the center of the ellipsoid. During the merging process, the distance of two clusters is computed as the mean of Mahalanobis distances between all data points from one cluster and the ellipsoid fitted to another cluster and vice versa. For single cells (or observation vectors) and clusters containing small numbers of cells, ellipsoids that would correctly fit (i.e., represent) such small clusters could not be computed, so Euclidean distance was used. Therefore, Mahalanobis-average linkage starts in a pure HCA fashion from single datapoints. The linkage smoothly shifts from Euclidean through weighted Euclidean/Mahalanobis to Mahalanobis distance measurement, when computing intercluster distance. The shift is controlled by a threshold from which only Mahalanobis distance is used. This threshold parameter is proportional to the dataset size. For 104 events, the threshold was set at 0.1% of all events. This means that a minimum of 10 events is used to define a fitted multidimensional ellipsoid needed for Mahalanobis distance measurements. The distance to clusters consisting of fewer observations is computed as the weighted mean of Mahalanobis and Euclidean distances: the fewer observations in the cluster, the weaker the contribution of the Mahalanobis distance.

We analyzed datasets with n = 104 events as the distance of data points requires O(n2) space in memory, and therefore, it becomes impractical (or even impossible) to analyze bigger datasets on current desktop computers.

The use of Mahalanobis-average linkage allowed us not only to build the hierarchy from single events but also to retain the advantage of using Mahalanobis distance measurements to compute the distance between larger clusters. The resulting hierarchy of the cells is displayed as a dendrogram (hierarchical tree), accompanied by the dataset table. This table is in the form of a heatmap, where the individual parameter values are color-coded (e.g., blue—low expression, red—high expression). This display, that we term dendroheatmap, allows visualization of the hierarchically clustered flow data with all parameters displayed in a single plot. Cell populations are then selected by the investigator as clusters (branches of the dendrogram) based on their inner compactness and distance to outer clusters. Alternatively clusters can be selected automatically by cutting the dendrogram into clusters at the absolute intercluster distance threshold (which is either chosen or computed). These clusters can be then plotted on traditional scatter plots.

Supervised Learning

To automate detection of clusters of interest, for example, MRD in follow-up samples, support vector machine (SVM) was used as a supervised learning method (with use of Spider—SVM package for MATLAB) (23). SVM classifiers are trained, based on a known class in the training dataset (here: leukemic blast populations in diagnostic samples). The classifiers are then able to recognize the class of interest in test datasets (MRD populations in follow-up samples), as was also reported previously (24). In our study, classifiers were built based on leukemic populations identified as clusters by HCA. In 10-fold cross-validation, two kernel functions were used to train classifiers: radial basis function kernel [sigma (0.25-8), C (5-100)] and polynomial kernel [order (1-8) C (5-50)]. Cross-validation was performed on a training dataset, which was split 10 times to form subsets of the same representation of class positive and class negative events (balanced splits). On each subset of the training dataset, all kernel functions and their respective parameters were tested. The best classifier (and therefore kernel function and its parameters) was then chosen based on the estimate of the lowest rate of misclassification. Misclassification means erroneous classification, that is, situations where the classifier assigns an event to the wrong class, and the rate of misclassification is the frequency of such errors. Before testing new samples for the presence of the cluster of interest (i.e., presence of MRD population in new follow up samples), data were scaled to fit the parameter ranges of the training data (diagnostic dataset). Training of SVM classifiers was done on 104 data points. Testing for the presence of the cluster of interest was performed on 104 events (to match HCA, which is limited in the size of dataset to be analyzed), as well as on whole datasets (5 × 104 to 5 × 105 events). Resulting class estimates (MRD values) were compared with both an independent HCA and the original MRD analyses, performed by specialists from the contributing flow cytometry centers.

All scripts needed to reproduce presented work (MATLAB M-files) are available from the authors on request.

Descriptive Statistics

As current cytometer operating software do not allow easy identification of events (even if in the fcs files they are clearly defined by the order they were recorded), we could not compare individual events assignments. We have used overall percentages and we checked overall LAIPs (leukemia-associated immunophenotypes) agreement where possible.

For assessing agreement between different analytical methods, Passing and Bablok (PB) regression was used. It is described in text by r-correlation coefficient, s-slope, and i-intercept of the regression.

RESULTS

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Hierarchical Clustering of Multidimensional Flow Cytometry Data

Three major populations are usually detected in peripheral blood and bone marrow based on forward (FSC) and side (SSC) light scattering characteristics: lymphoid, monocytoid, and granulocytic cells. As a first test of our method, nine normal bone marrow samples (i.e., samples from end of treatment or samples without disease involvement in bone marrow) were analyzed for the presence of these populations. HCA of six-color (eight-parameter) ungated flow cytometry data allowed us to identify these three populations as prominent clusters in all samples (e.g., Figs. 1A and 1B). Using color coding, the relative FSC and SSC values of all clusters (i.e., populations) can be visualized on the heatmap.

thumbnail image

Figure 1. HCA-Main hematopoietic populations in normal BM. A: Dendrogram with heatmap-HCA of 104 ungated events acquired from normal BM. Heatmap shows relative levels of all eight parameters (columns) in all 104 events (rows) in color coding (blue, low expression; red, high expression). Dendrogram shows the hierarchy of cells based on their similarity in all parameters measured. The x-axis under the dendrogram represents similarity distance. Colored branches of the dendrogram are selected clusters, as displayed in B. B: The main populations (as identified by HCA) are displayed on a conventional forward versus side scatter dot plot. C: All two-parameter combination plots of a defined cluster (marked by red rectangle in A). This population is negative for most of the antibodies from the B cell panel and may be difficult to detect by standard gating as it overlaps with other populations in all 28 two-parameter plots. Nevertheless it is a compact, homogenous population most likely of T cell origin. This red population was drawn on top of the remaining cells (gray). D: Mirror image to C. The gray population was drawn on top of the red population. Fluorescent dyes used were: CD10-FITC, CD22-PE, CD117-PerCP-Cy5.5, CD38-PE-Cy7, CD34-APC, CD19-APC-Cy7, and in all cases pulse height was used.

Download figure to PowerPoint

Moreover, in all normal samples, we found clusters corresponding to candidate cell populations from B-cell development (Supporting Information Fig. 8).

Next, the shape of these clusters (populations) was analyzed on scatter plots. Clusters (populations) defined by HCA reflect similarity in all parameters and, therefore, form compact populations with near lognormal distributions. Hierarchical clustering using Mahalanobis-average distance measurements in the linkage process also proved able to define populations with elongated ellipsoidal shapes, a feature not easily obtainable by conventional HCA metrics, but essential for defining biologically relevant populations.

One advantage of the multidimensionality of HCA is that it can visualize populations that are otherwise difficult to detect (hidden populations). This is exemplified in Figure 1. Although a B-cell antibody panel was applied for this analysis, a distinct CD38+B-lin population was identified that overlapped with other populations in all 28 two-parameter plots (and most likely represent T cells (25)) but is formed by cells with similar characteristics in all parameters.

Detection and Quantification of Leukemic Blast Populations by HCA of Flow Cytometry Data

To test the relevance of our method for clinical use, datasets from ALL samples at diagnosis (n = 48) were analyzed. The mean reference blast population, as determined by gating analysis in the participating laboratories, was 67.4% (standard deviation: 23.8%). HCA was performed in a blinded fashion, without prior knowledge of the expected percentages or of the leukemia-associated immunophenotypes (LAIP). Using HCA, we have correctly identified the leukemic blast populations in all samples analyzed. The blast populations were manually selected based on their distinct pattern on the heatmaps and their clear separation in the dendrograms. When clusters were plotted on traditional dot plots, it was confirmed that their LAIP matched that of the original populations as determined in the reference laboratories. The correlation between blast percentage data derived from standard gating and HCA was 0.968 (PB: r = 0.968, s = 1.01, i = −0.52).

ALL MRD Monitoring by HCA

To further validate our method in a clinical trial setting, we analyzed flow cytometry datasets from 23 BCP-ALL and 15 T-ALL sample pairs engaged in the most recent QC investigation of the I-BFM FLOW-MRD study group (18). Diagnostic (see also above) as well as day 15 datasets of patients treated with the AIEOP-BFM-ALL 2000 protocol (26) were again analyzed in a blinded fashion. There was excellent concordance of the results between the HCA and the reported values from the QC trial (Fig. 2). First BCP-ALL: For diagnostic samples (range of expected values: 23.92–90.21%, median: 63.83%), the correlation was 0.984 (PB: r = 0.984, s = 1, i = 0.52) and for day 15 MRD samples (range: <0.01–19.37%, median: 0.68%) the correlation was 0.995 (PB: r = 0.995, s = 1.02, i = −0.03). In T-ALL diagnostic samples (range of expected values: 28.00–92.00%, median: 82.00%), the correlation was 0.913 (PB: r = 0.913, s = 1.17, i = −14.45), and in day 15 MRD samples (range: <0.01–81.00%, median: 14.60%), the correlation was 0.996 (PB: r = 0.96, s = 1.01, i = −0.02).

thumbnail image

Figure 2. Correlation of HCA analysis results with standard gating. 123 samples from three independent centers, including 38 sample pairs from the most recent QC investigations of the I-BFM FLOW-MRD study group, were analyzed by standard gating (5 × 104 to 5×105 events) and HCA (1 × 104 events). The reported blast population percentages are calculated as percentage of all events analyzed. A: The Passing and Bablok regression. B: Bland-Altman plot showing differences of two measures (clustering and gating) from the mean values.

Download figure to PowerPoint

The overall correlation of all samples analyzed (n = 123, including 76 samples from QC trials) was 0.992 (PB: r = 0.992, s = 1, i = −0.01).

ALL MRD Monitoring by HCA and SVM

For further improved MRD monitoring, we combined the advantages of HCA in characterizing the leukemic population with fast and automatic recognition of the blast population by SVM classifiers. The workflow for the combined use of HCA and SVM analyses is illustrated in Figure 3. First, HCA was performed on the flow dataset of the original diagnostic sample to define the leukemic population as a cluster, and then this cluster was used to train a SVM classifier. The trained classifier was then used in all follow-up samples from the same patient to automatically detect MRD. The MRD estimates were compared with results from independent HCA and with the standard gating analysis from the different reference laboratories.

thumbnail image

Figure 3. Example of detection and quantification of ALL blasts. Diagnostic (left) and day 15 (right) bone marrow samples from one patient with ALL. A: Gates used for conventional gating of the leukemic population (red) using a standard antibody panel (CD58, CD10, CD45, CD34, CD19, CD20). B: Result of independent HCA of the same flow data. Clusters were selected form hierarchy as branches of dendrogram. Cells included in such a branch are colored red. The red clusters are displayed on similar dot plots as for the standard gating. (Note the slightly different shapes of populations, due to differences between biexponential (DiVa) and Hyperlog transformations. This is also reason for better visualization of inner heterogeneity in HCA and SVM of d15 blast population.) C: SVM for automatic detection of the MRD population in the day 15 sample. First, the classifier is trained based on known class labels (here: the red cluster from HCA at diagnosis). Second, the classifier is asked to automatically assign class distribution in a test sample (here: the day 15 sample from the same patient). The results are also displayed in two relevant dot plots, were coloring is result of automatic class selection (here class “MRD” - red).

Download figure to PowerPoint

MRD levels were analyzed using SVM classifiers in 15 follow-up samples from five patients with ALL with persistent leukemia throughout induction, as defined by conventional flow MRD analysis (Fig. 4). SVM classifiers trained on diagnostic populations from BM or PB that either constituted the majority or the minority of cells in the sample (range 3.51–92.38%) both performed well in the follow-up samples. HCA and SVM showed a good concordance with conventional gating in follow-up samples with the percentage of persisting blasts ranging from 0.004 to 57.54%, median 0.65% (as estimated by gating). In samples with low MRD levels (less than 0.5%), SVM (performed on the complete dataset) correlated better with standard gating than HCA (performed on 104 events only) correlation 0.967 (PB: r = 0.967, s = 1.03, i = 0.01) and 0.910 (PB: r = 0.91, s = 1.33, i = 0), respectively, Figure 4.

thumbnail image

Figure 4. Comparison of MRD monitoring by gating, HCA, and SVM. Datasets from five patients were obtained and analyzed without prior knowledge of the immunophenotype or percentage of the leukemic blast population. After HCA of the diagnostic sample, the leukemic cluster was used to train classifiers to be applied in the follow-up samples. In the follow-up samples, SVM was performed on 104 events as well as on all recorded events (5 × 104 to 5 × 105). A: An example of a patient's follow-up monitoring by all three methods. BD: Comparisons of HCA and SVM to standard gating.

Download figure to PowerPoint

Current Challenges in Flow MRD Monitoring

One of the key challenges of modern multiparameter flow cytometry is the increasing complexity of its datasets. HCA can be used for analysis of datasets with increasing numbers of parameters as was shown with six and eight parameters as well as with 10 parameters (Supporting Information Figs. 5 and 6) datasets. As proof of principle, we also successfully applied this method to a 20-parameter synthetic dataset (Supporting Information Fig. 7).

Specific challenges of flow cytometric MRD monitoring are the identification of leukemic blasts, which have undergone an immunophenotypic shift during therapy and the discrimination of leukemic blasts from regenerating hematogones with a related immunophenotype. There were two such examples in our cohort. Figure 5 shows HCA and SVM analyses of samples in which the leukemic blast population samples downregulated CD34 expression during treatment. Despite this antigen shift, both the methods were able to correctly identify and quantify the MRD population. Similarly, analyses of a postinduction bone marrow (day 78 of ALL-BFM 2000 protocol) are shown in Figure 6 in which HCA correctly distinguished between leukemic blasts and hematogones by putting them to individual branches in the dendrogram. These two populations were—due to their immunophenotypical similarity—otherwise very close in the cluster hierarchy. On the other hand, the SVM classifier was not trained to distinguish between the two populations, because the regenerating population was not present at diagnosis, and therefore, it assigned both the leukemic and the normal regenerating population to one cluster.

thumbnail image

Figure 5. HCA and SVM correctly identify the MRD population despite a shift in CD34 expression. Histogram of CD34 expression on the leukemic populations at diagnosis (black) and at week 11 (gray). The insert provides a comparison of MRD with HCA and SVM (SVM not present at d0 as it is trained at diagnosis).

Download figure to PowerPoint

thumbnail image

Figure 6. Regenerating hematogones are distinguished from leukemic cells by HCA. A: Populations in a day 78 bone marrow as defined by standard gating (i) and HCA (ii). B: Enlarged section of the dendrogram with heatmap corresponding to populations in ii. Red, leukemic population; blue, regenerating hematogones (lower CD10, CD20 expression); green, debris (SYTO41 negative).

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Both for hematological research and diagnostics, modern multiparameter flow cytometry is a powerful tool for phenotyping normal and leukemic cell populations (27). One of its key applications lies in the monitoring of MRD as a clinically important biomarker for relapse prediction and treatment stratification (1–5).

However, the rapidly rising complexity of multiparameter flow cytometry datasets creates new challenges. Currently, the analytical standard in the field involves gating, in which one or more gates are defined in each histogram or dual parameter plot and a sequence or combination of gates defines the population of interest. This process is tedious or even unfeasible, already requires highly experienced cytometrists and is observer dependent. Most importantly, the sequential gating is limited in its ability to reflect the multidimensionality of the data. A solution to both the multidimensionality of flow data and the observer dependency lies in the usage of unsupervised learning methods.

A number of methods have been suggested for the use in flow cytometry (10–16). Most methods rely on estimate of number of populations leaving behind hierarchical nature of complex biological sample or use the hierarchy of clusters only as a proxy for building models of estimated number of components/clusters (28). HCA, on the other hand, offers a picture of all recorded events in a hierarchically organized fashion so that cells with similar characteristics reside close to each other. HCA in its standard form measures distances between data points once and then uses this distance in the linkage part of the algorithm for merging clusters and thus building the hierarchy. This setting, however, does not reflect the ellipsoidal shape of populations in flow cytometry datasets and, therefore, there has been a lack of unsupervised learning methods applicable to flow cytometry.

We have developed a new algorithm for HCA, using adaptive Mahalanobis-average linkage, to cluster flow data. In this algorithm, merging of clusters is based on distances of data points of one cluster to an ellipsoid fitted to another cluster and vice versa. Mahalanobis-average linkage allows the formation of clusters starting from single cells. This feature has two important advantages over some previous methodologies (17): the major one being that clustering from single cells increases the sensitivity of population detection, which has important implications for both MRD monitoring and explorative analyses of flow data. Second, Mahalanobis-average linkage also allows clustering in pure HCA manner without the need of initial data splitting thus avoiding any possible introduction of errors in this first step.

As a marker of quality, populations chosen as clusters from HCA using Mahalanobis-average linkage, usually have an even distribution of measured cell parameters. HCA dissects such populations not only when they are relatively well separated from each other but also when they are overlapping with other populations in the sample and therefore would be difficult or even impossible to find using a traditional gating approach. This was exemplified by the distinct CD38+B-lin population in Figure 1.

Using this novel HCA algorithm, it became possible to correctly assign both the immunophenotype and the percentage of leukemic blasts, in a large cohort of diagnostic and follow-up samples from children with ALL. There was an excellent correlation between leukemic levels determined by traditional gating and HCA. The correlation was equally high for both BCP-ALL and T-ALL samples, even if T-ALL is traditionally more challenging due to interfering normal cell populations, blast heterogeneity or loss of immaturity associated markers. Most importantly, this correlation compared well with the interlaboratory QC investigations recently reported from the I-BFM FLOW-MRD study group (18). This comparison served as validation of our method, demonstrating the potential of this new approach for clinical use.

The biggest challenge for HCA using this algorithm is the size of dataset, which can be analyzed on current desktop computers, which is limited to ∼2 × 104 events. The sensitivity of any flow cytometry analysis is limited by the number of events recorded and by the minimal number of cells recognized as a population. For 104 events, datasets analyzed by HCA in this work using 10 cells as a threshold for defining a population, the sensitivity is 0.1%. This can be overcome in MRD monitoring, where high numbers of cells must be analyzed, by usage of SVM. SVM is a supervised learning method able to automatically detect populations of interest in datasets with high numbers of acquired events (>106, data not shown). The combination of hierarchical clustering and SVM allowed detection of low levels of residual disease population in follow-up ALL samples. Other possibilities, not presented here, to overcome cell number limitations is either use SVM on populations derived from other methods (e.g., binning) or to split the data before HCA is applied (either manual—subgating or the sequence of HCA on representative subset > SVM of chosen subset > HCA on subset defined by SVM).

Other challenges for flow cytometric MRD monitoring and thus for HCA or SVM analysis include immunophenotypic shifts in ALL blasts following therapy, as well as the discrimination of persisting leukemic blasts from regenerating normal hematogones (29–31). In principal, any supervised learning is prone to be affected by significant changes in the parameters of the population of interest, and these issues will need further investigation. In our hands, immunophenotype modulation did not hamper HCA or SVM analysis (Fig. 5). Regenerating populations, detected with HCA, were, however more challenging for SVM (Fig. 6). This difficulty may be addressed by training the classifiers on several classes simultaneously, for example, both by positive training on the class of interest (here: the malignant population) and by negative training on the remaining data points (here: residual normal bone marrow).

Despite these challenges, HCA using Mahalanobis-average linkage opens up a new perspective on how to view flow cytometry data. The inherent multidimensional nature of the analysis leads to identification of homogenous populations not only on histograms or two-parameter dot plots but also in the n-dimensional space (with n representing the number of parameters analyzed). This method can be up-scaled and is easily applicable to modern cutting edge high parameter flow cytometry that is otherwise difficult to analyze. In addition, HCA has the ability to show sample populations not only in the hierarchical context of other populations but also with their inner hierarchy. Dissecting tumor heterogeneity is key to understanding clonal evolution (32), development of drug-resistance (33) or identifying candidate leukemia-propagating stem cell populations (34).

Most important for its clinical applicability, HCA is less observer dependent than traditional gating. By reanalyzing 23 follow-up samples from the I-BFM FLOW-MRD study group (18), we have been able to achieve high concordance with standardized flow cytometry analysis, without participating in the respective intensive training and feedback framework. SVM testing is completely observer independent, once the classifier is trained for the recognition of residual disease population, and can be fully automatic. However, the choice of population to train the classifier still is observer dependent and relies on the method used. As HCA clusters form compact populations that provide optimal classes for SVM it appears to be advantageous to combine HCA with SVM. HCA has the ability of identifying various relevant populations without previous knowledge of the sample, whereas SVM, once the classifier is trained, can perform the test automatically, speedily and on very large datasets.

The hierarchy of flow cytometry events produced by HCA is independent of person or laboratory where it is performed. The cluster selection can be formalized (description of dendrogram branches) and population size and structure can be discussed. This will allow a new level of easy and formally precise standardization and quality control in large international multicenter trials.

In summary, we have developed a new algorithm that allows applying HCA to flow cytometry and which opens new opportunities for the scientific and clinical use of flow cytometry. Most importantly, this approach reflects the multidimensionality and enables analysis of complex, multiparameter flow data. It provides a new tool to study tumor heterogeneity. From a clinical perspective, in combination with SVM learning, HCA using Mahalanobis-average linkage is applicable to leukemia diagnostics and MRD monitoring. It has been validated against a standardized set of flow MRD data from the I-BFM FLOW-MRD study group, and it has the potential for automatization.

Author Contributions

K.F. designed research, wrote MATLAB scripts, analyzed data, prepared the figures and wrote the paper; T.S. developed Mahalanobis-average algorithm, wrote MATLAB scripts and critically reviewed the manuscript; A.S. retrieved data; B.W., J.I., E.M. and M.N.D. provided data and critically reviewed the manuscript.

Acknowledgements

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

The authors are grateful to Marian Case for flow cytometry datasets retrieval. This work contains data which were elaborated within an international co-operative study of the I-BFM ALL FLOW-MRD study group, represented by M.N. Dworzak (source of data files), G. Basso (Univ. of Padova), G. Gaipa (Tettamanti Research Center, Monza), and L. Karawajew (Robert-Roessle Clinic, Medical University of Berlin Charité). Senior author: Michael N. Dworzak, coordinator of the I-BFM ALL FLOW-MRD network.

Literature Cited

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. DESIGN AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. Acknowledgements
  7. Literature Cited
  8. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
CYTO_21147_sm_SuppInfo.doc4841KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.