Three individually successful techniques in the fields of biotechnology and drug discovery are high-throughput (HT) screening platforms (1), flow cytometry (2–4), and data-mining tools, such as hierarchical clustering (5–8). In this study, these three techniques have been combined and used successfully to detect cell states induced by multifactor combinations. The creation of multifactor combinations and analysis of the results of cell-based in vitro assays can better elucidate how cells behave in a complex microenvironment (9–11). Cells behave differently when treated with single factors as compared to treatment with multiple factors either in sequence or in combination (12, 13). Thus, by limiting the evaluation of treatment combinations, traditional experimentation does not approach a cell's natural environment and thus in vitro results may not reflect the in vivo situation (14–16).
For cell-based studies, HT, combinatorial platforms, aided by automation are more useful than traditional, manual experimentation because of three advantages (7, 17–19). First, HT-automated platforms have enabled the creation of large numbers of multifactor combinations that can elucidate how cells behave in a complex microenvironment and, consequently, reduce the limitations of traditional experimentation, in which only one or two variables are manipulated at a time and improve correlations between in vitro and in vivo results. Second, automated platforms have provided a hands-off method for accurately measuring and dispensing minute, and often viscous samples with increased accuracy and precision. Third, HT platform discovery and use requires sample miniaturization, resulting in a lower cost per sample, thus allowing analysis of more samples, and the exploration of a larger experimental space.
High-throughput platforms have not been used together with flow cytometry until recently (20). A well-known advantage of flow cytometry is its ability to evaluate multiple parameters for each cell, such as cell size, cell granularity, DNA content, cell viability, differentiation, proliferation, production of soluble antigen, and expression of cell-surface antigens (which can also indicate activation) (3, 4, 21–23). This advantage is particularly useful in the analysis of hematopoietic cell populations, because cell populations can be potentially analyzed using over 200 CD (cluster of differentiation) cell-surface markers. With the introduction of HT samplers to flow cytometry, a large diverse set of treatment conditions can be analyzed for a large number of read-outs simultaneously (20). By expanding both the number of input variables and the number of readouts, a larger experimental space can be explored, increasing the probability of finding conditions that can produce diverse phenotypic profiles.
In spite of these potential benefits, flow cytometry has disadvantages that have likely contributed to its non-use in HT platform-based approaches. Flow cytometry generates large, bulky data sets that are typically analyzed using two dimensional scatter plots. This makes it difficult, or impossible, to identify and compare trends in the data efficiently. Thus, traditional data analysis, consisting of generating graphs in 2-D format, may not be useful for analysis of data from HT flow cytometry. More commonly, traditional data analysis is being replaced by high-powered data-mining techniques in order to analyze the large amounts of data generated using HT, automated platforms.
One example of a data-mining technique is hierarchical clustering (24, 25). Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces and allows for easy handling of large bulky data sets. Rather than merely confirming already known relationships, hierarchical clustering facilitates the probability of finding rare, unexpected, or nonlinear events in complex high-dimensional data sets. The incorporation of HT experimental platforms should greatly increase the likelihood of finding improved correlations between in vitro and in vivo systems. Data mining tools, such as hierarchical clustering, can help to identify novel responses and alternate formulations that can produce a particular cellular profile.
Here, HT platforms, flow cytometry, and data-mining tools, such as hierarchical clustering, are all used to analyze the effects of complex treatments from combinations of factors on cell states, by analyzing both large data sets from multiple read-outs and experiment-wide responses to many different treatments in one plot. Through the use of data-mining techniques, rare, unexpected response profiles and nonlinear interactions were found in complex high-dimensional data sets. Finally, flow cytometry is shown to be an effective tool for defining discrete phenotypic states that can be related through informatics to the formulations that produced them.
- Top of page
- MATERIALS AND METHODS
- LITERATURE CITED
The acute promyelocytic leukemia cell line HL-60 was selected as the biological model for several reasons (26, 27). First, it is a very well-studied cell line and reproducing previously published data was necessary to validate the system. Second, the HL-60 cell line can respond to multiple biological and pharmacological factors and can differentiate along multiple pathways within the myeloid lineage (28–32). This multiplicity is necessary to assess multiple inputs and multiple endpoints in parallel. Third, this cell line shows evidence of nonlinear, non-obvious interactions, essential to prove TransForm's ability to explore the combinatorial space (33–40).
The HL-60 cells were exposed to five well-investigated pharmacological agents known to initiate differentiation in these cells: DMSO, vitamin D3, PMA, sodium butyrate, (pH 7.8), and ATRA. These agents, alone or in limited combinations, have been shown repeatedly to promote differentiation of HL-60 cells along three distinct pathways within the myeloid lineage: neutrophil, monocyte, and eosinophil/basophil (9, 29–32, 36, 41).
Tartan plots and heat maps can be generated by clustering either all cell-surface markers or a constrained subset of markers. Figure 2a shows a tartan plot that clusters all cell-surface markers. All individual red squares outlined in black are clusters of treatments that yielded a similar cellular profile but have a distinctly different phenotypic profile from the next red square outlined in black (compare squares 1–7). The squares are separated from each other by areas of yellow, green, or blue. The associated heat map in Figure 2b shows the relative intensities of marker expression for the treatment combinations in each square. For example, treatments 68–72 (lower white box) can be compared with square 4 in Figure 2a and treatments 73–82 (upper white box) can be compared with square 5 in Figure 2a.
If the analysis is constrained to cell-surface markers that represent a particular cell lineage (for example, markers representing the myeloid lineage), further stratification of phenotypes is observed. Tartan plots and heat maps were generated from clustering cell-surface markers that are restricted to myeloid cells (CD66b, CD11b/Mac1, CD13, CD14) (Figs. 3a and 3b) (13, 42–45). The Tartan plot shows a series of red squares along the diagonal separated by some yellow and a lot of blue, meaning that distinct cellular profiles arise based on experimental treatments (Fig. 3a). In fact, a pattern that is most representative of monocytes (CD66b−, CD11b/Mac-1+, CD 13+, CD14+) (13, 42–45) can be viewed in the heat map (Fig. 3a, compare formulations 29–45 black box with Fig. 3b, formulations 29–45 white box).
Figure 3. Hierarchical clustering using cell-surface markers for a specific cell lineage. (a) Tartan visualization. To validate the expected cell states, the data was clustered using markers specific for cells of the myeloid lineage (CD66b, CD11b/Mac-1, CD13, and CD 14). Red boxes along the diagonal line indicate formulations producing states similar to one another. (b) A heat map shows the percentage of positive cells for each of the indicated markers associated with the clustering Tartan plot in Figure 3a. For example formulations 30–45 show low expression of CD66b and high expression of CD 11b, CD13, and CD14, indicative of a monocyte cell. The black box around formulations 30–45 in Figure 3a corresponds to the white box around the same formulation in Figure 3b.
Download figure to PowerPoint
One convenient feature developed in this software is the ability to retrieve the treatment combinations giving rise to cells exhibiting a profile of interest (Fig. 4a, blue highlighted area). In this case, treatments 29 through 45, which produced the “monocyte-like” pattern in the heat map, contained vitamin D3, PMA, or both. These results are consistent with literature findings demonstrating that monocytes are produced following treatment of HL-60 cells with vitamin D3, PMA, or both. The difference in expression patterns between treatment wells can be examined by viewing the composite signatures for each of the wells that produced a particular profile (Fig. 4b).
Figure 4. Retrieval of formulations producing a particular cell type. (a) Clicking on a square cluster in a Tartan plot opens the formulation viewer, which shows the formulations that give rise to that particular cell type (blue highlighted area). (b) A composite signature viewer for that particular formulation and marker set can also be displayed, so that each individual composite signatures for a particular treatment in a cluster can be quantitatively viewed. Each line is a composite signature for an individual treatment. All the signatures resulting from treatments 30–45 are very similar, which means that the cellular profiles are similar.
Download figure to PowerPoint
An advantage of data-mining is the ability to discover nonlinear or unexpected events, such as factor dominance, unexpected cell cooperativity, or synergies and anergies (data not shown). Factor dominance occurs when the combination of two treatment factors produces a profile that resembles a profile produced by one of the factors and not the other. This was observed when PMA, 0.81 nM (PMAlo) (Fig. 5a) was combined with sodium butyrate, 100 μM, (sodium butyratelo) (Fig. 5b). The resultant profile was most similar to that of PMAlo (Fig. 5c). Dominance by one factor in one treatment did not, however, predict the outcome when that factor was used in a different treatment. When DMSO, 0.18 M (DMSOmed) (Fig. 5d) was combined with PMAlo (Fig. 5a), the resultant profile was most similar to that of DMSO (Fig. 5e), indicating that, for this treatment paradigm, DMSO and not PMA was the dominant factor.
Figure 5. Combinations reveal nonlinear, non-obvious interactions between factors. (a–e) Binary combinations can demonstrate factor dominance. Signature profiles show the percentage of cells positive for the cell-surface markers CD66b, CD11b/Mac-1, CD 13, and CD 14. The signature profile of cells treated with a combination of PMAlo + sodium butyratelo (c) is more similar to that of cells treated with PMAlo alone (a) than to that of cells treated with sodium butyratelo alone (b). PMA does not always act as the dominant factor, however. When PMAlo (a) is combined with DMSOmed (d), DMSO behaves as the dominant factor (e). (f–h) Binary combinations result in non-obvious interactions between factors. A “bin” is a grouping of antibodies with the same fluorescent label due to the low probability for expression in these populations. Here CD34, CD3, B220, and CD56 were all labeled with FITC. Neither sodium butyratemed (f) nor DMSOmed (g) alone produce cells positive for CD 125w, whereas cells treated with both sodium butyratemed and DMSOmed (h) produce cells expressing high levels of CD125w. Higher order combinations result in unexpected interactions between differentiation factors. (i, j) The ternary combination of DMSOhi+NaButlo+RAlo produced a phenotype distinctive from those resulting from respective unary and binary inputs. DMSOhi+NaButlo+RAlo treatment results in down regulation of CD83 and CD235 (black arrows) suggesting negative synergies between these factors. These synergies would not have been predicted from the unary and binary data. (j) Changing concentrations of the differentiation factors in a ternary mix produces distinct phenotypes, illustrating complexity of the DFs interactions.
Download figure to PowerPoint
Unexpected cell cooperativity occurs when a combination treatment produces a unique profile different from the profiles of the individual treatments. This is illustrated by sodium butyratemed 300 μM, (sodium butyratemed) (Fig. 5f) and DMSOmed (Fig. 5g). When present alone, each produced cells negative for cell-surface marker CD 125w. However, when used in combination these factors produced a population expressing high levels of the cell-surface marker CD125w (Fig. 5h).
Another example of unexpected cooperativity is the ternary combination of DMSO, 0.15 M, (DMSOlo), sodium butyrate, 500 μM (sodium butyratehi), and ATRA, 500 nM (ATRAhi), which produced a profile that had different levels (much lower) of expression of CD83 and CD 235a when compared to the profiles of the individual factors and of the binary combinations (black arrows, Fig. 5i). Furthermore, the combination of DMSO, sodium butyrate, and ATRA at different concentrations resulted in different signature profiles (Fig. 5j). These results show that factor type, combination, and concentration are all important for determining cell phenotype.
Data mining allowed us to identify and retrieve atypical cellular profiles that are not obvious on visual inspection alone. For example, some treatment combinations produced cell populations exhibiting high expression of lymphocyte markers (CD3, B220), hematopoietic stem cell (HSC) markers (CD 34), and erythrocytic markers (CD235) (Fig. 6a). Moderate to high expression of myeloid lineage markers (CD66b, CD11b, CD33, CD13, CD14) were also observed for these profiles (Fig. 6a). At the time of publication, no known cell population expressing high levels of multiple lineage markers had been reported, and such cell populations are considered atypical. Interestingly, all treatments producing this atypical cellular profile contained both PMA and DMSO, whereas treatments containing either PMA or DMSO alone did not produce this atypical cell profile (Fig. 6b, highlighted in blue).
Figure 6. Hierarchical clustering reveals functional cell states. (a) HL-60 cells treated with binary combinations of PMA and DMSO produce abnormal composite signatures. (b) The highlighted blue box shows the formulations that produced the abnormal composite signatures in Figure 6a. In this case, all formulations contained both PMA and DMSO. (c) Flow cytometric analysis of HL60s treated with 2 ng/ml idarubicin. Extent of apoptosis is indicated by 7-AAD and Annexin 5 markers. HL-60s treated with both PMA and DMSO are more sensitive to apoptosis following treatment with idarubicin than those treated with PMA or DMSO alone. (i) Undifferentiated cells, no idarubicin; (ii) Undifferentiated cells, 2 ng/ml idarubicin; (iii) Cells treated to differentiate with DMSOhi + 2 ng/nl idarubicin; (iv) Cells treated to differentiate with PMAlo + 2 ng/ml idarubicin; (v) Cells treated to differentiate with PMAlo + DMSOhi, no idarubicin; (vi) Cells treated to differentiate with PMAlo+ DMSOhi + 2 ng/ml idarubicin. (d) HL-60 cells treated to undergo differentiation with both PMA and DMSO show a synergistic response at low doses of idarubicin compared to cells treated with PMA or DMSO alone and control cells that have been not been stimulated to differentiate.
Download figure to PowerPoint
The identification of unusual cellular phenotypes provides a launching point to explore the biology of those populations. We proposed that unusual phenotypes may be associated with altered sensitivity to chemotherapeutic drugs (39, 46, 47). To test this hypothesis, differentiation was induced in the presence of PMA alone, DMSO alone, or mixtures of PMA and DMSO over 5 days (Fig. 6). The differentiated cell populations were subsequently treated with idarubicin, an anthracyclin derived antibiotic, commonly used to promote apoptosis in leukemic cells (37) Treatments containing both PMA and DMSO accelerated cell apoptosis, compared with treatments of either PMA or DMSO alone (Fig. 6c). This acceleration was most evident at idarubicin doses of less than or equal to 10 ng/ml, whereas higher idarubicin doses induced all cells to undergo apoptosis (Fig. 6c). Recent studies by Marekova et al. have shown that PMA and DMSO individually render HL-60 cells resistant to idarubicin-induced apoptosis (47), indicating that the combination of PMA and DMSO changes the susceptibility of cells to idarubicin (9, 37). Furthermore, similar increases in idarubicin-induced apoptosis were observed in combinations of PMA and other factors (ATRA, sodium butyrate, or vitamin D3). No such increases occurred with DMSO or combinations of DMSO with other factors (data not shown).
- Top of page
- MATERIALS AND METHODS
- LITERATURE CITED
The three techniques of HT experimentation, flow cytometry, and hierarchical clustering have been combined and used successfully to detect different cell phenotypes resulting from various combinations of multifactor treatments. Incorporating HT-automated dispensing platforms and informatic data mining tools into traditional assays has allowed the execution and analysis of more complex experiments which are thus becoming more mainstream. In this study, HT experimentation was extended to another application: flow cytometry.
Despite the pronounced advantages of HT experiments for cell-based assays, the use of flow cytometry in HT experimentation has not been widely used until recently (20). The introduction of HT samplers allows experiments to be performed at higher throughput and on a smaller scale, thus using less material and becoming more economical. The adoption of high throughput applications to flow cytometry will result in the generation of larger and more complex data sets. Conventional flow cytometry data mining applications are inadequate to handle this increased data complexity and therefore data sets can quickly become too difficult to manage and the identification of trends within the data set can often be missed. Here, the application of hierarchical clustering and other data-mining applications, now common place (e.g., in genomics research) apply to these large, complex data sets.
Hierarchical clustering not only identified trends in the data sets but also revealed nonlinear, non-obvious responses in the data. In this case, experiment-wide patterns of response under different treatment scenarios were examined. The expression of all cell-surface markers for a particular treatment and the expression of any one cell-surface marker over all treatments can be assessed from one plot. To confirm the validity of this approach, both the positive controls and the treatments that produced them were correctly identified and correlated to previously published results (31, 32, 36, 40, 42–45, 48, 49).
Data mining allowed us to identify and retrieve atypical cellular profiles that are not obvious on visual inspection alone. The ability to discover atypical cell profiles and explore their physiological relevance is facilitated by hierarchical clustering and provides a launching point to explore the biology of a particular cell population. Without the added ability to mine large data sets, opportunities to discover and provide insight into interesting biological phenomena may be missed. In this paper, this principle was illustrated by identifying a subpopulation of HL-60 cells that had enhanced susceptibility to chemotherapy-induced apoptosis in follow-up manual experimentation. Therefore, a potential application of these discovery methods may be that it is a useful tool for producing treatments of future clinical value. This could be accomplished either by identifying new combination treatments or if the mechanisms of the factors producing the desired effect are known, other agents which utilize the same pathways or molecules known to interact with these pathways may be used in subsequent combinatorial screens.
The incorporation of flow cytometry into a high-throughput platform has advantages over other common techniques that incorporate high-throughput dispensing platforms or advanced data-mining techniques. One such technique is molecular profiling, in which discrete phenotypes are identified by changes in gene expression rather than protein expression. Therefore, their identification is susceptible to effects from post-translational modification; so the expression profiles may not solely determine the final phenotype. Flow cytometry has the advantage that phenotypes are captured on the basis of protein expression; a more physiologically relevant parameter.
Molecular profiling of individual cells is also cumbersome. Most cell populations are heterogeneous mixtures of subpopulations. This heterogeneity is not compensated for in RNA and DNA microarray analyses, so the results reflect a mixture, rather than a single population of cells. Consequently, absolutely pure cell populations must be analyzed. Flow cytometry has the advantage that it does not require pure cell populations because it can quickly sort and examine large numbers of cells individually and separate them according to specific antigenic profiles.
Another advantage of incorporating flow cytometry is that it allows for multiparametric data capture of cell populations exposed to multifactor combinations. Multifactor combinations are rarely tested in traditional experiments and are often limited to testing only one or two components at a time. Thus the in vivo microenvironment can not be fully captured in vitro, resulting in poor correlation between in vitro and in vivo experiments. For example, in vivo, cells are targeted with multiple stimuli that occur either in combination or sequentially and include soluble and insoluble factors (50, 51), mechanical and electrical signals (50), osmotic pressure (52), hydrophobicity (53), and oxygen tension (52). In these experiments, the experimental design included both multiparameter inputs and multiparameter outputs. Performing experiments using multiple inputs and outputs is important, particularly in cell-based assays and provides rich data sets for the evaluation of trends.
While attempting to improve the overall sensitivity of the algorithms for detecting phenotypic profiles, some limitations of the technique were detected. One limitation was that not all formulations known to produce monocytes were discovered. For example, treatments containing vitamin D3lo alone were not observed in the monocyte cluster, although they do, in fact, produce monocytes. The most persistent limitation was that some profiles fell into multiple clusters. This limitation occurred because some differences between pairs of 16-dimensional histograms were lost in data reduction. For instance, the measure works well when using cell-surface markers that shift from either zero to a positive or a positive to zero surface marker expression. However, more subtle shifts in the mean fluorescence intensity are not captured as well by the measure. Alternatively, a distribution-free similarity measure such as the Kolmogorov-Smirnov measure that evaluates the difference between cumulative distributions from the profiles would also be able to handle cases like these (54). In future work, algorithms will be designed that better assess small shifts in mean fluorescent intensity for a particular marker.
Another limitation of hierarchical clustering was the selection of cell biomarkers. The proper selection of cell biomarkers is crucial for accurate experimental analysis. The markers should be selected only if they are found exclusively on the phenotype being sought. For instance, a cell-surface marker found on multiple cellular phenotypes is not as helpful as one found on a cell induced to differentiate into only one cell type. Finally, the proper labeling of cells by the antibodies must be verified. If an antibody is defective, then it provides unreliable information, and data may be misinterpreted.
Calibration and normalization of antibodies were not a concern for this application in flow cytometry. The responses of the antibodies in each well were normalized by comparing the fluorescence intensity distribution with untreated, unstained wells (negative controls) on the same plate. Day-to-day variation in fluorescence readings was assessed and corrected for by using cumulative distribution plots of control wells across all plates. A composite signature profile was generated for each unique treatment combination and compared against every other treatment combination in the experiment. Therefore, relative rather than absolute values of antibody expression were most important for our purposes.
The experimental goal for this paper was to validate our approach using the HL-60 cell line. The benefit of using a cell line for this purpose included easier use and handling, less sensitivity to environmental manipulation, result predictability, and a generous literature landscape for comparison of results. However, there are exciting potential applications for using HT combinatorial platforms together with HT flow cytometry and informatic data mining in primary isolated cells. For example, finding the combination of factors necessary to sustain embryonic stem cells in their undifferentiated state without the need for a feeder layer and/or finding the combinations of factors that provide more directed cell differentiation (55). Another application that could benefit from this type of approach is autologous cell therapy, a process that incubates specific cytokines in a medium containing an autograft, allowing for selective amplification of particular cell populations, which can then be returned to the donor (56, 57). This technique has often been hindered by the difficulty optimizing the many conditions necessary to produce the large, optimal cell number of the desired population necessary (58). The common challenge of these two applications is to develop and define methods that promote directed differentiation or expand the desired populations of cells without becoming contaminated with undesired cell populations. By implementing the approach described here, perhaps some unknown parameters concerning these fields of research can be revealed.
New automated ways to manipulate the complex input variables, along with data-mining tools that can analyze the large amounts of data produced from these experiments have proven successful. They can be used to model the in-vivo situation more accurately than possible with other methods. Therefore, HT flow cytometry coupled with hierarchical clustering has an important place in the investigation of biological systems in which multiple experimental parameters may be manipulated.