From transcriptome to cytome: Integrating cytometric profiling, multivariate cluster, and prediction analyses for a phenotypical classification of inflammatory diseases



Gene expression studies of peripheral blood cells in inflammatory diseases revealed a large array of new antigens as potential biomarkers useful for diagnosis, prognosis, and therapy stratification. Generally, their validation on the protein level remains mainly restricted to a more hypothesis-driven manner. State-of-the-art multicolor flow cytometry make it attractive to validate candidate genes at the protein and single cell level combined with a detailed immunophenotyping of blood cell subsets. We developed multicolor staining panels including up to 50 different monoclonal antibodies that allowed the assessment of several hundreds of phenotypical parameters in a few milliliters of peripheral blood. Up to 10 different surface antigens were measured simultaneously by the combination of seven different fluorescence colors. In a pilot study blood samples of ankylosing spondylitis (AS) patients were compared with normal donors (ND). A special focus was set on the establishment of suitable bioinformatic strategy for storing and analyzing hundreds of phenotypical parameters obtained from a single blood sample. We could establish a set of multicolor stainings that allowed monitoring of all major leukocyte populations and their corresponding subtypes in peripheral blood. In addition, antigens involved in complement and antibody binding, cell migration, and activation were acquired. The feasibility of our cytometric profiling approach was demonstrated by a successful classification of AS samples with a reduced subset of 80 statistically significant parameters, which are partially involved in antigen presentation and cell migration. Furthermore, these parameters allowed an error-free prediction of independent AS and ND samples originally not included for parameter selection. This study demonstrates a new level of multiparametric analysis in the post-transcriptomic era. The integration of an appropriate bioinformatic solution as presented here by the combination of a custom-made Access database along with cluster- and prediction-analysis tools predestine our approach to promote the human cytome project. © 2008 International Society for Analytical Cytology

During the last decade nonhypothesis-driven high throughput techniques such as genomics, transcriptomics, and proteomics dominated the bioscientific community and generated huge amounts of data, which, unfortunately, currently are only hardly exploitable for a detailed functional understanding of complex physiological and pathophysiological processes. The concept of cytomics addresses this void by using methods enabling a molecular phenotyping at the single cell level (1–3). In the field of cytomic research mainly cytometric and bioimaging techniques have been applied (3). Cytometry allows a phenotypical characterization of cell populations on the single cell level by measuring extra- and intracellular antigens alone or in combination with functional parameters, such as cytokine synthesis, cell proliferation, apoptosis, phagocytosis, anabolic, and catabolic activities (4–6).

The versatility and power of multiparameter flow cytometry were boosted by the latest technical developments in benchtop cytometers enabling simultaneous assessment of up to 17 different fluorescence colors (7, 8). Now, this technology opens up new vistas to dissect and phenotype leukocyte lineages and their malignant derivatives in more detail by the simultaneous detection of surface antigen and cytokine profiles (9).

In the field of chronic-inflammatory rheumatic diseases numerous transcriptomic studies were published using inflamed tissue, whole blood, or pre-enriched mononuclear cells (10–15). A definite functional interpretation of these data is partially limited by the heterogeneous cellular composition of sample material applied—a general problem of microarray studies addressed in various reviews (16–18). Therefore, we followed the concept of cell-specific expression arrays and dissected whole blood of different chronic-inflammatory rheumatic diseases in its major leukocyte populations (19). The perception that already limited numbers of differentially expressed genes were sufficient for disease classification and prediction, prompted us to develop a multiparametric cytometric approach for profiling suitable genes at the protein level. This challenge comprises not only the validation of gene expression data, but additionally implies the monitoring of a set of lineage and activation-dependent antigens in terms of a personalized immunophenotyping.

Besides multicolor staining protocols, we present a software solution based on an Access database which allows storage and statistical analyses of preanalyzed flow cytometric data. As exemplified by a pilot study comparing cytometric profiles of blood from ankylosing spondylitis (AS) patients and healthy controls, our database identified overrepresented and hitherto unknown antigen combinations. These were further analyzed by classification tools originally used for the analysis of DNA microarrays with regard to their value as disease classifiers and predictors.


Study Subjects

For a pilot study, seven AS patients (four males, three females, mean age 46 years) were included. They were characterized by a clinically active disease and were all positive for HLA-B27, and were treated with nonsteroidal antiinflammatory drugs (NSAIDs) such as ibuprofen or diclofenac, but not with disease-modifying antirheumatic drugs (DMARDs), such as sulfasalazine or antibody-based medications, such as TNF-alpha blockers.

The control group included seven healthy individuals (five males, two females, mean age 40 years) not treated with any medications and without any clinical signs of infection. Two of them were analyzed as replicates for determination of reproducibility.

The Ethical Committee of the Medical Faculty of the Charité, Berlin, had approved this study and a written informed consent was obtained from all patients and healthy donors.

Blood Collection and Staining of Cells

Total leukocytes were separated from freshly collected 10 ml heparinized, peripheral blood after lysis of erythrocytes at 4°C with a hypotonic buffer (EL-buffer; Qiagen, Hilden, Germany) according to the instructions of the manufacturer.

Absolute cell numbers were estimated by a CASY® cell counter (Schärfe System, Reutlingen, Germany) and were compared with cell numbers obtained by the cytometric TruCount method (BD Biosciences, Heidelberg, Germany) and alternatively, by values obtained from a differential blood cell counter (Ortho Diagnostics Systems, Neckargemünd, Germany). Cells were treated with Fc-binding reagent (Beriglobin, CSL Behring GmbH, Hattersheim am Main, Deutschland) to block nonspecific staining. Next, cells were divided into 11 aliquots: 10 multicolor staining cocktails and 1 unstained sample. Staining was performed on ice within 10 min under light protection. For staining of up to 50 different cell-surface markers, appropriate monoclonal antibodies (mAbs) conjugated to seven fluorochromes were combined in defined combinations (Table 1). The following dyes were used: FITC, PE, PE-Cy5, PE-Cy7, APC, APC-Cy7, and Alexa 405, excited by three lasers. After staining of cells, they were fixed in 1% paraformaldehyde and stored at 4°C in the dark until measurement within 24 h.

Table 1. Configuration of eight staining cocktails used for the pilot study comparing blood samples from AS patients and normal donors
405 nm488 nm633 nm

Multiparametric Flow Cytometry Analysis

All measurements were performed on a LSR II instrument equipped with four fixed-alignment 488-nm, 633-nm, 405-nm, and 355-nm lasers and filter configurations as shown in Table 2. To ensure best performance of the cytometer, quality control measurements were executed with Rainbow Calibration Particles (Gerlinde Kiesker, Steinfurt, Germany) before and after acquisition of every experiment. Once established, optimal baseline voltages (PMT) were not changed except for the forward (FSC) and side scatter (SSC) channels.

Table 2. Cytometer setup as used in this study with four lasers, 15 photomultiplier tubes (PMT), and longpass-mirrors and bandpass-filters as indicated
  1. Fluorochromes used in this study are highlighted in bold.

Blue (488 nm)A735780/60PE-Cy7
B650670/14PE-Cy5, PerCP
  695/40PE-Cy5.5, PerCP-Cy5.5
C600585/42PE-TexasRed, PE-Alexa 610
E500530/30FITC, Alexa 488
F 488/10SSC
Red (633 nm)A735780/60APC-Cy7
B680680/30Alexa 660
C 655/15APC, Cy5, Alexa 647
Violet (405 nm)A685710/50Qdot 705
B505545/70Alexa 430, Cascade Yellow
C 440/40Alexa 405, DAPI
UV (355 nm)A635660/20Qdot 655
B550610/20Qdot 605
C 440/40Alexa 350, DAPI

At least 600,000 up to 1,000,000 events were acquired for each staining-cocktail. All antibodies used were commercially available (BD biosciences) with the exception of the CD3 antibody (UCHT-1) which was conjugated with Alexa-405 by a commercially available labeling kit (Invitrogen, Karlsruhe, Germany).

Primary Data Analysis

Primary data analysis was performed with BD FACS Diva software v 4.1 (BD Biosciences, Heidelberg, Germany). Primary gates were localized according to FSC and SSC characteristics of typical lymphocyte, monocyte, and granulocyte populations. Because of the diversity of antibodies used, we did not use isotype controls, but always acquired unstained cells. Proper compensation was achieved by the application of antibody capture beads (CompBeads; BD). These polystyrene microparticles bind to the kappa-chain of the fluorochrome-conjugated antibodies and provide distinct positive and negative (negative control beads) stained populations which were used to set compensation levels automatically by the instrument setup software.

Statistical Analysis

Gate and quadrant statistics including total event numbers and mean and median fluorescence intensities were exported via Excel into a custom-made Access database. A more detailed description is included in the results section. Genes@Work was used for sample classification by hierarchical clustering of phenotypical parameters (20). Distance measures were Euclidean distance or Pearson correlation with z normalized parameter vectors and average linkage.

The predictive power of differentially regulated parameters was tested by the class prediction tool PAM (Prediction Analysis for Microarrays, version 1.20) (21). This method is based on a shrunken centroid approach and uses cross-validation to choose the set of parameters with the smallest estimated misclassification error. Thus 75 significant parameters were estimated for an errorless classification of 5 + 2 ND and 5 AS patients in the training data set. The PAM analysis was validated using an independent test set of 2 ND and 2 AS samples which were not included for primary parameter selection.


No Activation and Loss of Cells During Lysis of Erythrocytes

For our cytometric profiling approach, we consciously abandoned a pre-enrichment of leukocytes by density gradient centrifugation methods to acquire all mono- and polynucleated blood cell populations. Therefore, only erythrocytes and platelets were removed by hypo-osmotic cell lysis and washing steps which were performed in a temporally standardized manner at 4°C. Counting the cells after lysis and comparing these numbers with blood cell counts determined in whole blood by the TruCount method or by a hemocytometer showed that no significant cell loss emerged (data not shown). Furthermore, it could be shown for monocytes and CD4 lymphocytes isolated by magnetic or cytometric sorting procedures that lysis of erythrocytes at 4°C induced no significant transcriptional alterations, as measured by a custom-made DNA microarray covering nearly 900 immunologically relevant genes if compared with cells isolated directly from whole blood (data not shown).

Establishment of Multiple Multicolor Stainings with seven Flourochromes and Detecting up to 12-Parameters per Staining

In total eight fixed, standardized multicolor stainings were used for an immunophenotyping of blood cells in a particular sample. Thus, we could monitor a wide array of leukocyte populations, such as memory and naïve B- and T-cell subsets, inflammatory and resident monocyte populations, NK cells, plasma cells, plasmacytoid and myeloid dendritic cells, neutrophils, and eosinophils (Fig. 1A and 1B). In addition, antigens involved in complement and antibody binding, migration, and cell activation were determined.

Figure 1.

(A) This array of dotplots shows a representative experiment of a normal donor and exemplifies the gating strategy for the staining cocktail No. 1 (details for antibody composition see Table 1). It is based on a seven-color and 12-parameter analysis. (B) This scheme summarizes all cell populations and their subtypes comprised of staining cocktail No. 1 in a hierarchical manner.

The first multicolor staining cocktail will be exemplarily presented in more detail. It comprised antibodies against 10 different surface antigens, which were conjugated to only seven different fluorochromes (Table 1). Two colors, APC-Cy7 and PE, were used simultaneously for two and three antigens, respectively, since these antigens are lineage specific and can be clearly discriminated by an appropriate gating as shown in Figure 1A. In Figure 1B all major leukocytes and their corresponding subpopulations covered by this analysis were schematically summarized in a hierarchical manner.

In addition, a flexible number of multicolor stainings were included for the validation of candidate genes. These were usually combined with the main lineage markers CD3, CD4, CD14, CD19, and CD16.

Data Export, Storage, and Statistical Analyses by Access-Database

Hierarchical gating structure generated multidimensional data sets, which were tabulated by FACS Diva software. For further down-stream analyses, an Access-database (db) was established which hosts mean and median fluorescence intensities and absolute cell numbers of appropriate gated cell populations along with mean and median fluorescence intensities of rainbow beads and unstained cells. Usually, the FACS DIVA software acquired a data volume of 300–400 MB per analyzed sample, but only a volume of 64 KB was imported from CSV files into the Access-db. Thus it was ensured that all fluorescence signals are available for every population gathered by appropriate gate or quadrant settings in an unbiased manner. The db converted absolute fluorescence signals to relative values by generating ratios based on fluorescence signals obtained by the rainbow beads—a kind of data normalization. Relative cell numbers were calculated as percentages of all leukocytes and of major subtype populations, respectively. Absolute cell numbers were related to 1 μl of whole blood.

Because of the limitation of Access at 2 GB, the capacity of our Access-db is restricted to ∼3,500 samples, but upscaling to MSSQL Server will be possible to overcome this limitation. In addition to the FACS data, basic clinical data, such as age, gender, diagnosis, and medications are included. Principally, the db allows comparative analyses of groups or single samples, for example before versus after treatment or diseased versus healthy.

The current db structure contains 38 tables for data and information storage, 254 queries and unions for basic functions and calculations, 40 forms for interaction with db-users, 214 macros for user-friendly functionalities, and 39 VBA-modules with 9,942 lines of Visual Basic code (Fig. 2). The db is expansible for new parameters and new staining sets to be open for future developments.

Figure 2.

Ten of totally 40 different forms for interaction with users of the Access database are shown here as examples. The Main Form (in the middle) contains 23 different sheets, which contain all control elements for user interactions and data analysis. Top left is the form for data input of the exported FACS files from BD (after automatically converting them to Excel files). This form enables to import the complete set of data derived from all single antibody cocktail at once, or to import data of individual cocktails. The next form in clockwise direction is for adding new phenotypic parameters as defined by the primary FACS data software. Below that is the form for entering reference values. Another form is used for entering all relevant data of Rainbow calibration beads, which are needed for data normalization. The database is prepared for adding complete new antibody cocktail tables with a total new set of populations. The next two forms are relevant for entering data of a new sample/individual. In the last form presented, absolute leukocyte numbers per blood volume were stored.

Feasibility Study Comparing Ankylosing Spondylitis and Normal Controls

The power of our approach was evaluated by analyzing a group of seven clinically well characterized AS patients and gender-matched healthy control donors. In total 894 parameter combinations per sample defined by appropriate gate and quadrant settings including 99 control parameters had been exported from the FACS DIVA software. These multidimensional parameters may be described by averages of relative event numbers, absolute cell numbers per μl whole blood, and by relative fluorescence intensities of all fluorochromes applied. As could be shown in Figure 3 a set of 80 statistical significant (Welch t-test P-value < 0.022), multidimensional parameters were obtained that allowed a clear classification of the diseased and healthy groups. This classification was visualized by unsupervised hierarchical clustering of the 80 parameters by Genes@Work-software normally used for analyzing gene expression data (20).

Figure 3.

Classification of blood samples obtained from five ankylosing spondylitis patients (indicated as AS) and five healthy controls (indicated as ND) by 80 phenotypic parameters obtained by cytometric profiling. Two ND samples were analyzed in duplicate on different time points of blood collection. Classifying parameters were identified by appropriate statistical analysis using Welch t-tests. Parameters (rows) and patient-ID's (columns) were hierarchically clustered by Genes@Work. The color bar ranges from red to green in which green indicate a relative down and red a relative up-regulation of parameters. The independent two AS and two ND samples framed in yellow were also used for prediction analysis (PAM) and originally not included in the primary parameter selection process (for more details see Fig. 4).

Furthermore, the predictive power of the identified differentially regulated parameters was verified by a PAM analysis (21), and validated by the inclusion of four new independent samples originally not included for the parameter selection. Seventy-five out of 80 parameters were used and allowed an error-free prediction of AS patients and normal donors (ND) (Fig. 4). A minimum of nine parameters was sufficient to obtain a correct classification of all 7 + 2 ND and 7 AS datasets with PAM (data not shown). Clustering of these four new samples together with the primary samples also showed a correct allocation to the AS and ND branches within the cluster tree (Fig. 3).

Figure 4.

This figure shows the results of the PAM analysis performed with 75 out of 80 phenotypic parameters also used for the hierarchical cluster analysis (Fig. 3). The same samples as described in Figure 3 were used for learning (A) and prediction of new, independent samples (B) (train and test datasets). All AS (red dots) and all ND (blue dots) were correctly classified with a cross-validated probability of 1 (A) or a test probability of 1 (B).

Deciphering the results of the cluster analysis for antigens that were primarily responsible for the successful discrimination of AS and healthy control samples, the following parameters were conspicuous: CD1c in B lymphocytes, CD69 in CD4 lymphocytes, and CXCR4 in CD8 memory and effector lymphocytes (defined by CCR7 and CD45RA) and in a subpopulation of NK-cells that was double-positive for CD62L and CD45RA. Furthermore, cellular relocations were ascertained within the CD8 compartment. While the proportion of memory cells was decreased in the AS patients, the pool of terminally differentiated effector cells (defined by CCR7 and CD45RA+) was significantly increased.


The term Cytomics was originally created in 2001 by molecular botanists (22) and covers the earlier introduced term of “system cytometry” (23). This new level of “omics”-research aims to describe the molecular phenotype of single cells on a multidimensional level in health and disease. Hitherto, promising approaches to put the idea of cytome research into practice are only scarcely found in the literature. In this study we present a suitable integrated approach combining state-of-the-art multichromatic flow cytometry with bioinformatic tools originally developed for the analysis of DNA-microarray data. This experimental strategy was designed to detect disease-dependent changes in representation of specific subsets of lymphocytes, monocytes, and NK-cells along with a cytometric profiling of candidate genes obtained by transcriptome studies in the field of chronic-inflammatory rheumatic diseases (19). Our approach is an adaptation of the flow cytometry array (FCA) described by Hofmann and Zerwes (24), but includes far more parameters covering almost all major leukocyte populations in human peripheral blood. Moreover, it is the first study that demonstrates the feasibility to use 50 CD antigens in an unbiased manner to classify and predict an inflammatory disease, such as AS, in comparison with healthy control samples by the implementation of a powerful Access-db along with suitable DNA microarray software tools for parameter clustering and prediction analyses. Our study together with a growing number of recently published reports demonstrate the usefulness of bioinformatic tools that have been emanated from the requirements of global gene expression studies to analyze multiparametric cytometric data sets (25–27). The methods described mainly encompass hierarchical clustering methods, functional component and prediction analyses, and data-mining techniques (for a review see Ref.28). Hence, our study seizes the suggestions stated in a recent communication by Lizard (29) to accompany latest developments in multichromatic flow cytometry by the establishment of new strategies for data analysis.

We did not intend to discuss the biological significance of distinctive cellular markers identified so far, since the primary focus of this work was set on demonstrating the proof-of-principle of our cytometric profiling approach. The successful prediction of independent AS and ND blood samples make it reasonable to assume that our approach may also be useful to predict responsiveness to modern biologicals, such as TNF-alpha blockers or B cell-directed antibodies. Therefore, the monitoring of treatment studies for risk and therapy stratification in the field of chronic-inflammatory rheumatic diseases is currently under way using the same antibody cocktails as described here for our pilot study.

Predictive gene signatures have been already identified by global gene expression studies in the field of rheumatological disorders (30–32). At present, it remains questionable for several reasons whether this technology may be adapted for diagnostic and prognostic applications (33–36). To our knowledge, at the time of preparing this manuscript no licensed expression array is available for clinical applications. This gap may be closed by a suitable cytometric approach as presented here since it is cheaper, more flexible in parameter setup and much easier to perform if compared to the procedure applied for gene expression arrays. Finally, unlike gene expression studies, no time-consuming cell sorting steps will be necessary for a cytometric profiling approach.

Based on the experiences we have made in our study, primary data analysis with conventional flow cytometer data software, such as FACS DIVA or Flowjo, which are necessary so far for the correction of compensation and the adjustment of gates of particular cell populations, is the most time consuming step before complex group comparisons can be performed. Therefore, we would go far beyond to the suggestions of Lizard (32) and state for future developments that it seems mandatory to develop software solutions, which should work independently from any user-defined gate settings. It should include artificial intelligence-based algorithms for an automated identification of populations defined by any conceivable parameter combination that is only limited by the total number of antibodies included. Roederer and coworkers already addressed this particular challenge by an algorithm based on multivariate probability binning (37, 38). Thus, a new level of complexity in multidimensional data analysis would be achieved and could allow a complete unbiased mode for data generation comparable to that known from global gene expression studies.

Finally, to get rid of the physical restrictions caused by the application of fluorescence as detecting principle, new physical methods should be established that would afford the combination of unlimited numbers of antibodies without any interferences within the detection system. A realization of these future prospects would raise the level of cytomics in a new dimension and thereby, all disciplines in the field of system biology would be inspired.


The authors acknowledge the contribution of Sylvia Pade and Lothar Goldschmidt for their careful coordination of patient acquisition.