Generation of flow cytometry data files with a potentially infinite number of dimensions

Authors

  • Carlos E. Pedreira,

    1. Faculty of Medicine and COPPE, Engineering Graduate Program, UFRJ/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
    Search for more papers by this author
  • Elaine S. Costa,

    1. Instituto de Pediatria e Puericultura MartagĐo Gesteira and Departamento de Clínica Médica, UFRJ/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
    Search for more papers by this author
  • Susana Barrena,

    1. Cytometry Service, Department of Medicine and Cancer Research Center (IBMCC, University of Salamanca-CSIC), University of Salamanca, Salamanca, Spain
    Search for more papers by this author
  • Quentin Lecrevisse,

    1. Cytometry Service, Department of Medicine and Cancer Research Center (IBMCC, University of Salamanca-CSIC), University of Salamanca, Salamanca, Spain
    Search for more papers by this author
  • Julia Almeida,

    1. Cytometry Service, Department of Medicine and Cancer Research Center (IBMCC, University of Salamanca-CSIC), University of Salamanca, Salamanca, Spain
    Search for more papers by this author
  • Jacques J. M. van Dongen,

    1. Department of Immunology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
    Search for more papers by this author
  • Alberto Orfao

    Corresponding author
    1. Cytometry Service, Department of Medicine and Cancer Research Center (IBMCC, University of Salamanca-CSIC), University of Salamanca, Salamanca, Spain
    • Centro de Investigación del Cáncer, Paseo de la Universidad de Coimbra, s/n, Campus Miguel de Unamuno, 37007 Salamanca, Spain
    Search for more papers by this author

  • Disclosure of Information: None of the authors is employed by Cytognos S.L.; nor own a commercial stake in this company. Regardless, Cytognos S.L. is part of the EU-supported EuroFlow Research Consortium, has implemented some of the algorithms described in the present study, in its proprietary software INFINICYT, and has a contract license of several patents owned by the University of Salamanca, of which A. Orfao, C. E. Pedreira, and E. S. Costa are inventors.

Abstract

Immunophenotypic characterization of B-cell chronic lymphoproliferative disorders (B-CLPD) is associated with the use of increasingly larger panels of multiple combinations of 3 to ≥6 monoclonal antibodies (Mab), data analysis being separately performed for each of the different stained sample aliquots. Here, we describe and validate an automated method for calculation of flow cytometric data from several multicolor stainings of the same cell sample—i.e., the merging of data from different aliquots stained with partially overlapping combinations of Mab reagents (focusing on ≥1 cell populations)—into one data file as if it concerned a single “super” multicolor staining. Evaluation of the performance of the method described was done in a group of 60 B-CLPD studied at diagnosis with 18 different reagents in a panel containing six different 3- and 4-color stainings, which systematically contained CD19 for the identification of B-cells. Our results show a high degree of correlation and agreement between originally measured and calculated data about cell surface stainings, providing a basis for the use of this approach for the generation of flow cytometric data files containing information about a virtually infinite number of stainings for each individual cellular event measured in a sample, using a limited number of fluorochrome stainings. © 2008 International Society for Advancement of Cytometry

At present, flow cytometric immunophenotyping is an essential tool for the diagnostic characterization of neoplastic cells from patients with B-cell chronic lymphoproliferative disorders (B-CLPD) (1–5), for their prognostic evaluation (6, 7) and for disease-monitoring prior to or after therapy (8). In addition, flow cytometric immunophenotyping is also widely used for the evaluation of the expression of antigens targeted by antibody-based therapies, particularly in primary lymphomas (9–12). Because of the relatively high number of different B-CLPD entities and the complexity of their phenotypes (5, 13, 14), an increasingly larger number of monoclonal antibody (MAb) reagents are typically used at diagnosis for the identification and characterization of these entities and their aberrant phenotypes (13–17). Accordingly, panels of up to 20 or more MAb are commonly evaluated (13–17). Despite this, flow cytometry instruments currently used in clinical diagnostic laboratories have relatively limited multicolor capabilities, being able of simultaneously measuring the expression of between three and six antigens (13–19); more recently, instruments, which can simultaneously measure up to nine different markers, have become available in some clinical laboratories (20). Because of the multicolor limitations of flow cytometers, the use of panels of reagents for the characterization of B-CLPD, which contain two or more combinations of MAb, is mandatory. In such cases, inclusion of backbone reagents (e.g., anti-CD19 and/or anti-CD20), aimed at the identification of the cell population of interest (e.g.: B-cells) in all MAb combinations in a panel, is recommended and frequently used (14, 15). The inclusion of common backbone reagents in each of the combinations of MAb used to stain cells present in a sample increases the reproducibility of the gating strategy used to select specific cell populations in a sample for the evaluation of their overall phenotypic characteristics (15). In addition, this strategy has been recently shown to require strict supervision by an experienced operator, since automated adjustment of gates between different data files may result in inappropriate detection of specific cell populations (21). However, in a sample for which two or more aliquots were separately stained with different combinations of MAb, single cellular events are associated only with part of all information/parameters evaluated, and the correlation between the patterns of antigen expression observed in one aliquot and those of another staining cannot be directly obtained and require an experienced operator (21, 22). In some cases, further staining of a new aliquot of the sample with the precise combination of reagents, whose evaluation requires a direct correlation in single cells, is needed. This can only be done if the reagents available at that moment in the laboratory are conjugated with different, compatible fluorochromes.

In the present paper, we describe and validate an automated method for calculation of flow cytometric data from several multicolor stainings of the same cell sample—i.e., the merging of data from different aliquots stained with partially overlapping combinations of MAb reagents (focusing on one or more cell populations)—into one data file as if it concerned a single “super” multicolor staining. Based on a series of 60 peripheral blood (PB) and bone marrow (BM) samples from an identical number of B-CLPD patients, we show a high degree of agreement between the originally measured values and the calculated data, providing support for the possibility of generating data files containing information about a virtually infinite number of stainings for each single individual cellular event measured in a sample, using a limited number of fluorochrome stainings.

DESIGN AND METHODS

Peripheral Blood Samples

A total of 60 EDTA-anticoagulated PB (n = 48) and BM (n = 12) samples from an identical number of patients—32 males and 28 females; mean age of 67 years, ranging from 42 to 89 years—diagnosed with B-CLPD were analyzed in the present study. All patients were studied at diagnosis and they were grouped according to the WHO criteria (23), as follows: B-cell chronic lymphocytic leukemia (B-CLL), 29 patients (23 typical and 6 atypical B-CLL cases); mantle cell lymphoma (MCL), 8; splenic marginal zone lymphoma, 3; mucosa-associated lymphoid tissue (MALT) lymphoma, 3 cases; follicular lymphoma (FL), 4 patients. Seven cases showed biclonal/composite B-CLPD, corresponding to the coexistence of two different B-CLL clones in three cases, a B-CLL coexisting with a diffuse large B-cell lymphoma (DLBCL) in two patients and a FL coexisting with a DLBCL and a B-CLL plus an unclassifiable B-CLPD in one patient, each. From the remaining cases, two had an unclassifiable B-CLPD and the other four corresponded to BM samples carrying undetectable infiltration by neoplastic B-cells from patients with DLBCL, MALT lymphoma, MCL, and Burkitt lymphoma, respectively. Median white blood cell (WBC) and lymphocyte counts were of 42 × 109 leukocytes/L (range: 2.9–363 × 109 leukocytes/L) and 25 × 109 lymphocytes/L (range: 0.8–329 × 109 lymphocytes/L), respectively. Overall, the median percentage of neoplastic B-cells in the 56 infiltrated specimens was of 34 ± 29% (range: 0.1–92%), being similar in PB and BM samples: median of 36% (range 0.1–92%) versus 33% (range 2.2–86%), respectively. The study was approved by the local Ethical Committee of the University Hospital of Salamanca (Salamanca, Spain) and all individuals gave their informed consent prior to entering the study.

Multiparameter Flow Cytometric Immunophenotyping Studies

Multiparameter flow cytometric analysis of each PB and BM sample was performed using the following panel of three- and four-color combinations of MAb reagents—fluorescein isothyocyanate (FITC)/phycoerythrin (PE)/peridinin chlorophyll protein-cyanin 5.5 (PerCP/Cy5.5)/allophycocyanin (APC)-: FMC7/CD24/CD19/CD34, antihuman surface immunoglobulin λ light chains (sIgλ)/sIgκ/CD19/CD5, CD22/CD23/ CD19/CD20, CD103/CD25/CD19/CD11c, CD43/CD79b/CD19/-, cytoplasmic (Cy) Bcl2/CD10/CD19/CD38. In addition, a tube just containing CD19-PerCPCy5.5 was also stained and analyzed in parallel for each sample to evaluate the autofluorescence levels of both normal and neoplastic B-cells present in the sample. In a subgroup of sample (n = 4), additional multiple stainings — –/CD5/CD19/CD11c; –/CD20/CD19/CD11c; FMC7/–/CD19/CD43; –/CD23/CD19/sIgκ; –/CD24/CD19/CD79b and –/CD24/CD19/sIgκ—were performed for further comparison of actually measured and calculated data for the same markers.

Briefly, pretitrated amounts of each MAb in a combination were added to separate aliquots containing between 0.5 and 1 × 106 WBC in 100 μL of PB or phosphate-buffered saline (PBS; pH = 7.4)-diluted BM. After gentle mixing, the different sample aliquots were incubated with the corresponding MAb mixtures for 15 min at room temperature (RT) in the dark. After this incubation period, 2 mL of FACS lysing solution [Becton/Dickinson Biosciences (BDB), San José, CA] diluted 1/10 (v/v) in distilled water was added and another 10 min incubation was performed at RT, in the dark. Samples were then washed once in 4 mL PBS/aliquot (5 min at 540 g) and measured in a FACSCalibur flow cytometer (BDB). For the staining of sIgλ and sIgκ, samples were washed twice with 2 mL PBS containing 0.2% bovine serum albumin prior to adding the antibody reagents. For the staining of Cy-Bcl2, the Fix & Perm reagent kit (Invitrogen, Carlsbad, CA) was used, strictly following the recommendations of the manufacturer, after staining for surface CD10, CD19, and CD38. For each sample aliquot, information about a total of 5 × 104 leukocytes was acquired and stored using the CellQUEST software program (BDB). For samples with a low (<10%) B-cell percentage, additional information about a total of 5 × 104 CD19+/SSClo B-cells was acquired through an electronic live-gate set on a CD19 versus SSC dot plot and stored using the CellQUEST software, as previously described (15).

Merge of Flow Cytometry Data Files and Calculation of Flow Cytometric Data

Sequential merge of all data files from several multicolor stainings of the same cell sample was performed using the INFINICYT™ software program (Cytognos SL, Salamanca, Spain) using a slightly modified previously reported approach (24, 25). Similarly, the calculation function of the INFINICYT™ software, based on nearest-neighbour statistical tools (26, 27), was used to calculate the information about each individual parameter not actually measured in an individual event for the overall panel of markers analyzed; such calculation was done for each event measured. Accordingly, three parameters were measured in common—forward light scatter (FSC), side light scatter (SSC) and CD19-PerCP/Cy5.5—in all multicolor stainings of the same cell sample; all other parameters were measured only for that subset of cellular events corresponding to the specific three- and/or four-color staining, where it was specifically assessed. Briefly, for each event, a vector in a three-dimensional space was built-up based on the data measured for the three common parameters (FSC, SSC- and CD19-associated red fluorescence). Then, the nearest neighbor for each individual event in a data file/sample aliquot was calculated as that event in another file/aliquot showing the shortest distance to it in the three-dimensional space generated by the parameters (FSC, SSC, and CD19) measured in common in both data files/sample aliquots. Then, for each individual event in a data file, those values obtained for each of the closest events in the other data files were assigned for each of those parameters not actually measured in the former event.

The processes followed for matching events from different data files were as follows. Formally, the three common parameters—FSC, SSC, and CD19-PerCP/Cy5.5—were labeled as k = 1, k = 2, and k = 3, respectively. The remaining 14 parameters were labeled in sequence from k = 4 to k = 14. Let xmath image(i) denote parameter k for the ith observation in tube j. Denote xmath image(i) ≡ (xmath image(i),xmath image(i),xmath image(i)) ∈ ℜ3 as a vector containing the measurements of the common parameters for the ith observation in tube j. Let ℑj be the set of parameters not present in tube j [e.g., for the third tube (j = 3), ℑ3 = {4, 5, 6, 7, 8, 9, 13, 14, 15, 16, 17}]. Let #j be the number of observations in tube j and #ℑj the number of elements in set ℑj. Then proceed with the sequential steps listed below.

Step 1: Take tube j (start by taking the first tube, i.e., make j = 1); Step 2: Determine the parameter to be estimated in tube j by setting k = ℑj (s). Start by setting s = 1 [e.g., for the third tube (j = 3), start with k = ℑ3 (1) = 4]; Step 3: Take an observation in tube j (start by setting i = 1); Step 4: Find a tube hj that contains parameter k [e.g., to estimate parameter k = 10 in tube 1 (j = 1), one would have s = 4, ℑ1 (4) = 10 and h = 3, corresponding to the third tube]; Step 5: Find the element r in tube h for which the distance between xmath image(i) and xmath image(r) is minimized. Set xmath image(r) as the calculated value for xmath image(i)—note that the maximum distance accepted between these two elements was 90 (the maximum distance possible in the ℜ3 space composed of the common parameters is 1,024 × equation image)—; Step 6: Increment i = i + 1. If i ≤ #j, go to Step 3, else make i = 1; Step 7: Increment s = s + 1. If s ≤ #ℑj, go to Step 2, else j = j + 1; Step 8: If j ≤ 6, go to Step 1, and proceed again with the sequence described until information for all parameters in all events has been calculated.

The data file obtained after merging the original three- and four-color (five- and six-parameter) data files and calculating the values for each event in the data file was a file containing information about all parameters measured in all multicolor stainings for each of the events recorded. In practice, each merged/calculated data file contained information about all parameters measured, for each of the ≥2.5 × 105 events analyzed per sample (five aliquots/sample × ≥5 × 104 events/ aliquot).

In the present study, data obtained with intracellular stainings were separately analyzed, because two out of three common parameters (FSC and SSC) showed variable positions for B-cells due to the fact that in comparison with the other (surface) stainings, cells from this particular aliquot were submitted to additional fixation and permeabilization processes resulting in variable changes in the light scatter parameters.

Generation of Simulated Data Files

Two simulated data files were computationally created, one of them in ℜ6 and the other with the same parameters except one (in ℜ5). The first data file was generated by randomly assigning values to 50,000 events according to five Gaussian probability distribution functions (GPDF) in ℜ6, by using the mean and covariance matrix values obtained from the analysis of a real data file containing flow cytometry measurements of events corresponding to lymphocytes, monocytes, eosinophils, and basophils from a normal peripheral blood stained with HLA-DR FITC, CD33 PE, CD45 PerCP-Cy5.5, and CD14 APC. The second simulated data file was generated by randomly assigning values to 50,000 events according to five GPDF in ℜ5 (10,000 events for each GPDF), by using the mean and covariance matrix values obtained for all the above listed parameters except CD33-PE, measured for the cell populations from the real data file mentioned above. Subsequently, the two simulated files were merged. Values corresponding to the parameter excluded from the second data file—CD33—were calculated according to the procedures described above (Fig. 1).

Figure 1.

Bivariate dot plot histograms illustrating the results of calculating flow cytometric data for events contained in a simulated data file for a parameter for which originally no values were assigned. In panels AC, bivariate dot plot representations of two (parameter A and B; SSC and CD14-APC, respectively) parameters in common in the original two simulated data files are displayed for data file 1 (panel A) and for data file 2 (panels B and C). In panels D and G, data corresponding to events displayed in the original data file 1 used to calculate the values for parameter C (CD33-PE) for those events contained in data file 2 are displayed. Panels E and H show that no information about parameter C (CD33-PE) existed originally for those events contained in data file 2. Panels F and I show the calculated values for parameter C (CD33-PE) for the individual events contained in data file 2.

Flow Cytometric Data Analysis

The FACSDiVa software program (BDB) was used for the analysis of both the originally measured and the calculated data. For that purpose, merged/calculated data files acquired in a FACSCalibur flow cytometer were imported with the FACSDiva software by converting the imported data into relative linear units scaled from 0 to 262,144. For each individual sample, the percentage of neoplastic B-cells was recorded together with the mean fluorescence intensity (MFI; arbitrary relative linear units scaled from 0 to 262,144) and the coefficient of variation (CV) of the amount of expression observed for each individual antigen, after separately gating for the neoplastic and normal residual B-cells (and B-cell precursors in case of BM samples) present in each sample. For this purpose, total B-cells were identified as those CD19+ events showing low to intermediate FSC and SSC values, after specifically excluding platelets and cell debris, according to previously described methods (15). Normal mature B-lymphocytes were identified as those events being CD19+, CD20hi, CD10-, CyBcl2+, CD38−/lo, CD22++, CD23−/lo, CD43, CD79b+, FMC7+, CD103, CD25−/lo, CD11c−/+, CD5−/lo, and sIgκ+ or sIgλ+ with a sIgκ+/sIgλ+ B-cell ratio of between 2:1 and 1:1. In turn, normal BM B-cell precursors were identified as those CD19+, CD20−/lo, CD10+, CyBcl2lo, CD38hi, CD22lo, CD23, CD43+, CD79b, FMC7, CD103, CD25, CD11c, CD5, and sIg−/+ events (15). Neoplastic B-cells were all other mature B-lymphocytes showing an aberrant phenotype, as illustrated in Figure 2 for a PB sample from a patient with B-CLL (panels A–F) and for a BM sample from a patient with MCL (panels G–L). The following gating strategy was used for data analysis. Firstly, for each staining, both a “P1 gate”—for actually measured events—and a “P2 gate”—for calculated events—were generated, using the FSC versus file number dot plot (Fig. 3A). Then a third gate was set in the whole merged/calculated data file (e.g., for both actually measured and calculated events) to define CD19+ cells; this gate (CD19+ gate) was set in FSC versus SSC and SSC versus CD19 bivariate dot plot histograms (Fig. 3B). In the following step, three hierarchical gates were established for those events contained in the “CD19+ gate” aimed at the identification of normal mature B lymphocytes, B-cell precursors, and neoplastic B cells, according to the criteria described above. Of note, in some merged/calculated data files/samples, one or more of these three B-cell subpopulations could not be identified. Accordingly, B-cell precursors were not identified in data files corresponding to PB samples; neoplastic B-cells were not detected in the four samples, which did not show detectable tumor infiltration; and normal mature B lymphocytes could not be identified in 27 samples, where these cells were outnumbered by their neoplastic counterpart. Afterward, an intersection between each of the three B-cell subset gates and the “P1 gate” was done to create the following three new gates: “actually measured (AM) normal mature B-lymphocytes,” “AM B-cell precursors,” and “AM neoplastic B-cells.” In a similar way, a second intersection between each one of the three B-cell gates and the “P2 gate” was performed to establish three new gates for the identification of “calculated normal mature B-lymphocytes,” “calculated B-cell precursors,” and “calculated neoplastic B-cells.” This sequence was repeated for the analysis of the data corresponding to each staining in the merged/calculated data file.

Figure 2.

Bivariate dot plot histograms illustrating the gating strategy used for the identification of different B-cell subpopulations present in a PB from a patient diagnosed with typical B-cell chronic lymphocytic leukemia (B-CLL) (panels AF) and in a BM from a patient diagnosed with mantle cell lymphoma (MCL) (panels GL). Firstly, B-cells were identified based on CD19 expression and their unique scatter characteristics (panels A, B, G, and H). In Panels AF, normal mature B-lymphocytes (red dots) and monoclonal sIgκlo B-CLL cells (green dots) were identified based on their unique phenotypic features. As shown in panels CE, normal mature B-lymphocytes were identified as being CD23−/lo, CD22+, CD20++, CD5, and either sIgκ+ or sIgλ+ (sIgκ+/sIgλ+ ratio of 1), whereas neoplastic B-cells showed a CD23+, CD22lo, CD20lo CD5lo/+, and sIgκlo phenotype. In Panels GL, B-cell precursors (blue dots), normal mature B-lymphocytes (red dots), and monoclonal sIgλ+ B-CLL cells (green dots) were also identified based on their unique phenotypic features. As shown in panels IK, B-cell precursors were identified as CD23, CD22+lo, CD20−/+, CD5, and sIg; normal mature B-lymphocytes were identified as described above and neoplastic B-cells showed a CD23, CD22+, CD20++, CD5+, and sIgλ+ phenotype. Panels F and L show the distribution of CD19+ events corresponding to the different B-cell populations present in the sample in the distinct sample aliquots measured.

Figure 3.

Illustrating example of the merging and calculation processes performed on a set of five original data files corresponding to five aliquots of a representative PB sample from a patient diagnosed with a biclonal/composite B-CLPD carrying a B-CLL together with a splenic marginal zone B-cell lymphoma. The data corresponding to the merged and calculated data files is shown in panel A. Data about those CD19+/SSClo events stained with the CD22/CD23/CD19/CD20 combination—prior data calculation—is depicted in panels BD, showing that, while CD22 and CD23 were originally measured simultaneously in this group of events (panel C), that was not the case for CD20 and CD5 (panel D). In the following dot plots (panels EG), the same gating strategy is shown for CD19+/SSClo events stained with combinations of monoclonal antibodies other than CD22/CD23/CD19/CD20 after data calculation; as shown in panels F and G, the pattern of expression of CD22 and CD23 observed for the calculated data is identical to that of the actually measured, original data (panel C). In addition, two-dimensional dot-plot representations, corresponding to combinations of antibodies conjugated with the same fluorochrome but not obtained by direct staining of cells, were generated (e.g., panel G); the patterns of antigen expression observed were in line with what could be expected for both populations of neoplastic B-cells (panels HM).

Statistical Methods

All numerical and coded data derived from flow cytometric immunophenotyping studies were introduced in a database using the SPSS program (SPSS 12.0, Chicago, IL) and MATLAB program (Mathworks, Natick, MA). For each continuous variable analyzed, mean values and their standard deviation, as well as the median and 95% confidence interval, were calculated. To assess the statistical significance of the differences observed between groups, the Mann–Whitney U test was used. Pearson correlation and Bland–Altman plots were used for further comparison of originally measured and calculated flow cytometry data and to assess the degree of agreement between the different sets of measured and calculated data, respectively. For Pearson correlations, r2, slope and Y-intercept (Yint) were recorded. Y-intercept values were normalized by the maximum value observed for actually measured (AM) data (VMax) of each variable evaluated, as follows: normalized Y-intercept = (Yint/VMax) × 100. P-values <0.05 were considered to be associated with statistically significant differences.

RESULTS

As illustrated in Figure 1, no differences were found in the simulated data-files generated between the original and the calculated data for the CD33 parameter, the correlation coefficient (r2) observed between the original and calculated data being of 0.93 (slope 0.978; Y-intercept 1). Figure 3 illustrates the strategy used for the identification of CD19+ B-cells present in each individual sample analyzed and the specific analysis of the phenotypic characteristics of the different B-cell populations contained among CD19+ events. This figure also shows the comparison between the patterns of expression of individual markers for those parameters originally measured in individual multicolor stainings and the measures calculated for the same parameters for those events included in other aliquots of the same cell sample stained with different combinations of MAb; as displayed, data calculation allowed us to build new bivariate dot plots (Figs. 3H–3M) containing data from different multicolor stainings that would have been impossible to perform using direct flow cytometry measurements (e.g., evaluation of combined expression of two or more markers conjugated with the same fluorochrome). Comparison between the patterns of antigen expression obtained in these originally impossible bivariate dot-plots using calculated data versus real measurements obtained after staining for the same pairs of antibodies conjugated with different fluorochromes (n = 4) showed that, despite expected differences in fluorescence intensity, due to the use of different fluorochromes, similar profiles were observed when real versus calculated data was plotted, as illustrated in Figure 4.

Figure 4.

Illustrating example of the performance of the data calculation approach used in this study. Comparison between merged and calculated data files and real data files obtained with the same antibodies conjugated with different fluorochromes. Bivariate dot-plots corresponding to the real versus calculated data for the same pairs of monoclonal antibodies (conjugated with different fluorochromes) are shown in panels AC (CD5-PE vs. CD11c-APC, CD20-PE vs. CD11c-APC, and FMC7-FITC vs. CD43-APC, respectively), panels GI (sIgκ-APC vs. CD23-PE, CD24-PE vs. CD79bc-APC, and CD24-PE vs. sIgκ-APC, respectively), panels DF (CD5-APC vs. CD11c-APC, CD20-APC vs. CD11c-APC, and FMC7-FITC vs. CD43-FITC, respectively), and panels JL (sIgκ-PE vs. CD23-PE, CD24-PE vs. CD79b-PE, and CD24-PE vs. sIgκ-PE, respectively). These later panels (DF and JL) correspond to “originally impossible” stainings.

In line with this comparison of originally measured versus calculated data showed no significant differences (P > 0.05) with regard to intensity and expression pattern of individual antigens (Table 1). Moreover, a high degree of correlation was found for the percentage of all B-cell subpopulations identified (Table 2), r2 correlation coefficients being constantly >0.95 for all stainings. This was also associated with a degree of agreement of >75% for all B-cell populations analyzed (Table 2). In addition, immunophenotypic diagnosis could be correctly performed in all samples just by exclusively analyzing the information contained in the common parameters and calculated data; this included detection of more than one neoplastic B-cell clone in those seven cases with biclonal/composite B-CLPD, as illustrated in Figure 3.

Table 1. Comparison between originally measured and calculated data regarding the relative distribution observed for the different PB and BM B-cell populations as well as the amount and pattern of expression of individual antigens on the different B-cell populations identified, as reflected by its mean fluorescence intensity (MFI) and coefficient of variation (CV), respectively
 B-cell precursors (n = 12)Mature B-lymphocytes (n = 33)Neoplastic B-cells (n = 56)
Originally measured dataCalculated dataOriginally measured dataCalculated dataOriginally measured dataCalculated data
  • ND, not determined.

  • Results expressed as mean value ± one standard deviation and range between brackets. No statistically significant differences (P > 0.05) were observed for any of the comparisons performed between originally measured and calculated data.

  • a

    B-cell precursors could not be identified with this four-color combination of monoclonal antibodies.

% of total cells0.3 ± 0.8 (0.1–3.9)0.3 ± 0.8 (0.1–4)0.7 ± 0.9 (0.2–4.1)0.7 ± 0.9 (0.1–4.0)33 ±2 8 (0.19–92.9)32 ± 27 (0.35–91.7)
FMC7
 MFI152 ± 174 (60–578)139 ± 138 (63–475)9452 ± 5761 (2587–23471)9609 ± 5984 (2293–24711)1227 ± 2347 (67–13963)1252 ± 2375 (66–14272)
 CV87 ± 36 (44–180)86 ± 34 (51–171)67 ± 19 (30–116)65 ± 19 (27–113)85 ± 31 (45–191)85 ± 32 (44–195)
CD24
 MFI15981 ± 3209 (12009–21316)16450 ± 3265 (12612–22371)1937 ± 1616 (502–5315)1896 ± 1531 (501–5321)5840 ± 6813 (30–29998)5865 ± 6917 (31–29204)
 CV40 ± 11 (17–50)41 ± 11 (17–55)106 ± 34 (31–181)104 ± 37 (31–175)73 ± 32 (30–177)76 ± 32 (30–176)
CD34
 MFI38239 ± 83296 (4654–259797)38665 ± 83174 (4041–259797)219 ± 90 (41–400)222 ± 97 (51–451)186 ± 118 (54–760)188 ± 116 (57–731)
 CV69 ± 21 (46–112)69 ± 22 (48–117)86 ± 48 (52–118)85 ± 49 (49–112)78 ± 16 (46–131)78 ± 16 (48–139)
CD22
 MFI216 ± 85 (82–334)226 ± 86 (77–335)2785 ± 1625 (1047–7901)2679 ± 1530 (1034–7468)679 ± 577 (131–3019)675 ± 549 (128–3046)
 CV47 ± 7 (37–58)46 ± 6 (37–56)63 ± 19 (38–118)61 ± 18 (32–98)55 ± 13 (32–98)56 ± 17 (30–120)
CD23
 MFI118 ± 73 (41–276)124 ± 66 (44–261)451 ± 378 (67–1098)449 ± 387 (67–1126)4130 ± 4810 (76–20053)4161 ± 4840 (69–20373)
 CV55 ± 19 (28–9)56±17 (31–79)101 ± 32 (40–168)102 ± 35 (32–172)124 ± 58 (52–396)125 ± 58 (51–398)
CD20
 MFI624 ± 366 (135–1071)622 ± 366 (137–1092)63253 ± 33432 (18696–191902)60304 ± 31834 (19206–176978)25018 ± 30379 (513–107567)25053±30458 (539–108934)
 CV116 ± 45 (48–184)112 ± 43 (49–170)59 ± 15 (26–96)59 ± 18 (14–100)88 ± 44 (32–249)88 ± 43 (34–209)
CD103a
 MFINDND104 ± 36 (56–197)108 ± 39 (55–204)126 ± 71 (53–497)129 ± 71 (57–514)
 CVNDND56 ± 19 (29–125)56 ± 20 (35–145)58 ± 19 (37–140)59 ± 22 (36–156)
CD25a
 MFINDND91 ± 64 (30–285)91 ± 64 (30–287)763 ± 643 (91–3434)770 ± 648 (98–3563)
 CVNDND48 ± 56 (15–390)46 ± 55 (15–380)70 ± 22 (37–180)70 ± 21 (37–175)
CD11ca
 MFINDND171 ± 109 (32–447)178 ± 124 (32–498)717 ± 801 (49–4320)729 ± 802 (58–4324)
 CVNDND84 ± 31 (25–182)81 ± 29 (25–194)112 ± 30 (39–197)111 ± 29 (39–194)
CD43
 MFI635 ± 173 (463–979)663 ± 196 (449–1009)105 ± 36 (58–205)108 ± 40 (45–217)750 ± 565 (40–2310)784 ± 598 (42–2497)
 CV84 ± 17 (64–119)84 ± 15 (55–105)58 ± 11 (36–81)56 ± 12 (32–84)60 ± 14 (36–122)60 ± 12 (40–110)
CD79b
 MFI248 ± 58 (157–313)251 ± 64 (157–330)14701 ± 9159 (2397–36741)15114 ± 9456 (2052–39327)3587 ± 6333 (134–30502)3687 ± 6572 (132–31583)
 CV58 ± 18 (36–82)57 ± 16 (32–74)70 ± 23 (29–131)68 ± 26 (25–131)66 ± 21 (26–139)67 ± 21 (26–145)
SIgλ      
 MFI127 ± 36 (67–166)126 ± 39 (66–178)8262 ± 9515 (82–35610)8232 ± 9497 (85–35086)1725 ± 3141 (89–15809)1775 ± 3278 (91–16754)
 CV64 ± 13 (47–88)58 ± 19 (21–89)79 ± 25 (33–157)77 ± 25 (30–157)72 ± 27 (39–166)70 ± 24 (38–151)
SIgκ      
 MFI111 ± 36 (71–181)113 ± 42 (57–193)6363 ± 8973 (45–36442)7087 ± 11973 (41–81508)3267 ± 6134 (78–29160)3518 ± 6701 (87–29893)
 CV61 ± 18 (39–93)56 ± 22 (20–91)78 ± 20 (38–119)77 ± 20 (38–118)79 ± 26 (36–210)78 ± 25 (38–203)
CD5
 MFI241 ± 178 (120–748)243 ± 198 (111–837)974 ± 1856 (46–12377)931 ± 1815 (33–13003)4290 ± 4041 (111–17475)4389 ± 4012 (108–16169)
 CV86 ± 30 (51–141)75 ± 34 (39–146)112 ± 48 (37–197)108 ± 50 (32–197)107 ± 118 (34–625)108 ± 117 (33–660)
Table 2. Comparison between the relative distribution of different subpopulations of B-cells upon comparing originally measured versus calculated data
MAb combination (FITC/PE/PerCP-Cy5.5/APC)B-cell populations
B-cell precursors (n = 12)Mature B-lymphocytes (n = 33)Neoplastic B-cells (n = 56)
r2/slope/ Y-intercept% of agreementr2/slope/ Y-intercept% of agreementr2/slope/ Y-intercept% of agreement
  • FITC, fluorescein isothyocyanate; PE, phycoerythrin; PerCP-Cy5.5, peridinin chlorophyll protein-cyanin 5.5; APC, allophycocyanin; MAb, monoclonal antibody; ND, not determined.

  • Results expressed as r2 Pearson correlation coefficient/slope/Y-intercept and % agreement (Bland–Altman test).

  • a

    B-cell precursors could not be identified with this four-color combination of monoclonal antibodies.

  • b

    r2/% of agreement for neoplastic and normal mature sIgλ+ and sIgκ+ B-lymphocytes of 0.95/80%, 0.91/86%, 0.98/72%, and 0.98/84%, respectively.

FMC7/CD24/CD19/CD340.99/1.04/0880.97/1.05/−1.5900.99/1.00/1.196
CD22/CD23/CD19/CD201/1.01/0911/1.01/0881/1.01/0.592
CD103/CD25/CD19/CD11caNDND0.97/1.03/−1.4930.97/0.98/285
CD43/CD79a/CD191/0.97/−0.61000.95/1.12/−0.2890.98/1.06/0.391
sIgλ/sIgκ/CD19/CD5b0.99/0.93/−1.1830.97/1.13/−0.6760.96/0.96/−0.581

Similarly, upon comparing the median fluorescence intensity (MFI) obtained for originally measured and calculated data for each individual marker analyzed in the different subpopulations of B-cells identified, a significant correlation (r2 ≥ 0.9) and degree of agreement (≥80%) was observed (Table 3). Also the pattern of expression of individual antigens in the distinct B-cell subpopulations present in the samples analyzed was in line with what could be expected by an expert flow cytometrist (Fig. 2); this was actually reflected by the fact that a high degree of correlation (r2 ≥ 0.83) and degree of agreement (≥82%) was observed once we compared the CV of originally measured versus calculated data for those individual markers analyzed in both normal and neoplastic B-cell subpopulations (Table 3). These results were independent of whether or not normal B cells were identified in each particular case as well as of the relative size of the population of normal B cells within the total B-cell population in those cases in which the former could be identified. Of note, in all those samples in which minor normal B-cell populations were identified for the actually measured data-sets, they were also present in the calculated data-sets and vice versa. However, comparison of originally measured versus calculated data between intracellular and surface membrane stainings was associated with a significantly lower degree of correlation and agreement for neoplastic B-cells (r2 of between 0.75 and 0.98; % agreement of 63–75%), normal mature B lymphocytes (r2 of between 0.52 and 0.99; % agreement 54–81%), and normal BM B-cell precursors (r2 of between 0.45 and 0.94; % agreement 25–83%).

Table 3. Degree of correlation and agreement between the pattern of expression of individual antigens observed for both normal and neoplastic B-cells for actually measured versus calculated data
Antigen variablesB-cell populations
B-cell precursors (n = 12)Mature B-lymphocytes (n = 33)Neoplastic B-cells (n = 56)
r2/slope/ Y-intercept% of agreementr2/slope/ Y-intercept% of agreementr2/Slope/ Y-intercept% of agreement
  • MFI, median fluorescence intensity; CV, coefficient of variation of fluorescence intensity; ND, not determined.

  • Results expressed as the r2 correlation coefficient/slope/Y-intercept and % agreement (Bland–Altman test).

  • a

    B-cell precursors could not be specifically identified in the four-color staining in which this antigen was included.

FMC7
 MFI1.00/1.15/−3.81000.99/0.96/0.1931.00/0.99/−0.196
 CV0.97/1.03/1.2900.91/0.91/6.3900.99/1.00/−0.195
CD24
 MFI0.97/0.87/7.6900.99/1.05/−1871.00/0.98/0.3100
 CV0.96/0.96/2.3901.00/0.85/91001.00/0.99/−1.293
CD34
 MFI1.00/1.00/−0.21000.99/0.92/4.3921.00/1.02/−0.998
 CV0.99/0.95/3.61000.98/0.86/10920.95/0.96/2.598
CD22
 MFI0.98/0.96/2.7910.99/1.05/0.9881.00/0.99/−0.292
 CV0.88/1.14/231000.93/0.91/8901.00/1.00/0.195
CD23
 MFI0.99/1.07/−1.5900.99/0.99/−0.8760.98/0.99/0.198
 CV0.97/0.90/7900.83/0.85/−3.1871.00/0.99/090
CD20
 MFI0.99/1.00/−0.1900.97/1.03/0.5800.99/1.00/093
 CV0.99/1.04/1.2890.90/0.91/−1.9871.00/0.99/−0.1100
CD103a
 MFINDND1.00/0.85/6.6911.00/1.00/−0.795
 CVNDND0.99/0.85/6930.99/0.95/2.391
CD25a
 MFINDND0.91/1.00/0.1931.00/0.99/096
 CVNDND0.99/1.01/0.2950.97/1.00/−0.296
CD11ca
 MFINDND1.00/0.80/6.4910.99/0.99/−0.192
 CVNDND0.94/0.96/3.1950.99/1.00/0.597
CD43
 MFI0.96/0.85/71000.99/0.85/6880.98/0.95/0.597
 CV0.95/0.98/1.5860.89/0.85/15900.99/1.01/−198
CD79b
 MFI0.91/0.89/8.61000.99/0.96/0.5881.00/0.96/0.195
 CV0.93/1.01/−2.41000.92/0.84/12840.98/0.99/095
sIgλ      
 MFI0.98/0.81/18920.99/1.00/0.2850.99/0.96/0.288
 CV0.93/0.80/29830.92/0.89/7820.98/1.00/0.493
sIgκ      
 MFI1.00/0.90/111000.92/0.86/0.1861.00/0.98/090
 CV0.97/0.82/23920.97/0.91/6.2920.99/1.00/−0.195
CD5
 MFI0.97/0.89/3.61000.98/1.00/0.4850.99/1.00/−0.595
 CV0.89/0.82/17920.97/0.91/7830.97/0.99/−0.188

DISCUSSION

In the last decade, the complexity of the panels used for multiparameter flow cytometric immunophenotyping of different hematological malignancies, including B-CLPD, has dramatically increased (1-6, 15-17). This is due to the inclusion of phenotypic criteria in the currently used WHO classification of mature B-cell disorders (23), and the demonstration of the utility of the use of an increasingly higher number of markers for the classification and prognostic evaluation of B-CLPD (3–17). In addition, monitoring of minimal residual disease in leukemic B-CLPD demands a more complete knowledge about the aberrant phenotypes expressed by the neoplastic B-cells (8–15). Also the increasing use of antibody-based targeted therapies requires assessment of additional specific markers (11, 12). Altogether, this has resulted in panels of around between 15 and 25 markers, for the diagnostic characterization of B-CLPD (1-8, 13-17). Interestingly, such increase in the number of markers used for the characterization of individual samples is not restricted to B-CLPD, and it is even more pronounced in other disease conditions, such as myelodysplastic syndromes (28–30).

Despite the introduction of increasingly larger panels of reagents, flow cytometers currently employed in most clinical laboratories have a relatively limited number of fluorescence detectors (4, 5, 16). Inclusion of extra fluorochromes is associated with the need for new high quality fluorochrome-conjugated MAb and more complex compensation matrices between the fluorescence emissions of the combined fluorochromes (31, 32). Finally, a broader reagent stock would be required for performing all potentially useful combinations of stainings; at the same time, reagents are either only available or they just work properly, in a limited number of fluorochrome-conjugated formats (33–36). This is particularly true for antigens that require highly sensitive fluorochromes for their correct evaluation (34–36).

In the present paper, we propose a statistical approach to generate flow cytometry data files in which every single cellular event has information about all parameters measured in different multicolor stainings of the same cell sample. This approach generates data files with information on single cells about a virtually infinite number of parameters, without the need for increasing the number of fluorochromes, fluorescence detectors, or lasers used in a conventional four-color immunophenotypic approach. The main statistical tool used in this approach—calculation of the nearest neigbor—has long been described (26, 27); it allows calculation of information about parameters for a given cellular event measured in a specific multicolor staining, which were not directly measured for that cell, provided the fact that such parameters have been assessed in other multicolor stainings of the same cell sample, for cells which are similar to the former cell. The evaluation of the similarity between cells in different multicolor stainings of the same cell sample requires the use of information about parameters measured in common in the different multicolor stainings, and that identify the cell population of interest, making this inference achievable. Interestingly, this approach also allows visualization of cellular events in the same dot-plot of two (or more) parameters derived from different multicolor stainings with the same or with incompatible fluorochromes, which is not possible in current practice. In principle, our new calculation strategy can transform immunophenotyping with five three- and four-color tubes that each contains one marker in common into a single 15-color immunostaining of the target population that can be defined by scatter and one common marker.

Concerning the nearest neighbors search, the Euclidean distance was adopted because of being intuitive and simple while no clear advantages existed in choosing more sophisticated metrics (e.g., Mahalanobis distance). Since our goal was to search for the nearest neighbors, distances were calculated around the centers of mass, resulting that the actual values obtained using Euclidean and Mahalanobis metrics were almost equal. Furthermore, calculation of Mahalanobis distances involves the estimation of the covariance matrices for the populations and in such case this would have introduced an unnecessary numerical hardship for the largest populations, while it would had been less accurate for populations of events with less elements.

Several potential limitations associated to the use of the nearest neighbor approach should be considered. Firstly, it requires that the common parameters are adapted to more precisely identify the subset of events containing the cell populations of interest. In addition, the total number of events analyzed and stored in each data file should be chosen according to the relative frequency of the cell populations of interest, so that it is clearly identifiable. Finally, sample preparation and staining techniques should be created with only small variations in the patterns of staining for the common parameters obtained for different aliquots of a sample.

In the present paper, we evaluated and validated this statistical approach in 60 patients with B-CLPD and in data files that were computationally simulated. The panel used has been previously described in detail, and it has been shown to allow the distinction between normal and neoplastic mature B-cells (13–15), at the same time providing relevant information for the classification of B-CLPD. Overall, our results showed a high correlation and degree of agreement between originally measured and calculated data after generating files containing information about 17 different parameters on each of the ≥250,000 cellular events measured, even when minor populations (e.g. as residual normal mature B-cells, in cases with very high frequencies of neoplastic B-cells) were present. However, a lower correlation was found when four-color combinations of MAb containing information about surface-only and surface-plus-intracellular antigens were used to calculate nonmeasured parameters on events corresponding to B-cells. This was due to the occurrence of variable changes in the light scatter properties of the cellular events measured in the intracellular stainings, mainly caused by the use of different sample preparation techniques. In contrast, it did not occur for those stainings containing reagents aimed at detecting surface immunoglobulins, despite the use of a sample preparation protocol, which included an additional washing step with PBS prior to staining. In line with this, previous studies have shown that cell fixation and permeabilization prior to staining of intracellular antigens are associated with significant and uncontrolled changes in the light scatter properties of individual cells (37, 38). Thus, our results clearly show the need for well-standardized sample preparation techniques, common to all sample aliquots to be used for data calculation. Alternatively, a larger number of common backbone MAb could potentially be used to properly identify the cells of interest and overcome the limitations associated with changes in their light scatter properties. Such studies with eight-color immunostainings are currently ongoing in the EuroFlow project, entitled “Flow cytometry for fast and sensitive diagnosis and follow-up of haematological malignancies,” supported by the European Commission.

In summary, we describe and demonstrate the reliability of a new statistical approach that may be used for the automated generation of flow cytometry data files containing information on single events about a virtually infinite number of parameters. This new strategy opens the door for all applications of multiparameter flow cytometry for which a large number of parameters are needed, provided the fact that the cell population (or cell populations) of interest could be identified with a relatively limited number of markers.

Acknowledgements

The authors thank Prof. Nelson Spector (Departamento de Clínica Médica, Federal University of Rio de Janeiro, Brazil) for his helpful support.

Ancillary