Multiparametric flow cytometry has become an indispensable but complex tool for the diagnosis of acute leukemias. Interpretation of immunophenotypic data within a six-parameter analytical space relies on the standardization and validation of the instrument, the reagents, and the procedure. To address whether or not residual normal lymphocytes, usually present within leukemic samples, can serve as internal quality control for fluorescence intensity, 116 leukemic and 35 normal samples were analyzed.
Eight laboratories participated in the study and recruited a total of 151 individuals including 29 patients with B-cell precursor acute lymphoblastic leukemia (BCP-ALL), 77 with acute myeloid leukemia (AML), 10 with T-cell precursor acute lymphoblastic leukemia (T-ALL), and 35 normal bone marrow donors. Lymphocytes were gated according to the CD45hi/SSClo gating strategy, after which median fluorescence intensities (MFI) as well as percentages of positive cells (%positive) for CD19, CD22, CD7, and CD3 were recorded. Nonparametric statistics were used to compare variation within and between laboratories.
Normal lymphocytes within leukemic samples do not show substantial differences compared to lymphocytes from normal controls with respect to expression of CD19, CD22, CD7, and CD3. In particular, longitudinal control charts of MFI values for CD3 antigen provide useful information on analytical and instrument performance.
Flow cytometry immunophenotyping is an essential tool in the diagnostic procedure of acute lymphoblastic leukemia (ALL) and acute myeloid leukemias (AML) and its results are of great utility for the therapeutic decision-making process in these diseases (1–3). Recent technological developments in laser optics, fluidics, antibody generation, fluorochromes, and software have overcome many difficulties that have previously hampered the widespread availability of multiparameter flow cytometry (MFC); with such advances, most laboratories involved in the diagnosis of acute leukemias currently apply three- or four-color antibody panels (4–7). With so many variables, standardization and validation of instrumentation and methodology is essential to ensure the technical quality of the results and the correct diagnosis (8). Therefore, MFC has to be performed with appropriate quality control measures to avoid misinterpretations. Not only should individual reagents be well characterized with respect to specificity and performance, but also all combinations of fluorochrome-conjugated monoclonal antibodies must be critically evaluated for staining intensity, spectral overlap, and instrument compensation as well. Furthermore, normal and pathological fluorescence (FL) patterns must be characterized for the diagnostic interpretation of specific combinations of antigens. In an attempt to standardize these procedures, extensive recommendations have been published in Europe (9) and in America (10–16). Despite these efforts, there are still many concerns regarding everyday practice, and wide variations in operating procedures among laboratories involved in the flow cytometric immunophenotyping of acute leukemias still exist. Most studies that have been undertaken to evaluate interlaboratory variation have focused on enumeration of lymphocyte subpopulations or CD34+ hematopoietic cells (17–19). Only few trials have addressed interlaboratory variation in the context of flow cytometric immunophenotyping for acute leukemias. Quality control is particularly needed for this highly complex assay requiring various methodological steps and careful interpretation of the flow cytometric data together with clinical, morphological, and cytogenetic findings (20–23).
The current study emerged from a multicenter trial aimed to evaluate the diagnostic power of a limited four-color antibody panel consisting only of 13 monoclonal antibodies for immunophenotyping of acute leukemias. The performance of the study largely relied on written common protocols, without using external quality controls and staff training. A considerable variation between the labs became evident in the course of the study. Therefore, the question arose to find a measure for performance control within the given experimental setting.
In this respect, this multicenter trial offered the opportunity to retrospectively analyze the inter- and intralaboratory variability of immunophenotyping procedures of acute leukemias. Here, we analyzed the utility of normal lymphocytes present in leukemic samples as an internal quality control for semiquantitative measurements of the intensity of marker expression in the setting of immunophenotyping of acute leukemias.
PATIENTS, MATERIALS, AND METHODS
Eight European laboratories participated in the study. Between April 2000 and June 2001, a total of 151 consecutive samples from adults and children were analyzed. These included samples from 29 patients with B-cell precursor ALL (BCP-ALL), 77 with AML, 10 with precursor T-cell ALL (T-ALL), and 35 normal bone marrow (BM) donors. From all individuals or their guardians, informed consent was obtained according to the requirements of the local ethical committees. BM from healthy donors was obtained as part of the workup for allogeneic hematopoietic cell donation. Three centers using Beckman Coulter XL flow cytometers (Beckman Coulter, Miami, FL) recruited a total of 68 patients including 17 patients with BCP-ALL, 31 with AML, 5 with T-ALL, and 15 normal BM donors. The other five centers using FACSCalibur flow cytometers (BD Biosciences, San Jose, CA) recruited a total of 83 individuals including 12 patients with BCP-ALL, 46 with AML, 5 with T-ALL, and 20 normal BM donors. Detailed data on sample recruitment is presented in Figure 1.
BM aspirate samples that contained more than 30% blast cells by morphology were obtained at diagnosis prior to any chemotherapeutic treatment. The samples had been anticoagulated with EDTA or heparin. All samples were passed twice through a 25-μm gauge syringe for the homogenization of the distribution of BM blast cells in the sample. Cell count was performed to adjust the cell number to a final count of 107 cells/ml. All samples have been processed and analyzed within 24 h after BM aspiration, and prior to sample preparation, sample quality was evaluated by morphology.
Staining for Surface Antigens
Two tubes were stained exclusively for surface antigens with the following antibody combinations for FACSCalibur users: fluorescein isothiocyanate (FITC)/phycoerythrin (PE)/PE-cyanin5 (PC5)/allophycocyanin (APC) – (1) CD7 (BD Biosciences; 10 μl undiluted)/CD22 (BD Biosciences; 10 μl undiluted)/CD45 (Immunotech, Marseille, France; 10 μl 1/10 diluted with PBS)/CD33 (Immunotech; 5 μl undiluted) and (2) CD15 (BD Biosciences; 10 μl undiluted)/CD13 (BD Biosciences; 10 μl undiluted)/CD45 (Immunotech; 10 μl 1/10 diluted)/CD34 (BD Biosciences; 5 μl undiluted). For XL users, the combinations were as follows: FITC/PE/PE-Texas red (ECD)/PC5 – (1) CD7 (BD Biosciences; 10 μl undiluted)/CD22 (BD Biosciences; 10 μl undiluted)/CD45 (Immunotech; 10 μl undiluted)/CD33 (Immunotech; 10 μl undiluted) and (2) CD15 (BDB; 10 μl undiluted)/CD13 (BDB; 10 μl undiluted)/CD45 (Immunotech; 10 μl undiluted)/CD34 (Immunotech; 10 μl undiluted). A third tube was processed in parallel to measure the autofluorescence levels of the blast cells. This tube was only stained with CD45-PC5 (Immunotech; 10 μl 1/10 diluted) for FACSCalibur users and CD45-ECD (Immunotech; 10 μl undiluted) for XL users.
Antibody titration experiments (in PBS) were performed beforehand for the calculation of the appropriate amount of antibody in each combination. Antibodies were not premixed but placed separately into each tube before 100 μl of PBS-diluted BM sample containing 106 nucleated cells was added. Tubes were gently mixed for 10 sec and incubated at room temperature (RT) in the dark for 15 min. After addition of 2 ml/tube Quicklysis (Cytognos, Salamanca, Spain) and gentle vortexing, the tubes were incubated a second time for 10 min in the dark at RT. Then, they were centrifuged once for 5 min at 500g and the supernatant was discarded. The cell pellet was resuspended in 2 ml/tube of filtered PBS and centrifuged again at 500g for 5 min; afterwards, the supernatant was discarded with a Pasteur pipette and the cell pellet was resuspended in 0.5 ml/tube of filtered PBS containing 1% paraformaldehyde (PFA). Flow cytometric acquisition of the sample was performed either immediately or within 24 h after storage at 4°C. All monoclonal antibodies were centrally obtained, aliquoted, and distributed to each participating center.
Simultaneous Staining for Surface and Cytoplasmic Antigens
Two tubes were stained simultaneously for surface and cytoplasmic antigens, with the following antibody combinations for FACSCalibur users (FITC/PE/PC5/APC): (1) cyCD3 (DakoCytomation, Glostrup, Denmark; 5 μl undiluted)/cyCD79a (DakoCytomation; 10 μl undiluted)/CD45 (Immunotech; 10 μl 1/10 diluted)/CD117 (BDB; 3 μl undiluted) and (2) CD19 (DakoCytomation; 10 μl undiluted)/cyMPO (DakoCytomation; 3 μl undiluted)/CD45 (Immunotech; 10 μl 1/10 diluted)/CD5 (BD Biosciences; 5 μl undiluted). For XL users, the combinations were as follows (FITC/PE/ECD/PC5): (1) cyCD3 (DakoCytomation; 5 μl undiluted)/cyCD79a (DakoCytomation; 10 μl undiluted)/CD45 (Immunotech; 10 μl undiluted)/CD117 (Immunotech; 5 μl undiluted) and (2) CD19 (DakoCytomation; 10 μl undiluted)/cyMPO (DakoCytomation; 3 μl undiluted)/CD45 (Immunotech; 10 μl undiluted)/CD5 (Immunotech; 5 μl undiluted). A third tube was processed in parallel to measure the autofluorescence levels of the blast cells. This tube was only stained with CD45-PC5 (Immunotech; 10 μl 1/10 diluted) for FACSCalibur users and CD45-ECD (Immunotech; 10 μl undiluted) for Coulter XL instrument users. All monoclonal antibodies were centrally obtained, aliquoted, and distributed to each participating center. The Fix & Perm reagent kit (Caltag Laboratories, San Francisco, CA) was used for cell fixation and permeabilization, strictly following the recommendations of the manufacturer, after staining for surface markers.
Standard Protocol for Instrument Set Up and Color Compensation (24)
First of all, fluidics were rinsed and the time-delay calibration was performed for the FACSCalibur flow cytometer, according to the manufacturer's instructions. The amplification modes and threshold were defined as follows: forward light scatter (FSC) and sideward light scatter (SSC) were expressed on a linear amplification scale. FL intensities were expressed as relative linear channels on a four-decade logarithmic scale. FSC threshold was set in channel 52 on a linear scale of 1,024 histogram channels. For the appropriate positioning of the FSC vs. SSC window of analysis, a representative unstained cell suspension (e.g., whole blood) was run and FSC photodiode amplifier gain and SSC photomultiplier (PMT) settings were adjusted so that all relevant populations (neutrophils, monocytes, lymphocytes, and blast cells) fall on scale. After selection of the lymphocyte population, the PMT settings for each FL parameter were adjusted, so that the unstained lymphocytes were placed in the first log decade, slightly away from the axes, with all compensation settings at zero. Fluorescent reference beads (RFP-60-1; Spherotech, Libertyville, IL) were run under the obtained FL PMT settings with electronic compensation settings still at zero and the mean channels for the four FL channels were recorded after acquisition of 5,000 singlet beads. These channels represented the “application-specific initial target channels” for subsequent instrument set-up procedures.
Color compensation settings were performed with four single-color tubes (i.e., CD8-FITC, CD8-PE, CD8-PC5 or ECD, CD8-APC or PC5) stained blood samples according to the respective fluorochrome combination for FACSCalibur or Coulter XL users. Lymphocytes were selected by FSC vs. SSC gating and the parameters for electronic compensation were adjusted, such that the mean FL channel of the CD8negative and CD8bright lymphocyte population in the nondetecting FL channels were identical ± 10 channels. A control tube with CD3-FITC/CD19-PE/CD4-PC5 or ECD/CD8-PC5 or APC for FL channels 1 to 4, respectively, was run afterwards to verify the following criteria for appropriate compensation circuits: FL2 Mean CD3-FITCpositive = FL2 Mean CD3-FITCnegative; FL1 Mean CD19-PEpositive = FL1 Mean CD19-PEnegative; FL3 Mean CD19-PEpositive = FL3 Mean CD19-PEnegative; FL2 Mean CD4-ECD/PC5positive = FL2 Mean CD4-ECD/PC5negative; FL4 Mean CD4-ECD/PC5positive = FL4 Mean CD4-ECD/PC5negative; FL3 Mean CD8-APC/PC5positive = FL3 Mean CD8-APC/PC5negative. The fluorescent reference beads were acquired once more with the obtained instrument settings and with the electronic compensation settings activated. The mean channels obtained for each FL detector represent the “application- and instrument-specific target channels” for subsequent controls of the positioning of the FL window of analysis. For subsequent instrument controls, RFP-60-1 beads were acquired prior to each experiment and the mean FL channels of the reference beads were plotted in Levey–Jennings charts. Adjustment of PMT settings and color compensation was done only if the reference beads repeatedly fell outside tolerance zones according to Westgard's rules (25).
According to the guidelines formulated by EWGCCA (24), instrument performance with respect to FL intensity measurements was checked periodically by running calibration beads (i.e., a 1:1 mixture of Spherotech RCP-30-1L [containing three low-intensity peaks] and RCP-30-1H [containing four high-intensity peaks]) as well as a blank bead (Spherotech BCP-60-1). Using this procedure, the participating laboratories were able to verify that background FL was on scale, and that the four instrument performance parameters for FL intensity measurements (as detailed in a study by Kraan et al.24) were within specifications.
Flow Cytometric Data Analysis, Interpretation, and Statistics
Each site consecutively enrolled leukemic and nonleukemic samples for the trial, and generated listmode (LMD) data on these samples on site. The LMD data were analyzed centrally (i.e., the Berlin site), using EXPO32 software (Beckman/Coulter). The data analysis protocol included two variable bivariate dotplot histograms in addition to the CD45/SSC dotplot for both surface and intracellular stainings. Lymphocytes were gated as being CD45hi/SSClo (26, 27) and the expression of CD7, CD3, CD19 and CD22 within the lymphocyte population was recorded (Fig. 2). In a few cases of T-ALL, it was not possible to separate normal T-lymphocytes from blast cells alone, according to the CD45 vs. SSC. In these overlapping cases, we were able to separate normal T-cells according to their lower CD7-expression compared to blast cells in a CD45 vs. CD7 plot. Antigen-negative controls were used to set the threshold between positive and negative populations. For the positive population, the percentage of positive cells (%positive) and the median fluorescence intensity (MFI) values were determined on a four-decade logarithmic scale. Furthermore, ratios of %positive cells for CD7/CD3, CD19/CD22 were calculated. All statistical calculations and procedures were performed with the SPSS software (SPSS 11.0, Chicago, IL) including exploratory data analysis with boxplot diagrams and comparison of means (28). For evaluation of interlaboratory variability, nonparametric statistics were used, as indicated in the text.
We compared the MFI of CD19, CD22, CD7, and CD3 of normal lymphocytes in normal and leukemic samples between laboratories. Furthermore, ratios of MFI-CD19/MFI-CD3 and MFI-CD7/MFI-CD3 were also calculated. In Figures 3a–3c, the values for MFI-CD19, MFI-CD3, and MFI-CD19/MFI-CD3 of lymphocytes present in normal and leukemic samples are displayed as an example. Interlaboratory comparison shows a high variability for these parameters. In contrast, there were no significant intralaboratory differences between normal and leukemic samples, thus allowing pooled analysis of lymphocytes from normal and leukemic samples as compiled in Table 1. The highest MFIs were measured for CD22-PE (overall median ± SD value of 36.4 ± 16.7) and CD3-FITC (overall median ± SD value of 21.5 ± 10.5). CD22-PE was measured with a maximum MFI of 67.6 ± 36.5 and 50.5 ± 51.5 in centers VII and III, whereas in center II, a minimal MFI of 15.5 ± 5.0 was recorded. The lowest FL intensities were measured with the FITC-conjugated antibodies for CD19 and CD7, with an overall median ± SD value of 5.3 ± 2.4 and 7.4 ± 3.8, respectively. In center IV, MFI for CD7 of 14.3 ± 4.3 was almost four times higher than that obtained in center I and twice as high as in center III. The lowest MFI values for CD19 were recorded in center I and II with 2.2 ± 0.9 and 2.1 ± 0.8, respectively, whereas in center IV, a five times higher maximum value of 10.2 ± 2.4 was measured.
Table 1. Comparison of Interlaboratory Results With Respect to the Percentage of Positive Cells, Ratios CD7/cyCD3, CD19/CD22, cyCD3/CD19 and the Median Fluorescence Intensities of CD19, CD22, CD7 and cyCD3 Within the CD45hi/SSClo Gate Obtained for Each of the Eight Participating Centers
Further analysis compared MFI values for all 151 samples classified according to diagnostic entities and showed that the differences in MFI values for CD19, CD3, CD7, and CD22 between diagnostic categories were not significant (Fig. 3d). Therefore, the large variation of MFI values observed between different centers cannot be explained by differences in diagnostic categories of recruited samples and may be due to (i) differences in instrument set-up and instrument performance and (ii) differences in sample preparation.
In addition to MFI data, Table 1 provides information on percentages of cells positive (%positive) for B-and T-cell antigens. The %positive values for CD3 and CD7 were considerably higher than those for CD19 and CD22 and did not vary significantly between centers, thus indicating the predominance of T-lymphocytes within the CD45hi/SSClo lymphocyte gate of investigated samples (Table 1).
To enable evaluation of interlaboratory variation, interquartile ranges (IR) were used. This parameter reflects the variability of measurements regardless of absolute values and is therefore an adequate tool to compare interlaboratory performances. A small interquartile range complies with a low variability of the measurements, whereas large interquartile ranges indicate increased variation. Therefore, good intralaboratory performance corresponds to low interquartile ranges and vice versa. We have performed an interlaboratory comparison for the interquartile range of MFI-CD3 (Fig. 4a). The highest variability in laboratory performance was seen with an interquartile range of >20 in centers V and VIII and <5 in center I. By arbitrarily defined criteria (good performance = IR < 10, intermediate performance = IR between 10 and 20, and poor performance = IR > 20), three good-performing laboratories with interquartile ranges below 10 (centers I, II, and VI), three medium-performing laboratories with interquartile ranges above 10 (centers III, IV, and VII), and two poor performers (centers V and VIII) with interquartile ranges above 20, especially for the intracytoplasmic staining of CD3, were identified.
Figure 4 also shows longitudinal control charts of MFI-CD3 for all samples measured by different centers. The charts clearly demonstrate the low variation in centers I, II, and VI (“good performance,” Fig. 4b) as compared with centers III, IV, and VII (“intermediate performance,” Fig. 4c), and centers V and VIII (“poor performance,” Fig. 4d).
The analysis of MFI values for center IV showed similar longitudinal variance for all four FL channels, especially for the first four samples (Fig. 5a). Of interest, exploration of the information saved in each FCS 2.0 LMD file from center IV revealed that PMT-voltages were changed after measurement of the first four patients (Fig. 5b).
The goal of this investigation is to provide an internal quality control procedure for studies addressing FL intensity in the setting of immunophenotyping of acute leukemia. It emerged from a multicenter trial that was originally intended to evaluate the diagnostic power of a limited four-color antibody panel consisting only of 13 monoclonal antibodies to produce reliable and specific results for the clear distinction of acute leukemias from normal BM and the lineage assignment of acute leukemias. During the course of data exploration, it emerged that the trial also offered the possibility to investigate the inter- and intralaboratory variability in an international setting of eight European centers involved in immunophenotyping of acute leukemias. Instrument set up and calibration for multicolor immunophenotyping was done in each center according to guidelines formulated by the EWGCCA (24). The analytical procedure in each participating center was performed with a standardized supply of titrated monoclonal antibody and according to standardized operating procedures, including cell preparation with whole blood lysis, staining, instrument set-up, and FL compensation. LMD from each center were locally collected and centrally analyzed in one center. The expression of the T- and B-lymphocyte markers CD7, CD3, CD19, and CD22 within the CD45hi/SSClo lymphocyte gate were analyzed for assessing both inter- and intralaboratory variations in this multicenter study.
A prerequisite for the comparability of the measurements in eight different laboratories is the correct position of the window of analysis on each flow cytometer. In general, monitoring of instrument performance with regard to optical alignment, FL resolution, and sensitivity is achieved through the use of reference standard beads (8, 24). These records can be used for trend analysis of the laser power and voltage settings, to indicate potential technical problems. An extensive quality control of individual instruments was outside the scope of this trial. Each participant was provided with a detailed protocol plus standardized particles for instrument set-up and calibration of FL channels, including blank beads for verifying FL background signals. However, the participants were not requested to report these results unless they did not meet the specifications as per protocol (24). No such deviations were reported during the study.
Our approach to monitor for the quality and consistency of the assay performance, including instrument set up and calibration, was the use of normal lymphocytes, which can be distinguished from the blast populations in AML, BCP-ALL or T-ALL samples because of their high expression of CD45 and low SSC (26, 29, 30). Low or negative expression of CD45 occurs in most BCP-ALL and can be used as an easy discriminator for normal lymphocytes (31). In the present study, normal CD45hi-expressing lymphocytes were clearly separated from the bulk population of a CD45lo BCP-ALL. Our results also showed that normal lymphocytes from patients with BCP-ALL, AML, or T-ALL were not different in their composition and their FL intensities for CD19, CD22, CD7, and CD3 from those present in normal control individuals.
In general, these almost-indifferent qualities of lymphocytes from patients with acute leukemias and normal controls make it possible to use them as internal biological control particles for quantitative studies of intensity of marker expression. In contrast to reference beads, which are appropriate for monitoring instrument performance, lymphocytes are better suited for controlling FL compensation, cell preparation, and staining (23). Besides, they are present within every leukemic sample and additional acquisition is not needed. This offers the possibility to use their FL parameters in longitudinal control charts to visualize for the variations not only in cell preparation and staining, but also for the correct position of the window of analysis.
Despite the standardized operating procedures for cell preparation, staining, instrument set up, and FL compensation, considerable variations between centers and flow cytometers were not unexpected. Previous studies (32) have shown that results of cellular immunophenotyping measured as FL intensity are highly variable between instruments, in spite of a similar instrument set up and a strong similarity of their FL regression lines. This has been attributed to a varying combination of factors, including differences between instruments in laser power, optical filters, and performance of the fluidic systems (17, 33). Moreover, differences in institutional sample acquisition and conditions of sample storage (e.g., temperature and time to evaluation) may contribute to the variability between sites. In the current multicenter trial, we confirm these observations, since MFI values showed a high variability between centers. The highest MFI was observed for the PE-conjugated anti-CD22 antibody, with especially high values in center VII and VIII. The lowest MFI values were recorded for the FITC-conjugated anti-CD19 and anti-CD7 antibodies, whereas CD3-FITC had an MFI more than twice as high as that for CD7. This might be due to the different amounts of expression per cell of the different antigens and differences in the FL properties of distinct fluorochromes.
To track down such variations, longitudinal control charts are suitable tools. Charts were performed for each center, visualizing sample to sample variation over time for each parameter and each antibody tested. The alignment of these variations with the information on instrument set up and compensation recorded in the FCS 2.0 LMD files at least partially explains that changes in PMT-voltage and compensation can be the reason for considerable variations in FL performance. This has happened, e.g. in center IV, were the instrument settings were changed after the first four samples, resulting in a decrease in the MFI for CD3 together with additional changes in other markers as well.
In view of the predominance of T-lymphocytes within the CD45hi/SSClo lymphocyte population, T-cell markers seem to be the most adequate QC parameter both for intra- and interlaboratory control. According to the antibody panel applied in this study, CD3 may be used for combined surface membrane and cytoplasmic stainings, and CD7 as the QC parameter for surface membrane staining only. To standardize FL intensity data, MFI results may be corrected for differences between centers by expressing them as relative values (ratio) or as percentage, according to the relevant QC parameter (e.g., MFI-CD10-FITC-blasts/MFI-CD3-FITC-lymphocytes) with CD3 for combined surface membrane and cytoplasmic stainings, and CD7 as the QC parameter for surface membrane staining only.
A prerequisite for the application of our protocol is that in each panel at least one reagent must be present that is positive on the lymphocyte population for each fluorochrome used. This prerequisite may require adaptation of many staining panels that are in current clinical use.
Furthermore, these parameters can be used to define a tolerance zone to accept or reject study samples. The tolerance zone may be defined, e.g., as the 90% confidence limits of a sufficiently large number of observations.
In conclusion, normal lymphocytes present within leukemic samples do not show substantial quantitative or qualitative differences compared to lymphocytes from normal controls with regard to the expression of CD19, CD22, CD7, and CD3. Control charts registering FL intensity parameters for these markers can provide useful information for the interpretation of both inter- and intralaboratory variations. Residual normal lymphocytes are a suitable QC parameter for monitoring intralaboratory performance and for the evaluation of interlaboratory variability in the setting of immunophenotyping of acute leukemias.
The authors acknowledge the support given by Beckman Coulter and BD Biosciences, especially for the supply of monoclonal antibodies free-of-charge to all participants of the study. Furthermore, they also thank M. Martin and K. Liebezeit (Berlin) for perfect technical assistance.