Conflict of Interest: Cytognos S.L. is a part of the UE-supported EuroFlow Research Consortium and has implemented some of the algorithms described in the present study, in its proprietary software INFINICYT; Cytognos S.L. has a contract license of several patents owned by the University of Salamanca, of which A Orfao, CE Pedreira, and ES Costa are inventors. Other authors declare no competing financial interests.
A probabilistic approach for the evaluation of minimal residual disease by multiparameter flow cytometry in leukemic B-cell chronic lymphoproliferative disorders†
Article first published online: 3 OCT 2008
Copyright © 2008 International Society for Advancement of Cytometry
Cytometry Part A
Volume 73A, Issue 12, pages 1141–1150, December 2008
How to Cite
Pedreira, C.E., Costa, E.S., Almeida, J., Fernandez, C., Quijano, S., Flores, J., Barrena, S., Lecrevisse, Q., Van Dongen, J.J.M. and Orfao, A. (2008), A probabilistic approach for the evaluation of minimal residual disease by multiparameter flow cytometry in leukemic B-cell chronic lymphoproliferative disorders. Cytometry, 73A: 1141–1150. doi: 10.1002/cyto.a.20638
- Issue published online: 13 NOV 2008
- Article first published online: 3 OCT 2008
- Manuscript Accepted: 10 APR 2008
- Manuscript Revised: 12 FEB 2008
- Manuscript Received: 6 JUN 2007
- European Commission (EuroFlow). Grant Number: LSHB-CT-2006-018708
- The Instituto de Salud Carlos III, Ministerio de Sanidad y Consumo, Madrid, Spain. Grant Number: ISCIII-RTICC RD06/0020/0035-FEDER
- Ministerio de Educación y Ciencia, Madrid, Spain (Programa Hispano-Brasileño de Cooperación Universitaria). Grant Number: PHB 2004-0800-PC
- CAPES/Ministerio da Educação, Brasĺlia, Brazil
- CNPq-Brazilian National Research Council, Brasĺlia, Brazil
- FAPERJ-Rio de Janeiro Research Foundation, Rio de Janeiro, Brazil
- Fundación Marcelino Botĺn, Madrid, Spain
- minimal residual disease;
- flow cytometry;
- principal component analysis;
- pattern classification;
- Bayes theorem
Multiparameter flow cytometry has become an essential tool for monitoring response to therapy in hematological malignancies, including B-cell chronic lymphoproliferative disorders (B-CLPD). However, depending on the expertise of the operator minimal residual disease (MRD) can be misidentified, given that data analysis is based on the definition of expert-based bidimensional plots, where an operator selects the subpopulations of interest. Here, we propose and evaluate a probabilistic approach based on pattern classification tools and the Bayes theorem, for automated analysis of flow cytometry data from a group of 50 B-CLPD versus normal peripheral blood B-cells under MRD conditions, with the aim of reducing operator-associated subjectivity. The proposed approach provided a tool for MRD detection in B-CLPD by flow cytometry with a sensitivity of ≤8 × 10−5 (median of ≤2 × 10−7). Furthermore, in 86% of B-CLPD cases tested, no events corresponding to normal B-cells were wrongly identified as belonging to the neoplastic B-cell population at a level of ≤10−7. Thus, this approach based on the search for minimal numbers of neoplastic B-cells similar to those detected at diagnosis could potentially be applied with both a high sensitivity and specificity to investigate for the presence of MRD in virtually all B-CLPD. Further studies evaluating its efficiency in larger series of patients, where reactive conditions and non-neoplastic disorders are also included, are required to confirm these results. © 2008 International Society for Advancement of Cytometry
In recent years, multiparameter flow cytometry immunophenotyping has become an essential tool for the diagnosis and monitoring of response to therapy in a wide spectrum of diseases, including leukemic B-cell chronic lymphoproliferative disorders (B-CLPD) (1–3). Among other advantages, flow cytometry immunophenotyping allows for a rapid quantitative assessment of multiple characteristics of millions of cells, information being recorded for individual cellular events (4). This provides a tool for accurate multiparameter identification and characterization of neoplastic cells among normal cells in peripheral blood (PB) and bone marrow (BM), even when neoplastic cells are present at very low frequencies (≤10−4) among a major population of normal cells—minimal residual disease (MRD)—(5–8).
Detection of MRD by multiparameter flow cytometry immunophenotyping is based on the existence of different patterns of protein expression in normal versus neoplastic cells. In the last decade, it has been shown that MRD evaluation is of great clinical utility to predict disease recurrence and patient outcome in B-CPLD such as chronic lymphocytic leukemia (CLL) (7–12). To achieve the sensitivity required for MRD investigation, large numbers of cells—typically hundreds of thousands to millions—have to be analyzed (5, 7–13). Accordingly, information about several (≥6) different cell-associated features is typically obtained and stored in a digital list mode data file format for several hundreds of thousands to millions of cells measured. Such data files typically contain >106 individual data points; a number of entries which is by far larger than a typical data set containing information about a sample analyzed by DNA oligonucleotide microarray techniques (14–17).
In the past few years, important advances have been achieved in flow cytometers allowing for the measurement of an increasingly high number of parameters (≥10) in a more rapid way, through the evaluation of tens of thousands of cells per second (18). In contrast, analysis of the data recorded has not attained the same level of progress and it is still based on strategies, which were defined more than 20 years ago (4, 18, 19). Accordingly, with a few exceptions (20–26) analysis of flow cytometry immunophenotypic data typically relies on the definition of a variable number of bidimensional plots, where an experienced operator selects the subpopulations of interest (25–27). Often, depending on the expertise of the operator, specific cell populations—particularly those present at low frequencies—can be misidentified. Overestimation and/or underestimation of specific minor cell populations has a direct impact on the assessment of MRD with potential clinical/diagnostic consequences (5, 7, 28). More recently, we have described an alternative, automated method for analysis of flow cytometry immunophenotypic data (22, 25). With this new automated approach, we could detect neoplastic B-cells present in peripheral blood (PB) samples from patients with increased PB absolute lymphocyte counts, with a high efficiency and an increased reproducibility, by reducing expert-based data-analysis decisions. However, this method was only able to detect neoplastic cells in PB when they were present at relatively high frequencies (≥5% of the whole sample cellularity) (22, 25), its sensitivity being insufficient for MRD evaluation.
At present, consensus exists about the basic requirements for an adequate MRD technique to be used. Ideally, MRD approaches should allow clear and specific identification of neoplastic cells at frequencies ≤10−4, an MRD level which has proven to be clinically relevant in B-CLPD (1, 3, 8). In such case, flow cytometry measurements require a minimum number of events corresponding to the neoplastic cell population (e.g., ≥10 cells), to define it to be present, and at least 100 events to achieve statistical precision in quantifying the neoplastic cells (1, 3).
In this article, we describe an automated strategy for the detection of MRD in B-CLPD based on pattern classification tools and the Bayes theorem (29). With this probabilistic approach, we were able to systematically identify MRD by flow cytometry with a sensitivity of ≤8 × 10−5, a sensitivity of ≤2 × 10−7 being reached for the large majority of cases (80%). Furthermore, this approach allows “a priori” definition—e.g., at diagnosis—of the sensitivity that will be reached for each case, later on during follow-up of the disease.
MATERIALS AND METHODS
Patients and Samples
A total of 50 EDTA-anticoagulated diagnostic PB samples from 50 patients—28 males and 22 females; mean age of 51 years, ranging from 40 to 89 years—with different subtypes of leukemic B-CLPD were included in this study. Patients were classified according to the WHO criteria (30) into the following diagnostic categories: B-cell chronic lymphocytic leukemia (B-CLL), 31 patients (26 typical and 5 atypical B-CLL cases); mantle cell lymphoma (MCL), 8; splenic marginal zone lymphoma (SMZL), 3; mucosa-associated lymphoid tissue (MALT) lymphoma, 3; diffuse large B-cell lymphoma (DLBCL), 2 cases, and; follicular lymphoma (FL), one patient. The other two cases had an unclassifiable B-CLPD. Median white blood cell (WBC) and lymphocyte counts were of 31 × 109 leukocytes/L (range: 2.9 × 109–183 × 109 leukocytes/L) and 23 × 109 lymphocytes/L (range: 0.8 × 109 to 164 × 109 lymphocytes/L), respectively. Overall, the median percentage of neoplastic B cells in the 50 infiltrated specimens was of 40% (range: 0.4–92%). Moreover, PB samples from three patients diagnosed of B-CLL, which were collected after therapy under MRD conditions, were also included in this study.
In addition, EDTA-anticoagulated PB samples from a total of 5 adult healthy individuals —3 males and 2 females—were also collected. Median WBC and lymphocyte counts as well as B-cell percentages and absolute counts were of 6.4 × 109 leucocytes/L (range: 4.6× 109 to 10 × 109 leucocytes/L), 2.1 × 109 lymphocytes/L (range: 1.6 × 109 to 2.8 × 109 lymphocytes/L), 3.8% of WBC (range: 3–4.7%), and 0.24 × 109 B-cells/L (range: 0.16 × 109 to 0.47 × 109 B-lymphocytes/L), respectively.
All healthy volunteers and B-CLPD patients gave their informed consent prior to entering this study, which was previously approved by the local Ethical Committee of the University Hospital of Salamanca (Salamanca, Spain).
Multiparameter Flow Cytometry Immunophenotypic Studies
Multiparameter flow cytometric studies were performed for each PB sample using a panel of five combinations of fluorochrome-conjugated monoclonal antibodies (MAb)—fluorescein isothyocyanate (FITC)/phycoerythrin (PE)/peridinin chlorophyll protein-cyanin 5.5 (PerCP-Cy5.5)/allophycocyanin (APC)—which systematically included CD19PerCP-Cy5.5 in every combination, in addition to FMC7FITC/CD24PE/CD34APC, sIgλFITC/sIgκPE/CD5APC, CD22FITC/CD23PE/CD20APC, CD103FITC/CD25PE/CD11cAPC, and CD43FITC/CD79bPE, for a total of 15 different markers one (CD19) was common to all combinations. The techniques used to stain the cells have been previously described in detail by Sanchez et al. (13). For each staining corresponding to each individual sample analyzed, information about >5 × 104 cells was acquired in a FACSCalibur flow cytometer (Becton/Dickinson Biosciences, BDB, San José, CA) and stored using the CellQUEST software program (BDB). Overall, for each sample information about 17 different biological features—15 immunophenotypic and 2 light scatter characteristics—was measured and stored for a total of >2.5 × 105 events. For samples with low percentages of B cells (i.e., normal samples), an additional second step acquisition was specifically performed to measure higher numbers of B-cells contained in each stained aliquot. In this later step, an electronic live-gate set on a CD19 versus side light scatter (SSC) bivariate dot plot histogram was used to build a gate to collect information about the B-cells contained in a total of 106 events/sample aliquot. Accordingly, for each PB sample, five different data files were stored, each containing information about five or six cell attributes (two attributes related to light dispersion: forward light scatter (FSC), sideward light scatter (SSC) and 3- or 4-fluorescence attributes). Based on the panel of reagents used, three out of the 17 attributes measured were common to all five data files (FSC, SSC and CD19); all other 14 parameters varied among the data files according to the reagents used in each staining from the panel described above.
Data files from the five multicolor stainings corresponding to the same PB sample (from either B-CLPD patients or healthy individuals) were merged using the INFINICYT™ software program (Cytognos SL, Salamanca, Spain), as previously reported (26). Subsequently, the “calculation” function of the INFINICYT software based on the nearest neighbor principle (26, 31, 32) was applied to the merged data files, to calculate the information about each individual attribute not actually measured for individual events in the merged data files, for the whole panel of markers tested. As a result, for each PB sample, a single data file was obtained containing information about all 17 cellular attributes measured for each event recorded.
For each patient data file, a minimum of 2.5 × 105 cellular events were measured. The number of cellular events corresponding to B-cells within the total events measured in each file varied between 3.2 × 103 and 6.5 × 105 (mean of 4.2 × 105 ± 2.2 × 105) according to the percentage of B-cells—mean of 34% ± 29% (range: 0.1–92%)—present in each PB sample.
Data on normal cells was computationally generated by merging data from five different data files each containing 1 × 106 events corresponding to a normal PB sample (total events in the merged data file of 5 × 106). The merged data files contained events corresponding to normal PB cells from five different normal PB samples obtained from an identical number of healthy volunteers. From this merged file, a pool of 79,856 normal B-cell events, denoted from here on as “normal-B-cell-pool data,” was obtained.
In diagnostic PB samples from B-CLPD patients, very few normal B-cells can be typically found, because almost all B-cells present in the sample being neoplastic. For this reason, artificial “B-CLPD diagnostic-files” were built electronically for each patient, by mixing events corresponding to neoplastic B-cells from the patient, with events corresponding to normal B-cells from the “normal-B-cell-pool file” at a 1:1 proportion. In addition, further dilutions of neoplastic B-cell events in the “normal-B-cell-pool data” was also performed at different concentrations, to simulate progressively lower levels of MRD (“MRD-files”), as described below in this section.
For manual (operator dependent) data analysis, the INFINICYT software program was used. Briefly, during this process, total B cells were identified as those CD19+ events showing low to intermediate FSC and SSC values, after specifically excluding platelets and cell debris (11). Normal PB B cells were identified as being CD19+, CD20hi, CD22++, CD23−/lo, CD43−, CD79b+, FMC7+, CD103−, CD25−/lo, CD11c−/+, CD5−/lo, and either sIgκ+/sIgλ− or sIgλ+/sIgκ−. Neoplastic B cells were all other PB B-lymphocytes showing an aberrant phenotype. After this step, B-cells present in each sample were gated and the information about those events corresponding to B-cells was stored in a new data file.
For automated analysis, a Principal Component Analysis (PCA) Transformation was applied to each of the artificial “B-CLPD diagnostic-files” (33). In sequence, first we restricted our attention to the data projection into the space defined by the first versus second principal components. This PCA projection of diagnostic-files containing a 1/1 mixture of normal and neoplastic B-cells, systematically showed the presence in the data file of three clearly defined groups of B-cell events (Fig. 1). In fact, normal PB B-cells constantly displayed a bimodal distribution where two subpopulations defined by the isotype of the immunoglobulin light chain (sIg) expressed (sIgκ+ or sIgλ+), were detected; the third group of B-cell events corresponded to neoplastic B cells.
The goal of our automated data analysis strategy was first to split the two normal subpopulations (sIgκ+ or sIgλ+) for each of the 50 merged diagnostic-files, by using a classical k-means algorithm (34). Then, the normal B-cell population (e.g., blue and green events in Fig. 1C) which appeared to be localized closer to the neoplastic B-cells (e.g., red events in Fig. 1C) was identified for each “B-CLPD diagnostic-file” using the measure of the distance between the means of each B-cell population in the data file in ℜ17; the other far-off normal B-cell population (e.g., blue events in Fig. 1C) was temporarily discarded. Accordingly, at this point, we ended up with two B-cell populations for each case: one of the two normal B-cell populations and that of neoplastic B-cells.
Afterward, we calculated the mean and the covariance matrices for the two populations of normal B-cells (sIgκ+ or sIgλ+) as well as for each population of neoplastic B-cells from individual B-CLPD diagnostic data files (n = 50). On basis of these results, we estimated the likelihood rate for both populations, i.e., the probability distribution functions (pdf) where p(x|normal) and p(x|neoplastic). Here, p(x|normal) is the pdf of an event to assume a value x given that one knows that the population is normal (and not neoplastic). This strategy may be applied, because well-established conventional approaches for the investigation of MRD in B-CLPD indicate that in general, under MRD conditions after therapy, one should search for a neoplastic population with similar features to those observed at diagnosis (7–13). Accordingly, we assume that the pdf p(x|normal) and p(x|neoplastic) remain unchanged from diagnosis to sequential follow-up PB samples, obtained after therapy. Note that in fact, what one really wants to know is P(normal|x), i.e. the probability that an event belongs to the normal population, after measuring the attributes of this event. This goal may be achieved by applying the Bayes theorem (35) as follows (1):
Here, K is a constant to make P(normal|x) + P(neoplastic|x) = 1, and P(neoplastic) and P(normal) are the “a priori” probabilities of the two classes (neoplastic and normal B-cells, respectively). These reflect the probability of finding an event in one of the two classes, prior to be able to “see” the real data. In many applications, these a priori probabilities can be easily estimated by the relative frequencies of the classes in the sample. However, in the MRD setting, we are interested in estimating the “a priori” probabilities in the “B-CLPD diagnostic-files” to be applied, not at diagnosis, but after therapy during the follow-up period where the relative proportion between normal and neoplastic B-cells is expected to change and variably increase. This introduces a new challenge, because the number of neoplastic B-cell events is exactly what we are looking for. To overcome this awkwardness, we introduced the following iterative procedure: Step 1: Let us denote the number of neoplastic B-cell events at iteration ‘i’ as . Set counter i = 0 and initialize such that the frequency of neoplastic B-cell events is initially overestimated; Step 2: Estimate P(neoplastic) based on and apply the Bayesian Theorem as in (1) to calculate P(neoplastic|x) and P(normal|x); Step 3: Allocate all observations to the neoplastic or the normal B-cell populations by applying the Optimal Bayesian Decision Rule (36): An observation x is set to the neoplastic B-cell population if P(neoplastic|x) > P(normal|x), otherwise it is assigned to the normal population; Step 4: Increase i, recalculate ; and Step 5: If go to Step 2, otherwise, STOP.
For all experiments, we started with = 1000 events, assuming that one is seeking for a residual neoplastic B-cell population smaller than this number. To test the sensitivity of the method in detecting progressively lower numbers of MRD, we selected decreasing quantities of observations corresponding to neoplastic B-cells from each diagnostic data file and added these events to the pool of normal events (“normal-B-cell-pool data” file). Accordingly, for each of the 50 B-CLPD samples, files containing only the neoplastic B-cell events were used to randomly built 88 different sets of data with 1, 2, 3, …, 50, 60, 70, …, 300, 350, 400, …, 1000 neoplastic B-cell events. Each of these sets was merged with the “normal-B-cell-pool data” file to generate files containing decreasingly lower levels of MRD—“MRD files”—. Consequently, for each of the 50 patients, 88 “MRD-files” were generated containing known proportions of between 1 and 1000 neoplastic B cells in the pool of 5 × 106 normal cells (MRD frequencies of between 2 × 10−4 and 2 × 10−7). The overall procedure run for each individual B-CLPD case is summarized in Figure 2.
Two measures of performance were used for the 50 B-CLPD cases; the first relates to the minimal number of true neoplastic B-cell events the system is able to identify in a total of 5 × 106 events -sensitivity of the method-; for this purpose we defined the sensitivity of the method as the minimum number of true neoplastic B-cell events added to the “normal-B-cell-pool data” file that is associated with the detection of more than 60% of the neoplastic B-cell events merged. The second measure relates to the level of agreement observed between the number of neoplastic B-cell events added to the “normal-B-cell-pool data” file and the corresponding number of neoplastic B-cell events actually detected by the proposed procedure in that specific merged data file. For each B-CLPD case, we compared the MRD dilution frequencies of neoplastic B-cell events with the number of neoplastic B-cells identified to be present by the approach here described, in each of the individual MRD-files, and calculated the degree of correlation between the two measures (Pearson correlation coefficients).
Overall, a high degree of correlation was observed between the number of diluted and the number of computationally identified neoplastic B-cells for all 50 B-CLPD cases included in this study. In most cases (45/50 cases; 90%), Pearson correlation coefficients (r2) higher than 0.999 were detected. Those five cases showing the lowest correlation coefficients (r2 ≥ 0.964 and ≤ 0.999) are displayed in Figure 3. Of note, for all these later five cases, bivariate projections of the first versus the second principal components (Fig. 3, Column A) showed occurrence of a clear, partial overlap between the neoplastic B-cells and one of the normal B-cell populations in ℜ17 (either sIgκ+ or sIgλ+ normal B-cells).
Regarding sensitivity, in most cases (40/50 cases; 80%), MRD detection at levels as low as 1 event in 5 × 106 normal PB cells (2 × 10−7) were achieved. Interestingly, those five cases showing coefficients of correlation (r2) ≤ 0.999, which are represented in Figure 3, were among those 10 cases that showed a lower sensitivity between 8 × 10−5 (0.008%) and 6 × 10−6 (0.0006%) (Fig. 3, Column D). The other five cases in which a sensitivity >2 × 10−7 was achieved and had correlation coefficients (r2) ≥ 0.999 showed sensitivity levels of between 6 × 10−6 (0.0006%) and 6 × 10−7 (0.00006%) (Fig. 4). Accordingly, overall sensitivity was >1 × 10−6 in only 7 cases (14%) with an upper limit of 8 × 10−5. Of note, cases with a lower sensitivity (of between 8 × 10−5 and 6 × 10−7) corresponded to three cases of MCL (3/8 patients), three SMZL (3/3 cases), one FL (1/1 patient), and 2 B-CLL (2/31 cases).
To evaluate the specificity for MRD detection of the proposed approach, the same strategy was applied to each “MRD-file,” after subtracting all neoplastic B-cell events in the PCA projection. The aim was to verify whether any event corresponding to normal B-cells would then be equivocally classified as a neoplastic B-cell event. Of note, in most cases (43/50 cases; 86%), no events were wrongly identified as belonging to the neoplastic B-cell populations. In the remaining seven patients, only one (3/50 cases; 6%), four (2/50 cases; 4%) and at maximum five (2 cases; 4%) out of 5 × 106 events were wrongly identified as belonging to the neoplastic B-cell population (≤1 × 10−6).
To validate the method in patients with resistant disease as well as real MRD cases, PB samples from three different B-CLL patients were sequentially evaluated at diagnosis and after treatment. In all three cases, with the probabilistic approach here proposed, we were able to identify the presence of neoplastic B-cells after therapy (Fig. 5). For individual cases, 99%, 97%, and 92% of all neoplastic B-cells present in the follow-up samples according to an expert operator were correctly identified with the automated probabilistic approach. Furthermore, in all three cases, no events corresponding to normal B-cells as defined by an expert operator were wrongly classified as neoplastic B-cells. The specific percentages detected in the follow-up samples from these patients by an experienced operator versus the automated probabilistic approach here proposed were of 0.45%, 0.11%, and 48.9% vs. 0.45%, 0.1%, and 45%, respectively.
In the last decade, MRD detection by flow cytometry has been increasingly used to monitor response to therapy (5–11, 37). Among other advantages over molecular approaches used for MRD detection, flow cytometry is based on a relatively rapid and simple interrogation of hundred thousands to millions of cells, information being collected for each individual cell measured (1, 5, 27). In addition, it can be applied to the great majority of all leukemia/lymphoma patients (1, 5, 11, 37). Finally, it has been shown that conventional flow cytometry approaches for MRD detection are highly sensitive allowing for the identification of down to around 1 × 10−4 (0.01%) neoplastic B-cells among a major population of normal PB and BM hematopoietic cells (1, 5, 7, 8, 38). However, it should be highlighted that for MRD investigation by flow cytometry, an expert operator with extensive and detailed knowledge about the patterns of protein expression associated with normal versus neoplastic B-cells is typically required for adequate data analysis (1, 5, 11, 28). This, together with the variable patterns of phenotypic aberrations detected among different subtypes of B-CLPD has limited the establishment and implementation of standardized data analysis procedures that would facilitate the extended use of standardized flow cytometry MRD approaches in routine clinical diagnostic laboratories.
Here, we propose the use of a probabilistic approach for the evaluation of MRD by multiparameter flow cytometry in B-CLPD, through automated analysis of patient data files measured at diagnosis and the dilution of neoplastic B-cell events, at increasingly lower concentrations, into data files containing information about normal PB B-cells. Overall, our results show that this approach can be applied to virtually all B-CLPD with both a high sensitivity and specificity, whenever the search for minimal numbers of neoplastic B-cells similar to those detected at diagnosis is required. Accordingly, the strategy here proposed reached a sensitivity of 2 × 10−7 in 80% of the cases. Such sensitivity is significantly greater than the best sensitivity described in the literature for MRD by flow cytometry (1 × 10−5) in B-CLPD (7–13). Of note, for the remaining cases which displayed a lower sensitivity: between 8 × 10−5 (0.008%) and 6 × 10−7 (0.00006%); this could be potentially improved with additional markers. Such markers should be capable of increasing the differences already observed at diagnosis between normal and neoplastic B-cells. Inclusion of markers aimed at detecting aberrant B/cell phenotypes in SMZL, MCL, and FL (e.g., bcl2 and CD10) (39) would be particularly useful. To a great extent, this high sensitivity was reached because of the high specificity achieved with the proposed approach, because aberrant phenotypes were detected in most cases, where no normal B-cell events were misclassified. Of note, with the automated approach proposed information about the specificity and sensitivity associated with each patient is obtained already at diagnosis, to be used later on for MRD evaluation. This is particularly important because it is not possible to base the a priori evaluation on the prevalence of each cell population at diagnosis as in Tosetto et al. (40); in this regard, in our study we introduced a scheme that provides an iterative estimation for the a priori probability. However, it should be noted that normal control B-cells were obtained from a relatively reduced number of healthy adults and thus, further studies evaluating the efficiency of the approach here proposed in larger series of patients where PB B-cells from reactive conditions and non-neoplastic disorders are also included as control B-lymphocytes are still necessary to confirm our results.
The proposed approach is aimed at identifying MRD in B-CLPD based on the probability of each individual event measured to correspond to either a normal or neoplastic B-cell. In most cases, one event was sufficient to establish the presence of MRD. Nowadays, with conventional four-color approaches for MRD detection, ≥10 events presenting similar immunophenotypic features are required to define presence of MRD; in addition, around 100 events are necessary to precisely quantify their percentage among other cells in a sample (1–3). By increasing the number of parameters simultaneously measured, we also decreased the number of events required to define the presence of an abnormal cell population down to five events (maximum number of misclassified normal B-cells in a sample) for a 100% efficiency. Furthermore, each event identified as corresponding to a neoplastic B-cell by the probabilistic approach can be further examined by an expert operator using conventional methods of manual data analysis, based on the visualization of multiple bidimensional plots. In this way, the expert operator can use the established knowledge about the phenotypic features displayed by normal B-cells and the aberrant patterns of protein expression found in B-CLPD, to evaluate the consistency of the probabilistic approach here proposed, in real individual cases. Interestingly, similar results were obtained by diluting neoplastic B-CLL samples in normal PB (n = 2), prior to staining, instead of using electronic dilution of events corresponding to neoplastic B-cells in a pool of electronic events corresponding to normal PB B-cells (data not shown).
A potential limitation of the strategy proposed here is that, after therapy, neoplastic cells from B-CLPD patients may display variations in their immunophenotypic attributes with respect to those observed at diagnosis. Previous reports have shown that this may actually occur in a few B-CLPD patients (40–42). In such cases, the proposed approach could be associated with false negative results. Because of this, we tested the probabilistic approach by analyzing a few cases where both diagnostic and MRD samples from the same patient were available, confirming the reproducibility of the approach in real MRD samples. Further studies are required to investigate the utility of this strategy to detect infiltration by B-CLPD of tissues other than PB (e.g., bone marrow, cerebrospinal fluid) for a more reproducible staging of the disease (1). In line with this, the proposed probabilistic approach could also be useful in the evaluation of MRD in other hematological malignancies. This would be particularly useful in acute myeloid leukemias where manual data analysis is much more complex due to coexistence of different abnormal cell populations in the same sample and a marked phenotypic heterogeneity (6, 28) and to the effects of therapy (43).
In summary, here we propose and evaluate a probabilistic approach aimed at automating and standardizing the search for MRD, based on the immunophenotypic features of neoplastic cells observed at diagnosis in B-CLPD. Overall the proposed strategy was associated with a higher specificity and sensitivity than previously defined with expert-based manual data analysis approaches and points out its potential utility also in other minimal disease situations in both B-CLPD and other hematological malignancies.
- 3H43-A2 Clinical Flow Cytometric Analysis of Neoplastic Hematolymphoid Cells; Approved Guideline, 2nd ed. Clinical and Laboratory Standards Institute: Wayne, PA; 2007..
- 12Eradication of minimal residual disease with alemtuzumab in B-cell chronic lymphocytic leukemia (B-CLL) patients: The need for a standard method of detection and the potential impact of bone marrow clearance on disease outcome. Cancer Invest 2005; 23: 488–496., , .
- 29Baysian Decision Theory. In: DudaRO,HartPE,StorkGD, editors. Pattern Classification,2nd ed. New York: Wiley; 2001. pp 20–82., , .
- 312007., , . A method for generating flow cytometry data files containing an infinite number of dimensions based on data estimation. U.S. Pat. No. 11/240,167;
- 322007., , . Generation of flow cytometry data files with a potentially infinite number of dimensions derived from the fusion of a group of separate flow cytometry data files and their multidimensional reconstruction with both actually measured and estimated flow cytometry data. Eur. Pat. No. EP1,770,387;
- 33Maximun-likelihood and Bayesian parameter estimation. In: DudaRO,HartPE,StorkGD, editors. Pattern Classification,2nd ed. New York: Wiley, 2001; 115–116., , .
- 34Unsupervised learning and clustering. In: DudaRO,HartPE,StorkGD, editors. Pattern classification,2nd ed. New York, 2001; 526–527., , .