A computer simulation for exploring the detection of monoclonal B-cell lymphocytosis by flow cytometry


  • How to cite this article: Champion PD. A computer simulation for exploring the detection of monoclonal B-cell lymphocytosis by flow cytometry. Cytometry Part B 2010; 78B (Suppl. 1): S110–S114.



Monoclonal B-cell lymphocytosis (MBL) is defined by the presence of monoclonal B-cells in peripheral blood in the absence of hematologic disease. MBL is detected by flow cytometry with increasing frequency as the number of B-cells acquired increases.


Computer simulations in R language were used to examine the impact of increasing the number of B-cells acquired on the sensitivity of detecting MBL and to explore the possibility of detecting distinct B-cell clones among polyclonal B-cell populations.


With simulated populations containing 0.1%–1.0% monoclonal B-cells, the number of clonal B-cells detected showed a normal distribution in the upper range of clonal cells acquired and more nearly log-normal as the distributions became bounded by 0. The distributions peaked around the clonal prevalence. The detection of MBL increased sharply with a small increase in the total number of B-cells acquired when the number of clonal cells acquired was near the MBL cutoff point. MBL could be detected in log-normally distributed polyclonal B-cell populations.


Sampling variability in detecting monoclonal B-cells can be investigated through simulation. The observed population prevalence of MBL can be approximated with reasonable assumptions about the distribution of clonotypes in the circulating B-cell compartment. © 2010 International Clinical Cytometry Society

Monoclonal B-cell lymphocytosis (MBL), an essential precursor to chronic lymphocytic leukemia (1), is defined as the presence of a peripheral blood monoclonal B-cell population in the absence of hematologic or immunologic disease (2). Although the early population-based studies using routine clinical methods of flow cytometry found the prevalence of MBL to be less than 1% (3), subsequent studies using more sensitive research methods have found much higher levels (4). A recent population-based study in which five million leukocytes were analyzed from each individual found MBL in 14% of older adults (5). Mathematical modeling of this empiric data suggests that MBL would be detected in virtually all older adults if a sufficient number of leukocytes were analyzed (A. Orfao, personal communication).

The ability to detect a B-cell clone by flow cytometry depends on acquiring enough B-cells with a sufficiently distinct clonotype, defined in principle by either phenotypic or genotypic characteristics. Because all B-cells arise from clonal expansion, the distinction between MBL and a normal polyclonal population with one or more dominant clones is somewhat arbitrary. The methods used to detect MBL have a profound influence on its reported prevalence and natural history (4, 6). In particular, the number of B-cells acquired is a critical factor in the sensitivity of detection (4). However, no systematic analysis of how this parameter affects sensitivity has been attempted, nor has the influence of differing population distributions within the peripheral blood B-cell compartment been explored.

This report describes computer simulations for investigating the detection of MBL by flow cytometry. Two scenarios are simulated; one simple and the other more complex. The simple scenario assumes that an MBL clone constitutes 0.1%, 0.5%, or 1.0% of all circulating B-cells, which reflects the practical analytical situation. The simulation then examines the effect of increasing the number of B-cells acquired on the sensitivity of detecting MBL. The more complex scenario explores the probability of identifying any monoclonal population within the circulating polyclonal B-cell compartment. The simulation assumes that a distinct phenotype or genotype, collectively referred to as a clonotype, can be identified for every B-cell clone.


All simulations were coded in the open-source R language (7). The code is available through the Cytometry web site.

We choose parameters for the simulation based on the report by Nieto et al. (5) because of the extensive B-cell sampling used in this general population study. To simulate the simple case of detecting one monoclonal population against a background of polyclonal B-cells, we assumed that the clonal prevalence (i.e., the proportion of monoclonal cells among all B-cells) constituted 0.1%, 0.5%, or 1.0% and that the detection of a monoclonal B-cell was random. We created a B-cell population based on a uniform distribution that included the specified proportion of monoclonal B-cells. We then simulated the detection of MBL cells among varying numbers of total B-cells ranging from 1,000 to 128,000 in increments of half a binary log (i.e., 2 raised to the 1, 1.5, 2.0, 2.5, etc., power). The number of simulated samples analyzed for a given number of B-cells collected was determined initially by evaluating the smoothness of the resulting frequency distributions and then empirically establishing a formula that produced a sufficiently smooth distribution for all collection sizes (Table 1).

Table 1. Number of Virtual Samples in the Simple MBL Simulation
B-cells acquiredNumber of samplesa
  • a

    Calculated by SizesSampleNum = (SampleSize × 64)/(SqRt (SampleSize/1,000)).


To explore the more complex situation of detecting monoclonal B-cells among the clonal families constituting the entire compartment of circulating B-cells, we considered a clonal family to include all the cells originating from successful IgH and IgL rearrangements in pre-B-cells.

We simulated populations in which the numbers of B-cells in each clonal family was log-normally distributed. Each simulated population had sufficient clonal families to total 800 million circulating B-cells (Table 2), reflecting the average B-cell count reported by Nieto et al. (5) and assuming a total blood volume of 5 L. We further assumed that the peripheral blood B-cell compartment comprised 800,000 different clonal families, an arbitrary value that lies within the wide range of estimates based on experimental data and the combinatorial varieties produced by immunoglobulin genes. Based on this assumption, the mean for the number of B-cells in each clonal family was initially set at 1,000. Using the rlnorm() function in R language, the log sigma parameter (standard deviation) was empirically evaluated at values between 1.0 and 2.5 to explore distributions with increasing right-hand skewness, i.e., an extended right tail containing the most highly populated clonal families. As in the simple scenario, we varied the total number of B-cells acquired from 1,000 to 128,000. The detection of MBL was defined as the acquisition of at least 50 cells from the same clonal family.

Table 2. Parameters for the Complex MBL Simulation
Peripheral blood B-cell count160 cells/μL
Total blood volume5 L
Total circulating B-cells800 million
B-cells acquired for analysis1,000–128,000
Distribution of B-cells among all clonesLog normal (natural log base)
Standard deviation of log distribution1.0–1.3
Total number of clonotypes among all B-cells800,000
Average number of B-cells per clone1000
Clonal cells required for MBL designation50


The results from a total of 4,212,727 simulated samples for the simple scenario are depicted graphically (Fig. 1) as both distributional curves (left) and as cumulative relative frequency curves (right). As expected, the distributional curves peaked around the overall mean of clonal B-cells in each curve. For instance, at 1% clonal prevalence, the acquisition of 8,000 B-cells showed a peak around 80 clonal cells, the acquisition of 4,000 B-cells showed a peak around 40 clonal cells, and the acquisition of 2,000 B-cells showed a peak around 20 clonal cells. The curves appeared normally distributed in the higher range of the number of clonal cells detected. However, as the chance of detecting any clonal cell approached 0, the distributions appeared more log-normal.

Figure 1.

Results from the simple computer simulation for detecting monoclonal B-cells. The left panel of graphs shows the distribution of monoclonal B-cells detected in each of analysis of up to 90,509 B-cells acquired. The right panel of graphs shows the same results depicted as cumulative relative frequency distributions. The x-axis (number of monoclonal B-cells detected) is centered around 50, the criterion used in the study on which simulation was based (5).

The cumulative relative frequency curves reveal a surprising relationship between the total number of B-cells counted and the estimated probability of finding MBL. For instance, with a population containing 1% clonal B-cells, the chance of detecting 50 clonal cells is less than 10% if 4,000 B-cells are counted, but the chance increases to over 80% if just 1,700 additional B-cells are counted, and it approaches 100% if the count is doubled to 8,000 (Fig. 2). This nonintuitive result occurs when the means of the two frequency distributions lie on different sides of the clonal B-cell count criterion. From an epidemiologic standpoint, the nonlinear impact of sampling variability near a cutoff point complicates the relationship between the true population prevalence of MBL and the predictive value of laboratory methods used to detect it.

Figure 2.

Proportion of MBL samples in the simple simulation that met the criteria of ≥50 monoclonal cells detected as a function of the number of B-cells acquired.

For the complex scenario of detecting 50 or more monoclonal B-cells among those that constitute a normal polyclonal B-cell population, 100 simulated B-cell populations were generated using the parameters in Table 2 for each of the seven increasingly dispersed distributions (sigma 1.0–1.3). The B-cell populations in the seven distributions showed comparable values for the total number of clonal families and for the mean number of B-cells per clone (Table 3). However, the maximum number of B-cells per clone increased substantially as a function of sigma, reaching more than 10-fold higher at sigma 1.3 than at sigma 1.0. When these B-cell populations were analyzed by simulated flow cytometry in the same manner as those of the simpler scenario, the number of samples containing clonal B-cell counts equal to or greater than 50 increased markedly with increasing values of sigma (Fig. 3). At sigma = 1.15, 14% of the samples showed at least one clonal family with 50 or more cells. Because this sigma value best reflected the observations from which the parameters were chosen (5), we explored larger simulated populations (600) in this range of sigma values. At sigma 1.16, 14.7% contained at least one clonal family with a count of 50 or more B-cells (Fig. 4), the closest agreement with reported values. The details of B-cell clonality in these simulated populations are compared with those observed in the general population in Table 4. The most highly populated clonal families in the simulation accounted for a much smaller proportion of total B-cells than reported (median 0.04% vs. 0.38%), and among these, the number of biclonal observations was also smaller (4.5% vs. 19%). This result suggests that the log-normal distribution approximates that of the overall B-cell compartment but underestimates its right-hand skewness. This effect could be explained if the majority of clonal families in the circulating B-cell compartment were populated in a log-normal fashion while the most highly-populated families were undergoing active clonal expansion due to antigenic stimulation.

Figure 3.

Percentage of samples in the complex simulation that contained at least one clonal family with ≥50 monoclonal cells detected as a function of the sigma value for the log-normal distribution.

Figure 4.

Cumulative relative frequency plots showing the number of clonal B-cells detected per family among the four most highly populated monoclonal families in each of the 600 simulated samples using a sigma value of 1.16. The most highly populated family (red) contains ≥50 cells in 14.7% of the simulated samples. The next two most highly populated families (blue and violet) contain ≥50 cells in 0.7% and 0.2% of the simulated samples. None of the simulated samples contain ≥50 cells in their fourth most highly populated family (green).

Table 3. Characteristics of B-Cell Populations in the Complex MBL Simulation
SigmaNumber of clonal familiesNumber of B-cells per clonal family
Table 4. Clonal B-Cell Detection in the Complex MBL Simulation
Samples with detectable clones (% of all samples)14.8014.30
Biclonal (%)4.5019.20
Proportion of clonal B-cells (% of all B-cells)  
25th percentile0.040.14
50th percentile0.040.38
75th percentile0.054.20

In this complex simulation, we assumed that flow cytometric analysis of individual cells could distinguish every clonotype in the circulating B-cell compartment. Although true in theory (e.g., single-cell in situ sequencing of IGVH regions), it is clearly not practical and perhaps not even possible. However, the increasing number of multiplexed parameters that can be measured by flow cytometry does provide ever-greater resolution of B-cell properties. Quantitative measures of expression add yet another dimension to help discriminate clonal families of B-cells (8). A more robust comparison between results from this type of simulation and actual laboratory observations may be possible in the future.


Flow cytometry can be readily simulated to investigate the detection of rare or uncommon events. Results from a more complex simulation based on the theoretical assumption that all unique clonotypes can be enumerated show that the observed population prevalence of MBL can be approximated with reasonable assumptions about the distribution of clonotypes in the circulating B-cell compartment. Further refinement of such simulations that better reflect findings from population-based studies should increase our understanding of B-cell homeostasis and pathology.

Results from the simple simulation are relevant to both epidemiologic research and clinical practice. These results show that dispersion near the cutoff point due only to stochastic sampling error can have a profound influence on dichotomous classification of MBL. This influence was not related to population prevalence because all the simulated samples contained at least 0.1% monoclonal B-cells. Further, the simulations did not take into account any source of laboratory bias or imprecision, and we assumed that MBL could be identified without error. Epidemiologic studies of MBL prevalence must take into account the number of B-cells acquired and the criteria for designating MBL (i.e., the number of monoclonal B-cells detected). In the clinical setting, these same concerns could have an important impact on identifying individuals who should be closely followed for progression to a B-cell malignancy.