Impact of two independent bone marrow samples on minimal residual disease monitoring in childhood acute lymphoblastic leukaemia

Authors


Vincent H. J. van der Velden, Department of Immunology, Erasmus MC, Dr Molewaterplein 50, 3015 GE Rotterdam, the Netherlands.
E-mail: v.h.j.vandervelden@erasmusmc.nl

Summary

Minimal residual disease (MRD) diagnostics are used for risk group stratification in several acute lymphoblastic leukaemia (ALL) treatment protocols. It is, however, unclear whether MRD is homogeneously distributed within the bone marrow (BM) and whether this affects MRD diagnostics. We, therefore, analysed MRD levels in 141 paired BM samples (two independent punctures at different locations) from 26 ALL patients by real-time quantitative polymerase chain reaction (PCR) analysis of immunoglobulin and T-cell receptor gene rearrangements. MRD levels were comparable in 112 paired samples (79%), whereas two samples (both taken at day 15) had MRD levels that differed more than threefold. In the remaining 27 paired samples, MRD could be quantified or detected in one sample only. In four patients, MRD-based risk group classification was dependent on the site of BM puncture. Repetition of MRD analyses using 10-fold replicates instead of triplicates resolved most differences. In conclusion, MRD levels in paired BM samples were highly comparable, indicating that it is sufficient to analyse MRD in a single sample only. Nevertheless, MRD-based risk group classification can differ between paired BM samples, mainly because of variation below the quantitative range of the PCR assay rather than to a different distribution of leukaemic cells within the BM.

Several studies have shown that detection of minimal residual disease (MRD) in childhood acute lymphoblastic leukaemia (ALL) is clinically relevant in de novo ALL, relapsed ALL and in ALL patients undergoing stem cell transplantation (Cave et al, 1998; Van Dongen et al, 1998; Coustan-Smith et al, 2000; Panzer-Grumayer et al, 2000; Goulden et al, 2003). Based on these results, MRD diagnostics is now implemented in many front-line ALL treatment protocols, in which patients are stratified according to MRD levels in bone marrow (BM) samples obtained during and after induction therapy (Schrappe, 2002). In most of these MRD-based stratification studies, MRD diagnostics is performed by real-time quantitative polymerase chain reaction (RQ-PCR) analysis of immunoglobulin (Ig) and T-cell receptor (TCR) gene rearrangements. (Szczepanski et al, 2002; Van der Velden et al, 2003).

It is often assumed that leukaemic cells are randomly distributed throughout the BM compartment. However, animal models indicate that leukaemic cells may show a non-homogeneous distribution in the BM, especially during and after therapy (Martens et al, 1987). Whether such non-homogeneous distribution is also present in patients with ALL is not known. Consequently, it is unclear whether such non-homogeneous distribution would affect MRD diagnostics and MRD-based risk group stratification. Therefore, we analysed MRD levels in paired BM samples (two independent punctures at different locations) obtained from paediatric ALL patients during the first year of therapy.

Patients, materials and methods

Patients and sample processing

Twenty-six paediatric ALL patients, diagnosed at the Erasmus MC – Sophia between September 2001 and January 2003, were included in this study. The diagnosis of ALL was based on standard morphological, cytochemical, and immunological criteria. All patients (20 precursor-B-ALL and six T-ALL) were treated according to the Dutch Childhood Oncology Group (DCOG) ALL9 protocol (Jansen et al, 2005). After obtaining informed consent according to the guidelines of the Medical Ethical Committee (Erasmus MC, Rotterdam, The Netherlands), paired BM samples were obtained by performing two independent punctures from the left and right pelvic bone under general anaesthesia. The paired BM samples were obtained at pre-defined time points during the first year of the protocol, i.e. day 15 (n = 20), day 28 (n = 23), day 42 (n = 19), 3 months (n = 22), 6 months (n = 22), 8 months (n = 7), 9 months (n = 13), and/or 12 months (n = 15) after start of therapy. Samples were immediately transported to the laboratory, where mononuclear cells (MNC) were separated and DNA was isolated (Verhagen et al, 1999).

MRD analysis

Minimal residual disease levels were determined by real-time quantitative PCR (RQ-PCR) analysis of rearranged Ig and/or TCR gene rearrangements. Diagnostic samples were screened for the presence of IGH, IGK–Kde, TCRG, TCRD, Vδ2–Jα, and TCRB rearrangements using PCR heteroduplex analysis followed by sequencing of clonal rearrangements (Pongers-Willemse et al, 1999; Verhagen et al, 2000; Szczepanski et al, 2001, 2004; Van Dongen et al, 2003). Patient-specific primers were designed and used in combination with germline probes and primers as described previously (Verhagen et al, 2000; Van der Velden et al, 2002a,b; Bruggemann et al, 2004; Szczepanski et al, 2004). RQ-PCR data, generally performed in triplicate, were analysed according to the guidelines of the European Study Group on MRD detection in ALL (V. H. J. van der Velden, unpublished observations; Van der Velden et al, 2003). Briefly, the quantitative range of the assay was defined as the lowest dilution of the standard curve (a) giving good amplification, (b) with ΔCt of replicates <1·5, (c) at least 3·0 Ct apart from the lowest Ct of normal MNC (analysed in sixfold), (d) with the mean Ct value between −3·0 and 4·0 apart from the previous 10-fold dilution, and (e) resulting in a standard curve with a slope between −3·1 and −3·9 and a correlation coefficient >0·98. The sensitivity was defined as the lowest dilution (a) giving good amplification, (b) at least 1·0 Ct apart from the lowest Ct of normal MNC, (c) with at least one well positive (ΔCt replicates not relevant), (d) with the lowest Ct value within 20 cycles from the undiluted sample. MRD levels in follow-up sample were quantified if the mean Ct value of the replicates was within the quantitative range of the assay and if ΔCt of the replicates was <1·5. Follow-up samples with a Ct value of at least one of the replicates within 4·0 Ct from the sensitivity and with at least one Ct value ≥1·0 lower than the lowest Ct of normal MNC were considered as positive, with MRD levels below the quantitative range of the assay but without the possibility for accurate quantification. These samples were referred to as ‘positive, below quantitative range’. All other samples were considered as MRD negative.

A quantitative range of at least 10−4 for at least one MRD–PCR target was obtained in 19 of 26 patients; in 25 patients at least one MRD–PCR target had a quantitative range of 5 × 10−4. In the remaining patient (DCOG 8202), MRD levels were analysed by only one target with a quantitative range of 10−3; no further optimisation was performed since all follow-up samples showed high levels of MRD-positivity in this patient. Twenty-three patients were monitored by two MRD–PCR targets; in these cases the highest MRD level of the two MRD–PCR targets was used for analysis.

Risk group classification

MRD-based risk group classification was according to the MRD-based risk groups as defined by the International Berlin–Freiburg–Münster Study Group (I-BFM-SG) (Van Dongen et al, 1998; Willemse et al, 2002). Patients were considered high-risk if the MRD level at both day 42 and 3 months was ≥5 × 10−4. Patients were considered low-risk if the BM samples at both day 42 and 3 months were MRD negative (at least one target with a quantitative range ≤10−4). Remaining patients, in which high-risk and low-risk classification could be excluded, were considered as medium-risk. It should be noted that this MRD-based classification was not used for treatment risk group assignment in the DCOG-ALL9 protocol but was only used for evaluation of MRD diagnostics.

Results

MRD levels in paired BM samples

Analysis of 141 paired BM samples showed that quantitative MRD levels were detected in 50 pairs (Fig 1A). Generally, MRD levels in these 50 paired BM samples were very comparable and differed less than threefold (Pearson's correlation coefficient: 0·94). Only two pairs (arrows in Fig 1A), both obtained at day 15, had MRD levels that differed more than threefold (DCOG 7384, VH1.2–JH5: 5 × 10−2 vs. 6 × 10−3; Vδ2–Dδ3: 2 × 10−2 vs. 4 × 10−3; DCOG 8032, VH3.30–JH6: 6 × 10−3 vs. 1 × 10−3; no second target). In 18 paired BM samples, MRD levels were detectable but below the quantitative range of the assay in both samples, and 46 pairs were MRD negative.

Figure 1.

Minimal residual disease (MRD) levels in paired bone marrow (BM) samples. (A) The MRD level of the first BM sample, indicated on the x-axis, generally showed a good correlation with the MRD level in the second BM sample (y-axis). In two paired samples, quantitative MRD levels differed more than threefold (arrows); in seven paired samples MRD levels could only be quantified in one sample, the other sample was positive but with MRD levels outside the quantitative range of the assay (boxes A); and in 20 samples low MRD levels (outside the quantitative range) could only be detected in one sample (boxes B). (B) MRD levels in paired BM samples taken during and immediately after induction therapy (day 15, day 28, day 42 and 3 months). (C) MRD levels in paired BM samples taken at later time points (6, 8, 9 and 12 months). Open squares: MRD data from patients in complete remission; closed circles: MRD data from patients who ultimately relapsed.

In seven paired BM samples (boxes A in Fig 1A), MRD levels were around the quantitative range of the assay (5 × 10−4/10−4) and could be quantified in one sample only, while the other sample was assigned ‘positive, <(5 ×) 10−4’. Repetition of the MRD analysis in 10-fold replicates instead of triplicates showed that the actual MRD levels were similar in five out of the seven paired BM samples (Table I) and no significant differences in Ct values between the paired samples were observed. In the sixth pair (DCOG 7288, day 15), a maximal MRD level of 1 × 10−4 was detected in one BM sample, whereas the maximal MRD level in the other sample was ‘positive, <10−4’. For both targets (quantitative range 10−4), there was a significant difference in Ct values between the two paired BM samples (VH1.3–JH3: mean Ct values: 35·6 versus 34·5; VH5–JH6: 36·4 vs. 35·2; P < 0·05), suggesting a twofold (2ΔCt) difference in MRD levels. Moreover, in the seventh pair (DCOG 7400, day 42), a maximal MRD level of 2 × 10−4 was detected in one BM sample, whereas the maximal MRD level in the other sample was ‘positive, <10−4’. For the VH3.53–JH6 targets (quantitative range 10−4), there was a significant difference in Ct values between the two paired BM samples (mean Ct values: 37·2 versus 34·3; P < 0·05), suggesting an eightfold (2ΔCt) difference in MRD levels. The second target (Vκ2·30–Kde) had a quantitative range of 5 × 10−4; consequently in both samples MRD could not be quantified.

Table I.  Minimal residual disease (MRD) analysis in seven discordant samples (number versus positive, but no quantitation possible)a.
PatientTime pointbTarget 1Quantitative rangeTarget 2Quantitative rangeMaximal MRD level
InitialRepeatInitialRepeatBM1- initialBM2- initialBM1- repeatBM2- repeat
  1. d, day; mo, months.

  2. aSeven samples showing discordant MRD results in the initial analysis (BM1-initial versus BM2-initial; using triplicates) were re-analysed, using 10 wells for each target (BM1-repeat versus BM2-repeat).

  3. bTime point after diagnosis.

7288d15VH1.3–JH3b10−410−4VH5–JH6b5 × 10−410−4Positive2 × 10−4Positive1 × 10−4
7400d42VH3.53–DH3.9–JH6b10−410−4VK2.30–Kde5 × 10−45 × 10−4Positive2 × 10−4Positive2 × 10−4
8011d28Vδ2–Dδ35 × 10−410−4Vγ5–Jγ2·310−35 × 10−4Positive5 × 10−42 × 10−42 × 10−4
8108d15Vβ5.4–Jβ2.110−45 × 10−4VH1.69–JH4b5 × 10−45 × 10−4Positive3 × 10−4PositivePositive
81196 moSIL–TAL110−45 × 10−4Vβ4.3–Jβ2.110−35 × 10−41 × 10−4PositivePositivePositive
81198 moSIL–TAL110−45 × 10−4Vβ4.3–Jβ2.110−35 × 10−42 × 10−4PositivePositivePositive
811912 moSIL–TAL110−45 × 10−4Vβ4.3–Jβ2.110−35 × 10−4Positive1 × 10−4PositivePositive

Finally, in 20 paired samples (boxes B in Fig 1A), low MRD levels that were outside the quantitative range of the assay, could be detected in one of the two BM samples only. In order to discriminate between statistical variation and biological variation, the initial experiment (performed in triplicate) was repeated in 10-fold replicates (if sufficient amounts of DNA were still available). This resulted in concordant MRD results in 15 paired BM samples and discordant MRD results in five-paired samples (Table II). In three pairs (DCOG 7305, 3 months; DCOG 7372, 9 months; DCOG 8096, day 42), the same discordant result was observed, whereas in two pairs (DCOG 7305, 6 months; DCOG 8108, 12 months) the initial MRD negative sample became MRD positive and the initial MRD positive sample became MRD negative, respectively (Table II). In four of the five paired BM samples with discordant results, MRD could be detected using one target in a single well only; all the remaining wells were negative (Table II). Only in DCOG 8096 (day 42), one BM sample showed MRD positivity in three of 15 analysed wells, whereas the other BM sample (six wells analysed) was MRD negative.

Table II.  Minimal residual disease (MRD) analysis in 20 discordant samples (positive versus negative)a.
PatientTime-pointcTarget 1Target 2Initial (triplicates)Repeat (10-fold replicates)
Target 1Target 2Target 1Target 2Positive wellsb
BM1BM2BM1BM2BM1BM2BM1BM2BM1BM2
  1. d, day; mo, months.

  2. aTwenty samples showing discordant MRD results in the initial analysis (boxes B in Fig 1; using triplicates) were re-analysed, using 10 wells for each target (if sufficient material available, otherwise less wells were tested).

  3. bNumber of positive wells/total number of wells tested (combined data for both targets).

  4. cTime point after diagnosis.

7305d28Vδ2–Jα49VH3.21–JH6++++4/202/20
73053 moVδ2–Jα49VH3.21–JH6++1/200/10
73056 moVδ2–Jα49VH3.21–JH6++1/190/19
73513 moVδ2–Dδ3VH3.48–JH5+++1/201/18
73516 moVδ2–Dδ3VH3.48–JH5+0/200/20
73519 moVδ2–Dδ3VH3.48–JH5+0/200/20
7372d28Vδ2–Jα61VH2.26–JH6+++1/202/20
73729 moVδ2–Jα61VH2.26–JH6++1/200/20
73843 moVH1.2–JH5Vδ2–Dδ3+0/200/20
74006 moVH3.53–JH6Vκ2.30–Kde+0/200/20
74249 moDδ2–Dδ3+    0/100/10
8032d42VH3.30–JH6+    0/100/10
81083 moVH1.69–JH4Vβ5.4–Jβ2.1+0/200/20
810812 moVH1.69–JH4Vβ5.4–Jβ2.1++0/201/20
8121d42VH3.22–JH5Vδ2–Ja9+++++7/206/20
81219 moVδ2–Jα9VH3.22–JH5++++1/202/20
8096d42Dβ1–Jβ1.5Vδ1–Jδ1++0/63/15
80968 moDβ1–Jβ1.5Vδ1–Jδ1+++1/62/20
8097d28Vδ1–Jδ1Vδ3–Jδ1+0/140/6
80978 moVδ3–Jδ1Vδ1–Jδ1+0/170/17

Since the clinical use of MRD with respect to risk group stratification is often based on MRD levels during and shortly after induction therapy, we also analysed these early time points and late time points separately (see Fig 1B and C). Logically, generally higher MRD levels were detected at the early time points. However, there were no apparent differences in the comparability of the MRD levels in the paired BM samples obtained during early or later phases of therapy.

MRD-based risk group classification

Classification according to MRD levels at day 42 and 3 months was possible in 22 patients; two patients did not have day 42 and/or 3 months follow-up samples and in two patients the quantitative range of the targets was not sufficient. In four of 22 patients, MRD-based risk groups classification would have been different depending on the site of the BM puncture (Table III). These differences concerned two medium-risk versus low-risk (DCOG 7351 and DCOG 8121), one low-risk versus medium-risk (DCOG 8032) and one medium-risk versus high-risk classification (DCOG 8119).

Table III.  Minimal residual disease (MRD)-based risk group classification based on BM1 or BM2.
 BM2
HRMRLR
  1. Patients were classified according to the MRD-based risk group classification defined by the I-BFM-SG (see Patients, materials and methods), based on MRD levels in BM1 or BM2. The left panel shows the MRD-based risk group distribution based on the initial experiments (performed in triplicate), whereas the right panel shows the MRD-based risk group distribution based on the repeated experiments (performed in 10-fold replicates).

Based on data initial experiments (triplicates)
 BM1
  HR5
  MR192
  LR14
Based on data including the repeated experiments (10-fold replicates)
 BM1
  HR5
  MR111
  LR5

If the results of the repeated experiments (performed in 10-fold replicates instead of triplicates) were used for the MRD-based risk group classification, three initially discordant results (low-risk versus medium-risk) became concordant (two medium-risk, one low-risk). In patient DCOG 8119, MRD-based risk-group classification still differed between high-risk and medium-risk (MRD level at 3 months: Vβ4.3–Jβ2.1: 4 × 10−4 vs. 4 × 10−4; SIL–TAL1: 6 × 10−4 vs. 4 × 10−4). Of note, this patient relapsed 14 months after diagnosis.

Discussion

The present data demonstrate that MRD levels are highly comparable between BM samples obtained from different sites and that most differences occur with MRD levels at the lower border or below the quantitative range (reproducible part) of the RQ-PCR assay. These data indicate that for MRD diagnostics, it is sufficient to analyse a single BM sample only.

Only two pairs (DCOG 7384, DCOG 8032), both obtained at day 15, had MRD levels that differed more than threefold. In one paired sample (DCOG 8032), MRD could also be analysed by flow cytometry and, in agreement with the molecular MRD analysis, a clear difference in MRD levels was observed between the two BM samples: 0·34% versus 0·09% (using labelling TdT-CD10-CD19-CD20). Moreover, in patient DCOG 7400 (day 42) and possibly DCOG 8096 (day 42), MRD levels differed more than threefold, although precise quantification of MRD levels was not possible in all samples. The difference in MRD level in these four paired samples is probably related to the hypocellularity of the BM during early induction therapy, resulting in a higher chance of obtaining BM samples ‘contaminated’ with peripheral blood (Van Wering et al, 2001).

The vast majority of discordant results was observed in paired samples with MRD levels around or below the quantitative range of the assay (generally 10−4). Since the amount of DNA analysed in each well was 600 ng (corresponding to approximately 105 cells) the assay has a theoretical maximal sensitivity of 10−5 and samples with low MRD levels in this range may, just by statistical coincidence, be positive or negative. Indeed, repetition of the MRD analyses in the cases with discordant results using 10 wells instead of three resolved most of these differences, confirming the fact that the RQ-PCR assay is not reproducible below the defined quantitative range of the assay. In ALL treatment protocols using MRD-based risk group stratification based on cut-off levels below the quantitative range of the RQ-PCR, MRD data may thus become more accurate if the MRD analysis is performed in 10-fold replicates instead of the now generally used triplicates.

This study only included samples obtained during the first year of therapy. In most ALL treatment protocols involving MRD diagnostics, patients are stratified based on MRD levels during or shortly after induction therapy, i.e. based on the initial kinetics of the tumour load reduction. Our data show that this tumour load reduction is comparable between two independently obtained BM samples. However, it may well be that, in patients who ultimately relapse, the re-growth of the leukaemic cells is more focal (Martens et al, 1987) and that a single BM sample is not sufficient to detect an upcoming relapse. Although five patients in our study relapsed, we did not obtain paired BM samples during the 6 months preceding the relapse and consequently could not evaluate the potential focal re-growth of leukaemic cells.

If patients were classified according to MRD levels at day 42 and 3 months, discordant MRD-risk group classifications were obtained in four of 22 patients with a paired BM sample available at day 42 and/or 3 months. One patient (DCOG 8119) would have been classified as high-risk or medium-risk and was dependent on the site from which the BM sample was taken. In this patient, the MRD level at 3 months was just around the applied cut-off level of 5 × 10−4. Logically, because of the inherent variation of the RQ-PCR assay, such levels may sometimes be scored just above or just below this cut-off value. In this particular case, it appeared that the uncorrected MRD levels as determined by the two targets were in fact very similar, but the difference in MRD levels was due to a slightly higher result of the albumin control gene RQ-PCR in one sample (550 ng/well vs. 660 ng/well). Repetition of the two samples in the albumin RQ-PCR produced very comparable levels (770 and 680 ng/well). Consequently, if the DNA amounts of the repeated albumin RQ-PCR had been used, corrected MRD levels would have been similar in the paired 3 months sample and the risk-group stratification would have been intermediate-risk in both cases. However, it should be noted that this type of laboratory variation will always be present and is, therefore, inevitable. In the remaining three discordant cases [two intermediate-risk versus low-risk (DCOG 7351 and DCOG 8121) and one low-risk versus intermediate-risk (DCOG 8032)], the discrepancy was caused by one BM sample being assigned ‘positive, below the quantitative range’, the other being MRD-negative. As indicated above, such differences can be expected if a sample has a very low MRD level (around 10−5).

In clinical protocols where MRD-based low-risk patients are offered treatment reduction (such as the current DCOG ALL10 protocol), false-negative results should be avoided and guidelines for RQ-PCR data interpretation should be strict and thoroughly applied as in this study. However, in clinical treatment protocols where MRD-positive patients are offered treatment intensification, guidelines for RQ-PCR data interpretation should be less strict and should aim at the prevention of false-positive MRD results. Within the European Study Group on MRD detection in ALL (ESG–MRD–ALL), both types of guidelines have been developed (V. H. J. van der Velden, unpublished observations). Of note, if the less strict criteria for RQ-PCR data interpretation had been applied in this study, only one pair of BM samples (DCOG 7372, day 28) of the 20 discordant BM pairs (positive versus negative MRD result) would remain discordant.

In conclusion, MRD levels in paired BM samples obtained during the first year of therapy are very comparable, indicating that it is sufficient to analyse MRD in a single BM sample only. Nevertheless, MRD-based risk group stratification can differ between paired BM samples. However, in most cases this is not due to a different distribution of the leukaemic cells over the BM, but rather reflects variation in the RQ-PCR assay, particularly when MRD is detected below the quantitative range of the assay (generally 10−4).

Acknowledgements

We gratefully acknowledge Annemarie Wijkhuijs, Phary Hart, and Maaike de Bie for excellent technical assistance, Marieke Comans-Bitter for preparation of the figure, and Bibi van Bodegom for secretarial assistance. We thank the members of the Molecular Immunology Unit (Department of Immunology, Erasmus MC, The Netherlands) for helpful discussions. We gratefully acknowledge all the paediatric oncologists and Henk Westerhof and Rolina Stigter in the Erasmus MC – Sophia for obtaining patient samples. This work was supported by the Dutch Cancer Society (SNWLK2000-2268) and the Dutch insurance companies.

Disclosure

The authors declare no conflict of interest.

Ancillary