We analysed 43 studies examining the prognostic value of the early response to initial treatment for childhood ALL, assessed in terms of peripheral blast persistence, bone marrow blast persistence or MRD. None of these studies fulfilled the quality criteria stipulated by Simon and Altman (1994) for the acceptability of prognostic markers in clinical practice.
The reproducibility of the different assays was not demonstrated in 42 of the 43 studies, and was only partially reported in one (Lilleyman et al, 1997).
Assessment of peripheral blast persistence depends on the reproducibility of the peripheral blast count. Total circulating white cell counts are now carried out automatically. Sample dilution, storage temperature and storage duration and the nature of the anticoagulant can lead to variations (Henderson & Wood, 1986; Nelson et al, 1989). Two methods can be used to evaluate the percentage of blasts among leucocytes: the visual method, which is subjective and has poor reproducibility (Rumbke, 1985), and automated methods, the results of which depend on identifying blasts cells among young forms (D'Onofrio et al, 1987; Kawarabayashi et al, 1987).
The evaluation of bone marrow blast persistence requires a bone marrow sample. Cytological diagnosis of leukaemia is perfectly standardized and qualitatively reproducible, but evaluation of the proportion of abnormal cells is less reliable. It depends on the amount of marrow sampled (Batinic et al, 1990), the quality of the smear, the heterogeneity of the disease from one bone marrow site to another (Bernard & Mathe, 1951; Hann et al, 1977; Jacobs, 1977) and the examiner. This is reminiscent of the practical problems encountered during marrow sampling for therapeutic purposes. The volume of marrow taken on each occasion and the puncture site influence the cellularity of the sample (Bacigalupo et al, 1992). Finally, it should be noted that centralized marrow smear review does not eliminate all sources of variability.
All studies of residual disease by means of immunophenotyping or molecular biology, with the exception of one (Brisco et al, 1997), were based on marrow samples. Only one study, based on an animal model (leukaemic rats), explored the reproducibility of a MRD assay based on molecular biology (Maartens et al, 1987), and showed major variability among and within animals. Technical considerations such as the type and number of probes and the number of PCR cycles may also contribute to the lack of reproducibility (Freeman et al, 1999).
Another limitation of MRD studies based on gene rearrangements is the lack of stability of the rearrangement itself, which may change between diagnosis and relapse in between 25% and 65% of cases (Beishuizen et al, 1994; Baruchel et al, 1995).
In most of the studies, organized during prospective therapeutic trials, the laboratory assays appear to have been organized blindly to the clinical data and outcome. In the retrospective studies, this information was lacking and could not be inferred from the papers (Brisco et al, 1994; Farahat et al, 1998; Goulden et al, 1998).
Many studies were restricted to subgroups of patients defined by the use of several markers (white blood cell count, cytogenetic abnormalities, immunophenotype, etc.), making it difficult to extrapolate their results. Most teams reported few missing data on peripheral blast persistence. As regards persistence of bone marrow blasts and the three techniques used to study residual disease, the percentage of missing data was almost always > 25% and sometimes > 50%, that is to say, far > 15% recommended by Simon & Altman (1994).
Choice of cut-off points and timing of the measurement
The most frequent definition of persistence of peripheral blasts (i.e. > 1000 blasts on d 8 of treatment) was derived from a pilot study (Riehm et al, 1986). This choice was subsequently endorsed in other studies and can be considered as a standard. As regards persistence of bone marrow blasts, both the cut-off points and the assay dates varied. MRD assessments based on immunophenotyping and molecular methods used cut-off points ranging from 10−2 to 10−6 and assay dates ranging from 4 to 7 weeks after the start of treatment.