We read with interest the paper by Russo et al (2005) reporting on their multicentre phase III trial of fludarabine, cytarabine and idarubicin (FLAI) versus idarubicin, cytarabine and etoposide (ICE) for induction treatment of younger newly diagnosed acute myeloid leukaemia patients. The authors described two groups that were relatively well matched for demographics and disease-related features and randomised to receive either FLAI or ICE as first remission induction.
After a single induction course, they reported that a complete response (CR) was attained in 42 of the 57 patients (74%) randomised to receive FLAI and in 28 of the 55 patients (51%) randomised to receive ICE. CR rates rose to 46/57 (81%) and 38/55 (69%) after patients in both groups had received a second induction course with high-dose cytarabine. Excluding five early relapses in the FLAI arm and seven in the ICE arm, the authors went on to identify the patients in each group eligible for intensification of treatment with subsequent autologous or allogeneic stem cell transplantation (alive and in continuing CR) as 41/46 (89%) of the FLAI group and 31/55 (56%) of the ICE group. This difference in the percentage of patients who reached the stage of eligibility for intensification treatment was highlighted in both Tables II and IV of the text and a P-value (Chi-squared) of 0·0003 was quoted for this difference (Russo et al, 2005).
However, different denominators were used to derive these percentages for each group. For the FLAI group, the denominator used was the number of patients who achieved CR after two induction courses. In contrast, for the ICE group, the denominator used was the total number of patients initially randomised to receive ICE.
There are two denominators that could be used in this situation – the total number of patients randomised to receive each first induction regime, or alternatively, the number of patients in each group alive and in CR after two induction courses.
If either of these denominators were used for both groups then there would be no statistically significant difference between the groups (Table I).
|Group denominator||First induction regime||P-value|
|Number of patients eligible to proceed to intensification therapy||Total number of patients randomised to induction regime||41/57 (72%)||31/55 (56%)||0·09|
|Patients alive and in CR after 2 induction courses||41/46 (89%)||31/38 (82%)||0·48|
In the discussion, Russo et al (2005) went on to comment on the significant difference in the percentage of patients eligible for intensification in each induction group, highlighting, however, that the final outcome of patients following intensification treatment was not different in each induction group. This was as expected, as the analysis of final outcome for each group correctly used the total number of patients randomised to induction therapy as the denominator for each group.
Drawing valid conclusions from the results of clinical studies depends not only on correct design and empowerment of the study but also upon the correct analysis of the results obtained (Gehan, 1997).
It is of critical importance that the same respective denominator is used for each patient group to avoid spurious differences and arriving at invalid conclusions when comparing groups of patients in all clinical studies.