Testing Sokal's and the new prognostic score for chronic myeloid leukaemia treated with α-interferon: comments

Authors


We read with great interest the article by the Italian Cooperative Study Group on Chronic Myeloid Leukaemia (CML) (2000) who applied the new CML score (Table I) of Hasford et al (1998) to a data set of 272 patients treated with interferon alpha (IFN-α).

Table I.  Correction of the coefficients given for age and basophils in Table I of the article of the Italian Cooperative Group on CML (2000): the New CML score's value is evaluated by Table I.
New CML score = 1000 × [0·6666 × (0, when age < 50 completed years; 1, otherwise)
0·0420 × (spleen size in cm under left costal margin)
0·0584 × (% blasts in peripheral blood)
0·0413 × (% eosinophils in peripheral blood)
0·2039 × (0, when basophils < 3% in peripheral blood; 1, otherwise)
1·0956 × (0, when platelet count < 1500–109/l; 1, otherwise)]

The credibility of a prognostic model is increased when confirmation comes from outside the institution at which the prognostic model was evaluated. The Italian Cooperative Study Group on CML (2000) was the first to publish such external validation for the new CML score. Among their patients, the classification of the new CML score led to three risk groups with statistically significantly different survival times. Their results substantially support the new CML score's validity.

Because of the particular inclusion criterion ‘age < 56 years’, the Italian sample consisted of 78% of patients with the same age group (< 50 years). Taking this restriction of variability into account, the performance of the new CML score in the Italian sample is even more remarkable.

However, the gist of the paper by the Italian Cooperative Study Group on CML (2000) was a detailed comparison between the new CML score and the Sokal score (Sokal et al, 1984). The reason why Hasford et al (1998) aimed at the identification of a new prognostic model was because the Sokal score could not provide a satisfactory risk group discrimination for survival of IFN-treated patients in a couple of samples; a result which is also displayed by the paper of the Italian Cooperative Study Group on CML (2000) who noticed a statistically significantly different survival between the low-risk group and the intermediate-risk group of the new CML score, but not between the corresponding groups of the Sokal score. Their data suggest why test results were different: compared with the Sokal score, the new CML score placed 28 (21%) more patients but only six more observed deaths in the low-risk group. This and basically identical low-risk survival curves with the same median survival time (105 months) for either score indicate that the new CML score's allocation of 160 patients to the low-risk group was justified. They also showed that 23 of these 28 additional patients had intermediate risk according to the Sokal score. Leaving a substantial number of actual low-risk patients in an intermediate-risk group does, of course, increase its median survival time and decreases the statistical difference between the survival curves of both risk groups. On the other hand, an intermediate-risk group including low-risk patients increases the difference in survival when compared with the high-risk group. The high-risk groups of both scores had a median survival of 45 months, but 15 of 62 high-risk patients according to the Sokal score (24%) had a survival time of more than 60 months, three of them with censored survival times > 96 months. With regard to the new CML score, only 4 out of 25 patients (16%) had survival times > 60 months, all of whom died before month 96.

The Sokal score was developed within a sample of chemotherapy-treated patients. This is generally accepted as a reason why it does not work satisfactorily with patients treated otherwise. However, there was also a different methodical approach in defining boundaries. Regarding the new CML score, the minimal P-value approach was applied, a statistical procedure which is able to identify boundaries maximizing the difference of the resulting groups with respect to the investigated outcome variable. Thus, boundaries were suggested by real data and the three risk groups with the most different survival curves were selected. In contrast, Sokal et al (1984)‘divided into three subgroups of roughly similar size, using hazard ratios of 0·8 and 1·2 as boundaries.’ This indicates that the decision for the boundaries was driven by the idea of establishing risk groups of similar size and, maybe, by choosing boundaries that were equidistant from the hazard ratio 1·0. Their three risk groups were also different with regard to survival, but boundaries were rather suggested by Sokal et al (1984) than by real data. It is improbable that their suspected non-statistical proceeding of defining risk groups depicts the ‘real’ percentages of patients who should belong to the low-risk or high-risk groups as well as the new CML score does. Moreover, Sokal et al (1984) assigned mean values for missing items that ‘may have introduced a slight bias against prognostic variation’.

Hence, we cannot agree with the conclusions of the Italian Cooperative Group on CML (2000). We do not think that the Sokal score remains useful for these kind of patients and are concerned by the authors' implicative recommendation to use the Sokal score instead of the new CML score for the identification of high-risk patients.

The patients to whom the two scores attribute different risk groups are the patients to whom the prognostic model preferred by the clinicians really matters. If any clinician relies on the wrong high-risk group allocation of the Sokal score, he or she might turn down a therapy including IFN-α for his or her patient. This could be harmful to the patient: patients with intermediate risk or low risk according to the new CML score have the potential to become a haematological or cytogenetic responder when treated with IFN-α and, within either risk group, both kind of responses lead to a statistically significantly better survival compared with non-responders.

A sample of ‘younger’ patients with a high-risk group of less than 10% should be taken as good news. It does not make sense to try to artificially enlarge this risk group at the cost of patients who are in fact intermediate-risk patients and could respond to IFN-α.

Ancillary