A new staging system for multiple myeloma patients based on the Southwest Oncology Group (SWOG) experience

Authors


Dr J.L. Jacobson, Southwest Oncology Group Statistical Center, 1730 Minor Avenue, Suite 1900, Seattle, WA, 98101–1468, USA. E-mail: jothj@crab.org

Abstract

Summary. We aimed to develop and evaluate a staging system for multiple myeloma (MM) based on easily obtained laboratory measures. The Durie–Salmon stage is most commonly used and is an effective system of patient stratification for clinical trial research. However, the criteria are complex and many laboratory parameters are required to properly stage patients. In this analysis, we focused on two common measures with prognostic importance in MM: serum β2 microglobulin (β2m) and serum albumin. Pre-study data on 1555 previously untreated MM patients enrolled on four recent South-west Oncology Group (SWOG) phase III trials were used in the analysis. Staging models were developed and validated using regression tree methods for survival outcomes. SWOG stages were defined as: stage 1, β2m < 2·5 mg/l (14% of patients, median overall survival of 55 months); stage 2, 2·5 ≤ β2m < 5·5 (43% of patients, median overall survival of 40 months); stage 3, β2m ≥ 5·5 and albumin ≥ 30 g/l (32% of patients, median overall survival of 24 months); and stage 4, β2m ≥ 5·5 and albumin < 30 g/l (11% of patients and median overall survival of 16 months). This staging scheme was also predictive of event-free survival, first-year mortality and long-term (≥ 5 years) event-free survival. We conclude that although the SWOG stage does not represent a new prognostic marker for MM (cytogenetics, FISH), it could provide a simple alternative to the Durie–Salmon stage for patients with previously untreated MM. Additional evaluation in other MM patient populations is needed to confirm results.

Durie–Salmon (DS) stage (Durie & Salmon, 1975) is the most commonly used staging system for patients with multiple myeloma. Developed in the mid-1970s, it has proven to be an effective system of patient stratification for clinical trial research. As the survival rate for patients with multiple myeloma varies greatly, a staging system such as DS is important in the evaluation of trial results, allowing researchers to prospectively identify patients with survival characteristics at either end of the spectrum. However, the DS staging system is complex and requires knowledge of myeloma cancer biology to properly stage patients. Some of the criteria are difficult and inconvenient to evaluate on a routine basis. For instance, for evaluating the number of lesions to be noted on the bone survey, the system does not specify if all the lesions should be in different skeletal organs or if presence in one organ suffices. Staging a patient under the DS system requires results from a bone marrow biopsy, bone survey, serum electrophoresis, and values for haemoglobin, haematocrit and serum calcium.

Since the development of the DS staging system, new prognostic factors have been identified as having importance in the pretreatment evaluation of patients with multiple myeloma. Among these factors are routine laboratory measures, such as serum β2 microglobulin (β2m) and serum albumin. A staging system based on a combination of these laboratory variables could prove to be a useful tool with the classification power of the DS system but based on more simple criteria and more easily obtained tests. In this analysis, we developed and evaluated a staging system based on two laboratory measures with prognostic importance in multiple myeloma: serum β2m and serum albumin.

β2m has been one of the few factors in univariate and multivariate analysis found to have independent prognostic importance for survival. Studies correlating β2m levels with myeloma stage, disease status and survival suggest that β2m may be a product of myeloma cells and can be used as a tumour marker to predict the course of the disease (Berggarrd & Beam, 1968; Peterson et al, 1972; Cassuto et al, 1978; Kin et al, 1979; Norfolk et al, 1980; Bataille et al, 1982; Child et al, 1983; Peest et al, 1986). Its predictability for prognosis and its involvement in tumour cell growth make it a reasonable candidate as a main factor for patient staging (Mori et al, 1999; Facon et al, 2001).

Serum albumin is an indirect indicator of interleukin 6 (IL-6) levels, liver function and the nutritional status of the patient. It is an easily obtainable, standardized test. IL-6 is a pro-inflammatory cytokine that normally fluctuates within a narrow range and is expressed at low levels, except during infection, trauma or other stress related situations. IL-6 is a potent mediator of inflammatory processes, and it has been proposed that the age-associated increase in IL-6 accounts for certain phenotypic changes associated with advanced age, particularly those that resemble chronic inflammatory disease, such as decreased lean body mass, osteopenia, low-grade anaemia, decreased serum albumin and cholesterol, and increased inflammatory proteins such as C-reactive protein (CRP) and serum amyloid A. Furthermore, the age-associated rise in IL-6 has been linked to osteoporosis, lymphoproliferative disorders and multiple myeloma (Ershler & Keller, 2000). IL-6 is a potent myeloma cell growth factor and serum levels of IL-6 reflect disease severity in both myeloma and related disorders (Bataille et al, 1989). It is of particular interest that serum IL-6 levels are inversely proportional to the serum albumin levels. Moreover, the serum albumin level is a significant indicator of the patient's nutritional status. Serum albumin level inversely correlates with dietary well-being (Mazolewski et al, 1999). Therefore, low serum albumin correlates with both rapid myeloma growth and the patients' overall performance status.

The development of a staging system involves decisions on not only which clinical factors to include, but also which statistical methods are most appropriate. The choice of a statistical modelling method is important as the results often depend on the method itself. The most commonly used method for identifying prognostic groups based on survival data is to fit a Cox proportional hazards model to the group of potential factors and to then use the resulting regression equation to divide the patients into groups. This technique often results in prognostic groups that can be difficult to characterize and define. In the development of our staging system, we employed statistical techniques designed to identify and classify data based on certain outcomes of interest that have been recently adapted to the analysis of survival data.

Patients and methods

Patient population.  Pre-study data from four recent South-west Oncology Group (SWOG) multiple myeloma phase III trials were used in the analysis (Table I). Patients eligible for these studies had untreated, newly diagnosed multiple myeloma of any stage (≥ DS stage I). SWOG S8229 (Salmon et al, 1990) evaluated VMCP (vincristine, melphalan, cyclophosphamide and prednisone) and VBAP [vincristine, carmustine (BCNU), adriamycin and prednisone] for remission induction therapy followed by VMCP versus sequential half-body radiotherapy + vincristine–prednisone in patients who achieved remission status with chemotherapy, or sequential half-body radiotherapy + vincristine–prednisone in patients who fail to achieve remission. SWOG S8624 (Salmon et al, 1994) was a comparison of VMCP/VBAP with either VAD (vincristine, adriamycin, dexamethasone) or VMCPP/VBAPP (VMCP/VBAP with prednisone between cycles) for induction followed by alpha-2b interferon or no therapy for maintenance, or alpha-2b interferon + dexamethasone for incomplete or non-responders. SWOG S9028 (Salmon et al, 1998) compared VAD with VAD/verapamil/quinine for induction, followed by alpha-2b interferon or alpha-2b interferon plus prednisone for remission maintenance. The VAD + verapamil + quanine arm of the S9028 study was closed early as a result of the excessive mortality related to quinine toxicity. The arm was not included in the analysis. SWOG S9210 (Berenson et al, 2002) compared VAD + prednisone (VAD-P) with VAD-P/quinine for induction, followed by a randomization of prednisone dose intensity for remission maintenance.

Table I.  Patients enrolled on SWOG myeloma protocols.
SWOG studyDescriptionTotal registeredTotal eligible
  • *

    The VAD + verapamil + quanine arm of S9028 was closed early as a result of excessive mortality related to quanine toxicity. The arm was not included in the analysis.

S8229VMCP + VBAP/VMCP + Lev vs vincristine + prednisone621614
S8624VMCP + VBAP vs VAD vs VMCPP + VBAPP/interferon522509
S9028VAD vs VAD + Ve + Q/interferon233 182*
S9210VADP vs VADP + Q/prednisone262250
Total 16381555

The primary outcome of interest in this analysis was overall survival (OS), measured as the time from initial study registration to time of death from any cause or last contact. The prognostic variables of interest in this analysis were common laboratory measurements collected prior to treatment on all patients registered on the SWOG multiple myeloma trials that were identified as variables with potential prognostic value, including albumin, calcium, creatinine, haemoglobin, platelet count and β2m. Other patient characteristics such as age, sex, race and disease type were examined as covariates.

To evaluate and further validate the new staging scheme in a population of non-SWOG patients, data were examined from 231 patients enrolled in the University of Arkansas Total Therapy program for newly diagnosed multiple myeloma patients (Barlogie et al, 1999). Patients were enrolled between 1990 and 1995, and treated with an induction regimen of VAD, followed by tandem transplant and a maintenance regimen of interferon alpha-2b.

Statistical methods.  OS was calculated as the time from study registration to death from any cause or last contact. Event-free survival (EFS) was calculated as the time from study registration to either progression of disease or death from any cause or last contact. Survival curves were estimated by the product-limit method (Kaplan & Meier, 1958) and compared using the log-rank test (Mantel, 1966). Cox proportional hazards regression (Cox, 1972) was used to assess the influence of prognostic factors on survival outcome. Tree regression models were based on regression tree methods adapted to survival data, as described in Crowley et al (1997). Tree pruning was based on methods described in LeBlanc (LeBlanc & Crowley, 1993). Permutation and bootstrap methods were used to calculate a bias-adjusted splitting statistic and P-value.

Results

Patient characteristics

A total of 1555 protocol-eligible patients were available for analysis (Table I). The median age was 62 years (range 26–87 years), 61% were men and 19% were African–American. Laboratory characteristics are presented in Table II, and disease characteristics, including DS stage in Table III. Laboratory variable distributions did not differ between the four studies, with a few notable exceptions. β2m was higher in older studies (65%≥ 4 mg/l in S8229 vs 57%≥ 4 mg/l in S9210), as was haemoglobin (64%≥ 10 g/dl in S8229 vs 46%≥ 10 g/dl in S9210). While a similar number of patients had more than three lytic lesions in the four studies, a greater percentage of patients had no bone lesions in the more recent studies (25% in S9210) than the older studies (16% in S8229). Disease characteristics [immunoglobulin (Ig) isotype, light chain isotype] were similar across studies.

Table II.  Patient and laboratory characteristics.
 All patients (n = 1555) 
  1. n, number of patients tested.

Agen = 1555 
 Median (min, max)62 (26, 87) 
 ≥ 65 years 39%
Albuminn = 1534 
 Median (min, max)3·6 (0·3, 7·2) 
 < 30 g/l 20%
Calciumn = 1545 
 Median (min, max)2·4 (0·6, 4·7) 
 ≥ 2·5 mmol/l 34%
Creatininen = 1549 
 Median (min, max)115 (27, 1947) 
 ≥ 177 mol/l 22%
Haemoglobinn = 1551 
 Median (min, max)10·3 (3·7, 20·0) 
 < 10 g/dl 42%
Plateletsn = 1533 
 Median (min, max)235 (14, 1161) 
 < 200 × 10−9/l 35%
Serum β2mn = 1415 
 Median (min, max)4·8 (0·0, 63·7) 
 ≥ 4 mg/l 61%
Bone lesionsn = 1513 
 None28519%
 Osteoporosis1087%
 < = 3 lytic lesions20718%
 > 3 lytic lesions85056%
Table III.  Disease characteristics.
 All patients (n = 1555) 
  1. n, number of patients tested.

Serum light chainn = 1464 
 None16711%
 Kappa80955%
 Lambda48533%
Serum heavy chainn = 1539 
 None30520%
 IgG89458%
 IgA32521%
 IgM/IgD/IgE151%
Renal stagen = 1549 
 A114774%
 B40226%
DS stagen = 1549 
 I–II42027%
 IIIA78951%
 IIIB24022%

Median OS in the combined data set was 33 months. OS varied slightly between the four studies (Fig 1). OS by DS stage in this patient population is presented in Fig 2. Median OS by DS stage ranged from 57 months in stage I patients to 21 months in stage IIIB patients. The distribution of DS stage differed slightly between studies, with a smaller percentage of patients identified as either stage I–II or stage IIIB in the more recent studies (32% stage I–II in S8229 vs 28% stage I–II in S9210; 24% stage IIIB in S8229 vs 18% stage IIIB in S9210).

Figure 1.

OS by SWOG multiple myeloma protocol. Median OS ranged from 31 months to 38 months. The percentage of surviving patients in shown on the axis.

Figure 2.

Kaplan–Meier survival curves comparing DS stage and SWOG stage in the model data set, validation data set, and overall population for OS, EFS and Total Therapy I patients. The percentage of surviving patients in shown on the Y-axis and years since registration on the X-axis. Durie–Salmon stage I (___), II (.…), IIIA (_._._.) and IIIB (___ _), and SWOG stage 1 (___), 2 (.......), 3 (_._._.) and 4 (___ _) are shown as separate curves.

Prognostic factors

DS stage is a statistically significant (P < 0. 001) predictor of OS in the overall population, although it proved to be less predictive in some subgroups (women, IgA isotype). Univariate Cox regression analysis showed that pre-study albumin, calcium, creatinine, haemoglobin, platelets, β2m and the number of bone lesions had prognostic significance at P < 0·05, both as continuous variables and as dichotomous indicators split at clinically accepted values (Table IV). No statistically significant survival differences were found for either sex or race.

Table IV.  Cox regression results for OS.
 nContinuousDichotomous
HRP-valueSplitHRP-value
  • *

    Log( ) of variable used.

  • HR, hazard ratio; n, number of patients; num, number of bone lesions.

Univariate
 Age15551·02< 0·001≥ 65 years1·32< 0·001
 Sex1555  Women0·980·072
 Race1555  African–American1·00·99 
 Albumin (g/l)15340·76< 0·001< 30 g/dl1·63< 0·001
 Calcium (mmol/l)15451·11< 0·001≥ 2·5 mmol/l1·52< 0·001
 Creatinine* (mol/l)15491·51< 0·001≥ 177 mol/l1·61< 0·001
 Haemoglobin (g/dl)15510·90< 0·001< 10 g/dl1·47< 0·001
 Platelets (× 10−9/l)15330·99< 0·001< 150 × 10−9/l1·82< 0·001
 Serum β2m* (mg/l)14151·48< 0·001≥ 4 mg/l1·71< 0·001
 Bone lesions (num)15131·12< 0·001> 3 lesions1·36< 0·001
MultivariateStep  Step  
 Albumin20·82< 0·00141·41< 0·001
 Calcium61·050·01151·33< 0·001
 Haemoglobin50·960·01061·190·006
 Platelets40·99< 0·00121·59< 0·001
 Serum β2m*11·30< 0·00111·37< 0·001
 Bone lesions31·13< 0·00131·34< 0·001

Elevated calcium, platelet count and β2m, decreased albumin and haemoglobin, and more than three lytic lesions were identified as having an independent prognostic effects in the multivariate (forward) stepwise Cox model analysis (Table IV). β2m was the first variable entered into the stepwise multivariate model for both the continuous and dichotomous factor models. An interesting result of the multivariate dichotomous factors models is the similarity of hazard ratios among all of the factors.

Tree regression models

In order to validate the predictive potential of the new staging scheme developed in this analysis, the full data set was split to create training and validation data sets. A random sample of approximately two-thirds of the data (1000 patients) was taken from the full data set to create the training data set; the remaining one-third comprised the validation data set (n = 555 patients). The random sampling did not lead to important differences in the distributions of the lab variables, patient and disease characteristics, or outcome characteristics between the training and validation data sets.

The recursive partitioning process behind tree regression modelling proceeds by first splitting the predictor space into two regions or nodes, based on a specified rule. As the proportional hazards model proposed by Cox (Cox, 1972) is the most commonly used analysis tool for survival data, the Cox model log-rank statistic was used as the splitting rule. The log-rank statistic was calculated for each potential split point for each variable of interest. The maximum log-rank value from this group of log-rank values was chosen as the first split. In this training data set, the first split was chosen to be a β2m of 5·5 mg/l.

As a diagnostic tool, it is often helpful to look at a plot of the log-rank values for the possible split points of a given variable. Figure 3 plots the log-rank values for potential split points for β2m. This plot shows that the overall log-rank maximum is achieved at a β2m around 5·5 mg/l, but this maximum does not necessarily distinguish itself from many other potential split points. Also included on this plot are lines representing significance thresholds for α = 0·05 and α = 0·001 level tests. These values have been adjusted upwards for all multiple comparisons on the variable, but it is still clear that all split points would be considered statistically significant for β2m based on these thresholds. The most likely explanation is that very low values of β2m (typically < 2·5 mg/l) represent the favourable prognosis group, and for values above that level only the choice of best separation between two prognostic groups remains.

Figure 3.

Plot of Cox model log-rank statistic for potential split points in β2m based on OS model. Log-rank maximum is β2m = 5·5 mg/l, but this plot shows that most cut-off points identify distinct groups as the log-rank statistic is consistently above P = 0·001 reference line.

The rule was then applied recursively to the resulting nodes until the space had been split into a large number of nodes with a few observations each. As this large tree represents an over fit to the data, an algorithm for pruning back the branches of the tree to choose the ‘best’ subtree is then applied. This pruning method keeps only those splits deemed to be statistically significant (P < 0·05) after correcting for the potential bias that could be introduced by fitting such a large number of Cox models. Figure 4 presents the pruned tree for the β2m and albumin model. The terminal nodes of the tree identified four groups as distinct prognostic groups based on β2m and albumin values. Good prognosis patients are those with a β2m < 2·5 mg/l. The middle prognosis patients are those with a β2m between 2·5 and 5·5 mg/l (HR = 1·48), or β2m ≥ 5·5 mg/l and albumin ≥ 30 g/l (HR = 2·27). Poor prognosis patients are those with a β2m ≥ 5·5 mg/l, with the worst being those with and albumin < 30 g/l (HR = 3·32). OS by these four tree-defined groups is presented in Fig 2. This model seems to define four distinct groups.

Figure 4.

Pruned regression tree based on β2m and albumin in model data set. HR refers to hazard ratio comparing each group with the best prognosis group (β2m < 2·5 mg/l).

To validate the model developed in the training data set of patients, the tree-defined groups were applied to patients in the validation data set (Fig 2). The prognostic groups are well defined with clear separation between the survival curves estimated on the test data.

Using the results of this tree regression model, a new staging system was defined using β2m split at 2·5 mg/l and 5·5 mg/l, and albumin split at 30 g/d. Patients with β2m < 2·5 mg/l were defined as the best prognosis group (stage 1) with a median OS of 55 months, including 16% of the patients (Fig 2). Patients with 2·5 mg/l ≤ β2m < 5·5 mg/l had the next best prognosis (stage 2), with a median OS of 42 months, including 37% of the patients. In the group of patients with β2m ≥ 5·5 mg/l, albumin split at 30 g/l identifies two distinct prognosis groups. Patients with β2m ≥ 5·5 mg/l and albumin ≥ 30 g/l identified a poor prognosis group (stage 3) with a median OS of 25 months, including 35% of the patients. Patients with β2m ≥ 5·5 mg/l and albumin < 30 g/l identified the worst prognosis group (stage 4) with the hazard ratio of > 3·0, when compared with the β2m2 2·5 mg/l group, and a median OS of 18 months, including 12% of the patients. For no other reason than SWOG, data were used to create this new staging scheme, which is subsequently referred to as called the ‘SWOG stage’.

To see how the SWOG staging scheme would change if all variables found to have independent prognostic effect in the multivariate Cox models were included, a regression tree was created and pruned using the variables from Table IV. This all-inclusive tree had a very similar structure to the tree created using only β2m and albumin, adding only bone lesions and calcium to final tree (see Fig 5). This result appeared to indicate that β2m and albumin represent the prognostic information contained in the larger set of lab measures well, and that if an additional lab characteristic were to be added to the list of variables considered, a measure of bone disease might be the best choice, as both the number of bone lesions and calcium appear in this more inclusive tree model.

Figure 5.

Pruned regression tree considering all variables from multivariate regression model. HR refers to hazard ratio comparing each group with best prognosis group (β2m < 2·5 mg/l).

A logical question for this new staging scheme is how well it did (or did not) approximate the DS staging system. While there was a statistically significant correlation between SWOG stage and DS stage (P < 0·001), Table V shows that patients classified by SWOG stage are fully distributed among the DS stage, and vice-versa.

Table V.  Comparison of DS stage with SWOG stage.
DS stageIIIIIIAIIIB
Percentage of all patients6%21%51%22%
In patients with
 β2m < 2·5 mg/ml10%29%55%6%
 β2m > = 5·5 mg/ml4%16%42%39%
 Albumin < 30 g/l3%13%55%29%
 SWOG stage 110%29%55%6%
 SWOG stage 28%25%60%7%
 SWOG stage 34%17%38%42%
 SWOG stage 42%7%46%45%
SWOG stage1  2  3  4  
Percentage of all patients14%43%32%11%
In patients with
 DS stage I23%55%19%3%
 DS stage II19%52%25%4%
 DS stage IIIA15%51%24%10%
 DS stage IIIB3%14%61%22%

Progression-free survival

Progression-free survival (PFS) is an outcome commonly examined in multiple myeloma, usually in the context of evaluating treatment effects. Identifying groups with a poor PFS allows researchers to exclude the patient populations that have little chance of benefiting from treatment and may dilute any potential treatment benefit in comparisons with standard therapies. On the other hand, this would identify a group of patients that might need a different treatment strategy for their aggressive disease. As can be seen in Fig 2, the DS stage for the most part identified distinct prognostic groups for PFS, with stage IIIB patients remaining progression free and alive for a median 13 months. Likewise, the SWOG stage identified distinct prognostic groups for PFS, with stage IV patients having a median PFS of only 10 months (Fig 2).

Total Therapy patients

As external validation, the new SWOG staging scheme was applied to patients on the University of Arkansas Total Therapy program (Barlogie et al, 1999). Patients on this Total Therapy program had a better OS than SWOG patients (median OS 5·7 years) and, in general, presented with an earlier stage of disease (43% stage DS stage I–II). Figure 2 shows the OS by DS stage for this group of patients. Aside from stage IIIB patients, the DS staging system did not define distinct prognostic groups (in terms of OS). Figure 2 shows the OS by SWOG stage in this group of Total Therapy patients. While we didn't see the distinct separation found in the SWOG patient data, the SWOG stage does appear to have better prognostic value in this group of patients. Patients with a β2m ≥ 5·5 mg/l had a distinctly worse survival then those with a β2m < 5·5 mg/l. Further classification by β2m < 2·5 mg/l or albumin < 30 g/l was not important. However, a smaller percentage of patients in this population had a β2m ≥ 5·5 mg/l, and these small numbers make evaluating the prognostic potential of albumin in patients with an elevated β2m more difficult.

Secondary outcomes

The ability of the DS staging system and SWOG staging system to identify distinct groups based on the secondary response and survival outcomes of interest are examined in Table VI. An association between a lower stage and with a higher complete remission (CR) rate was not observed with either staging system; both were marginally better at predicting durable CR (defined as CR lasting at least 4 years). Both DS stage IIIB and SWOG stage 4 identified groups with a comparatively high 1-year mortality (35% and 40% respectively) and low long-term (> = 5 years) EFS (8% and 4% respectively). SWOG stage 1 identified a group of patients with a low 1-year mortality (8%), while DS stage showed little difference in 1-year mortality rates between stages I–IIIA. Both DS stage I and SWOG stage 1 identified a group of patients with a superior long-term EFS (29% and 24% respectively).

Table VI.  Comparison of complete response rate and survival outcomes.
 Percentage of patientsComplete responseSurvival
RateDurable CR1 year mortalityLong-term EFS
DS stage
I6%28%33%16%29%
II21%38%23%15%21%
IIIA51%46%15%18%11%
IIIB22%37%18%35%8%
SWOG stage
114%42%19%8%24%
243%44%18%13%15%
332%37%17%27%11%
411%40%8%40%4%

Discussion

The objective of this analysis was to develop a system for staging patients with previously untreated multiple myeloma using only widely available laboratory tests. DS stage has proven to be an effective and informative prognostic tool, but the overall complexity of the criteria used to stage patients makes its implementation less that straightforward at times. The staging system proposed here is simple and would be applicable to all patients. Serum β2m and albumin are known markers of disease in multiple myeloma, with β2m being a tumour marker (Durie et al, 1990) and albumin being a marker of rapid disease growth (Bataille et al, 1989). Serum albumin is also a marker of a patient's nutritional status and was found to have a high degree of negative correlation with SWOG performance status (P < 0·0001).

A staging system for multiple myeloma patients based on β2m and albumin levels is not a new idea; a system very similar to the one proposed here was explored in 1986 by Bataille (Bataille et al, 1986). The staging system proposed by this group differed somewhat and used a slightly higher cut-off point for β2m. They proposed a low-risk stage as a β2m < 6 mg/l and albumin > 30 g/l, an intermediate-risk stage as a β2m ≥ 6 mg/l and albumin > 30 g/l, and a high-risk stage as albumin ≤ 30 g/l. In comparison with a number of staging systems for multiple myeloma (DS, Medical Research Council and Merlini–Waldenstrom–Jayakar), they found that the β2m and albumin combination was the most predictive. This prognostic model was again proposed in an analysis of SWOG 8229 data by Durie (Durie et al, 1990).

The development of the SWOG staging system differs in that adaptive statistical methods were used to select cut-off points for β2m and albumin, essentially letting the data dictate how the most prognostically significant groups should be defined. Most prior analyses that have been used to develop new prognostic groups used predefined cut-off points on the variables of interest and then went about the process of selecting variables through multivariate regression modelling. Groups based on regression equation tertiles or on a sum-of-bad-independent-prognostic-factors approach may be easy to implement, but can result in groups that are more difficult to describe. Tree regression modelling results in groups that are easy to explain and interpret. In addition, tree regression introduces a hierarchical structure to the model that addresses and accounts for interactions in the data.

The prognostic abilities of the SWOG staging system could potentially be improved by including other factors of prognostic importance, as we saw in the regression tree based on a more complete set of prognostic factors (Fig 4). From those results, it appears that measures of bone disease may add prognostic information to the staging system. However, the added information is small and adds variables to the criteria. Recent studies have shown the dramatic prognostic importance of additional measures such as plasma cell labelling index (PCLI) and cytogenetics (chromosome 13 deletion). Once again, these measures are more difficult to obtain and would reduce the simplicity of the proposed SWOG staging system.

As would be expected of a useful staging system, the SWOG stage had prognostic importance when evaluating outcomes in addition to OS in the data used for analysis. For EFS, SWOG stage 4 identified a group of patients with a very poor progression-free duration (median 10 months) and also a comparatively high 1-year mortality rate. SWOG stage was not as efficient as DS stage in identifying patients with an improved chance of durable CR and, in addition, DS stage I patients had the worst CR rate among the stages, which are both signs indicating that stage may not be a good predictor of response. However, numerous analyses have shown little association between CR and OS in myeloma patients.

The SWOG stage could be a simple alternative to DS stage for patients with previously untreated multiple myeloma. It has an advantage in that it is based on common laboratory measurements that are widely available and can be reliably determined and reproduced. Improved simplicity can be helpful when a measure of disease stage is used to prospectively identify and stratify patients as part of a registration process for clinical trials. For this reason, the SWOG stage is currently used to stratify patients to SWOG front-line myeloma protocols. This staging system should appeal to clinicians as a tool to identify a patient's potential prognosis when evaluating potential therapies. More complex systems (e.g. molecular/genetic) may only be necessary if selective targeted therapy is proposed, for example, therapies for patients with deletion 13.

This analysis on a large data base of previously untreated myeloma patients confirms previous analyses on smaller groups of patients that also identified β2m and albumin as the most important prognostic factors in multivariate analysis. Additional evaluation in other myeloma data sets is needed to confirm this proposed staging system. In data presented at the 2002 ASH meeting, Dimopoulos (Dimopoulos et al, 2002) showed a high utility for SWOG stage in 397 myeloma patients from the Greek Myeloma Study Group (GMSG). The data that were used to develop the SWOG stage system are now part of an International Prognostic Index project coordinated by the International Myeloma Foundation (IMF), where myeloma data from all over the world will be used to identify and define prognostic staging systems with similar applications to that of SWOG stage.

Ancillary