Allogeneic transplantation for renal cell carcinoma has shown encouraging preliminary results. We reviewed the published literature to evaluate the impact of patient selection. Most studies did not include information on prognostic factors. We used patient entry rank within individual studies as a novel surrogate for patient selection, motivated by our own experience of an apparent impact of entry rank. One hundred patients were identified from nine studies. Twenty-six per cent of patients demonstrated either a partial or complete response. Median overall survival was 12·3 months. Grade 2–4 acute graft-versus-host disease correlated with an increased likelihood of response (odds ratio: 5·4, 95% confidence interval: 1·6–18·1, P = 0·006) but not survival. Earlier patient entry rank on each trial was associated with a higher probability of response (P = 0·004) and superior survival (P = 0·004). Patient entry rank served as a powerful prognostic factor, suggesting bias in patient selection that evolved over the course of the study. Further studies are warranted to determine the influence of order of patient entry in other early clinical trial settings.
We hypothesised that patient selection is the most important determinant of transplant outcome. Preliminary data at our own institution on 18 patients with RCC who underwent a uniform transplant regimen demonstrated that traditional prognostic factors, such as anaemia and reduced performance status, strongly influenced outcome (Artz et al, 2005). Moreover, the dramatically different response rates observed in the literature, despite fairly similar non-myeloablative conditioning regimens, further suggested that patient selection, rather than treatment differences, accounted for the observed variability in outcomes.
The absence of adequate reporting of prognostic factors across studies, a common problem in early phase studies, limited the analysis of patient selection. Motivated by the observation that good risk patients were clustered as early entrants on our own study, we investigated the impact of patient order of enrolment within the published literature, as a possible indirect surrogate of patient selection. Although the phenomenon of ‘patient drift’ has been reported, we are unaware of a rigorous analysis quantifying the importance of patient entry rank (Kalish & Begg, 1987).
Patients and methods
We selected published articles that included data on response and survival for individual patients. MEDLINE was searched in January 2005 by combining the term ‘renal cell carcinoma’ with ‘transplant’, ‘allogeneic transplant’, ‘non-myeloablative transplant’, and ‘reduced-intensity transplant’. Articles retrieved by this search were analysed. Exclusion criteria included reports that lacked detailed individual level data, case reports, or data reported solely in abstract form.
The primary outcomes were tumour response (partial and complete, excluding minor responses) and overall survival as defined by the authors. No independent verification or central review was used.
Data on individual patients was recorded including the entry rank of each patient on a trial from the summary tables in each article, without knowledge of calendar time. No additional follow-up from the published series was obtained. Although by convention, published tables typically list patients in accrual sequence for early phase studies, we contacted individual authors for further confirmation. Six of nine authors responded, and all confirmed that the sequence published reflected the actual accrual sequence; these six studies represented 74 of 100 patients.
Response differences across centres were compared using Fisher's exact test because of the small number of patients enrolled in each centre. Multivariate logistic regression was used to estimate the influence of predictor variables on response and to obtain odds ratios (OR), 95% confidence intervals (CI) and P values. As no responses were observed in two studies (Pedrazzoli et al, 2002; Hentschke et al, 2003), they were excluded when modelling response as an outcome. The reduction in heterogeneity after excluding these two studies was found by partitioning chi-square (Cochran, 1954).
Median, 100 d, and overall survival was determined from the day of transplant by the method of Kaplan and Meier using the time to event data provided in each article. Cox proportional hazards models were used to examine variables influencing survival and to estimate hazard ratios (HR), 95% CI, and P values. Estimates of multivariate logistic and Cox regression models incorporated the individual studies and only the additional predictor variable of interest, unless otherwise stated. A study indicator variable was included in all models to adjust for between-centre differences. Reduction in heterogeneity resulting from exclusion of the study by Pedrazzoli et al (2002) was assessed using a 1-degree of freedom Wald test based on the Cox regression model with all nine studies.
A smoothed plot of martingale residuals against patient entry rank based on Cox proportional hazards regression indicated that log-transformation of patient entry rank achieved the optimal functional form (Therneau & Grambsch, 2000). Logistic regression models also used log-transformed patient entry rank for consistency. The slope of the smoothed martingale residual versus untransformed patient entry rank was different for the first five patients than for those with higher ranks, suggesting that the effect of patient entry rank might differ for early, compared with late, patient enrolment. As in our own data, we had observed that the first five patients accrued fared better than patients enrolled later, we defined an indicator variable for the first five patients on each study. Because of varying study size, we also investigated the effects of being among the first quartile of patients enrolled compared with subsequent enrollment in a similar fashion. HR, OR, 95% CI and P-values are reported for the dichotomous variables. Other variables considered in multivariate models because of consistent availability were recipient age, recipient sex, presence of grade 2 or greater acute graft-versus-host disease (aGVHD), and study to which patients were enrolled. With almost all patients having clear cell carcinoma and having undergone a nephrectomy, insufficient variation existed to model these parameters. The cumulative probability of aGVHD could not be precisely modelled as only occasional reports documented the date of onset for aGVHD. Forward model selection methods were used, and study membership was adjusted in all the models.
All statistical tests were two-sided with an alpha value of 0·05, unless otherwise stated. Statistical analyses were performed using Stata Statistical software, Version 8 (Stata Corporation, College Station, TX, USA).
The data consistently available included patient entry number, recipient age, recipient sex, aGVHD, response, death, nephrectomy status, histology and survival time. The prognostic factors, performance status, presence of anaemia, lactate dehydrogenase, disease progression before transplant, comorbidity, presence of hypercalcaemia, cell dose infused, donor age, date of onset of aGVH, presence of chronic GVHD, date of onset of response, and donor sex, were not uniformly available and thus were not included in our analysis.
Reporting of eligibility for studies
Eight of the nine studies described eligibility criteria, with Hentschke et al (2003) being the exception. Inclusion criteria uniformly included metastatic RCC with at least one prior therapy and availability of a human leukocyte antigen (HLA)-matched donor.
The histological description and detail varied. Nevertheless, almost all patients had clear cell carcinoma or a combination of clear cell carcinoma and another histological subtype. One manuscript stated only that patients had ‘adenocarcinoma’ (Hentschke et al, 2003).
Almost all patients received fludarabine-based conditioning regimens in combination with a variety of additional agents. GVHD prophylaxis always included a calcineurin inhibitor (cyclosporin A or tacrolimus), usually in combination with at least one other immunosuppressive and/or anti-thymocyte globulin. The schedules of tapering of immunosuppressives varied considerably, ranging from as early as 30 days (usually for cases of disease progression) to later than 90 days after transplant.
Donors were HLA-identical siblings in 89 cases, one antigen mismatched sibling in three cases, matched unrelated donors in seven cases, and one antigen mismatched unrelated donor in one case. Mobilised peripheral stem cells were used in 99 patients and bone marrow in one patient.
Response, early mortality and survival
Twenty-six of 100 patients (26%) responded, but mean response rates among studies ranged from 0% to >50% (Table II). Twenty per cent died by day 100 with only one patient censored before day 100 for lack of follow up. Figure 1 depicts overall survival curves stratified by study. The curves for eight studies were fairly similar, but the study by Pedrazzoli et al (2002) showed inferior survival. Thus, much of the data was analysed both with and without these data. Median overall survival for all patients was 370 days but it was 413 days when excluding the study by Pedrazzoli et al (2002).
Differences across trials
The response rates differed significantly across all nine studies (P = 0·017, Fisher's exact test). However, after excluding the two studies without responses, no statistically significant difference was observed among the remaining seven studies with a total of 83 patients (P = 0·143), and the reduction in heterogeneity was found to be significant (P = 0·007). Overall survival also significantly varied across studies, (P < 0·0001 based on the log-rank test). The difference disappeared upon exclusion of the data from Pedrazzoli et al (2002) (P = 0·47) in which all patients died before day 100. Across the studies, 100-day mortality differed from 0% to 100% (P < 0·001) but this difference was not significant after excluding the study by Pedrazzoli et al (2002) (P = 0·50). A significant reduction in heterogeneity of 100-day mortality (P < 0·0001) and overall survival (P < 0·0001) was found when the study by Pedrazzoli et al (2002) was excluded.
Patient entry rank
Patients enrolled earlier had an increased probability of response and superior overall survival. Logistic regression analysis adjusted by study (thus dropping two studies without responses because of the complete concordance of study indicator and response category) demonstrated an association of earlier patient entry with an increased probability of response (OR = 2·99, 95% CI: 1·42–6·30, P = 0·004). Similarly, earlier patient entry was associated with better survival (HR = 0·55, 95% CI: 0·36–0·83, P = 0·004)(Fig 2A). After excluding the study by Pedrazzoli et al (2002), the effect of patient entry on survival remained significant (HR = 0·57, 95% CI: 0·37–0·88, P = 0·011). The associations with response (OR = 2·72, 95% CI: 1·28–5·77, P = 0·009) and survival (HR = 0·54, 95% CI: 0·36–0·82, P = 0·003) also persisted after accounting for both sex and age.
For ease of clinical interpretation, patient entry was modelled in an alternative way by comparing the first five patients on a given trial to subsequently enrolled patients on the same trial. Among the first 5, 17/45 (38%) responded as opposed to only 9/55 (16%) of those subsequently enrolled. Being among the first five patients increased the probability of response (OR = 9·1, 95% CI: 2·4–34·4, P = 0·001), and represented a positive prognostic factor for survival with a HR of 0·48 (95% CI: 0·26–0·90, P = 0·021). It remained a positive prognostic factor for both response (P = 0·002) and survival (P = 0·019) after adjusting for sex and age.
Because of differences in study sample size (range 7–19), we additionally investigated whether being among the first 25% of the patients enrolled in a study predicted response or survival. The first quartile showed a benefit in response (OR = 9·9, 95% CI: 2·803–34·98) and overall survival (HR = 0·33, 95% CI: 0·16–0·68 P = 0·003) compared with the subsequent quartiles. The response rates by quartile were 50%, 16%, 15% and 20% for first, second, third and fourth quartiles respectively (P = 0·005, Wald test adjusting by study). Similar HR for survival were found for the second, third or fourth quartiles compared with the first quartile (HR = 3·24 95% CI: 1·47–7·12, HR = 2·85 95% CI: 1·26–6·48, and HR = 2·67 95% CI: 1·00–7·13 respectively). Estimated survival curves are given in Fig 2B.
Acute graft-versus-host disease
Grade 2–4 aGVHD, as defined by the authors, occurred in 41% of individuals. Response occurred in 18/41 (41%) of patients with aGVHD compared with only 8/59 (14%) responses in those without aGVHD (P = 0·001 chi squared test). By logistic regression, the occurrence of aGVHD correlated with an increased probability of response (OR = 5·4, 95% CI: 1·6–18·1, P = 0·006) after adjusting by study. The effect of aGVHD on survival could not be precisely modelled because of inconsistent reporting on day of onset of aGVHD. However, if treated as a baseline variable, no association was found between survival and presence of aGVHD grade 2–4 (HR = 1·39, 95% CI: 0·71–2·68, P = 0·33). In an alternative analysis we used day 50 as the starting point (as aGVHD had usually occurred before day 50), and treated aGVHD as a baseline variable for those surviving past day 50. The results were similar (HR = 1·29, 95% CI: 0·65–2·55, P = 0·47). Finally, adjusting for aGVHD had no impact on the influence of patient entry rank.
Recipient sex and recipient age
Logistic regression demonstrated a non-significant trend toward increased responses for male recipients (P = 0·063). The association diminished when accounting for accrual sequence as men tended to have earlier patient entry rank (P = 0·14). No correlation was detected between response and age (P = 0·37). Similarly, no significant association existed between sex or age with overall survival (P = 0·77 and P = 0·90 respectively).
The purpose of this overview of studies employing NST for metastatic RCC was to test the hypothesis that patient selection heavily influenced the outcome. Our data provide a descriptive summary of the published literature, and demonstrate the prognostic importance of patient entry rank. They indirectly suggest the enrollment of healthier patients earlier in these studies.
Nine published studies comprising 100 patients were analysed. The mean individual overall response rate was 26% with 100-day survival of 80% and median survival of 1 year. Responses and survival significantly differed across series, with patients doing particularly poorly in the study by Pedrazzoli et al (2002). Even after excluding this study, large variation existed across the eight other studies with responses ranging from 0% to over 50% (P = 0·035) and 100-day mortality fluctuating from 0 to 28%.
A recent report by Blaise et al (2004) (not included in our analysis due to lack of data on individual patients) demonstrated only an 8% response rate among 25 RCC patients undergoing T-cell depleted NST, raising the possibility that conditioning regimen or GVHD prophylaxis may influence tumour activity (Blaise et al, 2004). Although the studies reviewed herein all used fludarabine-based NST and a calcineurin inhibitor for post-transplant immunosuppression, aspects of conditioning, additional immunosuppressive agents (e.g. anti-thymocyte globulin), and tapering of immunsuppression were highly variable. The limited number of patients and responses prevented precise estimates of treatment-related parameters on the outcome of transplant. The strong association between aGVHD and response does offer further evidence of a graft-versus-tumour effect (OR = 5·4, 95% CI: 1·6–18·1, P = 0·006). Interestingly, 100-day mortality, in general, was low in series with a higher incidence of aGVHD. Despite this, the presence of aGVHD was not associated with improved long-term survival (HR = 1·39, 95% CI: 0·71–2·68, P = 0·33).
With fairly similar transplant regimens but widely varying results, we hypothesised that patient selection was a critical, but possibly unrecognised, determinant, of outcome. Unfortunately, the studies supplied scant data on known RCC prognostic factors or patient health status, therefore preventing direct analysis of the effect of these factors on outcome. Traditional adverse prognostic factors for RCC include elevated serum lactate dehydrogenase, serum calcium, anaemia, lack of nephrectomy, and most importantly, reduced performance status (Citterio et al, 1997; Motzer et al, 1999, 2004; Ljungberg et al, 2000; Mejean et al, 2003, , Naito et al, 1991). Long-term follow-up in a small cohort of 18 patients from the University of Chicago transplant series identified reduced performance status and anaemia as adverse prognostic factors and thus confirmed that previously identified adverse factors for RCC applied even to the select group of patients undergoing allogeneic transplant (Artz et al, 2005). Further, in Pedrazzoli et al (2002), all seven subjects had a decreased performance status and no responses were observed.
We explored a novel approach indirectly assessing patient selection using patient entry rank on each trial as a surrogate marker, motivated by the observation in our own series that the patients accrued initially fared remarkably well, despite a uniform treatment protocol. Our analysis confirmed a strong association between earlier patient entry and increased likelihood of response (P = 0·004) and even overall survival (P = 0·004). Consistent results were obtained when examining entry rank as a continuous variable, the first five patients enrolled, or the first quartile of enrolled patients. These results, in conjunction with the University of Chicago data describing the importance of known prognostic factors, intimate that clinically significant bias exists for enrolling ‘better’ patients first. Improved responses among patients accrued earlier probably reflect patients with better prognostic factors and/or longer expected survival enabling an opportunity to observe a response as responses are typically delayed.
Our analysis has several limitations. First, the coding of patient entry rank assumes that the accrual order reported in each study actually reflects allocation sequence. Fortunately we were able to verify this for the majority of patients by contacting the authors. Of note, some trials also included other types of solid tumours, and their enrollment was not considered in our assignment of rank. However, by assigning in error an early rank to a particular patient, one might expect a reduction in the ability to detect differences in outcome between early and late patients. Second, the effect of patient entry rank may simply represent a time trend, i.e. the detection of the influence of an unknown variable that unfavourably affects response and survival over time. This assumes an unlikely scenario of a systematic and adverse treatment change across diverse studies conducted on three different continents. One possible treatment change that may have occurred systematically is either more or less aggressive control of aGVHD. Although information on, and thus analysis of aGVHD is limited, an association with patient entry rank persists after crude adjusting for aGVHD in multivariate models. Third, a regression to the mean phenomenon combined with publication bias offers an alternative explanation for our findings. Assuming studies with early promising results exhibit regression to the mean, while trials with poor results preferentially stop early and/or aren't published, patient entry may spuriously appear as a prognostic factor. Finally, the data are derived from an extremely novel treatment protocol for a single disease performed at selected institutions. The dynamics of patient selection do not necessarily apply to other treatments and/or diseases. We suspect the influence of entry rank would most likely exist in early phase trials considered highly experimental.
While much has been published on selection bias in phase II or randomised trials, almost no data exists showing selection bias within a study. One may postulate that a highly experimental therapy, strict eligibility, the need for a matched-donor, and available transplant facility all serve as elements promoting patient selection onto a study. The initial University of Chicago report quantified the process where 284 patients were seen, 84 were screened, and 15 eventually underwent transplant (three additional patients were transplanted at other centres) (Rini et al, 2002). Yet these factors do not necessarily explain preferential selection within a given trial. Physicians’ may choose better candidates to first undergo the experimental therapy. This effect may be most pronounced as a new trial becomes available and a pool of potential candidates exists, permitting selection of the healthiest subjects. After observing initial successes, more high-risk patients may be enrolled and/or the pool of potential subjects may be reduced. An analysis correlating patient entry rank with known prognostic factors, such as performance status would enable testing such a hypothesis. However, unidentified prognostic factors may also influence outcome and entry rank.
The need to adjust for order of patient allocation when a significant effect exists and the consequences for sequential response adaptive designs have been described (Kalish & Begg, 1987; Karrison et al, 2003). To our knowledge, this is the most detailed report confirming the association between patient entry rank and outcome. The analysis also reiterates the importance of reporting known pre-treatment prognostic variables to define the cohorts under study, rather than relying on eligibility criteria.
In conclusion, order of patient entry was the strongest prognostic factor in this analysis of NST for RCC. This observation suggests that significant patient selection led to encouraging results among the initial patients transplanted. Analysis of patient entry rank represents a novel technique for evaluating patient selection in early phase clinical trials. Whether entry rank bias applies to early phase trials for other diseases or treatments necessitates further study.