VISTA Steering committee members: A. Alexandrov, P.W. Bath, E. Bluhmki, L Claesson, J. Curram, S.M Davis, G. Donnan, H. C. Diener, M. Fisher, B. Gregson, J. Grotta, W. Hacke, M.G. Hennerici, M. Hommel, M. Kaste, K.R. Lees P. Lyden, J. Marler, K. Muir, R. Sacco, A. Shuaib, P. Teal, N.G. Wahlgren, S. Warach, and C. Weimar.
Subject Codes: Thrombolysis.
Conflicts of interest: None declared.
Background and Purpose
Approved use of intravenous alteplase for ischemic stroke offers net benefit. Pooled randomized controlled trial analysis suggests additional patients could benefit but others be harmed with treatment initiated beyond 4·5 h after stroke onset. We proposed prognostic scoring methods to identify a strategy for patient selection.
We selected 500 patients treated by intravenous alteplase and 500 controls from Virtual International Stroke Trials Archive, matching modified Rankin score outcomes to those from pooled randomized controlled trial 4·5–6 h data. We ranked patients by prognostic score. We chose limits to optimize our sample for a net treatment benefit significant at P = 0·01 by Cochran–Mantel–Haenszel test and by ordinal regression. For validation, we had these applied to the pooled randomized controlled trial data for 4·5–6 h, testing for net benefit by Cochran–Mantel–Haenszel test, ordinal regression, and also by dichotomized outcomes: modified Rankin score 0–1, mortality and parenchymal hemorrhage type 2 bleeds. All analyses were adjusted for age and National Institutes of Health Stroke Scale.
In the training dataset, limits of 56–95 on a prognostic score retained 714 patients in whom there was net benefit significant at P = 0·01. When applied to the 1120 patients in the pooled randomized controlled trial 4·5–6 h dataset, score limits of 56–95 retained 711 patients and gave odds ratio for improved modified Rankin score distribution of 1·13, 95% confidence interval 0·87–1·47, Cochran–Mantel–Haenszel P = 0·89. More patients achieved modified Rankin score 0–1 (odds ratio 1·44, 1·02–2·05, P = 0·04) but mortality and parenchymal hemorrhage type 2 bleeds were increased: odds ratio 1·56, 1·01–2·40, P = 0·04; odds ratio 15·6, 3·7–65·8, P = 0·0002, respectively.
Selection of patients between 4·5 and 6 h based on simple clinical measures failed to deliver a population in whom the alteplase effect would be safe and effective.
Treatment of patients with acute cerebral ischemia using intravenous alteplase is safe and effective if initiated within three-hours  or 4·5 h  of stroke onset. Pooled analysis of individual patient data from these trials and others that treated until six-hours from stroke onset (Onset Time to Treatment, OTT) suggest that some additional patients may benefit while others may be harmed by initiation of treatment within the OTT range of 4·5–6 h . Enhanced patient selection may identify subgroups of patients who may derive benefit at later OTT without suffering harm. Imaging approaches using mismatch of perfusion and diffusion weighted magnetic resonance scans, or computerised Tomography (CT) perfusion scans, are under study. However, simpler clinical criteria could also be helpful and would be more readily applied in routine clinical practice.
Absolute effects of treatment can be estimated from the product of the probability of an outcome without treatment and of the relative influence of treatment (e.g. risk of symptomatic bleeding under placebo is 1·5%, odds ratio for bleeding with alteplase is 4·0 and so risk of bleeding with alteplase is 6·0%, giving an absolute increase in risk of 4·5%). The net benefit will be the absolute increase in favorable functional outcome less any absolute increase in risk of bleeding or death. This creates an opportunity: by optimizing selection according to prognostic factors such as age and severity to minimize absolute risk of adverse events (death and bleeding), then perhaps the viable onset to treatment time may be extended to the 4·5–6 h window.
We hypothesized that we could generate patient selection criteria for thrombolysis using a prognostic score based on simple clinical measures, developing optimal selection criteria using cohorts from an observational dataset of treated and untreated patients and that we could validate our criteria within a separate existing dataset of pooled randomized controlled trial (RCT) data.
Due to the absence of available data on patients treated with thrombolysis after 4·5 h, it was necessary to disregard OTT for the development and initial testing of our strategy. In light of the assumptions, we had to invoke in generating our selection criteria, we planned prospective validation among patients treated in the 4·5–6 h time window from the RCT pooled dataset. Our objective was to demonstrate a net benefit from treatment after offsetting any harm, with intravenous alteplase initiated 4·5–6 h from stroke onset using simple clinical measures.
We conducted our study according to a prespecified analysis plan shared with the Virtual International Stroke Trials Archive (VISTA) steering committee and RCT investigators. We completed the development phase and declared our results including prognostic thresholds before accessing the RCT data for validation. As we were working with existing anonymized data, we did not require ethical review under institutional rules.
Data source and patients
For the development phase, we gathered demographics, clinical data, and functional outcome measures from neuroprotection trials in ischemic stroke conducted in the period 1998–2007. We obtained our data, anonymized in relation to patients and trials, from the VISTA . We excluded trials that had tested effects of thrombolysis or of any drug now known to influence outcome after stroke. We retained patients who received intravenous thrombolysis as standard care, at the time of analysis information on the time of administration of alteplase was unavailable but within the data was known to be <3 h. Trials included within VISTA stipulated only basic imaging requirements for patient eligibility and did not routinely record detailed imaging parameters. These patients formed our ‘VISTA thrombolysis group’ and were complemented by VISTA patients who were managed without alteplase, ‘VISTA controls’. Finally, we excluded patients who lacked relevant baseline and outcome information: enrolment National Institutes of Health Stroke Scale score (NIHSS), age, modified Rankin score (mRS) day 90, and NIHSS day 90. Death was recorded as mRS grade 6.
For the validation phase, we used the pooled individual patient data from the published randomized trials of thrombolysis [1-3, 5-9]. These trials included patients based on a defined onset time of stroke <6 h and a CT scan to exclude haemorrhage.
It is important when performing such analyses that separate data are used for the development stage and the validation stage, first to avoid data mining and second to ensure the methods are applicable externally.
Our outcome measure was mRS, analyzed as an ordinal scale, with dichotomization at mRS 0–1 vs. 2–6, and 0–5 vs. 6 (i.e. mortality) as secondary end-points. Symptomatic hemorrhage, defined as parenchymal hemorrhage type 2 (PH2), was also a secondary outcome .
Development of prognostic score thresholds
We began with the known distribution of mRS outcomes for a thrombolysed versus control population treated 4·5–6 h after stroke onset (Fig. 1), taken from the recent published analysis of pooled RCT data . As onset time of alteplase was not recorded in VISTA though required to be within three-hours of stroke onset, we needed to generate a development population that would be comparable with the RCT data for later-treated patients . We populated a training dataset from VISTA to match the known distribution of outcomes in the 4·5–6 h treated group: on the basis of their 90-day mRS, 500 VISTA control patients and 500 VISTA thrombolysis patients were selected at random to supply the correct number of patients within each outcome category to match those of the pooled RCT data . We did not permit prognostic factors of these patients to influence their selection. This generated a ‘VISTA trial population’ of 1000 patients who had outcomes that almost exactly matched the known outcomes of late-treated RCT patients, differing appropriately between alteplase and control groups.
Rather than developing a new prognostic score, we chose to use a score that had been previously published and validated on different datasets, ensuring a robust and reliable analysis. Weimar et al.  had developed a score to describe probability of functional independence after ischemic stroke, using age and baseline NIHSS score. This prognostic score has been validated by Köenig et al. on data from VISTA  and can be summarized in the formula score = 145 – 0·46 * age – 2·5 * NIHSS. The authors limited the score to these two variables after rigorous analysis showing no other assessed factor to be an independent predictor of outcome alongside age and baseline NIHSS. Trials included within VISTA stipulated only basic imaging requirements for patient eligibility and did not routinely record detailed imaging parameters: we could not incorporate imaging data to the prognostic score. We chose this validated score for its simplicity, practicality, and relevance to our data.
We used receiver operating characteristic (ROC) curves to check an association of outcomes with prognostic score and that the association was similar across all levels of mRS.
We ranked patients in VISTA according to predicted prognosis, using Weimar's prognostic score. We then excluded patients with the worst prognosis and at the other end of the range also excluded patients with the best prognosis, effectively applying a prognostic score ‘window’, within which we retained the patients in our treatment sample. We hoped to maximize the size of this retained sample, since statistical power is partly driven by sample size but also to maximize the treatment effect size through exclusion of ‘minimal responders’, because statistical power is also driven by the extent of the effect. We assumed that the optimal window of prognostic scores for identifying our study sample would represent a compromise between those that delivered a large sample with modest average treatment effect versus criteria delivering a small sample deriving a large treatment effect.
We used an iterative approach to select the optimal prognostic score window, first to identify a patient sample that had a treatment benefit identifiable at a statistical threshold of P < 0·05 and then a more restricted dataset with benefit detectable at P < 0·01.
Using logistic regression with the proportional odds model, adjusting for age and baseline NIHSS, we generated odds ratios for more favorable mRS between thrombolysed and control groups for each selected (better predicted outcome) or excluded (poorer outcome) population, with 95% confidence intervals. Significance was assessed by Cochran–Mantel–Haenszel (CMH) test.
We supplied the chosen prognostic score thresholds to an independent statistician who undertook CMH test, ordinal logistic regression, and dichotomized analysis of the individual patient data from the pooled RCT for patients treated 4·5–6 h from stroke onset.
Results are expressed as odds ratios and 95% confidence interval (CI) for more favorable mRS distribution under alteplase versus control, with P-values from CMH test. Secondary dichotomized outcomes (mRS 0–1, mortality and PH2 rate) were analyzed as previously described .
We ran exploratory analyses among 3–4·5 h and 0–3 h (0–90′ and 91–180′ combined) groups in a similar manner, recognizing that here the small sample sizes undermined power.
From the VISTA dataset, we sampled 1000 patients for our trial population to match the outcome distributions in the RCT data . Their baseline characteristics were control: age 69 ± 13, baseline NIHSS 10 IQR (Inter Quartile Range) 7,16; alteplase: age 68 ± 13, baseline NIHSS 13 IQR 9,18.
The pooled RCT data consisted of 3670 patients, of whom 1120 were treated between 4·5 and 6 h with alteplase or placebo, patient demographics as previously described .
Development of prognostic score boundaries
Outcome in both treatment groups varied according to prognostic score. Preliminary inspection confirmed that the simulated treatment effect was greatest among patients with moderate prognostic scores and was lost at extremes (data not shown).
Table 1 shows the most promising boundaries of prognostic score for selection of a population that appeared to benefit from thrombolysis. From this exploratory work, we identified the prognostic score range of 56–95 inclusive as delivering a population who may derive treatment benefit, significant at P < 0·01 and representing over 70% of the available patients. Wider limits of 47–104 inclusive retained approximately 90% of the population and appeared to offer benefit significant at P = 0·05.
Table 1. Finding boundaries of prognostic score above and below which alteplase should not be given, with 99% confidence limits and significance level of 0·01
When applied to the 1120 patients in the pooled RCT 4·5–6 h dataset, score limits of 56–95 retained 711 patients (64%) (Fig. 2) and gave odds ratio (OR) for improved mRS distribution of 1·13, 95% CI 0·87–1·47, CMH P = 0·89. More patients treated after 4·5 h who fulfilled the score limits achieved mRS 0–1 (OR 1·44, 1·02–2·05, P = 0·04) than in the overall population (OR 1·15, 0·88–1·51, P = 0·30). However, PH2 bleeds showed a trend to increase further, from OR 7·67, 2·99–19·7, P < 0·0001 to OR 15·6, 3·7–65·8, P = 0·0002. The OR for elevated mortality observed in the overall population was not limited by applying the selection score, the confidence interval simply widening: from 1·58 (1·07–2·33) P = 0·02, the OR for mortality became 1·56, (1·01–2·40), P = 0·04.
The wider prognostic boundaries of 47–104 gave ordinal OR 1·13 (0·90–1·41, CMH P = 0·40) in 988 patients (88%) (Fig. 3).
When applied to the 1620 patients in the pooled RCT 3–4·5 h dataset, among whom there is known benefit , score limits of 56–95 retained 1013 patients (63%) and gave OR for improved mRS distribution of only 1·05, 95% CI 0·84–1·30, CMH P = 0·27. Odds of achieving mRS 0–1 decreased as the boundaries were applied, and there was no advantageous effect on mortality or PH2 bleeds (Table 2).
Table 2. Odds ratios, 95% CIs and P-values from ordinal and logistic regressions. Investigating improvement on mRS, achieving mRS 0–1, mortality and the occurrence of PH2 bleeds, all analysis adjusted for baseline NIHSS and age on admission. Score I represents the initial prognostic cut points of 56–95 and score II represents the secondary cut points of 47–104
OR for ordinal analysis (proportional odds) (95% CI)
When applied to the 930 patients in the pooled RCT 0–3 h dataset, score limits of 56–95 retained 624 patients (67%) and similarly offered no advantage in terms of any outcome (Table 2).
Selection of patients for treatment initiation between 4·5 and 6 h based on simple clinical measures developed from an observational dataset failed to deliver a population in whom the alteplase effect would be both safe and effective.
We had postulated that by concentrating treatment among patients with low absolute risk of adverse outcomes, the adverse consequences of delayed treatment initiation may be limited sufficiently to uncover a net benefit. When we sought to validate this approach, we found that while favorable outcome by mRS 0–1 was improved, the risk of PH2 bleeds was also greatly exaggerated and that there was no net benefit. This mirrors the effects that are observed in the unselected 4·5–6 h dataset . It is possible that our approach correctly identifies patients with neither a fixed deficit nor almost certain recovery but that restoration of perfusion in these remaining patients simply carries substantial risk through bleeding. This undermines the use of ‘clinical judgement’ to choose patients for delayed treatment unless the patient prefers to accept a higher risk of mortality to try for functional independence, because we are not able to improve the risk/benefit ratio for treatment within this time window by use of these simple selection criteria. Indeed, our approach also failed validation in the earlier time windows. Although we recognize that imaging could be an instructive tool, our approach here does not inform the debate on use of more sophisticated imaging methods for patient selection. However, at this stage, perfusion–diffusion mismatch imaging has not developed sufficiently to be incorporated into routine practice for this purpose .
Our selection of prognostic score boundaries derives strength from the large, independent sample used to generate them, and from the clinically relevant size of the selected population, around two-thirds of patients who had been treated within this time window in the RCT. Weaknesses include the nonrandomized nature of our VISTA treatment groups, the restriction of our prognostic score to only two variables, age, and NIHSS, rather than including imaging parameters or other clinical variables, an absence of reliable data on PH2 bleeds among controls, and possibly most crucial, the absence of real data from treatment administered beyond 4·5 h. Although adding further variables such as blood glucose and blood pressure into the prognostic score could give a more precise answer, these explain only a limited proportion of the variability and were previously discounted for the prognostic score by Weimar et al.
It is likely that most VISTA thrombolysis patients were treated within three-hours of stroke onset. The natural history of NIHSS scores is for them to fall over the first hours after stroke onset. By using data from patients examined within three-hours of stroke onset to generate a simulated population 4·5–6 h from stroke onset, without allowing for this average improvement, we will have slightly inflated our estimate of the baseline severity.
While our analysis based on ordinal outcomes failed to deliver a population in whom treatment >4·5 h was safe and effective, analysis based on net benefit (mRS 0–1) showed significance. The analysis of trial data according to net benefit has proponents and opponents, the latter arguing that it may conceal useful treatment effects among subpopulations. Unless we can prospectively select patients for these subpopulations, the ordinal approach to interpretation may remain optimal as it better reflects the true outcome of clinical practice.
R.L.Fulton is supported by studentships from Wyeth/Pfizer and Johnson and Johnson.