Using continuous data on tumour measurements to improve inference in phase II cancer studies

In phase II cancer trials, tumour response is either the primary or an important secondary endpoint. Tumour response is a binary composite endpoint determined, according to the Response Evaluation Criteria in Solid Tumors, by (1) whether the percentage change in tumour size is greater than a prescribed threshold and (2) (binary) criteria such as whether a patient develops new lesions. Further binary criteria, such as death or serious toxicity, may be added to these criteria. The probability of tumour response (i.e. ‘success’ on the composite endpoint) would usually be estimated simply as the proportion of successes among patients. This approach uses the tumour size variable only through a discretised form, namely whether or not it is above the threshold. In this article, we propose a method that also estimates the probability of success but that gains precision by using the information on the undiscretised (i.e. continuous) tumour size variable. This approach can also be used to increase the power to detect a difference between the probabilities of success under two different treatments in a comparative trial. We demonstrate these increases in precision and power using simulated data. We also apply the method to real data from a phase II cancer trial and show that it results in a considerably narrower confidence interval for the probability of tumour response.

3. The log tumour size ratio is a bivariate normal random variable, but the covariance matrix for an individual is 0.5( z i0 E(z i0 ) ) 2 0.5( z i0 E(z i0 ) ) 2 0.5( z i0 E(z i0 ) ) 2 ( z i0 E(z i0 ) ) 2 . Thus, the standard deviation of the log tumour size ratio increases linearly with the baseline tumour size, but the average standard deviation across all patients is as before. The means of the two treatments are chosen as in the main paper.

Sensitivity analyses
The augmented binary approach makes several assumptions. We conducted a sensitivity analysis to three of these assumptions: 1) the hazard of non-shrinkage failure depends only on the most recent tumour size observation; 2) the various binary reasons for failure (new lesions, toxicity and death) can be combined and modelled as one process; and 3) the log tumour size ratios are bivariate normally distributed with constant covariance matrix.
3.1 Sensitivity to assumption that probability of failure depends on most recent tumour size observation To investigate the sensitivity of the augmented binary method to assumption 1) we simulated tumour shrinkage data and failure data for patients as if they were observed at four post-baseline timepoints (referred to in the following text as timepoints 1, 2, 3, and 4) instead of two. Timepoints 2 and 4 are the interim and final timepoints from the two timepoint trial. The tumour size data was simulated assuming that the log tumour size ratios were multivariate normal with mean (0.25δ 1 , 0.5δ 1 , 0.75δ 1 , δ 1 ), and covariance We simulated non-comparative data with n = 75, δ 1 = log(0.7), and no dropout for non-failure reasons. The parameters used in the failure model, α D and γ D were varied with (α D , γ D ) = {(−2, 0.1) or (−2.75, 0.2)}. The positive γ D parameters mean that patients with large tumour sizes at the previous timepoint are more likely to fail at the next timepoint than are those with smaller tumour sizes.
For each simulation replicate, the shrinkage data and failure data were simulated for all four timepoints. Then the data on timepoints 1 and 3 were thrown away, so that patients were only observed at baseline, interim and final timepoints. If a patient failed at timepoints 1 or 3 then they were recorded as failures at the interim and final timepoint respectively. The potential for bias here is that a patient's tumour size may be quite different at baseline and at timepoint 1 (say), but the augmented binary model will only use the baseline tumour size.
However, from 5000 simulation replicates, the coverage of the augmented binary approach was 94.46% when γ D = 0.1 and 94.48% when γ D = 0.2, which is not noticeably different from the results in the main paper. Thus it does not appear that the augmented binary method is particularly sensitive to this assumption.

Sensitivity to modelling different reasons for failure as one category
To test the sensitivity of type-I error rate and power to assumption 2) we simulated data using separate models for the probability of new lesions and probability of death or toxicity. For the sensitivity analysis, we simulated data such that the parameters representing the effect of treatment on failure due to new lesions, and failure due to death/toxicity respectively have the same magnitude, but have opposite signs. This means that the overall probability of failure is the same in each treatment arm, but that the probability of new lesions is lower on one arm, and the probability of toxicity or death is lower on the other arm. The two differences cancel each other out, so that the probability of detecting a difference in success probability between treatments should be the type-I error rate. A range of treatment and tumour size parameter values were considered. For each set of parameter values, we estimated the probability of rejecting the null hypothesis when there was no difference in tumour shrinkage between treatments, and when the value of x was 0.35.
The results of this analysis are given in table 2. They show that violation of the assumption that two distinct reasons for non-shrinkage failure can be combined into one outcome has little noticeable effect on the type-I error rate of any of the three approaches. When the magnitude of the effect of tumour size is large, there is a small inflation in the type-I error rate of the logistic regression approach. In the extreme case where the γ D = 0.2, and β D = 1, the type-I error rate was estimated to be 0.067 for the logistic regression approach. The inflation in type-I error rate when using logistic regression is explained by the fact that the model fitted is mis-specified. The power of the augmented binary approach fell modestly when the direction of effects differed, whereas the power of the logistic regression approach fell more sharply. Thus, although the type-I error rate and power of the augmented binary approach are somewhat sensitive to the first assumption, the sensitivity is no greater than that of the logistic regression approach.

Sensitivity to normality of log tumour shrinkages
To test sensitivity to the assumptions made about the distribution of tumour shrinkages, we simulated tumour shrinkages using three different models: 1) the tumour-size ratios (rather than the logarithm of the ratios) are bivariate normal with constant covariance matrix; 2) the differences in tumour size between each visit are bivariate normal with constant covariance; 3) the log tumour-size ratios are bivariate normal, but the covariance  Table 2: Estimated type-I error rate and power when the treatment effect parameters in the models used to simulate failure due to new lesions and failure due to death/toxicity have the same magnitude, but different sign.
matrix is a linear function of the baseline tumour size. The estimated type-I error rate and power are shown in table 3. The type-I error rate of the augmented binary approach shows no inflation for models 2) and 3). There was a small inflation (the observed type-I error rate was 0.057) for model 1). The power advantage of the augmented binary approach over logistic regression is sensitive to the endpoint used -using model 2) the power of the augmented binary approach (0.554) was lower than that of the logistic regression (0.580). These results indicate that when analysing real data, assessing the plausibility of the assumption of normality of the log tumour-size ratio is advisable. If there is evidence against the assumption of normality of the log tumour size ratio, the augmented binary approach can be easily modified so that a different function of the tumour shrinkage is used in the model represented in equation 1 in section 2.1 of the main paper.  Table 3: Estimated type-I error rate and power when the mechanism for generating tumour shrinkages differs. Scenario 1: the tumour-size ratios (rather than the logarithm of the ratios) are bivariate normal with constant covariance matrix; scenario 2: the differences in tumour size between each visit are bivariate normal with constant covariance; scenario 3: the log tumour-size ratios are bivariate normal, but the covariance matrix is a linear function of the baseline tumour size.