Separation and the information theory surrogate evaluation approach: A penalised likelihood solution

Surrogate evaluation is an important topic in clinical trials research, the use of a surrogate in place of a primary endpoint of interest is a common occurrence but also a contentious issue that is much debated. Statistical techniques to assess potential surrogates are closely scrutinised by the research community given the complexities of such an assessment. One such technique is the information theory surrogate evaluation approach which is well‐established, practical and theoretically sound. In the context of discrete outcomes, we investigated issues of bias due to inefficiency, overfitting and separation (sparse data) that have not been recognised or addressed previously. The most serious cause of bias is separation in trial information. We outline the concerns surrounding this bias and conduct a simulation study to investigate whether a penalised likelihood technique provides an appropriate solution. We found that removing trials with separation from surrogacy evaluation resulted in a large amount of discarded data. Conversely, the penalised likelihood technique allows retention of all trial information and enables precise and reliable surrogate estimation. The information theory approach is a critical tool for conducting surrogate evaluation. This work strengthens the practical application of the information theory approach, allowing analyses to be adapted or the results summarised with appropriate caution to mitigate the biases highlighted. This is especially true where separation occurs. The adoption of the penalised likelihood technique into information theory surrogate evaluation is a useful addition that solves an issue likely to arise frequently in the context of categorical endpoints.


| INTRODUCTION
Surrogates are measures of treatment effect that can be evaluated early and inform on the treatment effect on the primary outcome of interest. The use of a valid surrogate in place of the primary outcome offers potentially huge cost and time benefits. However, the use of an invalid surrogate could be extremely detrimental to the drug development process and to patient safety. Evaluating surrogates is as crucial as it is difficult: complexities in treatment mechanisms of action can mask potential inadequacies in a surrogate.
One well-established practical approach to surrogate evaluation is a multi-trial approach using information theory. 1,2 This approach generates estimates of surrogacy at two levels. Trial-level surrogacy quantifies the association between treatment effect estimates on the surrogate and true outcome in a trial, while individual-level surrogacy measures the correlation at the individual patient level after adjusting for treatment. The information theory approach has been extended to the case of continuous, 3 binary, 4 ordinal, 5 time-to-event 6 and longitudinal outcomes. 7 All of these settings have been thoroughly investigated via case studies and simulations. These multi-trial methodologies have also been used frequently in real applications to inform clinical trial practice 8 and there are calls for their use to be a requirement of regulatory bodies for studies investigating new drugs based on a surrogate. [9][10][11] Finally, SAS code and an R package help the applied researcher implement this methodology. 12 We previously extended the information theory approach to the case of a binary surrogate and ordinal true outcome (the binary-ordinal setting). 5 We identified three forms of potential bias in assessing trial-level surrogacy. These were due to inefficiency, overfitting and the impact of separation in discrete outcomes. Underestimation (of strength of surrogacy) occurred when there were data available from a large number of trials, due to the loss of efficiency inherent in discrete outcomes and the two-stage nature of the modelling. Overestimation was present when only small numbers of trials were available, due to overfitting in the first stage of modelling. The most serious of the three issues identified was the impact of separation (e.g., a zero cell in a cross-tabulation of two binary outcomes). Discrete outcomes are very common in medical practice but the usual logistic regression analysis of these is biased in the presence of separation. These issues require thorough elucidation so that the analysis of discrete outcomes, and in particular the information theory approach to surrogate evaluation, can be optimised. We investigate the causes of the biases due to inefficiency, overfitting and separation, the form they take and the conditions and settings under which they are strongest and most prevalent. Finally, we offer a solution to the most serious form of bias identified, that which occurs in the presence of separation.
In Section 2 we summarise the information theory surrogate evaluation approach and show how it is applied in the binary-ordinal setting to evaluate trial-level surrogacy. In Section 3 we outline how bias has a serious effect on surrogate evaluation in the presence of separation and present a penalised likelihood technique as a solution. In Section 4 we discuss how inefficiency and overfitting affect estimation. In Section 5 we present a simulation study to explore issues of bias in more detail, with conclusions in Section 6.

| THE INFORMATION THEORY APPROACH
Since the biases identified only impact on trial-level surrogacy, this is the only information theory surrogate evaluation measure we derive in this section.
In what follows: Y represents a discrete random variable with values k b , b 1,…, m y À Á and probabilities of occurrence of each value p b . We represent a putative surrogate as S, the treatment group indicator as Z and the true ordinal outcome as T. The categories of T are denoted by w = 1, …, W. In the multi trial context there are i = 1, 2, …, N trials, and j = 1, 2, …, n i patients per trial.
Surrogate evaluation was previously proposed using a meta-analytical approach with a joint mixed model of the true and surrogate outcomes regressed on treatment. However, the model was found to be computationally burdensome. 13 Tibaldi et al. 14 suggested that a two stage fixed effects approach 13 would be preferable to the full mixed effects model as it is more computationally feasible and has only a minor loss of statistical efficiency (for normally distributed outcomes). This two-stage approach was found to work well in various settings (e.g., with binary, continuous or time-toevent outcomes). However, at the individual level different measures of association in different settings meant there was no consistent interpretation. The information theory approach 1 was developed to resolve this inconsistency.
Information theory 15 concerns information, choice and uncertainty in a draw from a random process. Entropy is a key concept that quantifies the amount of information gained from such a draw. Entropy can be expressed mathematically as H Y ð Þ¼À P m y b¼1 p b log p b ð Þ, where Y is a discrete random variable with values k 1 , k 2 ,…, k m y and probabilities p 1 , p 2 , …, p m y respectively. A full list of the properties of entropy can be found in Shannon and Weaver. 15 An extension of the concept of entropy for continuous outcomes is a measure called entropy power (EP) which is used to compare random variables. 15 Mutual information is a key concept that quantifies the amount of uncertainty in a variable expected to be reduced if information about another variable is known. It is defined as I X, Y ð Þ¼H Y ð ÞÀH YjX ð Þ, where H YjX ð Þ is the conditional entropy of YjX. In the case of surrogate outcomes, this quantity can be considered as the information in T that is shared by S.
Alonso and Molenberghs 1 proposed a trial level measure of surrogacy based on these concepts, called R 2 ht : where α i and β i are the treatment effects in trial i on the surrogate and true outcome respectively. EP β α j Þ ð is the entropy power of the distribution of β i given the distribution of α i and EP β ð Þ is the entropy power of the distribution of β i . R 2 ht can be interpreted as the proportion of uncertainty in the treatment effects on T removed by adjusting for treatment effects on S. In the bivariate continuous setting, R 2 ht can be shown to reduce to the trial level surrogacy measure proposed under the meta-analytical approach. These concepts are consistent with the aims of surrogate evaluation since it is concerned with increasing our knowledge about the treatment effect on the true outcome using the surrogate.
In order to estimate R 2 ht Alonso and Molenberghs 1 suggested using the likelihood reduction factor, LRF, introduced by Alonso et al. 16 This provides consistent estimation of surrogacy; ranges in the unit interval; has a common interpretation across settings; and avoids evaluating high dimensional integrals and joint models for X and Y which would otherwise be required when fitting the models required by the information theoretic approach.

| Likelihood reduction factor: binary-ordinal setting
Here we estimate the likelihood reduction factor using the two stage approach outlined by Alonso and Molenberghs 1 but applied to the binary-ordinal setting described in Ensor and Weir. 5 At the trial level, we focus on the treatment effects on the surrogate in relation to the treatment effects on the true outcome. At the first stage interest is in the intercept and treatment effects for each trial on the binary surrogate and ordinal true outcome, μ S i , μ T i , α i and β i respectively. These are found by regressing the surrogate and true outcome on treatment using the logistic regression and proportional odds models (2) and (3) respectively.
At the second stage the parameter effect estimates from stage one are used in two further models. These are: the intercept-only model (4) of treatment effects on the true outcome for each trial; and the treatment effects on the true outcome regressed on the intercept and treatment effect of the surrogate for each trial, see model (5).
where, γ 3 and γ 0 are the intercept parameters with and without adjustment for the surrogate; γ 1 and γ 2 are the parameters for the surrogate intercept and treatment effects. Now we calculate the difference in À2*log-likelihood, denoted as G 2 , between these two models to determine the LRF.
Here N is the total number of trials. In this case, the difference in the À2*log-likelihood summarises the amount of information on the treatment effect estimates on the true outcome, b β i , explained by the addition to the model of treatment effect estimates on the surrogate, b α i . Therefore, the LRF conceptually links to R 2 ht which is defined in Equation (1) as the 'proportion of uncertainty in the treatment effects on T removed by adjusting for treatment effects on S'.
Confidence intervals can be calculated using an approach by Kent,17 the construction of these is detailed in Ensor and Weir. 5

| SEPARATION
Where separation or quasi-complete separation of categorical variables occurs, there is no unique maximum likelihood. 18 Let us consider the case of two binary variables where one is regressed on the other as in Equation (2). Complete and quasi-complete separation relate to the existence of empty cells in the cross-tabulation of S and Z. In Table 1A-C respectively we present the case of: no separation-with no empty cells; complete separation-where the binary variable Z perfectly predicts S; and quasi-complete separation-where one cell is empty. The maximum likelihood estimate for two binary variables is: If a zero occurs in the denominator or numerator of 7 the function is undefined (i.e., b φ = ∞ in the case of the denominator and b φ = log(0) for the numerator). Therefore, there is no maximum likelihood in the presence of separation. 18 Quasi-complete separation can occur in the case of an ordinal variable regressed on a binary one, in a similar manner and with similar consequences as in the binary case. In the case of the information theory approach for the binary surrogate in model (2)

| Impact of separation on information theory surrogate evaluation
When complete or quasi-complete separation occurs this typically causes problems with maximum likelihood estimation for generalised linear models. In the typical scenario, the model iterates several times trying to converge. 18 The affected parameter estimate increases on each iteration, continuing to do so until a fixed iteration limit. Generally by this point, the parameter estimate will be large and its standard error very large.
T A B L E 1 Examples of complete and quasi-complete separation in the binary and ordinal setting At the first stage of surrogacy estimation, S and Z (both binary) and T and Z (one ordinal, one binary) are regressed on one another for each trial in models (2) and (3). This returns treatment effect estimates on the binary surrogate and ordinal true outcome. However, in the presence of separation these will be biased and since these estimates are used in modelling at stage two they tend to cause outlying points in the stage two regression. The LRF, Equation (6), is then based on models with potentially highly influential outliers. This leads to unreliable estimation of R 2 ht , with a tendency to underestimate the true value. For a visual representation of the impact of separation see the results of a surrogacy assessment in Figure 1. The randomised trial Clots in Legs Or sTockings after Stroke (CLOTs3) 20 aimed to determine whether compression aids reduced the occurrence of deep vein thrombosis in immobile patients who had suffered stroke. We assessed if binary measures of deep vein thrombosis taken within 30 days of a stroke could be used as a surrogate in place of an ordinal measure of death and disability at 6 months post stroke. If centres (which can be used in place of trials in surrogacy evaluation 21 ) with separation are retained in the usual information theory surrogacy assessment various outlying points can be seen at the left and right ends of the x-axis in the second stage of modelling. These are due to separation and potentially could strongly influence the regression parameter estimates. This issue occurs in several centres in this real life example. In the presence of separation R 2 ht = 0.145, 95% CI (0.027, 0.325), and using the penalised likelihood approach R 2 ht = 0.077, 95% CI (0.003, 0.231).

| Penalised likelihood solution to separation issues
Various possible solutions to the issue of separation 18 include deleting problematic variables; combining categories (in our case trials); reporting only the likelihood ratio statistics; using exact logistic regression, penalised maximum likelihood or Bayesian estimation. Given the variables of interest, a desire to retain trial-specific information and the parameter estimation required in surrogacy evaluation only the latter three options are available to us. Allison 18 found that a Bayesian approach with uninformative priors led to convergence problems. Furthermore, parameter estimation is purportedly better for the penalised maximum likelihood than exact logistic regression. 22 The penalised likelihood technique of Firth 19 was originally introduced to reduce bias in maximum likelihood estimates in logistic regression. In particular, it applies to small samples where bias increases away from zero and infinite parameter estimates in the case of separation can be thought of as an extreme example. 22 If we have a scalar parameter θ of an exponential family model l θ Þ, that is, the sufficient statistic t affects the location but not the shape of U θ ð Þ. At the true value of θ we have E U θ ð Þ ð Þ¼0 therefore the score function is unbiased. It is also true that if the score function is linear in θ then E b θ ¼ θ. However, when the score function is curved in θ, as is the case under separation, Firth states that the unbiasedness of the score function and this curvature imposes a bias in b θ so that E b θ ≠ θ. Firth 19 favoured a systematic modification to the score function (adding a bias term) to prevent bias in b θ rather than correcting an already biased estimate of θ. Specifically, Firth employed the following notation for the vector of parameters θ r , where r = 1, …, p, whose maximum likelihood estimates are usually determined from the score function equation δLogl δθ U θ r ð Þ¼0. For the exponential family, Firth suggested a modified score equation: where the bias term i θ ð Þ j j 1=2 is the Jeffreys invariant prior. 23 The influence of this bias term is asymptotically negligible. Firth 19 showed that this technique removed the overall bias in parameter estimation.
Heinze and Schemper 22 applied the technique of Firth 19 to deal with instances of separation in the logistic regression context. Under assessment Heinze and Schemper 22 showed that it was 'an ideal solution to separation' as it produces finite parameter estimates that are superior overall to those from alternative methods. Therefore, we apply the penalised likelihood technique of Firth 19 to resolve surrogacy estimation issues which result from separation.
Penalised likelihood techniques may be implemented for generalised linear models using the logistf command in the logistf package and pordlogist in package OrdinalLogisticBioplot 24 in R.

| OVERFITTING AND INEFFICIENCY IN THE INFORMATION THEORY APPROACH
Besides separation, we also identified issues of underestimation and overestimation previously. 5 Underestimation worsened as the number of trials increased, presumably due to inefficiency of the two-stage approach compounded by the presence of discrete outcomes. We found that the general literature supported this assessment: Molenberghs et al. 25 investigated the partitioning of a large dataset by applying a logistic regression to each partition to gain multiple estimates of the parameter of interest. The mean of these parameters was then compared to the estimate gained from the model using all the data. They showed that inefficiency occurred when the number of partitions (in our case trials) was large compared to the size of the partitions. On top of this, parameter estimation is less efficient if binary or ordinal outcomes are used in place of continuous outcomes. 26,27 Overestimation occurred in the presence of weak surrogacy and a small number of trials. This overestimation was due to overfitting of the regression at the second stage of modelling to a small number of data points (one for each trial). Since the second stage models affected by overfitting are normal linear models, irrespective of the type of outcome being studied, such overfitting would be expected to be present in all settings. The classical R 2 measure of the coefficient of determination is known to be biased and inflated particularly in the case of small sample sizes and/or too many predictors. The information theory approach was introduced to address issues with a unified interpretation at the individual level. At the trial level the calculation of R 2 ht has been shown to be consistent with the classical R 2 measure. 28 Therefore the R 2 ht also suffers this bias. The classical R 2 can be adjusted through the calculation of the required shrinkage to provide an unbiased estimator of the population R 2 , which we denote R 2 adjC . Unbiased estimation of the population surrogacy strength is the primary focus of surrogacy evaluation. In order to assess overfitting of R 2 ht we will also present R 2 adjC in simulations. Methodologists have previously discussed a full and reduced model at the second stage of analysis. The full model is shown in Equation (5), and has two explanatory variables. The reduced model is the same equation without the trial specific intercept estimates, b μ S i . The calculation of R 2 ht for the reduced model proceeds in the same manner as for the full model but based on the reduced regression. Tibaldi et al. 14 explored simplified means of surrogacy assessment. This paper concluded that in general the full model confers a small benefit and should be used in practice, and this has since been the convention. Previous simulations have only focused on strong surrogacy, predominantly R 2 ht = 0.9, which explains why the issue of overestimation was not identified previously. A model based on fewer explanatory variables is likely to suffer less inflation of R 2 ht , therefore, we revisit this convention by presenting results of a reduced R 2 , namely R 2 ht:R . For completeness we will also include the adjusted estimate of the reduced model R 2 adjC:R in our simulations.

| SIMULATION
We investigated the practical worth of the penalised likelihood technique via a simulation study using R, based on the approach of Tilahun et al. 4 Various scenarios were simulated to study the R 2 ht surrogacy measure estimation. Trial sizes were set to 10, 20, 60, 100, and 300 patients. There were 5, 10, 20 or 30 trials in each simulated data set and 250 datasets simulated for each scenario. (Note that surrogacy data from multiple individual trials with the same design and treatment classes are not commonly available, therefore researchers tend to use centres within trial in place of trials, and evaluate surrogacy using data from multiple centres instead. 5 Centres within trials can have various different sizes from small to large; hence the above simulation trial size scenarios are representative of realistic settings that occur in practice). We present the mean point estimates and the variance. The joint mixed model in eight forms the basis for the data generation: where μ s ð , μ T ) and (α, β) are fixed intercepts and treatment effects respectively. (m S i , m T i Þ and a i ,b i ð ) are random intercepts and treatment effects for the ith trial respectively. Error terms are jointly distributed, Intercept and treatment effect parameters for S and T were set to μ s = 0.50, μ T = 0.45, α = 0.05, and β = 0.03. Surrogacy was simulated to be: strong at both trial level (R 2 ht ¼ ρ 2 ¼ 0:90) and individual level surrogacy (R 2 h ¼ ψ 2 ¼ 0:64Þ; or weak on both measures, with R 2 ht ¼ ρ 2 ¼ 0:30 and R 2 h ¼ ψ 2 ¼ 0:30. After simulating continuous S and T these were then dichotomised or categorised to represent a binary S and a seven category ordinal T. In order to create a binary surrogate outcome, a simulated continuous surrogate was dichotomised at the mean. This was in keeping with previous publications in this field. 4,29,30 To create ordinal outcomes the continuous variables were categorised using six evenly spaced cut off points, determined according to the quantiles of the true outcome variable.
Two techniques to deal with separation were investigated. The first was to apply a penalised likelihood technique that allowed trials in which separation occurred to be retained in analysis. The second was to remove trials where separation occurred. If fewer than three trials remained following trial removal the simulation was set to return a null value. For the removal technique the simulation was run until 250 datasets were simulated with three or more trials available.
In order to investigate the other underestimation and overestimation biases discussed in Section 4, results for the binary-ordinal setting are compared to the information rich continuous-continuous setting. To further investigate whether the underestimation seen for large trial sizes is due to inefficiency a much larger trial size of 3000 patients per trial was also simulated.

| Results
For both strong and weak surrogacy settings, the penalised likelihood technique was compared to the removal of trials technique. The penalised likelihood technique was based on the full dataset in each case, whereas the trial removal technique was often based on a much reduced dataset due to the removal of data from trials in which separation occurred. For small numbers of patients per trial, often fewer than half of the trials were retained in analysis; see Tables 2 and 3. Furthermore, sometimes where there were fewer than three trials available for analysis under the trial removal technique the second stage of modelling could not proceed: in some scenarios this occurred up to 90% of the time. Even where there were large numbers of patients per trial (e.g., 300 patients) the median number of trials retained in the removal technique was generally lower than the number simulated, see Tables 2 and 3. This represents a frequent loss of information when the trial removal technique is implemented and shows how often separation can occur. The penalised likelihood technique by comparison allows the retention of all trials in the analysis.
Comparing the R 2 ht between the penalised likelihood and the trial removal techniques is not straightforward given the conflicting issues of bias and the inclusion of different numbers of trials under each method. For instance, in Table 2

Mean
Var.

Mean
Var.

Mean
Var.

Mean
Var. 10 and one might expect overfitting to have an influence. Similarly there are 30 trials contributing to the penalised likelihood approach, here one might expect the dominant issues to be inefficiency and underestimation. When we look instead at the mean R 2 adjC , which at least removes the issue of overfitting and inflation of R 2 , we see that in fact the penalised likelihood technique, R 2 adjC = 0.481, is out performing the trial removal technique, R 2 adjC = 0.367. To obtain a clearer comparison of the techniques, we turn our attention instead to the R 2 adjC results. In the case of strong surrogacy as presented in Table 2 the R 2 adjC for the penalised likelihood approach shows less bias and more precision of estimates in all settings that have 60 participants per trial or fewer. The benefits of the penalised likelihood approach increase as the number and size of the trials reduces.
In the case of weak surrogacy, presented in Table 3, the R 2 adjC for the penalised likelihood approach displays less bias and more precision in all settings. As in the strong surrogacy setting, the benefit of the penalised likelihood approach increases as the number and size of the trials decreases.
In Table 4 the results in the binary-ordinal case underestimate trial level surrogacy in comparison to the continuous-continuous case and show less precision in all settings. This underestimation worsens as the number of trials increases and although particularly bad for small numbers of patients per trial it can even be seen in the case of 300 patients per trial. If we look at the additional simulation for 3000 patients per trial, we see that the results are much closer to the true value of 0.90, and more in line with that seen for the continuous case with 300 patients per trial. This is evidence that inefficiency is the cause of the bias.
To investigate the overestimation seen predominately for weak surrogacy and small numbers of trials we focus first on where R 2 ht = 0.30. See that R 2 ht is inflated even when the size of the trials is large. The inflation of the results is particularly bad when the number of trials is small, such that the surrogate might erroneously appear to be moderately good. The reduced model, R 2 ht:R , removes some but not all of the inflation of the R 2 ht . For instance, for 5 trials and 300 participants the mean R 2 ht = 0.600, R 2 adjC = 0.198, R 2 ht:R = 0.407 and R 2 adjC:R = 0.207. The estimate based on the reduced model, R 2 ht:R , is still large compared to the true value of 0.30 although not as poor as the full model R 2 ht estimate. Under both the full and reduced models the R 2 adjC appears to remove all the overestimation in results. When surrogacy is strong, see Table 4, the inflation of R 2 ht is noticeable in comparison to the R 2 adjC (mostly where the number of trials is small) but not as severe as in the weak surrogacy setting. Again the R 2 ht:R gives less biased estimation than the R 2 ht but not as good as the R 2 adjC in regard to overfitting in the smaller trial sizes and trial numbers. We also compared weak surrogacy for the binary-ordinal to the continuous-continuous setting, see Table 5. Overestimation was seen to behave similarly in both settings in respect to the values of R 2 ht and the comparative advantages of R 2 ht , R 2 adjC and R 2 ht:R described above. Since the adjusted models remove issues of bias due to overfitting we can compare R 2 adjC versus R 2 adjC:R to see clearly the benefits of the full versus reduced models. In all the continuous-continuous settings and the discrete setting where the strength of surrogacy is weak the full model shows little benefit. However in the case of strong surrogacy, see Table 4, R 2 adjC is very marginally but consistently less biased than the R 2 adjC:R . Presumably this is due to the trial intercept variable including a small amount of information that helps counteract some of the inefficiency present in the discrete setting. However, conversely, in all settings continuous or discrete the precision of the R 2 adjC:R is slightly better than the R 2 adjC , increasingly so as the number of trials decreases. Table S1 outlines the percentage of separation in the various settings: as might be expected the rates were similar regardless of the number of trials. Quasi-complete separation in the binary case was $60% for 10 participants per trial falling to $5% for 300 participants per trial. Complete separation only occurred in 2% of trials where there were 10 participants per trial and 0% in all other settings. Separation in ordinal outcomes is always defined as semi-complete 31 and this occurred in $20% of trials for 10 participants per trial down to $1% for 60 participants per trial and 0% otherwise.

| DISCUSSION
Surrogacy assessment is a complex issue and many statistical approaches have been suggested. An accessible, practically sound and well-developed approach is based on information theory. This approach assesses surrogacy at both the individual patient and trial levels. At the trial level and in the context of discrete outcomes we have identified three issues concerning the information theory approach that have not previously been recognised or investigated.
The first of these is underestimation, which counter-intuitively increases as the number of trials increases and is worse where there are few patients compared to the number of trials. We demonstrated through simulation and investigation of the wider literature that this was due to inefficiency of the two-stage nature of the approach and the use of uninformative discrete outcomes. Continuous outcomes by comparison were minimally affected.
The second issue of bias is that of overestimation. This was worse for small numbers of trials and a weak level of surrogacy. The overestimation was because of overfitting in the second stage of modelling, due to having too few data points available (one per trial) where the number of trials is small. The simulated results for five trials and a weak surrogate are such that a poor surrogate under investigation may erroneously appear to be moderately good. We showed that overfitting of weak level surrogacy was also present for continuous outcomes, supporting our hypothesis that this issue applies across all types of outcome.
Previously, researchers have suggested that a 'full' model should be used instead of a 'reduced' model with fewer explanatory variables. Our simulations show that in fact the reduced models give less biased estimation. This is certainly true in the presence of overfitting and in general it is hard to see much if any benefit of the full model. We also showed that an adjusted R 2 based on the classic coefficient of determination removed issues of overfitting in estimation. A comparison of the adjusted versus unadjusted R 2 shows the large impact overfitting is having on the results and allows us to clearly attribute this bias to overfitting. Based on these simulations, we would advise that surrogacy assessments are based on reduced models, alongside a check of the adjusted R 2 for the reduced model to make sure that the results are not overly optimistic (especially where the number of participants or numbers of trials are small).
Finally, we outlined how poor estimation of treatment effects in the presence of separation at the first stage of trial level surrogacy evaluation leads to biased estimation of the level of surrogacy. We proposed the penalised likelihood technique of Firth 19 as a solution to this. Under simulation investigation, we found an alternative method-removal of trials containing separation from the evaluation-resulted in a large amount of discarded data. This demonstrated how frequently separation can impact on the information available from a trial; given the value of this information, identifying a solution to this issue was critically important. The penalised likelihood technique provides improved estimation and precision without the loss of precious trial information. While this benefit was greater where the number and size of trials is small, these are realistic settings for surrogacy assessments. This technique provides a practical and effective solution to the pervasive issue of separation when assessing surrogacy using the information theory approach for discrete outcomes, and can be easily adopted for any combination of binary or ordinal surrogate and true outcomes. Furthermore, our work indicates that this technique could improve analysis of discrete outcomes in clinical trials research more generally where sparse data is an issue. We developed the command FixedDiscrDiscrIT in the R package Surrogate 32 using the above methodology, allowing practical application of information theoretic surrogacy evaluations in the presence of sparse data.
The use of unvalidated surrogates is an issue that is much debated in clinical trials research. Efforts to encourage researchers to adopt statistical approaches to surrogacy evaluation continue and are only strengthened through refinement of the approaches available. The information theory approach has been centre stage during this undertaking. Our work will further underpin this approach through better understanding of its application in practice and resolution of the issues caused by separation in discrete outcomes.