Assessing efficacy in non‐inferiority trials with non‐adherence to interventions: Are intention‐to‐treat and per‐protocol analyses fit for purpose?

Non‐inferiority trials comparing different active drugs are often subject to treatment non‐adherence. Intention‐to‐treat (ITT) and per‐protocol (PP) analyses have been advocated in such studies but are not guaranteed to be unbiased in the presence of differential non‐adherence.


INTRODUCTION
Non-inferiority trials are used to assess whether an experimental intervention is not worse than a proven comparator by more than a clinically acceptable amount (known as the non-inferiority margin).2][3] Participants not receiving their randomly assigned intervention according to the protocol (referred to as non-adherence or non-compliance), such as missing several doses of a prescribed medication or not undergoing a surgical procedure as planned, is a particular concern in these studies because it may dilute the size of any effect between trial arms and inflate the risk of falsely concluding non-inferiority. 4,5he terms adherence and compliance are often used interchangeably, though adherence is preferred here since it better reflects the partnership between participants and healthcare providers.][7] A common approach to handling non-adherence is to assess outcomes among different analysis populations, with consistent results providing greater confidence in the trial conclusions. 8Within the setting of non-inferiority trials, the intention-to-treat (ITT) and per-protocol (PP) populations have been advocated and are used extensively.However, in the presence of differential (or non-random) non-adherence, where the factors leading to non-adherence are also associated with outcomes, both of these approaches can result in bias in the same direction.][11] More sophisticated statistical methods that attempt to account for the impact of non-adherence have been proposed.A recent systematic review identified a range of techniques that can be applied for this purpose within non-inferiority trials, including: instrumental variable (IV) methods, rank-preserving structural failure time models (RPSFTM) and G-estimation, inverse-probability-of-treatment weighting (IPTW), modeling adherence as a time-varying covariate in a time-to-event analysis, and adjusting for observed adherence within a regression model. 12Details of such methods, their inherent assumptions, and their advantages and disadvantages were described as part of the review.Crucially, few studies have compared the performance of these alternative methods under different patterns of non-adherence in non-inferiority analyses, and so it remains unclear when they might be applied appropriately.
In the current study, we focus on a setting where the outcome of interest is binary, the desired effect estimate is the absolute risk difference, and non-adherence to the experimental intervention does not result in switching to the control intervention (and vice versa).Despite this type of non-adherence being common in trials comparing different active drugs (where non-adherence typically occurs in the form of missed doses or permanent discontinuation of the assigned medication), research into the impacts of treatment non-adherence has focused predominantly on crossovers from one trial arm to another.4][15] Instead, multiple imputation (MI), IPTW and doubly-robust (DR) methods will be used to estimate treatment effects under the hypothetical situation that all participants had received 100% of their randomly assigned intervention, and the performance of these techniques compared with ITT and PP approaches.Despite the well-recognized limitations of adjusting for observed adherence within a regression model, this method is also included in order to assess its performance in a non-inferiority setting. 12The aforementioned techniques will be evaluated under a range of non-adherence scenarios using computer simulations designed to replicate REMoxTB, a non-inferiority trial that assessed the safety and efficacy of novel tuberculosis (TB) regimens. 16hile we have chosen a TB treatment trial as the motivating example for this simulation study, the findings are likely to be relevant to other disease areas where non-inferiority trials with binary outcomes are used to compare different active drugs.

THE REMOXTB TRIAL
For adults with drug-susceptible tuberculosis (DS-TB), effective treatment typically requires several drugs to be taken together for 6 months.7][18] However, these trials are often subject to treatment non-adherence, which has been associated with an increased risk of unfavorable outcomes. 19,20REMoxTB was a randomized, double-blind, phase III non-inferiority trial that evaluated two 4-month experimental regimens (one isoniazid-based and the other ethambutol-based) in comparison with a standard 6-month control regimen among adults with newly diagnosed, previously untreated DS-TB. 16Each experimental regimen consisted of four TB drugs prescribed daily for 17 weeks, followed by 9 weeks of placebo, whereas the control regimen consisted of daily TB medications for 26 weeks.The primary efficacy endpoint, a composite unfavorable outcome of treatment failure or recurrence within 18 months post-randomization, was assessed using both PP (the primary analysis which excluded participants receiving less than ∼80% of their allocated regimen) and modified intention-to-treat (mITT) analyses.Further details regarding the prescribed regimens and definitions of the analysis populations, non-adherence, and primary endpoint are provided in Data S1 (Table S1).The two experimental regimens failed to achieve non-inferiority in both the PP and mITT analyses.However, between 8% and 11% of the participants within each group met the protocol definition of treatment non-adherence and it is unclear how this may have influenced the trial results.
We conducted a simulation study based on the REMoxTB trial to assess the performance of different statistical methods that can be used to account for the impact of non-adherence to interventions in non-inferiority trials.By simulating various non-adherence scenarios, the study aimed to: (i) identify effective methods to address potential bias introduced by treatment non-adherence in non-inferiority trials comparing active drugs, (ii) assess how each method affects the probability of correctly concluding non-inferiority (statistical power) or falsely claiming non-inferiority (type I error), (iii) assess the impacts of unobserved confounding, treatment-adherence interactions, and misspecification of the adherence and outcome models on the performance of each method, and (iv) identify which methods are most appropriate for use in non-inferiority trials comparing active drugs in the presence of treatment non-adherence.

Study design
Computer simulations were used to replicate a two-arm randomized non-inferiority TB trial with a composite unfavorable outcome consisting of treatment failure or recurrence within 18 months following randomization.Participants were randomized in a 1:1 ratio to either a 4-month experimental regimen (EXP) or a standard 6-month control regimen (CON) and the proportions of unfavorable outcomes occurring in the two groups compared by estimating the risk difference.
Letting  1 represent the true probability of unfavorable outcomes with EXP and  0 the true probability of unfavorable outcomes with CON, the risk difference is defined as  1 −  0 .This quantity can be estimated using the difference in the observed proportions of unfavorable outcomes among the EXP (p 1 ) and CON (p 0 ) groups, p 1 − p 0 .Data from REMoxTB were obtained from the Platform for Aggregation of Clinical TB Studies (TB-PACTS; https://c -path.org/programs/tb-pacts/)and used to inform the data generation process.Since the required parameters were similar in the two experimental groups of REMoxTB, data from these arms were combined before estimating the parameters to be used for the EXP group.Furthermore, it was assumed that the 4-month EXP regimen was followed by 2 months of placebo and therefore the adherence levels within this treatment group would be comparable to those observed in the experimental arms of REMoxTB.The simulation study was designed, conducted and reported using the "ADEMP" framework proposed by Morris and colleagues, and all analyses were conducted using Stata SE version 17.0. 21

Data generation
Participant-level data were generated within each simulated data set.First, three binary covariates (C) representing age (C 1 = 0 represents <30 years, C 1 = 1 represents ≥30 years), smoking status (C 2 = 0 represents never smokers, C 2 = 1 represents ever smokers), and human immunodeficiency virus (HIV) status (C 3 = 0 represents HIV negative, C 3 = 1 represents HIV positive) were simulated to reflect the characteristics of participants included in the mITT population of REMoxTB.Each of these characteristics were identified as important predictors of both adherence and unfavorable outcomes within the trial data set and therefore act as potential confounders of this association.Participants were randomly allocated to treatment (Z = 0 represents allocation to CON, Z = 1 represents allocation to EXP) with equal probability and then adherence to the assigned treatment (A, defined as the overall percentage of prescribed doses received) was simulated conditional on the three covariates (see Data S1 for details).Initially, the distributions of adherence were assumed to be similar in the CON and EXP groups, given that the observed distributions in the control and experimental arms of REMoxTB were also comparable.Next, adherence was converted from a continuous variable (0 to 100%) to an ordi- nal variable (<80%, 80% to <100%, or 100%) and each participant's probability of an unfavorable outcome estimated as follows: where y i = 1 represents an unfavorable outcome occurring for participant i (i = 1, 2, … , n) with probability  i , TE is the assumed treatment effect for receiving EXP vs CON, x i = 1 if the ith participant receives EXP and x i = 0 if they receive CON, a 1i = 1 if the ith participant receives 80% to <100% of doses and a 1i = 0 otherwise, and a 2i = 1 if the ith participant receives <80% of doses and a 2i = 0 otherwise.Different values of TE were explored ranging from 0 (non-inferiority of EXP) to 0.06 (inferiority of EXP) (Table 1).The remaining coefficients were estimated using the REMoxTB data set as described in Data S1.Finally, the outcome (Y = 0 represents a favorable outcome, Y = 1 represents an unfavorable outcome) was simulated using a Bernoulli pseudo-random variable taking the value one according to the probabilities estimated in Equation (1).
Based on a 15% risk of unfavorable outcomes in the EXP and CON groups and assuming 85% power, a one-sided type I error of 2.5%, and a non-inferiority margin of 6% on the risk difference scale (chosen to emulate REMoxTB), 1280 participants were required per simulated data set.In order to be 95% confident that the type I error would be between 1.8% and 3.2%, 2000 simulated data sets were generated for each non-adherence scenario explored.
A range of non-adherence scenarios were explored by varying different elements of the data generation process one at a time (Table 1).Among the factors varied were the quantity of treatment adherence in the EXP group, the assumed effect of receiving EXP vs CON, and the presence/absence of unobserved confounding of the adherence-outcome association.The level of adherence in the EXP group was increased by adding a random quantity to the percentage of doses received among individuals who did not originally receive all doses.This random quantity was drawn from a normal distribution with a mean of 3.28 and a variance of 1 (simulating >100 000 observations indicated that this would result in a 3% improvement in the median percentage of doses received).In order to reduce the level of adherence in the EXP group, the percentage of doses received was reduced among all individuals originally receiving less than 100% of doses, plus 20% of those originally receiving all doses.The latter were sampled so that participants with risk factors for being non-adherent (>30 years old, ever smokers or HIV positive) were more likely to be selected.After selecting the aforementioned individuals, a random quantity drawn from a normal distribution with a mean of 2.48 and a variance of 1 was subtracted from their original percentage of doses received (simulating >100 000 observations indicated that this would result in a 3% reduction in the median percentage of doses received).Unobserved confounding of the adherence-outcome association was induced by omitting HIV status, the covariate with the largest effect on the risk of unfavorable outcomes, from the relevant analyses.As well as assuming that the effects of non-adherence on unfavorable outcomes were the same in the CON and EXP groups, a treatment-adherence interaction was also explored.This interaction was produced by using Equation (1) to calculate the risks of unfavorable outcomes among the EXP group, but multiplying the coefficients a 1 and a 2 by a factor of 0.8 when calculating the corresponding risks among the CON group.Consequently, the effect of non-adherence on unfavorable outcomes was assumed to be greater for EXP than CON, which is plausible given that the EXP regimen contains fewer doses in total and so each missed dose is likely to be more important.Finally, misspecification of the models used to predict adherence and outcomes was explored by omitting all interaction terms (see Section 3.4).Further details regarding the data generation process are provided in Data S1.

Estimand
The ICH E9(R1) addendum provides a structured framework for defining estimands and sensitivity analyses in trials. 22here possible, we define each of the five attributes used to construct an estimand in accordance with the mITT approach used in REMoxTB: i. Treatment-A 4-month experimental regimen (EXP) will be compared to a standard 6-month control regimen (CON) with each participant assigned to receive one of the treatments at random.ii.Population-Adults with newly diagnosed, previously untreated DS-TB.iii.Outcome-A binary outcome ('favorable' or 'unfavorable'), where a favorable outcome is defined as participants with culture negative status at 18 months post-randomization, who had not already been classified as having an unfavorable outcome (see intercurrent events below), and whose last positive culture was followed by at least two negative culture results.Culture negative status is defined as two negative culture results without an intervening positive result.Missing values of the outcome due to withdrawals of consent (experienced by 1.8% of REMoxTB participants) or losses to follow-up (0.5%) will be treated as unfavorable outcomes.iv.Intercurrent events-In this simulation study, the intercurrent event of interest is non-adherence to the randomly assigned treatment which will be handled by estimating the effect of EXP vs CON under the hypothetical scenario that all participants had received 100% of their allocated regimen.Changes or extensions to the assigned regimen (7.4%), non-violent deaths during treatment (0.9%), and TB-related deaths during follow-up (0.1%) will all be treated as unfavorable outcomes.Participants experiencing reinfections with a new strain of TB (2.2%) or non-TB-related death (0.4%) will be classified as having non-assessable (neither favorable nor unfavorable) outcomes and will be excluded from simulations. 23. Population-level summary-The effect of EXP vs CON had all participants been fully adherent to their allocated regimen will be estimated using a risk difference and its corresponding two-sided 95% confidence interval (CI).Non-inferiority of EXP vs CON will be concluded if the upper limit of the 95% CI for the risk difference is less than the non-inferiority margin of 6%.

Analysis methods
The MI, IPTW, and DR methods were applied in order to target the estimand described in Section 3.3.Other statistical approaches which do not target this estimand but are commonly employed in non-inferiority trials were also assessed, namely ITT and PP analyses, and adjustment for observed adherence within a regression model.In the subsequent descriptions, the regimen received is denoted by X (X = 0 represents receiving CON, X = 1 represents receiving EXP) and the percentage of doses received is denoted by A 100 (A 100 = 0 represents receiving less than 100% of doses, A 100 = 1 represents receiving 100% of doses).Therefore, there are four potential outcomes, where Y X=0,A 100 =0 represents the potential outcome if less than 100% of doses of CON are received, Y X=0

Intention-to-treat and per-protocol analyses
The ITT analysis incorporates all participants according to their randomly allocated regimen, regardless of the treatment they actually receive.The effect of Z on Y is estimated as Pr . In contrast, the PP analysis excludes participants who are considered to be non-adherent to their allocated regimen.Three separate definitions of non-adherence were considered based on participants receiving less than 100%, 90%, or 80% of their prescribed doses (denoted PP100, PP90, and PP80, respectively).For instance, the PP100 analysis estimates: Pr . The ITT and PP analyses were conducted using generalized linear models (GLM) for binary outcomes with identity link functions in order to estimate risk differences.

3.4.2
Adjustment for observed adherence within a regression model Adjustment for participants' observed levels of adherence was performed using a logistic regression model in the ITT population.The model contained an indicator variable for allocated treatment and an ordinal variable for the percentage of prescribed doses received (<80%, 80% to <100%, or 100%).Predicted probabilities of the counterfactual outcomes Y Z=0 and Y Z=1 were obtained for each participant and the difference in the mean predicted probabilities between the EXP and CON groups calculated (ie, a risk difference), along with a 95% CI estimated using delta-method SEs.This approach is referred to as an adjusted ITT analysis henceforth.

Multiple imputation of outcomes
MI was used to impute outcomes for participants receiving less than 100% of doses under the counterfactual scenario that they had been fully adherent.Log-odds ratios for the effects of the covariates (C) and adherence (A 100 ) on unfavorable outcomes were estimated using logistic regression models fitted separately within the CON and EXP groups which contained indicator variables for C and A 100 , and all of their possible interactions.A new set of coefficients were then drawn from a multivariate normal distribution using the estimated coefficients and variance-covariance matrix from the logistic regression model as the mean and variance, respectively, and these new coefficients used to predict participants' log-odds of an unfavorable outcome.After transforming these log-odds into probabilities using the expit function, a participant-level bias correction was applied (in the form of a second-order Taylor series expansion of the logit function) to account for the fact that the expectation of these predicted probabilities is not equal to the expectation of the predicted probabilities from the initial logistic regression model (see Data S1 for details).Within each stratum of C and A 100 , the predicted probabilities for those receiving all doses were then used to impute outcomes among those receiving less than 100% of doses.Ten imputations were performed per simulated data set and the estimated risk differences were combined using Rubin's rules. 24If the participants within a particular stratum of C who received 100% of doses all experienced the same outcome (either all favorable or all unfavorable), a decision was made to impute unfavorable outcomes with risks of 1% and 90%, respectively.

3.4.4
Inverse-probability-of-treatment weighting IPTW was used to re-weight the outcomes of participants receiving all doses so that the re-weighted pseudo-population contained no individuals that were non-adherent.First, predicted probabilities of receiving all doses were estimated within each stratum of C using logistic regression models fitted separately in the CON and EXP groups which contained indicator variables for each covariate in C and all of their possible interactions.Weights were calculated as the inverse of these predicted probabilities and the risk difference estimated using a weighted GLM for a binary outcome with an identity link function and robust SEs to account for weighting.The IPTW estimator for a given treatment group is: where pi = Pr ( If the participants within a particular stratum of C all received 100% of doses, a decision was made to re-weight these individuals using a weight equal to 1.01 (equivalent to a predicted probability of receiving all doses of 0.99) in order to incorporate some random variability.

Doubly-robust estimator
The DR estimator was used to combine properties of the MI and IPTW methods.This approach involves three key steps: (i) predicted probabilities of receiving all prescribed doses are estimated separately in the CON and EXP groups as described for the IPTW method, and then inversed to obtain weights, (ii) predicted probabilities of unfavorable outcomes are estimated separately in the CON and EXP groups as described for the MI method, and (iii) each predicted probability of the outcome (from Step ii) is weighted by the predicted probability of receiving all doses (from Step i) in order to produce a weighted average of the two models.The phrase "doubly robust" refers to the fact that this estimator requires at least one of the models used to predict adherence or outcome to be correctly specified, but not both.The DR estimator for a given treatment group is: where Ŷi is the predicted probability of an unfavorable outcome for the ith participant (from Step ii).ΔDR was estimated separately in each treatment group using the teffects command in Stata and the risk difference calculated using their absolute difference (EXP minus CON).The variance for the risk difference was calculated as the sum of the group-specific variances of ΔDR .

Performance measures
To assess the performance of the different analysis methods under each non-adherence scenario, we calculated the following: the mean estimated risk difference, the difference between the mean estimated risk difference and the true risk difference (the bias), the empirical SE (the SD of the estimated risk differences), the mean of the estimated SEs, the loss of information (using the variance of the risk difference assuming perfect adherence as the comparator), the mean squared error (bias-squared plus the empirical SE), and the type I error or power. 21To assess the probability of a type I error, unfavorable outcomes were generated assuming that EXP was inferior to CON (6% treatment effect) and analyses which concluded non-inferiority were deemed to have committed a type I error.To examine power, unfavorable outcomes were generated assuming that EXP was non-inferior to CON (0% or 3% treatment effect) and analyses which failed to conclude non-inferiority were deemed to have committed a type II error.Power is the probability of avoiding a type II error.Monte Carlo Standard Errors (MCSE) were calculated for the bias and type I error, and used to estimate corresponding 95% CIs.

Similar quantities of non-adherence in the control and experimental groups (no unobserved confounding or treatment-adherence interaction)
In simulations where the quantities of non-adherence were assumed to be similar among trial arms, a median of 7.4%, 20.2%, and 72.3% of the participants in each treatment group received 80%, 80% to <100%, and 100% of their prescribed doses, respectively.All of the analysis methods were unbiased when the treatment effect was assumed to be either 3% or 6%, though the type I error rates tended to be slightly higher than the nominal value of 2.5% (Figure 1; Tables S2 and  S3).In particular, the PP80 and PP90 approaches appeared to inflate the type I error rate, whereas the ITT analysis was unbiased and maintained a type I error rate close to 2.5%.When no difference in the risks of unfavorable outcomes was assumed between treatment groups, all methods were unbiased and resulted in estimates of power that were at least 85% (the level of power assumed in the sample size calculation; Table S4).

TA B L E 2
Performance of analysis methods assuming better adherence in the experimental group (EXP) than the control group (CON), a treatment effect of 6%, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.

2.96
Abbreviations: CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; MSE, mean squared error; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses; SE, standard error. a The SD of the estimated risk differences.
b Using the variance of the risk difference assuming perfect adherence as the comparator.
Adjusted for observed levels of adherence as a covariate.

F I G U R E 1
Mean bias, type I error, and statistical power of each analysis method assuming similar quantities of non-adherence in the control and experimental groups, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses.* Adjusted for observed levels of adherence as a covariate.

Better adherence in the experimental group (no unobserved confounding or treatment-adherence interaction)
In simulations assuming better adherence in the EXP group, a median of 7.2%, 5.9%, and 86.8% of participants in this arm received 80%, 80% to <100%, and 100% of their prescribed doses, respectively (corresponding values for the CON group were unchanged from Section 4.1).IPTW and the DR estimator were the only unbiased methods under both 3% and 6% treatment effects (Figure 2; Table S5 and Table 2).The MI approach was also unbiased when the treatment effect was assumed to be 3%, but overestimated the treatment effect of 6% by a small amount (bias = 0.11%; 95% CI 0.02% to 0.19%).The ITT, PP, and adjusted ITT analyses were all biased under both 3% and 6% treatment effects, with the magnitude of bias appearing to be similar for the ITT, PP80, and PP90 approaches; the corresponding type I error estimates were inflated approximately 3-fold.Similar patterns were observed when no effect of EXP was assumed, with only the MI, IPTW, and DR methods providing unbiased estimates of the treatment effect, and the ITT, PP80, and PP90 approaches appearing to be biased by similar amounts (Table 3).All of the analysis methods resulted in estimates of power that were greater than 90%.

Worse adherence in the experimental group (no unobserved confounding or treatment-adherence interaction)
In simulations assuming worse adherence in the EXP group, a median of 7.6%, 34.5%, and 57.9% of participants in this arm received 80%, 80% to <100%, and 100% of their prescribed doses, respectively (corresponding values for the CON group were unchanged from Section 4.1).The MI, IPTW, and DR approaches were the only unbiased methods under both 3% and 6% treatment effects (Figure 3; Table S6 and Table 4).The ITT, PP, and adjusted ITT analyses were all biased under both 3% and 6% treatment effects, with the magnitude of bias appearing to be similar for the ITT, PP80, and PP90 approaches; TA B L E 4 Performance of analysis methods assuming worse adherence in the experimental group (EXP) than the control group (CON), a treatment effect of 6%, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.

5.07
Abbreviations: CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; MSE, mean squared error; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses; SE, standard error. a The SD of the estimated risk differences.
b Using the variance of the risk difference assuming perfect adherence as the comparator.
c Adjusted for observed levels of adherence as a covariate.

TA B L E 5
Performance of analysis methods assuming worse adherence in the experimental group (EXP) than the control group (CON), a treatment effect of 0%, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.

3.97
Abbreviations: CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; MSE, mean squared error; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses; SE, standard error.a The SD of the estimated risk differences.b Using the variance of the risk difference assuming perfect adherence as the comparator.c Adjusted for observed levels of adherence as a covariate.

F I G U R E 2
Mean bias, type I error, and statistical power of each analysis method assuming better adherence in the experimental group than the control group, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses.* Adjusted for observed levels of adherence as a covariate.
the corresponding type I error estimates were less than 1%.When no difference in the risks of unfavorable outcomes was assumed between treatment groups, similar patterns were observed, though the MI approach overestimated the treatment effect by a small amount (bias = 0.10%; 95% CI 0.01% to 0.19%) and resulted in a moderate loss of power (80.9%; 95% CI 79.2% to 82.6%).IPTW and the DR estimator remained unbiased and resulted in estimates of power that were close to 85% (Table 5).In contrast, the ITT analysis resulted in a considerable loss of power (73.0%; 95% CI 71.1% to 74.9%).

Unobserved confounding of the adherence-outcome association
In the presence of unobserved confounding of the adherence-outcome association, the MI, IPTW, and DR approaches were all unbiased under treatment effects of 0%, 3%, and 6% when the quantities of non-adherence were similar among trial arms (Table S7).In simulations assuming better adherence in the EXP group than the CON group, all three methods overestimated the treatment effects by a small amount, with the corresponding estimates of bias ranging from 0.08% to 0.21% (Table S8).In contrast, the three methods tended to be unbiased in simulations assuming worse adherence in the EXP group than the CON group, except for IPTW and the DR estimator which both underestimated the treatment effect of 3% by a small amount (bias = −0.14%;95% CI −0.23% to −0.05%) (Table S9).Regardless of the assumed level of adherence in the EXP group, unobserved confounding did not affect the estimated type I error rates substantially (range: 1.9%-3.1%).

Presence of treatment-adherence interaction
When the effect of non-adherence on unfavorable outcomes was assumed to be greater for EXP than CON, the ITT, PP, and adjusted ITT analyses tended to be biased, the magnitude of which could be substantial (Figures S1-S3 and Mean bias, type I error, and statistical power of each analysis method assuming worse adherence in the experimental group than the control group, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses.* Adjusted for observed levels of adherence as a covariate. Tables S10-S18).For example, the mean bias for the ITT analysis was ∼1.2% under all three treatment effects (0%, 3%, and 6%) when the treatment-adherence interaction was combined with similar quantities of adherence between trial arms; the corresponding type I error estimate was roughly eight times smaller than the desired 2.5% and power was estimated to be 16% lower than the desired value of 85% (Tables S10-S12).The ITT, PP, and adjusted ITT analyses were most biased in scenarios that combined the treatment-adherence interaction with worse adherence in the EXP group than the CON group (Tables S16-S18).In contrast, the MI, IPTW, and DR approaches were unbiased and maintained type I error rates close to 2.5% in most of the scenarios explored, resulting in estimates of power that were at least 83%.Results for these three methods were also comparable in the presence of unobserved confounding of the adherence-outcome association, though a small amount of bias remained in some scenarios (Tables S19-S21).

Misspecification of the adherence and outcome models (no unobserved confounding or treatment-adherence interaction)
When the adherence and outcome models were misspecified by omitting interaction terms, the IPTW and DR methods remained unbiased and maintained type I error rates close to 2.5% regardless of the assumed level of adherence in the EXP group (Tables S22-S24).In contrast, the MI approach remained unbiased when the quantities of non-adherence were similar in the EXP and CON groups but tended to result in a small amount of bias when adherence levels differed between trial arms (range: −0.23%-0.21%),which consequently under-or over-inflated the type I error rate.All three methods resulted in estimates of power that were at least 83%.

DISCUSSION
Using computer simulations to replicate a real non-inferiority trial with an active control regimen, this study found that when non-adherence differed between trial arms, ITT and PP analyses often resulted in non-trivial bias in the estimated treatment effect which consequently under-or over-inflated the type I error rate.Depending on the patterns of non-adherence, it was possible for ITT and PP analyses to be biased in the same direction by similar amounts, and, generally, the presence of a treatment-adherence interaction exacerbated the issues with these approaches.Adjustment for observed adherence led to similar issues, whereas the MI, IPTW and DR methods were able to correct bias under most non-adherence scenarios but could not always eliminate bias entirely in the presence of unobserved confounding.The bias correction applied as part of the MI approach appeared to work well, but not perfectly; the inclusion of higher order terms from the Taylor series expansion of the logit function may further reduce the magnitude of any bias observed.IPTW and the DR estimator were generally unbiased, maintained desired type I error rates, and did not result in any meaningful losses of statistical power.
Our results are consistent with those of Mo et al. despite differences with the study designs. 5For instance, when differential non-adherence occurred due to confounding factors which increased the probability of an adverse outcome, Mo and colleagues also found that ITT and PP analyses tended to result in bias that frequently occurred in the same direction.IPTW and IV estimation were able to minimize this bias, but the former approach could only eliminate it entirely if all confounders were appropriately adjusted for.IV methods were not considered in the current study since standard approaches are only able to account for treatment crossovers, a different type of non-adherence to that considered in our study.
Other simulation studies have assessed the performance of RPSFTM and G-estimation for handling treatment crossovers in non-inferiority trials with time-to-event outcomes compared with conventional survival analyses. 13,14While RPSFTM and G-estimation is not an appropriate analysis method for the binary outcomes considered in our study design, these simulations were consistent with our results and showed that in the presence of differential non-adherence, ITT and PP analyses can result in either conservative or anti-conservative type I error rates.Similar to our conclusions, the authors of both studies deduced that in non-inferiority trials with treatment non-adherence, neither ITT or PP analyses can guarantee the validity of non-inferiority conclusions.
Although IPTW and the DR estimator were the best performing methods in this study, these techniques have some important limitations.First, they can eliminate bias if all confounders can be appropriately adjusted for, but this will often not be possible.Potential confounders of the adherence-outcome association need to be carefully considered at the design stage of trials, relevant data collected as fully as possible, and their effects modeled correctly when predicting adherence and outcomes.For IPTW, misspecification of either of these models, such as the omission of important interactions or incorrectly specifying the functional form of covariates, may lead to treatment groups that are imbalanced with regard to potential confounders and consequently bias in the estimated treatment effect. 25A key advantage of the DR estimator is that only one of the models used to predict adherence and outcomes needs to be correctly specified, but not necessarily both.Finally, IPTW and the DR methods cannot be used if there are strata of the covariates for which all participants are fully adherent (violating the positivity assumption).To overcome this issue, we imputed a large predicted probability of receiving all doses for individuals within such strata (probability of 0.99), which performed well in the simulations.
ITT analyses play an important role in estimating treatment effects in clinical practice, but we have shown that they pose several limitations in the context of non-inferiority trials.They rely on strong assumptions, such as trial adherence being reflective of real-world behavior, yet adherence is frequently observed to be better in trials. 26Consequently, non-inferiority conclusions based on an ITT approach may not apply to real-world scenarios with different adherence patterns.In addition, a particular concern in non-inferiority trials is that non-adherence can dilute estimated treatment effects and increase the risk of type I errors (accepting an inferior intervention).In our study, using an ITT analysis inflated the risk of a type I error in some scenarios, particularly when adherence to the experimental regimen was better than the comparator.Instead, we advocate for estimating treatment effects under full adherence to ensure that non-inferiority exists when participants receive their interventions as intended, avoiding any bias caused by non-adherence.This estimand translates easily from a trial to real-world setting and is likely to be of greater interest to patients and healthcare professionals than ITT effects dependent on observed adherence.
In future TB trials, shorter regimens and novel approaches to adherence monitoring may result in improved adherence. 27One would expect adherence levels to improve as duration of treatment shortens, in which case any bias in ITT and PP analyses should diminish.However, some non-adherence is likely even with shorter regimens, as was seen in a more recent short-treatment TB trial. 28Therefore, the ITT and PP results of trials assessing shorter regimens are not guaranteed to be free from bias caused by non-adherence.In addition, recent developments in digital adherence technologies such as digital pillboxes show mixed evidence that they may lead to improved adherence. 29,302][33] Our findings are also of interest beyond TB trials, where non-adherence to treatment is likely to remain a concern for the foreseeable future.
The current study has several strengths including the use of a real non-inferiority trial to inform the design, the direct comparison of different statistical methods, and the range of non-adherence scenarios explored.However, it has some limitations.First, our simulation study relies on assumptions made in the data generation process and analysis, albeit we feel that these assumptions are plausible.Second, we calculated adherence using the overall percentage of prescribed doses received.Other patterns of adherence, such as the timing of missed doses, may be important. 34,35hird, adherence was dichotomized (100% vs <100% of doses received) before being included in the models used to predict adherence and outcomes.It is plausible that utilizing a continuous functional form of adherence may improve the ability of the more sophisticated methods to eliminate bias due to treatment non-adherence.For instance, using fractional polynomials or generalized propensity score methods. 36,37Future work should assess the additional benefits of these approaches.In addition, further research should compare the performance of different statistical methods for handling treatment non-adherence in non-inferiority trials with active control regimens and time-to-event outcomes (where treatment crossovers are often not permitted).

CONCLUSION
In non-inferiority trials where treatment non-adherence differs between arms, ITT and PP analyses can produce biased estimates of efficacy that can occur in the same direction, potentially leading to the acceptance of inferior treatments or efficacious regimens being missed.IPTW and the DR estimator, which are relatively straightforward methods to implement, were able to correct bias under most non-adherence scenarios and should be used to supplement ITT and PP approaches in ongoing non-inferiority trials with active control regimens and binary outcomes.Future studies should ultilize more sophisticated methods for handling non-adherence in the primary analysis with a number of sensitivity analyses conducted (including ITT and PP approaches) in order to explore the impacts of their different assumptions.

TA B L E 1 Simulation study scenarios. Element of data generation process Scenarios explored
a Defined as the overall percentage of prescribed doses received.
Performance of analysis methods assuming better adherence in the experimental group (EXP) than the control group (CON), a treatment effect of 0%, no unobserved confounding of the adherence-outcome association, and no treatment-adherence interaction.
Abbreviations: CI, confidence interval; DR, doubly robust; IPTW, inverse-probability-of-treatment weighting; ITT, intention-to-treat; MI, multiple imputation; MSE, mean squared error; PP100, per-protocol defined as receiving 100% of prescribed doses; PP80, per-protocol defined as receiving at least 80% of prescribed doses; PP90, per-protocol defined as receiving at least 90% of prescribed doses; SE, standard error.aThe SD of the estimated risk differences.bUsing the variance of the risk difference assuming perfect adherence as the comparator.cAdjusted for observed levels of adherence as a covariate.TA B L E 3