Simple Bayesian models for missing binary outcomes in randomized controlled trials

Missing outcomes are commonly encountered in randomized controlled trials (RCT) involving human subjects and present a risk for substantial bias in the results of a complete case analysis. While response rates for RCTs are typically high there is no agreed upon universal threshold under which the amount of missing data is deemed to not be a threat to inference. We focus here on binary outcomes that are possibly missing not at random, that is, the value of the outcome influences its possibility of being observed. Salient information that can assist in addressing these missing outcomes in such situations is the anticipated response rate in each study arm; these can often be anticipated based on prior research in similar populations using similar designs and outcomes. Further, in some areas of human subjects research, we are often confident or we suspect that response rates among RCT participants with successful treatment outcomes will be at least as great as those among participants without successful treatment outcomes. In other settings we may suspect the opposite relationship. This direction of the differential response between those with successful and unsuccessful outcomes can further aid in addressing the missing outcomes. We present simple Bayesian pattern‐mixture models that incorporate this information on response rates to analyze the relationship between a binary outcome and an intervention while addressing the missing outcomes. We assess the performance of this method in simulation studies and apply this method to the results of an RCT of a smoking abstinence intervention.

can be subject to appreciable response bias in intervention effect estimates if the difference between the respondents and nonrespondents is large enough.The intent-to-treat principle, where all individuals randomized to participate in the trial are included in the analysis as part of the arm to which they were randomized, has become standard in most research fields.The challenge for applying this principle is how to include cases with missing outcomes in the analysis.
3][4] The term missing at random (MAR) labels situations where whether an outcome is missing depends on completely observed variables, but not on the value of the outcome itself given these other variables.Missing not at random (MNAR) labels situations where the value of the outcome directly influences whether the outcome is observed; these types of missing data are also labeled as nonignorable or informative missing data.6][7][8][9] Pattern mixture analyses make assumptions about how the association between intervention and outcomes differs between those with observed outcomes and those with missing outcomes.Selection models make assumptions about how observation of the outcome is related to the assigned intervention and the value of the outcome.In practice, it can be difficult to confidently specify mathematical models for these associations which then necessitates use of sensitivity analyses assessing results across a range of plausible models.Synthesizing results across the range of modeling assumptions incorporated in the sensitivity analysis can be a significant challenge when the results vary greatly.
Researchers conducting an RCT can find these methods challenging to implement.1][12] This may be due to a misplaced belief in the appropriateness of the employed methods or due to a lack of skill or confidence to apply the more appropriate but complex methods.Incorporation of additional information, such as information on anticipated response rates, may help simplify implementation of the appropriate methods to address the missing outcomes.Researchers are often confident the response rates for a trial will fall in ranges specifiable prior to study implementation; these anticipated response rates are used in the study planning efforts and are informed by prior research of similar interventions, designs, and populations.Further, response rates will depend in large part upon the intensity of the follow-up protocol employed.While RCTs employ protocols aimed at yielding high response rates, there are tradeoffs between response and both costs and participant acceptance of specific protocols so there will generally be some nonresponse.For a given follow-up protocol and study population, we may be comfortable specifying upper bounds less than 100%, such as 90% or 95%, on the response rate for a given group.
We focus on binary outcomes, with values here labeled as successful or unsuccessful, where observation of the outcome may depend on both the value of the outcome and intervention assigned.We build into the analyses assumptions on the differences in response rates between those with successful and unsuccessful outcomes.If we consider a range of positive and negative values for this difference and give balanced weight to positive and negative values, we generally will obtain overall point estimates comparable to those from the complete case analysis but with altered variance for the estimate that increases with increased ranges of values considered.More information may be gained by also considering ranges for the differences in response rates that are shifted in either direction.Researchers in some fields are often willing to assume that response rates for those participants with successful outcomes are comparable to or larger than response rates for those with unsuccessful outcomes.8][19][20] A cautionary hedge against such assumptions then would be to also implement analyses where response rates among those achieving abstinence are assumed to be comparable or smaller than response rates among those not achieving abstinence.Alternately, in some fields the common perception may be reversed with those with successful outcomes viewed as less likely to respond.
Given information or assumptions on the likely response rates in each study arm, assumptions on whether those with successful outcomes are more or less likely to respond than those with unsuccessful outcomes, and upper bounds on the response rates for those more likely to respond we can generate, for each arm, intervals that contain the difference in response rate between the higher responding group and the overall response rate in arm.We discuss a PMM for a two arm RCT with binary outcome that uses the intervals for the difference in response between those with successful outcomes and those with unsuccessful outcomes to address the missing outcomes.These PMM analyses can be employed in a manner that retains the analytic simplicity inherent to an RCT and can be implemented in a manner that does not require modeling techniques more complex than Monte Carlo sampling from readily derived Beta posterior distributions.The PMMs can be extended to include additional arms and covariates to further model the missing data mechanism.However, our intent here is to discuss a simple approach to appropriately address missing outcomes that can be employed by a wide population of researchers that does not require higher levels of statistical computation skill and knowledge.

SIMPLE PMM FOR MNAR OUTCOMES
We consider a binary outcome Y , binary intervention Z, and binary indicator for response R. We assume intervention assignment Z is always known.When the value of Y is missing, R has value 0; when the value for Y is observed, R has value 1.The conditional distribution of Y given Z is specified by p z = P(Y = 1 | Z = z) and the conditional distribution for Y given Z and R by p zr = P(Y = 1 | Z = z, R = r).Similarly, we can specify the conditional distribution of R given Y and Z by q zy = P(R = 1 | Z = z, Y = y) and the conditional distribution of R given Z by q z = P(R = 1 | Z = z).We assume 1 > p z , q z , p zr , q zr > 0 for all z, r.
The joint distribution for (Y , R) given Z can be denoted using P(Y = y, R = r | Z).PMMs and SMs are both based on this joint conditional distribution for Y and R given Z but use different factorizations to model the joint conditional distribution.A PMM models how R depends on Z and how Y depends on both Z and R. For a SM, this order of dependency is reversed.These are given by the following factorizations: We consider a PMM specifying logistic regression models for the respective conditional distributions.
We can readily fit Bayesian models for the distribution of R given Z and for the distribution of Y given Z and R = 1 in Equations ( 1) and ( 2) by specifying vague or noninformative prior distributions on the model parameters.Generally, we would want to specify vague or noninformative prior distributions for these parameters in the analysis of an RCT to ensure the derived posterior distributions for the parameters are primarily based on the observed data with little to no dependence on the specified prior distributions.Difficulties in fitting the full model of course arise from the cases with missing outcomes; where R = 0.For these cases, no observations are available for Y so any prior distributions we would specify for the comparable model cannot be updated by the observed data in the generation of posterior distributions.Using vague prior distributions for  00 and  10 does not work well in this setting as, since these prior distributions are not updated by the data, the posterior distribution will give considerable weight to both small and large values for the parameters.Karicoti and Raghunathan 21 discuss using PMM with outcomes following a distribution in an exponential family of distributions.For the logistic regression models considered here, they model Equation (3) by relating the parameters in this model to those in the model for Y | Z = z, R = 1.This is done by specifying prior distributions, for both z = 0 and z = 1, on the odds ratio for the association between R and Y given Z = z, denoted  z .The relationships log  0 =  01 −  00 and log  1 =  0 +  11 −  10 relate these two sets of parameters.For each value for Z then, this approach must specify a distribution on the odds ratio, or the log odds ratio, for the association between Y and R, given Z.As above, specifying diffuse vague prior distributions for the  z generally does not work well as the resultant posterior distributions will give considerable weight to very small and very large values for the  z .Specifying informative prior distributions on the  z or vague prior distributions on more likely values for the  z can be difficult as well.The recommended approach then is to consider a series of distinct plausible odds ratios, specify prior distributions tightly centered around these different individual odds ratios, and examine how results vary across these different prior distributions.The results derived across this series of different prior distributions for each of the  z can vary greatly and, as discussed above, synthesizing these results can be difficult.This PMM approach is similar to the frequentist imputation approach discussed by Hedeker, et al 22 and extended by Zhang, et al 23 to add biochemical verification of self-reported outcomes.In contrast to the difficulties in specifying prior distributions for  z , specifying plausible ranges for differences in response rates between those with successful and unsuccessful outcomes and sensible prior distributions on these ranges may be easier in some settings.

A SIMPLE BAYESIAN PMM ADDING RESPONSE RATE INFORMATION
Here we consider how to use the information or assumptions discussed above concerning the study response rates and upper bounds on response rates in the simple Bayesian PMM above.Follow-up assessment rates among those with successful outcomes and among those with unsuccessful outcomes will depend on several contextual elements of a study including study population, intervention assigned, follow-up assessment protocol intensity, and protocol duration.Most follow-up protocols will not reach all participants.For example, Tomson et al 24 noted that a common reported reason for initial nonresponse was simply a breakdown in the followup process, with the participant not receiving the followup material or the study team not receiving the returned material, rather than avoidance or refusal by the participant.Some populations are more transitory and difficult to reconnect with and shorter, less intense protocols are going to have less success making contact than more intense protocols.The PMM developed here uses specified upper bounds on the higher of the response rates among those subgroups in each arm and a research team may be confident in specifying upper bounds on the response rates that are lower than 1 based on these considerations.
If response rates are known or assumed to be higher for those with Y = 1 for a given z, then q z1 − q z ≥ 0. We assume q z is known to fall in a specified interval (a z , b z ) and the response rate among those with successful outcomes are known to be bound above by the quantity Q z .This combination of information implies the difference d z = q z1 − q z is bound above by As an example, if prior studies involving similar interventions, methodologies, and populations have consistently obtained response in the range 70% − 80% and we are confident that response rates when Y = 1 are bound above by 95% then we may feel confident in specifying that d z ∈ (0.0, 0.25).
In practice we would consider other alternate values for d m z based on contextual considerations and assumptions we want to consider in sensitivity analyses.For example, a more cautious set of assumptions would be to assume that response rates are either higher for those with Y = 1 or comparable to response rates for those with Y = 0 such that q z1 − q z ≥ c where c < 0 is a subjectively chosen bound for comparability of rates.In this case we would set d m z = c.Similarly, we may want to consider upper bounds smaller than Q z − a z to assess how results vary with the interval specified to contain d z .In the example above, if we observe response rates near 80% at the end of the study we may implement sensitivity analyses specifying narrower intervals d z ∈ (0.0, 0.15) or d z ∈ (0.0, 0.2).Sensitivity analyses assessing variation in the results of the PMM method developed here across different intervals for the d z is always warranted.
Assume then we are willing to specify that d z will fall in an interval (d m z , d M z ) for a given z.The usefulness of this information for addressing the missing outcome data becomes clearer when we note, for given q z and d z with q z1 = q z + d z , that The distribution specified by p z0 can be expressed in terms of p z1 , q z , and d z .Using a Bayesian framework, q z and p z1 are random variables with prior distributions generated from the prior distributions specified for  0 ,  1 ,  01 , and  11 .The posterior distributions for p z1 and q z given the observed data are derived using the specified prior distributions and likelihood models above.Let  q z (⋅ | Data) and  p z1 (⋅ | Data) denote these posterior distributions.We can approximate the full posterior distribution for the q z and p zr by Monte Carlo sampling from the individual posterior distributions and the distribution for d z .If we specify a prior distribution,  d z , for d z with mass focused on (d m z , d M z ) then we can use this distribution with  q z (⋅ | Data) and  p z1 (⋅ | Data) to derive posterior distributions for q z1 and p z0 .Specifically, to sample values for q z1 and p z0 given sampled values for q z and p z1 drawn from  q z (⋅ | Data) and  p z1 (⋅ | Data), we can sample a value d z from  d z and set q z1 = q z + d z and construct p z0 using Equation ( 4).In practice we can adjust these q z1 to min(q z + d z , 1 −  z ) with  z small.Specific study circumstances may add information on the more likely values for the d z but uniform prior distributions on the interval (d m z , d M z ) often represent an encapsulation of our prior information or beliefs for the location of d z .
As discussed above, we may instead want to assume that response rates among those with Y = 1 are lower than for those with Y = 0 based either on prior knowledge or as a cautionary sensitivity analysis to the assumptions made above.The approach above can be readily modified to consider the specification that dz = q z0 − q z falls in an interval ( dm z , dM z ) for a given z.This leads to the modification that for given q z and dz with q z0 = q z + dz .Here, to sample values for q z0 and p z0 given sampled values for q z and p z1 drawn from  q z (⋅ | Data) and  p z1 (⋅ | Data), we can sample a value dz from a specified prior distribution d z , set q z0 = q z + dz , and use Equation ( 5) to construct a value for p z0 .The method then can combine different directional assumptions across the study arms where those with successful outcomes are assumed to be more responsive in one arm and less responsive in the other arm.We would specify an interval for d z and use Equation ( 4) to sample values for p z0 in the one arm and specify an interval for dz and use Equation ( 5) to sample values for p z0 in the other arm.Generally, the intervention effects of interest can be specified in terms of the p z = q z p z1 + (1 − q z )p z0 so we can then simulate the posterior distribution for any effect measure of interest using Monte Carlo sampling.Posterior distributions for  00 and  10 can be approximated in similar fashion.For example, if we specify intervals (d m z , d M z ) to contain d z for both arms then given the q z , the q z1 ,  01 , and  11 , ) −  00 .
The discussion above assumed prior knowledge of overall response rates in the development of the method but, in practice, if we feel confident in specifying upper bounds, say Q z on the q z1 , but are less confident in specifying intervals containing the q z we can use the observed follow-up rates, qz , in each study arm without great loss of generality.The PMM approach uses specification of which of these response rates is or should tend to be the larger in each arm.Often we may feel confident about the direction of these differences but sensitivity analyses around this assumption will generally be warranted.
The discussion above outlined how to find an initial interval, for example, the (d m z , d M z ) = (0, Q z − a z ) interval above, but sensitivity analyses considering different values for the bounds of the interval including changing the direction of the differential response within arms is also warranted.Note also that in practice we would avoid using only intervals centered around 0 as such a set of intervals generally yields point estimates comparable to that from the complete case analysis with a change in magnitude of the measures of precision for the estimate, relative to the complete case analysis, that is a function of the width of the specified intervals.
The differential response rates d z drive the potential response bias.Consideration of ranges of possible differences in response rates will encompass the range of potential response bias.Specification of the intervals (d m z , d M z ) will depend on the specific settings of the RCT.The upper bound d M z will vary with our precision in specifying the range for q z and the upper bound we can specify for q z1 .The lower bound d m z can vary with how much differential response, due to social desirability or other biases, we are confident is present and willing to incorporate in the analysis.

A BAYESIAN PMM WITH CONJUGATE PRIOR DISTRIBUTIONS
We can view the sample from a different vantage and use vague conjugate prior distributions to formulate a version of the pattern mixture model that is simple to implement and does not require specialized software for fitting Bayesian models.
We have and we can view the variables for participants as i.i.d observations drawn from the joint distributions for (Y , R) given Z = z.Given the sample size, N z , for the number of participants in the group with Z = z, we can view the number of participants with observed outcomes, n z , as a binomial random variable with Further, among these n z individuals, the number of participants with Y = 1, denoted n z1 , is another binomial random variable with probability of success p z1 and If we specify a Beta( z ,  z ) prior distribution for q z and a Beta(a z , b z ) prior distribution for p z1 , so that then the posterior joint density function for the parameters is the product of two Beta densities, The first is a Beta(n z +  z , N z − n z +  z ) distribution and the second is a Beta(n z1 + a z , n z − n z1 + b z ) distribution.Among the m z = N z − n z participants with missing outcomes, the number with Y = 1, denoted m z1 , is another binomial random variable with probability of success p z0 and Again though, with no observations on Y when R = 0 we cannot update any prior distribution placed on p z0 .Using the approach above, given q z we can sample d z from the prior distribution  d z with mass focused on the interval (d m z , d M z ) and set q z1 = q z + d z .Given q z , q z1 , and p z1 , we have Inference about the distribution of Y conditional on Z = z can then be based on the sample updated posterior density function where f d z is the probability density function for the distribution , then sampling from the joint distribution is trivial and we can then readily simulate the posterior distribution for Y given Z = z.We can use the pair of joint posterior distributions for Y given Z = z to simulate the posterior distributions for any effect parameter of interest that is a function of the p z ; specifically, given a draw from the posterior distribution for q z , p z1 , d z , we have Alternatively, as above, we may instead want to assume dz ∈ ( dm z , dM z ).In this case we can sample dz from a specified distribution d z with mass focused on the interval ( dm z , dM z ) and set q z0 = q z + dz .Given q z , p z1 , and dz , ) . and The characteristics of Beta distributions inform our specification of the prior distributions used in application of this approach.If we specify a Beta(a, b) prior distribution for p z1 with a > b then before any data is incorporated in the model we have increased the likelihood of p z1 > 0.5 and if a < b then the model promotes the event that p z1 < 0.5.Specifying the hyperparameters to be equal yields a symmetric distribution around 0.5 with the probability of Y = 1 in group z with R = 1 comparable to a fair coin flip.Also we can view the sum of the hyperparameters in the Beta prior distribution as an a priori sample size.A small a priori sample size then leads the posterior distribution for p z1 to be primarily informed by the data.Thus, to not favor success over failure and to have an uninformative prior, we specify a = b = 0.5 for the conjugate prior distributions of p z1 and q z ; the information these prior distributions contribute to the posterior distribution is then equivalent to adding a prior sample size of 1.

APPLICATION OF THIS BAYESIAN PATTERN MIXTURE MODEL
We present results of a simulation study to demonstrate the performance of this Bayesian PMM method.We consider situations where overall response rates are in the neighborhood of 70% to 75%.We make use of a selection model formulation for the alternate conditional distributions to generate the data for these simulations; ), logit q zy =  + z + y + zy.
The PMM and SM stem from the same joint conditional distribution for (Y , R) given Z so the sets of parameters are directly related to one another; translations between parameters for the formulations are presented in the Appendix.Using this selection model formulation we can show that the relative nonresponse bias in estimating the odds ratio in a complete case analysis (CCA) is ) − e  e  (e  − 1)(e  − 1) (e  + 1)(e +++ + 1) , as outlined in the Appendix.We used this to set the bias present in the scenarios developed for the simulation studies.We consider scenarios with Under the first of these the probability Y = 1 when Z = 0 is 0.10 and the effect of the intervention is given by an odds ratio of e 0.8 ; so the probability Y = 1 when Z = 1 is approximately 0.20.In the second of these, the probability Y = 1 when Z = 0 is 0.30, the effect of the intervention is given by an odds ratio of e 0.45 , and the probability Y = 1 when Z = 1 is approximately 0.40.Table 1 presents the scenarios considered for the response probabilities among the different combinations for Y and Z together with the resultant odds ratios conditional on R = 1, the response bias expected in the sample odds ratio, and the proportional response bias expected in OR YZ|R=1 relative to OR YZ .We considered a total sample size of N = 2000 and a Bernoulli(0.5)distribution for intervention variable Z.For the first scenarios there are differences in response rates between those with and without successful outcomes but, since these rates do not vary by intervention arm, there is no response bias for the odds ratio for the association between outcome and intervention;  and  in the selection model formulation are both zero and the percent response bias is zero.We then altered these response rates for the intervention and outcome combinations to generate increasing magnitudes of response bias in the odds ratios for the other scenarios.
We have no specific context to use to specify the intervals for d z here so we used upper bounds of 90% and 95% for the response rates for combinations of intervention and outcome as these seem to be generally appropriate upper bounds on response rates.Using the observed sample response rates when prespecified intervals are unavailable is an alternative approach.Here, for a given sample we would expect the number of observations in each intervention arm to be near 1000 and the proportion of nonrespondents in each arm to have a sample standard error of approximately 1.5% or less.The sample proportion of respondents in each arm then should be close to the population model proportions, q z , in Table 1.For a given scenario, we considered one set of intervals given by (d m 0 , d M 0 ) = (0, 0.95 − q 0 ) and (d m 1 , d M 1 ) = (0.05, 0.95 − q 1 ).The intervals for the d z convey a sense of confidence that there is some differential response, with some greater confidence of a differential response when Z = 1, but include smaller degrees of differential response in addition to more substantial and concerning differences in response.A second set of intervals specified (d m 0 , d M 0 ) = (−0.05,0.90 − q 0 ) and (d m 1 , d M 1 ) = (0, 0.95 − q 1 ) shifting the intervals to the left to put more weight on situations where response rates are more comparable between those with successful and unsuccessful outcomes.A third set of intervals used a large shift to the left adding more weight to situations where response rates are higher among those with unsuccessful outcomes specifying (d m 0 , d M 0 ) = (−0.10,0.90 − q 0 ) and (d m 1 , d M 1 ) = (−0.10,0.95 − q 1 ).For each scenario and each interval, we drew 250 random samples from the specified population model and, for each sample, implemented an analysis using the full data with no nonresponse, a CCA setting Y to a missing value when R = 0, and analyses using the Bayesian PMM methods using the intervals for (d m z , d M z ) specified in Table 2.For the models using the full data and the observed data we specified a logistic regression of the outcome including intercept and intervention Scenario 1 q 00 = 0.70 q 0 = 0.71 d 0 = 0.09 2.23 0.00 0.00 q 0 = 0.73 d 0 = 0.07 1.57 0.00 0.00 q 10 = 0.70 q 1 = 0.72 d 1 = 0.08 q 1 = 0.74 d 1 = 0.06 q 01 = 0.80 q 11 = 0.80 Scenario 2 q 00 = 0.70 q 0 = 0.705 d 0 = 0.045 2.37 0.07 0.15 q 0 = 0.715 d 0 = 0.035 1.67 0.07 0.10 q 10 = 0.70 q 1 = 0.72 d 1 = 0.08 q 1 = 0.74 d 1 = 0.06 q 01 = 0.75 q 11 = 0.80 Scenario 3 q 00 = 0.70 q 0 = 0.71 d 0 = 0.09 2.55 0.14 0.32 q 0 = 0.73 d 0 = 0.07 1.79 0.14 0.23 q 10 = 0.65 q 1 = 0.69 d 1 = 0.16 q 1 = 0.73 d 1 = 0.12 q 01 = 0.80 q 11 = 0.85 Scenario 4 q 00 = 0.75 q 0 = 0.75 d 0 = 0.00 2.70 0.21 0.48 q 0 = 0.75 d 0 = 0.00 1.90 0.21 0.34 q 10 = 0.70 q 1 = 0.73 d 1 = 0.12 q 1 = 0.76 d 1 = 0.09 q 01 = 0.75

TA B L E 2
Prior intervals for d 0 , d 1 considered in simulation study.
Interval set 1 d 0 ∈ (0, 0.95 − q 0 ) d 1 ∈ (0.05, 0.95 − q 1 ) Interval set 2 d 0 ∈ (−0.05, 0.90 − q 0 ) d 1 ∈ (0, 0.95 − q 1 ) Interval set 3 d 0 ∈ (−0.10, 0.90 − q 0 ) d 1 ∈ (−0.10, 0.90 − q 1 ) effect parameters.Prior specifications for the intercept and intervention effect were both normal distributions with mean 0 and variance 10 4 .We implemented a PMM analysis as outlined above in Section 3 specifying vague normal distributions for F  0 , F  1 , F  01 , and F  11 and a uniform distributions on the intervals (d m z , d M z ); we refer to this approach as the Vague Normal Prior (VNP) PMM in the tables and discussion below.We also implemented a beta-binomial model with noninformative Beta prior distributions for the response and observed outcome rates and a uniform distribution on (d m z , d M z ).We used Beta(0.5, 0.5) prior distributions for the sample proportions of response and positive outcome given a response R = 1.In the discussion and tables below, we refer to this method as the Conjugate PMM.
Figures 1 and 2 present the results of the simulations.For the first scenario, in each Figure , where there is differential response by outcome but no overall response bias in the odds ratio, the performance of the PMM methods incorporating the assumptions on the response rates exhibit bias in the point estimates unless the intervals for the d z place more equitable weight on both positive and negative values.In the other three scenarios in each Figure , as the response bias present in the population model increases we do see increasing bias in the CCA odds ratio estimates and increasingly poor coverage results for the corresponding interval estimates.For the PMM approach, we see a reduction in bias in the point estimates relative to the CCA when the specified intervals more accurately capture the differential response and comparable bias to the CCA bias when the intervals are wider and centered nearer to the origin.The coverage of the posterior credible intervals for the PMM methods incorporating the response rate information are higher than the CCA and comparable to or higher than those from the full data analyses.This increased coverage comes with corresponding increases in the width of intervals.
The remaining bias in the PMM point estimates is not surprising given the lack of specificity in the intervals.The performance of the interval coverage though is reassuring.The performance of this method would be expected to diminish as the width of the intervals specified to contain d z increases or as the center of the specified interval is shifted from the true F I G U R E 1 Summary of Simulation Results considering Odds Ratio e  = e 0.8 .Results are presented for each scenario considering an odds ratio of e 0.8 for the association between Y and Z.The plots present the average bias, interval coverage, and interval length from 250 replications of a Bayesian analysis of the full data set using vague normal priors (⬦), a Bayesian analysis of the observed data using vague normal priors (△), an analysis using a vague normal prior PMM (•), and a conjugate priors PMM analysis (◼).

F I G U R E 2 Summary of Simulation Results considering
Odds Ratio e  = e 0.45 .Results are presented for each scenario considering an odds ratio of e 0.45 for the association between Y and Z.The plots present the average bias, interval coverage, and interval length from 250 replications of a Bayesian analysis of the full data set using vague normal priors (⬦), a Bayesian analysis of the observed data using vague normal priors (△), an analysis using a vague normal prior PMM (•), and a conjugate priors PMM analysis (◼).value for d z .Similarly, the performance would be improved if the specification of the intervals is more narrowly centered around the true values for the d z .To illustrate these shifts in performance we implemented additional simulations for scenarios 1 and 3 above using the intervals specified in Table 3.The first set of intervals for each scenario shifts the posited location for the d z away from the true values while the second set of intervals more narrowly focuses the intervals around the true values for the d z than considered in the initial set of simulations.We implemented the same process as above.The results for these simulations are illustrated in Figure 3.For the highly misspecified intervals we see that the bias in the estimate derived from the PMM method is substantial in the first scenario and larger than the CCA in the third scenario; the interval coverage remains high, though this is not always the case with highly misspecified intervals.For the intervals more narrowly centered around the true values for the d z we see that the bias in the point estimate for the PMM method is small, the interval coverage remains high, and the width of the intervals is narrower than the intervals in the simulations above and comparable to the length of the CCA intervals.
In the Web Supplement we present results of simulations considering these situations with a 10% response rate when Z = 0 but with smaller associations between Y and Z.For the sample size of 2000, we evaluated the performance of the conjugate PMM for log odds ratios in {0, 0.15, 0.30, 0.45, 0.60}.Similar performance for the proposed PMM method relative to CCA was observed for the scenarios with smaller true odds ratios.The same patterns observed in Figures 1  and 2 were observed for these simulations.Moreover, when we considered an overall sample size of 500 the continued to follow the patterns observed in simulations with the larger sample size.
F I G U R E 3 Summary of Simulation Results with Less Accurate and More Accurate Interval Specification.Results are presented for scenario's 1 and 3 with an odds ratio of e 0.8 for the association between Y and Z and intervals specified in Table 3. Labels A and B refer to intervals for d z that have locations far away from and narrowly focused around the true values of d z , respectively.The plots present the average bias, interval coverage, and interval length from 250 replications of Bayesian analyses of the full data set using a beta-binomial model (◼), a Bayesian analysis of the observed data using a beta-binomial model (△), and an analysis using the conjugate priors PMM analysis (•).

APPLICATION TO THE OPT-IN TRIAL
The Offering Proactive Treatment Intervention (OPT-IN) Study 25 RCT compared the effects of a proactive tobacco cessation intervention, relative to usual care, on smoking cessation rates among a population-based sample of 2406 individuals currently smoking cigarettes and enrolled in the Minnesota Health Care Programs, the state's publicly funded healthcare programs for low-income populations.The proactive intervention comprised personalized mailings and telephone calls as outreach to smokers to foster use of tobacco cessation treatments.Participants were randomized to study arms, usual care (n = 1206) or proactive intervention (n = 1200), via stratified sampling within strata based on age, gender, and program enrollment within the Minnesota Health Care Programs.Randomization resulted in a roughly even distribution across intervention arms for demographic and smoking history variables including cigarettes smoked per day, types of cigarettes smoked, and cessation treatments previously used.At the 12 month postrandomization follow-up assessment, 16.5% of respondents in the proactive intervention and 12.1% of respondents in the usual care arm reported 6-month smoking abstinence, yielding an observed increase of 4.4% in abstinence rates in the proactive outreach arm.Analysis of the observed data yielded an estimated odds ratio of 1.47 with 95% confidence interval (1.12, 1.93).However, follow-up assessment was completed for 68% of the proactive outreach arm and 78% of the usual care arm.A range of selection models fit using the Expectation Maximization (EM) algorithm of Ibrahim and Lipsitz, 26 incorporating a breadth of participant covariates, resulted in estimates ranging from 50% (OR = 1.50, 95% CI [1.16, 1.95]) to 68% (OR = 1.68, 95% CI [1.31, 2.16]) for the increase in the odds of smoking abstinence in the proactive outreach arm compared to the usual care arm.
We applied the Conjugate Prior PMM method to the reported OPT-IN trial results assuming those who continued smoking cigarettes would generally be less likely or comparably likely to respond to the study follow-up assessment as those who had quit smoking cigarettes.We considered a collection of different sets of intervals for the (d m z , d M z ) to assess the potential impact of response bias.For each set of intervals, we implemented the PMM approach using the conjugate beta-binomial formulation with Beta(0.5,0.5)prior distributions and uniform distributions on the intervals for the d z with 5000 draws from the generated posterior distributions.The sets of intervals considered are presented in Table 4.Given the large difference in response rates between the two arms, in the first three scenarios for d z we considered greater differences in response rates between those abstinent from smoking and those continuing to smoke in the proactive outreach arm than we did in the usual care arm.For these post hoc analyses we chose an upper bound of 90% on the response rate among those who quit smoking based on the overall response rates in the arms, the limited contact with the usual care arm over the course of the study, the potential high transitoriness of enrollment and engagement of the population with the healthcare programs.
In the first scenario we considered somewhat comparable rates by smoking status in the usual care arm and a shift to higher nonresponse among those continuing to smoke in the proactive arm.The second scenario considers a range of higher differences in response in both arms but with the same upper bounds.The third and fourth scenarios consider variations on these specifications including smaller differences in response rates in the arms.These scenarios then investigate whether the larger rate of abstinence in the proactive outreach arm may be driven by higher rates of nonresponse among those continuing to smoke in this arm.The seventh through ninth scenarios consider equal intervals for each arm with the range of differences closer to zero.The eighth scenario assumes rates are generally comparable between those who quit smoking and those who continue to smoke while the seventh and ninth consider various shifts and expansions of the intervals.The fourth through sixth scenarios fall between these prior two sets.The common assumption in tobacco cessation studied is that those continuing to use tobacco are less likely to respond.The ninth scenario consider a set of intervals in contrast to this assumption.The last scenario examines somewhat larger shifts from the supposed direction for proactive outreach arm than for the usual care arm with generally comparable rates in the usual care arm.
The results of these analyses are presented in Table 4.The table presents the mean of the posterior distribution for the odds ratio along with 95% and 90% highest posterior density intervals and posterior probabilities that the odds ratio is larger than 1, 1.1, and 1.2.The eighth scenario assumes the response rates do not differ much with the outcome and the point estimate for the odds ratio is similar to that obtained from a CCA.The seventh and ninth scenarios consider the same intervals for d 0 and d 1 and shift these slightly off of the origin and, in the seventh scenario, increase the width of the interval.Overall though the results for these scenarios are fairly comparable to those from the eighth scenario with slightly wider interval estimates and lower probabilities of the odds ratio exceeding the presented thresholds.As the intervals specified for the d z shift to larger positive values with greater shifts for d 1 from Scenario 6 down to Scenario 1, the point estimates for the odds ratio and associated posterior probabilities of the odds ratio surpassing the larger thresholds diminish.However, outside of Scenario 1 with its large shift to the positive values for the proactive arm and a narrow interval around the origin for the usual care arm, the results for the second through fifth scenario yield reduced point estimates for the odds ratio between 1.23 and 1.33 with posterior probabilities of greater than 0.90 and posterior probabilities greater than 0.75 for the odds ratio being greater than 1.1.There is some attenuation in the magnitude of the likely odds ratio in these scenarios but there remains substantial posterior probability of the presence of an intervention effect.The results for the ninth and tenth scenarios indicate that if those with negative outcomes are more likely to have responded, with larger differences in the proactive outreach arm, then the observed results may be an underestimate of the intervention effect.
The PMM analyses implemented here tended to result in lower point estimates for the association between intervention and outcome than the EM algorithm selection models employed in Fu, et al 25 though there is substantial overlap in the interval estimate results for the models presented here and for the models selected for focus in the initial OPT-IN analysis based on model fit criteria; the credible intervals here cover the point estimates obtained from the identified EM algorithm selection models thought to have the best fit.In line with the discussion in the introduction, the numerous selection models fit to the OPT-IN data presented in Fu, et al using different collections of covariates and interactions in the selection models exhibit a wide spread of odds ratio estimates.We see for the PMM results presented here similarly wide spread of point estimate results.Notably, the results presented by Fu, et al with the larger odds ratios that had the worse fit criteria statistics are more consistent with the model presented here where response rates would be higher among those who continued to smoke cigarettes.This is interesting and useful information to use to understand the results of the analyses assessing the impact of missing data.The 10 models presented here were fit in a matter of seconds whereas the implementation of the EM algorithm models takes orders of magnitude longer to fit.The PMM approach presented here can serve as a readily implemented method to augment implementation of more complex methods.This approach does not add significant computational burden in implementing a more robust assessment of the impact of missing data on study results.

DISCUSSION AND CONCLUSIONS
We have discussed the structure and examined the performance of these PMMs that incorporate knowledge or assumptions on response rates among groups with different outcomes.The approach uses a simple extension of the RCT structure to include a classification of participants as responders and nonresponders and models the differences in outcome between responders and nonresponders using response rates as prior information.This approach retains the simplicity of the RCT framework and has substantial computational advantage over more mathematically complex approaches to missing outcomes with perhaps little cost in terms of the complexity of results.The PMM method generates point estimates with lower bias relative to the CCA estimates when the intervals for the differential response are accurately located.The resultant interval estimates perform well in terms of coverage but can come with a cost of large width.Implementation of the beta-binomial form of the PMM is simple and does not require complex modeling techniques beyond Monte Carlo sampling from posterior distributions.This approach may provide an attractive and appropriate inferential method for those without extensive expertise or comfort with the more complex approaches to addressing missing data in RCT analyses.
We observed that the vague normal prior PMM typically underperformed in comparison to the conjugate PMM, even though the former method is more adaptable to include covariates as an extension.Furthermore, the PMM methods can be combined in a single analysis where we first obtain initial estimates of the odds ratio of outcome success and the estimated effect of the outcome values on response from a conjugate PMM model.Following this step, we could employ the vague normal prior PMM approach while adjusting for covariates and use the conjugate PMM's estimated posterior distributions of the estimated odds ratios for intervention success and the effect of the outcome value on response as their prior distributions.Given the results here though, alternate prior distributions may perform better than a normal prior distribution; a prior distribution such as the t-distribution with naturally wider tails may reduce shrinking relevant non-null effects.
There is a trade off here relative to the use of the more complex approaches.We reemphasize here that the beta-binomial PMM structure is much less computationally intensive to implement than multiple imputation, PMM, and SM approaches using many covariates.We assign relatively vague prior distributions for the differences in response rates in each arm.This stands in contrast to use of other PMM and SM models, such as those discussed by Karicoti and Raghunathan 21 and Linero, 9 where series of highly informative prior distributions are used without a clear sense of how to synthesize disparate results.PMM or SM using extensive sets of covariates may result in decreased bias of the point estimates and possibly narrower interval estimates than the PMM proposed here.However, these analyses require specification of plausible selection or pattern mixture models via prior knowledge, the results of these more complex analyses can be sensitive to the models employed, and the results can demonstrate substantive variance across possible models just as the results of the PMM analyses for the OPT-IN trial exhibited variation across intervals considered.The relative benefits of the different approaches in practice may then be difficult to assess.The trade off between assessing differences in results to plausible choices for these intervals to the complexity of assessing variation in results due to specifying more complex pattern mixture or selection models and the complexity of implementing these latter analyses could be seen to be a fair trade.Alternately, this simpler, readily implemented PMM approach can also be used to augment the more complex approaches.

with OR YZ = e 0.8 Scenarios with OR YZ = e 0.45 q zy q z d z OR ZY |R=1 %Bias Bias q z d z OR ZY |R=1 %Bias Bias
Population model scenarios for simulation studies.
TA B L E 1 Summary of PMM analyses of the OPT-IN prolonged abstinence outcome.