We used a data-generating process identical to one used in a prior study that examined optimal caliper widths for use with propensity-score matching 10. Briefly, we simulated data sets such that approximately 25 per cent of the sample was exposed to the treatment. The data-generating process was designed to induce a specific average treatment effect for the treated (ATT), the measure of effect that is estimated when propensity-score matching is used 11. Furthermore, in the simulated data sets, the marginal probability of the outcome would be approximately 0.29 if all subjects in the population were not exposed. We then examined scenarios in which the risk differences due to treatment in treated subjects were 0, −0.02, −0.05, −0.10 and −0.15 (i.e. absolute reductions in the probability of the outcome due to treatment were 0, 0.02, 0.05, 0.10 and 0.15). First, we randomly generated 10 independent covariates (*X*_{1}−*X*_{10}) from independent standard normal distributions for each of 10 000 subjects. We then assumed that the following logistic regression model related the probability of treatment to these 10 baseline covariates:

We then generated a treatment status indicator (*Z*_{i}) for each subject from a Bernoulli distribution with subject-specific probability equal to *p*_{i, treat}. Those subjects with *Z*_{i} = 1 denote the treated subjects in whom the ATT is defined. We assumed that the following logistic regression model related the probability of the outcome to these covariates and an indicator variable (*Z*) denoting treatment:

In the above regression model, *p*_{i, outcome} denotes the probability of the outcome for the *i*th subject and β denotes the log-odds ratio relating treatment to the outcome. We then generated subject-specific outcomes from a Bernoulli distribution with probability *p*_{i, outcome}. The regression coefficients for the baseline covariates in the above two regression models were set as follows: α_{L} = log(1.1), α_{M} = log(1.25), α_{H} = log(1.5) and α_{VH} = log(2). These are intended to reflect low, medium, high and very high effect sizes. We fixed the value of α_{0, treat} = −1.344090 so that approximately 25 per cent of the subjects would be treated. We fixed the value of α_{0, outcome} = −1.098537 so that the probability of the event occurring in the population if all subjects were untreated would be approximately 0.29. To induce a risk difference of 0, β was set to be 0. For risk differences of −0.02, −0.05, −0.10 and −0.15, the required value of β equaled log(0.90619), log(0.7795362), log(0.6001387) and log(0.45292), respectively. The reader is referred elsewhere for a more detailed explanation of how these values of β were determined 12.

The above scenario assumed that the 10 covariates (*X*_{1}−*X*_{10}) were all independently distributed standard normal random variables. We also examined four additional covariate scenarios. In the second covariate scenario, the 10 covariates were from a multivariate normal distribution such that the mean and variance of each random variable were equal to 0 and 1, respectively, while the correlation between pairs of random variables was equal to 0.25. In the third covariate scenario, the first five covariates (*X*_{1}−*X*_{5}) were assumed to be independent Bernoulli random variables with parameter 0.5, while the last five covariates (*X*_{6}−*X*_{10}) were assumed to be independent standard normal random variables. In the fourth covariate scenario, the first nine covariates were assumed to be independent Bernoulli random variables with parameter 0.5, while the tenth covariate was a standard normal random variable. In the fifth covariate scenario, all the 10 covariates (*X*_{1}−*X*_{10}) were all independent Bernoulli random variables with parameter 0.5. The values of α_{0, treat}, α_{0, outcome} and β were modified in order to preserve the proportion of treated subjects, the marginal probability of the outcome, and the required treatment effect. We refer to the five covariate scenarios as the independent normal covariates scenario, the correlated normal covariates scenario, the first mixed covariates scenario, the second mixed covariates and the independent Bernoulli covariates scenario, respectively. Within each of the five covariate scenarios and for each of the five absolute risk reductions, we randomly generated 1825 data sets, each consisting of 10 000 subjects. We refer to the above set of 25 scenarios as the scenarios with a 0.29 outcome probability and a weak treatment-selection model.

We also examined three additional sets of five covariate scenarios. In the next set of five scenarios, the data-generating process was modified so that the probability of the outcome if all subjects were untreated was 0.15. We refer to this set of five 25 scenarios as the scenarios with a 0.15 outcome probability and a weak treatment-selection model. We then modified these two sets of 25 scenarios by changing the weak treatment-selection model to a strong treatment-selection model. A strong treatment-selection process will induce greater differences in baseline covariates between treated and untreated subjects in the unmatched sample. In these two sets of 25 scenarios, the coefficients for the treatment-selection model and the outcomes model were changed to: α_{L} = log(1.5), α_{M} = log(1.75), α_{H} = log(2), and α_{VH} = log(2.5). In the two sets of simulations in which there was a strong treatment-selection model, we observed low percentages of treated subjects successfully matched to untreated subjects in some of the covariate scenarios. Therefore, in these two sets of scenarios, minor modifications were made to the data-generating process by adding additional untreated subjects to the sample. Ten additional copies of each untreated subject were created within each replication of the Monte Carlo simulations. For these 10 additional subjects, outcomes were generated independently using the same subject-specific probability of an outcome. In these last two sets of simulations, the initially generated data set was of size 1000 (rather than of size 10 000). Then 10 copies of each untreated subject were added to the simulated sample so as to increase the number of potential control subjects.