Barendregt  proposes a new method to define an input distribution for a relative risk (RR), as used in the probabilistic sensitivity analysis (PSA). He suggests his method is non-Bayesian, and as such does not use subjective, prior information. This suggestion is also furthered by the use of the term “parametric bootstrapping” instead of PSA, as bootstrapping is in essence a non-Bayesian method .
Although I sympathize with his goal of making analyses as objective as possible, and of avoiding the use of subjective information, unfortunately, as I will argue in this commentary, such objectivity is not achieved by the method described. And although the use of the term “bootstrapping” also implies non-Bayesian methodology, I do not believe that the term “parametric bootstrapping” in this context is the right term to use. Parametric bootstrapping, as I know it, is a method in which, first, a parametric model is fitted to the data. Then, the fitted model is used to generate a new set of data (mimicking the original data). In this context, this would mean that a set of individual patient data is generated from the fitted relative risk model. After that, the quantity of interest (e.g., a cost-effectiveness ratio) is calculated based on the newly generated individual patient data, and this procedure is repeated many times to obtain a distribution of cost-effectiveness ratios, which in turn can be used to obtain non-Bayesian confidence intervals of these cost-effectiveness ratios. An essential feature of this parametric bootstrapping procedure is that the RR used is the same during all repeating loops in the procedure, namely equal to the RR that results from fitting the model to the original data.
In contrast, in the uncertainty analysis described by Barendregt, a probability distribution for the RRs is defined and used for drawing different values for the RR from this distribution for use in the uncertainty analysis. Therefore here, different RRs are used in different repeating loops of the procedure. It is important to note that the term “distribution of the RR,” and therefore also the entire procedure, implies a Bayesian view, as for non-Bayesians the RR has only a single (although unknown) value, its true value, which has always a probability of 100%.
The derivation of Barendregt starts with the non-Bayesian confidence interval of the RR in Equation 4. From this equation, Equations 5 and 6 are derived as the “natural candidate” for the distribution of the RR. However, the distribution given in Equations 5 and 6 can only be derived from the confidence interval in Equation 4 when one assumes that an x% confidence interval means that there is x% probability that RR lies in this interval. This is not the case. An x% confidence interval is not the interval in which the RR lies with 95% probability; it is an interval with the property that if one repeats the experiment (or the observational study) many times, x% of those intervals will contain the true RR. In other words, the x% is a property of the group of similarly calculated intervals, and cannot be applied to a particular individual interval: each individual interval either contains the true RR or does not contain it.
In order to state the probability that a parameter lies in a particular interval, one needs information on the prior probability of the parameter. An example might clarify this: take the case where one randomly draws 10 balls (with replacement) from an urn containing white and black balls, and where the experiment results in 6 black and 4 white balls. Standard statistical analysis then delivers a point estimate for the prevalence of black balls of 0.6, with an (exact) 95% confidence interval of 0.26–0.88. Now, if all possible prevalence rates of black balls are equally likely (= prior information), the probability that the prevalence is between 0.26 and 0.88 is indeed 95%. But if one would know that there are 10 balls in the urn, and that at most two of them can be black, then it is clear that the probability that the true prevalence is in the 95% confidence interval from this experiment is zero. Nevertheless, even with this knowledge, the interval 0.26–0.88 from the classical procedure is still a valid (non-Bayesian) 95% confidence interval. This case is just one of the unlucky, rare cases where the true value is not contained in the 95% confidence interval. Of course, in this particular example, the prior knowledge can be used to devise more powerful non-Bayesian procedures for this particular situation, only yielding intervals up to 0.2. However, the point I want to make with the example is that the probability that a parameter is in a standard confidence interval depends on more than the confidence interval itself: it also depends on a priori knowledge on the possible value of the parameter one is estimating.
This would not be too bad if there would be a unique way to define the prior probability of the parameter under ignorance. Unfortunately, that is not possible. At first sight, the prior assumption that all prevalence rates are equally likely might seem to be such a unique prior. However, it is not unique: one could also propose, for instance, the slightly different assumption that all odds of white against black balls are equally likely. Therefore, unambiguous “uninformative” priors do not exist.
The mistake of interpreting an x% confidence interval as meaning that the parameter value lays in this interval with x% probability is one that is often made by users of statistics, especially as the correct interpretation does not seem very useful. Users are not looking for a statement on repeat performance of a statistical procedure, but for a statement on the value of a parameter, in this case the RR. In the century before the term “Bayesian” came into use some 50 years ago , this was called the “inverse probability,” indicating that it gives the probability of the parameter given the data, rather than the probability of the data, given the true parameter value. In that time, it was already widely recognized  that it is impossible to estimate this probability from data or objective facts only, but that this can only be done after making assumptions on the probability of different parameter values “under ignorance”[2–4]. In modern terms, in order to derive a probability distribution for a RR from data, one needs a prior probability distribution for the RR. There is no non-Bayesian way around this. Non-Bayesians basically solve the problem by declaring it inadmissible, as parameters like a relative risk for them only have an unknown true value, not a distribution.
So the new correction method proposed by Barendregt in his Equations 12 and 13 is based on the Bayesian notion that a parameter has a distribution, and thus his method unavoidably places an implicit prior distribution on the RR. The question is: is this implicit prior reasonable? If I reason back from the posterior distribution defined in Equations 12 and 13 to the prior behind it, then I find a prior that places the lowest probability on the observed RR, and higher probabilities on RR values farther removed from the observed RR. This does not seem a very sensible prior to me, and therefore I would not use the distribution defined by Equations 12 and 13.
If one would want to carry out a non-Bayesian uncertainty analysis, bootstrapping would be an option. In the example given, it is possible to do parametric bootstrapping by sampling case and controls from a Poisson-distribution with an expectation of 20 and 40 cases, respectively. In this simple example, the parametric bootstrapping would yield results equal to that of nonparametric bootstrapping (presuming that the real data are generated by a Poisson-process). Nonparametric bootstrapping is cited by Barendregt as the golden standard for uncertainty analysis. Such an approach yields an average RR of 0.513, closer to the average of 0.519 when using the distribution from Equations 5 and 6, than to the “desirable” point estimate of 0.5.
A disadvantage, in my opinion, of the bootstrap approach is that it only delivers a (non-Bayesian) confidence interval on the cost-effectiveness ratio (or any other outcome) that can only be interpreted as an interval that contain the true value with a particular frequency, but not as a statement on the probability of a particular cost-effectiveness ratio. A Bayesian confidence (credibility) interval, on the other hand, has a much wider usefulness, and can be used, for instance, in value-of-information analysis [5,6].
Apart from these epistemological objections, the reasoning behind the proposed method does not convince me. Barendregt lists as desirable properties for the distribution to be used:
- 1The type of distribution is based on the kind of variable, and the way the point estimate and confidence interval was obtained.
- 2The distribution returns a mean that is equal to the point estimate.
- 3The distribution returns an uncertainty interval that replicates the confidence interval of the point estimate.
As justification for the latter two criteria, he refers to consistency of the point estimate and the results from the uncertainty analysis, and to “neither under- nor over-representing the uncertainty implied by the confidence interval.” I agree with his first desirable, but the arguments given for the last two desirables elude me. Both point estimate and confidence interval are theoretical constructs in the context of a particular statistical analysis rather than the truth that needs to be exactly reproduced.
A minor comment on the method is that Barendregt states in the discussion that his second method returns an uncertainty interval with a width that replicates the width of the confidence interval of the original point estimate. This strictly spoken can not be true, as the method (Equations 12 and 13) delivers a lognormal distribution with equal variance to the variance of the original lognormal distribution (Equations 5 and 6). As the shapes of both distributions differ, this implies unequal width of for instance the 95% confidence intervals. In the example given by Barendregt, the difference is small, but its existence can be illustrated with a more extreme example (2 and 4 cases, instead of 20 and 40). Here, the 95% confidence interval from Equations 5/6 would have been 0.09–2.73 (a width of 2.64), while the second correction method from Equations 12/13 would yield a confidence interval of 0.03–2.36, a width of 2.33.
In summary, defining the distribution of a RR or any other model parameter without being a Bayesian is epistemologically impossible. This means that being explicit on prior distributions used for deriving those distributions, and justifying them, is a necessary part of suggesting new ways to derive distributions. Bootstrapping offers possibilities to be non-Bayesian, but at the price of giving only non-Bayesian answers. The method presented by Barendregt, however, cannot be seen as a bootstrapping approach.