In this issue of the Journal, Bayesian models for predicting malignancy of ovarian tumors are presented1. A Bayesian approach to statistical analysis is based on Bayes' theorem and differs fundamentally from the traditional approach. Bayes' theorem, which resulted in a paradigm shift in statistics, is rooted in an essay by Thomas Bayes published posthumously2. Why do we believe that Bayes' theorem is relevant to clinical decision making? Consider a diagnostic test. Suppose people with disease G are identified by a positive test result T in 95% of cases. Let us formally denote this by saying that P(T|G) = 0.95. However, this does not mean that a positive test result is associated with a 95% chance of having disease G: P(T|G) ≠ P(G|T). This inequality is explained by Bayes' theorem:
Thus, one needs to estimate P(G), the prior probability of having G (this can be based on the estimated prevalence of G), and combine this probability with the test result P(T|G) to arrive at P(G|T), the posterior probability of having disease G. P(T) can be calculated simply as a normalization factor. Bayesian statisticians use the theorem as the core of their procedures and extend its use to statistical analysis itself while traditional (or orthodox) statisticians confine its use to decision-making problems like the one here.
The central difference between traditional and Bayesian statisticians is their view of probability. Traditional statistics uses a ‘frequentist’ approach to probability; that is, the probability of an event is interpreted as the fraction of times the event occurred after an infinitely repeated set of identical trials. While this seems reasonable enough, this interpretation can give rise to problems in procedures such as significance tests. For example, a 95% confidence interval means that, if we were to repeat a study many times, the confidence interval should contain the true value in 95% of the replica studies. This does not mean that there is a 95% chance that the true value is inside the interval, even though such a conclusion is often (incorrectly) drawn. Owing to its view on probability, traditional statistics often does not answer the questions that we really want to ask. Bayesian statisticians define probability as a degree of belief. The Bayesian view of probability, therefore, is a more general one in which probabilistic reasoning about statistical models themselves is natural.
Suppose that, as an example, we want to compare the mean age of conception in primigravida whose parents are less than 20 years older (Group 1) with primigravida whose parents are at least 20 years older (Group 2). In general terms, the Bayesian approach3, 4 works in the following way:
- 1.A model for data analysis is defined that is thought to be suitable for the problem. In our example, we could postulate that, in both groups, the age of first pregnancy has a Gaussian distribution with population mean µ1 (Group 1) and µ2 (Group 2), and with equal but unknown variance σ. This model has three parameters: µ1, µ2 and σ.
- 2.A prior probability distribution on the model parameters is specified, reflecting our knowledge or beliefs about likely values of these parameters. In our example, we would define a prior probability distribution p(µ) for µ1 and µ2, centered at the ages an expert believes are the most likely values for these parameters. The width of the priors represents the uncertainty with respect to the expert's belief. (Figure 1a, bold line).
- 3.Next, we look at the data (D) we have collected. We can compute how likely our data are for different assumed values for the parameters: if, in our example, the true mean age in Group 1 is 22 years, what is the probability of obtaining the data we have collected for this group? This yields the likelihood function p(D|µ). The computation of this function makes use of the specified model and its assumptions.
- 4.The combination (by multiplication) of the prior distribution (representing the information outside the collected data) and the likelihood function (representing the information inside the collected data) yields the posterior probability distribution for the parameter; it reflects how likely different parameter values are true, after taking into account one's prior beliefs and the information in the collected data. To obtain a proper probability distribution, this product needs to be normalized to have an area of 1. Based on this distribution, we can infer probabilistic conclusions. For example, a 95% ‘probability interval’ can be constructed that can be interpreted as the interval that contains the true parameter value with a probability of 0.95 (i.e. 95% chance), where probability interval refers to the Bayesian version of confidence interval5. Or, in our example, one can use the posterior distributions to estimate the probability that µ1 − µ2 is smaller than zero.
In traditional statistics, one believes that the data should speak for themselves so that there is no need for a prior distribution. Rather, only the data are used to find estimates for the parameter values. This yields a single parameter estimate, as opposed to a probability distribution as obtained in a Bayesian approach. Therefore, one advantage of the Bayesian approach is that it incorporates uncertainty about the true parameter value. This approach is intuitively more valid since in general we have no exact knowledge about the true value of any parameter. Traditional statisticians often use P-values to describe the results of their analyses. P-values represent the probability of obtaining data at least as extreme as the collected data given that a hypothesis (usually the hypothesis of no effect) is true. Actually this is a probability statement about data given a hypothesis, while Bayesian analysis, on the contrary, results in probability statements about a hypothesis given the data5. This is more consistent with natural reasoning and is more informative, for example, for decision making: ‘What is the probability that treatment A is better than treatment B?’
Let us work out our example. Suppose we have collected data on 20 women in Group 1 (ages in years: 19, 26, 25, 20, 21, 21, 29, 24, 23, 23, 26, 22, 22, 18, 30, 24, 22, 24, 23, 26) and 20 women in Group 2 (ages 22, 23, 31, 27, 20, 22, 25, 26, 26, 28, 26, 28, 27, 27, 23, 20, 29, 25, 25, 28). The sample mean age of conception in primigravida is 23.4 years in Group 1 (SD = 3.05) and 25.4 years in Group 2 (SD = 2.96). Our model is the one specified above, which has three parameters: µ1, µ2 and σ. We will only focus on results for µ1 and µ2. Traditional statisticians would estimate the population mean age to be 23.4 for women belonging to Group 1 and 25.4 for women belonging to Group 2. To test whether the population means differ, traditional statisticians could use a t-test (with the null hypothesis stating that both groups have the same mean age) resulting in a P-value of 0.0485. These results suggest that the mean age of conception in primigravida may differ for both groups—but be aware of blind interpretation of P-values6. If the null hypothesis is true, we have less than 5% chance of collecting data that are at least as extreme as the collected data. However, this statement is not really of interest to clinicians, since what they want to know is the probability of a difference in age given the data that have been observed. Thus it makes sense to undertake a Bayesian analysis. Figure 1a shows such an analysis. The bold line represents the prior distributions: the expert believes that the mean age is most likely 25 years in both groups (he or she does not believe both groups differ much) with similar degree of uncertainty. Therefore the prior distribution for the mean age is identical in both groups (in fact, the prior distribution is a Student t-distribution because σ is unknown). Using the assumed model, the likelihood functions are computed and plotted (dashed lines). Notice that these (Student t-) distributions are centered around the sample mean ages. Using Bayes' theorem we can derive the posterior distribution (plotted as the thin full lines). Using these (Student t-) distributions we can derive probabilistic conclusions: we conclude that there is a 92.3% chance that the mean age for people belonging to Group 1 is lower than the mean age for people belonging to Group 2. For Group 1 women, the posterior distribution is centered around 24.2 years (90% probability interval: 23.38–25.02); for Group 2 women, the posterior is centered around 25.2 years (90% probability interval: 24.48–26.02). The 90% probability interval for the difference between both mean ages is the interval between − 2.15 and 0.15: we conclude that there is a 90% chance that the mean age difference is inside this interval.
The use of prior distributions is often criticized as being subjective. Several points can be raised here. First, the use of prior information is necessary to arrive at probabilistic conclusions, and in fact all analyses make use of prior information (e.g. the analysis of a positive test result mentioned above) but in Bayesian analysis this information is made explicit. Second, orthodox statistics are less objective than they may seem: they incorporate different possibilities for data analysis (the choice of estimator, test and model) which may sometimes yield different conclusions7. Bayesian inference, on the contrary, follows a consistent and unified procedure using Bayes' theorem, even though the choice of model is just as important in Bayesian analysis. Third, making prior information explicit is an open procedure stimulating discussion and thought7. Fourth, sensitivity analyses can be performed to check the robustness of the results to changes in prior distributions. In Figure 1b another expert has put less uncertainty in his prior distribution (it is narrower and higher). The resulting posterior distributions can be compared with those of the initial analysis. There is now an 87.7% chance that the mean age in Group 1 is lower than that in Group 2, and the 90% confidence interval for the difference of both means is now the interval between − 1.62 and 0.28. Fifth, convincing data and/or larger data sets make the posterior distribution more dependent on the information inside the data (i.e. the likelihood) than on the prior information used (Figure 1c, based on three times as many data by cloning the original data set). And it is natural to choose broader, less informative prior distributions (with more uncertainty) when there is less prior knowledge about the parameters, which again makes the likelihood term more important (see Figure 1d, to derive which a flat prior is used that contains no information at all; the posterior and likelihood are now identical).
The quantification of prior beliefs into a prior probability distribution can be a difficult step. A prior distribution may be based on available literature and data, and on the experience of an expert7–11. For example, one can collect the beliefs of several experts by asking for their best estimate of the parameter and what values they consider unlikely. This information can then be summarized into a prior probability distribution reflecting the experts' ideas and uncertainty thereof. Another possibility is to use the results of a meta-analysis about the subject. If our prior information is vague, this uncertainty regarding the accuracy can be incorporated quantitatively by making the prior distribution more broad. Remember that the prior often has a limited effect, which prohibits overly optimistic or pessimistic prior estimates to considerably influence the result.
We conclude that the use of Bayesian models opens new possibilities for clinical research. Downsides are the mathematical complexity and time-consuming nature of the analyses.