Statistical analyses of data and making sense of medical data have received much attention in the medical literature, but nevertheless have caused confusion among practitioners. Each researcher provides a different method for comparing treatments. For example, when the end point is binary, such as disease versus no disease, the common measures are odds ratios, relative risk, relative risk reduction, absolute risk reduction, and the number needed to treat. The question faced by the practitioner is then: Which one will help me in choosing the best treatment for my patient?
The purpose of this paper is to illustrate, using examples, how each measure is used, what it means, and what are its advantages and disadvantages.
Some pairs of measures present equivalent information. Furthermore, it is shown that different measures result in different impressions.
It is recommended that researchers report both a relative and an absolute measure and present these with appropriate confidence intervals.
In recent years, the amount of available information in medical literature has increased rapidly, and as more studies are performed, the results have become more easily accessible. Even patients in the Internet era are aware of current research. The problem is how to judge the evidence in various published studies and decide whether it justifies changing the existing treatment for a new one. “At the mention of the term ‘statistics’, most physicians react with a groan of confusion and annoyance.”
The main difficulty in the comparison of different treatments lies in the fact that they are almost never compared, in a preplanned study, against each other. Instead, most studies compare the new treatment with a placebo. Moreover, the end points of the studies may differ, the initial severity of the disease may be different, the studies look at different subsets of patients who had previously been exposed to different drugs, and the outcome criteria may be different. These problems imply limitations to any systematic review of placebo-controlled trials designed for regulatory purposes . Obviously, the best way to compare several treatments is to design a study that will include all treatments to be compared, but that is a hard task to accomplish for regulatory purposes.
In the past few years, the issue of integrating the results of several independent studies has been the topic of many articles. While some suggest using only relative risk , or absolute risk reduction , others advocate use of the number needed to treat criteria [5,6], and some consider the odds ratio to be the method of choice . Obviously, the choice of method is linked to the type of study and its design. For retrospective studies and for cross-sectional studies, in which the aim is to look at the association rather than differences, the odds ratio is recommended, while a relative risk or risk difference cannot be meaningfully calculated. Risk calculations are only meaningful in follow-up studies. Odds ratio is also used in case-control studies, in which the relative risk cannot be estimated.
In the present article, we are mainly concerned with controlled studies. We will describe the different measures of treatment effects along with their advantages and disadvantages and summarize some of the debates regarding which one is to be used, as reported in the medical literature in recent years. Because the choice of treatment depends on the measure being used, it is important that the practitioner and the patient understand the differences between the measures. We hope that this understanding will help in choosing the proper measure for the case and recommend that both a relative and an absolute measure be reported to give a more complete picture.
Absolute risk reduction. The basic and simplest measure is the absolute risk reduction (ARR), also called the risk difference. That is, as a result of using the treatment, is the risk of an event reduced by a clinically meaningful amount? The calculation is just the difference between the risk of an event in the control group and the risk of an event in the treated group.
The advantage of the estimated ARR is that it is easy to compute, the confidence interval obtained is easy to interpret (and is readily available with standard statistical packages), it reflects both the underlying risk without treatment and the risk reduction associated with treatment, and has a clear meaning, which makes it appealing to the practitioner. A confidence interval that contains zero means that there is no significant difference between the treatment and the placebo in terms of risk. One disadvantage is that a difference in risk of fixed size may have greater importance when the risks are close to 0 or 1 than when they are near the middle of the range. A difference between 0.010 and 0.001, when considering the risk that people suffer serious side effects, is more noteworthy than the difference between 0.410 and 0.401 .
Number needed to treat. A related measure, based on the absolute risk reduction, is the number needed to treat (NNT), which is defined as the reciprocal of the absolute risk reduction. The meaning of this measure is the number of patients that need to be treated, to get the desired outcome in one patient who would not have benefited otherwise. Also, when the outcome is binary, the cost-effectiveness ratio becomes the product of the incremental costs and the NNT. A confidence interval (CI) for NNT can be obtained by inverting the upper and lower confidence limits of absolute risk reduction. The NNT has both advantages and disadvantages that are discussed in the medical literature. It can be easily understood and used and “ . . . ] should help us to make the best clinical decisions with our patients.” Elferink and Van Zwieten-Boot  encourage the use of NNT and state that NNT takes into account the absolute benefit and is a meaningful measure because it addresses both statistical and clinical significance in a way that is easily interpreted. It is worth noting that the numerical value of NNT is a function of the disease, the intervention, and the outcome . A NNT of 10 when the outcome is very serious may be judged differently than a NNT of 5 for a milder outcome. Therefore, it is only appropriate to compare NNTs directly, when treatments for the same condition, severity, and outcome are compared.
When there is no difference in risk between the treatment and control, the absolute risk reduction is zero and NNT is infinite. Also, when the difference is not significant, the CI for absolute risk reduction will include zero. Because a CI for NNT is obtained by taking reciprocals of the CI for ARR, we may get an ARR of 0.1, with a 95% CI of −0.05 to 0.25, which yields a NNT of 10 and a 95% CI of −20 to 4. There are two problems with this interval. First, NNT should be positive, and second, the CI does not include the point estimator. NNT is equal to 10 in this case , for which McQuay and Moore suggest using only point estimates . However, it is not satisfactory for a CI to be presented only when the result is significant . The interpretation of a negative value for NNT is as follows: if NNT patients are treated with the new treatment, one fewer patient will benefit than if they were all treated with the control. When NNT is negative, it is called NNH—the number needed to harm. As ARR approaches zero, it means that there is almost no difference between the new treatment and the control, and therefore, infinitely many patients need to be treated for one to get well, who otherwise would not have. The problem of interpreting a CI such as (95% CI, −20 to 4) still exists, because ARR of zero translates into NNT equal to infinity. One simple solution is to report two separate intervals: NNH (20–∞) and NNT (4–∞). Altman  proposes combining both intervals into one statement: NNTH 20 to ∞ to NNTB 4.
To overcome the disadvantages, it has been suggested that NNT be accompanied by the control group event rate to which they apply and the relative risk and CI from which they are derived . Newcombe  suggests that absolute risk reduction is a more basic quantity, with much less potential to be misunderstood and is preferable to the NNT, because of the NNT's singularity problem. He suggests that NNT and its CI will be used in an alternative way, when the absolute risk reduction is well away from zero.
Relative risk and relative risk reduction. The next two popular measures are relative risk (RR) and relative risk reduction (RRR). The relative risk of a treatment is the ratio of risks of the treated group and the control group, also called the risk ratio. The relative risk reduction is derived from the relative risk by subtracting it from one, which is the same as the ratio between the ARR and the risk in the control group.
RR is easy to compute and interpret and is included in standard statistical software. The CI is calculated by exponentiating the lower and upper limits of the CI for log(RR), which has the general form
One disadvantage of RR is that its value can be the same for very different clinical situations. For example, a RR of 0.167 would be the outcome for both of the following clinical situations: 1) when the risks for the treated and control groups are 0.3 and 0.05, respectively; and for 2) a risk of 0.84 for the treated group and of 0.14 for the control group. RR is clear on a proportional scale, but has no real meaning on an absolute scale. Therefore, it is generally more meaningful to use relative effect measures for summarizing the evidence and absolute measures for application to a concrete clinical or public health situation .
Odds ratio. Odds ratio (OR) is a common measure of the size of an effect and may be reported in case-control studies, cohort studies, or clinical trials. It can also be used in retrospective studies and cross-sectional studies, where the goal is to look at associations rather than differences. The odds is the natural measure of effect size in logistic regression modeling and can be interpreted as the ratio between the number of patients who fulfill the criteria and the number who do not or the number of events relative to the number of nonevents. The odds ratio is the ratio between the odds of the treated group and the odds of the control group. It can be obtained, along with its confidence interval, using standard statistical software. Both odds and odds ratios are dimensionless. An odds ratio less than 1 means that the odds have decreased, and similarly, an OR greater than 1 means that the odds have increased. It should be noted that ORs are hard to comprehend  and are frequently interpreted as a relative risk. Although the odds ratio is close to the relative risk when the outcome is relatively uncommon , there is a recognized problem that odds ratios do not give a good approximation of the relative risk when the initial risk is high [13,14]. Furthermore, an odds ratio will always exaggerate the size of the effect compared to a relative risk [15,16]. When the OR is less than 1, it is smaller than the RR, and when it is greater than 1, the OR exceeds the RR. However, the interpretation will not, generally, be influenced by this discrepancy, because the discrepancy is large only for large positive or negative effect size, in which case the qualitative conclusion will remain unchanged.
It is worthwhile to note that RR and OR are related as follows:
where n11 is the frequency of (yes, group 1); n21 is the frequency of (yes, group 2); n22 is the frequency of (no, group 2); and n12 is the frequency of (no, group 1).
This formula explains why OR approximates RR well when n11 and n21, the frequencies of the “yes” outcome, are small relative to n12 and n22, respectively. This is known as the “rare outcome assumption.”
The odds ratio is the only measure of association directly estimated from a logistic model, without requiring special assumptions and regardless of whether the study design is follow-up, case-control, or cross sectional . Risks can be estimated only in follow-up designs. In case-control and cross-sectional designs, the OR is a ratio, which depends on four probabilities as follows:
where E = 1 if the patient was exposed, E = 0 otherwise, D = 1 if the patient has the disease, and D = 0 otherwise. It is worthwhile to note that risk cannot be estimated from a case-control and cross-sectional studies because they require conditional probabilities of the type P^(D|E), which are not available.
A hypothetical example, used in part by McQuay and Moore , will be used to illustrate the different measures accompanied by their CIs. The study aims to compare the recurrence of migraine headaches in a control group receiving placebo and a treated group receiving a new antimigraine preparation. For the sake of illustration, we examine four different possible outcomes for the control and treatment groups, denoted by C1 and M1 for study 1, C2 and M2 for study 2, C3 and M3 for study 3, and C4 and M4 for study 4. It is assumed that all groups were of 1000 individuals.
At the end of the study, migraine recurred in 30% of control group C1 (risk, 0.3), 5% of treatment group M1, 84% of control group C2, 14% of treatment group M2, 10% of control C3, 1.7% of treatment group M3, and in 95% and 70% for C4 and M4, respectively, as summarized in Table 1. The measures used are absolute risk reduction with 95% CI, risk, number needed to treat with 95% CI, relative risk with 95% CI, risk reduction, odds, and odds ratio with 95% CI.
|Risk of event||0.3||0.05||0.84||0.14||0.1||0.017||0.95||0.7|
It can be seen that:
- 1The first three cases have the same relative risk and relative risk reduction, while case 4 is significantly different. However, the absolute risk reduction, NNT, and odds ratios are significantly different in the three cases studied. (For odds ratios, case 2 is different from cases 1 and 3, which are similar.)
- 2Cases 1 and 4 have the same absolute risk reduction, NNT, and odds ratios, but very different relative risk, relative risk reduction, and risk at baseline.
The following example  is a prospective study, which compares the incidences of dyskinesia after ropinirole (ROP) or levodopa (LD) in patients with early Parkinson's disease. The results show that 17 of 179 patients who took ropinirole and 23 of 89 who took levodopa developed dyskinesia. The data are summarized in Table 2.
|Presence of dyskinesia|
The risk of having dyskinesia among patients who took LD is 23/89 = 0.258, whereas the risk of developing dyskinesia among patients who took ROP is 17/179 = 0.095
Therefore, the absolute risk reduction is
The variance of ARR is given by
Therefore, a 95% confidence interval for the difference in proportions is given by
where 1.96 is the upper percentile of 2.5, taken from a standard normal table for 95% CIs.
The number needed to treat and its CI are obtained from ARR and its CI by taking the reciprocals as NNT = 1/ARR = 1/0.163 = 6.13, and its CI is given by (1/0.264 − 1/0.063) = (3.79 − 15.87). The relative risk is 0.095/0.258 = 0.368
The confidence interval is obtained as follows: a CI for the log of RR is obtained, and the lower and upper limits are transformed to obtain the desired interval.
The variance of log(RR) is given by
Therefore, a 95% CI for log(RR) is given by
Exponentiating the lower and upper confidence limits provides the 95% CI for RR: (0.207–0.653).
The odds of having dyskinesia for LD patients is 23/66 = 0.348. The odds of having dyskinesia for ROP patients is 17/162 = 0.105, and therefore the odds ratio OR is 0.105/0.348 = 0.302.
The procedure for obtaining a confidence interval is as follows: a CI for the log of OR is obtained, and the lower and upper limits are then transformed to obtain the desired interval.
The variance of log(OR) is given by
Therefore, a 95% CI for log(OR) is given by
Exponentiating the lower and upper limits, we obtain the 95% CI for OR as (0.151–0.602).
Because RR is clear in proportional scale, but has no real meaning on an absolute scale, it might be best to report both—to use a relative effect measure for summarizing the evidence and an absolute measure for applying it to a concrete clinical or public health situation. For our example, all the statistics show that ROP is better at preventing dyskinesia. However, it is best to report that the risk with LD is three times higher than the risk with ROP and that, by using ROP, the risk of developing dyskinesia is reduced by 16%. These two pieces of information complete the picture.
An interesting study reported by Malenka et al.  tested whether a patient's perception of benefit is influenced by how the benefit is presented—in relative or absolute terms. They found that the framing of benefit or risk in relative versus absolute terms may have a major influence on patient preference. The medication whose benefits were expressed in relative terms was chosen by 56.8% of patients, whereas 14.7% chose the medication whose benefit was expressed in absolute terms.
The discussion above and the hypothetical example were aimed at showing that choice of treatment depends on the measure being used. Therefore, it is important that the practitioner understands what the different measures really express and which ones may be more appropriate for a specific patient setting. For example, ARR and NNT are absolute measures, whereas RR and RRR are relative measures. It is recommended that both a relative and an absolute measure be reported, to portray a more complete picture.
The author thanks Dr Rivka Inzelberg whose valuable comments helped improve this paper.