Attributable fractions for partitioning risk and evaluating disease prevention: a practical guide


  • Geir E. Eide

    Corresponding author
    1. Centre for Clinical Research, Haukeland University Hospital, Bergen, Norway
    2. Section for Epidemiology and Medical Statistics, Department of Public Health and Primary Health Care, University of Bergen, Bergen, Norway
    Search for more papers by this author

  • Conflicts of interest
    The author has declared that there are no conflicts of interest.

Geir Egil Eide, Dr. Philos., Centre for Clinical Research, Haukeland University Hospital, Armauer Hansen's House, N-5021 Bergen, Norway.
Tel: +47 55975534
Fax: +47 55976088
email: Geir.Egil.Eide@Haukeland.No


Introduction:  The attributable fraction (AF) is used for quantifying the fraction of diseased ascribable to one or more exposures. The methodology and software for its estimation has undergone a considerable development during the last decades.

Objectives:  To introduce methods for: (i) apportioning excess risk to multiple exposures, groups of exposures and subpopulations; (ii) graphical description; and (iii) survival data.

Results:  Adjusted, sequential and average AFs are reasonable measures obtainable with standard software. The latter two both sum up to the combined AF for a set of exposures. The average AFs are independent of the exposures' ordering. For an ordered, preventive strategy, scaled sample space cubes illustrate the effects on the risk of disease from stepwise exposure removal. Pie charts illustrate the portions of the total risk ascribed to different exposures or risk-profiles. Attributable hazard fraction, AF before time t, and AF within study incorporate time to disease and interventions.

Conclusions:  The practice of crude calculations of AFs in epidemiology should be abandoned. Further development of methods for AFs with survival data and possibly linking it to causal modelling is of interest.

Please cite this paper as: Eide GE. Attributable fractions for partitioning risk and evaluating disease prevention: a practical guide. The Clinical Respiratory Journal 2008; 2: 92–103.


The wish for quantifying the fraction of diseased or deceased individuals that can be ascribed to a given exposure or risk factor has been fundamental to respiratory epidemiologic research, and the investigation of potential cause-effect relationships is an ultimate goal. Levin (1) seems to be the first to have published a measure for this parameter based on probabilistic considerations. Levin's interest was in quantifying the proportion of lung cancer cases in the population that could be ascribed to smoking, and he formulated his measure in terms of the relative risk and the probability of exposure. MacMahon and Pugh (2) proposed an alternative formulation in terms of the total risk of disease and the risk of disease in the unexposed. The two formulations that were proved to be algebraically equivalent by Leviton (3) constitute the classical definitions of the attributable fraction (AF), and are given in probabilistic notation as follows:

Levin's formula


MacMahon & Pugh's formula


Here, D denotes the event of disease, Ē the event of exposure, and over-line complementary events, while the relative risk is RR = P(D|E)/P(D|E). For instance, D might be getting or having lung cancer and E being smoking exposed in some well-defined meaning. Maximum likelihood estimators (MLE) are easily obtained by substituting probabilities by the corresponding sample proportions. Thus, Levin's formula is suitable when the relative risk can be estimated like in a case-control study, while MacMahon and Pugh's formula is more convenient for a cross-sectional or a one-sample cohort study. Alternative formulations have been proposed to optimally serve different sampling designs. A new formulation useful when only exposure probabilities are available was given by Eide and Heuch (4).

Miettinen (5) formulated the attributable fraction in exposed (AFE) confining the proportion to only those exposed (i.e. all subjects with the event E) and not the total population. The formulations à moduli Levin (1) and MacMahon and Pugh (2), respectively, can be written as




Miettinen (5) also related AF to AFE by the equation


For the AF, Walter (6, 7) developed asymptotic distributions for the MLEs in the cross-sectional, cohort and case-control designs providing approximate standard errors and confidence intervals (CIs). For AFE, being just at transformation of the RR, standard errors and confidence intervals are easily obtained by transforming them from the RR-scale.

The theory so far was univariate, describing the total elimination of only one exposure. Thus, the attributable fraction as defined above is considered to be crude, unadjusted or ‘marginal’. However, most often the situation is multi-expositional, i.e. there are many factors influencing the probability of disease, and Walter (8) was the first to discuss this problem in probabilistic terms. Some exposures may be considered to be modifiable, others not; and the adjusted attributable fraction, as first defined by Whittemore (9), was designed to quantify the effect of removing one exposure while the others remained unchanged. Whittemore (9, 10) also developed the asymptotic distribution of the maximum likelihood estimator from case-control data. Morgenstern and Bursic (11) suggested the slightly more general concept of ‘potential impact fraction’ reflecting the possibility of imperfect prevention of exposure.

Moreover, in the multi-factorial case often a multiple logistic model was estimated for the risk of disease and Bruzzi et al.(12) showed how this could be applied to estimate adjusted attributable fractions with case-control data. Benichou and Gail (13) were the first to apply the delta method to find the asymptotic variance for a model-based adjusted AF with case-control data and Basu and Landis (14) extended this methodology to cohort and cross-sectional data.

Despite these developments, much confusion prevailed when trying to apportion an excess risk to single exposures in a multi-expositional setting. Some calculated the crude AF for each exposure, and some calculated the AF for each exposure adjusted for the rest. Each method gives AFs for the single exposures involved that might sum up to more (or less) than the AF for them all and even to more than 1 (15). Some authors ‘normalized’ the calculated single-factor AFs so that they were forced to sum to this total combined AF (mentioned, but not advocated, by Kjuus et al.) (16). An elegant solution to this problem was, however, first given by Cox Jr. (17–19) by adapting a principle from game theory (20). Also, Kruskal (21, 22) discussed a parallel solution for ranking independent variables in a multiple regression model according to their individual contributions to the total explained variance. Recently, Rowe et al.(23) provided an updated discussion, however incomplete, as it did not mention the nice solution from game theory.

At the end of the 1980s, there was no standard software available for displaying or calculating estimates of attributable fractions of any kind. With regard to graphic presentation Kjuus et al.(24) and Olsen and Kristensen (25) included some nice figures illustrating basic concepts of AFs in instructive ways by using pie charts and risk vs exposure plots, respectively, thus paving the way for later development of graphic computerised routines.

In the classical epidemiologic literature, the AF has mostly been a static measure giving the proportion of cases at a given time point that could have been prevented by a hypothesized intervention on the exposure distribution. However, an intervention may have immediate, short-term and long-term effects on the occurrence of a disease; and episodes of disease may more or less come and go (chronic diseases). Also, subjects may be of risk for an exposure for shorter or longer periods of time and with varying amounts. Thus, there is a need to further develop the classical concept of attributable fraction by using more dynamic and flexible modelling of the effects of putative preventive interventions that also take time to disease into account.

The terminology has not been unique through the years and different authors prefer different terms for the AF, the most popular being etiologic fraction and (population) attributable risk.

This review has three aims: (i) to give an introduction to today's standard methods for apportioning excess risk to multiple exposures, groups of exposures and subpopulations; (ii) to give an introduction to graphical displays that are useful with AF-estimation; and (iii) to discuss future possible development of AF methodology that is relevant to the situation with survival time data. The review will be highly coloured by the author's own work with these topics through the years, but hopefully still useful in practice for any researcher in the field of respiratory epidemiology.

Material and methods

The most prominent dataset in the examples described later stems from the Hordaland Study of Obstructive Lung Disease (HSOLD) (26), and one part was given in detail by Eide and Gefeller (27).

Most analyses were done using Stata (Stata Corp, Texas); however, Excel (Microsoft Corporation, Washington) and Maple (Waterloo Maple Inc, Canada) have also been applied. The most advanced graphics were done in Maple.


This section will be divided in three parts according to the three previously mentioned aims of the review.

Attributable fractions with multiple risk factors

A systematic development of terminology and probability theory concerning the effect on a binary response from hypothesized manipulations of one or several risk factors in a population was given by Eide and Heuch (28). The manipulations were either lowering the harmful effect of an exposure or reducing the exposure's extent in the population. Such modifications might be done in a stepwise manner by removing one exposure at a time. In this case, the adjusted AF quantifies the effect of removing one exposure in step 1 leaving the rest unchanged. The combined adjusted attributable fraction arises as the measure of the combined effect of removing several (or all) exposures while adjusting for the remaining. With stepwise removal of one exposure at a time in a pre-specified order, the important concept of sequential attributable fractions (SAFs) for all risk factors in an ordered stepwise preventive strategy are defined as differences between adjusted combined AFs at two consecutive steps. Not giving an exposure any priority in the set of exposures to be eliminated leads to the concept of average attributable fraction (AAF) as the average of the SAFs for this exposure over all possible orderings of the risk factors in the set. Both the SAFs and the AAFs for a set of risk factors exhibit the appealing property of summing to the combined AF for the set. However, only the AAFs are independent of the ordering and thus give a unique apportioning of the combined, possibly adjusted, attributable fraction to the single-risk factors eliminated. The usefulness of these measures for evaluating and choosing between different preventive strategies will be demonstrated in the examples below.

Further theoretical justification for the SAF and the AAF has been provided by the research group of Gefeller (29–31), relating their theoretical properties to the optimality of the Shapley value in game theory (20). This group has also extended the concept of AAF to a multiplicative, rather than additive, variant (32) as well as a variant for grouped exposure variables (33).

A Bayesian extension has also been suggested in which the different ordered strategies were given weights according to an a priori consensus (34, 35). Recent evaluations of the different approaches have been provided by Rabe and Gefeller (36) and Rabe et al.(37).

Routines for estimating estimators based on multiple logistic and Poisson regression models have been programmed in Stata (38) and a brief user guide exists (39). Stata also provides unadjusted estimates for various study designs, and below examples of its use are given.


All analyses in this example were done with Stata/SE 8.2 for Windows. Technical details of the analyses were given by Eide (39).The data comes from the cross-sectional random sample of the Hordaland Study of Obstructive Lung Disease in 1985. The data analysed below was given in table 1 of Eide and Gefeller (27) and has 4 variables: one response variable which is chronic cough (no = 0, yes = 1) and three explanatory variables: residence (rural = 0, urban = 1), smoking (never = 0, ex = 1, light = 2, moderate = 3, heavy = 4) and occupational exposure to dust or gas (0 = unexposed, 1 = exposed). The aim is to estimate and discuss the different kinds of attributable fractions of the prevalence of chronic cough with respect to urban residence, smoking and dust or gas exposure.

First, Stata was used to estimate unadjusted attributable fractions based on 2 × 2 tabled data between each exposure and the response. For instance, for the multi-categorical smoking variable the four exposure categories were combined to one and the Stata command cs was used to produce the output in Fig. 1.

Figure 1.

Stata output for estimation of attributable fractions from a cross-sectional study by using the command: cs cough smoking [fweight = freq].

Note that Stata only gives confidence limits for the AFE, but not for the AF in the population. For the other two, dichotomous exposures, the set up of the 2 × 2 tables are straight forward and the AFs were estimated as 0.202 for urban residence and 0.219 for gas/dust exposure.

A combined attributable fraction for all three exposures can be obtained by constructing the dichotomous no-exposure/any exposure variable and the corresponding 2 × 2 table. The result is a combined attributable fraction for all three exposures of 0.648, which is much less than the sum of the three unadjusted AFs (0.447 + 0.202 + 0.219 = 0.868).

To obtain CIs for these data-based, unadjusted, attributable fractions, the formulas by Walter (7) could be applied; but equivalently one may run a simple logistic regression in Stata as will be shown below. Note that for the cs-command the Stata-documentation has an error as the response-variable and the exposure-variable have been switched.

Finally, it is worth knowing that the formulas for estimates and confidence limits with case-control data (matched and unmatched) are different from cohort studies or cross-sectional studies. Thus, with case-control data, the Stata commands are also different from that described above. Details were given by Eide (39).

We proceed to describe estimation of adjusted, sequential and average attributable fractions based on a logistic regression model when data comes from a randomly sampled cohort or a cross-sectional study. The data of table 1 of Eide and Gefeller (27) may be given to Stata in various formats, that be it tabled or case wise. An example of the former is given in Table 1, which consists of one line per combination of the values of the response variable (chronic cough) and the explanatory variables, that is altogether 2 × 2 × 5 × 2 = 40 lines. The first column identifies the 20 different ‘exposure classes’ by numbering them from 0 to 19. The three next columns are the values for the three explanatory variables residence, smoking and dust, respectively. The fifth column is the response variable and the last column gives the observed frequencies for each response × exposure class. To analyse the five-leveled variable smoking as a nominal variable, four indicator (or dummy) variables were created and named es (ex-smoker), ls (light smoker), ms (moderate smoker) and hs (heavy smoker), respectively; and the Stata command logit was used to perform a logistic regression analysis resulting in the output of Fig. 2.

Table 1.  Example of input data file to Stata
 . . . 
Figure 2.

Stata output for estimation of logistic regression model by using the command: logit cough residenc es ls ms hs dust [fweight = freq].

Then, to obtain the estimated combined attributable fraction of the prevalence of chronic cough because of smoking and dust/gas-exposure adjusted for residence, the aflogit command was applied resulting in the output of Fig. 3. The estimated combined AF is 0.512 with 95% CI: (0.414, 0.595). Also found in the output is the AF for dust adjusted for smoking and residence (0.1674), while the corresponding AF for smoking must be calculated as the sum of the components (−0.0425 + 0.0169 + 0.2292 + 0.2089 = 0.4126). Ex-smoking is ‘preventive’ for chronic cough and has a negative AF component.

Figure 3.

Stata output for estimation of logit-model-based attributable fraction by using the command: aflogit es ls ms hs dust [fweight = freq].

Now, using this procedure, adjusted attributable fractions were estimated for each single-risk factor, each pair of risk factors, and the combined attributable fraction for all three risk factors. The results are summarised in Table 2.

Table 2.  Attributable fractions (AFs) of chronic cough from Stata
ExposureUnadjusted estimatesAdjusted estimates
AF95%CI: LL95%CI: ULAdj AF95%CI:LL95%CI: UL
Residence (urban)0.20200.09820.29390.16710.06290.2598
Smoking (ever)0.44660.34170.53470.41260.30200.5056
Dust (exposed)0.21900.15010.28220.16740.09630.2329
Residence Smoking0.60970.45990.71800.51810.40340.6108
Residence Dust0.36970.23900.47790.31330.19930.4110
Smoking Dust0.51900.40090.61390.51240.41350.5946
Any of the three0.64760.48000.76120.60170.50000.6828

From Table 2, one may study different, ordered, stepwise preventive strategies by calculating sequential and average attributable fractions according to Eide and Gefeller (27). For instance, removing smoking first (not changing dust exposure or residence) will reduce the prevalence to 41.3%. Further removing dust/gas exposure (not changing residence) will give a total reduction of 51.2%, i.e. a sequential reduction of 9.9% when removing dust after smoking. Finally, from removing urban residence, a total of 60.2% reduction of chronic cough in the population is obtained. Altogether, with three exposures, there are six different ordered removal strategies, and averaging the sequential effects of removing smoking over all strategies gives an average attributable fraction due to smoking of 34.96%. Stata does not calculate these, but for the average attributable fraction, Grömping and Weimann (40) developed the asymptotic distribution and a SAS-procedure for its estimation that may facilitate extended use in the future. Also, the freeware pARtial Package (41) developed in R can compute these and, moreover, bootstrap- and jack-knife estimates. A recent paper (42) summarised the historical development and compared the coverage of different interval estimators. In conclusion, they found that confidence intervals based on the computer-intensive methods may be worth considering when estimating the adjusted attributable fraction.

From the beginning, there has been a duality between AF and AFE and also between attributing risk to variables and to categories. While the theory of adjusted, sequential and average AFs was well developed, a similar development was not done for the AFE, although for the unadjusted versions equation (5) gives a simple relationship. In Eide and Heuch (43), a probabilistic development of adjusted, sequential and average AFEs was carried out in parallel to that for the AF. In principle then, the AFE for each exposure subclass is partitioned into average AFEs for all the factors responsible for the excess risk of this subclass; and by piecing together the AFEs from all the subclasses for one particular factor, the average AF in the total population for this factor is restored. Thus, the relationships between average AF concepts for the exposed and for the population were disclosed; and a complete theory for the adjusted, sequential, an average attributable fractions was established. This does not only resolve the duality between partitioning the AF and AFE to multiple-risk factors, but also the duality between attributing risk to variables or to categories. Indeed, a complete decomposition of the AF for the chronic cough example above was given in table 3 of Eide and Heuch (43).

Graphical description

The risk-exposure plot of Eide and Gefeller (27) was formalised by Eide and Heuch (28) and termed as ‘scaled Venn diagram’. This ‘scaled Venn diagram’ consists of a unit square representing the sample space where the probability metric at the horizontal scale represents the joint distribution of the explanatory variables and at the vertical scale the response (disease/not disease) distribution. For discrete exposure variables, the probabilities of various events are mapped as areas of corresponding rectangles in the unit square thus directly depicting the sizes of diseased and exposed groups in the population. A variant with one continuous exposure variable was also described. The problem of generalising the diagram to the case with more than one continuous explanatory variable has, however, not been resolved.

The term ‘scaled Venn diagram’ was chosen because its construction was inspired by, and reminded the authors of, the classical Venn diagram (John Venn 1834–1923). This term is, however, slightly misleading since the classical Venn diagram is different in some important aspects like having no metric scale (is scale-less), no outer border for the ‘universe’, a symmetrical appearance, and in that events (sets) are illustrated by possibly intersecting circles or ellipses rather than rectangles (44). A contemporary competitor to Venn's diagram was the ‘Lewis Carroll diagram’ (Louis Carroll 1832–1898) in which the ‘universe’ was confined within a square and the events were represented by rectangles. A metrical Venn diagram was suggested by Edwards and Edwards (45) as a squared diagram enabling visual comparison of expected and observed frequencies in a 2 × 2 × 2 contingency table. However, neither these, nor later, developments of the classic Venn diagram (44) are convenient for illustrating excess risk or attributable fractions. Rather, for discrete exposure variables, the ‘scaled Venn diagram’ is a two-dimensional (2D) Mondrian plot (46) where all combinations of the values for the explanatory variables generate the first dimension and the response variable the other. To also embrace the variant with a continuous explanatory variable, the term scaled sample space square (SSSS) may be more appropriate to apply. Eide and Heuch (28) gave examples of the suggested diagrams for cross-sectional as well as case-control data, relaxing the requirement of a unit square.

In Eide and Heuch (47), the SSSS of Eide and Heuch (28) was combined with the mosaic-plot (48, 49) to create a three-dimensional (3D) display of the multivariate association structure within the exposure variables as well as between disease variable and the exposure variables. Moreover, this so-called scaled sample space cube (SSSC) directly depicts the sizes of diseased and not diseased subpopulations in various exposure groups, making it especially useful for illustrating excess risk given any level and combination of exposures.

The mosaic plot is a version of the spine plot (50) made to graphically disclose possible associations in multidimensional contingency tables (51). In the mosaic plot (48), the tiles are separated by gaps to improve visual discrimination. This is especially useful with empty or infrequent categories, but is not so convenient when a probability metric is desired in the plot. Without gaps, the mosaic display is termed a Mondrian diagram (reminding of paintings by the Dutch painter Pieter Cornelis Mondriaan 1872–1944) (46).

The resulting scaled sample space cube illustrates the joint exposure distribution as a Mondrian diagram in the two first dimensions and the conditional response distribution in the third dimension. Probabilities of the various events appear as volumes of 3D rectangular boxes within the cube. In Eide and Heuch (47), it was demonstrated how the scaled sample space cube can be used to illustrate excess risk and the potential impact on risk of disease from hypothesised interventions on the exposures in the population. Examples were given by Eide, Heuch and Eagan (52) and are shown in Fig. 4. The figure uses data of the Hordaland Study of Obstructive Lung Disease (53) on 11 years of incidence of attacks of dyspnea and illustrates the effect of removing exposures according to all possible ordered preventive strategies. The volume of the yellow boxes represents the proportions of disease that might have been prevented. It illustrates that the most effective strategy would be to remove smoking first (direct estimated adjusted AF: 0.297, 95%CI: (0.173, 0.420)), and, thereafter, female gender (adjusted combined AF: 0.441, 95%CI: (0.242, 0.640)). Gender is, however, not a realistic target for a preventive campaign and this strategy is mostly of academic interest. However, one might speculate if female gender in this case could be a proxy for other, possibly modifiable, exposures that are yet to be identified. For dust/gas and smoking, the combined adjusted AF is 0.344 with 95% CI of (0.201, 0.488). (Notice that changing the incidence in the person's being at least 30 years of age in 1985 to the level of those below 30 years increased the overall incidence and gave negative attributable fractions for age while choosing age above 50 as reference gave positive attributable fractions as shown in Fig. 4).

Figure 4.

Scaled sample space cubes showing the effects of stepwise removal of exposures; Volumes of yellow boxes are potentially removed excess risks.

Finally, Eide and Gefeller (27) constructed a simple pie chart for displaying the average attributable fractions. The complete decomposition to risk factors and subpopulations obtained by Eide and Heuch (47) invites for further development of this pie chart as shown in Fig. 5, panel A and B, respectively. In Panel B, the original pie chart of average attributable fractions (27) is further subdivided by risk profiles. In panel A, the pie chart is sorted according to the 20 exposure classes and further subdivided to single exposures within each class. As a pie chart cannot display negative AF components, such components are subsumed in larger subgroups to obtain only positive values.

Figure 5.

Pie chart showing complete decomposition of total AF × 100% = 60.18% to risk factors and exposure classes. Left panel sorted according to exposure class numbering, right panel according to the risk factors, i.e. residence (green: 12.75%), smoking (red: 34.96%) and occupational exposure to dust or gas (blue: 12.47%).

The scaled sample space cube is especially beneficial when used interactively on a computer screen. Rotation and zooming often disclose hidden features of the data or the model. Interactivity is even more important with the volume-based scaled sample space cube than with the area-based scaled sample square. In print, however, the advantages of such interactive manipulation are not so apparent, but Fig. 6 shows an example of rotating the scaled sample space cube of Fig. 4 that discloses the small group of women who were above 50 years and smokers in 1985 but had no incident cases of breathlessness during 1985–1996; a survival of the fittest effect? An example of the usefulness of zooming with the scaled sample space square was given by Eide and Heuch (28).

Figure 6.

Rotation of scaled sample space cube to better display the extreme risks in the small exposure groups in the back of the cubes of Fig. 4. For females over 50 years, there were no cases of breathlessness in the smoking group, but a large proportion among the ex-smokers.

Like for the mosaic plot, or the double-decker plot (54), plotting the scaled sample space cube for different orderings of the explanatory factors may disclose information that otherwise might go undetected. Indeed, in the context of attributable fractions this can give directions to more efficient preventive strategies. In the cubes in Fig. 4, the orderings of the variables in the mosaic plot in the base is the same for all preventive strategies to make visual comparisons between the different removal strategies easier. When illustrating only one selected preventive strategy, one may adapt the convention of having the same order in the base mosaic as in the actual strategy because eliminating one exposure from the mosaic will give the mosaic display corresponding to the remaining exposure factors. Another ambiguity is the ordering of categories within each categorical variable. For an ordinal variable, it seems natural to stick to the original ordering, but for a nominal variable the ordering does not come automatically. One reasonable convention would then be to order the categories according to increasing risk of disease, be it hypothesised, observed or estimated.

As of today, neither mosaic plots, nor spine plots, Mondrian diagrams, or SSSCs can be found in standard statistical software, although specialised software exists ( Indeed, this author could not even find a standard statistical package with the option of creating a univariate histogram with bin widths varying between the categories, nor could the option of a 3D bivariate histogram with irregular baseline grid be found. For example, to incorporate such flexibilities in the plotting routines, drawing a possibly two-dimensional histogram with varying pre-selected or data-driven bins should be a natural challenge for statistical software developers.

Statistical graphics are distinguished from other graphics by their universality (55) and are valid for any data measured on a nominal, ordinal or continuous scale and not tailored for only one specific application. The 3D scaled sample space cube is such a method that, like a histogram, it can be used also for continuous distributions provided that reasonable categorisations, or possibly smoothing, are made. Developing further interactive facilities like querying, selection and linking, and varying the plot characteristics by rescaling, resizing, zooming, reordering or re-colouring (50) might further enhance its applicability.

Attributable fractions and time

The ultimate goal of traditional epidemiology concerns relationships between exposure and disease that are causal, and a common requisite for this is that the individual should be exposed before disease occurs. However, time from exposure to disease is not an ingredient in the definitions of attributable fractions in the tradition of Levin (1). Moreover, the time from a suggested intervention to its impact on the population studied has seldom, if ever, been taken into account when estimating attributable fractions. Thus, the attributable fraction is a static measure considering the disease situation at one point in time and what it could have been if the risk distribution at the same time point, or some other fixed (but seldom clearly defined) time point, had been different. The need for a more dynamic attributable fraction measure is obvious. Indeed, from the field of event history analysis (or survival analysis), a set of models is offered that describes evolution of disease in individuals over time. Indeed, such models set out to describe the time from one event (starting time) to another (disease) and estimation procedures are especially developed that account for the feature of censored observation times that is so frequently present in survival data.

Samuelsen and Eide (56) set out to transfer the concepts of attributable fraction to the situation with survival data. To accomplish this, three different concepts were proposed: the attributable hazard fraction (AHF), the attributable fraction before time t (AFB), and the attributable fraction within study (AFW). Traditionally, one would think of these AFs as the potential effects of removing an exposure at the starting time (t = 0), i.e. that some kind of intervention takes place at the intervention time s = 0. In Samuelsen and Eide (2007), this situation is modelled within the framework of time-to-event analysis and extended to include the possibility of an intervention at a later time point, s > 0. The different definitions will be illustrated and interpreted by extending an example by Chen et al.(57).


Figure 7 illustrates some of these concepts. The idea here is that individuals in the population are followed from a starting time point (t = 0) to the time of event. This could be e.g. time from start of exposure to diagnosis of the disease, time from diagnosis to death, or time from birth to first symptoms of the disease (age). In the latter two cases, ‘exposure’ could be some personal characteristic as e.g. smoking or low birth weight, respectively. At a time point s > 0, one may want to study the effect of an intervention removing the exposure or compensating for it by treatment, e.g. stop smoking or some special intervention aimed at children with low birth weight. Figure 7 illustrates that all the suggested AFs are time-dependent. With intervention at time s, all measures are zero when t < s.

Figure 7.

Attributable fractions with intervention at t = s.

The attributable hazard fraction at time t with intervention at time s, AHF(t,s), is interpreted as the fraction of the immediate risk of disease at time t given no disease before this time point that can be prevented by the intervention at time s. For t ≥ s, AHF(t,s) is monotonically decreasing and approaches zero.

The attributable fraction before time t with intervention at time s, AFB(t,s), is interpreted as the fraction of the total number of diseased up to time t given that exposure could have been prevented by the intervention at time s. For t ≥ s, AFB(t,s) is increasing continuously from zero to a maximum cumulative effect before decreasing to zero in the long run.

The attributable fraction within study with intervention at time s, AFW(t,s), is defined as the fraction of the total number of diseased in a study with maximum follow-up time t for all subjects that could be prevented with intervention at time s. This measure depends on the censoring scheme for the particular study at hand and may or may not approach zero when maximum follow-up time is approaching infinity. In Fig. 7, there is a curve for a study with only administrative censoring and one with additional cases lost to follow-up.

The bias from the common practice of substituting the adjusted hazard ratio from a Cox proportional hazards model for survival data (58) for the RR in the classical definition of Levin (1953) was also demonstrated by Samuelsen and Eide (56).

Samuelsen and Eide (56) proposed estimators for the three AF measures and showed how various point-wise confidence intervals can be obtained using the bootstrap methodology. The proposed concepts and their estimators were illustrated using a real data set on age when being granted cash benefit for hearing impairment. Receiving cash benefit then served as a proxy for developing hearing impairment and the impact on the risk of being hearing impaired by preventing harm from low birth weight in several scenarios were estimated.


First, this article describes reasonable and practical methods for apportioning excess risk to different risk factors as well as to groups of risk factors and to subpopulations identified by given risk profiles. Second, a graphical methodology for describing excess risks attributable to different exposure groups in a multi-expositional setting is described. This methodology provides means for dynamic illustrations of the effects on the risk of disease in a population of stepwise removing exposures in an ordered preventive strategy. It also paves the way for making simple pie charts quantifying the portions of the total risk to different exposures or risk-profiles. Finally, the article points at how the classic concept of attributable fraction from epidemiology can be further developed from being a static measure to being used with dynamic scenarios encountering risks developing in time or measuring effects of preventive interventions on risk changing over time.

In summary, the practice of simple and crude calculations of attributable fractions in epidemiology should be abandoned. Modern statistical software should be upgraded to be able to illustrate the different kinds of attributable fractions as well as with procedures for estimating the more advanced types. Finally, further development of the statistical theory for attributable fractions with survival data and possibly linking it to causal modelling should be of great interest.


The author is grateful for the invitation from the journal editors to give this synopsis of his work with attributable fractions. Prof. Ivar Heuch is especially acknowledged for his longstanding contributions to and discussions of most of the ideas presented throughout the article.