## Introduction

The wish for quantifying the fraction of diseased or deceased individuals that can be ascribed to a given exposure or risk factor has been fundamental to respiratory epidemiologic research, and the investigation of potential cause-effect relationships is an ultimate goal. Levin (1) seems to be the first to have published a measure for this parameter based on probabilistic considerations. Levin's interest was in quantifying the proportion of lung cancer cases in the population that could be ascribed to smoking, and he formulated his measure in terms of the relative risk and the probability of exposure. MacMahon and Pugh (2) proposed an alternative formulation in terms of the total risk of disease and the risk of disease in the unexposed. The two formulations that were proved to be algebraically equivalent by Leviton (3) constitute the classical definitions of the *attributable fraction* (AF), and are given in probabilistic notation as follows:

Levin's formula

MacMahon & Pugh's formula

Here, D denotes the event of disease, *Ē* the event of exposure, and over-line complementary events, while the relative risk is RR = P(D|E)/P(D|E). For instance, D might be getting or having lung cancer and E being smoking exposed in some well-defined meaning. Maximum likelihood estimators (MLE) are easily obtained by substituting probabilities by the corresponding sample proportions. Thus, Levin's formula is suitable when the relative risk can be estimated like in a case-control study, while MacMahon and Pugh's formula is more convenient for a cross-sectional or a one-sample cohort study. Alternative formulations have been proposed to optimally serve different sampling designs. A new formulation useful when only exposure probabilities are available was given by Eide and Heuch (4).

Miettinen (5) formulated the *attributable fraction in exposed* (AFE) confining the proportion to only those exposed (i.e. all subjects with the event E) and not the total population. The formulations à moduli Levin (1) and MacMahon and Pugh (2), respectively, can be written as

and

Miettinen (5) also related AF to AFE by the equation

For the AF, Walter (6, 7) developed asymptotic distributions for the MLEs in the cross-sectional, cohort and case-control designs providing approximate standard errors and confidence intervals (CIs). For AFE, being just at transformation of the RR, standard errors and confidence intervals are easily obtained by transforming them from the RR-scale.

The theory so far was univariate, describing the total elimination of only one exposure. Thus, the attributable fraction as defined above is considered to be crude, unadjusted or ‘marginal’. However, most often the situation is multi-expositional, i.e. there are many factors influencing the probability of disease, and Walter (8) was the first to discuss this problem in probabilistic terms. Some exposures may be considered to be modifiable, others not; and the adjusted attributable fraction, as first defined by Whittemore (9), was designed to quantify the effect of removing one exposure while the others remained unchanged. Whittemore (9, 10) also developed the asymptotic distribution of the maximum likelihood estimator from case-control data. Morgenstern and Bursic (11) suggested the slightly more general concept of ‘potential impact fraction’ reflecting the possibility of imperfect prevention of exposure.

Moreover, in the multi-factorial case often a multiple logistic model was estimated for the risk of disease and Bruzzi *et al.*(12) showed how this could be applied to estimate adjusted attributable fractions with case-control data. Benichou and Gail (13) were the first to apply the delta method to find the asymptotic variance for a model-based adjusted AF with case-control data and Basu and Landis (14) extended this methodology to cohort and cross-sectional data.

Despite these developments, much confusion prevailed when trying to apportion an excess risk to single exposures in a multi-expositional setting. Some calculated the crude AF for each exposure, and some calculated the AF for each exposure adjusted for the rest. Each method gives AFs for the single exposures involved that might sum up to more (or less) than the AF for them all and even to more than 1 (15). Some authors ‘normalized’ the calculated single-factor AFs so that they were forced to sum to this total combined AF (mentioned, but not advocated, by Kjuus *et al.*) (16). An elegant solution to this problem was, however, first given by Cox Jr. (17–19) by adapting a principle from game theory (20). Also, Kruskal (21, 22) discussed a parallel solution for ranking independent variables in a multiple regression model according to their individual contributions to the total explained variance. Recently, Rowe *et al.*(23) provided an updated discussion, however incomplete, as it did not mention the nice solution from game theory.

At the end of the 1980s, there was no standard software available for displaying or calculating estimates of attributable fractions of any kind. With regard to graphic presentation Kjuus *et al.*(24) and Olsen and Kristensen (25) included some nice figures illustrating basic concepts of AFs in instructive ways by using pie charts and risk vs exposure plots, respectively, thus paving the way for later development of graphic computerised routines.

In the classical epidemiologic literature, the AF has mostly been a static measure giving the proportion of cases at a given time point that could have been prevented by a hypothesized intervention on the exposure distribution. However, an intervention may have immediate, short-term and long-term effects on the occurrence of a disease; and episodes of disease may more or less come and go (chronic diseases). Also, subjects may be of risk for an exposure for shorter or longer periods of time and with varying amounts. Thus, there is a need to further develop the classical concept of attributable fraction by using more dynamic and flexible modelling of the effects of putative preventive interventions that also take time to disease into account.

The terminology has not been unique through the years and different authors prefer different terms for the AF, the most popular being etiologic fraction and (population) attributable risk.

This review has three aims: (i) to give an introduction to today's standard methods for apportioning excess risk to multiple exposures, groups of exposures and subpopulations; (ii) to give an introduction to graphical displays that are useful with AF-estimation; and (iii) to discuss future possible development of AF methodology that is relevant to the situation with survival time data. The review will be highly coloured by the author's own work with these topics through the years, but hopefully still useful in practice for any researcher in the field of respiratory epidemiology.