### INTRODUCTION

- Top of page
- ABSTRACT
- INTRODUCTION
- FRACTIONAL POLYNOMIALS
- CASE STUDIES
- DISCUSSION
- CONFLICT OF INTEREST
- ETHICS STATEMENT
- ACKNOWLEDGEMENTS
- REFERENCES

Determining the functional form of the relationship between a variable that represents cumulative duration of past exposure to an agent and the hazard of an adverse outcome is an important issue in environmental and occupational epidemiology, public health, and clinical medicine. Important examples of cumulative duration of past exposure include tobacco smoke, industrial toxins, environmental pollutants, radiation, and medications taken for chronic medical conditions. Correctly describing the functional nature of the relationship between such exposures and the risk of an adverse event can inform regulators, public health officials, clinicians and patients and can identify whether or not cumulative toxicity is important. When it is, an accurate characterization of the relationship between cumulative duration of past exposure and toxicity can help to characterize safe levels of cumulative exposure, levels of cumulative duration of past exposure at which the risk becomes clinically important, and the level of total duration of exposure after which the risk of the outcome increases more rapidly.

Common study designs for the analysis of the effects of exposures on the risk of adverse events are the cohort design, the case-control design and nested case-control design. When using a cohort design, the Cox proportional hazards regression model is often used to describe the association between exposure and the hazard of an outcome. When using a case-control (or nested case control) design in which one or more controls have been individually matched to each case, conditional logistic regression can be used to determine the association between total cumulative duration of past exposure and the risk of the outcome.

The Cox proportional hazards regression model is frequently used in biomedical research to model the effect of explanatory variables on the hazard of the occurrence of a time-to-event outcome.[1] A primary advantage of this model is that one is freed from making specific assumptions about the functional form of the hazard function or about the specific parametric family from which the distribution of event times arises. Two secondary advantages of this model are its abilities to incorporate time-varying covariate effects and time-dependent (or time varying) covariates.[2] The former refers to a variable whose effect on the hazard of the outcome is allowed to change over the duration of follow-up. The latter refers to a variable whose value itself changes over the duration of follow-up.

Dichotomous or categorical time-dependent variables occur frequently in biomedical research. Each level of the variable represents a discrete state, and patients can move between different discrete states (e.g. receipt of an organ transplant, current use of chronic medication, and current use of different medications within the same therapeutic class). However, a time-dependent variable may also be a continuous variable. A particularly important type of continuous time-dependent variable is one denoting cumulative duration of past exposure. While incorporating categorical time-dependent variables into a Cox proportional hazards regression model is relatively straightforward, incorporating a continuous time-dependent variable is more complex. In particular, one must decide how to model what may be a non-linear relationship between a continuous time-dependent covariate and the log hazard of the outcome.

Different methods have been proposed for estimating the effect of cumulative duration of exposure on the risk of adverse outcomes. Hauptmann *et al*. used a B-spline to model the effect of incremental exposure on disease risk.[3] The method was subsequently applied in a case-control study examining the effect of smoking on lung cancer. Richardson *et al*. used parametric latency functions to estimate the incremental effect of increasing exposure on disease risk.[4] The method was subsequently illustrated by examining the association between radon exposure and lung cancer mortality in uranium miners. Richardson *et al*. describe that lagging exposure is often used to allow for a latency period in studies examining the effect of cumulative exposure on disease risk. They developed methods to allow for the joint estimation of parameters describing the association between exposure and outcome and the latency distribution.[5] The method was illustrated by examining the association between cumulative asbestos exposure and lung cancer mortality in textile workers. Finally, both Abrahamowicz *et al*.[6] and Sylvestre and Abrahamowicz[7] developed a method for modelling the cumulative effects of time-dependent exposures in cohort studies, weighted by recency, represented by time-dependent covariates in a Cox proportional hazards model. In doing so, the function that assigns weights to previous doses was estimated using cubic regression splines. This method has been used to assess the cumulative effects of exposure to benzodiazepines on the risk of fall-related injuries in the elderly.[8]

When fitting regression models, analysts frequently categorize continuous explanatory variables, either out of convenience or in the interest of simplicity. However, several authors have criticized this practice.[9-12] Drawbacks to categorization include difficulties in deciding how best to categorize the continuous variable, incorrect inferences, and the loss of information. However, assuming a linear relationship between the continuous variable and the outcome can also yield misleading analyses. Several different approaches have been proposed to account for non-linear relationships between fixed or time-invariant continuous explanatory variables and the outcome. These include the use of generalized additive models, restricted cubic smoothing splines, and fractional polynomials (FPs).[12-14] Each of these approaches allows for flexible modelling of the relationship between a continuous fixed or time-invariant covariate and the outcome.

The objective of the current study was to examine the utility of FPs to model the relationship between cumulative duration of past exposure and the risk of an adverse outcome. We examine the use of these methods in two common study designs: the cohort design and the nested case-control design. We first provide a brief review of FPs for modelling the effects of continuous covariates. We then describe two case studies in which we examine the relationship between cumulative duration of past exposure to a specific medication and the risk of an adverse event.

### FRACTIONAL POLYNOMIALS

- Top of page
- ABSTRACT
- INTRODUCTION
- FRACTIONAL POLYNOMIALS
- CASE STUDIES
- DISCUSSION
- CONFLICT OF INTEREST
- ETHICS STATEMENT
- ACKNOWLEDGEMENTS
- REFERENCES

Fractional polynomials were proposed by Royston and Altman as a restricted set of transformations of a single continuous variable.[15, 16] Given a continuous explanatory variable *x* > 0, a transformation of the form *x*^{p} for *p* in *S* = {−2, −1, −0.5, 0, 0.5, 1, 2, 3} (with the convention that *x*^{0} denotes log(*x*)) is referred to as an FP transformation of degree 1. An FP1 function, defined as *ϕ**(*x*,*p*) = *β*_{0} + *β*_{1}*x*^{p} = *β*_{0} + *ϕ*_{1}(*x*;*p*), can be included in a regression model as an explanatory variable. There are eight FP transformations of degree 1.

Royston and Altman extended the definition of FP transformations to FP*m* transformations, where *m* is an integer ≥2. An FP2 transformation of *x* with powers *p* = (*p*_{1},*p*_{2}) yields the vector . An FP2 function (for *p*_{1}≠*p*_{2}), defined as , can be included in a regression model as an explanatory variable (for *p*_{1} = *p*_{2}, the function would be modified as above). The powers *p*_{1} and *p*_{2} are taken from the same set *S* described in the previous paragraph. There are 28 FP2 transformations with distinct powers (*p*_{1} ≠ *p*_{2}) and 8 FP2 transformations with equal powers (*p*_{1} = *p*_{2}); consequently, there are a total of 44 FP1 and FP2 transformations. FP1 functions are always monotonic, while FP2 functions may be monotonic or unimodal. In medical applications, FP1 and FP2 transformations are used almost exclusively, with higher order transformations being used rarely. FP1 and FP2 functions allow representation of a wide range of non-linear relationships. For greater detail, the reader is referred to the comprehensive reference by Royston and Sauerbrei.[14] Once a specific FP1 or FP2 function is incorporated into a regression model (for Cox proportional hazards models, the intercept is omitted from the FP function), the regression model can be estimated using conventional methods for the regression model at hand.

A closed test procedure, known as RA2, has been proposed for selecting the most appropriate FP function for inclusion in a regression model.[17, 18] In this procedure, which is described in greater detail in the references provided in this paragraph, a linear function is assumed as the default choice or selection. The test procedure preserves an overall family-wise type I error rate of *α*. One fits each of the 44 possible regression models, each containing a different FP function as an explanatory variable (possibly in addition to other covariates) and determines the deviance of each fitted model. We reproduce Royston and Sauerbrei's[14] description of this selection procedure:

- Compare the best FP2 model for
*x* against the null model using a test with 4 degrees of freedom. If the test is not significant, then one stops the process and concludes that the effect of *x* is not significant at the *α* level. Otherwise proceed. - Test for the best FP2 for
*x* against a linear relationship at the *α* level using a test with 3 degrees of freedom. If the test is not statistically significant, then one stops, with the final model being a straight line. Otherwise continue. - Test for the best FP2 for
*x* against the best FP1 at the *α* level using a test with 2 degrees of freedom. If the test is not significant, the final model is FP1; otherwise, the final model is FP2. At this point the procedure terminates.

The previous discussion of FPs and the procedure for selecting the best FP transformation for settings in which there is a single continuous variable one is seeking to model using FPs. It is also possible to use FPs to simultaneously model the effects of several continuous variables on the outcome. However, we do not consider this approach here, as the focus of the current study considers only a single continuous variable denoting cumulative duration of past exposure.

### DISCUSSION

- Top of page
- ABSTRACT
- INTRODUCTION
- FRACTIONAL POLYNOMIALS
- CASE STUDIES
- DISCUSSION
- CONFLICT OF INTEREST
- ETHICS STATEMENT
- ACKNOWLEDGEMENTS
- REFERENCES

We demonstrated that FPs can be used to model the relationship between cumulative duration of past exposure to a pharmaceutical agent and the risk of an adverse outcome. These methods can be used in cohort studies in which cumulative duration of past exposure is treated as a continuous time-dependent covariate. The methods are also applicable in case-control designs in which total cumulative duration of past exposure is treated as a continuous variable. Our focus was on modelling the relationship between current cumulative duration of medication use (in days) and the risk of an adverse event. However, the methods can be applied to more general problems. The methods could also be applied to model the relationship between cumulative dose exposure (e.g. in milligram of active ingredient) and the risk of an adverse outcome. Furthermore, in cohort designs, the use of FPs may allow one to model the relationship between any continuous time-dependent covariate (which takes on strictly positive values) and the hazard of an outcome.

We illustrated the applications of our methods in both a cohort design and a nested case-control design. There are advantages to each of these two designs. The use of a cohort design will, in general, result in greater statistical power and efficiency because information from all of the cohort members is being used, while a nested case-control design will result in greater computational efficiency.[25, 26] Conducting the FP analyses in the cohort of amiodarone users required over 400 times as much processor time than did conducting these analyses in the nested case-control study of bisphosphonate users. The substantial increase in process time for the cohort design arises from the fact that for these analyses, the data set must be structured so that there is one row of data for each day of follow-up, so that the cumulative duration of past exposure can be calculated separately for each day. Thus, for a cohort with 1000 subjects, each of who is followed for an average of 1000 days, the resultant data set will have 1 000 000 records. A further advantage of the nested case-control design is that when necessary data on all subjects in the cohort are not readily available, this design will be more economical than the cohort design because extra data will need to be collected for only the cases and the matched controls.[27]

In our second case study, which used a nested case-control design, we compared the use of FPs with the use of restricted regression splines. However, we did not conduct a similar comparison in our first case study that used a cohort design. The rationale for this omission highlights an advantage to the use of FPs to model the effects of continuous time-dependent covariates. The use of (restricted) cubic regression splines requires that the locations of knots be specified. Frequently, these knots are placed at specified quantiles of the distribution of the continuous variable.[12] However, when the value of the continuous variable changes over the duration of follow-up, it is unclear how these quantiles should be determined. When the continuous variable denotes cumulative duration of past exposure, the distribution of that variable will shift to the right (or shift upwards) over time. If one decides to base the quantiles of the distribution of cumulative duration of past exposure at the end of follow-up, then the large majority of the values of the variable during the early phases of follow-up will lie to the left of the first or second knots, limiting the flexibility of the estimate. In contrast to this, the use of FPs does not rely on computing any such quantiles. Instead, the use of this method simply requires that one apply the appropriate FP1 or FP2 transformation to the value of the continuous time-dependent covariate at each follow-up time. In contrast to this, in a retrospective design such as the nested case-control study, the cumulative duration of past exposure is fixed at the index date (the event date for the cases). Therefore, the requisite quantiles can be specified, allowing one to use (restricted) cubic regression splines to model the non-linear relationship between cumulative duration of past exposure and the risk of the outcome.

In the current study, we explored the use of FPs for modelling the effect of time-dependent covariates on the log hazard of an outcome. An advantage to the use of FPs in the current context is that, for negative powers, they can model curves that tend to a horizontal asymptote as the explanatory variable tends to infinity (i.e. the curve becomes increasingly flat as the explanatory variable becomes very large). This may be useful in examining the effects of cumulative duration of past exposure if there are settings in which there is some ‘saturation’ level of cumulative duration of past use, above which the risk of an event does not change. Such a setting could not be modelled using cubic regression splines (which do not permit accurate approximation of curves in which there is a horizontal asymptote—cubic regression splines would result in the curve tending to either negative or positive infinity as the explanatory variable becomes arbitrarily large). While we believe our use of FPs to model the effects of time-dependent covariates to be original, it bears noting that FPs have been proposed for modelling time-varying covariate effects.[14, 28]

The primary limitation of the method, as currently described, is that it does not take into account the recency of the exposure. Thus, a given duration of cumulative past exposure that occurred entirely in the distant past has the same effect on the current hazard of the occurrence of the outcome as the same duration of cumulative exposure that occurred recently. As noted in the 'INTRODUCTION'Introduction, Hauptmann *et al*.,[3] Richardson *et al*.,[5] Abrahamowicz *et al*.[6] and Sylvestre and Abrahamowicz[7] have developed methods to allow one to account for recency of exposure or latency. In future work, the methods described in this paper need to be expanded to allow one to incorporate the timing of cumulative duration of past exposure when using FPs to model the exposure–outcome relationship.

In conclusion, FPs can be used to model the nature of the relationship between cumulative duration of past exposure to an agent and the risk of the occurrence of an outcome. These methods can be employed in cohort designs when cumulative duration of past exposure is treated as a time-dependent covariate in a Cox proportional hazards model. The methods can also be applied in case-control designs in which total cumulative duration of past exposure is assessed for each case and control. Increased use of these methods will provide investigators in clinical medicine, public health, and epidemiology with tools to examine the form of the relationship between cumulative duration of past exposure and the risk of an outcome.