Estimating risk factor progression equations for the UKPDS Outcomes Model 2 (UKPDS 90)

To estimate 13 equations that predict clinically plausible risk factor time paths to inform the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model version 2 (UKPDS‐OM2).


| INTRODUCTION
Computer simulation models allow users to predict disease progression, health outcomes and costs in individuals with type 2 diabetes beyond the time constraints of clinical trials and observational cohorts. Such models form the cornerstone of economic evaluation of therapies for the treatment of diabetes and its complications. One of the most widely used simulation models is the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS-OM). 1 It has been used in a wide variety of applications, including cost-effectiveness evaluations of diabetes interventions, 2-4 healthcare service planning 5 and as a long-term prognostic tool. 6 Its structure and published equations form the basis of other health economic diabetes simulation models including many of those that are part of the Mt Hood diabetes modelling network. 7 In 2013, a new version of the UKPDS Outcomes Model was published (version 2, UKPDS-OM2) based on data from 5102 UKPDS participants in the 20-year trial and all 4031 surviving participants entering the 10-year posttrial monitoring (PTM) period. 8 The updated model also showed the importance of new risk factors (such as estimated glomerular filter rate [eGFR], heart rate, microand macro-albuminuria and white blood cells [WBCs]) in predicting diabetes-related complications and death. The model did not include a set of equations for updating risk factor time paths, and so diabetes simulation modellers have had to make assumptions, often with a limited evidence base. Although there has been an attempt to estimate time paths for a subset of risk factors, this involved only four of the 13 risk factors used in the UKPDS model, and these equations have not been widely adopted into health economic diabetes simulation models. 9 The Mt Hood network has identified this area as a research priority 7 and recommended in its recent guidelines on transparency on diabetes simulation modelling 10 that descriptive statistics of risk factor time paths by treatment be routinely reported in all economic evaluations of diabetes interventions.
We aimed to estimate progression equations for each of these 13 clinical risk factors using individual participant data from the UKPDS trial and PTM period. An immediate use for such equations is to incorporate them into the UKPDS-OM2 (or related diabetes simulation models), permitting them to make predictions of future event rates and life expectancy without making arbitrary assumptions about the trajectory of these risk factors. The UKPDS study involved the use of a range of anti-diabetes therapies (including metformin, sulfonylureas or insulin) representing a standard of care against which many new therapies are seeking to improve. Hence, estimated risk factor time paths using UKPDS data could provide a benchmark to evaluate the additional benefits of newer therapies and policies.

Conclusions:
The new equations allow risk factor time paths beyond observed data, which should improve modelling of long-term health outcomes for people with type 2 diabetes when using the UKPDS-OM2 or other models.

K E Y W O R D S
blood glucose, complications, patient-level simulation, risk modelling, survival, UKPDS

Novelty statement
What is already known?
• Computer simulation models allow users to predict disease progression, health outcomes and costs in individuals with type 2 diabetes. The UKPDS Outcomes model 2 (UKPDS-OM2) requires progression equations for 13 clinical risk factors.
What this study has found?
• We have estimated 13 new risk factor progression equations for people with type 2 diabetes. The predictive accuracy of UKPDS-OM2 improved significantly when using the new risk equations compared with holding the risk factors constant.
What are the implications of the study?
• The equations allow risk factor time paths to be projected beyond observed data, which should improve modelling of long-term health outcomes for people with type 2 diabetes when using the UKPDS-OM2 or other models.

Outcomes model version 2
The UKPDS was a clinical trial evaluating different glycaemic and blood pressure management regimens for individuals newly diagnosed with type 2 diabetes. Briefly, the UKPDS recruited 5102 participants, aged 25-65 years, between 1977 and 1991 and followed them until the trial concluded in 1997. 11 At the end of the trial, all 4031 surviving participants entered a 10-year PTM period (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007) with no attempt to maintain their previously allocated trial regimen. 12 Clinical risk factor levels were collected for study participants until 2002, giving up to 24 years of longitudinal data. All participants provided written informed consent. Approval was obtained from the ethics committees at all 23 clinical centres, and the study conformed to the Declaration of Helsinki guidelines. UKPDS-OM2 (version 2) was developed using individual participant data from the UKPDS main trial and the 10 years of PTM data. 8 It is a computer simulation model for forecasting the occurrence of major diabetesrelated complications (myocardial infarction, ischaemic heart disease, stroke, congestive heart failure, amputation, blindness, diabetic foot ulcer and renal failure) and death in participants diagnosed with type 2 diabetes. In brief, UKPDS-OM2 is based on an integrated system of parametric equations that predict the annual probability of any of the above complications and uses Monte Carlo methods to predict the occurrence of events. The likelihood of these events is based on patient demographics, duration of diabetes, clinical risk factor levels (HbA 1c , LDL-cholesterol and HDL-cholesterol values, systolic blood pressure, etc.) and history of diabetesrelated complications. Different treatment and management strategies are evaluated through their impact on risk factor levels. Elements of UKPDS-OM1 and UKPDS-OM2 have been widely used in many other prediabetes and diabetes simulation models. 13-18

| Clinical risk factors
We used the 13 clinical risk factors found to be associated with the risk of complications of type 2 diabetes and death in UKPDS-OM2 (

| Statistical analysis
The estimation procedure involved (1) re-estimating three original UKPDS-OM1 equations for risk factor time paths (HbA 1c , SBP and smoking status) over a longer follow-up period; (2) estimating new equations, not in UKPDS-OM1, for the remaining 10 risk factors (LDL-cholesterol, HDLcholesterol, eGFR, BMI, heart rate, WBC, haemoglobin, PVD, micro-/macro-albuminuria and atrial fibrillation); (3) comparing the predicted time paths of the 13 risk factors in UKPDS-OM2 (from baseline data) with the observed time paths and (4) after integration of the new risk factor equations into UKPDS-OM2, comparing the predicted and observed incidence of diabetes-related complications and deaths. We followed the approach of UKPDS-OM1. 1 Linear and non-linear dynamic models estimated the time path of nine risk factors. Generally, these models included the value or status of the risk factor in the previous period, number of years since diagnosis (in log form for improved fit), first risk factor value collected post-randomisation (from the first year of follow-up), sex and ethnicity. We estimated models using risk factor data that excluded the initial treatment effect (from randomisation to the first year of follow-up). We also used multivariable parametric proportional hazards survival models to estimate the risk of developing PVD, microor macro-albuminuria (MIC ALB), atrial fibrillation (AT FIB) and eGFR below 60 ml/min/173m 2 (EFGR <60). We also tested for significant differences in time paths of HbA 1c , BMI, SBP and lipids for individuals allocated initially to one of the four UKPDS treatment groups: diet, metformin, sulfonylureas and insulin. To test for differences, we restricted the sample to the UKPDS trial period (excluding PTM) and where a significant difference between the treatment groups was identified, we re-estimated the model for that particular group within the UKPDS trial. See Online Appendix for more details ("Details on modelling and statistical approach"). Finally, we captured parameter uncertainty by bootstrapping (with replacement) the UKPDS patient-level data and re-estimating all equations to derive sets of fully correlated regression coefficients. These sets of regression coefficients could be used to obtain a distribution of risk factor values from which 95% confidence intervals (CI) are derived. All analyses were carried out using Stata version 13.0 software (StataCorp LP).

| Internal validation using the UKPDS data and UKPDS-OM2
For the internal validation of each of the continuous risk factors, we divided the value range of the first observed value (post-randomisation) into quintiles and compared the predicted values obtained using the initial values against the observed values for up to 20 years of followup. Although data were available for up to 24 years, these were quite sparse towards the end, and therefore, we imposed a pragmatic limit of 20 years. Smoking status for each participant in 3-year period was modelled probabilistically from previous smoking status, using the logistic model and 2000 random draws from a uniform (0,1) distribution. The predictions were compared against the observed proportion of current smokers. The risk factor predictions from the parametric survival models were converted into binary outcomes using the same approach as in UKPDS-OM2 (unconditional probability of event occurring between t and t+1 compared with a random number) and the predicted cumulative failure was compared with the observed (Kaplan-Meier) cumulative failure.
We also assessed the internal validity of the risk factor models together with the 15 event equations from UKPDS-OM2 by using them simultaneously to (i) simulate the risk factor time paths of 4629 individuals in the UKPDS trial who had complete risk factor data at baseline and to (ii) produce UKPDS-OM2 predictions of event rates. As a sensitivity analysis, and to reflect common practice in diabetes modelling, we held all risk factors constant from baseline and repeated UKPDS-OM2 prediction of event rates. We compared simulated cumulative failure of each of the major outcomes of the model with the observed (Kaplan-Meier) cumulative failure of events up to 25 years. We judged the fit acceptable if the model predictions were within the 95% CI of the observed rate. To minimise Monte Carlo error, we performed 50,000 replications for each individual participant. Table 2 shows the risk factor values for the 5102 UKPDS participants during the 24 years of measurement. Age at diagnosis of diabetes was 51.9 years (SD 8.8), and the proportion of women was 41.0%. The person-years of follow-up data available for analysis varied between 12,869 (heart rate) and 65,252 (BMI). The between-participant variance was higher than the within-participant variance for 11 of the 13 risk factors. For BMI, micro-and macro-albuminuria, smoking status and atrial fibrillation, the proportion of the total variance explained by variation between individuals was above 90%. Tables S1 and S2 in Appendix report the mean time paths for all risk factors over 20 years.

| Risk factor equations
The risk factor equations, including constants, coefficients (p < 0.05) and standard errors are reported in Tables S3 and S4 (Appendix). The previous (lag) values of all risk factors except haemoglobin were significantly associated with their current value (p < 0.05) and suggested convergence to the overall mean over time (absolute value <1), holding all else constant. Being female was associated with higher values of all continuous risk factors except for haemoglobin, holding all else constant. Ethnicity (Afro-Caribbean and/or Asian) was associated with all continuous risk factors except with heart rate. For the binary risk factors, older age at diagnosis was associated with higher rates of PVD, atrial fibrillation and micro-or macro-albuminuria (hazard ratios [HRs] 1.06, 1.10 and 1.01, respectively) (see Tables S4 and S5 in Appendix). Higher BMI was also associated with higher rates of PVD, atrial fibrillation and micro-or macro-albuminuria (HRs 1.02, 1.08 and 1.18, respectively). Being a current smoker was associated with higher rates of developing PVD and micro-and macroalbuminuria (HRs 2.38 and 1.39) but not atrial fibrillation. Female, higher age at diagnosis, higher SBP and higher LDL-cholesterol were associated with a higher risk of progressing to an eGFR <60 ml/min/1.73m 2 (HRs 2.18, 1.08, 1.09 and 1.01, respectively). Asian or African-Caribbean ethnicities were associated with a lower risk of an eGFR <60 (HRs 0.73 and 0.37, respectively). We also found significant differences between initial treatment allocation groups for HbA 1c (diet and insulin), BMI (diet and insulin) and HDL-cholesterol (metformin) (see Table S6 in Appendix). Sets of fully correlated regression coefficients for all equations are available as a see correlated_regres-sion_coefficients.xlsx supplementary file. Finally, fully worked examples of how to use the equations to predict the risk factors are presented in the online Appendix (see 'Predicting risk factors').  Table S3 (Appendix). The predicted values were estimated based on the initial observed risk factor value. The average time path for HbA 1c increases markedly over time, whereas LDL-cholesterol, WBC and heart rate appear to remain fairly constant. Overall, the predictions were similar to the observed values for all risk factors. In Figures S1-S3 T A B L E 2 Risk factor values in the 24 years of UKPDS and PTM data (Appendix), we report the comparison of predicted and observed risk factor values divided by quintiles of first observed value (Tables S7 and S8 in Appendix), which were consistent with the findings for the overall mean. Figure S4 (Appendix) reports the simulated and observed KM cumulative failure curves for the binary variables (micro-and macro-albuminuria, PVD and atrial fibrillation) and the simulated proportion of current smokers using the equations in Table S4 (Appendix) compared with the observed mean. Overall, the predictions were consistent with observed events. Figure 2 shows the simulated and observed KM cumulative failure curves for each of the diabetes-related complications and all-cause mortality. The estimated number of individuals having each complication and mortality was based on UKPDS-OM2 predictions incorporating the risk factor values at baseline and the timevarying paths predicted by the risk factor equations in Tables S3 and S4 (Appendix). For every type of event, the predicted numbers were within the 95% CI surrounding the cumulative failure curves for the UKPDS population at 5, 10 and 15 years. For six out of eight events with data up to 25 years, the predicted numbers were also within the 95% CI surrounding the observed rates. After 20 years, the predicted cumulative failure for heart failure was slightly overestimated. At 25 years, the predicted cumulative failure for death was slightly underestimated.

| Internal validation
As a sensitivity analysis, we also report the simulated and observed KM cumulative failure curves for each complication and death holding all risk factors fixed at their baseline values ( Figure S5 in the Appendix). In this sensitivity analysis, the predicted cumulative failure for all event types was underestimated after 10 to 15 years of follow-up. Hence, the UKPDS-OM2 event predictions based on the new risk factor time path equations performed significantly better compared with holding all risk factors constant.

| DISCUSSION
The primary aim of this analysis was to estimate a set of type 2 diabetes risk factor progression equations that are clinically plausible and consistent with the observed UKPDS data (overall and when data are subdivided, e.g. by baseline risk factor quintiles). The equations have been estimated to become an integrated part of the UKPDS-OM2 and thus facilitate long-term predictions of diabetes-related complications and mortality that are consistent with participants treated in the UKPDS.
Health economic diabetes simulation models rely on predicting changes in metabolic risk factors, such as HbA 1c , to capture treatment effects for multiple interventions. For example, a recent systematic review identified over 65 cost-effectiveness studies using type 2 diabetes simulation models that quantified the changes in HbA 1c associated with blood glycaemia-related interventions in terms of quality-adjusted life-years (QALYs) or life expectancy. 19 Understanding risk-factor progression based on what might be considered "usual care" is critical to quantifying the gains from new interventions and therapies. In this regard, we would see the equations reported in this study as representing a way to obtain historical controls which can be used to evaluate the benefits of additional metabolic control that could be achieved by further intensifying treatment or using of newer therapies. When these risk-factor equations are incorporated into type 2 diabetes computer simulation models, 7,10 the benefits of improvements in risk factor profiles can be translated into metrics such as QALYs, which are commonly used for economic evaluations.
Incorporating individual treatments would require developing further equations to simulate how these change over time or adopting algorithms for treatment pathways. Several other diabetes models have tackled this issue by explicitly including algorithms to capture important diabetes treatment pathways. [20][21][22] Instead, we explicitly excluded data on the initial effect of treatments in the UKPDS on risk factors when estimating our models and focused on simulating subsequent trajectories. The aim was to allow users to use their own data on the impact of treatments in the first year F I G U R E 2 Observed (black lines) and simulated* (red dashed lines) KM cumulative failure (CF) curves and 95% CI (grey lines) diabetes complications and death over 30 years. *simulated using the predicted risk factor time paths from baseline values. Observed data for diabetic foot ulcers are only available for 12 years. Observed data for remaining events up to 25  and then use our estimated models afterwards if needed. For users interested in historical differences between treatments, we report separate equations for treatment groups in the UKPDS trial that we found to have significant differences in risk trajectories after first year of follow-up. It is important to note that the equations are relatively parsimonious, including primarily lagged values and some routinely recorded demographic factors. Increasing the number of covariates may increase the goodness of fit but would also increase the risk of overfitting and hinder the usability of the equations if other users lacked the full set of covariates required. Hence, we have tried to strike a balance and rely primarily on covariates such as sex, age at diagnosis, ethnicity and baseline values of the risk factors of interest, which are most likely to be available to other users.
Our work shows that, compared with the commonly used modelling assumption of holding risk factors constant over time, the predicted time paths using these equations produce notably different predictions of complications. The predictive performance of UKPDS-OM2 when using these risk factor equations was significantly improved relative to an assumption of holding risk factors constant and more accurately reflects the deterioration in metabolic control observed in the UKPDS cohort. More recent therapies or interventions may enable participants to improve the control of risk factors, and many recent economic evaluations involve comparisons of new therapies with the glucose-lowering medications that were included in the UKPDS. [23][24][25][26] Our analysis has a number of limitations. First, the data used to estimate the equations are increasingly historical and may in some cases reflect values or trajectories that are less commonly seen in contemporary populations. Validation and re-estimation of the equations in large but more contemporary data sets are needed. However, this can be an advantage if the purpose of the economic evaluation is to assess new interventions relative to those previously used. Second, the sample sizes are not particularly large in comparison with many more recent trials and cohort studies, and some measures were only taken every 3 years. More generally, the trajectories described by these equations represent the outcome of a mixture of prevailing treatments over the duration of UKPDS. Finally, the equations will not capture the random variation in risk factor trajectories across individuals with the same characteristics and values at baseline. This places the onus on the analyst to incorporate this variation if using the trajectories to inform personalised care at individual level.
Set against these limitations, the strength of the UKPDS data continues to be the low levels of missing data and of participant attrition; the generally representative nature of the newly diagnosed patients with type 2 diabetes in the United Kingdom when recruited into the study 27 ; frequent and careful repeated measures of risk factors; long-term follow-up; and the careful ascertainment and formal adjudication of all relevant clinical events.
Looking to the future, other larger, but more contemporary data sets, which also have these attributes may serve as platforms from which to estimate new risk factor equations and to externally validate or replace this set. Until then, these equations -apart from their intrinsic interest -give modellers and the wider research community a useful additional tool when trying to simulate the long-term effects of type 2 diabetes and its therapies.