Estimating the association between blood pressure variability and cardiovascular disease: An application using the ARIC Study

The association between visit‐to‐visit systolic blood pressure variability and cardiovascular events has recently received a lot of attention in the cardiovascular literature. But, blood pressure variability is usually estimated on a person‐by‐person basis and is therefore subject to considerable measurement error. We demonstrate that hazard ratios estimated using this approach are subject to bias due to regression dilution, and we propose alternative methods to reduce this bias: a two‐stage method and a joint model. For the two‐stage method, in stage one, repeated measurements are modelled using a mixed effects model with a random component on the residual standard deviation (SD). The mixed effects model is used to estimate the blood pressure SD for each individual, which, in stage two, is used as a covariate in a time‐to‐event model. For the joint model, the mixed effects submodel and time‐to‐event submodel are fitted simultaneously using shared random effects. We illustrate the methods using data from the Atherosclerosis Risk in Communities study.


INTRODUCTION
Systolic blood pressure (SBP) is universally recognised as an important risk factor for cardiovascular disease (CVD) and is routinely included in risk scores for CVD risk prediction. 1,2 The prognostic value of SBP is primarily based on the mean of measurements over multiple outpatient visits, whereas substantially less attention has been given to the variability of SBP across visits (ie, visit-to-visit SBP variability). However, there is increasing evidence of associations between greater visit-to-visit SBP variability and CVD outcomes in community-based and clinical settings. [3][4][5][6] Therefore, in addition to assessing mean clinic SBP levels over time, measurements of visit-to-visit SBP variability may improve the accuracy of CVD risk prediction, which is crucial for the optimisation of patient care.
The standard deviation (SD), coefficient of variation (CV), average real variability (ARV), and SBP variability independent of the mean across multiple visits have been widely used to quantify visit-to-visit SBP variability. [3][4][5][6][7] However, these measures are estimated on a person-by-person basis and are therefore subject to considerable measurement error. This measurement error causes regression dilution bias in the estimated association between visit-to-visit SBP variability and CVD. 8 Mixed effects models have been proposed, which allow the within-individual variability to differ between individuals. For example, Hedeker et al introduced the mixed location scale model, where the within-individual variances may be assumed to depend on an additional random effect, as well as time-constant and time-varying covariates. 9,10 More recently, Goldstein et al have explored the mixed location scale model using a Bayesian framework in the context of growth curve models. 11 We consider extensions of the mixed location scale model to estimate the association of SBP variability with CVD. Repeated SBP measurements are modelled using a mixed effects model with a random component on the within-individual SD, thus allowing the borrowing of information across individuals. The model allows for each individual to have different within-individual variation, which describes the variability in their SBP measurements. We propose two methods for estimating the association of this SBP variability with CVD. The first is a two-stage method, where, in stage one, we obtain estimates of the usual level of SBP and the SBP variability for each individual from the mixed effects model, and in stage two, these are included as covariates in a survival model for the time to the first CVD event. The second method is a joint model where the repeated SBP measurements and the time to first CVD event are modelled simultaneously using shared random effects. 12 There are advantages and disadvantages in the use of two-stage versus joint models. The heavy computational burden associated with simultaneous analysis of repeated measurements and time-to-event data makes the two-stage method a more feasible option for large datasets. The two-stage method could also be readily incorporated into a landmarking approach for dynamic risk prediction, 13 for example, in the dynamic model for CVD risk prediction recently proposed by Paige et al. 14 However, two-stage methods for the analysis of repeated measurements and time-to-event data have been shown to lead to bias in estimated covariate effects compared to joint models. 15 This bias is due to two factors: (1) regression dilution bias caused by residual measurement error in the estimates of random effects from the mixed effects model and (2) bias caused by informative truncation of the repeated measurements by the event of interest. In the two-stage approach, this second source of bias can be removed by separating longitudinal follow-up from survival follow-up if data allows, as we describe in Section 2.
The outline of this paper is as follows. In Section 2, we describe the methods, including the naive method that has predominantly been used in the CVD literature, and our proposed two-stage and joint model methods. In Section 3, we compare the methods using simulation studies and explore the accuracy of all methods in a variety of scenarios. We apply the two-stage and joint models to data from the Atherosclerosis Risk in Communities (ARIC) Study in Section 4, and we present a final discussion in Section 5.

METHODS
The association of SBP with CVD has previously been investigated using mixed effects models to analyse repeated measurements of SBP. [16][17][18] In this section, we first present the naive methods for estimating SBP variability that have been commonly used in the literature, and then, we explore how the mixed effects model can be extended to allow SBP variability to differ between individuals. We propose a two-stage method and a joint modelling method for estimating the association between SBP variability and CVD events. Although we focus here on the association between SBP variability and CVD, the methods and results presented would be relevant for any application where the association between the variability in a longitudinal outcome and a time-to-event is of interest. We consider estimating the association of the variability of a longitudinal outcome with a time-to-event, adjusted for the "usual level" of the longitudinal outcome. This "usual level" could be the average value, which may be assumed constant in time, or the current value or baseline value if time-dependence is to be taken into account.

Notation
Given a dataset with N individuals and n i SBP measurements from the ith individual, i = 1, … , N, let Y i j , j = 1, … , n i , be the jth measurement of individual i taken at measurement time t ij . Each individual is followed up from baseline until time where T i is the time of the event for individual i and C i is the censoring time.

Estimating longitudinal variability 2.2.1 Naive method
In the cardiovascular literature, the usual level of SBP is typically estimated as the mean of an individual's repeated measurementsB n i and the SBP variability as their SDB The repeated measurements may have been taken prior to follow-up of the time-to-event, or concurrent with the time-to-event follow-up, in which case the repeated measurement process is terminated by an event. Other measures of visit-to-visit SBP variability have been proposed, such as the CV (defined as the SD/mean), the maximum and minimum difference, and the ARV (defined as the average change between successive visits), 7 but all are usually estimated using within-individual information only.

Mixed effects models
An alternative method to estimate the usual level and variability for each individual is to use a mixed effects model for the repeated measurements from all individuals, allowing information to be borrowed across individuals. The mixed effects model also allows longitudinal trajectories to be modelled by including time-dependent terms in the model. Consider a standard linear mixed effects model where X i j is a covariate vector for the fixed effects and Z i j is a covariate vector for the random effects b i , assumed normally The residual errors i j are assumed independent and normally distributed, i j ∼ N(0, 2 ). We can allow variability in the repeated measurements to differ between individuals by replacing the residual SD with an individual-specific residual SD i and assuming that the i are randomly distributed. Note that, in this model, we do not distinguish between true variability in an individual's repeated measurements and measurement error. We assume a log-normal distribution for the residual SD distribution, ensuring positivity of the SDs, i ∼ logN( , 2 ). The choice of log-normal distribution also allows a natural extension of the model to incorporate correlation between the usual level and the residual SD by assuming a multivariate normal distribution for the random effects and log residual SD where Σ b is a vector of covariances between the random effects and the random residual errors. Alternative distributions have been proposed for the residual SD, such as the half-Cauchy distribution. 19 (See also the work of Hedeker et al 9 and recent work of Goldstein et al, 11 where a log-normal distribution was assumed for the residual variances 2 i .) Specific examples of the model specified by Equations (1) and (2) that we will consider are a random-intercept model where the random intercept b 0i can be interpreted as the usual level of the repeated measurements, and adjustment is made for baseline covariates X i . We consider a model (LMM1) that does not account for the correlation between b 0 i and i LMM1: and a model (LMM2) that does account for the correlation LMM2: We also consider a random intercept and slope model where the random intercept b 0i is now the baseline value (minus the population average), and i now measures the variability around the individual's average trajectory. Again, we can either ignore or allow for correlations between the random effects and the residual SD's LMM3:

Estimating the association between SBP variability and CVD
We consider two approaches for estimating the association between SBP variability and the time to the first CVD event, ie, a two-stage approach and a joint model.

Two-stage approach
We fit the models in two stages. In stage one, we either (i) estimate the usual level and SD of the repeated measurements for each individual using the naive method or (ii) fit a mixed effects model and estimate the usual level, residual SD, and other random effects for each individual, as described in Section 2.2. In the second stage, we use the estimated usual level and residual SD as covariates in a standard proportional hazards Cox regression for the time to the first event where U i is a vector containing the estimated usual level, residual SD and other random effects for each individual, is a vector of parameters describing the association between the mixed effects model and the time-to-event model and W i is a vector of baseline covariates.
We estimate mixed effects model parameters using Bayesian Markov chain Monte Carlo (MCMC) implemented using JAGS Version 3.4.0 20 and the R2jags package in R. 21 We assume diffuse priors for all model parameters. The usual level The SBP measurement process is truncated informatively by the event of interest, as illustrated in Figure 1A. In order to avoid this informative truncation, we can divide all follow-up into follow-up of the repeated measurements and follow-up of the survival outcome, separated at separation time t sep , as shown in Figure 1B. Only repeated measurements taken before t sep are used in estimation of the mixed effects model, and only events taking place after the separation time are used in estimation of the time-to-event model conditional on surviving until t sep . Separating the longitudinal follow-up from the survival follow-up in this way involves discarding information from repeated measurements following t sep and individuals experiencing events before t sep . In practice, t sep should be chosen to minimise this information loss.

Joint model
In the joint model, we use a shared parameter approach to link the mixed effects and time-to-event models. The mixed effect submodel is defined by Equations (1) and (2). The time-to-event submodel is defined by where b i are the random effects shared with the mixed effects submodel. We assume a piecewise constant baseline hazard; we choose cut-points t k , 0 ≤ k ≤ K, at the K-quantiles of the observed event times and assume that the baseline hazard is constant in the time intervals between cut-points, The piecewise-constant assumption ensures that the hazard function can be integrated analytically, thus avoiding the need for numerical integration when evaluating the likelihood function. In practice, the number of cut-points can be chosen by assessing model fit, for example, using the deviance information criterion. 22 We again estimate model parameters using Bayesian MCMC implemented using JAGS Version 3.4.0 20 and the R2jags package in R 21 and assuming diffuse priors for all model parameters.

Regression dilution bias in association parameters
In the two-stage approach, because SBP variability is estimated with error, we expect bias in the estimated association with CVD due to regression dilution. We aim to reduce regression dilution bias by using a mixed effects model that allows information to be borrowed across individuals. The joint model accounts for this measurement error by including the underlying random effects in the time-to-event model, rather than estimates of those random effects. However, misspecification of the mixed or joint models could introduce other sources of bias. In Section 1 of the Supplementary Materials, we show for the simplified case of linear regression that ignoring the correlation between the random effects and the residual SD could also lead to bias in the estimated effects of both the usual level and the variability on the outcome.

SIMULATION STUDIES
We use simulation studies to explore the bias in parameter estimates due to imprecise estimation of SBP variability and due to ignoring correlation between SBP and SBP variability in the mixed effects model. We consider two scenarios. In Scenario 1, each individual has the same number of repeated measurements, thus bias in the two-stage approach arises solely from measurement error in the estimates of the usual level and variability of SBP (cf Figure 1B). In Scenario 2, the repeated measurement process is truncated by the event times, so that bias in the two-stage approach arises from both measurement error and informative truncation of the repeated measurements (cf Figure 1A). We generated simulated datasets to explore the effect of the number of measurements per individual, the value of the association parameters and the extent of the correlation between the usual level and the variability.
We take a simple model for the repeated measurements with a random intercept only and random residual error SD where the residual error i ∼ N(0, 2 i ), and we take a bivariate normal distribution for b 0i and log , Here, b 0i is the usual level and i represents the variability in the longitudinal outcome. Based on results from the ARIC study, we take 0 = 120, 0 = 15, = 2, and = 0.5. Event times are drawn from a Weibull distribution with shape parameter k and scale parameter . A log-linear effect of the covariates on gives a proportional hazards model We introduce administrative censoring at 20 years. To maintain an event rate of 20% before censoring, we take 0 = −10.26 and k = 2 with default association parameters 0 = 0.02 and = 0.05, similar to those found for the ARIC dataset. For Scenario 1, we used all repeated measurements for all individuals. For Scenario 2, we assumed repeated measurements were taken equidistantly between baseline and 18 years follow-up and discarded all repeated measurements following event times. In each simulation setup, we generated 1000 datasets, each consisting of 1500 individuals.
We analysed each generated dataset using the two-stage method from Section 2.3.1, with (i) naive and (ii) mixed model estimates of SBP usual levels and variabilities, and the joint models from Section 2.3.2. For the mixed effects and joint models, we fitted models ignoring the correlation between SBP and SBP variability (mixed effects model LMM1 and joint model JM1 used Equations (3) and (4)), and allowing for the correlation (mixed effects model LMM2 and joint model JM2 used Equations (3) and (5)). The joint models were fitted using 15 time intervals for the baseline hazard. Model convergence was checked using the Gelman-Rubin statistic as modified by Brooks and Gelman. 23 We also fitted the survival models using the true values of b 0i and i as covariates, representing the best achievable results from the time-to-event model when all covariates are known precisely. For all scenarios, we calculated the mean and SD of the estimated log hazard ratios and the root mean squared errors (RMSEs) and coverage probabilities at the 95% level. Table 1 shows results for n = 4, n = 7, and n = 10 measurements per individual for Scenario 1. With 4 measurements per individual, using the naive method leads to a slight bias in the log hazard ratio (logHR) for the usual level, but substantial negative bias and associated loss of coverage in the logHR for variability, which is most likely caused by   regression dilution due to imprecise measurement of longitudinal variability. Using the mixed effects model LMM1 leads to a positive bias in both the usual level logHR and the variability logHR, as we would expect from the arguments given in Section 1 of the Supplementary Materials, because correlation between these quantities was not accounted for in their estimation. Using method LMM2, which accounts for the correlation, results in minimal bias, even with only 4 measurements per individual. For both two-stage models, the coverage is a little below the nominal level, but the RMSE is larger for the variability logHR in LMM2 due to greater variation in effect estimates. Results for the joint models are similar to the two-stage models in terms of the bias, but coverage is closer to the nominal levels. Results for n = 7 and n = 10 are similar, with the naive model giving reduced but still substantial bias in the variability logHR with a greater number of measurements. Table 2 shows results for n = 4, n = 7, and n = 10 measurements per individual for Scenario 2. The two-stage model LMM2 now also gives negative bias in estimating the variability logHR due to informative truncation of the repeated measurements by the CVD event. The joint model JM2, which accounts for this informative truncation, gives consistent results. Our results suggest that the bias incurred in the two-stage approach by informative truncation is more considerable here than the bias incurred by error in the estimates of SBP variability from the linear mixed models (LMMs).
In Section 2 of the Supplementary Materials, we present further results investigating performance of the models for different levels of association between the longitudinal trajectories and the CVD event (Supplementary Table 1) and for different levels of correlation between the usual level and the variability (Supplementary Table 2). All results are given for datasets with n = 4 measurements. In brief, all methods performed well with no association between times-to-events and longitudinal variability. But, when the associations between the time-to-event and the usual level and variability of the longitudinal trajectories was substantially stronger, all the methods struggled to give consistent results with only 4 repeated measurements. The joint models performed the best in this case with the least bias and highest coverage (Supplementary Table 1). When there is negative correlation between usual levels and variabilities, the direction of bias is reversed. Bias in the usual level logHR using methods LMM1 and JM1 increases with increasing . But, for the variability logHR, the pattern is less clear because of the interplay between regression dilution bias and the bias incurred by ignoring correlation in the mixed effects model (Supplementary Table 2).  In summary, the method that overall resulted in the least bias was JM2. For parameter values similar to those observed in the data example, method LMM2 performed well. But, for all methods, biases were observed with strong association parameters with n = 4 measurements per individual, so more measurements per individual would be required in this scenario. The interplay between biases due to multiple causes is complex and depends on the correlation parameter . The RMSE, however, is generally slightly higher for LMM2 than for LMM1 and for JM2 than for JM1, suggesting a trade-off between the bias caused by ignoring the correlation for models LMM1 and JM1 and the higher variance induced by the increased complexity of models LMM2 and JM2.

EXAMPLE: ARIC STUDY
We illustrate our methods using data from the ARIC study. 24 Briefly, 15 792 mostly black and white adults aged 45-64 years were enrolled into the ARIC study between 1987 and 1989 via probability sampling from 4 US communities: Washington County, Maryland; Forsyth County, North Carolina; Minneapolis, Minnesota, suburbs; and Jackson, Mississippi. Participants underwent five examinations during 25 years of follow-up (ie, Visit 1, 2, 3, 4, and 5 examinations), with an annual contact by telephone. In the current analysis, we used SBP measurements from Visit 1 (1987Visit 1 ( -1989 through Visit 4 (1996Visit 4 ( -1998. We analysed both the full ARIC dataset and a reduced dataset where longitudinal follow-up was truncated at the last SBP measurement taken at Visit 4 (at t sep = 11.9 years from baseline) and the time origin for the survival follow-up was taken at the same time-point ( Figure 1B). For the reduced dataset, to have enough measurements to estimate the SD of repeated measurements using the naive method, we restricted to individuals with three or four nonmissing measurements recorded and also to individuals who did not experience an event before t sep . In total, our full dataset consisted of 13 161 individuals and our reduced dataset consisted of 10 019 individuals. Supplementary Table 3 shows the baseline characteristics of the reduced and full datasets.
We analysed both the full and reduced datasets using two-stage and joint models. For the two-stage approaches, we used the repeated measurements of SBP to estimate the usual SBP level and the SBP variability using the naive method and the linear mixed effects models without and with accounting for the correlation between SBP and SBP variability (we again label these models LMM1 and LMM2, respectively). We also fitted joint models both without and with correlation between SBP and SBP variability (again JM1 and JM2, respectively). We applied the mixed effects models/submodels in four ways.
1. With a random intercept only, not adjusting for baseline CVD risk factors. 2. Including a slope term in the model as both a fixed and random effect. The covariance matrix between the random effects and the residual log-standard deviation was taken to be unstructured, allowing for correlations between the random slopes and the random intercept and between the random slopes and the SBP variability. 3. Adjusting the random intercept mixed effects model for baseline CVD risk factors (age, sex, diabetes status, smoking status, baseline total cholesterol, and baseline HDL cholesterol). 4. Adjusting the random intercept and slope mixed effects model for baseline CVD risk factors.
Time-to-event models were adjusted for age, diabetes status, smoking status, baseline total cholesterol, baseline HDL cholesterol, and sex.
For the Bayesian estimation, we used diffuse uniform prior distributions U[0, 100] for SDs, uniform U[ −1, 1] prior distributions for correlation parameters, and diffuse normal prior distributions N(0, 100 2 ) for all other parameters. Priors were specified for the bivariate and trivariate normal distributions by expressing them as two and three conditional univariate normal distributions, respectively. We used a burn-in of 1000 MCMC updates for the mixed effects models and 2000 MCMC updates for the joint models. We calculated results from 1000 sampled updates and checked convergence using the Gelman-Rubin statistic as modified by Brooks and Gelman. 23 For the joint models, there was a high degree of autocorrelation for the usual SBP and SBP variability hazard ratios, so MCMC chains were thinned by 4 updates. Figure 2 shows histograms of the estimated SBP usual level and SBP variability using the naive method. Also plotted are fitted normal and log-normal distributions, respectively. The plots suggest that the assumption of a normal distribution for the random intercepts and a log-normal distribution for the residual SDs is appropriate. The naive usual level estimates are plotted against the naive variability estimates in Figure 3. There is a positive correlation with those with higher SBP tending to have greater SBP variability. The correlation between the naive estimates is 0.42, but the true correlation without measurement error is likely to be higher. Estimates of usual levels and variabilities from the mixed effects models are compared with the naive estimates in Figure 4. The plotted lines are the lines of agreement between the two methods. Note that there is more difference between the methods for SBP variability estimates than for usual level estimates, with lower SBP variabilities being underestimated by the naive method and higher variabilities being overestimated.
Results from all models applied to the reduced ARIC dataset are shown in Table 3. As expected, the joint models took considerably longer to run than the two-stage models with runtimes of between 17 and 20 hours compared to between 4 and 10 minutes, respectively. All methods find an increased risk of CVD events with higher SBP variability. The methods which include random slopes find no evidence for an association between SBP gradient and CVD risk after adjusting for the usual level of SBP and SBP variability. The estimated association of SBP variability with CVD events is smaller for the naive model, suggesting regression dilution bias in this parameter estimate. Method LMM1, which does not account for the correlation between the SBP usual level and variability, gives a higher estimate of the association with the usual level of SBP than method LMM2, which does account for the correlation, as would be expected from the arguments given in Section 1 of the Supplementary Materials. The estimate of the association with SBP variability is generally lower for LMM2 than LMM1, suggesting a slight bias in the LMM1 estimates. The standard errors are, however, considerably larger for LMM2 than for LMM1. There is little difference between the results from the joint models and the results from the two-stage LMMs. Adjusting for other CVD risk factors in the mixed effects model makes little difference to the estimated association with SBP usual level and variability.
Results from all models applied to the full ARIC dataset are shown in Table 4. For the models without random slopes, the logHRs for the SBP usual level and SBP variability are similar to those from the truncated dataset. Standard errors are slightly smaller reflecting the increased sample size. But, for the models with random slopes, the logHRs for the effect of slope on the hazard of CVD are very different for the full dataset, with more positive slopes now leading to a lower risk of a CVD event when adjusted for the SBP usual level and SBP variability. This is counterintuitive to what we would expect; as higher SBP is associated with greater risk of CVD, we would expect those with increasing SBP over time to also have a higher risk.

DISCUSSION
We have proposed two-stage and joint modelling methods to estimate the association between visit-to-visit SBP variability and CVD, both of which use extensions of standard mixed effects models to allow different individuals to have different SBP variabilities. Both methods reduce regression dilution bias in the estimated hazard ratio compared to naive methods that have previously been used where SBP variability is estimated on an individual-by-individual basis. In addition, the joint modelling methods allow for informative truncation of the repeated measurement process by the event of interest. In practice, the method of choice will depend on the likely extent of the biases incurred by the two-stage approach vs the computational tractability of the joint modelling approach.   (1)  An important assumption of the two-stage method is that truncation of the observation process is noninformative; otherwise, the number of observations may depend on the underlying risk level, which may lead to bias. Informative truncation can be avoided in the two-stage approach if distinct periods of follow-up for longitudinal and survival data can be defined, but this may only be possible at the cost of discarding information from individuals with events during longitudinal follow-up and repeated measurements taken during survival follow-up. In the clinical literature, analyses have been conducted both using survival follow-up recorded subsequently to the repeated observations of SBP and using survival follow-up recorded concurrently with SBP measurements. As an example of the former, Rothwell et al explored the use of 7 compared with 10 SBP measurements but commented that, when 10 SBP measurements were used, survival follow-up was shorter. 5 Another important assumption of our method is that the measurement schedule is noninformative, which will not be satisfied in some studies. Some clinical studies investigating the association between SBP variability and CVD have used data, which has not been collected for research purposes, such as data from electronic health records. For example, Gosmanova et al used data from around 3 million US veterans receiving healthcare from the US Veterans Health Administration 25 and Hippisley-Cox et al used SBP variability as a risk predictor in the most recent QRISK tool for predicting CVD based on around 8 million individuals attending general practices in the UK. 2 In such cases, additional bias may be incurred by informative observation of the repeated measurements, where the act of measurement is more (or less) likely in individuals at greater risk of disease. In these cases, it may be necessary to extend the joint model to include an additional submodel for the recurrent event process of an SBP measurement being taken. However, for cohort studies with a pre-specified measurement schedule, such as the ARIC study analysed in this paper, such additional model complexity is unlikely to be warranted. We therefore leave this investigation for future research.
Results from our simulation studies suggest that the best model giving the least bias and coverage closest to nominal levels is the joint model that allows for correlation between the SBP usual level and the SBP variability. The naive method gives considerable bias in the association between SBP variability and CVD, as expected. Results previously reported in the CVD literature using the naive method could be subject to considerable bias in both the estimated association and its standard error. Our simulation results also indicate that there may be a complex interplay between different sources of bias. While the use of mixed effects models and joint models can reduce regression dilution bias, model misspecification, such as not accounting for correlation between these variables, may introduce additional sources of bias.
Our analyses of the ARIC data found evidence of a positive association between higher SBP variability and the risk of CVD events. For this dataset, results between the two-stage LMM approach and the joint modelling approach were similar, suggesting that the biases incurred by use of the two-stage approach were minimal in this example. However, use of the joint model comes at a cost of considerably longer computational time. This might be reduced by running multiple MCMC chains in parallel. If computational time is an issue, then a possible modelling strategy would be to use the two-stage method for model selection and the joint model for inference. If there was also clinical interest in identifying explanatory variables associated with the longitudinal variability, then the mixed effects models could be extended to incorporate a linear predictor in the mean of the log i distribution, ,i = T X i . Similar models have previously been considered by Hedeker et al 9,10 and Goldstein et al. 11