Construction of a risk assessment model of cardiovascular disease in a rural Chinese hypertensive population based on lasso‐Cox analysis

Abstract Many assessments have been used to predict cardiovascular risks in the general population, but their applicability in patients with hypertension needs to be further evaluated. In the current study, a cardiovascular risk assessment model was constructed in a hypertensive population. This prospective cohort study was conducted with cardiovascular examinations in rural northeast China in 2012 and 2013, and followed up to collect cardiovascular events in 2015 and 2018. Data were derived from 4763 hypertensive patients who were free of cardiovascular disease (CVD) at baseline and completed follow‐up. After lasso regression was used to screen for risk factors of CVD at baseline, a multivariate Cox regression risk model was established and a nomogram was developed. The model was validated using an independent test set (one third of data not used for model building). Among 4763 patients, 354 (7.43%) had a cardiovascular event during a median follow‐up of 4.66 years. Nine risk factors were screened by lasso regression, including sex, age, current smoking, body mass index (BMI), history of transient ischemic attack (TIA), family history of hypertension, family history of stroke, physical labor intensity, and high low‐density lipoprotein cholesterol (LDL‐C). The c‐index of the CVD model was 0.707, and that of an updated model with baseline blood pressure was 0.732. In the validated cohort the respective c‐indexes were 0.665 and 0.714. An assessment model of CVD risk was established in a hypertensive population which may provide an original prevention strategy for hypertensive populations in rural China, and further reduce the CVD burden.

in China. 4 The high morbidity and mortality of CVD have imposed a heavy economic burden on China. Therefore, CVD prevention and control in rural areas of China is highly desirable.
Hypertension has been recognized as an independent risk factor for cardiovascular events. 5 The risk of cardiovascular events in hypertensive patients is 2-5 times higher than that in normotensive patients. 6 More than half of CVD cases in China are related to hypertension. 7 Hypertension is a major cause of CVD and an important cause of premature death in China. 2 The latest national data show that the Chinese hypertensive population has exceeded 300 million. 8 The prevalence of hypertension in rural areas of northeast China is as high as 50%. 9 Therefore, increased prevention and control of cardiovascular events in Chinese rural hypertensive populations is required, and overall cardiovascular risk assessment and risk stratification are important strategies for the prevention and treatment of CVDs.

Study population
The In the present analysis participants with normal blood pressure (n = 4976) or with CVD (n = 610) were excluded, leaving 4763 participants with hypertension who were free of CVD at baseline. The data were randomly divided into a training set (n = 3176, two thirds of the data) for model construction and an independent test set (n = 1587, one third of the data) for model validation. Figure S1 shows the patient sample size and exclusion criteria. The Ethics Committee of China Medical University (Shenyang, China) approved the research protocol, and written informed consent was formally obtained from all participants.

Study variables
Physical examinations and detailed methodology on the data collection process have been described elsewhere. 9

Definitions
Hypertension was defined as systolic BP (SBP) ≥ 140 mm Hg and/or diastolic BP (DBP) ≥ 90 mm Hg and/or the administration of antihypertension drugs within 2 weeks. 11 Diabetes mellitus was defined as having a fasting glucose level ≥ 7.0 mmol/L and/or a self-reported diagnosis that was previously determined by a physician. 12 Physical labor intensity was defined as occupational physical activity intensity and categorized into three groups(light; moderate; heavy), which has been presented elsewhere. 9 Current smoking was defined as self-reported at least one cigarette/day. 13

Adjudication of outcomes
The primary outcome in the current study was incident CVD. The median follow-up period was 4.66 years. CVD was defined as fatal or nonfatal CHD and stroke. 10 Adjudication of the occurrence of CHD and stroke has been described previously. 10

Statistical analysis
When continuous variables conformed to a normal distribution the mean and standard deviation were evaluated, and Student's t-test or one-way analysis of variance were used for statistical analysis.
When the variable type is classified variable the corresponding constituent ratio of each category was calculated, and chi square analysis or Fisher's exact test were used for statistical analysis.
Candidate variables were screened via lasso regression to achieve dimensionality reduction and optimization of the model, and prevent the over-fitting phenomenon. All variables selected were included to  The glmnet package of R was used for variable selection in the lasso method, and the rms package of R was used for drawing and internal verification of the nomogram (c-index and calibration chart). ROC curves were drawn with the survival ROC package.
Comparisons of the performance among the nomograms were conducted by calculating the category-free net reclassification improvement (NRI) 16 and integrated discrimination improvement (IDI). The Z test was used to calculate p values of IDI. Cox proportional hazards regression modeling was conducted using the survival package.
The main statistical analysis software used in this study was R ver-sion 3.5.1, and two-tailed p values < .05 were considered statistically significant.

Baseline characteristics
Baseline characteristics of the participants in the development cohort and the validation cohort are shown in between the two cohorts.

Variable selection
Lasso-penalized Cox analysis was performed in the development cohort to narrow the candidate independent variables. Because all participants were hypertensive patients, different levels of BP were not screened. The incorporation of BP levels into the updated model was used to highlight the importance of blood pressure control level on the CVD outcomes. Different BP levels correspond to different risk levels of CVD, 17,18 therefore baseline SBP and DPB levels were used for subsequent updating of the model. The process of independent variable selection via lasso regression is shown in Figure S2. Nine variables were selected: sex, age, current smoking, BMI, history of TIA, family history of hypertension, family history of stroke, physical labor intensity, and high LDL-C. In multivariate Cox regression analysis these nine variables were risk factors for CVD in patients with hypertension in the development cohort (Table S1).

Constructing and validating a predictive nomogram (model A)
A nomogram was constructed to predict 2-year and 4-year CVD incidence in 3176 participants with hypertension using the above-

Updating and reevaluation of the model (model B)
In univariate Cox regression mean SBP and mean DBP were risk factors for CVD, with respective hazard ratios of 1.024 and 1.019 (p < .001; Table S2). Model A was updated with SBP and DBP grading levels. The multivariate Cox regression model after the incorporation of updated variables is shown in Table S3. The updated predictive nomogram is shown in Figure 3, and the calibration plots are shown in  Figure S4C, S4D). The calibration and c-index of model B were better than those of model A.

Indicators for comparison between the original model A and the updated model B: NRI and IDI
NRI was defined as the difference in the proportions of participants with events correctly assigned a higher probability (NRI + ) and participants without events correctly assigned a low probability (NRI -) by an updated model compared with the old model. 19,20 The total NRI was statistically significant with respect to both the 2-year cumulative incidence of CVD and the 4-year cumulative incidence of CVD (Table 2), which showed that model B was better than model A. NRIwas statistically significant, which indicated that the prediction accuracy of model B for individuals without events was better than that of model A. In addition, the reclassification results of the updated model compared with the primary model indicated that the IDI index of the development cohort and the validation cohort were both greater than 0 for 2year cumulative CVD incidence and 4-year cumulative CVD incidence (all with p < .05). This indicated that the reclassification of the updated model after the addition of SBP and DBP was more accurate and the model performed better.

DISCUSSION
Since the development of Kannel  Since the publication of SPRINT research results, 25 24,28,29 This is consistent with the current study. Obesity has been an established risk factor for increased cardiovascular disease. 30 The obesity paradox does exist in patients with hypertension, that is, the prevalence of adverse events in thin or normal weight patients with hypertension are higher than those of obese patients. 31 of the baseline data was lost. Although multiple imputation was used, these lost data may limit interpretation of the results. Lastly, the defect of the definition of CVD does not include incident heart failure may limit our results.

AUTHORS CONTRIBUTION
Yingxian Sun and Nanxiang Ouyang contributed to the conception or design of the work. Guangxiao Li and Chang Wang contributed to the acquisition, analysis, or interpretation of data for the work. Nanxiang Ouyang drafted the manuscript. All gave final approval and agree to be accountable for all aspects of work ensuring integrity and accuracy.

CONFLICTS OF INTEREST
There are no conflicts of interest.