Redundancy among risk predictors derived from heart rate variability and dynamics: ALLSTAR big data analysis

Background Many indices of heart rate variability (HRV) and heart rate dynamics have been proposed as cardiovascular mortality risk predictors, but the redundancy between their predictive powers is unknown. Methods From the Allostatic State Mapping by Ambulatory ECG Repository project database, 24‐hr ECG data showing continuous sinus rhythm were extracted and SD of normal‐to‐normal R‐R interval (SDNN), very‐low‐frequency power (VLF), scaling exponent α1, deceleration capacity (DC), and non‐Gaussianity λ25s were calculated. The values were dichotomized into high‐risk and low‐risk values using the cutoffs reported in previous studies to predict mortality after acute myocardial infarction. The rate of multiple high‐risk predictors accumulating in the same person was examined and was compared with the rate expected under the assumption that these predictors are independent of each other. Results Among 265,291 ECG data from the ALLSTAR database, the rates of subjects with high‐risk SDNN, DC, VLF, α1, and λ25s values were 2.95, 2.75, 5.89, 15.75, and 18.82%, respectively. The observed rate of subjects without any high‐risk value was 66.68%, which was 1.10 times the expected rate (60.74%). The ratios of observed rate to the expected rate at which one, two, three, four, and five high‐risk values accumulate in the same person were 0.73 times (24.10 and 32.82%), 1.10 times (6.56 and 5.99%), 4.26 times (1.87 and 0.44%), 47.66 times (0.63 and 0.013%), and 1,140.66 times (0.16 and 0.00014%), respectively. Conclusions High‐risk predictors of HRV and heart rate dynamics tend to cluster in the same person, indicating a high degree of redundancy between them.

The HRV indices are classified into time-domain and frequency-domain indices (Camm et al., 1996). The time-domain indices consist of various statistical measures of the variations in normal-to-normal (N-N) R-R interval (R-R interval of consecutive sinus rhythms). The representatives are the standard deviation of 24-hr N-N interval (SDNN) (Kleiger et al., 1987) and deceleration capacity (DC) (Kantelhardt et al., 2007). A decrease in these indices predicts increased mortality risk after acute myocardial infarction (AMI) (Bauer et al., 2006). The frequency-domain indices are calculated by power spectral analysis of the N-N interval time series and are quantified as the power of frequency components.
Among such components, a reduction in the power of a very-lowfrequency band (0.0033-0.04 Hz) is the most powerful predictor of post-AMI mortality (Bigger et al., 1992). The indices of HR dynamics include various nonlinear indices that capture the qualitative feature of fluctuation. Detrended fluctuation analysis (DFA) (Peng et al., 1995) quantifies the scaling exponents of fractal-like HR dynamics, and a reduction in the short-term (4-11 beats) exponent (α 1 ) is increased mortality risk in post-AMI patients (Huikuri et al., 2000). The non-Gaussianity index (λ) quantifies the probability density function for abrupt large HR changes (Kiyono et al., 2007), and an increase in λ predicts increased mortality risk in patients with heart failure (Kiyono et al., 2008) and in those after AMI (Hayano et al., 2011).
Earlier reports advocating new mortality predictors have shown that their predictive power is independent, at least partly, of those reported earlier. However, there are no large-scale studies that verified systematically the relationships between the predictors, and there are no credible facts as to whether they are independent of each other, or whether there is redundancy among them.
In this study, we analyzed their inter-relationships and examined the degree of redundancy among the major prognostic predictors of HRV and HR dynamics. For these purposes, we used 24-hr ECG big data obtained from the Allostatic State Mapping by Ambulatory ECG Repository (ALLSTAR) project (Hayano, Kisohara, Ueda, & Yuda, 2020;Hayano et al., 2018).

| ALLSTAR database
We obtained 24-hr ECG data from the ALLSTAR database, which   Labour and Welfare, Japan, December 22, 2014), the purpose and information utilized in this project have been public through the project's homepages (http://www.suzuk en.co.jp/produ ct/holte r/ detai l/ and http://www.med.nagoy a-cu.ac.jp/mededu.dir/allst ar/), in which opportunities to refuse the uses of information are ensured for the research subjects.
The 24-hr ECG data in this database were recorded for some clinical purpose(s) by medical facilities and were referred for analysis to three ECG analysis centers (Suzuken Co., Ltd.) located in Tokyo, Nagoya, and Sapporo in Japan. The data were anonymized by the centers and stored with accompanying information, including age, sex, and recording date, time, and location (postal code). Table 1 shows the characteristics of ALLSTAR subjects, including underlying cardiac diseases, cardiovascular risk factors, and medications, obtained from a randomized survey of 73,582 (17%) subjects.
All data were recorded with the Cardy series of Holter ECG recorders (Cardy 2, Cardy 2P, Cardy 203, Cardy 301, Cardy 302 Mini and Max, Cardy 303 pico and Cardy 303 pico+, Suzuken Co., Ltd., Nagoya, Japan), by which multi-channel ECG data were digitized at 125 Hz with 10 bit (0.02 mV/digit). The digitized data were sent to the analysis centers and analyzed with Holter ECG analyzers (Cardy Analyzer 05, Suzuken Co., Ltd.); the temporal positions of all R waves were determined, the rhythm annotations were given to all QRS complexes, and all errors in the automated analysis were corrected manually by skilled medical technologists. The suspicious outcomes of the analysis have been reviewed by contracted cardiologists.

| Data selection
From the ALLSTAR database, 24-hr ECG data were selected for this study with the following criteria.
Data were included only if all of the following were met: 1. Subject age at ECG recording >20 year 2. The first ECG recording, if there was a repeated recording 3. Record length >21.6 hr (90% of 24 hr), and 4. Cardiac rhythm is in sinus rhythm for >19.2 hr (80% of 24 hr).
Data were excluded if ECG showed at least one of the following: 1. Evidence of artificial pacemaker implantation or 2. Non-sinus rhythm beats >20% of total recorded beats.

| Computations of predictors
We studied the time-domain and frequency-domain indices of HRV and nonlinear indices of HR dynamics that are known as the major predictors of increased cardiovascular mortality risk. They were computed by the methods according to the recommended standard (Camm et al., 1996) and to the earlier studies (Iyengar, Peng, Morin, Goldberger, & Lipsitz, 1996;Kantelhardt et al., 2007;Kiyono et al., 2007;Peng et al., 1995). Briefly, from the ECG data, the time ces, we calculated the fractal correlation properties of HR dynamics using the DFA method and measured the short-term (4 to 11 beats) scaling exponents(α 1 ) (Iyengar et al., 1996;Peng et al., 1995). We also calculated the non-Gaussianity index of λ setting the scale at 25 s (λ 25s ) according to the previous study (Hayano et al., 2011).

| Assessment of inter-relationships between predictors
To assess the relationships between the predictors, Pearson's correlation coefficients (r) were calculated and the similarities between them were defined as (1 − |r|). To visualize the inter-relationships, a relationship map of the predictors, where the distances between them best match with the similarities between them, was obtained by exhaustive computer searching through all possibilities. The mutual independency of predictors was also assessed by the squared multiple correlation coefficient (R 2 ) calculated by multiple regression analysis of predictor with all other predictors as explanatory variables.

| Assessment of redundancy
To assess the redundancy between their predictive powers, the values of each predictor were dichotomized into high risk and low risk using the cutoff reported in earlier studies to predict post-AMI mortality. We used SDNN < 65 ms (Huikuri et al., 2000), DC ≤ 2.5 ms (Bauer et al., 2006), VLF < 5.75 ln(ms 2 ) (Huikuri et al., 2000), DFA α 1 < 0.75 (Huikuri et al., 2000), and λ 25s > 0.6 (Hayano et al., 2011) as high risk. Note that SDNN, DC, VLF, and DFA α 1 represent high risk when the values are below the thresholds, while λ 25s represents high risk when the value is above the threshold (Hayano et al., 2011).
Then, the appearance rate of the high-risk value for each predictor was measured. Using the obtained rate, the expected probability that 0 to 5 high-risk predictors out of 5 would appear in the same person was calculated under the assumption that the predictors are independent of each other. Then, we compared the expected and observed risk clustering rates.

| Statistical analysis
SAS program package (SAS Institute) was used for statistical analyses.
The Freq, Corr, and Reg procedures were used for assessing occurrence rates, correlation coefficients, and multiple correlations, respectively. p < .05 was used for the criteria of statistical significance. recordings. Of the remaining 321,220 data, 21,014 (6.54%) due to insufficient data length, 20,034 (6.23%) due to atrial fibrillation or flatter, 13,658 (4.25%) due to frequent > 20% ectopic beats, and 1,323 (0.41%) due to implanted pacemaker were excluded, and 265,291

| Characteristics of the sample population
data that met all criteria were finally used in this study. Subjects of the used data were aged 65 ± 16 (mean ± SD) years, 116,554 males, and 148,737 females.  The explained variances by the other predictors (R 2 ), whose small values represent independence from the others, were the smallest for λ 25s and the largest for VLF.  Table 5 shows the observed rate of the low-risk and high-risk subjects for each predictor. From these rates, the expected rates of multiple high-risk predictor items accumulating in the same person were calculated under the assumption that the predictors are independent of each other. As shown in Table 6, the observed rate of subjects with no high-risk item was 66.68%, which was 1.10 times the expected rate (60.74%). The observed rate was lower (0.73 times) than the expected rate for subjects who had only one high-risk item.

| Redundancy of predictive powers
For the subjects with multiple (two to five) high-risk items, the observed rates were always greater than the expected rate, and for the subjects with all of the five high-risk items, the observed rate was 1,140.66 times the expected rate. F I G U R E 1 Relationship mapping of HRV and HR dynamics indexes where the distance between the indexes best match with the similarity between the indices (1 -|r|). The placement of indices was determined by a computer search through all possibilities. Dotted lines indicate a negative correlation. The abbreviations are explained in the footnote to Table 2  Table 7 shows the expected and observed rates of all possible combinations of high-risk predictor items. For all combinations of four high-risk items, the observed rates were > 19 times higher than the expected rates. On the other hand, the observed rate of subjects with a single isolated high-risk value of SDNN, DC, α 1 , or λ 25s was lower than the expected rate.

| D ISCUSS I ON
To examine the redundancy between the major mortality risk predictors derived from HRV and HR dynamics, we analyzed the interrelationships between the predictors and the tendency of multiple risks clustering in the same person using the 24-hr Holter ECG big data from the ALLSTAR database. We observed substantial interrelationships among the predictors; particularly, a close positive correlation between SDNN and VLF, moderate positive correlations among VLF, DC, and DFA α 1 , and negative correlations of λ 25s with the other predictors. The inter-relationships were able to be visualized by a relationship mapping of predictors (Figure 1). We also observed that multiple high-risk values of predictors clustered in the same person at a much higher rate than expected rates calculated assuming their mutual independence. Our observations support a high degree of redundancy among their predictive powers.
This study is the first large-scale systematic analysis of the inter-relationships of risk predictors derived from HRV and HR dynamics in 24-hr Holter ECG. There were earlier studies to analyze the relationship between a part of HRV and HR dynamics indices. In the study first to report the prognostic value of DC for post-AMI mortality, Bauer et al. (Bauer et al., 2006) examined the independent predictive value of DC and SDNN by the Cox proportional hazards regression model. All of their three post-AMI cohorts, decreased SDNN ≤ 70 ms was a significant univariate mortality risk, but it no longer had independent predictive power when DC ≤ 2.5 ms was entered in the regression models.
Although the independence of prognostic power was not examined directly, Huikuri et al. (Huikuri et al., 2000) reported that α 1 < 0.75 predicted post-AMI arrhythmic death independently of established clinical risk factors, while none of the time-or frequency-domain HRV indices had independent predictive power. Finally, in a previous study in patients after myocardial infarction (Hayano et al., 2011), we demonstrated that λ 25s > 0.6 was a significant predictor of mortality even in the Cox hazards regression models with coexisting SDNN, DC, VLF, and DFA α 1. These studies suggest both dependency and independency of the predictors, but the entire relationships were unclear from them.
In this study, we used the relationship mapping to visualize the inter-relationship of all predictors. It was generated so that the distances represent the similarity between predictors. A thorough computer search produced some placements with similar optimality, but they were derivatives of rotation on a plane. Despite its simplicity, predictors that reflect the quality (DFA α 1 and λ 25s ) and quantity (SDNN and VLF) of HR fluctuations are separated in the upper left and lower right, and those that reflect sympathetic (λ 25s ) (Kiyono et al., 2008) and parasympathetic (SDNN and DFA α 1 ) (Huikuri, Perkiomaki, Maestri, & Pinna, 2009) functions are separated vertically. It also suggests the relative independency of λ 25s from the other predictors and the similarity between SDNN and VLF. For λ 25s , higher values represent increased risk (Hayano et al., 2011;Kiyono et al., 2008). Thus, it shows negative correlations with other risk predictors, which are indicated with dotted lines in the relationship mapping.
This study showed that high-risk values of HRV and HR dynamics tend to cluster in the same person. The probability of multiple high-risk predictor values appearing in the same person was higher than that expected assuming independence between predictors.
Furthermore, the ratio of observed rate to the expected rate increased as the number of risks clustering increased. These indicate that there is an association between predictors and that multiple factors may reflect common features. This is also consistent with the fact that the observed rate of subjects with a single isolated high-risk value of SDNN, DC, α 1 , or λ 25s was lower than the expected rate, indicating that these high-risk values are unlikely to occur alone. All of these support that there is a high degree of redundancy between the predictors derived from HRV and HR dynamics.
This study has limitations. First, the results of this study were obtained from a single database. Thus, the relationships and mutual dependencies that were found between the predictors derived from HRV and HR dynamics indices may be specific to the

| CON CLUS IONS
In the ALLSTAR 24-hr Holter ECG big data, we found that the risk predictors derived from HRV and HR dynamics show a strong tendency for their high-risk values to cluster in the same person. Our observations support that there is a high degree of redundancy between the predictors. Data for no abnormality and for the combination of all of five abnormalities are presented in Table 6 as numbers of abnormalities of 0 and 5, respectively.
*Ratio of observed rate to expected rate. Abbreviations are explained in the footnote to Table 2. TA B L E 7 Expected and observed rates of subjects by the combination of highrisk values