Prediction of death rates for cardiovascular diseases and cancers

Abstract Background To estimate cardiovascular and cancer death rates by regions and time periods. Design Novel statistical methods were used to analyze clinical surveillance data. Methods A multicenter, population‐based medical survey was performed. Annual recorded deaths from cardiovascular diseases were analyzed for all 195 countries of the world. It is challenging to model such data; few mathematical models can be applied because cardiovascular disease and cancer data are generally not normally distributed. Results A novel approach to assessing the biosystem reliability is introduced and has been found to be particularly suitable for analyzing multiregion environmental and healthcare systems. While traditional methods for analyzing temporal observations of multiregion processes do not deal with dimensionality efficiently, our methodology has been shown to be able to cope with this challenge. Conclusions Our novel methodology can be applied to public health and clinical survey data.

Assessing the reliability of healthcare systems and estimating excess mortality from CVDs using conventional statistical methods are challenging [30][31][32][33][34][35].To achieve the latter goal over large areas, degrees of freedom are typically calculated for random variables governing dynamic biological systems.In principle, the reliability of a complex biological system can be accurately estimated if there are sufficient measurements or by using Monte Carlo simulations.For CVDs and cancers, however, data are scarce before 1990 [30].Against this background, we introduce a novel method for assessing the reliability of biological and healthcare systems, to aid prediction and management of excess mortality from CVD.This study focused on crosscorrelations in CVD and cancer deaths among countries within the same climatic zone.Worldwide health data and related research are readily available online [30].
Lifetime data analysis with the application of extreme value theory is widespread in the fields of medicine and engineering, [30].A recent paper presented the arguments for and against using the upper distribution of life expectancy data [1].A bivariate lifetime distribution is often assumed when analyzing statistical data [3].A new approach that uses Clayton, Gumbel, and inverse Gaussian power variance functions, as well as conditional sampling and numerical approximation, was applied for survival analysis [2].However, few studies have aimed to predict excess CVD and cancer mortality; this paper aimed to address this deficit.
In this paper, excess mortality from CVD is viewed as an unexpected event that may occur in any country at any time.The nondimensional factor λ is used to predict CVD risk.Biological systems are influenced by environmental parameters that can be modeled as ergodic processes.The CVD and cancer incidence data for 195 countries during the period 1990-2019 were retrieved [30].The biological system under consideration herein can be regarded as a multidegree of freedom (MDOF) dynamic system with highly interrelated regional components/dimensions.This study focused on predicting excess mortality rather than symptoms.

| METHODS
Consider an MDOF biosystem subjected to random ergodic environmental influences.The other alternative is to view the process as being dependent on specific environmental parameters whose variation in time may be modeled as an ergodic process on its own.The MDOF biomedical response vector process   R R t ⃗ ( ) is measured and/or simulated over a sufficiently long time interval T (0, ).Unidimensional global maxima over the entire time span T (0, ) are denoted as . By sufficiently long time T , one primarily means a large value of T with respect to the dynamic system autocorrelation time.
Let X X , …, N 1 X be consequent in the time local maxima of the bioprocess X t ( ) at monotonously increasing discrete time instants  , …, ; , and so on.For simplicity, all R t ( ) components, and therefore, its maxima are assumed to be nonnegative.The aim is to estimate system failure probability with ∭ ( ) due to its high dimensionality and available data set limitations.In other words, the time instant when either X exceeds, Y exceeds, Z exceeds, and so on, the system is regarded as immediately failed.Fixed failure levels max , and so on, see Naess and Gaidai [32] and Naess and Moan [49].
Next, the local maxima temporal instants t in monotonously nondecreasing order being sorted into one single merged synthetic time vector . In this case, t j represents the local maxima of one of the MDOF biosystem response components either X t ( ), Y t ( ), or Z t ( ), and so on.That means that having R t ( ) time record, one just needs to continuously and simultaneously screen for unidimensional response component local maxima and record its exceedance of the MDOF limit vector η η η ( , , , …) in any of its components X Y Z , , , ….The local unidimensional response component maxima are merged into one temporal non- , …, ) in accordance with the merged time vector . That is to say, each local maxima R j is the actual encountered local maxima corresponding to either X t ( ), Y t ( ), or Z t ( ), and so on.
Equation ( 5) presents subsequent refinements of the statistical independence assumption.The latter type of approximations enables capturing the statistical dependence effect between neighboring maxima with increased accuracy.Since the original MDOF bioprocess R t ( ) was assumed ergodic and therefore stationary, probability will be independent of j but only dependent on conditioning level k.Thus, the nonexceedance probability can be approximated as in the Naess-Gaidai method, see [32,49], where: Note that Equation (6) follows from Equation (1) by neglecting , as the design failure probability is usually very small.Further, it is assumed that ≫ N k.Note that Equation ( 5) is similar to the well-known mean up-crossing rate equation for the probability of exceedance [32,49].There is observed convergence with respect to conditioning parameter k Note that Equation ( 6) for k = 1 turns into the quite well-known nonexceedance probability relationship with the mean up-crossing rate function where ν λ ( ) is the mean up-crossing rate of the response level λ for the above assembled nondimensional vector R t ( ) assembled from scaled MDOF biosystem response . The proposed methodology can also treat nonstationary cases.An illustration of how the methodology can be used to treat nonstationary cases is provided as follows.Consider a scattered diagram of m M = 1, …, bioenvironmental states, with each short-term bioenvironmental state having probability q m so that  . The corresponding long-term equation is then with p λ m ( , ) k being the same function as in Equation ( 7) but corresponding to a specific short-term environmental state with the number m.Note that this statistical model has already been validated [47,[50][51][52].

| RESULTS
Prediction of CVD and cancer has long been a target in the fields of epidemiology and mathematical biology.Public health systems are dynamic, highly nonlinear, multidimensional, and spatially diverse systems that are challenging to analyze.Previous studies have used a variety of approaches to predict CVD and cancer cases.In this section, the above-described methodology is applied to real-world CVD data sets for all countries of the world.
The statistical data in the present section are from the "Our World in Data" website [30], which provides annual CVD death rates for all countries for the period 1990-2019.The death rates for the 195 countries (components X Y Z , , …) constitute 195 dimensional (195D) data for a dynamic biological system.
General failure limits (η η η , , ,… ), that is, CVD thresholds, are less intuitive than setting failure limits for each individual country according to its population, such that X Y Z , , , … are equal to the annual death rate of a given country.The death rate for cancer is lower than that for CVD, but it is typically more painful to die from cancer.In this paper, the "failure limit" for cancer is lowered fourfold to match that for CVD.
Next, the local maxima from all nondimensionalized time series data are merged into a single time series using Equation ( 5): Each maximum, such as X X max{ , } j j cardio cancer , is inserted into single time series according to its temporal occurrence (denoted by subscript j).
Figure 1 presents the annual deaths from CVD and cancer by country and year.Figure 2 presents the number of new deaths as a 195D vector R ⃗ .Data for Uzbekistan were excluded from the analysis because they were regarded as outliers.R ⃗ was assembled from different regional components, that is, CVD data sets.Index j is a running index of local maxima encountered in the "non-decreasing" time series.
Overall, there is a clear East-West divide in the CVD death rates.Rates across North America and Western/Northern Europe tended to be lower than those across Eastern Europe, Asia, and Africa.For most of Latin America, the rates were moderate.As an example, in France, the age-standardized CVD death rate was around 86 per 100,000 in 2017, while across Eastern Europe, it was around five times higher (400-500 per 100,000).Uzbekistan had the highest rate of 724 per 100,000.
Figure 3 presents the predicted annual CVD death rates (percentage relative to the entire population of a given country) over 100 years, extrapolated from Equation (10).λ = 0.6% was used as a cut-off value.The 95% confidence intervals (CIs) were calculated.According to Equation (5), p λ ( ) is directly related to the target failure probability ( P 1 − ) derived from Equation (1).Therefore, system failure probability can be estimated as k .Note that, in Equation ( 6), N corresponds to the total number of local maxima in response vector R ⃗ .Conditioning parameter k = 3 was F I G U R E 1 Annual deaths from cardiovascular disease and cancer as a percentage of the population for 195 countries.
found to be sufficient because of the convergence of k (see Equation 6).In Figure 3, the 95% CIs are relatively narrow, which represents an advantage of the proposed method.Table 1 compares 100-year predictions based on data for 15-and 30-year periods.The 15-year data set was derived from the full 30-year data set by omitting odd years.The 95% CIs were wider for the truncated data set, as expected.The predicted average annual CVDs over the next 100 years, among all years and countries, were found below 1%.Our methodology uses available data efficiently by assuming that healthcare system data sets are multidimensional and extrapolates death rates even when the data set is relatively limited.The predicted nondimensional factor λ, indicated by the star in Figure 3, represents the probability of excess CVD mortality for any given country.Our method could be applied to predict cancer clusters, rather than merely death rates over time, which would be of high practical importance.

| CONCLUSIONS
Traditional methods for assessing the reliability of healthcare systems on the basis of time series data do not efficiently deal with systems characterized by high dimensionality and cross-correlations.The main advantage of our methodology is its ability to assess the reliability of high-dimensional nonlinear dynamic systems.Despite its simplicity, the novel multidimensional modeling strategy introduced herein can be used for accurate forecasting of CVD death rates in individual countries.
We analyzed 195D data, that is, CVD and cancer death rates for 195 countries worldwide, for the period 1990-2019.A novel method for analyzing the reliability of a multidimensional biosystem was applied and the mechanisms of the proposed method were described in detail.Direct measurements and Monte Carlo simulations are both suitable for assessing the reliability of dynamic biological systems; however, the complexity and high dimensionality of such systems necessitate the further development of robust and accurate techniques that can use limited data sets in an efficient manner.
This study predicted an average annual death rate for CVD over a 100-year period of about 1% across countries and years.Under current national health management approaches, CVDs will continue to represent a threat to the health of the world population.
This study introduced a general-purpose, robust, and easy-to-apply method for analyzing the reliability of multidimensional systems.The method has previously been validated by application to a wide range of simulation models but only in the context of onedimensional systems; in general, highly accurate predictions were obtained.Both measurement and numerically simulated time series data can be analyzed.Applying the method to the data set used in this study yielded reasonable confidence intervals, indicating that it could serve as a useful tool for reliability studies of various nonlinear dynamic biological systems.Finally, the suggested methodology has many potential public health applications beyond the prediction of CVD death rates.

F
I G U R E 2 Left: Cross-correlations between cardiovascular disease (CVD) and cancer cases as a percentage of the population.Right: Annual death rates as a 195-dimensional vector R ⃗ , as a percentage of the population of the corresponding country.The cancer rate was increased fourfold to match that of CVD.F I G U R E 3 Death rate predictions over 100 years extrapolated from p λ ( ) k .The critical level is indicated by a star.The 95% confidence intervals are indicated by dotted lines.The percentage of the population is represented by the horizontal axis.Left: Predictions based on 30 years of data; Right: predictions based on 15 years of data.T A B L E 1 Predicted cardiovascular disease death rates over 100 years based on 30-and 15-year data sets.