Stochastic differential equation modeling the prevalence of rare chronic diseases with an application to systemic lupus erythematosus



In this article we describe a system of stochastic differential equations to model the age-specific prevalence of rare chronic diseases from incidence and mortality rates. As an application, the age profile of the prevalence of systemic lupus erythematosus in England and Wales in1995 is calculated. The results are in good agreement with the observed epidemiological measures.

1 Introduction

With a view to basic epidemiological parameters such as incidence, prevalence and mortality of a disease, it has been proven useful to consider so-called state models or compartmental models (Fix and Neyman, 1951; Chiang, 1968). The model used here is also termed illness-death model (Kalbfleisch and Prentice, 2002). It consists of the three states Normal, Disease, Death and the transitions between the states. Normal means non-diseased with respect to the disease under consideration. The numbers of persons in the Normal and Disease state are denoted as S (susceptibles) and C (cases), respectively. The transition intensities (synonymously: rates) are called as shown in Figure 1: i is the incidence rate, m0 and m1 are the mortality rates of the non-diseased and diseased persons, respectively. In general, the intensities depend on calendar time t and age a with the mortality m1 also depending on the duration d of the disease.

Figure 1.

Illness-death model of a chronic disease with three states. Persons in the state Normal are healthy with respect to the considered disease. In the state Disease, they suffer from the disease. In the most general case, the transition rates depend on the calendar time t, age a, and in case of the disease-specific mortality m1, also on the duration d of disease.

When the rates do not depend on calendar time t, the model is called time-homogeneous. Then, with the additional assumption that there is no dependence of m1 on the duration, Murray and Lopez (1994) have considered a system of ordinary differential equations to relate the changes of the numbers of healthy and diseased persons with the rates of the inflows and outflows of the corresponding states:

display math(1)

Age a plays the role of temporal progression. The linear system in Equation ((1)) looks relatively harmless, but the impression is misleading. Mostly, only the age-specific mortality of the general population is well-known, and rate m1 is epidemiologically accessible as relative risk. Then, the system becomes nonlinear. Once the functions S and C are known, the age-specific prevalence math formula can be calculated.

The benefits of these equations are twofold. First, for smooth incidence and mortality rates plus an initial condition, the age profile of the numbers of patients or the prevalence is uniquely determined. To state it clearly, the ‘forces’ incidence and mortality uniquely prescribe the prevalence – not only qualitatively but in these quantitative terms. In this, we speak of the forward problem: we deduce the effect, namely the number of diseased persons, from the causes (the forces) Second, the reverse way means deducing from the numbers of diseased persons to the incidence. This is the inverse problem as we infer from the effect to the cause.

This paper is structured as follows: in the next section, we describe the illness-death model of Figure 1 in terms of a new system of two stochastic differential equations (SDEs). As an application, in section 3, we solve a forward problem to estimate the age-specific prevalence of systemic lupus erythematosus (SLE) in England and Wales from published data. This allows the calculation of the mean age at onset of SLE, the mean duration and the burden of SLE in terms of the number of diseased persons. These values are compared with the empirically measured values.

2 Stochastic description of the illness-death model

In this section, we consider S(a) and C(a), the numbers of susceptibles and diseased persons at age a,  a ≥ 0, as random variables. Let X(a) := (S(a), C(a))t be the composite vector (the superscript denotes transposition). For Δa > 0 define the vector ΔX of increments:

display math

Now, we follow the reasoning of (Allen, 1999) and (Allen, 2008), who have applied the theory presented here in the field of infectious diseases modeling.

Choose Δa > 0 sufficiently small, such that at most one person can change the state. If a person changes the state, three cases can occur: the person may die without the disease, the person may die with the disease or the person may contract the disease. Thus, in accordance with the definition of the rates i, m0 and m1, the following assumptions about the probability distribution P are made:

display math

The last case is the situation that nobody changes the state. If we further assume that the increments are normally distributed, we obtain the expected value

display math

and covariance matrix

display math

The matrix V is symmetric and positive definite. Hence, there exists a uniquely determined matrix square root V1/2. Because of the normal distribution assumption, the vector X fulfills

display math(2)

where ξ = (ξ1,ξ2)t has normally distributed components ξi ∼ N(0,1),  i = 1, 2.

Under certain smoothness conditions on the coefficient functions i, m1 and m0, the difference equation ((2)) is a Euler approximation to the Itō SDE system

display math(3)

In this expression, W1 and W2 are independent Wiener processes (Kloeden and Platen, 1999) and the 2 × 2 – matrix B = (bij) is the uniquely determined square root of the covariance matrix divided by Δa:

display math

The system ((3)) can be solved numerically, and the associated ‘prevalence path’ math formula can be derived. Once the prevalence is known, epidemiologic characteristics can be calculated easily. The total number of persons, math formula, with the disease is given by

display math(4)

where N(a) := S(a) + C(a) denotes the number of persons aged a. Moreover, the mean duration math formula of the disease is the number of person-years of all diseased persons divided by the total number of persons who ever contracted the disease. Thus, it holds

display math(5)

Similarly, the age at onset of the disease math formula may be computed by

display math(6)

3 Application to systemic lupus erythematosus

In this section, the SDE system ((3)) is applied to epidemiological data of SLE in England and Wales. SLE is a severe rheumatic disease with a variety of clinical manifestations. Despite several therapy options, patients often are restricted heavily in quality of life and ability to work. Epidemiological data are rare. Here, the incidence data for male and female individuals are taken from the UK General Practice Research Database in the years 1990–1999 as reported in (Somers et al., 2007). The mortality m1 of SLE patients is modeled by the relative mortality as reported in (Bernatsky et al., 2006). In order to apply our model, duration of SLE was not taken into account.

For the mortality of the non-diseased persons, we choose to take the mortality in the general population. Because of the low prevalence of SLE, this is legitimate. Then, 5000 solution paths of the SDE system ((3)) are simulated by the Euler–Maruyama method (Kloeden and Platen, 1999), and the corresponding age-specific prevalence have been calculated. This is carried out for male and female individuals separately.

To give an illustration, each of the Figures 2 and 3 shows an example path of the prevalence (dotted line). Furthermore, the graphs show the regions where 95% of the 5000 solution paths are located. The upper and the lower dashed curve indicate the 97.5% and 2.5% quantile of the 5000 prevalence paths, respectively. This means, for each age a, the corresponding quantiles from the empirical distribution of the 5000 values at age a are calculated. Additionally, Figures 2 and 3 show the curve of the median (solid line).

Figure 2.

Age-specific prevalence of systemic lupus erythematosus in male individuals.

Figure 3.

Age-specific prevalence of systemic lupus erythematosus in female individuals.

The median curves of male and female individuals indicate the big difference of prevalent SLE between male and female individuals. This is due to the fact that gender is a risk factor for SLE, and incidences of male and female individuals differ strongly. The hazard ratio (female versus male) is about 10 in the age-group of 25–35 years. The hazard ratio decreases to about 5 in the following age-classes until 65 years and, after that, lowers to about 2.

To measure the burden of SLE in England and Wales, we estimate the total number math formula of persons with SLE by Equation ((4)). The number N(a) is obtained from official vital statistics in the year 1995 (Office for National Statistics, 2011).

Figures 4 and 5 show the distributions of math formula obtained from the 5000 paths. Again, the enormous difference between male and female individuals becomes obvious. While a median of 50.1 thousand female individuals are affected in England and Wales (interquartile range (IQR): 40.3–60.9), for male individuals, the corresponding number is 9.2 (IQR: 6.6–12.5) thousand. The mean duration math formula of SLE in male and female individuals is 23.2 (IQR: 16.7–31.5) and 23.9 (IQR: 16.2–29.1) years, respectively. Thus, genders do not differ much in that respect. With respect to the mean age at onset, math formula, the 5000 paths yield 51.750 (IQR: 51.748–51.752) and 46.108 (46.102–46.113) years in male and female individuals, respectively. It is striking that math formula has a relatively low variability in both genders. This is due to the factor 1 − p(a), which is close to unity irrespective of the variations in p.

Figure 4.

Histogram of the number of male individuals with systemic lupus erythematosus.

Figure 5.

Histogram of the number of female individuals with systemic lupus erythematosus.

4 Discussion

In the domain of infectious diseases, the theory of deterministic differential equations was generalized towards SDEs more than a decade ago (Allen, 1999). In this article, this transformation has been accomplished in the field of chronic diseases. The numbers of healthy and diseased persons have been modeled by a new system of two Itō SDEs. In rare chronic diseases, such as SLE, a stochastic formulation might be preferable over a deterministic one. Even if the incidence and mortality rates are well-known, statistical fluctuations in the number of diseased have a strong impact in the age course of the prevalence. This becomes obvious in Figures 4 and 5 where the distribution of the total number of male and female individuals with SLE in England and Wales in 1995 have been estimated. The middle 50 spans about 6 and 20 thousand male and female individuals, respectively. Additionally, other disease characteristics have been calculated. The mean age at onset as derived in our theoretical model is 51.8 and 46.1 for male and female individuals, respectively. The corresponding empirical values 52.2 and 46.3 observed in the register data by Somers et al. are in good agreement. Another hint for the appropriateness of the methods described here comes from the basic epidemiological equation that overall prevalence equals the product of overall incidence and duration of the disease (Szklo and Nieto, 2007, chapter 2). The overall prevalence in male and female individuals can easily be obtained by Equation ((4)) and the age pyramid N(a). In our model, the overall prevalence divided by the mean duration (cf. Equation ((5))) yields the overall incidence 1.58 and 7.95 per 100 000 person-years for male and female individuals, respectively. Again, this is close to the empirically observed values 1.60 and 8.01 per 100 000 person-years (Somers et al., 2007, Table 1). When relating the results of this study to other epidemiological data from the UK, especially the overall incidence for female individuals appears high. Hopkinson, Doherty, and Powell (1993) found a value of about 6.5 per 100 000 person-years only. However, it has to be noted that the data of Hopkinson et al., (1993) are, in a way, inconsistent: If we calculate the mean duration of SLE in female individuals by the basic epidemiological equation for the data of Hopkinson et al., we find a mean duration of about 7 years only, which contradicts common survival times of persons with SLE (Cervera et al., 2003).

Although there is an ongoing debate about the differences between childhood-onset SLE and adult-onset Livingston et al.,(2011) as well as duration of SLE as a risk factor for comorbidities and mortality, it has to be noticed that reliable epidemiologic data about age at onset, duration of the disease, mean age of diseased and so on are sparse or even lacking. As an example, the sytematic review (Danchenko, Satia, and Anthony 2006) about the global burden and epidemiology of SLE just found one study from Germany, the European country with the most inhabitants. The associated publication (Zink et al., 2001) was about prevalent cases, but did not mention a prevalence estimate. Incidence has not been addressed.

Theoretical models such as the one presented here may help to at least roughly estimate the burden and characteristics of rare chronic diseases. This is especially true in countries with few epidemiological or administrative data. However, the approach described here has several limitations. First, the SDE ((3)) does not take into account calendar time trends. In the application to SLE, it has been shown that relative mortality of SLE patients undergoes a secular trend (Bernatsky et al., 2006). In the same publication, we find that relative mortality in persons with SLE depends on the duration of the disease. The longer a person is diseased, the more the relative mortality risk decreases. Duration dependence is not modeled in Equation ((3)). Hence, the new approach may be used as an approximation only, and more evaluations of the method are necessary to examine the validity and applicability of the model. However, the disease characteristics derived by the new methods in this work are consistent and indicate an interesting and maybe fruitful way to go.