Concept and solution of digital twin based on a Stieltjes differential equation

We introduce the concept of a digital twin by using Stieltjes differential equations. A precise mathematical definition of solution to the problem is presented. We also analyze the existence and uniqueness of solutions, introduce the concept of main digital twin. As a particular case, the classical compartmental SIR (susceptible, infected, recovered) epidemic model is considered and we study the interrelation between the digital twin and the system. In doing so, we use Stieltjes derivatives to feed the data from the real system to the virtual model which, in return, improves it in real time. Numerical simulations with real data of the COVID‐19 epidemic show the accuracy of the proposed ideas.


INTRODUCTION
A digital twin (DT) can be defined as an evolving digital profile of the historical and current behavior of a physical object or real process that helps optimize the performance of the real process. It can also be defined as an avatar of a real physical system which exists in the computer. 1,2 It is important to warn that a DT is not just a mathematical model or a virtual counterpart. It may be a classical mathematical model, but with the plus of real-time inter-connections and feedback from the real product or system to the virtual model and vice versa. A DT is, in other words, a virtual model representing its physical counterpart which interacts with it in a continuous loop. The name comes from a Roadmap report by John Vickers of NASA in 2010 in which the authors goal was to improve the physical model simulation of spacecraft. 2,3 A DT consists of three distinct parts, namely: the physical object or real system, the digital/virtual model, and connections between these two objects. This method has been applied to, for example, aircraft engines, 4,5 wind turbines, 6 and networks, 7 just to cite some of them-cf. previous study. 8 The main idea is that the real and digital objects are related in a way such that data flows from the real system to the virtual one, and the information from the digital one is available to understand and predict the evolution of the real system, or even act on it to modify its behavior. In this way, both objects interact in manner which is twice beneficial: allowing the obtaining of a more accurate model and shaping the real objects behavior accordingly.
The study of pandemics by using mathematical compartmental models can be traced back to Kermack and McKendrick. 9 The objective of this method is to divide the total population in disjoint groups or compartments, depending on the state of health, and then study the spread level of a certain disease among the population. In this setting, every individual in the same group has the same characteristics, which simplifies the study of the model. The simplest epidemiological model is usually referred as SIR model and divides the total (constant) population in three compartments, namely: susceptible to infection (those individuals who have not been previously exposed to the disease), infected (those individuals who have been infected by the illness), and recovered (those who have passed the disease). Despite its simplicity, the SIR model can yield powerful insight in what managing a pandemic concerns, so it has been applied to analyze the spread of COVID-19, 10 among others. That said, there are many variations of the basic SIR model, and some of those include other compartments such as exposed, hospitalized, or superspreaders, 11 with temporary immunity and Lévy jumps, 12 which consider imperfect vaccination, 13 that take into account fractional derivatives, 14 difference equations, 15,16 and other variants as the stochastic SIS epidemic models. 17 The classical derivative can be also substituted by other operators such as Stieltjes derivatives. 18,19 This kind of derivatives (assuming that g is a continuous function at x) can be pictured as taking the value for some nondecreasing and left-continuous function g ∶ R → R. This mathematical tool possesses many of the important properties of the usual derivative. Moreover, it is extremely helpful when it comes to analyzing systems of differential equations with impulses or latency periods. 20 In our case, the impulses can be understood as corrections made to the initial model in order to take into consideration the new (real) data. Thereby, link between DT and Stieltjes derivatives appears in a natural way.
The main aim of this work is to introduce a precise mathematical definition of a solution to a digital twin, by using the Stieltjes derivative. Existence and uniqueness of solution are obtained by considering the proposed definition. Reinterpreting the concept of the digital twin as a Stieltjes differential equation allows us to study issues such as the existence and uniqueness of the said concept from a mathematical point of view. As an example, we will consider the simple SIR model to which we will feed real COVID-19 data which will illustrate, in turn, the utility of this approach. Numerical experiments are also presented to show the proposed definition and results.
The structure of this work is as follows: In Section 2, we introduce the basic definitions and notations. More precisely, we present the classical SIR model by using classical derivatives, and introduce Stieltjes derivatives. In Section 3, we give a new interpretation of the digital twin(s) by using the aforementioned Stieltjes derivatives and present a definition of solution, as well as a result concerning the existence of solution to the corresponding system of differential equations. Numerical simulations are presented in Section 4, comparing this approach to real data of the COVID-19 pandemic. Finally, we include the conclusions in Section 5.

BASIC DEFINITIONS AND NOTATIONS
Let us consider the SIR compartmental model representing the evolution of a disease at each time t ∈ [0, T]. 21 This is the so-called nominal model and it is calibrated at an initial time. Here, as usual, we have considered three compartments for the population, namely: S(t) denotes the population of susceptible individuals to an infectious, but not deadly, disease at time t; I(t) stands for the population of infected individuals at time t; and R(t) is used to represent the recovered individuals at time t. We have used classical derivatives in the SIR model, and we have denoted by the rate at which an infected person infects a susceptible person, and by the rate at which infected people recover from the disease. There is no explicit solution to the SIR model 22 and the numerical solution is usually calculated by using an appropriate numerical method requiring initial conditions for S(t), I(t) and R(t), or by considering a power series expansion. 23 We would like to recall that the definition of basic reproduction number, which can be understood as the expected number of secondary cases produced by a single infection in a completely susceptible population. We will be working with Stieltjes derivatives, so it is necessary to introduce some concepts regarding this kind of derivative. Let I ⊂ R be an interval and g ∶ I → R a left-continuous non-decreasing function. We will refer to such functions as derivators. Denote by u the usual topology of R. We define the g-topology g 18 as the family of those sets U ⊂ I such that for every x ∈ U there exists ∈ R + such that, if ∈ I satisfies |g( ) − g(x)| < , then ∈ U.
We say is g-continuous at x 0 ∈ I if for every ∈ R + there exists ∈ R + such that | ( ) − (x)| < if |g( ) − g(x)| < . We say is g-continuous on I if it is g-continuous at every point in I. It can be checked that a function ∶ (I, g ) → (R, u ) is continuous if and only if it is g-continuous. 24, Lemma 6 We define the set We also define D g as the set of all discontinuity points of g. Observe that, given that g is nondecreasing, we can write , t ∈ R, and g(t + ) denotes the right handside limit of g at t.
provided the corresponding limits exist. In that case, we say that is g-differentiable at t.
Associated to g we consider the Lebesgue-Stieltjes measure space (I,  g , g ), where  g and g are the -algebra and measure constructed in an analogous fashion to the classical Lebesgue measure, where the length of [a, b) is given by . We must emphasize that for g(t) = t we recover the classic Lebesgue measure space and ′ g (t) = ′ (t) is the usual derivative. Given a derivator g, we say that a function ∶ I → R is g-absolutely continuous on I ( ∈  g (I)) if given > 0 there exists > 0 such that for every pairwise disjoint family of subintervals {(a n , b n )} m n=1 such that ∑ m n=1 (g(b n ) − g(a n )) < we have that ∑ m n=1 | (b n ) − (a n )| < . We refer the reader to the reference 19 for details concerning the Lebesgue-Stieltjes measure space.

THE DIGITAL TWIN OF A SIR MODEL
Given the epidemiological compartmental model of SIR type detailed in (1), we can construct its digital twin as where t ∈ [0, T] and t s ∈ [0, T] are, respectively, the system time and what we shall call, for lack of a better name, "slow time." 1 The slow time t s can be thought of as a time variable which is much slower than t, such as, for instance, the number of cycles of the epidemic, some prescribed times at which the experimental data is available, or just a different time scale of some sort which carries with it qualitative and quantitative changes in the system which we have to take into consideration. These changes are represented by the fact that (t s ) and (t s ) change with t s .
Thus, S, I, and R are functions of two variables, that is, the solution will depend also on t s , so we can write S(t, t s ), I(t, t s ), R(t, t s ). As a consequence, the equations of the dynamical system are expressed in terms of the partial deriva-tives with respect to the time variable t. This will not imply that the SIR model (1) has turned into a partial differential equation system since the infinitesimal dependence is restricted to the variable t.
For t s = 0 (the initial time) the digital twin reduces to the original model since the only information available at that time is the initial conditions. It is important to observe that the rates (t s ) and (t s ) are unknown, so it will be necessary to estimate them from the data.
It is assumed that changes in (t s ) and (t s ) are so slow that the dynamics of (3) are effectively decoupled from these functional variations. As a consequence, both (t s ) and (t s ) are constant as far as the instantaneous dynamics of (3), as a function of the time t, is concerned.
The system receives real data at times determined by t s . We can assume that t s ∈ T s , where T s = T s ⊂ [0, T] is a locally finite set. Since, in this case, we are taking as time domain [0, T] instead of R or [0, ∞), this implies that T s is finite. We write T s = {t k s } k≥0 and assume that t 0 s = 0 and t k s < t k+1 s , for k ≥ 1. As previously mentioned, at each time t k s , with k ≥ 1, the system receives data that we can use to estimate the parameters (t s ) and (t s ) and also to correct the evolution of the digital twin. For example, suppose that at each of the times t k s the system receives real data from the variables S, I and R: where, if we assume that the times in T s are evenly spaced by a time span t > 0, t k = t k s − t, = 0, … , M, are the set of measurement times and S k , I k and R k , = 0, … , M are real data at times t k , = 0, … , M. In particular, t k s = t k 0 and it may happen that t k−1 s ∈ {t k ∶ = 0, … , M} if we consider overlap in the data that we will use to adjust the coefficients (t s ) and (t s ). For instance, if the digital twin receives a data packet every two days made up of real data of four days back and the periodicity of the data is daily ( t = 1), we are in the previous situation (M + 1 = 4). For the first two elements of T s we have that: Note that in the above situation, real data at times 2 and 4 are present in the data packets that are received at times t 1 s = 4 and t 2 s = 6. However, the real data is often corrected if errors are detected in the measurements. For example, in the context of the COVID pandemic, the government of Spain frequently corrected previous real data when they detected errors in the information received from the local governments. So, at each time step t k , we have the opportunity of correcting possible errors. With these real data, we can estimate the value of the parameters (t k s ) and (t k s ): At time t 0 s , we assume that (t 0 s ) and (t 0 s ) are known and also the initial data (S 0 0 , I 0 0 , R 0 0 ). Additionally, in order to fit the digital twin to the real data at the points t k s , we will assume that for k ≥ 0. Therefore, we can add to (3) the previous assumptions and consider the following system: Next, we shall introduce the concept of solution for the previous system (4). On the one hand, it only makes sense to consider the solution at pairs (t, t k s ), with t ≥ t k s , with k ≥ 0. Indeed, the system receives real data at time t k s . Following the principle of causality, it does not make sense to consider the solution at (t, t k s ) with t < t k s for the data provided by t k s is not yet available at time t k . On the other hand, given an element t ∈ [0, T], there exists a finite number of pairs (t, t k s ) with t > t k s . Thus, we have a finite number of values for S, I and R to choose from at time t: As a consequence, we can define the solution to (4) in the following way.

Definition 2.
We define the solution to the problem (4) as a multivalued map: where ⊔ denotes the disjoint union and and each triple (S(·, t k s ), I(·, t k s ), R(·, t k s )) is a solution to system (1)  Remark 1. If we are willing to lose the information regarding which value t s generates which point in R 3 , we can simplify the definition to a multivalued map where From the previous multivalued solution, we can define the concept of the main digital twin (MDT) as the "best forecast" that we can make at all times. We assume that this best forecast at a time t corresponds to the branch associated with the closest previous t k s element. Definition 3. Under the previous hypotheses, we define the main digital twin (MDT) as follows: Observe that, given an element t ∈ (t k s , t k+1 s ] (analogous for t ∈ (sup T s , T]), and this limit is not, in general, equal to MDT(t k s ) = (S(t k , t k−1 s ), I(t k , t k−1 s ), R(t k , t k−1 s )). In some way, the main digital twin can be interpreted as the solution to a system of differential equations with impulses. Indeed let us define We have that the main digital twin MDT(t) = (S(t), I(t), R(t)) is the solution to where m is the Lebesgue measure on [0, T]. If we define x(t) = (x 1 (t), x 2 (t), x 3 (t)) = (S(t), I(t), R(t)), , Equation (7) admits a classical formulation of a system of differential equations with impulses: Differential equations with impulses are a special case of differential equations with Stieltjes derivatives. 18,19 Therefore, we can define the concept of solution to problem (8) in terms of a system of differential equations with Stieltjes derivatives. We have the following result.

Theorem 1. Let us assume that sup T s < T, consider the derivator
and define

Then x is a solution to (8) if and only if x solves
Proof. The proof is a particular case of Márquez-Albés, 25, Theorem 3.43 which itself draws from López Pouso and Rodríguez. 19, Section 3. 2 We include an adaptation for the convenience of the reader. First of all, note that the derivator (9) is well defined in the sense that it is non decreasing function and left-continuous. We must observe that, given an element t k s ∈ T s ∖{t 0 s }, g(t k s + ) − g(t k s ) = 2 −k > 0. So let us see now each of the conditions separately: Let x be an element that solves (8). Given an element t ∉ T s ∖{t 0 s } we have, since T s = T s , that Therefore, we have that (x ) ′ g (t) exists if and only if x ′ (t) exists and both derivatives are equal, so Finally, given an element t k s ∈ T s ∖{t 0 s }, we have that and then x be an element that solves (10). Given an element t ∈ [0, T]∖(T s ∖{t 0 s }), we can proceed as in the previous case and we have that Finally, for elements t k s ∈ T s ∖{t 0 s }: which concludes the proof.
Therefore, the concept of the main digital twin (4) can be understood in the following sense.

Definition 4 (Main digital twin). We say that a function x ∈
Remark 2. Observe that • We have encoded in the derivator g s the times at which the main digital twin interacts with the real model. • In the term F s we have coded how the real data correct the evolution of the main digital twin (MDT). This correction is materialized in the modification of the parameters and , and in the correction of the MDT at the times at which it interacts with the real model.
Next we show a local result of existence and uniqueness for the main digital twin associated to (4) in the case T s finite.

Theorem 2. (Local existence and uniqueness of the main digital twin for (4)). Assume T s finite. Then there exists ∈ (0, T] such that (4) has an unique solution in the interval [0, ).
Proof. The proof is a particular case of Frigon and López Pouso. 18,Theorem 7.4 For r > 0 arbitrary we shall check the hypothesis of Frigon and López Pouso. 18,Theorem 7.4 In this case, the function F s occurring in (11) is given by For every x ∈ B ( , where we are considering the Euclidean norm in R 3 , the function F s (·, x) is clearly Borel measurable, in particular, g-measurable. Indeed ([0, T],  g , g ) is a Borel regular space by Rudin. 26,Theorem 2.18 It is clear from (6) that s , s ∈  1 g ([0, T)), so F s (·, (S 0 0 , I 0 0 , R 0 0 )) ∈  1 g ([0, T)) as well. Given x, y ∈ B ( , we have for t ∈ [0, T]∖(T s ∖{t 0 s }): On the other hand, given an element t k s ∈ T s ∖{t 0 s }, Thus if we denote by and also From now on, in order to simplify the matters, we will assume that [0, T) the maximal interval in which Theorem 2 is fulfilled.

NUMERICAL SIMULATIONS
In this section we will present the numerical results that we have obtained using real data from the evolution of COVID in Galicia (Spain). In our case we have considered real data from the May 2, 2020 (day 1) to April 11, 2021 (day 344). It is important to mention that real data were available since March 13, 2020, however, due to the change in the data collection methodology that took place on April 29, 2020, we have considered it appropriate to only take into account data from May 2, 2020 on. In Figures 1 and 2, the evolution of the susceptible, infectious and recovered individuals is represented in the time period described above.
For the numerical simulations, we have considered that the digital twin receives data each 7 days and we have used M + 1 = 15 days for adjusting the coefficients and . Thus, t 0 s = 1, t 1 s = 15, t k s = t k−1 s + 7, for k ≥ 2. The adjustment of the coefficients (t 1 s ) and (t 1 s ) will be carried out using the data collected at times {1, … , 15}, for adjusting (t 2 s ) and (t 2 s ) we have used data collected at times {8, … , 22}, and so on.   To fit the data we have used a classical fitting method based on solving the following least squares optimization problem: where being (S(t), I(t), R(t)) the numerical solution to To solve numerically the previous equation we have used the Matlab command ode45 and, for solving the optimization problem (12), we have used the fminsearch Matlab function. Computations have been performed on a 2019 MacBook Pro (2,5 GHz Intel Core i5 with 4 kernels). The CPU time required to make the 48 adjustments was 6.88 s. We observe that in the fitting process, we have not used the S variable due to two reasons. On the one hand, the different scale in the data could produce unsatisfactory adjustments and, on the other, the variable S can be obtained from the previous ones taking into account the size of the population (2.702.592 individuals), so it is not necessary at the optimization step. In Figures 3 and 4 and Figures 5 and 6, we include two examples of the fitting process for t 1 s = 15 and t 17 s = 127, respectively. In the next figures, we present the evolution of the digital twin (DT) solution to the multivalued problem (4): in Figure 7, we find the evolution of the multivalued solution for the susceptible individuals (note that we have restricted the scale for a better display of the results); in Figures 8 and 9, we show the evolution for the infectious and recovered individuals,   respectively. Observe that given a time t ∈ [t s , t +1 s ), we have + 1 forecasts {(S(t, t k s ), I(t, t k s ), R(t, t k s )), k = 0, 1, … , }. In these figures, the continuous non-emphasized lines correspond to realities that would have occurred if the conditions for the spread of the pandemic had not been modified (confinement, social distancing, use of masks, etc.). In this way, the multivalued solution introduced (Definition 2) implies that the same situation can be reached at a given moment from different previous situations. Of course, some potential realities were avoided due to different measures implemented by the authorities. Now, it is also the case with vaccination against COVID-19. We would also like also to highlight that the basic reproduction number (2) changes in the period of analysis, as represented in Figure 10. Now, we will consider the Main Digital Twin (MDT) as the best forecast that we can make at all times (5). In Figure 11, we represent the evolution of the main digital twin for the susceptible individuals; in Figure 12, the evolution of MDT associated to the infectious individuals and, in Figure 13, the evolution for the recovered individuals. All of the numerical simulations have been done using the ode45 Matlab function. The total CPU time required to perform the complete simulation of MDT was 0.18 s.
Finally, we will compare the main digital twin in terms of definition (5) with the solution of the Stieltjes differential Equation (11). The numerical approximation of the solution to a system of Stieltjes differential equations was introduced in Fernández and Tojo 27 where a predictor-corrector numerical scheme to approximate the solution to a Stieltjes differ-  x + i, = x i,k + F s,i (t , x )Δ + g s (t ), x * i, +1 = x + i,k + F s,i (t + , x + )(g s (t +1 ) − g s (t + )), x i, +1 = x + i, + 1 2 ( F s,i (t + , x + ) + F s,i (t − +1 , x * +1 ) ) (g s (t +1 ) − g s (t + )), for every = 0, … , N and i = 1, 2, 3, being x = (x 1, , x 2, , x 3, ). For the numerical simulations that we will present below, we have considered h = 0.5. In Figures 14-16, the comparison between the numerical solution to (5) using ode45 and the solution to (11) using the scheme (13) for the susceptible, infectious and recovered individuals. Although the calculation of the distance between the numerical approximation of (5) and the numerical approximation (11) is not relevant since both are numerical approximations, in the simulations carried out we have observed a maximum difference of 1.07e+01 for the susceptible individuals, 4.08e+00 for the infectious individuals and 2.48e+00 for the recovered individuals. If we consider h = 0.25, we obtain a maximum difference of 5.56e−01, 5.65e−01, and 2.24e+00 for, respectively, the susceptible, infectious, and recovered individuals. The CPU time required to perform the simulation in the case h = 0.5 was 0.14 s and 0.15 s in the case h = 0.25.

CONCLUSIONS
The rise of digital twins motivated by industrial requirements puts into focus the need for this relatively new concept to be addressed and formalized in a clear and rigorous way. In this letter we propose a digital twin for a SIR mathematical compartmental model. In doing so, Stieltjes derivatives have been used intensively giving rise to a precise definition of solution. We have also analyzed the existence and uniqueness of the solution to the mathematical problem. Moreover, we have applied the proposed ideas to a specific problem: the spread of the pandemic of COVID-19 in Galicia. The numerical experiments performed show that the proposed definitions and results can be used to explain other pandemics.