Change time estimation uncertainty in nonlinear dynamical systems with applications to COVID‐19

The impact that each individual non‐pharmaceutical intervention (NPI) had on the spread rate of COVID‐19 is difficult to estimate, since several NPIs were implemented in rapid succession in most countries. In this article, we analyze the detectability of sudden changes in a parameter of nonlinear dynamical systems, which could be used to represent NPIs or mutations of the virus, in the presence of measurement noise. Specifically, by taking an agnostic approach, we provide necessary conditions for when the best possible unbiased estimator is able to isolate the effect of a sudden change in a model parameter, by using the Hammersley–Chapman–Robbins (HCR) lower bound. Several simplifications to the calculation of the HCR lower bound are given, which depend on the amplitude of the sudden change and the dynamics of the system. We further define the concept of the most informative sample based on the largest ℓ2 distance between two output trajectories, which is a good indicator of when the HCR lower bound converges. These results are thereafter used to analyze the susceptible‐infected‐removed model. For instance, we show that performing analysis using the number of recovered/deceased, as opposed to the cumulative number of infected, may be an inferior signal to use since sudden changes are fundamentally more difficult to estimate and seem to require more samples. Finally, these results are verified by simulations and applied to real data from the spread of COVID‐19 in France.


INTRODUCTION
The current COVID-19 pandemic has shown that most countries in the world were poorly prepared to deal with spread detection, analysis, and mitigation. For example, some reports state that the virus was present in Europe weeks before the official counting started. 1 This miss in detection can be attributed to the virus's relatively large asymptomatic spread. 2 Typically, hosts seek medical attention only once they show symptoms, and testing asymptomatic hosts is not prioritized due to limited testing capabilities. These limitations makes it difficult to obtain a good overview of the current infection state in a population. Furthermore, different countries measure cases and deaths of COVID-19 differently, making it difficult to obtain a good comparison between countries with regards to, for instance, the virus spread, mortality rate, and recovery rate. 3 Due to the quick spread of the virus, which led to hospital beds saturating in many countries, [4][5][6] virtually all countries implemented some form of non-pharmaceutical interventions (NPI). 7 The goal of these interventions was mainly to "flatten the curve," 8,9 implying that a reduction of the virus spread could be achieved through a change of behavior in the population. Mathematical models of the arguably most impactful NPI, namely lockdown, shows mixed results on how much it affects the spread. [10][11][12] These discrepancies arise from the different assumptions and models that are used to model the dynamics of the disease spread. For instance, Flaxman et al. 7 measure the impact of different NPIs on the reproduction number, R 0 , using raw data from 11 European countries that were dealing with the COVID-19 pandemic during the first wave of infections in the spring of 2020 by a priori assuming that the de facto change in the spread parameter occurs the same day that the NPI is implemented. Their analysis attributes most of the actual change in the spread to the final NPI which could be due to the inherent delays in the system, either because it takes some time for an NPI to be effective, or because a change in the spread only manifests itself in the measurements a few days later. Additionally, the separability of multiple changes implemented in rapid succession is hard to assess, which makes it difficult to confidently attribute parts of the change to a specific NPI. Therefore, since the de facto change occurs much later than the intervention dates, the cumulative change tends to become aggregated in the last considered time step, implying that there is a large bias in the estimation of the NPIs' impact. Finally, the Bayesian framework, which is used in Flaxman et al. requires an assigned prior on the estimates, 13 which could be subject to systemic error based on prior assumptions about the spread.
These problems highlight the need for different tools in the analysis of the NPI efficiency, which is the main motivation for this article. Specifically, we aim to provide fundamental analyses that shine light on what underlying structures in a dynamical system determine estimation uncertainty. By properly understanding estimation uncertainty, we may answer common questions that were posed during the COVID-19 pandemic, such as; how long should the NPIs remain in effect, 14 when it will be known if an NPI has worked, and, finally, if a specific NPI has worked at all. [15][16][17] Given a disease spread model, these questions can be linked to the uncertainty in particular parameters of the model. A popular parametric model of diseases is the susceptible-infected-removed (SIR) model. 18 This disease model is described by two parameters, the spread rate between infected and susceptible population, and the removal rate which captures either a deceased or recovered population who is now immune from becoming infected again. The ratio between these two parameters gives the R 0 spread parameter, the reproduction number, which is frequently used in the media since it quantifies the average number of people an infected person spreads the disease to the course of their infectious period. 19 The SIR model, and its variants, have been used together with physical data to analyze the spread of a wide variety of different diseases, 20 how they spread using common resources between populations, 21 and how diseases spread across networks. 21,22 In this article, we derive fundamental limitations in the estimation of change time in a dynamical system. Specifically, we analyze whether it is possible to attribute changes of the transmission rate of COVID-19 to specific NPIs using time series of the cumulative number of infections, and the number of recovered and deceased. It is based on our previous work in input privacy, 23,24 where additive noise was used to hide when step inputs were applied to a dynamical system. The theoretical tools that were developed for those problems allowed us to provide fundamental detectability limitations. We used these limitations to analyze how system dynamics and noise variance affects the estimation quality for all estimators. The similarity to how an NPI affects the way a disease spreads in a population comes from the proposition that we can treat the change of the disease spread parameter as a change in the input to a dynamical system which represents the disease spread in a population. Although we will focus on the SIR model to describe the disease transmission in a population, the theory we develop to do so will be done for general dynamical systems. The results will therefore also be of interest to and could be applied in areas outside the scope of this article. Additionally, we choose to work with the standard SIR model since it has been used as a starting point in many works predicting the spread of diseases, most recently for COVID-19 25 Due to the immediate nature of the NPIs that were implemented during the COVID-19 pandemic 7 (lockdowns, closure of schools, and banning public events, for instance), we can model them as step changes of the spread parameter in the SIR model, since they suddenly change the dynamics of the spread. Thus, these theoretical tools can also be used to analyze the detectability of NPIs, and therefore also how certain we are of their effectiveness. Also, although we focus on the impact that NPIs have on the spread parameter, the analysis can also be used to detect an increase in the disease spread due to, for instance, new variants of the virus.
Typically, fundamental limitations on the estimation quality are represented by a lower bound on the estimation variance. In our previous work, this limitation was obtained through the Hammersley-Chapman-Robbins (HCR) lower bound. 26,27 A more popular method is to calculate the Fisher information matrix, whose inverse also gives a lower bound on the estimation variance 28 and is called the Cramér-Rao (CR) lower bound. In fact, the CR bound may be attainable using a maximum likelihood estimator with the number of samples tending to infinity. 29 One may use it to explicate the underlying model structure that determines the uncertainty. For example, additive noise has previously been used to hide parameters from an adversary. 30 In particular, the Fisher information matrix was exploited to find the uncertainty of the estimated parameters, and noise was injected until this lower bound was above an acceptable threshold. The same idea was implemented in order to hide the state variables from an adversary, where the Fisher information was used to quantify how difficult it was to estimate the state. 31 Several other types of confidentiality breaches have been quantified by the Fisher information. 32,33 The limitation of Fisher information is that it can only provide a lower bound to parameters that fulfill a certain regularity condition, for instance being continuous. Many quantities, especially in disease modeling, usually only take discrete values. Examples of this are the number of infected, hospitalized, or recovered. Additionally, data that is released by health organizations throughout the world is usually done on a daily, weekly, monthly, or yearly basis. Therefore, it is necessary to be able to also capture changes of parameters that only take discrete values. The HCR bound, on the other hand, does not have this regularity restriction and could be applied to parameters that take discrete values. However, the HCR bound is more difficult to compute, since its obtained by solving a non-convex optimization problem. Additionally, if the parameters only take discrete values, then the HCR bound is obtained by solving a combinatorial optimization problem.
Here, we are not specifically interested in the uncertainty of how large the impact of an NPI is, but rather, we are interested in the uncertainty of when an NPI was applied. The reason why we are interested in the temporal uncertainty is because we want to know how far apart the NPIs must be separated in time to be able to draw conclusions on their efficiency. We will take an agnostic view of the step changes and assume that we do not know when they were implemented within a certain time window. The motivation for this assumption comes from the results reported by Flaxman et al., 7 where the effect of multiple NPIs for COVID-19, represented by step changes in rapid succession, were investigated. For most of the cases, they concluded that only one of the changes, typically the last one, was overwhelmingly larger than the others. Thus, it is reasonable to represent this problem as one change that occurs over a specific time window. Therefore, we restrict ourselves to only one change and investigate how inaccurate the best possible estimator would be if this is the only change that occurs. Directly characterizing the properties of the best estimator of the change point using finite data is a difficult problem, even in relatively simple scenarios. 34,35 Hence, we will instead compute lower bounds on the estimation error variance that apply to all estimators. The change time is a discrete parameter, meaning that the regularity condition required by the CR bound is violated here. 36 The HCR bound 26,27 offers the solution to this problem and generally provides a tighter bound than the CR bound. 37 In fact, when applicable, the CR bound can become a special case of the HCR bound. 36 Previously, the HCR bound has been used to answer questions regarding privacy, 23,24 quality of estimation, 37,38 and biochemistry. 27 Therefore, although we will specifically analyze the SIR model in this article, the problem statement will be formulated in general terms and the subsequent results that are derived may therefore be used to answer similar questions for other applications as well.

Contributions and organization of the article
The contributions of this article are the following.
The first contribution directly extends our previous results 23,24 to nonlinear dynamical systems, where the property that the HCR lower bound only depends on the difference between alternative trajectories in the 2 -norm carries over to nonlinear dynamical systems. When there are multiple abrupt changes, our second contribution can be used to determine if one abrupt change in the dynamical model can noticeably affect the estimation uncertainty of another abrupt change. Such an analysis in the temporal direction is missing from most works trying to measure the impact of NPIs. 7,8,[10][11][12] Once the estimation uncertainty has converged, we show that any additional samples will not reduce the estimation uncertainty of the abrupt change. Hence, if a second, subsequent abrupt change occurs after this convergence, the two changes will be sufficiently decoupled so that the effects of each individual change can be independently estimated, which typically increases the reliability of its estimated amplitude. The third contribution helps us reduce the search space of the combinatorial optimization problem that is required to obtain the HCR bound. Finally, the fourth contribution may help to explain why there are contradictions in the literature about the effectiveness of different NPIs. In most countries, such as France, the NPIs were implemented within 1-2 days of each other. For those cases, our rudimentary analysis indicates that estimation of NPI effectiveness based on the SIR model may not be possible.
The organization of the article is as follows, in Section 2, we introduce the HCR bound which is the main mathematical tool that we use for the analysis. We also introduce the SIR model in Section 2, which will serve as an example system that the theoretical results are applied on. The problem setup is formulated in Section 3, both for the general case and the SIR model. In Section 4 we derive the lower bound on the estimation variance for the change time in nonlinear systems. A measure of information with regards to the change time is introduced in Section 5, which can be used for analysis in a low-noise case, or as an initial analysis. How estimators utilize this information quantity is analyzed in Section 6, where it is shown that when the measurement noise is large an unbiased estimator is not able to use all of the generated information about the change. Also, we apply the proposed theoretical analysis and show that one may conduct a simplified analysis on two types of measurements. This simplified analysis is then used in Section 7, where an SIR model is determined using data from the first wave of the COVID-19 pandemic in France. Finally, the article is summarized and concluded in Section 8.

PRELIMINARIES
In this section, we go over the mathematical tools that are used in this article. Specifically, we present the HCR bound and provide some intuition with regards to its different components. After stating the HCR bound, we present the SIR model and its sampled approximation.

HCR bound
The HCR bound is a classical result, which we state here for the completeness of this article. The proof of this theorem can be found in the original papers. 26,27 Theorem 1. Let ∈ Θ ⊂ R q be a vector of unknown parameters and let x ∼ p(x | ), where p(x | ) is the probability density of x given the parameter ∈ Θ. Then, the estimation variance of any estimator, (x) ∶=̂, of the parameters using the sample x is lower bounded by, where, In other words, this theorem states that the variance of the estimated parameters is lower bounded by the largest fraction between E 2 and the 2 -divergence between the true and an alternative probability distribution. Using Neyman's version, the 2 -divergence is defined as, which can be simplified to, The 2 -divergence has historically been used as an information measure with regards to hypothesis testing. Note that the 2 -divergence, or simply the denominator of (1), does not depend explicitly on what estimator is used. The estimator does, however, appear explicitly only in the numerator (2), which essentially measures how the estimator bias changes for alternative parameters. For simplicity, we will only consider the unbiased HCR bound, E 2 = 2 , in this article. It is always possible to pick a biased estimator so that the estimation variance becomes arbitrarily small, for instance by choosing a trivial estimator that always gives the same estimate, independently of the data. Bias may not always influence the estimation negatively, since a small amount of bias may reduce the variance substantially, depending on the bias-variance trade-off.

SIR model
In this section, we state the standard SIR model together with some equivalent formulations that are obtained by variable changes which will prove to be useful for the analysis in Sections 5 and 6. The continuous-time dynamical equations of the standard SIR model are given by where S ∈ R + is the number of individuals who are susceptible to an infection, I ∈ R + is the number of infected individuals, and R ∈ R + is the number of individuals who have recovered or died from the disease. The parameter denotes the spreading rate of the disease, denotes the removal rate, and P denotes the total number of individuals in the population. It might be hard to work with the model representation (4). We can rewrite it to a more manageable form by exploiting that the population number P is constant. A constant P implies that the model ignores processes that might affect P, such as assuming that births and deaths from other causes than the disease occur at an equal rate, or do not occur at all. Either way, this assumption may affect the model's long-term predicting capability negatively. We will assume P to be known, which means that we can write S = P − I − R. Inserting this expression for S and rewriting the dynamical equations we get which is possible to do since I > 0. Thus, we only need to work with a second-order system. Performing a change of variables to ln(I) = i we get We can also change the second state into Z = I + R, which gives the following system Both representations (5) and (6) will be used in this article, where i will be treated as an internal state of the system, and R and Z will be the measured states. Specifically, we will determine which states give better estimation certainty and therefore, should be measured to provide as much information as possible about a change in the spread parameter .
In order to obtain discrete-time dynamics, we will approximate the time derivative using the Euler forward stepping method with length h, which means that for a signal ṡ Applying this result to the SIR model (5) gives the following discretized model, and for the state representation in (6), we have Occasionally, we will also use a transformed representation of the susceptible state, w = ln S, as the internal state instead of i. The dynamics of that new variable is given by,

PROBLEM FORMULATION
Consider a nonlinear, discrete time-varying system with additive noise on the output where x k ∈ R n are internal states of the system, z k ∈ R p is the noiseless system output, y k ∈ R p is the noisy system output and e k ∈ R p is the additive zero-mean Gaussian noise with E [ e k e ⊤ l ] = Σ if k = l and zero otherwise. The system is defined through the model parameters, which are represented by the vector ∈ R 2q and the vector ∈ R r . An observer that measures the noisy outputs, y k , knows the form of the model M(⋅, ⋅, ⋅), the initial state x 0 , and the known parameters . However, the observer does not know the unknown parameters .
Assume that the dynamics have the following form, for some unknown time step ∈ {1, … , N}. Since our unknown parameter is time step of the change, we will for clarity denote it as k * ∶= ∈ Θ. Let z k be the noiseless output of an alternative trajectory, meaning that this output follows the same model which is described in (11), but with a different change time, namely k * + ∈ Θ. With this notation, we can write the output of the original trajectory as z 0 k , or simply z k . Let us denote the set of models where is unknown as , where M(x 0 , , ⋅) ∈ .
Assume that the observer uses an estimator, (Y , M(x 0 , , ⋅)), where Y = {y k } N k=1 are the corrupted measurements from system (10), in order to estimate . We propose the following problem, Problem 1a. Consider an estimator, ∶ R pN ×  → Θ, that takes measurements, Y ∶= {y k } N 1 , and a partially known model, M(x 0 , , ⋅) ∈ , as input in order to produce an estimate of the parameter . What is the lowest possible variance for such an estimator as a function of the samples, Y ? Remark 1. We will use the HCR bound to answer Problem 1a. However, the HCR bound is not attainable in general. The solutions which we provide will only provide a bound to Problem 1a. Therefore, we will only be able to conclude when an estimate is not reliable.
The generality of the statement in Problem 1a means that we can apply the answer to similar questions for a wide variety of dynamical models. Answers to Problem 1a will naturally depend on what model type  is used, which we will show through two examples in Section 4. Asymptotic properties of the estimation uncertainty, which we will derive, will therefore depend on the dynamics of the system. It will be shown that the SIR model fulfills these conditions and one may therefore use those results to conduct an analysis of the mathematical model. Remark 2. If is also unknown, then Problem 1a would still be of interest, since for dynamics like (11) would act as a nuisance parameter, which means that it would need to be estimated together with . Therefore, the necessary conditions that we derive will still hold and the estimation uncertainty will be larger than for the case when is known.

Change time estimation for the SIR model
In this article, we will illustrate the theoretical results mainly by using the SIR model as an example. Specifically, we want to apply Problem 1a to the SIR model in order to quantify the fundamental uncertainty of estimating the change time for a change in a model parameter. Additionally, we want to quantify how the estimation improves as the number of samples N grow. Since NPIs suddenly affect how the virus spreads, the parameter of interest will be the infectivity rate .
For the examples where the SIR model is used, we will answer the following question, which is a special case of Problem 1a.
Problem 1b. Consider an estimator, ∶ R pN ×  → Θ, that takes sample measurements, Y ∶= {y k } N 1 , and the discrete SIR model (7) or (6) as an input in order to produce an estimate of the parameter = k * . What is the lowest possible variance for such an estimator as a function of the samples, Y ?
Implicitly, Problem 1b states that in the theoretical development for the SIR model, we will for simplicity assume that is known both before and after the change, ∈ { , }. Therefore, the change time = k * will be the only unknown parameter. Assuming that is known is not a large infringement on the problem as was discussed in Remark 2, since the change time is typically a nuisance parameter that needs to be estimated before estimating . In that case, the results in this article will still hold, but they will be more conservative.

A LOWER BOUND ON CHANGE TIME ESTIMATION
Bounding the estimation variance through the HCR bound has previously been successfully applied to linear dynamical systems. 23,24,[37][38][39] One can analogously derive the HCR bound for system (11). For brevity, from now on we will omit the explicit dependence on the model M(x 0 , , ⋅). However, the reader is encouraged to keep in mind that the results still explicitly depend on dynamics of the system.
be an unbiased estimator of the change time, k * , where the measurements Y are generated by (10). The variance of any such estimator is lower bounded by where and is the set indices corresponding to time steps of the samples in Y.
Proof. The proof follows from inserting (11) into the calculation of (1). First, note that for an unbiased estimator, we have that E [ (x) | k * ] = k * , which gives that Second, we calculate the 2 -divergence (3). Note that the probability density is given by which in turn implies that . Now, taking the integral of this, we get, which, when combined with (3), gives the denominator in (12). Finally, since the supremum is taken over a finite, discrete set of , we can change it to a maximization. This concludes our proof. ▪ The lower bound (12) highlights a couple of interesting properties. For example, a sampling instant that is able to lower the bound substantially may be thought of as containing a lot of information. Necessarily, this happens for time steps k where is large, for some . The bound implies that if the difference in the weighted 2 -norm between the true and some possible alternative trajectory is large, then it is easier to distinguish them in the presence of measurement noise. Conversely, when the difference (13) is small, the sample will not improve the lower bound substantially since that particular sample will not help an estimator to distinguish the true trajectory from the alternative ones. Due to the maximization over , a problem with the HCR bound is that it is difficult to make general theoretical predictions without checking all possible values of . However, we will show that if the dynamical system (11) fulfills some conditions, at least asymptotically, then we can restrict to a subset of values. Specifically, we will look at two cases that are exemplified by the SIR model. Consider Figure 1, where a simulation of the SIR dynamics is shown and note the asymptotic properties of the different states. The recovered and the susceptible states, R k and S k , respectively, converge to a non-zero steady state value, whereas the infected state I k , generally converges to zero. To gain some intuition behind which will be optimal in (12) when we measure these signals, we will now look at two corresponding, but simpler, cases. Let us start with an example that has a non-zero steady state. Example 1. Consider the system given by where e k ∼  (0, 2 ), and a ∈ R. Note that the noiseless output trajectory difference becomes The HCR bound for an unbiased estimator is then given by,  where c can take one of two possible values, c ∈ , ⌊⋅⌋ and ⌈⋅⌉ denotes rounding down and up to the nearest integer in R ⧵ {0}, respectively, and W is a real constant that can be obtained by evaluating a Lambert W -function. Thus, the optimal is given by * = ±c, ∀ 2 ∈ R + , and ∀a ∈ R ⧵ {0}.
Example 1 illustrates two specific properties of the HCR lower bound applied to dynamical systems which converge to steady state. First, we get that when  ( , k * ) ∝ , the optimal argument * will be approximately proportional to 2 a 2 , in the sense that we can write In Appendix A, we will show that once  ( , k * ) starts to increase linearly, it will be possible to restrict which we need to consider. Second, note that  ( , k * ) does not depend on any samples other than {k * , … , k * + | * |} (or possibly {k * − | * |, … , k * }). If one was to estimate the change time in Example 1, the only important samples for the estimation variance of the change time are the * samples immediately after the change, {k * , … , k * + | * |}. The sufficiency of these samples shows that eventually, additional samples will not reduce the uncertainty. In Figure 1, the states R k and S k converge to a non-zero steady state. Therefore, these two states will asymptotically behave similarly to the step response in (14). Now, let us consider the other asymptotic property, namely when the steady state tends to zero.
Example 2. Consider the system given by and where e k ∼  (0, 2 ). The HCR bound for an unbiased estimator is then given by, For very small 2 , the which maximizes (17) will parametrize an alternative trajectory that deviates the least from the true trajectory. The closest alternative trajectory is then obtained by = 1, and when 2 → 0, we get that * = 1.
Thus, there is some value in finding the closest alternative trajectory even for cases like the one in Example 2, since it will give the HCR bound for the low-noise case. For larger noise levels, the corresponding to the closest alternative trajectory can be used to bound * , since | * | can only increase as the noise variance 2 increases. Therefore, the low-noise case will serve as a baseline for the theoretical development and will be the topic of the next section. Thereafter, we will use the low-noise case as a stepping stone for analyzing systems with large noise levels.

INFORMATION CONTENT
As we have seen in Example 2, the quantity (13) can be used as the basis for a measure of information that can be extracted at time step k once the optimal has been found. For low noise levels, 2 → 0, the optimal in (12) is completely determined by the denominator, which is independent of what estimator is used. This quantity is therefore intrinsic to the signal and forms a fundamental limit on how much information can be extracted by an estimator. Therefore, based on the denominator, we make the following definition: The information content pertaining to the change time k * for a set of samples , sampled at time steps k, where k ∈ , and corrupted by zero-mean Gaussian noise, is given by the smallest 2 -divergence, namely where 0 ∉  . Note that since is finite, we have that the set of candidates , where k * + ∈ , is also finite.
The minimizer of (18), which we denote by I , parametrizes an alternative trajectory which is closest to the true trajectory in the 2 -norm during time steps , Thus, one can interpret I as the parameter which corresponds to an alternative trajectory that is most easily confused with the true trajectory. Under that interpretation, the quantity (18) measures the amount of information available in the signal which can be used to discriminate between the true and its most similar trajectory. This information measure is a special case of the Barankin information matrix. 40 Specifically, it is recovered from the Barankin bound by using a single test vector which maximizes the Barankin bound in the low-noise limit. Evaluating the information content (18) requires that the 2 -divergence is calculated N − 1 times. For very large data sets, this can be computationally expensive, which motivates the need for a restriction in . Example 1 shows that some samples could be removed without impacting the information content. Specifically, this also implies that the corresponding can be removed from the calculations. Therefore, if there exists alternative trajectories z k that converge to the true trajectory, z k , during for some ∈  , then there also must exist a subset of  that does not need to be considered when evaluating (18). The next lemma characterizes the set of candidates of which are sufficient to consider when evaluating (18).

Lemma 2.
The set  of candidates which minimizes (18) can be restricted to and = min( I ) fulfills the following condition The minimizer to (18), I , is then in this new set, I ∈  I .
Proof. First, let us consider . The lemma states that all candidates ∈  I which are larger than , can be removed. This is true if 2 ,k * , ≤ 2 ,k * , + , ∀ ∈ N, which when written out explicitly becomes Simplifying this inequality gives (19). The second inequality (20) is obtained by using the same argument. The inequality 2 ,k * , ≤ 2 ,k * , − , ∀ ∈ N, can be written as which concludes the proof. ▪ Lemma 2 is essentially a reformulation of the statement that the minimizer to (18), I , parametrizes an alternative trajectory that produces a smaller 2 -difference compared to any other . However, the way the condition is presented in (19) gives us insight into how we can derive a useful tool for restricting . Let z k > z + k ; then the last factor in each term is positive. The first factor, however, is simply the difference between z k and the midpoint between z k and z + k . Thus, if z k is closer to z k than z + k , then the first factor is also positive, thus making the product positive. Subsequently, if the sum of all these products is positive, then we get that parametrizes a trajectory with a smaller 2 -difference than + . We use this lemma to establish a sufficient condition which gives us a tool for simplifying the calculation of information content for certain dynamical systems. Specifically, we use the following result.
Proof. First, consider the case where z k ≥ z i k ≥ z i+1 k , ∀k and ∀i ≥ 1. Additionally, let  N 1 = {1, … , k * − 1, k * + 1, … , N}. Then we have that, since the first factor is non-negative and the third factor is non-negative. Letting i = max( N 1 ) − 1 = N − 1, implies that we can remove the largest value in  N 1 by using Lemma 2. Now consider  N−1 1 and redo the process until i = 1, then we will be left with,  k * +1 1 . Similarly, we can remove the elements in  k * +1 1 from below using (20), since z i−2 k ≥ z i−1 k ≥ z k , ∀k and ∀i ≥ 0, leaving us with the set  k * +1 k * −1 . Finally for the other case, namely when z k ≤ z i+1 k ≤ z i+2 k and z i−2 k ≤ z i−1 k ≤ z k , ∀k and ∀i ≥ 1, we get an analogous proof. ▪ Remark 3. Note that the result in Theorem 2 can be generalized, simply by changing the conditions to ∀k and ∀i ≠ , for some ∈  , the same result will still hold.
As a result of Theorem 2, we are able to significantly simplify the analysis for the class of systems with parameter changes that do not cause trajectories to cross each other, since we do not need to consider a changing | | > 1. Additionally, this property extends to when future samples are added as well, implying that an analysis of the information content for these systems does not need to consider the case where changes when more samples are added. The SIR model provides two such signals that fulfill the condition in Theorem 2, which therefore are suitable to use to detect changes in the spread parameter. Consider one of the states of the SIR model, namely, the recovered state, R k . Essentially, members of that state are people in the population who are not able to contract, nor transmit the disease to a member of the susceptible population. The state is essentially an integrator of the infected state, I k . An integrator typically reduces the estimation variance of the change time, 23 mostly due to the fact that changes in pure integrator system produces alternative outputs that are persistently different from the true trajectory. This property allows us to state the following result. Theorem 3. Let the disease spread be captured by the SIR model and let the measurement variable be the number of recovered and deceased, R k . It is then sufficient to only consider the closest alternative trajectories of R k , that is, = ±1 in (18).
Proof. We will show that the conditions in Theorem 2 hold for R k . Also, we will only prove the theorem for the case for when the infection rate increases, > , however the proof is analogous for the opposite case, < . First, let w k denote an alternative trajectory of w k , which is defined by (9), in the sense that w k changes → one time step earlier than w k . Similarly, we let R k denote an alternative trajectory of R k where the change happens one time step earlier. Let the change occur at time step k * + . Then we have that w k * + = w k * + and R k * + = R k * + which gives that Note that the last factor is positive, P − R k − exp w k > 0, ∀k, since it is simply the number of infected, I k . The relation (22) is intuitive, if the disease begins to spread more quickly, > , then in one time step the size of the susceptible population will decrease more compared to the slower spreading dynamics. Now, we need to show that these two alternative trajectories do not cross each other during the outbreak, which is achieved by first noting that the dynamics fulfill the following equality, where C is a constant which could be obtained from the initial conditions of R k and w k . Therefore, we can rewrite the dynamics for w k , as and recall that for R k , We can see that which we obtain by combining (23) with (22), and observing that R k * + +1 = R k * + +1 gives us C − C = w k * + +1 − w k * + +1 < 0 from (22). From (24), since all the three terms are negative, we get that w k − w k < 0, ∀k > k * + . For R k , we analogously get, We know from the previous expression that the last term is positive for all k > k * + . Therefore, using the initial condition R k * + +1 = R k * + +1 , we get that R k − R k > 0, ∀k > k * + + 1. Finally, the proof is concluded by noting that R k (1) ≤ R k (2) ≤ … ≤ R k (N), ∀k, where R k (l) denotes the output of R at time step k, for a system where the change time is l. ▪ Theorem 2 does not apply to the SIR model where the infected state I k is the measured output, because all of its alternative trajectories will cross each others' paths. The integrating action that the recovered state R k introduces helps to ensure that no such crossing occurs between its alternative trajectories. The consequence of non-crossing trajectories is that the 2 -norm increases as | | increases, which can be seen in Figure 2 for a change in the spread parameter when R k is measured. It shows 2 ,50, -divergence, as a function of the time step k ∈ after a change in the model parameter at time Step 50 along the vertical axis, and the alternative trajectory parametrized by along the horizontal axis. First, we can see that Theorem 3 is verified, since the minimum occurs at = ±1, ∀k. Additionally, we can see that 2 ,50, ,50, -divergence as a function of the alternative change times, parametrized by , and the number of samples after the change, given by k ∈ . The true change time occurs at time step k * = 50, indicated by the vertical white strip. Note that for each time step k, the minimum occurs at = 1, which is given by the red crosses, and that the 2 ,50, -divergence increases as we obtain more samples k is larger for negative , as opposed to the corresponding positive , indicating that = 1 is typically favorable in the maximization of (12) rather than = −1. Finally, note that the 2 ,50, -divergence increases as we collect more samples and that it eventually converges to some value depending on . This convergence indicates that information about the change time is not present in the signal for all future time steps, and to obtain a good estimate of the change, one needs to start sampling relatively early.
Another type of measurement that is typically available is the total number of cases, Z k = I k + R k , which contains both the R k state, thus it contains an integrator, and the current number of infected which is a measurement that changes relatively quickly. In contrast to R k , where the change is usually visible once the infected people have recovered, the addition of the current number of infected means that changes should be able to be detected earlier. In the next corollary we show that this signal has the same useful property as R k and thus is equally easy to analyze. Corollary 1. Let the disease spread be captured by the SIR model and let the measurement variable be the part of the population that have at some point been sick, Z k = I k + R k . It is then sufficient to only consider the closest alternative trajectories of Z k , that is, = ±1 in (18).
Proof. The result follows from Z k = N − S k and (24), where it was shown that w k = ln(S k ) never crosses paths with any of its adjacent trajectories, w k (1) ≤ w k (2) ≤ … ≤ w k (N), ∀k, where w k (l) denotes the output of w at time step k, for a system where the change time is l. Therefore, alternative trajectories of Z k will never cross paths. Thus, the conditions of Theorem 2 are fulfilled. ▪ We have produced two states of the SIR model whose HCR bound on the estimated change time is relatively easy to analyze. During an ongoing epidemic, coincidentally, these two states are typically measured nevertheless, in order to understand the fatality risk and the disease progression through the population. A natural question one could ask is which of these two states provide the most certain estimate. Obviously, different levels of noise for the two signals will skew the estimation uncertainty in favor of one or the other. However, for similar noise conditions, we have a signal that definitely performs better, at least initially. Before we state the next result, we remind the reader that the superscript in, for instance, I k , denotes the signal's change time, k * + .

Proposition 1.
Consider the measurement sequences for the number of recovered, R k , and the total number of people that are or have been sick, Z k . For estimation of the change time, the measurement sequence Z k is more informative than R k until I k − I 1 k switches sign.
Proof. First, let use the same notation as in the proof of Theorem 3, namely that, for instance, R k denotes an alternative trajectory of R k where the change happens one time step earlier. For the first sample after the change time, R k * +1 − R k * +1 = 0, while, using ln where, since the second factor is always positive, the change in I k has the same sign as − . Additionally, the first sample of Z k − Z k = R k − R k +Ī k − I k =Ī k − I k after the change is more informative than the first sample of R k − R k = 0. Looking at the dynamics of the difference,̄k − i k for k > k * , we see that it is an integrator of z k , Note that if the spread parameter increases, > , the differencēk − i k becomes an integrator of a negative quantity. And sincēk − i k ≥ 0 initially, then eventually the sign will change since Corollary 1 implies that Z k does not cross paths with its alternative trajectories. This implies that Z k is more informative than R k until̄k − i k changes sign. ▪ Remark 4. Actually, Z k may potentially produce more information about the change compared to R k , ∀k. Specifically, in the continuous time domain, the difference in alternative trajectories of Z(t) are larger in the L 2 -norm compared to R(t). This can easily be seen by the relation R + I = R + 1Ṙ and for some T > 0 and ΔR 0 = 0. However, this useful property remains elusive when we discretize the dynamics using the Euler forward method.
Proposition 1 gives a time interval for when Z k definitely produces better estimates than R k . The time step where we can no longer guarantee a better estimation quality for Z k turns out to be important since the sample at this time step will decrease the HCR bound the most out of all other time steps, which we will see in the next subsection. Additionally, the same time step is important for convergence of the information content, which is shown in the next section.

Most informative sample
The term 2 ,k * , increases monotonically when more data points are sampled, which implies that the information content ( , k * ) also increases monotonically by adding more samples. Notice that there is subadditivity between the samples:

Lemma 3. Consider two sets of measurements, which have been sampled at time steps  and , then we have that
Additionally, we have that samples which do not increase the information content in a sample set contain no information on their own: Proof. For the subadditivity, we use the Taylor expansion of an exponential function, e x = ∑ ∞ n=0 x n n! . We get that where, in the first inequality, we have used that (x + y) n ≥ x n + y n , for x, y ≥ 0, and, in the second inequality, we have used min x f (x) + g(x) ≥ min x f (x) + min x g(x), which proves (26). For the second claim, we can simply use the inequality (26) and the positivity of the information measure to obtain the result, which concludes our proof. ▪ The set  in (27) corresponds to the time steps where there is no information about the change time. In Example 1, the data set sampled at  in (27) represents samples of (14) which are taken when k ≠ {k * , … , k * + | * |}, when both the true trajectory and the closest alternative trajectory in the weighted 2 -norm have either not had a change, or reached the same steady state after the change. Sampling when the closest alternative trajectory has reached steady state will therefore not add any new information about the change time. However, this claim can be relaxed by considering the time steps at which the trajectories have converged to each other. Proposition 2. The earliest time step at which no more additional information (k, k * ) is generated, is given by the k, such that where I is the minimizer of (18).
Proof. The result follows from the fact that we can split up any sample set into two sets,  and , where (, k * ) = ( , k * ) and {l, … , N} = , for some l ∈ N. Then, if there exists an l such that ( ∪ , k * ) = (, k * ), we can use Lemma 3 to obtain which is equivalent to (28). Finally, we set k = min{l ∶ ( ∪ , k * ) = (, k * )}. ▪ Proposition 2 provides us with a key insight that is needed for the development in the next section, namely, that it is possible to predict how future samples impact the information content if the future trajectory is known. Alternative trajectories of some dynamical systems only approach each other asymptotically. Therefore, we need another way to determine the convergence of the information content. Since the information content relies on the weighted 2 -difference between the true and closest alternative trajectory, we could use the time step that provides the largest difference. Any single sample after this time step will produce a smaller information gain due to the smaller 2 -difference and thus, the information content will start to converge. In fact, this time step will be the same one that gives the largest increase of (18) compared to other samples in the set, which we use as the formal definition.
Explicitly, the MIS for the dynamical system (10) corresponds to the sample l which maximizes the following expression, where I is the minimizer of (18). Note that the MIS should only be compared to other individual samples in the set. For instance, a combination of several other less informative samples could increase the information content substantially more than the MIS can do on its own. For our signals of interest in the SIR system, we can show when the MIS occurs. • For R k : • For Z k : Thus, the first time step that fulfills these conditions is the most informative one.
Proof. The difference of the two alternative trajectories of R k , given by ΔR k , never cross each others' paths. Therefore, it is only necessary to find the time steps when the two trajectories start to approach each other. Also, ΔR k increases while ΔI k > 0, since it is a integrator of ΔI k . Therefore, a maximum occurs when ΔI k , switches sign. For Z k we have from (8) that This difference starts to decrease when the second term is negative, thus implying that the difference is increasing up until that time step. This concludes our proof. ▪ Given Proposition 1 in the previous section, where it was shown that Z k produces better estimates than R k initially, it seems intuitive that most of the information content appears much earlier in Z k . In fact, we show that this is the case in the next proposition.

Proposition 4. Consider the measurement sequences for the number of recovered, R k , and the total number of people that have been sick, Z k . For the estimation of the change time, the measurement sequence Z k achieves its maximum information content earlier than R k . Additionally, the maximum information content for Z k is at least as large as R k .
Proof. The result follows from Proposition 3, where the condition for the maximum information for R k implies that the condition for Z k becomes Ẑk − Z 1 k > 0, which is always true. Additionally, note that the maximum information content for R k occurs when ΔI k = 0, which for Z k gives the difference ΔZ k = ΔR k + ΔI k = ΔR k . Thus, Z k will always produce at least one sample that has more information content than the maximum information content in one sample for R k . ▪ Proposition 4 states that Z k will stop generating information about the change earlier than R k . However, its most information dense samples will have more information than the MIS for R k . This concentration of information also explains why Z k produces better estimations of the change time initially, compared to R k .

INFORMATION USAGE FOR LARGE NOISE VARIANCE
As was eluded to in Section 5, the numerator in (1) depends on the estimator bias, which is 2 for unbiased estimators. In the low-noise variance case, it acts as a scaling factor to account for the distance between k * and k * + I . However, as we increase the noise variance we will come to a point where, effectively, the estimator no longer can distinguish between the true and the most similar alternative trajectory, given the current amount of samples. In the lower bound (12), this deterioration is characterized by a change in the optimal to a larger integer, I < * . As was seen in Example 1, when the noise level is increased the new optimal corresponds to the difference between the true and second closest alternative trajectory that has a larger than the closest trajectory. Therefore, in order to effectively distinguish the true trajectory from an alternative one given a certain noise level, a sufficiently large gap in the 2 -difference needs to occur. This effect is shown in Figure 3, where we have plotted the HCR bound for R k , using the same SIR model that was used in Figure 2. The HCR bound is plotted as a function of the number of samples along the vertical axis and the alternative change times along the horizontal axis. Note that initially, when there are few samples in relation to the noise level, we get the optimum for * = N. Thus, the estimator will not be able to effectively distinguish between the true and any other the true change time is * = 50. For the initial time steps after the change, the maximizer of the HCR lower bound becomes the latest time step, * = k, which is indicated by the red crosses. Eventually, when the number of samples becomes large enough, the maximizer becomes * = 2, before it converges to * = 1, which is the same as the minimizer of the information content I , as seen in Figure 2 alternative change times. However, as the estimator collects more information, * eventually becomes 2, before finally converging to 1, which is the same that minimizes (18). Figure 3 shows that Lemma 2 cannot always be used to restrict the set of possible for (12) although the conditions of Theorem 2 are still fulfilled. However, we may be able to recover a similar result if we let the number of samples become large enough. Specifically, recall Example 1, where we obtained a restriction in because  ( , k * ) increases linearly in . In this section, we show that this property holds generally for the HCR bound on all dynamical systems. In order to prove it, we will first need an equivalent result of Lemma 2 for the large-noise variance case.

Lemma 4.
The set  of candidates which maximize (12) can be restricted to and = min( * ) fulfills The maximizer to (1) is then in this new set, * ∈  * .
Proof. Let us again start by considering . Plugging it into the HCR bound (12) we find that in order to remove every that is larger than , the following must hold, which is the same as (30). Similarly, for the lower bound, we have that which is the same as inequality (31). ▪ Note that the set  * contains the elements from the set  * I that were given by Lemma 2, which contained the that minimized (18),  * I ⊆  * . This fact follows from which is a consequence of the extra importance the unbiased estimator puts on the early and late measurements. These inequalities give us a clear starting point for finding the maximizer for (12), The set of candidates of ∈  , can be restricted to | | ≥ | I |, where I minimizes (18).
In other words, the which parametrizes the alternative trajectory that is used to calculate the information content (18) also acts as the smallest | | that should be considered when evaluating the HCR bound. Thereafter, one can iterate the calculations of (30) for increasing to get a tighter lower bound until a maximum has been reached. The information content is therefore a good starting point to perform a first analysis of how the uncertainty evolves over time in order to gain intuition for when the quality of the estimation will improve. After that, one can use the analysis of information content as a stepping stone for calculating tighter lower bounds until the HCR bound is obtained.
Remark 5. For completeness, we present two methods in Appendix A to further restrict for general nonlinear dynamical systems. These results could be used to simplify the calculations on how future samples affect the HCR bound.
Lemma 4 gives us a property which enables us to project how the HCR bound will evolve asymptotically when future samples are collected.

Corollary 2.
Consider the noiseless, scalar, output of the dynamical system (10), z k , and the alternative noiseless outputs, z k . Suppose the same conditions as in Theorem 2 hold and the alternative outputs do not converge to the true output, for some > 0, some ∈ N, and ∀i ∈ {−(N − 1), … , −2, 2, … , N − 1}. Then, as N → ∞, it is sufficient to consider = ±1, which represents the trajectory closest to the original in the weighted 2 -norm, when evaluating (12).
Corollary 2 is the final piece of information that is needed for simplifying the analysis of the SIR model. With it, we obtain the following result. Corollary 3. The HCR bound for the estimator (Y ) using measurement Z k is equal or less than the HCR bound using R k until time step k > k * , where I k − I * k = 0, if the noise variance is the same for both types of measurements.
Proof. Let us prove this claim by contradiction. First, note that the 2 -norm for Z k − Z k is larger than R k − R k , ∀ before I k − I k changes sign for the first time. Now let * Z maximize the HCR bound for when Z k is used and similarly, let * R maximize the HCR bound for R k . Assume that the HCR bound using R k is smaller than the HCR bound using Z k , where  Z and  R correspond to the 2 -differences using outputs Z k and R k , respectively, and max( ) = k − 1. Then, since which implies that * R is not the optimizer of (12). Therefore, we conclude that (32) must be false. ▪

F I G U R E 4
The HCR lower bound for R k is shown together with the MUIS for both R k and Z k , as a function of the time step k and the size of the parameter change. First, note that the HCR bound stops decreasing approximately where the MUIS for R k occurs. Second, note that the MUIS for Z k always occurs earlier than for R k , implying that the HCR bound converges earlier to a steady state for Z k than for R k Corollary 3 gives us an answer to which measurement one should use to most reliably estimate a change time during the first time steps. Similarly as in Remark 4, the time-continuous SIR model again gives that ΔZ(t) is larger than ΔR(t) for any interval of time in the L 2 -norm. Therefore, it is possible that, at least for fast sampling, the discrete version of the SIR model would exhibit similar properties.
In Section 5, we defined the notion of MIS, which is a single sample in the sample set that increases (18) the most. However, the sample that is obtained by that specific definition might not be useful in the large-noise case. Fortunately, an equivalent notion may be defined for the sample that decreases the HCR bound the most: Definition 3. The most useful informative sample (MUIS) in a set of , with regards to the change time k * , is defined as where B(⋅, ⋅) is defined in (12).
For dynamical systems, the MUIS has the same expression as (29), with the difference that * maximizes (12) instead of minimizing (18). By useful, we mean that it is the information which the estimator actually extracts from the signal, which could be less than the total amount of information that is present in the signal.
For the SIR model, equivalent results to Propositions 3 and 4 can be derived using the that maximizes the HCR bound (12). The results will only differ through the change of I → * . Similarly to MIS, MUIS will not necessarily reduce the variance by a significant amount. If the MUIS occurs late during the sampling, then it will not play a major role in reducing the variance since the estimator has already gained a lot of information from the previous samples to conclude when the change happened. In Figure 4, one can see the impact of the MUIS for R k . Notice that the improvements in the HCR bound quickly become smaller after the MUIS, unless it has already converged. Since the MUIS is able to represent the boundary between large and small changes in the HCR bound, one may therefore use the MUIS as a measure to determine how long the sample horizon must be to sufficiently ensure that the HCR bound starts to converge.
For comparison, the MUIS for Z k is also shown in Figure 4. It verifies Proposition 4, which states that the sample set of Z k is more informative than R k , if they are sampled with the same amount of noise. The MUIS for Z k occurs much earlier than the MUIS for R k , which means that the lower bound may converge sooner for Z k than for R k .

APPLICATION TO COVID-19 DATA
We will now apply our theoretical results to real data from the COVID-19 pandemic in France. The data in this section is taken from the European Center of Disease Prevention and Control. 41 The two signals, Z k and R k , were obtained based on the cumulative number of cases, (d Z k ) N k=1 , and the cumulative number of deaths, (d R k ) N k=1 , where the sampling period is one day and the first time step k = 1, corresponds to March 1, 2020. However, before we can apply our results, we have to address two major systematic artifacts. First, the dynamic model (4) requires knowledge of the size of the population in which the virus is spreading. Typically within one country, only certain geographical areas are affected by the disease at a time. In Italy, for instance, the regions that were affected the most in the first wave managed to have a relatively small number of cases in the second wave, 42 indicating that the first wave only affected a subpopulation living in certain regions of Italy. To deal with this discrepancy, we let the population number in (4), P, be a tunable variable. Choosing P that best represents the data can be interpreted as finding the size of the subpopulation of France that was most affected by the COVID-19 spread during the first wave. Additionally, the data is missing a large number of unreported cases, for instance, the number of recovered individuals are not included in d R k and the number of cumulative cases are under-reported. In order to account for this deficiency, a scaling of the signals Z k and R k is needed. By assuming that the number of unreported cases are proportional to the reported ones, we can also account for this discrepancy through by tuning P. However, the choice of P for the two signals Z k and R k needs to be made separately to account for the different number of under-reported cases for the two measurements. The population P was chosen so that the output trajectory of the estimated SIR model using data for the entire horizon is able to follow the physical data. Finally, the equivalent susceptible population was chosen to be P Z = 168,000, and P R = 36,000. These choices of P resulted in the estimated removal rate being the same magnitude as the recovery rate of the disease, namely 1 to 6 weeks, which could be used as a verification that the choices of P Z and P R are reasonable.
The second artifact which has to be addressed is that many infection cases and deaths which are discovered during weekends are typically not reported until the following one or two weekdays. This phenomenon of under and over reporting of cases affects the variance of the time series notably, as seen in Figure 5, where we have plotted the histogram of where (d Z k ) N k=1 is the 7-day rolling average of (d Z k ) N k=1 . A similar noise distribution appears for d R k . One can see that there are two small peaks in addition to the central bell-like histogram; one to the left of the origin (under-reporting), and F I G U R E 5 The difference between the raw data signal of d Z k and its corresponding rolling 7-day average, d Z k , which is used to estimate the Gaussian noise variance, is shown. One can see that it approximately has a normal distribution. Additionally, note that there are two smaller additional bell-like curves at approximately ±0.008, which is due to the over-and under-reporting of cases around weekends one to the right of the origin (over-reporting). Figure 5 illustrates that the error could be approximated by a Gaussian, zero-mean measurement noise with some data contamination which gives rise to the additional small peaks. In order to accurately estimate the variance of the measurement noise, we use ê k as the estimated instance of the noise, thus making the estimation variance not depend on the model of the underlying process. Additionally, to deal with the contaminated part of the data, we use a robust estimator called the Median Absolute Deviation (MAD)-estimator, 43 which has historically proven to be able to be more resistant against a small number of outliers. The variance is then calculated aŝ

Certainty of NPI
We use the data for infection spread in France to perform the analysis. The reason for this choice is that the French government implemented their NPIs in rapid succession over 5 days. The rapid succession means that we can model these multiple NPIs with a single NPI corresponding to the cumulative change. Additionally, we only consider the first 80 days of the breakout in the first wave since the SIR model (7) does not take geographical spread into account since P is constant. If we consider longer periods of time, the infection spread will behave in a way that is not perfectly captured by the single-population SIR model (7). For instance, by spreading to geographically remote populations of a country which have not come into contact with the virus before an entirely new infection cycle could start. An additional factor is that an increase in the rate of testing also leads to an increase in the number of currently infected in the data. This increase is another effect that is not captured by the SIR model and therefore introduces additional modeling error for longer time horizons. Finally, we mention that the estimated variance, 2 × 10 −5 , of the measurement noise was low enough so that only = ±1 was needed to be considered when conducting the analysis, which verifies Corollary 2.
Consider Figure 6, which shows the temporal evolution of the number of deceased d R k , and the cumulative number of infected d Z k , together with the NPIs of case-based self-isolation, encouraged social distancing, banning of public events, school closures, and lockdown. Again, one may see that all the NPIs were implemented within 5 days. In Figure 7 one may see the running estimated change time based on the two signals. For the first 35 days, one can see that the plot has a trend that is increasing, indicating that the estimator has not detected a significant change of the viral spread in the population and therefore does not produce a consistent result as the number of samples grow. However, once time Step

F I G U R E 7
The plot shows the running estimates of change times for the total number of recovered/deceased, R k , based on d R k , and the total number of infected, Z k , based on d Z k . The fundamental uncertainty, quantified here by the HCR lower bound, for the two corresponding signals are shown as the shaded area in red for R k , and in black for Z k . Note that the lower bound on estimation variance tends to zero as the number of time steps grows, which occurs almost immediately for Z k . For R k , the uncertainty was almost zero on May 1st, nearly 45 days after the NPIs 36 is reached (April 5th), the change time estimates starts to be consistent at around time Step 32 for Z k . However for R k , the estimated change time becomes approximately 36 and the volatility decreases much later at k > 60.
In Figure 7, the HCR bound for the removed R k and cumulative state Z k is shown in addition to the estimated change time. Notice that the measurement of the total number of infections, Z k , has a much lower HCR bound than for the removed state, R k . The fact that Z k provides better estimates than R k is consistent with Proposition 1, which states that it should, at least initially, provide more informative samples. Additionally, one can see that the fluctuations of the estimations correlate closely with how large the HCR bound is. For instance, at day 60, the large fluctuations in R k stop, which coincides with when the HCR bound becomes small. Thus, the HCR bound successfully predicts that R k will fluctuate heavily based on the realization of the noise up until day 60. On the other hand, the HCR bound for Z k does not predict that it will have large fluctuations and the estimates of the change time reflect this. Therefore, during most of the time steps for Z k and after time Step 60 for R k , the limitation on the estimation uncertainty disappears, implying that there could be an efficient estimator that is able to perfectly estimate the change time and thus is better than the one defined by (34) and (35).
Finally, note that the estimated change time is shifted 2-3 weeks compared to when the actual implementation of the NPIs occurred. This delay seems intuitive, since most testing was done once an individual shows symptoms, which can happen up to 2 weeks after exposure. Similarly, since recovery and death from the disease is additionally delayed from the time step when symptoms start, the change in R k due to the NPIs is delayed further compared to Z k . Parts of the delay may be captured by using an SEIR model, where the additional E k state constitutes members of the population which have been in contact with the disease but not yet started to show symptoms. In practice, however, people who are detected while in the asymptomatic phase are typically put into the infected state I k (with regards to reporting confirmed cases). Therefore, an NPI could immediately show up as a change in I k due to the (small) portion of the population that are asymptomatic and have been detected. This effect could explain why Z k also seems to estimate a change time early on with a relatively low HCR bound, at roughly time Step 15, which fits well with when the NPIs were actually implemented in France. Thus, an estimator that detects this early change with high certainty could potentially exist. Additionally, this estimation persists until 2-3 weeks after the change, which corresponds to when the asymptomatic phase ends for people that were infected after the implementation of the NPIs. However, R k becomes more prevalent than Z k in the latter time steps, which skews the result closer to the detected change time for R k .
As was previously mentioned, the detected change is actually made up of four smaller changes that occurred over 5 days. A key question we posed in Section 3 was if it was possible to separate the effects of these NPIs to attribute reductions in the spread parameter to particular NPIs. Consider Figure 8, where we have plotted the HCR bound for Z k as a function of the time step after the change and percentage of the total change, based on the estimation of the change at time Step 70 using d Z k . One may see that if the total change is attributed to one NPI, then it may be possible to estimate the change time the very next day, since the HCR bound says that the estimation uncertainty is lower bounded to the order of ±10 −1 days. However, since there were four NPIs in rapid succession, at least one of them will contribute to at most 25% of the total change. Figure 8 implies that a change of that magnitude will at the earliest be reliably detected (variance less than ±1 day 2 ) after 3 days. Furthermore, if one of the NPIs had no effect on the spread, then some of the remaining ones will at most affect 33% of the total change, which corresponds to an earliest detection at least 2 days after the implementation according to Figure 8. The results are confirmed by using d R k as well, as seen in Figure 9, where the full change (100%) is detectable only after 4 days. For the worst case scenario, where each NPI contributes equally to the change, the change time could potentially be reliably estimated only 7 days after the change. Hence, we have obtained another confirmation of Corollary 3, which stated that the HCR bound for Z k would be lower than the HCR bound for R k .

Most informative sample
As previously discussed, it is important to be able to know if the HCR bound has converged, implying that additional samples will no longer improve the lower bound. From Section 4, we show through an example that once the MIS has been collected, then no major improvements to the HCR bound can be expected from future samples. Figure 10 shows when the MIS is expected to occur, based on the samples up until day k, where {d Z k ∶ ∀k < k} and {d R k ∶ ∀k < k}. Recall that since * = I , the MUIS will be the same as the MIS. Throughout the time window, one can see that the MIS is expected to occur after 29 days for both types of measurements. As time Step 29 is approached, the estimated MIS increases. From the plot for Z k , one may see that the MIS is eventually estimated to be between time Steps 40-45. Note that the curve for the estimated change time using d Z k has converged well before day 40, implying that the information collected before , for up to 15 days after the change. For instance, for a change that is half of the total detected one, Δ ( − ) = 0.5, this plot shows that the earliest day we can detect it is on the second day after the change, where the HCR bound is approximately 1. Therefore, 2 days after the change, an unbiased estimator will produce an uncertainty of at least ±1 day F I G U R E 9 The plot shows the HCR lower bound for signal R k of the estimated change time as a function of the relative size of the change, Δ , for up to 15 days after the change. For instance, for a change that is half of the total detected one, Δ ( − ) = 0.5, this plot shows that the earliest day we can detect it is on the fifth day after the change, where the HCR bound is approximately 1. Therefore, 5 days after the change, an unbiased estimator will have an uncertainty of at least ±1 day occurs. For instance, the MIS for measurements based on Z k is initially estimated to be at approximately time Step 29, but eventually converges at day 42, indicating that the HCR bound for Z k should start to converge by then. However, one may see that the HCR bound for Z k already has converged at day 37. The MIS based on R k is initially estimated to be at approximately time Step 40-50, but it eventually settles to be approximately at time Step 57, which corresponds well with when the HCR bound converges and when the volatility of the estimated change time becomes small the MIS is sufficient for some estimator to perfectly estimate the change. Therefore, the MIS does not add any additional information about the change compared to what has already been collected. For measurements based on R k , the MIS happens a few time steps later than for measurements based on Z k , verifying Proposition 4. The MIS for R k is eventually estimated to be between time Steps 50-60. Here, however, the impact of the MIS is very noticable. In Figure 7, one can see that the HCR bound converges shortly after time Step 60, which is a few time steps after the MIS has been collected. We can therefore conclude that the MIS is a sufficient indicator of when the HCR bound starts to converge, if it has not converged yet.

CONCLUSIONS
We have analyzed the detectability of multiple change times for sudden parameter changes in nonlinear systems using the HCR lower bound. Using the lower bound, we defined the notion of information content pertaining to change time in a measurement signal, which determines the quality of estimation in a low-noise scenario. The information content is identical to the Neyman 2 -divergence between the true signal and the most similar alternative signal. Typically, both the lower bound and the information content are calculated by solving a combinatorial problem. We have presented results which helps to reduce the number of candidates that needs to be evaluated, which can be used to speed up calculations of the HCR bound. We then gave conditions for dynamical systems that only need one candidate to be considered when calculating the HCR bound, which enabled us to compare how the quality of detection evolves over time. Additionally, we defined the notion of most (useful) informative samples which, according to simulations, seems to be a good indicator of when the information content and HCR bound starts to converge. This theoretical framework was then applied to detecting changes to the infection parameter in the SIR model, where it was shown that the measurements of the total number of infected, Z = I + R generates more information and provides a higher quality estimate of the change time than measurements of the number of recovered or deceased R. These changes could be model the effects that NPI (decrease in ) or new variants (increase in ) have on the spread of an epidemic. The theory was then applied to real data from the COVID-19 pandemic in France during the first wave in spring 2020. It was verified that measuring Z gave a higher quality of the estimate than R in multiple instances. Additionally, there exists no estimator which is able to separate the four NPI which were implemented over a 5-day period. However, the calculations show that for Z, an estimator would at the earliest be able to separate the NPIs if they had been implemented with at least a 3-day delay between them. For R, estimating the corresponding change would need to have at least a 7-day delay between the implementations. Proposition 6. The set of candidates of  can be restricted to  + , where * ∈  + is the which maximizes (12), such that = max( + ) = I + ⌈ ⌉. Here, I = max( I ) from Lemma 2 and > 0 fulfills the following equation ) .

(A2)
If there is no that fulfills this equation, then we can set = N. Similarly, the set can be restricted from below using the same expressions, but by replacing I , , and with I , − , and − , respectively.
Proof. We will only prove this proposition for the upper bound, . The proof for the lower bound I is analogous. First, note that we want to find for which l ≥ 1, the following inequality holds,