SEARCH

SEARCH BY CITATION

Keywords:

  • Infectious disease epidemiology;
  • Real-time surveillance;
  • Reporting delay;
  • Truncation

Summary

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

A Bayesian approach to the prediction of occurred-but-not-yet-reported events is developed for application in real-time public health surveillance. The motivation was the prediction of the daily number of hospitalizations for the hemolytic-uremic syndrome during the large May–July 2011 outbreak of Shiga toxin-producing Escherichia coli (STEC) O104:H4 in Germany. Our novel Bayesian approach addresses the count data nature of the problem using negative binomial sampling and shows that right-truncation of the reporting delay distribution under an assumption of time-homogeneity can be handled in a conjugate prior-posterior framework using the generalized Dirichlet distribution. Since, in retrospect, the true number of hospitalizations is available, proper scoring rules for count data are used to evaluate and compare the predictive quality of the procedures during the outbreak. The results show that it is important to take the count nature of the time series into account and that changes in the delay distribution occurred due to intervention measures. As a consequence, we extend the Bayesian analysis to a hierarchical model, which combines a discrete time survival regression model for the delay distribution with a penalized spline for the dynamics of the epidemic curve. Altogether, we conclude that in emerging and time-critical outbreaks, nowcasting approaches are a valuable tool to gain information about current trends.


1 Introduction

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

During May–July 2011, Germany was confronted with a large outbreak of gastrointestinal disease caused by Shiga toxin-producing Escherichia coli (STEC) O104:H4 associated with sprouts consumption. A total of 2987 cases of diarrhea without the hemolytic uremic syndrome (HUS) complication and 855 cases of HUS were attributable to the outbreak, making this one of the largest STEC outbreaks ever reported (Buchholz et al., 2011; Frank et al., 2011). During the outbreak, it was vital to have daily information on current epidemic trends in order to judge, if the outbreak was ongoing, assess the impact of control measures and perform capacity planning. However, such real-time tracking is complicated by the inherent delay in public health reporting systems between the occurrence of the event, for example, time of symptom onset or hospitalization, and the time the report becomes available in the public health surveillance database. Following Donker et al., (2011), such delay-adjusting tracking procedures are called nowcasts in the public health setting.

We address the nowcasting task in the statistical framework of the occurred-but-not-reported-events problem (Lawless, 1994). Here, estimation of the delay distribution takes the inherent right-truncation of the data generating process into account. The problem originates from actuarial science, where it is known as claims reserving modeling (England and Verrall, 2002), but has also found application in a biostatistical context when analyzing the AIDS/HIV epidemic  (Brookmeyer and Damiano, 1989; Kalbfleisch and Lawless, 1989; Zeger, See, and Diggle, 1989). Applications can also be found in noninfectious disease modeling such as cancer registry data (Midthune et al., 2005) or mortality monitoring (Lin, Yip, and Huggins, 2008). Besides the already mentioned use during the 2009 H1N1 influenza pandemic in Donker et al., (2011), nowcasting procedures have also been used to assess excess mortality during heatwaves (Green et al., 2012) and—in a somewhat different form—to perform influenza-like illness surveillance (Nunes, Natario, and Lucilia Carvalho, 2013).

image

Figure 1. Daily number of hospitalizations due to HUS during the outbreak as available in retrospect. Also shown in darkgray are the number of available hospitalization reports at the RKI as of 2011-06-02 (indicated by the chrosshairs symbol).

Download figure to PowerPoint

Figure 1 shows the daily number of HUS hospitalizations during the outbreak; 658 of the 827 (i.e., 79.6%) now confirmed HUS cases have information on the date of hospitalization available. Since hospitalization can be assumed for HUS cases, missingness merely reflects that the event occurrence was not recorded, for example, because it is not a mandatory reporting field or, because it was not possible to ascertain with sufficient precision (for the 169 HUS cases without hospitalization date, 101 were recorded as having been hospitalized, 29 had missing information on this matter, and only 39 were indicated as not having been hospitalized). Also shown in the figure is the number of hospitalizations available in the central German surveillance database SurvNet@RKI as of 2011-06-02—the discrepancy between the two curves is due to reporting delay: the notification from the hospital goes to the local health department from which it is then forwarded to the Robert Koch Institute (RKI) via the state health department. The goal of nowcasting is to predict the true number of counts from the currently available counts. The corresponding animation of available reports as a function of time (in days) in the period 2011-05-01 to 2011-07-06 can be found as Web Animation 1.

The aim of our work is—in hindsight—to evaluate the quality of the nowcasting procedures on HUS hospitalizations during the outbreak. We chose the date of hospitalization for HUS as the target, since it constituted a more reliable and outbreak specific indicator compared to, for example, onset of symptoms in all STEC cases. In the present work, we propose a novel Bayesian hierarchical model for the nowcasting problem, which better reflects uncertainty due to both estimation of the delay distribution and the count data nature of the hospitalization data. A unique feature of our work is that, on the basis of the full epidemic curve, we are able to retrospectively perform a quantitative evaluation of different nowcast schemes based on proper scoring rules (Czado, Gneiting, and Held, 2009). This allows us to address both aspects of sharpness and calibration of the nowcasts and permits a ranking of the procedures.

Our work is structured as follows: Section 'Nowcasting' introduces nowcasting and its mathematical notation. Section 'Time Homogeneous Delay Distribution' considers frequentist and Bayesian nowcasting when the delay distribution can be assumed as time-homogeneous, whereas Section 'Joint Bayesian Modeling of Epidemic Curve and Delay Distribution' contains a joint Bayesian modeling of both time-varying delay distribution and epidemic curve. Proper scoring rules, which we use to evaluate the predictive quality of the casts, are explained in Section 'Evaluating the Nowcasts'. In Section 'Nowcasting during the STEC O104:H4 Outbreak', all methods are applied and evaluated on the STEC outbreak data. Finally, a discussion of the results and an outlook in Section 'Discussion' completes the work.

2 Nowcasting

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

We adapt the notation of Lawless, (1994) to describe the nowcasting problem. Let inline image be the number of cases, which occur on Day t and become available with a delay of d days, that is, the case reports arrive on Day inline image. Here, the indices span inline image, with T being the index of the current day, that is, now, and inline image. It is typical to assume that delays can occur only up to a maximum of D days; for example, because larger delays cannot be estimated reliably or because cases with a longer delay provide information about times too far in the past to be relevant. Hence, the last category corresponding to D days often covers the case “D days or more.” We shall return to this issue in Section 'Nowcasting during the STEC O104:H4 Outbreak'. While nowcasting can be performed repeatedly for different “nows” as described in Sections 'Evaluating the Nowcasts' and 'Nowcasting during the STEC O104:H4 Outbreak', the current description focuses on a single “now” T. Figure 2 illustrates the reporting triangle of inline image's emerging from the progress of time and delay. In particular, the numbers inline image are unknown when inline image. Hence, we shall denote the observed data as inline image, where

  • display math

is the right-angled trapezoidal observation region at time T when using a moving window of size m.

image

Figure 2. Reporting triangle at time T. The right-angled trapezoid spanned by inline image, and inline image defines the available observations. Delays larger than D are not considered. The shaded box inline image indicates the observations, which at time t are not yet available.

Download figure to PowerPoint

Denote by inline image the number of cases, which occurred on t and which are reported until time T. The aim of nowcasting is to predict the total number of cases occurred on day t, inline image, given the information available at time T. Formally, this number is inline image and consequently is the currently missing part inline image.

The reporting delay (in days) of a case occurring at time t follows a distribution with probability mass function (PMF) inline image, inline image, where inline image. Considering time t and conditioning on the number of cases inline image observed by time T, the observed inline image's in the reporting triangle due to the right-truncation have a multinomial distribution with size inline image and cell probabilities inline image for inline image.

If one, as in Kalbfleisch and Lawless, (1989) or Zeger et al., (1989), assumes an underlying inhomogeneous Poisson process on the occurrence of new cases s.t. inline image, the above conditional formulation of the inline image's originating from multinomial sampling can be re-formulated as an unconditional likelihood corresponding to an incomplete contingency table

  • display math(1)

Nowcasting can thus be divided into steps of determining the inline image's, the sequence of inline image's and finally predicting the unobserved inline image's in order to compute the total inline image.

We shall address the problem in Section 'Time Homogeneous Delay Distribution' by first assuming a time-homogeneous delay distribution s.t. inline image for all time points within the moving window. The starting point will be the work of Lawless, (1994), which uses the conditional formulation to estimate the delay distribution without any assumptions about the distribution of the inline image's. In a second step, we extend the delay distribution modeling in Section 'Joint Bayesian Modeling of Epidemic Curve and Delay Distribution' to a discrete survival modeling context, which allows us to handle changes in the delay distribution over time. Here, the unconditional formulation becomes more natural to use, since the estimation of the epidemic curve and the delay can be formulated as a joint problem in the context of hierarchical log-linear modeling. This also allows us to impose smoothness constraints on the inline image's by penalized splines.

3 Time Homogeneous Delay Distribution

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

For the current “now” T, we will in this section assume time-homogeneity of the delay distribution within inline image, that is, we only take those cases into account, which occurred during the last inline image days. In other words, we use as lower nodes of the right-angled trapezoid the observations inline image and inline image in Figure 2. For the further derivation of the delay estimation, we follow the notation in Lawless, (1994) by introducing

  • display math

that is, inline image denotes the rectangle spanned by the observations inline image, inline image, inline image, and inline image. That is all regarded cases with delay less or equal to d. Furthermore, inline image denotes the right edge of this rectangle. This corresponds to all regarded cases with delay equal to d. Section B.1 in the Web Appendix contains an example calculation of these quantities for a specific reporting triangle.

Since at time T we observe a right-truncated version of the delay distribution, we define the reverse-time discrete hazard function  (Lagakos et al., 1988, Kalbfleisch and Lawless, 1991; 1994),

  • display math

where inline image denotes the cumulative distribution function (CDF) of the delay distribution. We note that inline image. The CDF and PMF of the delay distribution are now given as

  • display math(2)

3.1 Frequentist Nowcasting

For the reader's convenience, in this section we recapitulate and discuss existing results on performing nowcasting for right-truncated delays. Lawless, (1994) showed that the maximum likelihood estimate (MLE) for inline image when inline image is inline image. The MLE for the delay distribution can now be obtained by plug-in of these estimates into (2). Note that if the window width m is very close to the maximum delay D, the MLE for the long delays may be based on few observations. Based on the MLE, Lawless, (1994) presented the following nowcast procedure

  • display math

which can be linked to inverse probability weighting. This link also provides solutions for the case when inline image is zero (Elliot and Little, 1999). If inline image, that is, no events are reported to have occurred at t by time T, then inline image equals zero. Otherwise, the predictive distribution can be computed based on an asymptotic normal approximation for the prediction error. This is done by computing the variance of inline image. From this, a inline image prediction interval for inline image is inline image, where inline image is the inline image quantile of the standard normal distribution. We shall obtain a discrete predictive distribution for inline image via discretizing this Gaussian predictive distribution by taking the difference of the CDF evaluated at the integers and attribute all mass below inline image to inline image.

3.2 Bayesian Nowcasting

As a competitor to the above method, we propose a novel Bayesian hierarchical model to directly address the combination of delay estimation and count data nature of the forecast. In particular, we will show that the generalized Dirichlet (GD) distribution is a conjugate prior-posterior distribution for the reversed delay under right-truncated multinomial sampling.

The density of the inline image distribution (Wong, 1998) with parameters inline image and inline image is proportional to

  • display math

where the probabilities inline image and inline image are all nonnegative and sum to one. Here, inline image for inline image and inline image. In the above, inline image denotes the reverse delay order, that is, where delay d is mapped to the new scale inline image in analogy with the reverse time hazard function.

Property 3 in the Web appendix shows that the posterior of inline image under right-truncated multinomial sampling of Section 'Nowcasting' is again a GD distribution with parameters inline image and inline image,

  • display math

With the above GD posterior the marginal posterior expectation for inline image, when using the improper prior inline image can be shown to be equal to the ML estimate defined in the previous section (see Web Appendix).

Note that the posterior in this case is only proper if inline image and inline image for all inline image. In practice, we shall use the informative prior inline image, inline image and inline image, inline image, where inline image is a known constant describing the prior concentration. In this case, the GD prior reduces to a symmetric Dirichlet distribution with concentration parameter inline image (see Wong, (1998) or Web Appendix). A property of this prior is that inline image for inline image. Note that if one uses a Dirichlet prior and pretends that the column sums in the reporting triangle originate from multinomial sampling, then the posterior is again Dirichlet (and hence also GD) and represents a Bayesian estimate ignoring right-truncation (see Web Appendix).

Assume now that, by conjugate prior-posterior updating, we have obtained an estimate of the delay distribution, that is, inline image. In order to predict the unknown inline image, from the incomplete inline image we assume the following model hierarchy for inline image

  • display math

where inline image is the proportion reported within a delay of inline image days and inline image are known constants. In this hierarchy, the marginal (prior) distribution of inline image is negative binomial with mean inline image and variance inline image. Furthermore, given inline image the distribution of inline image is compound binomial-negative binomial. The marginal posterior for inline image, inline image, is

  • display math(3)

where by application of Bayes' theorem and using inline image

  • display math(4)

for inline image and zero otherwise. Note that inline image in (3) is not available in analytic form but we know that inline image due to previous developments. We hence solve the integration in (3) by Monte Carlo sampling jointly inline image for all inline image using the following algorithm.

  1. For inline image:
    1. Draw inline image by the algorithm of Wong, (1998) and calculate inline image for inline image.
    2. Calculate the nonnormalized density inline image for inline image and inline image, where inline image is sufficiently large.
    3. Set inline image, where inline image is the normalization constant.
  2. Approximate inline image for inline image and inline image.

Since the Monte Carlo sampling in inline image is based on entire probability vectors and not just samples, only a small number of K, say 100 or 1000, is needed to obtain accurate results for the multivariate integration. Altogether, our Monte Carlo based procedure is fast, accurate and easy to use in practice.

4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

In outbreak situations there is often prior knowledge about the shape of the epidemic curve (here measured as daily number of hospitalizations): typically, one expects the curve to initially increase, level off, and then decrease. An advantage of such shape assumptions is that they can help to improve the estimation of the delay distribution and hence the nowcasts (Kalbfleisch and Lawless, 1989). However, for foodborne outbreaks such shape assumptions are only partially applicable: the epidemic curve is very much dominated by how the contaminated food item is distributed, its shell life and consumer awareness. Person-to-person transmission often only plays a minor role. Hence, one of the concerns during the STEC outbreak was, if several waves could occur. As a consequence, our modeling of the epidemic curve is restricted to a semi-parametric approach imposing some degree of smoothness.

4.1 Epidemic Curve

In Section 'Bayesian Nowcasting', we used inline image as prior for the expected number of cases. Instead, we now model inline image as a quadratic spline based on the TP basis with knots placed at inline image to describe the evolution in inline image. We place inline image knots evenly between time zero and T. In order to avoid over-extrapolations at the end, where uncertainty is large, we remove any knots which are between time inline image and T. Then,

  • display math(5)

The first part of (5) is a global 2nd order polynomial reflecting that the epidemic curve initially rises, levels off, and then decreases again. The second part contains terms representing the deviation from this global polynomial by having the 2nd order terms change at each knot. Automatic smoothing can be obtained by treating the inline image's as independent random effects, that is, inline image with the amount of smoothing being proportional to inline image. By setting up design matrices inline image and inline image it becomes obvious that this corresponds to a Poisson mixed model, which can be handled in a Bayesian framework by appropriate specification of priors (chap. 8.1.9 Fahrmeir et al., 2013, chap. 8.1.9).

4.2 Delay Distribution Modeling

In Section 'Time Homogeneous Delay Distribution', we assumed the same delay distribution for all times t. To generalize this, we instead use a discrete time survival regression model to describe the truncated delay distribution. We shall use a model which combines the nonparametric-like feature of the Dirichlet distribution (i.e., a parameter for each delay time) with parametric features for modeling changes in the distribution. Specifically, we model the discrete time hazard inline image, inline image, for a case occurring at time t as

  • display math

Here, inline image denotes a inline image vector of covariates depending on time and delay and inline image are the corresponding covariate effects. The corresponding delay distribution is then computed as

  • display math(6)

As an example: On May 23, 2011, there was an intervention toward faster reporting, where local and state health authorities were asked to communicate suspected HUS cases directly and without the otherwise necessary quality checking. This situation could now be represented by a single change-point s.t. inline image. Extension of the above modeling to include day-of-the-week effects or case specific covariates, for example, age or county of residence, is immediate.

4.3 Implementation of the Hierarchical Bayesian Nowcasting Model

Multivariate normal priors on inline image and inline image for the model (6) were combined with the spline model in (5) resulting in a Bayesian hierarchical Poisson model for nowcasting based on (1). Inference per T was done using MCMC as implemented in JAGS (Plummer, 2003) using the runjags package (Denwood, 2013). Three separate chains were run using parallel computing on three CPU kernels simultaneously. To obtain better convergence the time covariate was centered as inline image in the TP spline. Furthermore, the number of inline image's in the discrete survival model was halved by using an alternative parameter vector inline image of length inline image and then assuming inline image in order to stabilize and speed up the computations.

5 Evaluating the Nowcasts

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

We assess the quality of the nowcasts by using proper scoring rules for count data (Czado et al., 2009) while comparing with the complete data at the end of the outbreak. Such scoring rules have become the standard the way for evaluating probabilistic forecasts in count data time series applications, because they address both the aspect of calibration and sharpness—see, for example, Paul and Held (2011) for an application in infectious disease modeling. Here, calibration refers to the consistency between the distributional forecast and the observation that materializes. Sharpness refers to the concentration of the forecast distribution.

We denote by inline image the set of days at which to do the nowcasting. For each inline image, we consider the k time points inline image, where l is the first lag for which the forecasts are considered reliable and inline image is the last lag at which nowcasts are considered relevant. In our case, we will use inline image and inline image since nowcasts for delays between 0–2 days were too dominated by chance to be communicable.

Considering a specific time T and a single inline image, let inline image be the predictive distribution for time inline image based on the information available at T and with inline image being the true. In the evaluations we will use the logarithmic score (logS)

  • display math

where inline image is the PMF of the predictive distribution inline image. This rule is very intuitive—the higher the probability of the forecast distribution for the actual observed value the better. However, the rule is local, because no credit is given for high forecast probabilities near the true value. The ranked probability score (RPS)

  • display math

is a less local proper scoring rule, since comparison between the CDF of the predictive distribution, inline image, and the empirical one-value CDF happens over the entire support. If the predictive distribution is calculated by tabulating samples, for example, MCMC output, another disadvantage of logS is that it becomes infinity, if no samples are equal to the true value. This can quickly happen, if the true predictive probability is low.

The output of individual scoring rules can be combined by averaging. For example, the mean score of a specific nowcast method at lag d is

  • display math

where inline image is the scoring rule. An alternative is to perform the summation over the lags instead of the elements in inline image; one then obtains a mean-score per T, that is, inline image. In some cases one wants to average over all scores, that is, calculate inline image. Note that in such overall mean calculations, the same true value inline image enters once for each inline image, where t is included, that is, up to k times. Furthermore, one should be aware that averaging over all lags means that one considers nowcasts far away from “now” to be just as important to casts about days close to “now”. In practice, one may might be more interested in getting good predictions about the near past.

6 Nowcasting during the STEC O104:H4 Outbreak

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

We evaluate the quality of different nowcast procedures during the outbreak based on the date of hospitalization for HUS, because this proved to be a reliable and specific indicator for outbreak cases. Evaluation starts 2011-06-02, which was the day of the first nowcast produced during the outbreak (Robert Koch Institute, 2011), and continues to 2011-07-04, where the outbreak was officially declared to be over. For each day, all currently available cases are included for the estimation of the delay distribution. Delays up to inline image days are considered—if any of the available cases have a longer delay, we set their delay to the upper limit of 15 days. This occurred for 11.9% of the 658 HUS cases with an available hospitalization date. Note: The specific choice of inline image is motivated by a combination of a median delay of 5–6 days, estimation robustness and a lack of relevance in adjusting total hospitalization counts too far back. However, information about cases with very long delays is not lost in the delay distribution estimation, since the last category corresponds to “Delay inline image.” If the interest had been in adjusting counts further back, one could have used a larger D, but would then have had to categorize delays in courser time units (e.g., as in Section 'Implementation of the Hierarchical Bayesian Nowcasting Model') to ensure stability of the estimation.

Retrospectively, based on all cases, it was possible to assess the delay distribution inline image for each day t by tabulating the delays for all cases occurring at time t in the final reporting triangle. Figure 3 contains the daily median of the smoothed empirical delay distribution obtained from tabulating all cases occurring within a moving window of inline image.

image

Figure 3. Median delay (as well as 10% and 90% quantiles) of the smoothed empirical delay distribution as a function of time of occurence. Also shown is the resulting median when the posterior mean of the single change-point model (fitted on 2011-07-04) is inserted into (6).

Download figure to PowerPoint

We observe a distinct delay reduction due to the intervention on May 23, 2013. This reduction is well captured by the changepoint model. The plot also reveals some unexpected variation in the delay distribution between June 10 and June 20, 2011. However, since these estimates are based on just a few cases and no apriori explanations exist for these changes, we decided to not add extra changepoints in the delay distribution model. Daily estimates of the delay distribution from the bayes.trunc.ddcp procedure are found as Web Animation 2. Furthermore, daily ML estimates of the delay distribution with and without right-truncation adjustment are illustrated in Figure 2 of the Web Appendix.

As an example of how the delay estimates are translated into nowcasts for the number of hospitalizations, Figure 4 shows the predictive distribution of inline image when T= 2011-06-02 and t= 2011-05-28 for the Lawless approach and the three Bayesian procedures: GD-Negbin Bayes ignoring truncation (abbreviated: bayes.notrunc), truncation adjusted GD-NegBin Bayes (bayes.trunc) and the hierarchical Bayes procedure (bayes.trunc.ddcp) with delay distribution changepoint and smoothed epidemic curve. Furthermore, we chose inline image in the PMF computation of the nonMCMC Bayes procedures.

Table 1. Mean score for the period from 2011-06-02 and 2011-07-04
 bayes.notrunclawlessbayes.truncbayes.trunc.ddcpunif
logS2.28Inf2.02Inf5.69
RPS1.7915.231.771.3995.10
dist.median2.372.502.511.87143.94
outside.ci0.070.160.050.090.84
image

Figure 4. Predictive distribution of the number of hospitalizations inline image when T= 2011-06-02 and t= 2011-05-28, that is, the considered delay is inline image days. The true number of hospitalizations is inline image.

Download figure to PowerPoint

An animation showing the median and 95% credibility interval of the nowcasts obtained with the bayes.trunc.ddcp procedure is shown in Web Animation 3. In the animation, the delay distribution is re-estimated for each day based on all currently available observations. The animation also contains a plot of the posterior median of inline image and an equal tailed 95% credibility interval, which gives a good indication of the current epidemic trend (up, stable, or down) and its associated uncertainty.

6.1 Nowcast Comparisons Using Scoring Rules

The following procedures are compared: the truncation adjusted Lawless, (1994) procedure and the three previously described Bayes procedures (bayes.notrunc, bayes.trunc, and bayes.notruncddcp). In order to form a baseline for our comparisons we also assess the behavior of the discrete uniform (unif) on inline image as predictive distribution. We evaluate the procedures between 2011-06-02 and 2011-07-04 using inline image and inline image. Altogether, inline image nowcasts are computed.

To quantify the comparisons, we use four scores: the logarithmic-score and RPS as proper scoring rules together with the more heuristic measures: median of the absolute deviations and proportion of times the observed value lay outside an equal-tailed 95% prediction interval. The mean overall score of all 330 nowcasts is given in Table 1.

Based on RPS, bayes.trunc.ddcp has the overall best performance. It also becomes clear that the Lawless procedure is not performing too well for the HUS outbreak data. Partly, this is caused by the fact that an observation of zero counts leads to a point mass forecast distribution at zero. As a consequence, the lawless procedure has infinite mean log-score. Altogether, the approximate Gaussian distribution of the predictive appears unwarranted in the small count data setting. The bayes.trunc.ddcp procedure also has an infinite mean log-score, because the predictive PMF is computed by tabulating 30,000 MCMC samples at each instance. Hence, in some situations where the probability is low the resulting PMF is zero, which leads to an infinite log-score when evaluated at this value.

An investigation of the effect of the lag inline image on the RPS score is shown in the left panel of Figure 5. To further interpret this figure, we show corresponding results in simulated data, where the delay distribution is time-constant. We obtain this simulated data by using the actual event time for each case, but simulate report time according to a PMF equal to the delay distribution of all 658 cases (delays inline image15 days were set to 15 days). The right panel in Figure 5 shows the corresponding results. We note that for the real outbreak data, the procedure ignoring truncation perform better for time points closer to “now” (typically these being the more interesting ones to nowcast) than the purely right truncation adjusting method. For the simulated data, the right-truncation adjusting procedures performs better for all lags, which is further indication that the delay distribution changed during the outbreak and that the bias of the nonadjusting procedures (too much weight on the short delays) coincides with the real delays getting shorter. Except for long lags in the actual data does bayes.trunc.ddcp outperform the two other procedures; for the simulated data this result is an emphasis of the effect of the epidemic curve on the nowcasts.

image

Figure 5. Mean RPS Score by lag. Left panel shows the results for the actual data. For comparison, the right panel shows results for simulated data with stationary delay distribution.

Download figure to PowerPoint

As a final comparison, Figure 6 shows the mean RPS score as a function of the nowcasted time points in inline image. We observe high values of the mean-score for the early time points, where the number of cases is high. Altogether, when averaging over the 10 included lags, the bayes.trunc.ddcp procedure appears to perform best. However, early in the outbreak the procedure is not superior—even though it is the only procedure adjusting for the change in delay distribution. A closer examination reveals that the combination of a flexible delay distribution model and the 2nd order polynomial leads to over-extrapolation in this situation. Stronger priors might be a way to obtain better results this situation. Note also that the further scoring gets away from the May 23, 2011, the more are differences between the bayes.trunc and bayes.trunc.ddcp procedures due to the penalized TP spline acting on the epidemic curve. Altogether, it appears that once the peak has been reached the hierachical model works very well, while in the early phases it might contain too much flexibility.

image

Figure 6. Mean-score as a function of time T, where the nowcasting is performed.

Download figure to PowerPoint

In conclusion, ignoring truncation for the outbreak data is an ad hoc way to compensate for the fact that, early in the outbreak, delays reduced significantly. However, right-truncation adjusted procedures are to be recommended as proper adjustment methods. A simple but robust way to handle time varying delay distributions with the methods from Section 'Time Homogeneous Delay Distribution' is to use a moving window in the delay distribution estimation. Figures 4 and 5 in the Web Appendix contain the results of the daily updated delay distribution estimate when using a moving window of 25 days. For comparison, Table 2 in the Web Appendix contains the mean scores when using a moving-window procedure of 25 days and Figure 6 in the Web Appendix shows the mean scores by lag. A small improvement is noticed for all Bayes procedures. However, a moving window based procedure remains a compromise between including only recent events and having enough events to obtain a reliable estimation. Altogether, the reporting dynamics of the STEC outbreak emphasizes the need to handle time-varying reporting delay distributions—either using simple moving window approaches or more complex regression models for the delay distribution. Furthermore, the results show that imposing a smoothness restriction on the epidemic curve as done in Section 'Joint Bayesian Modeling of Epidemic Curve and Delay Distribution' helps improve the nowcasts. At the early stages of the outbreak, when only few observations are available, such flexible models are, however, difficult to identify and the interplay between epidemic curve component and delay distribution component in the estimation is nontrivial.

7 Discussion

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

Real-time tracking of the epidemic curve during the STEC/HUS outbreak was an important component of the outbreak management. The results of the nowcasts were circulated among the stakeholders at the RKI on a daily basis and were hence part of the body of evidence used to assess epidemiological trends each day. In case communication of a more qualitative output for the epidemic trend is required, an additional decision procedure could categorize the nowcasts into, for example, trend states “up,” “stable” and “down.” Such answers are immediate from the penalized TP spline for inline image as visualized in the Web Animation 3. Because the outbreak was a time-limited event, it was in retrospect possible for us to determine the true number of daily hospitalizations due to HUS. Having the true number of hospitalizations allowed a unique evaluation of the quality of the nowcasts by proper scoring rules.

Our comparisons showed that the proposed Bayesian nowcast procedures, which took the count data nature into account, provided better results than the discretized Lawless procedure.

Within the Bayesian procedures, addressing right-truncation under the assumption of a time homogeneous delay distribution only provided small improvements for the STEC/HUS outbreak. The main reason is that the actual delay pattern was time-varying, because of intervention measures to ensure faster reporting and other complex factors, for example, awareness of hospital doctors toward HUS and its reporting, resources at local health institutions, media attention and case definition. Altmann et al., (2011) contains a more detailed analysis of the reporting delay during the outbreak. Since the intervention toward faster reporting occurred at an early stage of the outbreak, a moving window based approach is a way to address the problem. The model based approach reflecting knowledge about the intervention, however, showed better results and appears less ad hoc. The Poisson formulation of the nowcasting problem also makes it obvious that a frequentist alternative could have been to fit a generalized additive model containing a penalized spline for the trend and a factor for the delay (Wood, 2006). However, the hierarchical Bayes approach is more flexible, because it allows the combination of complex models for both delay and epidemic curve. Furthermore, predictive distributions are immediately available due to the data-augmenting nature of the Bayes approach.

The evaluations based on scoring rules depend on the specific time-period investigated, that is, it makes a difference whether the early outbreak with many cases and fluctuating reporting patterns or the late outbreak period with few cases and less dramatic changes is investigated. Our choice of evaluation period for the scoring was based on an interest of what method—in retrospect—would have been the best way to address the nowcasting problem as posed by the circumstances of the outbreak then. As easy as it is to model details in hindsight, we underline that during outbreaks consistency is more important than accuracy (Cowden, 2010) and hence robustness is often more important than the best fit. To promote application of nowcasting procedures in future outbreak situations or even noninfectious disease applications, the proposed methods are available in the R package surveillance (Höhle, 2007) as function nowcast.

8 Supplementary Materials

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

Web Appendices, Tables, Figures, and Animations referenced in the Sections 'Introduction', 'Time Homogeneous Delay Distribution' and 'Nowcasting during the STEC O104:H4 Outbreak' are available with this paper at the Biometrics website on Wiley Online Library. Furthermore, a R file for analyzing the data corresponding to Figure 1 of the Web Appendix is available from this website.

Acknowledgements

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

We thank Leonhard Held for fruitful discussions and helpful suggestions at the initial stages of our work. Thanks also goes to Doris Altmann for preparing the SQL query, which extracted the necessary data for our analysis from the SurvNet@RKI database. Comments and suggestions by the associate editor and two anonymous reviewers helped to improve the manuscript substantially.

References

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information
  • Altmann, M., Wadl, M., Altmann, D., Benzler, J., Eckmanns, T., Krause, G., Spode, A., and an der Heiden, M. (2011). Timeliness of surveillance during outbreak of shiga toxin producing Escherichia coli infection, Germany, 2011. Emerging Infectious Diseases 17, 19061909.
  • Brookmeyer, R. and Damiano, A. (1989). Statistical methods for short-term projections of AIDS incidence. Statistics in Medicine 8, 2334.
  • Buchholz, U., Bernard, H., Werber, D., Böhmer, M. M., Remschmidt, C., Wilking, H., Deleré, Y., an der Heiden, M., Adlhoch, C., Dreesman, J., Ehlers, J., Ethelberg, S., Faber, M., Frank, C., Fricke, G., Greiner, M., Höhle, M., Ivarsson, S., Jark, U., Kirchner, M., Koch, J., Krause, G., Luber, P., Rosner, B., Stark, K., and Kühne, M. (2011). German outbreak of Escherichia coli O104:H4 associated with sprouts. New England Journal of Medicine 365, 17631770.
  • Cowden, J. M. (2010). Some haphazard aphorisms for epidemiology and life. Emerging Infectious Diseases 16, 174177.
  • Czado, C., Gneiting, T., and Held, L. (2009). Predictive model assessment for count data. Biometrics 65, 12541261.
  • Denwood, M. J. (in review). runjags: An R package providing interface utilities, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software.
  • Donker, T., van Boven, M., van Ballegooijen, W. M., Van't Klooster, T. M., Wielders, C. C., and Wallinga, J. (2011). Nowcasting pandemic influenza A/H1N1 2009 hospitalizations in the Netherlands. European Journal of Epidemiology 26, 195201.
  • Elliot, M. R. and Little, R. J. A. (1999). Weight trimming in a random effects model framework. In Proceedings of the Survey Research Methods Section, American Statistical Association, 365370.
  • England, P. and Verrall, R. J. (2002). Stochastic claims reserving in general insurance. British Actuarial Journal 8(3), 443544.
  • Fahrmeir, L., Kneib, T., Lang, S., and Marx, B. (2013). Regression: Models, Methods and Applications. Berlin Heidelberg: Springer.
  • Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M., an der Heiden, M., Bernard, H., Fruth, A., Prager, R., Spode, A., Wadl, M., Zoufaly, A., Jordan, S., Kemper, M. J., Follin, P., Mller, L., King, L. A., Rosner, B., Buchholz, U., Stark, K., and Krause, G. (2011). Epidemic profile of shiga-toxin producing Escherichia coli O104:H4 outbreak in Germany. New England Journal of Medicine 365, 17711780.
  • Green, H. K., Andrews, N. J., Bickler, G., and Pebody, R. G. (2012). Rapid estimation of excess mortality: Nowcasting during the heatwave alert in England and Wales in June 2011. Journal of Epidemiology and Community Health 66, 866868.
  • Höhle, M. (2007). Surveillance: An R package for the surveillance of infectious diseases. Computational Statistics, 22, 571582.
  • Kalbfleisch, J. D. and Lawless, J. F. (1989). Inference based on retrospective ascertaintment: An analysis of the data on tranfusion related AIDS. Journal of the American Statistical Association 84, 360372.
  • Kalbfleisch, J. D. and Lawless, J. F. (1991). Regression models for right truncated data with applications to AIDS incubation times and reporting lags. Statistica Sinica 1, 1932.
  • Lagakos, S. W., Barraj, L. M., and De Gruttola, V. (1988). Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika 75, 515523.
  • Lawless, J. F. (1994). Adjustments for reporting delays and the prediction of occurred but not reported events. The Canadian Journal of Statistics 22, 1531.
  • Lin, H., Yip, P. S. F., and Huggins, R. M. (2008). A double-nonparametric procedure for estimating the number of delay-reported cases. Statistics in Medicine 27, 33253339.
  • Midthune, D. N., Fay, M. P., Clegg, L. X., and Feuer, E. J. (2005). Modeling reporting delays and reporting corrections in cancer registry data. Journal of the American Statistical Association 100, 6170.
  • Nunes, B., Natário, I., and Luclia Carvalho, M. (2013). Nowcasting influenza epidemics using nonhomogeneous hidden Markov models. Statistics in Medicine 32, 26432660.
  • Paul, M. and Held, L. (2011). Predictive assessment of a nonlinear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine 30, 11181136.
  • Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Hornik, K., Leisch, F., and Zeileis, A. (eds), Vienna.
  • Robert Koch Institute (2011). Final presentation and evaluation of epidemiological findings in the EHEC O104:H4 outbreak, Germany, 2011. Technical Report, Berlin: Robert Koch Institute.
  • Wong, T. -T. (1998). Generalized Dirichlet distribution in Bayesian analysis. Applied Mathematics and Computation 97, 165181.
  • Wood, S. (2006). Generalized Additive Models: An Introduction with R. Boca Raton: Chapman & Hall/CRC.
  • Zeger, S. L., See, L. C., and Diggle, P. J. (1989). Statistical methods for monitoring the AIDS epidemic. Statistics in Medicine, 8, 321.

Supporting Information

  1. Top of page
  2. Summary
  3. 1 Introduction
  4. 2 Nowcasting
  5. 3 Time Homogeneous Delay Distribution
  6. 4 Joint Bayesian Modeling of Epidemic Curve and Delay Distribution
  7. 5 Evaluating the Nowcasts
  8. 6 Nowcasting during the STEC O104:H4 Outbreak
  9. 7 Discussion
  10. 8 Supplementary Materials
  11. Acknowledgements
  12. References
  13. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
biom12194-sm-0001-SuppData.zip464KSupporting Information.
biom12194-sm-0001-SuppDataCode.zip13KSupporting Information.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.