Nowcasting the COVID‐19 pandemic in Bavaria

Abstract To assess the current dynamics of an epidemic, it is central to collect information on the daily number of newly diseased cases. This is especially important in real‐time surveillance, where the aim is to gain situational awareness, for example, if cases are currently increasing or decreasing. Reporting delays between disease onset and case reporting hamper our ability to understand the dynamics of an epidemic close to now when looking at the number of daily reported cases only. Nowcasting can be used to adjust daily case counts for occurred‐but‐not‐yet‐reported events. Here, we present a novel application of nowcasting to data on the current COVID‐19 pandemic in Bavaria. It is based on a hierarchical Bayesian model that considers changes in the reporting delay distribution over time and associated with the weekday of reporting. Furthermore, we present a way to estimate the effective time‐varying case reproduction number Re(t) based on predictions of the nowcast. The approaches are based on previously published work, that we considerably extended and adapted to the current task of nowcasting COVID‐19 cases. We provide methodological details of the developed approach, illustrate results based on data of the current pandemic, and evaluate the model based on synthetic and retrospective data on COVID‐19 in Bavaria. Results of our nowcasting are reported to the Bavarian health authority and published on a webpage on a daily basis (https://corona.stat.uni-muenchen.de/). Code and synthetic data for the analysis are available from https://github.com/FelixGuenther/nc_covid19_bavaria and can be used for adaption of our approach to different data.


Introduction
We perform an evaluation of the Bayesian hierarchical nowcasting based on synthetic data mimicking the reported Bavarian COVID-19 data and retrospectively on the official data from the LGL that was reported until July, 31.

Data generating process
For simulation of the synthetic data, we utilized a smoothed version of the observed number of reported disease onsets per day and specified a reporting delay model similar to the model described in equation 3 of the manuscript. In this discrete time hazard model for the reporting delay, we specified a linear time effect on the log-hazard with five change points over time. The aggregated number of newly diseased cases (green, solid) and number of reported cases per day (red, dotted) are shown in Figure 1.  The changepoints in the linear time effect of the delay distribution lead to (smooth) changes in the delay distribution. We did not add any further effects of e.g., weekdays to the delay distribution model, and simulated reporting dates for each case with disease onset on day t directly from the delay distribution without adding any further variability.
The utilized delay distribution can be illustrated by plotting the empirical median and 25%-and 75%-quantile of the sampled reporting delay over time ( Figure 2).

Results
We estimated nowcasts for all days T from March, 17 (22 days after disease onset of first case) until June, 30 by restricting the data to all cases reported until the respective date and compared the nowcast predictions for all days t = T − 6, . . . , T − 2 to the actual true numbers of newly diseased cases per day.
Nowcasts are performed based on six different models (see description in the manuscript), and the performance of the models is compared via proper scoring rules, the root mean squared error and coverage frequencies of 95% prediction intervals.
In Table 1, we present results based on the nowcasts for all dates and restricted to the time period between March, 17 and April, 30, where most of the dynamic in the epidemic curve happened. Table 1: Quantification of the performance of six different nowcast models on synthetic data. Shown are the average metrics over all nowcast dates (*_f) and restricted to the period until May, 1 (*_m). Reported scores are the continuous ranked probability score, logarithmic score, root mean squared error of posterior median, and coverage frequencies of 95% prediction intervals. Estimated models are Poisson and Negative Binomial models with 1) (assumed) constant delay distribution, 2) linear time-effects with changepoints every two weeks before now (*_cp2W), 3) daily changes in delay distribution based on a first-order random walk prior (*_rW) and 4) true changepoints of the data generating process (*_cptrue). In Figure 3 we show nowcast predictions and 95%-PIs 3 days before the current date over time for each model, and compare them with the true number of newly diseased cases. Note that in the quantitative evaluation based on scores and coverage frequencies, we focus on nowcasts 2-6 days before now and not only on the most current prediction 3 days before now.
We can also compare the estimated delay distribution (for the most current date) with the empirical delay distribution from the sampled data ( Figure 4). Note that at each nowcast day t the delay distribution is in general estimated for the complete past. We summarize the results the following way: when we supply the true, in reality unknown, changepoints of the delay distribution to model fitting, the nowcast performs best with respect to our evaluation metrics. It shows the lowest log and CRPS score, lowest root mean squared error and shows the desired coverage frequencies for the 95%-prediction intervals. With the models assuming changepoints in the linear time effect on reporting delay every two weeks before T , we obtain a similar, but slightly worse performance. The approach appears to be able to capture the moderate changes in the delay distribution well. Modeling the changes on a daily basis shows a slightly worse performance with respect to the CRPS score and PI coverage frequencies. The prediction intervals are wider and there exists some evidence for convergence problems at the beginning of the nowcast period where the median delay was strongly overestimated leading to an upward bias in the predicted number of new disease onsets per day. This might be related to the overly complex model for the reporting delay and very few observations to estimate the reporting delay up to this time. Assuming a constant reporting delay distribution over time and ignoring the changes in the data generating process leads to the worst performance with biggest scores and low coverage frequencies of the prediction intervals. The number of estimated disease onsets is overestimated during the whole nowcasting period starting at end of March. When specifying an adequate model for the delay distribution, the distributional assumptions regarding N t,d play a minor role for the performance on the synthetic data.

Retrospective evaluation based on Bavarian data
For the retrospective evaluation of nowcasting we utilize all official data that was reported until July, 31 and restrict the evaluation period until June, 30, assuming that for all days before, the true number of disease onsets are reported based on the available data at end of July. Furthermore, we focus on all reported cases with available disease onset information. The aggregated case counts in this period are given in Figure 5.  The empirical reporting delay distribution can be illustrated as with the synthetic data ( Figure 6). Note that the illustrated delay is, in contrast to the delay reported in Table 2 of the manuscript, the time between disease onset and reporting at the LGL (regional health authority). In Table 2, we described the delay between disease onset and reporting at the local health authority that is relevant for the disease onset imputation model.

Results
We estimated nowcasts for all days T from March, 17 (22 days after disease onset of first case) until June, 30 by restricting the data to all cases reported until the respective date and compared the nowcast predictions for all days T − 6, . . . , T − 2 to the numbers of disease onsets per day as reported until July, 31. Nowcasts are performed based on six different models (see description in the Paper), and the performance of the models is compared as above. In addition we compute the coverage frequencies of the 95% credibility intervals of the estimated time-varying reproduction number with the estimate obtained from utilizing all available data until July, 31 (Table 2). Table 2: Retrospective quantification of the performance of six different nowcast models on Bavarian COVID-19 data. Shown are the average metrics over all nowcast dates (*_f) and restricted to the period until May, 1 (*_m). Reported scores are the continuous ranked probability score, logarithmic score, root mean squared error of posterior median, and coverage frequencies of 95% prediction intervals, as well as coverage frequencies of the estimated R(t) at the most current date. Estimated models are Poisson and Negative Binomial models with 1) (assumed) constant delay distribution, 2) linear time-effects with changepoints every two weeks before now (*_cp2W), 3) daily changes in delay distribution based on a first-order random walk prior (*_rW) and 2) and 3) with additional effects of the weekday of case reporting (*_wd). The comparison of the daily nowcast predictions 3 day before now and the retrospective truth, as well as the estimated delay distribution are shown in Figure 7 and Figure 8: We find that the Poisson model assuming no changes in the reporting delay distribution performs bad. Daily case numbers are strongly overestimated. This is in line with the apparent changes in the reporting delay between disease onset and reporting at LGL over time. Comparing the Poisson model with two-week changepoints with a similar model using a Negative Binomial distribution for N t,d we find the latter to perform better with respect to the evaluation metrics. Furthermore, it is apparent that prediction intervals in the poisson model are too narrow. This improves when using a Negative Binomial model with overdispersion. Adding weekday effects to the delay distribution improves the performance of the models as well. Comparing the Negative Binomial model with daily changes in the delay distribution with the two week changepoint model, we found better coverage frequencies for the former (mainly because of wider prediction intervals) but lower CRPS score and RMSE for the latter. The estimation of the reporting delay allowing for daily changes based on a first-order random walk appeared to be somewhat instable at the beginning of the nowcasting period, as also seen in the evaluation based on synthetic data.
Looking visually at the predictions of the nowcast, we find that the predictions close to now (e.g. 3 days before now), as illustrated in the Fig. 7, overestimate daily case counts in the crucial timeperiod between March 15 and April 1. This is also true for our preferred model, with 2 week changepoints and weekday effects in the delay distribution. In this period the number of disease onsets stabilized, but daily reported cases were still increasing steadily. The situation of a stabilizing number of new disease onsets and simultaneously decreasing average reporting delay between disease onset and case registration (that becomes now -in retrospect- apparent) is not easily identifiable based on incomplete data in real time surveillance. It is, however, apparent that the nowcasting approach is valuable in understanding the dynamics of the pandemic better compared to the daily counts of newly reported cases. It helps to illustrate uncertainty with respect to the current state of the pandemic and the predictions of our preferred models that account for changes in reporting delay are not too far off. Furthermore coverage frequencies of 95% prediction intervals close to 90% appear acceptable. Looking more closely at the results of e.g., the Negative Binomial model with 2-week changepoints and weekday effects in the delay distribution, we find that based on the nowcast, the pandemic situation seems to stabilize from around March, 20 on, and the predictions close to now are already starting to decrease at around April, 1, where the daily number of newly reported cases were still at their peak.
Comparing the estimated R(t) at most current day t based on the different nowcast models with the retrospective truth based on all reported data, we find coverage probabilities of the 95% credibility intervals close to or bigger than 90% for all models that consider changes in the delay distribution over time (Table 2, Figure 9). The estimation of R(t) is, however, biased when it is based on a biased nowcasting approach, e.g., due to ignoring changes in the delay distribution. In the Poisson model assuming no changes in delay distribution R(t) is biased upwards during the whole timeperiod.  Figure 9: Estimated R(t) over time based on all nowcast models at most current date and associated 95% CI. Comparison with R(t) based on all reported disease onsets until July, 31.