Inferring UK COVID-19 fatal infection trajectories from daily mortality data: were infections already in decline before the UK lockdowns?

The number of new infections per day is a key quantity for effective epidemic management. It can be estimated relatively directly by testing of random population samples. Without such direct epidemiological measurement, other approaches are required to infer whether the number of new cases is likely to be increasing or decreasing: for example, estimating the pathogen effective reproduction number, R, using data gathered from the clinical response to the disease. For Covid-19 (SARS-CoV-2) such R estimation is heavily dependent on modelling assumptions, because the available clinical case data are opportunistic observational data subject to severe temporal confounding. Given this difficulty it is useful to retrospectively reconstruct the time course of infections from the least compromised available data, using minimal prior assumptions. A Bayesian inverse problem approach applied to UK data on first wave Covid-19 deaths and the disease duration distribution suggests that fatal infections were in decline before full UK lockdown (24 March 2020), and that fatal infections in Sweden started to decline only a day or two later. An analysis of UK data using the model of Flaxman et al. (2020, Nature 584) gives the same result under relaxation of its prior assumptions on R, suggesting an enhanced role for non pharmaceutical interventions (NPIs) short of full lock down in the UK context. Similar patterns appear to have occurred in the subsequent two lockdowns. Estimates from the main UK Covid statistical surveillance surveys, available since original publication, support these results. Replication code for the paper is available in the supporting information of doi/10.1111/biom.13462.


Introduction
Clinical data on the number of cases of Covid-19 (SARS-CoV-2) are subject to severe temporal confounding, as the rate of testing and criteria for testing have been changing rapidly on the same time scale as the infections, particularly in the early weeks and months of the epidemic. Because these are samples of convenience where the ascertainment fraction is changing and unknown, the data can clearly not be used to infer the actual number of infections. Neither, under normal circumstances, would statisticians recommend attempting to estimate the effective reproduction number of the pathogen from such data, since given the data problems the estimates must necessarily be driven strongly by the modelling assumptions (see e.g. Levine et al., 2001, §1.6 for general discussion of the problems with inference from non-random samples). Indeed generically it is often very difficult to infer epidemiological parameters from clinical data, without the results being informed as much by the prior beliefs encoded in the model as by the data (e.g. Wood et al., 2020). Much less problematic are estimates based on randomized surveillance testing, as now conducted in the UK by the office for national statistics (see Supporting Information for discussion of inferring incidence from testing data).
However some clinical data directly measure the quantity of epidemiological interest. This is the case for deaths with Covid-19 and for fatal disease duration. While not perfect, these data are less compromised than the data on cases. Deaths are reliably recorded and clinical grounds for suspecting Covid-19 are relatively clear for fatal cases, although accurately attributing death to a single cause is clearly not always possible. Good records are also often kept for such cases, with the result that there are several published studies on fatal disease duration , see section 2). Although only possible with a delay of some weeks, it is of interest to establish what these relatively high quality data imply about the time course of infections, without strong modelling assumptions.
Two types of daily death data are available. Daily reported deaths (e.g. Worldometer, 2020) typically show marked weekly fluctuations as a result of weekly patterns in reporting delays, and may exclude deaths in some locations (such as nursing homes). Registered death data, such as the ONS data in the UK (Office for National Statistics, 2020), contain deaths in all locations and record exact date of death. NHS (2020) publishes equivalent data for hospital deaths in England. The weekly cycle is less pronounced in these data, but their release is necessarily delayed relative to the daily reported deaths, although recent work partially overcomes this delay problem, by modelling the delays to enable 'now-casting' of deaths by actual death date: see Stoner et al. (2020). The right column of Figure 2 shows ONS, NHS and Swedish daily deaths by date of death (without now-casting).
The purpose of this paper is to show how a relatively straightforward statistical approach can be used to infer the fatal infection trajectory in the UK in a data driven way that makes the minimum of strong modelling assumptions. The approach is also applied to data from Sweden, the western European country offering the greatest policy contrast to the UK. Sweden never implemented full lockdown, sticking to less restrictive NPIs (broadly aimed at 'optimal mitigation' rather than 'suppression' in the terms used by Walker et al., 2020, who projected around 40 thousand deaths for this policy). Meaningful quantification of the aggregate strength of restrictions that are intrinsically multivariate is difficult, but in terms of their aggregate economic impact, Swedish GDP dropped by about 2.9% in 2020 compared to about 9.9% for the UK. Particular questions of interest are when the decline in fatal infections started in the UK and Sweden, whether UK infections were in substantial decline before full lockdown, whether the pathogen reproduction number was below 1 before lockdown, and how the timing of fatal incidence decline relates to the timing of the easing of lockdown.
Answers to these questions may contribute to judging the proportionality of lockdown measures in the UK context, where there is strong statistical evidence for very large preventable life loss being associated with economic deprivation, and of economic deprivation being increased by economic shocks. This evidence is reviewed in detail in Marmot et al. (2020). For example the deprivation related life loss that the current UK population was due to suffer before the Covid crisis was 140-240 million life years (or 2-3.5 years per capita, see Marmot et al., 2020, figure 2.3, for example). The range depends on whether the life expectancy of the lower decile or the lower half of the deprivation distribution is used as the reference for achievable life-expectancy. In examining the effects of the 2008 financial crisis and its aftermath, Marmot documents sharp reductions in life expectancy growth in the UK, which would imply a life loss burden in the 10s of millions of years. However attribution of such reduction-relativeto-trend is obviously very difficult. Less problematic is the 9 million life year loss implied by the increase in life expectancy gap between the more and less deprived halves of the UK population since 2008 (7 weeks per capita, see Marmot et al., 2020, figure 2.5, for example): given the evidence presented in the review, this is more difficult to attribute to causes unrelated to the 2008 economic shock. The Bank of England estimates the shock to the UK economy caused by the response to Covid-19 to have been the largest for over 300 years, so there is a clear danger of substantial life loss being caused, given the historical data for the UK. For example, a feature of the 2008 crisis already repeated in 2020, is the reliance on a large programme of quantitative easing. Quantitative easing is credibly argued to directly increase economic inequality via mechanisms related to asset price inflation (e.g. Fontan et al., 2016;Domanski et al., 2016). There is some literature attributing some short-term life saving to recessions (see e.g. Anon, 2020), but the effects are modest relative to the long term effects reviewed by Marmot. For comparison with the above figures, the life loss that might have occurred from a minimally mitigated Covid-19 epidemic appears to be in the region of 3 million life years (2.5 weeks per capita). This is based on Office for National Statistics (2019) lifetables, ONS Covid-19 fatality by age data, a mid range infection fatality rate estimate of 0.006, a somewhat high herd immunity threshold of 0.7 and a 1 year lower bound life expectancy adjustment for co-morbidities based on Hanlon et al. (2020). It is broadly in line with the UK government estimates (Anon, 2020). Given that 9 million life years, associated with the substantially smaller economic shock of 2008, is not negligible relative to 3 million life years potentially losable to Covid-19, there is obviously a delicate balance to be struck in the UK context, and evidence based on assumption light inference should probably play a role in shaping that balance. Another indicator of the difficulty of achieving the right balance is that the usual UK threshold for approving a pharmaceutical intervention is £30,000 per life year saved. On the basis of economic costs detailed in OBR (2020), and the preceding life loss figures, the non-pharmaceutical interventions used in the UK appear to have a cost per life year saved that is an order of magnitude higher than this (excess government borrowing is projected to peak at £660 Billion in the OBR central scenario, for example). This discrepancy in willingness to pay may lead to a problem of opportunity cost, as the same money can not be spent on preventing other life loss, such as that associated with economic hardship.
The remainder of the paper is structured as follows. Section 2 discusses the available information on the distribution of fatal disease durations, and how to combine it while adequately characterizing the associated uncertainty. Section 3 introduces a simple generalized additive model for direct modelling of the daily deaths trajectories, and shows how it can be extended to infer the trajectory of fatal infections, either directly or by inferring the trajectory of the pathogen effective reproduction number, R, in a simple epidemic model. Since the extensions are not standard models, and are relatively expensive to compute with using standard Bayesian software, section 4 outlines methods allowing computationally efficient inference with the models. Section 5 presents the main results on infection trajectories, and also the estimation of R. Section 6 discusses possible problems with the approach, in particular examining whether smoothness assumptions could be leading to substantial bias in inferred timings. Replication code and data are provided in the Supporting Information.

Fatal disease duration
Data on the incubation period from infection to onset of symptoms are analysed in many papers, for example Lauer et al. (2020) found that the period is 2 to 11 days for 95% of people, with a median of 5.2 days. A meta-analysis by  suggests a log-normal distribution with log scale mean and standard deviation of 1.63 and 0.50. The uncertainty in this distribution is negligible in comparison to the uncertainty in the distribution of times from onset of symptoms to death discussed next.
Several studies estimate the distribution of time from onset of symptoms to death, while properly controlling for the right truncation in the fatal duration data.  found that the distribution of time from onset of symptoms to death for fatal cases can be modelled by a gamma density with mean 17.8 and standard deviation 8.44, based on 24 patients from Wuhan.  suggested a gamma density model with mean 20 and standard deviation 10 based on 41 patients from Wuhan.  found that a log normal model offers a slightly better fit, and estimated a mean of 20.2 days and standard deviation of 11.6 days from 34 patients internationally. These distributions are shown in the left panel of Figure 1. A simple meta-analysis approach was used to combine the models. Samples of the correct size were simulated from each model and a log normal model was estimated by maximum likelihood for the combined resulting sample (n = 99). A further log normal was also fitted    found it to be a better model than the gamma. In addition, under strict conditions, I was able to access data on fatal disease durations for deaths occurring in English hospitals. Access to data with hospital acquired infections filtered out was not possible, so is was necessary to treat these data as a mixture of hospital and community acquired infections, as detailed in the Supporting Information. The resulting inferred fatal disease duration distribution for community acquired infection is plotted in blue in Figure 1 and is consistent with the published studies.
Finally, after the peer reviewed version of this paper was published, I realized that Figure 12 of Pritchard et al. (2020) gives the cumulative distribution function of the time from hospitalization to death for 24421 patients in the ISARIC study who had died by the time of the report. Furthermore this is at a point in time at which recruitment to the study was at a low rate, so that right truncation problems should be relatively minor. The study also reports the mean and standard deviation of time from onset to hospitalization. Combining this information as described in the Supporting Information, gives the densities shown as dashed blue curves in Figure 1. These data offer strong support for the distributions used in this paper, and in fact suggest that the results may be somewhat more certain than the plotted uncertainty bands imply.

Models
This section first introduces a simple generalized additive model for daily death trajectories, and then shows how this can be extended to directly infer the trajectory of fatal infections (fatal incidence), without having to assume any particular dynamic model for the epidemic. The resulting model is no longer a generalized additive model and is the model that this paper advocates using. Its structure is such that any method for inference with the model can also be used for inference with the dynamic model of Flaxman et al. (2020), with appropriate restriction of the incidence trajectory to one representable with that model. The Flaxman model is presented to allow comparison of the results from the infection trajectory model with the apparently contradictory results of Flaxman et al., but not to advocate its use. Basic deaths series model. Let y i denote the deaths or reported deaths on day i, assumed to follow a negative binomial distribution with mean µ i and variance µ i + µ 2 i /θ. Let where f is a smooth function of time measured in days, and f w is a zero mean cyclic smooth function of day of the week, D i ∈ {1, 2, . . . , 7}, set up so that f w (7), where k = 0, 1 or 2 denotes order of derivative. f (i) represents the underlying log death rate, while f w describes the weekly variation about that rate. The functions f and f w can be represented using splines with associated smoothing penalties λ f ′′ (t) 2 dt and λ w f ′′ w (D) 2 dD. Hyper-parameters λ and λ w control the smoothness of the functions. The model is a straightforward generalized additive model and (λ, λ w ) can be estimated as part of model fitting using a standard empirical Bayes approach as described in . The model provides a good fit to both the reported deaths and ONS data. As expected f w is greatly attenuated for the ONS data (it vanishes for Swedish exact death date data).
Infection trajectory model. To estimate the daily infection trajectory the model is extended by expressing f (i) in terms of the time course of earlier infections. Let f c (i) be the function describing the variation in the number of eventually fatal infections over time. Let B be the square matrix such that B ij = π(i − j + 1; µ, σ 2 ) if i ≥ j and 0 otherwise . π denotes an infection-to-death log normal density as discussed above. For the moment its parameters, µ and σ 2 , are treated as fixed but this will be relaxed in section 4.3. Given the continuity of the log normal, the given form for B ij can be viewed as approximating an integral of π over each day, using the midpoint of the integrand -it is straightforward to approximate the integral more accurately, but given that π is originally estimated from durations discretized to whole days, any precision gain is illusory.
is the expected number of deaths on day i. log f c (i) can be represented using a spline basis, again with a cubic spline penalty. Working on the log scale ensures that f c is positive, but is also appealing because it means that a cubic spline penalty on log f c (i) can be interpreted as a first derivative penalty r ′ (t) 2 dt, acting on the epidemiologists 'intrinsic growth rate', r. The final infection trajectory model is then obtained by simply substituting f (i) = log δ(i) into (1). B is rank deficient, so inferring f c can be viewed as an inverse problem: without regularization multiple solutions that oscillate from day-to-day are possible. This ambiguity is removed by the smoothing penalty on log f c .
Relaxed Flaxman model. Since this work was originally undertaken in late April 2020, the work of Flaxman et al. (2020) has appeared. Flaxman et al. make inferences about the reproduction number, R, and hence incidence rates, based on death trajectories and the fatal infection duration distribution of , but do so by modelling the pathogen effective reproduction number R t within a simple epidemic 'renewal model'. Flaxman et al. (2020) represent R t as a step function with steps allowed each time the government announced new containment interventions, and a sparsity prior promoting a small number of steps. In the notation of Flaxman et al. the expected number of infections each day (now total, rather than fatal) are denoted c t . Given an initial c 1 the model is iterated from t = 2 as follows where N is the total initially susceptible population, g 1 = 1.5 0 γ(x)dx and g j = j+.5 j−.5 γ(x)dx for j > 1. γ is the p.d.f. of a Gamma distribution with shape parameter 6.5 × 0.62 2 and scale parameter 0.62 −2 . The c t values multiplied by the assumed infection fatality rate give f c . The level of the IFR only matters for the damping term in the first bracket of the expression for c t -this has almost no effect in practice, a mid range value of 0.006 was used. The original assumptions about R t can be relaxed by representing log R t using a spline basis, with associated penalty as for the other models, while log c 1 is also treated as a free parameter. Hence f c in the infection trajectory model can simply be replaced by the Flaxman model with log R t represented as a spline function. The model is otherwise unchanged. This model is presented only to allow comparison of this paper's results with those of Flaxman et al. (2020): its simple single compartment structure clearly does not meet the aim of inferring incidence with minimal assumptions.

Methods
The infection trajectory and Flaxman renewal models are not standard models estimable with standard software. They can be implemented in Bayesian software, such as JAGS or STAN, but inference typically takes several hours if this is done. Dealing adequately with the uncertainty in the disease duration distribution multiplies this cost by 1-2 orders of magnitude. To avoid these problems an empirical Bayes approach can be employed.

Basic inferential framework
Direct inference about (1) uses the empirical Bayes approach of Wood et al. (2016) in which the smooth functions are estimated by penalized likelihood maximisation (e.g. Green and Silverman, 1994), with the smoothing parameters and θ estimated by Laplace approximate marginal likelihood maximization. Writing β for the combined vector of basis coefficients for f and f w , the penalized version of the log likelihood, l(β), can be written where S λ = λS f + λ w S w : S f and S w are known constant positive semi-definite matrices. Smoothing parameters, λ and λ w , control the smoothness of f and f w . Let β be the maximizer of the penalized log likelihood, and H its negative Hessian at β. Viewing the penalty as being induced by an improper Gaussian prior, β ∼ N (0, S − λ ), β is also the MAP estimate of β. Furthermore in the large sample limit Writing the density in (3) as π g , and the joint density of y and β as π(y, β), the Laplace approximation to the marginal likelihood for the smoothing parameters λ and θ is π(λ, θ) = π(y, β)/π g (β|y). Nested Newton iterations are used to find the values of log(λ), θ maximizing π(λ, θ) and the corresponding β (for details see Wood et al., 2016). Given (3) credible intervals for f are readily computed, but it is also straightforward to make inferences about when the peak in f occurs. Simply simulate replicate coefficient vectors from (3) and find the day of occurrence of the peak for each corresponding underlying death rate function, f .

Extension for the infection and Flaxman models
While inference about (1) using the preceding framework requires little more than a call to the gam function in R package mgcv, its application to the other models, which are not generalized additive models, requires more work. For the model formulated in terms of f c this requires expressions for the negative binomial deviance (or log likelihood) and its derivative vector and Hessian matrix w.r.t. the model coefficients.
First consider the negative binomial deviance for observation i, These need to be transformed into derivatives w.r.t. β, as follows: Writing X f and X w for the model matrices for the smooth terms log f c and f w , , division and multiplication are applied element-wise to vectors), and When using the relaxed Flaxman model, the preceding derivatives of f c have to be replaced with derivatives of f c w.r.t. the coefficients of the spline representing log R t . Routine application of the chain rule to (2) gives corresponding iterations for the derivatives of c t , and hence f c , w.r.t these spline coefficients and log c 1 .
Given these expressions and the penalties, β can be obtained by Newton iteration, given smoothing parameters. To estimate smoothing parameters, the simplest approach is to fix the negative binomial θ at its estimate from model (1), and use Wood and Fasiolo (2017), alternating generalized Fellner Schall updates of the smoothing parameters with updates of β given those smoothing parameters. This finds the smoothing parameters to approximately maximise the model marginal likelihood. The non-linearity of the renewal equation model means that some effort is required to get non-absurd starting values. I got these by a few minutes of experimentation with simple step functions for the initial log R t to get death trajectories of roughly the shape and amplitude of the true trajectories (a close initial fit is not required: initial deviances 2 orders of magnitude greater than for the final fit were unproblematic).
Given θ and the smoothing parameters, the approximate posterior (3) could be used directly, or as the basis for the proposal distribution in a simple Metropolis Hastings sampler. A fairly efficient sampler results from alternating fixed proposals based on (3) with random walk proposals based on zero mean Gaussian steps with a shrunken version of the posterior covariance matrix. By this method, effective sample sizes > 5000 for each coefficient took about 40 seconds computing on a low specification laptop. This was the approach used for the infection trajectory model. The results were indistinguishable from those produced at the cost of several hours of computing using JAGS (Plummer, 2003;Plummer et al., 2006) to simulate from the model posterior.

Disease duration distribution uncertainty
The methods so far perform inference conditional on fixed values for the parameters µ and σ 2 of the log normal density describing the infection to death duration distribution. In reality there is uncertainty about these parameters. To incorporate this uncertainty into the infection trajectory model, inference was run for each of the 100 sample distributions shown in grey in the right hand panel of Figure 1, and the resulting posterior samples were pooled, to give a sample from the unconditional posterior distribution of the model. Figure 2 shows the results of applying the model to the Office for National Statistics daily Covid-19 death data for the UK, to the NHS England hospital data and to the daily death data for Sweden from Folkhälsomyndigheten (2020). The results include the uncertainty about the disease duration distribution shape. ONS and NHS data are up to 27th June -including later data simply narrows the uncertainty, while making negligible difference to the overall conclusions. The most notable feature of the results is that fatal infections are inferred to be in substantial decline before full lockdown (the same result was obtained by this method in early May 2020, based on the first 50 days of reported daily death data). Sweden appears most likely to have peaked only one or two days later (barring some systematic difference in fatal disease durations for Sweden), having introduced NPIs well short of full lockdown. The results also emphasise the fact that the infection trajectory is not simply a time shifted version of the death trajectory (assuming it was might lead to unwarranted delay in easing lockdown, for example). The difference in timing and shape of the inferred profile between the ONS and NHS data reflects the fact that the latter contain care home data. There is an argument for preferring hospital data for inferring community fatal infections, in that the care home epidemic is now known to have special features with at least some of the infection not coming from normal community transmission. See in particular Comas-Herrera et al.

Results
(2020) for a discussion of care home deaths internationally, including the UK. In addition, in the UK, care home deaths were often attributed to Covid-19 without a test, especially after death certification guidelines were changed to encourage reporting of suspected, rather than confirmed Covid-19 deaths. The care home data therefore have some under-reporting of Covid deaths, followed by over-reporting (the signal of this is visible in ONS data in the change in non-Covid pneumonia deaths being reported, relative to normal, for example).
Taken together the results for the UK and Sweden raise the questions of firstly whether full lockdown was necessary to bring infections under control, or whether more limited measures might have been effective, and secondly whether the several month duration of full lockdown was appropriate. These emphasise the desirability of statistically well founded direct measurement of epidemic size through randomized testing. Had such testing being carried out leading up to lockdown it would have been clearer if the measures preceding lockdown (see Figures 2 and 3) were working, or whether stronger restrictions were needed. Similarly such testing might have given earlier indication of when lockdown could be eased. Instead management was reliant on a complex modelling synthesis of expert judgement and problematic clinical case data. Less statistically problematic reconstructions, like the one presented here, are clearly only possible weeks after the fact. Note that while it is natural to interpret these fatal infection trajectories as proportional to the overall infection trajectories, that will only be the case if the infection fatality rate is constant over time. There is evidence for improvements in hospital care from late March onwards that suggest that this is might not be the case (see Dennis et al., 2021). The Supporting Information includes a sensitivity analysis of this issue: it has the potential to right shift the peak incidence by up to a day and to lead to somewhat less rapid decay of the incidence trajectory.    (incidence), f c . The assumed mean time to infectivity was 1/γ = 3 days and the mean infectivity duration was 1/δ = 5 days. The labelled vertical bars show policy change dates in March 2020. Given the rapidity of policy change relative to the epidemic's dynamic time scale, and government policy sometimes lagging behaviour, casual over interpretation of these timings should be avoided. Right: sensitivity analysis. Dashed blue -time to infectivity was varied from 1 to 5 days. Grey -duration of infectivity was varied from 2 to 10 days. Logs are natural. R appears to be below 1 before full lockdown, but fell further after it.

Inferring R
Much public debate has focused on the effective reproduction number, R, and in theory it is possible for a decline in the rate of infections to be only temporary as a result of R dropping but remaining above one. Could it be that the declines in f c seen before lockdown were of this short term type, and that renewed increase would therefore have occurred without full lockdown? The answer appears to be no. R is all but impossible to measure directly, so inference about it requires assumption of an epidemic model. However, given an epidemic model it can be directly inferred from the reconstructed infection profile. For example consider a simple SEIR model:Ṡ = −βSI,Ė = βSI − γE,İ = γE − δI (here δI is the rate of recovery or progression to serious disease). f c is a direct estimate of βSI (to within a constant of proportionality), so by solvingĖ is readily computed (any constant of proportionality cancels in R). A different epidemic model could be used here of course: see Diekmann et al. (1990) for calculation of R in general from a model. Figure 3 shows the results using f c for the English hospital data for plausible values of average time to infectivity of 1/γ = 3 days and mean duration of infectiousness of 1/δ = 5 days, along with sensitivity analysis for these values. The credible intervals shown include the uncertainty about the fatal disease duration distribution. R appears to be below 1 before full lockdown. A useful feature of the R estimates is to emphasise that the analysis in this paper in no way suggests that lockdown did not have an effect on transmission. Even if R was below one before lockdown, full lockdown can only have reduced it further, and the estimates in Figure 3 are obviously consistent with this. Note, however that the recovery in R after the post lockdown dip is to be expected, given the simple fact that R is the number of new infections created per infection, averaged over the population of infections, not the population of people. Broadly speaking, at lockdown the population of people, and infections, was split into the locked down population, where infections could create few new infec- tions, and the 'unlocked' population where the reproductive rate of the pathogen was higher (assuming lockdown had an effect). An initial average over all infections is then dominated by those infections in the locked down population, giving a low R (especially once the possibilities for infecting locked down household members have been exhausted). As the infections in the locked down population die out, the proportion of all infections that are in the unlocked population must increase -so that an average over all infections must yield a higher R again.  (2). Within that model they assume that R is constant between the imposition of interventions, but can undergo a step change at each intervention: the steps are free model parameters. This model for R is quite restrictive. In particular it does not allow R to change after lockdown, despite the fact that at lockdown the population has been stratified in a way that the renewal model does not represent, so that some compensating flexibility in R is likely to be required to avoid modelling artefacts. At the same time the model is rather underdetermined preceding lockdown, because of the frequent intervention changes. This indeterminacy in the model is addressed by using a sparsity promoting prior on the step changes in R, which favours few larger changes, rather than several smaller changes (see the supplementary material for Flaxman et al. for a description of this prior). When using the model to simultaneously model multiple European countries there is a further assumption that the intervention effects are the same for all countries (despite the different order of their implementation) and that only the lockdown effect varies between countries. It seems likely to be difficult to pick up effects of the interventions preceding lockdown from such a model structure.

The Flaxman model
A relaxed version of the Flaxman model in which log R t is a continuous function is described in section 3. The results from using this model for inference using the NHS hospital data are shown in Figure 4. The relaxation of the assumptions on R brings the results (for the UK) into alignment with those in the rest of this paper, and into broad consistency with developments later in the year, which are otherwise difficult to square with Flaxman et al. (2020). tier system lockdown end Figure 5: Inference for the English hospital deaths data up to mid February 2021, including disease duration uncertainty. Top: Inferred fatal incidence. Grey symbols are the hospital deaths from which incidence is inferred. Red vertical lines mark the start of each of the three English lockdowns. Note that improvements in medical treatment mean that the IFR is very likely not to be constant between the first and later waves, so comparing their relative sizes is difficult. Bottom: the inferred R using the simple SEIR approach. NPI impositions short of lockdown are marked by dotted vertical lines, relaxations are marked by dashed lines.

Later infection waves
While the initial motivation for this work was to provide reasonably timely analysis for the first wave, based on the limited data available by May 2020, the methods scale readily to the much longer data runs available by early 2021. The only change is that it makes sense to use an adaptive smoother (see e.g. Wood, 2017, §5.3.5) for f (t), in which the degree of smoothness is allowed to vary with time. The longer data runs make it feasible to estimate the multiple smoothing parameters that this entails. Using an adaptive smooth guards against artefacts driven by the smoothness that is appropriate on average, for all the data, not being appropriate at times of rapid change.
The results of this application are shown in Figure 5. Note that likely changes in infection fatality rate as a result of improved hospital treatment mean that the relative sizes of the fatal infection incidence curves in the first and subsequent waves can not be interpreted as reflecting the relative sizes of total incidence (the later incidence curves would need to be scaled up somewhat). Causal over-interpretation of the R curves should be avoided, not least because there is no reason to expect Covid-19 not to display the seasonality in transmission common to other respiratory illnesses. However, the results are obviously inconsistent with full lockdowns having caused R < 1, since cause should not happen after effect. Further, the drop in R seen after the initial NPIs were introduced, but before full lockdown, does seem consistent with the levels of R later achieved while measures short of lockdown were in place. The interesting feature of R apparently increasing from quite early in the second lockdown, might relate to the spread of the new variant, but of course also occurs at a time when respiratory infections generally start to increase. Likewise the further increase until mid December, may well be due to the new variant, but increased activity in the run up to Christmas is also likely to be a factor -incidence appears to peak over the Christmas to New Year period. Vaccine rollout seems virtually certain to be a major factor in pushing down R and fatal incidence from December. The vaccine has been given to those most at risk first, so the constant IFR assumption required to interpret fatal incidence as proportional to total incidence obviously no longer holds. This further implies that the inferred R is in some sense only the R relevant to the 'at serious risk' population. Of course, it could be argued that for epidemic management purposes, the fatal incidence and the corresponding R are of primary interest.
Interestingly the pattern observed at the second lockdown and in the preceding months is consistent with the results reported by  who analysed regionally stratified death, hospital occupancy and testing data for 2020 up until December, using a highly detailed age structured SEIR with added health service compartments. The entire trajectory up until December is also consistent with the results of Wood and Wit (2021), who re-implemented the Knock et al. model, but removed some of its very strong modelling assumptions around the first lockdown.

Model checking
While standard residual checks indicate no problem with the model from the point of view of statistical fit, there are three issues which could potentially undermine the results, and a further issue relating to interpretation.
The first relates to the infection to death interval distribution and the fact that the death data contain an unknown proportion of patients whose infection was hospital acquired. These patients are likely to have had shorter disease durations, since they were already sufficiently unwell or frail to be in hospital. This paper has inferred when the fatal infections would have occurred if they were all community generated, since it is the community infections that are of interest with respect to the effects of lockdown, social distancing etc. Without knowing even the proportion of deaths from hospital acquired infection it is anyway not possible to do otherwise.
The presence of hospital infections in the death data will bias inference about the dynamics of community fatal infections if it substantially changes the shape of the deaths profile, relative to what would have occurred without hospital infection. Broadly, if the trajectory of hospital acquired infection deaths peaked earlier than the overall trajectory, then the community infection peak will be estimated to be earlier than it should be (since the true community infection death peak is then later). Conversely, if the hospital acquired infection deaths peaked later, then the community infection peak will be estimated as being later than it should be. The degree of bias will depend on the proportion of hospital acquired infections and the degree of mismatch in timings. It is difficult to judge which alternative is more likely: standard epidemiological modelling assumptions would imply that the more community acquired cases are hospitalised the more hospital infections would occur and that hospital infections will lag community cases. But against this, hospital acquired fatal disease durations are likely to contain a higher proportion of shorter durations. In any case the proportion of hospital acquired infections in the death series would have to be quite high for the issue to substantially modify the conclusions.
The second issue is that age dependency in the duration distribution coupled with shifts in the age structure of deaths over time could also be problematic. However, as documented in the Supporting Information, the data for England and Wales show remarkably little variation in the age structure of Covid-19 fatalities over the course of 2020, while analysis of English hospital data apparently shows little evidence for age dependence in the disease duration distribution.
The third issue is whether the smoothing penalty on log f c would lead to systematic mis-timing of  Figure 2. The blue symbols are the simulated death data used for inference. The bottom row is for the NHS England hospital data under the time dilated model. Even this model deliberately modified to promote a very abrupt change at lockdown suggests that the infection rate was probably declining before lockdown.
the estimated peak under the scenario of a very asymmetric peak in the true infection profile around lockdown. To investigate this, data were simulated from a model in which the underlying infection rate increased geometrically, doubling every 3 days until lockdown, when the rate dropped immediately to 0.2 of its peak value, shrinking thereafter by 5% per day. Fatal infections were simulated as Poisson deviates with the given underlying rate. This model is an extreme scenario, in which measures prior to full lockdown had no effect, and the effect of lockdown was instant, as if the locked down population (i.e. those not in essential work) had isolated alone, rather than increasing their contact with members of their household while drastically reducing it with everyone else. However it is the scenario implicit in much public discussion in the UK, at least at the time that this work was originally conducted. Under this scenario, the method does indeed tend to incorrectly estimate the infection peak as 2 to 3 days before lockdown, rather than the day before, as it struggles to accommodate the drop.
The naive approach to this issue is to introduce a parameter at lockdown representing an instantaneous drop in infections. However doing so introduces a very strong structural assumption into the model, undermining the aim of avoiding strong assumptions. This approach also has the serious side effect of introducing non-parametric smoothing boundary effects on both sides of the break. These boundary effects severely compromise inference in the most interesting region of the infection profile, while simultaneously increasing the importance of the structural assumption at the expense of the data. Indeed when such a model is built it estimates a large drop even from data simulated from a smooth infection profile. It also estimates such a drop if the drop's location is moved (for simulated or real data).
A better approach is to use a smooth time-dilation to relax, but not eliminate, the model smoothness assumptions in the vicinity of lockdown. The dilation is made sufficient that the model can accurately capture the extreme scenario in the simulation, but without imposing a break and boundary effects. In particular f c and its smoothing penalty are computed with respect to a version of time which makes the day before, of and after lockdown count as 3.5, 6 and 3.5 days, respectively. Obviously regular un-dilated time is used for mapping infections to deaths. For the extreme simulation, the model then correctly gives most posterior probability to the day before lockdown as the peak. In contrast the same model for the real data has very low probability of the peak being the day before lockdown rather than earlier. Figure 6 shows the results from fitting the time dilated model to the extreme simulation scenario and to the NHS England hospital data. Even this model, deliberately modified to favour a very abrupt change at lockdown, suggests that infections started to decline before lockdown, with the most likely day for the peak only 1 day later than with the un-dilated model. The Supporting Information includes similar checks for the Flaxman et al. (2020) model, with similar conclusions. Finally, interpretation of the fatal incidence trajectories as proportional to the overall incidence trajectories rests on an assumption that the infection fatality rate is constant over time. There is evidence that the hospitalized case fatality rate declined in the two months or so after the peak of the first wave of infections (Dennis et al., 2021), with this effect not explicable by any detectable change in patient characteristics. However, on the ground changes in the severity threshold for admission would be very difficult to detect, seem likely at times when some hospital's were at or near capacity, and could also contribute to such a pattern. The Supporting Information includes a check of the impact that the reported improvements would have on the shape of inferred overall incidence. The peak incidence could be shifted by as much as a day later, and there would be a somewhat slower decline in incidence relative to the results plotted in Figure 2.

Discussion
The analysis in this paper does not absolutely prove that substantial decline in fatal infections in the UK preceded the first full lockdown by several days, but it very strongly suggests that this is what happened, and the ISARIC fatal disease duration distribution data, not available in the peer reviewed version of   Ward et al. (2021), lagged by the mean 5.8 days from infection to first symptoms. Day 1 is Jan 1 2020. Red lines are the lockdowns. Rather than simple lagging, this paper's method could be applied to the onset dates to infer infection dates given the published infection to onset distribution, but because none of the onset peaks are after lockdown this can not change the qualitative conclusion (although it would sharpen the peaks a little). Right ONS published estimated incidence and 95% confidence limits. Red lines are at lockdowns 2 and 3 (the survey was not running at lockdown 1). the paper, only strengthens this point further. Since the peer reviewed publication, information has also become available from the UK's two main Covid surveys based on proper randomized statistical sampling. The Office for National Statistics PCR prevalence survey has now published its estimates of incidence. For the second and third lockdown these support the analysis in this paper. See Figure 7. Furthermore the REACT-2 survey (Ward et al., 2021) has reconstructed numbers of newly symptomatic cases per day from survey participants testing positive for Covid-19 antibodies. Lagging these by the published mean time from infection to first symptoms gives the left plot of figure 7. This agrees with the results of this paper for all three lockdowns. Some differences in relative peak sizes between the series are to be expected if there are changes in infection fatality rate, and as a result of the vaccination program. Taken together these results, from the 3 highest quality data sources available for the UK epidemic, leave little room for reasonable doubt that incidence was in decline before each lockdown. Apparent demonstrations of the contrary result appear to rely on building it into the modelling assumptions, or on highly informal reasoning about timings or correlations, or on a view that scientific opinion, in sufficient quantity, outweighs data and measurement.
The timing result in turn suggests that the measures, and possibly the spontaneous reactions to rising deaths and cases, preceding lockdowns were probably sufficient to bring the epidemic under control, and that community infections, unlike deaths, were probably at a low level well before the first lockdown was eased. Such a scenario would be consistent with the infection profile in Sweden, which began its decline in fatal infections shortly after the UK, but did so on the basis of measures well short of full lockdown.
The analysis does not in itself say what would have happened without full lockdowns, and must obviously be weighed against other evidence. No currently available analysis will conclusively determine what would have happened without full lockdowns, and the state of the art in causal inference is obviously a very long way from being able to answer this question. Models based on approximations to the mechanisms of epidemic transmission do not allow reliable answers to these causal questions either. This is particularly so given the paucity of data with which to validate their component assumptions -a paucity that only grows more acute as more detail is included in the models. These are not weather or climate models, based on the bulk properties of enormous numbers of physically well understood interactions of simple molecules, tested and refined against huge quantities of carefully measured calibration data collected worldwide over decades. Rather they are best working approximations constructed by experts given the limited information that could be rapidly assembled in a matter of months, and subject to all the uncertainty this implies. A model does not become a valid basis for casual inference merely by being described as mechanistic. As the above reanalysis using the Flaxman model serves to emphasise: the inclusion of model structure aiming to represent mechanism is no guarantee of improved statistical inference, and certainly not a justification for treating inference with mechanism based models as causal.
Since this work was first undertaken other low assumption analyses have appeared, in particular looking for the coincidence of NPI introductions and changepoints in incidence, for example in Germany and Spain. The results of this paper are in some alignment with such analyses for Germany (Wieland, 2020;Küchenhoff et al., 2020), which also suggest that a decline in incidence preceded the first full lockdown. Both are based on case data, which are problematic even in Germany which had mass (but not randomized) testing in place from the start of the epidemic. However it seems likely that the biases in case data would lead to the start of decline in incidence being estimated as later than it really was, rather than earlier, so the qualitative conclusion is likely to be robust. In Spain, Santamaría and Hortal (2020) also identify substantial changes in rate of change of incidence before Spanish lockdown based on death series, but not sufficient to suggest a decline in incidence before lockdown. Based on pre-print versions of the current paper a number of researchers have also attempted to employ the basic idea of dynamic model free inference about incidence profiles, but via a simplified method. This method tries to impute date of infection by subtracting a random draw from the fatal duration distribution from each deceased patient's death date. This process is replicated to obtain an expected incidence profile. The method is invalid, as duration of disease is not independent of time of death, and it will tend to incorrectly show much less steep, or no, decline before lockdown. See the Supporting Information for a full discussion.
The results of applying the method to data up to mid February 2021, as well as being consistent with the independent measurements in Figure 7, also provide a picture consistent with the results for the first lockdown. In particular the results preceding the first lockdown appear consistent with how the epidemic progressed under later restrictions short of lockdown. This is not the case for the published analyses suggesting high R and surging incidence on the eve of the first lockdown. The fact that school re-opening does not appear to be followed by an increase in R is interesting: whether it relates to people deciding to keep school children apart from the vulnerable, which is anecdotally plausible, or to other factors, is unclear. While tempting, it is difficult to interpret the later patterns in terms of the new, apparently more infectious, variant that emerged in late 2020: there is confounding with seasonality of transmission, behavioural changes around the end of year holidays and with the roll out of effective vaccines from late December onwards. Greater clarity on these issues may emerge in future, particularly if the UK ONS Covid surveillance data eventually becomes public in raw form.

Acknowledgments
Thanks to Nicole Augustin, John Coggon, Dan Cookson, Peter Green, Dirk Husmeier, Anna Jewell, Jason Matthiopoulos, Jonty Rougier and Ernst Wit for various suggestions and discussions of aspects of the paper. I am grateful to Robert Verity for his unsolicited offer to help get access to the duration data for England, for making the effort to get the permissions and follow through on this, and for being willing to repeat the original data extraction to remove right truncation problems. I am also very grateful to a non-attributable source who provided key information on the necessity of filtering out patients with onset after hospitalization from such data. I thank two fact checkers who, although they declined to correct an article at fullfact.org to remove misrepresentations of this paper,  and Flaxman et al. (2020) and some other related statistical flaws, alerted me to the ISARIC study.
The peer review process for this paper was unusual, with 5 referees and 3 editors/associate editors providing comments, which influenced the paper as follows. The paper was first rejected by JRSSC (Applied Statistics) in early May 2020 for insufficient methodological novelty, and then by Nature and Nature Communications, all without refereeing. In late May it was sent to PLOS ONE and reviewed by an epidemiological referee: they argued that the paper should not be published as, although the substance of the paper appeared correct, a reader might interpret it as implying that lockdowns were unnecessary. They also argued that the drop in incidence could have occurred despite R > 1 before lockdown: this latter point inspired the simple R calculation presented in the paper. Editor Abdallah M. Samy rejected the paper (at the end of July) on the basis of this first report, a decision formally overturned on appeal in early August. A second referee then strongly recommended publication in a detailed review (dated late August), and provided useful references on the care home epidemic. At the same round a third referee commented briefly on the paper, stating that recessions save lives and consequently suggesting removal of one paragraph of discussion from the paper. On 20 November Abdallah M. Samy formally invited a major revision, but added his own report stating that the paper was unacceptable, arguing that the discussion of possible human costs of lockdown was inappropriate, that incidence can be meaningfully inferred from case data (how was not stated), that statistical analysis could suggest nothing whatsoever about what might have happened without lockdown (a dynamic epidemic model is apparently required), that the study needed an ethics committee statement, and that the paper was remiss in not citing Flaxman et al (the update including the Flaxman model having been supplied 2 months earlier). These points motivated the transfer of the paper to the statistical journal Biometrics and the discussion of causal inference in the discussion. One referee and the AE at Biometrics made the very useful suggestion to analyse the data up to mid February 2021, and provided helpful extra references, substantially improving the paper. The editor's comments were also helpful. A further (theoretical epidemiologist) referee understood the paper's aim to be implementation of the Flaxman et al. model as a GAM, motivating much clearer signposting of several technical aspects. Their comments on age structure effects and the feasibility of PCR testing based incidence estimation led to the sections in the Supplementary Material on these points, and they also suggested giving calendar dates on plots making them much more readable.
Supporting Information for 'Inferring UK COVID-19 fatal infection trajectories from daily mortality data' 1 Feasible direct inference of incidence from randomized PCR testing Useful estimates of incidence can be obtained from properly randomized PCR surveillance testing, even using numbers of tests well within the laboratory capacity available early in the epidemic. This section provides a simple illustration of this, by sketching a method and showing its ability to capture incidence profiles at the sort of levels that are important for decision making -i.e. at a level slightly over 1 per 1000 per day. For illustrative purposes I consider a very simple model of PCR positivity in which the proportion, P , of people potentially testing positive is governed by the simple ODE model (ẋ denoting the time derivative of x)Ṗ = f (t) − δP where f is the incidence (strictly speaking of potential PCR positivity) as a proportion of the population, and 1/δ is the mean duration of positivity. One could of course substitute any number of alternative models for the assumption of an exponential distribution of the time that subjects are PCR positive, without changing the basic approach discussed here. With only slightly more effort a stochastic formulation could also be substituted (although is likely to add little, given the large numbers involved). The number testing positive in random samples of size N from the population is then given by where α is the test sensitivity (which is measurable in a reasonably direct manner). As in the main paper, we can represent f semi-parametrically, e.g. using a smoothing spline, so that where X t is a row vector of spline basis functions evaluated at time t. Writing the derivative of P w.r.t. β j as P β j we have an ODEṖ β j = f (t)X tj − δP β j for each such derivative (known as sensitivities in this context). Given any value of β it is straightforward to solve for P and the sensitivities, for example by 4th order Runge-Kutta integration. Hence the log likelihood and its derivative are readily evaluated, and the empirical Bayes approach given in the main paper can be used to find the posterior models,β, an appropriate smoothing parameter and the large sample posterior covariance matrix. To avoid requiring the second derivative ODE system,β can be obtained by quasi-Newton optimization, with the Hessian required for smoothing parameter update obtained from the first derivative of the log likelihood by finite differencing.
By way of illustration, data were simulated from such a model for 100 days, with 400 tests per day (2800 per week) conducted on randomly selected people from a general population subject to the true incidence curve shown in red in figure 8, and δ = 0.1. The method was then used to reconstruct the incidence curve (here 100% sensitivity was assumed, since sensitivity is a simple scale parameter in this problem). Three random replicate reconstructions are shown in figure 8. Uncertainty is wide at the end of the data, but usable for 10 days earlier. Of course the swab to testing lag adds to this. Larger sample sizes would be needed if local/regional estimates are required, but for the 'whole country' picture considered in the main paper such direct estimation is clearly feasible.

How not to infer fatal incidence
Several researchers picked up the pre-print version of this paper (Wood, 2020) and have attempted to use the basic idea of inferring fatal incidence directly from death trajectories and the fatal disease distribution, but via a simple 'imputation' method. Suppose the ith patient died on day t i . A random draw from the fatal disease duration distribution, τ i , is subtracted from their death day to give an imputed infection day, t i −τ i . Repeating this for all deaths generates an imputed fatal incidence curve. Repeating the imputation many times allows an expected incidence curve to be generated. This method is not valid. It is completely plausible that duration of disease is independent of time of infection, but not of time of death. Further, unless incidence and deaths are at some constant equilibrium, duration of disease can not be independent of both time of infection and time of death: when deaths are rising, we inevitably see the deaths from short duration diseases before those from longer durations. Since the imputation method assumes independence of t i and τ i it can not be valid. Figure 9 shows that this is not a minor concern. It shows incidences reconstructed using the described imputation method. I then added random draws from the fatal duration distribution to the imputed days of infection, to obtain the expected daily deaths implied by the imputed incidence trajectory (essentially the 'sanity check' applied in the main paper). The expected daily deaths are an exceedingly poor fit to the data.

Fatal disease duration distribution
Fatal disease duration data for England are available in the CHESS 1 database, access to which is restricted to particular research groups under strict conditions. With the kind help of Robert Verity from Imperial College I was able to access information on the distribution of fatal disease durations for 3274 deaths that occurred before 10 June 2020 with recorded symptom onset before 1 May. The information provided was a bar chart of the duration distribution by day, on condition that only the information about the model fitted to the data be distributed further. The data were not filtered to remove hospital acquired infections, but it was not possible to obtain data only for those with onset before hospitalization. This is problematic for two reasons. Firstly, for inferring the time course of community acquired fatal infections it is the distribution of fatal disease durations for community acquired infections that is required, which the raw data do not provide: for example, they contain substantial proportions of durations of 1-3 days that appear clinically implausible for deaths from community acquired COVID-19 (see, e.g. Huang et al., 2020;Wang et al., 2020;Zhou et al., 2020;Tay et al., 2020). Secondly the raw data are from a relatively small proportion of the total deaths. It is very unlikely that the ratio of hospital to community acquired infections in this sample is representative: for hospital acquired infections the onset of symptoms is presumably almost always known, and hence more likely to be recorded than for community acquired infections. This makes the raw distribution unrepresentative of the distribution for all deaths and also not usefully informative about the proportion of all deaths that are from hospital acquired infection. Note also that without more extensive data access it is not possible to rule out that some proportion of what appear to be hospital acquired infections really represent data problems (for example recording onset day as hospital admission day).
To deal with these issues a two component mixture model was fitted to data digitized from the bar chart, consisting of a gamma distribution (representing hospital acquired infections) and a log-normal distribution (representing community acquired infections). Parameterization was such that the lognormal had the longer mean duration. The higher the gamma mixture proportion the larger the lognormal mean. To find the shortest mean community acquired duration defensible from the data, the gamma mixture proportion was reduced to the point at which the log likelihood was about 4 below the MLE (decreasing further decreases the log-likelihood sharply, pushes a χ 2 goodness of fit statistic into the significant range, and starts to suggest rather high probabilities of very short disease durations for the log-normal mixture component). This point has about 0.7 of the mixture contributed by the community infection component. The resulting log-normal community infection fit has a mean of 21 days and a standard deviation of 12.7. Longer durations would be slightly more consistent with the data under the mixture model, but given the aims of this paper it is better to use conservatively short estimates here. Figure 10 shows the various estimated distributions over the duration range observed in the CHESS data. The log-normal model has an earlier mode, but longer tail, than the Verity et al. (2020) model used in earlier versions of this paper.
It should perhaps be noted that this model was obtained before I was aware of  and . Note also that the data for this model were obtained before the decision to attribute deaths to Covid-19 only if there was a positive test within the 28 days preceding death: this may be the reason for the model's slightly heavier tail. Otherwise the results are broadly in agreement with those from the published studies.
Assuming independence of incubation period and onset to death period, the preceding fit and the  . The mixture model was estimated by maximum likelihood, with the hospital acquired mixture proportion reduced until the profiled log likelihood was reduced to 4 below the MLE, to obtain the shortest mean community acquired duration consistent with the data under this model. The black curves in no way inform the red curve in the fitting.
More recently results of Robert Verity's own more detailed analysis of the CHESS data have appeared in . The full fitted distribution is not given, but the figures that are reported imply a slightly shorter mean duration of just over 24 days. This is just under a day and a half less than for the mean duration for the average distribution used in the main paper, and within the uncertainty range considered in the paper.
A second less problematic source of information is the ISARIC study, which I was unaware of before the peer reviewed version of this paper was published. Pritchard et al. (2020) summarizes information on the patients enrolled in this study up until October 2020. At this time point the recruitment rate to the study was quite low, so that right truncation problems should have a fairly low impact. Figure 12 of the study plots the observed cumulative distribution function for time from hospitalization to death for the 24421 study patients who had died up to the time of the report. This can be digitized and converted to a probability function, which can be well approximated by a log-normal density (mean 12.47 and standard deviation 10.97) -see Figure 11. Pritchard et al. (2020) report that the time from onset of symptoms to hospitalization had a mean of 7.7 days and standard deviation of 6.1. So the mean time from onset to death is about 20.2 days. Assuming independence of the two durations the standard deviation of the onset to death duration is then 12.55.
Modelling the onset to hospital duration as log normal, the best fit lognormal approximation to the distribution of time from infection to death then has a mean of 27.7 and standard deviation 12.0 (an alternative gamma approximation was a poor fit). Note that this distribution estimate does not take into account the small amount of right truncation present in the data, so might be slightly biased towards lower durations.  Figure 11: Empirical probability function of hospitalization to death interval distribution obtained by differencing the CDF digitized from Figure 12 of Pritchard et al. (2020), fitted (to minimize KL-divergence) by a log normal density.

Possible age structure effects
One possible concern is that if the distribution of fatal disease duration is strongly age dependent and the age distribution shifts over time, then the results of the paper's analysis could be biased in ways that could be difficult to correct. In fact Dennis et al. (2021) looked for temporal changes in patient characteristics including age as possible explanations for the mortality improvements that they report in the early months of the epidemic, but did not find age distribution changes in hospitalized patients. Additionally  analysed English hospital data to parameterize a detailed age structured epidemic and hospital model, but while they report age effects on rates of hospitalization and transfer to ICU, with different distributions of time to death for ICU and general ward patients, those distributions are not reported to be age dependent.
While both the cited analyses rely on confidential data with stringently controlled access, it is possible to look for evidence of age distribution shifts in the weekly England and Wales Covid-19 deaths by age data publicly available from the UK Office for National Statistics 2 . These data give total England and Wales Covid-19 deaths each week in 20 age bands, < 1, 1-4, 5-9 , . . . , 85-89 and 90+. They also record the total number of Covid-19 deaths each week in care homes for the elderly in England and Wales. To look for age distribution changes in hospitalized patients, it is necessary to remove the care home deaths from the weekly totals. The care home deaths are not broken down by age, so I simply reduced the total deaths in the last three age classes by the same proportion, in order to reduce the each weekly total deaths by the correct amount.
A negative binomial generalized additive model was then fitted to the data, with the structure where a i denotes the age class (a number from 1 to 20) and w i is the week. f 1 and f 2 are univariate splines, while f 3 is a tensor product interaction spline, (without the main effects). Thus f 3 represents any change in age distribution of deaths over time. See Wood (2017) section 5.6.3 for details. f 3 is statistically significant, but the effect size is too small to be biologically significant. Figure 12 shows the estimated model components. Leaving the care home deaths in the totals leads to a slightly stronger interaction ranging from about -0.6 to 0.4 in the early weeks. This reflects the somewhat different dy-  Figure 12: Generalized additive model term estimates on the log link scale, from fitting to England and Wales hospital death by age data. Left is the effect of week, middle of age group and right of their interaction. The interaction ranges over approximately -0.2 to 0.2, and is clearly a very small effect relative to the others.
namics of the care home epidemic relative to the community epidemic, as discussed in the main paper. The main effects are essentially unchanged from those shown.

Further model checking of relaxed Flaxman model
The time dilation check from the Model checking section of the paper was also applied to the relaxed Flaxman et al. (2020) model, with the results shown in the upper panel of figure 13. Again the results are qualitatively similar to the undilated case, despite modifying the model to favour sharp change in R at lockdown. Although highly problematic for the reasons discussed in the paper, the results of a check using a model in which a step change was forced to occur at lockdown is also shown in the lower row of figure 13. The boundary condition artefacts that this introduces are clearly visible, but in any case the inferred R on the eve of lockdown is about 1.5. This is substantially below the Flaxman et al. estimates of close to 3.

Sensitivity to mortality rate reductions
There is evidence for reductions in the hospital mortality rates in England from the week of 29th March 2020 until the end of June, with this reduction apparently not being attributable to any change in patient characteristics: Dennis et al. (2021) report mortality rates reducing by a multiplicative factor of about .985 per day (before then, if anything the death rates were increasing). While this does not undermine inference of fatal infection incidence, it obviously means that fatal disease incidence should probably not be interpreted as proportional to overall incidence. Given the uncertainties in the Dennis et al. (2021) results, a direct correction is difficult. Furthermore ruling out changes in severity of disease required for admission over the first wave is also not possible: for example, general practitioners (family doctors) were initially working with central guidance on when patients should self isolate, but not when they should be sent to hospital, so it seems unlikely that on the ground admission criteria were constant, especially at times when some hospitals were at or near capacity. However a sensitivity test is straightforward. The observed deaths each day can simply be scaled up by the ratio of the number of deaths expected without improvements to the number expected with improvements (assuming .985 per day im-  Figure 14: Sensitivity of the results to improvements in the IFR. To interpret the fatal incidence trajectories as proportional to overall disease incidence, the IFR has to be constant. There is evidence for this not being the case as hospital care has improved. These plots show inferred incidence from death data 'corrected' for the mortality rate improvements estimated in Dennis et al. (2021). Note the very slight rightward shift in the peak timing distribution, and somewhat slower decay in the incidence profile. provement from 29th March). This has the effect of making the downward tail of the adjusted deaths series decay more slowly than for the observed deaths (see right panel of fig 14). Applying the method to the English hospital data then gives the results in figure 14. There is a shift in the inferred peak incidence to later, and the incidence decays more slowly, relative to the results shown in the main paper. Note that the mortality improvements only apply to hospital deaths, not care home deaths.