Inference under Superspreading: Determinants of SARS-CoV-2 Transmission in Germany

Superspreading complicates the study of SARS-CoV-2 transmission. I propose a model for aggregated case data that accounts for superspreading and improves statistical inference. In a Bayesian framework, the model is estimated on German data featuring over 60,000 cases with date of symptom onset and age group. Several factors were associated with a strong reduction in transmission: public awareness rising, testing and tracing, information on local incidence, and high temperature. Immunity after infection, school and restaurant closures, stay-at-home orders, and mandatory face covering were associated with a smaller reduction in transmission. The data suggests that public distancing rules increased transmission in young adults. Information on local incidence was associated with a reduction in transmission of up to 44% (95%-CI: [40%, 48%]), which suggests a prominent role of behavioral adaptations to local risk of infection. Testing and tracing reduced transmission by 15% (95%-CI: [9%,20%]), where the effect was strongest among the elderly. Extrapolating weather effects, I estimate that transmission increases by 53% (95%-CI: [43%, 64%]) in colder seasons.


Introduction
At the point of writing this article, the world records a million deaths associated with Covid-19 and over 30 million people have tested positive for the SARS-CoV-2 virus. Societies around the world have responded with unprecedented policy interventions and changes in behavior. The reduction of transmission has arisen as a dominant strategy to prevent direct harm from the newly emerged virus. Yet, quantitative evidence on the determinants of transmission remains scarce.
Individual variation in transmission (dispersion or potential for so called superspreading [25]) is one of the reasons why the collection of adequate evidence is challenging. Overdispersion (high likelihood of superspreading) is well documented for SARS-CoV-1 [27,24] and is shown to be a dominant feature of SARS-CoV-2 (see also [1]). Under superspreading single cases convey little information as transmission kinetics are driven by few outliers. Thus, conclusions drawn from anecdotal evidence with few primary cases (e.g., [16] and [19]) are subject to substantial uncertainty.
In the absence of sufficiently large and detailed contact tracing data, some research has relied on surveillance data with aggregated cases [30,29,5] or deaths [13,5]. However, such data provide additional challenges to inference, including a time lag in reporting, under-reporting of cases, and a lack of established methods to account for superspreading.
I address those issues with the following three points. First, by aggregating cases based on symptom onset instead of reporting date, the timing of transmission is estimated more sharply. Second, I account for underreporting by modelling infections as compartmentalized unobservable latent process. Age compartments allow to identify age specific growth rates if reporting differs between age groups. Finally, the paper proposes a probabilistic infection process, that extends well-established models for superspreading [25] to aggregated case counts. I apply the model to German surveillance data and find that transmission was reduced predominantly by public awareness rising and voluntary behavioral adaptations to publicly reported local incidence. Furthermore, extrapolation suggests strong seasonal effects and potential for testing and tracing.
In the following, I introduce the model. Afterwards, the data is described, and estimation results are illustrated and discussed. A supplementary document contains additional detail in Sections S1 through S6.

Model
The model distinguishes between cases and infections. Infections at time t are denoted by i t . Infections are unobserved. Instead, we observe the number of reported cases c t with symptom onset at time t. The full model (Supplementary Section S1) uses daily data and accounts for timing as standard in the literature [7,13]. In this section, I present a simplified version to illustrate the implications of superspreading for statistical inference. In particular, the generation time (from primary infection to secondary infection) and incubation period (from infection to symptom onset) are fixed to one time step. Further, indices for age and location are dropped. This simplified model is illustrated in Figure 1. A primary infection causes new infections in the following time interval. The average number of secondary infections caused by one primary infection is denoted by the reproductive number R t . The probability of an infection being reported as a case is denoted by the reporting rate r t . In a popular transmission model [25] secondary infections follow a negative binomial distribution with mean R t and dispersion Ψ, i.e. N B(R t , Ψ). Dispersion Ψ describes the individual variation in transmission, where small values are associated with high variation. The reproductive number R t is determined by the conditions that transmissions at time t are subject to. Given those conditions, I assume that the number of secondary infections caused by one primary infection is independent across individuals. This implies for infections i t that i t ∼ N B(i t−1 R t , i t−1 Ψ).
Thus, infections are more dispersed under a small infection count i t−1 . For large infection counts i t−1 the model converges to a Poisson model with no overdispersion.
The growth rate it i t−1 constitutes a random variable with mean R t and variance Rt(Ψ+Rt) i t−1 Ψ . Variation in the growth rate is caused by variation in the mean R t and noise influenced by individual dispersion Ψ. We can separate the two factors, as the former influences growth rates irrespective of the current infection count, while the latter has less impact for many infections i t−1 . Empirical weekly growth rates in Germany fit this model and can be used to estimate the dispersion parameter (see Section S2.2).
The model presented in this study deviates from the literature, where overdispersion is often ignored [5,17,8] or assumed constant [13]. A constant dispersion of aggregated cases arises under the assumption that the number of secondary infections across primary cases are identical. In this case it follows that i t ∼ N B(i t−1 R t , Ψ). This alternative assumption of a constant dispersion is inherent, but often unappreciated, in inference based on negative binomial regression, the endemic/epidemic model introduced in [15], and epidemiological models with random effects (e.g., the model in [12]). 3 The model in this study does not feature infections between compartments. Supporting this simplification, a large scale contact tracing study in India found that most transmissions occur within the same age group [21]. I discus extensions with importation in Section S5.2. The remaining parts of the model are standard. Cases c t constitute a sum of Bernoulli trials that is approximated by a Poisson distribution for computational convenience.
The reproductive number is modelled as a function of covariates, where the effect is assumed to be multiplicative. Each location and age group features a basic reproductive number R 0 in the absence of all interventions and under average weather conditions. Effect estimates can be interpreted as the ratio of prevented/added infections at a particular time.
The main obstacles for inference include high correlation between covariates, unobservables, and an unknown reporting rate (see Section S5 for details).

Reporting rate
Reproductive numbers are identified even if only a fraction of infections is ascertained as cases [31]. Importantly, said fraction has to be constant over time. As the likelihood of developing symptoms changes with age ( Figure S2), the reporting rate r t cannot be assumed to be constant across age groups.
I chose a reporting rate of 25% for the model. Case fatality rates can provide some support for this model assumption. If the infection fatality rate is constant over time, the case fatality rate identifies changes in the reporting rate. The observed case fatality rate in Figure 2 does not suggest major changes over time. A comparison to the age-specific infection fatality rates (estimated based on multiple international serological studies [22]) indicates a reporting rate close to 25%. Similar reporting rates arise based on first evidence of unpublished serological studies in Germany [18]. Identification of the reproductive number relies on the assumption that symptomatic infections were equally likely to be reported over time within a specific age group and location.   Table 1: Estimated basic reproductive number R 0 , dispersion Ψ, respective ratio of primary infections actually infecting, and ratio of secondary infections from 20% most infecting primary cases for different age groups. The ratios of secondary infections was computed assuming a constant reproductive number of 1.
For the age groups 15-34 and 35-59 we observe high dispersion, with more than half of all cases infecting nobody and 20% of primary cases initiating 70 to 80% of secondary cases. Older age groups show less tendency for superspreading. Previous work studied dispersion abstracting from age differences. For comparison, I simulated the marginal distribution of secondary infections, which is a mixture negative binomial distribution [14]. Assuming equally distributed infections across age groups, its mean variance ratio is 0.378 (95%-CI: [0. 33 . Global data-sets of outbreaks suggested a higher potential for superspreading with a dispersion parameter of 0.10 (95%-CI: [0.05, 0.20]) [10]. Noteworthy, the aforementioned studies do not account for changing reproductive numbers and are prone to overestimate the prominence of superspreading. Further, clusters might be more likely to be traced, while diffuse community spread is harder to identify.
Other limitations apply to the approach presented here. The model does not allow for overdispersion in reporting and is therefore prone to overestimate the dispersion in transmis-sion. Moreover, changes in the reproductive number might be inadequately modelled. If the reproductive number is over-fitted, individual variation is under-estimated (and vice versa). Many covariates are strongly correlated ( Figure S20). However, the correlation matrix of effect estimates suggests that effects are identified ( Figure S21).   Figure 3: Change in transmission associated with covariates. The plot depicts the mean and 95%-CI intervals for age-weighted effect estimates of covariates on the reproductive number R t . Real valued variables (r) and standardized variables (s) are marked.

Effects of policy interventions
The strongest effect was associated with public awareness rising. Transmission reduced by 58% (95%-CI: [53%, 62%]) when government officials gave their first speeches asking for decided behavioral changes from the public. Simultaneously implemented changes, like the staggered declaration of international risk areas, might bias this result.
The closure of restaurants was associated with additional reductions in transmission of 15% (95%-CI: [5%, 23%]). The effect was strongest for the age group 15-59.
School and daycare closures were associated with a 12% (95%-CI: [−5%, 28%]) reduction in transmission. The intervention affected the age group 15-79 equally, which suggests that the reduction stems from behavioral changes, instead of the absence of transmission in the educational setting, which would affect other age groups one generation time later.
Limiting sports was associated with additional reductions in transmission. The closing of non-essential shops, mandatory distancing and limitation of gatherings in public spaces, however, showed no evidence for reducing transmission. In the age group 15-34, transmission actually increased by 25% (95%-CI: [−1%, 54%]) when public distancing became mandatory. This might explain why no significant effect of stay-at-home orders nor of business closure was found in the United States [5]. The regulation could induce more private contacts with higher transmission risk. In some states a mild stay-at-home order was imposed, allowing individual sport and work, which was associated with a 9% (95%-CI: [4%, 13%]) reduction in transmission across all age groups.
Initially, testing was covered for patients with exposure or within a risk group. Later, any symptomatic case, regardless of exposure, was eligible. The narrow testing regime was associated with a 16% (95%-CI: [7%, 26%]) increase in transmission. This effect is driven by younger cohorts, whereas transmission in the age group 80+ was reduced.
The reopening of daycare and churches with precautionary measures had no significant impact. Some evidence for increased transmissions was associated with school reopenings.
Masks (mouth and nose cover including cloth masks) became mandatory in supermarkets and public transport. Compliance was found to be high [3] and timing of implementation varied across states and counties. A small effect of −6% (95%-CI: [−17%, 7%]) was associated with this policy. If only a small ratio of transmissions occurs in these settings, this is consistent with a household study in China, where mask wearing before symptom onset reduced transmission by 80%. As a comparison, mandatory masks for employees were associated with a reduction in transmission by 10% in the United States [5].

Effects of information and testing and tracing
Testing and tracing has been argued to be crucial [11]. As symptom onset date and reporting date (when health departments were informed about positive test) are available, we know for each case the days of potential infectiousness and when testing and tracing was initiated (for details see the supplementary material). This allows to construct a daily location specific proxy for the ratio of traced infectious and to provide empirical evidence on the effectiveness of testing and tracing that corroborates results of modelling studies [20].
The ratio of traced infectious was associated with a reduction of transmission by 33% (95%-CI: [22%, 43%]). As the ratio does not reflect unreported infections, the impact of testing and tracing on an infectious individual arises after adjusting for the reporting rate. Extrapolating to unobserved infections, testing and tracing reduced secondary infections by 84% (95%-CI: [35%, 100%]) for the age group 15-59. The effect was strongest for the high risk group over 60 years, where the ratio of traced infectious reduced transmission by 44% (95%-CI: [27%, 59%]). Adjusting for unreported infections, this effect corresponds to an eradication of transmission to older age groups for tested and traced infectious individuals. The hypothetical extrapolation to all infections should be interpreted with caution.

Mar
Apr  Figure 4: Total effect of testing and tracing (ratio of traced infectious), information (logarithm of reported local incidence), and season (average temperature and relative humidity). 95% confidence bands are shown. Figure A and B denote total effect given the data. Figure C extrapolates the total effects of weather variables in an out-of-sample prediction based on average daily weather in the past three years, where confidence bands represent uncertainty in effect estimation, and results are smoothed with a 14-day rolling average.
Recent modelling studies for Germany concluded that 30-50% of transmission can be re-duced by testing and tracing [6]. I find that the total effect of testing and tracing increased from March to May (Panel A of Figure 4). In May 2020, testing and tracing was attributed an average transmission reduction of 15% (95%-CI: [9%, 20%]). For the age group over 80 years, the impact is larger with a reduction of 28% (95%-CI: [17%, 37%]).
In general testing and tracing capabilities are limited. If infections surpass capacities, the ratio of traced infectious decreases. The results from above imply that unmitigated spread increases the speed of transmission, especially in older age groups.
Another potential factor for transmission is voluntary behavioral response to risk of infection [5]. Individual risk of infection depends on local incidence. In line with this argument, I find that publicly reported incidence reduced transmission. This information effect was highest in April (Panel B in Figure 4) when estimates indicate a reduction of 44% (95%-CI: [40%, 48%]). An interesting counterfactual is the level of incidence that is sufficient to stop growth in the absence of policy interventions. The results suggest that a reported incidence of 300 to 1000 weekly cases in 100,000 induces sufficient voluntary behavioral change. This incidence corresponds to 3,600 to 12,000 weekly deaths in Germany (based on a reporting rate of 50% and an infection fatality rate of 0.75% guided by [22] and the German age distribution from 1 to 80 years). The result is speculative, as it extrapolates beyond the support of the data (the 99% quantile of incidence is 160 cases in 100,000).
Cumulative incidence (in percentage points) was negatively associated with transmission. Random mixing, full immunity after infection, and the absence of underreporting would be consistent with an effect of −1%. The data does not allow a sharp identification of this effect as the highest measured cumulative incidence was 1.2%, which corresponds to a reduction of 20% (95%-CI: [4%, 35%]).

Effects of weather and seasonality
Seasonal effects have been discussed to play a role in SARS-CoV-2 spread [2,4]. I find that low temperature was associated with higher transmission. Relative humidity was associated with a small increase in transmission. As the latter effect is mostly driven by the age group 60-79, there is no strong evidence for relative humidity being an important determinant of transmission in Germany. The extrapolated change in transmission due to seasonality is 53% (95%-CI: [43%, 64%]) (based on average temperature and relative humidity in January and July). The expected seasonal effects over the year are denoted in Panel C of Figure 4. These findings are consistent with other studies. In an international cross-city comparison, low temperature and humidity was associated with community spread [28]. A cross state comparison in the United States found that high temperature and UV-light reduced incidence [30].
The effect estimate for seasonality conflates the interaction of the virus with environmental circumstances and behavioral changes due to weather [4]. Moreover, extrapolating effects identified by daily weather to entire seasons can induce substantial biases. In line with the findings presented here, a study based on monthly case data in countries on both hemispheres found that the season of other human corona viruses was associated with low temperature and high relative humidity [23].

Recent data as robustness check
The effect estimates presented so far are subject to substantial uncertainty and suffer from high correlation (compare Figure S21). Model misspecification may play a role as other factors can influence behavior contemporaneously.
The main model is estimated on data until May 2020 as detailed intervention data is not available for the remaining time period. However, the covariates for weather, information, and testing and tracing are available. As robustness check, I consider data from May to August 2020 including the age group 5-14 and with the same model specifications, but substituting policy interventions with week fixed effects. This study covers another 30,000 cases in 141 counties. See the supplementary material for details.
I find that results are consistent for information on incidence, temperature, and testing and tracing. Seasonality effects were less strong which might indicate non-linear effects. School children below 15 years exhibited higher transmission during school holidays. This suggests that reopened schools constituted a relatively low risk of transmission.

Discussion
Empirical evidence is paramount to inform modelling studies and policy decisions. I rely on features of German surveillance data to improve upon early empirical studies analyzing SARS-CoV-2 transmission [13,17,29,9]. In particular, symptom onset allows sharper estimation of infection timing. Moreover, age groups improve identification of growth rates and enable the estimation of age-dependent effects. Inference in such highly compartmentalized data should account for superspreading. As with any structural estimation, results should be interpreted with caution and a keen eye on the model's limitations. The Bayesian framework allows to integrate alternative assumptions (e.g., priors on the reporting rate) to asses their impact on the results presented here.
A large set of covariates was analyzed. Previous studies found stronger effects of policy interventions without controlling for information, tracing, or seasonality. For example, the shut-down in France was attributed a 77% reduction in transmission based on hospitalization and death data [29]. An international comparison of death data estimated an effect of 81% [13]. A reduced form approach on case growth found heterogeneous effects of policy interventions in China, South Korea, Italy, Iran, France and the United States [17]. As noted in other studies, growth rates in Germany substantially reduced before major policy interventions were put in place [9]. If not controlled for, adaptation to local risk of infection can be wrongly attributed to policy interventions [5]. I find that higher reported incidence reduced transmission. This explains the common observation that incidence rarely exhibits prolonged exponential growth. Spread is unmitigated as long as it is undetected. In Germany, the reduction in transmission was driven mainly by behavioural adaptations to reported incidence instead of immunity by infection. If transmission keeps decreasing with higher incidence, there exists an equilibrium value for reported incidence where the reproductive number is 1. If information or behavioral adaptation is delayed, incidence moves in waves around this equilibrium value.
Individual risk of contracting (or spreading) Covid-19 is a function of behavior and local incidence. If behavior is constant, incidence grows until immunity slows further expansion. Behavioural changes to prevent transmission are costly to individuals and their peers, but decrease incidence and are therefore potentially beneficial to society. Thus, policy interventions can be warranted to share and direct the costs of transmission prevention. As behavior was found to change with local incidence, indirect social and economic consequences of the pandemic can be expected even in the absence of policy interventions. Ultimately the impact of policy interventions depends on their effect on transmission and their ability to allocate costs of prevention effectively.
Estimating the costs of interventions is beyond the scope of this study, but some remarks regarding effectiveness seem warranted: Public awareness rising was crucial, while little effect of closing shops and public distancing rules was found. Daycare reopening showed no, school reopening relatively little impact. Testing and tracing had a large effect. As many infections occur before or around symptom onset, timely testing and fast turnaround is important. These results support the call for investments in alternative testing technologies [26].
Seasonality of SARS-CoV-2 transmission has been a highly politicized topic [4]. Given the high rate of susceptibility, seasonal effects on SARS-CoV-2 kinetics are limited [2]. While weather was only one of the key determinants of transmission in the study period, its extrapolated effect could be decisive. Importantly, seasonality increases the equilibrium incidence where information stalls growth. As testing and tracing capacities are limited and crucial to prevent transmission (especially to older age groups), higher incidence could lead to an additional increase in growth rates and an unproportional increase of exposure in high risk groups.

S1 Model
The model comprises three parts. The transmission model denotes the transmission dynamic over time. It accounts for dispersion in secondary transmissions and a probabilistic latent period, but does not include importation (across location or age). The key feature is a time dependent and age-specific instantaneous reproductive number. Infections are unobserved, and have a probabilistic connection to reported cases in the measurement model. The measurement model features a probabilistic incubation period and a probability of symptom development being detected and reported. The effect model describes how covariates (e.g., interventions, weather, information, etc.) influence the instantaneous reproductive number. All other characteristics (e.g. dispersion, latent period distribution, etc.) are assumed to be constant.

S1.1 Transmission
The number of infections in location l and age group a is denoted by i l,a t . Infections i l,a t lead to additional infections by transmission. The transmission model has two key features: Firstly, the current average growth rate is represented in the instantaneous reproductive number R l,a t . Secondly, the model allows for dispersion (compare [23]) of secondary infections, as overdispersion is hypothesized to be a crucial component of Covid-19 infection dynamics [9,2].
Let i t,t denote the number of infections at time t caused by primary cases infected at time t , where we omit location and age for notational convenience. We have i t = i t,t . Let i j,t,t for j = 1, . . . , i t denote the offspring at time t of individual j infected at time t . Let i t ,j = t i j,t,t denote the sum of secondary cases of individual j. All distributional statements for random variables realizing at time t are meant given R t and previous infections i t−1 , i t−2 , . . . if not noted otherwise. Let D i denote some constant generation time distribution with positive support [7]. Figure S1: Model illustration. All nodes directly connected to new infections i t are shown. Each infection i t with t < t transmits on average to D i (t − t )R t new cases, where D i denotes the generation time distribution. Infections at time t develop symptoms at time t > t with probability D s (t − t) and are reported with probability r t .
The negative binomial distribution of secondary infections can be motivated as a generalization of the Poisson model (as mixture of Poisson distributions, where the mixing distribution of the Poisson rate is gamma distributed [23]), or as the mechanistic outcome of a linear growth process [11].
Assumption: The offspring i j,t,t has a negative binomial distribution with mean R t D i (t − t ) and dispersion ΨD i (t − t ) and is independent for t < t and j ∈ {1, . . . , i t }.
The argument is as follows: As i j,t,t for j = 1, . . . , i t are independent and identically distributed, it follows that As i t,t for t < t are independent and have a negative binomial distribution with identical parameter p t,t = Ψ Rt+Ψ , the distribution of the sum of all transmissions at day t is denoted by Note the difference to the widely used parametrization of dispersion in [23], which models the dispersion of the amount of secondary infections without generation time. If the reproductive number is constant, i.e. R t = R, a single infection induces t i j,t,t ∼ N B(R, Ψ) secondary infections, coinciding with the aforementioned standard model of dispersion in the seminal paper [23]. Thus, priors on the dispersion Ψ can be informed by studies working in the standard framework and posterior results can be compared conveniently. Notably, if the instantaneous reproductive number is varying over time, the number of secondary infections constitutes a mixture negative binomial distribution instead as denoted in [14].
If D i (1) = 1 (a common assumption for weekly case counts), the arguments presented here simplify. Specifically, the viral load is L t = i t−1 and the assumption reduces to i j,t,t−1 being independent given R t and having a negative binomial distribution with mean R t and dispersion Ψ. We note the similarity to [32] who derive a negative binomial distributions for the time series SIR model first introduced in [11] from a linear birth process, considering only a single time step, in which case the model presented here would reduce to i t ∼ N B(R t i t−1 , i t−1 Ψ), which recovers the time series SIR model for Ψ = 1. Thus, the model presented here can be seen as a generalization of the time series SIR model with flexible dispersion Ψ and accounting for a generation distribution D i .
Consider the alternative assumption generating standard models from the literature: It follows that As i t,t for t < t are independent and have a negative binomial distribution with identical parameter p t,t = Ψ Rt+Ψ , the distribution of the sum of all transmissions at day t is denoted by If transmission is considered only between subsequent time intervals, we obtain the standard assumption of inference based on negative binomial regression, the endemic/epidemic model introduced in [18], or epidemiological models with random effects (e.g., the model in [12]), i.e. i t ∼ N B(R t i t−1 , Ψ). A similar point applies to [13], where dispersion is independent of the total count in the death reports.

S1.2 Measurement
As illustrated in Figure S1, it is assumed that an infection at time t leads to symptom onset being reported at time t with probability D s (t − t)r t , where D s (t − t) denotes the distribution of the incubation period and r t the likelihood of developing symptoms and being positively tested and reported as a case. The aggregation of all cases c t constitutes a sum of Bernoulli trials, following a Poisson binomial distribution, which is approximated by a Poisson distribution for computational convenience: Extensions of the model to include additional observable case counts are straight forward. The paper focuses on symptom onset of Covid-19 cases and argues that this provides sharper information on the timing of infections than death counts or hospital admissions. Deaths counts would require that the infectuous fatality rate is constant or adequately modelled. Hospital admissions and deaths counts suffer additionally from lower numbers, especially among younger age groups. Aggregated reported cases (without symptoms) include asymptomatic cases, but require knowledge on the timing from infection to reporting and are subject to larger distortions if proportion of asymptomatic infections ascertained as cases changes due to testing regime.

S1.3 Effects
The instantaneous reproductive number is modelled as a function of input covariates x 1 , . . . , x J , where the effect is assumed to be multiplicative such that Each location and age group has an individual basic reproductive number R l,a 0 . The effect of covariate x j on the instantaneous reproductive number of age group a is denoted by β a j . Covariates are usually standardized or dummies. A coefficient of .5 would signify that an increase from the 0 to 1 in x j increases the instantaneous reproductive number by 50%. A coefficient of −0.5 would signify that the same change in x j is associated with a 50% reduction in transmission.

S2 Data
The study combines information from three main sources. Case reports are obtained by the Robert Koch Institute (RKI) 1 , weather data from the German weather service 2 , and policy interventions were specifically catalogued for this study 3 . The data is available in an accompanying R-package. 4
age asymptomatic Figure S2: Ratio of asymptomatic cases by age group. Ratio of cases reported without symptom onset. This may include asymptomatic cases, presymptomatic cases that were not followed up on, or symptomatic cases, where symptom onset was not reported.
February. It has been argued that Germany had relatively large testing capacities in the early phase of the pandemic [33]. This is also supported by excess death data [30], which provides no evidence for undetected Covid related deaths. The data was obtained by the RKI, the research institute responsible for disease control and prevention that is subordinate to the Federal Ministry of Health in Germany. Laboratories are required by law to report positive test results within 24 hours. The data is gathered by local health departments (Gesunheitsamt) responsible for collecting reports on notifiable diseases (diseases required by law to be reported to government authorities) and subsequently passed along to the RKI. The health departments are organized on a subregional level (county or "Landkreis").
The data includes case-specific information on date of symptom onset, date of reporting to the health department, age bracket, county, and death. More detailed information (occupation, likely transmission environment,etc.) is provided to the RKI, but not publicly available [26].
Not all cases reported a date of symptom onset. Generally, entries are updated after first date of reporting, which suggests that symptom onset after testing is reported. The remaining cases without symptoms are either asymptomatic cases, false-positive tests, or cases were symptoms were developed but not consistently reported. Interestingly, the ratio of asymptomatic cases is age-dependent as illustrated in Figure S2. This suggests different likelihood of developing symptoms across age groups, with the age group 35 to 59 years being most likely to exhibit symptoms. An age-dependent likelihood of developing symptoms provides another argument for the relevance of modelling the growth of Covid-19 with age compartments to disentangle a changing age distribution and changing incidence. Figure S3 shows the 7-day incidence of symptomatic cases across age groups 5 . Notably, the reported cases were first dominated by moderate age groups, in April overtaken by the elderly, and in the recent past mainly driven by the age group 15 to 34 years. 5 Population data was obtained from the Regional Database [10].  Note that the age groups 0-4 and 5-14 were not included in the main model as relatively few cases were observe in the study period until May 2020. Further, the low infectious fatality rate makes an assessment of the reporting rate by the case fatality rate (see Figure 2 in the main text) more challenging. A large-scale serological study in the state of Bavaria (Bayern) found that children were 6 times more likely to be sero-positive than expected by case reports [19], which suggests that their reporting rate was about half as large as those of adults. Figure S4 shows growth rates computed on a 7-day window to account for weekday effects. The left column is based on reporting dates. The right column based on symptom onset.

S2.1.1 Outbreak in Heinsberg
The first major outbreak recorded was in the community Gangelt in the county Heinsberg [31]. Indeed, 47% of the 734 cases that were officially recorded with symptom onset in February 2020 were reported in this county. Figure S5 illustrates the advantage of using symptom onset to judge the timing of infections. We observe a clear pattern of constant growth rates of new cases with symptom onset until February 26. With an incubation period of 4-7 days this suggests that reproductive numbers were high during carneval, which is celebrated most intensely between the Thursday (February 19) and Saturday (February 22). At February 26, local authorities became aware of the first cases and closed schools and daycares. In the following days a large number of people quarantined. The figure illustrates that reporting date shows a significant lag and is not appropriate to connect infections to specific circumstances.
As the situation in Heinsberg was extraordinary, with a large media exposure and drastic adaptations beyond the officially recorded measures, the county is excluded from the main analysis.

S2.2 The variance of growth rates
In the model section of the main text, where generation time is assumed be 1, it is argued that the realized growth rate g t = it i t−1 has a variance of σ 2 gt = Rt(Ψ+Rt) This point is investigated in the data. Weekly symptomatic cases were computed such that the assumption of a generation time and incubation period of 1 seems reasonable and weekday effects are accounted for. First, we estimate the variance for each value of i t−1 in the data by its empirical analogue. Figure S6 illustrates variance estimates and their fit with the modelled derived in the main text.
The finding can also be used to estimate the dispersion over time. In particular, it follows directly that dispersion is a function of the variance of growth rates:   We evaluate the development of dispersion over time using the equation above. R t is assumed to be constant within a state and estimated by the growth rate average across counties. As estimator for σ 2 gt we use its sample analogue within state and month. Basically, it is assumed that average number of secondary transmissions is constant within state, and the variance of growth rates within a state across counties is used to estimate the dispersion parameter. Results can be seen in Figure S7. Without accounting for specific effects of interventions and covariates, the dispersion Ψ is estimated to be around 0.25 in the first months, lower in June and increasing in summer. Those estimates are slightly below the one obtained in the full model, where the average across all age groups for cases until mid May is estimated to be consistent with a dispersion parameter of 0.47.

S2.3 Cumulative incidence
Several explanatory variables were constructed based on the case data provided by the RKI. This includes the cumulative incidence, which was computed for each location (ignoring age) and is assumed to impact the instantaneous reproductive number two weeks later. The lag was introduced to distinguish local saturation and information effects from immunity in the general population.

S2.4 Information on incidence
Additionally, the historic case data based on reporting date (instead of symptom onset) allow to reconstruct the publicly available information about county specific case load. Reported cases are assumed to influence behaviour the following day. The logarithm of county-specific publicly known 7-day incidence is used to represent this information about local risk of infection. Specifically the transformation log 10 (1 + cases/pop × 10 5 ) was used, where cases is the 7-day accumulated cases and pop the population of the location at hand. This variable obtains the value 0 if no infections were reported in a week.

S2.5 Ratio of traced infectious
Previous modelling studies suggested that delays between symptom onset and confirmation are important factors in outbreak control [1]. The data allows to construct for each day the ratio of local cases that has been reported to the health department among the potentially currently infectious cases. I refer to this ratio as the ratio of traced infectious. Individuals are assumed to be infectious one day prior to 6 days after symptom onset to avoid weekday effects. This assumption is in line with virus shredding [17] and contact tracing [6] studies. As the health department is responsible for contact tracing the ratio of traced infectious may hold substantive information about the immediacy of contact tracing and the speed of testing. Importantly, the ratio is computed based on the reported cases. The ratio of traced infectious among all infections arises after dividing by the reporting rate. This has implications for the interpretation of the effect estimate of the ratio of traced infectious. The effect estimate of testing and tracing on an individual primary case arises after multiplying with the inverse of the reporting rate as illustrated in Section S4.4.
The development of the ratio of traced infectious over time is shown in Figure S8. In March, the average ratio increased from 0 to 30%. By the end of April the ratio reached it's peak at 50% before staying mostly constant for the remaining time. Importantly, the regional variation illustrated by the 80%-confidence intervals is substantial.

S2.6 Weather
Daily location specific weather data was obtained from the Climate Data Center (CDC) of the German Weather Service (Deutscher Wetter Dienst -DWD). For each county the temperature and relative humidity from any weather station within 50 kilometers was considered and daily averages were computed. Figure S9 plots example time series of weather covariates for two regions of Germany to illustrate the substantial variation in daily weather. Figure S20 illustrates the covariance structure between covariates and shows that relative humidity and average temperature exhibit an empirical correlation of −0.32, which suggests that there is sufficient variation between the two weather variables to distinguish their associations with the instantaneous reproductive number.
A summary of all real-valued covariates can be found in Table S1. The variation of each covariate that is not associated with location or time can be considered most valuable for a ro-

S2.7 Interventions
Policy interventions were specifically catalogued for this study. The full data set with references can be accessed online. A descriptive summary of the most important interventions can be found in Table S2. Many intervention effect estimates rely almost entirely on variation across time. A description of each intervention can be found in Table S3.  The responsibility for public health interventions lies mostly on the state level (Bundesland). Some policy measures (like testing regime) were decided on a national level. Many were implemented simultaneously after state leaders coordinated their response. A few counties (sub-regional level) deviated by imposing additional restrictions (e.g. earlier mandate for masks in public).
Information on the timing of the most important interventions is given as timeline in Figure  S10 and further details can be found in Table S2. Full information on the timing of enactment in different locations is provided in Figures S24 and S23.
All interventions are coded as active, when they can be assumed to have impact. For example, the speeches given at March 12 are denoted as active starting March 13. If the closing of schools was announced entering into force immediately, they were denoted as active the day after.
Closing of schools, daycare, shops, sports is still denoted as active when the respective reopening takes place. Thus, estimates of the reopening can be directly understood as the effect of reopening. One exception is the stay-at-home order, which was lifted after a relatively short amount of time and not implemented in all states.

S3 Prior choice
Priors were chosen with the goal to enable identification trough information in the literature, while allowing for adaptation in the context of the study. For effects, dispersion, and initial conditions weakly informative priors were chosen. Table S4 lists all prior choices. Generation time distribution D i and incubation period distribution D s are assumed to have a gamma shape with standard a deviation of 2 and a mean that has a normal prior with a mean of 5.5 days and a standard deviation of 0.1. For a review on the incubation period see [25].
Infections are initialized when the first symptomatic case was reported. The six previous days have a prior for initial infections that is exponentially distributed with mean µ init 6 , where µ init has a positive normal prior with mean 4 (mirroring the reporting rate of 0.25) and standard deviation 4.
Effects of interventions and covariates are equipped with a normal prior with mean 0 and standard deviation 0.2. Additionally, the model has an error term with standard deviation 0.1 to prevent that misspecification of R l,a t is attributed to individual dispersion. Results without this error term are largely robust (not shown here).
parameter description prior or parameter choice β a j effect of covariate j on age group a N (0, 0.2) β a t,l multiplicative error term for R a t at location l N (0, 0.1) r t reporting rate 0.25 incubation period distribution Gamma shaped Γ(µ s , σ s ) µ i mean generation time N (5.5, 0.1) µ s mean incubation period N (5.5, 0.1) σ i standard deviation generation time 2 σ s standard deviation incubation period 2 Ψ a dispersion parameter for age group a N + (0, 5) µ init expected initial infections N + (4, 4) d init number of days for initial infections 6 i t initial infections for t = t 0 , . . . , t 0 + d init exponential with mean µ init d init Table S4: Parameter and prior choices. N (µ, σ) denotes a Gaussian distribution with mean µ and standard deviation σ. N + (µ, σ) the respective half-normal distribution.

S4.1 Implementation of MCMC
The data was prepared and the results were analyzed with the statistical software R 3.6.3 [28]. The MCMC sampler was constructed with JAGS 4.3 [27]. The burn-in phase was 10.000 iterations. And 10.000 iterations were sampled subsequently, which were then thinned to 1000 draws for inference. The maximum Rhat-value among the monitored variables (excluding the latent infection process for computational reasons) was 1.16 and visual diagnostics were administered to assess convergence. Replication code is available online. 6

S4.2 Transmission
Results for incubation period and generation time are illustrated in Figure S11. Noteworthy, symptom onset is self-reported and therefore prone to biases. In particular, patients are incentivized to report symptoms to increase their likelihood of receiving a test and full health insurance coverage for the associated expenses. Further, it can be assumed that symptom onset is often elicited after the positive test and unrelated symptoms are assigned as first symptoms, which would lead to an overestimation of the incubation period.
The basic reproductive number is estimated separately for each location and age compartment. Table S5 shows a meta regression that explains variation in basic reproductive number,  timing of first symptomatic case, and number of infections before the first case became symptomatic.
The age averaged basic reproductive number (in the absence of interventions and under average macro conditions) is 2.53 and positively associated with population density. In general, variation due to explanatory variables is very little (R-squared of 0.08). As expected population density has a small positive effect. The ratio of 15-34 year olds has a small negative effect, which might indicate the reduced susceptibility of younger age groups. Variation in timing of initial (detected) exposure varies more prominently, where early initial infections are explained by high population density and a large ratio of young inhabitants. The accumulated effect of one standard deviation in population density and age groups amounts to about 6-7 days earlier initial infection. Given the basic reproductive number in winter, this accumulates to about 3-4 times as much initial exposure. Finally, the number of infections already ongoing, when the first case develops symptoms is on average 4.6 (driven by the assumption that the reporting rate r t is 0.25) and no significant covariates were detected.
No evidence for a significant difference for Eastern Germany can be found, which suggests that the stark difference in incidence between Eastern and Western Germany is mostly driven by initial exposure and potentially population density.

S4.3 Policy interventions
In the following the age specific effect estimates shown in Figure S12 are discussed. For additional details on interventions see Table S3. All effect estimates should be understood as average changes in transmission associated with the situation the interventions were implemented in. It is subject to discussion, if the same effect can be expected to manifest itself under different circumstances.
Average effects of covariates are shown in the main text in Figure 3 and are based on the German age distribution. As age-specific effects for any single intervention were uncorrelated, the marginal effects are estimated more sharply than the age-specific effects. In the following, the most important differences in age are discussed.
Narrow testing, reducing the availability for tests to risk groups, health care workers and individuals exposed to a confirmed case, increased transmission to younger age groups, and reduced transmission to older age groups. The return to symptomatic testing end of April mitigates those differences.
Holidays show some evidence for increased transmission in younger age groups, but decreased transmission in the age group above 80 years.
The closing of schools and daycares was associated with a reduction in all age groups, except above 80 years. Noteworthy, there is little evidence in the age group 15-34. The lack of evidence for children below 15 years complicates drawing any decisive conclusions.
Limiting sport activities is associated with a decrease in transmission for the age group 15-34. Surprisingly, an even larger effect is found for the age group 80+, which might be due to misspecification of the model. Elderly care was subject to substantial changes during the Table S5: Meta regression of locations. A regression of the mean of the basic reproductive number R 0 , the timing of the first symptomatic case (days after 2020-02-15), and the number of infected when first case developed symptoms. Explanatory variables are population density, dummies for rural county and Eastern Germany (including Berlin), and ratio of different age groups. Continuous covariates are standardized. Basic reproductive number and number of initial infections are weighted average over age-specific results.  Figure S12: Changes in transmission. The plot depicts average effects and 95% confidence intervals for different age groups for all covariates excluding fixed effects. The grey shade denotes the respective confidence interval of the prior. Descriptions use abbreviations for closed (cl.), limited (lim.), recommended (rec.), and cumulative incidence (cum. inc.). 23 pandemic and making those changes available as data set is beyond the scope of this study, but should be considered for future work. Restaurant closure is associated with a moderate decrease in transmission, where older age groups show no evidence for changes. Bar closures were mostly applied in parallel. The effect estimate is mostly driven by the small implementation differences in mid-March, where bar closure arguably had very little potential to reduce bar attendance. Closing and limiting of events shows no evidence for reducing transmission. In fact, for the first recommendations the association is positive, which might stem from the fact that early transmission was mostly due to importation from high risk areas, which is subject to different dynamics. The other policy interventions were implemented in an environment were cases dynamics were mostly driven by local infections.
The opening of schools, daycares, and allowing religious gatherings is associated with little to no increase in transmission. A small increase in transmission associated with school openings can be detected for the age groups 35-79. Noteworthy, all openings were under safety concepts adapted to the risk of SARS-CoV-2 transmission.
The first speeches of the German president and health minister asking for major behavioural changes are associated with a strong reduction in transmission for all age groups except for the age group 80+. The second major speech by the chancellor Angela Merkel shows no effect on average, but is associated with some increase in the younger population. It should be noted that other interventions that are not in the data, might factor into the reduction associated with the first speech. This includes a reduction of importation of cases from international travel as the RKI denoted a number of European areas as risk areas and quarantine rules for homecoming tourists with symptoms were set in place.
The closing of shops is associated with an increase in transmission for younger age groups. Related, mandatory distance in public spaces is associated with an increase in younger age groups, which suggests that the reduction of public interactions might have been substituted with private interaction that was subject to higher risk of transmission. The possibility should be considered that especially younger individuals, who face less individual risk, are prone to increasing transmission, when public interaction is substituted by private interaction. In line with this thought, a stay-at-home order (with exceptions including individual sport and work) reduces transmission for all age groups.

S4.4 Testing and Tracing
The ratio of traced cases has a strong impact on all age groups, which is especially strong for older age groups. This suggests that a tracing reduces transmission to older more vulnerable groups. Naturally, this variable conflates two different effects that cannot be distinguished: Individual behavioural changes due to the positive test, and the effect of contact tracing as administered by the health department.
Further, the ratio only measures the traced cases among reported cases, not among all infections. If the effect is assumed to be the same among the individuals not reported, the effect 24 0 5 10 Apr 01 Jul 01 delay Figure S13: Average and 80% confidence bands on delay between symptom onset and day of reporting. Delay is measured in days until first notice of local health department. The horizontal lines denote the critical phase of one day before until 6 days after symptom onset, which is hypothesized to be the days of highest infectiousness.
testing and tracing has at the individual level can be extrapolated by dividing the effect estimate by the reporting rate. This allows to infer the relative reduction in transmission for an infectious individual by testing and tracing. It should be noted that most cases are reported between 2 to 6 days after symptom onset as shown in Figure S13. The effect estimate of tracing therefore is driven mostly by this region. It may be possible that earlier testing and tracing is un-proportionally more (or less) effective.

S4.5 Information and cumulative incidence
Publicly reported local incidence decreased transmission. The effect is sharply estimated and stronger for younger age groups.
Cumulative incidence is associated with a stronger reduction for old age groups. Cumulative incidence is given in percentage points. Thus, under random mixing, full immunity, and the absence of underreporting an effect of 1% should be expected.

S4.6 Weather
High average temperature reduced transmission. This effect is consistent across age groups. In the model period average temperature ranged from −2 • C to 18 • C (0.95-CI: Relative humidity is significant on average and for the age group 60-79. The potential role of relative humidity in the transmission dynamics of SARS-CoV-2 is discussed in [3]. The results here suggest a minor role of relative humidity in Germany. External validity for the effect of relative humidity beyond the range observed here (0.95-CI: [43%, 90%]) is in doubt.

S4.7 Robustness check with more recent data
As a robustness check the model was applied to data from May to September 2020. The instantaneous reproductive number R t is modelled as a function of weather variables and of the ratio of traced infectious and local information as before. As there is not sufficient data on regional interventions at this point in time, those were substituted by week fixed effects. The model also contains weekday fixed effects and noise terms as before.
In this model, the age group 5-14 years was included, as schools were partly open and the ratio of younger age groups among the cases increased compared to the low incidence in the first surge of cases in March and April.
Similar to the main study, the mean incubation period was found to be slightly less than mean generation time with 4.6 and 6.6 days respectively. The age specific results can be found in  Table S6: Estimated basic reproductive number R 0 , dispersion Ψ, respective ratio of primary infections actually infecting secondary infections, and ratio of secondary infections from 20% most infecting primary cases for different age groups. Results based on data from May to August 2020. The ratios of secondary infections was computed assuming a constant reproductive number of 1.
The effect estimates for the covariates can be seen in Figure S14. The results are largely consistent with the main study. Testing and tracing is associated with a strong reduction in transmission. The effect is strongest for old and the youngest age group. This effect is slightly stronger, which might indicate that the reporting rate is higher as this would make the ratio of traced cases a better proxy for the ratio of traced infections.
Information on local incidence is again found to be a strongly correlated with a reduction in transmission. As before, this effect is mostly found among younger age groups. The age group 80+ shows no evidence for a successfully adaptation to current risk of infection. Intervention/covariate Figure S14: Effects on reproductive number for model applied to data from May to August 2020. The plot depicts average effects and 95% confidence intervals for different age groups for all covariates excluding fixed effects. The shaded are indicates the 95% confidence intervals of the prior distribution.
Average temperature shows a similar pattern for the age groups 15-34 and 35-59. For the other age groups no significant effect of weather can be detected. Arguably, transmission to older and younger age groups is mostly driven by the middle age groups in professional or household setting, which would explain the pattern. The predicted seasonal effect of weather ( Figure S15) is smaller than in the main study, which suggests non-linear effects of temperature and relative humidity.
Holidays have a high variation state by state in Germany. Interestingly, there is a strong positive effect on school children below the age of 15, which suggests that school is a less risky environment for transmission than holidays. It should be noted that free tests were available at the border for German citizens when returning from international vacation, which might lead to a higher reporting rate for school children during holidays and an upward bias in the effect of holidays.  Figure S15: Total effect of testing and tracing (ratio of traced infectious), information (logarithm of reported local incidence), and season (average temperature and relative humidity). Estimates based on age group 15-59. 95% confidence bands are shown. Figure A and B denote total effect given the data. Figure C extrapolates the total effects of weather variables in an outof-sample prediction based on average daily weather in the past three years, where confidence bands represent uncertainty in effect estimation, and results are smoothed with a 14-day rolling average.

S5 Assumptions
In the following, the main assumptions of the model are listed.

S5.1 Reporting rate
The key assumption to identify the reproductive number from report data is a correctly modelled probability of reporting r l,a t . In the main specification the reporting rate r l,a t is assumed to be constant over time.
Previous studies argued that deaths are more reliable than case data [13]. One limitation of death data is that identification of growth rates relies on a constant fatality rate, which is violated if age groups are not taken into account. As shown in Table S7 depicting German data, 27% of cases over 80 years died, while only 0.03% of cases under 35 years. If age groups are affected differently by interventions, relying on death data could induce strong biases. Effects on younger age groups are essentially undetectable. Further, improved hospital care is likely to have reduced the infected fatality rate over time. Figure S16 shows the symptomatic case fatality rate. The standard case fatality rate includes asymptomatic cases and requires adjustment by time as cases are often reported earlier than deaths. The RKI case data allows to compute fatality rate by symptom onset. The symptomatic case fatality rate is less prone to changes in testing and suggests that the infection fatality rate decreased over time.  Naturally, using case data is subject to a similar critique, as detection rates of infections may change over time. One advantage of using symptom onset instead of reporting date to aggregate case data, is that changing detection rates of asymptomatic cases does not impact the results.
The number of executed tests in Germany is published weekly by the RKI [29]. Consistent reporting of those numbers started mid March. Between mid-March and mid-June there was only moderate variation in weekly tests between 327 and 431 thousand tests each week. Early test data is not available and case data may suffer from significant changes in the testing regime, which in turn may impact effect estimates for early interventions. In summer 2020 the test numbers increased further and reached a million each weak in August. Figure S17 shows the number of tests necessary to find one symptomatic case. Noteworthy, this number increased in spring (where test numbers stayed largely constant) and stayed Tests for one symptomatic case Figure S17: Average tests in a week for one reported case with symptom onset in week. Weekly number of tests as reported by the RKI and obtained from [24].
constant in summer (where test numbers increased). It was hypothesized that interventions reducing the doses of viral inoculum, e.g. face masks, increase the likelihood of asymptomatic caes [15]. Focusing on symptomatic cases, captures this effect. If asymptomatic cases are equally (or more) likely to spread, this could lead to a higher perlocution rate in the second generation that would not be captured by the model.
As can be seen in Figure S18, the ratio of reported cases without symptom onset increased over time. If the likelihood of developing (and reporting) symptoms remains constant over time, this suggests that more asymptomatic cases were found over time. In this situatoin, relying on reported cases instead of symptomatic cases might bias inference on growth rates.

S5.2 Absence of importation
The model proposed here ignores importation. Naturally, transmission between locations and age groups occur. Arguably, the obtained instantaneous reproductive number R l,a t should only be read as reduced form summary of the current growth rate. Extension of the model incorporating importation across compartments are straight forward but require strong additional  Figure S18: Ratio of asymptomatic cases over time.
assumptions for the identification of reproductive numbers. Consider the transmission model where R l,a,a t denotes the reproductive number from age group a to age group a. If detection rates are heterogeneous, identification of R l,a,a t requires that the ratio of detection rates between age groups is known. The implementation of such approaches would require reliable prevalence data that allow to identify detection rates of the German reporting system for PCR-positive test stratified by age and location.
The data provides mixed evidence for transmission dynamics between age groups. Figure  S19 shows 7-day incidence based on symptom onset for the six regions with the highest incidence in Germany. The data is standardized to allow for different detection rates within age and location. While the changes in across age groups suggest that moderate age groups infected the elderly and children in the high incidence phase in March and April (distancing rules active and schools closed), the pattern becomes unclear in later stages during summer.

S5.3 Homogeneity
Under the assumption of random mixing, i.e. each infected is equally likely to infect any other member of the population, the instantaneous reproductive number R l,a t can be interpreted as a random draw from the current transmission situation in the population. In a more realistic setting, transmission is heterogeneous. A contact tracing study from Hong Kong finds, for example, that infections in social settings were associated with more secondary cases compared to household infections [2].
As transmissions occur more likely within a cluster (household, workplace, location, ethnicity, social class, etc.) R l,a t can expected to exhibit auto-correlation in its error to depict an accurate description of the current average transmission dynamic in the entire population. For  Figure S19: Standardized incidence heatmap for age groups over time. Incidence is standardized within age and location compartment. the same reason, local immunity acquired within a cluster of infections (e.g., within a household) may temporarily underestimate the expected transmission dynamic in the entire population which will govern the process when infections are more equally distributed. Such heterogeneity in transmission, as modelled for example via a spatial [20] or a network structure [8], can lead to underestimating the effect of interventions that have a stronger effect on intra-cluster/long distance transmission. The reduced transmissions by the second generation of infections occur later and would not be attributed correctly to the initial intervention.
Similarly, interventions might be more/less effective over time. Especially, the weather variables, which are identified based on daily variation in weather, are subject to the critique that long-term effects might differ substantially from short term effects.
Local saturation (e.g. in households or local communities) through immunisation can also play a role [16]. The usage of a high number of subregional compartments can only partly control for that.

S5.4 Unobservables
Ultimately any transmission dynamics can be attributed to individual behaviour (potentially in interaction with external factors). The covariates considered here to explain the instantaneous reproductive number may omit other shared drivers of individual behaviour. One additional factor that may drive the spread of SARS-CoV-2 is the prevalence of new variants of the virus [21].
As a summary for the reader, Figure S20 illustrates the correlation matrix of the main covariates and Figure S21 the correlation matrix of the effect estimates of the main covariates.

S5.5 Interaction effects
It is assumed that each covariate increases/decreases a share of the secondary transmissions at a particular day. This ignores interaction effects, which are likely to be of high importance for many interactions. Effect estimates for policy interventions should be understood as estimating the association between the intervention and transmission under the circumstances it was implemented. External validity to other circumstances is subject to discussion.

S5.6 Constant characteristics
Throughout, the model presented here assumes that properties of the infection are time-independent, in an effort to recover time dependent dynamics of the instantenous reproductive number. This is a simplification. There is for example evidence, that the generation time distribution may be shortened by policy interventions [4]. Further, most interventions can be argued to reduce the risk of superspreading events, thereby reducing dispersion. Estimates of dispersion and generation should be understood as empirical averages in the considered time period.