nosoi: A stochastic agent‐based transmission chain simulation framework in r

Abstract The transmission process of an infectious agent creates a connected chain of hosts linked by transmission events, known as a transmission chain. Reconstructing transmission chains remains a challenging endeavour, except in rare cases characterized by intense surveillance and epidemiological inquiry. Inference frameworks attempt to estimate or approximate these transmission chains but the accuracy and validity of such methods generally lack formal assessment on datasets for which the actual transmission chain was observed. We here introduce nosoi, an open‐source r package that offers a complete, tunable and expandable agent‐based framework to simulate transmission chains under a wide range of epidemiological scenarios for single‐host and dual‐host epidemics. nosoi is accessible through GitHub and CRAN, and is accompanied by extensive documentation, providing help and practical examples to assist users in setting up their own simulations. Once infected, each host or agent can undergo a series of events during each time step, such as moving (between locations) or transmitting the infection, all of these being driven by user‐specified rules or data, such as travel patterns between locations. nosoi is able to generate a multitude of epidemic scenarios, that can—for example—be used to validate a wide range of reconstruction methods, including epidemic modelling and phylodynamic analyses. nosoi also offers a comprehensive framework to leverage empirically acquired data, allowing the user to explore how variations in parameters can affect epidemic potential. Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands‐on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model can be easily and quickly implemented.

linked by transmission events, known as a transmission chain. Reconstructing transmission chains remains a challenging endeavour, except in rare cases characterized by intense surveillance and epidemiological inquiry. Inference frameworks attempt to estimate or approximate these transmission chains but the accuracy and validity of such methods generally lack formal assessment on datasets for which the actual transmission chain was observed.
2. We here introduce nosoi, an open-source r package that offers a complete, tunable and expandable agent-based framework to simulate transmission chains under a wide range of epidemiological scenarios for single-host and dual-host epidemics.
nosoi is accessible through GitHub and CRAN, and is accompanied by extensive documentation, providing help and practical examples to assist users in setting up their own simulations.
3. Once infected, each host or agent can undergo a series of events during each time step, such as moving (between locations) or transmitting the infection, all of these being driven by user-specified rules or data, such as travel patterns between locations.
4. nosoi is able to generate a multitude of epidemic scenarios, that can-for example-be used to validate a wide range of reconstruction methods, including epidemic modelling and phylodynamic analyses. nosoi also offers a comprehensive framework to leverage empirically acquired data, allowing the user to explore how variations in parameters can affect epidemic potential. Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands-on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model can be easily and quickly implemented.

| INTRODUC TI ON
Infectious disease events, especially those resulting from novel emerging pathogens, have significantly increased over the past few decades, possibly as a result of alterations in various environmental, biological, socioeconomic and political factors (Chan et al., 2010).
By definition, infectious agents need to spread through transmission between hosts. If successful, the resulting transmission process creates a connected chain of hosts linked by transmission events, usually called a transmission chain. Transmission is highly stochastic and can be influenced by a wide array of intrinsic and extrinsic factors, such as within-host dynamics and environmental or host behavioural factors. Reconstruction of transmission chains, however, remains difficult to achieve, except in certain rare cases characterized by intense surveillance and epidemiological inquiry (Mollentze et al., 2014;Worby et al., 2016).
Molecular data may represent a critical asset in reconstructing the transmission history of a pathogen (Campbell, Cori, Ferguson, & Jombart, 2019;De Maio, Worby, Wilson, & Stoesser, 2018;Didelot, Fraser, Gardy, & Colijn, 2017;Didelot, Gardy, & Colijn, 2014;Worby et al., 2016). Often, however, the relationship between individual cases is too distant to allow for the perfect reconstruction of a transmission chain. In that context, the study of infectious agents' genomic sequences can be used to reconstruct, under an evolutionary model, their likely evolutionary history. These reconstructions rely on evolution occurring on the same time-scale as the epidemic or transmission process, which is the case for most fast-evolving pathogens such as RNA viruses (Romero-Severson, Skar, Bulla, Albert, & Leitner, 2014;Ypma, van Ballegooijen, & Wallinga, 2013). The inferred evolutionary history has been used in recent years to estimate the timing, the origin or the effectiveness of mitigation measures of several epidemics (Dellicour et al., 2018;Dudas, Carvalho, Rambaut, & Bedford, 2018;Grubaugh et al., 2019;Hill et al., 2019).
The accuracy, validity or limitations of both currently available and future methods, however, generally lack formal assessment on datasets for which we have been able to observe the actual geographical spread and the complex factors that shaped its pattern.
In that context, a simulated dataset is extremely useful as the exact transmission history is known and can be compared to the histories inferred from different software packages. The last decade has seen the development of several integrated epidemic and genetic simulation tools that can be used to assess the performance of some of these models, such as TreeSim (Stadler & Bonhoeffer, 2013), Seedy (Worby & Read, 2015), ouTbreaker2 (Campbell et al., 2018) or faviTeS (Moshiri, Ragonnet-Cronin, Wertheim, & Mirarab, 2019).
While undoubtedly useful, these tools fall short in accommodating a wide range of epidemiological scenarios. In particular, arboviral (e.g. Zika, dengue or yellow fever) outbreaks, where two types of hosts participate in the epidemic process, are poorly modelled. These hosts are characterized by drastically different behaviour or infection dynamics and cannot be accurately modelled using a single host type. Furthermore, geographical location diffusion is simulated in these tools, when possible, on a contact network or in discrete space. Yet, recent years have seen the development of methods taking advantage of phylogeographical diffusion in continuous space (Dellicour, Rose, Faria, Lemey, & Pybus, 2016;Lemey, Rambaut, Welch, & Suchard, 2010), creating a need for epidemiological simulations in a continuous space.
To enable the performance assessment of these methods under complex and realistic scenarios, including spread in continuous space or arbovirus outbreaks, we present nosoi, a flexible agentbased transmission chain simulator implemented as an open-source r package (R Core Team, 2019).

| CHAR AC TERIS TIC S
nosoi generalizes and significantly extends a basic model that allowed individual humans and mosquitoes-each one being characterized by a unique set of infection parameters-to interact within a simulated environment (Fontaine et al., 2018). It was initially designed to model real-world arboviral epidemics unfolding under varying within-host dynamics (Fontaine et al., 2018).
nosoi employs agent-based modelling, which focuses on the individual active entities-known as (autonomous) agents-of a system and defines their behaviour and the interactions between them.
The main interest then lies in the global dynamics of and the complex phenomena within the system that emerges from the interactions of the many individual behaviours. Within nosoi, the agents' behaviour is governed by user-specified rules that can accommodate high levels of stochasticity at each level of the epidemic process. Agents can experience dual-host dynamics, such as those from human and mosquito populations, and exist in structured populations, with different behaviours according to host type and/or structure. Population structure can either be absent, discrete (e.g. different categories) or continuous (such as geographical space). In these structures, agents can trigger a movement, a contact or a transmission event, with the probability of such an event occurring being potentially host-, individual-, structure-and/or time-dependent. These agents are recruited when infected and can either recover or die from the infec- In essence, nosoi allows the user to simulate and keep track of one or more transmission chains occurring during an infectious disease outbreak and, as such, to store and output a (collection of) transmission tree(s). Genetic data can be subsequently simulated along each transmission tree using sequence simulation software such as πbuss (Bielejec et al., 2014) or SantaSim (Jariani et al., 2019), which can then serve as input for phylodynamic inference methods. nosoi is accompanied by extensive tutorials, helping the user to set up and visualize their simulation, available as documentation in the package, or at https://slequ ime.github. io/nosoi/.

| PR AC TI C AL E X AMPLE
We here showcase nosoi with the starting scenario of a single human infected with an Ebolavirus-like pathogen in West Africa.
The simulated epidemic unfolds in a geographically structured host population, specifically in a continuous geographic space, for 365 days or discrete time-steps. Within-host dynamics, influencing the probability of exiting the simulation (dying or recovering) and the between-host transmission probability, are modelled according to published literature that describes Ebolavirus infection in humans (Casillas, Nyamathi, Sosa, Wilder, & Sands, 2003;Skrip et al., 2017). The remaining parameters (number of daily contacts, probability of movement and standard deviation of the random walk in continuous space) were empirically set. The number of daily contacts is restricted by the number of people living in the area, as provided by spatial demographics data obtained from WorldPop (www.world pop.org), to avoid reaching locally unrealistic counts of infected humans. The complete specification and accompanying code for this simulation are available as a document on F I G U R E 1 Schematic of status and location assessment for each agent (in case of a structured population), or host, during each discretized time step of the simulation. Optional steps in the simulation framework are shown in shades of green and are only performed in case of a structured (either discrete or continuous) population. Several factors (embedded in the gray box), either individually or globally set, can influence these steps according to user-specified settings

F I G U R E 2 Visualization of a simulated
Ebolavirus-like transmission chain in West Africa at three time-points (91, 228 and 365 days after the introduction of the first infected host), represented as (a) a network, (b) a tree or (c) a tree mapped on the continuous space the simulation took place in nosoi's website (https://slequ ime.github.io/nosoi /artic les/examp les/ ebola.html).
Over the course of 365 days, the simulation has yielded 3,603 infected agents. The average number of secondary cases per agent is 1.12, which is roughly coherent with previous epidemiological estimates of R 0 for previous Ebolavirus outbreaks ( Van Kerkhove, Bento, Mills, Ferguson, & Donnelly, 2015). The increase in infected agents' number is exponential, as would be expected considering the specifications of the model, that is, absence of intervention strategies or changes in the simulated environment.
The transmission chain can be represented either as a network

| US E S
Trends in globalization, including expansion in international travel and trade, have extended the reach and increased the pace at which infectious diseases spread (Chan et al., 2010). These trends provide infectious agents with ample opportunities to establish and spread successfully, but many practical difficulties remain in accurately inferring key aspects of an epidemic. Standard testing of models of spread typically focuses on using that same model to generate simulated data, which offers important but limited insights and mostly provides a test of proper implementation and a way to compare different methodologies. nosoi, however, is a phylogenetic model-independent agent-based simulation framework that offers realistic and complex epidemiological scenarios.
As such, it enables accurate testing of popular inference methods in both discrete and continuous phylogeography using either maximum-likelihood (Ishikawa, Zhukova, Iwasaki, & Gascuel, 2019) or Bayesian inference (Lemey, Rambaut, Drummond, & Suchard, 2009;Lemey et al., 2010;Suchard et al., 2018), which are widely used in pathogen phylodynamics. In that regard, an interesting application of our proposed simulation framework could be to study the increasingly popular structured coalescent models  Müller, Rasmussen, & Stadler, 2017), and to compare their accuracy under realistic epidemiological transmission scenarios against discrete phylogeographical inference.
nosoi enables the simulation of real-life scenarios of viral outbreaks, and we provide several example scenarios to showcase its capabilities to generate a single transmission chain using different settings. An important aspect is that the resulting transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral genetic relationships between pathogens sampled from these hosts.
In that regard, it is crucial to acknowledge the growing number of methods that infer either phylogenetic trees, transmission trees or jointly estimate both (for an overview, we refer to Baele, Suchard, Rambaut, and Lemey (2017)).
Apart from assessing the performance of various methods in reconstructing geographical spread or the dynamics of an infectious agent, nosoi can prove useful for assessing the performance of classic deterministic SIR and SIRS compartmental models (Kermack & McKendrick, 1927). These epidemiological models estimate the theoretical number of people infected with a contagious illness in a closed population over time under some assumptions. For example, the original SIR model assumes that the population size is fixed, that the incubation period of the infectious agent is instantaneous and that the duration of infectivity is the same as the length of the disease. It also assumes a completely homogeneous population with no age, spatial or social structure.
These assumptions can be matched as closely as possible by the user-defined settings in nosoi or be violated in more realistic settings, allowing to examine the sensitivity of the deterministic models to the assumptions under a complex and fine-tuned epidemiological scenario.
nosoi also offers, in line with its initial purpose (Fontaine et al., 2018), a comprehensive framework to leverage empirically acquired data. A pathogen's within-host dynamics characterized in laboratory settings can be embedded into a full stochastic epidemiological model, allowing the user to explore how variation can affect its epidemic potential.
Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands-on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model or part of it can be easily and quickly implemented. The documentation provides suggestions for visualization using well-known external r-packages, such as ggploT2 (Wickham, 2009) or ggTree (Yu, Lam, Zhu, & Guan, 2018;Yu, Smith, Zhu, Guan, & Lam, 2016). The package is also fully integrated in the r and phylogenetic environments, and, through the use of the Treeio and TidyTree r packages (Wang et al., 2019), simulated transmission trees can be exported in a wide variety of formats for downstream analyses, such as the beaST  or jplace (Matsen, Hoffman, Gallagher, & Stamatakis, 2012) formats.
In summary, nosoi provides a complete, tunable and expandable framework to simulate epidemiological processes based on transmission chains, in a user-friendly manner. Accessible through GitHub and the CRAN, the code is well covered by unitary tests and accompanied by extensive documentation, providing help and practical examples to users. Open-source and coded in the widely used r language, it allows users to customize their model by implementing new mechanisms for all or part of the core model. In addition, and contrary to other available tools, by decoupling sequence evolution from the epidemiological process, it can connect to any external sequence simulator, allowing the user to choose a tool and model that can address the biological question of interest.

ACK N OWLED G EM ENTS
The authors would like to thank Maude Jacquot and Albin Fontaine

AUTH O R S ' CO NTR I B UTI O N S
S.L. designed and conceived the package, and wrote its documentation; P.B. and S.D. provided editing and optimization to the package r code; P.L. and G.B. supervised and guided the project; S.L. and G.B. wrote the initial draft. All authors contributed critically to the drafts and gave final approval for publication.

DATA AVA I L A B I L I T Y S TAT E M E N T
The package is available on GitHub (https://github.com/slequ ime/ nosoi) and the CRAN (https://cran.r-proje ct.org/packa ge=nosoi).
The reviewed version of the package presented in this manu script is available through Zenodo (https://doi.org/:10.5281/zenodo. 3860006). The complete specification and accompanying code for the simulation presented in this manuscript are available as a document on nosoi's website (https://slequ ime.github.io/nosoi /artic les/ examp les/ebola.html).