Mapping influenza activity in emergency departments in France using Bayesian model‐based geostatistics

Background Maps of influenza activity are important tools to monitor influenza epidemics and inform policymakers. In France, the availability of a high‐quality data set from the Oscour® surveillance network, covering 92% of hospital emergency department (ED) visits, offers new opportunities for disease mapping. Traditional geostatistical mapping methods such as Kriging ignore underlying population sizes, are not suited to non‐Gaussian data and do not account for uncertainty in parameter estimates. Objective Our objective was to create reliable weekly interpolated maps of influenza activity in the ED setting, to inform Santé publique France (the French national public health agency) and local healthcare authorities. Methods We used Oscour® data of ED visits covering the 2016‐2017 influenza season. We developed a Bayesian model‐based geostatistical approach, a class of generalized linear mixed models, with a multivariate normal random field as a spatially autocorrelated random effect. Using R‐INLA, we developed an algorithm to create maps of the proportion of influenza‐coded cases among all coded visits. We compared our results with maps obtained by Kriging. Results Over the study period, 45 565 (0.82%) visits were coded as influenza cases. Maps resulting from the model are presented for each week, displaying the posterior mean of the influenza proportion and its associated uncertainty. Our model performed better than Kriging. Conclusions Our model allows producing smoothed maps where the random noise has been properly removed to reveal the spatial risk surface. The algorithm was incorporated into the national surveillance system to produce maps in real time and could be applied to other diseases.


| INTRODUC TI ON
At least three million people are severely affected by seasonal influenza each year, leading to substantial morbidity and mortality and inducing important stress on healthcare structures. 1 Seasonal influenza can result in patient overload in secondary healthcare settings, in particular hospital emergency departments (EDs). During these epidemics, it is essential for healthcare authorities to have at their disposal an accurate representation of the localized risk of influenza, ideally in real time, to wisely adjust the healthcare offer by increasing bed capacity, reallocating human resources or postponing nonurgent care. As part of a surveillance toolbox, maps of influenza activity could prove useful in such context. Disease maps provide a visual summary of complex geographic information, to facilitate interpretation of the data, to highlight existing patterns across space and to identify areas of elevated risk. 2 Beyond their importance for surveillance and decision-making, disease maps can also be interesting communication tools towards clinicians, partners, healthcare managers and the general public. Because of their visual and intuitive appeal, maps released in official surveillance reports are often published by mainstream media during outbreaks. 3,4 Generating reliable disease maps requires two important features: the availability of high-quality data and the use of sound statistical methods. Epidemiological surveillance is often based on sentinel systems, in which a network of selected general practitioners (GPs) or healthcare facilities report cases. Sentinel surveillance systems have made it possible to considerably improve our understanding of epidemic dynamics. For instance, surveillance data from the French Sentinelles network was used to quantify the impact of school closure on influenza epidemics. 5 However, some challenging aspects remain while working with sentinel data, such as the small proportion of participating GPs, irregular reporting and coarse spatial resolution. The availability of new types of standardized data on hospital ED visits offers new opportunities. In France, the Oscour ® network, a syndromic surveillance system, represents a complementary source of disease surveillance data, covering 92% of all emergency hospital visits in the country, with automatic and near real-time transmission of individual-level data. 6 It does not monitor influenza in the general population but makes it possible to obtain a very detailed picture of influenza activity in secondary health care, by providing high-quality data with larger volume, higher spatial resolution and fewer reporting delays.
When working with spatial point data, the objective of disease mapping is often to predict the continuous (ie, interpolated) risk surface of the disease over the study region, where the noise has been properly filtered. Due to the complex nature of spatial data and management of uncertainty in the estimates, the generation of such maps requires the use of appropriate statistical methods. In geostatistics, the most widely used tool, known as Kriging, allows to carry out spatial interpolation or smoothing of observed values, by constructing a linear predictor for unobserved values of a continuous spatial process and estimating the covariance structure of the data with a tool known as the variogram. 7,8 However, traditional Kriging is less appropriate when considering non-Gaussian outcomes (eg, disease counts or proportions) and does not account fully for inherent uncertainties, such as those arising from the uneven distribution of underlying population and those associated with estimating the variogram parameters. 9 Despite these limitations, Kriging is still used to map influenza activity. [10][11][12] We propose to rely on an alternative statistical approach, Bayesian model-based geostatistics (MBG), a class of generalized linear mixed models, with a multivariate normal random field as a spatially autocorrelated random effect. Our objective was to create reliable weekly interpolated maps of influenza activity in the ED setting in France using emergency hospital data from the Oscour ® network, to inform Santé publique France, the French national public health agency, and local healthcare authorities. We developed an algorithm to routinely produce weekly maps that can be used as surveillance, decision-making and communication tools, and integrated it into MASS, a web application used at Santé publique France for monitoring influenza activity. 13 Although this model was developed for influenza in France, the methodological framework we describe in this study could be more widely applied, both to other diseases and to other surveillance systems.

| DATA
Following the 2003 heat wave in France, Santé publique France set up a new syndromic surveillance system, which included the Oscour ® network ("Organisation de la Surveillance Coordonnée des Urgences," Coordinated Health Surveillance of Emergency Departments), based on hospital EDs. The creation of this network was motivated by the need to provide high-quality information to public health authorities to help with evidence-based decision-making and to have a real-time assessment of the situation in secondary health care. 14 This surveillance network has already been described elsewhere. 6,13 Briefly, data are collected directly from patients' computerized medical files filled in during medical consultations. For each patient, the collected data include date, age, gender, zip code, ED identification number, reason for emergency visit, main and associated medical diagnosis based on the tenth edition of the International Classification of Diseases (ICD-10), and whether the patient was admitted for hospitalization after discharge. Encrypted data are transmitted daily to Santé publique We included influenza-coded cases, which gathered visits with ICD-10 codes J09, J10 and J11 as clinical diagnosis. We used, as a measure of risk, the weekly proportion of influenza-coded cases among all coded visits, which comprises visits with all diagnostic types: diseases, accidents, injuries, etc. This indicator allowed us to account for the variability in the volumes of visits among EDs and over time. We used the number of coded visits as a denominator rather than the total number of visits (coded and noncoded), so that the measured risk is not biased by the proportion of coding, which can vary over space and time (Section 2 in Appendix S1).

| Modelling framework
Our objective was to create weekly continuous surfaces of influenza activity in metropolitan France, using point-referenced data at each ED locations. Geostatistics capitalize on the spatial correlation between observations to carry out spatial interpolation or smoothing of the attribute of interest, filtering the noise in the observations and highlighting existing patterns. The basis of our approach is a body of theory known as Bayesian MBG. 15 MBG combine the efficiency of classical geostatistical interpolation algorithms for spatial prediction with the formalization and flexibility of generalized linear modelling and allow the application of Bayesian methods of statistical inference for parameter estimation and spatial prediction. 16 Uncertainty is rigorously handled at all stages of the modelling process.

| Model description
In the geostatistical framework, the point-referenced data are realizations of an underlying spatial process (or random field) {U(s), s ∈ D} characterized by a spatial index s which varies continuously in the fixed domain D. Here, the location s was a two-dimensional vector with latitude and longitude. For each week, we assumed that our observations (number of influenza-coded cases) available at n spatial locations and represented by the vector y = (y(s 1 ),…, y(s n )), where the set (s 1 ,…, s n ) indicates the locations of EDs, followed a binomial distribution: with N i the total number of coded visits in location i and p i the influenza probability for location i. The linear predictor was defined as the logistic transformation of p i and included an intercept α, a random effect represented by U(s i ) which is the realization of a random field U at the location s i and an unstructured random error (residual noise) e i : The component e i was modelled as Gaussian with zero mean and variance 2 e . The random field was modelled as a Gaussian field (GF), so that the vector (U(s 1 ),…,U(s n )) followed a multivariate normal distribution with zero mean and spatially structured covariance matrix Σ: The covariance matrix Σ was defined by the Matérn spatial covariance function: where ‖ s i -s j ‖ is the Euclidean distance between two locations s i and s j and σ 2 is the marginal variance. The term K λ denotes the modified Bessel function of the second kind and order λ > 0. The parameter λ measures the degree of smoothness of the process and is usually kept fixed due to poor identifiability. Conversely, κ > 0 is a scaling parameter related to the spatial range r, that is the distance at which the spatial correlation becomes almost null (ie, <0.1) via the empirically derived definition r ≈ √ 8 . 17 Bayesian specification was then completed by assigning prior distributions to parameters and hyperparameters (Section 3 in Appendix S1).

| Implementation
The model was implemented using the integrated nested Laplace for the response variable were predicted at each pixel of a regular grid covering France. We used a grid of 2 km resolution (572 × 530 pixels) to obtain a smooth posterior surface. Every week was fitted independently. All stages were coded using the R-INLA package (www.r-inla.org). Model theory and implementation in R have already been thoroughly described. [18][19][20] We provide a summary of key features and the R code for the model in Section 3 in Appendix S1. Two complementary maps were generated: in the first map, we used the posterior mean of the proportion of influenza-coded cases as a point estimate for each pixel, while in the second map, we represented the coefficient of variation (relative standard deviation) to highlight areas with more or less uncertainty. The coefficient of variation is a standardized measure of dispersion of a probability distribution and is defined as the ratio of the standard deviation to the mean. It shows the extent of variability in relation to the mean.

| Model assessment
spatial prediction method, we compared the observed proportion of influenza-coded cases at the district level (N = 96) to the grid predictions averaged by district. Three statistics were computed: Pearson's linear correlation coefficient, the mean error (same unit as the data), which indicates whether the predictions are biased by being on average too low or too high, and the mean absolute error, which measures the average magnitude of prediction errors.
Third, we checked that our model was not overfitting the data, by

| RE SULTS
Over the study period (7 November  credible intervals for the model parameters are provided in Table 1.
The posterior mean of the range was 168 km for week 49, indicating that the spatial correlation became negligible beyond this distance.
It increased to 176 and 227 km for weeks 50 and 51, respectively.  Figure 3C). Overall, our model performed better than the standard Kriging method (Section 4 in Appendix S1). In particular, Kriging predictions at the district level underestimated influenza activity (mean error of -0.25%, 15 times higher than the MBG mean error) and the correlation coefficient for the leave-one-out predictions at the ED locations was lower (r = 0.84) than with the MBG model.

| D ISCUSS I ON
Disease maps are more and more often used in infectious diseases epidemiology, as they are efficient surveillance and control tools.
However, due to the complex nature of spatial data and management of uncertainty in the estimates, the generation of such maps administrative boundaries. This makes the interpretation of the data easier. The model has been incorporated into the national syndromic surveillance system, as described below. We provided the R code so that the model can be more widely applied to other countries or diseases.
Spatial interpolation of disease rates can be performed using a variety of simple techniques, including inverse distance-weighted methods, splines or trend polynomial surfaces. The main limitations of these deterministic methods is that they do not provide a measure of the reliability of the predictions and do not take advantage of the spatial structure of the variable. 10,22 In 1992, Carrat and Valleron were the first to apply Kriging to the spatial analysis of an infectious disease. At that time, Kriging was a real improvement over the existing approaches as it made use of the spatial correlation between observations and allowed for estimation of the interpolation error. 10 However, traditional Kriging is not well suited to the analysis of disease rates as it does not take into account the underlying population size, while rates computed from sparsely populated areas tend to be less reliable, 22 and works best for data that follow a normal distribution, which is hardly fulfilled with counts or proportions. 23  They have been shown to be valuable methodologies for generating predictive prevalence and risk maps for malaria, 16,26 shistosomiasis 27 or poverty, 28 among others. In our model, the spatial dependence was defined using a Matérn covariance function. We observed that the spatial range r, a parameter of this covariance function, was increasing during the ascending phase of the epidemic, as the epidemic spread through the country and the affected areas become larger. As spatial correlation is mainly due to population connectivity between locations, parameters of a mechanistic model explicitly describing flows of individuals would be fixed over time. But here, this underlying mechanism is approximated with a phenomenological model, leading r to vary over time.

Parameter
Week 49 Week 50 Week 51 Intercept, α  In this study, we used as a risk indicator the proportion of influenza-coded cases among all coded visits, as previously carried out by others. 30,31 This allowed us to account for the variability in the volumes of visits among EDs, and the proportion of coding, which can vary over space and time (Section 2 in Appendix S1).
One limitation of this measure of risk is its sensitivity to the "case mix," that is the variability in the number of noninfluenza cases, which can bias the proportion of influenza-coded cases if unusual variations are observed. The number of noninfluenza cases did not display large temporal variations over the study period and therefore should not substantially affect the observed trends (Section 2 in Appendix S1). However, we cannot rule out that the proportion of influenza-coded cases might be impacted by the spatial variations in the case mix. The ICD-10 codes used to classify cases as influenza are not perfectly sensitive nor specific to influenza but we have no reason to think that coding practices significantly differed among EDs. In addition, the proportion of influenza-coded cases is a mixture of the true influenza risk in the general population and the care-seeking behaviours, which depend on (a) the severity of the virus (ED surveillance systems tend to capture more severe cases), (b) the ease of access to an ED (the risk might be underestimated in areas where access to EDs is more difficult, due to lower density of health services and longer travel time to hospitals) and (c) the socioeconomic status (the risk might be overestimated in neighbourhoods with lower socioeconomic status that have no access to a GP). Thus, our maps must be primarily seen as the spatial representation of the influenza risk in the ED setting.
This data set is complementary to another influenza surveillance system, the Sentinelles network of volunteer GPs, which monitors influenza consultations in general practice and produces maps using Kriging. 10 It has been shown that data from both sources followed similar temporal patterns. 6 Spatial patterns, however, are not expected to be directly comparable. First, they do not measure the same risk and populations are different regarding age distribution and severity profiles. 13 Second, the Oscour ® network generates a larger volume of data (number of cases is five times higher than in the Sentinelles network) and has a better spatial resolution (on average ~650 participating EDs by week, compared to ~260 GPs participating in the Sentinelles network by week). 32 Third, the Sentinelles maps are produced by applying Kriging to incidence estimated at the district level (ie, only 96 spatial points), without taking into account their precision (which depends on the number of participating GPs by district). 10 For our results to efficiently support surveillance and be accessible to public health professionals, it is essential that they are integrated in an effective information flow where data are gathered, analysed and reported on a weekly basis. This is performed through the intranet online application, called MASS, which has been developed by Santé publique France to provide their epidemiologists with an easy access to up-to-date surveillance data from specific and syndromic surveillance systems and to the results of statistical analyses of the epidemiologic risk. 13 One aim of this study was to complete this surveillance tool by developing automated disease mapping. The algorithm that we developed in R was thus added to MASS so that weekly disease maps can now be visualized. Along with the smoothed influenza activity map, the algorithm output also provides an uncertainty map. Indeed, it is important that scientists better communicate about uncertainties in model estimates as this might potentially affect interpretation of the data. Uncertainty in our results arises from at least four sources: low density of data sources in some areas, low volume of visits in some EDs, uncertainty in the spatial parameters and inherent spatial heterogeneity in influenza proportions that occurs over short spatial scale and that cannot be explained by the data and modelling approaches. This latter component of variation is captured as "noise" (randomness) by the geostatistical model and the smoothing process causes the loss of local details of the spatial variation in risk. But the model ensures that the degree of randomness is measured and incorporated in the predicted posterior distributions at each pixel. 16 This framework can be the basis for future developments, to include the temporal dimension, as well as population movements and connectivity for instance. Our algorithm could also be easily applied to other countries or any other diseases monitored by a surveillance system where cases are geographically referenced. Although our approach was mainly descriptive, with the aim of supporting surveillance, further research should aim at developing methods for predicting influenza activity in the ED setting in order to support local healthcare planning.

ACK N OWLED G EM ENTS
We thank the staff of Santé publique France implicated in influenza surveillance and the partners involved in the Oscour ® network. This work was supported by the LabEx "Integrative Biology of Emerging Infectious Diseases (IBEID)" (Grant Number ANR-10-LABX-62-IBEID).