Nonannual seasonality of influenza‐like illness in a tropical urban setting

Abstract Background In temperate and subtropical climates, respiratory diseases exhibit seasonal peaks in winter. In the tropics, with no winter, peak timings are irregular. Methods To obtain a detailed picture of influenza‐like illness (ILI) patterns in the tropics, we established an mHealth study in community clinics in Ho Chi Minh City (HCMC). During 2009‐2015, clinics reported daily case numbers via SMS, with a subset performing molecular diagnostics for influenza virus. This real‐time epidemiology network absorbs 6000 ILI reports annually, one or two orders of magnitude more than typical surveillance systems. A real‐time online ILI indicator was developed to inform clinicians of the daily ILI activity in HCMC. Results From August 2009 to December 2015, 63 clinics were enrolled and 36 920 SMS reports were received, covering approximately 1.7M outpatient visits. Approximately 10.6% of outpatients met the ILI case definition. ILI activity in HCMC exhibited strong nonannual dynamics with a dominant periodicity of 206 days. This was confirmed by time series decomposition, stepwise regression, and a forecasting exercise showing that median forecasting errors are 30%‐40% lower when using a 206‐day cycle. In ILI patients from whom nasopharyngeal swabs were taken, 31.2% were positive for influenza. There was no correlation between the ILI time series and the time series of influenza, influenza A, or influenza B (all P > 0.15). Conclusion This suggests, for the first time, that a nonannual cycle may be an essential driver of respiratory disease dynamics in the tropics. An immunological interference hypothesis is discussed as a potential underlying mechanism.


| INTRODUC TI ON
One of the most important challenges facing big-data studies in all fields is that the larger the data set the less precisely targeted each data point is in answering a specific question. Nowhere is this more apparent than in the big-data approaches used in infectious disease surveillance, where the volume of data has allowed many types of associations to be investigated, [1][2][3][4][5][6] but the distance between the source data (an online search, a news story, a social media post) and the presupposed condition (infection with a pathogen) is large enough to warrant additional inquiry into the validity of the association. Indeed, this has been done by several research groups for Google's flu prediction algorithm Google Flu Trends. 7-10 Critiques of the algorithm included its reliance on Internet search behavior remaining constant, an overfitting effect that may have given too much weight to associations that were present in training data sets only, as well as specific examples of incorrect forecasts. 9,11 The challenge in big-data disease surveillance is to narrow the gap between the infection and the data point describing it and to find a way to generate large data sets where the data points are grounded in the presence of virus, genetic material, an antibody profile, or a set of symptoms. This study presents an attempt at narrowing this gap, and like some of the early big-data studies 1,[12][13][14][15] is focused on respiratory disease and influenza virus.
In temperate countries, influenza virus is one of the most studied disease systems, exhibiting a predictable wintertime transmission season and a robust relationship between syndromic and molecular surveillance. Little is known about the epidemiology of influenza virus in the tropics despite a renewed research interest in tropical influenza over the past decade resulting from increased availability of influenza surveillance and sequence data. [16][17][18][19][20] To date, research on tropical influenza has concentrated on whether influenza epidemics exhibit annual seasonality [21][22][23][24][25][26][27][28][29] and whether influenza viruses show patterns of year-round persistence. [30][31][32][33][34] A third question that has received less attention is whether syndromic influenza-like illness (ILI) surveillance has the same peaks and troughs as molecular surveillance for influenza virus in these regions. In temperate countries, public health agencies are able to rely on ILI reporting to signal the onset of the influenza season, 1,35,36 but it is not known whether ILI and influenza correlate in tropical countries. 37,38 The majority of epidemiological studies looking at influenza and/ or respiratory disease in the tropics have two major drawbacks. The first is ignoring absolute case counts and reporting only the percentage of samples (nose/throat swabs) that test positive for influenza. 26,29,[38][39][40][41] Ignoring case counts makes it impossible to determine whether samples are being taken during an influenza season or outside of it. The second drawback is underpowering the analysis using a short time series or monthly data or both. [37][38][39][40][42][43][44][45][46] Monthly data are normally too coarse to infer the presence of an annual transmission season or other periodic trends (if these exist) unless the time series is very long. In fact, this is one of the reasons for disagreement in the current literature as some studies on respiratory disease in the tropics claim support for an annual transmission season 21,26,29,39,40,42,[47][48][49] while others show mixed or no evidence. 22,27,46,[50][51][52][53][54] Among these, some of the more weakly supported results are being used in public health policy to advocate for particular vaccination timings based on incorrectly identified seasonal signals. 29,49 For influenza virus specifically, studies with sufficient data 27,28,55 have generally found that annual seasonal signals are not supported in the tropics.
Understanding the dynamics of respiratory disease and influenza in the tropics-especially the presence or absence of annual seasonality-may allow the forecasting methods currently deployed in temperate countries [56][57][58][59] to be used for tropical influenza. Current forecasting methods rely on mechanistic susceptible-infectedrecovered (SIR) models and known/inferred climate associations to accurately predict increases in influenza virus infections. In the tropics, it is not known whether influenza dynamics obey classic SIR models, whether they are characterized by low-level persistence, or a combination of the two. It is also not known which climateinfluenza associations are expected to be present in tropical countries despite accumulating evidence that absolute humidity may be the most influential climate factor. 28,60 Essentially, the absence of winter in tropical countries makes respiratory disease forecasting much more difficult than in temperate or subtropical climates. If the intrinsic epidemiological dynamics and the presence/absence of climate associations can be understood in the tropics, forecasting of influenza epidemics may be possible. Thus far, the only attempt at influenza forecasting for the subtropics reported that the majority of forecast attempts (lead time >2 weeks before epidemic peak or onset) had accuracies below 50% when predicting the timing, onset, exhibited strong nonannual dynamics with a dominant periodicity of 206 days. This was confirmed by time series decomposition, stepwise regression, and a forecasting exercise showing that median forecasting errors are 30%-40% lower when using a 206-day cycle. In ILI patients from whom nasopharyngeal swabs were taken, 31.2% were positive for influenza. There was no correlation between the ILI time series and the time series of influenza, influenza A, or influenza B (all P > 0.15).
Conclusion: This suggests, for the first time, that a nonannual cycle may be an essential driver of respiratory disease dynamics in the tropics. An immunological interference hypothesis is discussed as a potential underlying mechanism. magnitude, or duration of an influenza epidemic, 61  reporting of ILI as simple as possible in order to encourage frequent reporting and wide participation and to create a real-time ILI surveillance system that could be used by health professionals in Ho Chi Minh City. Our study is most similar to the clinic-centered mHealth system setup in Senegal 45 and Madagascar, 64 and the benefits of this type of real-time, big-data epidemiology can be seen in the dengue hotline system recently described by Rehman et al. 65 The purpose of our study was to build a long-term consistent time series of both ILI reports and influenza molecular confirmations. We analyzed the data with traditional time series decomposition to detect periodic signals, with stepwise regression analyses to determine the importance of climate and other covariates, and with regression-based forecasting to determine the predictability of ILI trends in Ho Chi Minh City. lowing general symptoms (i) fever with axillary temperature above 37.5°C, (ii) malaise, (iii) headache, and (iv) myalgia; and (c) one or more of the following respiratory symptoms (i) cough, (ii) rhinorrhea, (iii) sore throat, and (iv) dyspnea. To encourage enrollment and reduce dropout, clinics are advised to send daily reports by standard mobile phone short messaging system (SMS) text messages; reporting with log books and email is also available. SMS messages are automatically passed to a text-parsing and data-cleaning system that was set up and is still actively managed by the Oxford University Clinical Research Unit (OUCRU) in HCMC. Every day, ILI reports are manually approved by a qualified project team member at OUCRU; on approval, they are automatically entered into a mySQL database that holds all data points for the study. A small number of clinics (about 8%) did not use SMS reporting (by their request) and instead emailed ILI numbers to the project team or wrote them down in a daily logbook provided by OUCRU. As part of the data processing pipeline, reports by email or logbook were regularly merged into the main mySQL database. There was no obviously apparent difference in ILI numbers when comparing clinics that used SMS, email, and logbook reporting.

| ILI data
Community engagement meetings were run for the first several years of the study to distribute and explain the study protocol, and a basic leaflet outlining the goals of the study and the reporting methodology was distributed to interested physicians. All documents were translated into Vietnamese, and annual reports and ILI trends were fed back to the clinics on a regular basis. A total of 63 clinics were enrolled in the initial study period (August 2009-December 2015). Clinics that reported frequent zeros (>50%), or withdrew too early (contributed <200 reports), were not considered for the analysis. The clinics included mostly single-doctor clinics, some that were open early morning and late evening only (to accommodate a full-time working schedule for that doctor at a city hospital) and some that were open day-time hours as that clinician's primary source of income. A few of the clinics were larger polyclinics with several doctors (three to five) and several nurses (five to ten) on staff, a waiting area, one or two patient beds for day-time only inpatient stay, and the ability to see between 100 and 200 patients per day. The presenting symptoms for patients attending the clinics in this study included ILI, fever, rash, skin infections, nausea, diarrhea, dehydration, conjunctivitis, muscle ache, joint pain, and physical cuts/scrapes/injuries from motorbike (or other) accidents.
In May 2012, a new study component was launched for 24 clinics that agreed to periodic collection of nasopharyngeal (NP) swabs so that a subset of ILI patients could be molecularly confirmed as positive or negative for influenza virus. A swabbing schedule was made at random every year, so that each clinic would be visited an approximately equal number of times, with two clinics selected for swabbing each week. In other words, each clinic was visited two or three times per year, and each week (excepting holidays and the early months of the swabbing substudy), there were two clinic visits lasting 3 days each; the schedule was designed in this way so that no single clinic would have too many visits, as some doctors viewed these as disruptive to the clinic's normal patient flow. Numbers of NP swabs collected each week depended on the numbers of ILI cases presenting at the clinics as well as patient consent.

The research protocol was approved by the Oxford Tropical
Research Ethics Committee at the University of Oxford and by the Scientific and Ethical Committee of the Hospital for Tropical Diseases in Ho Chi Minh City.

| Molecular confirmation
Respiratory specimens (nasal/throat swabs) were collected from ILI patients at outpatient clinics, transported the same day on ice to OUCRU, and stored in −80°C freezers for a maximum of 3 months before RNA extraction and influenza A and B PCR testing. All specimens were tested by real-time PCR using primers, probes, and reagents recommended by the World Health Organization (WHO) and the Centers for Disease Control and Prevention (CDC). Sequences of probes and primers used can be referred to in Table S1.

| Climate data
Data on daily mean temperature (T) and relative humidity (RH) were collected from Weather Underground for Ho Chi Minh City, Vietnam (http://www.wunderground.com), from the beginning of 2000 until the end of 2015. Absolute humidity (AH) was calculated using relative humidity and temperature: The series of daily climate data were smoothed with a 15-day moving average before being used in our analyses.

| Time series detrending and standardization
A total of 28 regularly reporting clinics (those who reported at least 200 reports from 2010 to 2015 and reported positive ILI numbers at least half of the time) were included in the time series analysis. A 29th clinic that met these inclusion criteria was removed for quality control reasons. The ILI data of 2009 were not used in the analysis due to the small number of reporting clinics during the first 5 months of the study. Each clinic's time series was converted to a z-score scale by computing the z-score of each ILI percentage inside a 12month moving window (centered at the calculated data point), thus removing long-term trends in the data; we verified that window sizes of 6, 9, 15, and 18 months did not have any qualitative effects on the overall ILI trends. The daily z-scores were averaged across clinics and smoothed using a 15-day window to construct the ILI z-score time series that we used in our subsequent analysis (see Figure S1 for effects of different smoothing windows).
The time series was validated by verifying that it was not white noise (P-value <10 −15 , Box-Ljung test) and by showing that the majority of individual clinics had a higher correlation to the aggregate time series than would be expected if reporting were random ( Figure S2).

| Statistical analysis and forecasting
Periodicity and frequency decomposition in the smoothed 6-year ILI trend were assessed with a standard autocorrelation function (ACF) and a discrete Fourier transform (DFT). The ILI z-score time series was regressed (linear link function) onto linear and nonlinear variants of the climate variables (T, RH, AH, √T, √RH, √AH, T 2 , RH 2 , and AH 2 ) to determine which nonlinear effects were present, as there is some evidence of nonlinear effects of climate on ILI. 67 In addition, a time-dependent fixed effect α j mimicking the dominant periodicity identified by the ACF (here, 206 days) was included on the right-hand side of the regression equation. Twenty-one α j were allowed for in the model, meaning that periodicity in the system is modeled with a piecewise constant function taking 21 different values during a full period of 206 days. This is equivalent to having 21 fixed-effect terms in a regression, each multiplied by an indicator variable describing whether that data point belongs to that period, ensuring that only one fixed-effect term is added at a time. The piecewise constant function has an advantage over the sinusoidal approach traditionally used in epidemiological analyses because the stepwise nature of the α j allows the periodicity in the system to take any shape determined by the data and does not require that the forcing function to be sinusoidal or continuous. In exploring the shape of this function, it was found that more than was defined simply as the median of the absolute differences between the predicted z-score time series and the real z-score time series. We varied the size of the training set to determine how many years of data would be needed to achieve robustness in predictability ( Figure S4).

| Bootstrapping climate data
To test the robustness of this prediction to changes in the annual cli-   (Table 1).

| RE SULTS
To create a single ILI time series for Ho Chi Minh City, we detrended and standardized each clinic's ILI percentages to a z-score scale and then aggregated these into a single z-score time series.
Several internal validations were carried out to ensure that the data followed certain expected behaviors for multisite syndromic reporting and that arbitrary or random reports were not being sent during the course of the study (see Materials and Methods).
In particular, note that individual clinic time series correlated with each other, and replacing a single clinic with a white noise signal of equal variance reduced the correlation between that clinic and the aggregate ILI trend ( Figure S2). ILI trends in Ho Chi Minh City ( Figure 1) suggest that there are typically multiple ILI peaks per Year year, as has been observed in other tropical and subtropical regions. 28,30,61 Visually, no seasonal or annual cycle appears in these data.

Clinics reporting at least
In the terms with explanatory power were the daily temperature, relative humidity (RH and √RH), the interaction term between RH and temperature, lagged climate terms, and the nonannual cycle (see Table 2).
When factoring in interactions and nonlinear terms, the effects of climate are not very strong. At 75% relative humidity, an increase in 1°C is associated with a 0.085 decrease in ILI on the z-score scale. At 28°C and 75% relative humidity, a 10% increase in relative humidity is associated with a 0.034 increase in the ILI z-score. The association between the nonannual cycle and the ILI trend is statistically significant, and the nonannual effect is identified using the Akaike information  Figure 4). Thus, the nonannual cycle is the key characteristic of this dynamical system that enables accurate forecasting.
TA B L E 2 Estimates of coefficients from regressing the smoothed daily ILI z-scores (2010-2012) onto two climate variables, an interaction term, and the temporal indicator variables that were used to construct a periodic 206-d forcing function in the time series  Several robustness tests were performed. Figure S6 shows that forecasting using a 202-day intrinsic nonannual cycle in combination with bootstrapped climate data gives the most accurate forecasts and that a 211-day cycle was optimal when forecasting ILI trends using real weather data. These results are robust to whether mean or median prediction error is used as an evaluation criterion ( Figure S7).
Using a simpler regression model with no lags and no nonlinear climate terms, a 201-day cycle gave the lowest prediction errors ( Figures S8 and S9). All analyses provided support for the existence

| D ISCUSS I ON
Our study demonstrates the value of community epidemiology studies for describing fine-scale dynamics of ILI in tropical settings where respiratory disease dynamics are nonannual and difficult to predict. We were able to show that a network of community clinics can generate a high-quality syndromic time series that can be used to understand local transmission patterns of respiratory disease and that such a network can generate a significantly larger data set (~6000 data points per year) than traditional surveillance systems that report weekly or monthly measures of incidence. This volume of data increases statistical power to detect ILI associations as well as the presence of nonannual forcing in the system. The present study does not achieve the data volume seen in "big-data" study designs 1,4,5,74 which can have tens of millions of observations per year, but the specificity of our data signal is higher than in the aforementioned studies as each data point in our study corresponds to a patient, seen by a physician, determined to have met or not met the clinical criteria for ILI.
The major quality control challenge we encountered was accounting for long-term trends in ILI (we had a downward trend in our data).
In a multisite time series, detrending must be carried out carefully, and changes in a site's reporting patterns must be investigated individually. From discussions with the reporting physicians in our study, the putative causes of the decreasing trend in ILI were likely to have been (a) a more than doubling of patient visit costs that would have reduced the likelihood of reporting a minor respiratory illness, (b) increased clinical specialization at some sites, or (c) more conservative interpretation of ILI guidelines after molecular diagnostics were introduced in May 2012. In addition, during 2011 and 2012, a few large clinics were enrolled in the study, and some of these had higher patient volumes but lower ILI percentages. All of these features of community-based syndromic reporting systems need to be considered for both study design and surveillance purposes. Detrending with a 12-month moving average appears to be the simplest way to detrend and preserve any potential annual structure in the data.
The lack of correlation between influenza trends and ILI trends suggests that the transmission dynamics of respiratory disease differ between tropical and temperate zones, consistent with the past decade's literature on this topic. 24,27,28,30,60,63 Given the observed pattern of multiple ILI peaks in our data, some of which are influenza epidemics and some of which are not, the natural hypothesis explaining this pattern is that multiple respiratory pathogens cocirculate and cause asynchronous epidemics. It is unknown whether in such a system multiple respiratory The second major question that arises from the basic correlational analysis between ILI and influenza is why high influenza periods should be observed when ILI is low. To the best of our knowledge, this pattern has not been observed in other surveillance systems, as a wave of influenza infections is normally sufficient to generate a substantial uptick in the ILI signal. The likely explanation for a high-influenza low-ILI period is a larger than expected prevalence of other respiratory viruses among the reported ILI cases; this is possible as the community clinics in our study are almost exclusively outpatient and likely to see many mild cases of respiratory disease. If influenza infection represents only a small fraction of respiratory disease among these outpatients, a wave of influenza alone would not generate an ILI peak.
In general, community-based studies of respiratory disease should aim to characterize the contribution of all respiratory viruses to the ILI trend to determine whether it is a particular pathogen's dominance or synchrony among certain pathogens that generates an ILI peak.
The major finding in our study is that the dominant periodicity observed in our ILI time series is nonannual. This is the first report of a should be investigated in other locations to determine whether a period with particular climatic features can result in an increase or decrease in viral transmission that is detected by larger case numbers several weeks later. Much work remains to be done before respiratory disease outbreaks in the tropics can be forecast accurately; our hope is that the nonannual signal identified in this study will help in this endeavor.
A second limitation in the current study design is the lack of age information. We experimented with several different reporting methods (SMS, email, log books) for this study, but only the logbook method was able to capture age information consistently.
Unfortunately, this method was adopted by a minority of the clinics in our study, and it was not compatible with real-time reporting.
The age distribution of ILI cases represents a critical data gap in our study and in other mHealth studies that aim at real-time reporting, as the age distribution could tell us whether the major disease burden skews toward childhood respiratory diseases or general respiratory diseases like influenza. As tropical countries have younger age distributions than temperate countries, this difference may have a profound epidemiological effect on differences in ILI dynamics between temperate and tropical zones, as well as the proportion of ILI cases that are caused by influenza vs other respiratory viruses.
The public health value of our mHealth reporting system is that ILI results can be fed back in real time to participating physicians and the community of health professionals in Ho Chi Minh City. Real-time ILI trends from our study are publicly available and updated daily. The two key questions raised by our study are (a) to what extent the transmission of noninfluenza respiratory viruses in the tropics is a potential driver of complex multipathogen transmission system and (b) whether it is useful to attempt the timing of influenza vaccination in an epidemiological scenario where influenza epidemics occur irregularly. We aim to investigate the first of these questions by introducing more respiratory virus diagnostics into our study. The second question can be evaluated with a mathematical model of influenza epidemiology, but will necessitate a longer influenza time series and a better understanding of the key drivers of influenza virus dynamics in tropical settings.

ACK N OWLED G EM ENTS
We are very grateful to all the participating clinicians in this study,

CO N FLI C T O F I NTE R E S T
MFB has acted as a consultant to Visterra Inc in Cambridge, MA.

AUTH O R CO NTR I B UTI O N S
MFB and JF designed the study. HML and AW analyzed the data.