The meteorological observations consisted of temperature and RH recorded each ten min. We summarized these data by computing the daily mean and the daily range, considering the minimum and maximum record values for two meteorological observed variables, air temperature and air RH, and one calculated SVPD, obtaining ten typical weather variables for each day.

#### Climate data for Fortaleza at the city scale

The region of Fortaleza has a typical tropical wet and dry climate (Figure 1) with high temperatures and high RH throughout the year. December and January are the warmest months, with a mean range of 25° C to 31° C. On a yearly average, the RH is 77% and the total amount of rainfall is 1,378 mm. The precipitation occurs during the first six months of the year, when relative humidity is high. A strong rainy season spans from February to May, with rainfall particularly intense in March and April. The climate is generally dry during the last six months of the year with very little rainfall in that period (Hastenrath and Greishar 1993).

The monthly number of DEN cases recorded for the city of Fortaleza remained very low during the first three months of the year (Figures 2a, 2b). This number grew to 4,000 cases/month in July, 2005 and August, 2005, a few months after the end of the rainy season. The year 2007 was somewhat different, with a slower temporal distribution and an attenuation of the seasonal peak (about 2,000 cases/month from May to July). Therefore, we distinguish the temporal pattern of DEN notification for the city of Fortaleza-CE (Figure 2c).

Measured at the meteorological station of the Universidade Federal do Ceará (UFC) (Site #0 in Figure 1 from Degallier et al. (2012)), weekly air temperature (AT) and air RH were relatively stable during 2005–2007, with a yearly average of 27° C and 78%, respectively. The seasonal variation, with ranges of about 8° C for AT and 35% for RH, followed the same seasonal variation as that of rainfall, with lowest (highest) temperature and highest (lowest) humidity observed during the wet (dry) season. The saturated vapor pressure deficit (SVPD) showed a similar seasonal variation with low values (average ∼ 6 mb) during the four months of the rainy season, and higher values (up to 12 mb) during the remaining eight months.

#### Experimental procedure at the local scale

During the three years of the study, we conducted twelve experiments (EXP1–12) distributed on seven sites (from #1 to #7, Degallier et al (2012)) in Fortaleza, separated by 3 to 10 km. Sites #1 to #6 were operated in private or institutional locations within a natural air environment, while Site #7 was operated inside a closed air-conditioned room at the Ceará State Secretary of Health. AT and RH were recorded each ten min. Over three years (2005–2007), 12 experiments were conducted, equally distributed during the wet and dry seasons, respectively. Additional information about local conditions and temporal experiment difficulties is provided in Degallier et al. (2012).

Briefly, 40–50 mosquitoes (35–45 females and approximately five males) were released in netted wooden cages (30×30×30 cm) at the Laboratory of Entomology of SESACE (Secretaria Estadual de Saúde do Ceará). The mosquitoes were taken from their rearing cage two days after hatching. Just before release in the experimental cages, the females were offered anaesthetized quail for blood-feeding for two hours. On the same day as blood-feeding, the cages were installed at the experimental sites. Cotton plugs with ten percent sugar were changed daily and dead mosquitoes were counted. A small tube with filter paper and tap water was renewed daily in each cage to collect eggs. Most of the twelve experiments lasted until the death of the last mosquito in the last cage. As there was no large-scale outbreak of dengue during the experiments, the cages were not exposed to insecticides.

Significant seasonal differences were recorded at all sites where mosquitoes were exposed to natural air conditions. The daily variations recorded a higher variability for the wet season, during which rainy episodes of a few days could induce a drop in AT which is associated with higher humidity. On the contrary, the climate parameters were quite stable during the dry season. Independent of a specific season, the daily variations of the measured variables were somewhat different between the sites. For instance, the daily temperature range was systematically higher for Site #2 than for Site #1, as for the daily relative humidity range, on behalf of the dry season and similarly for the wet season (Degallier et al. 2012).

#### Discrete hazard and survival functions

We used the framework of survival analysis to analyze the influence of weather conditions and age on mortality. While analyzing the influence on mortality of factors that vary in time, lifetime observations could not be used as a target variable to be explained and a regression model was required. Hence, two statistical analyses were performed in this study: (1) using as a response the dummy variable representing death occurrence of a given individual on a given day; (2) using as a response the discrete variable representing the number of death occurrences on a given day. We then attempted to explain these variables using the ten climate factors as predictors.

To model the effect of some continuous variables *X*_{1,}…,*X*_{k} on the dummy variable *Y*, associated with the mosquito's risk of death, we used the binary logistic regression (Agresti 2002). as well as proportional hazard modeling. To analyze the influence of climate and age on mortality, we used the framework of survival analysis (Lawless 2003), the branch of statistics dealing with death in biological organisms and in particular the concepts of survival and hazard rate functions. The survival function, conventionally denoted by *s*, is defined as *s(t) = 1-F(t)*, where *F(t)-P(T ≤ t*) is the cumulative probability function, with *t* the random variable denoting the time of death. The hazard function (also known as force of mortality or hazard rate), is defined as the event rate at time *t*, conditional on survival until time *t* or later: *P(t ≤ T < t +δt*│*T*≥*t*. Implementing this framework, we estimated an empirical hazard function.

The proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity (Comfort 1979). In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. Survival models can be viewed as consisting of two parts: the underlying hazard function, often denoted *h*_{0}*(t*_{j}), describing how the hazard (risk) changes over time at baseline levels of covariates, and the effect parameters, describing how the hazard varies in response to explanatory covariates. The proportional hazards condition (Breslow 1975) states that covariates are multiplicatively related to the hazard. However, the covariates are not restricted to binary predictors, and in the case of a continuous covariate *X*_{i}, the hazard responds logarithmically. Thus, each unit increases in *X*_{i} results in proportional scaling of the hazard. The effect of covariates estimated by any proportional hazards model can thus be reported as hazard ratios. Cox (1972) observed that if the proportional hazards assumption holds (or is assumed to hold), then it is possible to estimate the effect parameters without any consideration of the hazard function. This approach to survival data is an application of the Cox proportional hazards model, sometimes abbreviated to Cox model or to proportional hazards model.

#### The logistic hazard model

Cox (1972) proposed an extension of the proportional hazards model to discrete time by working with the conditional odds of dying at each time *t*_{j} given survival up to that point, the model is given by:

where *h*_{T}(*t*_{j}│*X*_{i}) is the hazard at time *t*_{j} for an individual with covariate values *X*_{i}, *h*_{0}*(t*_{j}) is the baseline hazard at time *t*_{j}, and exp {*X*_{i}β} is the relative risk associated with covariate values *X*_{i}. Taking logs, we obtain a model on the logit (l) of the hazard or conditional probability of dying at *t*_{j} given survival up to that time: *l*[*h*_{T}*(t*_{j}│*X*_{i})] =α_{j}+*X*_{i}β, where α_{j}=*l[h*_{0}*(t*_{j}*)]* is the logit of the baseline hazard and *X*_{i}β is the effect of the covariates on the logit of the hazard. Note that the model essentially treats time as a discrete factor by introducing one parameter α_{j} for each possible time of death *t*_{j}. Interpretation of the parameters β associated with the other covariates follows along the same lines as in the logistic regression. Time-varying covariates and time-dependent effects can be introduced in this model along the same lines as before. In the case of time-varying covariates, note that only the values of the covariates at the discrete times *t*_{1} <*t*_{2} <…<*t*_{j-1} <*t*_{j} are relevant. Time-dependent effects are introduced as interactions between the covariates and the discrete factor (or set of dummy variables) representing time (Therneau and Grambsch 2000).

The logistic regression analyses binomially distributed data, where the numbers of Bernoulli trials *n* are known and the probabilities of success *p* are unknown. The model proposes for each trial that there is a set of explanatory variables that might inform the final probability. The model then takes the form:

One can transform the output of a linear regression to be suitable for probabilities by using a logit link function. The logit, natural logs of the odds, of the unknown binomial probabilities are modelled as a linear function of the explained variables *X*_{1},…,*X*_{k}:

For a real-valued explanatory variable *X*_{1}, the intuition is that a unit additive change in the value of the variable should change the odds by a constant multiplicative amount. The logit function is invertible, so

The parameters of the model {β_{0},β_{1},…,β_{k}} are estimated by the principle of maximum likelihood based on the data. So, the binary logistic regression is a useful way to describe the relationship between one or more independent variables and a binary response variable, expressed as a probability. The logistic function is defined as

The input is *z* and the output is *f*(*z*), which is confined to values between 0 and 1; *f*(*z*), represents the probability of a particular outcome, given that set of explanatory variables. The variable *z* is a measure of the total contribution of all the independent variables used in the model and is known as the logit. In this study the variable *z* was defined as *z*=β_{0}+β_{1}*X*_{1}+…+β_{k}*X*_{k}.

From a technical point of view, there is no error term in a logistic regression, unlike in classic linear regression. The logistic regression is useful when we are predicting a binary outcome from a set of continuous predictor variables. It is frequently preferred over discriminant function analysis because of its less restrictive assumptions.

#### The Cox proportional hazard model

Cox proportional-hazards regression allows analyzing the effect of several risk factors on survival. The probability of the endpoint (death) is called the hazard. The hazard function for the Cox proportional hazard model is modeled as:

where X_{i} is a collection of predictor variables and *h*_{0}*(t*_{j}) is the baseline hazard at time *t*_{j}, representing the hazard for a sample unity (mosquito) with the value 0 for all the predictor variables. By dividing both sides of the above equation by *h*_{0}*(t*_{j}) and taking logarithms, we obtain:

One calls

the hazard ratio. The coefficients {β_{0},β_{1},…,β_{k}} are estimated by Cox regression and can be interpreted in a similar manner to that of multiple logistic regressions. Suppose the covariate is discrete, then the quantity exp{*X*_{i},β} is the instantaneous relative risk of an event, at any time, for an individual with an increase of one-unity in the value of the covariate compared with another individual, given both individuals are the same on all other covariates.

The Cox proportional regression model assumes that the effects of the predictor variables are constant over time. Furthermore there should be a linear relationship between the endpoint and predictor variables. Predictor variables that have a highly skewed distribution may require logarithmic transformation to reduce the effect of extreme values. This model is robust and a safe choice of a model in many situations. Because of the model form

the estimated hazards are always non-negative. Even though *h*_{0}*(t*_{j}) is unspecified, we can estimate {β_{0},β_{1},…,β_{k}} and thus compute the hazard ratio. The *h*_{T}(*t*_{j}│*X*_{i}) and *S*_{T}(*t*_{j}│*X*_{i}) can be estimated for a Cox model using a minimum of assumptions. In survival analysis, the Cox model is preferred to a logistic model, since the latter one ignores survival times.

The proportional hazard model is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying survival distribution. The model assumes that the underlying hazard rate, rather than survival time, is a function of the independent variables (covariates), and no assumptions are made about the nature or shape of the hazard function. Thus, Cox's regression model may also be considered as a nonparametric method. The model may be written as:

where *h*(*t*) denotes the resultant hazard, given the values of the *k* covariates for the respective case (*X*_{1},…,*X*_{k}) and the respective survival time (*t*). The term *h*_{0}*(t)* is called the baseline hazard, the hazard for the respective individual when all independent variable values are equal to zero. The baseline hazard is an unspecified function that does not depend on *X* but only on *t*. The exponential involves the *X* but not *t*; *X* are time-independent. Similar to ordinary linear regression in the logistic hazard model, the unknown parameters {β_{0},β_{1},…,β_{k}}are usually estimated by maximum likelihood.

Although the Cox model is non-parametric to the extent that no assumptions are made about form of the baseline hazard, there are still a number of important issues which need be assessed before the model results can be safely applied. First, they specify a multiplicative relationship between the underlying hazard function and the log-linear function of the covariates. This assumption is also called the proportionality assumption. In practical terms, it is assumed that, given two observations with different values for the independent variables, the ratio of the hazard functions for those two observations does not depend on time. The second assumption is that there is a log-linear relationship between the independent variables and the underlying hazard function.

An hypothesis of the proportional hazard model is that the hazard function for an individual depends on the values of the covariates and the value of the baseline hazard, *h*_{0}*(t)*. Given two individuals with particular values for the covariates, the ratio of the estimated hazards over time will be constant, hence the name of the method: the proportional hazard model. The validity of this hypothesis may often be questionable.

After the data compilation for all individuals, we submitted an application of the logistic regression to the dataset, using all available climate variables at a time as a predictor, to analyze their effect separately. We also considered potential delayed effects of predictors by applying the regression to lagged variables (Martinussen and Scheike 2006), with lag ranging from one to five days. Finally, we considered cumulative effects using the sum of the variable over the past two to five days as a predictor.

Following the statistical analysis, we estimated the logistic regression coefficients for the daily weather attributes on the complete sample. For this, we have chosen the model by the Akaike Information Criterion (AIC) in a stepwise algorithm (Venables and Ripley 2002), where the seasonal effect was statistically significant. It is worthwhile to note that multicollinearity in the logistic regression model (as well as in the Cox model) is a result of strong correlations between explanatory variables. The existence of multicollinearity inflates the variances of the parameter estimates. That may result in wrong signs and magnitudes of regression coefficient estimates and consequently, in incorrect conclusions about the relationships between independent and dependent variables. In this study, air temperature and relative humidity were involved in multicollinearity and have been combined into a single variable which was the saturated vapor pressure deficit.