Rift Valley fever (RVF) is a viral disease of animals and humans and a global public health concern due to its ecological plasticity, adaptivity, and potential for spread to countries with a temperate climate. In many places, outbreaks are episodic and linked to climatic, hydrologic, and socioeconomic factors. Although outbreaks of RVF have occurred in Egypt since 1977, attempts to identify risk factors have been limited. Using a statistical learning approach (lasso-regularized generalized linear model), we tested the hypotheses that outbreaks in Egypt are linked to (1) River Nile conditions that create a mosquito vector habitat, (2) entomologic conditions favorable to transmission, (3) socio-economic factors (Islamic festival of Greater Bairam), and (4) recent history of transmission activity. Evidence was found for effects of rainfall and river discharge and recent history of transmission activity. There was no evidence for an effect of Greater Bairam. The model predicted RVF activity correctly in 351 of 358 months (98.0%). This is the first study to statistically identify risk factors for RVF outbreaks in a region of unstable transmission.
Rift Valley fever (RVF) is a vector-borne viral zoonosis that infects livestock and humans (Flick and Bouloy 2005). Economic losses in endemic regions can be great due to death of the host and/or abortion in livestock, particularly cattle, sheep, camels, and goats. Recent epidemics have resulted in tens to hundreds of thousands of human cases, of which <1% are fatal (Flick and Bouloy 2005). Transmission of RVF virus (RVFV) is typically through the bite of an infected mosquito or direct contact with body fluids of infected animals during slaughter. The ecology of RVFV varies greatly among regions. In areas of endemism, like sub-Saharan Africa, primary vectors are biting mosquitoes in the genus Aedes, which may also transmit the virus vertically, enabling persistence during periods of drought (Linthicum et al. 1999, Favier et al. 2006, Métras et al. 2011). In the Horn of Africa, outbreaks are associated with extreme climatic conditions, particularly above average rainfall associated with the warm phase of the El Niño/Southern Oscillation (Linthicum et al. 1999, Linthicum et al. 2007, Anyamba et al. 2009). Ecologic forecasting of outbreaks in endemic regions has been well studied and is widely considered to be effective (Anyamba et al. 2009). By contrast, in North Africa and the Arabian peninsula, including Egypt, Culex spp. are the dominant vectors (Hoogstraal et al. 1979, Meegan et al. 1980, Hanafi et al. 2011). Culex do not transmit transovarially, however, and persistence requires a permissive vertebrate reservoir. In these regions, outbreaks are more likely to be associated with importation of new infections and secondary transmission than with amplification of viruses already circulating (Abdo-Salem et al. 2011). A reliable model to forecast the origin and duration of outbreaks of Rift Valley fever in such regions would be of considerable value to public health planning, the targeting of vector control operations, and reducing economic losses. The development of such models is an active area of research.
Although Rift Valley fever has historically been limited to sub-Saharan Africa, introduction to other parts of the world is a concern (EFSA 2005, Linthicum et al. 2007, Hartley et al. 2011). Particularly, sporadic epidemics/epizootics have occurred in Egypt since 1977, when RVFV appeared in southern Egypt and then spread to the Nile Delta causing a severe outbreak in humans (∼200,000 human infections and ∼600 deaths) and livestock (Meegan 1979, Darwish and Hoogstraal 1981). Currently, Egypt represents the northernmost extent of RVFV. Unsurprisingly, transmission of RVFV in Egypt is unstable, giving rise to sporadic outbreaks which presumably occur only when the pathogen is introduced under the right entomologic and hydrologic conditions. Although there have been five major outbreaks of Rift Valley fever in Egypt since 1977 (Table 1), attempts to discern risk factors associated with outbreaks have been limited (Métras et al. 2011). Hypothesized risk factors include hydrologic conditions favorable to the creation of mosquito vector habitat and socio-economic factors favorable to the importation of the pathogen (Hoogstraal et al. 1979, Sellers et al. 1982, Gad et al. 1986), both of which have been identified as risk factors in other regions (Métras et al. 2011, Abdo-Salem et al. 2011). Particularly, since the first two Rift Valley fever epidemics in Egypt coincided with the timing of the Greater Bairam, an Islamic religious festival during which animal sacrifice and feasting are customary, we hypothesized that the timing of the Greater Bairam, when large numbers of livestock are imported from Sudan and sub-Saharan Africa, might be a risk factor (Abdo-Salem et al. 2011). This hypothesis is consistent with a recent study in which chance of introduction of the virus to Yemen was considered greater during festival periods (Abdo-Salem et al. 2011). Since importation of livestock for the Greater Bairam occurs in advance of the festival, we supposed that introduction of the virus might lead the Greater Bairam holiday by as much as two months. The presence of entomologic conditions favorable to the transmission of the virus, previously reported to correlate with RVF transmission activity (Kenawy et al. 1987, Gad et al. 1999, Hanafi et al. 2011), was also included as a covariate. Finally, if the virus is either temporarily or permanently persistent, then recent transmission activity should be a good predictor of current transmission intensity. Lagged RVF status was therefore also considered an important candidate predictor.
Table 1. Summary of outbreaks of Rift Valley fever in Egypt, 1977–2011.
Outbreak duration (months)
Primary Epidemiological References
Other African nations reporting RVF in the same year5
The first outbreak occurred from July-December 1977. A second outbreak occurred from July–December 1978. Phylogenetic evidence suggests that these two outbreaks represented a single introduction event (Grobbelaar et al. 2011).
Two peaks of transmission to humans reported (see Figure 3). Outbreak duration calculated as time elapsed between first and final months of transmission activity.
Outbreak confined to domestic ruminants.
Sources: Swanepoel & Coetzer (1994), Fontenille et al. 1998, Thonnon et al. 1999, Dar et al. 2013, Gerdes (2004)
RVF was also reported from Sudan in 1976, the year prior to the Egypt outbreak.
To test these hypotheses and construct a statistical forecasting system, we trained a statistical model to make one-month-ahead predictions of the status of Rift Valley fever in Egypt. We constructed a data set of 24 potential predictors representing a range of social, hydrologic, and entomologic factors. As the number of candidate predictors is of the same order as the number of months during which RVF was active, the associated regression model is poorly conditioned. Therefore, we adopted a regularization procedure (lasso-regularized generalized linear model; Park and Hastie 2007, Friedman et al. 2010) commonly used in statistical learning to select the best subset of predictors. The objectives of our model were (1) to predict the continuation (alternately, termination) of outbreaks, and (2) to predict the onset of new outbreaks. The model was optimized to provide the maximum performance in leave-one-out cross-validation. Although this model generally failed to predict the starting point of an epidemic, the ongoing status of infection-free vs infected state was predicted with 98% accuracy; the month of the termination of outbreaks was predicted correctly in three out of the five outbreaks that occurred during the 30-year study period.
MATERIALS AND METHODS
Two response variables were considered in this study. First, as an aid to short-term forecasting (continuation or termination of outbreak), all months between August, 1975 and July, 2005 (n=360) were scored as RVF+ (evidence of local transmission of Rift Valley fever to either animals or humans as indicated by official authorities and/or published work, usually including Ministry of Health and/or Ministry of Agriculture officials as authors; particularly references (Hoogstraal 1978, Hoogstraal et al. 1979, Laughlin et al. 1979, Meegan 1979, Meegan et al. 1980, Darwish and Hoogstraal 1981, Sellers et al. 1982, Arthur et al. 1993, Abd el-Rahim et al. 1997, Gad et al. 1999, Okda et al. 2006, Kamal 2011, Hanafi et al. 2011) or RVF- (absence of evidence of local transmission Rift Valley fever). Second, as an aid to anticipating new outbreaks (onset), the time series of RVF activity was broken into intervals of consecutive activity, i.e., runs of RVF+ or RVF. From these intervals it was determined if each month was the start of a new outbreak of Rift Valley fever, i.e., the first month in a series scored RVF+.
Hydrologic data were obtained from records of the Ministry of Water Resources and Irrigation for a range of locations upstream and downstream of Aswan High Dam in Egypt, where conditions might contain the signal of future outbreak conditions (Figure 1):
Mosquito activity: An indicator variable for the seasonal fluctuations in abundance of Cx. pipiens and Cx. antennatus mosquitoes (December-April=0, May-November=1)
Greater Bairam: An indicator variable for the celebration of Greater Bairam
Rift Valley fever: An indicator variable for the detection of RVF transmission in Egypt (used later to construct lagged variables that can be used in prediction)
RVF × Mosquito: An indicator variable taking the value 1 if both Mosquito Activity and Rift Valley fever are 1; otherwise 0
Greater Bairam × Mosquito: An indicator variable taking the value 1 if both Greater Bairam and Mosquito are 1; otherwise 0
Monthly rainfall at Gambeila: Continuous variable in units of mm precipitation month−1 (mean: 125.1; sd: 97.6)
Monthly river discharge at Malakal station: Continuous variable in millions of m3 month−1 (mean: 2584.4; sd: 720.1)
Monthly river discharge at Dongola station: Continuous variable in millions of m3 month−1 (mean: 5775.0; sd: 5719.7)
Natural inflow at Aswan Dam: Continuous variable in millions of m3 month−1 (mean: 6991.9; sd: 5741.3)
Total water arriving at Aswan Dam: Continuous variable in millions of m3 month−1 (mean: 5957.5; sd: 5597.6)
Monthly river discharge at Gaafra: Continuous variable in millions of m3 month−1 (mean: 4821.9; sd: 1460.1)
Monthly river discharge at Damietta: Continuous variable in millions of m3 month−1 (mean: 816.6; sd: 395.2)
Monthly river discharge at Rosetta: Continuous variable in millions of m3 month−1 (mean: 525.9; sd: 515.9)
Average water level at Malakal station: Continuous variable in m (mean: 11.8; sd: 0.8)
Average water level at Dongola station: Continuous variable in m (mean: 10.6; sd: 1.6)
Maximum monthly water level at Lake Nasser: Continuous variable in m (mean: 172.2; sd: 6.2)
Average water level at Gaafra: Continuous variable in m (mean: 82.7; sd: 0.9)
Average water level at Damietta: Continuous variable in m (mean: 13.3; sd: 0.3)
Average water level at Rosetta: Continuous variable in m (mean: 13.1; sd: 0.3)
One record was missing for Monthly Rainfall at Gambeila (July, 2004) and was replaced by the average July rainfall calculated over all other years. To enable evaluation of the relative importance of selected variables, continuous variables were rescaled by dividing each observation by the variable standard deviation. Thus, both continuous and categorical variables were confined to a comparable range. Additional variables were then constructed from this list, including:
RVF-Lag1: A one-month lagged variable for RVF status and mosquito density
GB-Lag1 and GB-Lag2: One- and two-month lagged variables for Greater Bairam holidays
RVF-Lag1 × Mosquito-Lag0: A constructed feature positive for positive RVF status at lag one month and high mosquito density at current time
RVF-Lag1 × Mosquito-Lag1: A constructed feature positive for RVF status at lag one month and high mosquito density at lag one
We initially studied models estimated using linear discriminant analysis, recursive partitioning, and generalized linear models. In these models, variable selection and model structure were guided by iterative fitting, inspection of coefficients, and significance tests. Ultimately, this approach proved too unwieldy. We therefore turned to a systematic search via lasso-regularized generalized linear modeling (Park and Hastie 2007, Friedman et al. 2010). One interpretation of this estimation procedure is that it is as a penalization or “shrinkage” method that trades bias for reduced variance, i.e., the unbiasedness of the maximum likelihood estimates is lost, but precision on most or all estimated coefficients is improved (Hastie et al. 2009). A desirable consequence of this estimation scheme is that it favors sparse solutions to overdetermined regression problems, i.e., coefficients on many variables shrink to zero (Hastie et al. 2009). Since a coefficient of zero implies no effect, this feature may be exploited for variable selection in addition to improving model identifiability.
Recognizing that hypothesized interactions are already coded in the variables RVF × Mosquito and Greater Bairam × Mosquito, we fit a binomial model with first order terms only to the combined data set consisting of the original and constructed variables. Lasso regularization requires setting a tuning parameter (λ) that governs the severity of the penalty function (Hastie et al. 2009). To optimize the model, we searched for the value of λ that minimized average residual deviance (equivalently, minimized classification error) on the withheld observation by iteratively fitting the model to all points but one (leave-one-out cross-validation; Hastie et al. 2009). Because we explored covariates with time lags up to two months, the number of records available for model fitting was reduced from n=360 to n=258. As predictor variables include lagged covariates, it was impossible to avoid information leakage from test data. Given this and the additional constraints imposed by the relatively small number of observations, we determined that model performance should be reported as the fit of the model to all the training data after optimization using leave-one-out cross-validation rather than by designating a test data set withheld from model fitting. We used leave-one-out cross-validation and our estimates of model performance were calculated for the entire data set, but in each case were calculated for records excluded from model fitting. For comparison, we also estimated null accuracy, which we define as the accuracy the model would have achieved if we had simply chosen to classify every point according to the majority class (in our case RVF). Fitting was performed with the ‘glmnet’ package in R (R Development Core Team 2008). Data and source code to perform the analysis may be obtained from the lead author of this article.
Performance of models
Although the best fit model for predicting the continuation of RVF activity was obtained at l=0.00729, the range of values for l that result in nearly equivalent models is broad (Figure 2). This suggests that many variables are nearly equivalent in their predictive ability. The model that minimized average deviance on withheld observations was selected for further analysis. Variables in this model included Monthly rainfall at Gambeila, Monthly discharge at Gaafra, RVF-Lag1 × Mosquito-Lag0, and RVF-Lag1 × Mosquito-Lag1. Performance of this model was high overall, but selective. Specifically, the model predicted RVF activity correctly in 351 of 358 months (98.0% accuracy), achieved with a model specificity of 99.4% and model sensitivity of 84.4%. This performance was achieved mostly by correct prediction of the continuation of outbreaks and inter-outbreak intervals. To illustrate, Figure 3 shows rainfall at Gambeila (the most important hydrologic indicator detected in this analysis) plotted through time with outbreak intervals designated in red. Misclassified points are circled, showing that the start of new outbreaks is considerably more difficult to predict than the continuation or termination of an outbreak. Since both categorical and continuous observations are scaled to a common range, variable importance may be evaluated by inspection of the magnitudes of the fit coefficients. Table 2 shows that importance of RVF-Lag1 × Mosquito-Lag1 exceeds that of the next most important variable (Monthly rainfall at Gambeila) by more than eight-fold. No variables could be identified that predicted the onset of the epidemic, either when all RVF active months were used or when the model was trained to classify only onset months.
Table 2. Variables included in the best fit model in declining order of importance. No other covariates were detected to have any effect. The effect of the regularization parameter on the number of selected coefficients is shown in Figure 2. The theory of glmnet does not provide a method for calculating standard errors on the coefficients.
RVF-Lag1 × Mosquito-Lag0
Monthly rainfall at Gambeila
RVF-Lag1 × Mosquito-Lag1
Monthly river discharge at Gaafra
Comparison with alternative models
The null accuracy, i.e., the accuracy the model would have achieved if we had simply chosen to classify every point according to the majority class, for this model was 91% (sensitivity: 0%, specificity: 100%). Further insight into the parsimony of the best fit model is obtained by comparing using the simple rule, “If RVF-Lag1=1 then RVF+, otherwise RVF”. Such a rule-of-thumb, although superior to the null model, only achieves accuracy of 97.2% (sensitivity: 84.4%, speciﬁcity: 98.5%). These comparisons show the statistically optimized model to be superior to these naïve alternatives.
The development of statistical systems for predicting emerging and re-emerging infectious diseases is an active area of research. Here we have shown that a statistical model can predict the continuation and termination of outbreaks of Rift Valley fever in Egypt, a region where transmission is unstable but periodic introduction and subsequent outbreaks have caused a significant public health and economic burden. Specifically, months were misclassified only seven times in nearly thirty years of forecasts. These findings provide evidence that a statistical system for Rift Valley fever may be capable of accurate forecasting of the duration of outbreaks after they have begun.
We hypothesize that RVF emerges in Egypt when a constellation of interconnected hydrologic, entomologic, and social conditions are met. Due to the different periods of the hydrologic cycle (which determine mosquito abundance) and the Islamic calendar (which determines the dates of the Greater Bairam, the putative condition for importation of the virus, and possibly also increases the population of susceptible livestock thereby boosting amplification), the circumstances enabling the emergence of RVF are infrequently and irregularly met. The predictors identified by our model relate to hydrologic conditions occurring both upstream (rainfall at Gambeila) and downstream (discharge at Gaafra) of Aswan High Dam and their effects on vectors and disease prevalence. Previous studies indicate that the sudden appearance or a surge in quantity of environmental water is associated with outbreaks or resurgence through increasing vector populations (Digoutte and Peters 1989, Mondet et al. 2005). Causes of such increases in water quantities include, inter alia, unusually heavy rainfall (Linthicum 1999), flooding following dam construction (Digoutte and Peters 1989), and rainfall patterns typical of the rainy season (Mondet et al. 2005). Our finding that the continuation of RVF outbreaks is well predicted by river discharge is not surprising as the water budget for much of Egypt, which is heavily irrigated, is dependent on rainfall at upper Nile reaches including Gambeila (Said 1993). Once in Egypt, water discharges downstream to the Nile Valley and Delta depend on normal requirements as well as storage capacity of the Aswan Dam. In cases where water storage capacity reaches critical levels, more water is released to reduce physical pressure on the body of the Dam (Aziz and Sadek 2003). Thus, with above normal rainfall in Africa, Egypt receives higher budgets allowing for more surface water availability downstream. Linkages between water availability, creation of mosquito breeding habitats, and increased RVF vector species abundance are well documented (Davies et al. 1985, Bicout and Sabaiter 2004, Elfadil et al. 2006, Abdo-Salem et al. 2011).
Our demonstration of a strong statistical association between RVF activity and both upstream and downstream hydrological factors in Egypt highlights the importance of hydrology in RVF epidemiology (and probably for other vector-borne diseases, too). RVF virus is present in a variety of ecosystem types, including arid areas (Northern Senegal, Yemen), irrigated areas (Egypt/Aswan, Delta of Senegal River), forest (Madagascar), and wetlands called dambos (Kenya, South Africa; Métras et al. 2011). In Egypt, the 1977 and 1993 RVF outbreaks occurred at the time of the first and second filling of the Aswan High Dam to spillway elevation following high floods in Africa (Jobin 1999). A series of successive high floods occurred in the years 1999–2000, 2000–2001, and 2001–2002 (Amer 2003) which may have contributed to the 2003 RVF epidemic. In all studies analyzed by Métras et al. 2011, extrinsic factors associated with RVF or RVF virus exposures were found to be water-related. However, the mechanism(s) through which hydrological factors influence vector populations and disease events vary considerably among regions because the ecological habits of the vector species, hydrological systems, and water management practices differ. For example, rainfall in Africa is linked to RVF epidemics through inundation of dambos and hatching of vertically infected Aedes spp. females (Linthicum et al. 1999). Here we show that rainfall in Africa is also linked to RVF activities in Egypt, in areas located many miles away through the River Nile hydrological system. Such remote hydrological linkages (ca. 5,500–6,500 km) help to explain how large-scale climate-related events like ENSO lead to vector-borne disease outbreaks (Linthicum et al. 1999, Anyamba et al. 2009).
On the vector side, ongoing work by one of the authors (ANH) demonstrates that the hydrologic control system of Aswan High Dam may contribute to temporal abundance of mosquito vectors. The operation rules governing the discharge of water into Egypt allow the flow of calculated quantities that vary throughout the year according to water requirements and storage capacity. Interestingly, the water discharged downstream increases gradually starting from the months of March-April each year and peaks in September. Maximum density of Culex pipiens and Cx. antennatus, two known vectors of RVF (see Hoogstraal et al. 1979 and Hanafi et al. 2011, respectively), were found to follow such a pattern of increase in water levels by a one-month lag period (ANH, unpublished data). Hence, hydrological monitoring may continue to provide a useful tool for vector and disease surveillance and prediction.
Recent RVF activity is another significant factor identified in our study. Maintenance of the virus during inter-epidemic periods is still an epidemiological mystery, not only in Egypt but also in other areas with a similar epidemiological situation. In areas where floodwater mosquitoes of the genus Aedes exist, vertical transmission through drought-resistant eggs is responsible for both maintenance and emergence of the virus in the system. Our laboratory observations on Aedes (Ochlerotatus) spp. in Egypt indicate that their eggs may resist desiccation and hatch when soaked in water (ANH, unpublished observations). Under laboratory conditions, Ae. caspius has been shown to be capable of transmitting the virus (Turell et al. 1996, Gad et al 1987). Under field conditions, this species was suggested to play a role in RVF transmission as a bridge vector between humans and bovines (Gad et al. 1999). The ability of Aedine species to vertically transmit RVF virus in Egypt is an important topic for future research.
Although Greater Bairam was not among the significant RVF predictors in this study, we suggest that the contribution of festive occasions to RVF emergence and resurgence should nevertheless remain an open topic for study (compare with Gad et al. 1986, Jobin (1999), and Davies (2006)). Particularly, animal importation may be associated with virus introduction/reintroduction, increased virus amplification in sheep populations, increased vector foraging behavior related to increased host availability, and/or higher human exposure. Kamal (2011) suggested that importation of animals infected with RVF from Sudan is among the main causes of resurgence of RVF epizootics in Egypt and that non-compliance with rules that imported live animals were to be slaughtered upon arrival leads to such animals being mixed with native breeds, creating conditions for spread throughout the country. This provides support to earlier studies implicating animal movement between Sudan and Egypt (whether legal or illegal) in RVF virus introduction (Gad et al. 1986). In fact, the soon-to-beoperational land transportation route between the two countries would facilitate animal movement; hence increasing the risk of outbreaks.
The onset of RVF outbreaks in areas of unstable transmission is not only influenced by natural and socioeconomic phenomena but is also complicated by a country's political agenda and vigilance of health systems. Due to fears of potential economic losses, countries may prevent or delay the official announcement of an outbreak due to potential negative impacts on the economy (Kamal 2011). This situation limits the availability of precise information about outbreak events. For example, Egypt has not officially recognized the RVF epidemic of 2003 (Hanafi et al. 2011, Kamal 2011). Additionally, the onset of some outbreak events in Egypt have been attributed to live animal vaccine (Kamal 2011), although not the first and largest outbreak, which occurred prior to the use of live animal vaccine. It follows that modeling capability to predict the onset of RVF events in Egypt would improve with proper epidemiological information and with alternative modeling approaches. We hypothesize that Rift Valley fever occurs in Egypt when a constellation of entomologic, hydrologic, and social conditions are simultaneously met. Due to the different periods of the hydrologic cycle (which determines mosquito abundance) and the Islamic calendar (which determines the dates of the Greater Bairam, the putative condition for importation of the virus in infected livestock, as well as large numbers of livestock that may accelerate virus amplification), the circumstances enabling the emergence of Rift Valley fever are infrequently and irregularly met. A better understanding of this interaction will be crucial for developing more effective forecasting systems in non-endemic regions.
The failure of this model to predict the onset of outbreaks does not diminish its potential use in anticipating the continuation and termination of outbreaks. We propose that potential improvements to the model might be obtained through the development of mechanistic models for transmission (Favier et al. 2006, Gaff et al. 2007). Specifically, since the hypothesized emergence scenario is specific, one promising approach is to parametrically encode the timing of introduction into a model for transmission, which could then be fit to data such as have been reported here. Such a model would benefit greatly, however, from quantitative information about the relative prevalence of Rift Valley fever or transmission activity.
One potential explanation for the failure of our model to anticipate the onset of outbreaks is that they were caused by live animal vaccine (Kamal 2011), not introduction in livestock, although clearly this explanation would not apply to outbreaks that occurred before the use of vaccines. This would explain why outbreaks do not occur annually when entomologic conditions are met. Of course, importation of susceptible (non-vaccinated) livestock hosts might still be a necessary condition for amplification, in which case a three-fold set of conditions must be met for an outbreak to begin (ecologic conditions favorable to the vector, expectation of Greater Bairam building up levels of susceptible hosts, and reversion of attenuated vaccine-derived virus to an infectious state to initiate the outbreak). Since live-virus vaccines are no longer in use, it is anticipated that future outbreaks are more likely to be sparked by introductions. The very good performance of our model during epidemic periods underscores its usefulness regardless of the mechanism of introduction, an important consideration in regions where transmission is unstable.
The authors thank Chris Cosner and Kenneth Linthicum for comments on an earlier draft of this paper. This study was supported by a grant from the National Institutes of Health (R01GM093345).