An assessment of probabilistic prediction skill of seasonal temperature extremes over Southern Africa is presented. Verification results are presented for six run-on seasons; September to November, October to December, November to January, December to February, January to March, and February to April over a 15-year retroactive period. Comparisons are drawn between downscaled seasonal 850 hPa geopotential height field forecasts of a two-tiered system versus downscaled height forecasts from a coupled ocean–atmosphere system. The ECHAM4.5 atmospheric general circulation model (GCM) is used for both systems; in the one-tiered system the ECHAM4.5 is directly coupled to the ocean model Modular Ocean Model version three (MOM3), and in the two-tiered system the ECHAM4.5 is coupled with Van den Dool sea surface temperature (SST) hindcasts. Model output statistical equations are developed using canonical correlation analysis (CCA) to reduce system deficiencies. Probabilistic verification is conducted using the relative operating characteristic (ROC) and reliability diagram. The coupled model performs best in capturing seasonal maximum temperature extremes. Seasons demonstrating the highest ROC scores coincide with the period of highest seasonal temperatures found over Southern Africa. The above-normal category of the one-tiered system indicates the highest skill in predicting maximum temperature extremes, implying the coupled model predicts skilfully when there is a high likelihood of experiencing extremely high seasonal maximum temperatures during mid to late summer. The downscaled coupled maximum temperature hindcasts are evaluated additionally in terms of their monetary value and quality to the general public. The seasonal forecast system presented in this study should be able to reduce risks in decision making by the health industry in Southern Africa.
The dramatic improvement in supercomputer power as well as an increase in the understanding of atmospheric processes have resulted in the progression of seasonal forecasts experiencing a vast improvement over the past decade (Cane et al., 1994; Hastenrath et al., 1995; Barnston and Smith, 1996; Jury, 1996; Mason et al., 1996, 1999; Hunt, 1997; Makarau and Jury, 1997; Mason, 1998; Mattes and Mason, 1998; Jury et al., 1999; Landman and Mason, 1999; Landman and Tennant, 2000; Landman et al., 2001a; Landman and Goddard, 2002). A forecast of any timescale can never be 100% accurate because of the atmosphere having inherent internal variability as a key characteristic, (Doblas-Reyes et al., 2000) therefore requiring seasonal climate simulations to be represented probabilistically (e.g. Mason et al., 1999; Goddard et al., 2001; Goddard and Mason, 2002). General circulation model (GCM) ensembles, if correctly configured, allow for such probabilistic forecasts, because of ensemble forecasting providing a viable way to approximate the probability spread of atmospheric states (Brankovic and Palmer, 2000; Landman and Beraki, 2012). Improving the prediction skill of these forecasts is also made possible by expanding the understanding of local climate systems and their respective processes, in this case over Southern Africa (Klopper et al., 1998).
Dynamic numerical models that include the interactions between the atmosphere, ocean, and land should provide theoretically superior seasonal forecasts than purely statistical approaches, as they have the capability of handling a vast range of both linear and nonlinear interactions and their probable flexibility against a changing climate (Barnston et al., 1999). Therefore, the choice of using dynamic models in seasonal forecasting has become a significantly popular one (Stockdale et al., 1998; Landman et al., 2009). However, even with such improvements, model errors are still a substantial source of problems (Latif et al., 2001; Palmer et al., 2004), and it is still unclear as to what degree the current generation of numerical forecast models in use are able to challenge and improve upon existing empirical methods in seasonal forecasting (Van Oldenborgh et al., 2004). Barnston et al. (1999) concluded that dynamic models were not able to capture the 1997/1998 El Niño event and follow the La Niña event better than statistical models (Van Oldenborgh et al., 2004).
Skill has been established by GCMs for a variety of scales with resolutions of approximately 100–300 km (Landman and Goddard, 2002; Landman and Beraki, 2012; Landman et al., 2012). However, GCMs do not have the ability to capture local smaller-scale features accurately, and a common consequence is overestimating certain variables such as rainfall over Southern Africa (Joubert and Hewitson, 1997; Mason and Joubert, 1997; Landman et al., 2009). The representation of rainfall at mid to high latitudes is highly complex and more often than not poorly predicted (Graham et al., 2000; Goddard and Mason, 2002). Such systematic biases have resulted in statistical recalibration and downscaling of GCM simulations becoming a necessity, particularly over places such as Southern Africa (Landman and Beraki, 2012). A viable method to make the necessary corrections for these biases is to employ a model output statistics (MOS) approach (e.g. Landman and Goddard, 2002; Landman and Beraki, 2012). Previous research has been conducted on forecasting seasonal rainfall over Southern Africa using global models that have been downscaled using a MOS approach and was found to perform better during El Niño and La Niña seasons than during neutral years (neither an El Niño nor a La Niña event) or years when there was no strong El Niño or La Niña influence (Landman and Beraki, 2012).
Seasonal temperature forecasts over Southern Africa however, have been neglected since the seminal paper completed by Klopper et al. (1998), where a deterministic assessment of forecast skill was investigated. A more recent study of temperature variation over Africa shows that a significant increase in temperatures has been seen over Africa and that it is not exclusively because of variations in the El Niño-Southern Oscillation but rather because of other natural variability of the climate and/or human activity (Collins, 2011). This paper presents a probabilistic assessment of forecast skill of temperature, including both minimum and maximum seasonal temperatures by using state-of-the-art GCM configurations.
Owing to the seasonal progression of accurate sea surface temperature (SST) anomaly predictions, it has been made possible to produce seasonal-average weather forecasts by incorporating them in atmospheric GCMs (Graham et al., 2000; Goddard and Mason, 2002). Using such a modelling system (two-tiered) to predict a particular region's seasonal characteristics has been operational in South Africa for numerous years (e.g. Landman et al., 2001b). The one-tiered system, whichfollowed the two-tiered system, is known as a fully coupled ocean–atmosphere model (e.g. Stockdale et al., 1998; Saha et al., 2006; Weisheimer et al., 2009). Coupled models' fundamental feature is their ability to describe interactions between the atmosphere and the ocean, and therefore aim to yield more reliable seasonal forecasts, whereas two-tiered systems exclude this interaction, but however do include the atmosphere's response to SSTs (Copsey et al., 2006; Troccoli et al., 2008). Coupled ocean–atmosphere models are at the top of the modelling ladder in terms of complexity and computational expense (DeWitt, 2005). Theoretically fully coupled models should outperform the two-tiered forecasting system, as they capture reality more accurately and therefore should produce more accurate forecasts (DeWitt, 2005).
Global models are employed in this study to predict seasonal minimum and maximum temperature extremes over Southern Africa during the austral summer period from September through April. Winter has been omitted because of minimal forecast skill found in the austral winter forecasts (Mason et al., 1996; Landman et al., 2012). Moreover, extremes in minimum and maximum temperatures have a direct impact on human health (particularly maximum temperatures); therefore an application of this study may include the potential of health hazards associated with extremely hot summer seasons over the region.
Exposure to high temperatures can impact human health in many different ways. Such exposure to high temperatures can lead to symptoms such as fatigue, dizziness, and cramps, as well as heat illnesses such as heat exhaustion and heatstroke. High ambient temperatures, including those experienced during a heat wave, have been associated with increases in mortality. These increases in mortality are not just from heatstroke, but also from cardiovascular, cerebrovascular, and respiratory diseases, and have also been seen in all-cause mortality (Smoyer-Tomic and Rainham, 2001; Diaz et al., 2002; Medina-Ramón and Schwartz, 2007; Baccini et al., 2008; Ballester et al., 2011; Rocklöv et al., 2011; Vaneckova et al., 2011). The heat–health relationship varies for different locations and for different population, and thus it is critical to develop these relationships based on local health and meteorological data. Those most vulnerable to heat generally have been found to be the elderly, people living in urban areas, people with pre-existing cardiovascular and respiratory disease, and those with compromised coping capacities (Semenza et al., 1996; Basu and Samet, 2002; Naughton et al., 2002; Kovats and Hajat, 2008). In addition to increased mortality with increased ambient temperatures, there is some evidence of increases in hospital and emergency admissions for specific heat-related illnesses (such as respiratory diseases), particularly amongst vulnerable groups (Michelozzi et al., 2009; Green et al., 2010; Wichmann, et al., 2011). However, there are fewer studies on non-fatal illnesses and their relationship to high ambient temperature, and thus less of a consensus on the impact.
In a study in London, the increases in admissions during high ambient temperatures were not at the same magnitude as increases in mortality, which led the authors to suggest that many heat-related deaths may occur before receiving medical attention (Kovats et al., 2004). Analyses of the 1995 and the 1999 Chicago heat waves determined that isolated people such as those living alone, not leaving home every day, and being confined to bed were those with the strongest risk factors for heat-related death (Semenza et al., 1996; Naughton et al., 2002). Thus, in order to protect public health from high temperatures, it is critical both to empower the public, through education and alerts, to be able to protect their own health effectively, and increase assistance and prevention measures, particularly focusing on those who are vulnerable and isolated.
Planning measures, such as heat warnings and heat-health plans, have been implemented in many countries to aid in the prevention of increases in morbidity and mortality from high temperature events (e.g. Kalkstein et al., 1996; Smoyer-Tomic and Rainham, 2001; Sheridan and Kalkstein, 2004; Tan et al., 2004; Pascal et al., 2006). Some weather bureaus call for a heat warning when temperatures are forecasted to reach above a certain threshold. Some cities have more elaborate intervention plans that are activated by such warnings. For example, in Philadelphia, PA, USA, there are 10 activities that are enacted once a warning has been issued by the US National Weather Service ranging from media announcements, to halting suspensions of utility services, to increases in emergency medical service staffing (Kalkstein et al., 1996; Sheridan and Kalkstein, 2004). An evaluation of Philadelphia heat watch/warning system concluded that for 1995–1998, issuing a warning saved 2.6 lives on average with high benefits and low costs (Ebi et al., 2004).
After the August 2003 heat wave in France where the excess mortalities during the heat wave was 14 800, the French government developed a Heat Health Watch Warning System (Pascal et al., 2006). In July 2006, France experienced another severe heat wave, though not as intense as the 2003 heat wave, and the system was used. While there still were 2100 excess deaths recorded for that period, it was approximately one-third the predicted excess deaths from a health model for the same period (Fouillet et al., 2008). While it is difficult to elucidate the exact reason for these decreases, it may be because of increased public awareness, preventative measures, and the heat warning system developed by the government (Fouillet et al., 2008). In general, evaluations of heat warning systems are difficult, as there are many factors that might influence the changes in mortality. However, as the impact of high temperature on mortality is well-documented, and temperatures and heat waves are expected to increase in the future because of climate change, the need for effective heat-health warnings and plans are on the increase in order to reduce large public health impacts due to increasing temperatures.
Most of the research on increases in mortality with high temperatures has been in temperate regions. Thus, there is a lack of knowledge in general of how populations living in more tropical environments might be impacted by increasing temperatures, though the limited research available does suggest that increases in temperature do impact mortality rates (Kynast-Wolf et al., 2010; Vaneckova et al., 2011). In Southern Africa little is known on how public health is impacted by high ambient temperatures, with only one study in Cape Town that investigated the heat–mortality relationship (McMichael et al., 2008). Previous work has focused on occupational health and heat stress, with a focus on the mining industry (e.g. Wyndham, 1965; Mathee et al., 2010). Besides, as the interior regions of Southern Africa are projected to experience increases in temperature as great as 4–6 °C under the A2 emission scenario by the end of the century, the occurrence of heat events is also projected to increase (Engelbrecht and Bopape, 2011).
Increased temperatures will not only directly impact health through increased heat-related illnesses, but many health aspects are also impacted by climate factors (e.g. malaria, cholera, dengue fever, and malnutrition). There is a need for accurate seasonal forecasts of climate variables in order to plan for health impacts effectively in advance. For heat impacts on health, much of the focus internationally has been on short-term forecasting; however, long-term forecasting will aid in planning (McGregor et al., 2004, 2006). Little research has been performed on applying seasonal forecasting of extreme events to health impacts, though seasonal forecasts have been used in developing early warning systems for infectious diseases (e.g. Kuhn et al., 2005; Thomson et al., 2006; Connor et al., 2008; Kelly-Hope and Thomson, 2008). Seasonal forecasts can provide information with long lead-times that will allow the health community enough time for planning. Myers et al. (2000) suggest that for epidemic forecasting, when it is critical to give health workers enough time to plan for unexpectedly high (or low) number of cases, a lead-time of 2–6 months is most useful. A longer lead-time is also required for decisions such as increasing the health care sector budget (Myers et al., 2000). Such seasonal forecasts can then be followed-up by shorter range forecasts, which can provide more precise spatial and temporal information for more focused response measures. Such a combination provides information across decision timescales in order to prepare effectively to mitigate negative health impacts.
In order to develop effective plans on both the seasonal and shorter timescales, adequate skill in forecasting is critical. Coelho and Costa (2010) describe the challenges in integrating seasonal forecasts into application in areas such as agriculture and health. The first two challenges that they identify are adequate skill in seasonal forecasts of information that is useful in application models, and downscaling this modelling spatially and temporally. This paper highlights the skill in forecasting extremely high temperatures over Southern Africa on a seasonal timescale.
2.1.1. Temperature data
The temperature data have been obtained in 3 month datasets of 2 m minimum and maximum temperatures from the Climatic Research Unit (CRU) TS3.1 (Mitchell and Jones, 2005; Harris et al., 2013). These are on high-resolution 0.5 × 0.5 grids and allow for the comparison of variations in climate with variations in other phenomena (Harris et al., 2013). The seasons of interest for which data have been obtained were the six run-on seasons, September to November, October to December, November to January, December to February, January to March, and February to April, which includes the period when Southern Africa is mainly controlled by influences from the tropics, therefore being a reasonably high predictability time and hence ideal for seasonal predictability studies over the region (Landman and Beraki, 2012; Landman et al., 2012). The data were extracted in a format which is compatible with the Climate Predictability Tool (CPT). CRU TS3.1 minimum and maximum temperature data are available from January 1901 to June 2009; however data were only extracted from 1982/1983–2008/2009 because of the global model's data availability restrictions.
2.1.2. Atmospheric general circulation model data
Comparisons are drawn between the statistically downscaled seasonal forecasts of an atmospheric GCM (AGCM) versus the statistically downscaled forecasts from a fully coupled system. The ECHAM4.5 (Roeckner et al., 1996) AGCM is used for both coupled and two-tiered systems. The AGCM hindcast set is available from January 1957 to November 2012 and is obtained by coupling the model with SST anomalies that are produced using constructed analogue SST's (Van den Dool, 2007) and consists of 24 ensemble members. Three month data of 2 m temperatures and the 850 hPa geopotential height fields have been extracted for the same six run-on seasons, September to November, October to December, November to January, December to February, January to March, and February to April with 3 month lead-times for each season for the time period from 1982/1983 until 2008/2009 supplying 27 years of available hindcast data. The largest number of years possible were extracted, as longer records of archived data improve the chance of creating more robust empirical downscaling equations as opposed to shorter ones (Landman and Beraki, 2012). These data are acquired from the archive data library of the International Research Institute for Climate and Society (IRI) in a format compatible with the CPT and seasonal means of the model data are used in this study. For this two-tiered system, forecasts are produced near the beginning of the month, therefore for a 1 month lead-time there are roughly 3 weeks from dissemination of the forecast, to the start of the forecasting season. For example, a 1 month lead-time forecast for December to February is produced in the first week of November and for a 2 month lead-time, forecasts are produced beginning October and so forth.
2.1.3. Coupled ocean–atmosphere model data
The ocean model, which is the Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model version three (MOM3) (Pacanowski and Griffies, 1998), is directly coupled to the ECHAM4.5 (DeWitt, 2005) using the Ocean Atmosphere Sea Ice Soil (OASIS) coupling software (Terray et al., 1999) provided by the European Centre for Research and Advanced Training in Scientific Computation (CERFACS). The model consists of 12 ensemble members. Further explanation of the coupled models' configuration can be found in DeWitt (2005). The ocean–atmosphere model used in this research is therefore the ECHAM4.5–MOM3 and data for this fully coupled forecasting system are only available from 1982 until July 2012. The model data obtained are the 2 m temperatures and the 850 hPa geopotential height field data. Three month seasonal averages have been extracted for the six run-on seasons of September to November, October to December, November to January, December to February, January to March, and February to April with 3 month lead-times for each season from 1982/1983 to 2008/2009. These data were acquired from the archive data library of the IRI and were also extracted in a format compatible with the CPT. Seasonal means are extracted for this one-tiered system with forecasts also being produced near the beginning of the month; therefore the same process applies as for the two-tiered forecasting system.
The hindcasts from both the one-tiered and two-tiered forecasting systems are downscaled statistically to Southern Africa's seasonal minimum and maximum temperatures for the six run-on seasons of September to November, October to December, November to January, December to February, January to March, and February to April. The forecasts are downscaled from an approximate 2.5 × 2.5 resolution to a 0.5 × 0.5 resolution. Owing to the coarse spatial resolution of coupled models (Palmer et al., 2004) downscaling global model output to a higher resolution is essential to fulfil the needs of end users and also to further improve upon the forecasts where possible (Landman and Goddard, 2002) by fixing the systematic deficiencies found in the global models. To address this requirement, MOS equations (Wilks, 2006) are employed to adjust for any deficiencies found within the global models directly in the regression equations (Landman and Goddard, 2002; Wilks, 2006; Landman and Beraki, 2012). The technique used to overcome these errors is the MOS algorithm using predictor values from the global models in both the development and the forecast stages (Landman et al., 2012).
Choosing the appropriate predictor variable or model field is very important and requires much attention (Landman et al., 2012). Raw model forecast of temperatures that are influenced by topography and other external factors such as soil moisture are poorly resolved and therefore may not be a good predictor of seasonal temperatures at the ground level. Temperature fields may be complex and contain structures on spatial scales well below those resolved by models. However, variables such as large-scale circulation may be better simulated by models than temperature, and should therefore be considered instead of a MOS system to predict seasonal temperatures. In this study 2 m temperatures were used initially as the predictor fields, however it was found that using the large-scale circulation pattern in the form of 850 hPa geopotential height fields proved a better predictor and displayed more skill in capturing these seasonal minimum and maximum temperatures.
Canonical correlation analysis (CCA) is an option of the CPT whereby the MOS equations are established (Barnston and Smith, 1996). This tool was developed at the IRI (http://iri.columbia.edu). The forecast fields (predictors) for each of the global models that were used in the MOS are confined over a domain that covers the area between the equator and 40 °S, and 15 °W to 60 °E. The minimum and maximum temperature data (predictand fields) cover a smaller domain of 12 °S–35 °S and 11 °W–41 °E. The predictor fields cover a larger area than the predictand fields such that any surrounding large-scale circulation patterns that could influence potentially the smaller domain are included. The MOS process begins by performing empirical orthogonal function (EOF) analysis on both the predictor and predictand fields; in this case the model forecasts the fields (850 hPa geopotential height fields) and the CRU temperature data respectively, followed by the CCA (Landman and Beraki, 2012). A choice is made as to how many EOF and CCA modes are retained by cross-validation skill sensitivity tests using the CPT's CCA tool (Landman and Beraki, 2012). In this study, four CCA modes were chosen to be retained.
Verification is the final stage of analysis and is executed over a test period that is completely distinct from the training period. Forecast skill that may be overstated artificially is kept to a minimum via this method and includes evaluations of the predictions against their observations that exclude any information following the forecast year (Landman et al., 2012). A true operational forecasting environment is emulated where there is no information available of the approaching season (Landman et al., 2012). This process is known as retroactive forecasting (e.g. Landman and Beraki, 2012). The process used is similar to the one explained in the Landman et al. (2012) paper, however, instead of predicting rainfall, temperatures (minimum and maximum) are used. Using the December to February maximum temperatures as an example in this study, the models are first trained with information from 1982/1983 to 1994/1995, providing 12 years of trained data. The maximum December to February temperatures for the following year, 1995/1996, are predicted using these trained models. The MOS equations are then re-trained using information up to and including 1995/1996 in order to predict the 1996/1997 maximum temperatures. This process is repeated until 2008/2009, thereby resulting in 15 years of independent forecast data.
The prediction skills of the six run-on seasons of minimum and maximum temperature forecasts are evaluated by placing the observed and predicted fields into three categories being defined as above-normal, near-normal, and below-normal (e.g. Landman et al., 2012). These three categories are not divided equally because of the above- and below-normal threshold values, representing the 75th and 25th percentile values of the climatological record assessed in this study (i.e. 27 years climatology) respectively (Landman et al., 2012). This tests the model's ability to predict extremes in seasonal minimum and maximum temperatures.
Owing to the seasonal climate having an inherently probabilistic nature, it therefore requires to be judged probabilistically. The key attributes of probabilistic forecasts include the reliability, defined by the confidence level communicated and if appropriate and if there are systematic biases (Landman et al., 2012); the resolution, showing if any information is useful; the discrimination, indicates if events are distinguished from non-events; and lastly the sharpness, accounts for the confidence level of the forecasts (Wilks, 2006; Troccoli et al., 2008). Forecast verification in this study is evaluated by using the relative operating characteristic (ROC) (Mason and Graham, 2002) and the reliability diagram (Hamill, 1997; Wilks, 2006). When a high seasonal minimum or maximum temperature event occurred opposed to when it did not occur, the ROC scores will indicate a higher probability by means of a higher score. Therefore, ROC detects whether a set of forecasts has the attribute of discrimination (Mason and Graham, 2002). If there is consistency between predicted probabilities and observed frequencies of an event, the forecasts would be accepted as reliable (Hamill, 1997; Wilks, 2006).
3.1. Retro-active forecast verification
ROC is the ability of a forecasting system to discriminate events from non-events, and ROC graphs are created by plotting the forecast hit rates against the forecast false alarm rates (Wilks, 2006). The ROC score is represented by the area beneath the ROC curve and is used as a gauge of discrimination between the events versus the non-events. If the area beneath the ROC curve (i.e. ROC score) is ≤ 0.5, the forecasts would be classified as having no skill but above 0.5 there would be increasing skill for the perfect discrimination of a ROC score of 1.0 (Landman et al., 2012). The ROC score in this study can be interpreted as the probability of the forecasting system successfully discriminating above- or below-normal seasons from other seasons.
Figures 1(a) and (b) are graphical representations of the ROC scores for the above- and below-normal categories for the five run-on seasons, October to December, November to January, December to February, January to March, and February to April of the maximum and minimum temperatures for both the forecasting systems as calculated for the 15 year retroactive test period, respectively. Only five out of the six seasons investigated were taken into consideration because the spring (September to November) indicated very little to no skill in capturing both minimum and maximum temperatures and therefore was omitted.
Figure 1(a) shows encouraging results in terms of the skill captured when predicting seasonal maximum temperature extremes. The coupled model or one-tiered forecasting system produces the highest overall skill in predicting seasonal maximum temperature extremes shown by the highest amount of dark grey and black in the second and fourth images in Figure 1(a). Moreover, the seasons that showed the highest ROC values are December to February, January to March, and February to April, which coincide with the period of highest seasonal temperatures found normally over Southern Africa. The above-normal category of the one-tiered system indicates the highest overall skill, which implies that the coupled model is able to predict skilfully when there is a high likelihood of experiencing extremely high seasonal maximum temperatures during mid-summer. The coupled system is also useful in predicting the likelihood of experiencing extremely high maximum temperatures during the second half of the summer season, that is, January to March. It was also interesting to note that even though skill was the highest at a lead-time of 1 month, even up to 3 month lead-times skill was found to be high particularly for late summer when using the coupled model.
Figure 1(b) shows that skill in discriminating seasonal minimum temperature extremes from non-extremes is found only from mid-summer onwards where the ROC scores reached above the 0.5 ROC score threshold. Neither the one- or two-tiered forecasting system definitively outperformed the other in capturing these minimum temperature extremes, with both above- and below-normal categories performing equally poorly.
Both the one- and two-tiered forecasting systems have been considered in predicting seasonal minimum and maximum temperature extremes, however it has been shown that the coupled model performs best at predicting seasonal maximum temperature extremes for mid and late summer, and therefore has the most potential to support decisions to be made by the health sector. It is for these reasons mentioned above that this paper will focus on only the coupled model forecasts, as well as only the prediction of seasonal maximum temperature extremes from here on.
The ROC score is sometimes criticized as a measure of forecasting performance because of its insensitivity to reliability (Troccoli et al., 2008), hence the inclusion of reliability diagrams in this study. Reliability diagrams are created by plotting the observed relative frequencies against the forecast probabilities and thereby provide the attribute of consistency between the two (Troccoli et al., 2008). The diagonal lines in Figure 2 indicate perfect reliability between the observations and forecast probabilities. These reliability diagrams also indicate whether over-forecasting or under-forecasting occurs. When the reliability graph (solid black line: above-normal or dotted black line: below-normal) lies above the diagonal, under-forecasting occurs because of observations of the specific event occurring more frequently than the event being forecasted. When the reliability graph lies below the diagonal, over-forecasting occurs because of the forecast probability exceeding the observations.
Figure 2 shows three reliability diagrams with their accompanying frequency histograms of the coupled model for the seasons December to February, January to March, and February to April at a lead-time of 2 months. The December to February maximum temperature reliability diagram shows good reliability for the above-normal category, as the solid black line follows the diagonal fairly consistently but does drop below the diagonal at around the 60% forecast probability threshold and there tends to be a lot of over-forecasting occurring for the higher forecast probabilities. The January to March and February to April reliability diagrams both indicate very strong reliability for the above-normal categories as the solid black lines almost follow the diagonal exactly but do tend to over-forecast towards the very high forecast probabilities. There is less consistency in capturing the below-normal seasonal maximum temperature extremes for all three images, with over-forecasting dominating as the dotted reliability graph lies below the diagonal more often than not.
The January to March season shows the highest reliability overall, indicating that there is a high level of consistency in capturing seasonal maximum temperature extremes for this late summer season. These results correspond well to the high ROC scores found in this season over Southern Africa. These results have shown that the coupled model has the prediction skill for certain configurations and certain seasons, that is, January to March, but how can it be predicted how economically feasible these forecasts are and if it is possible for a policymaker or health practitioner to understand and measure their value? Such questions can be addressed through the following two skill scores; the cumulative profits score and the two-alternative forced choice (2AFC) score.
3.2. Forecast performance for decision makers
The cumulative profit analysis of the coupled model hindcasts presented in this study may be used to communicate the monetary value of forecasts. This analysis is described in detail by Hagedorn and Smith (2009) and can be summarized as follows. A cumulative profit scoring parameter is used to ensure that the focus is not solely on the general skill of the forecast, but also represents various aspects of the potential economic value of such a forecasting system. The cumulative profits parameter provides an intuitive way to communicate the skill of probabilistic forecasts to both experts and more particularly to non-experts. It is of utmost importance that research completed by scientists is presented in such a way that end users including the general public may make use of the findings effectively. This parameter evaluates probabilistic forecasts by means of quantifying the skill of the forecast using an effective daily interest rate. Capital is invested into a specific probabilistic forecast or forecasting system and depending on the outcome of how well the forecast performs, a return is obtained on the investment. Therefore, the higher the amount of capital placed on a forecast that is correct, the higher the profit or return will be because of the influence of the effective daily interest rate, indicating to customers, such as a forecast user, that the forecast is worth the money spent on it.
Figure 3 represents a graph showing the three run-on summer seasons' cumulative profits from 1995 to 2009. From the figure it is clearly noted that January to March is the season with the highest cumulative profit percentage for a forecast user (around 20%) towards the end of the test period, which signifies that if investment is made in the forecast of seasonal maximum temperature extremes in January to March over Southern Africa, the return on the investment would be the highest, indicating to a non-expert that the skill of the forecast is good in January to March. A possible reason for the dramatic increase from 2002 for January to March could be because of improved seasonal forecasts that occurred during this period and then in 2007 a slight drop in cumulative profits because of an underestimation of an extremely warm season (cf. Figure 5); therefore the cumulative profits parameter is directly affected by the accuracy of the seasonal forecast. This provides non-experts with more understanding and confidence in the January to March seasonal maximum temperature extreme forecasts by representing the monetary value of the forecasting systems' skill. December to February and February to April however do not have such a high return on investments resulting in profit, but have smaller cumulative profit percentages that reach approximately 5% from 2004 onwards. It should be noted that, however, the return is still positive.
The final score analysed for these forecasts is a verification score known as the 2AFC (Mason and Weigel, 2009) test. This particular score was chosen because of its usefulness in terms of its administrative purpose. The 2AFC score provides an indication of the forecast quality to the general public as well as communicating or transferring changes in forecast quality to officials. Therefore, this is a very useful and informative score not only to atmospheric scientists but also to a variety of stakeholders.
To calculate this 2AFC score, a set of two forecast-observation pairs is used and then determined if the forecasts can be applied to discriminate between the observations. An example would be if there were two seasons, one where the temperature was warmer than the other, whether the warmer season could be identified through the forecasting result. The main objective of this score is to compare all the sets of two forecast-observation pairs available and ask the same question every time, and from that information, calculate the proportion of time the question is correctly answered. The proportion is known as the probability of a correct decision, and the question asked is the 2AFC test. If the observations are assumed to be distinguishable, the chance of choosing the correct forecast season, when provided unskilful forecasts, is 50%; therefore any value above 50% for this score indicates a skilful forecast, as it is better than purely guessing.
Figure 4 shows that the highest 2AFC scores are found in December to February and January to March over Southern Africa. This is the mid-late summer season over this region and therefore it corresponds well as to when the seasonal maximum temperate extremes will be at their highest. The north-eastern regions are the areas with the highest 2AFC scores, which signify that the highest amount of discrimination can be found over these regions. There are a large percentage of the forecasts being correctly discriminated over the Gauteng Province and adjacent regions, which is one of the most highly populated areas of Southern Africa. This essentially signifies that the general public will have confidence in the forecasts over this region for seasonal maximum temperature extremes, specifically in December to February and January to March. The south-western parts of Southern Africa tend to have less useful forecasts of seasonal maximum temperature extremes as produced by the forecast system described in this study.
To determine whether the coupled maximum temperature system is a practicable one, it would make sense to focus on a large metropolis such as Pretoria to assess the predictions that have been made over a recent 15 year period, which would be a good indication of the actual forecast if it were issued operationally. The reason for choosing this particular location is that it is a large urban centre with a huge population of approximately 3 million people and has the potential to be affected adversely by these maximum temperature extremes. Also this area falls within the high predictability region and therefore forecasts can be made with more confidence over this region. The exact location of Pretoria is 25.7256 °S, 28.2439 °E.
In Figure 5(a), which is a retro-active deterministic forecast of maximum temperatures for Pretoria, the observations follow the forecasts relatively closely as well as remaining within the upper and lower one standard deviation confidence levels for 12 out of the 15 cases, that is the majority of the time. The most obvious outliers of the forecast compared with the observations in Figure 5(a) were in 2000 and 2007 when the observations were much smaller and larger than the forecasts respectively. From Figure 5(b), the probabilistic maximum temperature hindcasts for Pretoria, it can be seen that in 2000 there was an enhanced probability that extremely low maximum temperatures were likely for that year, which signifies that the probabilistic forecast was able to capture the lower maximum temperature possibility better than the deterministic forecast. The same applies for 2007 but just that the probabilistic forecast captured the higher probability of an extremely high maximum temperature for that January to March season. With the knowledge of the probability of a predicted category occurring, additional forecast value is acquired (Mason and Graham, 1999), because of probabilistic forecasts exhibiting reliability considerably in excess of that achieved by corresponding deterministic forecasts (Murphy, 1998). If the 1999 forecast is considered it is found that a lower January to March maximum temperature was predicted than actually occurred. In fact 1999 was the third hottest season over the test period, even though it was a La Niña year. During La Niña years, Southern Africa tends to be cooler and wetter than usual, which was reflected in the temperature forecast presented in this study as well as in a rainfall forecast previously determined (Landman et al., 2012), as most seasonal forecast models are swayed significantly by ENSO's influence (Landman and Beraki, 2012).
Figures 5(a) and (b) indicate that the deterministic forecast captured the nature of the seasonal maximum extreme better than the probabilistic forecast in only a limited number of cases (e.g. 1995, 2002, 2003, and 2004). If it is considered how maximum temperature prediction for a large metropolis such as Pretoria would have been produced over a recent 15 year period, it is seen from the results presented that it would have been a good idea for decision makers to make better use of probabilistic forecasts as opposed to deterministic forecasts.
4. Discussion and conclusions
When considering the seasonal maximum temperature extremes, a large amount of skill is found in capturing these extremes, specifically when using the coupled model. The 1 month lead-time indicates the highest overall skill; however, when using the coupled model, predictions of seasonal maximum temperature extremes are possible up to 3 months lead-time with almost the same amount of skill as at a lead-time of 1 month. Therefore, forecasts can be made skilfully approximately 3 months in advance when using the fully coupled model for seasonal maximum temperatures. The above-normal category shows the highest overall skill, implying that the forecast system is able to capture when there was a high likelihood of experiencing high seasonal maximum temperatures for that particular season. The coupled model proves useful in predicting the likelihood of experiencing extremely high seasonal maximum temperatures during the second half of summer from around January to March with spring and winter indicating very little to no skill. The seasons for which the model has the highest overall skill are those of December to February, January to March, and February to April, which coincide with the period of the highest peak of the seasonal temperature annual cycle over Southern Africa. Seasonal minimum temperatures indicated very little skill and were therefore excluded from further analysis. Very little skill was indicated when seasonal temperatures were minimum were therefore excluded from further analysis.
The coupled model also proves to be the most reliable in predicting seasonal maximum temperature extremes over Southern Africa with the above-normal category producing the largest amount of confidence, again implying that when using the coupled model it can be said confidently when there will be a high likelihood of experiencing extremely high seasonal maximum temperatures for a particular season. The most reliable season was January to March with a close to perfect reliability at a 2 month lead-time. Again January to March coincides with the highest seasonal maximum temperatures found over Southern Africa. Most of the forecasts did however tend to over-forecast, especially for the high probability forecasts. There was, however, an apparent discrepancy between the relative operating characteristic (ROC) and two-alternative forced choice (2AFC) scores with the below-normal category performing better according to the 2AFC score and the above-normal category performing better according to the ROC scores. The reason for this discrepancy may be the fact that the ROC and reliability diagrams are calculated over the entire Southern African region, whereas the 2AFC score indicates a geographical distribution of skill over the chosen domain, therefore making it difficult to compare directly these parameters. However, if the Pretoria area is focused on, the above-normal category performs best in both the ROC and the 2AFC scores.
Most forecasts are verified using scientific processes and outcomes, whereas less attention is given to scores that may be of use to a non-expert audience such as government officials and the general public (Mason and Weigel, 2009). The scientific verification processes used in this study were the ROC and reliability diagrams, which provided insight into which forecasting system and which season were most skilful and reliable. In addition to those scores, the cumulative profits and 2AFC scores were analysed, which are more intuitive and provide better understanding of the forecasts to end users. Both these two scores indicated that January to March would be the most profitable season to invest in when considering seasonal maximum temperature extremes, which corresponds well to the results found when assessing the ROC and reliability verification parameters.
A further analysis over Pretoria established that there was a large amount of predictability of these seasonal maximum temperatures when using the probabilistic forecast over this location. The impact of this is that Pretoria is part of a large urban centre, where millions of people reside in houses that are not equipped with air-conditioning, thus making it a community potentially vulnerable to extreme temperatures. This paper has shown that by using a coupled probabilistic forecasting system, seasonal maximum temperatures which may adversely affect such a city centre can be predicted well in advance with high skill and confidence, as well as render profitable and useful information to non-experts.
The health industry is directly affected by temperature extremes, owing to the fact that with increased maximum temperatures there are more associated health problems. Therefore with the skilful prediction of maximum temperature extremes, the health sector may be better prepared for the health risks associated with the increase of maximum temperatures and can therefore be more capable of dealing with these problems such as heat stress and respiratory problems. Thus, it is critical for South Africa, and Africa as a whole, to develop Heat Health Watch Warning Systems; this is highlighted as a priority by the South African government in a recently released Climate Change Response White Paper (Government of the Republic of South Africa, 2011).
One key component in developing such a system is the need for forecasting ability of indicators, such as maximum temperature, that can be related to health outcomes. This paper describes the ability to forecast, on a seasonal timescale, maximum temperatures. This is a critical first step in developing and refining the required forecasting ability. Such forecasts will be useful for alerting the South African government and public health stakeholders to summers that are expected to have above normal temperatures with a 3 month lead-time. This will allow for sufficient preparation to be ready to enact heat-health alerts and prevention measures. After this, forecasting of critical indicators on a timescale of a few days will be required in order to activate the alerts and the prevention plans.
In addition to the ability to forecast key indicators, the knowledge of how South Africans respond to heat is required to determine when an alert should be activated, and information on vulnerability of populations to heat is required to know where and for which populations the alert should be focused on. The temperature at which health is impacted differs across countries and population (Basu and Samet, 2002; Medina-Ramón and Schwartz, 2007; Baccini et al., 2008; McMichael et al., 2008), as do the best meteorological indicators for predicting health outcomes (Vaneckova et al., 2011). For example, in the French system, a combination of forecasted minimum and maximum temperatures is used. This was chosen both for the suitability for forecasting and for the performance of the indicator to determine excess mortality (Pascal et al., 2006). Indices that combine multiple meteorological parameters (e.g. temperature, relative humidity, and wind speed) have been created in order to describe how hot it feels. One such example is apparent temperature (AT) and has been used by countries to develop heat alerts and has been found to be related to excess mortalities (Steadman, 1979; Smoyer-Tomic and Rainham, 2001; Watts and Kalkstein, 2004; Ballester et al., 2011). However, a study in Brisbane, Australia, found that average temperature measurements performed similarly to composite indices (such as AT) in detecting excess mortality days. There is disagreement on what the best meteorological indictor is as it is strongly dependent upon the forecasting ability and the population's response to heat.
Thus, before a heat-health plan could be developed in South Africa, research is required in collecting historical health and meteorological data to investigate the relationship between heat and mortality and morbidity across South Africa. In addition, the relationship needs to be probed across multiple meteorological indicators. The forecasting work in this paper must be continued in order to investigate, and improve upon where needed, the ability of these various meteorological indicators to be forecasted seamlessly from seasonal to weather timescales. By forecasters and health researchers working collaboratively, health alert and prevention plans then can be developed.
In conclusion, one-tiered forecasting systems or coupled models outperform two-tiered systems when predicting seasonal maximum temperature extremes over Southern Africa. This paper shows that coupled models, when analysed probabilistically, exhibit skill in capturing maximum temperature extremes over Southern Africa for the mid-late summer season. Therefore, this modelling contribution demonstrates that it is definitely feasible to direct some of the available research and modelling funds, as well as effort towards the development and implementation of operational seasonal forecasting systems that incorporate fully coupled models, as well as health alert and prevention strategies.
This work was partly supported financially by the National Research Foundation (NRF) of South Africa, the Applied Centre for Climate and Earth System Science (http://www.access.ac.za), a Council of Scientific and Industrial Research (CSIR) grant, and by the Peter Carpenter Scholarship for African Climate Change. The computing time used to produce the retrospective forecasts from IRI was largely provided by grants from the multiagency Climate Simulation Laboratory (CSL) programme, with Dave DeWitt as PI.