A semiparametric multivariate, multisite weather generator with low-frequency variability for use in climate risk assessments


  • Scott Steinschneider,

    Corresponding author
    1. Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, Massachusetts, USA
    • Corresponding author: S. Steinschneider, Department of Civil and Environmental Engineering, University of Massachusetts Amherst, 130 Natural Resources Rd., Amherst, MA 01002, USA. (scottsteinschneider@gmail.com)

    Search for more papers by this author
  • Casey Brown

    1. Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, Massachusetts, USA
    Search for more papers by this author


[1] A multivariate, multisite daily weather generator is presented for use in decision-centric vulnerability assessments under climate change. The tool is envisioned to be useful for a wide range of socioeconomic and biophysical systems sensitive to different aspects of climate variability and change. The proposed stochastic model has several components, including (1) a wavelet decomposition coupled to an autoregressive model to account for structured, low-frequency climate oscillations, (2) a Markov chain and k-nearest-neighbor (KNN) resampling scheme to simulate spatially distributed, multivariate weather variables over a region, and (3) a quantile mapping procedure to enforce long-term distributional shifts in weather variables that result from prescribed climate changes. The Markov chain is used to better represent wet and dry spell statistics, while the KNN bootstrap resampler preserves the covariance structure between the weather variables and across space. The wavelet-based autoregressive model is applied to annual climate over the region and used to modulate the Markov chain and KNN resampling, embedding appropriate low-frequency structure within the daily weather generation process. Parameters can be altered in any of the components of the proposed model to enable the generation of realistic time series of climate variables that exhibit changes to both lower-order and higher-order statistics at long-term (interannual), mid-term (seasonal), and short-term (daily) timescales. The tool can be coupled with impact models in a bottom-up risk assessment to efficiently and exhaustively explore the potential climate changes under which a system is most vulnerable. An application of the weather generator is presented for the Connecticut River basin to demonstrate the tool's ability to generate a wide range of possible climate sequences over an extensive spatial domain.

1. Introduction

[2] The reluctance of the global community to mitigate greenhouse gas emissions and the legacy of past emissions already produced spurs the need for climate change adaptation. Recently, bottom-up or “decision-centric” approaches to identifying robust climate change adaptations have become more popular in the literature [Jones, 2001; Johnson and Weaver, 2009; Lempert and Groves, 2010; Prudhomme et al., 2010; Wilby and Dessai, 2010; Brown et al., 2011; Brown and Wilby, 2012]. These approaches focus on a system of interest (e.g., agricultural lands, an ecosystem, a reservoir, etc.) and systematically identify its vulnerabilities to climate; this contrasts “scenario-led” methods that limit the analysis to a set of climate model projections that may or may not reveal a system's climate sensitivities. A critical step in decision-centric methods involves testing the performance of a system over a range of plausible climate changes to identify harmful climate states that could cause the system to fail. As the literature on this topic is relatively young, limited tools have been investigated for the production of altered climate time series over which to conduct the vulnerability assessment. This study presents a new stochastic weather generator specifically designed to aid these assessments. The model can be used to generate time series of weather expressing various changes in the climate at multiple temporal scales. Such time series may be especially useful for exploring changes that are expected to occur, such as increasing intensity and decreasing frequency of precipitation consistent with the acceleration of the hydrologic cycle, or changes to low-frequency climate variability, that are not well simulated in current global climate model projections.

[3] Bottom-up or vulnerability-based approaches to climate change adaptation form a relatively new area of research that attempts to appraise possible adaptations of a system to climate stressors by first identifying the climate vulnerabilities of that system over a wide range of potential climate changes. After system vulnerabilities are identified, different adaptation strategies can be evaluated over threatening climate states in order to identify robust adaptation measures. The likelihood of harmful climate conditions can also be assessed using available climate information, including the most up-to-date climate modeling results (e.g., global circulation model (GCM) projections). By detaching the identification of system vulnerabilities from climate projections produced by GCMs, bottom-up approaches differ from more traditional top-down approaches that depend on a limited number of internally consistent climate scenarios to explore the range of potential climate change impacts [Christensen et al., 2004; Wiley and Palmer, 2008]. It has been argued that bottom-up methods are better equipped to provide more decision-relevant information useful in identifying robust adaptation measures under deep future uncertainty [Lempert et al., 1996]. In part, this is because bottom-up approaches can better explore a full range of plausible climate changes, whereas GCM projections provide only a limited view and do not delimit the possible range (although they are often interpreted to do so) [see Stainforth et al., 2007; Deser et al., 2012].

[4] Despite the growing interest in decision-centric approaches, technical methods for actually conducting the vulnerability assessment (i.e., generating perturbed climate sequences over which to test system vulnerability) are relatively underdeveloped. To date, only a handful of methods have been utilized. The most popular approach has been to apply simple change factors to the historic record of precipitation and temperature, effectively testing system sensitivities to mean climate shifts [Johnson and Weaver, 2009; Gober et al., 2010; Lempert and Groves, 2010; Brown et al., 2012]. Other studies have explored more detailed changes, including shifts in intraannual climate [Prudhomme et al., 2010] and high-order statistics (e.g., variance, serial correlation) of annual hydroclimate data [Moody and Brown, 2013]. While all of these approaches were appropriate for their specific application, these methods exhibit limited ability to perturb the entire distribution of climate variables or alter their behavior at multiple temporal scales. For instance, none of the methods mentioned are equipped to simulate climates exhibiting shifts in both long-term (decadal) precipitation persistence and extreme daily precipitation amounts. Yet both of these changes are possible under climate change [Timmermann et al., 1999; Collins, 2000; Intergovernmental Panel on Climate Change, 2007] and may be important in a climate sensitivity analysis for a particular system (e.g., a reservoir jointly managed for flood risk reduction and water supply). Thus, there is a need for more generalized and comprehensive tools to conduct climate vulnerability assessments for systems sensitive to different climate variables across multiple temporal scales.

[5] We propose stochastic weather generators as one possible tool that can fulfill this need. Stochastic weather generators are computer algorithms that produce long series of synthetic daily weather data. The parameters of the model are conditioned on existing meteorological records to ensure the characteristics of historic weather emerge in the daily stochastic process. Weather generators are a popular tool for extending meteorological records [Richardson, 1985], supplementing weather data in a region of data sparsity [Hutchinson, 1995], disaggregating seasonal hydroclimatic forecasts [Wilks, 2002], and downscaling coarse, long-term climate projections to fine-resolution, daily weather for impact studies [Wilks, 1992; Kilsby et al., 2007; Groves et al., 2008; Fatichi et al. 2011, 2013]. Their use for climate sensitivity analysis of impact models has also been tested, particularly in the agricultural sector [Semenov and Porter, 1995; Mearns et al., 1996; Riha et al. 1996; Dubrovsky et al. 2000; Confalonieri, 2012]. These sensitivity studies systematically change parameters in the model to produce new sequences of weather variables (e.g., precipitation) that exhibit a wide range of change in their characteristics (e.g., average amount, frequency, intensity, duration, etc.). By incrementally manipulating one or more parameters in the model, many climate scenarios can be simulated that exhaustively explore potential futures that exhibit slight differences in nuanced climate characteristics, such as the intensity and frequency of daily precipitation, the serial correlation of extreme heat days, or the recurrence of long-term droughts. Previous bottom-up climate impact assessments, which have relied heavily on simple change factors to generate new climate sequences, have not been able to test system vulnerabilities over such a wide range of plausible climate changes. To the authors' knowledge, only one study has used a weather generator to investigate a system's climate sensitivity in the context of a decision-centric climate change analysis [Jones, 2000], and this study only examined changes in mean temperature and precipitation. The potential of weather generators for driving vulnerability assessments in bottom-up climate change studies has not yet been adequately explored, particularly with respect to nuanced aspects of climate variability.

[6] While the use of stochastic weather generators for bottom-up risk assessments is very attractive in theory, there are many challenges that arise in practical application. As mentioned earlier, socioeconomic and biophysical systems are often vulnerable not only to changes in mean climate but also to changes in nuanced climate variability. Therefore, the chosen weather generator should be able to easily perturb any of these climate characteristics, which not all models in the literature can easily accomplish [Wilks and Wilby, 1999]. Additionally, impact models often require sequences of several weather variables at multiple locations that exhibit a realistic covariance structure between variables and across sites. The production of spatially distributed, correlated weather variables continues to challenge certain approaches to stochastic weather generation [Beersma and Buishand, 2003]. Weather variables can also exhibit long-term persistence [Hurst, 1951; Koutsoyiannis, 2003] on timescales up to decades that can significantly impact system performance, requiring that the chosen weather generator be capable of replicating (and possibly altering in a bottom-up analysis) structured low-frequency climate variability.

[7] The literature is rich with examples of stochastic weather generators that can address some subset of the challenges listed above. Both parametric and nonparametric models have been proposed to maintain correlation structures between variables and across sites [Wilks, 1998, 1999; Rajagopalan and Lall, 1999; Buishand and Brandsma, 2001; Wilby et al., 2003, Apipattanavis et al., 2007]. Some have argued that nonparametric models may be more capable than their parametric counterparts to reproduce the spatial covariance structure of multivariate weather variables [Buishand and Brandsma, 2001], but the ability to specify distributional shifts in weather variables is often more straightforward using parametric approaches [Wilks and Wilby, 1999]. Several models have also been proposed to preserve low-frequency variability observed in the historic record [Hansen and Mavromatis, 2001; Dubrovsky et al., 2004; Wang and Nathan, 2007; Chen et al., 2010; Fatichi et al. 2011; Kim et al., 2011], but these approaches have not been generalized to multisite applications. After a substantial literature review, the authors were only able to identify one stochastic weather generator in the literature with the ability to specify distributional shifts in weather variables while simultaneously maintaining low-frequency climate variability and intervariable and intersite correlations [Srikanthan and Pegram, 2009], and the simulation of multidecadal climate persistence may still be difficult with this model formulation. In the context of vulnerability based climate change assessments, a new model is required that can simultaneously simulate weather variables exhibiting accurate correlations between variables and across sites, appropriate long-term persistence at interannual and interdecadal time scales, and shifted distributional characteristics hypothesized under climate change.

[8] This study presents a stochastic weather generator with greater ability to support bottom-up vulnerability assessments under climate change for a wide range of socioeconomic and biophysical systems sensitive to different aspects of climate variability and change. The proposed stochastic model addresses all of the challenges mentioned above with several components, including (1) a wavelet decomposition coupled to an autoregressive model to account for structured, low-frequency climate oscillations, (2) a Markov chain and k-nearest-neighbor (KNN) resampling scheme to simulate spatially distributed, multivariate weather variables over a region, and (3) a quantile mapping procedure to enforce long-term distributional shifts in weather variables under climate change. Parameters that govern each model component can be altered to perturb various statistics of the climate system at different temporal scales. The tool can be coupled with impact models in a decision-centric risk assessment to determine the potential climate changes under which a system is most vulnerable. This allows the analyst to evaluate system performance over a wide range of possible climate changes to identify risk or to investigate specific climate change effects that are of concern (e.g., less frequent but more intense rainfall). An application of the weather generator is presented for the Connecticut River basin to demonstrate the tool's ability to generate a wide range of possible climate sequences over an extensive spatial domain. The remainder of the paper proceeds as follows. The proposed weather generator is presented in section 2. The model is evaluated in section 3, and section 4 demonstrates the ability of the model to produce various climate sequences for use in a bottom-up climate change analysis. The article then concludes with a discussion in section 5.

2. The Weather Generator

[9] A flexible weather generator is desired that can accurately reproduce various characteristics of the historic climate regime while introducing the capacity to alter many of these characteristics in a decision-centric climate change analysis. The model considered in this work couples an autoregressive wavelet decomposition [Kwon et al., 2007] for extracting and simulating low-frequency structure in annual climate with a multivariate weather generator [Apipattanavis et al., 2007] that effectively captures daily weather characteristics, including dry and wet spell statistics, cross correlations between weather variables, and spatial correlations across multiple sites. The two models are linked by conditioning the daily weather generator on simulations of annual climate produced by the autoregressive wavelet decomposition. Time series of weather variables produced by the coupled modeling approach are then altered in a third step used to enforce distributional shifts in the climate. For precipitation, a quantile mapping procedure is utilized to implement this change. Long-term shifts in other variables are enforced using simpler additive and scaling methods. A flow diagram of the overall modeling framework is given in Figure 1. The various submodels and algorithms used are described in detail below.

Figure 1.

Schematic flowchart of the daily weather generation process conditional on annual simulations of climate and subject to postprocess distributional adjustments.

2.1. Wavelet Autoregressive Model for the Preservation of Low-Frequency Structure

[10] Most daily weather generators produce weather simulations that tend to be overdispersed at interannual timescales and fail to reproduce observed low-frequency persistence. Several studies have proposed methods to correct for overdispersion in weather simulations [Hansen and Mavromatis, 2001; Dubrovsky et al., 2004; Wang and Nathan, 2007; Chen et al., 2010; Fatichi et al. 2011; Kim et al., 2011]. This study utilizes a relatively new approach put forth in Kwon et al. [2007] that extracts low-frequency signals in climate data using wavelet decomposition and then stochastically simulates each signal using autoregressive time series models. By simulating each signal separately, the wavelet autoregressive model (WARM) can better reproduce a time series of climate exhibiting a similar spectral signature to the observed data. In our methodology, the WARM approach is applied to annual, area-averaged precipitation over the region of interest. Each year of generated annual precipitation is then used to inform a single-year simulation of the daily weather generator (described below), embedding appropriate low-frequency structure within the daily weather generation process.

[11] Let math formula represent a time series of annual, area-averaged precipitation for a region. The WARM approach decomposes this series into H orthogonal component series, zh, that represent different low-frequency signals, as well as a residual noise component ε.

display math(1)

[12] A simulation of math formula is generated with time series models of each low-frequency component and the residual noise. Following Kwon et al. [2007], we consider linear autoregressive (AR) models for each term:

display math(2)

[13] Here, ρh is the order of the AR model for the hth low-frequency component, ρ is the model order for the residual noise term, e and ξ are independently and identically distributed white noise processes, and math formula and βu are the AR model coefficients. Wavelet decomposition is used to generate the low-frequency components and residual noise term in equation (2). The wavelet transform is an analysis tool that enables the decomposition of a signal into orthogonal components in both the time and frequency domain [Torrence and Compo, 1998]. In-depth details on the implementation of the wavelet transform and its use in the WARM approach can be found in the supporting information for this article. Time series models can be fit to each low-frequency component and the residual noise term using well-documented model-fitting procedures [Box and Jenkins, 1970]. A simulated time series of annual precipitation, math formula, can then be generated by summing the simulations of each component.

[14] The daily weather generator (presented in section 2.2) must be conditioned on the annual climate simulations produced using WARM to embed appropriate low-frequency structure within the daily weather generation process. To achieve this, the WARM simulation is used to generate a new climate data set for each simulation year that is composed of a weighted resampling of historic years. The daily weather generator is then iteratively fit to each new data set for a given simulation year and run for 365 days. The methodology for conditioning the daily weather generator on WARM simulations proceeds as follows:

[15] 1. Generate a simulation of annual precipitation of length Ta using the WARM procedure.

[16] 2. For simulation year ta, calculate the Euclidean distances math formula between the WARM simulated area-averaged precipitation value, math formula, and the vector of annual, historic, area-averaged precipitation, math formula.

[17] 3. Order the distances from smallest to largest and assign weights to the k smallest distances using a discrete kernel function given as

display math(3)

[18] Here, j indexes the first k ordered distances dj. These weights, which are greatest for the nearest neighbor and smallest for the kth neighbor, sum to 1 and thus form a discrete probability mass function. We follow the heuristic approach suggested by Lall and Sharma [1996] and set k equal to the square root of the number of years of historic data.

[19] 4. Sample with replacement 100 of the k nearest neighbors based on the kernel weights from step 3. Determine the associated years of the 100 selected neighbors. Gather all of the daily data from the 100 selected years into a new data set to be associated with simulation year ta. We note that data may be repeated in this new data set because years can be sampled more than once.

[20] 5.Build the daily weather generator using this conditional data set and run it over the length of 1 year.

[21] 6. Repeat steps 1–5 for all Ta years of the annual WARM simulation.

2.2. Semiparametric Multivariate and Multisite Weather-Generating Algorithm

[22] The daily weather generation process utilized in this study is based on the methods proposed in Apipattanavis et al. [2007]. That study coupled a Markov chain and KNN resampling scheme to simulate spatially distributed, correlated, multivariate weather variables over a region. The Markov chain is used to better represent wet and dry spell statistics while the KNN bootstrap resampler preserves the covariance structure between the weather variables and across space. Since the details of the method can be found in Apipattanavis et al. [2007], only a brief overview will be provided here.

[23] Assume a simulated, daily time series of R weather variables math formula is desired at L different locations, where math formula represents the ith weather variable (e.g., precipitation) at time t and location l, and T is the length of the simulation. A weather generation scheme is designed to simulate area-averaged weather variables, math formula, that can then be immediately disaggregated to individual locations. The weather generation approach is based on the common practice of first simulating precipitation occurrence, St, as a chain-dependent process. A three-state (extremely wet (St=2), wet (St=1) or dry (St=0)) Markov chain of order 1 is used to simulate the occurrence of area-averaged precipitation across the L locations. The number of states and chain order can be chosen to maximize performance while maintaining model parsimony using quantitative criteria such as Akaike's information criterion [Akaike, 1974], though this study simply follows the chain structure suggested in Apipattanavis et al. [2007]. Nine transition probabilities (p00, p01, p02, p10, p11, p12, p20, p21, p22) for the three-state Markov chain are fit to the area-averaged precipitation occurrence time series by month using the method of maximum likelihood. Here, pab denotes the probability of precipitation state b occurring, given the occurrence of state a on the previous day. A threshold of 0.3 mm is chosen to distinguish between wet and dry days at the area-averaged scale, while the 80th percentile of area-averaged precipitation (by month) is used as the threshold for extremely wet conditions. Again, these values are taken directly from Apipattanavis et al. [2007].

[24] Area-averaged precipitation occurrence can be simulated from the fitted Markov chain using standard procedures well documented in the previous weather generation literature. After simulating the occurrence of area-averaged precipitation states, a vector of weather variables math formula must be simulated and then disaggregated to each of the L locations. A KNN resampling algorithm of lag-1 is used to generate the values for all the weather variables. This algorithm follows a six-step process:

[25] 1. Let math formula be a vector of area-averaged weather variables already simulated for day t−1. Also assume, without loss of generality, that the Markov chain had simulated day t−1 and day t as wet days.

[26] 2. Partition the historic record to find all pairs of days in a 7 day window centered on day t (if day t is 15 January, then the window includes all historic days from 12 to 18 January) that have the same sequence of area-averaged precipitation states simulated by the Markov chain for day t−1 and day t (in this case, two wet days in a row). Assume there are Q such pairs, each containing 2 days of area-averaged weather, math formula and math formula.

[27] 3. Calculate the weighted Euclidean distance, dq, between the simulated, area-averaged vector of weather variables, math formula, and each of the Q vectors of historic, area-averaged variables:

display math(4)

[28] Here, math formula denotes the ith area-averaged weather variable already simulated for time t−1, math formula denotes the same area-averaged weather variable on the first day of the qth historic pair sampled in step 2, math formula is the mean of the ith area-averaged weather variable across all time steps, and wi denotes the weight. In this study, each weight wi is set equal to the inverse of the standard deviation of the ith weather variable, though there are methods in the literature for selecting weights in KNN resampling procedures to produce optimal forecasts [Karlsson and Yakowitz, 1987]. By centering each variable in the distance equation about its mean and dividing by its standard deviation, we standardize values and give near-equal importance to each variable in the nearest-neighbor calculation. Prior to normalization, transformations may be required for non-Gaussian weather variables.

[29] 4. Order the distances dq from smallest to largest. The k smallest distances are assigned weights using the same discrete kernel function presented in equation (3). Again, we follow the heuristic approach suggested by Lall and Sharma [1996] and set math formula.

[30] 5. Sample one of the k-nearest neighbors based on the weights developed in step 4 and record the historic date associated with that selected neighbor. Then, use vectors of weather variables math formula on the successive day to the recorded date for each of the L locations to simulate the multivariate, multisite weather for day t.

[31] 6. Repeat steps 1–5 for all T days of the simulation.

[32] To begin the algorithm and generate initial values for all weather variables, data for a random day from the simulation starting month is selected from the historic record that is consistent with the first precipitation state simulated by the Markov chain.

2.3. Quantile Mapping Technique to Enforce Long-Term Climate Changes

[33] By just using the coupled models of sections 2.1 and 2.2, it is not feasible to generate weather outside of the range of historic variability, nor is it possible to change the distribution of those variables. In the context of a vulnerability assessment, this capability is critically important, particularly for precipitation, which often dominates system performance. The approach developed here incorporates a quantile mapping method to alter the distribution of daily precipitation. Alterations to other weather variables are treated more simply using standard additive or multiplicative factors.

[34] Let math formula be daily, nonzero precipitation values for month m and location l simulated from the daily weather generator. Assume the simulated precipitation amounts can be modeled by a theoretical cumulative distribution function math formula with parameters η. A “target” cumulative distribution function, math formula, is introduced that represents the projected distribution of future precipitation under a climate change. For simplicity, we assume that math formula and math formula arise from the same distribution but differ between their parameter sets, η and η*. The parameter set η* can be altered to control how the distribution of future precipitation differs from the historic observations. Many possible changes in precipitation characteristics are possible through adjustments to η*, including shifts in the mean, standard deviation, or extremes. For example, assume historic and projected precipitation for month m follow two-parameter Gamma distributions with shape and scale parameters η = {κ, θ} and η* = {κ*, θ*}. The parameter set η can be estimated by fitting a Gamma distribution to math formula. Then, a new mean math formula and variance math formula can be specified for the target Gamma distribution, and the parameter set η* can be inferred using the relationships between the parameters and the first two moments, math formula and math formula. If changes in the first two moments do not sufficiently account for particular shifts in higher order statistics that are of interest, the target parameter set η* can be further tailored to better impose this change. Once the parameter set η* of the target distribution is specified, a quantile mapping procedure can be used to alter the distribution math formula of simulated nonzero precipitation to match that specified by math formula (Figure 2). To do this, we first determine the exceedance probability of the tth value of synthesized precipitation for month m, math formula, from the cdf math formula. Then, the target cdf math formula is used to map this exceedance probability to a new precipitation amount, math formula, that is consistent with the specified distribution for climate-altered monthly precipitation:

display math(5)
Figure 2.

The quantile mapping procedure to adjust daily, nonzero precipitation values. (a) A sample of an original time series of April precipitation simulated by the weather generator. The blue point represents a sample precipitation value to be adjusted. (b) The cdf for the fitted gamma distribution to the original simulation of April precipitation (black), as well as the target cdf used to make the adjustments (red). (c) The rectangle delimits an inset, shown in detail. Here, the precipitation value represented by the blue point in Figure 2a is mapped to a new precipitation value via four steps. (d) The new, adjusted precipitation time series, including the adjusted point (blue), is shown.

[35] This procedure is repeated for each nonzero precipitation amount synthesized by the weather generator.

3. Model Evaluation

[36] To evaluate the performance of the proposed weather generator, we apply it to daily weather data distributed across the Connecticut River basin in the New England region of the United States. Daily precipitation and maximum and minimum temperature are the variables included in the analysis. The data are available between 1 January 1949 and 31 December 2010 as gridded observations with a spatial resolution of approximately 144 km2 [Maurer et al., 2002]. The Connecticut River basin drains over 31,000 km2 and contains a large number (260) of grid cells, enabling an evaluation of the multisite performance of the approach. The spatial extent of the proposed model application is quite large, and so adequate performance of the model at this spatial scale greatly supports its use for vulnerability assessments of large, spatially expansive systems. For evaluation, the model is run 50 separate times, each 62 years long (the length of the historic record). We examine the reproduction of multiple characteristics of each weather variable at several different time scales.

[37] Figure 3 shows the mean, standard deviation, and skew of nonzero daily precipitation amounts, daily maximum temperature, and daily minimum temperature for all combinations of months and grid cells. The median values of these statistics are taken over the 50 different simulations for comparison against the historic statistics. The results suggest good performance for all variables and statistics except for the skew of daily precipitation, which tends to be underestimated in the simulations for some grid cells.

Figure 3.

Daily performance statistics for all grid cells and months, including the mean, standard deviation, and skew of precipitation, maximum temperature, and minimum temperature. Median values across the 50 different simulations are shown against the observed values.

[38] Correlations of a given variable across sites and cross correlations between different variables for a given site are shown in Figure 4. Again, median values across the 50 simulations are shown. Both types of correlation are very well preserved, as is expected given the resampling techniques used to generate the daily weather sequences. The simulations also capture the average number of dry and wet days across all sites and months rather well (Figure 5). There is a slight underestimation of the average lengths of wet and dry spells, particularly for those grid cells with larger spell lengths, but this underestimation is slight (less than a day).

Figure 4.

Intersite correlations for daily precipitation, maximum temperature, and minimum temperature, as well as cross correlations between each pair of variables. Median values across the 50 different simulations are shown against the observed values for all grid cells. Correlations are taken across the entire simulation/observed record.

Figure 5.

Average number of dry and wet days per month, as well as the average dry and wet spell length per month, across all grid cells. Median values across the 50 different simulations are shown against the observed values.

[39] The spread of lag-1 autocorrelations across the 50 different simulations are shown in Figure 6. For each variable, the distribution of this statistic is shown for the average autocorrelation across all sites. There is a negative bias in the lag-1 autocorrelations for daily precipitation, although this bias is slight. Similarly, the simulations tend to consistently underestimate the autocorrelation in the temperature fields, but again this bias is actually rather small in magnitude. The slight underestimation of serial correlation for all variables could likely be improved by increasing the order of the Markov chain, but no such correction was made here.

Figure 6.

Distributions of lag 1 serial correlation values for precipitation and maximum and minimum temperature across the 50 model simulations. The average serial correlation across all grid cells is shown. Observed values are shown by the red triangles. All serial correlations are taken across the entire simulation/observed record.

[40] To explore the reproduction of extremes, Figure 7 shows the distribution of 10 and 20 year maximum annual precipitation events, as well as the average number of extreme heat days, across the 50 simulations. The precipitation extreme value estimates were developed for each grid cell by fitting a Generalized Extreme Value (GEV) distribution to the time series of annual maximum precipitation at that location. The temperature extremes were taken as the average number of days per year above 32°C. The distributions for the average of these statistics across all locations are shown for the ensemble of 50 simulations. The model tends to underestimate the magnitude of extreme rainfall events, although the spread of model simulations contains the observed value for the 10 year event and nearly reaches the observed value for the 20 year event. For temperature extremes, the model again shows a slight negative bias, although the range of simulations does contain the observed value. Overall, there is a moderate negative bias in the extremes, an effect that can often emerge in weather generators that rely on data resampling [Lee et al., 2012].

Figure 7.

Distributions of the 10 and 20 year precipitation event, as well as the average number of extreme heat days per year (>32°C), across the 50 model simulations. The average of each extreme event across all grid cells is shown. Estimated from the observed data are shown by the red triangles. All precipitation extreme value estimates are derived from a fitted GEV distribution.

[41] Statistical comparisons for annual precipitation totals and temperature averages are shown in Figure 8. The mean precipitation and temperature fields are well preserved at the annual timescale. The standard deviation of precipitation is adequately captured for all but a few grid cells. The standard deviation of both temperature fields tends to be undersimulated, particularly for those grid cells exhibiting greater annual temperature variability. The skew for all three variables is not well captured by the model, although we note that there is significant uncertainty in the observed skew values due to the small number of annual observations available for its calculation. For precipitation and maximum temperature, the skew is overestimated for those grid cells with small skew values and underestimated for those grid cells with larger skew values. This particular model discrepancy may be due to the fact that basin-averaged climate fields are being used to drive the model over a large and somewhat heterogeneous region.

Figure 8.

Annual performance statistics for all grid cells, including the mean, standard deviation, and skew of cumulative precipitation, maximum temperature, and minimum temperature. Median values across the 50 different simulations are shown against the observed values.

[42] Finally, the power spectra of annual precipitation values are examined in Figure 9. One low-frequency component (H=1) with significant periods between 1 and 4 years was modeled in the WARM approach. The mean simulated power spectrum across the 50 simulations matches that seen for the observations reasonably well. Most importantly, the mean simulated spectra become statistically insignificant at around the same period length (∼4 years) as in the observations. Furthermore, the observed spectra are completely within the 95% uncertainty bounds.

Figure 9.

Power spectra for annual precipitation. The observed spectra (black solid) are compared against the mean power spectra (dashed blue) of the 50 simulations, along with range bounded by the 2.5th and 97.5th percentiles of the power spectra for the ensemble (gray). Also shown is the 95% significance level (red dotted) developed from a red noise background process. The power spectra of the observations and simulations become statistically significant if they rise above the red dotted line.

[43] Overall, the performance of the model for most statistics is either good or adequate, with only some moderate discrepancies in the higher-order statistics. This is promising given that the model is being applied to a very large region subject to various changes in topography, which can often be quite challenging for weather generation procedures. Furthermore, we note that these performance statistics are comparable to those seen in the weather generator presented in Srikanthan and Pegram [2009], which is the only other weather generator in the literature with the ability to specify distributional shifts in weather variables while simultaneously maintaining low-frequency climate variability and intervariable and intersite correlations.

4. Model Demonstration for a Climate Stress Test

[44] The daily weather generator was specifically designed to facilitate a decision-centric climate risk assessment of systems sensitive to several components of the climate at various temporal scales. In the modeling framework presented here an emphasis was placed on altering precipitation patterns in the climate system because this variable often dominates the performance of biophysical and socioeconomic systems. Several parameters can be adjusted in the model to vary different components of precipitation (see Table 1). These include the parameters for the target distribution in the quantile mapping scheme, the transition probabilities of the Markov chain, the coefficients of the AR model for low-frequency components, and the standard deviation of white noise for those AR models. By changing these parameters, shifts in daily precipitation amounts, daily persistence, interannual persistence, and interannual variability can be implemented in a bottom-up climate change assessment. The exact outcome of some of these perturbations will be known a priori, such as with the quantile mapping procedure, while outcomes from other perturbations can only be approximated prior to the simulation due to the stochastic formulation of the model. This is the case for changes in annual persistence forced by alterations to the parameters of the WARM model. Furthermore, scaling factors and delta shifts can be applied to other climate fields (e.g., daily temperatures, wind speeds, etc.) to explore other system sensitivities to potential climate changes. Many of these changes, including those related to the quantile mapping, delta shifts, and transition probabilities, can be implemented differently by month, allowing for seasonal climate changes to be explored.

Table 1. Model Parameters That Can Be Altered to Perturb the Climate System at Various Temporal Scales
Climate FieldModel ComponentParameterEffectTiming
PrecipitationQuantile mappingTarget distribution parameters (η*)Change entire distribution of daily precipitation by monthXX 
Daily weather generatorTransition probabilities (pab)Alter the daily persistence of daily precipitation by monthXX 
WARMCoefficients of the AR model (αh)Adjust the persistence of low-frequency signals  X
WARMStandard deviation of AR white noise (σe)Adjust the magnitude of low-frequency signals  X
TemperatureDaily Weather GeneratorDelta Shifts (δt)Shift the daily temperature values by monthXX 

[45] To demonstrate how this model could be used in a decision-centric climate risk assessment, the weather generator is used to generate several sequences of weather representing various types of climate change for the Connecticut River basin. Five types of climate change are examined here, including alterations to the mean of daily precipitation, the coefficient of variation of daily precipitation, the daily persistence of precipitation, the magnitude of low-frequency variability, and the level of persistence in that low-frequency variability. All adjustments are applied as step changes in the model rather than trended changes. The model parameters being changed and the magnitude of their perturbation are given in Table 2. Various combinations of these changes are presented below in order to illustrate the types of climate change that can be explored with the tool, as well as the potential, unintended consequences that may arise in other variables from the imposed parameter changes.

Table 2. Climate Changes Included in the Stress Testa
Climate ChangeModel Parameter AdjustedSize of Adjustmentb
  1. a

    All adjustments are applied as step changes in the model rather than trended changes.

  2. b

    All values show the size of the change above baseline values.

Mean precipitationMean of daily precipitation (μ*)±30%
Precipitation variabilityCoefficient of variation of daily precipitation math formula+30%
Daily precipitation persistenceTransition probabilities p0,1 and p0,0−0.2 (p0,1)
+0.2 (p0,0)
Magnitude of low-frequency variabilityStandard deviation of white noise for all AR models (σe, σξ)±30%
Persistence of low-frequency variabilityLag-1 coefficient for low-frequency component (α1)−0.2

[46] Figures 10a and 10b show the changes to the distribution of nonzero daily precipitation at one grid cell in April caused by increasing the mean and coefficient of variation, respectively, for that month by 30% in the quantile mapping procedure. All other components of the climate system were kept unchanged from their historic, fitted values. Comparisons are made against a baseline model run with no changes imposed. When the mean value is increased in the quantile mapping approach, the entire distribution of daily precipitation values is shifted upward (Figure 10a). These values are shifted in such a way to ensure that the variability of precipitation (i.e., the coefficient of variation) does not change. Correlations between precipitation and maximum temperature are examined to determine whether mean changes under the quantile mapping procedure degrade relationships between precipitation and other variables (Figure 10d). For mean changes, these relationships appear well preserved. The distribution of daily April precipitation looks quite different when the mean is kept constant but the coefficient of variation is increased (Figure 10b). Here, the distribution is stretched to increase the highest events (>0.85 nonexceedance level) while lowering all of the remaining, smaller precipitation values in order to maintain the same mean value. This stretching of the distribution causes distortions in the correlations between precipitation and temperature, producing a negative bias in the correlation values across most grid cells (Figure 10e).

Figure 10.

Intended (first row) and unintended (second row) changes to various weather characteristics due to forced changes in model parameters, including the mean of daily precipitation (first column), the coefficient of variation (CV) (second column), and transition probabilities in the Markov chain (third column). Comparisons are made between a model run with the change imposed and a baseline run without any parameter changes. (a, b) Baseline (black solid) and adjusted (red dashed) empirical distributions of nonzero April precipitation for a single grid cell. (c) The average number of dry days per month across all grid cells. (d, e) The cross correlation between nonzero precipitation and maximum temperature at each grid cell. (f) The average number of extreme heat days (>32°C) per year across all grid cells.

[47] Figure 10c shows the average number of dry days per month across all grid cells for a model run under baseline transition probabilities in the Markov chain and a run with increased persistence in dry days. As expected, the run with a greater persistence in dry days exhibits an increased number of these events. Unlike the results from the quantile mapping procedure, however, the change in this statistic for each grid cell can only be determined after imposing the alternative model parameterization and exploring the resulting climate sequence, because daily precipitation persistence is being modeled (and altered) at the basin-average scale. We also note that alterations to daily precipitation persistence can change the distribution of certain temperature statistics that depend on the occurrence of precipitation. For instance, increases in dry day persistence also lead to more extreme heat days (>32°C) across most grid cells (Figure 10f).

[48] Finally, we present a sample of model runs exhibiting changes to the magnitude, variability, and frequency of annual precipitation. The model runs are compared against an ensemble of GCM projections to demonstrate how the weather generator can produce a much wider range of potential climate changes than the limited view afforded by the GCMs. Figure 11 shows the mean, coefficient of variation, and lag-1 autocorrelation coefficient for annual precipitation averaged over the entire Connecticut River basin. The statistics from several climate scenarios are presented, including those from the observed record, 234 downscaled GCM projections for the 2050–2099 period, and many different weather generator runs. The GCM projections were gathered from the World Climate Research Program's (WCRP's) Coupled Model Intercomparison Project phase 5 (CMIP5) multimodel data set and were downscaled using the bias-correction spatial disaggregation technique [Wood et al., 2004; Reclamation, 2013]. Three, 20 member ensembles of weather generator runs, each 62 years long, are presented. The first set is run under baseline conditions, while the second set is run with a 30% reduction in mean precipitation and a 30% increase in the standard deviation of annual precipitation. The final ensemble is run with a 30% increase in mean precipitation, a 30% reduction in the standard deviation of annual precipitation, and a significant decrease in the lag-1 autocorrelation of annual precipitation.

Figure 11.

The mean, coefficient of variation, and lag-1 serial correlation coefficient of annual precipitation. Statistics for several climate scenarios are shown, including (1) the observed record (red), (2) future (brown) BCSD downscaled GCM projections from the CMIP5 archive, (3) 20 baseline weather generator simulations (blue), (4) 20 simulations with a decreased mean and increased standard deviation (green), and (5) 20 simulations with an increased mean, decreased standard deviation, and decreased autocorrelation (magenta). The observed lag-1 serial correlation is 0.19.

[49] Several conclusions emerge from the results in Figure 11. First, the ensemble of 2050–2099 GCM runs shows an increase in mean precipitation over the historic average, with a mean increase of 110% and a range of 100% and 122%. These projections show a slight decline in the average coefficient of variation, but this change is largely driven by an increase in the mean with little change in the standard deviation. Also, the projections exhibit much lower serial correlation values than that seen in the observed record, with only a handful of scenarios showing comparable levels of persistence. The historic (1950–2000) time period from these projections (not shown) exhibit the same low level of persistence as the future scenarios, suggesting that the downscaled GCM projections may not exhibit realistic, higher-order climate characteristics over an aggregate region. Importantly, the magnitude, variability, and persistence of annual precipitation under these future GCM projections only exhibit a limited range of possible outcomes. This narrow view of possible future climate outcomes limits the utility of these projections in a climate change risk analysis, in which all climate possibilities, particularly high-impact, low-probability events, are important to the discovery and quantification of risk.

[50] In contrast, the 20 member ensemble of weather generator runs under baseline conditions exhibit climate characteristics that are directly comparable to the observed record. The magnitude, variability, and lag-1 autocorrelation of annual precipitation are all relatively unbiased. Furthermore, the ensemble of runs presents a range of plausible climates that could occur even without climate change, providing an analyst with climate sequences that could be used to test the robustness of a system to internal climate variability.

[51] A much wider range of possible future outcomes can be explored using the proposed weather generator. Figure 11 exhibits two possible combinations of change simulated by the model, including a set of climate sequences with significantly less but more variable annual precipitation, as well as a set of climate sequences with more annual precipitation, but with depressed variability and persistence. These two sets of changes are just a sample of what could be simulated by the weather generator, but their expansive range across climate change space demonstrates how the model could be used to explore a wide range of possible climate outcomes under climate change. This affords analysts more flexibility in how they examine the weaknesses of a system of interest and enables a more thorough exploration of climate risk. Given the tendency of planners and managers to underestimate the possibility of potential hazards, we feel that there are significant advantages to exploring system weaknesses over a wide range of possible climate outcomes, an analysis made possible by the proposed weather generator.

5. Discussion

5.1. Model Limitations

[52] It is important to recognize the limitations of any tool when trying to infer insight from model results. While the weather generator presented in this study was designed to simulation multiple forms of climate variability at several different time scales, there are certain components of climate variability that are still challenging for the model to account for or modulate. For one, a resampling algorithm drives the model, so at the daily time scale the tool implicitly assumes that the spatial correlation structure of the weather variables is stationary. This may not be the case under future climate changes, yet such a change cannot be simulated with this model. At interannual timescales, the tool currently simulates low-frequency variability based on an annual precipitation time series and ignores any signal in the annual temperature data. Also, it may be difficult to estimate robust parameters for certain low-frequency signals in the WARM model if the length of the annual precipitation time series is not sufficiently long. One approach to circumvent both of these issues would be to replace the annual precipitation time series with an alternative climate proxy that relates to both precipitation and temperature (such as an ENSO index), for which there is more data available through climate reconstructions [Kwon et al., 2009]. This requires, however, that a significant climate proxy with a long record can be found for the region of interest. Additionally, if monotonic trends, as opposed to quasi-oscillatory variability, are present in the annual data, then the WARM approach may identify spurious low-frequency components [Kwon et al. 2007]. Such trends, if identified, should be removed from the data before building the WARM model, but distinguishing trends from low-frequency oscillations is not straightforward. Finally, this model is data intensive, and therefore may be difficult to use in data-sparse regions. Despite these limitations, however, this tool does provide a step forward in the simulation of climate across multiple temporal and spatial scales for use in vulnerability assessments of human and ecological systems.

5.2. Determining Scenario Plausibility, Selecting the Scenario Range, and Linking to Climate Science

[53] The model presented here was designed to support decision-centric climate change studies by enabling an analyst to test a system under a wide range of plausible climate scenarios and identify potential climate hazards. However, the analyst faces two immediate questions when trying to conduct this “climate stress test”: (1) what constitutes a plausible climate change? and (2) how large should the range of climate changes be? Finding limitations on how far the climate can be perturbed before the scenario should be considered implausible is a difficult task. Expert opinion may be useful in defining these bounds, as may very large simulation ensembles of simpler (computationally faster) climate models [Piani et al. 2005]. However, the plausibility of each climate change scenario may not be critical when identifying system hazards as long as implausible changes are discounted or disregarded later in the analysis when developing estimates of climate risk [Brown et al., 2012]. The important factor is to determine how far the climate must change before the system no longer functions properly so that the analyst is aware of the potential climate hazards. Therefore, a promising strategy in bottom-up approaches may be to identify those climate variables and time scales that influence the performance of the system and then extend the range of climate changes for those variables wide enough to stress the system to failure. When those failures emerge, judgments can be made regarding the plausibility of the conditions causing them; they need not be made earlier. In practice, there may be computational challenges for exploring so many scenarios, but with parallel computing capabilities, the cost of an additional simulation run is often rather small. Also, adaptive sampling techniques may be utilized to reduce the number of simulations needed to discover performance thresholds in climate change space.

[54] Once performance thresholds in climate change space are identified, information on the likelihood of harmful climate states can be used to estimate climate risks facing the system. If certain scenarios used in the stress test are truly implausible, then the likelihood assessment should reveal this and discount these scenarios when estimating climate risk. Downscaled GCM projections are a logical starting place to garner this likelihood information, and recently, there have been significant efforts in the climate science community to develop formal probability distributions of global and regional climate variables from these projections. These approaches utilize initial condition ensembles [Stainforth et al., 2005], perturbed physics ensembles [Rougier et al., 2009], multimodel ensembles [Tebaldi et al., 2005], or combinations thereof [Sexton et al., 2012] to develop pdfs of response variables. Expert opinion can also be valuable in forming these likelihood estimates, as can data from the paleorecord. In addition, imprecise probabilities could be utilized to express uncertainty regarding the estimated values [Rinderknecht et al., 2012]. Potentially, more reliable probability estimates may be developed for discrete thresholds (i.e., the likelihood of climate change beyond a threshold associated with system failure), rather than continuous probabilities across the entire climate space. In all of these cases, the probabilities of change should likely be considered subjective, but they can still be coupled with the results of the vulnerability assessment to quantitatively appraise the robustness of different adaptation measures across the range of climate change space [Moody and Brown, 2013]. More research is needed to explore approaches for gathering this probabilistic information and coupling it with the results of an extensive vulnerability assessment.

6. Conclusion

[55] The most recent scientific knowledge suggests that the impacts of climate change on socioeconomic and biophysical systems could be very significant, yet they remain highly uncertain. Recently, decision-analytic approaches have been proposed to better handle this uncertainty and frame adaptation studies under climate change in terms more relevant for decision makers. These approaches, often bottom-up by design, require an understanding of system sensitivities to various changes in the climate system to better identify vulnerabilities and develop an understanding of potential risks to the system. However, technical methods for conducting these vulnerability assessments are relatively underdeveloped in the literature. This study presented a stochastic weather generator that can help facilitate the discovery of system vulnerabilities to several components of the climate system. When coupled with impact models, the weather generator enables a more complete identification of system vulnerabilities that can help inform risk management strategies and the selection of robust adaptation measures.

[56] The tool is designed to work not only for specific sites but also for systems that cover large spatial extents, such as trans-state river basins or ecosystems. However, future work is needed to explore how spatially expansive the model can be made before its skill degrades. Future studies will also utilize the weather generator tool to conduct stress tests on various socioeconomic and biophysical systems in order to appraise potential improvements from available adaptation measures.

[57] As climatic records continue to show increasing nonstationary in their probabilistic behavior, decision makers across a range of fields will seek actionable information that directly informs a choice between measures they can take to safeguard their system from further shifts in the climate. The high degree of uncertainty that surrounds these changes hinders the utility of a traditional predict-then-act framework for adaptation decision making. A shift in philosophy may be needed to provide the information truly needed to adapt our society to potential environmental changes that we cannot foresee. This study hopefully adds to a developing body of literature exploring new methods to analyze and present climate change adaptation information that can help better inform decision makers as they navigate an uncertain future.


[58] We thank three anonymous reviewers for their thoughtful criticisms and advice that helped to significantly improve this article. The work of the authors was partially supported by the National Science Foundation grant CBET-1054762 and the Department of Defense Strategic Environmental Research and Development Program (SERDP) project RC-2204.