We assess hydroclimatic projections for the Murray-Darling Basin (MDB) using an ensemble of 39 Intergovernmental Panel on Climate Change AR4 climate model runs based on the A1B emissions scenario. The raw model output for precipitation, P, was adjusted using a quantile-based bias correction approach. We found that the projected change, ΔP, between two 30 year periods (2070–2099 less 1970–1999) was little affected by bias correction. The range for ΔP among models was large (∼±150 mm yr−1) with all–model run and all-model ensemble averages (4.9 and −8.1 mm yr−1) near zero, against a background climatological P of ∼500 mm yr−1. We found that the time series of actually observed annual P over the MDB was indistinguishable from that generated by a purely random process. Importantly, nearly all the model runs showed similar behavior. We used these facts to develop a new approach to understanding variability in projections of ΔP. By plotting ΔP versus the variance of the time series, we could easily identify model runs with projections for ΔP that were beyond the bounds expected from purely random variations. For the MDB, we anticipate that a purely random process could lead to differences of ±57 mm yr−1 (95% confidence) between successive 30 year periods. This is equivalent to ±11% of the climatological P and translates into variations in runoff of around ±29%. This sets a baseline for gauging modeled and/or observed changes.
 The Murray-Darling Basin (MDB) located in southeast Australia (Figure 1) is Australia's food bowl, with almost 40% of Australia's agricultural production. The region supports extensive grazing, dryland cropping, and, most importantly, a variety of irrigated crops. Acute water shortages in the basin in recent years as a result of drought and overallocation have focused attention on the long-term sustainability of activities within the basin [e.g., Murray-Darling Basin Commission, 2009; Potter et al., 2010; Maxino et al., 2008]. Superimposed on that are concerns about the possible impact of climate change on water availability in the future.
 One key part of the information base used to evaluate likely future conditions is the projections from state-of-the-art coupled atmosphere-ocean general circulation models. The most recent compilation of model simulations has been made available through the World Climate Research Programme's Coupled Model Intercomparison Project phase 3 (CMIP3) multimodel data set [Meehl et al., 2007]. The same database was used to prepare the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [IPCC, 2007] (hereafter referred to as IPCC AR4 models) and is widely used to assess the hydrologic impact of climate change [e.g., Groves et al., 2008; Chiew et al., 2009].
 For precipitation in particular, different models or, on occasion, different runs of the same model give very different regional-scale simulations for the historical period and projections for the future as documented in the global water atlas [Lim and Roderick, 2009] and elsewhere [Johnson and Sharma, 2009]. An interesting point here is that in terms of globally integrated precipitation, there is little practical difference between various simulations (i.e., for the historical period) and projections (i.e., for the future) of climate models [Lim and Roderick, 2009]. The large differences in model simulations and model projections occur at regional scales.
 Two basic approaches have been used to make regional-scale projections. The first, here called the ranking approach, is based on the idea that some models give better representations of the recent past in a given region. The ranking is based on comparing the model output with observations over the historical period. Previous research on precipitation scenarios for the MDB have generally followed this approach [e.g., Maxino et al., 2008; Chiew et al., 2009; Smith and Chandler, 2010]. The ranking approach can have some interesting consequences. For example, as summarized by Smith and Chandler [2010, Table 2], the model rankings for precipitation simulations over the MDB are different from the ranking for the entire Australian continent. Taking this to the limit, it might turn out that the most highly ranked model for a particular purpose would vary from one region to the next.
 The second, here called the bias correction approach, is based on the idea that the statistical properties of the model output can be adjusted to be identical with observations over the historical period. This approach has been widely used in climate change impact studies [Wood et al., 2004; Maurer and Hidalgo, 2008]. The simulation from each individual model run is adjusted so that the overall mean and the variance match observations for the historical period.
 In comparing the two approaches, each model will have the same ranking after bias correction, and each model will therefore contribute equally to the ensemble. In contrast, the whole aim of model ranking is to change the relative weights and, in some cases, even remove models (i.e., zero weight) from the ensemble. This could have a major impact on the projected hydroclimatic changes. In that respect, the aim of this paper is to develop an understanding of the properties of the individual model runs and the statistical properties of the ensemble of the simulations, projections, and projected changes. To do that we use data from the global water atlas [Lim and Roderick, 2009].
2. Data and Methods
2.1. Climate Model Database
 The global water atlas [Lim and Roderick, 2009] is based on monthly climate model output for precipitation (P), evaporation (E), and land area fraction available in the multimodel climate data archive for the IPCC AR4 models[Meehl et al., 2007]. In preparing the atlas, there was no a priori model selection; that is, all models having available data (for P and E) were used. In total, the output for 39 paired model runs from 20 different climate models (Table 1) was available for the historical period known as the 20C3M scenario (climate in the 20th century) and for one future scenario, the A1B scenario [IPCC, 2000]. The A1B scenario (the 750 ppm stabilization scenario) assumes midrange emissions for 2000–2099. The 39 individual model runs are here called the all–model run ensemble. Multiple runs were available for eight of the models (Table 1), with each run representing a different set of initial conditions [e.g., Rotstayn et al., 2007, p. 5]. Hence, those multiple runs can be used to examine the sensitivity to initial conditions. Output from the 20 models (where multiple runs were averaged) are here called the all-model ensemble.
Table 1. Summary of the Climate Model Output Showing Number of Monthly Runs Available for Each Model-Scenario Combination
Model and Country
INGV-ECHAM4, Europe, ECMWF
 The model output was resampled into a common geographic grid of dimensions 2.5° × 2.5° (∼270 km × 270 km). The monthly model output was aggregated into annual time series, and projected changes were calculated from the difference in P, E, and P − E between the end of the 20th (1970–1999) and 21st (2070–2099) centuries. A geographic mask defining the MDB (Figure 1) was used in conjunction with the model-specific land area fraction to extract model output for the MDB region.
 Initial calculations revealed a problem with the estimates of E. When integrated over the MDB, the 30 year averages in E were often greater than P in many models. Our investigation found that the problem was caused by the way the climate model output is archived. The climate models presumably calculate P and E separately for the land and ocean fractions of each grid box and add them, adjusting for land area fraction as appropriate to obtain the total P and E for each grid box. However, the separate grid box level estimates for the land and ocean components are not archived. When reconstructing the estimates, the base level assumption is that the land and ocean E and P scale directly with the respective area fractions. This ignores the fact that over the long term, E does not exceed P over land but can over the ocean (or lake) where water is always available for evaporation. The worst-case scenario occurs along dry coastal regions where E from the ocean will be much higher than E from land. This is very clear in global maps made using the raw outputs of P and E (e.g., for their difference, see Held and Soden [2006, Figure 7]) where there are clear discontinuities along arid coastlines such as around much of Australia or in the Middle East.
 We tried various schemes, and the final approach to reconstructing the climate model output for E is described in detail in Appendix A. In brief, as a “work-around,” after calculating E in each grid box within the MDB, we tested whether the 100 year average E was greater than P. If so, then the 100 year average E was reset to be equal to the 100 year average P. It should be noted that by following this procedure, we may still have E greater than P in any given year or even in a 30 year average. Hence, our approach is not ideal, but the alternative, of ignoring the problem, led to unphysical, and unrealistic, results for the MDB water balance. This problem is a regional one and could be resolved if the land and ocean components of P, but especially E, were archived separately for each grid box in the climate model output.
2.2. Precipitation Observations
 We obtained the time series of observed annual precipitation for the MDB from the Bureau of Meteorology of Australia (1900–2008, http://www.bom.gov.au). To aid with the bias correction, we also obtained the monthly precipitation data from the Global Precipitation Climatology Center (GPCC) database [Rudolf and Schneider, 2005; Rudolf et al., 2010]. This data set is developed from rain gauge–based precipitation data interpolated on a 2.5° × 2.5° grid from 1901 to 2007. We compared the 107 year time series of annual precipitation for the MDB from the two sources and found that they were, for all practical purposes, identical (linearly regressed with the slope 1.02, intercept −6.03 mm yr−1, and determination coefficient 0.996). Hence, the GPCC monthly precipitation data set was used to undertake the bias correction of precipitation data as described in section 2.3.
2.3. The Bias Correction Method
 Traditional quantile-based mapping bias correction approaches adjust the mean and variance of a model simulation to agree with the statistical properties of the observations. Specifically, the cumulative distribution function (CDF) of the model output is adjusted to agree with the CDF of the observations [Wood et al., 2004; Maurer and Hidalgo, 2008]. In detail, for a given grid box and month, one first locates the percentile value for the model simulation and then replaces the simulated monthly precipitation with the observed monthly precipitation from the same percentile in the (observed) CDF. This is the bias-corrected output. The remaining challenge is how to adjust the model projections. Recently, Li et al. [2010, Figure 3] proposed an approach where the difference over time in the model output (CDF model projection minus CDF model simulation) is preserved in the projection. For the future projection, one first constructs the CDF for the model projection, simulation, and the observations and then locates the corresponding percentile values in the three CDFs. The bias-corrected output is calculated by adding the difference between the model projection and simulation to the observation at the same percentile. Hence, projected changes in the model output should be preserved.
 Following the Li et al.  method, we used monthly P estimates (GPCC, 1901–2007) to perform the bias corrections. We then aggregated to annual data and calculated P for the relevant 1970–1999 and 2070–2099 periods. Note that on completion of the procedure, the (monthly) variance in each individual model run for the historical period will, by construction, equal the (monthly) variance in observations. However, after the bias correction, the variance of the annual time series of each model run will not necessarily be equal to the variance in the (annual) observations because the bias correction method used monthly data [Sun et al., 2010].
3.1. Hydrologic Balance of Raw Model Outputs
 The hydrologic summaries of the raw outputs from the 39 climate model runs for the MDB are described in Table 2. In the 39 model runs examined, P for the 1970–1999 period varied from 230.1 to 994.4 mm yr−1. The all–model run average was 578.4 (±176.2 mm yr−1) (plus or minus standard deviation, denoted by σ) compared with the observed value of 517.4 mm yr−1. Of the 39 model runs, 22 showed increases in P to the end of the 21st century while 17 showed decreases. The average change for the all–model run ensemble was for a very small increase in MDB annual precipitation of 4.9 mm yr−1 by the end of the 21st century. However, the range is large, with some model runs projecting a drop in annual precipitation of as much as 150 mm (e.g., CSIRO-Mk3.0, CSIRO-Mk3.5, and MPI-ECHAM5 Run2), while other model runs simulate increases of about the same amount (e.g., MIROC3.2_MEDRES).
Table 2. Summary of Raw Hydrologic Outputs for the MDB (mm yr−1)
P − E
P − E
Δ(P − E)
The All–Model Run Ensemble
Mean across all runs
Number of runs showing increases
Number of runs showing decreases
The All-Model Ensemble
Mean across all models
Number of models showing increases
Number of models showing decreases
 The results for the all-model ensemble for the 1970–1999 period were virtually identical, with a large range (230.1–984.3 mm yr−1) and very similar mean and standard deviation (556.2 ± 179.3 mm yr−1). The projected future change, averaged over all models, was also very small (−8.1 mm yr−1) within a large range (−109.8–153.3 mm yr−1).
 The evaporation results are not as reliable for the reasons outlined previously, and even after our adjustment, there are small negative values of P − E for a 30 year period in some model runs. With that caveat in mind, the all–model run runoff (P − E) for the period 1970–1999 varied from −1.6 to 156.4 mm yr−1 with an average of 44.6 mm yr−1 compared with the observed runoff of ≈27 mm yr−1 (M. L. Roderick and G. D. Farquhar, A simple framework for relating variations in runoff to variations in climatic conditions and catchment properties, submitted to Water Resources Research, 2010). Of the 39 model runs examined, 16 showed increases in P − E to the end of the 21st century while 23 showed decreases. The average change for the all–model run ensemble was for a very small decrease in MDB annual runoff of −0.5 mm yr−1 by the end of the 21st century. However, again we emphasize that the range (−20.9–32.5 mm yr−1) is large relative to the average change. The overall results for the all-model ensemble were more or less the same (Table 2).
 The time series (P, E) for all 39 model runs are shown in Figure S1 in Text S1 in the auxiliary material and document the diversity of simulations and projections described above. That diversity is further explored in Figure 2, where we show changes in the grid box level hydrologic balance for the model run with the largest projected increase in P (MIROC3.2_MEDRES Run2, ΔP = 155.4 mm yr−1) as well as the largest projected decrease (MPI-ECHAM5 Run2, ΔP = −158.5 mm yr−1) along with the mean of the all–model run ensemble. In summary, the raw outputs of the IPCC AR4 models show a large range of simulations and projections for the MDB. The overall conclusion about the hydrologic changes projected for the MDB (small change in ensemble mean but with large variation among individual runs/models) is the same as found for the whole of Australia [Lim and Roderick, 2009].
 Given the above noted land-ocean boundary problems in using simulations or projections for E (and P − E), we focus on P, the most important of the hydrologic variables in the MDB, in the remainder of this paper. In terms of P, the generic feature of the projections is that the average across all model runs or all models shows little change (4.9 and −8.1 mm yr−1) within a very large range (up to ±150 mm yr−1) by the end of the 21st century.
3.2. Precipitation Ensemble Using Raw Model Output
 The P ensemble (Figure 3) has been constructed using the raw model output for all 39 runs (Figure 3a). Note that the mean of the all–model run ensemble is virtually constant from year to year and is more or less the same as the mean of the all-model ensemble (Figures 3b and 3c and Table 2). The mean of the (all–model run or all-model) ensemble is larger than observed, and this is adjusted later using the bias correction approach described in section 3.3.
 Earlier work reported that the lag one (year) autocorrelation was ≈0 for both (annual) precipitation and runoff in all subbasins within the MDB [Potter et al., 2008]. Here we extend that result with the finding that the autocorrelation in (annual) precipitation is also ≈0 for lags up to 36 years in both observations and climate model output for the MDB (Figure 3d). The implication is that over the last century, the MDB annual precipitation time series closely approximates a purely random time series (of a given variance). Hence, precipitation in a given year is (statistically) independent of that in the previous year(s). What is especially important is that (nearly) all of the 39 model runs also share the same basic characteristic (Figure 3d). Hence, from a statistical viewpoint, the model runs examined here also approximate purely random time series.
3.3. Precipitation Ensemble Using Bias-Corrected Output
 The bias evident in the raw output time series (Figure 3b and Figure S1 in Text S1) was removed using the quantile-based method [Wood et al., 2004; Li et al., 2010]. The before and after results for each of the 39 model runs are summarized in Table 3 and are shown in Figure S2 in Text S1. As anticipated, the variability in model simulations (for 1970–1999) is vastly reduced in the bias-corrected output (minimum 435.0, mean 486.6, and maximum 546.6 mm yr−1) compared to the raw model output (minimum 230.1, mean 578.4, and maximum 994.4 mm yr−1). The same holds for the 2070–2099 period. The statistical properties of the ensemble after bias correction are depicted in Figure 4. The overall reduction in the spread of model simulations and projections is clearly evident by comparing Figures 4a, 4b, and 4c with Figures 3a, 3b, and 3c. Importantly, the autocorrelation analysis shows the same basic pattern as found in the raw model output (compare Figures 4d and 3d).
Table 3. MDB Precipitation Summary Before and After Bias Correction (mm yr−1)
The All–Model Run Ensemble
Mean across all runs
Number of runs showing increases
Number of runs showing decreases
The All-Model Ensemble
Mean across all models
Number of models showing increases
Number of models showing decreases
 Despite the changes induced by the bias removal procedure, the statistics of the projected difference, ΔP, were more or less unchanged (Table 3). For example, in the all–model run ensemble, the statistics of ΔP calculated from raw model output (mean 4.9, σ = 68.3, minimum −158.5, and maximum 155.4 mm yr−1) are for all practical purposes identical with that calculated after the bias correction procedure (mean 5.5, σ = 69.2, minimum −156.1, and maximum 158.1 mm yr−1) (Table 3). The same holds for the all-model ensemble.
 Another key feature of the results is that the ensemble (either the all–model run or all-model) mean is more or less constant over time both before (Figure 3c) and after bias correction (Figure 4c). This is a very interesting result. A literal interpretation, of relevance only to the bias-corrected output, is that the ensemble average across a number of model runs (e.g., 39) in a given year is a good approximation to the corresponding average of the 30 year time series for a given model run.
 The interannual variance of each individual model run was, in some instances, changed markedly by the bias correction procedure (Table 4). However, the statistical properties of the all–model run ensemble of the variances in the simulations, projections, and projected changes were all, more or less, unchanged (Table 4).
Table 4. Summary of Variance of Annual Precipitation Time Series Over the MDB From the Raw Output and After Bias Correction ((mm yr−1)2)
The All–Model Run Ensemble
Mean across all runs
Number of runs showing increases
Number of runs showing decreases
 In summary, the bias correction scheme forced the ensemble to more or less replicate the statistical properties of the observations (Figure S2 in Text S1). The correction scheme did slightly alter the projections of ΔP for individual model runs (Table 3). However, those changes were small, and the statistical properties of the projections of ΔP (Table 3) were virtually unchanged by the bias correction procedure. We examine the variability in projections of ΔP further in section 3.4.
3.4. Understanding Variability in Future Projections of ΔP
 The finding, via the autocorrelation analysis, that the observed, simulated, and projected P time series are from a statistical viewpoint more or less random, has important implications. For example, as a base case, assume that the P time series were to remain a stationary time series into the foreseeable future. On that basis, we could anticipate differences between 30 year averages of P, for example, between the 1970–1999 and 2070–2099 periods, just by chance. How large could the differences be?
 To investigate the differences, we calculated the variance of the observed MDB annual time series. Then, by assuming a purely random process with that variance, we numerically generated a time series of 130 random numbers. Averages were taken over the first 30 numbers and the last 30, and the difference was taken. We repeated that process 10,000 times to simulate a statistical distribution for ΔP. In the initial calculations we used a normal distribution to generate the random number sequence, and the resulting distribution of ΔP was also normal. Subsequent tests with other assumed distributions for the random time series (e.g., uniform, gamma) showed that regardless of that choice, the resulting distribution of ΔP was always very close to normal. The results for the MDB (observed variance (1900–1999) 12,510 (mm yr−1)2) were for a mean ΔP of 0 (as per the assumption) and for a variance of the ΔP distribution of 826 (mm yr−1)2. The equivalent 2σ bound (95% confidence interval) was ∼±57 mm yr−1, and the overall bounds (assessed at the 0.01% percentiles) were ∼±110 mm yr−1. The resulting interpretation is that under the stationary assumption employed here, we could expect 95% of all changes in P over successive 30 year periods in the MDB to be within ∼±57 mm yr−1, with a further 5% up to the outer bounds (∼±110 mm yr−1) of the distribution. Changes beyond that would immediately identify that the time series cannot be stationary.
3.5. Variability in Future Projection of ΔP in Climate Models Over the MDB
 Following the above example for the MDB, we can immediately see that the magnitude of the bounds (e.g., 2σ for 95% confidence or 0.01% for outer limits) must scale with the variance of the original time series. This has profound consequences for interpreting changes in the individual model runs. In particular, visual inspection of the raw output time series for each of the 39 model runs (Figure S1 in Text S1) shows that the variance in the model runs can be as much as double (e.g., CSIRO-Mk3.5 Run1 and MPI-ECHAM5 Run2) or as little as a quarter (e.g., IPSL-CM4 Run1 and GISS-AOM Run1 and Run2) of the observed variance. To test further, we calculated the variance of the annual time series for each of the 39 model runs in both the 20th and 21st centuries (Table 4). The results show the large range in the variance of the raw model output relative to observations over the 20th century.
 Using those insights, we prepared a plot showing the projected change ΔP (2070–2099 less 1970–1999) for each of the 39 model runs versus the variance of that model run for the 1900–1999 period. Overlaid on that plot are calculations (per the above description) of the statistical bounds (±1σ, ±2σ, ±3σ, maximum, and minimum) assuming a stationary distribution (Figure 5). Key features, some alluded to previously, emerge immediately. For example, the results for model 12, (one run of) model 16, (two runs of) model 14, and (two runs of) model 15 all fall outside the outer bounds for a random stationary process. The statistical characteristics of those time series have changed substantially, and those projections cannot be considered stationary.
 A closer examination of the above noted model runs is warranted (see Figure S1 in Text S1). The time series for model 12 (IPSL-CM4) shows a long steady downward decline in the mean with little change in variability around that trend. This model run violated the stationary assumption because the mean changed. However, the variability in the model run is much smaller than observed over the historical period, and the model simulation is not convincing. The time series for both runs of model 12 (MIROC3.2_MEDRES) show that the stationary assumption was violated because of a marked upward trend in both mean and variance. The time series for (two runs of) model 15 (MIUB-ECHO-G) show that the stationary assumption was violated because of a steady increase in the mean with little change in variability. The contrast here is interesting; each model run had readily understood reasons for violating the stationary assumption. However, the reasons were different. The remaining (one run of) model 16 (MPI-ECHAM5 Run2) is an enigma. This model run, projected the largest of all decreases in ΔP, and examination of the time series (Figure S1 in Text S1) shows a marked decline in the mean with perhaps a slight increase in variability around the mean. The enigma is that each of the four model runs gave markedly different results. In contrast, multiple runs from the other seven models tend to cluster together (Figure 5).
 The 2σ bounds approximate the 95% confidence interval. Using that as a guide, we can also identify many other model runs that are unlikely to be stationary, including models 2, 4, 5, and 6, (one run of) model 7, (one run of) model 8, (one run of) model 15, (two runs of) model 16, (two runs of) model 17, and models 19 and 20. In summary, the projected change ΔP falls outside the bounds of an assumed stationary process in six model runs and outside the ±2σ range in a further 13 model runs. The remaining 20 model runs fall within ±2σ bounds. Given the previous results for the autocorrelation analysis, it would be difficult to distinguish those time series from one generated by a purely random process.
 The single runs of the two CSIRO models present an interesting case study because both models projected large decreases in P that approximate the ±3σ level, implying that the projected time series has changed substantially. Results for model 5 (CSIRO-Mk3.5) show a large decrease in variability about the mean (Figure S1 in Text S1 and Table 4). However, model 4 (CSIRO-Mk3.0) shows the opposite with a large increase in variability about the mean (Figure S1 in Text S1 and Table 4). The central difficulty here is that both models have a variance during the historical period that is larger than observed. With only a single run available, it is difficult to come to any firm conclusions. More recent research with a slightly different variant of the Mk3.0 model has shown quite large differences in ΔP between multiple runs [Rotstayn et al., 2007, Figure 19].
 The bias correction procedure changed the variance of the annual time series in the climate model output (Table 4). We also used the bias-corrected model outputs to prepare Figure 6 analogous to Figure 5. While many of the details of the plot are different (compare Figure 5), the overall pattern and the general conclusions drawn above remain.
 A previous study on the MDB evaluating the capacity of different IPCC AR4 climate models to simulate temperature and precipitation found the IPSL-CM4 and CSIRO-Mk3.0 to be the best overall models [Maxino et al., 2008]. Of those, the IPSL-CM4 model was ranked the best for precipitation [Maxino et al., 2008, Table II]. We used the difference-variance framework to investigate the P annual time series in the (single) IPSL-CM4 model run in considerable detail. We found that this particular time series was not very convincing. The mean P was less than the half the observed value, and more importantly, there was little year-to-year variation in P: the variance was around 1/6 of the observed value (Table 4 and Figure S1 in Text S1). The totally different conclusions highlighted above arise because the earlier work considered monthly, i.e., intra-annual, P, while we considered annual P, i.e., the interannual variation.
 Of the 20 models examined here, 8 had multiple runs available. Of those, the results for 7 models show that while there was some variation between different runs, the overall projections still tended to converge in a “region” of the difference-variance plot (Figure 5). There was an exception: multiple runs from the MPI-ECHAM5 model (model 16 in Figures 5 and 6) diverged. That exception raises an important point. Would any of the 12 models having a single run behave the same as the MPI-ECHAM5 model if multiple runs were submitted? Of course, we do not know the answer. In that respect, we believe that there is, at least at the moment, some reason to be cautious about overinterpreting the results from single runs of a climate model.
5. Summary and Conclusions
 The results presented here are based on 39 model runs from 20 different IPCC AR4 climate models (Table 1). For each of the 39 model runs, the main results are all derived from annual P time series for the historical period (1900–1999) and for a future (2000–2099) that follows the IPCC A1B emissions scenario. Other emission scenarios could have been used, but here we pursued an understanding of the statistical nature of the ensembles and of the simulations and projections for P. Our main focus was the projected change ΔP (defined as the difference 2070–2099 less 1970–1999).
 Of the 20 models, 12 contributed a single run, while two or more runs were available from the other 8 models. The model population was extremely diverse, with 7 (of the 39) runs being contributed by a single model. It was thus possible that a simple average over all runs would be biased toward those models contributing the most runs. To investigate this possibility, we created two P ensembles. The first, called the all–model run ensemble, included all 39 model runs. The second was formed by averaging the multiple runs (where necessary) across each model to create a 20 member all-model ensemble. Despite the heterogeneous nature, the statistical properties of the simulations and projections for P were, for all practical purposes, identical for both ensembles (Table 2). On that basis, we only summarize results from the all–model run ensemble.
 The range in P among the raw model output was large (Figure 3a), with obvious bias relative to the observed time series (Figures 3b and 3c). After bias correction, the climatological range was much reduced (Figures 4b and 4c). Finally, while the bias correction did change the simulation (1970–1999) and projection (2070–2099) for P, it did not materially alter the projected change ΔP (Table 3).
 The main findings (Figures 5 and 6) are derived from the autocorrelation analysis (Figures 3d and 4d). We found that the observed annual P time series for the MDB could be considered to be a purely random time series with no time dependence at any of the (time) lags considered (Figure 3d). Just as importantly, we found that (nearly) all of the annual time series from the 39 model runs also shared the same basic characteristic. We emphasize that these results held in both observations and models, both before (Figure 3d) and after (Figure 4d) bias correction. The consequences are important.
 First, by assuming that the MDB annual P time series remains stationary into the future, one can estimate the probabilistic variations in ΔP that result solely from random fluctuations. This provides an extremely useful base case against which to assess the projections for ΔP by climate models. Second, it is important to consider the variance of the model time series when comparing different model estimates of ΔP because the magnitude of random fluctuations scales with the variance. In particular, the variance in individual model runs for the historical period was up to twice, or as little as 1/6, the observed value. Hence, it is virtually impossible to interpret differences in ΔP between different model runs without considering the variance of each model run. To address this situation, we developed the “difference-variance” plot (Figures 5 and 6). This enabled us to rapidly identify those model runs showing large changes in ΔP relative to their variance.
 What does all this imply for projections of ΔP in the MDB? In terms of the all–model run ensemble, there was a large range (∼±150 mm yr−1) in projected change ΔP (Table 2 and Figures 5 and 6). When the projections of ΔP are averaged over all model runs, the result is, more or less, zero change (Table 2). One contribution arising from this work is the “base case” scenario. For the MDB, we anticipate that a purely random process could lead to differences of ±57 mm yr−1 (95% confidence) between successive 30 year periods. This is equivalent to ±11% of the climatological P and with all else constant, translates into variations in runoff of around ±29% (∼7.7 mm yr−1 on a catchment-wide basis and equivalent to ∼7700 GL yr−1) (Roderick and Farquhar, submitted manuscript, 2010). This sets a baseline for gauging modeled and/or observed changes.
Appendix A:: Adjustment for Evaporation in Mixed Land-Ocean (Water) Grid Boxes
 For mixed grid boxes with land and ocean or lake, the 100 year average (e.g., 1900–1999, 2000–2099) of P − E in most climate models initially was negative. As a consequence, the evaporation estimates for the land component of mixed grid boxes required adjustment for both periods of 1900–1999 and 2000–2099 to make the results physically realistic. The procedure adopted is described below.
 Generally, for every grid box,
where E, EL, and EO are annual evaporation for the whole grid box, land area (whose fractional area is αf), and ocean area of the grid box, respectively; P, PL, and PO are annual precipitation with the same meaning. Here, we assume, with the same respective meanings, P = PL = PO.
 In the initial step, we set E = EL = EO. For the grid boxes where αf > 0 and EL > PL, we set
where E′L is the adjusted land evaporation for the grid box and and are the 100 year average annual land evaporation and precipitation for the grid box, respectively. For those grid boxes, the ocean evaporation will be
Then for the whole adjusted period (1900–1999) we have
Note that for a different period, e.g., 1970–1999, the averaged and adjusted evaporation is not necessarily equal to .
 We acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI), and the WCRP's Working Group on Coupled Modeling (WGCM) for their roles in making available the WCRP CMIP3 multimodel data set. Support of this data set is provided by the Office of Science, U.S. Department of Energy. This research was supported by the Murray-Darling Basin Authority (contract MD1318) and by the Australian Research Council (DP0879763).