We investigated whether seasonal soil moisture forecasts derived from a land surface model forced by seasonal climate forecast model outputs are more skillful than benchmark forecasts derived from the same land surface model but with forcings taken from resampled climatological precipitation, temperature and low level winds. For most forecast leads and over the western United States, soil moisture forecasts based on seasonal climate forecasts are no more skillful than the benchmark. For relatively short (one month) leads, the climate model-based forecasts are more skillful than the benchmark along a swath from the Gulf States to the Tennessee and Ohio Valleys and the Southwest monsoon region, where the climate model has skillful precipitation forecasts.
 Drought is among the costliest natural disasters in the United States, with average losses exceeding $10 billion per event [National Climatic Data Center, 2011]. Drought early warning systems based on hydroclimate forecasts can help local and federal governments to reallocate resources for mitigating drought impacts [Hayes et al., 2004]. Currently, both the Environmental Modeling Center (EMC) of the National Centers for Environmental Prediction (NCEP) and the University of Washington (UW) routinely produce hydroclimate forecasts of soil moisture (SM) and runoff to support the Climate Prediction Center (CPC) operational Seasonal Drought Outlook. The EMC uses the hydrological prediction system developed by the Princeton University group [Luo et al., 2007] based on the NCEP Climate Forecast System version 1 (CFSv1). The UW uses the Ensemble Streamflow Prediction (ESP) method to predict SM and runoff on seasonal time scales [Wood and Lettenmaier, 2006]. Both systems use the VIC model as the core of their hydroclimate forecast systems.
 NCEP recently upgraded their CFS system to version 2 (CFSv2) with improved model physics and higher spatial resolution (http://www.cfs.ncep.noaa.gov/cfsv2.info). Yuan et al.  examined forecast skill of 2 m temperature (T2m) and precipitation (P), and found a substantial improvement in month-1 forecast skill relative to CFSv1 over the conterminous United States (CONUS). The question we raise in this paper is whether these improvements in P and T2m forecasts lead to improved ability to forecast SM, a primary variable required for agricultural drought forecasting.
 For seasonal SM forecasting, skill comes from the initial hydrologic conditions (IHCs) and climate forecast (CF) skill. Koster et al. performed multi- model experiments to examine the influence of the IHCs on CF skill at subseasonal time scales. They found that better initialization clearly improved the skill of temperature forecasts, with more modest improvements in precipitation forecast skill. They did not address the effect of IHCs on SM forecast skill specifically.Shukla and Lettenmaier  compared the SM and runoff forecast skill of ESP, a method widely used in hydrology which is based solely on knowledge of IHCs (no CF) as represented by a land surface hydrology model, and a method that Wood and Lettenmaier  termed reverse ESP (rESP), which is based on climatology for IHCs but perfect CF. Comparison of SM forecast skill for ESP and rESP isolates the contributions due to IHCs and CF. They found the IHCs dominate SM forecast skill at leads 1 to 2 months, and CF thereafter. For some parts of CONUS, such as the western interior region, IHCs play an important role even at longer leads. It should be emphasized that while ESP is a practical tool that is widely used in hydrology, rESP is not, because it assumes perfect forecasts. In this paper, rather than perfect CFs, we assess SM forecast skill relative to ESP for SM forecasts in which CFSv2 is the CF source.
2.1. VIC Simulation
 We used VIC model version 4.0.6 [Liang et al., 1994] to perform the forecast experiments. This is the same version of VIC that is used in the University of Washington (UW) quasi-operational Surface Water Monitor (SWM;http://www.hydro.washington.edu/forecast/monitor). We ran the model in water balance mode (essentially meaning that the effective surface temperature is assumed to be equal to surface air temperature) with a spatial resolution of 0.5 degrees. To spin up the model's SM and snow storages, the VIC model was run from 1 Jan 1979 to 1 Dec 2010 with initial conditions on 31 December 1978 taken from UW's SWM archive. Forcings for the simulation were derived from observations from index stations using the procedure outlined in Wood and Lettenmaier . This long-term simulation is labeled as VIC(SIM). The SM taken from VIC(SIM) was also used for verification and to derive parameters for downscaling and error correction.
2.2. Bias Correction and Spatial Downscaling (BCSD) Method
 The BCSD method is a quantile mapping approach that is commonly used to correct biases of hydroclimate forecasts [Wood et al., 2002; Wood and Schaake, 2008]. The BCSD method corrects the full probability distribution of the variable in question. We applied BCSD to CFSv2_VIC forcings to correct for biases in precipitation and T2m CFSv2 forecasts.
2.3. ESP_VIC and CFSv2_VIC Experiments
 All experiments were carried out for the period during which CFSv2 hindcasts are available (1982–2009). We examined forecasts initiated on 1 January and 5 July. Both the ESP_VIC and CFSv2_VIC experiments have the same IHCs, obtained from VIC(SIM) on the same forecast date for the target year.
 For a given target year, we formed the ESP_VIC ensembles by selecting at random N years from the historical period 1950–2009, excluding the target year. For each ensemble member, VIC forcings (P, Tmax and Tmin) were taken directly from the gridded data for the forecast period, beginning with the forecast initialization date and proceeding for a 3-month period. Other variables such as downward solar and longwave radiation, required to force VIC were indexed to the daily mean temperature and temperature range following the approach outlined inMaurer et al. , while surface wind was taken from the lowest vertical level of the NCEP/NCAR reanalysis, which starts from 1948. These forcings were then used to drive the VIC model to obtain SM values for each day of the forecast period. The process was repeated for each of the N ensemble members. The ensemble average SM forecast is the equally weighted mean of all members. We tested the ESP forecasts for N = 10, 20, 40 and 60 and found that about N = 20 produced stable results, and is approximately consistent with the 16 ensemble members available for CFSv2 (see below).
 For the CFSv2_VIC forecasts, the VIC forcings were derived from the CFSv2 seasonal hindcast archive from National Climate Data Center (NCDC). Archived CFSv2 seasonal hindcasts were performed every 5 days from 1 Jan 1982 to 27 December 2009 with a frozen model and data assimilation system. On each day, four forecast runs were initialized at 0Z, 6Z, 12Z and 18Z of that day. Each run lasts for 9 months. To obtain a total of 16 ensemble members, we used four ensemble members each initialized on the nominal forecast date (Dfcst) as well as Dfcst-5, Dfcst-10 and Dfcst-15. Time series of 6-hourly P, T2mand 850-hPa winds were obtained from the CFSv2 hindcast archive for each ensemble member. The forecast date Dfcst is always the first day of the target month on which a set of (four) ensemble members were initiated. For each ensemble member, the daily Tmax, Tmin, P and 850-hPa winds were bi-linearly interpolated to the VIC grids with a spatial resolution of 0.5 degrees from the CFSv2 grid (with approximate resolution of 1 degree). Monthly mean P and T2mat each lead were corrected for bias using the BCSD method. We chose to correct monthly means instead of daily means because the 28-year record is not long enough to establish a stable daily climatology, and to avoid problems with mis-representing interactions among the three primary variables. The correction was equally distributed to all days within the month. Then, forcings derived from the bias-corrected daily P and Tmax, Tmin were used to drive the VIC model to obtain the SM forecasts.
 The monthly mean CFSv2_VIC and ESP_VIC SM forecasts were corrected using BCSD to the probability distribution of the historical simulation, VIC(SIM). Even though P, Tmax and Tmin are bias corrected for CFSv2_VIC, the relationship between SM forecasts and the forcings is not linear. For example, errors in evapotranspiration will feed back to SM forecasts. Therefore, we chose to perform a second stage error correction. For CFSv2_VIC, we found that the skill is generally higher after correction. For ESP_VIC, we found that the second stage error correction did not result in statistically significant differences in the forecasts, however for consistency, we performed the second stage bias correction to both sets of forecasts.
 The SM hindcasts from both experiments (ESP_VIC and CFSv2_VIC) were cross validated against VIC(SIM) for the target year. The root-mean-square error (RMSE) and the correlation between hindcasts and VIC(SIM) were used to estimate forecast skill. For the July 5th initial conditions, the July means for both hindcasts and corresponding verification from the VIC(SIM) are 26-day means. We normalized the RMSE by the standard deviation of the SM anomalies from VIC(SIM). If the normalized RMSE is greater than 1, then there is no skill because the errors are larger than interannual variability. We considered alternative forecast evaluation strategies – for instance, use of multimodel, rather than single model (VIC) simulation. However, the forecast errors in general are much larger than the uncertainties among the models. We therefore, chose to verify forecasts against VIC(SIM) as outlined above.
 To measure the relative skill of the two experiments, we calculated the RMSE ratio R between the two experiments. Let RMSE(i) be the RMSE for hindcasts produced by experiment i (i = 1, 2),
If R ratio is less than 1, then experiment 1 has higher skill than experiment 2. The reverse is true if R ratio is greater than 1 [Shukla and Lettenmaier, 2011].
 Let the variance Si be equal to (RMSE(i))2 (assuming bias is small as a result of bias correction). We tested whether the difference in variances (S1 and S2) between two experiments is statistically significant at the 5% level using Bartlett's test as applied by Lettenmaier and Burges .
3. Forecast Skill
Figure 1 shows the normalized RMSE for forecasts initialized on 1 January and 5 July for the ESP_VIC experiments while the skill for the CFSv2_VIC is given in Figure 2. The correlations for ESP_VIC at lead -1 are also given (Figures 1d and 1h). There is a close correspondence between the correlations and the RMSEs. Forecasts in areas where the normalized RMSE was greater than 1 were considered unskillful. Skill is regionally dependent. Skill is higher over the western interior of CONUS where forecasts are skillful even at leads longer than one month and is lower over the eastern U.S. At Lead- 1, forecasts from both ESP_VIC and CFSv2_VIC generally are skillful. At Lead -2, both forecasts are skillful mostly over the western interior of CONUS. The areas where CFSv2_VIC has low skill are also areas where ESP_VIC has low skill. For July, both forecasts show no skill for Lead -2 east of about 100°W.
Figures 1c and 1gshow the RMSE ratio for ESP relative to persistence. For persistence forecasts, the SM anomaly at Lead-1 is equal to the SM anomaly for the previous month. For Lead - 1 the R ratio (ESP/persistence) is less than 1 essentially everywhere. ESP_VIC forecasts are more skillful than persistence because they have full knowledge of the IHCs and climatologic forcing in comparison to persistence of SM.
 The question we seek to answer is whether SM forecasts derived from CFSv2_VIC forecasts are more skillful than those derived from ESP_VIC, which resample their forcing from climatology. Figures 3a–3d show the RMSE ratio (ESP_VIC/CFSV2_VIC) (contoured). The red (black) contour lines indicate where CFSv2_VIC (ESP_VIC) generally has higher skill. The areas that the differences between the variances of CFSv2_VIC and ESP_VIC are statistically significant at the 5% level based on the Bartlett test as applied by Lettenmaier and Burges  are shaded blue.
 Overall, the differences in skill between the two experiments are statistically significant only at Lead 1. For January, the areas with statistically significant skill differences cover the Northern Central and the western interior region. For July, the areas are limited to the area west of 115°W. For these areas, ESP_VIC is more skillful than CFSv2_VIC (ratio <0.9 black contour lines). CFSv2_VIC adds more skill over the eastern CONUS which arguably is more dynamically active (RMSE ratio >1.1 contoured red) but only if the P forecasts are skillful. These are the regions that show the RMSE of rESP is low relative to ESP, where CF is especially important to overall forecast skill [Shukla and Lettenmaier, 2011]. The ESP approach is generally most skillful in the areas where SM is persistent and is less skillful over the dynamically active areas where the interannual variability of soil moisture is low compared with that of precipitation during the forecast period. This latter condition leads to low forecast skill along a swath from the Gulf States to the Tennessee and Ohio Valleys even at Lead 1. Precipitation over these areas depends on the path and strength of moisture transport from the Gulf of Mexico, which is determined by dynamic forcings. For the Southwest, SM increases after the monsoon onset which varies from late June to early August [Higgins et al., 1997]. The timing of monsoon onset and retreat depends on the establishment of monsoon circulation. ESP_VIC does not have that information, and arguably for this reason ESP_VIC forecast skill is lower than CFSv2_VIC over Arizona and New Mexico for July.
Figures 3e–3h show the cross validated (normalized) RMSE skill for CFSv2 daily P forecasts after the BCSD bias correction for monthly means verified against the P analyses. There is a good correspondence between the forecast skill for P and the RMSE R ratio for SM (Figures 3a–3d). The areas where the P forecasts have high skill are also the region where CFSv2_VIC SM forecasts have higher skill than ESP_VIC. For January, ESP_VIC is more skillful than the CFSv2_VIC forecasts over the interior of the West and the North Central CONUS for Lead 1 (Figure 3a). In these regions, P forecasts have low skill with normalized RMSE > 1.2 (Figure 3e). On the other hand, CFSv2 P forecasts are generally skillful over the Gulf coast, the Ohio and Tennessee Valleys (even for Lead -2 for January forecasts). These are regions where the CFSv2_VIC SM forecasts are more skillful than ESP_VIC. For July, ESP_VIC has higher skill over most of the West where the P forecast skill is low.
4. Discussion and Conclusions
 We have evaluated ESP_VIC and CFSv2_VIC SM forecasts over the CONUS for January and July for the period 1982–2009. As pointed out by Shukla and Lettenmaier , SM forecast skill is regionally and seasonally dependent. Overall, predictive skill is higher over the western part of CONUS for both ESP_VIC and CFSv2_VIC and lower over the eastern part of CONUS.
 To estimate the persistence, we computed the characteristic time To [Trenberth, 1984] from the autocorrelation R (i) at lag i (in months) for i = 1 to 30
 Over the CONUS, there are two distinct hydroclimate regimes. The interior of the West is dry and has high water holding capacity. To computed from SM based on VIC(SIM) is about 2 years over the western U.S. The eastern U.S. is wetter with more frequent precipitation. SM is less persistent but To for SM nonetheless is about 6 months, which is much longer than To for precipitation. This accounts for the regional differences in SM forecast skill.
 Persistence has commonly been used as a baseline for evaluation of forecast skill because of its simplicity and availability [Schubert et al., 1992]. For SM, persistence can produce relatively skillful forecasts. However, ESP forecasts are more skillful than persistence (Figures 1c and 1g) because they exploit full knowledge of the IHCs and the seasonal cycle of climatologic forcing. Furthermore, it is always possible to obtain reliable IHCs (assuming the existence of consistent long-term model forcing data, which is generally the case over CONUS) from a land surface model such as VIC. Therefore, we argue that for SM, ESP is a more relevant benchmark than persistence. It sets a higher bar for alternative methods, such as CFSv2_VIC.
 Does CFSv2_VIC add any value to the ESP_VIC forecasts? Over the western interior of CONUS, ESP_VIC generally is superior to CFSv2_VIC due to the strong persistence of SM and because of the low skill of the P forecasts from the CFSv2 for January and July. Figure 3shows that there is a good correspondence between areas where forecasts have skill and areas where CFSv2_VIC has RMSE ratios greater than 1.1 – especially at Lead -1. When and where P forecasts from the CFSv2 are skillful, the CFSv2_VIC does add values to SM forecasts.
 The CFSv2 hindcasts were performed every 5 days. For 16 member ensemble, the old member is about 15 days old. It does not exploit the skillful weather forecast at the beginning of forecast [Shukla et al., 2012]. Better weighting of the CFSv2 hindcasts may improve forecast skill.
 This project was supported by the NOAA (CTB) grant J8R1RP7-P01 to the Climate Prediction Center, and by NOAA (CTB) grant NA10OAR4310245 to the University of Washington.
 The Editor thanks the two anonymous reviewers for their assistance in evaluating this paper.