## 1. Introduction

[2] Without effective options for energy storage, providers of wind energy must rely heavily on accurate short-term forecasts of wind speed to effectively maximize the proportion of energy captured from wind in the load of existing power grids [*Porter and Rogers*, 2010]. Both numerical and statistical models have been applied to the problem of weather forecasting, each with strengths relevant to wind forecasts over different time scales. Statistical models provide very accurate and computationally inexpensive probabilistic wind speed forecasts over short time scales (on the order of a few hours) using a combination of observed data and persistence; however, their accuracy tends to degrade quickly with increasing forecast lead time [e.g., *Brown et al*., 1984; *Kretzschmar et al*., 2004]. *Gneiting et al*. [2006] and *Hering and Genton* [2010] fine-tuned these techniques by incorporating directional components into the predictive function adding a physical component to the forecast. Numerical weather prediction more robustly incorporates a physical representation of the system and while computationally more expensive than statistical models, can provide skillful forecasts over longer periods of time with applications ranging from long-term global climate prediction to short-term, high-resolution localized wind forecasts [*Giebel et al*., 2003], even over complex terrain [*Clark et al*., 1997; *Grønås and Sandvik*, 1999]. Combining statistical and numerical forecast tools into statistically tuned physical representations of the atmospheric system can further improve wind forecasts by applying statistical post processing techniques to numerical ensemble outputs to generate sharpened and more reliable probabilistic forecasts [e.g., *Gel et al*., 2004; *Hamill and Whitaker*, 2006; *Berrocal et al*., 2007; *Pinson and Madsen*, 2009; *Sloughter et al*., 2010; *Delle Monache et al*., 2013]. Ensemble simulations, in general, add a probabilistic component that can be used to evaluate the uncertainty of a forecast and are used operationally by, for example, the European Centre for Medium Range Weather Forecasts [*Molteni et al*., 1996] and the National Center for Environmental Prediction [*Toth and Kalnay*, 1997], and in more specialized applications such as flood forecast models [e.g., *Bao et al*., 2011]. Monte Carlo techniques are common in hydrological applications, described in stochastic hydrolgeology textbooks [e.g., *Rubin*, 2003], and used in advanced applications such as multimodel hydrologic ensembles to enhance the skill of predictions with Bayesian model averaging schemes [*Ajami et al*., 2007; *Duan et al*., 2007].

[3] The primary limitations in using numerical weather prediction to generate forecasts are the computational expense (a high-resolution forecast over a large spatial extent often requires the resources of high-performance computing environments) and model error stemming from simplifications and parameterizations in the numerical model's physical representation of the system it simulates [*Hanna and Yang*, 2001; *Eckel and Mass*, 2005]. Improvements in numerical models can be made by replacing assumptions and parameterizations with the mathematical representations of the physical processes they simulate. Of interest in this work is the representation in the modeling system of interactions between the land surface and the atmosphere. Work by *Chen and Avissar* [1994] clearly demonstrated a strong connection between soil moisture distribution on the land surface and regional scale precipitation and wind patterns. *Patton et al*. [2005] showed more specifically that heterogeneous surface soil moisture patterns are connected with very different development in the atmospheric boundary layer when compared with homogeneous land surface initializations, and *Holt et al*. [2006] further demonstrated that physically based land surface schemes that better represent vegetative processes and soil moisture initializations tend to yield more accurate forecasts of regional atmospheric disturbances. To capture the land surface-atmosphere physical connection, *Maxwell et al*. [2011] developed a fully coupled hydrologic and atmospheric model using the ParFlow (PF) hydrologic model [*Ashby and Falgout*, 1996; *Jones and Woodward*, 2001; *Kollet and Maxwell*, 2006; *Maxwell*, 2013] and the Weather Research and Forecasting (WRF) model [*Skamarock and Klemp*, 2008; *Skamarock et al*., 2008] in order to physically represent hydrologic processes from bedrock to the top of the atmosphere. This fully coupled model, PF.WRF, was used by *Williams and Maxwell* [2011] to show that reduction in uncertainty in subsurface characterization of hydraulic conductivity propagates into atmospheric variables, specifically wind speed.

[4] Assimilation of observations into forecast models represent another avenue by which improvements can be made to the forecasts produced by numerical simulations. In operational settings, 3-D and 4-D variational analysis (3D-Var and 4D-Var) are commonly used to assimilate observed quantities into the forecast model state, assuming an isotropic and static spatial distribution of error statistics. The adjoint of the forward model can be applied in these least squares techniques allowing for time evolution of the error statistics, in the case of 4D-Var. This Bayesian technique is not used in 3D-Var, which provides a computational savings [*Lorenc et al*., 2000; *Lorenc*, 2003]. The Kalman filter and its extension the extended Kalman filter (detailed in section 2) eliminate the need for linear and adjoint models to be applied to the error covariance calculations and also eliminate the assumption of isotropy and stationarity [*Lorenc*, 2003; *Hamill*, 2006]. The application of Monte Carlo techniques makes the Kalman filtering technique more tractable with the ensemble Kalman filter (EnKF) [*Evensen*, 1994; *Anderson and Anderson*, 1999; *Anderson*, 2001; *Evensen*, 2003]. Adding a linear operator to the ensemble Kalman filter to reduce the unrestricted propagation of error due to artificial added noise through the system yields better results, allowing for smaller ensembles [*Anderson*, 2001], though as we will show, larger ensembles are preferable and provide better results. *Evensen*'s [1992] experiments with the extended Kalman filter and the ensemble Kalman filter [*Evensen*, 1994] were focused on oceanographic applications, but it is also used for meteorological applications [*Lakshmivarahan and Stensrud*, 2009]. *Houtekamer and Mitchell* [1998] experimented with the error covariance characteristics of the ensemble Kalman filter in an atmospheric ensemble assimilating satellite and radiosonde observations using a dual filter technique where two ensembles are used to generate error statistics and the error statistics of one ensemble are applied to the calculations of the other in an effort to reduce the effects of statistical “inbreeding.” The ensemble Kalman filter has also been applied to hydrologic applications ranging from back calculation of porosity and hydraulic conductivity from head and concentration measurements [*Li et al*., 2012] to assessment of variably saturated groundwater flow regimes to track contaminant transport [*Hendricks-Franssen et al*., 2011; *Kollat et al*., 2011]. *Reichle et al*. [2002] used the ensemble Kalman filter to update soil moisture fields in a land surface model forced with offline atmospheric data.

[5] To date, a robust data assimilation scheme has not been applied to a coupled hydrologic and atmospheric modeling system. *Wang et al*. [2012] used 3D-Var to integrate observations in a coupled flood forecasting system employing atmospheric and hydrologic models coupled in a one-way offline scheme (i.e., atmospheric outputs force the hydrologic model). The PF.WRF model is coupled online such that feedbacks between the atmosphere and the surface and subsurface are simulated. This is described more in section 2. Using the EnKF functionality provided by the Data Assimilation Research Testbed (DART) [*Anderson et al*., 2009], an open source software package that includes a selection of several algorithms to assimilate observational information (and contains interfaces for several models including WRF), we extend the existing WRF-DART framework to include the dynamic hydrology of ParFlow, improving the physical representation of the simulated system and incorporating observed data to improve the predictive skill in a numerical weather forecast. We use this fully coupled data assimilation framework to study interactions between subsurface processes and the atmospheric processes coupled with them, which can be traced by analyzing the propagation of uncertainty reduction from soil moisture fields through surface energy fluxes to downwind measurements of wind speed.

[6] We selected the EnKF as the data assimilation technique for the fully coupled modeling system because of its flexibility using a Bayesian approach to calculating a flow-dependent forecast error covariance matrix using prior information from the model state vector derived from ensemble error statistics. By using an ensemble technique as opposed to a single realization forecast, these error statistics are easier to calculate as needed at each observation update cycle, resulting in computational efficiency. We have extended the existing WRF-DART interface so that it can work with the fully coupled PF.WRF model and assimilate hydrologic variables not included in the standard WRF-DART interface. This paper presents the methodology for constructing the fully coupled hydrologic and atmospheric modeling system with advanced data assimilation algorithms and describes the process for verifying it by assimilating soil moisture observations in an idealized test case to evaluate the responses of the model to the observations assimilation.