## 1. Introduction

[2] Earth observation, whether from in situ networks, intensive (but usually sporadic) field campaigns, or from satellite-based remote sensing retrievals, provides an inherently discontinuous stream of data. Remote sensing based retrievals of the terrestrial, ocean, atmosphere and subsurface systems, have significant potential to inform a variety of Earth system modeling applications [*Brunner et al.*, 2004; *Li et al.*, 2009; *Milzow et al.*, 2009]. From a terrestrial hydrological perspective, satellite retrievals have been used to characterize spatially and temporally varying fields such as soil moisture [*Jeu et al.*, 2008; *Liu et al.*, 2011], evapotranspiration [*Kalma et al.*, 2008; *Su et al.*, 2007], rainfall [*Huffman et al.*, 1995; *Kummerow et al.*, 2000], radiation [*Diak et al.*, 1996; *Weymouth and Le Marshall*, 2001] and even seek observationally based hydrological closure [*Sahoo et al.*, 2011; *Sheffield et al.*, 2009]. However, one of the confounding problems with the use of such observations is the presence of spatial discontinuities, due to incomplete coverage of the domain resulting from satellite orbital characteristics or through occlusion by cloud cover and other atmospheric effects. Such discontinuities often make satellite-based observations difficult to integrate within traditional modeling frameworks, which prefer spatially and temporally continuous data fields.

[3] The problem of gap filling in spatially discontinuous data sets, including those inherent in retrievals from Earth observing systems, has been the focus of many research investigations [e.g., *Boloorani et al.*, 2008; *Maxwell et al.*, 2007; *Wang et al.*, 2012; *Yuan et al.*, 2011; *Zhang et al.*, 2007]. In general terms, the gap-filling problem can be formulated as determining the value of a pixel with spatial constraints (it must be coherent with the surrounding values), temporal constraints (it must be coherent with the preceding values), and also constraints related to any dependence with covariates (which may not necessarily be linear dependencies). For example, topography is a covariate which is known to be influential on the spatial distribution of rainfall and air temperature [*Goovaerts*, 2000]. In many gap-filling studies, the reconstruction problem is often relatively well defined due to one or more of the following reasons: (1) the variable of interest is available at a time step that is close, relative to the temporal variability of the studied phenomenon, (2) the spatial extent of the gaps is small relative to the size of the features being reconstructed, (3) some strongly informative or linearly correlated covariates are available, and (4) there is only a single unknown variable to reconstruct, therefore the problem of preserving relationships between different uninformed variables does not exist. These types of problems can be described as strongly constrained gap filling, because the amount of information available to guide the interpolation is considerable. In such cases, some relatively simple methods such as image compositing can successfully address the problem [*Cihlar*, 2000; *Du et al.*, 2001]. While cokriging generally gives better results than image compositing, the highly constrained nature of the problem is similar [*Pringle et al.*, 2009].

[4] In this paper, we address a more challenging class of problems, referred to herein as weakly constrained gap-filling problems. Weakly constrained problems would include phenomena that change at subdaily time scales, where exhaustive measurements during preceding days may not be available or informative enough to fill gaps on some other day, and where filling cannot be inferred from linear correlation with a covariate. Another characteristic of weakly constrained gap filling is the significant size of the gaps compared to the size of the structures present in the image.

[5] As a result of the weak constraints imposed on the interpolation problem, the solution is necessarily nonunique, stressing the need to quantify uncertainty. In previous studies, the high determinism of so-called strongly constrained gap-filling problems did not generally confront the question of uncertainty in the interpolation results. We adopt the framework of geostatistics, which offers the means of evaluating interpolation uncertainty, either through an estimation variance in the case of kriging, or through Monte Carlo analysis.

[6] A popular approach to gap filling is cokriging [*Zhang et al.*, 2012]. *Zhang et al.* [2009] applied the technique to multispectral images to impose correlation with the same gap free image taken four months earlier. However, kriging and its variations have two major limitations: they are smoothing interpolators and can only account for linear relations with covariates [*Chilès and Delfiner*, 1999; *Goovaerts*, 1997]. Using kriging can result in the interpolated areas being visibly distinct from the rest of the image, presenting unrealistic continuous textures and, if point measurements are available, artifacts near these locations. Other geostatistical techniques such as stochastic simulation allow for better representing the textures present in the data. However, the underlying models are based on two-point statistics and may therefore not reproduce the complex spatial patterns present in known parts of the domain [*Journel and Zhang*, 2006]. Moreover, when dealing with multiple variables, these methods often consider linear relationships, which are oversimplifications in many environmental modeling applications [*Rivoirard*, 2001].

[7] In this paper we investigate the use of multiple-point geostatistics for gap-filling applications. The method employed here is the direct sampling approach [*Mariethoz and Renard*, 2010; *Mariethoz et al.*, 2010]. The technique exploits intrinsic relationships between linked observations and offers the capacity to provide more realistic spatially continuous fields from remote sensing based platforms and broaden their effective use and integration within the Earth sciences. Multiple-point geostatistics methods use training images to describe a time varying data set for periods other than the missing time, which then allows for the identification of specific spatial patterns that might be expected to recur in subsequent scenes. The spatial patterning and image structure can then be used to improve the gap-filling procedure. The supplementary use of multiple covariates, which are themselves incomplete, is at the foundation of the approach presented in this paper.

[8] In this preliminary assessment of the technique to Earth system data, we apply the reconstruction method to regional climate model (RCM) simulations over southeastern Australia [*Evans and McCabe*, 2010] and use this as a synthetic surrogate for remote sensing based retrievals. The advantage of using synthetic model output as opposed to actual satellite data is the capacity (1) to artificially impose distributions of gaps that can reflect both expected orbital features and atmospheric condition and (2) to assess subsequent image reconstructions against a spatially continuous modeled “truth.” An especially important aspect of using synthetic imagery is that it ensures we address a weakly constrained problem, by imposing gaps typically larger than the spatial structures present. It also allows the production of data sets where gaps occult simultaneously across multiple nonlinearly related variables, and to then validate the results against the known nonlinear relationship. Such a validation would be extremely difficult with real data.

[9] In the sections 2 and 3 we detail the structure and logic behind the direct sampling approach and then develop realistic scenarios based on these synthetic data to describe and assess the potential application of the technique to remote sensing retrievals.