Sparse geologic dictionaries provide a novel approach for subsurface flow model representation and calibration. Learning sparse dictionaries from prior training data sets is an effective approach to describe complex geologic connectivity patterns in subsurface imaging applications. However, the computational cost of sparse learning algorithms becomes prohibitive for large models. Performing the sparse dictionary learning process on smaller image patches (segments) provides a simple approach to address this problem in image processing applications. However, in underdetermined subsurface flow model calibration inverse problems, reconstruction of a segmented image can introduce significant structural distortion and discontinuity at the boundaries of the segments. This paper proposes an alternative sparse learning approach where the sparse dictionaries are learned from low-rank representations of the large-scale training data set in spectral domains (e.g., frequency domain). The objective is to develop a computationally efficient dictionary learning approach that emphasizes large-scale spatial connectivity patterns. This is achieved by removing the strong spatial correlations in the training data, thereby eliminating a large number of insignificant components from the sparse learning computation. In addition to improving the computational complexity, sparse learning from low-rank training data sets suppresses the small-scale details from entering the reconstruction of large-scale connectivity patterns, thereby providing a regularization effect in solving the resulting ill-posed inverse problems. We apply the proposed approach to travel-time tomography inversion and nonlinear subsurface flow model calibration inverse problems to demonstrate its effectiveness and practicality.
 Reliable prediction of subsurface flow and transport plays an important role in effective resource development planning and implementation. In particular, proper representation of spatially distributed subsurface flow properties, and their connectivity, dominate the flow and transport behavior in geologic formations. To infer the connectivity patterns in subsurface flow properties, an inverse modeling framework is applied to combine flow and transport response measurements with a forward model that relates the observed quantities to the subsurface properties of interest. Formulation and solution of the resulting inverse problem have been widely discussed in the subsurface flow and transport literature [e.g., Yeh, 1986; McLaughlin and Townley, 1996; Carrera et al., 2005; Oliver and Chen, 2011; Dai and Samper, 2004].
 A major difficulty in solving subsurface characterization inverse problems is data limitation, which is caused by the inconvenience and high cost of data acquisition. In general, the number of parameters used to describe distributed subsurface properties with a high enough resolution overwhelmingly exceeds the number of available measurements, rendering the resulting inverse problems severely ill-posed. As a consequence, an infinite number of subsurface property maps can be conceived that reproduce the measurements but provide very distinct predictions. The solution set of such ill-posed inverse problems is typically constrained by incorporating prior information to discriminate against implausible outcomes. However, prior model comes with its associated uncertainty. A general approach to incorporate the uncertainties in all prior knowledge (as well as data and model errors) is the probabilistic approach [Kaipio and Somersalo, 2007; Tarantola, 2005a; Kitanidis, 2012; Harp et al., 2008; Ye and Khaleel, 2008; Dai et al., 2010].
 Deterministic formulations of ill-posed inverse problems, on the other hand, use prior knowledge to constrain the solution set, for example, by adding, either implicitly or explicitly, information about the attributes of the desired solution or by using the prior knowledge to reduce the number of parameters (i.e., parameterization). Classical regularization methods such as Tikhonov regularization [Tikhonov and Arsenin, 1977] of zeroth, first, and second order promote the shortest, smoothest, and flattest solutions, respectively. These classical methods are useful when strong evidence suggests that the properties of interest should exhibit gradual spatial variability. In formations where spatial variability is characterized by abrupt changes in the spatial variability (e.g., facies distribution), more sophisticated techniques must be employed. Parameterization methods such as zonation [Jacquard, 1965; Yeh and Yoon, 1981; Tsai et al., 2003], level-set [Berre et al., 2007; Dorn and Villegas, 2008; Cardiff and Kitanidis, 2009], sparse geological indicator [Dai et al., 2005], and truncated Gaussian/pluri-Gaussian [Galli et al., 1994] have been introduced for abruptly changing geologic formations (or facies types). However, these methods generally rely on knowledge about the number, location, and shape of the facies, and are known to be inflexible, and in most cases hard to interpret geologically. In recent years, multiple-point geostatistics has been proposed for generating subsurface facies maps from a conceptual model that experienced geologists can infer from outcrop surveys and process-based modeling [Strebelle, 2002; Caers, 2003; Ronayne et al., 2008].
 Prior training methods have been widely used in machine learning and computer vision. A popular example is the face recognition problem where a large training data sets of human faces is used to inform the face recognition inversion algorithm of the expected patterns in the solution, thereby constraining the solution feasible set. Similarly, reliable prior information can significantly enhance the plausibility and quality of the inversion solutions in subsurface imaging problems, in particular when complex geologic connectivity patterns are considered. For example, a training data set that characterizes the general shape, connectivity, and geometric attributes of the expected geologic patterns can be adopted to significantly constrain the connectivity in the solution. In a recent publication [Khaninezhad et al., 2012], sparse geologic dictionaries were introduced as a flexible approach for summarizing and incorporating prior geologic knowledge into the solution of the subsurface flow model calibration inverse problems. The sparse learning approach is motivated by the recent developments in sparse representation and approximation literature, formalized under the compressed sensing paradigm [Donoho, 2006; Candès and Tao, 2005; Candès et al., 2006; Elad, 2010]. In particular, learned sparse dictionaries [Tošić and Frossard, 2011; Aharon et al., 2006] have been introduced as an alternative to generic image compression bases, where specialized dictionaries are learned from application-specific prior data sets for a more effective representation of patterns similar to those in the prior data set.
 Several approaches have been introduced to implement sparse dictionary learning in image processing [Aharon et al., 2006; Tošić and Frossard, 2011; Khaninezhad et al., 2012]. In Khaninezhad et al. , the formulation and advantages of learned sparse dictionaries for solving ill-posed subsurface flow inverse problems are discussed using the K-SVD algorithm [Aharon et al., 2006]. The computational cost of learning sparse dictionaries can, however, limit their application to large (full-size) images. To circumvent this problem, in image processing, sparse dictionaries are learned from smaller image segments/patches that are combined to form a full image. In data-deficient inverse modeling applications, such as subsurface model calibration, spatial image segmentation can lead to structural distortion and discontinuity at the boundaries of the image patches. Therefore, a more effective approach that honors the intrinsic and desired properties of the application of interest is more appropriate. For example, in data-scarce geoscience applications where the focus is primarily on identifying the overall shape and connectivity of the large-scale geologic features in the solution, an alternative approach to spatial segmentation should be sought. In this paper, we address this issue by learning sparse dictionaries from low-rank (low-dimensional) spectral representations of the training data set. To this end, we first consider the spectral representation of the prior data set in a linear compressive basis and discard the expansion dimensions that correspond to insignificant details to arrive at an effective low-rank representation of the full size image library. This step significantly reduces the dimension of the training data set without introducing a noticeable loss of quality in the prior model of facies connectivity. The subsequent sparse learning from the low-dimensional representation of the prior geologic connectivity can be performed efficiently. In addition to computational gain, working with the proposed low-rank data representations eliminates small-scale details from the solution and facilitates the estimation of large-scale connectivity features.
2.1. Dictionary Learning
 Conventionally, generic compression bases such as the discrete cosine transform (DCT) [Britanak et al., 2006; Jafarpour and McLaughlin, 2009] and the discrete wavelet transform (DWT) [Mallat, 2009] are used to provide highly sparse representation or approximation of a given signals or image. The generality of these transforms offers the robustness that is required for compressing arbitrary natural images, which explains their application in the JPEG and JPEG2000 compression standards. However, in many application, where compact representation of specific types of images (e.g., face or fingerprint images) is of interest, tailored bases or dictionaries that are learned from a training data set with the same class of images are more effective. Sparse dictionary learning can achieve exactly this goal; that is, to learn sparse representations of a class of images from a given training data set consisting of instances of similar images as prior information. For example, a training data set can be constructed by vectorizing and stacking L images of size N (pixels) as columns of the a matrix . This training data set can then be used to learn dictionaries such that a sparse set of words (elements) from this dictionary can be combined to approximate any image in the training data set ui (Table 1 lists the main notations that are used in this paper). Examples of methods used to construct sparse dictionaries form U are the method of optimal directions (MOD) [Engan et al., 2000], the K-SVD algorithm [Aharon et al., 2006; Tošić and Frossard, 2011; Khaninezhad et al., 2012], and the online dictionary learning approach [Mairal et al., 2010].
Table 1. Notations
Training data set containing L (vectorized) training samples
Low-rank basis (truncated DCT or SVD transform matrix)
Low-rank representation of the training data set (U = ΦV)
Learned dictionary either from the training data set U or the low-rank data set representation V
Coefficients of the sparse representation in dictionary D
Measurement matrix (mapping model to data)
Nonlinear mapping from permeability to flow data
Sensitivity matrix of the nonlinear mapping function g(·)
 Given the training data set U, any dictionary learning process essentially solves the following optimization problem for a fixed value of ε (a small approximation error):
where and denotes the number of nonzero entries in the vector θi. In this paper, we loosely define
as the norm of the vector x (note that for p < 1 the definition of norm does not apply). For simplicity, we write throughout this paper. The optimization in equation (1) can be equivalently expressed as:
where and S is the number of active (nonzero) elements used to approximate each training image. In fact, different dictionary learning techniques use different methods to solve the optimization problems in equations (1) and (3). Numerical examples show that the MOD and K-SVD have comparable performance in terms of solution and computational complexity, with the K-SVD having a small advantage [Elad, 2010]. Another approach is the online dictionary learning algorithm, which is less accurate but computationally more efficient than the other methods, especially, for large training sample sizes L.
 Sparse dictionary learning algorithms are computationally intensive, particularly for large image sizes N. Hence, in image compression, sparse dictionaries are learned for small image segments, typically of size 64 = 8 ×8 or 256 = 16 ×16, that are combined to form a full image [Elad, 2010]. Image approximation is then achieved at a high compression rate and with minimal structural distortion and boundary effects when relatively dense measurements are available (note that in many image processing applications, either the full image or a noisy version of it is available and a compressed representation of it is desired). In ill-posed inverse modeling applications where limited data are available, image segmentation can result in significant boundary effects and discontinuity. Furthermore, in subsurface flow and transport modeling, the connectivity of the parameters that represent the flow properties (hydraulic conductivity or flow permeability) are critical in predicting the flow and transport behavior in the underlying geologic formations. This insight is critical and can be exploited to formulate an effective and computationally efficient sparse dictionary learning approach for subsurface imaging applications. In this paper, we capitalize on the intrinsic properties of geologic formations and the related inverse problems to develop an efficient sparse geologic dictionary learning framework using the K-SVD algorithm described in Aharon et al. , Tošić and Frossard , and Khaninezhad et al. .
 The K-SVD algorithm is designed to find a dictionary containing m elements that can sparsely represent each of the training samples in , by solving
where . The K-SVD algorithm divides the above optimization problem into two subproblems: (i) sparse coding and (ii) dictionary update. In the sparse coding step, for a current dictionary, a basis pursuit algorithm is used to find the sparse representation for each member of the training data set. In the dictionary update step, the sparse representation obtained in the first step is fixed and the dictionary elements are updated to reduce the sparse approximation error. These two steps are repeated until convergence. Table 2 summarizes the K-SVD algorithm. Further details about the K-SVD algorithm may be found in Aharon et al. .
Table 2. K-SVD Algorithm
Initialization: Initialize dictionary with . Set j = 1.
REPEAT until the stopping criteria is met
Using a pursuit algorithm (e.g., OMP), compute as the solution of
For each column k = 1,2,…,m in D(j−1)
– Define the group of prior model instances that use the element,
– Compute the residual matrix
where is the kth row of .
– Restrict Ek by choosing only the columns corresponding to ωk, i.e., find
– Apply SVD decomposition to
– Update the dictionary element dk = a1, and the spare representation
2.2. Low-Rank Approximation of Training Data
 In subsurface modeling application, the training database is composed of hundreds or thousands of high-dimensional geological images that show the expected connectivity patterns in the formation of interest. These images can be constructed by stochastically integrating various sources of incomplete information including, quantitative and qualitative static data as well as geologic expertise and process-based modeling. As a more appropriate alternative to segmentation, we propose to use a low-rank spectral representation of the prior training data to speed up geologic dictionary learning. In this approach, large-scale prior models are first projected onto an effective low-dimensional subspace (e.g., a subspace defined by low-frequency DCT basis elements) that preserves the important connectivity features of the training data.
 Learning sparse dictionaries from low-rank spectral representations of the training data set (e.g., prior realizations of permeability images) can be achieved by projecting each training sample onto a low-dimensional compression basis Φ. Generic compression bases, such as DCT and DWT, yield reduced representations of natural images that exhibit fast decay properties in these bases. In particular, correlated subsurface physical property maps are known to have compressed approximations in these bases. The learning is then performed on the projection coefficients of the samples. Geologic facies and their physical properties typically exhibit piecewise continuous spatial distributions with distinctive geometry and connectivity. Therefore, after projection onto a spectral domain, a good approximation of their large-scale connectivity is obtained by eliminating many of the high frequency (small scale) details to significantly reduce the dimension of the prior samples. Data limitation in model calibration implies that one cannot hope to resolve high resolution details in these images; thus, removing the details from the prior representation should not result in a major loss.
 Figure 1 (top) shows a reference square slowness map (or after vectorization), where . The model represents facies distribution with distinctive slowness properties in an areal layer of a given formation. Figure 1 (bottom) contains five samples from a library of training data with 1000 samples, denoted as U1000. These samples are drawn from a large training image using the single normal equation simulation (snesim) multiple-point geostatistical simulation algorithm [Strebelle, 2002; Caers, 2003] and, hence, share similar statistical patterns that represent the prior knowledge about the reference image.
 The low-rank approximation of (or ) is simply obtained by projecting x onto a low-dimensional subspace span (Φ), i.e.,
where Φ† is the Moore-Penrose pseudoinverse of Φ. Figure 2a illustrates the low-rank approximations of the reference model with different numbers of low-frequency (truncated) DCT basis elements, i.e., 66, 91, 190, 276, and 325. Approximations with more than 190 lowest frequency DCT components do not result in any noticeable improvement in representing the structural connectivity in the reference map; that is, including additional basis components in representing the prior library minimally improves the structural connectivity; nonetheless, significant and insignificant elements equally contribute to the computational cost of dictionary learning. Therefore, removing insignificant elements results in substantial computational gains with little loss in quality of the learned dictionary.
 Alternatively, the low-rank basis Φ can be acquired from the prior geologic information in the library. A classical example is the low-rank representation with singular value decomposition (SVD). Denoting the library U that contains L training samples with the same dimension as x (i.e., ), it is straightforward to show that among all rank-m approximation bases, the leading m SVD left singular vectors of U give the smallest root-mean-squared error (RMSE) over the entire library. Expressing the SVD of the library U as
the rank-m SVD basis is defined through the set of basis vectors . We consider a library containing 1000 training samples, denoted as U1000, where
 The first 10 leading singular vectors (corresponding to the 10 largest singular values) of the library with samples shown in Figure 1 (bottom), U1000, are depicted in Figure 3. Figure 2b shows the performance of low-rank SVD approximation for m = 66, 91, 190, 276, and 325. For orthogonal Φ, such as the truncated DCT or SVD bases, we get
for . While the coefficients are low-dimensional, they are not learned from the training data set. We construct a sparse dictionary from by effectively learning the patterns in the projected coefficients, such that has sparse coefficients . A simple approach is to apply the K-SVD algorithm to the truncated DCT or SVD coefficients to obtain the dictionary D. In our examples, we pick m = 190 DCT or SVD basis elements.
 Figures 4a and 4b summarize the approximation quality of dictionaries learned from the low-rank training data after projection onto the truncated DCT and SVD bases. These approximations are performed using the orthogonal matching pursuit (OMP) algorithm for m = 190 dimensional D with different sparsity levels. Figure 4c shows OMP reconstruction results by applying the K-SVD algorithm to the original high-dimensional training data (used to learn ) for different sparsity levels. The results show that the dictionary D that is learned from the low-rank representation of the training data gives comparable results to those obtained from the dictionary learned directly from the original high-dimensional samples , i.e., , at a much higher computational cost. In this paper, we take k = m = 190 to make a fair comparison.
2.3. Computational Complexity
 The main objective of the low-rank prior representation is to improve the slow computation of sparse dictionary learning for large images [Elad, 2010; Aharon et al., 2006]. However, a main consideration is to do so while accounting for the intrinsic properties of geologic formations and without introducing structural distortion that can result from image segmentation in ill-posed inverse problems. To demonstrate the gain in computational complexity of the algorithm, we consider the K-SVD dictionary learning as an example. To learn a dictionary from a library , the algorithm alternates between the sparse coding and the dictionary update steps for a prespecified number of iterations [Aharon et al., 2006]. The computational complexity of each iteration is [Rubinstein and Elad, 2008], where S is the sparsity level. In the low-rank dictionary learning, one approximates with . In our approach, we apply K-SVD to learn a dictionary from the compressed library in the transform domain, i.e., V. The computational complexity of each iteration in this case is , where is the compression rate applied to the training data. The computational complexity of the low-rank dictionary learning is improved by approximately a factor of r. For our examples with , and L = 1000. The sparse learning step from the original full-size library took 1456.7 s. On the other hand, sparse learning from the SVD and DCT truncated libraries took approximately 155.1 and 153.6 s, respectively, both around 10% of the computation required to generate a similar dictionary from the full-size training data. All experiments were performed on MATLAB R2011b with a 3.4 GHz Intel quad-core CPU and 16 GB RAM. This analysis does not consider the computational cost of converting the full images to their low-rank representations, which is relatively insignificant for the precomputed generic compression transforms (e.g., DCT), but can become important for learned projection bases (e.g., SVD). Next, we explore the performance of this method using a straight ray travel-time tomography as a linear inverse modeling example and a two-phase subsurface flow model calibration as a nonlinear inversion.
 A few remarks are in order at this point. First, in the examples of this paper we focus on reconstruction of channel facies, which is known to be more challenging than estimation of Gaussian-type heterogeneous property distributions. However, the proposed approach is also applicable to sparse representation and reconstruction of Gaussian-type random fields. In addition, the main emphasis of this manuscript is on sparse learning from low-rank representations for improved computation and more consistent formulation of the inverse problem. Several aspects of constructing sparse geologic dictionaries such as the sparsity level or the size of the dictionary have been discussed in Khaninezhad et al. . In addition, generating a library of prior models can have practical limitations, especially for large-scale models. Such geologic libraries can be obtained from stochastic treatment of the geological modeling process and geostatistical simulation techniques. In addition, the exact number of samples to include in the library is primarily controlled by the available computational resources and the complexity and diversity of the existing features in the model. Moreover, an important property of inversion with sparse geologic dictionaries is the robustness to outliers or inconsistent samples in the prior library [Khaninezhad et al., 2012]. A detailed discussion of these points falls beyond the scope of the current paper.
2.4. Inversion Formulation
2.4.1. Linear Inversion: Straight Ray Tomography
 We consider two-dimensional straight ray travel-time tomography as a linear inversion example. In this case, the travel time of a wave through a medium is obtained by integrating the slowness of the cells that are intersected by a straight line that connects each pair of transmitter and receiver at fixed locations. The resulting travel time can be expressed as:
where x and y are spatial Cartesian coordinates in the horizontal direction, dl is the differential distance along the ray, and is the slowness (inverse of velocity) at point (x, y) [Bording et al., 1987]. With the simple straight ray path assumption, this formulation leads to travel times (measurements) that are linearly related to the slowness of the medium (parameters). Mathematically, we can write the resulting system of equations as y = Ax, where A contains the distances between the source/receiver pairs and maps the model parameters x onto data y. In general, for high resolution characterization of the slowness map, the number of unknown parameters is overwhelmingly larger than the number of measurements, resulting in a severely underdetermined inverse problem.
 A common method to alleviate data limitation is to incorporate qualitative and/or quantitative prior knowledge about the unknown parameters to regularize the solution. In this paper, we assume the prior knowledge is available in the form of a training data that have similar patterns to those expected in the solution.
 Assuming x has a sparse approximation in ΦD, where Φ is the truncated basis for low-rank representation of the training data and D is the dictionary learned from it, the sparse representation of x is θ and can be expressed as . The least-squares formulation of the tomography inversion is then obtained by minimizing a ℓ2-norm data misfit function
 To find a sparse solution for θ, one approach is to minimize and objective function consisting of the support of the solution in addition to the data misfit norm, that is
where λ is a regularization parameter that balances the data misfit and the regularization term. Standard methods for specifying λ includes the L-curve and the generalized cross validation (GCV) [Menke, 1989; Parker, 1994; Tarantola, 2005b].
 From the recent findings in the sparse reconstruction and compressed sensing literature [Donoho, 2006; Candès and Tao, 2005; Candès et al., 2006; Elad, 2010], under some mild conditions, the discontinuous term in the objective function can be replaced with a more well-behaved sparsity-promoting term (with ), that is
that yields the same solution as in the original problem with regularization. A special case is for p = 1, in which case the objective function becomes convex and reducible to a linear programming problem that can be solved efficiently [Donoho, 2006; Elad, 2010]. An algorithmically effective representation of the objective function for p = 1 is the weighted norm-2 representation
where is a diagonal weighting matrix whose entries at iteration k, i.e., , are approximately proportional to the inverse of the unknown coefficients from the previous iteration, . This choice simplifies the objective function and improves its convergence property while ensuring that at convergence reduces to .
 To avoid the singularity due to the form , Li and Jafarpour  use to adaptively control the weighting matrix during the iterations. We use a variant of the iteratively reweighted least square (IRLS) algorithm in Table 3 [Li and Jafarpour, 2010] to minimize the objective function in equation (13). The iterative form of the Newton algorithm for minimizing the objective function in equation (13) is obtained by applying the necessary condition
which results in iterations of the form
where K = AΦD.
Table 3. IRLS Algorithm for Linear Travel-Time Tomographic Inverse Problem
Input: Φ and D.
Initialization: Start with an appropriate initial guess θ(0),
Compute the weight matrix W
WHILE the convergence criterion does not meet
Update λ (optional)
Compute the convergence criterion
 Next, we consider a similar formulation for nonlinear problems. In this case, the formulation departs from the theoretical conditions specified by compressed sensing [Donoho, 2006] for the linear case. However, similar guidelines are incorporated in deriving a sparsity-promoting algorithm. The details of the formulation are discussed in Khaninezhad et al. .
2.4.2. Nonlinear Inversion: Subsurface Flow Model Calibration
 Of particular interest in this paper is the subsurface flow model calibration, which is posed as a nonlinear inverse problem. The problem we consider in this section is a two-phase immiscible and incompressible multiphase flow system. The underlying equations for these examples can be used to describe modeling and simulation of hydrocarbon reservoirs as well as groundwater remediation systems where water injection is used to clean up contaminated aquifers. In this paper, we describe the problem as a waterflooding example that is commonly performed in hydrocarbon reservoirs. The unknown of interest is the spatial permeability distribution of the geologic formation x. We denote the vector of measured and simulated data (pressure and water saturation) as and , respectively. The governing flow equations g(·) provide a nonlinear mapping from the parameter (permeability) space onto the data (pressure and saturation) space. In our examples, the governing equations g are solved numerically using a finite element discretization of the reference model, following the same approach as outlined in Aarnes et al. . The observation vector is denoted as , where and are the column vectors of water saturation and pressure measurements. The corresponding simulated data are defined as , where and .
 Also in this case, assuming x has a sparse approximation in ΦD, where Φ and D are defined similarly as in the tomography example, the sparse representation of x is θ and can be expressed as . To incorporate the prior knowledge (sparsity in ΦD), the least-squares formulation of the flow inverse problem (ℓ2-norm data misfit function) is augmented by a sparsity-promoting regularization term. The resulting cost function (in its iteratively reweighted form) is expressed as:
which now represents a nonlinear nonconvex objective function. Also in this case, we follow the classical iterative Newton approach to minimize the specified cost function. The diagonal weighting matrix W is similar to the tomographic inversion, whose entries are , where .
 We use a variant of the iteratively reweighted least square (IRLS) algorithm in Table 4 [Li and Jafarpour, 2010] to minimize the objective function in equation (16). The resulting update equation at the (n + 1)th iteration of the Newton's method can be expressed as
where , and G is the Jacobian matrix also known as the sensitivity matrix of g(x) with respect to x around . Note that the first-order Taylor approximation of the nonlinear terms is invoked, i.e., . The required first-order derivative information are computed using an efficient adjoint formulation that was described in Li and Jafarpour .
Table 4. IRLS Algorithm for the Nonlinear Subsurface Flow Inverse Problem
Input: Φ and D.
Initialization: Set appropriate θ(0).
Compute the weight matrix W
WHILE the convergence criterion does not meet
Update γ (optional)
Update sensitivity matrix G
Compute the convergence criterion
 In the next section, we apply the above inversion formulations to numerical examples. The examples are based on reference facies models representing a fluvial formation with dimensions . A challenging aspect of estimating the geologic features in fluvial formations is the sudden change in facies type (e.g., from sand channels to shale), which is not easy to describe and capture in an inverse problem. Simulation and estimation of such low-entropy discrete patterns is nontrivial and usually requires strong priors. In our examples, we use a training library U1000 that contains features with statistically similar pattern or spatial connectivity. We assume that the high and low property values are equal to the values in the reference map and the location and connectivity of the underlying features are the main unknowns.
3. Results and Discussion
3.1. Straight Ray Tomography Example
 As discussed earlier, segmentation of an image into small patches, as performed in image compression, is not suitable for ill-posed inverse modeling in geological settings since as undesirable structural distortion may be introduced due to data deficiency. Figures 5 and 6 illustrate this point by examining the approximation performance of the segmentation approach under two scenarios: (i) when the reference image is assumed known and only its compression is desired (Figure 5) and (ii) when the true image is not known and is retrieved from solving an underdetermined travel-time tomography inverse problem (Figure 6). In these examples, each training map of slowness is divided into 16 patches of size 16 × 16. In Figure 5, an undercomplete K-SVD dictionary is learned from the small patches. The OMP algorithm is applied to approximate each patch with sparsity levels 5, 10, 15, 20, and 25 approximations. Even in this case, where the original image is known and inversion is not needed, for high compression (approximation) rates the boundary effects are quite evident. Figure 6a shows a sample configuration of transmitters and receivers used in our travel-time tomography examples. The number of observations equals the number of transmitters times the number of receivers. Figure 6b shows the solution of the travel-time tomography inversion when 81 (corresponding to a pair of nine evenly spaced transmitter and receiver arrays) arrival time measurements are used in the IRLS inversion algorithm using the above image segmentation approach. The structural distortion along the boundaries of the segments is evident and attributed to the lack of sufficient data resolution to preserve the continuity across the patches.
 Figure 7 summarizes the reconstruction results with our proposed approach of learning dictionaries from low-rank spectral representation of the training data. Figure 7a shows the reconstruction results obtained from the observation configuration in Figure 6a. From the first to the third column, the results are shown for K-SVD dictionaries learned from the truncated DCT coefficients, the truncated SVD coefficients, and the original training samples. To show the importance of learning (from prior data), the reconstruction results (without any learning) using only truncated DCT and truncated SVD bases are also included in the fourth and fifth columns. Figures 7b and 7c show similar results that are obtained for fewer numbers of data (a total of 49 and 25 measurements from arrays with seven and five evenly spaced transmitters and receivers). To illustrate the reconstruction quality as a function of number of measurements, Table 5 summarizes the computed root-mean-squared error (RMSE) for the recovered slowness maps using different dictionaries with different number of measurements.
Table 5. Slowness Reconstruction RMSE in the Tomography Examples
DCT + K-SVD
SVD + K-SVD
81 = 9 × 9
49 = 7 × 7
25 = 5 × 5
 The significance of using learned dictionaries becomes more pronounced as the number of measurements decreases. As the number of measurements changes from 81 to 25, the first three columns in Figure 7 incorporate sparse learning (from prior data) and tend to preserve the major connectivity in the solution whereas the last two columns, which do not incorporate sparse learning, exhibit a loss of facies connectivity. More importantly, the results with learned dictionaries from the low-rank approximation of the training data are comparable to those obtained with dictionaries learned from the full-rank training data; the latter demands significantly higher computational time. In addition, better continuity is achieved when low-rank data representations are used, especially in the case of low-frequency DCT approximated training data. Another key point to note is that in data-deficient inverse problems, where the high resolution details are not recoverable from low-resolution data, these detail components can remain in the solution and form implausible artifacts that are not supported by the data or the physics of the problem. Such details are suppressed in dictionary learning from low-rank descriptions.
3.2. Flow Model Calibration Example
 For nonlinear model calibration examples, two sets of numerical experiments resembling waterflooding of an oil reservoir were carried out. In each case, a 640 m × 640 m × 10 m reservoir domain is discretized into a two-dimensional 64 × 64 × 1 uniform grid block system with 10 m × 10 m × 10 m block sizes. In both examples, the forward flow simulations were performed using an in-house simulator with an adjoint implementation for immiscible two-phase (oil/water) systems. A total simulation time of 600 days was considered and the waterflooding experiments were designed to inject water into the reservoir at a rate of one pore volume per year. During the simulation time, 50 12-days intervals were considered for data collection. The well configuration was based on a repeated uniform five-spot flooding pattern. Figure 8a illustrates the well configuration and reference permeabilities for the two examples: Reference Model A and Reference Model B. The corresponding oil saturation distributions after 120, 240, and 360 days are shown in Figure 8b. Table 6 summarizes the general simulation parameters that were used for the two examples.
Table 6. Parameters of Reservoir A and B
Reservoir A and B
64 × 64 × 1
10 m × 10 m × 10 m
Initial oil saturation
Number of injectors
Number of producers
Observation at injection wells
Observation at production well
Pressure and Saturation
 We first consider the segmentation approach for flow model calibration. The reconstruction results using image segmentation is shown in Figures 9a and 9b for the two reference models. While the features identified generally resemble the trend in the reference models, the connectivity of the channel features is lost when moving from one patch to the next. This is expected to happen since the data resolution is too low to inform the reconstruction about the correct continuity at the boundaries. The connectivity in the solution may be improved by imposing certain smoothness or continuity constraints along the edges of each segment. However, such measures can be useful for ensuring local continuity and do not enforce the expected large-scale field connectivity, which is a major limitation of the segmentation approach.
 The reconstruction results using different dictionaries are compared in Figures 10a and 10b for the Reference Models A and B, respectively. The results clearly show the distinction between the dictionaries generated with sparse learning (first three columns) and those obtained without sparse learning (last two columns). The truncated DCT and SVD bases without K-SVD learning fail to capture the correct connectivity. However, when sparse learning is implemented the estimated maps show marked improvements and capture the correct trend in the permeability distribution. It is important to note that in the case of truncated SVD (fourth column), the basis components are constructed by using the training data set, but without implementing sparsity as a constraint. Nonetheless, the estimation results are quite inferior to those obtained when sparse learning with K-SVD is applied. In both examples, dictionaries that use low-rank training data representation (first two columns) show similar performance to the one obtained from implementing the K-SVD on the full-rank prior images (third column).
 Figure 11 shows the recovered sparse coefficients over the learned dictionaries using the IRLS algorithm (red lines). For comparison, another 30-sparse representation in the same dictionary is also plotted using the well-known OMP algorithm (blue lines). That is, the OMP solution assumes that the reference models are known and provides a 30-sparse approximation. Figures 11a and 11b use the dictionary learned from the truncated DCT coefficients for the Reference Models A and B, respectively, while Figures 11c and 11d are related to the dictionary learned from the truncated SVD coefficients for the same reference models. The obtained solutions are clearly sparse, indicating that the algorithm is successful in promoting sparsity. In addition, in these examples about 10 of the significant components obtained by IRLS and OMP overlap, implying that some of the significant components of the solutions are correctly identified.
 To evaluate the prediction performance of the solutions, we switched the injection and production wells and used the reconstructed permeability models to forecast the well responses and compare them with those obtained from the reference model. Changing the well types in the prediction mode provides a more unbiased configuration than the original setup that was used in the model calibration step. The statistical summary of the prediction results are provided in Table 7 and show that, overall, sparse training over the DCT-based low-rank representations of the training data provides very good results at a much lower computational cost. Figures 12a and 12b show the predicted pressure and water saturation in a sample well. Overall, the prediction results are in accord with the conclusions derived from examining the estimated permeability maps; that is, the reconstruction performance is superior when sparse dictionary learning is used. The results obtained with the low-rank and full-rank representations of the training data sets are comparable. The estimates with low-rank representations have much lower computational complexity and tend to provide better continuity, which is useful in detecting facies distribution.
Table 7. RMSE of the Pressure P and Water Saturation Sw Over All Production Wells Using Inverted Five-Spot Injection Pattern
DCT + K-SVD
SVD + K-SVD
 Sparse geologic dictionaries offer a flexible approach for solving ill-posed inverse problems when reliable prior geologic models are uncertain. However, a major difficulty is the computational complexity of constructing such dictionaries for large-scale inverse modeling applications. In this paper, we presented a low-rank prior model approximation approach to address this issue by eliminating a large fraction of the insignificant dimensions that largely contribute to the computational burden of the sparse dictionary learning algorithm. The low-rank spectral representation of training data for sparse dictionary learning removes insignificant details that do not contribute to the large-scale connectivity in the facies distribution. These details are hard to resolve in typical underdetermined inverse problems that are encountered in geoscience applications, including subsurface flow model calibration. We showed that spatial image segmentation, which is the image processing solution to address the computational burden of sparse dictionary learning algorithm, can lead to structural distortion (discontinuities) of the solution when ill-posed inverse problems are considered. In addition, unsupervised image segmentation does not provide a mechanism to honor large-scale geologic connectivity, which is of significant interest in many subsurface applications. We propose transform-domain low-rank representation of the full-scale property maps or “images” as an alternative approach that is more appropriate for subsurface inverse modeling applications.
 In data-deficient subsurface inverse problems, low-rank dictionary learning in spectral domains (e.g., truncated DCT or SVD bases) improves the computational cost of sparse dictionary learning algorithms without compromising the recovery of connected large-scale structures. We showed examples from geophysical tomography and subsurface flow inverse problem in which learning from low-rank representations provided comparable, and in some cases superior, solutions to those obtained with dictionaries learned from full-size original training images. The main motivation of the proposed approach is to remove redundancy and unwarranted, and in most cases unresolvable, details of the training data from the dictionary learning process to gain computational efficiency. In general, balancing computation, data resolution, and reconstruction quality to maintain important solution attributes (e.g., connectivity structure) is ultimately application-specific and must be approached according to the main requirements and purpose of each application. In applications where small-scale features are important and must be included, for example, natural fractures as flow pathways or thin shale lenses as barriers, alternative parameterization methods must be adopted to represent those features.
 An interesting and important characteristic of geologic formations is the multiscale nature of the heterogeneity in rock physical properties. To accurately predict the flow and transport performance of subsurface systems the multiscale nature of the existing properties should be properly represented in the predictive model. Removing the features corresponding to any scale introduces departure from the true behavior of the system. The main challenge lies in evaluating the importance of different scales for the process of interest in light of the available data resolution and sensitivity. For geoscience applications in which large-scale facies connectivity is to be inferred from low-resolution flow data, removing the spectral dimensions that are only useful for representing fine-scale details not only reduces the computational complexity, but also improves the estimation results by suppressing less relevant and unresolvable small-scale features. However, removing fine-scale details may not be warranted even in a geologic setting if the fine details dominate the flow and transport behavior of the system.
 This work is partially supported by research grants from the Department of Energy and Society of Petroleum Engineers.