Corresponding author: P. C. Leube, Institute for Modelling Hydraulic and Environmental Systems (LH2), SimTech, University of Stuttgart, Pfaffenwaldring 61, DE-70569 Stuttgart, Germany. (firstname.lastname@example.org)
 Many hydro(geo)logical problems are highly complex in space and time, coupled with scale issues, variability, and uncertainty. Especially time-dependent models often consume enormous computational resources, but model reduction techniques can alleviate this problem. Temporal moments (TM) offer an approach to reduce the time demands of transient hydro(geo)logical simulations. TM reduce transient governing equations to steady state and directly simulate the temporal characteristics of the system, if the equations are linear and coefficients are time independent. This is achieved by an integral transform, projecting the dynamic system response onto monomials in time. In comparison to classical approaches of model reduction that involve orthogonal base functions, however, the monomials for TM are nonorthogonal, which might impair the quality and efficiency of model reduction. Thus, we raise the question of whether there are more suitable temporal base functions than the monomials that lead to TM. In this work, we will derive theoretically that there is only a limited class of temporal base functions that can reduce hydro(geo)logical models. By comparing those to TM we conclude that, in terms of gained efficiency versus maintained accuracy, TM are the best possible choice. While our theoretical results hold for all systems of linear partial or ordinary differential equations (PDEs, ODEs) with any order of space and time derivatives, we illustrate our study with an example of pumping tests in a confined aquifer. For that case, we demonstrate that two (four) TM are sufficient to represent more than 80% (90%) of the dynamic behavior, and that the information content strictly increases with increasing TM order.
 In recent years, there has been a dramatic increase in the computational complexity of hydro(geo)logical models. This has been driven by new problems addressing large-scale relationships like global warming, reactive transport on the catchment scale, or CO2 sequestration. Computational model complexity becomes even more drastic when facing the ubiquitous need for uncertainty quantification and risk assessment in the environmental sciences [Christakos, 1992; Oreskes et al., 1994; Rubin, 2003] or stochastic inverse techniques for incorporating field data into uncertain hydrogeological models [e.g., Kitanidis, 1995; Gómez-Hernández et al., 1997; Evensen, 2007; Franssen et al., 2009]. For that reason, reducing the complexity of hydro(geo)logical models has been the focus of many research efforts [e.g., Hooimeijer, 2001].
 The goal of mathematical (not conceptual) model reduction is to reduce the computational costs or, alternatively, to admit more conceptual complexity, finer resolution, or larger domains at the same computational costs, or to make a brute force optimization task more feasible [Razavi et al., 2012]. The computational demand of stochastic models for space-/time-dependent hydro(geo)logical systems shall serve as an example. The corresponding computational demand can be broken down into contributions from spatial, temporal, and stochastical resolution, e.g., spatial grid resolution, time step size, and the number of repeated simulations dedicated to uncertainty. The latter may involve, for example, Monte Carlo (MC) simulation [e.g.,Freeze, 1975; Smith and Schwartz, 1980, 1981; Robert and Casella, 2004], polynomial chaos expansions (PCE) [e.g., Wiener, 1938; Li and Zhang, 2007; Oladyshkin et al., 2011a, 2012], or statistical moment generating equations [e.g., Neuman and Orr, 1993; Neuman, 1993; Zhang, 2002]. Reducing model complexity with adequate techniques (but not merely by reducing the spatial resolution) allows the modeler or investigator to almost maintain the required numerical prediction quality while controlling the computational costs.
 In this work, we will focus on temporal complexity which is owed to the dynamic character of hydro(geo)logical systems. The dynamic character appears in time-dependent system response curves. Some examples include aquifer reactions due to recharge events, tidal pumping, or changing river stages [e.g.,Yeh et al., 2009], drawdown curves (DC) due to the excitation of the subsurface water level in pumping tests [e.g., Fetter, 2001], solute breakthrough curves (BTC) during the injection of water-borne tracers and contaminant spills [e.g.,Fetter, 1999], or reactions of river discharge to precipitation in hydrological models [e.g., Nash and Sutcliffe, 1970].
 In our opinion, the most powerful contribution to temporal model reduction has been made by Harvey and Gorelick . Their approach converts dynamic models to steady state models by employing a Laplacian transformation to the time dimension. After a Taylor expansion of the Laplace coefficients (LC), one can directly simulate characteristics of the time-dependent response curves, the so-called temporal moments (TM) with steady state equations. Alternatively, TM can be derived by projecting the time-dependent governing equations onto a series of monomialstk of the order [e.g., Cirpka and Kitanidis, 2000a]. The generating equations for TM are steady state equivalents of the original governing equations, and so allow for swift evaluation. TM are said to capture the most significant aspects of the system response such as strength, delay, duration, etc., and often have well-defined physical meanings (seesection 2.2). Thus, TM intuitively bear a high information density, dramatically reducing computational costs at a comparatively small loss of information. The only prerequisites are that the governing equations must be linear (systems) of PDEs or ODEs, and the coefficients must be time invariant.
 We observed that almost all applications involve hypothetical data and scenarios to test and demonstrate their methods, and almost no field applications exist. So we raised the question as to why there are only so few applications. Many models used by practitioners indeed satisfy the prerequisites for applying TM. We believe that hydro(geo)logists have simply not become comfortable yet with TM-based methods and that reservations exist about the possible loss of information when using only a few TM. The goal of the current study is to overcome these reservations, and to push the use of TM, including higher orders, further toward practical application. Among all work known to us, onlyVarni and Carrera , Cunningham and Roberts , and Nowak and Cirpka compared their numerical simulations against field measurements of groundwater age, grain-size distributions, and solute breakthrough curves, respectively.Nowak and Cirpka  and Yin and Illman  reduced measured real breakthrough curves and experimental drawdown curves, respectively, and then used TM in geostatistical inversion.
 Furthermore, all of the applications in TM predominantly employed low-order moments. For example, the studies byZhu and Yeh  and Yin and Illman  stated that using TM induces a loss of information in inverse modeling based on an analysis up to the first order TM. All of the studies we could find rarely provided reasons for the choice of order, and none of them assessed the information lost by not looking at higher orders. The fact that no systematic assessment has been performed raises the research question: How many TM are required to properly capture the behavior of a system?
 Another way of removing the time dependence within the governing equation is to apply Laplace transform techniques. They have been proven to be suitable in forward model reduction (including reconstruction of the full time series from simulated LC) when considering sequences of more than 10 and up to 100 LC [e.g., Sudicky, 1989]. Here the questions are how many and which LC are required to properly represent the system?
 In addition to integral transformations, increasing attention has been drawn by snapshot-based model reduction methods [Vermeulen et al., 2004; McPhee and William, 2008]. Via proper orthogonal decomposition (POD) into dominant spatial patterns [Papoulis, 1991], the model is reduced to some number of orthogonal base functions in physical space with time-dependent coefficients. Within other disciplines, this method is referred to as principal component analysis (PCA) [Pearson, 1901], or Karhunen Loève transform (KLT) [Loève, 1955]. We refer to these methods as spatial reduction methods since the model is, in its proper effects, reduced in physical space while the time-related model complexity remains untouched. The scope of this work, however, is strictly limited to temporal reduction methods. This strict focus is legitimate, because reduction methods in time can be evaluated independent of spatial methods. Reduction techniques in space and in time can be arbitrarily combined because space and time are independent coordinates. For that reason, we do not focus any further on spatial reduction methods.
 A principal difference between POD and TM (in addition to working on space or time), however, is that TM employ nonorthogonal base functions. The advantage of working with orthogonal base functions is that they have very elegant properties in the solution of many mathematical and physical problems (e.g., Hermite polynomials are the optimal set of base functions when expanding functions of normal distributed variables [Cameron and Martin, 1947]). This leads to the next research question: Would other (and possibly orthogonal) base functions in temporal reduction work better than the nonorthogonal monomials that lead to TM?
 We will answer the above raised research questions in the following order: We first address the question of whether other base functions in temporal reduction work better than the nonorthogonal monomials. To this end, we derive theoretically classes of temporal base functions that can reduce hydro(geo)logical models, and compare those to the structural aspects of TM. We perform this analysis solely in terms of computational efficiency with no attention paid to reconstruction options to recover the original function of the model reduction.
 Second, we discuss the research question: How many TM or LC are required to properly capture the dynamic system behavior? This is answered by measuring the information density of TM with regard to the underlying fully resolved time series. For this analysis, we apply a novel method called PreDIA (pre-posterior data impact assessor) [Leube et al., 2012] in order to assess the information density of TM. PreDIA is a nonlinear Bayesian filtering scheme originally developed for data worth analysis. We apply it here to measure the informational value of knowing any given number of TM from an unknown curve. PreDIA, as used here, may be seen as a generalized analysis of variance (ANOVA) [Harris, 1994] technique. We perform this analysis on an example featuring aquifer pumping tests. This example is based on a single, specific linear PDE and, hence, can by no means be seen as a general assessment for all linear PDEs involved in hydro(geo)logical problems. However, the very same analysis can easily be applied to other cases, e.g., the advection-dispersion equation or linear hydrological models.
 These research questions need to be mirrored against the purpose and context of model reduction. Sometimes, TM directly correspond to the physical quantities of interest. In some cases of forward modeling, however, the ability to reconstruct a full time series from the reduced model will be relevant. In inverse modeling, only the information content of measured time series captured by the reduced model will matter. We will address these issues throughout our study wherever appropriate.
 The remainder of this paper is organized as follows: Section 2 summarizes the concept of TM. Section 3 discusses possible alternative integral transformations, and section 4 compares their adequacy for model reduction to the approach of TM (both addressing the first research question). Section 5 features the analysis of information density on the example of drawdown curves in aquifer tests, thereby addressing the second research question.
2. Temporal Moments
2.1. Definition of Temporal Moments
 Let be a space- and time-dependent response of a system to an external excitation starting at time (e.g., a drawdown curve, solute breakthrough curve, or river discharge after a rainfall event). The vector x includes all considered spatial coordinates. The k-th temporal moment of is defined by,
where tk is a monomial of order k, which is used as a base function. Then, the k-th raw moment is calculated by normalization with ,
The normalization by makes a function with density properties, i.e., , similar to probability density functions. Raw temporal moments are then closely related to statistical moments, simply applied to time rather than to the values of some random variable. Typically, higher-order (temporal or statistical) moments are then centralized to and standardized to using the binomial transform [e.g., Papoulis, 1991],
where are centralized and are centralized and standardized moments. This is an analogy to image pattern recognition, where algebraic moment invariants are calculated in order to make image features invariant with respect to scale, translation, and rotation [Prokop and Reeves, 1992]. TM can also be derived from the moment-generating function (MGE) via the Laplacian transformation [Kubo, 1962; Harvey and Gorelick, 1995] or as the Taylor series coefficients in the spectral domain [Kendall and Stuart, 1977]. For additional details, we refer to Appendix A.
2.2. Physical Meaning of Temporal Moments
 Based on the order k of the respective base function tk, temporal moments capture different individual features of the response curves . As summarized in the Introduction, most existing applications only consider lower-order TM.Figure 1 illustrates the zeroth through second moments. The zeroth temporal moment is a simple integral of the response over time and so measures the overall response strength. This is marked as the shaded area under the example response curve in Figure 1. The first raw (normalized) temporal moment provides information on the time between excitation and bulk response (vertical dashed line), i.e., a characteristic response time of the system, whereas the second normalized and centralized TM is the square of a characteristic response duration. Higher-order temporal moments represent the asymmetry (skewness or tailing ) and peakedness (kurtosis ) of the response curve and other higher-order characteristics also known in statistics.
 In many situations, these features can be put into relation to the governing physical flow and transport processes. The most well known cases are: TM from aquifer testing (drawdown curves) and TM from tracer experiments (e.g., breakthrough curves). For a drawdown curve obtained from slug-like aquifer tests, is related to the steady state drawdown that would result from continuous pumping. The characteristic relaxation time is which also bears some transient information that is needed to estimate the storativity [Li et al., 2005].
 In porous-media- or fractured-porous-media solute transport, temporal moments summarize the information on the transport from the point (volume) of solute injection to the point (area and volume) of observation, yielding a path-integrated measure of transport characteristics. Here the zeroth TM is the total observed mass at a given point x. The first TM, , is related to the bulk arrival time along the travel path, reflecting the apparent average seepage velocity [Aris, 1958; Cirpka and Kitanidis, 2000b; Goode, 1996]. The second TM, , is physically related to local dilution and may be used to define the corresponding dispersion coefficients [e.g., Cirpka and Kitanidis, 2000b]. Higher TM represent more complex information on structural properties of porous media, e.g., caused by nonuniform grain-size distributions [Cunningham and Roberts, 1998]. Luo et al.  derived a direct relation between the TM of breakthrough curves (BTC) and the TM of memory functions in multirate mass transfer (MRMT). A great advantage in this context is that TM remove the nonlocality in time from MRMT equations.
2.3. Simulating Temporal Moments
 In addition to their intuitive understanding, physical meaning, and significance in theoretical analysis, temporal moments have the advantage that they can be simulated at very low computational costs. Let us consider a generic linear dynamic and distributed system (e.g., described by a system of coupled linear PDEs or a single linear PDE such as the groundwater flow equation [Bear, 1972]),
with the generic initial and boundary conditions,
with storage coefficient S, linear differential operator of arbitrary order (including adequate coefficients such as hydraulic conductivity), and forcing term . and are the corresponding prescribed values on the boundaries. Without loss of generality, we may subtract from the equation. This is allowable for all linear PDE systems and makes most intuitive sense if represents a steady state. Applying equation (1) to equations (5)–(8) reduces the transient PDE (equation (5)) to a set of steady state equations,
with boundary conditions,
where is the k-th moment of the forcing term, now including the model forcing by the initial condition (if it is not equal to the steady state at ), and and are the k-th(x) moments of the boundary values. Quite obviously, the considered moments of and have to be finite, which is satisfied if a finite forcing persists over a finite time. The detailed steps that lead to equations (9)–(12) are provided for arbitrary base functions in section 3. Auxiliary conditions required for integrating the time derivative are and at , i.e., the system response has to asymptotically decay to zero faster than the highest power of t used as the analysis approaches infinity.
 Overall, this allows us to simulate TM at the computational costs of a few steady state simulations, avoiding the costly need for time-marching schemes in transient simulations. As a matter of fact, TM are calculated recursively where the previous TM of order (k – 1) serves as source term for the respective current TM of order k. Applications to specific problems can be found elsewhere, as reviewed in the Introduction. For drawdown from pumping tests in confined aquifers, which occurs as parabolic PDE, TM reduce equations (8)–(10)to an elliptic PDE with formally time-dependent boundary conditions.
3. Coupling Structure of Resulting Equations
 Now we pursue our first research question, i.e., what constraints does a set of arbitrary base functions have to meet, such that computational costs can best be reduced? It can be anticipated from equations (1)–(10), that choosing other base functions than the monomials tk will lead to other, more general, temporal characteristics than TM. The key question will be, whether their resulting generating equations are fully coupled, recursively coupled (such as for TM), or independent. In order to analyze this issue, we replace the monomials tk in equations (1)–(10) with a set of yet unspecified base functions , and repeat all of the steps analogously. This leads to a definition for arbitrary characteristics,
with their corresponding generating equations,
The boundary conditions are again,
Integrating terms (2) and (3) and the boundary conditions is trivial, since and the time integral can be moved into the spatial differential operator. This leads to a differential expression for the new temporal characteristics . Term (1) requires an integration by parts and leads to terms (4) and (5) in the following equation:
where are the temporal characteristics of order k that correspond to TM in equation (1) when setting , and are the corresponding characteristics of the forcing function . When the auxiliary conditions are changed accordingly, term (4) vanishes. The associated boundary conditions become
Equation (17)can be solved without reverting to the time-dependent solution ofequation (5), if and only if the remaining term (5) can be expressed through a combination of characteristics with arbitrary orders k ranging from to , such that all time-related differential and integral operators disappear,
where cKk are linear coefficients. Applying equation (13) and auxiliary conditions to both sides of equation (20) allows the replacement of by its characteristics and the rewriting if equation (20) as a system of ordinary differential equations (ODEs),
This set of equations will finally allow the replacement of all remaining time-related operators in term (5) ofequation (14), and lead to a coupling between the K replicants of equation (14) for all .
4. Different Coupling Cases
 We will now investigate the specific coupling cases that can occur in equation (21). The goal is to find the set of base functions that allows the most swiftly simulated temporal characteristics from equation (14) while summarizing the dynamic behavior of as well as possible (with a small number). In this way we wish to find the most efficient approach to model reduction in time.
 Putting equation (21) into matrix notation reveals different cases of coupling schemes between the replicates of equation (14) as illustrated in Figure 2. Four specific cases are of particular relevance for further analysis and will be discussed in the following sections.
4.1. Fully Populated Case
 In the most general case (a), term (5) can only be expressed as a linear combination of all lower- and higher-order characteristics, leading to a fully populated coupling matrix. This will occur only if the base functions are nonpolynomial, e.g., rational, trigonometric, etc., such that none of their time derivatives vanish.
 For the final purpose of simulating temporal characteristics, this will lead to a fully coupled system of equations in equation (14). This is unfeasible, because it will be much more expensive to solve than recursively coupled systems or decoupled equations (see other cases). Also, it may exclude commercial software packages from being used if they do not allow for solving coupled equations. Therefore, we can immediately remove the general case (a) from our further considerations.
4.2. Lower-Order Case
 Case (b) resembles the situation where term (5) can be expressed as a linear combination of characteristics of order only smaller than k, leading to a lower triangular coupling matrix. This can only occur if the base functions are polynomials of order k (or polynomial approximations of trigonometric, hyperbolic, square root, logarithmic, or any other arbitrary base functions, truncated at order k, sorted in ascending order). From equation (21) it can be seen that in this case the first line directly leads to , such that must have first order in t, and so on.
 Let us now consider an arbitrary polynomial base function expressed via linear combinations of monomials ti,
with time-independent coefficients . When pursuing this approach we get,
Pulling the sum and outside the integral yields,
which can be expressed as a linear combination of TM of order ,
From this it follows that any temporal characteristic based on arbitrary polynomial base functions obeying case (b) can be mimicked by TM through linear recombination. Therefore, arbitrary polynomials of order will capture the same temporal information, yet at slightly higher computational costs (due to the treatment of multiple source terms in equation (17)). A drawback of polynomial base functions is that reconstruction techniques for the full time series from the characteristics are satisfactory only when choosing adequate shape assumptions (e.g., Harvey and Gorelick , Enzenhöfer et al. , and Mohammad-Djafari ), which may introduce subjectivity into the analysis. For data compression in inverse modeling, however, this problem is insignificant.
4.3. Appell Case
 Case (c) is a special case of (b) involving the so-called Appell sequences [Appell, 1880]. Appell sequences include the monomials that lead to TM, Hermite polynomials, Bernoulli polynomials, and Euler polynomials. They are, in fact, defined via an ODE system that is simpler than equation (21), occupying only the secondary diagonal of the coupling matrix. By the nature of this coupling, it is obvious that the recursive coupling is computationally the most efficient way to simulate temporal characteristics together with the last case (d).
4.4. Laplace Diagonal Case
 Case (d) considers the situation where the coupling terms only occupy the main diagonal. Guaranteeing that is proportional to can be fulfilled if and only if , which directly leads to the Laplace transformation (LT). The relation of the LT to TM is recalled in Appendix A. In brief, the LT yields the spectrum of the system response, and TM are the Taylor expansion coefficients of the spectrum. As a consequence of diagonal coupling, the Laplace coefficients (LC) can be determined independently (uncoupled, nonrecursively). However, it is unclear which LC summarizes the dynamic behavior in best. As a direct consequence, applications employing the LT typically used between 10 and 40, sometimes even 100 LC to accurately restore the original response [e.g., Li et al., 1992; Sudicky and McLaren, 1992].
 We rate the Laplace case as practical for the purpose of reconstructing the full time series when considering sequences of more than 10 and up to 100 LC because there are swift and accurate algorithms for the inverse Laplace transform [e.g., Crump, 1976]. For only few LC, however, the reconstruction lacks accuracy. Another disadvantage is that the inverse Laplace transform cannot guarantee nonnegativity (compare section 4.6). For the purpose of data compression and model reduction in inverse modeling, the reconstruction options offer no advantage at all. Also, no physical meaning has been attributed to LC to the best of our knowledge.
4.5. Orthogonal Case
 Characteristics should summarize the dynamic behavior of as well as possible, as it already has a small number. Only then can the sequence of considered characteristics be truncated at low , leading to a small set of replicates for equation (14) to be solved. In analogy to signal processing, we refer to this desired property as optimal compression. A perquisite for optimal compression is that temporal characteristics in the order of the sequence add large and possibly nonredundant information units, sorted from most to least significant information units in descending order. Using the terminology of Fourier, Laplace, or more general integral transforms [e.g., Debnath and Bhatta, 2007], the spectrum has to decay as fast as possible with increasing order k, by an adequate choice of . The goal of nonredundancy can be achieved by guaranteeing orthogonality among the respective base functions .
 We will now investigate whether any orthogonal base functions exist that allow us to reduce equation (14). Orthogonality between base functions and is defined as,
with respect to the weighting function . Nk is the squared weighted L2 norm and depends on the choice of the base function . In this context, the optimal choice of the base functions strongly depends on the associated weighting function and its own moments [e.g., Abramowitz and Stegun, 1972; Oladyshkin et al., 2011b] and on the integration interval .
 Taking advantage of orthogonal base functions has already been done in fields different to our study. In image processing, Teague  established orthogonal polynomials in order to derive moments invariant with respect to image translation. In the context of object reconstruction, Prokop and Reeves  resumed that monomials are highly correlated and thus introduced orthogonalized moments in order to reduce the information redundancy among conventional moments. Furthermore, they concluded that orthogonal moments are more suitable in image reconstruction and may be used to determine the minimum number of moments required to adequately reconstruct, and thus uniquely characterize, a given image. In chromatography, Kucera suggested the expansion of a time-dependent response in order to analytically solve the advection-dispersion equation. To this end, he suggested the use of orthogonal Hermite polynomials.
 For most dynamic, distributed systems of interest in hydro(geo)logical applications, we have and . Under these conditions, it is impossible to define orthogonal base functions because the moments of are infinite (compare Oladyshkin et al. [2011b]). Orthogonal base functions can be excluded from out considerations, since they do not exist for the class of problems we are interested in. An interesting case for future research are response curves truncated at some upper time T, such that . This leads to so-called truncated TM [Luo et al., 2006], which may be accessible to orthogonalization.
4.6. Cumulant Case
 For the sake of completeness, we recall the case of cumulants [Van Kampen, 2007]. As summarized in Appendix A, cumulants are also related to the Laplace transform (LT). Applying the natural logarithm to the moment-generating equation (MGE) yields the spectrum of so-called cumulants, . Cumulants are not able to reduce equation (14)because applying the logarithm converts terms (1) and (2) to mixed integro-differential expressions and, hence, irreversibly changes the character of the parabolic PDE.
 Cumlants have the elegant property that they allow reconstruction of the dynamic response using the so-called Edgeworth expansion, which has highly advantageous convergence properties for nearly Gaussian problems [Kendall and Stuart, 1977]. Cumulants can be obtained from TM via the relation provided in equation (A5) in Appendix A. However, nonnegativity of is often a physical requirement, and the Edgeworth expansion cannot guarantee nonnegativity. For these reasons and for our purposes, we can also exclude cumulants.
4.7. Conclusions on the Basis of Coupling Analysis
 The derivations in equations (13)–(21) hold for a very generic form of linear PDEs. For example, this form includes linear parabolic PDEs (e.g., representing a dynamic confined groundwater model). The examples by Harvey and Gorelick  or Luo et al. for advective-dispersive contaminant transport or MRMT, respectively, illustrate how this concept also holds for other parabolic or partial integro-differential equations. In fact, the results apply to any (system of) linear PDEs with the following properties: the spatial derivatives may have any arbitrary order; there may be an arbitrary number of arbitrary-order time derivatives; and for the integration by parts to work out, the coefficients must be independent of time and independent of the solution .
 From all of the considered cases in sections 4.1 through 4.6, we draw the following conclusions:
 1. Any temporal characteristic based on arbitrary polynomial base functions or on cumulants can be mimicked by TM through linear recombination, and would not offer improved computational efficiency compared to TM.
 2. Polynomial-based temporal characteristics in general contain the same information as TM, simply arranged in different linear combinations. Hence, they cannot capture more information from the dynamic system.
 3. As an overall consequence, there is no temporal model reduction for dynamic systems based on arbitrary integral transforms with polynomial base functions that is better than the monomials leading to TM.
 4. The only remaining integral transform that reduces equation (5)to a noncoupled system of steady state PDEs is the Laplace transform. In all applications of the Laplace transform that we could find, the number of characteristics necessary to capture the dynamic behavior was in the range of tens to hundreds. This is only satisfying when the goal is to reconstruct the full time series from the reduced-model simulations. For sets of characteristics smaller than 10, the reconstruction from LC lacks accuracy. Nonnegativity, which is often a physical requirement, cannot be guaranteed. Also, the possible information content in the LC is hard to access because it is a priori unknown which parts of the Laplace spectrum to use.
 Thus, two open questions remain: How does the compression rate converge, or in other words, how many TM or LC are necessary to achieve a sufficient degree of compression? And do TM, other than the Laplace transform, provide the most important information units first, and possibly in a strictly ordered fashion? We will pursue and answer these research questions in section 5 using a specific example. On the basis of only one single example, we cannot claim generality for the results from section 5.
5. Compression Efficiency
 In order to answer the remaining two questions raised in section 4.7, we numerically measure the compression efficiency of TM or LC against the underlying highly resolved time series. In other words, we measure the error due to the temporal compression on only a few characteristics. In general, the number of necessary characteristics to achieve an acceptably small compression error (question a) and the convergence behavior (question b) will depend on the properties of the system under investigation along with its initial, boundary, and forcing conditions, and on the given application context. For illustration, but without claiming generality, we choose an example from groundwater flow. The approach we pursue in the following, however, can be suggested as a general methodology to assess the compression efficiency for arbitrary systems of interest.
5.1. Efficiency Measures
 The most intuitive way to measure the compression efficiency of TM would be through the L2 norm of the approximation error,
where is an appropriate reconstruction of a time-series based on a set of TM at a given location . The individual choice of reconstruction techniques, e.g., maximum entropy [Jaynes, 1957], or Edgeworth series expansion [Kendall and Stuart, 1977], however, inevitably introduces an error of its own. To make our analysis independent of the error in the specific reconstruction technique chosen, we replace the L2 norm by a statistically motivated norm based on Bayesian principles, the conditional standard deviation (CStD),
which is the conditional variance of given a set of TM (or other characteristics ) at any location of interest. The CStD represents the motivation that TM or other characteristics should at least be informative to identify response curves among an ensemble of physically plausible random response curves. A sample of random time series realizations typically arises in stochastic problems or systems with random coefficients. To make our analysis independent from one specific set of TM values or used for conditioning, we average over a corresponding spectrum of TM values, yielding the compression error ,
which is related to equation (28) via the law of total variance [Weiss, 2006]. Please note that for notational simplicity, we no longer carry the dependence of on .
 This procedure is a specific case of Leube et al. , who developed a framework (PreDIA) to measure the level of information carried by data (here TM or LC) in the context of optimal design of experiments. It may be viewed as a generalized ANOVA [Harris, 1994] technique. In this context, PreDIA is used to analyze what is the explanatory power of a set of TM in restricting the prediction variance of drawdown curves. The underlying analysis relies on Monte Carlo (MC) simulations of random system responses in combination with a likelihood-based reweighting [Smith and Gelfand, 1992] of the MC ensemble that performs the conditioning needed in equations (28) and (29). For details on numerical implementations and a deeper discussion, we refer to the original publication.
 Finally, is normalized by its value for k = 0, i.e., by the uncertainty in the absence of any TM or LC, yielding the normalized compression error
For time-integrated analysis, we first integrate over time, normalize again, and define the total normalized compression error by
Figure 3illustrates the above-introduced measures. We consider a function described by three unknown parameters a through c,
where a through care assumed to be log-normal distributed. The uncertainty iny(t) because of the lack of knowledge on a through c is steadily decreased when learning about the parameters a through c, eventually decreasing to zero when all three parameters are known. Storing only zero to three parameters instead of the entire curve resembles a data compression, and the resulting uncertainty in y(t) integrated over time is the related compression error. When normalized by the compression error for zero parameters, this yields the normalized compression error in equation (31).
5.2. Physical Scenario
 As a simple illustrative example of a dynamic distributed and linear system, we consider groundwater flow in a 2D depth-averaged heterogeneous aquifer with random transmissivity during a pumping test,
with the sink term , locally isotropic transmissivity , and drawdown . Initial and boundary conditions are,
where is the location of an extraction well with pumping rate m3 s−1 during a pumping interval of h. The extraction well is located at whereas the response is monitored at . We define as a discretized random space function represented by cell-wise values on a fine numerical grid. Following classical geostatistical ideas, we use as a known constant, and assume that is second-order stationary with an isotropic Gaussian covariance functionC(h) that only depends on the separation vector h [e.g., Kitanidis, 1997].
 For the variation of S, we follow the suggestion of Li et al. , who reviewed the sparse literature on the variability of the storage coefficient S, and finally recommended use of a spatial constant with lognormal distribution and log-variance . Table 1 summarizes the relevant geostatistical parameters associated with the ensemble generation.
Table 1. Parameters for the Flow and Geostatistical Modela
Parameter values marked by stars are varied within the scenario analysis (see Table 2).
Integral scale lnT
Integral scale lnS
72 × 102
72 × 103
 The transient problem (equation (33)) and its corresponding TM (equations (9) and (10)) are solved parallelly by MODFLOW-2005 [Harbaugh, 2005] on 80 cores with 2.8 GHz. To guarantee a highly accurate sampling of and its TM, we employ a Monte Carlo ensemble consisting of 250 k realizations. Realizations of are generated with the same implementation of fast Fourier transform (FFT)-based methods [Newsam and Dietrich, 1994] as used by Nowak et al. . As a matter of the weighting-based importance resampling used in PreDIA, the accuracy of the analysis is degenerating with increasing TM sequence lengthK or, in general, with stronger conditioning on data [Leube et al., 2012]. These limitations, also known as the “curse of dimensionality” or “filter degeneracy,” have been the scope of many studies in the past [e.g., Liu, 2008; Snyder et al., 2008; Van Leeuwen, 2009]. We ensure that our study results are unaffected by this problem by assessing the associated MC error of computing R, Rn, and Rt by means of the bootstrap method [Efron, 1982]. This has, to the best of the authors' knowledge, not been done before in the context of any reweighting-based MC analysis.
 In order to investigate the sensitivity of our analysis toward the choice of the underlying geostatistical scenario settings, we repeat our analysis in several scenario variations where we vary the most relevant geostatistical parameters. Varied parameters include , , and as summarized in Table 2.
Table 2. Definition of Scenario Variations in Test Casesa
Cases 1a, 1b, and 1c have identical values for μS, , λS,1,2.
5.3. Cumulative Efficiency Analysis for TM
 In the next three sections, we present and discuss the results of our analysis. In this section, we consider TM sequences of increasing higher-order to answer question (a) raised at the end of section 4; whereas in section 5.4 we consider individual TM of specific orders k to answer question (b). Section 5.5 will compare the results to the performance of LC.
Figure 4 (left) shows for scenario 1a the expected response curve (r(t) here drawdown at ) in the absence of any TM data (dashed-dotted line), enveloped by its uncertainty (lightest gray-shaded area). Using TM sequences of increasing highest-orderK then helps us to know many more details about the dynamic response as illustrated by the decreasing shaded areas for various values of K. Obviously, a relatively high reduction of compression error is achieved by the first two TM in some time interval around the mean response time (zeroth TM only) and peak time (zeroth and first TM). The additional information when adding higher TM can hardly be seen in this type of visualization.
Figure 4(right) shows the time-averaged compression error mapped against the highest-TM orderK for all four scenarios from 1a through 2. The MC error of , estimated by the jackknife method, is visualized by the gray-shaded areas. These areas represent ±2 standard deviations of assessing the values. For all cases, decreases strictly with increasing K, i.e., the longer the TM sequence considered. This is apparent, since longer TM sequences bear more information about the underlying time series. The first two TM (the zeroth and first) convey more than ∼80% of the information in all four cases, whereas the second and third TM contribute another 10%. The remaining 10% of information is distributed among an unquantifiable number of higher moments.
 Comparing the different geostatistical scenarios 1a through 2, we find the compression error generally identical with only slightly differences (±3%). Although these differences appear to be small, they allow some meaningful insight into the driving physical processes: case 1c ranks comparatively best ( of 1c is below that for 1a) in the sense that the overall information is concentrated best in the lower-order TM. This is because case 1c causes less variability (smaller ) associated with the possible dynamic shapes and features of drawdown curves compared to case 1a. With less variability in dynamic features, fewer units of information (a lower number of TM) suffice to infer the actual shape of the dynamic response.
 The opposite behavior can be observed when analyzing scenarios 1b and 2. They introduce more variability compared to case 1a (higher for case 1b and uncertain S for case 2), causing more variable dynamic features. Case 2 produces even more drastic dynamic features through a much stiffer system with less diffuse behavior (small ). Hence, both scenarios require additional information, i.e., more TM in order to achieve the same level of information.
 All of the above analyses may suffer to some extent from filter degeneracy making our results for longer TM sequences slightly less reliable. On the basis of our bootstrap error estimate, however, we found that critical levels of filter degeneracy do not occur for a TM sequence length below 10. This is when most information (>90%) has already been captured, and so does not affect the conclusions we made above.
5.4. Individual Efficiency Analysis for TM
 In section 5.3we analyzed the compression error of entire TM sequences of highest-order . The final remaining question is: Is the order of TM given by the recursive character of equations (9) and (10) the one that provides the most informative TM at first (question b raised at the end of section 4)? To this end, we analyze individually for every TM of order k. The results are shown in Figure 5 (top).
 Obviously, for scenario 1a, lowest-order TM are again the more informative ones, and the compression error of individual TM is steadily increasing with increasing order k, and eventually climbs up to 90%. The same behavior can be observed for scenarios 1b through 2.
 For scenarios 1b and 2, higher-order TM convey more important information compared to scenarios 1a and 1c, while they contribute much less in the cumulative analysis insection 5.3. This seeming inconsistency is explained by the fact that TM are not statistically independent, i.e., they are not orthogonal, convey partially redundant information, and their information content is not simply additive. Apparently, the more variable scenarios 1b and 2 produce more redundancy among different TM.
Figure 5 (bottom) shows the time dependence of for TM of increasing order kon the example of scenario 1a. The curves indicate that each TM has specific time ranges in which it contributes the most information. For higher orders, the conveyed information is shifted to later times due to the increased leverage of higher-order monomials at later times. Thus, higher-order TM capture later-time features of the response curve. This turns out to be important in slug-like pumping test analyses, where even the late-time features of drawdown recovery still contribute valuable information for estimating transmissivity [Oliver, 1993; Wu et al., 2005; Zhu and Yeh, 2006]. Also, the late-time behavior of solute breakthrough curves is important to identify non-Fickian transport phenomena, e.g., in multirate mass transfer models [Haggerty and Gorelick, 1995; Luo et al., 2008].
 When working with the TM of noisy time series measured in the field, in the context of inverse modeling, higher-order TM may be subject to large errors. Such errors have the potential to compromise their information content requiring more TM to compensate for that loss. This is, however, not the scope of our study and further research is needed to address this issue.
5.5. Comparison to Laplace Case
 As described in section 4.4, LC are computationally attractive since they can be computed independently. However, it is a priori unknown which parts of the spectrum or precisely which LC will be most informative. This triggers the question if there exists a set of LC (and if yes, then which one) that is superior to TM in terms of information content.
 We mimic the lack of knowledge on the optimal choice of LC sets by randomly sampling from a large spectrum of potential Laplace variables u, with . We repeat this 500 times and measure the total normalized compression error for different sequence lengths K in each repetition. We use case 1a as a physical scenario.
Figure 6shows the resulting total normalized compression error for the ensemble of 500 sets (gray lines) and the ensemble mean (dashed line). For comparison, we include the results for the TM (dashed-dotted line) obtained insection 5.3. We observe that there are only a small fraction of LC sets performing slightly better than the TM. The ensemble mean (expected Laplace performance), however, performs considerably worse than the TM. As the optimal set of LC is unknown in practical applications, the possibly better performance of LC cannot be exploited.
 Whether or not this disadvantage of LC can be outbalanced by the advantage of easier curve reconstruction for will depend on the specific application context.
6. Summary and Conclusions
 This work is established in the field of model reduction, where we build on the promising approach of TM. TM are designed to efficiently reduce dynamic models by projecting their temporal responses onto monomial base functions in time. This yields temporal characteristics with practical application areas ranging from field characterization, swift forward modeling, characterization of flow and transport processes, etc. Since the application of TM has been limited to lower orders in most past studies, and since TM are based on very simple monomial base functions, we pursued the following research questions: How many TM are required to properly capture the systems dynamic behavior?, and would orthogonal or other base functions in temporal model reduction work better?
 By answering both questions we find the following conclusions most important:
 1. Only polynomial base functions and the Laplace transform effectively reduce models in time because all other possible base functions lead to a highly inconvenient coupled system of equations for the temporal characteristics.
 2. Temporal characteristics from arbitrary polynomial base functions, other than the monomials that lead to TM, have to be solved recursively in increasing lead order of the polynomials, just like TM.
 3. Any temporal characteristic based on arbitrary polynomial base functions , other than monomials (including the case of orthogonal polynomials or cumulants), can be mimicked by the TM through linear recombination.
 4. For these findings, we claim generality for all dynamic models based on sets of linear PDEs with an arbitrary number and degree of spatial and temporal derivatives. Hence, there is no better way for temporal model reduction than TM in this class of systems based on other polynomial base functions.
 5. On the basis of an example from groundwater flow, the first two TM cover more than 80% of the information required to identify a drawdown curve. Considering up to four TM, captures up to 90% or more of the overall information. The remaining 10% of information is distributed among an unquantifiable number of higher moments. The lowest-order TM are always the most informative.
 6. The distribution of information content over time differs among the TM orders. Late-time behavior can mainly be inferred from higher orders. The relevance of higher-order TM has to be judged in light of any specific application task.
 7. This is by far better than what we found for LC. One advantage of LC is that the equations are fully decoupled, such that arbitrary coefficients can be chosen in arbitrary order. This is, at the same time, their greatest disadvantage, because it is a priori unknown which ones are the most informative ones. Hence, it will be close to impossible to pick the optimal set of LC that could compete with TM.
 8. These findings need to be reflected against the purpose of model reduction. For accurate curve reconstruction from many (10…100) time characteristics, the Laplace technique may be the better choice, because swift and accurate algorithms for the inverse Laplace transform exist. However, for fewer characteristics, the inverse Laplace is not satisfactory. In addition, the inverse Laplace transform cannot guarantee nonnegativity which is often a physical requirement. Thus, we strongly encourage the use of TM. Also, in many cases, TM directly correspond to physical quantities of interest [Cirpka and Kitanidis, 2000b]. In inverse modeling, no curve reconstruction is necessary, and we generally suggest using TM.
 On the basis of these findings, we hope to encourage more studies to work with the concept of TM. Especially because the number of studies listed in section 1 that employ TM with real data is vanishing, improved tests on existing datasets [e.g., Liu et al., 2007] should be performed. Also, we hope to encourage those who limited their TM applications to only lower-order TM to consider a longer moment sequence. Our study results specifically provide valuable advice for hydraulic tomography studies under transient conditions to use TM up to the fourth order. This might potentially alleviate the loss of accuracy used as an argument against TM byYin and Illman . We agree with their argument that the temporal reduction of data to only a few TM includes a loss of information. However, there is no reason not to use more TM up to higher orders than they did. In the context of inverse modeling via TM of noisy measured time series, the question of assigning meaningful measurement errors to higher-order TM will pose a challenge for further research. We expect similar results for inverting tracer data based on TM. This hypothesis is supported in part by the study ofNowak and Cirpka , who showed that including the second TM of tracer breakthrough curves for geostatistical inversion leads to better results.
Appendix A:: Derivation of the Moment-Generating Equation
 The Laplace transform (LT) of a response curve is expressed as,
where u is the Laplace variable. Inserting the Taylor expansion of eut for t = 0,
into equation (A1) decomposes the spectral domain. TM now occur as coefficients of the Taylor series and yield the so-called moment generating equation [Van Kampen, 2007],
Taking the derivative of with respect to u, yields the k-th TM,
 Applying the first natural logarithm to both sides of equation (A2) and then performing just another Taylor expansion (for the logarithm) for t = 0 yields the cumulants, [Van Kampen, 2007],
Cumulants can be expressed as a nonlinear recombination of equal- and lower-order normalized TM,
Appendix B:: Orthogonal Polynomials
 Finding an arbitrary orthogonal base function with time-independent coefficients (similar to equation (22)) meeting the constraints of and would have to fulfill the following constraint (according to equation (26)):
and so there is no real-valued nonzero base function for this case.
 The authors would like to thank the German Research Foundation (DFG) for financial support of the project within the Cluster of Excellence in Simulation Technology (EXC 310/1) at the University of Stuttgart.