LPMLEn–A Frequency Domain Method to Estimate Vertical Streambed Fluxes and Sediment Thermal Properties in Semi‐Infinite and Bounded Domains

This paper presents the LPMLEn, a new method to estimate vertical flux and thermal diffusivity from streambed temperature time series using the frequency domain. The main advantages of this new method are: (a) the use of multiple frequencies and multiple sensors for the parameter estimation; (b) noise/uncertainty handling in an optimal way; (c) the possibility to estimate the parameters with both semi‐infinite and bounded domain models; and (d) the compensation for temperature drifts in the data known as transients. The capabilities of the LPMLEn are demonstrated using both synthetic and field data, highlighting the advantages of the bounded domain model over the semi‐infinite domain model in the parameter estimation process.

apply the local polynomial (LP) method to estimate uncertainties and reduce leakage. Where Sohn and Harris (2021) as well as Vandersteen et al. (2015) considered the subsurface as semi-infinite, Schneidewind et al. (2016) developed an early bounded (finite) domain model with the idea to estimate vertical flux for distinct streambed sections. This early bounded model (LPMLE3) utilized information from three (noisy) sensors, two for the input (top and bottom boundaries) and the third for the output.
Here, we introduce LPMLEn, which builds upon works from Vandersteen et al. (2015) and Schneidewind et al. (2016) by extending the bounded LPMLE3 method to include temperature data from n sensors in the parameter estimation process and by considering noisy input and output data for both semi-infinite and bounded domains. The possibility to use n sensors better constrains parameter uncertainty whereas the lower boundary condition better insulates the parameter estimation process from spatial and temporal heterogeneities outside the probed domain. This is especially helpful when flux and thermal diffusivity are estimated within small vertical subsections of the streambed to detect streambed heterogeneity. The advantage of using n sensors within bounded domains is beneficial in the analysis of flux within heterogenous sediment when data are collected with temperature probes containing many sensors such as multilevel temperature lances (Munz et al., 2016) or FO-DTS (Selker et al., 2006).

The LPMLEn Method
The LPMLEn combines the LP method with an MLE to estimate 1D vertical streambed fluxes and thermal diffusivities using data from n temperature sensors. It operates in the frequency domain and can use multiple frequencies and sensors simultaneously for the parameter estimation process. The LP method is applied to reduce leakage (transients) and estimate the noise levels at the selected frequency components of the measured temperature signals. The LPMLEn is provided here with two models: (a) the semi-infinite domain model where only an upper temperature boundary condition is specified to estimate the parameters and (b) a bounded (finite) domain model where an additional lower local temperature boundary condition is assigned to estimate the parameters for a distinct section of the streambed.
Like other 1D models, LPMLEn assumes local thermal equilibrium (or a slow-moving equilibrium without significant change of diffusivity and vertical flux), steady water flow and constant thermal parameters over the model domain. While it provides information on the vertical flux, we would like to remind potential users that in natural settings water flow has vertical and nonvertical flow components (horizontal or lateral). While flowlines at the center of a stream are formed in such a way that they often indicate predominantly vertical flow (Cuthbert & Mackay, 2013;Shanafield et al., 2010), it has been shown that nonvertical flow components can often become significant closer to streambanks or in areas of high hyporheic flow (Lautz, 2010;Reeves & Hatch, 2016;Roshan et al., 2014). Nonvertical flow does not invalidate 1D heat transport estimates, however curvature in the flow path may, depending on the degree of curvature (Cuthbert & Mackay, 2013). Areas with high curvature in flow paths warrant the use of more complex and data-intensive 3D heat transport models (Ghysels et al., 2021;Karan et al., 2014).
Coupled vertical (1D) water flow and heat transport defined after Stallman (1965) is given by the advection-diffusion equation with [ML −1 T −2 Θ −1 ] as the volumetric heat capacity of the water-sediment mix, [ML −1 T −2 Θ −1 ] the volumetric heat capacity of water, and [LT −1 ] as the vertical flux (a positive value means downward flow). The parameter [L 2 T −1 ] represents the effective thermal diffusivity and can be described by where [ML T −3 Θ −1 ] is the bulk thermal conductivity and is a function based on and the thermal dispersivity [L]. Here, we use the LPMLEn to estimate without prior knowledge of either thermal conductivity or dispersivity. If all thermal parameters in (3) were known (e.g., through prior field or lab measurements), the LPMLEn could also be used to only estimate the Darcy flux, but this is discouraged because and are coupled. However, this prior knowledge could be used to validate the estimation method and model choice (Luce et al., 2013).
As the solution to (1) is determined by its boundary conditions, we first cover the analytical solution for two common choices, (a) the semi-infinite domain and (b) the bounded domain with Dirichlet (temperature specified) boundary conditions. As we provide the analytical solutions in the frequency domain, but the measured data are time series, a transformation to the frequency domain and additional processing are required, for which we use the Fast Fourier Transform (FFT) and the LP method, respectively. The unknown parameters are then estimated using the MLE considering multiple frequencies, n sensors and their uncertainty (the MLEn part).

The Models and Their Analytical Solutions
For well-posedness, i.e., to obtain a unique solution to (1), two boundary conditions are required. The two choices commonly applied are: (a) the semi-infinite domain where the upper boundary condition at depth is given by the function ( ) and the lower boundary condition is set to approach zero at infinity (Hatch et al., 2006;Luce et al., 2013;Sohn & Harris, 2021;Stallman, 1965;Vandersteen et al., 2015) and (b) the bounded domain where the upper boundary condition at depth is given by the function ( ) and the lower boundary at depth by the function ( ) . The semi-infinite domain model  is given by and the bounded domain model  by Using a transfer function notation, the solution to both models is given according to van Berkel et al. (2014a) and van Berkel et al. (2013) by where Θ( ) is the Laplace transform of ( ) , while ( ) and ( ) are the Laplace transforms of the functions describing the upper and lower boundary conditions, respectively. The transfer functions ( ) and ( ) for the semi-infinite domain model  are given by and for the bounded (finite) domain model  by with ] as a simplification of the parameters for the eigenvalues, described in detail in van Berkel et al. (2014a). The aim is to estimate = [ ] , which then can be transformed back to and . However, to estimate the unknown parameters, the model needs to be linked to the temperature time series. In practice, the Laplace variable can only be measured on the imaginary axis, thus = , where = √ −1 and is the angular frequency. Any measured temperature time series can be transformed to the frequency domain using the FFT such that it is a function of . However, the measured streambed temperature data are commonly not perfectly periodic due to natural conditions such as temperature fluctuations that are slower than the measurement period, introducing spectral leakage (Pintelon & Schoukens, 2012). Furthermore, the measured temperatures can be subject to transients due to the initial condition, sensor drift and noise. To reduce spectral leakage and remove the other unwanted contributions to the temperature signal, we apply the LP method to assess and improve the quality of the data set.

Processing the Data Set With the LP Method
A standard method to reduce spectral leakage is windowing, which is known to introduce systematic (bias) and random (noise leakage) errors when transforming data to the frequency domain (Pintelon & Schoukens, 2012). The LP method (Pintelon et al., 2010a(Pintelon et al., , 2010b outperforms windowing techniques by assuming linear relationships (transfer functions) between a given noiseless reference signal and the measured response, i.e., the LP considers that the signal originates from a system rather than it being independent. The LP method splits the measured signal into a forced response, (circular complex normally distributed) noise, and transients (i.e., the signal part that cannot be explained by the reference signal). Similar to Vandersteen et al. (2015) and Schneidewind et al. (2016), we use the temperature measurement Θ ( 1, ) at the top of the streambed as the "noiseless" reference signal, as this signal contains the largest temperature perturbation and as such is least subjected to measurement noise. As a result, the LP returns the (estimated) forced response Θ ( , ) and the corresponding covariance matrix Θ( ) for the remaining = 2, …, sensors, which will be used for the parameter estimation process. This is in contrast to Vandersteen et al. (2015) where the "noiseless" reference signal is included in the estimation of and . Furthermore, the obtained covariance matrix allows us to assess the quality of the data set and to take the measurement uncertainty consistently into account during parameter estimation performed subsequently using the MLE.

The n-Point MLE
The aim of the MLE is to find those parameters ̂= [̂̂] for which the output of the selected model (in our case  or  ) is the most likely to generate the measured temperatures. An MLE has been used previously by van Berkel et al. (2013van Berkel et al. ( , 2014a to estimate the thermal transport coefficients considering slab geometries on a semi-infinite domain with two sensors (MLE2) and a bounded domain with three sensors (MLE3). They were later incorporated into the LPML (Vandersteen et al., 2015) and the LPMLE3 , respectively. A cylindrical representation of the MLE3 has also been put forward (van Berkel et al., 2019).
In a similar fashion, for the LPMLEn, we construct the log-likelihood cost function  ( ) that considers the uncertain boundary inputs and as in Schneidewind et al. (2016) and determines the likelihood for the n-outputs , which are the remnant measurements of the system. For the sake of clarity, we define the new variable ( ) = [ ( ) , ( ) , ( ) ] that contains both the inputs (boundary conditions) and the outputs (other temperature measurements) and has the corresponding covariance matrix . Thus, for and for  : ] .
Using this general notation, the cost function for the log-maximum likelihood (ML) is given as in Pintelon and Schoukens (2012) by with the Hermitian transpose , the angular frequencies that contain relevant information , ∈ , which contains a total number of frequencies. The error is then given by with its corresponding error covariance matrix where denotes the identity matrix of size × . The transfer function vectors and are given by where , = 1, …, are the depths of the sensors considered as output of the model. Note that the cost function is constructed from the sum of squares of the error that is normalized by its (co)variance. Each of the × equations in the cost function will cause a 2 distribution with × degrees of freedom. Under the assumption that we have no modeling error, the expected value of the cost function is × − 2 2 . Here, the correction factor 2 2 originates from the two real parameters that are estimated.
All things considered, we have now defined the complete set of noise and transfer functions belonging to the n-point estimator with constant transport coefficients. Note that the LPML in Vandersteen et al. (2015) results in a much simpler form of (15) and (16) , 0] and that in the LPMLE3 in Schneidewind et al. (2016) the matrix products were written out considering only one of n sensors resulting in scalar expressions for (15) and (16), which were independently derived at the time.
As with the LPML and LPMLE3, the parameters are estimated by minimizing the cost function  The cost function  is nonconvex and is therefore often optimized using nonlinear least squares minimization techniques such as Gauss-Newton or Levenberg-Marquardt methods (Fletcher, 1980;Levenberg, 1944;Marquardt, 1963). As the number of parameters is limited and the analytical transfer functions are known, the analytical Jacobian is determined and used to improve computational efficiency. Furthermore, the analytical Jacobian is used to determine the variance of the estimated parameters following: with its Jacobian where 1∕2 is a square root of the positive definite matrix , i.e., = . To obtain ̂′ , ̂ is transformed viâ where the uncertainty of the variable ̂′ is obtained using propagation of uncertainty, i.e., 10.1029/2021WR030886 6 of 13 with the Jacobian of the transformation given by Hence, we can estimate the most likely vertical flux and thermal diffusivity including the corresponding covariance matrix for both parameters.

Application of the LPMLEn
In this section, we estimate the thermal diffusivity and vertical flux by applying the LPMLEn on three synthetic and one experimental data set using both, the semi-infinite domain model  and the bounded domain model  . The utilization of synthetic data in 1D heat transport modeling to test methodological limitations and better understand the modeling process is a common approach also used by others (Glose et al., 2019(Glose et al., , 2021Irvine et al., 2020;Lautz, 2010).
Here, the synthetic data sets are used to show that the semi-infinite domain model can estimate the flow in the wrong direction even though thermal diffusivity and Darcy flow are constant on the domain where the measurements are taken. Moreover, the semi-infinite domain can be seen as a special case of the bounded domain, i.e., on a semi-infinite domain, the bounded domain will still be exact while the inverse is not true. This gives the LMPLEn the unique capability to diagnose whether the use of a semi-infinite domain approach is justified for a given data set and whether the estimated parameters can be deemed accurate. This is demonstrated by applying the LPMLEn on an experimental data set and comparing the estimation results and model fits.

Synthetic Data Sets
We apply the LPMLEn with both models on four synthetic data sets. The first two synthetic data sets consist of five temperature signals at depths = [0.15, 0.17, 0.20, 0.25, 0.35] m, generated by evaluating (6) with the transfer functions for the semi-infinite domain model (7), (8) resulting in data set I, and the bounded domain model (9), (10) resulting in data set II. The used input signals are taken from an experimental data set. Here, the lower boundary condition for data set II (bounded domain) does not originate from a semi-infinite domain model. Stated differently, the temperature signal at the lower boundary condition contains information that cannot be explained by the semi-infinite domain model, e.g., by a contribution of hyporheic flow. For the sake of clarity, we did not add noise to these data sets and used the identity matrix as the covariance matrix. More information about the generation of the synthetic data sets I and II is presented in the in Supporting Information S1. The simulated and estimated transport parameters for both data sets are shown in Table 1. For completeness, the model fits are presented in Figure S2 in Supporting Information S1. The estimation results show that if the underlying physics-based model is the semi-infinite domain model (data set I), both models will estimate the same transport parameters. On the contrary, for data set II where the lower boundary condition contributes to the solution, the flux estimate obtained with the semi-infinite domain model is in the wrong direction.
Data sets I and II are based on constant parameters over the entire model domain. For the semi-infinite domain, this means that diffusivity and flux need to be constant up to infinity and deviations from this assumption can profoundly affect the estimates. We show this by generating synthetic data set III where the diffusivity is changed from 1.5 10 −6 m 2 s −1 to 0.8 10 −6 m 2 s −1 at = 0.275 m. The data set is generated using (6) with a numerical approximation of the transfer functions using a central finite difference scheme in MATLAB with 10,001 points. The upper boundary condition at = 0 m is taken from an experimental data set and the lower boundary condition at = 10 m is set to zero to mimic a semi-infinite domain. Details about the generation of this data set can be found in Supporting Information S1 and our MATLAB model is similar to a numerical model presented in Das et al. (2019). Furthermore, the temperature is known, i.e., measured, every 0.05 m from 0.1 to 0.5 m.
A single set of transport parameters is estimated at each depth using three neighboring sensors. The estimation results are plotted in Figure 1 at the middle sensor location, e.g., at 0.15 m, the estimated transport parameters are shown for the estimate using = [0.10, 0.15, 0.20] m. In Figure 1, we clearly see that in shallower depths, In addition to synthetic data set III and Figure 1, Figure S3 in Supporting Information S1 contains the estimation results of the diffusivity change in the opposite direction (data set IV), i.e. from 0.8 ⋅ 10 −6 to 1.5 ⋅ 10 −6 m 2 s −1 .
The results are similar to Figure 1, but now with the semi-infinite domain model overestimating the transport parameters instead of underestimating them.
In conclusion, we have shown two scenarios where the LPMLEn can help us detect when estimates of the semi-infinite domain approach are inaccurate compared to using a bounded domain model. Additionally, the LPMLEn with the bounded domain model might prove extremely helpful analyzing the vertical flux of distinct small (local) domains with variations in streambed sediment taking multiple sensors into account, especially for temperature data with very high vertical spatial resolution (e.g., in the cm range) as is typically the case for data collected with FO-DTS where the fiber-optic cable is coiled around a PVC core (Briggs et al., 2012;Folegot et al., 2018;Vogt et al., 2010).

Experimental Data Set
Here, the LPMLEn is demonstrated on a 90-day temperature time series (location ML1) from the Slootbeek, a small tributary to the Aa River, Belgium. The field site and flow system of the River Aa and Slootbeek have been described in detail in previous studies (Anibas et al., , 2018Ghysels et al., 2021). Temperature was measured using a multilevel temperature lance (UIT, Dresden, Germany) from 17 February to 12 May 2012 every 10 min at the streambed top and six additional depths (see Figures 2a and 2b). To determine following (22), we used = 4.18 × 10 6 Jm −3 K −1 and = 3.07 × 10 6 Jm −3 K −1 . The value for is based on previous research from Vandersteen et al. (2015) and closely represents a sandy loam (Ren et al., 2000;Stonestrom & Constantz, 2003).
Similar to Vandersteen et al. (2015), we analyze the experimental data set using a rectangular 10-day window. To reduce the computational load during parameter estimation, only the relevant or informative frequencies should be included in the analysis. To select the relevant frequencies, the data set is transformed to the frequency domain using the FFT and then processed using the LP method with sensor 1 as noiseless reference signal. As the LP method provides the covariance matrix of the processed Fourier coefficients for the other sensors, the ratio between the amplitude squared and the variance of the Fourier coefficients, i.e., signal-to-noise ratio (SNR), shown in Figure 2c for one 10-day window, can be used to select the informative frequencies for the parameter estimation process. Similar to Vandersteen et al. (2015), we choose to include all frequencies up to 1.5 d −1 . While selecting the relevant frequencies and sensors for the parameter estimation is ultimately the choice of the modeler this selection will influence the parameter estimates. However, including noninformative frequencies in the parameter estimation process will have little impact on the estimates because the MLEn uses the error (co) variances as weighting, hence signals with a low SNR have a smaller contribution to the solution. They will influence (increase) the expected value of the cost function and delay the computation. On the other hand, discarding informative frequencies will influence the estimation depending on the (co)variance.
Comparatively, excluding a sensor has a more significant effect on the estimates than excluding noninformative frequencies as it also changes the domain on which the parameters are estimated. In Vandersteen et al. (2015), the parameters are estimated using sensors 1-7, while we here estimate the parameters using sensors 2-6 as sensor 1 is used as noiseless reference signal for the LP method and sensor 7 is excluded due to the low SNR at some intervals. In addition, our bounded domain model  uses the bottom sensor as boundary condition, i.e., = 3 , whereas the semi-infinite domain model  considers this sensor as an output ( = 4) . Consequently, the semi-infinite domain model is fitted over one additional sensor depth. For this reason, we also estimate the parameters with the semi-infinite domain model using sensors 2-5, such that the number of outputs between semi-infinite and bounded domain models is the same ( = 3) . However, note that this estimate is then based on a reduced data set compared to the bounded domain model. The diffusivity, vertical flux, and the (normalized) Figure 2. (a) Multilevel temperature lance used to collect data, (b) temperature data from the Slootbeek at ML1 (both modified from Vandersteen et al., 2015), (c) example of signal-to-noise ratio (SNR) plot for one 10-day window after using the local polynomial method. It can be used to select informative frequencies. We chose to include all frequencies up to 1.5 d −1 , indicated by the vertical dashed line.
cost function value are obtained by moving the 10-day window along the data set ( Figure 3). As the number of equations is much larger than the correction factor 2 2 , the expected value  ≈ 1 if there are no modeling errors. Hence, the difference with this expected value reflects the modeling error.
From Figure 3c, we see that, in general, the bounded domain model has the smallest modeling error. Moreover, while on some intervals, estimates with the semi-infinite and bounded domain models agree, at other intervals the flow is estimated in opposite directions (Figure 3b, e.g., between day 10 and day 20). As shown with the synthetic data set, if both models agree in their estimates, the estimates can be deemed trustworthy. However, in case the two models do not agree, the bounded domain model is more likely to estimate the correct flow. To verify this, we can look at how the models fit the data. For clarity, the model fits at 1 = 9.2569 and 2 = 17.2153 are shown in Figure 4 without the error bars on the measured data, while a figure with error bars is shown in Figure S4 in Supporting Information S1. In these figures, we see that at 1 all models fit the data very similarly, while at 2 the bounded domain model describes the phase behavior better than the semi-infinite domain models do.
Additional evidence is provided by looking at the joint variation of the estimated diffusivity and flux from Figures 3a and 3b, presented in Figure 5. As mentioned previously, we estimated the thermal diffusivity instead of the bulk thermal conductivity as there is a physical basis to expect variation of the diffusivity with flux (3), namely, hydraulic dispersion. Two common notations of the thermal dispersivity are given by de Marsily (1986) and Rau et al. (2012)  The estimates originating from the bounded domain model  follow that pattern much more strongly than the semi-infinite domain model  , suggesting that some of the variations estimated by the semi-infinite domain model are spurious. A linear least squares fit of (25) and (26) on the bounded domain estimates is shown along the estimates in Figure 5. As both fits do not perfectly fit the data, further research is necessary to establish more insight into the joint variation of the estimated parameters. However, one should note that the observed variations are not the result of small variations within the data set, but originate from larger trends as small variations, e.g., estimates originating from the next and previous 10-day windows, would be close to each other (within each others 95% confidence ellipses, see Figure S5 in Supporting Information S1).
Combining the knowledge from the synthetic data set, the model fits and the variation of the diffusivity with flux, indicates to us that the semi-infinite domain model estimates the flow in the wrong direction and that the estimates originating from the bounded domain model are more trustworthy. However, as with any model no absolute guarantee exists that all underlying physical processes in our natural system are described sufficiently well.

Conclusion
The LPMLEn extends previous frequency domain approaches to estimate vertical streambed fluxes and thermal diffusivities by including temperature data from n sensors in the multifrequency parameter estimation process.  It contains the commonly used semi-infinite domain model, and a more robust and physically based bounded (finite) domain model. Processing the data with the LP method and systematically taking the measurement uncertainty into account by minimizing the (log) ML cost function result in an estimate of the parameters and their corresponding uncertainties.
The application of the two models to the same data set is crucial in verifying whether the use of the semi-infinite domain approach is justified and the estimated diffusivity and Darcy flux can be deemed accurate. The semi-infinite domain model is a special case of the more general, bounded (finite) domain model; hence, the semi-infinite domain model is more likely to estimate the flux erroneously whereas the bounded domain model has a higher likelihood to estimate the direction of the flux correctly as is demonstrated using synthetic data sets as well as field measurements. In case estimates obtained with both model types agree, model fits and cost function values are also very similar. Where the estimated parameters do not agree, the bounded domain model fits the data better and has a much lower cost function value.
As such, the LPMLEn with the bounded model can be used to verify assumptions made for the semi-infinite domain model. Additionally, if the data set allows for it, the bounded domain model can easily be used to estimate parameters for distinct vertical streambed sections, which might shed light on the contribution of shallow hyporheic flux and deeper groundwater upwelling to the overall flux estimate in future studies.

Data Availability Statement
For the moment, LPMLEn is provided in MATLAB but future improvements will include a Python-based GUI that allows the user to compare various 1D temperature models and choose specific windowing techniques. The MATLAB version (van Kampen et al., 2022) is available at HydroShare: http://www.hydroshare.org/ resource/3b13760174174c31988120baeb84e2e8. and effort to review our manuscript and suggest invaluable improvements. This work has been carried out within the framework of the EUROfusion Consortium, funded by the European Union via the Euratom Research and Training Programme (Grant 101052200-EUROfusion). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. DIFFER is part of the Institutes Organization of NWO. U.S. was supported by a fellowship from the German Research Foundations (403826296). The views and opinions expressed herein do not necessarily reflect those of the German Research Foundation. The authors declare no competing interest.