Choice of Pedotransfer Functions Matters when Simulating Soil Water Balance Fluxes

Modeling of the land surface water‐, energy‐, and carbon balance provides insight into the behavior of the Earth System, under current and future conditions. Currently, there exists a substantial variability between model outputs, for a range of model types, whereby differences between model input parameters could be an important reason. For large‐scale land surface, hydrological, and crop models, soil hydraulic properties (SHP) are required as inputs, which are estimated from pedotransfer functions (PTFs). To analyze the functional sensitivity of widely used PTFs, the water fluxes for different scenarios using HYDRUS‐1D were simulated and predictions compared. The results showed that using different PTFs causes substantial variability in predicted fluxes. In addition, an in‐depth analysis of the soil SHPs and derived soil characteristics was performed to analyze why the SHPs estimated from the different PTFs cause the model to behave differently.

It has been shown that PTFs are highly accurate for the area (or the input data range) they were developed for, but have limited accuracy if applied outside these regions (Vereecken et al., 2010). Several reviews about the accuracy and reliability of PTFs for the van Genuchten model (VGM) have already been published (e.g., Donatelli et al., 2004;Schaap, 2004;Wösten et al., 2001). Hereby, the predicted hydraulic function from the PTFs were compared to the measured data and the goodness of fit of the prediction was evaluated. The authors used two metrics to determine the performance of the PTF: 1) the term accuracy was related to the comparison between predicted and measured values of water content or hydraulic conductivity that were used to develop the PTF; 2) reliability was related to the evaluation of PTFs on measured values that were different from those that were used to develop the PTFs (Wösten et al., 2001;Zhang et al., 2020). Reliability studies are typically validation studies such as those performed by Tietje and Tapkenhinrichs (1993), Wösten et al. (2001), and Wagner et al. (2001). Despite much progress in developing PTFs and in identifying appropriate PTF predictor candidates, some unresolved or unexplained variability still exists at the level of the soil sample (Schaap & Leij, 1998), which plays and important role when functional aspects of soils are being studied and analyzed using numerical models (e.g., Christiaens & Feyen, 2001). Functional aspects already studied are the impact of PTFs on water supply capacity (Vereecken et al., 1992), ground water recharge (Vereecken et al., 1992), and aeration (Wösten et al., 2001). In the study of Vereecken et al. (1992), the authors showed that 90% of the variation in the predicted soil water supply was attributed to estimation errors in hydraulic properties using the PTFs developed by Vereecken et al. (1989Vereecken et al. ( , 1990. Chirico et al. (2010), on the other hand, evaluated the effect of PTF prediction uncertainty on the components of the soil water balance at the hillslope scale. One major result was that the simulated evaporation was much more affected by the PTF model error than by errors resulting from uncertainty in the input data (e.g., soil texture).
Land surface models (LSMs), when embedded in numerical weather prediction or climate models, generally operate at large scales (regional, continental to global scales) and rely on PTFs to predict the hydraulic functions needed to solve the Richards equation for the water flow. Different LSMs use different PTFs for this purpose (Vereecken et al., 2019). As Vereecken et al. (2019) showed, not only different PTFs but also different hydraulic models (Campbell, Brooks and Corey, or Mualem-van Genuchten) are in use. Knowing that different PTFs and/or the choice of the hydraulic model will impact the outcome of the water flow simulations (e.g., Guber et al., 2006;Yakirevich et al., 2013), a key question is how recently launched LSM inter-comparison activities of the Land Surface Schemes (LSSs) embedded in LSMs, such as those under the Global Energy and Water Exchanges (GEWEX) GLASS project (https://www.gewex.org/panels/global-landatmosphere-system-study-panel/) or model inter-comparisons initiated by the World Climate Research Program (https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6, of which GEWEX is part), such as CMIP6 and its predecessors, will be impacted by the choice of PTF.
Therefore, the aim of this study is to systematically analyze the functional sensitivity to the choice of different PTFs using a physically based numerical model. As the "truth" of this model exercise is unknown, the performance of the model runs with its hydraulic parameters derived from a set of individual PTFs will be evaluated against the ensemble mean as best predictor, as well as against the 70 and 90 tolerance intervals of the ensemble range. The numerical exercise is structured in the following way: 1) model runs for homogeneous soil profiles without vegetation, 2) homogeneous soil profile covered with grass and wheat, 3) layered bare soil, 4) layered vegetated soil (grass and wheat), and 5) influence of a fluctuating water table in a layered grass vegetated soil. Finally, additional soil physical properties were calculated based on the estimated soil hydraulic parameters obtain from the PTFs, which were used to explain the differences observed in simulated water fluxes. As some LSMs also use class PTFs (van Looy et al., 2017;Vereecken et al., 2019), we will also analyze the use of this type of PTF, and the associated errors when simulating water fluxes. We formulate three hypotheses 1) the use of different PTFs will lead to systematically different hydrological states and fluxes (e.g., net infiltration, evapotranspiration, root zone water availability, drainage), 2) some PTFs can be identified which perform distinctively differently from the ensemble spread in terms of 90% tolerance interval outliers, and 3) the differences in predicted states and fluxes simulated with inputs from different PTFs will be reduced with increasing model setup complexity.

Hydraulic Functions
Three pairs of hydraulic functions are widely used in hydrological modeling, namely those developed by Brooks and Corey (1964), Campbell (1974), and van Genuchten (1980). The Brooks and Corey (BC) (1964) water retention function is given by: where α is the reciprocal of the air entry value (or bubbling pressure) (cm −1 ), n is a dimensionless shape parameter (−) (related to 1/b for the original Brooks-Corey b parameter), h is the pressure head (cm), and S e is the effective saturation (−) given by: where θ is the actual water content (cm 3 cm −3 ), θ r is the residual water content (cm 3 cm −3 ), and θ s is the saturated water content (cm 3 cm −3 ).
The unsaturated hydraulic conductivity is given by: where K s is the saturated hydraulic conductivity (cm d −1 ).
The Campbell (1974) water retention function is a modification of that introduced by Brooks and Corey (1964), with θ r set to 0.
The Mualem van Genuchten function (MvG) (van Genuchten, 1980) is given by: where m is a shape factor related to n by m = 1−1/n. For the VGM model, the unsaturated hydraulic conductivity is calculated by: where λ is the tortuosity factor [−].

Pedotransfer Functions
For this study 13 pedotransfer functions (PTFs) were used, whereby eight predict the hydraulic parameters for the MvG (van Genuchten, 1980) and five PTFs predict the parameters in the BC (Brooks & Corey, 1964) or Campbell (1974) functions. Out of these 13 PTFs, four were so called class-transfer functions, where only the USDA textural classes will be used as input for the prediction of the hydraulic parameters. It has to be noted that the PTF of Clapp and Hornberger (1978) (from here on Clapp&Hornberger) does not specify hydraulic parameters for the silt class. All other PTFs use textural information (gravimetric percentage of sand, silt, and clay) as basic inputs. Additionally, some PTFs require information about BD such as the PTFs of Schaap et al. (2001) (here referred to as Rosetta SSC + BD), Wösten et al. (1999) (here Woesten), Weynants et al. (2009) andWeihermüller et al. (2017) (here Weynants), and that of Tóth et al. (2015) for the topsoil (here "Toth continuous"). Others need information about the organic carbon content (C org ), which are the Woesten, Weynants, and "Toth continuous" PTFs. Here, it has to be noted that an updated version of Rosetta (Rosetta3) is also available (Zhang & Schaap, 2017), which provides more accurate soil hydraulic parameters compared with the estimation from the original Rosetta model. Nevertheless, we decided to use the older Rosetta version, as it is widely in use and also imbedded in some hydrological software such as HYDRUS .
Soil organic carbon is used as a predictor for the PTFs as it affects soil BD, hydraulic conductivity, and water retention because of its effect on soil structure and adsorption properties (van Genuchten & Pachepsky, 2011).
Total porosity ϕ, as estimated from BD, is used only by Rawls and Brakensiek (1985) (here Rawls MvG), and pH and cation exchange capacity (CEC) are inputs in "Toth continuous." An overview of all used PTFs, their abbreviations, and their inputs is provided in Table 1. The region from where data were taken to train the PTF are from either the USA or Europe. Rosetta is the only PTF combining two data regions, whereas Weynants PTF is based on samples from Belgium only. In addition, the number of samples used for PTF development greatly differs, ranging from 5320 for Rawls PTFs to 166 for Weynants. Important for the PTF development is the data used to generate the PTFs, whereby either only retention data (θ (h) ) or a combination of retention (θ (h) ) and unsaturated hydraulic conductivity (K (h) ) data was used. θ (h) and K (h) were used in the development of the PTFs Rosetta, Woesten, Weynants, and Toth, whereby the percentage of available K (h) data is typically low compared to the availability of θ (h) data, generally due to the more complex and laborious procedures required to determine K (h) . Even though in some cases both types of data (θ (h) and K (h) ) were used in the development of some PTFs, the data were not jointly inverted to estimate the hydraulic parameters, meaning that Rosetta, Woesten, and Toth fitted the hydraulic parameters sorely on the retention curve and used the fitted α and n values of the Mulaem van Genuchten equation to predict K (h) . In contrast, Weynants used joint inversion of both hydraulic characteristics (θ (h) and K (h) ) simultaneously to estimate the parameters including a near saturation hydraulic conductivity K s * at a predefined pressure head of −6 cm. All other PTFs either used the closed form expression of van Genuchten (1980) or Brooks and Corey (1964) to predict K (h) , using the estimated parameters from the retention data, together with measured K s values, to estimate the unsaturated hydraulic conductivity based on either van Genuchten (1980) or Brooks and Corey (1964).

of Used Pedotransfer Functions (PTFs) With Input Parameters
In this study, we will compare model simulations for 12 soil textural classes. For the estimation of hydraulic parameters from texture based continuous PTFs a representative soil texture was used for each soil class located in the center of the respective class area in the textural triangle; bulk density and C org were set to 1.4 g cm −3 and 1%, respectively. The texture of the corresponding class is depicted in Figure 1 and the predicted hydraulic parameters for all applied PTFs are listed in Annex Table 1 and Annex Table 2.
In general, it is known that relatively small changes in the shape of the soil water retention curve near saturation can significantly affect the results of numerical simulations of water flow for variably saturated soils, including the performance of the numerical stability and rate of convergence (Schaap & van Genuchten, 2006;Vogel et al., 2001). To address this problem, especially in fine textured soils, the estimated air entry value (i.e., the reciprocal of α) from the PTF for the van Genuchten formulation (Equation 4) was set to −2 cm as proposed by Vogel et al. (2001), whenever the originally proposed set of hydraulic properties from the PTF did not lead to numerical convergence.

Numerical Modeling
For the simulation of vertical water flow, the one-dimensional Richards equation (Equation 6) was solved using the finite element code HY-DRUS-1D : where z represents the vertical coordinate (cm), positive in the downward direction, Q is the source/sink term, θ is the volumetric water content (cm 3 cm −3 ), and K (θ) is the unsaturated hydraulic conductivity (cm d −1 ) as a function of water content.
WEIHERMÜLLER ET AL.  Wesseling (1991) A 200 cm soil profile was simulated, and the lower boundary condition of the flow domain was defined as free drainage, which is typically used when the ground water table is far below the soil surface. As a second option, a fluctuating ground water where A s is the amplitude of the sine curve, which is defined as the maximum displacement of the function from its center position, and shows the height of the curve (fluctuation) (here, A s = 100 cm), 2π is the natural period of the sine curve, f is the frequency (here, f = 1/365 days −1 ), and t is the time period of the sine curve (here, t = 1-10,988 days).
The upper boundary condition in HYDRUS was set to an atmospheric boundary with surface runoff. The domain was non-equally discretized with 401 nodes, with finer discretization at the top to account for the stronger flow dynamics close to the soil surface. Pressure head was used for initialization of the soil profile with linearly decreasing potentials between the bottom (0 cm) and the top (200 cm) node (i.e., hydrostatic equilibrium).
For the simulations, different setups were chosen with varying complexity. A simple homogeneous soil profile without vegetation was selected as the simplest case, to study the impact of the choice of different PTFs. Complexity was increased by adding different vegetation covers (grass and wheat). In both cases, growth was not simulated and both crops covered the soil throughout the entire year. Potential evapotranspiration, ET 0 , was split into soil evaporation, E, and transpiration, T, by setting T to 75% of ET 0 . Also, rooting depth was assumed to be the same (0-30 cm) for both vegetation covers with a linear decrease in root density from the top soil layer to the maximum rooting depth. The root water uptake reduction model of Feddes et al. (1974) was used, based on the parameter values of Wesseling (1991) for both grass and wheat vegetation, as taken from the HYDRUS embedded look up table (see Table 2). Therefore, the only difference between both vegetation scenarios was the root water uptake. This simplification in terms of growing season and rooting depth was done to simplify the comparison of the simulation results, by ensuring that root water uptake will not be from different soil layers when grass is replaced by wheat. In a next step of increasing complexity, soil layering was introduced, whereby two layering schemes were assumed. 1) Sandy loam over silt loam overlaying a loamy sand and 2) silt loam over silty clay loam overlaying a silty clay, respectively. For the layered profiles the first layer was set to extend from 0 to 50 cm, the second layer from 50 to 100 cm, whereas the third layer occupied the rest of the profile (100-200 cm). Again, the same vegetation parameters as for the homogeneous soil were used. Finally, the layered system covered by wheat with a fluctuating groundwater table was simulated. Figure 2 shows a schematic view of the seven model scenarios used in this study.
WEIHERMÜLLER ET AL.  Thirty years (10,988 days) of daily climatic data (comprising precipitation and Penmen-Monteith potential ET) from 1982 to 2011 were taken from North Rhine-Westphalia, Germany (mean NRW climatic data) as used by Hoffmann et al. (2016) and Kuhnert et al. (2017). The climate is humid temperate.

Statistics and Data Evaluation
The HYDRUS-1D outputs were used to analyze the fluxes of the soil water balance, i.e. actual evaporation (E a ), actual evapotranspiration (ET a ), drainage (D), and runoff (R). For the data of the 13 different PTFs the cumulative flux was selected and the arithmetic mean, 0.05 and 0.95 percentiles (which spans the 90% tolerance interval), and the 0.15 and 0.85 percentiles (which spans the 70% tolerance interval), were calculated for the last data point of the cumulative fluxes (t end = day 10,988) for each soil textural class. In a next step, outliers based on the percentiles were calculated and flagged according to Equation (8) and (9) where flux(x) is the cumulated flux (e.g., actual evapotranspiration) at t end for each individual PTF used to simulate the soil class for each model scenario. In other words, if the flux value, flux(x), at t end exceeds (is less than) the percentile span calculated, the simulation was flagged with a 1 (−1), whereas if the flux value, flux(x), at t end lies within the given percentile, the simulation was flagged with a 0. The same procedure as for the 90% tolerance interval was repeated for the 70% tolerance interval. In order to present variability of the hydraulic parameters or simulated fluxes, the coefficient of variation (CV) was calculated, relating the standard deviation to the mean value. The CV was expressed as a percentage.
For the analysis of differences in soil hydraulic properties estimated by the PTFs, the comparison of two group means was performed with the non-parametric Wilcoxon rank sum test using Matlab® ranksum function using the probability p = 0.05. Principal component analysis (PCA) was performed, also with Matlab®.
To help interpret the differences between model runs with regards to the water fluxes, the matric flux potential, MFP, the characteristic length, L C , the sorptivity, S, and characteristic time t grav was calculated as explained next.
The matric flux potential MFP (cm 2 d −1 ) is a convenient bulk soil hydraulic property that is often used in soil water movement studies (e.g., Pullan, 1990;Raats, 1977;Grant & Groenevelt, 2015), which is defined as the integral of the hydraulic conductivity K (h) (cm d −1 ) over the pressure head h (cm) starting at an arbitrary reference pressure head h ref : where h ref was set to permanent wilting point at h = −15,000 cm according to Pinheiro et al. (2018) and h set to full saturation (h = 0).
As a second soil physical feature, the characteristic length of bare soil evaporation, L C (cm), was calculated. According to Lehmann et al. (2008Lehmann et al. ( , 2018, L C is the maximal extent of the capillary flow region to supply water to evaporating surface. L C is determined by the range of capillary pressure between large and small pores driving the capillary flux against gravity (expressed as head difference, denoted as gravity length L G ) and the hydraulic conductivity K eff of the supply region. Formally, L C is defined via: where α and m are the van Genuchten parameters used in Equation (4) and E 0 is potential evaporation rate.
To calculate effective conductivity, K eff was estimated as 4K crit (Haghighi et al., 2013;Lehmann et al., 2018), whereby K crit is the unsaturated hydraulic conductivity at critical water content and capillary pressure when capillary pathways start to disconnect (Lehmann et al., 2008). The critical capillary pressure and gravity length are determined based on linearization of the soil water retention curve. For the van Genuchten formulation used in Equation (11), the linearized retention curve consists of the tangent to the inflection point, and L G and h crit can be expressed analytically. For Brooks and Corey, the values are determined numerically with a line passing through the air-entry value (S e = 1, h = h b ) and a particular point on the retention curve that is closest to (S e = 0, h = h b ).
The soil sorptivity, S (cm d 0.5 ), which is defined as a measure of the capacity of a porous medium to absorb or desorb liquid by capillarity, was calculated assuming a soil column with uniform initial water content and infinite length, following the approach of Parlange (1975) where D(θ) (cm 2 d −1 ) is the diffusivity defined by Klute (1952) as: For the initial water content, θ i , the maximum reported residual water content (θ r ) for all MvG parameters used in this study (0.192 cm 3 cm −3 ) was used for all PTFs based on MvG and 0.12 cm 3 cm −3 for all PTFs based on BC.
Finally, the so-called characteristic time (t grav ) was calculated according to Philip (1957), which determines the "time" t where gravitational forces become dominant (t ≫ t grav ), while for t ≪ t grav capillary forces remain dominant over gravitational forces (Rahmati, Groh, et al., 2020;: Two final related soil properties that were calculated were the characteristic time for the attainment of FC, τ FC (d), and the elapsed time required for attainment of FC, denoted as t FC (d), see Assouline and Or (2014).
To do so, the effective soil saturation at FC S FC (Equation 15) and the unsaturated hydraulic conductivity at FC K(S FC ) (Equation 16) were calculated by: Moreover, the quantity of drainable water from the soil profile to depth z at FC, Q FC , expressed as equivalent water depth (dimensions of length), has to be calculated as: Here, z was set to 30 cm to match the maximal rooting depth for convenience, as z only scales with total Q FC .
Consequently, a characteristic time (d) for the attainment of FC, τ FC , can be deduced from the ratio of these two quantities, Q FC and K FC : The drainage dynamics can now be linked to the elapsed time (d) required for attainment of FC, denoted as t FC : where K m is the effective hydraulic conductivity that represents the mean value of K(S FC ) weighted by S FC (representing the available relative cross section of flow):

Predicted Hydraulic Parameters and Hydraulic Functions
In a first step, the retention and hydraulic conductivity curves for the 13 PTFs and the 12 USDA soil classes were plotted based on the estimated soil hydraulic parameters listed in Annex Tables 1 and 2. As an example, the retention and hydraulic conductivity curves for the USDA textural class sand is plotted in Figure 3 (see Annex Figure 1 for all retention and hydraulic conductivity curves of all USDA textural classes). Figure 3 shows that the retention curves based on the 13 PTFs greatly differ along the entire pressure head range. For these curves, the saturated water content, θ s , varies between 0.472 for Rawls MvG and 0.375 cm 3 cm −3 for Cosby SSC with a mean of 0.417 cm 3 cm −3 over all PTFs. The corresponding coefficient of variation (CV) is 7.5% for the sandy soil. The residual water content, θ r , varies between 0 for those PTFs setting θ r to 0 such as Weynants, Clapp&Hornberger, Cosby SC, and Cosby SSC, to 0.061 cm 3 cm −3 for the "Toth class" PTF with a mean of 0.029 cm 3 cm −3 and a CV of 80.1%, indicating a much higher variability in terms of CV in θ r compared to θ s . An even larger variability can be found in the saturated hydraulic conductivity, K s , for which we found a maximum value of 1520.6 cm d −1 for Clapp&Hornberger and a minimum of 8.3 cm d -1 for the "Toth class" with a mean of 315.8 cm d −1 (CV = 129.2%). In general, the smallest CV values (data not shown) for all USDA textural classes were found for θ s , with lowest ranging from 2.4% for the silty clay loam class to CV = 7.5% for the sand class. (Larger variability was observed in θ r with the lowest coefficient of variation in the loamy sand class (CV = 76.3%) and highest in the silty clay loam class (CV = 106.6%). As expected, K s showed the largest variability, with lowest CV in the loam class (91.7%) and largest in the clay loam class (215.2%). The larger CV values for the K s estimation is not surprising as a large uncertainty in predicted K s has been already widely reported (e.g., Ahuja et al., 1985;Jaynes & Tyler, 1984;Schaap , 1998;Tietje & Hennings, 1996). Furthermore, it has to be noted that the PTF developed by Weynants et al. (2009) predicts K s * instead of K s , where K s * is a hydraulic conductivity acting as a matching point at suction head h = −6. Therefore, some slightly lower K s (here K s *) value will be predicted by Weynants' PTF.
On the other hand, there seems to be a clear grouping among the class PTFs, with regards to the estimation of K s . Clapp&Hornberger predicted the highest values for six classes (sand, loamy sand, sandy loam, loam, silt loam, and silty clay loam), followed by the PTF of Woesten for five soil classes (sandy clay loam, clay loam, sandy loam, silt loam, and clay). Even more pronounced is the picture for the prediction of lowest K s , whereby the PTF of Rawls MvG estimated the smallest K s values for 11 soil textural classes, except for sand, whereas the "Toth class" PTF showed the lowest K s values. Unfortunately, the α and n (or 1/b) values cannot be directly compared between the BC and MvG approaches, as both parameters have a slightly different physical meaning.

Numerical Model Performance
As numerical stability of the simulation is one of the crucial aspects in the choice and application of the PTF, especially for large scale modeling, we analyzed each PTF with respect to numerical convergence, when using HYDRUS. For each PTF and the seven model scenarios (see Figure 2), 44 individual model runs were performed: for each PTF, the three homogeneous soil layer model scenarios were modeled for each soil textural class (these are 36 model runs). In addition, the four layered configurations are run for a coarse and a fine soil layering, resulting in eight model runs; hence, a total of 44 model runs per PTF were obtained. Note that for the Clapp and Hornberger (1978) PTF, only 41 model runs were performed as no parameters were reported for the silt class. For 486 model runs, out of the total 569 model runs (i.e., 85%), convergence was achieved. A total of 184 out of 217 (85%) of the model runs for the BC and 279 out of 352 (79%) for the MvG parameterization converged, even though it has been reported that the BC type function sometimes prevents rapid convergence and might therefore cause numerical problems. This was deemed to be caused by the discontinuity present in the slope of both the soil water retention and unsaturated hydraulic conductivity curves (van Genuchten, 1980).
For those cases where the simulation did not converge for the MvG parameters, the air entrance value (the inverse of α) was set to −2 cm, and the model was rerun. This procedure increased the total number of converged MvG simulation runs to 302 (86%), which is a similar percentage to that obtained for BC. Looking at individual PTFs (see Table 3), we can see that the Rosetta SSC + BD and Cosby SSC seem to be numerically very stable with 100% converged runs. The Woesten and "Toth continuous" function converged for > 95% of the runs, after setting the 1/α to −2 cm. On the other hand, the lowest convergence was found for the Rawls MvG and Rawls BC with 43% and 39%. Unfortunately, using 1/α = −2 cm did not improve convergence for Rawls MvG.
The reason why some PTFs prohibited the HYDRUS model from converging is quite apparent for some cases. For example, Rawls MvG and Rawls BC yielded very low K s values of 0.8 cm day −1 for the loam and sandy clay class and ≤0.3 cm day −1 for clay loam, silt, and silt loam class. Extremely low values were obtained for silty clay loam (0.04 cm day −1 ) and clay (0.004 cm day −1 ), almost allowing no infiltration at all. Another extremely low K s value was predicted by the "Toth class" PTF for silty clay, with 0.01 cm day −1 ; these unrealistically low values again led to numerical instabilities.
It has to be noted that the reported convergence here is only valid for the numerical model (HYDRUS-1D) used in this exercise with the given numerical (convergence) default criteria, vertical discretization and temporal resolution, and atmospheric boundary conditions. The performance of these PTFs may change if a different numerical scheme, e.g. solving the Richards equation in the mixed or diffusivity form were used, or a different spatial discretization and/or temporal resolution. Furthermore, the lack of certain processes in our simulations (e.g. coupled heat and water transport or evaporation from the wet canopy), or the nature of the atmospheric forcings (e.g. a difference in rainfall frequency and amount) will affect the likelihood of convergence.

Simulated Fluxes over Time
First, the simulated cumulative fluxes were analyzed. ET a (vegetated surface) or E a (bare soil) is a key flux as it indirectly contains information of the net infiltration into the soil profile (net daily infiltration = daily sum of precipitation-daily sum of ET a or E a ), deep drainage (over long-run), and plant available water in the root zone. Furthermore, ET a or E a determines the return of water from the soil profile to the atmosphere, and as such affects the land surface energy budget. Cumulative ET a or E a data for each scenario/soil class combination was plotted and the arithmetic mean of all data (model ensemble mean, MEM) for each combination, as well as the spread of the data, was calculated by the 70% and 90% tolerance interval according to Equation 8 and 9.
As an example of the high variability in simulated fluxes, the simulated cumulative E a , for the homogeneous bare soil scenario of loamy sand texture, over the entire simulation period of 30 years, that ends on day 10,988 (t end ), is plotted in Figure 4a. There is a large variability between the various simulations based on the 13 PTFs. MEM at t end is 1692 cm (564 mm year −1 ). The smallest simulated cumulative E a was 1,273 cm (424 mm year −1 ) for the Carsel&Parrish PTF, and largest, with 2,043 cm (681 mm year −1 ), for Weynants. The difference of 257 mm year −1 between the largest and smallest simulated E a , and their deviation of 140 and 117 mm year −1 from MEM clearly indicates that the choice of PTF substantially affects the estimation of the E a for this soil class. In contrast, low variability was found for cumulative E a for the bare homogeneous clay loam (see Figure 4b). Notably, two out of the 13 simulations did not converge (Rawls MvG and Rawls BC), which potentially also impacts the variability in simulated fluxes. Nevertheless, for the remaining 11 simulations the lowest simulated flux was 1744 cm (581 mm year −1 ) for Rawls class PTF and largest for Weynants PTF with 2,041 cm (680 mm year −1 ) (for this soil, MEM = 1,893 cm or 631 mm year −1 ). Overall, the difference between the largest and smallest flux is only 99 mm year −1 , i.e., 2.5 times smaller than the difference found for the loamy sand. As E a will be also be influenced by the precipitation entering the soil (total precipitation over 30 years = 2,479.7 cm (827 mm yr −1 )), we also looked at the cumulative runoff. For most soil textural class/PTF combinations, still for the homogeneous bare soil scenarios, runoff is low or negligible, with zero runoff, or values <1 cm over 30 years, for 121 model runs, which is equivalent to 88% of runs. Nine simulations (7%) returned a runoff >1 cm but <10 cm, and eight exceeded 10 cm over 30 years (7%).

Table 3
Overview of Converged Simulations for the Different PTFs Listed in Table 1 The highest cumulative runoff was generated for the Rawls MvG/silt combination with 675 cm (27% of total precipitation) followed by Rawls silt loam with 664 cm and Rawls MvG loam with 389 cm. These three combinations have also been flagged as outliers of the lower 0.15 percentile for total cumulative evaporation at t end , which can be explained by the fact that less water enters the soil, and therefore less will be evaporated. The same holds for the Carsel&Parrish PTF and silty clay loam (runoff = 135.1 cm) and sandy clay (runoff = 70 cm) combinations. On the other hand, the combination "Toth class" PTF/silt loam generated 122.9 cm runoff but was classified as an upper 0.95 percentile outlier, generating more E a . Finally, Rawls BC/silt and "Toth continuous"/silty clay combinations generated runoff of 156.1 and 43.7 cm, respectively, yet are not classified as outliers. In general, runoff generation is linked to low K s values (see Annex Tables 1  and 2). An overview of all cumulative E a fluxes at t end for the bare soil scenarios is plotted in Fig. Annex 2. Our findings with regards to ET a for the vegetated scenarios (grass and wheat), still with a homogeneous soil profile (Annex Figures 3 and 4), were comparable. For some soil classes, such as clay, clay loam, and silty clay loam, variability between PTFs was low, for both grass and wheat, whereas the ET a for the sandy and sandy loam soils showed consistently high variability. In contrast, ET a for the loamy sand class exhibited relatively high variability for grass (as was also the case for bare soil) and a slightly smaller one for the wheat scenario configuration. For the other soil textural classes, the picture is less clear. . Surprisingly, eight out of 12 soil class/PTF combinations yielding runoff >100 cm were not flagged as outliers for the 90% tolerance interval for the grass and four out of 10 for the wheat.
There are some unexpected findings, namely that the simulation for the Toth continuous' PTF yielded 46.1 and 11.7 cm runoff, respectively, for the silty clay and silty clay loam under wheat vegetation, despite the fact that the ET a flux at t end was flagged as an outlier of the upper 0.95 percentile, indicating relatively high evaporation with respect to the model ensemble.
Finally, the simulation for the scenario of sandy loam overlying silt loam and loamy sand plotted in Annex Figure 5 showed much lower variability for E a compared to E a of the homogenous profile with the texture of the uppermost layer (silt loam in Annex Figure 2). This indicates that soil layering will reduce the effect of the choice of PTF on the cumulative evaporation. This holds true even more for the layered bare soil scenario where silt loam overlies silty clay loam that is overlying silty clay. Again, the variability in E a for the layered system is much lower than that of the homogeneous silt loam, that forms the first layer in the vertically heterogeneous soil profile. Besides, it can clearly be seen that when vegetation is introduced, variability increases slightly, which is reflected in the coefficient of variation (CV) of the flux at t end , where for the layered profile topped by sandy loam the CV increased from 3.9, via 5.1% to 4.9% for the bare, grass, and wheat vegetation scenario. For the profile with the first layer consisting of silt loam, CV values were 0.5%, 3.7%, and 3.7% for the bare, grass, and wheat vegetation, respectively. Introducing a fluctuating ground water table increased the CV substantially to 13.6% for both vegetated layered systems. Variability in simulated E a or ET a for the layered scenarios can partly be explained by a large reduction in runoff. In total only two simulations (2%) for the Carsel&Parrish (sandy loam topped layered profile under grass vegetation (290.2 cm) and sandy loam topped layered profile for the wheat vegetation and ground water fluctuation (302.6 cm)) exceeded runoff of 100 cm. A further four exceeded the runoff threshold of 1 cm (3 combinations for Carsel&Parrish and one for Rawls BC).
Overall, the choice of PTF substantially affects the simulated values of E a or ET a for most soil classes, irrespective of the fact whether the soil was bare, where the water (vapor) can only leave the soil column via the pore-space at the soil surface, or vegetated, where a considerable proportion of the water being returned to the atmosphere consist of water taken up from the deeper rooted parts of the soil profile.

Outliers per Scenario
As shown above, substantial variability in simulated E a or ET a fluxes occurred for different PTFs and model scenarios. The fluxes exceeding the 70 or 90% tolerance intervals, respectively, were marked as outliers and calculated for each scenario and soil class according to Equation 8 and 9. The number of outliers were counted for each scenario individually in a first step.
The number of outliers varies greatly between PTFs with regards to E a fluxes for the homogeneous bare soil (see Figure 5a); these fluxes were shown in Figures 4a and 4b. Naturally, more outliers are detected for the 70 than for the 90% tolerance interval. For this scenario, Rosetta SSC, Weynants, and "Toth class" exceed the upper 0.95 percentile, whereby Weynants exceeded this percentile for all soil classes where the model had converged, except for silt and silt loam. Rosetta SSC exceeded the upper 0.95 percentile for clay, whereas the "Toth class" PTF exceeded it for silt and silt loam, respectively. Looking at the lower 5 percentile, Carsel&Parrish PTF exceeded this threshold for eight soil classes, and further outliers were found for Rawls MvG (N = 6) and Rawls BC class (N = 4). Two outliers were calculated for Cosby SSC, Rawls BC, and one for Rosetta SSC and Woesten PTFs.
Finally, only Rosetta SSC + BD, "Toth continuous," and Cosby SC indicate no outliers for the upper and lower 70% tolerance interval.
As some simulation runs did not converge (see discussion above), the comparison in terms of total number of outliers is limited. Therefore, the total number of outliers was normalized to the number of converged simulations for each scenario and PTF combination. Again, we present the relative number of outliers for the homogeneous bare soil profile simulations in Figure 5b as an example (all others are shown in Annex  Figures 6-8).
Here, the Weynants PTF shows the largest percentage of outliers for the upper 0.95 and 0.85 percentile with 82% and 100% outliers, respectively. For the "Toth class" PTF, we found 20% and 50% outliers for the upper 0.95 and 0.85 percentile, respectively, indicating that also this PTF simulated larger fluxes with respect to the ensemble. On the other hand, Rawls MvG shows the largest percentage of outliers at the lower end (86% for the 0.15% and 57% for the 0.05 percentile) followed by Carsel&Parrish PTF with 73% for the 0.15% and 45% for the 0.05 percentile. However, Rawls BC and Rawls class also show substantial percentages of outliers for the 0.15 percentile. By comparing the relative (converged only) and absolute (all runs) number of outliers, it can be seen that despite equal or even lower or higher absolute number of outliers for different PTFs, the relative numbers differ due to non-converged simulation runs for some PTFs. For instance, "Toth class" for the 0.95 percentile showed two outliers yielding 20% relative outliers as two simulations (silty clay and silty clay loam) did not converge, whereby one outlier for Rosetta SSC yielded only 8% relative outliers as all simulations converged.

10.1029/2020MS002404
15 of 30 As there is no clear trend in the analysis of the absolute or relative outliers for the individual scenarios (see Figure 5b and Annex Figures 6-8b) which PTF generates most outliers, from here on the outliers over all scenarios for all soil textural classes were calculated for converged simulation runs only and expressed in relative terms. Figure 6a shows the outliers of the 90% tolerance interval (sum of upper and lower outliers), combined for all textural classes, for the seven scenarios for the 13 PTFs for E a and ET a at t end . In this figure, the PTFs of the two main hydraulic formulations are clustered: those based on the Mualem van Genuchten (MvG) on the left and those based on Brooks Corey (BC) formulation on the right. Furthermore, two lines are added, dividing the results into three groups: i) those PTFs with relative number of outliers <10%, classified as "robust," ii) those PTFs with 10%≥ outliers ≤ 20%, classified as "intermediate robust," and iii) the PTFs with relative number of outliers >20%, classified as "non-robust." It has to be noted that these thresholds (10% and 20%) were chosen arbitrarily, but may help to formulate the final recommendations for the choice of preferred PTF, to be used in land surface models, for example.  for the 90% tolerance interval, and can be therefore classified as "robust" with respect to the ensemble behavior (spread). Interestingly, all PTFs using BC formulation show low relative numbers of outliers below 10%. Woesten PTF did not show any outliers at all, indicating that this PTF is very robust with respect to the PTF ensemble used. On the other hand, the "Toth class" PTF was classified as intermediate robust, and three PTFs (Carsel&Parrish, Rawls MvG, and Weynants) were classified as non-robust, whereby Rawls MvG produced most outliers (32%).
The results for the 70% tolerance interval are shown in Figure 6b and followed the same approach as for the 90% tolerance interval discussed above. Four PTFs are characterized as robust (Rawls BC, Clap-p&Hornberger, Cosby SC, and Cosby SSC). Again, all these four PTFs serve to produce parameters for the BC hydraulic formulation. The intermediate robust grouping includes Rosetta SSC, Rosetta SSC + BC, Woesten, and "Toth continuous," that provide parameters for the Mualem van Genuchten formulation. Finally, Carsel&Parrish, Rawls MvG, Weynants, "Toth class," and Rawls BC class are those PTFs classified as non-robust. There are two class, rather than continuous, PTFs here, indicating that continuous PTFs are more likely to be robust. Also, the Weynants PTF was based on a relative small number of samples, for Belgium only.
Based on the results presented above, it can be concluded that the use of different PTFs results in different hydraulic properties that predict considerably different E a or ET a fluxes leading to different soil water contents in the root zone but also to differences in deep percolation (or ground water recharge). Furthermore, PTFs such as Carsel&Parrish, Rawls MvG and Weynants can be identified as systematically less robust. In contrast, others, such as Woesten or all PTFs using the BC formulation (except Rawls BC class) seem to be robust with respect to the ensemble of PTFs used in this study.

10.1029/2020MS002404
17 of 30 To facilitate the identification of outliers, all outliers per PTF, scenario, and textural class combination were color coded and plotted in Table 4. Again, Weynants overestimates E a or ET a fluxes (brown color for dryer soil conditions) for nearly all textural soil classes except for clay, and silt. On the other hand, Rawls MvG shows underestimation (blue color for wetter soil conditions) for loam and silt loam over all three homogeneous soil scenarios and for silt and sandy loam for two out of the three homogeneous soil scenarios. The Carsel&Parrish' PTF, on the other hand, results in over-and underestimation, depending on soil class.
In the study of Zheng et al. (2020), who evaluated also a set of 13 PTFs using independent retention data (data not used in the development of the PTFs), the Carsel&Parrish PTF showed largest RMSE in predicted volumetric water content of the retention data points, which is in agreement to our findings that those PTF is characterized as less robust with respect to simulated fluxes. On the other hand, Weynants PTF was the one with lowest RMSE, whereas Weynants was characterized as less robust in our case. The reason why Weynants behaves differently between the study presented by Zheng et al. (2020) and our study, might be that only the retention characteristics were analyzed by Zheng et al. (2020), whereas in the function sensitivity performed here, also the hydraulic conductivity function plays a virtual role. The reasons why Weynants is characterized as less robust is further discussed in the following sections.

10.1029/2020MS002404
18 of 30 Note. Numbers for the individual pedotransfer/soil class/scenario combinations depict the % of total 90% tolerance interval outliers for the instantaneous E a /ET a flux. NaN are non-converged simulations. % spread is the spread in % between minimum and maximum cumulative E a /ET a at tend over one soil class/scenario combination.

Simulated Spread with Respect to Scenario
We raised the hypothesis that differences (variability) in simulated fluxes from using different PTFs will be reduced with increasing model complexity. Increasing complexity was generated by introducing vegetation (grass or wheat), soil layering, or the assumption of a fluctuating ground water table, for the layered vegetated soil scenario only. As only the homogeneous scenarios (bare, grass, and wheat) used all soil classes, we restrict the analysis on these three scenarios.
For the analysis, again the simulated cumulative actual E a or ET a data at t end was taken and the model ensemble mean (MEM) for E a to ET a at t end over all PTFs was calculated for each individual soil class and scenario. Based on the MEM value for E a or ET a , as well as the individual E a or ET a value at t end for each model run, the % difference from the MEM (100/MEM*E a @t end or ET a @t end ) was calculated and visualized using boxplots in Figure 7, where the red line indicates the median, the box indicates the 0.25 and 0.75 percentiles, the whiskers represent the most extreme data points not considered as outliers, and crosses represent the outliers (value is more than 1.5 times the interquartile range). From the boxplots, two types of information can be deduced: i), the variability of predicted E a or ET a over all PTFs for one soil class/scenario and ii), the change in variability (spread) resulting from a change in scenario complexity (bare, grass, or wheat vegetation).
In general, the largest variability in predicted E a or ET a was found for the bare soil conditions, which is most pronounced for the loam, loamy sand, sand, sandy clay, clay loam, and sandy loam class. Minor differences were found between bare and vegetated scenarios for the other soil classes. The silty clay soil class for the grass scenario showed the smallest overall spread between minimum and maximum predicted E a (or ET a ) with a value of 7% (min = 97% and max 104%). On the other hand, the largest variability was found for the combination sandy soil/bare soil scenario with 53% (min = 72% and max 125%). All spreads, throughout the 13 PTFs, for different soil classes and scenarios are provided in the final column of Table 4. Overall, bare scenarios show a mean spread of 30%, whereby the grass and wheat vegetated scenarios have only 23% spread over all soil classes. A possible explanation for the reduced spread in simulated E a or ET a with increasing model complexity (in this case vegetation) is that for the vegetated profiles water is extracted from the rooted portion of the soil profile, whereas under bare soil the water can only leave the soil profile at the soil surface. In the latter case, differences in the soil hydraulic properties, especially in unsaturated hydraulic conductivity, which is highly variable (on the order of magnitudes) between PTFs, close to the surface will impact the E a flux more substantially. As shown earlier, runoff will occur in both scenarios (bare and vegetated) and is even slightly larger for the vegetated scenario, and therefore, cannot explain the reduced variability.
Next, for the layered soil scenarios (bare, grass, and wheat, without fluctuating groundwater table) a clear reduction in the variability was observed, for both profiles (by sandy loam or silt loam). The bare and the vegetated scenarios showed nearly the same spread (mean 13.4% for bare, 18.5% for grass, and 15.5% for the wheat). In general, the sandy loam overlaying silt loam and loamy sand showed always higher variability compared to the silt loam overlaying silty clay loam and silty clay, which is consistent to the finding that the sandy loam of the homogeneous soil profile also showed higher variability compared to the homogeneous silt loam scenarios.
Overall, the results indicate that adding vegetation reduces the variability in the simulated E a or ET a flux, even if runoff occurs more frequently. This conclusion also holds for adding more complexity in terms of soil layering, although the latter has to be regarded with some caution due to the low number of soil combinations selected for these model runs. However, taking into account that large portions of our global land surface is covered by vegetation, differences in predicted fluxes, as a result of differences in PTFs used to generate the hydraulic parameters, will most likely be smaller compared to an "unvegetated world." In contrast, adding a fluctuating ground water table to the layered wheat scenario greatly increased variability in ET a flux, for both soil layering to 48% and 35% for the sandy loam overlaying silt loam and loamy sand, and silt loam overlaying silty clay loam and silty clay, respectively.

Differences in Instantaneous Fluxes
Cumulative fluxes at t end will only provide long-term systematic under or overestimation, but will not provide information on how the instantaneous fluxes fluctuate compared to the MEM. Therefore, the instantaneous fluxes were also analyzed. The same analyses as conducted for the cumulative fluxes were performed, i.e., calculation of the MEM and the 0.95 and 0.05 percentiles for time step i, whereby i runs from day 1 to 10,988. Second, the total number as well as the upper and lower percentile outliers were counted. As an example, the outliers of E a for the sandy loam of the homogeneous bare soil scenario were plotted in Figure 8, for the different PTFs. Carsel&Parrish PTF shows a substantial number of outliers for the lower 0.05 percentile (N = 2020 or 18% of all days), indicating that for these days less water will evaporate and return to the atmosphere, which would have implications for the cloud forming processes of a numerical weather prediction or climate model if a LSM using this PTF were to be embedded within it. On the other hand, Weynants has 3053 outliers for the upper 0.85 percentile (28%) but also a smaller number of outliers for the lower 0.05 percentile (N = 309 or 3%), leading to larger E a flux. A large number of 0.05 percentile outliers were also found for Rawls MvG (N = 2,992 or 27%), again combined with a lower number of upper 0.95 percentile outliers (N = 391 or 4%). Cosby SC, Cosby SSC, and Rawls BC showed only low number of outliers (N < 10) for the upper and lower percentiles. Even though the model runs for which the hydraulic parameters were derived from Carsel&Parrish and Rawls MvG PTFs exhibit large numbers of outliers, both are not flagged as 90% tolerance interval outliers when the cumulative flux at t end was analyzed. This means that the non-flagged instantaneous E a fluxes compensate for the lower fluxes determined as outliers in Figure 8, or that the outliers are close to the 0.15 percentile, which is reflected by the fact that the total sum of underestimated flux (outlier flux-flux for the lower 0.05 percentile for each outlier day) is low, amounting to 5.7 and 2.4 cm over the 30-year period, respectively. Moreover, both PTFs show runoff exceeding a total of 1 cm in 60% (Carsel&Parrish) and 36% (Rawls MvG) of all converged simulations, respectively. For the Rawls MvG the nine simulations with runoff even exceed the 100 cm threshold, with runoff ranging between 388.6 and 859.2 cm. Looking at all textural classes (data not shown) for the homogeneous bare soil scenario, 29 soil class/PTF combinations out of the total 151 do not exhibit any outliers at all for the instantaneous E a flux. These outliers are clustered in three soil classes only (clay, silty clay, and silty clay loam). Interestingly, out of these 29 with zero outliers in instantaneous flux, five are flagged as outliers for the cumulative flux at t end (Rosetta SSC clay, Cosby SSC clay, Weynants silty clay and silty clay loam, as well as Carsel&Parrish silty clay loam), meaning that these PTFs over-or underestimate instantaneous E a only very modestly, yet consistently throughout the simulation period.
The percentage of all 90% tolerance outliers (sum of upper and lower outliers) summed over all days for all three homogeneous soil scenarios (bare, grass, and wheat) for all soil classes and PTFs are provided in Table 4. Over all soil classes and PTFs, the bare soil scenario has the lowest total number of outliers (N = 119,930 days or 6.5% over all days and scenarios) followed by the homogeneous wheat configuration (N = 17,3961 days or 10.1%) and the homogeneous grass scenario (N = 17,8249 days or 10.4%). This finding is perhaps in contradiction to the finding that the percental spread in cumulative E a or ET a at t end was larger for the bare soil scenario, compared to the vegetated ones. Furthermore, for some texture classes the total number of outliers increased remarkably when vegetation was implemented, such as for the clay class, where the bare soil scenario has no outliers (0%), while the percentage of outliers increased to 14% for the homogeneous grass and wheat scenario, respectively. This indicates that the differences in available root zone water, affecting actual transpiration, are the main driver for differences between PTFs, compared to fluxes over the soil surface E a . On the other hand, only the silty clay and the silty clay loam showed no outliers at all for the instantaneous flux for all scenarios. Looking at all soil class/PTF/scenarios combinations, no clear trend in the total number of outliers in instantaneous evapo(transpi)ration flux, and flagged outliers for the cumulative E a or ET a flux at t end can be observed. This leads to the conclusion that the outliers in instantaneous flux alone do not necessarily sum up to a cumulative flux flagged as an outlier.

Explaining Variability and Outliers by Soil Physical Properties
As has been shown, substantial variability exists in cumulative and instantaneous fluxes, and some PTFs are found to be more robust than others. In this section, we discuss in more detail the reasons for the differences between the predicted soil water fluxes, resulting from the use of different PTFs, by analyzing the estimat-ed hydraulic parameters K s , λ (MvG tortuosity parameter) and the soil physical characteristics. In general, variability between estimated K s for the different PTFs is quite low (Figure 9a), and values for Rawls MvG and BC only are significantly lower than all other PTFs. These lower values may explain the poor numerical convergence for these simulations, and the prevalence of lower E a fluxes as well as a high number of lower 0.05 percentile outliers at t end as depicted in Table 4, especially for Rawls MvG. Clapp&Hornberger K s values are significantly higher than those estimated by the Weynants PTF, "Toth class," and Cosby SC and SSC, yet did not show any high outliers for E a fluxes. Interestingly, Cosby SC and SSC were developed based on the same water retention and K s data as Clapp&Hornberger, as both used data from Holtan et al. (1968), nevertheless estimated K s values are quite different. One reason might be that Clapp&Hornberger only used textural classes, and averaged K s for those classes, whereas Cosby SC and SSC is a continuous PTF. Coming back to the outliers listed in Table 4, those runs based on Weynants PTF indicate larger E a fluxes at t end and a large number of upper 0.95 percentile outliers, whereas their estimated K s is not significantly different from most other PTFs. Here, it has to be noted that Weynants did not estimate K s but rather estimated a near saturation hydraulic conductivity K s * that is mainly controlled by textural properties and which is lower that K s . The results suggest that variability in K s alone cannot explain the flux differences simulated. Looking at the λ value used in MvG formulation, two different classes of PTFs can be distinguished, those setting λ to 0.5 as originally proposed by van Genuchten (1980) (Carsel&Parrish, Rawls MvG, and "Toth continuous") and those who fitted λ as an additional free parameter (Rosetta SC and SSC, Woesten, Weynants, and "Toth class"). The variability in λ is plotted in Figure 9b. It shows that the λ estimates of Weynants' PTFs are significantly lower than those from the other four PTFs estimating λ, except for "Toth class." "Toth class" λ values are significantly lower than those calculated by Rosetta SC and SSC, and then those setting λ to 0.5, whereas Woesten is significantly lower than Rosetta SC and SSC, and <0.5. The more negative λ values for Weynants appear strongly related to the larger number of upper 0.95 percentile outliers listed in As λ is correlated to the n parameter, and n directly impacts the hydraulic properties and hence L G , L C , τ FC , and t FC , and to a less extend S, the correlation between λ and these soil characteristics was calculated. The results indicated (data not shown) that λ is not significantly correlated to L C , t grav , τ FC , and t FC but moderately correlated to L G (R 2 = 0.31, p = 0.05) and S (R 2 = 0.30, p = 0.05), whereas λ is not correlated to the flux E a at t end .
For the calculated soil characteristics L G , Weynants shows large variability and high median and significantly differs from Woesten, Rawls MvG, Rawls BC class, Rawls BC, and Cosby SC and SSC. In contrast, Rawls BC and BC class show low L G , and Rawls BC class is significant different from Rosetta SSC and Clapp&Hornberger (see Figure 10a). Here, it has to be kept in mind that L G solely depends on the water retention characteristics and hence the n and α values play a crucial role in the calculation. As n is positively correlated with λ, and Weynants shows the smallest λ values, the significant difference, with regards to L G , between Weynants and most other PTFs seems logical. Large L G values occur for very fine textures, which are classically associated to low K values that limit water supply to the evaporating surface, which is reflected by the higher number of upper 0.95 percentile outliers for Weynants, leading to a drier soil profile. Lower E a fluxes at t end , and therefore, a wetter profile occurred frequently for Carsel&Parrish and Rawls (MVG and BC), whereby all these PTFs are also located in the low L G range.
The calculation of L C is based on knowledge of L G and the actual hydraulic conductivity distribution above the evaporation front. Therefore, K s plays also an important role in the calculation of L C . The impact of K s on L C is clearly reflected in the high L C values for Clapp&Hornberger, which exhibit high K s values across all soil classes compared to all other PTFs (see Figure 10b). At the other end of the spectrum, the impact of K s on L C is also apparent for Rawls MvG and Rawls BC which do not indicate much spread and are characterized by low K s and hence low L C . Surprisingly, Clapp&Hornberger are not classified as outliers when looking at cumulative fluxes (see Table 4), whereas the low L C for Rawls MvG corresponds to the number of outliers detected. On the other hand, Weynants, which was characterized as the PTF with most outliers at the upper 0.95 percentile, lies in the middle of the range of L C values depicted in Figure 10b, indicating that L C might not be a good indicator for flagged outliers. As stated in Lehmann et al. (2008), L C longer than 1 m are considered as unrealistic (evaporative extraction of water by capillary flow across several meters is unlikely). Interestingly, only the Clapp&Hornberger PTF show L C >1 m, while all other PTFs give realistic values.
The analysis of MFP shows a quite different picture (Figure 10c). Here, the PTFs based on BC group together and exhibit a higher MFP compared to the MvG based PTFs. Testing on significance showed that Rawls BC class, Clapp&Hornberger, and both Cosby PTFs are significantly different from all others and that only Rawls BC is not significantly different from those using MvG formulation, except for Rawls MvG. This is of interest, as Rawls MvG is only a "translation" of the BC to van Genuchten parameters from Rawls BC according to Morel-Seytoux (1986), while keeping K s . As the Weynants PTF showed substantial outliers, as listed in Table 4, one would also expect Weynants to be different with regards to MFP as the λ value is much smaller compared to all other PTFs, while K s does not differ (see Figures 9a and 9b). One reason for the fact that MFP for Weynants does not differ from the other PTFs might be its relatively low n value, as λ and n are positively correlated. The impact of λ as opposed to the effect of MFP becomes clearer when we compare Weynants and Woesten, which show no significant difference in MFP, yet larger K s values for Woesten and lower λ for Weynants. Overall, the MFP cannot explain the outliers detected and depicted in Table. 4 as only Rawls MvG is systematically different and exhibits large number of outliers, whereas Weynants MFP are in the center of the range of values found for the different PTFs. On the other hand, MFP values for Clapp&Hornberger, as well as for both PTFs from Cosby, are significantly higher, yet do not stand out in Table 4.
With regards to the sorptivity S (Figure 11a), there is a large variability in S for Clapp&Hornberger, which is significantly different from all other PTFs. Small variabilities in S, however, are found for Woesten, Rawls MvG and BC, Weynants, "Toth class" and both Cosby PTF. In general, S is moderately correlated to L C (R 2 = 0.40).
Rawls BC shows a high t grav , which is significantly different from all other PTFs, except for Rawls MvG. Both Cosby PTFs and both Rawls continuous functions (Rawls MvG and BC) show relatively large variability (Figure 11b). The higher t grav for Rawls MvG fits with the larger number of outliers listed in Table 4, whereas for BC this pattern is not clear, maybe due to the lack of numerical convergence. In general, larger t grav values are associated with more fine-grained soils such as loam and clays (Alastal, 2012), whereas the low t grav of Woesten characterizes more coarse soils such as sands.
High τ FC were calculated for Rawls MvG (Figure 11c), whereby the large τ FC is associated with extremely low predicted K s values. Extremely high values were found for Rawls MvG with τ FC exceeding 3 million days, whereby Rawls MvG has K s values of 0.01 and 0.004 cm d −1 for the silty clay and clay class, respectively, and also did not converge. For the two soil classes, silt and silt loam, where the model run did converge τ FC is also extremely large (>44,000 days) and for these soils again low K s values of 0.2 and 0.3 cm d −1 , respectively, were estimated. Additionally, these two model runs are also outliers at the lower 0.05 percentile. Clapp&Hornberger PTF resulted in the smallest τ FC , whereas the K s predictions are in general higher as for the other soils (see Figure 9a) and none of the simulations were flagged as outliers. On the other hand, all other PTFs have comparable τ FC values, and the outliers detected in Table 4 seem not to be linked with τ FC . Finally, t FC was analyzed, which shows the same pattern as τ FC , which is to be expected as t FC and τ FC are linearly correlated as also shown by Assouline and Or (2014).
As these soil physical characteristics were calculated to help explain differences in simulated E a at t end , all characteristics were correlated against E a at t end (see Figure 12). Only log 10 (L G ) shows a moderate correlation to E a at t end (R 2 = 0.52, p = 0.05) and a weak correlation was found for log 10 (L C ), with R 2 = 0.29 (p = 0.05). As E a , and also drainage D at t end , will be biased if runoff is generated (because less water will infiltrate into the soil profile and be available for evaporation and drainage), E a and drainage at t end were normalized (E a_norm , D norm ) by dividing E a or drainage at t end by the difference of precipitation at t end (2479.72 cm) and runoff at t end . By doing so, the correlation between λ and E a_norm increased to R 2 = 0.31 (p = 0.05). For the derived soil characteristics the correlation also increased (to R 2 = 0.57; p = 0.05) for log 10 (L G ) but decreased for log 10 (L C ), to R 2 = 0.10. On the other hand, the correlation slightly increased for t FC , from R 2 = 0.09 to 0.22.
In a next step, a principal component analysis (PCA) using all converged model runs and soil hydraulic parameters available for MvG and BC (θ r , θ s , and K s ) as well as all soil characteristics (L c , L G , MFP, S, and t grav , t FC , and τ FC ) and fluxes (E a_tend , E a_norm , D tend , and D norm ) was performed on log transformed data (except θ r , θ s , MFP, E a_norm , and D norm ) and the results are plotted in Figure 13. The first three components explain 76% of the variability in the data and the important loadings on PC 1 (42.9% of variability) are t FC (0.38), K s (−0.35), and L G (0.33). PC 2 (24.5% of variability) includes the important loadings L C (0.47), E a at t end (0.40), and S (0.31). PC 3 explains only 8.6% of the variability and t grav (0.48) and θ s (0.47) are the important loadings. The PCA triplot shows scatter of the individual PTFs around the origin of the triplot but also distinct PTF clusters, whereby Weynants (black circle) is oriented along the PC 1 in a fairly small volume and is positively correlated to t FC and τFC and negatively to D at t end (as drainage D at t end is negative per definition). Rawls (MvG and BC) is oriented in the same direction as Weynants but it exhibits larger scatter, whereas Clapp&Hornberger (red solid markers) is oriented along PC 2 and correlates positively with K s . reported by Clapp&Hornberger are amongst the highest compared to all other PTFs as already discussed in relation to Figure 9a.
Out of these 13 PTFs, three (Clapp&Hornberger, Weynants, and Rawls) can be identified as being distinctive from all others in the triplot as they do not cluster around the origin. Furthermore, they do not only differ considerably in their estimated soil hydraulic parameters (e.g., λ and n values for Weynants, and K s for Rawls and Clapp&Hornberger) but also in the soil characteristics derived from these parameters, whereby in all soil characteristics either the n value (remember that n is correlated to λ) as well as K s are directly or indirectly integrated. For example, the low L C values for Rawls PTFs indicate that the maximum extent of the flow region sustaining evaporation is much smaller than for all other PTFs. This results in low E a at t end compared to other PFTs and larger number of outliers as depicted in Table 4.
Finally, a multiple regression was performed to test whether E a at t end can be predicted by the soil hydraulic parameters and/or characteristics, whereby only one of those parameters or characteristics were used in turn, i.e. those that were available for MvG and BC. As per Figure 13, all entries were log transformed except for θ r , θ s , and MFP, and the best regression was selected using bootstrapping. The best predictive model was found by E a @ t end = 1252.13 + 183.30 log 10 (L G ) + 367.88 log 10 (L C ) − 405.22 log 10 (S) with an R 2 of 0.88 (see Figure 14) pointing to the fact that the soil characteristics L G , L C , and S describe well the physical behavior of soils with regards to actual evaporation. Using E a_norm instead of E a decreased the predictive power of the multiple regression (R 2 = 0.75).

Summary and Conclusion
In this study, 13 PTF were used to populate the hydraulic parameters required in the HYDRUS model that was then used to simulate the water fluxes for 12 USDA soil classes, for different model scenarios that varied in complexity (homogeneous or layered soil profile, with and without vegetation) over a period of 30 years. Plotting the hydraulic functions (water retention and hydraulic conductivity curves) for all PTFs revealed large differences, especially for the hydraulic conductivity curve, leading to the hypothesis that the different PTFs will also show substantial differences in simulated fluxes.
It turned out that some PTFs generated parameters that rendered the HYDRUS model numerically unstable, so that it failed to converge for certain soil class/configuration combinations, especially those reported by Rawls and Brakensiek (1985) (Rawls MvG) and by Rawls et al. (1982) (Rawls BC), which converged only in less than 44% off all simulation runs. Surprisingly, PTFs using the BC formulation resulted in higher convergence rates, compared to those based on Mualem van Genuchten, even though BC is in general perceived to be less stable.
In a next step, differences in simulated actual evaporation E a or evapotranspiration ET a between the model runs were analyzed, as E a and ET a indirectly contain information on the net infiltration, deep drainage (over long-term) and water stored in the root zone. Therefore, the cumulative E a or ET a at the end of the simulation period (t end = 10,988 days) was selected and the 90% and 70% tolerance interval as well as the model ensemble mean were calculated. Fluxes exceeding the tolerance limits were flagged and counted. The results indicate that some PTFs (Rawls MvG, Weynants, and Carsel&Parrish) were classified as non-robust, as the fluxes generated by the parameters derived from these PTFs exceeded a defined threshold of 20% of the 90% tolerance interval outliers over all scenarios and soil classes. On the other hand, all PTFs using the BC formulation (Rawls BC, Rawls BC class, Clapp&Hornberger, Cosby SC, and Cosby SSC) are classified as robust, as they generally result in a low percentage of 90% tolerance outliers. The PTF of Woesten performed best, and it showed no outliers at all for the 90% tolerance interval. A hypothesis raised at the beginning of the study was that increasing model complexity will reduce the variability in predicted fluxes. Therefore, the individual simulated E a and ET a fluxes at t end were compared to the model ensemble mean (MEM), and the relative spread of the individual simulations was calculated. The results show that the bare soil scenarios exhibit the highest mean percentage spread (30%), whereas the grass and wheat vegetated scenarios had a reduced spread (23%), averaged over all soil classes. The reduction in relative spread with the inclusion of vegetation can be explained by the fact that for these runs the water leaving the soils can be extracted from the entire rooted soil profile (after which it gets transpired via the vegetation), whereas under bare soil conditions it can only leave the soil profile at the soil surface. In the latter case, differences in the soil hydraulic properties close to the surface, especially in unsaturated hydraulic conductivity, which is highly variable (in order of magnitudes) between PTFs, will impact the E a or ET a flux more substantially.
The instantaneous E a or ET a fluxes over time were also analyzed, whereby again the 90% tolerance outliers were calculated and counted. The results indicate that some PTF/soil class/model scenario combinations showed substantial outliers in the instantaneous fluxes, yet were not flagged as outliers for the cumulative flux at t end , indicating that the non-flagged instantaneous fluxes compensate these outliers. On the other hand, other PTF/soil class/scenario combinations showed no outliers for the instantaneous fluxes, but were Figure 14. Predicted E a at t end (cm) by multiple regression of soil characteristics log 10 (L G ), log 10 (L C ), and log 10 (S) versus simulated E a at t end [cm]. flagged as outliers for the cumulative case, indicating that even small over-or underestimations in instantaneous flux can sum up to large errors in the long-run.
To explain differences in simulated E a for the homogeneous bare soil scenario, different soil characteristics were calculated, and a PCA was conducted using all simulated fluxes, soil hydraulic parameters and soil characteristics available for both MvG and BC. The PCA revealed three distinct PTFs clusters, namely Weynants, Rawls, and Clapp&Hornberger, whereby Weynants and Rawls were also characterized by a large number of tolerance outliers. Weynants correlates positively to gravity time of infiltration t FC and τ FC and negatively to drainage D at t end , whereby Clapp&Hornberger is oriented in the opposite direction and correlated with the saturated conductivity K s . For Rawls a reasonable correlation with t FC and τ FC is found, but due to the large scatter for this PTF the interpretation is less clear.
Finally, a multiple regression was performed, showing that the gravitational length L G , characteristic length of evaporation L C and sorptivity S together explain almost 90% of the variability in simulated E a at t end .
Overall, our results provide insights in the functional behavior of the PTFs as a bases for the selection of PTFs in land surface modeling, but also for large scale hydrological or crop models, where considerations regarding the numerical stability, model behavior and performance over the long run and instantaneously should be balanced against each other. Based on this, Rosetta SSC + BD, Woesten, and "Toth continuous" seem to be the most robust PTFs for the Mualem van Genuchten function and Cosby SC for BC. Note, however, that our study is in essence a sensitivity analysis; it does not include model verification using measured fluxes, and it employs one model only.
In any case, the results clearly demonstrate that the choice of PTF can substantially affect the simulated fluxes, and as a consequence, the water content stored in the soil profile with part of that available for root water uptake and crop growth. Therefore, we strongly recommend to harmonize the PTFs used in land surface, large scale hydrological, or crop model inter-comparison studies to avoid artifacts originating from the choice of PTF rather than from model structures. Additionally, our study should motivate future studies, where measured verification fluxes are available from lysimeters and or eddy covariance stations.