Pedotransfer functions developed for calculating soil saturated hydraulic conductivity in check dams on the Loess Plateau in China

Soil saturated hydraulic conductivity (Ks) is a key soil hydraulic property that determines the hydrological cycle of check dam–dominated catchment areas. However, Ks data are lacking due to the difficulty of directly measuring this variable in deep soil layers. In this study, 45 soil profiles (0–200 cm) in 15 check dams in three typical watersheds (Xinshui River, Zhujiachuan, and Kuye River) in a hilly gully region on the Chinese Loess Plateau were selected, and a total of 586 soil samples were collected along the soil profiles. Backpropagation neural network (BPNN) and support vector regression (SVR) models based on the genetic algorithm (GA) were tested, and pedotransfer functions for Ks estimation were established for check dams on the Loess Plateau. Basic soil characteristics, such as soil depth, sand, silt, clay, soil organic matter, and bulk density, were adopted as the model inputs to estimate Ks. Combinations of these parameters could be used to suitably estimate Ks, and the models were found to require relatively few soil characteristics to achieve similar accuracy. In comparison to GA‐BPNN, the GA‐SVR model attained good practicability and was more stable in Ks prediction (the geometric mean error ratio was between 0.942 and 1.101; RMSE was between 0.069 and 0.073). Our research can make some contributions to the solution of land restoration and watershed governance on the Chinese Loess Plateau.


INTRODUCTION
The Loess Plateau of China is the most severe land degradation region in the world because of its unique climatic many check dams possess no spillway, and all the sediments entering from upstream of the dams are deposited (Sun et al., 2014). A sediment layer of considerable thickness (i.e., at least several centimeters) extending across reservoir beds is produced (Tang et al., 2018). Therefore, the hydrological cycle within check dams involves a very complex process. The soil saturated hydraulic conductivity (K s ) is one of the crucial hydraulic properties for the assessment of water and solute transport in soil (Bagarello et al., 2017). Especially for the special sedimentary deposits in check dams, K s is particularly important (Schimelpfenig et al., 2014). Reliable K s data are required to study the hydrological cycle and the hydraulic processes that occur in check dams. However, it is difficult to obtain accurate and direct K s measurements in deep soil profiles (Patil & Singh, 2016;C. Zhao et al., 2020). Hence, pedotransfer functions (PTFs) should be developed to obtain indirect K s estimates of deep soil profiles in check dams on the Loess Plateau.
In recent decades, PTFs have been developed and increasingly adopted as an alternative approach to indirectly estimate K s through more easily measurable soil characteristics, such as soil texture, bulk density (BD), soil organic matter (SOM), and soil pH (Klopp et al., 2020;Lee, 2005;Wösten et al., 2001). Y. Li et al. (2007) applied PTFs to estimate the K s of representative Fengqiu County soils in the North China Plain. Yao et al. (2015) developed local PTFs with a higher accuracy and suitability according to the influencing factors of K s at their experimental sites. Y. Wang et al. (2012) predicted K s based on data from the Chinese Loess Plateau using multiple linear regression methods. However, the previously developed PTFs for K s were mainly designed for cultivated layers and upper soil layers (depths <60 cm), and most of these PTFs were established using traditional linear regression methods based on datasets collected in a large area. Therefore, it is necessary to explore new methods to establish PTFs and test their validity for deep layers of check dams on the Loess Plateau.
Support vector regression (SVR) is a nonlinear machine learning algorithm that has been recognized as a proper tool for predictions (de Santana et al., 2021;Elbisy, 2015;Ghaedi et al., 2016). This approach can attain generalizability with only a few data samples. However, SVR has shortcomings in its application because its performance depends on the determination of the learning machine's insensitive loss coefficient ε, error penalty factor C, and nuclear parameter σ. Traditional parameter selection methods are inefficient and subjective, and their use effect is not yet satisfactory. The backpropagation neural network (BPNN) is also a nonlinear approach (Huang et al., 2021). The advantages of BPNN include a strong adaptive ability, simple learning rules, and ease of implementation (Reda et al., 2021;. Thus, the BPNN is one of the most commonly applied architectures for artificial neural networks (W. Zhang Goh, 2016). However, the convergence speed is low, and the solution process easily yields local extreme values and causes network training to fail (Chen et al., 2010). The evolutionary algorithm known as the genetic algorithm (GA) can solve the problems of the traditional SVR method's difficulty in determining the parameters and the slow convergence of the BPNN (H. Xu et al., 2020;Zou et al., 2020). The GA searches for the optimal solution by imitating the crossover and mutation processes of chromosomal genes in biological evolution (Momenbeik et al., 2010). The GA has been adopted for the feature selection and hyperparameter optimization of regression models for different classification problems due to its good convergence and strong robustness (Sajan et al., 2015). In this study, GA-backpropagation neural network (GA-BPNN) and GA-support vector regression (GA-SVR) models were adopted to derive PTFs for estimating the K s of check dams on the Loess Plateau. The objectives of this study were (a) to determine the variations in the soil characteristic parameters of check dams with soil depth, (b) to quantify the contributions of basic soil characteristics to K s , and (c) to establish PTFs for check dams in three typical watersheds on the Loess Plateau.

Study area and sampling
We selected 15 check dams in coarse sandy hilly catchments on the Loess Plateau distributed in the Xinshui River, Zhujiachuan, and Kuye River watersheds ( Figure 1). These three watersheds are tributaries with severe soil erosion in the middle reaches of the Yellow River and represent typical coarse sand areas on the Loess Plateau. The catchments in the Xinshui River, Zhujiachuan, and Kuye River watersheds are located in southeastern Shanxi Province, northwestern Shanxi Province, and northern Shaanxi Province, respectively. These three typical watersheds feature a midlatitude warm temperate and semihumid continental monsoon climate, with heavy F I G U R E 1 Map of the study area and three typical watersheds on the Loess Plateau, northwest China. The numbers 1, 2, and 3 indicate the three soil profiles of the check dams rains concentrated in summer and autumn and arid windy conditions in winter and spring. The soil is loose and barren, and surface vegetation is scarce, leading to substantial soil erosion and a large sediment transport volume. In this study, six, seven, and two check dams were selected in the Xinshui River, Zhujiachuan, and Kuye River watersheds, respectively. Three soil profiles (0-2 m depth, at equal intervals) were selected in sequence from in front of the dam to the mouth of the ditch in the main channels for analysis. Among these profiles, the profile near the dam was adopted as the main profile, and the other two profiles functioned as auxiliary profiles. As shown in Figure 1, a typical small watershed in Liudaogou was chosen as an example; three sample soil profiles of check dams are marked in the figure. Soil cores and sediment couplets were classified in detail, and samples were collected. An undisturbed soil core with a volume of 100 cm 3 and ∼500 g of disturbed soil were collected from each sedimentary layer of the check dams (Figure 2). The sampling interval was appropriately reduced in thin siltation layers, and a total of 586 samples were collected from 45 soil profiles. Thus, 268, 268, and 50 soil samples were collected from check dams in the Xinshui River, Zhujiachuan, and Kuye River watersheds, respectively. A grain-size deposition diagram of the main profiles of the 15 check dams is shown in Figure 3. None of the plots within the dammed areas of the sampled small watersheds experienced disturbance by human activities (e.g., farming) since its construction.

Laboratory analysis
Soil saturated hydraulic conductivity was determined based on Darcy's law with the constant-and falling-head method (Bagarello et al., 2006). The SOM content was measured with the dichromate oxidation method (Walkley & Black, 1934). The BD value of the check dams in the three typical watersheds was measured at different depths with the volumetric ring method (W. Li et al., 2008). Soil particle sizes were analyzed with a laser particle size analyzer (Mastersizer 2000, Malvern, Ltd.) to obtain fractions with a diameter <0.002 mm (clay fraction), 0.002-0.05 mm (silt fraction), and 0.05-2 mm (sand fraction). Three replicates were determined for each soil sample. The soil texture was determined based on the USDA soil texture classification ( Figure 3).

Prediction models
In this research, GA-SVR and GA-BPNN models were used. When establishing PTFs for K s calculation, the objective F I G U R E 2 Schematic diagram of the main soil profiles of the check dams. The numbers 1-6, 7-13, and 14-15 denote the check dams of the Xinshui River, Zhujiachuan, and Kuye River watersheds, respectively F I G U R E 3 Distribution of the soils investigated in this study (n = 586) based on USDA textural classes function involved minimization of the mean square error between the measured value and the predicted value. The kernel function is one of the key functions in the SVR algorithm that helps in data transformation into hyperplanes (Akinpelu et al., 2020). The radial basis function is the most commonly used kernel function. It was selected as the kernel function because it requires fewer hyperparameters to be defined and has performed well in practical applications (Tan et al., 2015). The cycle of GA fitness scaling, crossover, mutation, and selection processes was allowed to continue until the final continuous mean square error was minimized. Through these processes, the parameters of SVR (ε, C, and σ) were determined (Supplemental Table S1). Similarly, the establishment of the GA-BPNN model entailed optimization of the structural parameters of the BPNN with the GA fitness scaling, crossover, mutation, and selection processes, followed by optimization of the initial weights of the BPNN and the resulting output. A flow chart of the GA-SVR and GA-BPNN models for estimation is shown in Figure 4. In this study, the architectures of the GA-SVR and GA-BPNN models were realized by PYTHON programs (version 3.7).

Statistical analysis
Pearson correlation analysis was used to determine the degree of correlation between K s and the related basic soil characteristics and was performed using SPSS (version 17.0). The partial least squares regression (PLSR) method can be applied to eliminate the covariance among the independent variables under the condition that the independent variables are subject to multiple notable correlations (Subedi & Fox, 2016).
The K s of all check dam profiles in the three watersheds was selected as the response variable in the PLSR analysis, and the soil depth and measured basic soil physicochemical characteristics were adopted as independent variables. The PLSR-related operations and analysis were carried out in SIMCA-P (Umetrics AB) data analysis software. Variable importance for projection was considered to express the predictive importance of the independent and dependent variables of the PLSR model, and regression coefficients were F I G U R E 4 Flow chart of the applied models for estimating the soil saturated hydraulic conductivity (K s ), BD, bulk density; BPNN, backpropagation neural network; GA, genetic algorithm; SOM, soil organic matter; SVR, support vector regression calculated to reflect the direction and intensity of the influence of each variable of the PLSR model. One-way ANOVA of PTFs was performed using SPSS (version 17.0). All input data were normalized using the following function (Besalatpour et al., 2013;Ye et al., 2018): where x i denotes the normalized data, and M(x min ) and M(x max ) are the minimum and maximum observed values, respectively. The data of the main profile (209 samples) were selected to establish the PTFs for K s estimation, with 80% of the data randomly divided into a training dataset (167 samples) and the remaining 20% assigned to a test dataset (42 samples).

Evaluation of the model performance
To evaluate and compare the adopted approaches, we calculated three statistical performance metrics: the geometric mean error ratio (GMER), geometrical standard deviation error ratio (GSDER) and RMSE. The GMER represents the average factor predicting over or under the measured value and is equal to 1 in the best case. A value of GSDER equal to 1 corresponds to perfect matching and increases with the deviation from the measured data. The RMSE should be as low as possible and close to 0. The best model will, therefore, yield a GMER near 1 and low GSDER and RMSE values.
where M(x i ) and P(x i ) are the true values and model-estimated values, respectively, and n is the number of data points. Multiple linear regression is a traditional method for developing PTFs for soil parameters (Santra & Das, 2008;Y. Wang et al., 2012). Therefore, based on the available soil data in this study, four widely applied PTFs were selected to assess their performance in K s estimation and were compared with the measured K s values of check dams in the study area. Three statistical parameters (GMER, GSDER, and RMSE) were considered as evaluation criteria. Note. BD, bulk density; CV, coefficient of variation; K s , soil saturated hydraulic conductivity; SOM, soil organic matter. Table 1    another part of the catchment. This soil profile continuously changes due to high erosion and continuous sedimentation, and therefore soil properties might not be related to soil depth.

Contributions of the influencing factors of K s
The correlations between K s and depth, sand, silt, and clay were significant at the .01 level, and the correlation with BD was significant at the .05 level. However, K s was not closely related to SOM. The K s value was significantly positively correlated with sand and negatively correlated with silt and clay.
A preliminary correlation analysis revealed a significant correlation between two independent variables at p < .05 or p < .01 (Table 2). Therefore, the influence of collinearity among independent variables was eliminated by PLSR analysis. In the PLSR model for K s estimation (Figure 6), the variables with variable importance for projection >1 included soil depth, clay, sand, and silt in descending order, and the regression coefficients were 0.16, −0.08, 0.07, and −0.06, respectively.

F I G U R E 6
Variable importance for projection and regression coefficients between the soil parameters and the soil saturated hydraulic conductivity. Error bar denotes SD

Pedotransfer functions for K s
In this study, soil depth was considered a permanent input variable in each combination the regression coefficient between the soil depth and the K s value in the model was T A B L E 3 Performance of pedotransfer functions with different input variables for the genetic algorithm-support vector regression (GA-SVR) and genetic algorithm-backpropagation neural network (GA-BPNN) models

F I G U R E 7
Estimated soil saturated hydraulic conductivity with the genetic algorithm-backpropagation neural network and genetic algorithm-support vector regression models. The number 0 represents the normalized measured data; 1, 2, 3. . . 16 represent the 16 combinations of pedotransfer functions; the performance was computed from the test dataset the highest at 0.159 ( Figure 6). The different input-output combinations formulated for the establishment of the GA-SVR and GA-BPNN models were based on the arrangement and combination of the soil depth and other soil physical characteristics as inputs and K s as the permanent output (Table 3). The PTFs for the 16 combinations established based on the GA-SVR model were not significantly different from each other, except for significant differences between Combina-tions 6 and 7 with Combination 9 (Figure 7). Compared with those of the GA-SVR model, the PTFs developed by the GA-BPNN using these soil characteristics had a larger range of fluctuations in the predicted data. There was also no significant difference between the mean values of the combinations and the measured data, except for Combination 16, which significantly overestimated the measured values (GMER = 1.204). Although Combinations 14, 15, and 16 considered particle composition, BD, and SOM, their accuracy did not improve significantly. Therefore, when establishing the PTFs of K s , only a few easy-to-obtain soil characteristics, such as the depth+sand+silt combination, are considered because of their similar accuracy.
When the input was optimal, the GA-SVR model yielded a GMER (0.990) value near 1 and GSDER (1.394) values smaller than those of the GA-BPNN model, indicating that the GA-SVR model achieved better applicability. Here, the GMER values of the GA-SVR model were relatively good under different input combinations (between 0.942 and 1.101). A comprehensive comparison of the two indicators of GMER and GSDER revealed that BD was the most important input variable in the GA-BPNN model.

Differences in the soil properties of check dams
In small dam-controlled watersheds on the Loess Plateau, the soil originating from slopes, valleys, and inter-ditch areas is eroded by rainfall or floods, transported, and finally deposited in check dams (Zema et al., 2018;. In this study, the K s value of the check dams in the Kuye River watershed was the highest at different depths, followed by the K s values of check dams in the Xinshui River and Zhujiachuan watersheds. These results were attributed to the notion that the soil particles of check dams in the Kuye River watershed mainly consist of sand, and the soil pores were large, which caused an increase in soil water movement channels and K s enhancement (Yao et al., 2015). In our study, the variability in the K s and SOM of check dams in the three watersheds was moderate to high, and the CV was >10%. Except for the high variability of the silt and clay in check dams in the Kuye River watershed, the soil particle size distribution showed relatively low variability, especially for the silt fraction in the Xinshui River check dams (CV, 8.7%). Moreover, higher variability was observed for SOM, presumably due to the variability in land use types and management practices (C. Zhang et al., 2013). In contrast, BD showed a relatively stable variation. The K s values of check dams in the three watersheds also showed moderate to high variability, which may be attributed to the combined influence of soil properties, natural factors, and human activities (Hu et al., 2013;Wei et al., 2016).
Human activities, climate change, and other factors influence sediment transport to check dams, thus affecting the change in K s . The K s value of check dams in the Xinshui River watershed was low and relatively stable (0.02-1.87 cm h -1 ) between 0 and 60 cm, whereas the BD values of these soil layers were high and varied greatly (1.24-1.93 g cm -3 ). The BD values of the check dams in the Zhujiachuan watershed decreased within the 0-to-60-cm depth, which is consistent with the research results of Liu et al. (2017) obtained on the Loess Plateau. However, the BD values of check dams in the Xinshui River and Kuye River watersheds at depths of 140-200 cm were similar to those at depths of 0-20 cm. Only at the 20-to-40-cm and the 40-to-60-cm depths was the BD higher, but this might be because there was a different soil material in these layers. The SOM contents in check dams of these two watersheds also increased with increasing soil depth. This increase occurred because soil erosion caused SOM instability on the surfaces of the check dams (C. Wang et al., 2012). Most of the SOM was transported and deposited into the dammed areas during sediment movement (McCorkle et al., 2016). The results were consistent with those reported in previous studies on check dams in a loess hilly gully catchment (Liu et al., 2018).
In our results, the Zhujiachuan and Kuye River watersheds had high sand contents, whereas the Xinshui River watershed had high silt contents (Table 1; Figure 3). This finding is closely related to their soil types. According to USDA soil taxonomy, the soil type in the Zhujiachuan and Kuye River watersheds, which are located on the northern Loess Plateau, is sandy loam soil, and the soil type in the Xinshui River watershed is silt loam soil. Thus, soil type is the main reason for the difference in soil particle composition among the three watersheds.

Effects of the soil types and models on PTF selection
The physical, biological, and hydrological properties of soil are intrinsically connected (Yang et al., 2021;C. Zhao et al., 2013). Soil saturated hydraulic conductivity exhibited significant correlations with the sand, clay, and silt contents and with BD, which indicated the feasibility of applying these basic soil characteristics to estimate K s (Qiao et al., 2018b;Santra & Das, 2008). Soil studies have demonstrated a correlation between K s and SOM, which comprised the basis for the establishment of PTFs for K s determination (Saxton et al., 1986;Vereecken et al., 1990;Wosten et al., 2001), but SOM was not the main factor influencing K s in this study. This finding may be attributed to the relatively high variability in SOM with depth in our research, which was greatly affected by soil formation (Qiao et al., 2018b). The soil formation process in check dams is affected by rainfall and erosion. Owing to the high erosion rates and sediment yields, the amounts of sediment mobilized by rainfall events and deposited behind check dams are substantial and produce sediment layers of a considerable thickness (i.e., several centimeters or more) (Y. Wang et al., 2017). Therefore, the soil formation process in check dam is different from that of natural soil, resulting in great variation in SOM. Soil depth is an important variable for establishing PTFs, especially for the special sedimentary T A B L E 4 Selection of the widely applied pedotransfer functions (PTFs) for the prediction of saturated hydraulic conductivity  Table 4 for the PTFs). BD, bulk density; GMER, geometric mean error ratio; GSDER, geometrical standard deviation error ratio; SOM, soil organic matter deposits in check dams (Szabo et al., 2021). In this study, soil depth was selected as a fixed input variable of the PTFs for K s . There was no significant difference in the prediction accuracy of the 16 different combinations of PTFs for K s , demonstrating that K s could be predicted under limited conditions using a small number of variables that were relatively easy to obtain. Even though soil science and many related fields have used intelligent regression models, the GA has rarely been adopted to predict K s . In this study, GA-SVR and GA-BPNN models were adopted to predict K s , and the results obtained were better than those (RMSE = 1.41) of Y. Li et al. (2007). On the Loess Plateau of China, C. Zhao et al. (2016) established PTFs for K s determination via multiple linear regression and artificial neural network methods, and Qiao et al. (2018a) developed the first PTF for BD estimation in deep layers (50-200 m). They did not focus on the variation in K s or on the development of PTFs for check dams. In contrast, PTFs for check dams in three typical, small watersheds on the Loess Plateau were the subject of our research, and the results were promising, laying a theoretical framework for future sustainable ecosystem development and research.

Evaluation of the performance of published PTFs
In the past few decades, researchers have developed and tested various types of models to predict soil properties (Besalatpour et al., 2013;Cosby et al., 1984). Much attention has also been paid to PTF comparisons (Wagner et al., 2001;X. Zhang et al., 2019). However, a PTF is a tool suitable for a particular area, and the method that best suits the studied soil should be chosen (Picciafuoco et al., 2019). Figure 8 shows the measured and estimated K s values with the PTFs investigated in the references (Table 4). The estimation capability of the four PTFs was generally low, and the PTF of Cosby et al. (1984) was inferior to the other three PTFs (GMER = 3.10), which further verified that the PTFs exhibited regional differences. These four PTFs either overestimated or underestimated the K s values of this study. The existing PTFs for K s estimation in the literature are not always applicable to other regions at an acceptable level of accuracy. The established references selected in this study also support this point. The ability of PTFs to accurately predict soil properties often greatly depends on the underlying data (X. Zhang et al., 2019). Our research is based on a dataset that covers the basic soil characteristics and K s values of representative check dams on the Loess Plateau. The soil conditions and climatic environment in this area remain consistent or vary only within a narrow range. Therefore, similar to many PTFs developed for specific regions, the PTFs for K s estimation developed in this study should be applied to other areas with caution following validation.

CONCLUSIONS
In this study, the K s values and basic soil physicochemical characteristics (0-200 cm depth) of check dams in the Xinshui River, Zhujiachuan, and Kuye River watersheds were significantly different. Pearson's correlation coefficient analysis revealed that K s exhibited significant positive correlations with soil depth and sand but negative correlations with BD, SOM, silt, and clay. These combinations could estimate K s well and could be used to establish PTFs for check dams in the three watersheds on the Loess Plateau. The performance of PTFs generated with the GA-SVR model was better than that of PTFs generated with the GA-BPNN model. The basic soil characteristics, such as soil particle sizes, were the most important input variables for K s prediction. Thus, we developed the first PTFs based on GA and machine learning techniques for estimating K s in check dams on the Loess Plateau. This development could have practical value for further research on sustainable land restoration under the background of global environmental changes on the Loess Plateau and even at the global scale.

A C K N O W L E D G M E N T S
This study was supported by the Chinese Academy of Sciences (CAS) Light of West China Program.

C O N F L I C T O F I N T E R E S T
The authors declare that they have no conflicts of interest.