Experimental and statistical study of saturated hydraulic conductivity and relations with other soil properties of a desert soil

Agricultural expansion in the Kharga Oasis, in the western desert of Egypt, depends strongly on irrigation. Soil hydraulic conductivity is therefore a key property for reclaiming desert land and planning irrigation schemes. Soil samples collected at 10‐m intervals in a 120 m by 120 m plot were analysed for hydraulic conductivity together with 12 other basic physical and chemical soil properties. The resulting data were analysed statistically using Pearson correlation, principal component analysis and linear regression. The hydraulic conductivity values varied over four orders of magnitude and, because of the saline–sodic nature of the soil, were about two orders of magnitude smaller than what is generally reported in the literature for similar soil textures. Results showed that the hydraulic conductivity was correlated significantly with soil variables that relate to soil structure, such as wilting point, field capacity and SAR, and less to variables that relate to soil texture, such as silt and clay fractions. Pedotransfer functions for hydraulic conductivity were derived by stepwise multiple linear regression and fitted by residual maximum likelihood. The first model was based directly on soil properties, whereas the second model was based on principal components. Both models showed that part of the variation in hydraulic conductivity encountered in the field could be explained, but with large uncertainty probably resulting from sampling and measurement errors, randomness and soil heterogeneity, or other soil properties that were not observed. The most significant predictors for the first model were wilting point, SAR and silt fraction. The second model used the first two principal components of the soil variables as predictors, the first one related to soil structure and the second to soil texture.


Introduction
Saturated hydraulic conductivity (K) determines the capacity of a soil to transmit water and as such is an essential soil physical property that affects all soil-plant-water relations and processes. However, it is also one of the most variable soil properties because it is associated with soil texture and structure, but is also affected first few principal components from PCA usually contain most of the variance in a dataset with interrelated variables. Consequently, it is used to reduce the dimensionality of the data and to determine the strength of the relations among these variables (Jolliffe, 2002). For instance, Duffera et al. (2007) used PCA to identify relations between K, bulk density and porosity, Rogiers et al. (2012) used it for estimating K from particle-size data and Wang et al. (2013) used it to identify soil properties affecting K in loessial soils of China.
Pedotransfer functions relate simple soil properties to complex soil properties such as K (Wösten et al. 2001). Several pedotransfer functions have been developed to predict K from other soil properties (Sobieraj et al., 2001). A common approach is to use multiple linear regression analysis to relate K to soil properties such as texture classes, bulk density, field capacity and wilting point. For instance, Jabro (1992) used stepwise regression to predict K from silt and clay fractions and bulk density, Singh et al. (1992) to predict K from bulk density, organic carbon and the sand fraction, Li et al. (2007) to predict K from bulk density, soil organic carbon, and silt, clay and sand fractions and Wang et al. (2012) to predict K from bulk density, soil organic carbon and soil texture. Comprehensive pedotransfer functions for K have been presented by Rawls et al. (1982) based on 5350 datasets collected in the USA, Wösten et al. (1999) with 5521 sets of soil data from Europe and Schaap et al. (2001) based on 1306 soil samples from North America and Europe.
The purpose of this research was to investigate the hydraulic conductivity of a desert soil in the Kharga Oasis in Egypt, where crop production on reclaimed soil depends strongly on irrigation. The objective was to determine K and its relations with other soil properties. We applied correlation and PCA to quantify the relations among the different soil properties and to identify those that relate to K. We derived pedotransfer functions to predict K by multiple linear regression analysis with related soil properties or their principal components, and we cross-validated the prediction results.

Study area
This study was carried out in the Kharga Oasis in the western desert of Egypt (Figure 1). The oasis is a wind-ablated depression about 18-km long and 15-30-km wide carved into Cretaceous-Tertiary marine strata that form the Libyan Plateau (Elgabaly & Khadr, 1962). The oasis was formed by structural differences and subsequent exposure to wind, and dry and wet seasons. The present prevailing climatic conditions are arid, with strong winds and sand storms in spring and temperatures up to 40 ∘ C in summer; the mean annual rainfall is < 1 mm, but occasionally torrential storms occur (Lamoreaux et al., 1985).
The soil of the Kharga Oasis relates primarily to the different parent materials derived from pre-existing sediments of fluvial origin mixed with windborne material (Elgabaly & Khadr, 1962). The soil in the depressions is classified as Chromic Haplotorrert or Typic Haplotorrert with moderate capability to produce common cultivated crops, whereas the soil on the depression margins is classified as Typic Torripsamment with very severe limitations for crop cultivation (USDA, 1999;Gad, 2015).

Data collection and laboratory analyses
The present study concentrates on a small plot of bare soil near to El-Monera Village in the north of the Kharga Oasis ( Figure 1). Samples were taken at 10-m intervals in a 120 m × 120 m plot to give 144 sampling locations in total. At each site the topsoil was removed and three samples were taken at a depth of 5-10 cm. Disturbed samples of about 1 kg were taken for particle-size and chemical analyses. Two undisturbed cylindrical soil cores were also taken; one sample of 5 cm in diameter and 2.5 cm in length for determining soil water retention characteristics and dry bulk density, and another sample of 5 cm in diameter and 5 cm in length to determine K.
The disturbed soil samples were air-dried and passed through a 2-mm mesh sieve. Part of the sample was used for particle-size analysis. Hydrogen peroxide and hydrogen chloride were added to remove organic matter and carbonates, and silt and clay fractions were determined by the pipette method. Subsequently, soil textural classes were determined according to the USDA soil classification system (USDA, 1987). The remaining part of the soil was used to determine chemical properties. Soil organic carbon content (expressed as CaCO 3 ) was determined by the modified Walkley-Black method using a Collins calcimeter. Sodium adsorption ratio (SAR) was determined with saturated soil paste extracts and a Perkin-Elmer flame photometer (Shelton, CT, USA) to determine Na + , whereas Ca 2+ and Mg 2+ were determined by titration. Soil pH was determined in a 1:2.5 soil:water suspension with a pH electrode. Soil electrical conductivity of saturation soil extracts was determined with a conductivity cell.
The undisturbed soil samples of 2.5-cm length were used to determine soil water characteristics with a pressure cell apparatus. Water content at saturation was determined after saturating the soil cores, and field capacity and wilting point were determined after subjecting the samples to suctions of 33 and 1500 kPa, respectively. Afterwards, the samples were completely air dried to determine the dry soil bulk density as the ratio of dry mass and bulk volume. The other series of undisturbed soil samples of 5-cm length was used for determining K, whereby samples were saturated, mounted in a constant head permeameter and subjected to 5 cm of water head. The outflow was collected and K values were derived using Darcy's law.

Data analyses
Classification of the soil samples with the USDA soil classification system indicated 101 samples of sandy loam, 19 of sandy clay loam, 15 of loamy sand, five of loam and four of sand. Figure 2 shows the soil samples plotted in a texture triangle, indicating that the samples are scattered from sand to loam, but are predominantly sandy loam soil. Basic summary statistics of the soil variables are presented in Table 1: minimum, mean, maximum, standard deviation, skewness and Kolmogorov-Smirnov test to verify whether the observations are normally distributed. The skewness for a normal distribution should be zero, but a value between minus and plus one is deemed acceptable in statistical analyses, whereas for the Kolmogorov-Smirnov test with 144 samples the value should be less than 0.113 at the 0.05 probability level. Soil organic matter content and K only were not normally distributed. We disregarded the non-normality of soil organic matter content because it was very small and unimportant in this study. Hydraulic conductivity values  were transformed to common logarithms (log 10 ), and the two impervious samples were excluded from further analysis ( Figure 3). The strength of interrelations between the observed soil variables was examined by a Pearson correlation matrix (Table 2), and principal component analysis (PCA) was carried out to identify the most important variables and interrelations. These analyses are only exploratory because the variables may be spatially correlated and hence not independent. Principal component analysis transforms the observed variables linearly into orthogonal uncorrelated variables known as principal components (PCs), which maintain the total variance in the original data. The PCA was performed on the correlation matrix, which in effect standardizes data measured on different scales to unit variance. Therefore, the PCs become independent of the scale and units of the observed variables. The output from PCA comprises the eigenvalues, eigenvectors and PC scores. The eigenvalues give the variance accounted for by each component and the PCs are ranked according to this (Table 3). The first PC accounts for most of the variation, and subsequent components are orthogonal to one another and uncorrelated, with reducing variance accounted for. Principal components with eigenvalues < 1 can be disregarded because they account for less information than the original variable; this in effect reduces the dimensionality of the transformed data. The eigenvectors contain the cosines of the angles between the original axes and the new principal component axes. The absolute magnitude of the eigenvectors can help to provide an interpretation of the PCs. We used Varimax rotation, an orthogonal linear transformation that maintains the total variance in the data, but aligns PCs along directions that emphasize the relation between the PCs and the observed data. We did this to improve the interpretation of our results. We converted the eigenvectors to the product-moment correlation coefficients between the principal components and original data to aid the interpretation further (Table 4).
A pedotransfer function relating log 10 (K) to the other soil variables was derived by multiple linear regression analysis with stepwise variable selection. Both forward selection and backward elimination were used. Forward stepwise variable selection starts with no predictor variables in the model and at each step a predictor can be added or deleted if the t-statistic is significant or no longer significant, respectively, at a prespecified probability level (P = 0.15 in this study). The variable added is the predictor that has the smallest t-statistic. The procedure stops when all variables not in the model have insignificant Student's t-statistics (P > 0.15). Backward stepwise variable elimination starts with all candidate variables. At each step the predictor with the largest t-statistic is deleted provided it is insignificant (P > 0.15). The procedure stops when no more variables can be removed from the model. All models were fitted by residual maximum likelihood (REML) because the sample data came from a small, intensively sampled plot, which could result in spatial dependence between sites. In contrast to ordinary least squares regression, REML does not   assume that model residuals are spatially independent. Instead, REML fits a model by maximizing the likelihood conditioned on an appropriate form for the covariance matrix of the residuals (Lark & Cullis, 2004). The spatial variation of the model residuals was described by an isotropic spherical variogram with nugget and sill variances, and a range of spatial dependence, also estimated by REML. The goodness of fit of the models was assessed by the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the log-likelihood. For the REML procedure we used the nlme package in R (Pinheiro et al., 2017). Linear regression with stepwise variable selection does not necessarily produce the best model if there are many redundant predictors, which was the case in this study because the variables related to soil structure in particular were strongly inter-correlated. In this case, it might be better to do the multiple linear regression on PCs because these are uncorrelated. For this we needed to redo the PCA with all the soil variables except for log 10 (K). A pedotransfer function relating log 10 (K) to PCs was derived by the same procedure as explained above.
A cross-validation was carried out to assess the robustness of the pedotransfer functions. A five-fold cross-validation was used, whereby the observations were randomly partitioned into five (more or less) equal-sized subsamples. The cross-validation procedure selected each subsample in turn as validation data and the remaining subsamples as training data. Pedotransfer functions were derived with the training data (including PCA in the case of pedotransfer functions based on PCs) and used to estimate log 10 (K) with the validation data. The robustness of the pedotransfer functions was verified by comparing the predictions with the original model predictions. This procedure has the advantage that it uses all observations for validation.

Results and discussion
Descriptive statistics Table 1 shows that the average silt and clay fractions are 13 and 15%, respectively, indicating that the soil is predominantly sandy (i.e. coarse textured). The organic matter content is very small (0-0.5%), which is characteristic for a bare desert soil. The carbonate content (4-23%) is large, which is typical for a dry and warm climate. The SAR varies considerably (8-71), with a mean of 35, indicating that the soil is predominantly sodic. This is important because high sodicity causes clay to swell when wetted, causing aggregates to break down and soil structure to collapse, resulting in poor physical properties, in particular reduced hydraulic conductivity from the clogging of pores (e.g. Rhoades et al., 1992;Bagarello et al., 2006). The pH values indicate that 99 samples (69%) are very strongly alkaline (pH > 9), 42 (29%) are strongly alkaline (8.5 < pH < 9) and 3 (2%) are moderately alkaline (7.9 < pH < 8.5), indicating the dominating presence of sodium carbonate. This is confirmed by the electrical conductivity values, which range from 7 to 133 S m −1 with a mean of 59 S m −1 because the soil is predominantly saline-sodic. Saturated water content varies between 0.34 and 0.64 m 3 m −3 , which is in the range for these soil types. Field capacity is between 0.09 and 0.44 m 3 m −3 , which is also typical, although the maximum value is somewhat large. Wilting point varies between 0.06 and 0.37 m 3 m −3 , which is rather large and clearly relates to the high salinity (greater retention of water). Dry bulk density varies from 1138 to 1728 kg m −3 with an average value of about 1400 kg m −3 , which is typical for a coarse-textured soil.
Hydraulic conductivity values vary from around zero to 2.4 m day −1 with a mean of 0.19 m day −1 . Two soil samples (sandy loam) were almost impervious because no water flow could be detected during the permeameter experiments, which was attributed to cementation by calcium carbonate (caliche) or clogging by high sodicity, or both. The K values of the other samples were smaller than what is typically expected for these soil textures. Values reported in the literature for these soil texture classes range from about 0.1 to 10 m day −1 ; for example, Rawls et al. (1998), Saxton & Rawls (2006), USDA Natural Resources Conservation Service (www.nrcs.usda.gov), FAO (www.fao.org) and US Salinity Laboratory Rosetta model (www.ars.usda.gov).
The K values of our data are about one to two orders of magnitude smaller, which could result from cementation by carbonates, but more likely from clogging of pores because of high sodicity.
The standard deviation (Table 1) for most soil properties is large, indicating moderate to strong variability, which is not uncommon for soil (e.g. Mulla & McBratney, 2002). Soil bulk density, pH and saturated water content are the least variable, whereas field capacity, wilting point and carbonate content are moderately variable. The variability of silt and clay content is large and for the SAR and electrical conductivity very large. The very large variability of K is common for soil (e.g. Mulla & McBratney, 2002;Deb & Shukla, 2012;Wang et al., 2013); Figure 4 shows the cumulative frequency distribution of the observed log 10 (K) values for the different soil texture classes. For sand and loam soil types there were not enough observations for the cumulative frequency distributions to be meaningful, but for loamy sand, sandy loam and sandy clay loam the distributions appear to be normally distributed. Although the median values deviate by about a factor of 1, there is considerable overlap because the range for each texture class is about three to four, which means three to four orders of magnitude for the corresponding K values. The most comprehensive set of K values reported in the literature are those for the Rosetta model (Schaap et al., 2001;www.ars.usda.gov), which gives average log 10 (K) values from −1 to 0 and ranges of 2 to 3 for the soil textures identified in this study. The log 10 (K) values observed in this study are considerably smaller and more widely distributed than predicted by the Rosetta model. Table 2 shows that log 10 (K) is moderately correlated with saturated water content, field capacity and wilting point and weakly correlated with silt, carbon content, SAR and bulk density. The negative correlation of log 10 (K) with wilting point, field capacity and saturated water content indicates dependence of K on the presence of micropores, which is revealed by the large wilting point, in particular. It is likely, however, that clogging of pores from sodicity also plays a role. The latter is indicated by the negative correlation of K with SAR. The negative correlation of log 10 (K) with silt indicates the dependence of K on soil texture. The negative correlation of log 10 (K) with calcium content might indicate cementation of soil pores.

Correlation and principal component analysis
Results of the PCA (Table 3) show that the first four PCs have eigenvalues ≥ 1 and these account for 70% of the variation in the data. When the eigenvectors were plotted initially, they did not provide a clear interpretation of the axes and that was why we rotated the axes by Varimax. After rotation, we converted the eigenvectors to correlation coefficients between the soil variables and the first four PCs; they are given in Table 4. Sodium adsorption ratio (SAR), saturated water content, field capacity and wilting point are strongly positively correlated with PC1, and bulk density and log 10 (K) are strongly negatively correlated with it. This indicates that a large part of the variation and interrelation in the data relates to soil structure, especially the effect of sodicity. The silt fraction and carbonate content are strongly positively correlated with PC2, which indicates that there is a second, less important source of variation that relates to soil texture. There is a very weak correlation only between log 10 (K) and PC2, suggesting that K is only weakly related to soil texture. The third PC appears to reflect variation in the data related to redox conditions in the soil, which is unrelated to K and can be disregarded for this study. The fourth PC is correlated with the clay and soil organic matter contents, which, however, are small in this soil; therefore, this PC appears to be unrelated to K and can also be disregarded.
The first two principal components appear to be linked to only K and therefore require further consideration. Figure 5 shows the correlations between the soil properties and the first two PCs plotted in the plane of PCs 1 and 2 as vectors. The correlations for SAR, saturated water content (WS), field capacity (FC) and wilting point (WP) show a strong positive relation with PC1 and with each other, and bulk density has a strong negative relation with PC1. These soil properties are related to soil structure; therefore, PC1 can be considered to represent soil structure. Vectors for silt fraction and Figure 5 Correlations of the soil variables with the principal components after Varimax rotation plotted in the plane of PC1 and PC2: silt fraction (Silt), clay fraction (Clay), soil organic matter (SOM), carbonate content (CAR), sodium adsorption ratio (SAR), electrical conductivity (EC), saturated water content (WS), field capacity (FC), wilting point (WP), bulk density (BD) and (K) hydraulic conductivity; SAR, WS, FC, WP and BD align along PC1, which appears to relate to soil structure; Silt and CAR align along the PC2, which appears to relate to soil texture; log 10 (K) relates more strongly to PC1, suggesting that it depends more on soil structure than on soil texture.

Figure 6
Plot of log 10 (K) values and soil texture classes (USDA, 1987) in the plane of PC1 and PC2. The symbols represent the soil texture classes (S, sand; LS, loamy sand; SL, sandy loam; SCL, sandy clay loam; L, loam; USDA, 1987) and the colours represent various values of log 10 (K) / m day −1 ; log 10 (K) increases from right to left along PC1 and soil texture classes vary from sand to loam in the direction of increasing PC2. carbonate content, which relate to soil texture, align positively with PC2; therefore, it can be considered to represent soil texture. The vector for log 10 (K) is more closely aligned to PC1 than to PC2, indicating a closer relation to soil structure than to soil texture. Figure 6 shows a plot of the PC scores in the plane of the first two principal components. The scores represent the sampling sites, and the texture for each site is illustrated by a symbol. The symbols are overlain by values of observed log 10 (K). The soil texture classes vary from sand to loam in the direction of increasing PC2, whereas log 10 (K) increases mainly from right to left along PC1. There is no clear separation between the texture classes of the samples, whereas the log 10 (K) values occupy more clearly defined parts of the multivariate space.

Pedotransfer functions
Linear regression of log 10 (K) by stepwise forward selection or backward elimination of the other observed soil properties leads to the same model:  Tables 5 and 6. Table 5 gives the coefficients  of the predictors in the regression equation, the standard error of the coefficients and the corresponding Student's t-statistic and P level. Table 6 gives the estimated sill and nugget variances and the range of spatial dependence of the residuals, and the model fitting criteria AIC, BIC and log-likelihood. There are three predictor variables only in the pedotransfer function given by Equation (1): wilting point, SAR and silt fraction. The Student's t-statistic for the regression coefficient of WP is much larger than for SAR and Silt. Therefore, the most important predictor for log 10 (K) is wilting point. This result is in contrast with other pedotransfer functions for K cited in the literature, which are usually based predominantly on variables associated with soil texture (e.g. Rawls et al., 1982;Jabro, 1992;Wösten et al., 1999;Schaap et al., 2001). This shortcoming had already been criticized by Vereecken et al. (1989), for example, who recommended improvement of models with additional information such as water retention. Pachepsky et al. (2006) also recommended the need for predictors that relate to soil structure.   (1) plotted against observed values, together with the model's 95% confidence bands. For every predicted value of log 10 (K) the uncertainty is about ±1.5, or about three orders of magnitude for the corresponding K values. This large uncertainty can be attributed to sampling or experimental errors, soil heterogeneity or other soil properties that have not been observed. The results of the five-fold cross-validation, also shown in Figure 7(a), suggest that the pedotransfer function is robust because predictions are close to the original predictions.
Linear regression of log 10 (K) by stepwise forward selection or backward elimination of the PCs leads to: for which the first two PCs only were selected as significant predictors. Figure 7(b) shows the predicted log 10 (K) values plotted against observed values with 95% confidence intervals. Details of the model results are given in Tables 5 and 6. Notice that because the PCs have a zero mean and are uncorrelated, the intercept in Equation (2) equals the mean of the observed log 10 (K) values (Table 1) and the standard errors of the regression coefficients are equal. However, Student's t-statistic for the regression coefficient of PC1 is much larger than for PC2. Therefore, the most important predictor is PC1, which represents soil structure, whereas PC2 representing soil texture is less important. The prediction uncertainty revealed by the 95% confidence bands is very large and of the same magnitude as for Equation (1). However, the results of the five-fold cross-validation for Equation (2) appear to deviate more from the original model predictions than for Equation (1). This is expected because PCA was redone for each of the five cross-validation runs, which resulted in more scatter of the predictions. The estimated parameters for the spatial dependence of the residuals (Table 6) are almost the same for both models. The sill and nugget variances are slightly larger for the pedotransfer function based on PCA. The range is small (about 11 m), indicating that there is spatial autocorrelation between neighbouring sampling sites only. The model fitting criteria obtained by REML (Table 6) indicate that Equation (2) performs better than Equation (1). The value for the log-likelihood is about the same, indicating no essential difference in goodness of fit. However, the AIC and BIC criteria, which express a trade-off between the goodness of fit and the complexity of the model, are substantially smaller for Equation (2) than for Equation (1). The relative likelihood L derived from the AIC values is given by: which shows that the model given by Equation (2) is almost five times more probable than that given by Equation (1). Also, the difference in BIC values of 6.0 also provides strong evidence that Equation (2) is more probable than Equation (1).

Conclusion
The desert soil of the Kharga Oasis, Egypt, is particularly saline-sodic, which has been shown to have a marked effect on its properties. The saturated hydraulic conductivity is considerably smaller than that commonly expected from soil texture alone. If information from the literature was used to estimate K for this soil the results would be orders of magnitude too large, and any design for soil reclamation through irrigation schemes based on such predictions would lead to failure. Principal component analysis showed that the first PC mainly depended on field capacity, wilting point, bulk density and SAR, which relate to soil structure, and the second PC mainly depends on silt fraction and carbonate content, which relate to soil texture. Hydraulic conductivity related considerably more strongly to the variables related to soil structure.
Pedotransfer functions for log 10 (K) explained part of the observed variation in K, but with a large uncertainty, which can be attributed to sampling error, experimental error, soil heterogeneity or to other soil characteristics that were not measured. The models indicated that soil properties related to soil structure were more important than those related to soil texture for the prediction of K. The first model based directly on soil properties required three predictors only that can be obtained easily in practice (i.e. wilting point, SAR and silt fraction). The second model was based on the first two principal components only as predictors. This model was more likely than the first model, but was less practical because many soil properties are usually used for the analysis.