Exchangeable acidity and pedotransfer functions for the soils of Ghana

Soil exchangeable acidity (EA) is an indicator of aluminium toxicity potential in acidic soils. Predicting the distribution and dynamics of EA is needed for the identification and management of acidic soils. In this study, we used datasets of 355 pedons from across Ghana and the Cubist rule‐based algorithm to generate pedotransfer functions (PTFs) of EA. Eight soil properties (pH, organic carbon, calcium, sodium, magnesium, total exchangeable bases, cation exchange capacity [CEC] to clay ratio and soil depth) were used to predict EA. We first used the whole dataset to construct generic PTFs and then stratified the dataset based on World Reference Base‐Reference Soil Groups (WRB‐RSGs) to generate soil‐specific PTFs. Goodness‐of‐fit statistics comprising the root mean squared error (RMSE), Lin's concordance correlation coefficient ( ρC$$ {\rho}_C $$ ) and coefficient of determination (R2) were used to evaluate the prediction accuracy and reliability of the developed PTF models on both calibration and validation datasets. The Fluvisols EA‐PTF exhibited lower performance metrics in the validation (RMSE = 0.17 cmolc kg−1, ρC$$ {\rho}_C $$ = 0.19, R2 = 0.24), whereas the EA‐PTFs for all other WRB‐RSGs and whole dataset had above‐average performance metrics in the validation (0.05 ≤ RMSE ≤ 0.97 cmolc kg−1; 0.34 ≤ ρC$$ {\rho}_C $$ ≤ 0.94; 0.52 ≤ R2 ≤ 0.96). Soil pH sufficed for predicting EA in soils with pH above 5.0, but in soils with a pH < 5.0, the levels of exchangeable bases (e.g., Na+, K+, Mg2+, Ca2+), CEC to clay ratio (CCR) and soil depth improved the prediction of EA. The developed EA‐PTFs are useful for estimating the missing values of EA in soil databases.


| INTRODUCTION
Excessive leaching and intense weathering in the humid tropics result in acid soils with low inherent fertility.The greatest portion of the total soil acidity is contributed by H + and Al 3+ , commonly referred to as exchangeable acidity (EA).The EA is a function of soil pH and the exchange capacity (Blume et al., 2016;Carter & Gregorich, 2007).Sparks (2002) defined the EA as the amount of the total cation exchange capacity (CEC) contributed by H + and Al 3+ .
Soil pH is a rapid indicator of the active acidity (concentration of H + ions) in soils.The pH does not provide quantitative information on the soil acidity (Blume et al., 2016), nor does it provide information on plant nutrient availability (Barrow & Hartemink, 2023;Hartemink & Barrow, 2023).In acid soils, the EA provides a measure of the concentration of both H + and Al 3+ ions in the soil.By considering both the H + and Al 3+ ions, the EA provides important quantitative information that is useful for assessing the potential for aluminium (Al) toxicity in acidic soils.For example, a single Al 3+ ion has a strong tendency to hydrolyze and release up to three H + ions to lower the pH of acid soils (Brady & Weil, 2017).Kamprath (1984) noted that the pool of exchangeable cations in acidic tropical soils are primarily Al 3+ ions.For this reason, the EA is a more accurate indicator than soil pH for diagnosing the detrimental effects of soil acidity on nutrient availability and plant growth (Alley & Zelazny, 1987).
In acidic soils where much of the base cations (e.g., Na + , K + , Mg 2+ , Ca 2+ ) have been replaced by acid cations (e.g., H + and Al 3+ ), the CEC determined in the laboratory will give values that are quite higher than what can be observed in the field.This common phenomenon in acid soils is usually accounted for by summing the EA values together with the values of the base cations in what is referred to as the effective CEC (ECEC) (Carter & Gregorich, 2007;McKenzie et al., 2004).The relevance of soil pH for investigating the negative effects of soil acidity should be based on the relationship between pH and exchangeable Al 3+ saturation of the ECEC (Kamprath, 1984).The soil pH-Al 3+ relationship has been the subject of previous studies that have reported that exchangeable Al 3+ increases with decreasing pH, but the concentration of Al 3+ at the same pH value can vary significantly among soils (Kariuki et al., 2007;Pierre et al., 1932).Adams (1984) pointed out that base cations particularly Ca 2+ exhibit a decisive ameliorating effect on the dynamics of Al 3+ ions depending on the pH and the prevailing soil conditions.
The optimum pH range for neutralizing the detrimental effects of soil acidity is reached when the exchangeable Al 3+ ions have been eliminated by liming (Alley & Zelazny, 1987).Thus, liming of acidic soils causes the pH to rise, whereas the EA becomes neutralized (Schroth et al., 2000).Given the significant role of acid cations in the dynamics of soil reaction, the EA can be used as a proxy in the determination of the amount of agricultural lime (CaCO 3 ) that will be required to correct the pH of acidic soils (Bleam, 2016).In general, soil acidity can have severe impacts on crop production with primary effects on plant growth (Yerima et al., 2020).Jones (1984) pointed out that even when the topsoil has been limed, high acidity in the subsoil horizons that may have not been accounted for can affect root growth and make field crops vulnerable to drought-induced stress.
Soils in Ghana have acidity problems that may have resulted from natural and anthropogenic factors (Buri et al., 2005).High rainfall, geology, clay mineralogy, soil texture and buffering capacity are important factors that influence the development of soil acidity (Hartemink & Barrow, 2023).Certainly, the availability of definitive data on soil acidity for different soils is needed to provide a comprehensive understanding of the important factors involved in the development of acidic soils (Adams, 1984).
Long-term pedon data covering more than 90% of the soil resources of Ghana are available from decades of detailed reconnaissance soil surveys (Asiamah, 1987).This inventory of legacy soil information has supported major agricultural development and soil management needs of the country over the past decades (Owusu et al., 2020).Despite the fact that soil databases provide valuable information for understanding soil properties at specific locations, the full potential of soil information is realized when predictive models are developed from soil databases to derive insightful inferences about soil properties and make generalizations across broader geographic areas (Minasny & Hartemink, 2011;Yerima et al., 2020).
Predictive models such as pedotransfer functions (Bouma, 1989) can be used to add value to basic soil information and estimate missing data in soil databases (McBratney et al., 2002;Minasny & Hartemink, 2011).Estimating the missing data in soil databases can eliminate the need to collect new soil data at the cost of time and resources for laboratory analysis.Much work has been done on developing pedotransfer functions (PTFs) for the estimation of soil physico-chemical properties (McBratney et al., 2002); however, information in the literature regarding soil exchangeable acidity PTFs (EA-PTFs) is scarce.Owing to the availability of country-wide datasets (Figure 1), we exploited the opportunity to develop predictive models of EA for Ghana.

Highlights
• Pedotransfer functions (PTFs) were developed for predicting soil exchangeable acidity (EA).• Soil pH sufficed for predicting EA in soils with pH above 5.0.• For pH < 5.0, the levels of exchangeable bases, CEC to clay ratio and soil depth improved EA prediction.• The PTFs are useful for estimating the missing values of EA in soil databases.
The objectives of this study were to examine the distribution patterns of EA in the soils of Ghana and develop EA-PTFs from the soil datasets.The PTFs presented in this study are useful for predicting the missing values of EA from available physico-chemical properties in a soil database.Additionally, the developed EA-PTFs provide an essential context and valuable insights needed to understand the distribution and dynamics of EA in different soils.

| Study area
Ghana is a tropical country in West Africa situated within the central coast of the Gulf of Guinea.The country is divided into six agro-ecological zones (AEZs) based on climate at different locations.Climatic factors such as rainfall and temperature exert appreciable influence on soil formation in Ghana.Soils of the high rainforest AEZ experience intense weathering and excessive leaching that promote the formation of acid soils.In the interior savannah AEZ, high ambient temperatures coupled with low rainfall regimes result in the formation of groundwater laterites, plinthites and somewhat less-acid soils.The semi-deciduous forest AEZ is interspersed with less-acid soils and more-acid soils in a complex pattern.The coastal savannah AEZ has varied soil types ranging from heavy-textured to salt-affected soils depending on the location.Buri et al. (2005) provided preliminary evidence indicating that soil pH in many parts of Ghana has fallen below the values recorded in the same locations about three decades earlier.They note further that soil acidity has become a major concern in Ghana.In this context, the present study used long-term pedon data to explore the distribution patterns of soil acidity in Ghana and the potential to predict EA in the soils.

| Soil classification
The World Reference Base (WRB) for soil resources (IUSS Working Group WRB, 2015) is the widely adopted soil classification system in Ghana, although many other international soil classification systems have been correlated with the soils of Ghana.Because the majority of pedons used in the study had been classified based on the WRB, we decided to use WRB Reference Soil Groups in our study.Therefore, all the 355 pedons were grouped at the Reference Soil Groups (RSGs) level based on the 2014 version of the WRB (Table 1).In most cases, the original WRB-RSG that came along with the pedon data was maintained when the pedon morphological data were found to be consistent with the 2014 version of the WRB; otherwise, the WRB-RSG was reassigned.Consistency in the pedon classification was necessary for grouping of the EA dataset in order to ensure that soils with similar characteristics were used to develop soil-specific PTFs.

| Harmonization of the pedon data
The pedon data used for developing the PTFs in this study were obtained from the soil data repository at CSIR-Soil Research Institute, Kwadaso-Kumasi, Ghana.All the pedons described before 1990 had location information in the form of text descriptions, which were georeferenced as outlined by Zhao et al. (2015).Of the 1582 georeferenced pedon descriptions, only 355 of those pedons had recorded entries of EA and corresponding covariate data.The 355 georeferenced pedon data were used in this study (Figure 1).
The full list of pedon physico-chemical and morphological datasets considered for use as covariates in the modelling are shown in Table S1.However, some of the covariate data were rejected by the stepwise covariate selection method and only those datasets that showed correlation with the target soil property were subsequently used in the modelling (Table 2).EA (cmol c kg À1 ) of the soil samples was determined by titrimetric (unbuffered 1.0 M KCl) extract, which has been described in detail by Cottenie (1980).

| Exploratory analysis of the pedon data
The EA data from the study area has been available since 1953, with a gap in data between 1953 and 1967.The highest intensity of the EA data corresponds to the 1980s (Figure 2a).The soil depth interval showed that about 80% (n = 284) of the 355 georeferenced pedons have an average depth of approximately 100 cm (Figure 2b).Data on pedons belonging to Acrisols, Ferralsols, Solonetz, Lixisols, Gleysols, Vertisols and Nitisols generally showed a soil depth of about 200 cm.However, the data on pedons belonging to Plinthosols, Leptosols, Planosols, Arenosols and Fluvisols generally showed a soil depth of about 150 cm (Figure S1).
A soil profile depth of 100 cm or more is consistent with international standards for field survey and soil description (Jahn et al., 2006).Therefore, the EA data from the 355 pedons can be considered representative of the soil variability and diverse environmental conditions in the geomorphic areas where the soils were sampled (Table 1).The soil depth interval was calculated as the T A B L E 2 Correlation between the covariates and measured exchangeable acidity used for developing the pedotransfer functions (PTFs).

Soil covariates Units
Summary average value taken for the upper and lower horizon boundaries (Figure 2b).For instance, for a topsoil horizon comprising 0-20 cm depth interval, the average depth corresponds to (0 + 20)/2 = 10 cm.Similarly, for a subsoil horizon comprising 34-66 cm depth interval, the average depth is equivalent to (34 + 66)/2 = 50 cm.The soil depth intervals were used for the assessment of the depth distribution and dynamics of EA in the different WRB-RSGs (Figure S1).The pedon data do not include repeated sampling for any of the georeferenced sites.Therefore, it is not possible to deduce any temporal changes in EA from the 355 pedons used in this study (see Figure S2).Owusu et al. ( 2020) described the soil sampling scheme adopted for the pedon characterization and the limitations of such long-term datasets.In essence, these national soil survey datasets serve as a reservoir of soil information that basically provides a set of snapshot evidence about soil properties (Tomer et al., 2017).Nevertheless, the pedons were sampled in diverse geomorphic areas and land use types.This makes the available soil survey data a valuable resource for developing PTFs at the national scale (Chakraborty et al., 2020).

| Development of the PTF models with Cubist
The EA-PTFs were developed by implementing the rulebased machine learning algorithm Cubist (Quinlan, 1992).Before Cubist was selected, we performed an exploratory analysis of the data with Multiple Linear Regression (MLR) and Random Forest algorithm (RF) (not shown).
From the initial analysis, Cubist achieved the lowest prediction errors and highest model performance among the three algorithms.
Cubist is a data partitioning algorithm that uses a hybrid approach by combining machine learning decision trees and linear regression analysis to model complex non-linear relationships between predictor variables (covariates) and the target soil property (Kuhn & Johnson, 2013).The Cubist algorithm works by recursively partitioning the original input data into internally homogeneous subsets of the target soil property (EA) and predictors.The decision trees create a series of interpretable rules to define the data partitions and then fit a linear regression model to each data partition.The final model produced by the Cubist algorithm is a combination of decision trees and linear regression models derived from different parts of the data.This results in a regression tree model with final nodes as multivariate linear models instead of discrete values.The rules extracted from the decision tree, along with the associated multivariate linear models, collectively form a Cubist model.This means that the EA-PTFs are Cubist models created by the Cubist algorithm.
Previous studies have demonstrated that the multivariate linear models created by Cubist are accurate, parsimonious and interpretable (Landré et al., 2018;Minasny & McBratney, 2008).For example, Xiao et al. (2023) found that Cubist achieved the best performance among Random Forest and Gradient Boosted Machine when the three machine learning algorithms were compared for their accuracy in developing PTFs.The PTFs presented in this study are the multivariate linear models (Cubist models) that encompass all training cases in each soil type.
The PTFs were developed using (1) the whole dataset and (2) the soil-specific dataset for each of the WRB-RSGs.In the first case, the 355 pedons were randomly split into 70% for calibration (n = 247 pedons; 1355 samples) and 30% for independent validation (n = 108 pedons; 582 samples) using the sample function in R Statistical Software (R Core Team, 2022).Similarly in the second case, for each of the WRB-RSGs, 70% of the pedons were randomly selected as calibration dataset and the remaining 30% of the pedons were used as validation dataset.The horizons (samples) from the same soil profiles were kept together in either the model training or testing to ensure independent validation.
To assess uncertainties in the PTF models, the data were randomly split into calibration and validation in 50 replications, resulting in the development of 50 PTF models for each soil type.The mean and 90% confidence interval (CI) of the RMSE values from these 50 PTF models were calculated to quantify the model uncertainties of the PTFs.For each soil type, the PTF model with the lowest RMSE values across the calibration and validation was selected as the final model from the pool of 50 replicates.

| Performance evaluation of the PTF models
The predictive performance of the PTF models was evaluated with the goodness-of-fit statistics comprising the root mean squared error (RMSE), bias, Lin's concordance correlation coefficient (ρ C ) and coefficient of determination (R 2 ), which were calculated for the calibration and validation datasets to characterize the accuracy and reliability of the developed PTF models, respectively.These statistical measures are defined by the following equations: where y i is the observed value, b y i is the predicted value, n is the total number of observations, i is the ith observation and y is the mean of the observations.The R 2 quantifies the proportion of the variation in the dependent variable explained by independent variables in the model.The RMSE quantifies the average magnitude of errors between the predicted and observed values.Bias represents a systematic error in the model predictions.It occurs when the model consistently over-or underpredicts the observed values, even when averaged over multiple predictions.A good PTF model has low RMSE values, bias close to 0 and a higher R 2 .The Lin's concordance correlation coefficient (ρ C ) is defined as: where ρ is the correlation between the predicted and observed values; b y and y are the means; σ 2 b y and σ 2 y are the respective variances of the predicted and observed values.The concordance correlation coefficient (ρ C ) measures the agreement between the predicted and observed data points along a 45-degree line from the origin.ρ C can range from À1 to 1, and values closer to 1 indicate agreement between the model predictions and observed data.
In general, the distribution of EA in the soils varies with the soil depth interval (Figure 4).For the topsoils, 0-30 cm depth interval, Ferralsols have distinctly high  EA values, followed by Solonetz and Acrisols (Figure 4a).At the 30-60 cm depth interval, Solonetz have markedly high levels of EA, which is followed by Ferralsols and Acrisols (Figure 4b).Within the subsoils, 60-100 cm depth interval, Acrisols show high levels of EA followed by Solonetz and Ferralsols (Figure 4c).In deeper soil horizons, 100-200 cm depth interval, Acrisols together with Ferralsols remain high in the observed values of EA (Figure 4d).

| Soil EA PTFs
Soil EA PTFs were developed for the whole dataset (generic EA-PTFs) (Table 3).At soil pH > 5.3, the generic EA-PTF prediction ranged from 0.02 to 4.79 cmol c kg À1 with a mean value of 0.49 cmol c kg À1 and estimated error 1 of 0.27.At pH ≤ 5.3, the generic PTF prediction ranged from 0.04 to 7.50 cmol c kg À1 with a mean value of 1.65 cmol c kg À1 and estimated error of 0.89 (see Figure 5).The soil pH, T.E.B, Ca and Avg.Dpth were the most important attributes in the generic EA-PTFs (r = 0.68; see Table S2).
Additionally, EA-PTFs were developed for 12 WRB-RSGs (Table 3).For soils classified as Acrisols, the EA-PTF prediction ranged from 0.05 to 7.70 cmol c kg À1 with a mean value of 1.98 cmol c kg À1 and estimated error of 0.88 (see Figure 5).The soil pH and Avg.Dpth were the key attributes in the Acrisols EA-PTFs (r = 0.89; see Table S2).For soils classified as Ferralsols, the EA-PTF prediction ranged from 0.05 to 6.65 cmol c kg À1 with a mean value of 1.95 cmol c kg À1 and estimated error of 0.79 (see Figure 5).The soil pH, CCR, Avg.Dpth and O.C were the main attributes in the Ferralsols EA-PTFs (r = 0.90; see Table S2).
For soils classified as Solonetz, the EA-PTF prediction ranged from 0.12 to 6.95 cmol c kg À1 with a mean value of 2.20 cmol c kg À1 and estimated error of 1.11 (see Figure 5).The soil pH, CCR and Na were the main attributes in the Solonetz EA-PTFs (r = 0.90; see Table S2).For soils classified as Vertisols, the EA-PTF prediction ranged from 0.02 to 7.30 cmol c kg À1 with a mean value of 0.91 cmol c kg À1 and estimated error of 0.55 (see Figure 5).The soil pH and Ca were the main attributes in the Vertisols EA-PTFs (r = 0.89; see Table S2).For soils classified as Plinthosols, the EA-PTF prediction ranged from 0.05 to 4.34 cmol c kg À1 with a mean value of 0.88 cmol c kg À1 and estimated error of 0.66 (see Figure 5).
The soil pH was the most important attribute in the Plinthosols EA-PTFs (r = 0.83; see Table S2).For soils classified as Gleysols, the EA-PTF prediction ranged from 0.02 to 7.50 cmol c kg À1 with a mean value of 0.86 cmol c kg À1 and estimated error of 0.51 (see Figure 5).The soil pH was the most important attribute in the Gleysols EA-PTFs (r = 0.82; see Table S2).
For soils classified as Lixisols, the EA-PTF prediction ranged from 0.08 to 6.30 cmol c kg À1 with a mean value of 1.55 cmol c kg À1 and estimated error of 0.95 at pH ≤ 5.0.The Lixisols EA-PTF prediction ranged from 0.02 to 4.79 cmol c kg À1 with a mean value of 0.58 cmol c kg À1 and estimated error of 0.31 at pH > 5.0.The soil pH, T.E.B, Ca, Mg and Na were the main attributes in the Lixisols EA-PTFs (r = 0.83; see Table S2).For soils classified as Nitisols, a pH value of 6.2 was used as the primary criterion in creating the EA-PTFs (Table 3).The soil pH, Ca and Mg were the main attributes in the Nitisols EA-PTFs (r = 0.90; see Table S2).For the soils classified as Leptosols, Planosols, Arenosols and Fluvisols, the soil pH was the key attribute in the EA-PTFs (Table 3).

| PTF model uncertainties
Leptosols > Nitisols > Arenosols > Fluvisols (Table 4).Generally, PTF model uncertainties were lowest in the calibration, with CIs indicating a narrower range of potential errors, compared with the validation phase (Table 4).Saigusa et al. (1980) demonstrated that EA can be used as a realistic measure of aluminium (exch-Al) toxicity potential in soils.According to FAO (1979), exch-Al >2.0 cmol c kg À1 can be considered toxic for plant growth.Based on this information, it can be established that the measured EA values presented in this study are indicative of potential Al toxicity in Acrisols, Ferralsols, Solonetz, Vertisols, Plinthosols, Gleysols and Lixisols, but the levels of potential toxicities depend on the soil depth interval (Figure 4).It is evident that Ferralsols can be expected to show high levels of EA in the topsoils, whereas Acrisols may have high EA in the subsoils.In contrast, the EA values in Nitisols, Leptosols, Planosols, Arenosols and Fluvisols are below the levels considered to present potential toxicity problems (Table 1; Figure 4).Supplementary pedon characterization data have been used to deduce the influence of parent material on the levels of EA in the soils.It was observed that tarkwaian rocks, volta alluvium, phyllite, shale, sandstone, alluvial soil and granite appear to naturally predispose associated soils to high and moderate levels of EA.However, gneiss rocks, schist and colluvium show low levels of EA in the soils which form from them (see Figure S4).Obiri-Nyarko (2012) reviewed the causes of soil acidity in Ghana and enumerated that weathering of parent rocks, leaching, acid rain, organic matter decomposition, crop removal and application of acidforming fertilizers are among the most important factors to be considered.Meng et al. (2019) added that areas where high rainfall amounts lead to excessive leaching of base cations are naturally prone to soil acidification.

| Accuracy and reliability of the PTFs
Accurate and reliable prediction of EA in different soils is needed for diagnosing the potential for Al toxicity in acidic soils.The EA-PTF models for Ferralsols,   3; see Figure 5).The low errors and high explained variance achieved by the PTFs for predicting EA in Ferralsols, Plinthosols, Nitisols, Leptosols and Arenosols may be attributed to the representativeness of the EA data for these WRB-RSGs.At the same time, The EA-PTF models for Acrisols, Solonetz, Gleysols, Vertisols, Lixisols, Planosols and whole dataset demonstrated moderately good to mixed performance with varying RMSE, moderate ρ C and relatively high R 2 values in the cross-validation.The moderate performance achieved by the EA-PTFs for Acrisols, Solonetz, Gleysols, Vertisols, Lixisols and Planosols may have resulted from misclassification of some of the pedons in these WRB-RSGs.For example, some of the Gleysols pedons may have been misclassified as Planosols in the same way as Acrisols and Lixisols.Errors in the classification of the pedons in these WRB-RSGs may have propagated through the PTF models, and by reducing the representativeness of the EA data resulted in the moderate accuracy and reliability of the models developed for these soils.In the case of Fluvisols, the EA-PTF was accurate but had the lowest reliability in the cross-validation.Reliability of the Fluvisols EA-PTF could not be optimized from the model replications.Overall, except for Fluvisols, the soil-specific EA-PTFs derived for the WRB-RSGs generally showed commendable accuracy and reliability in the crossvalidation when compared to the generic EA-PTFs derived from the whole dataset (Table 3; see Figure 5).
To the best of our knowledge, there is a dearth of EA-PTFs in the published literature.However, there exist PTFs for harmonizing soil pH data to the GlobalSoilMap standard (Libohova et al., 2014) and PTFs for estimation of soil pH from geochemical indices (Wu & Liu, 2019).

| Uncertainty assessment
Uncertainty assessment is important because PTF models are subject to prediction errors that can be quantified (Schaap, 2004).In this study, we quantified the uncertainties in the PTF models by calculating the average error magnitude (RMSE) of the predictions within 90% CIs where the true RMSE value may fall (Table 4).The observed variation in the widths of CIs across the derived PTF models underscored varying levels of uncertainties in their predictions.PTF models with narrower CIs, such as the Arenosols EA-PTF, Nitisols EA-PTF and Leptosols EA-PTF, indicated lower uncertainties in their predictions.Conversely, PTF models with wider CIs, such as Solonetz EA-PTF and Vertisols EA-PTF, indicated higher uncertainties and underscored a wider margin of error and variability in their predictions.The high levels of uncertainties in the Solonetz EA-PTF for both the calibration and validation suggest that there are potential inconsistencies with the quality of the Solonetz EA data.Also, the observed low levels of uncertainties in the Fluvisols EA-PTF may be specious considering the lack of harmony in the statistical measures of model performance in the validation (Table 3).Ultimately, the uncertainties associated with the derived PTF models presented in this study may partly be due to misclassified pedons and/or heterogeneous soil composition in the pedons of the same soil type (see Figure S1).

| Limitations and perspectives for future research
The EA-PTFs developed in this study are primarily useful for predicting the missing values of EA in existing soil databases.However, the PTFs may have the potential to be used for predicting EA for new soil data.One major limitation of the present study is that the EA-PTFs have been developed from datasets collected over the past seven decades that represent a snapshot evidence of the distribution and dynamics of EA in the different soils.Since EA may be susceptible to changes over time and in some instances where these changes are not proportional across the range of soil properties used as covariates, then the predictive accuracy of the EA-PTFs will be limited when tested on recently collected soil data.Another limitation of the study is the stepwise covariate selection method adopted for selecting the smallest set of covariates used for developing the EA-PTFs.The stepwise selection method relies on sequential addition or removal of covariates based on statistical significance, which potentially leads to the omission of important covariates with non-linear relationships with EA.Therefore, it is possible that the smallest set of covariates used for developing the EA-PTFs in this study may not reflect the range of soil properties needed to account for EA in the different soils.It is important to recognize these limitations in order to use the developed EA-PTFs effectively.
To address these limitations, it is considered worthwhile to update the developed EA-PTFs for site-specific applications.Further research is needed to revise and improve the EA-PTFs by integrating site-specific factors and other relevant soil properties that were not considered as covariates in the present study.Site-specific factors such as parent material, land use and precipitation must be considered when updating the EA-PTFs.In further studies, it is important to explore the potential of modern tree-based variable selection methods, which are considered better alternatives to the classical stepwise variable selection.Furthermore, the importance of soil

F
I G U R E 1 Distribution of the 355 pedons across the study area (Ghana).

F
I G U R E 2 Attributes of the 355 pedons used in the study (a) temporal extent of soil sampling and (b) overview of soil depth intervals for the pedons.
data analysis of the soil datasets in landscape units of the study area.2. Pearson correlation analysis for all numerical datasets.3. Graphical diagnosis of depth distribution characteristics of the Target Soil Property.4. Evaluation of Tukey HSD mean (general comparisons) of Target Soil Property groups.5. Plotting of Histograms, Boxplots, etc. Soil Data available for use as Covariates/Predictors pH, O.C, Ca 2+ , Mg 2+ , Na + , Depth of the soil profile, TEB, CCR, N, P, K, sand silt, clay.

-
Cubist rule-based multivariate linear models as pedotransfer functions (PTFs) Model Evaluation on Training Data (Correlation coefficient, Average error, Relative error) Pedotransfer Functions (PTFs) 1. Generic PTFs developed for use in any soil type (i.e.useful when the WRB-RSGs is not known).2. Custom stratified PTFs based on WRB Reference Soil Groups (WRB-RSGs).NB: A good PTF must have commendable performance in the cross-validation (CV) (i.e.performance comparison of the calibration and validation goodness-of-fit statistics).Validation Goodness-of-fit (R 2 , ρ c , RMSE) Uncertainty = RMSE at 90% CI Calibration Goodness-of-fit (R 2 , ρ c , RMSE) Uncertainty = RMSE at 90% CI Model Calibration Model Validation CV F I G U R E 3 Methodological steps used to develop the pedotransfer functions in the study.
Observed values of soil exchangeable acidity at different soil depth intervals.(a) Ferralsols have distinctly high levels of exchangeable acidity in the 0-30 cm topsoil; (b) Solonetz have markedly high levels of exchangeable acidity in the 30-60 cm depth interval; (c) Acrisols show high levels of exchangeable acidity in the 60-100 cm depth interval; (d) Acrisols together with Ferralsols remain high in the observed values of soil exchangeable acidity.As the exchangeable acidity values can indicate toxicity problems in soils, the large values (may be considered outliers) are shown here to represent potentially toxic levels of soil acidity that are possible in different soils. 1 The estimated error represents prediction uncertainty associated with the PTF.
T A B L E 4 Pedotransfer function model uncertainties expressed as root mean squared error (RMSE) at 90% confidence intervals (CIs).
Pedon distribution and summary statistics of exchangeable acidity in the soils.
T A B L E 1 Pedotransfer functions for the prediction of exchangeable acidity in general soil types (based on whole dataset) and 12 World Reference Base (WRB) Reference Soil Groups.