Prediction of soil pH patterns in nature areas on a national scale

Question: To assess the acidification process, nationwide information about soil pH on a site level is called for. Measurements of soil pH may be used, however there are not sufficient measurements available to map soil pH nationwide on site level. Instead we developed a soil pH map based on vegetation data. Location: Natural terrestrial areas in The Netherlands. Methods: 271,693 vegetation plots were used to estimate average soil pH per plot with indicator values, based on field measurements, of plant species. By spatial interpolation average pH values between the plots, with the soil type, groundwater table and vegetation management type as ancillary explanatory variables we created a soil pH map. The map covers all terrestrial nature areas (all areas that are not built up areas, agricultural areas and infrastructural areas) in the Netherlands with a map res-olution of 25 × 25 m 2

This programme was necessary to restore and create environmental conditions that make the sustainable conservation of biodiversity possible. To evaluate the effects of the measures taken or still planned to be taken as a result of government policies, a map of the status of the environmental conditions contains essential information.
High soil pH values, in combination with calcium, may reduce the phosphorous availability, whereas low soil pH values may cause metals, like the toxic aluminium, to become available and harm plant growth. Soil pH also influences bacterial activity thus influencing the form of available nitrogen: nitrate versus ammonium (Marschner, Crowley, & Yang, 2004;Stevens et al., 2011). These direct and indirect effects of soil pH determine plant species occurrence (Falkengren-Grerup, 1986;Roem & Berendse, 2000;Silvertown et al., 2006;Wamelink, Goedhart, van Dobben, & Berendse, 2005). As a consequence, plant species occurrence can be used as indicators for soil pH (see e.g., Wamelink et al., 2005;Wasof et al., 2013).
Soil pH in natural areas is not only, indirectly, influenced by industrial activities, but since long by other anthropogenic activities, such as land use, manuring, nitrogen deposition and nowadays even human induced climate change Falkengren-Grerup, 1986;Roem & Berendse, 2000;Van Den Berg et al., 2005).
Soil pH is also influenced by natural processes such as mineralisation, that causes a decrease in pH, weathering, soil displacement (for example by digging of animals), erosion and sedimentation, that may cause an increase in soil pH (Van Breemen, Mulder, & Driscoll, 1983).
Furthermore, soil pH depends on the soil type, the supply of carbonate rich water, and plant litter.
For the evaluation of the effects of measures taken or planned in the field (environmental scenario's) as a result of government policies to mitigate acidifying effects or nature restoration insight in the soil pH is necessary. Therefore, we developed a soil pH map for terrestrial nature areas in the Netherlands. Because of the lack of sufficient costly field measurements, we used the Dutch vegetation database and the responses of plant species established by Wamelink et al. (2005Wamelink et al. ( , 2012 to estimate soil pH. In combination with information about groundwater table, soil type and vegetation type we predicted a soil pH map for terrestrial nature areas in the Netherlands. We define terrestrial nature areas here as all terrestrial areas that are not built up areas, agricultural areas and infrastructural areas.

| Overview of the method
The basis for the soil pH map is National Dutch vegetation (NDV) database (Hennekens & Schaminée, 2001;Schaminée, Hennekens, & Ozinga, 2012) and the indicator values for soil pH developed by Wamelink et al. (2005). The NDV contains vegetation relevés from different vegetation types and are spread over all nature areas in the Netherlands. The soil pH of these relevés is unknown, the soil pH is instead estimated. For this we use the indicator values for soil pH. These indicator values are based on field measurements of the soil pH (see further Appendix S1 and Wamelink et al., 2005). The estimated soil pH, as the average of the indicator values of the species present in the relevé, where the basis for the estimation of the pH map. To go from point data to the grid cell level we applied spatial interpolation, to interpolate between data points. This model does not only include the estimated pH but also groundwater table, soil type and vegetation type. The standard error of the estimated soil pH with the indicator values was used to weigh the estimated soil pH. With the regression model a soil pH and standard error was estimated for each terrestrial nature area grid cell. Subsequently a cross validation and a validation on independent data were carried out. The map was applied to evaluate the current state of the Dutch nature areas by comparing the pH requirements of the vegetation types with the estimated soil pH on our map.
The applied method to estimate the soil pH map is in short: 1. Our data are vegetation derived mean pH values at almost 300,000 locations including an associated standard error. The model to obtain these pH values and standard errors is given in Wamelink et al. (2005) standard errors of step 1 into account (Delhomme, 1978). Hence, more weight is given to locations with smaller standard errors.

| Database and spatial prediction of soil pH
The vegetation plots (relevés) used for the estimation of soil pH were retrieved from the National Dutch vegetation database (Hennekens & Schaminée, 2001;Schaminée et al., 2012). The database has an entry in the "Global index of vegetation-plot databases (GIVD)" under number EU-NL-001 (https://www.givd.info/ID/EU-NL-001).
This database contains relevés where at least all higher plants were identified, a part of them also include bryophytes and lichens. They were made by many authors for many different purposes. All relevés were made following Braun-Blanquet (1964) method or similar methods. We only used georeferencedrelevés made after 1 January 1990. were used in the statistical analyses for mapping soil pH. The standard error was used to give different weights to the mean soil pH.
A pH map was made for terrestrial and semi-terrestrial vegetation management types only (including e.g., swamps, fens and so on). For the spatial interpolation of soil pH we used a soil type, groundwater table and vegetation type as ancillary information. A more extensive description of the data and the used method can be found in Appendix S1.

| Soil type and groundwater table
Soil type and soil pH are related and can therefore be used to improve prediction of soil pH. We used the national Dutch soil map scale 1:50,000 (De Vries, 1999;De Vries, de Groot, Hoogland, & Denneboom, 2003) generalized to 43 soil mapping units. We also used the groundwater table class of the soil map that distinguishes eight different classes of depths at which groundwater tables seasonally fluctuate.

| Vegetation management type
The vegetation management map consists of 37 terrestrial vegetation management types (Appendix S2). This map was primarily set up for the benefit of nature managers and one of the major goals was to appoint subsidies to the nature areas for vegetation management such as mowing and grazing and the improvement of the nature (abiotic) quality. The map was set up by field experts and is independent from the other variables used. The vegetation management types are rather broad and not defined at the more precise phytosociological classes (association) level. For instance, pine, oak and beech forest have comparable management in the Netherlands so they ended up in one vegetation management type 'pine, oak and beech forest' (N15.02). Most of these types are more or less semi-natural, since all are managed, even the raised bogs (removal of trees, hydrological management). Some nature areas on the map are defined at a landscape level, and have their own section in the typology. Because these landscape types are too broad for our purposes, we added additional information to split these types up to the level of the seminatural vegetation management types. This was done for Dune and Salt Marsh landscapes (N01.02), River and Marsh landscapes including flood plains (N01.03) and Sand and Chalk Landscape (N01.04).
This downscaling was carried out based on soil type, groundwater table and satellite data.

| Statistical analyses
Spatial interpolation is often used to predict values at grid points given values at neighbouring points. The result is a raster map. To apply spatial interpolation the polygon and point maps we used, were all transformed into raster maps with 25 × 25 m 2 cells. A wellknown spatial interpolation technique is kriging. We applied this geostatistical technique to predict pH-H 2 O on a regular grid for the Netherlands. Our general model is: where: y(s): the pH at location s, with s a vector of spatial co-ordinates m(.; s): a model that gives the spatial trend of the pH at location s x 1 … x p : explanatory variables that are related to y(s) r(s): a spatially correlated residual at location s, i.e., the spatial variation not accounted for by model m(.;s).
Our geostatistical model is often referred to as kriging with an external drift (Goovaerts, 1997). It is well known that kriging predictions generally benefit from taking auxiliary explanatory variables x 1 , …, x p into account. The deterministic trend model m(x 1 , x 2 , …, x p ; s) is usually written as a weighted linear combination of p explanatory variables x i , In our approach we will take a different view by writing this expression as is a random forest model (Breiman, 2001). A random forest model is a machine learning model that uses bootstrap samples from the original data to build an entire forest of regression trees models. The explanatory variables, x i , i = 1…p, are run down each tree to get predictions of pH. The predictions of all trees are combined to an ensemble prediction of pH. A random forest model is able to capture nonlinear structures in the data and is relatively easy to fit. On top of that, it provides out-of-bag validation statistics which can be used to judge the fit.
On the downside, a random forest prediction may be biased, but this potential bias will be corrected for by parameters b 0 and b 1 of our geostatistical model.
We took soil types, vegetation management types and ground water table classes as explanatory variables in our random forest model f(.; s) to predict the spatial trend of pH for each grid cell. We used the random Forest package (Liaw & Wiener, 2002) in R (R Core Team, 2016) for fitting the random forest model. The gstat-package (Pebesma, 2004) in R is subsequently used for semivariogram estimation/modelling and kriging. We used the functions randomForest and krige for, random forest and kriging respectively. The uncertainty in the "observations" y(s), which are in fact estimates from indicator values of pH for species, is explicitly taken into account by incorporating the (square of the) standard errors of the pH-estimates in the kriging equations following the procedure of Delhomme (1978), referred to as "kriging with uncertain data."

| Uncertainty
Solving our geostatistical model not only leads to predictions of pH, but also to the associated variance of the prediction error.
Smaller variances indicate more accurate predictions. The variances of the prediction error are usually zero at the observations points. In our situation, this is generally not the case, as we took the standard error of each pH estimate explicitly into account in the kriging system.

| Cross-validation
To validate the predictions of the pH-map, we used leave-one-out cross-validation (Efron & Gong, 1983;Efron & Tibshirani, 1994;Wehrens, Putter, & Buydens, 2000). This is a well-established validation method where each observation is successively removed from the data and its value is predicted given the remaining data. In this way, an error can be computed for each data location. The errors are then summarised by a statistic like the mean error, as a measure of bias, or the root mean squared error, as a measure of accuracy.

| Validation
The predictions of the pH-map were also validated against an independent dataset. The BIS dataset (http://maps.bodemdata.nl/ bodemdatanl/index.jsp, last visited 17-6-2018) contains field measurements of soil pH that were not used for the estimation of the map. The BIS database contains only a limited amount of measurements in nature areas, * in total, since it mostly focusses on agricultural areas. The database contains data of soil pH measured in H 2 O, CaCl 2 en KCl extract, all three were used for the validation. Field data were linked to the pH map via the coordinates of the field data. National assessment of the soil pH, linked to soil quality. The predicted soil pH map was used to assess the abiotic quality of designated vegetation management types for all terrestrial nature areas in the Netherlands. We used threshold values per vegetation type to assess the quality. The threshold values were derived from vegetation relevés that are linked to and are exemplary for the vegetation types.
A response curve was estimated per vegetation type and the threshold values were defined as the 5 and 25 percentiles of the response curve (spline; Wamelink et al., 2011). The defined threshold values of the optimal conditions for soil pH per vegetation management type at a certain raster cell were compared with the predicted soil pH of the same raster cell. This was only done for the vegetation management types that have defined threshold values of the optimal conditions for soil pH (Appendix S2). If the predicted soil pH is above the maximum threshold value for pH for the vegetation management type, we assume that the abiotic quality for soil pH is considered good. If the predicted soil pH lies between the threshold values for maximum and minimum pH for the vegetation management type, the abiotic quality for soil pH is considered moderate. If the predicted value is below the minimum threshold value for the vegetation type the abiotic quality is considered bad; The threshold values are based on field measurements similar to the estimation of the response functions for the plant species (Wamelink et al., 2005.

| Patterns in predicted soil pH
The soil pH, based on the indicator values (Wamelink et al., 2005), was predicted in a raster map for all natural areas by means of kriging and varies between 3.0 and 8.6 ( Figure 2). As was expected the sandy soils in the East show on average lower pH values than the clayish areas in the West and the North, and the more calcareous soils in the South and in the coastal dune area. The map also indicates that soil pH may vary largely within a few kilometres, also linked to differences in vegetation management type, soil type and groundwater table. In the dune area (Appendix S3) the highest pH values are found closest to the sea, in the youngest yellow dunes. The older dunes in the eastern part have lower values, which can be expected from older grey dunes. Here natural processes lead to succession to heathers and forest with leaching of calcium as a result of natural acidification due to decomposition and build-up of organic matter (Kreutzer 1995;Xu, Tang, & Chen, 2006). In the East of the Netherlands, where the nature areas are scattered in an agricultural landscape, soil pH also seems locally highly variable within the individual areas (Appendix S4). This indicates that the method in even such a fragmented landscape is able to create gradients. The Dutch nature areas have two distinct large peaks in their soil pH, one around pH 4 and one around pH 6, with a minor peak around pH 7.7 (Appendix S5). This agrees with the many more acid sandy soils that occur in The Netherlands than light acidic clayey soils under nature. The pH peak of 7.7 is caused by calcareous soils in the South-East and the salty calcareous soils (dunes and salt marshes) in the coastal zone. Hence, the predicted soil pH patterns meet all our expectations based on expert knowledge.

| Uncertainty
For each area we also estimated the uncertainty in the predicted soil pH (Figure 3

| Cross-validation
The difference between predicted minus observed pH ranges from −2.82 till 2.36 pH-units, with an RMSE of 0.42 pH-units. One third of the predictions has a difference between −0.1 and 0.1 pH-units and 83% has a difference between −0.5 and 0.5 units (see also Appendix S7 for the distribution of the difference and Appendix S5). The difference between predicted and observed shows a normal distribution with zero mean, indicating that there is no systematic bias present. Major prediction errors seem to be present all over the country and not directly related to one of the underlying maps or the pH value itself (Appendix S8). Given the fact that the error in a pH measurement is about 0.1 pH-units the predicted soil pH values do resemble the observed values rather good, though at some sites the difference between predicted and field values may be large.

| Validation
The correlation between field data and predicted pH are quite reasonable till good and ranges between 0.66 till 0.85 (Table 1; see also Appendix S9). The mean difference between predicted and measured soil pH is for pH H 2 O almost zero indicating that the predicted pH is almost the same as the measured pH. The mean error is negative for KCl and CaCl 2 . This means that the predicted pH based on vegetation is on average higher than pH CaCl 2 and KCL of the soil.
That is expected as both K and Ca will remove protons (H) from the adsorption complex by cation exchange.
The period of the validation data 2000-2009 is a subset of the period of the pH data that has been used to create the map (1990 and later). This may influence the outcome of the validation. The validation are legacy data and have not been collected with validation of the pH map in mind. Therefore, the data are not evenly distributed over the Netherlands. Sandy areas are for instance over-represented. This may bias the validation results to these areas. The validity of the results are therefore limited to the validation areas.

| National assessment of the soil pH, linked to soil quality
The quality of soil pH is in most cases considered good for the present or planned vegetation management types (Figure 4). There is no pattern visible in soil pH that is considered bad; small areas can be found scattered all over the country, both in areas with a natural high soil pH as well as areas with a low soil pH. When looking at the vegetation management types separately, especially the semi natural grasslands still seem to suffer from soil pH values that are too low as well as part of the swamps ( Figure 5)

| D ISCUSS I ON
Acidification is considered one of the main pressures causing biodiversity loss. To reduce this pressure the Dutch government took measures to decrease acid emission and restore soil pH in nature areas since the 1980s of the last century. The predicted soil pH conditions, based on indicator values (Wamelink et al., 2005), are considered good in most ecosystems, and hence, we assume that acidification of the soil is not a major problem on a national scale anymore in the Netherlands. Though, in some areas the soil pH is still too low.
Especially, for semi natural grasslands and swamps the pH is too low.
In these case extra measures, such as liming (Kreutzer, 1995), may be necessary. Our findings that acidification is not a major problem anymore agrees with earlier research inside and outside the Netherlands Likens, Driscoll, & Buso, 1996;Wamelink et al., 2013). However, this is not the case everywhere; especially in Asian countries acid deposition is still high and expected to raise in the future (Larssen et al., 2006). We think the predicted soil pH map is a useful tool for evaluations of both government policies as well as nature conservation management on a regional till national scale. Despite an on average high data density, there are still nature areas without any available data. Here the predicted soil pH is based on the soil type, groundwater table and vegetation map and interpolation of data of nearby nature areas with relevés. As might be expected the uncertainty in the predicted soil pH is larger in these areas.
We choose 1990 as our starting year to include relevés, hence we used relevés made over a 25 year period. The period used is considered in relation to what we thought was an optimal data density. Given the data density it also should be possible to make a pH map based on a less optimal data density for example data collected every 5 or 10 years. If there is sufficient data available in the Netherlands to do this from 1950 this may lead to a series of maps that could show a change in soil pH over time as a result of acidification. Another problem related to this is the so called time-lag of plant species as indicators of soil circumstances. Since some species are long-lived they may have germinated and grown up under circumstances that are different than the present ones. Consequently their presence may be an echo of the past giving a false indication, the less species present in a relevé the bigger this chance is. A shorter interval in years, as suggested 5 or 10 years may overcome this problem partly.
To create a soil pH map, if possible, direct measurements of soil pH should be used. Currently, there are not sufficient measurements available to map soil pH nationwide on a site level. However, it is possible to improve our method by using field data as extra information.
The data can then be used with a higher weight than the estimated pH values of the relevés.
Cross-validation shows that the predicted soil pH are accurate in relation to estimated pH and the pH patterns meet our expectations.
Also the validation on a limited number of independent pH measurements shows that the soil pH is predicted quite well. Thus we assume that the results are robust enough to make an accurate soil pH map on a national level. An earlier validation of predicted soil pH compared with field measurements on a European scale also showed a good relation between predicted and measure soil pH (Wamelink et al., 2005). This opens the way of the evaluations of both government policies as well as direct field management on a regional till national scale. Our results show clearly that the measures taken to mitigate acidification has had its positive effect and that in many areas acidification does not lower the soil pH anymore. Acidification may still have an impact on base cation availability and thus affecting both plant and animal species occurrence.

ACK N OWLED G EM ENTS
We like to thank all people (way over 100) that were willing to share their valuable field measurements for this research. This paper was