A matter of time: Recovery of plant species diversity in wild plant communities at declining nitrogen deposition

High levels of nitrogen deposition have been responsible for important losses of plant species diversity. It is often assumed that reduction of ammonia and nitrogen oxide emissions will result in the recovery of the former biodiversity. In Western Europe, N deposition peaked between 1980 and 1988 and declined thereafter. In a 60‐year experiment in hay meadows, we tested the hypothesis that increasing and declining nitrogen deposition had negative, respectively, positive effects on plant species diversity.


| INTRODUC TI ON
The emissions of ammonia from intensive animal husbandry and nitrogen oxides from traffic and industry increased markedly from 1950 onwards with dramatic impacts on many wild plant populations both through soil acidification and eutrophication (Bakker & Berendse, 1999;Berendse et al., 2001;Bobbink et al., 2010;Duprè et al., 2010;Stevens et al., 2004Stevens et al., , 2010. Long-term experiments addressing the impacts of nitrogen inputs and soil acidification are rare, but all experiments covering more than one decade show that nitrogen addition leads to a rapid decline of plant species diversity (Bobbink, 1991;Clark & Tilman, 2008;Isbell et al., 2013;Storkey et al., 2015;Tian et al., 2016) or to important changes in species composition (Avolio et al., 2014).
Given the strong decline of sulphur oxide deposition after 1975 (Engardt et al., 2017), it might be expected that the observed decrease of ammonium and N oxide deposition has resulted, at least partly, in the recovery of the former biodiversity and that future policy measures to reduce ammonia emissions would facilitate further recovery of species richness. Storkey et al. (2015) observed in the Park Grass Plot Experiment in Rothamsted that after 1990 species diversity recovered in the plots where no nitrogen was applied.
However, other studies in grasslands, heathlands and boreal forests showed that recovery of vascular plant species composition after cessation of nitrogen addition can be extremely slow (Strengbom et al., 2001;Power et al., 2006;Clark & Tilman, 2008; see for a review Stevens, 2016).
Elevated nitrogen deposition has two important impacts on wild plant communities. The first effect is increased productivity and dominance of fast growing, tall species, overtopping many slow growing species that subsequently disappear (Berendse, 1994;Grime, 2001). In hay meadows and other short vegetation, species diversity often declines when aboveground biomass (including litter) exceeds 750 g m −2 y −1 (Al-Mufti et al., 1977;Grime, 2001). The second effect is at least as important. Deposition of nitrogen and sulphur oxides have strong acidifying impacts on the soil, but also ammonium deposition can contribute to soil acidification. In N-saturated soils, ammonium accumulates in the soil solution. This leads to increased activity of ammonium oxidizing bacteria (e.g., Nitrosomonas) that oxidize ammonium to nitrate producing two H + ions per ammonium ion, resulting in rapid soil acidification (Van Breemen et al., 1982).
Depending on soil characteristics acidification can lead to increased and even toxic concentrations of exchangeable aluminium and losses of magnesium, potassium and calcium ions through leaching (Haynes & Swift, 1986). Many plant species in Western Europe cannot cope with these conditions and disappear subsequently (Grime, 2001).
We hypothesized that elevated atmospheric N deposition increases productivity and reduces soil pH and that both these impacts may lead to the local extinction of plant species and reduced diversity.
The Ossekampen Grassland Experiment in the Netherlands (Elberse et al., 1983)  In the presented study, we investigate in the control plots the changes in plant species diversity over a period of 60 years and examine the changes in soil pH and productivity as the two most plausible explaining variables. We also examined the changes in plant species composition and the dynamics of the abundance of individual species.
The response of the vegetation to the raise and decline in N deposition is possibly confounded with the impacts of climatic change. To unravel this complex relationship, we investigated the impacts of changes in seasonal temperature and precipitation on the species diversity and the abundance of the individual plant species in the control plots. The experiment included fertilization and liming treatments that had diverging impacts on productivity or soil pH or both. We tested whether these treatment effects (e.g. N addition and liming) confirmed the observed correlations and the hypothesized interactions.
In this paper, we attempt to answer the following questions.
1. Does the observed increase (before 1987) and decline (after 1987) in atmospheric nitrogen deposition have the hypothesized negative, respectively, positive effects on species richness, species diversity and the abundance of short forbs and legumes? 2. Does the impact of atmospheric nitrogen deposition on species diversity act through its effect on aboveground biomass or through its effect on soil acidity? 3. Do the impacts of the liming, N addition and other treatments confirm the observed effects of the increase and decline in atmospheric nitrogen deposition on species richness and diversity in the control plots?

| Location and design of the experiment
The Ossekampen Grassland Experiment (Wageningen, the Netherlands) started in 1958 in a grassland on river basin clay that K E Y W O R D S grasslands, hay meadows, long-term dynamics, nitrogen deposition, oscillations, plant species abundance, plant species diversity, productivity, soil acidification was extensively grazed, and where the hay was harvested every other year (Elberse et al., 1983;Pierik et al., 2011). It is located at the Ossekampen Experimental Farm near Wageningen, in the centre of the Netherlands (51°58′15″N; 5°38′18″E). The soil consists of heavy river basin clay with soil pH-KCl 4.9, soil organic matter content 21%, P-content 15 mg P kg −1 soil and K-content 116 mg K kg −1 soil (as measured at the start of the experiment). During the first year, the dominant plant species were Holcus lanatus, Anthoxanthum odoratum, Festuca rubra and Agrostis capillaris (Table S1 gives all plant species present in 1958).
In 1958, the experiment started with 12 plots of 40 m 2 (16 × 2.5 m) that received six different fertilization treatments.
The plots with the different treatments were randomly located within each of two replicated blocks. The treatments were as follows: control (no fertilizer or liming), Ca, K, P, PK and NPK1 (Ca: 715 kg Ca ha −1 y −1 ; K: 332 kg K ha −1 y −1 ; P: 52 kg P ha −1 y −1 : N: 160 kg N ha −1 y −1 ). After 1981, lower amounts of Ca, P and K were applied, while the applied amounts of N were kept equal (Table S2). Fertilizers were applied each year, but in 1958 no fertilizers were applied to enable assessment of the initial vegetation composition and soil characteristics. Ca, K, P and N were applied as lime marl, potassium chloride, superphosphate and ammonium nitrate, respectively. In 1966, two other treatments were initiated: N-only and NPK2. These plots were duplicated as well in the two replicated blocks. The amounts of added N, P and K in the new plots were equal to those in the NPK1 treatment (Table S2).
Between 1958 and 1966, these new plots were managed like the control plots. The aboveground biomass in all plots was harvested twice each year: in the last week of June or the first week of July and in October.

| Measurements and modelling of atmospheric N and S deposition
The deposition data (wet and dry) of ammonium and ammonia (NH x ), nitrogen oxides (NO y ) and sulphur oxides (SO x ) were obtained by combining measurements and modelling (Environmental Data Compendium, 2017a). For the periods from 1993 onwards, between 1981 and 1993, and before 1981, we applied different approaches depending on the availability of measured data. The presented data from 1981 to 2017 are country-wide averaged values of the modelled deposition of nitrogen and sulphur oxides per grid cell (5 × 5 km). The deposition per grid cell is calculated using large-scale emission data maps for the entire European continent and data on local emission contributions. The model simulates atmospheric transport between the different grid cells, chemical conversions in the atmosphere and wet and dry deposition processes (Velders et al., 2010).
From 1993 onwards, the modelled deposition data were calibrated annually using measured atmospheric and rainwater concentrations of ammonia and ammonium, nitrogen oxides and sulphur oxides (Velders et al., 2010) The data between 1981 and 1993 were produced by model simulation using yearly data on the emissions of NH 3 and NO x , but without annual calibration to measured atmospheric concentrations.
Estimates of NO y deposition before 1981 were extrapolated backwards using data on human population numbers, car use, industrial production and electricity production available from the public data base Statline (CBS, 2012;Noordijk, 2007). Ammonium deposition in the years before 1981 was extrapolated backwards using the livestock numbers in Statline (CBS, 2012) and historical reports on land use and applied amounts of fertiliser. Sulphur oxide deposition data from the period before 1981 are less reliable since they are based on assessments that did not take into account changes in sulphur content of the combusted coal (Anonymus, 1999).

| Soil measurements
Before 1985, soil samples were collected every 4-7 years and after 1985 every 3 years. At each sampling occasion, 50 soil samples (2.4 cm diameter; 5 cm deep) were randomly collected in each plot.
The samples were taken in early spring before the application of fertilizers. The 50 samples were pooled. Soil pH-KCl was measured after extracting 10 g field moist soil with 50 ml 1 N KCl.

| Vegetation measurements
The aboveground biomass in each plot was measured by cutting a strip of 1.25 × 16 m at the end of June or beginning of July (before the first harvest) and again in October (before the second harvest).
Fresh weights of the harvested material were then determined.
After measuring fresh and dry weights of subsamples, the total dry mass was calculated using the measured dry matter contents. From 1977 onwards, total N concentrations in each harvested sample was measured after digestion of 200 mg ground material by 30 N sulphuric acid and a mixture of sodium sulphate, copper sulphate, selenium and salicylic acid. Ammonium concentrations in the diluted digests were measured colorimetrically using indophenol blue with salicylate.
Plant species abundances were measured in 50 subsamples (25 cm 2 ) taken along three parallel lines within each plot during the first half of May. In each subsample, all species were recorded and the abundance of each species was expressed as its frequency (F i for the i th species), that is the fraction of subsamples in which it was found (De Vries, 1948). In addition, we distinguished five functional groups: legumes, short (<1 m) and tall (>1 m) forbs and short (<1 m) and tall (>1 m) grasses using the maximum plant heights in the LEDA database (Kleyer et al., 2008).
Plant species richness (S) was estimated as the number of species present in the 50 subsamples per plot. The second measure of diversity that we use is the weighted species richness (D) that was calculated by summing the observed species weighted by their This index D is equal to the expected number of species in one single random subsample (Solow & Smith, 1991). D varies between 1/N (one species present in one of N subsamples) and S (all species present in all N subsamples). S and D (diversity in this paper) are the two extremes of the Standardised Species Count family of diversity indices (Kempton, 1979). This index fits well with the subsampling method that we applied (Solow & Smith, 1991).

| Statistics
To answer the first question asked in the introduction, we analysed the relationships between on the one hand atmospheric N deposition and on the other hand productivity, soil pH, species richness, diversity and the frequencies of the functional groups in the control plots. For this purpose, we choose to apply non-parametric tests (Kendall's test) since the N and S deposition data were modelled and extrapolated data so that we could not assume normal or log-normal distributions. In all cases, Kendall's test produced similar results as Pearson's and Spearman's tests did. To address the second question, we examined the impacts of productivity and soil pH on species richness and diversity by ANOVA testing for the effects of productivity*treatment and pH*treatment interactions using the quasi-Poisson log-linear model (Jongman et al., 1995). Subsequently, we fitted smoothing splines using the species richness and diversity data of all treatments. We performed model comparisons using deviance-based F-tests (Jongman et al., 1995).
In addition, we investigated whether the effects of N and S deposition acted by their impacts on soil pH or on productivity by using the Structural Equation Model (SEM) approach. We constructed two models. The first model included only the hypothesized interactions. The second model included all possible interactions and also the effects of year on diversity since both local extinction and new establishment of species can take many years. These models were constructed applying a linear SEM model to the data collected in the control plots from 1965 onwards. Restriction to the data from 1965 onwards makes the relation between N deposition and pH approximately linear and limits the possible effects of the transition from pasture to hay meadow at the start of the experiment. Differences between blocks were minor and never significant. Therefore, the variable block was not added to the model.
In order to answer the third question, soil pH, productivity, species richness, diversity and frequencies of functional groups were compared between treatments using a repeated measures ANOVA with year as within-subject factor and fertilization treatment as between-subjects fixed factor. We performed this analysis for the periods 1958-1987 and 1988-2017, separately, since N deposition increased before 1987, while it declined after 1987. Relationships with time per treatment were fitted using smoothing splines. To test the differences between the slopes of soil pH, species richness, diversity and functional group abundances in the different treatments, we calculated linear regression coefficients for each individual plot applying piecewise linear regression (Montgomery & Peck, 1992) with breakpoints in 1987. The differences between regression coefficients were analysed using ANOVA with treatment as fixed factor and block as random factor which was followed by pairwise t-tests.
We also tested the differences between these regression coefficients before and after 1987 using t-tests.

| Ordination of plant species composition
To address questions 1 and 3, we investigated the changes in plant species composition over time and the impacts of the treatments applying Principal Component Analysis (PCA) (Jongman et al., 1995;ter Braak, 1994) and Principal Response Curve analysis (PRC) (Van den Brink & ter Braak, 1999) using the log-transformed frequencies of each species (log(F j + 0.02)). The analyses were complemented with variance decomposition and permutational ANOVA tests.
The changes over time were displayed by trajectories in ordination diagrams. The typical shape of trajectories in eight initial PCAs (one per treatment) was also found in a single partial PCA of all 589 vegetation samples in which plot (i.e., treatment) was the conditioning factor. This plot-adjusted PCA analyses the within-plot variation only, allowing it to visualize plot and treatment dependent year effects. A second partial PCA of all samples had year as conditioning factor. This year-adjusted PCA analyses the within-year variation only, allowing it to visualize the time-dependent plot and treatment effects. The results of these analyses were interpreted with the help of the supplementary variables diversity, pH, production and N deposition. Species scores were standardized post hoc, that is the original scores were divided by each species' standard deviation, so that distance to the graph origin is an approximate measure of fit.
Variance decomposition and testing of effects was carried out by redundancy analysis (Peres-Neto et al., 2006). Treatment main effects were tested using random permutations of the 16 plots, keeping years together, using the 1966-2017 data. For this period, data of all eight treatments were available. Year and treatment-byyear effects were tested using independent cyclic (time series) shifts of years within plots. Ordinations and tests were carried out using Canoco 5.12 (Ter Braak & Šmilauer, 2018), in which conditioning factors are termed covariates.
We complemented the second partial PCA by a PRC analysis with associated tests which tested the time-dependent treatment effects ( Van den Brink & Ter Braak, 1998, 1999. Further details are given in the Supporting Information (Figures S8 and S9).

| Atmospheric N and S deposition
In the Netherlands, country-wide atmospheric deposition of NH x and NO y compounds increased sharply from 1950 onwards as a result of the increases in livestock, traffic and industrial production after the Second World War (Figure 1). During the entire period, the average contribution of NH x to total N deposition was about 65%. Deposition of NH x was highest in 1988 (2,100 mole N ha −1 y −1 ). Peak deposition of NO y was observed in 1987 (1,100 mole N ha −1 y −1 ). Thereafter, N deposition declined. The data before 1993 have greater uncertainties than the data after 1993 since the applied model was calibrated using measured data from 1993 onwards.
Annual deposition of sulphur oxides reached its peak in the 1960s (approximately 4,000 mol ha −1 y −1 ) and had declined in 2017 to ca. 7% of this maximum. From 1980 onwards, the acidifying impact of sulphur oxide deposition was less than that of atmospheric nitrogen deposition. The total deposition of acidifying N-and Scompounds increased from about 3,000 mol ha −1 y −1 in 1950 up to levels that were twice as high between roughly 1965 and 1985 and decreased thereafter to ca. 2,000 mol ha −1 y −1 in recent years. After 1990, ammonium dominates the acidifying deposition. Its acidifying effect depends on the oxidation of ammonium to nitrate, so that the NH x deposition data should be considered as potential acidifying deposition.

| Soil pH
Soil pH in the control plots was negatively correlated with atmospheric N deposition in the preceding year (Kendall's τ = 0.728, n = 18, p < .001). In all non-limed plots, soil pH dropped significantly between 1958 and 1987 ( Figure S1; Table S4). Addition of only N resulted in a strong decline in soil pH, from 4.4 in 1970 to 3.4 in 2017. Over the whole period , this decline was strongly significant (p < .001), although the decrease during the first period  was only marginally significant (p = .054) due to the small number of years after the start of the N addition treatment. After 1987 soil pH recovered in the control plots and the K, P and PK plots, but not in the three N addition treatments (Tables S3 and S4). In both periods, liming had a strong and P addition had a small positive effect on soil pH, while K and PK fertilization had no effects.

| Aboveground productivity
In the control plots, there was a weak negative-instead of positivecorrelation between atmospheric N deposition and productivity Productivity in the four NPK plots almost doubled relative to the control plots ( Figure S2). On average, it exceeded the production in the control plots by 4.2 ton ha −1 y −1 and after 1987 it increased even further (t = 3.42; p < .001). Addition of N only had a very small positive impact on productivity between 1966 and 1987, but this effect disappeared after 1987 (Table S5). K addition had no effect but liming and P and PK fertilization increased productivity by 1 ton ha −1 y −1 in the Ca and P treatment and by 1.4 ton ha −1 y −1 in the PK plots.
F I G U R E 1 Changes in atmospheric wet and dry deposition of total nitrogen (NH x + NO y ; black) and sulphur oxides (orange) in the Netherlands during the 20th and 21th century. Presented values are country-wide averaged values per grid cell (5 × 5 km)

| Species richness and diversity
Species richness (S, the number of species in 50 samples) and diversity (D, weighting each species by its frequency F j ) in the control plots were negatively correlated with atmospheric N deposition in the preceding year (S: Kendall's τ = −0.543, n = 38, p < .001; D: Kendall's τ = −0.566, n = 38, p < .001), but were not significantly related to any of the seasonal temperature variables that we investigated. After the start of the experiment, plant species richness (S) and diversity (D) dropped fast in all plots (Figure 2; Figure S4).
Remarkably, from 1987 onwards species richness and diversity recovered in all treatments (Figure 2). The slopes of the relationships of species richness and diversity versus time switched from negative before 1987 to positive between 1987 and 2005 (Tables S6 and S7).
In the two NPK1 plots, 72% and 76% (26 and 31 species, respectively) of the species that were initially present disappeared between 1958 and 1987, as compared to 11% and 21% (4 and 7 species, respectively) in the two control plots. The decline in species richness was significantly steeper in the N and NPK plots (−0.87 ± 0.08 species/y) than in the plots without N addition (−0.42 ± 0.05 species/y; Table S7).
In all non-limed plots, the increase in species richness after 1987 was relatively slow (0.28 ± 0.05 species/y), but highly significant  (Table S7).
The within-treatment relationships between species richness or diversity and productivity were weak, while there were no significant effects of the productivity-by-treatment interaction on species richness (p = .06) and diversity (p = .39). Overall, the relationship was unimodal (Figure S5a,b). The fitted splines deviated significantly from log-linear lines (both S and D: p < .001). Higher productivity increased species diversity when the harvested biomass was below 6 ton/ha, while its impact was negative when the biomass exceeded 6 ton/ha. Declining soil pH negatively affected species richness and diversity ( Figure S5c,d). The within-treatment relationships of species richness and diversity with pH were all positive. The pH-bytreatment interactions were not significant (S: p = .35; D: p = .72).
The fitted splines ( Figure S6c,d) deviated significantly from a horizontal line (both S and D: p < .001).
The SEM with the hypothesized (indirect) impacts of N and SO 2 deposition on diversity (D) has coefficients that deviate significantly from zero and correspond in sign with the hypothesized effects, except for the non-significant link between SO 2 deposition and soil pH ( Figure S6a). If all possible direct and indirect effects are included, only the negative impact of atmospheric N deposition on soil pH is significant ( Figure S6b). The direct effect of N deposition on diversity in this SEM is large and negative but not statistically significant, also when the effect of SO 2 deposition is removed from the model.

| Functional group composition
In the control plots, legumes (p < .001) and short forbs (p = .01) decreased during the first 30 years but they recovered after 1987 F I G U R E 2 Changes in (a) species richness (the number of species in 50 samples) and (b) species diversity (weighting each species with its frequency F j ) in the control plots and in the plots that received Ca, N and NPK additions. The fitted lines are smoothing splines with 5 degrees of freedom using a log-linear model (p < .001 and p = .003, respectively; Figure 3a,b). The abundances of legumes, short forbs and short grasses in the control plots were negatively related to atmospheric N deposition (legumes: Kendall's τ = −0.471, n = 39, p < .001; short forbs: τ = −0.274, n = 39, p = .014; short grasses: τ = −0.389, n = 39, p = .001). The abundance of legumes, short forbs, tall forbs and short grasses was not related to any of the seasonal temperature variables that were investigated. The abundance of tall grasses was weakly correlated with average spring temperature (p = .047).
Galium verum and Leontodon saxatile) and short graminoids (e.g. Luzula campestris) disappeared during the first decade, while the short grasses were replaced by tall grass species (Arrhenatherum elatius, Dactylis glomerata, Elymus repens). The tall forbs Anthriscus sylvestris and Heraleum sphondylium established themselves in these highly productive plots 20 and 15 years, respectively, after the start of the treatments. During the last decade, a few short grass (e.g., Anthoxanthum odoratum, Festuca rubra) and short forb species (Plantago lanceolata, Leucanthemum vulgare) that had disappeared earlier, re-established. The legume species that had disappeared did not return in these plots.
In the N addition plots, all legumes went locally extinct within a few years and the short forbs strongly declined (Figure 3a,b). The short forb species that disappeared were characteristic for relatively high soil pH values: for example, Lysimachia nummularia and Ranunculus repens (Kruijne et al., 1967). The recovery of species richness and diversity after 1987 in the N-only plots was mainly due to an increase in short forbs (p < .001), such as Hypochaeris radicata and Leontodon saxatile, characteristic of soils with low P supplies (Kruijne et al., 1967). None of the legume species did recover in these plots.
During the first decades, the tall grasses expanded in the Ca, K, P and PK plots (p < .001) but not in the control plots (p = .823; Table S9a). After 1987, short forbs recovered in the limed and in the K and P plots (p < .001) and reached in the Ca and P treatments higher abundances than in the control plots ( Figure S7b, Table S8d).

| Plant species composition
Variance decomposition of the plant species composition showed that the treatment main effects (df = 7) explained 37% of the total variation in species composition, followed by the year main effects (22%, df = 38) and the treatment-by-year interaction (21%, df = 252).
In terms of mean squares (taking into account differences in df), the treatment-by-year effects are small compared to the treatment and year main effects. Nevertheless, all three effects were highly significant (p < .001, using 999 permutations; Table S10). The block effect was only tiny (2%, df = 1).
The first, plot-adjusted PCA summarizes the dynamics in species composition ( Figure 4a). It reveals that there is a strong trend over the years that is shared by all plots and treatments ( Figure S9

| Dynamics of plant species abundances
Subsequently, we analysed the dynamics of the 40 species found in more than 2% of all samples in the control plots. These species revealed amazingly diverging temporal patterns. These dynamics vary from steady increase or decline to seeming random dynamics or regular oscillations with remarkably constant intervals (Figure 6; for all species and all plots Figure S10). Almost all dicot species (20 out of 22 species) followed more or less regular fluctuations, while most grass species (12 out of 14 species) showed no clear patterns.  (Table S11). However, the residuals of the regressions of species abundance on temperature or precipitation showed fluctuations similar to the species abundances (data not shown), indicating that fluctuating weather conditions could not explain the observed oscillation patterns.

| The impacts of N deposition on plant species composition in the control plots
In Western Europe, nitrogen deposition has declined during the last decades (Engardt et al., 2017), but in many other parts of the world it is expected to increase further in the near future (Galloway et al., 2004) while the impacts on biodiversity at a global scale might be far reaching. In our control plots, decline and recovery of species richness, diversity, legumes and short forbs followed the increase and decline of atmospheric N deposition. In the 1980s when N deposition and total acid deposition reached their maximum, not only the different diversity measures and the abundance of short forbs and legumes declined to their lowest levels, but in these years also the trajectories of the total plant species composition (in ordination space) reached their turning points in all plots. Unfortunately, we cannot make a proper comparison with the dynamics of hayfields that experienced continuously low N deposition during the last 60 years and hence could serve as an appropriate control. During this period, all parts of the Netherlands (and neighbouring countries) were exposed to very high levels of atmospheric N deposition about ten times higher than the N deposition in preindustrial times.
After 1987, plant species diversity in the control plots increased again, but the former species composition did not recover completely.
After 60 years, diverse plant communities had developed again, but with a species composition that differed from the communities at the start of the experiment, although in all years the grasses Festuca rubra, Agrostis capillaris and Anthoxanthum odoratum remained the dominant species. During the first decennia species characteristic of grazed pastures (e.g. Trifolium repens, Lolium perenne, Poa trivialis) disappeared and did not return, while during the last decennia species of relatively phosphorus-poor hay meadows established themselves (e.g. Danthonia decumbens, Carex panicea) or increased (e.g. Lotus corniculatus, Luzula campestris; Kruijne et al., 1967). The close relationships that we found between plant species diversity and atmospheric N deposition seem to confirm the results of many other studies that investigated the impacts of high N deposition on F I G U R E 5 Trajectories of plots (a) and the positions of selected species (b) resulting from a principal components analysis of logtransformed plant species frequencies, adjusted for year effects. Selected species have more than 20% of their variance explained. These diagrams display 46% of the residual variation. Selected species have more than 15% of their variance explained. Filled circles: start of trajectory (1958 or 1966); filled squares: end of trajectory (2017); up triangle: 1987; down triangle: 2005. Species abbreviations are given in Table S1 [Colour figure can be viewed at wileyonlinelibrary.com] plant species diversity Clark & Tilman, 2008;Duprè et al., 2010;Stevens et al., 2004Stevens et al., , 2010Storkey et al., 2015;Tian et al., 2016).
It is possible that the new species composition reflected not only the changed soil conditions but also the warming climate. We found that the frequency of a large part of the 40 investigated species was positively or negatively related to seasonal temperatures, which is in line with the results of many other studies (e.g. Parolo & Rossi, 2008;Tamis et al., 2005;Thomas et al., 2004). However, there were no effects of any of the seasonal temperature variables on species richness, diversity or the abundance of legumes or short forbs.
The dynamics of the individual species were in most cases not related to the changes in nitrogen deposition. Only one of the grass species (Holcus lanatus) followed the changes in nitrogen input and soil pH. In most dicot species, we observed more or less regular fluctuations in abundance. Oscillations in animal populations are often explained by predator-prey or host-parasite interactions (Turchin, 2003). A rapidly increasing body of literature suggests that such predator-prey cycles also occur in plants and are driven by the dynamics of root parasitic fungi and nematodes which affect and are affected by the different plant species (Bever et al., 2012;Bezemer et al., 2006;Van de Voorde et al., 2012). The observed regular fluctuations confirm earlier theoretical predictions based on Lotka-Volterra models for plant-pathogen interactions (Bever et al., 2012).
It is possible that the steep recovery of legume species in the control plots after 1995 has been facilitated by the near absence of legumes between 1975 and 1995. During these two decades, root parasitic nematodes and fungi specialized on the legumes might have disappeared relaxing the pathogen pressure on these species (de Deyn et al., 2004;Mommer et al., 2018;Sarathchandra et al., 1995).

| The impacts of productivity and soil acidity on plant species diversity
The next question to be answered is whether the impact of N deposition on soil pH or its impact on plant production caused the vegetation diversity and species composition to change in the observed direction. Most studies assume that an increase in standing biomass and enhanced above-ground competition for light is the primary mechanism responsible for the decline in plant species richness and diversity when N supply increases (Borer et al., 2014;Hautier et al., 2009;Suding et al., 2005).
Remarkably, in our experiment we found no positive correlation between N deposition and productivity in the control plots, while we measured only negligible effects of N addition on biomass production and clear negative impacts on diversity. NPK fertilization did increase productivity very strongly which resulted in a sharp decline of diversity, and the local extinction of all short forb and legume species. Over all treatments, we found a weak hump-shaped relationship between species diversity and productivity with species diversity declining at high productivity levels (in the NPK plots) which confirms classical knowledge (Al Mufti et al., 1977;Grime, 2001), but in the control plots changes in productivity were not correlated with the loss or recovery of species diversity. These results agree with Grace et al. (2012Grace et al. ( , 2016 who found that only a small fraction of the variance in species richness in their grassland plots was explained by the species richness-productivity relationship. They emphasized the need of understanding the underlying mechanisms. On the other hand, species richness, diversity and frequencies of short forbs and legumes in the control plots were strongly correlated with soil pH, while soil pH was negatively correlated with F I G U R E 6 Dynamics of the abundance (F i ) of three grass species (Anthoxanthum odoratum (a), Festuca rubra (b), Holcus lanatus (c) and three forb species (Rumex acetosa (d), Leucanthemum vulgare (e), Plantago lanceolata (f)) in one of the control plots. See Figure S1 for all species and all control plots [Colour figure can be viewed at wileyonlinelibrary.com] atmospheric N deposition in the preceding year. Treatments (Ca and P) with positive effects on soil pH had positive impacts on the two diversity measures and the abundance of legumes and short forbs, while N addition resulted in rapid soil acidification and severe losses of diversity and complete disappearance of legumes. During the first decennia (1958)(1959)(1960)(1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969)(1970)(1971)(1972)(1973)(1974)(1975) not only N deposition, but also the former high S deposition contributed to the observed soil acidification.
After 1975 the contribution of sulphur oxide deposition to total acid deposition decreased rapidly, so that we did not find any correlation between sulphur oxide deposition and soil pH.
The first Structural Equation Model that we built confirms these results. There was a negative relationship between N deposition and soil pH and no effects on plant production, while diversity significantly decreased with declining soil pH. The second model including all possible interactions confirmed again the significant effects of N deposition on soil acidity and the absence of any effect on productivity. In this model, none of the impacts on diversity were significant, which can probably be attributed to the nonlinear effects of time that were not covered by the linear relationship between diversity and year that was assumed in the model. In Section 4.4, we discuss these important non-linear temporal effects in more detail.
Summarizing, we conclude that in our experiment the increase and decline of nitrogen deposition acted by its impacts on soil acidity and not by its effects on productivity or standing biomass.

| The effects of liming and N addition treatments on plant species diversity
So far, our results suggest that increased N deposition leads to soil acidification with important negative effects on species diversity, while reduction of N deposition leads to recovery of soil pH and subsequently to increased plant species diversity. However, the treatment effects in our experiment blur this viewpoint. In all non-limed plots soil pH dropped significantly between 1958 and 1987 confirming the acidifying effects of atmospheric N deposition and its negative effects on species richness and diversity, but also in the limed plots where soil acidification was more than compensated, severe losses of species richness and diversity were measured. Apparently, not only soil acidification, but also other factors contributed to a rapid decline of species diversity and the local extinction of species.
A second important observation is that also in the N and NPK treatments species richness and diversity increased again after 1987, while in these plots the amounts of added N were four times higher than the N inputs by atmospheric deposition. This does not justify the conclusion that the decline and increase of diversity are merely due to the increase and decline of atmospheric N deposition. Nevertheless, at the end of the experiment the diversity in the N plots was about one-third and in the NPK plots about two-third lower than in the control plots, confirming the negative impacts of increased N deposition on plant species diversity.
A similar recovery of diversity after N additions had stopped was observed in other long-term studies as well (Clark & Tilman, 2008;Power et al., 2006;Stevens, 2016;Storkey et al., 2015;Strengbom et al., 2001). In the Park Grass Plot experiment, established in 1856 in the United Kingdom (Silvertown et al., 2006), a recovery of species diversity took place from 1990 onwards, which coincided with the decline in atmospheric N deposition in the UK (Storkey et al., 2015). However, in this same experiment diversity also recovered in the plots where N fertilization was continued, although recovery was faster in the plots where N addition had stopped. These findings agree with our observations, although Storkey et al. (2015) concluded-without any restriction-that declining N deposition resulted in the recovery of diversity.

| Local extinction and establishment: a matter of time
It is striking that the pattern of decline and recovery of diversity, short forbs and legumes was independent of the applied treatments which manipulated productivity or soil acidity. This finding strongly suggests that at least part of this dynamics is the consequence of the slow response of the plant community to the new conditions created at the start of the experiment or changes in environmental factors that we did not investigate, but nevertheless changed during the experimental period.
During the first decades, many species disappeared since they were not adapted to the new hay meadow conditions created when the experiment started in 1958. Plant species characteristic of grazed pastures (e.g. Cynosurus cristatus, Lolium perenne, Phleum pratense; Kruijne et al., 1967) that were present at the start of the experiment had only disappeared after 10-20 years. It is important to note that such changes probably occur at the beginning of each field experiment due to the start of homogeneous and consistent management and the intensive monitoring that each field experiment requires.
The new species observed during the last three decades apparently needed several decades to arrive and establish. Hay meadow species of relatively nutrient-poor substrates only established after 1987 (e.g. Danthonia decumbens, Carex panicea, Ajuga reptans). It seems that both local extinction and new establishment of species adapted to the changed environmental conditions can need several decades.
A convincing example is provided by the two NPK treatments that had started in two different years (1958 and 1966).
Productivity in these plots reached high productivity levels immediately after start of the fertilization, but in both treatments the tall forbs Anthriscus sylvestris and Heracleum sphondylium, characteristic of very fertile soils, needed 15-20 years after start of the treatment to establish themselves. In these fully fertilized plots, analysis of the dynamics of the individual species revealed another striking phenomenon. Species of intermediate soil fertility levels (e.g. Leucanthemum vulgare and Anthoxanthum odoratum) had first disappeared due to the strongly increased biomass but re-established themselves during the last years. Possibly, strong natural selection in these highly productive plots had favoured genotypes that are able to survive on these fertile soils. Future research can test this hypothesis by collecting shoots and seeds in the control and the NPK plots and re-introduce these species in low and highly productive vegetation. The growth response following introduction might reveal possible genetic differences between the two plant populations.

| CON CLUS IONS
Our findings confirm the conclusions of Stevens (2016) who reviewed the studies that analysed the impacts of reduced nitrogen inputs. He found that soil chemistry can recover rapidly but that recovery of vascular plant species composition can take many years, possibly due to the lack of sufficient seed sources in the neighbourhood. In the landscape that nowadays surrounds the Wageningen Grassland Experiment, source populations of almost all grassland species characteristic of low and intermediate soil fertility have disappeared due to strong agricultural intensification. However, we observed that establishment of some species can take many years, even when the species was present in other plots within a radius of less than 15 m. For example, in 1973 Anthriscus sylvestris newly established in one of the two replicated NPK plots but it took 15 years (1988) before this species also established in the other replicated NPK plot.
In a meta-analysis of studies on recovery after pulse disturbances, Hillebrand and Kunze (2020) showed that the recovery of species composition is often much slower than that of ecosystem functions, also after other kinds of disturbance. In our experiment that spanned a much longer time period than most other experiments, this long-term recovery of plant species diversity becomes visible.
Our results show that policies aiming to reduce ammonia and nitrogen oxide emissions will contribute to the restoration of species-rich, flowering meadows, but our study also provides strong arguments for long-standing patience after measures have been taken to restore plant species diversity in wild plant communities.

ACK N OWLED G EM ENTS
We thank all people who were involved during the last 60 years in the vegetation sampling, the sorting of the biomass samples and the identification of species.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available