A comparison between data requirements and availability for calibrating predictive ecological models for lowland UK woodlands: learning new tricks from old trees

Abstract Woodlands provide valuable ecosystem services, and it is important to understand their dynamics. To predict the way in which these might change, we need process‐based predictive ecological models, but these are necessarily very data intensive. We tested the ability of existing datasets to provide the parameters necessary to instantiate a well‐used forest model (SORTIE) for a well‐studied woodland (Wytham Woods). Only five of SORTIE's 16 equations describing different aspects of the life history and behavior of individual trees could be parameterized without additional data collection. One age class – seedlings – was completely missed as they are shorter than the height at which Diameter at Breast Height (DBH) is measured. The mensuration of trees has changed little in the last 400 years (focussing almost exclusively on DBH) despite major changes in the nature of the source of value obtained from trees over this time. This results in there being insufficient data to parameterize process‐based models in order to meet the societal demand for ecological prediction. We do not advocate ceasing the measurement of DBH, but we do recommend that those concerned with tree mensuration consider whether additional measures of trees could be added to their data collection protocols. We also see advantages in integrating techniques such as ground‐based LIDAR or remote sensing techniques with long‐term datasets to both preserve continuity with what has been performed in the past and to expand the range of measurements made.


Introduction
The link between ecosystems and human society and the benefits that the latter obtain from the former are clearly acknowledged (Millennium Ecosystem Assessment 2005;TEEB 2010;DEFRA 2011). To understand the ways in which ecosystems will change, along with the goods and services that human societies obtain from them, models are needed that can be projected into the future (Clark et al. 2001;Evans 2012;Evans et al. 2012Evans et al. , 2013a. These models will typically be process based as these are more appropriately projected into novel conditions than statistical models. Such process-based models are highly data intensive, and data availability may constrain the ability to develop, parameterize, and test them (Evans et al. 2014;Lonergan 2014).
Forests and woodlands are important from global to local scales and provide many goods and servicesranging from their role in the global carbon cycle to their esthetic and amenity value as well as timber. The UK's recent National Ecosystem Assessment suggested that woodlands provide many ecosystem services varying from provisioning services such as timber and fuel wood, to regulating services such as climate regulation and flood regulation, to cultural services such as recreation and tourism (DEFRA 2011). The value placed on the carbon sequestration service provided by UK woodlands was £680 million/year, with a further £77 million/year due to the carbon sequestered in harvested wood products, while the value of timber production was estimated as £113-131 million/year (DEFRA 2011;Quine et al. 2011). A more recent analysis of the economic benefit of woodlands in the UK (Economics Europe 2015) suggested that the total value of UK woodland is £270 billion.
While the language of ecosystem services has only been current for 40 years or so (Daily 1997), the goods and services provided by woodlands and forests have been valued for a long time; an example from the 18th century "forests. . .are of considerable service to neighbourhoods that verge upon them by furnishing them with peat and turf for their firing; with fuel for the burning of their lime; and with ashes for their grasses; and by maintaining their geese and their stock of young cattle at little or no expense" p. 24 (White 1977). As this quote illustrates, the services that were valued in the 18th century were fuel and shelter. An earlier text by John Evelyn (1665 [but presented at the Royal Society in 1662]) elaborates the many services and goods obtained from woodland in the 17th century (Evelyn 2012). These include timber for building ships, dwellings and weapons, fuel, food and shelter. Evelyn's book also details how foresters measured trees in the 17th century. Several pages (pp. 82-87) are given over to describing the sizes of particular trees in terms of the diameters and heights of their trunks and the diameter of their canopies. These measurements are then used to calculate either the amount of timber (and its value) that could be obtained from the tree or the number of animals that could be provided with shelter under its canopy. Four hundred years ago, it is clear that the main good obtained from trees was timberit was a major construction material, and as Evelyn makes clear, the construction of ships and the security of the country depended on having good quality timber available.
The main modern reference work on forest mensuration in the UK deals with, for standing trees, the measurement of trunk diameter (DBH), basal area (area at breast height), timber height (height to which usable timber extends), tree height, and timber volume (Matthews and Mackie 2006). There is no mention of tree measurement for any purpose other than timber production in this handbook. A wide-ranging review of this subject includes the measurements listed above and additionally includes mention of methods to measure tree crown area, crown depth, and radial growth and to estimate leaf surface area, leaf weight, and sapwood area (Laar and Akca 2007). Again there is little, if any, mention of an ecosystem service other than timber production.
DBH and tree height are the two measurements made on trees in the UK's Environmental Change Network (ECN). ECN was established in 1992 and makes standardized measurements at fixed intervals (for trees, every 3 years for DBH and every 9 years for height). ECN was one of the original members of the European network of the International Long-Term Research network (ILTER). ILTER has adopted a comparable monitoring approach since its launch in 2003, as does the US-LTER that has been operating since 1980. Also in 1980, the Smithsonian Tropical Research Institute (STRI) established a forest plot on Barro Colorado Island, Panama. STRI has developed tree census techniques in which all trees in a plot have DBH measured every 5 years (Condit 1998). This project has now expanded from the original plot to include over 60 plots globally (including the one referred to in this article as the Oxford plot). These comprise the Global Earth Observatory network (ForestGEO), which is now collecting data on carbon pools and fluxes on some of its plots. These various projects aimed at collecting long-term data using forestry techniques to address ecological questions have already been extremely valuable; for a review of the ECN, see Morecroft et al. (2009) and of ForestGEO, see Anderson-Teixeira et al. (2015).
To address concerns about the likely impact of environmental changes in the future, it will be necessary to move from describing ecosystems to predicting their likely future state in changed conditions. Ecology has not traditionally focussed on prediction despite repeated calls to do so (Simberloff 1981;Judson 1994;Grimm 1999;Clark et al. 2001;Evans et al. 2013a). For an ecosystem containing taxa with long-lived individuals, it is almost impossible envisage how prediction can be achieved without the use of computational models.
Ecological models that are capable of being projected into the future, possibly into novel conditions outside the parameter space within which the data were collected, will have to be process based (Evans 2012;Evans et al. 2013a). The problems associated with projecting statistical models outside the bounds of the data collection are well known (Rice 2004). Process-based models are extremely demanding of data, as there are often many interacting processes each requiring parameterization (Evans et al. 2014;Lonergan 2014). For long-lived species, such as trees, parameterization is especially demanding as most processes occur slowly, and so require long-term datasets to ensure that robust estimates of the relevant rates can be obtained (Moustakas et al. 2006). It is rare that datasets exist for creating such models, and so data, the collection of which was originally motivated by some other purpose, usually need to be identified and processed in a manner that makes them suitable for inclusion.
Here, we test the capability of existing data to parameterize a widely used predictive model (individual-based model) -SORTIE (Pacala et al. 1996;Moorcroft et al. 2001;Strigul et al. 2008;Kunstler et al. 2009;Tanentzap et al. 2013). Similar data would have been required for parameterizing other similar models, see Bugmann (2001); Snell et al. (2014) for reviews and Table 1 for summary, but we use SORTIE as an example. We have based this calibration on a well-studied woodland -Wytham Woods in Oxfordshire, UK (Savill et al. 2010), for which we have data from a set of ECN plots and a ForestGEO plot. Our purpose was to determine whether a well-established process-based model could be parameterized for the UK with available data, and if not then what additional data would be required. It is therefore a test of our current capability to create models such as those that were found to be lacking by the NEA (DEFRA 2011;Evans et al. 2013a).

Data availability
The data that we have available are three datasets from Wytham Woods (Oxfordshire, UK) supplemented by one from Alice Holt (Hampshire, UK):  Dawkins, Field, and Kirby (Dawkins and Field 1978;Horsfall and Kirby 1985;Kirby et al. 1996;Kirby 2004).
In total, there were data available on 21,614 individual trees, each of which had been measured on up to seven occasions. The data from the two ECN plots and the Oxford plot have been published elsewhere .

Model's data requirements
SORTIE is an individual-based model initially developed for the Great Mountain Forest (Connecticut) and used in USA (Pacala et al. 1996;Strigul et al. 2008), New Zealand (Coomes et al. 2009Kunstler et al. 2009Kunstler et al. , 2011Forsyth et al. 2015), Canada (Canham et al. 1999;Bose et al. 2015), Scotland (Tanentzap et al. 2013), and the Pyrenees (Ameztegui and Coll 2011;Ameztegui et al. 2015). It produces projections of the community structure of a forest and its carbon flux. Its basic assumptions are that trees compete for light, that adult trees grow, survive, and reproduce in relation to their size, and that saplings and seedlings grow and survive in response to their light environment. Trees are divided into three age classesseedlings (trees <1.35 m tall), saplings (trees with DBH <10 cm), and adults (trees with DBH >10 cm). Table 2 summarizes the algorithms, parameters, and data that are required for every tree species in order to be used in SORTIE. In total, there are six pieces of data required for every seedling, five for every sapling, and seven for every adult. Examination of the data available showed: (1) Seedlings: No data exist in any of the available datasets on trees that are shorter than the height at which DBH is measured. Therefore, no seedlings could be included in SORTIE UK.
(2) Saplings: DBH data exist in all datasets and height in the ECN datasets. Therefore, it was possible to estimate growth rate but this could not be related to light environment without light data. An alternative would be to use SORTIE's capability to estimate the light environment at any specific location. To achieve this, the trees would need to be mapped accurately and canopy openness would need to be known. Using the DBH records as records of presence, we have been able to estimate mortality . However, as the allometric relationships of saplings all relate to diameter at 10 cm above ground (D 10 ) and this is not included in the available  datasets, no allometric relationships could be derived. This meant the height data were not useable without further data collection.
(3) Adults: DBH exists in all datasets, and height data exist in the ECN datasets. Therefore, it was possible to fully parameterize the height -DBH allometric equation but the two other allometric equations could not be parameterized. We could use the DBH records to estimate growth rate, and we were able to estimate the effect of size on growth rate. As with saplings, we have been able to estimate mortality using DBH records as records of presence . Therefore, using the available data, we were able to parameterize five of the 16 equations required to run SORTIE. Over the last 4 years, we have collected the missing data from all the ECN trees and a subset of the trees in OXF , and we are now in a position to use SORTIE at Wytham Woods.

Data volumes
The data demands outlined in the previous section need to be met for each tree species that one wishes to include in the model. We have focussed our efforts on the eight commonest deciduous species in Wytham that between them represent over 99% of all individuals. Even for these common species, the total number of usable data records for each species remains low (Table 3). The reasons for this are the following: Firstly, to estimate the parameters for any relationship usually at least two different pieces of data are needed for each individual (either data on two different variables, e.g., DBH and height or measurements of the same variable at two different time points), thus reducing the number of usable data below the sample size that might exist for either individually. Secondly, each species has to be parameterized for all age classes separately, which obviously reduces the sample size available for any age class below the total for the species. This means that at present, of the 21,614 trees in the datasets, only 726 have provided complete data for the production of the model.

Discussion
It is perhaps unsurprising that the data collected by ECN and ForestGEO at their plots in Wytham Woods do not allow the parameterization of the model we have used. These surveys were not set up with this purpose and so should not be criticized for not meeting its needs. However, while it may seem elementary that "plants stand still to be counted and do not have to be trapped, shot, chased, or estimated" (Harper 1977), monitoring and analyzing long-term datasets often involves accounting for data collected for other purposes. Today we measure trees in much the same way as that recorded by Evelyn in the middle of the 17th century (Evelyn 2012), and the main measurement is a convenient diameter of the tree at some distance up its trunk (Matthews and Mackie 2006). The use of this measurement seems to be ubiquitous among those involved with monitoring woodlands (Hiley 1954;Matthews and Mackie 2006), but it is important to ask why we are using it, and whether it continues to be of utility. Given the huge changes in the relative values of the goods and services provided by woodlands to society over 350 years, it is perhaps surprising that our methods of tree mensuration have changed so little.
It is the case that Wytham Woods are well studied and has good data availability; it is likely that if models cannot be parameterized at this location, they will be difficult to parameterize for any site in the UK without additional data collection. One criticism of what we have carried out could be that we are attempting to use a model that is unusually demanding of data. However, competing models are broadly comparable with SORTIE (Table 1, Bugmann (2001); Snell et al. (2014). The rationale of our work is that there is a societal demand for projections of the future state of ecosystems (DEFRA 2011). To meet this demand, several key issues need to be addressedprincipally the modeling framework and data availability. At present, the data even at a relatively well-studied location do not seem to be adequate to parameterize models that could be used to project the state of ecosystems into the future. To achieve this, there would seem to be a need to bring modeling and measurement closer together and to allow the former without jeopardizing the latter. It is hoped that this article can be seen as an attempt to do so.
The use of any measurement of tree size that is taken at some distance from the ground will automatically exclude any individual that is shorter than the height at which this measurement is made, and this is why there are no data on seedlings in any dataset to which we have access. Obviously, the higher the measurement is taken, the greater the number of individuals that will be excluded. This is shown in Figure 1, which demonstrates that a substantial number of individuals are predicted to exist below the currently recorded minimum size classes. Seedling data from ECN taken within the same plots on which larger trees are measured reveal that for some species, seedlings make up a very large fraction of the individuals in the populationfor example, for ash, 26% of all individuals are seedlings (Table 4). Overall, the lack of lower size classes is a serious data omission not only for predictive modeling calibration but also for forest assessment as it is impossible to detect whether there is a lack of recruitment for some species.
The second point worth noting about DBH is that for modeling purposes, it could be replaced with any trunk diameter measurement. All relationships that use DBH could use any available trunk diameter including D 10 . Had there been good data on D 10 for all species, then all allometric equations could be related to D 10 rather than DBH, which would then be redundant. There are many reasons to continue measuring DBH, it is simple, ergonomically undemanding, and can be assessed with greater accuracy than measurements lower on trunks. These reasons in addition to the long-term datasets that have been created justify continuing the measurement of DBH, but we would argue for new measurements to be added.  The final point emerging from this analysis is that measuring only DBH with no other measurements allows very few parameters to be estimated. We have shown that it is possible to use DBH records to assess the mortality of trees , but other than that, little parameter estimation is possibleonly annual diameter growth rate and the effect of size on growth rate can be estimated, and then, only in adults if light data are missing for saplings. However, note that the light data can be estimated from SORTIE given information on canopy openness and tree positions. ECN has measured tree height in addition to DBH, and this allows the parameters of the height-DBH equation to be estimated for adults. The variables that most increase the ability to calibrate allometric equations and thus develop predictive models are D 10 and light (Table 5). These both add an additional three parameterizable equations to the ones that can be parameterized with DBH and height. Figure 2 illustrates how the realism of the modeled forest increases as additional variables are considered. It is inevitable that as one includes additional information, then one can achieve greater realism, but active decisions should be made about how to trade off realism against the costs of data collection, processing, and simulation (Evans et al. 2013b;Weisberg 2013).
We are not advocating an end to measuring DBH. If we were to do so the valuable long-term datasets that have been built up to now would be lost. At the very least, an alternative measurement protocol would need to run alongside one using DBH for a period of time to allow the interconversion of one set of measurements to the other. What we do suggest is giving some thought to the addition of other measurements to standard protocols. In the relatively near future, it is likely that the utility of remote sensing data will increase to a point where it would be desirable to integrate remotely sensed data with these long-term datasets. An example would be the ability of a ground-based LIDAR system to measure DBH, basal area, woody biomass, stand height, foliage profile, crown diameter, and stem count Zhao et al. 2011;Yang et al. 2013). This would seem to be a costeffective means to collect detailed information on woodland structure, but repeated surveys would be needed to build up adequate data to assess all parameters and the way in which they change with time. The recently announced European Space Agency Biomass satellite may provide a further option. Biomass will use P-band synthetic aperture radar (420-450 MHz) to map the woody parts of forests with a return rate of 6 months over a projected 5-year period. Unfortunately, due to conflicts with military radar use, this will apparently not be available northern parts of North America or Europe, including the UK.
If it matters to society that we understand how ecosystems and the services they provide might change into the future, then we need the data to develop the models to do so (Evans et al. 2014;Lonergan 2014). To project systems into the future, into potentially unknown conditions, process-based models are needed (Evans et al. , 2013a. By their nature these are demanding of data, datasets (even ones of long duration) will be of no utility if they only contain data on a single variable. Our conclusion is that despite the huge efforts that have gone into measuring and recording data, the data that were available in this ecosystem at the start of this project were  insufficient to parameterize a widely used process-based model. To do so, a significant amount of additional effort was required even for this well-studied ecosystem. This means that it will be difficult, without substantial additional data collection, to develop the models required to make projections of woodland ecosystems that were felt to be desirable by, for example, the UK's National Ecosystem Assessment (DEFRA 2011).

Conflict of Interest
None declared. Figure 2. The nature of the modeled forest and hence our ability to both predict and understand it increases in realism and complexity as data on more parameters are concerned. If we consider DBH alone (bottom), then the forest is simply a series of trunks in cross section, if additionally include height, D 10 , crown measurements, and light, then a simplified but recognizable forest appears.