Using legacy data to reconstruct the past? Rescue, rigour and reuse in peatland geochronology

There is a growing interest in the rescue and reuse of data from past studies (so‐called legacy data). Data loss is alarming, especially where natural archives are under threat, such as peat deposits. Here we develop a workflow for reuse of legacy radiocarbon dates in peatland studies, including a rigorous quality assessment that can be tailored to specific research questions and study regions. A penalty is assigned to each date based on criteria that consider taphonomic quality (i.e., sample provenance) and dating quality (i.e., sample material and method used). The weights of quality criteria may be adjusted based on the research focus, and resulting confidence levels may be used in further analyses to ensure robustness of conclusions. We apply the proposed approach to a case study of a (former) peat landscape in the Netherlands, aiming to reconstruct the timing of peat initiation spatially. Our search yielded 313 radiocarbon dates from the 1950s to 2019. Based on the quality assessment, the dates—of highly diverse quality—were assigned to four confidence levels. Results indicate that peat initiation for the study area first peaked in the Late Glacial (~14,000 cal years BP), dropped during the Boreal (~9,500 cal years BP) and showed a second peak in the Subboreal (~4,500 cal years BP). We tentatively conclude that the earliest peak was mostly driven by climate (Bølling–Allerød interstadial), whereas the second was probably the result of Holocene sea level rise and related groundwater level rise in combination with climatic conditions (hypsithermal). Our study highlights the potential of legacy data for palaeogeographic reconstructions, as it is cost‐efficient and provides access to information no longer available in the field. However, data retrieval may be challenging, and reuse of data requires that basic information on location, elevation, stratigraphy, sample and laboratory analysis are documented irrespective of the original research aims.


| INTRODUCTION
Data rescue in the geosciences is a field of rapidly growing interest (Wyborn et al., 2015). Data that have been collected in the past are often referred to as 'legacy' data (Griffin, 2015;Smith et al., 2015).
Many researchers are realising both the scientific potential of reusing data from past studies, and the increasing threat of data loss, particularly concerning data from the pre-digital era. Data loss is alarming, particularly in landscapes where natural archives are degrading or at risk, such as peatlands. Peatlands are under ongoing threat of excavation, drainage, pollution and climate change (e.g. Bragazza et al., 2006;Swindles et al., 2019).
The long-term archives of past environments contained in peat deposits are in some regions largely lost, as is evident from the relatively minute remnants of the once extensive peatlands of northwest Europe (Casparie, 1972;Vos, 2015). Consequently, studies on the formation, dynamics and palaeoenvironmental characteristics of these landscapes could greatly benefit from data rescue, as legacy data may contain information that can no longer be obtained in the field.
Additionally, access to field sites may be difficult due to strict nature conservation regulations in protected peat remnants. Furthermore, limited understanding of how representative these remnants are for the former intact landscape makes field-based studies challenging. Hence, data rescue potentially offers a starting point for peatland research, may provide new insights through meta-analyses (e.g. Ruppel et al., 2013;Tolonen & Turunen, 1996) and can identify knowledge deficits to address with future research. However, data reuse is often challenging due to changing research methods, limited information on data quality, and difficulties regarding data access and retrieval.
Here we aim (1) to develop a workflow for reuse of legacy radiocarbon data in peatland studies, including rigorous quality assessment, and propose ways to tailor the workflow to specific research questions and case studies; (2) to test and evaluate the proposed approach by applying it to a case study of a (former) peat landscape in the northern Netherlands, for which we build a comprehensive dataset of legacy radiocarbon dates.

| BACKGROUND
In this section we briefly discuss the use of legacy data in geoscience, introduce processes of peat formation, review use of (legacy) radiocarbon dates in peatland research, and provide a short introduction to radiocarbon dating. In the last paragraph the case study is introduced.

| Legacy data in geoscience
In the geosciences, legacy data may play a role when analysing landforms or processes of the past or that change through time, and to reinvestigate previous work (cf. Smith et al., 2015). The distinction between 'new' and 'legacy' data is somewhat artificial, and partly the result of many practical issues such as unknown data storage locations, lack of accessibility, physically degrading storage media or unreadable data formats, and unwritten information on records that disappeared from the scientific community when researchers retired or passed away (Griffin, 2015;Wyborn et al., 2015). Other important factors causing the artificial separation of legacy data include the continuous change in research methods, the technological advances to refine and develop new equipment, and ever-increasing computational power.
Data that were passed on by previous generations of scientists may potentially be used for purposes that are diverting from the original research objective for which the data collection was designed (Wyborn et al., 2015). Meta-analyses based on legacy data may yield insights that require a bird's-eye view on the subject matter, crossing boundaries of time and place that limit many case studies. This particularly applies when information is no longer available in the field, or when long-term records are needed to describe and quantify how systems changed through time. However, this requires adequate data access and retrieval, transformation of data to current digital formats, and ways to evaluate data quality and effects of changing research methods to ensure robust meta-analyses. To quote Griffin (2015), " [...] it is up to our community to remove [...] the artificial barriers that presently prevent the access that research requires simultaneously to all of its data."

| Processes of peat formation
Peatlands form distinctive ecosystems on the verge from land and water. Their initiation is primarily dependent on the decay rate of biomass (and resulting production-decay balance), which is predominantly influenced by moisture level (Charman, 2002a). Factors that may influence moisture status and consequently peat growth potential include climate (e.g. Weckström et al., 2010), changes in hydrological base level (such as sea level rise, e.g. Berendsen et al., 2007) or regional groundwater changes (e.g. Van Asselen et al., 2017), landforms and surface topography (e.g. Almquist-Jacobson & Foster, 1995;Loisel et al., 2013;Mäkilä, 1997), impermeable deposits or resistant layers in the soil profile (e.g. Breuning-Madsen et al., 2018;Van der Meij et al., 2018), and anthropogenic influence (e.g. Moore, 1975;Moore, 1993). Some of these factors such as climate may act at larger spatial scales, whereas others, for instance impermeable layers, could also have more local effects.
Given favourable boundary conditions, peat initiation may occur through (a combination of) terrestrialisation (also known as infilling), paludification and primary mire formation (Charman, 2002a;Rydin & Jeglum, 2013a). Terrestrialisation refers to the process where peat forms in or at the edge of existing water bodies. Paludification does not include a true aquatic phase; instead, peat develops directly on previously dry mineral substrate following changes in moisture status that led to waterlogging. Primary mire formation refers to peat formation on newly exposed land (as opposed to paludification, where previous vegetation was present) that has been waterlogged since initial exposure, for instance after deglaciation or land uplift from sea.
Over time, peatlands grow vertically and may reach a point where their surface rises above groundwater level. Isolation from groundwater and resulting strong dependence on rainwater leads to ombrotrophication (Charman, 2002b;Rydin & Jeglum, 2013a).
In addition to vertical growth, peatlands may expand laterally to cover larger areas. Poor drainage adjacent to the peatland may cause paludification of surrounding soils. This is referred to as an autogenic process (Charman, 2002b), but the degree to which this happens and the rate of lateral spread are dependent on allogenic factors such as climate and topography (e.g. Korhola, 1994).
Reconstructing the period of peat initiation requires dating the peat base (also referred to as basal peat, do note that this definition of basal peat is much broader than the 'Basisveen Bed' as known in Dutch stratigraphy [TNO-GSN, 2021a]). Peat initiation and subsequent lateral expansion are often not easily distinguished as both require basal peat dates for reconstruction. Lateral expansion can only be deduced from a series of basal dates (e.g. Chapman et al., 2013;Mäkilä, 1997;Mäkilä & Moisanen, 2007), which in fact is also needed to determine which date indicates the age and location of peat initiation.

| Legacy data in peatland research
Meta-analyses of composite datasets often provide new supraregional insights and may point to knowledge gaps that need to be addressed by future research. For instance, in an extensive study by Tolonen & Turunen (1996) Berendsen et al., 2007;Hijma & Cohen, 2019;Meijles et al., 2018) or to increase insights into peat compaction and land subsidence (e.g. Koster, 2017). Ruppel et al. (2013) studied trends in peatland initiation in North America and northern Europe, through analyses of 1,400 retrieved basal peat dates. Their results not only provided insights in spatiotemporal trends in peat initiation but also indicated a lack of (retrieved) data for the Northwest European Plain.
Future studiesincluding the research presented in this papermay complement this image and further develop our understanding of peatlands through space and time, the influence of autogenic processes and feedbacks, and allogenic causes for changes in peatland dynamics.

| Radiocarbon dating
For environmental reconstructions based on peat archives, radiocarbon ( 14 C) dating is the preferred method to connect stratigraphies to an absolute time scale. We provide a concise explanation of the radiocarbon method, as well as a summary of the development of its measurement techniques (Figure 1). This is relevant when using data obtained with techniques subject to methodological changes.
Radiocarbon dating is based on the radioactive decay of 14 C. This isotope is produced in the upper atmosphere by cosmic radiation. It oxidises to 14 CO 2 , which is incorporated into living organisms through photosynthesis and the food chain. Upon death of the organism, the radioactive decay of 14 C enables to derive its age (i.e., timing of death).
Although the principle is relatively simple, complications do exist.
First, changes in cosmic ray flux and geomagnetic field strength cause variations in the production rate of 14 C through time (De Vries, 1958).
This requires 14 C dates to be calibrated in order to express them in calendar years. This is primarily done by 14 C dating of tree rings, which are dated absolutely by dendrochronology. Second, isotopic fractionation (mass-dependent effects) during photosynthesis leads to depletion of the heavy isotopes 13 C and 14 C in plants, the latter causing age aberrations for which measurements need to be corrected.
Third, the half-life of 14 C is 5,730 AE 40 years, where originally a value of 5,568 years was used by Libby et al. (1949) who developed the 14 C method.
In its early days, 14 C dates were reported in BP (Before Present, defined as 1950), using the natural 14 C content as a reference. It soon became clear that this is problematic because of the complications mentioned above. These are solved by the Radiocarbon Convention, which defines the 14 C timescale (Stuiver & Polach, 1977;Van der Plicht & Hogg, 2006): i. The 14 C radioactivity is measured relative to that of a modern reference material, i.e. Oxalic Acid with a radioactivity of 0.226 Bq/g C; ii. From this measured radioactivity, the 'radiocarbon date' is calculated using a half-life of 5,568 years; iii. Radiocarbon dates are corrected for fractionation using the stable isotope 13 C (to a reference value δ 13 C = À25‰, see below); iv. Radiocarbon dates are expressed in the unit BP.
The original half-life value was chosen to keep the meaning of earlier reported dates unchanged. The chosen value for the δ 13 C reference value is that of charcoal, wood and plants (including peat). The Convention means that BP should not be taken literally: 14 C years differ from calendar years, and present is not today (or 1950). Calibration transfers 14 C dates into calendar dates. These are expressed in cal BP, which is defined as calendar years before 1950 CE. The calibration curves are updated regularly (Figure 1).
In radiocarbon practice, the δ 13 C and %C values are indicators for sample integrity (Mook & Streurman, 1983). When these values are not within the accepted range, the organic sample material is usually degraded, or there is contamination. They are therefore an integral part of 14 C dating, also for legacy data. 13 C is a stable isotope, thus its concentration is time independent. It can therefore be used as a measure of fractionation of the photosynthesis process. Since δ 14 C = 2δ 13 C, we then also know the fractionation effect for 14 C and thus the age deviation caused by this process.
In the early days of radiocarbon (the 1950s, i.e., before the Convention), δ 13 C was not measured, and fractionation correction was not applied. The significance of δ 13 C is dependent on the type of photosynthesis used by plants, known as the C 3 and C 4 pathways. For C 3 plants, the δ 13 C value is around À25‰, not very different from the reference value so that fractionation corrections are small, within the measurement uncertainty and negligible. Therefore, our peat dates that were not corrected for isotopic fractionation (i.e., measured before the Convention) are still useful. For completeness we note that for C 4 plants, the δ 13 C value is around À10‰, which leads to large fractionation corrections; here the difference from the reference value (δ 13 C = À25‰) is around 15‰, which corresponds to an effect of 240 BP for 14 C and cannot be neglected. Thus, for regions containing C 4 plants, it will be necessary to correct previously uncorrected dates for the fractionation effect.
The %C refers to the organic C content of the sample after the pre-treatment (ideally ABA, the acid-base-acid method) designed to isolate the pure datable fraction. This is different from the organic content of the original peat, such as that measured by loss-on-ignition, where the weight loss is measured of dried untreated material before and after combustion at high temperature (e.g. Chambers et al., 2011;Kennedy & Woods, 2013). The lower the carbon content of the 14 C sample, the larger the effect of contamination (i.e., all the carbon that was not related to the sample when it was alive) will be (Lanting & van der Plicht, 1994).
Initially, radiocarbon concentrations were measured by radiometry. This method requires large quantities (typically 1 g) of carbon (Cook & van der Plicht, 2013), meaning that only bulk samples could be dated. In the 1970s, accelerator mass spectrometry (AMS) was developed for direct measurement of 14 C concentrations in a sample.
This method is much more efficient, enabling dating samples of typically 1 mg (Tuniz et al., 1998). The most recent development is that of mini-AMS systems (MICADAS), based on the same technology but much smaller machines. Radiometry is still applied at some laboratories. Radiocarbon laboratory codes (available at www.radiocarbon.org) provide unique identifiers for dates and immediately provide information on where the date was measured and often also on the measurement method (conventional or AMS).
F I G U R E 1 Timeline of developments in radiocarbon dating (right). Changes at the radiocarbon laboratory in Groningen (left) are relevant for the case study [Color figure can be viewed at wileyonlinelibrary.com] 2.5 | Case study selection and aims Palaeogeographic maps are often built though integration of (legacy) data from various sources (e.g. Pierik & Cohen, 2020).
Current palaeogeographic reconstructions of the Netherlands (Vos, 2015;Westerhoff et al., 2003;Zagwijn, 1986) were created with a strong focus on the development of river deltas (e.g. Berendsen & Stouthamer, 2000, 2001 and coastal area (e.g. Hijma, 2009;Cohen et al., 2014;Pierik et al., 2017). In contrast, reconstructions of inland peatlands remain uncertain due to limited data for these areas (Spek, 2004;Van Beek, 2009;Vos, 2015). Increased understanding of their spatiotemporal dynamics is needed to refine representation of these landscapes on the palaeogeographic map series, for the development and validation of peat growth models (e.g. Kleinen et al., 2012), and related quantification of their role in past, present and future carbon cycles (e.g. Erkens et al., 2016;Yu et al., 2011). Furthermore, insight in peatland palaeogeography is key to understand Parts of the region belong to a national park, a UNESCO Global Geopark and several Natura 2000 reserves.
During part of the Saalian (MIS 6), the northern Netherlands was covered by a continental ice sheet, leading to deposition of glacial till (Rappol, 1987;Rappol et al., 1989;TNO-GSN, 2021a;Van den Berg & Beets, 1987). The central part of the study area is known as the Drenthe Plateau or till plateau (Bosch, 1990;Ter Wee, 1972). Meltwater scoured deep valleys east and south of the Drenthe Plateau, the Hunze valley (Bosch, 1990) and palaeo-Vecht valley (Bosch, 1990;Kuijer & Rosing, 1994;Ter Wee, 1966) respectively. The area east of the Hunze valley is also known as the Hunze Plain (Groenendijk, 1997). Here, fluvio(peri)glacial sands were deposited during the Saalian (in this part of the study area glacial till is only sporadically found, Bosch, 1990). In the Weichselian, the Drenthe Plateau became dissected by incising rivers, consequently glacial till is largely absent in river valleys (Klijnstra, 1979). During the coldest phase of the Weichselian, coversand was deposited with thicknesses varying from 0.5-2 m (Ter Wee, 1979;TNO-GSN, 2021b). This deposit is present at the surface in the north-eastern, eastern and south-eastern parts of the Netherlands (Figure 2c) and the larger European Sand Belt (Koster, 1988(Koster, , 2005. Peat deposits in the study area formed on both the low-and high-lying plains (e.g. Casparie, 1972Casparie, , 1993, in river valleys (e.g. Candel et al., 2017) and in fossil pingos (e.g. De Gans & Sohl, 1981). Based on historical data peat thickness on the plains appears to have reached at least 7 m at some sites (Fochteloërveen, Douwes & Straathof, 2019), maximally 7 m in the largest pingos (Stokersdobbe, Paris et al., 1979) and locally at least 7 m in river valleys that were deeply incised prior to the Holocene (Drentsche Aa river, Candel et al., 2017).
In the northern Netherlands, large-scale peatland reclamations took place from the eleventh and twelfth century onwards. These were initiated by monasteries and local landlords, originally for agricultural purposes . From the late sixteenth century onwards, commercial-scale peat-cutting for fuel became dominant (Gerding, 1995). As a result, only small remnants of the former peat landscapes remain (Figure 2d and 2e).

| APPROACH AND METHODS
We propose a workflow for data rescue in geochronological peatland research ( Figure 3, Section 3.1), which involves a rigorous quality assessment of legacy data. The rationale of this procedure is threefold: i. To assist in systematically recording properties of legacy dates, using quantitative information and uniform qualitative categories where possible; ii. To enable evaluation of data on various quality aspects, either determined by technical aspects of the date, properties of the date related to its landscape position, or both (Section 3.1.2); iii. To assign a penalty score to each date based on case-specific weights for quality aspects (Section 3.1.3). This enables taking data quality into account in subsequent meta-analyses, to test for sensitivity using subgroups of data with different quality levels, and to safeguard robustness of conclusions.
To evaluate the power of the proposed methods, we apply the workflow to a case study, for which we have formulated three research questions on spatiotemporal peatland dynamics (Section 2.5.1). To answer these questions, we tailor the proposed workflow as explained in Section 3.2. Based on the process of data rescue and meta-analysis in the case study, we identify research deficits to address during future studies and evaluate the value of data rescue in geochronological peatland research.  (Table 1); ii. A complementary set of quality criteria with flexible weights to suit specific research questions (Table 2; see Section 3.1.3 for more explanation on the use of weights); iii. A script for automated quality assessment of the recorded legacy data using the weights defined in point (ii), to make the approach suitable for evaluating large legacy datasets.
Based on the literature discussed in Section 2.4, we propose quality criteria that consider technical aspects of radiocarbon dating and  (Table 2). To allow automation of the quality screening process, quantitative information is used in the database where possible. Additionally, discrete and Boolean categories were defined that can be used to standardize qualitative descriptions. The quality assessment was scripted in Python, to automatically assign a penalty score to each date.

Definition of quality
For constructing the quality assessment one has to decide what quality means, and for which properties it must apply. According to the Cambridge Dictionary (2020), quality means "how good or bad something is" or "a characteristic or feature of [...] something". Both definitions are used in our quality assessment (see below).
When considering radiocarbon dates of peat layers, each date's quality may be assessed for its dating quality (Q d ) and taphonomic quality (Q T ). Dating quality refers to technical aspects of the radiocarbon date, i.e., sample characteristics and the way it was processed in the laboratory. Taphonomy, a term originating from palaeontology, is the science of how materials (or fossils) become embedded in their surroundings (e.g. Martin, 1999). The taphonomic quality therefore refers to characteristics of where the sample came from, e.g. its location and stratigraphical position. The degree to which a radiocarbon date represents the event of interest is determined by its dating and taphonomic quality. Both Q d and Q T are determined by the approach and methods that were followed by the researchers from the original study the date was obtained from. Figure 4 provides a visualisation of the effects of methodology on the resulting Q d and Q T . As dating approaches were tailored to answer a particular research question (with a certain required level of certainty), Q d and Q T may diverge for radiocarbon dates originating from different studies.
In textbook examples where a bullseye is used to illustrate accuracy and precision, these concepts usually apply to a set of replicate measurements. Note that, in Figure 4 and the explanation above, Q d and Q T apply to the accuracy of a single measurement (i.e. the degree to which a date represents the true age of the event of interest), and that precision (i.e. the degree to which replicate measurements lead to the same result) is not indicated in Figure 4. The possibility to replicate a date is however fully dependent on the information contained in Q d and Q T , therefore a high penalty score for Q d and Q T will most likely result in low precision (e.g. if location is poorly known, attempting a replicate measurement cannot be performed with a high precision).
To ensure accuracy and robustness of conclusions derived from meta-analyses, quality assessment may provide insight in sources of error and allows to expand data analyses based on subsets of data with increasing uncertainty. To make the quality assessment flexible to answer a variety of research questions, we have created an adaptable, twofold approach. First, each date is evaluated for aspects that are considered negative (i.e. in line with the F I G U R E 3 Proposed workflow for data rescue, quality assessment and metaanalysis in geochronological ( Add LabCode Any additional laboratory codes that were assigned to the date.

YearOfDating
Year when the sample was dated by the laboratory. In case a period is given, the earliest year is recorded.

CopyrightLicense
Licence under which the data was published (if applicable).

ConsultedLab 1
Either 'yes' or 'no' to indicate whether the laboratory was consulted for additional details on the sample.

Reference
Original publication(s) that mentioned the date.

ReliabATOrigPub
Any comments about whether the date was reliable according to its original publication.

Notes
Any other comments (e.g. whether a lithological or lithogenetic cross-section is available, short description of the definition for basal peat as used by the original publication that mentioned the date).

AGE 14CMean
Reported mean age in conventional 14 C years (BP).

14CSD
Reported standard deviation of the mean age in 14 C years.

ReservoirCorrected14CMean
If applicable and reported, the mean age in 14 C years (BP) after applying a reservoir correction.

CalMean
Reported mean calibrated age in years before common era (BCE)/common era (CE).

CalSD
Reported standard deviation of the mean age in calendar years.

UnknownMean
Reported mean age, unknown whether the data represent uncalibrated or calibrated ages.

UnknownSD
Reported standard deviation of the mean age, unknown whether the data represent uncalibrated or calibrated ages.
Delta13 13 δ value used to correct the dating result for isotopic fractionation.

Delta13source
Either 'measured' or 'estimated' to indicate whether the reported 13 δ value was quantified in the laboratory or that a standard estimate was applied.

CarbonContent
Measured carbon content (%C) of the dated sample material.

LocationDescription
Description of the location where the sample was taken.
X 2 Column listing all X-coordinates in the Dutch RD-new projection, either reported or transformed from reported longitude.
Y 2 Column listing all Y-coordinates in the Dutch RD-new projection, either reported or transformed from reported latitude.

XYuncertainty
Reported uncertainty of measured X and Y coordinates.
Lat Column listing latitude, in case location is only reported as geographical coordinates.

Lon
Column listing longitude, in case location is only reported as geographical coordinates.

LLuncertainty
Reported uncertainty of measured latitude and longitude, in case location is only reported as geographical coordinates.

Landform
Either

Stratigraphy
Either 'Upperlimit', 'Lowerlimit', or 'Within' to indicate whether the sample was taken at the top of the peat layer, the base of the peat layer, or somewhere within the peat layer, respectively.

SampleThickness 4
Sample thickness in cm.

SampleDetails
Any information on the sample material, species, etc.
MeasuredFraction Details on measured fraction as provided by the referenced paper for the date.

SampleType
Either 'bulk' or 'macro' to distinguish between bulk samples and samples consisting of plant macrofossils.

SpeciesType 5
In case of a 'bulk' SampleType, this automatically becomes 'undefined'. In case of a 'macro' SampleType, this column lists either 'terrestrial', 'aquatic', 'both' or 'NR'. Terrestrial and aquatic distinguish between plant species that might be affected by a reservoir effect.

Aboveground
In case of a 'bulk' SampleType, this automatically becomes 'undefined'. In case of a 'macro' SampleType, either 'yes', 'no' or 'NR' must be selected to distinguish between seeds/leaves/stems or roots, respectively. Pretreatment Either 'ABA' or 'onlyA' to indicate whether the ABA protocol was used or only the first A, respectively. 1 In case laboratory data differed from data provided by publications, the laboratory was followed. For one entry in the case study, this led to a major change in the age (GrN-5460, the datelist in which this date is mentioned contains an error).

2
In case no coordinates were provided by the original publications but the field of sampling was indicated, coordinates were retrieved based on georeferenced historical maps where the old field boundaries were indicated. If only a nearby village was mentioned, its central coordinates were registered.
3 These categories may after the quality assessment be replaced by a case-study-dependent value, to be used as uncertainty range in further analyses (e.g. 'field' might be replaced by an uncertainty value of 100 m). 4 In case no sample thickness was retrieved, a default sample thickness can be defined after the quality assessment to use in further analyses. 5 To determine whether terrestrial or aquatic plants were used for dating the species was looked up in the Dutch online encyclopaedia of species (Soortenbank.nl, 2020). Samples that consisted of Sphagnum mosses that lacked further information on terrestrial or aquatic growth habit were listed as 'Undefined'.
abovementioned definition "how good or bad something is"), for which penalties are assigned (for comparable approaches, see e.g. Small et al., 2017). For instance, a bulk sample is considered less reliable than a plant macrofossil sample (Törnqvist et al., 1992(Törnqvist et al., , 1998. Second, each date is assessed on the availability of information that allows to make informed choices with regard to data analysis (i.e. "a characteristic or feature of something"). In case of missing information or a low level of detail, a penalty is assigned. In this case the focus is not on the implication of the property (for instance, the location itself is not judged), but on knowledge about the property (do we know the location well or not). Depending on the information that is available (and the resulting penalty score), data may be filtered prior to data analysis (for example, first including only sites with well-known location, then analysing sites with uncertain location as well). This allows a purposeful assignment of dates to various analyses.
Design of the quality criteria Age. This category contains criteria for three properties: Mean and SD, Delta13 and Carbon content (

| Weights in the quality assessment and interpretation of penalties
For each data entry, the taphonomic quality Q T and dating quality Q d are calculated using the quality criteria and (case-specific) weights listed in Table 2. In case a specific criterion is irrelevant for the research questions to be answered, it can be assigned a weight of zero and will then no longer be considered. Depending on the case study and research aims, weights may be adapted to tailor the quality assessment.
The total penalty score Q results from the sum of Q T and Q d . Q is normalized to 1, i.e. the minimum value is 0 (no penalties, reflecting highest quality) and the maximum possible value is 1 (poorest quality).
Due to this normalisation, the maximum values of Q T and Q d are always below 1 and do not need to be equal, as they depend on the chosen weights. For instance, in our case study (Section 3.2) the weights listed in Table 2 (Table 3).
T A B L E 2 Criteria used in the quality assessment, based on the categories listed in

Gentle (A only)
No pre-treatment applied or not retrieved.
3.2 | Application of the workflow to a case study

| Case study: Data rescue and quality assessment
The data search scope was determined by the spatial definition of the study area presented above ( Figure 2). All acquired dates were recorded irrespective of their measured radiocarbon age (no restrictions in time period were applied during the search phase). Data originate from 1955 to 2019 and stem from a wide variety of environmental and archaeological studies, including scientific literature, books and reports from contract-based archaeology. We used the database set-up of Table 1 and recorded dates from peat layers (i.e. excluding dated archaeological artefacts originating from peat layers).
The majority of retrieved dates was performed by the radiocarbon facility of Groningen University (Centre for Isotope Research and its predecessors). The history of this laboratory is shown in Figure 1.
Developments are also reflected in laboratory codes, moving from

| Case study: Meta-analysis
To answer the case study research questions (Section 2.5.1), the We chose the criteria weights listed in Table 2. In this way, the penalty contribution of each criterion is ordered based on the qualities we consider most important to answer the case study research questions.
For these questions, age and location are crucial, followed by elevation, stratigraphy and landform. To prevent qualities from becoming irrelevant, we kept the difference in weight between criteria relatively small.
Based on the penalty scores, each date was assigned to one out of four confidence levels based on the definitions listed in Table 3, where Q T,lim and Q d,lim were set at 50% of their respective maximum values.
After completing the quality assessment, filtering was applied based on (1) confidence level, (2) Stratigraphy (to select only basal peat dates), (3) SampleMaterial (to distinguish peat initiation processes, explained below), and (4) Landform (to derive landform-specific age trends). For analyses on the relationship between age and elevation, we calculated elevation relative to m O.D. for samples that were only retrieved with depth from the (former) surface. To this end, we derived the surface elevation from the digital elevation model (DEM) and subtracted the sample depth. For basal dates, elevation is not affected by compaction. Dates from within the peat or the top might be affected by compaction, however, as we only used these data for a general overview of the elevation range from which samples were retrieved, they were not corrected for compaction effects.
All ages were (re-)calibrated in OxCal (version 4.4;Bronk Ramsey, 1995) using IntCal20 (Reimer et al., 2020). To analyse trends of peat initiation, dates of basal peat layers (i.e. entries registered with stratigraphy 'lowerlimit') were selected and summarised using kernel density estimation (KDE) with the KDE_Model function in OxCal (Bronk Ramsey, 2017). To test model outcome for sensitivity to previously assessed data quality, the data subsets from the four confidence levels ( Figure 6) were added to the model in separate runs and outcomes compared.
To derive spatiotemporal insights on peat initiation, data were plotted in geographic information system (GIS) software (ESRI ArcMap, version 10.6) using the chronostratigraphy shown in Table 4.
To assign dates to the listed periods the μ value of the calibration was used for simplicity (i.e. instead of the 2σ age range). Similarly, μ was used to construct age-elevation plots.
To determine which peat initiation process (terrestrialisation, primary mire formation or paludification) was responsible for peat formation at a specific site, the sediment underlying basal peat often provides indications (Ruppel et al., 2013). Typically, peat from terrestrialisation is underlain by lake sediments such as gyttja. Primary mire formation starts on inorganic sediment where fresh parent material is exposed, whereas paludification occurs on inorganic sediment where soils have formed through time, sometimes with litter layers of past vegetations. Unfortunately, information on soil horizons underlying peat deposits is limited for our case study data. To determine the prevalence of these three processes in the study area, we therefore assigned basal peat dates to each initiation process based on registered SampleMaterial (Table 5).

| Data rescue for case study region
We compiled a dataset consisting of 313 legacy radiocarbon dates.
The majority (85%) of the retrieved dates indicates peat layers of

| Quality assessment
Based on the values for Q d and Q T , each date was subsequently assigned to one of four confidence levels (Table 3, Figure 6). For green dates, both Q d and Q T were fairly low, meaning that sufficient information is available regarding dating aspects and taphonomic characteristics. On the opposite side, red dates have high penalty scores for Q d and Q T , indicating that information for these dates is very limited. Orange dates have sufficient information regarding taphonomy but lack detail regarding dating aspects, and vice versa for purple dates. ducted. It appears that in the 1990s less peat dates were performed, however some large studies were published that were (partly) initiated in the 1980s (e.g. Groenendijk, 1997;Van Geel et al., 1998). This relates to certain geographic foci (Figure 5b), for example the eastern part of the province of Groningen (Groenendijk, 1997), and the Bargerveen (Dupont, 1986) and Fochteloërveen

| Large-scale trends of peat initiation
To deduce spatiotemporal trends in peat initiation, we focused analyses on dates from basal peat layers only (n = 74, see 'lower limit' in Figure 5d). The estimated distribution of these ages is shown in Figure 8a to 8c, based on green dates with applied filter for aboveground remains of terrestrial macrofossils (n = 12), green dates without filtering applied (n = 50) and dates from all confidence levels combined Based on the available information, most peat initiation sites appear to result from either primary mire formation or paludification (Table 5). However, one would expect the number of terrestrialisation sites to be larger, as 19 dates were collected in topographic depressions such as pingos (Table 6; apparently gyttja was only found/sampled at some of the pingo sites). As the study area has been deglaciated since the penultimate glacial, all land in this region has been exposed for the past 130,000 years. Paludification was therefore probably the dominant peat formation process in the study area.

| Peat initiation trends for different landforms and elevations
We grouped landforms into four categories (Table 6). For both green confidence level dates and dates from all confidence levels combined, KDE models were constructed (Figure 9, showing only models from all confidence levels combined). Too little data were available to model F I G U R E 4 Effect of methodology (dating and taphonomic quality) on representation of the true age of the event of interest (bull). The distance to the bull indicates how robust a date is, i.e. the degree to which the date corresponds with the true age of the event of interest. Note that multiple black dots (i.e. potential dating results) were drawn for the purpose of illustrating the effect of dating quality and taphonomic quality, whereas in reality they apply to a single measurement. In case of the lower left for example, the approach ensures a sample is collected from the right position (e.g. location, elevation, stratigraphical level), and strict methods are applied with regard to sample selection and laboratory procedures. With low dating and/or taphonomic quality, dates will deviate more from the true age of the event of interest. Our approach aims at attributing penalties to those dates in the quality assessment, as a way to characterise their trustworthiness and usefulness to answer a specific research question [Color figure can be viewed at wileyonlinelibrary.com] only green confidence level dates that were filtered for aboveground terrestrial macrofossils. The distribution for 'Peatlands (unspecified)' in Figure 9c shows two peaks similar to the model outcomes in

| DISCUSSION
Here, we first discuss the main findings of the case study, followed by experiences regarding data retrieval, representativity of the resulting dataset, and effect of the quality assessment. Based on this, we evaluate the proposed workflow.

| Main findings on peatland development
The legacy dataset indicates peat initiation in the study area from at least the Late Glacial onwards ( Figure 5). The KDE model results show a bimodal distribution of basal peat dates, with a first peak during the Late Glacial, a low in the Boreal period, followed by a rise starting in the Atlantic and finally a peak during the Subboreal (Figure 8). The majority of data points is located in the northern half of the study area. Here, several spatial clusters indicate areas with simultaneous peat initiation, for example during the Atlantic and Subboreal in the east of the province of Groningen (Groenendijk, 1997).
When considering peat initiation for landform groups, several trends can be distinguished (Figure 9)  ( Figure 10b and 10c, Meijles et al., 2018) and the hypsithermal (Holocene Thermal Maximum; 9,000 to about 5,000-6,000 years ago; Renssen et al., 2009;Wanner et al., 2008). Given favourable climatic conditions for peat growth, combined with sea level rise and related groundwater level rise, peat deposits increasingly filled (higher located) river valleys (Figure 9f and Figure 10c) and eventually formed on high-lying plains (Figure 9d and Figure 10c). The drop of the second peak coincides with neoglacial cooling (5,000-6,000 years ago to pre-industrial time; Wanner et al., 2008), perhaps indicating less favourable climatic conditions. However, as peat covered an increasingly large area, further initiation and expansion may also have become limited due to lack of sites suitable for peat growth. Casparie & Streefkerk (1992) state that for the Netherlands two main phases of climate-induced mire initiation occurred, from 7,000-6,500 BCE ($9,000-8,500 cal years BP, start of Atlantic) and around 5,000 BCE ($7,000 cal years BP, middle Atlantic). Both periods fall T A B L E 5 Classification of SampleMaterial to derive peat initiation process Peat initiation process SampleMaterial filter Green confidence level, n: All confidence levels, n: F I G U R E 5 Overview of age and location of case study legacy data points (n = 313). (a) Locations of data points binned based on chronostratigraphy, using definitions listed in Table 4. Uncertainty of locations (see text) not shown for legibility. Note that several data points overlap (i.e. multiple samples collected at [nearly] the same location). Basal peat date means stratigraphical position is 'lower limit'. Background map shows the reconstructed palaeogeography of the Netherlands for $2,500 cal years BP (also see Figure 2c) between the start of the second peak and the 'bump' prior to its maximum in Figure 8c, but the legacy dataset shows no indication of a drop of peat initiation between 8,500 and 7,000 cal years BP. Van  advocate that the 2,800 cal years BP event is a cause for peat initiation. Locally, peat may have initiated at this timing, but their sampling location may also have been a site overgrown through lateral expansion of a pre-existing, older peatland. Presence of a main initiation period around 2,800 cal years BP is not supported by the bimodal distribution of the legacy data. Based on detailed palynological investigations in the Bargerveen peat remnant (indicated in Figure 2b), Dupont (1986) concludes that human influences can be traced in arboreal pollen data only from 5,500 cal years BP onwards, which suggests that human impact on peat initiation was probably limited in the study area.
On the Dutch national palaeogeographical map series (Table 7), peat initiation in the study area starts at the earliest around 7,500 cal years BP, slightly later than the rise of the second peak in the bimodal distribution in Figure 8. No peat deposits are present on the maps prior to 7,500 cal years BP, whereas our results indicate that a peat initiation peak during the Late Glacial must have resulted in peat cover prior to this date (mainly in topographic depressions and river valleys, Figure 9e and Figure 9f). According to the map series, the maximum extent of peatlands was reached between 3,250 and 2,500 cal years BP ( Table 7). The basal dates in the legacy dataset are mostly older than this, indicating that the majority of peatlands in the study area Based on what could be derived from the legacy data, and considering the surface exposure of the study area for 130,000 years (Ter Wee, 1962), paludification seems to have been the most prominent process causing peat formation in the study area. Paludification may result from environmental factors but also from autogenic processes leading to lateral expansion of peatlands (see Section 2.2). For our case study, it is often unclear whether dates stem from the same former peatland, as this would already require a clear view of their palaeogeography. Consequently, the dataset is not suitable to draw inferences on local peat initiation versus lateral expansion of existing peatlands.
The legacy dataset leads us to tentatively conclude that the study area witnessed two major phases of peat initiation, where the earliest peak was probably mostly driven by climate whereas the second was the result of climate in combination with Holocene sea level rise. We did not consider presence of impermeable deposits in the study area; these may have further enhanced the potential for peat growth, but the degree to which this contributed and on which spatial scale remains unclear. F I G U R E 6 Overview of the quality assessment of the case study dataset (n = 313) showing resulting Q d and Q T values for each data point (note that some points overlap). Limits of the confidence levels are defined in Table 3, with Q d,lim and Q T,lim set at 50% of their respective maximum normalised values. The coloured quadrants indicate the four confidence levels that were used in subsequent data analyses [Color figure can be viewed at wileyonlinelibrary.com] 5.1.2 | Experiences regarding data retrieval Most scientific publications from which data were collected were fairly easy to find using basic literature searches and keyword queries.
Reports from contract-based archaeology were easily accessed, however due to the vast amount of reports available, it was generally difficult to find relevant information.
Irrespective of data source, we were able to retrieve the laboratory code for all samples, thus providing insights into the uncalibrated dating results. In case of ambiguities, dates could be retrieved from the Groningen databases. The bulk SampleType was mainly deduced from laboratory codes. Details for macrofossil samples were retrieved from publications and laboratory archives. Overall, we found many more dates than anticipated.
Unfortunately, quite often location and sample elevation were not documented in great detail (Figure 5e). For our GIS analyses, the spatial error was considered irrelevant due to the fairly large scale of the study area. However, location was needed to calculate former were present at this location at the dated age, but further implications are much more difficult to deduce.

| Representativity of the legacy dataset
The meta-analysis of Ruppel et al. (2013) indicated a lack of data for the northwest European Plain. The legacy dataset of our case study demonstrates that this image is not entirely valid: our search revealed 74 basal peat dates in the studied region. Additionally, sea level research such as the reconstructed RSL curve for the Wadden Sea (Meijles et al., 2018) is based on elaborate datasets of (legacy) basal peat dates.
However, despite our efforts a limited number of dates was found in the southern half of the study area. This is probably due to two major factors. As can be deduced from Figure  To address these biases in the dataset, future studies may include (legacy) dates that were not performed on peat deposits directly, but on archaeological artefacts that were retrieved from peat layers or from underneath them. It has, for example, been demonstrated that the coversand landscape underlying the northern part of the former Bourtangermoor (Dutch-German border area, the surviving remnant on Dutch territory is the Bargerveen, Figure 2b) is very rich in Mesolithic sites (Groenendijk, 2003). Such finds provide a terminuspost-quem for peat initiation, even though potential hiatuses must be taken into account. Well-preserved overgrown cultural landscapes are also known from northern Germany (e.g. Pantzer, 1986). Well-dated archaeological finds from peat layers may provide a terminus-ante-F I G U R E 8 Outputs of KDE models for basal peat ages in the case study dataset. Results are based on model runs of basal ages with (a) green confidence level that were based on aboveground remains of terrestrial macrofossils (n = 12), (b) dates with green confidence level with no further filtering applied (n = 50) and (c) all confidence levels combined (n = 74). The dark-grey area indicates the sampled KDE estimated distribution. The blue line shows the mean of the KDE distribution, the lighter-blue band shows the AE1σ range. The red crosses show the central values for the entered dates, the black crosses show the medians of the marginal posterior distributions for every dated event. The calibration curve is indicated for reference (Reimer et al., 2020) [Color figure can be viewed at wileyonlinelibrary. com] quem (for underlying peat layers) and/or terminus-post-quem (for overlying layers), depending on the local stratigraphy. As archaeological finds from peatlands were often recovered in the distant past during peat-cutting , they do require a quality assessment of their own, tailored for archaeological aspects in addition to taphonomic (Q T ) and dating (Q d ) quality.

| Effect of the quality assessment
The quality assessment shows that the data points are dispersed through the four confidence levels, indicating that for some samples taphonomic quality is relatively low whereas for others problems lie in the dating quality ( Figure 6). A significant part of the data points received a green confidence level (n = 121 out of 313), which allows most detailed filtering options as for many aspects sufficient information is available.
The KDE modelling runs with different confidence level groups

| Evaluation of approach
The proposed workflow and quality assessment demonstrate the balancing act to reach robustness without being too strict and consequently discarding the majority of data. All data points contain information, the question is how to extract it adequately. The quality assessment has a flexible set-up, and depending on the research questions to be answered, assessment criteria can be included, excluded or made more impactful using the weights. Subsequent filtering allows tailor-made and informed decisions for data analysis. For instance, if for a certain research question (e.g. reconstructing a sea level curve) it is unnecessary to know a detailed location of the date but crucial to know its elevation and stratigraphical position, weights may be adjusted accordingly, which will result in a higher penalty for dates that do not match these criteria.
The case study shows that varying criteria have been used to define peat initiation and to subsequently select samples, resulting in divergent approaches to date the onset of peat accumulation.
Consequently, this led to a range in taphonomic (Q T ) and dating (Q d ) quality in our quality assessment. The methods of the studies from which dates were retrieved partly depend on their research objectives, but also reflect methodological possibilities at the time of dating, for instance use of bulk sampling prior to the development of AMS.
Discussions on methodological aspects of dating and 'best practices' are reflected in the quality criteria. For instance, a bulk sample receives a penalty for SampleType, as bulk samples are generally large and consist of an uncharacterised mixture of organic compounds (e.g. Törnqvist et al., 1992Törnqvist et al., , 1998. Inherently, this means a penalty is also assigned for SpeciesType and Aboveground, as it is unknown which species and which plant tissues are contained within the sample. If for a given peatland a reservoir effect is expected (Blaauw et al., 2004;Kilian et al., 1995), then weights for these properties can be increased, filtering can be applied (to exclude all samples with unknown and aquatic species) or both.
It is important to note that the penalty score is cumulative, not exclusive. For instance, if it is known whether a sample consisted of macrofossils, it will receive no penalty for the property SampleType.
However, for a sample that consisted of bulk, the overall penalty score may still be low (and resulting confidence level green) if other properties (with an assigned weight above zero) were well known and few further penalties were assigned. In case SampleType is crucial to answer the research question, either its weight should be increased substantially, or a filter should be applied after the quality assessment to generate a list of dates for instance with green confidence level and only macrofossils as SampleType.
It is also important to realise that the more strictly the boundaries of the confidence levels are defined, and the more subsequent filtering is applied, the smaller the resulting subset of data points will be. This may also result in over-representation of samples from a few studies from a specific area (as these have comparable taphonomic and dating quality), which may affect how representative outcomes are for the study area as a whole.
T A B L E 6 Landform groupings, specifying applied Landform filter and number of dates with green confidence level and all confidence levels combined (only basal dates)  Table 6, for interpretation of KDE plots see caption Figure 8). Results are from model runs where dates from all confidence levels were included. (g) Comparison of model outcomes for landform type 'Plains and ridges' when only basal dates are included versus dates from all stratigraphical positions [Color figure can be viewed at wileyonlinelibrary.com] F I G U R E 1 0 Comparison of peat initiation data with δ 18 O and sea level rise curves. (a) δ 18 O curve (GICC05 NGRIP δ18O data accessed through OxCal). The bimodal distribution of peat initiation dates (including all confidence levels) is shown in (b); see Figure 8c for details. In the ageelevation plot in (c) only basal peat dates are included (n = 73; note that in (b) n = 74 as for one date no elevation information is available). Data points are coloured by landform. In (d) peat dates from all stratigraphical positions are shown (n = 302), data points are coloured by confidence level. Note that sample elevation in (d) for non-basal dates is only indicative as it was not corrected for potential compaction effects. The RSL curve (data from Meijles et al., 2018) was added to (c) and (d). The data points that were used by Meijles et al. (2018) to generate the RSL curve are not part of our case study dataset [Color figure can be viewed at wileyonlinelibrary.com] Finally, the quality assessment only makes a difference if the dates actually differ for the selected criteria, otherwise the majority will receive the same penalty. This means that the combination of criteria used (i.e. turned on and off by reducing the weight to zero) is crucial to really distinguish dates based on their quality.

| Implications and recommendations
Data rescue and reuse lead to improved continuity of data (Gil et al., 2016) and development of new, overarching insights (e.g. Ruppel et al., 2013;Tolonen & Turunen, 1996, this study). Based on the process of data rescue and meta-analysis of the case study, it appears that the two largest peat remnants in this area, Fochteloërveen and Bargerveen, have so far only been considered by two studies dating one and two vertical cores, respectively (Dupont, 1986;Van Geel et al., 1998). These remnants are the main storage sites of the remaining peat archives and have scientific potential yet to be discovered.
The properties that are recorded and their level of detail always depend on the research question to be answered. Additionally, awareness of what is relevant to report may differ between disciplines.
However, based on experiences with data reuse in our case study, we emphasise the importance of recording detailed information on basic properties such as geographical location, elevation, stratigraphical position and sample details. With peat soils further diminishing in spatial extent but also in thickness, we underline the importance of registering coordinates, and where possible elevation in m O.D.
Without this information, options for future peat studies that require field data are further reduced. Additionally, sharing data based on the FAIR (findability, accessibility, interoperability and reusability) principles is key (Gil et al., 2016;Wilkinson et al., 2016), otherwise options for reuse decrease rapidly (Savage & Vickers, 2009).

| CONCLUSIONS
We developed a workflow for reuse of legacy geochronological data in peatland studies, including rigorous quality assessment. The latter can easily be tailored to specific research questions by adjusting the relative weights assigned to penalised aspects.
The proposed approach was tested on a case study of (former) peatlands in the Netherlands. Peat growth started in the Late Glacial ($14,000 cal years BP), dropped during the Boreal ($9,500 cal years BP) and showed a second peak in the Subboreal ($4,500 cal years BP). Peat initiation occurred in the Late Glacial and throughout the Holocene in river valleys, whereas only during the Subboreal on plains and ridges. We tentatively conclude that the earliest peak was mostly driven by climate (Bølling-Allerød interstadial), whereas the second was probably the result of Holocene sea level rise and related groundwater level rise in combination with climatic conditions (hypsithermal).
Studies that reuse legacy data may yield new insights that require a bird's-eye view to be discovered. However, their success depends on data retrieval. We therefore emphasise the importance of FAIR sharing of detailed information on basic properties such as geographical location, elevation, stratigraphical position and sample details.
These should be recorded irrespective of research aim, to prevent further data loss from peat archives that are at risk of disappearing.

AUTHOR CONTRIBUTIONS
Funding was secured by RvB. CQ drafted the outline for the research, We thank Annemie Kersten and Bert Groenewoudt for help with the literature search, several authors whose work was included in the case study for providing details on radiocarbon samples, and Kim Cohen for the discussion and information about databases for legacy radiocarbon dates. We thank Harm Jan Pierik and an anonymous reviewer for their efforts in reviewing an earlier version of this manuscript and the dataset; their feedback was highly appreciated.

CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.

DATA AVAILABILITY STATEMENT
All data from this study are available under CC-BY 4.0 license at the 4TU.Centre for Research Data; see Quik et al. (2021). The automated quality assessment (Python script) and OxCal scripts are also included.
T A B L E 7 Comparison of peatland initiation and expansion in the study area as indicated by three Dutch national palaeogeographical map series Zagwijn (1986) Westerhoff et al. (2003 Vos et al. (2020) Vos (2015) Nr. of maps/timeframes 10 6 13 Peat initiation 1 $ 7,500 cal years BP $ 6,500 cal years BP $ 7,500 cal years BP Maximum extent 1 $ 3,250 cal years BP $ 2,600 cal years BP $ 2,500 cal years BP 1 For our study area