The Landscape Data Commons: A system for standardizing, accessing, and applying large environmental datasets for agroecosystem research and management

Understanding where, when, and why agroecosystems are changing requires quality information about ecosystems that span land tenure, ecological processes, and spatial scales. Over the past two decades, land management agencies and research groups have adopted a suite of standardized methods for monitoring rangelands, which have been implemented at over 85,000 monitoring locations globally. However, the ability to use these data to understand agroecosystem dynamics and change across scales and across land ownership has been limited because, until now, these data have not been available in a harmonized, accessible format for analyses, modeling, and decision‐support tools. We present the Landscape Data Commons, a cyberinfrastructure platform that harmonizes and aggregates standardized agroecosystem data, enables linkages to models, and facilitates analysis and interpretation of data within decision‐support tools. The Landscape Data Commons provides a community platform for users to contribute data and develop next‐generation tools to support agroecosystem management through the 21st century.


INTRODUCTION
Shared data provide unprecedented opportunities to understand agroecosystem responses to weather, climate, land use, disturbance, and management.Ecosystems globally are threatened by the dual challenges of land degradation and climate change, which are reducing ecosystem resilience to drought, increasing soil erosion rates, and contributing to productivity and biodiversity loss (Bestelmeyer et al., 2015;Cowie et al., 2011;Webb et al., 2017).Data on the status and condition of ecosystems are needed to identify trends, manage threats, and as a basis for adaptation strategies (Karl et al., 2012;Verstraete et al., 2011).Data-enabled advances include the development of predictive models of ecosystem change (Peck et al., 2019;Peters et al., 2020), seasonal and long-term forecasts (Ash et al., 2007;Dietze et al., 2018), quantitative benchmarks (Bestelmeyer et al., 2017;Webb et al., 2020), and decision-support frameworks for assessing management trade-offs and synergies (Musumba et al., 2017).There is now a great opportunity to improve ecosystem research and management through data systems that leverage modern cyberinfrastructure (i.e., networked software and hardware computational systems, servers, and databases; Atkins, 2003) and contemporary data sharing ideals (i.e., findable, accessible, interoperable, and reusable [FAIR]; Wilkinson et al., 2016) to aggregate data across communities, interactively connect data to models, and integrate data and models with decision-support tools.Yet, no applications integrate multiple data types and models across agroecosystems (e.g., rangelands, pasturelands, and croplands), ecosystem processes, and stakeholders.
There is a clear need for an integrated data system, or data commons, that (1) houses data describing multiple agroecosystem processes and (2) can be used to evaluate the impacts of management practices and agroecosystem attributes.In the United States, researchers and land managers adoption of standard monitoring methods that describe soil and site stability, hydrologic function, and biotic integrity ecosystem attributes (Herrick et al., 2018) have transformed agroecosystem monitoring and assessment (McCord & Pilliod, 2022), provided new research and collaboration opportunities, and enabled data-informed land management (Herrick et al., 2010;Toevs et al., 2011;Webb et al., 2016).However, data are housed in agency-specific databases (e.g., Kachergis et al., 2020;USDA NRCS, 2020), research data repositories (e.g., Delgado et al., 2018), or individual data management systems (e.g., Courtright & Van Zee, 2011), and the process of locating, obtaining, aggregating, and formatting the data is a barrier to use.Previous efforts to aggregate ecological data have either focused on specific attributes (e.g., vegetation traits, species occurrence), remained unavailable to land managers and/or researchers who do not contribute data (e.g., Bruelheide et al., 2019;Robertson et al., 2014), or brought data

Core Ideas
• Managers and researchers need monitoring data that are directly connected to models and decisionsupport tools.together by aggregating metadata while leaving data harmonization to the users (e.g., Michener et al., 2012).Data harmonization, whereby aggregated data collected using standardized methods are brought into a composite dataset and format, can support FAIR data and could increase data use in research and management.An integrated data system would provide a shared, accessible knowledge base where researchers, land managers, conservation planners, and other partners contribute and access data and run analyses and models.
We developed the Landscape Data Commons to address the critical need for a common data portal and decision-support infrastructure for ecosystem research and land management.The primary purpose of the Landscape Data Commons is to harmonize disparate agroecological datasets and make them available through cyberinfrastructure to support knowledge and model development and data-informed decision-making.The Landscape Data Commons makes agroecosystem data available in a single, comprehensive, analysis-friendly dataset in which barriers, including disparate data formats, repository structures, and permissions, have been surmounted.Another objective is to support new analysis tools and mechanisms for integrating agroecosystem knowledge.Here, we describe the Landscape Data Commons with examples of current applications in rangelands.We invite the broader scientific, management, and education communities to use the Landscape Data Commons to explore data, conduct analyses, and develop and improve tools across a diversity of ecosystems.

LANDSCAPE DATA COMMONS
The Landscape Data Commons enables research and management communities to efficiently explore ecosystem processes and ecosystem management strategies through (1) data harmonization and aggregation, (2) data access, (3) model connections, and (4) integration with other datasets and analysis tools (Figure 1).

Data harmonization
At present, the Landscape Data Commons houses global data from standardized plot-based monitoring methods collected in primarily rangeland ecosystems.These core methods include line-point intercept, canopy gap, vegetation height, species inventory, and field wet aggregate soil stability (Herrick et al., 2018).In the United States, across research institutions, state and federal agencies, and non-governmental organizations, data have been collected at more than 85,000 locations since 2004 (Figure 2).An estimated 3000 monitoring locations have also been sampled internationally (Cleverly et al., 2019;Densambuu et al., 2018;Oliva et al., 2020).Although data collected using these methods are comparable across programs, they are not interoperable because they have often been stored in different formats.Accordingly, the first task of the Landscape Data Commons is to harmonize these datasets by transforming data stored using different schemas into a single, analysis-friendly dataset (Figure 3; McCord et al., 2022).We also harmonize other information, including covariates that are co-located with some core methods plots, including meteorological and soil erosion data collected by the National Wind Erosion Research Network (Webb et al., 2016), soil profile characterizations (Herrick et al., 2018), ecological site or site potential identifications (Caudle et al., 2013), and rangeland health assessments (Pellant et al., 2020).
After harmonizing raw measurements from field data collections, we aggregate the data into a single dataset and produce commonly used plot-level indicators as part of the harmonized dataset (Toevs et al., 2011).These indicators include vegetation cover and composition, soil surface cover, canopy gaps in different size classes, vegetation height by species and plant functional group, and wet soil stability estimates.In cases where models that rely on inputs from the Landscape Data Commons produce plot-level estimates, model outputs are stored alongside plot-level indicators (see section 2.3 below).

Data access
The Landscape Data Commons provides access to harmonized data through a web data portal (www.landscapedatacommons.org) and an application programming interface (API; https://api.landscapedatacommons.org/,https://api.landscapedatacommons.org/api-docs/)(Figure 3).Users can access the harmonized raw data as well as environmental indicators and model outputs produced from the raw data.The calculation of environmental indicators follows U.S. Bureau of Land Management Assessment Inventory and Monitoring and the US Natural Resources Conservation Service standards (see McCord et al., 2022).Within the data portal, users can spatially select data using custom polygons, preview, and then download the data.The Federal Geographic Data Committee standard metadata and project description tables are included in the download package.Data with open use policies are freely available for all visualization and analysis purposes, while a range of data access permissions and embargo periods support data sovereignty, active research, and other legal data protections (Carroll et al., 2020).The Landscape Data Commons accommodates mixed access within a dataset; for instance, datasets where geographic locations are not open to all users due to land ownership (e.g., for some plots located on private or indigenous lands), observations could be included as part of summarized distributions (e.g., boxplots) provided by a data visualization tool.

Model connections
The Landscape Data Commons provides harmonized data formats that enable researchers and managers to efficiently run models from monitoring data inputs (Figures 2 and 3).We currently support three model types through the Landscape Data Commons, with efforts ongoing to integrate others (e.g., Musumba et al., 2017).The first type of model supported by the Landscape Data Commons is plot-based process models that require inputs from the monitoring data and produce estimates of wind erosion (Edwards et al., 2022) or water erosion (Hernandez et al., 2017) for a given monitoring location.Model outputs are supplied back to the Landscape Data Commons for use by others-it increases accessibility to critical soil erosion information (Figure 2).Second, the Landscape Data Commons supports the development of modeling products that are stored and accessed on other platforms.For instance, the Landscape Data Commons was used to train and validate the Rangeland Analysis Platform (Allred et al., 2021) that produces spatial and temporally explicit estimates of vegetation cover indicators (Figure 2).Finally, the Landscape Data Commons is increasingly used in the development of conceptual models of ecosystem dynamics and services used to assess conservation practice effectiveness and improve conservation planning (e.g., Fletcher et al., 2020;Heller et al., 2022).

Integration with other datasets and analysis tools
The Landscape Data Commons is designed to support the linkage of ecosystem inventory and monitoring data to The Landscape Data Commons enables researchers and managers to better understand agroecosystem dynamics by harmonizing and aggregating standardized monitoring data (Herrick et al., 2018;McCord et al., 2022), facilitating connections to models including wind (Aeolian EROsion) and water (Rangeland Hydrology and Erosion Model) erosion models and artificial intelligence remote sensing models, making data and model outputs available to users through a data portal and application programming interface (API), and providing links to analysis and decision support tools.

F I G U R E 2
The Landscape Data Commons connects research datasets and models to land management monitoring programs so that standardized field measurements can be extended to understand new indicators (e.g., wind and water erosion; Edwards et al., 2022;Hernandez et al., 2017) and extend existing indicators of vegetation cover across space and through time (e.g., Allred et al., 2021).

F I G U R E 3
The Landscape Data Commons cyberinfrastructure relies on a set of workflows and code that leverage different languages, including R, Python, and JavaScript.Where possible, workflows are automated to increase efficiency and maintain data quality.Without these tools, workflows, and databases, individual researchers and managers with the same goals as the Landscape Data Commons would be required to build their own data harmonization tools (e.g., McCord et al., 2022), develop quality assurance (QA) and quality control (QC) procedures (McCord et al., 2021), maintain a data storage and data sharing mechanism, and establish connections to a range of models and analysis tools (Allred et al., 2021;Edwards et al., 2022;Hernandez et al., 2017).
analysis tools and decision-making frameworks.This includes presenting field data with model outputs alongside conceptual and knowledge systems (Figures 3 and 4).One such knowledge system consists of major land resource areas and ecological site descriptions, a classification of land potential and ecosystem dynamics, that the U.S. Department of Agriculture and others use for conservation planning (Brown & Havstad, 2016;Kachergis et al., 2020;Spiegal et al., 2016).The Landscape Data Commons links indicator datasets to ecological sites through the Ecosystem Dynamics Interpretive Tool (https://edit.jornada.nmsu.edu/)so that quantitative and qualitative understandings of ecosystems can be explored.

CASE STUDY: ASSESSING WIND EROSION RISK IN THE CHIHUAHUAN DESERT
Wind erosion and resultant blowing dust are important threats to ecosystems and human health in the Chihuahuan Desert of New Mexico (Figure 4).Wind erosion selectively removes fine soil, nutrients, and carbon (Webb et al., 2012), while blowing dust exposure is linked to adverse health effects such as asthma, Valley fever infections, and is a safety hazard to road transportation (Tong et al., 2023).Ecologically, the region is dominated by grasslands and savannas, which have experienced extensive shrub encroachment and loss of perennial grasses that are desirable for livestock production (Bestelmeyer et al., 2018).In the last century, vegetation change from grass-to shrub-dominated ecological states has been accelerated by drought episodes (Bestelmeyer et al., 2011;Christensen et al., 2023).This shift in vegetation composition has resulted in a reduction of total foliar cover, which enables wind erosion to act as a positive feedback mechanism promoting shrub encroachment on sandy soils across the region (Okin et al., 2006).Land management to ensure regional ecosystem and human health relies on identifying thresholds that can be used as benchmarks to indicate where rangelands might be susceptible to accelerated wind erosion and ecological state change (Webb et al., 2020).Benchmarks can be used with monitoring data to assess wind erosion risk and develop appropriate management strategies from site F I G U R E 4 An example application of the Landscape Data Commons to assess wind erosion risk in the context of drought at the site (Jornada), ecological site group (Sandy ESG), major land resource region (MLRA 42B), and ecoregion (Chihuahuan Desert) scales in New Mexico (a).We used 678 plots, collected from 2011-2019, to establish a benchmark value for total foliar cover that indicates an increased risk of horizontal aeolian sediment mass flux, Q (g m −1 day −1 ), produced from the Aeolian EROsion (AERO) model (b).The established benchmark (30% total foliar cover) was confirmed by local experts.We then applied this benchmark to the remaining Landscape Data Commons points in the study area to determine how wind erosion risk is changing across scales (c) and in the context of ongoing drought pressure data from the US Drought Monitor, available at https://droughtmonitor.unl.edu/(d).
to regional scales.However, without access to monitoring datasets that span land ownership in the region or linked erosion models, managers have had to rely on datasets that may not represent the range of ecological conditions and may not have time or expertise to harmonize data and run models to assess erosion risk-reducing the information available to inform management decisions.
We used the data and model results from 1,963 plots in the Landscape Data Commons to assess risk of wind erosion and associated ecological state change in southern New Mexico (Figure 4) at the US EPA Ecoregion Level III (Omernik, 1987), the major land resource area 42B (MLRA 42B; (Salley et al., 2016), for the sandy ecological site group (ESG; Salley et al., 2016) and a long-term research site on the Jornada Experimental Range (Webb et al., 2016).First, we established a benchmark for wind erosion by comparing total foliar cover estimates to modeled wind erosion estimates from the Aeolian EROsion model (Edwards et al., 2022) at 678 plots sampled between 2011 and 2019 (Figure 4b).We found that horizontal sediment mass flux (Q) increased when total foliar cover was below 30%.This value was validated by local experts familiar with wind erosion processes in the Chihuahuan Desert.We then applied this benchmark to the remaining 1,185 plots and evaluated the status of the total foliar cover relative to the 30% total foliar cover benchmark between 2015 and 2022 to determine the proportion of plots sampled that were meeting the benchmark and therefore, at lower risk of wind erosion (Figure 4c).We found that at the landscape scales (ESG, MLRA, and ecoregion), years dominated by extreme or exceptional drought conditions (Figure 4d) corresponded to increased risk of wind erosion, but this risk decreased as drought conditions eased.For the long-term research site on the Jornada Experimental Range, all site visits since 2016 were not meeting the wind erosion risk benchmark, indicating that a shift from grass-dominated to shrub-dominated states may already have occurred and that the site is at risk of further soil loss due to wind erosion.
This case study reveals the utility of harmonizing and aggregating monitoring datasets and erosion models in the Landscape Data Commons.Harmonized and aggregated regional datasets linked with erosion models can provide data to support benchmark development to assess threats to rangeland social-ecological systems and provide insights into regional responses to climate and land management.Access to site-specific information in the context of the ecoregion enables users of the Landscape Data Commons to maximize the amount and spatial extent of data available to identify areas that are resilient to drought and other pressures and then initiate efforts to deploy site-specific management changes where needed.

DISCUSSION
The Landscape Data Commons cyberinfrastructure has the potential to transform scientific collaboration among researchers, land managers, and policymakers (National Academy of Sciences, Engineering & Medicine, 2016).Harmonized data in the Landscape Data Commons currently supports ecosystem service evaluations (Fletcher et al., 2020;Metz & Rewa, 2019), rangeland wind and water erosion assessments (Edwards et al., 2022;Nearing et al., 2011), biodiversity assessments (Condon & Pyke, 2020), multiscale species distribution modeling (McMahon et al., 2021), and the USDA Long-Term Agroecosystem Research network (Spiegal et al., 2022).The Landscape Data Commons is unique in agroecological contexts in that we harmonize ecological monitoring data into a common format that both managers and researchers can use.Harmonized data enable scientists to support conservation and ecosystem management through knowledge co-production, where land management communities contribute data and assist in the interpretation and conceptual advances using those data (Herrick et al., 2017;Peters et al., 2020).While there are an increasing number of initiatives, such as the National Ecological Observatory Network, that simultaneously established standardized methods, data collection frameworks, and databases, and efforts such as the Agricultural Model Intercomparison and Improvement Project that are making large datasets available to researchers for modeling applications, few have successfully compiled data collected by multiple entities and made the data and model outputs accessible in a way that allows managers to easily analyze and interpret the data.In contrast, the Landscape Data Commons provides both data harmonization and analytical capabilities.Future expansions of the Landscape Data Commons may include contributions from compatible monitoring programs worldwide (e.g., Cleverly et al., 2019;Densambuu et al., 2018;Oliva et al., 2020), field-based aboveground production, soil health, and other measurements.We encourage the international community of land managers and researchers to use and contribute data to the Landscape Data Commons to advance ecosystem analyses, modeling, and assessments.
Data sharing and the addition of new data types will extend the utility of the Landscape Data Commons.For example, vegetation and soils data in the Landscape Data Commons are more easily interpreted alongside historical records documenting conservation and restoration practices (e.g., Pilliod et al., 2017), information on land management or protection status (U.S. Geological Survey, 2022), and conceptual models of ecological potential (Bestelmeyer et al., 2016).Creating multiple points of harmonized data access through APIs and a data portal ensures that a broad range of data users can interact with these datasets.Finally, data harmonization and sharing efforts are successful if they have direct applications to models and decision-support frameworks.Considering those connections in advance and working collaboratively with both the modeling and land management communities will ensure the Landscape Data Commons can maximize benefits to researchers and conservation planners.
As an infrastructure that can connect to both knowledge resources for managers and scientific modeling advances, the Landscape Data Commons provides a conduit for improving the accessibility of scientific research to managers (Figure 4).Future applications of the Landscape Data Commons could contribute to knowledge co-production in an adaptive management context, where data collected by land managers can help researchers improve models and advance our understanding of ecosystem processes to inform data-supported decision-making.

•
Standardized monitoring protocols present an opportunity to understand cross-scale agroecosystem dynamics.• The Landscape Data Commons provides data harmonization, data access, and model connections.• Standardized data and modeled indicators enable managers to leverage quantitative data in decisionsupport tools.• Shared data and model infrastructure can support collaborative adaptive management on agroecosystems globally.