Building a multi‐scale, collaborative, and time‐integrated digital crust: The next stage of the Macrostrat data system

Macrostrat is a platform for deep‐time geoscientific research that integrates stratigraphic columns and geologic maps into a digital description of the crust. The database and supporting software track crustal evolution and provide location‐based geological information to geoscience end users. Macrostrat houses multiple scales of mapping and stratigraphic data, from continent‐ and basin level summaries to single quadrangles and measured sections. Currently, Macrostrat's primary data holdings consist of regional stratigraphic columns with a spatial footprint weighted heavily to North America. While the data are of sufficient scale and resolution to generate insights about Earth evolution, increasing resolution and expanding spatial coverage will allow a new generation of scientific and interpretive uses. The next phase of Macrostrat's development will increase the detail and complexity of Macrostrat's multiscale data holdings, largely by engaging a wider range of geoscientists in entering stratigraphic data. To support broad collaboration, we are building new web‐based software to assemble and visualize regional stratigraphic sequences, refine multiple working age models, and compose regional records from measured stratigraphic sections. These tools will allow Macrostrat to draw on the expertise of a wide range of geoscience workers and grow a dataset with global relevance and a variety of end uses. New capabilities will pave the way to processes for submission, review, coordination, and assimilation of community‐contributed stratigraphic datasets. Digital compilation of geological maps and columns requires substantial effort, and well‐designed systems for distributing this work in the geoscience community will allow Macrostrat to build more adaptable and scientifically relevant products.


MACROSTRAT DATA SYSTEM
The Macrostrat data system describes the basic geologic elements of the Earth's crust in space and time.It consists of a relational geospatial database that holds information about rock units and their chronostratigraphic context, an age model that sits atop these column records, and a linked set of geologic mapping data (Peters et al., 2018).
The system provides an integrated platform for deep-time crustal research that describes the rock record quantitatively and in an explicit spatial and temporal context.
Macrostrat supports innovative deep-time Earth science (Section 1.2), and its data have entered wide community use through publicly accessible portals.However, its scope and usefulness are limited by incomplete and lowresolution data coverage.Here, we outline 'Macrostrat v2', a collaborative software infrastructure that extends Macrostrat into a comprehensive, multiscale stratigraphic archive.Carefully designed software will provide new ways for users to capture, visualize, and benefit from stratigraphic datasets at all scales.This software will support a new phase of collaborative data capture, engaging a broad community of researchers in digitally describing the Earth's stratigraphic framework.

| Fundamental elements
Macrostrat's stratigraphic and geologic mapping system allows geologic units, their properties, and spatial and temporal links between them to be captured and tracked.Data have been harmonized from many sources over ~15 years since the system's inception. 1.1.1| Stratigraphic columns   Macrostrat's column data model consists of a set of chronostratigraphically arranged columns that describe successions of rocks Figure 1, (Peters et al., 2018).Columns are composed of stacked 'units', which describe physicallyand time-bound bodies of rock or sediment.Each unit can carry attributes, such as dominant and subsidiary lithologies, thickness (or range of thicknesses, if applicable), and internal characteristics like sedimentary structures, fossil occurrences, geochemical measurements, or palaeocurrent directions.Units can also be linked to a variety of proxy records that describe important features of rock units (Section 5.1.2).

A principled accounting of geologic time
The stratigraphic superposition of units can be tracked both by physical relationships (stratigraphic height) and chronostratigraphic ordering relative to time interval boundaries.Columns can optionally be subset into unconformity-bound 'packages' with a continuous representation of time within.This schema describes the physical layout of strata using relationships that have been understood since the genesis of modern geology (Steno's law), building a descriptive digital record of stratigraphy that is adaptable for a variety purposes.
This physical description of stratigraphic relationships is augmented by a simplistic but capable time-age model which is applied to all columns in the Macrostrat database.Some geochronologic subdivisions, such as stage boundaries (Gradstein et al., 2020), have well known age constraints, while others acquire age information from their positions relative to these reference points.This system enables positional stratigraphic information to be mapped into continuous absolute time by interpolation between age constraints.This hierarchy of relative age relationships allows the Macrostrat age model to be dynamic, with constraints that automatically update as they are refined.Ongoing work toward tracking multiple age models will allow Macrostrat to support a wide range of ways to account for geologic time (Section 5.2.1).

A multiscale representation of stratigraphy
Like other geoscience data systems (e.g., Schonwalder-Angel et al., 2019;Walker et al., 2019), Macrostrat recognizes that geological phenomena often follow similar rules across scales.The additional ability to track superposition against either height or geologic time allows Macrostrat to ingest virtually any stratigraphic record, from regional composite columns with no direct physical correspondence to high-resolution core logs or field measured sections (Aswasereelert et al., 2013) in the same fundamental data structure.
for distributing this work in the geoscience community will allow Macrostrat to build more adaptable and scientifically relevant products.

K E Y W O R D S
collaboration, geology, macrostratigraphy, rock record, stratigraphy, visualization Despite this flexibility, the current iteration of Macrostrat is primarily a chronostratigraphic project, and most Macrostrat columns are regional-scale records organized in geologic time.Columns and their constituent units are sourced primarily from regionally defined, representative stratigraphic summaries compiled at basin and continental scales (e.g., COSUNA; Childs, 1985).However, more localized scales of data are increasingly being ingested as part of data-assimilation projects (Segessenman, 2020, Section 4.1.2).

Project-based infrastructure
Macrostrat's 'project' infrastructure allows new datasets to be developed separately from the canonical column dataset.This system allows new datasets to be compiled separately from the 'core' Macrostrat dataset.It is being used to organize ocean-drilling cores (Section 4.1.2) and mesoscale compilations targeting Ediacaran and Mesozoic basins in North America (Segessenman, 2020) as essentially stand-alone data products within the larger Macrostrat cyberinfrastructure.Altogether, 3,992 columns have been entered into various Macrostrat projects that are not yet part of the core public dataset of 1,534 columns.These columns cover over 18% of Earth's continental crust and much of the deep sea (Figure 1).

| Geologic maps
Geologic maps carry information on crustal physical structure that complements stratigraphic columns and provides important geological context.Macrostrat's geologic map database (visual exploration at https://macro strat.org/map)integrates maps produced by a variety of organizations into a single, multiscale dataset.To date, 294 distinct geologic maps have been ingested and incorporated into the active mapping environment.These range from quadrangle-scale maps (e.g., Stone et al., 2017) to regional and global compilations (e.g., Chorlton, 2007;Horton et al., 2017).This multiscale map collection is stored in a PostGIS-enabled PostgreSQL relational database and integrated into a topologically consistent product.The composite map is publicly available as a 'Google Maps'-like global, multiscale dataset at four main levels of detail.Currently, 2,540,323 geologic map polygons, constituting ~40 GB of spatial data, are available from all map sources, and over 15,000 geologic units have been linked from the mapping datasets to stratigraphic columns.The strong age framework provided by Macrostrat's stratigraphic data system allows units in the mapping dataset to be augmented with more precise ages calculated by the Macrostrat age model; Macrostrat column units, in turn, are augmented with additional descriptive data, which often include richly detailed lithological descriptions of map units.
The Macrostrat map dataset provides global coverage of generalized mapping and a consistent framework into which detailed maps can be readily added and discovered.This allows map visualization software and derivative data analyses to be developed against a common target, taking advantage of continuous improvements to maps.However, detailed mapping is currently available only in certain areas where such products have been ingested (Figure 1b).

| A tool for deep-time science
The primary goal of Macrostrat has been to enable novel science by providing comprehensive accountings of the quantity and character of rocks in the Earth's upper crust (Peters & Husson, 2018).Geologists have captured the basic lithologic, geochemical, and biological properties of crustal rocks since the mid-1800s.This accumulated body of knowledge informs our understanding of the long-term evolution and present-day condition of the integrated Earth system.Typically, geological records are composited from individual observations (e.g., stratigraphic columns, structural relationships, and sample-based proxies) which provide strong evidence for the local state in the Earth's environment at a particular time.Macrostrat aims to complement these disconnected records with a comprehensive digital description of crustal rocks that allows the estimation of aggregate properties over space and time.This framework, in turn, can be augmented with proxy records to better characterize important geologic features (Section 5.1.2).
Tabulating comprehensive global data on the volume, age, and composition of rocks can track the evolution of the sedimentary system through time and provide baseline expectations for the distribution of sample-based proxy records.The first steps in this direction were outlined to summary results published by a group of Russian geoscientists led by Alexander Ronov in the 1950s through 1980s (Ronov et al., 1980).This approach, 'macrostratigraphy' (Peters, 2006;Peters et al., 2022), has evolved in tandem with the Macrostrat data system and drives most of its science results.The continent-scale data of Macrostrat allows bounds to be put on the actual quantity of sedimentary material that was deposited and eroded through Earth history.
Tabulations of rock area have allowed new investigations into topics including the coevolution of life and the environment (Peters, 2005(Peters, , 2008)), the structure of the Great Unconformity (Keller et al., 2019;Peters & Gaines, 2012) and the contribution of igneous rocks to Earth's crustal framework through deep time (Peters et al., 2021).Estimates of rock abundance through time provide extra power when combined with proxy datasets, such as organic carbon (Husson & Peters, 2017) or stromatolite occurrences (Peters et al., 2017).

| A community-accessible index of physical geology
Though the Macrostrat database is primarily geared towards supporting global-scale integrative science, the compilation of a global set of stratigraphic and mapping data helps the geosciences in other ways.A record of the basic physical properties of rocks can be used as reference material or to provide context for more localized work.Compilations of mapping and stratigraphic data are time-consuming to assemble and put to use.Aggregating datasets on a common platform increases discoverability and provides a single data endpoint that can serve a wide variety of uses.
Macrostrat can be accessed using a freely available 'application programming interface' (API; https://macro strat.org/api)that exposes its mapping and stratigraphic datasets in a consistent, machine-readable format.This public view of Macrostrat's resources drives a wide range of software that depends on stratigraphic and geologic mapping data.Some examples include our geologic map interface (https://macro strat.org/map)and the Rockd mobile app (https://rockd.org),which combines Macrostrat contextual data with the ability to collect and share basic outcrop information (Figure 2).
Macrostrat's data resources are in wide community use for research, teaching, and exploration.The Macrostrat API serves 100 K+ requests daily and provides geological context to a variety of user-facing software, including data portals (e.g., https://macro strat.org/mapand MinDat) and mobile apps that provide geologic context in educational and field settings, such as StraboSpot (Walker et al., 2019), Mancos, Flyover Country, and Rockd.

OVERCOME DATA LIMITATION
Although Macrostrat's data products are expressive and widely used by geologists, the system has reached limits in scope and functionality that require a change in approach to overcome.Despite Macrostrat's strong data model, the centralized nature of its data-gathering limits the scope of effort and new knowledge that can be brought to bear in assembling new column datasets.As a result, the Macrostrat database is primarily composed of low-resolution columns with incomplete spatial coverage (Figure 1).
To improve the completeness of Macrostrat's data holdings at all scales and broaden the impact of its capabilities for global stratigraphic analysis, a more participatory approach to curating Macrostrat's stratigraphic dataset must be taken.New, user-friendly workflows for data assimilation will allow contributions from workers with a range of geological backgrounds and expertise.This new community-oriented approach, along with the software tools needed to support it, constitutes a major update to Macrostrat's capabilities.The next phase of Macrostrat's evolution will comprise a substantial community infrastructure for collaboratively creating, integrating, and using digital geologic columns and maps.We refer to this new phase of work as 'Macrostrat v2'.

| Goals of data expansion
The envisioned expansion of Macrostrat focuses on stratigraphic columns, but a parallel approach will also be taken for Macrostrat's geologic map system, which is also limited (though to a lesser degree) by centralized data ingestion processes.Expanding column data holdings has two main goals: to produce better global coverage of stratigraphic data and to better represent multiple scales in Macrostrat's archive.Increasing the global coverage of Macrostrat's regional column dataset will support better-constrained characterizations of Earth history, accounting for the stratigraphic information available in all sedimentary basins.At the same time, integrating higher scale measured stratigraphic data will support new capabilities for regional and local science.The information gathered in this push will provide richer opportunities to explore the geologic record through Macrostrat's publicly available software and services (Section 1.3).

| Global coverage
Currently, the spatial footprint of Macrostrat columns is heavily weighted to North America, although recent work has expanded the dataset in South America, New Zealand, and the deep sea (Section 4.1.2).Overall, only ~20% of the Earth's continental surface is covered by a stratigraphic column, complicating global inference.Building a truly global dataset will open the door to richer studies of global change less likely to be biased by continent-specific records.Critically, it will provide a chance to determine whether signals that dominate the current North America-focused Macrostrat database, such as the 'boring billion', Great Unconformity, and Sloss tectonostratigraphic sequences, manifest as strongly when substantial continental area outside of North America is integrated.Improving this coverage towards a comprehensive global record will require engagement from geologists with local expertise focused on various regions.1b).
2.1.2| Multiple spatial scales Macrostrat's key dataset of summary stratigraphic columns has sufficient scale, chronostratigraphic sampling, and resolution to generate insights about the continents' evolution through Earth history.Macrostrat also supports stratigraphic datasets much more detailed than these regional-scale summaries, including measured sections (Section 1.1.1).However, apart from ocean-drilling cores (Section 4.1.2),relatively few high-resolution data have been entered into Macrostrat thus far.
A stratigraphic dataset with increased spatial and temporal resolution will support new modes of scientific usage.Part of the reason Macrostrat's scientific focus has thus far leaned heavily towards the global 'macrostratigraphic' perspective is that such science is well-supported the regional composite data that are currently available.Macrostrat column datasets are often at too coarse a scale to capture high-resolution details (Figure 6a and b), leading to a different view of the geologic record than that derived from localized scrutiny (Flowers et al., 2020).Far from being an intrinsic shortcoming of Macrostrat's data management approach, this stems from the fact that Macrostrat is often data-limited at the detailed scales necessary to generate insights relevant to regional and local studies.
Increasing the scale and fidelity of data tracked by Macrostrat will allow the system to be used in localized and basin-scale studies, using the same tools and approaches that have been developed to study global change (e.g., Aswasereelert et al., 2013).Eventually, the digital capture of high-resolution column datasets will allow the automatic construction of stratigraphic composites from local constraints (Section 5.2.1).

| Limitations of current data entry approaches
Column data are currently entered into Macrostrat via manual database inserts, scripts, and web forms.These are efficient for expert users, but are aged, poorly documented, and fragmented.Only a handful of people, mostly Macrostrat lab graduate students and close collaborators, have been sufficiently versed in these techniques to enter data into the system; this approach cannot scale to a large collection of workers.
Increasing the spatial coverage of Macrostrat's stratigraphic datasets requires engaging with regional literature and expertise, and in some cases, seeking translation support.Data entry limitations are compounded for highresolution data: although measured stratigraphy and drill cores are abundant and straightforward to digitize, localized columns represent such small areas of the globe that far more must be captured to create useful composite datasets.
As the Macrostrat dataset has grown in size and complexity, the staff involved in its curation do not have sufficient capacity or local expertise to add more detailed stratigraphic data worldwide.To grow, Macrostrat must prioritize collaborative data entry to involve a distributed community of scientists in gathering and validating datasets.

| Feedback between software and engagement
Effectively harnessing distributed community effort will greatly increase the quality and specificity of stratigraphic data tracked in Macrostrat, but will require a well-designed collaborative infrastructure.Successful knowledge-curation platforms generally support a engaged community of contributors with innovative, purpose-built software.Since the early 2000s, Wikipedia has built a best-in-class encyclopaedia through broadbased collaboration (e.g., Laniado & Tasso, 2011;Wilkinson & Huberman, 2007), with innovation supported by the MediaWiki software that powers the project (Barrett, 2008).Similarly, the OpenStreetMap collaborative geospatial index (Haklay & Weber, 2008)  and user-curated geoscience databases such as the Paleobiology Database (Uhen, 2018) and Neotoma (Williams et al., 2018) have also built web-based software tools to curate community data resources.Macrostrat's new collaborative design additionally follows the principles of 'visual analytics' (Heer & Agrawala, 2008;Keim et al., 2008), in which data-rich visualizations are used to convey information in a way that assists understanding and continued enhancement.Following these patterns, new software tools for editing digital column datasets (Section 3) can support collaborative data stewardship by a wider range of geologists (Section 4).With rich graphical displays of stratigraphic columns (Section 3.2) and user-friendly editing components (Section 3.3), geologists will receive guided feedback for building and correcting column datasets.This 'virtuous cycle' will speed the production and quality of digital stratigraphic columns.Establishing collaborative infrastructure to effectively harness distributed community effort will greatly increase the quality and specificity of stratigraphic data tracked in Macrostrat.

COLLABORATION
The first iteration of Macrostrat software infrastructure focused on assembling the core stratigraphic and mapping databases and building APIs to support their public use (Peters et al., 2018).'Macrostrat v2' shifts this focus to user-facing software to enable collaboration.Macrostrat's current column production pipeline, which consists of scripts tailored for expert use that provide minimal feedback, is not suitable for collaborative use.Correcting this deficiency requires careful attention to the user-facing software that mediates data workflows, to ensure effective representation of geological details and a simple user experience.
Here, we describe a new generation of software for building, visualizing, and using stratigraphic column datasets.New data entry software prioritizing ease of use will allow geologists to input columns with minimal guidance.A column visualization framework will showcase the results of this effort in a familiar and information-dense graphical format, encouraging contributions and data revisions.Lastly, improved tools to use stratigraphic datasets, link them to other geological information, and build Earth-system models will enhance the scientific value of Macrostrat column records, both individually and in aggregate (Section 5.1).This user-friendly software will provide the basic architecture for collaborative data-gathering efforts for column data (Section 4).

| Elements of system design
Macrostrat's user-facing applications (Table 1) capitalize on modern web technologies.User interfaces are developed in TypeScript to run in the web browser.Most applications use React, a robust and commonly used library for component-based web design, which allows pages to be built from nested, composable blocks.Web components are easy to share across applications, allowing visually rich interfaces to be rapidly assembled.Macrostrat maintains its own library of web components targeted at manipulating geological data (Table 1c).Such components are becoming standard elements of web software design; they will be maintained and extended to support future needs.
Broadly, the user-facing components described here are supported by the public Macrostrat API and authenticated extensions for stratigraphic data editing.Information will be stored in a PostgreSQL relational database alongside Macrostrat's other data holdings, and applications will run in Docker containers to ensure portability and maintainability.Macrostrat's software is versioned using git and our code is open-source and publicly available on GitHub.

| Visualizing stratigraphic columns
To serve the interpretive needs of geologists, stratigraphic columns must be presented using established conventions for data-dense, graphical display.Software to visualize stratigraphic columns in accepted, human-readable formats is necessary to drive broad usage of digital column datasets.Most existing software to digitally render stratigraphic columns focuses on producing static representations of individual columns as part of a pre-production process, either for measured stratigraphic logs (Duncan et al., 2021;Jobe et al., 2021;Lewis et al., 2011) or composite columns and correlation charts (e.g., Zehady et al., 2020).Macrostrat's data model and API responses can represent both types of stratigraphic summary, but its output is not optimized for human consumption.In our new visualization approach, we dynamically construct graphical stratigraphic columns from canonical computerreadable records, transmitted over Macrostrat's API.
The @macrostrat/column-components library (part of our web components system, Table 1c) focuses on building expressive, high-fidelity stratigraphic column visualizations using standardized representations of lithology, unconformities, and symbologies (Federal Geographic Data Committee, 2006).The module is written in Typescript, uses web-based visualization tools such as d3 (Data Driven Documents; Bostock et al., 2011), and produces HTML and SVG outputs for interactive display in modern web browsers.A modular, component-based design (Section 3.1) allows column visualizations to be produced for different purposes from the same underlying data (Figure 3).Column-related information such as chemostratigraphy and sedimentary textures can also be integrated Figure 3b and c.The adaptability of component-based visualizations allows graphical columns to balance expressiveness and approachability.Simplified column visualizations will be added to Macrostrat's web interface and the Rockd app.More complex versions, integrated closely with editing tools, will allow geologists to efficiently notice and correct deficiencies in stratigraphic column datasets (Section 2.3).

| Ingesting stratigraphic data
Alongside new visualization capabilities, a new generation of web user interfaces will enable collaborative data entry workflows for Macrostrat columns.Usability will be prioritized by establishing guided workflows for data curation tasks.These tools will increase the efficiency of digitizing stratigraphic columns, allowing more geoscientists to produce records that can be ingested into Macrostrat.Allowing flexibility while guiding users to produce meaningful, well-linked datasets requires sophisticated userinterface design.Here, we describe several examples of user-friendly interfaces that support complex data production workflows.

| Capturing column datasets
The centrepiece of Macrostrat's collaborative data entry infrastructure is the prototype 'column-builder' web application Figure 4a, Table 1d, which substantially reworks Macrostrat's current data entry interfaces (Section 2.2) for increased usability and interactivity.In column-builder, stratigraphic columns are easily constructed and modified through forms and drag-and-drop interactions that streamline complex data entry tasks.Unit attributes, such as stratigraphic names, lithologic descriptions, and environmental interpretations, can be selected from searchable lists.Formbased workflows are also defined for project management and attribution to the published literature.By simplifying the process of capturing interpreted column datasets, this application will enable a broader set of collaborators to contribute to Macrostrat's core column datasets.

| Editing column footprints
Stratigraphic columns in Macrostrat are defined within a geographic area of validity.For spatial continuity, column footprints must be non-overlapping and have consistent boundaries.Macrostrat's column footprint editor (Figure 4b; Table 1e) streamlines this cumbersome topological operation.The software, consisting of a TypeScript application atop a PostGIS core, allows users to quickly create topologically accurate footprints for collections of stratigraphic columns.A set of basic drawing tools allows column outlines to be created and edited.Where columns are adjacent, these inputs are used to dynamically generate a seamless tesselation of column footprints.Existing geometries (shapefiles and GeoJSON) can also be imported for editing.

| Incorporating column visualization
Pairing high-fidelity, standards-based stratigraphic data visualizations (Section 3.2) with robust editing capabilities will support continued enhancement of column datasets (Section 2.3).The prototype Stratiform application (Figure 4c; Table 1f), which prioritizes graphical input over tabular column displays, is a step in this direction.Like Macrostrat's basic column editor (Section 3.3.1),this application captures column datasets, but it is geared towards extracting stratigraphic information from measured logs and published bed-scale columns.This software will eventually support digitizing the large amounts of detailed stratigraphic data that is needed to expand Macrostrat's stratigraphic archive across multiple scales (Section 2.1.2).

| Linking stratigraphic names
Many aspects of column curation encode substantial scientific interpretation, and robust data entry workflows provide important means of ensuring fidelity.This can be demonstrated for many types of column-associated data, but an important example is capture of well-characterized stratigraphic names.A well-linked set of stratigraphic information requires consistent and accurate descriptions of stratigraphic name relationships.Communication of geologic information has benefited from well-maintained lexicons containing canonical stratigraphic names for specific regions, which are typically maintained by geologic surveys (e.g., Soller & Berg, 2005).Macrostrat unifies many of these resources into an index that facilitates processes such as linking stratigraphic and mapping datasets (Section 1.1) and assigning sample-based proxy data to geologic units (Section 5.1.2).
The varied usage of stratigraphic names complicates their digital description.One key issue is homonyms: official lexicons for the United States (Soller & Berg, 2005) and Australia (Lenz, 1996) both include to an Admiral Formation, referring to completely separate bodies of rock.Conversely, many names record locally varying descriptions of the same entity.For example, Macrostrat tracks four distinct representation of the St.Peter Formation in midcontinent North America.Nesting of stratigraphic names (e.g., Apex Basalt Formation → Salgash Subgroup → Warrawoona Group) requires close examination to capture data with the needed level of specificity.Although officially curated stratigraphic names are most desirable, linking to lexicons is not always feasible, especially for new studies at localized scales.Indeed, 6% of stratigraphic names in Macrostrat lack an associated 'concept' linking to a canonical version.However, prioritizing links where possible is important to maintaining a coherent description of geology.
One way to address such complexities is to centre user-interface design in column creation workflows and build interactions that facilitate the discovery of contextually relevant stratigraphic names (Figure 5).A new F I G U R E 4 Editing interfaces for digitizing different aspects of stratigraphic columns: (a) Tabular interface for editing regional chronostratigraphic columns (Table 1d).(b) Topological editing of column spatial footprints (Table 1e).(c) Entering measured stratigraphic data from field-gathered graphical logs and observations (Table 1f).
name linking interface allows open through all names known to Macrostrat, prioritizing those present in nearby maps and columns or linked to an official lexicon (Figure 5a).Coloured tags show brief summaries of link status, parent names, and matched locations to assist in finding the best match.Easily accessible information panels show detailed descriptions of the selected geologic unit, including hierarchy and lexicon metadata (Figure 5c).The interface provides many visual cues to help users quickly evaluate and select candidate names for a geologic unit.Similar care will be invested in user workflows for other metadata-linking tasks, allowing well-linked column datasets to be created with relative ease (Section 3.1).

ENGAGEMENT
Software to enable geoscientists to add stratigraphic column datasets with relatively little training lays the groundwork to broaden collaborations among geoscientists towards assembling a multiscale stratigraphic archive.

| Current collaborative efforts
Since its inception, supporting use by a wide range of geologists has been a core goal of the Macrostrat platform (Peters et al., 2018), with notable success (Section 1.3).However, engaging a broader set of geologists in developing the data resource is a relatively new focus.Here, we outline some ongoing collaborative projects to improve Macrostrat, which provide a template for further efforts.

| PalaeobioDB integration
Recent efforts to align Macrostrat with the Paleobiology Database (PBDB; paleo biodb.org)are emblematic of the community-led approach we seek to build.PBDB is a long-standing community data resource focused on occurrences of macroscopic life on past Earth (Uhen, 2018), with over 400 contributors and 430 official publications to date.The need to contextualize fossil occurrence data within a geological framework has driven the development of substantial links to Macrostrat for rock units, timescales, and palaeogeographic reconstructions.Macrostrat have led development of the PBDB API, and these emerging links between data systems have contributed to science integrating stratigraphic and palaeobiology approaches (e.g., Heim & Peters, 2011;Peters & Heim, 2010;Peters & McClennen, 2016).As Macrostrat gains the tools to hold different representations of stratigraphy, we plan to deepen these links, fully replacing PBDB's internal stratigraphic model with Macrostrat's equivalent.This will allow us to draw on expertise from the palaeobiology community in developing new column datasets.

| The eODP project
The ongoing eODP project (Fraass et al., 2020) has provided an opportunity to test both the technical and collaborative aspects of community-led data ingestion into Macrostrat.A team of scientists at the Integrated Ocean Drilling Program (IODP) is working with Macrostrat and PBDB to digitize, clean, and standardize stratigrapic data for the global oceandrilling expeditions under their stewardship, as well as data from the other drilling epochs (ODP, DSDP).Through this effort, 888 columns (offshore drilling holes) containing over 150,000 staged, cm-scale lithological units and their properties have been integrated into Macrostrat (Figure 3d).These columns represent physical descriptions of logged cores rather than regionally composited stratigraphy.Still, they sit comfortably alongside Macrostrat's canonical dataset in a separate project.

| The Rockd mobile app
One of the most substantial forays that Macrostrat has made into collaborative data curation has been the Rockd mobile app.The app seeks to help users discover geological surroundings is accessible to students, enthusiasts, and professional geologists.It also enlists these same users in capturing outcrop-scale observations.
Rockd uses Macrostrat's composite geologic map and public API to provide rich contextual information about nearby geologic features.Alongside these core datasets, Rockd also hosts user-provided 'checkins' documenting notable geological features, and includes in-app tools to create and share them.Field observations shared by Rockd users add rich local details to Macrostrat's geological information and increase the app's effectiveness as a discovery platform.Since checkins are dominated by outcrop scale, easily digestible snippets of geological information, they are particularly valuable for outreach and education (Cohen et al., 2018;Schott, 2017).
The Rockd checkin system has become the basis for a major collaborative exercise to compile information about important outcrops.As of March 2023, over 95,000 users have created Rockd user accounts, and active users have contributed 33,000 observations and photos at 19,000 locations.The resulting dataset constitutes a substantial and rapidly growing global index of important field sites.Through maintaining Rockd, Macrostrat staff have accrued substantial expertise in building user-facing software systems (e.g., user management, authentication, and data quality control), which will be brought to bear for a collaborative Macrostrat platform (Section 5.3).

| Engaging new research communities
In many other parts of the geosciences, researchers have begun to develop community-level aggregate datasets; these parallel efforts are ripe targets for integration with Macrostrat.One potential such integration is with the Sedimentary Geochemistry and Paleoenvironments (SGP) project (Farrell et al., 2021), which seeks to combine geochemical proxy data from many sources to build deep-time palaeoenvironmental records.Its research consortium structure has allowed it to rapidly achieve the scale of established collaborations such as PBDB.As of late 2022, over 150 geoscience researchers worldwide are engaged in compiling this dataset.SGP does not have a unified time-stratigraphic model, and the fidelity of its proxy dataset would improve substantially when linked to Macrostrat's sophisticated representation of stratigraphy.With user-friendly column entry tools, SGP researchers could compile stratigraphic datasets from the literature alongside their targeted proxy records.
Another axis of community engagement will be the creation of working groups to digitize stratigraphic records for specific areas of the globe.Often, this will mean working with geologic surveys who maintain the canonical representations of stratigraphy in various regions.Scientific working groups will ingest finer scale stratigraphic compilations (e.g., for a specific basin) to address specific geologic research questions.The openness and flexibility of the Macrostrat system will allow data compiled independently by many users to be freely applied to other scientific purposes.

| Enhancing stratigraphic data
As new, more refined data are published using Macrostrat's tools, data immediately become available on publicly accessible APIs.New data added to the system will, therefore, receive community-wide visibility as part of a global dataset, and curators will gain attribution accordingly.Straightforward access to up-to-date, expert-compiled stratigraphic records will greatly benefit the geologic community, who will immediately be able to use the data through platforms like Macrostrat's web applications and Rockd.
Importing stratigraphic data into Macrostrat produces a standardized record that is automatically linked to other expressions of geologic history, such as map units, palaeogeographic rotations, and geochemical proxies.Building capabilities for stratigraphic analysis and expression will provide more tools for users of the Macrostrat system, contributing to its attractiveness and to the scope of science that is feasible on the platform.Here, we outline some emerging capabilities that will potentially make the platform more useful to deep-time geoscience researchers, especially as the core column dataset expands.

| Plate reconstructions
Palaeogeographic reconstructions have been integrated into Macrostrat's API (Peters et al., 2018), enabling rotations to past continental positions for Macrostrat, Rockd, the PBDB Navigator, and other web-based data analysis and visualization software.Macrostrat palaeogeography capabilities were previously based on the GPlates web service (Müller et al., 2018), but have been expanded using Corelle (Table 1g) (https://rotate.macrostrat.org), a GPlates-compatible system geared towards efficient web visualization for multiple plate rotation models.These capabilities are a key building block for data-driven palaeoenvironmental maps derived from columns, maps, and other environmental data.

| with proxy
Aligning Macrostrat's stratigraphic archives with proxy datasets (e.g., geochronology, taxa first appearances, and stable-isotope geochemistry) is important to answering a variety of science questions (Section 1.2).Macrostrat provides the ability to link proxy records to rock units within its column data structure (Figure 6), establishing chronostratigraphic context for measurements and providing new descriptive information for rock units.
Macrostrat maintains a large collection of proxy datasets, especially focusing on whole-rock geochemistry and detrital zircon geochronology compilations (Puetz, 2018); we also compile data directly from the primary literature.However, compiling proxy data is not Macrostrat's primary expertise, and we seek to shift the locus of this work into communities that produce this data.Recently, by partnering with SGP (Section 4.2) and taking a leadership role in the building data systems for geochemical labs (Quinn et al., 2019; https://sparr ow-data.org),we have sought to build data integrations that allow Macrostrat to benefit from large-scale community efforts to digitize proxy datasets.Linking dedicated compilations of proxy data to Macrostrat can situate records of Earth history within a time-dynamic crustal framework, building new capabilities for deep-time research (Figure 7).Macrostrat's integrated age model (Peters et al., 2018; Section 1.1.1)is readily applied to all stratigraphies tracked in Macrostrat.However, its simplistic approach sidesteps process-based systematics like orders-ofmagnitude variation in deposition rates (Sadler, 1981) and ignores age constraints from fossil occurrences, geochronology, and sequence-stratigraphic correlations that are not incorporated into published unit ages.This approach is most appropriate for summary datasets which gain their general age reference frame prior to entering the Macrostrat system.However, combining highresolution stratigraphic columns and unit-referenced proxy records could allow Macrostrat to take advantage of sophisticated age-modelling approaches, such as constrained optimization against the ordering of biostratigraphic events (Sadler, 2001(Sadler, , 2006)), chemostratigraphic pattern-matching (Hay et al., 2019), and Bayesian inference (Haslett & Parnell, 2008;Trayler et al., 2020).
An extension of Macrostrat's age-modelling framework to measured-section scale lays the groundwork for generative processes to apportion time through the geologic record, correlating major stratigraphic boundaries and generalizing units from field to regional scale.This approach would allow replication of the current Macrostrat dataset and major Earth-history synthetic products like the Geologic Timescale (Gradstein et al., 2020) directly from field constraints using a repeatable, software-encoded process.The resulting first-principles digital chronostratigraphic model for the entirety of Earth history would be a satisfying evolution of the Macrostrat project.

| Geologic framework models
One area that has seen relatively little work so far is the creation of 3D models from stratigraphic and geologic Macrostrat datasets spatial depth-domain context to what might exist in the subsurface.A growing set of available constraints from new stratigraphic datasets should allow the construction of high-quality geologic framework models.Other efforts to produce generative framework models from dynamic constraints have seen impressive results (Boyd, 2019).Making progress in this direction will fulfil a longstanding goal for Macrostrat to become a fully defined spatial and temporal index of the Earth's rock record.

| Digitally enabled stratigraphic publication
Taken together, the approaches described here allow Macrostrat to mediate significant aspects of geoscience data collaboration in a web-based software environment.This is a substantial advance on current practices, in which stratigraphic data are managed manually through its compilation and peer review.The Macrostrat system may ultimately provide a platform to model draft data in a digital framework throughout the interpretive process.
As more workers ingest local to mesoscale stratigraphic datasets, there is an increasing likelihood that conflicting stratigraphic models based on differing interpretations will be encountered.Even with the limited number of independent projects currently in Macrostrat, these conflicts are already evident Figure 6a and b.These disagreements are an inevitable part of the scientific process.If the community begins contributing to Macrostrat at the scale envisioned here, production of coherent community-level column datasets will require resolving these conflicts using tools and processes to assess, review and synthesize competing stratigraphic models (Section 5.3).
One potential endpoint of this approach could be a coordination process similar to literature peer review integrated with Macrostrat's user interfaces, and a staged process for assimilation into Macrostrat's canonical map and column datasets.Well-designed software for organizing this work may allow Macrostrat datasets to be credited as a scientific product, providing traceability of the interpretive process and a further incentive for researcher engagement.

| Conclusion
Macrostrat is a foundational software platform for the description of the Earth's crust, but its lack of both global coverage and high-resolution local stratigraphy limits its research and contextual value.Here, we have outlined a community-driven collaborative approach that will allow us to move this science-enabling resource past these data limitations.This new systematic approach requires the construction of substantial new infrastructure, deemed 'Macrostrat v2', to harness this collaborative effort.The enhanced Macrostrat system will include rich data visualizations and column editing software, potentially extending to include systems of review.
This extension of Macrostrat's software infrastructure is already underway.Several prototype applications for data visualization and column entry have been developed, prioritizing increased usability.Once finalized, these tools will enable more vigorous collaboration to create and maintain stratigraphic records.Greater community involvement, especially 'crowd-sourced' local geologic expertise, will be directed towards filling gaps in global datasets and enriching Macrostrat's store of localscale data.
Ultimately, building a multiscale stratigraphic compilation is a key step towards producing a spatiotemporally complete digital record of the Earth's crustal framework.Such a 'digital twin' can situate geological models of many types and is of great value to future geoscience research.Realizing this vision requires the engagement of software developers, funding agencies, and geoscientists to extend digital infrastructure and contribute to community data resources.We expect that the new capabilities of Macrostrat outlined here will shape these efforts in the years to come.

F
Global distribution of Macrostrat data.(a) Core stratigraphic column dataset showing a heavy weighting towards North America.Inset shows the temporal organization of a single column.(b) Sources for the multiscale geologic map dataset, with highest levels of detail in selected areas of western North America.

F
I G U R E 2 User-facing software for exploring the Earth's geologic record.(a) Macrostrat's web interface (https://macro strat.org;Table 1a) showing our geologic map atop a north-oriented oblique globe view.(b) Same software showing a 3D view of Golden, Colorado with contextual information panel.(c) Map and location context views for the mobile platform in the Rockd app (https://rockd.org;Table

F
I G U R E 3 Expressions of stratigraphic columns produced by Macrostrat's web-based renderer (Table 1c).(a) Macrostrat regional column dataset (vertical axis = time).(b) A measured section and associated chemostratigraphy (vertical axis = height, horizontal axis = grain size, colours = facies).(c) Detailed measured column with aligned graphic log and field notes.(d) Ocean-drilling core log (vertical axis = depth below seafloor) (Section 4.1.2).

F
Stratigraphic name selection workflow, designed to guide users towards linking a Macrostrat geologic unit to existing lexicons.(a, b) Searching stratigraphic names by fuzzy string matching.Formations known to occur near the column location are prioritized, and tags show the source and relevance of each name.(c) Modal view of detailed information for a specific stratigraphic name, including source and name hierarchy.(d, e) Alternate flow for creating a new stratigraphic name, for cases when a lexicon match is not available.(f) Summary panel for a successfully linked stratigraphic name.

F
Spatial and temporal overlay of Macrostrat geologic framework and proxy records.(a) Corelle (Table 1g) web visualization showing a Gondwana-centred south polar view rotated to 341 Ma (reconstruction from Wright et al., 2013) overlain by Macrostrat column footprints and a density plot of PBDB fossil collections (grey), Macrostrat measurements (blue), and SGP samples of Visean (lower Carboniferous) age.(b) Macrostrat total sedimentary area through geologic time (black line); TOC measurements from SGP, summarized with a loess regression (red).Sparse spatial and temporal coverage emphasizes the need for new data compilation.
Software managed by Macrostrat and mentioned in this article.a T A B L E 1a Repositories are relative to https://github.com/UW-Macrostrat.