A Graph Theoretical Intercomparison of Atmospheric Chemical Mechanisms

Graph‐theoretical methods have revolutionized the exploration of complex systems across scientific disciplines. Here, we demonstrate their applicability to the investigation and comparison of three widely used atmospheric chemical mechanisms of varying complexity: the Master Chemical Mechanism v3.3, GEOS‐Chem v12.6, and the Super‐Fast chemical mechanism. We investigate these mechanisms using a class of graphical models known as species‐reaction graphs and find similarities between these chemical reaction systems and other systems arising in nature. Several graph theoretical properties are consistent across mechanisms, including strong dynamical system disequilibrium and clustering of chemically related species. This formalism also reveals key differences between the mechanisms, some of which have characteristics inconsistent with domain knowledge; e.g., isoprene and peroxy radical chemistry exhibit substantially different graph properties in each mechanism. Graph‐theoretical methods provide a promising set of tools for investigating atmospheric chemical mechanisms, complementing existing computational approaches, and potentially opening new avenues for scientific discovery.

than 25 species and reactions to >10,000. These larger mechanisms frequently include contributions from teams of researchers for a variety of purposes, and do not always follow consistent design principles. As the complexity of these mechanisms grows, understanding the behavior of these systems through manual inspection of the chemical reactions becomes intractable. Thus, the emergent properties and characteristics of the chemical system are typically investigated using computational approaches, e.g., where the system of differential equations for chemical abundances represented by the mechanism is solved for a given set of initial conditions (Brasseur & Jacob, 2017).
In this work, we present an alternative approach to investigating complex atmospheric chemical mechanisms. We show how discrete mathematical techniques from the field of graph theory can provide insights into the inherent structural properties of atmospheric chemical mechanisms that are independent of any specific chemical state, can be used to evaluate the consistency of a mechanism with existing domain knowledge, and can potentially provide new avenues for insight into atmospheric chemical systems. We begin by introducing chemical graph theory in the context of atmospheric chemistry, followed by several examples demonstrating how these graph-theoretical methods can allow for the quantitative exploration and intercomparison of gas-phase atmospheric chemical mechanisms across scales.

Chemical Graph Theory
Graph theory provides a set of tools for understanding the relationships within any type of complex system, such as a chemical mechanism, that can be represented as a "graph." In general terms, a graph, G = (V, E), is a defined mathematical construct that is composed of a set of vertices, V, interconnected by pairwise relationships expressed by a set of edges, E. Edges connect vertices whose interaction we are interested in understanding. Mathematical operations can be applied to graphs to provide insight into complex system behaviors (Estrada, 2016). As a set of data analytical tools, graph-theoretical methods are widely used in a variety of domains, including social network analysis, neuroscience, genetics, and ecology (Estrada, 2016). For a more detailed introduction to graph theory, see Wilson (2010) and Estrada (2016).
Graph-theoretical methods have already been applied in a variety of chemical applications including the investigation of reaction mechanisms (e.g., Lu & Law, 2005, 2006, and particularly for organic chemical synthesis (Bajczyk et al., 2018;Fuller et al., 2012;Grzybowski et al., 2009). For example, Grzybowski et al. (2009) compiled a list of all known reactions in the synthesis of organic molecules into a graph, and illustrated the vast complexity of this chemical system and the potential utility of graph-theoretical methods for chemical synthesis. In an atmospheric context, Dobrijevic et al. (1995Dobrijevic et al. ( , 2010 used similar methods in studies aimed at exploring the important production pathways of organic species in the atmospheres of Neptune and Titan. Chemical mechanisms can be represented by a specific type of graph known as a species-reaction graph (Feinberg, 2019;Sakamoto et al., 1988). In a species-reaction graph, each chemical species and each reaction equation is represented as a vertex in the graph, and edges connect each reaction with its chemical reactants and products. The edges are directional, with edges pointing from reactants to each of the reaction equations they participate in, and edges pointing from the reaction equations to each of their products. Species-reaction graphs belong to a class of graphs termed a "directed bipartite network," where vertices belong to two distinct sets (i.e., chemical species, and reaction equations), and where every edge connects a vertex from one set (a chemical species) to a vertex from the other set (a reaction). The species-reaction graph is distinct from simply connecting reactants to products, as has been done in previous work (Grzybowski et al., 2009). Using the bipartite representation with intermediate reaction vertices is preferable, as it directly represents the interactions between reactants, and is thus a more realistic representation of the complex system (Estrada, 2016;Feinberg, 2019).
An example of a simple species-reaction graph for a two-reaction mechanism is shown in Figure 1. The reactants CH 2 O and OH are connected to their reaction vertex R 1 , which is connected to its products, H 2 O, CO, and HO 2 . The second reaction, the oxidation of CO: CO + OH, is encoded analogously.

Chemical Mechanisms
In this work, we present a graph theoretical analysis of three gas-phase atmospheric chemical mechanisms from models of varying complexity. All graph theoretical analyses presented here were completed using two freely available software packages: gephi 0.9.2 (Bastian et al., 2009) and igraph 1.0.0 (Csardi & Nepusz, 2006).
The simplest chemical mechanism used here is the "Super-Fast" mechanism (Cameron-Smith et al., 2006). It was designed for computational efficiency and to be used in multidecadal or multicentury global climate model simulations, with a major aim of predicting the concentration of ozone and the chemical control on the atmospheric lifetime of methane. It is currently an option for use in two major U.S. climate models: the Community Earth System Model and the Energy Exascale Earth System Model. The mechanism has 18 chemical species and 20 unique chemical reactions; for more details, see Cameron-Smith et al. (2006). Although it is a simplified representation of atmospheric chemistry, the Super-Fast mechanism produces reasonable estimates of the concentrations of pollutants and the chemical lifetimes of greenhouse gases (Brown-Steiner et al., 2018).
As an example of intermediate complexity chemical mechanisms, we use the mechanism from the GE-OS-Chem model (www.geos-chem.org; Bey et al., 2001). This mechanism is predominantly used in an offline chemical transport model (i.e., with prescribed atmospheric transport and physics) and typically is run over time periods ranging from a few days to a few years. These differences enable chemical transport models to afford the increased computational burden required to achieve greater chemical complexity and realism. More specifically, we use the "Tropchem" mechanism from GEOS-Chem v12.6, which was designed to simulate the chemical composition of the atmosphere to high fidelity in the global troposphere, with a detailed representation of HO x -NO x -BrO x -VOC-Halogen chemistry (Mao et al., 2013;Sherwen et al., 2016;Travis et al., 2016). This mechanism is an order of magnitude more complex than the Super-Fast mechanism, with ca. 200 species and 750 reactions.
The most detailed representation of atmospheric chemistry used in this work is the "Master Chemical Mechanism," or MCM (http://mcm.leeds.ac.uk/; Jenkin et al., 1997). We use the MCMv3.3, which was designed as a zero-dimensional model for atmospheric chemistry and is too computationally expensive to be routinely used in 3-D chemistry and climate models. The MCM is one of the most comprehensive existing atmospheric chemical mechanisms, containing nearly 6,000 species and 17,000 reactions. It has a state-ofthe-science representation of known atmospheric chemical reactions (Bloss et al., 2005;Jenkin et al., 1997Jenkin et al., , 2012Jenkin et al., , 2015Saunders et al., 2003), and is frequently used as a reference case when developing reduced-complexity mechanisms (e.g., Chan Miller et al., 2017).
The species-reaction graphs for each of the three mechanisms described above are visualized in Figure 2. The large range in chemical complexity is visually apparent as the MCM is far more densely connected and less sparse than the Super-Fast mechanism.

Global Graph Properties
Global properties of entire graphs can provide useful summary statistics of the overall complex system, including the presence of highly clustered vertices, dynamical instabilities, and more. We begin by describing a set of scalar quantities known as modularity and reciprocity for each chemical mechanism and linking them to known design considerations in the development of chemical mechanisms, followed by a brief description of other graph theoretically interesting properties. The graph properties in this section are calculated directly from the structure of the graphs and are independent of any particular chemical state. SILVA ET AL.

10.1029/2020GL090481
3 of 10 Figure 1. A two-reaction chemical mechanism and the corresponding species-reaction graph. Each chemical species, as well as each reaction equation, is represented by a single "vertex" (shown as a box), while each reaction is represented by a directed "edge" (shown as an arrow in the diagram). Red and blue colors highlight the reaction equation vertices and the edges belonging to the corresponding chemical reaction.

Modularity
Modularity is a measure of clustering (or community structure) in graphs, aimed at characterizing graphs with densely connected groups or clusters of vertices (Newman, 2006). Modularity scores range from 0 to 1, with higher values indicative of a more "modular" graph. A graph with completely random connections between vertices has a modularity score of zero. For more details on the calculation of modularity see Estrada (2016) and Newman and Girvan (2004). The modularity of each chemical mechanism investigated in this work is given in Figure 2. As chemical mechanism complexity increases and more classes of reactions are considered, there is an associated increase in modularity, a result consistent with a previous analysis of simplified chemical mechanisms of Earth and other planetary atmospheres (Estrada, 2016). This indicates that the more complex representation (i.e., MCM) is both less random in its organization and contains more clusters of interrelated vertices than the simple mechanisms (Super-Fast). This reflects in part the fact that many closely interrelated chemical species are grouped together (known as "lumping") in simpler chemical mechanisms (e.g., Emmerson & Evans, 2009). This lumping effectively replaces clusters of highly interconnected vertices or chemically similar species with a single vertex, which reduces the mechanism's modularity.

Reciprocity
Reciprocity is a measure of what fraction of the edges from one vertex to another point in both directions. In the context of directed graphs of complex systems such as the species-reaction graph, low reciprocity typically indicates that the system is highly unstable (Estrada, 2016). Chemically, reciprocity represents the fraction of single step reactions in a given mechanism where at least one product is also a reactant. In atmospheric chemical mechanisms, reciprocal reactions exist for two main reasons: they either are reactions that require the presence of chemically inert third bodies (e.g., H 2 O or N 2 ), or are approximations of fast chemistry in reduced-complexity mechanisms (e.g., rapid catalytic reaction cycles or radical recycling approximated by a single reaction step). Reciprocity values shown in Figure 2 indicate that while all three graphs have low reciprocity, reciprocity decreases with graph complexity consistent with a simplified representation of fast chemical processes that are more directly resolved in the detailed mechanisms (e.g., OH recycling and isoprene chemistry).
SILVA ET AL.

Other Properties of Interest
There are other properties of chemical mechanisms that are interesting from a graph theoretical perspective, and aid in relating these graphs to wider classes of well-studied graphs. These include the approximate power-law scaling of the degree distributions, and the small-world classification of the graphs.
The number of edges connected to a given vertex is known as the vertex degree. The distribution of the degrees of all vertices in a graph is known as the "degree distribution." For all three graphs studied here, much of the degree distribution (plotted for all vertices in each bipartite graph in Figure S1) roughly follows a power-law scaling, where the majority of vertices have a low degree, meaning they participate in few reactions, and a small subset of vertices has a very high degree, meaning they participate in many reactions. This decreasing power-law degree distribution (or "fat tailed" degree distribution) is commonly found throughout graphs in the natural world, including planetary chemical reaction networks (Broido & Clauset, 2019;Solé & Munteanu, 2004). To characterize these degree distributions, we fit the decreasing power-law portion of the distributions following D n   , where D is the in-degree or out-degree, n is the number of vertices, and γ is the exponential free parameter. The γ for all three mechanisms is summarized in Figure 2. We find that as the graphs grow in complexity, their exponential constant γ becomes more negative. This indicates that larger graphs are more interconnected and less heterogeneous (e.g., Estrada, 2010), consistent with higher chemical process resolution.
In addition to approximate power-law degree distributions, these chemical mechanisms have properties consistent with a class of graphs known as "small-world" graphs (Watts & Strogatz, 1998), which are commonly observed in natural systems (Telesford et al., 2011). Mathematically, small-world graphs are graphs where the average path length between vertices scales proportionally to the natural log of the number of vertices in that graph (Watts & Strogatz, 1998), this relationship is visualized for the mechanisms in this work in Figure S2. Small-world graphs have vertices that largely only directly interact with a few vertices, but all other vertices in the graph are quite close (i.e., a small number of edges away). This is consistent with the highly coupled nature of atmospheric chemical reactions, where despite generally having only a few direct chemical interactions, a perturbation to a given chemical species can quickly have a cascading effect across many species in the reaction network.

Vertex Characterization
In addition to global properties, graph-theoretical methods can be used to characterize the relationships between individual vertices and the entire system. In this section, we use these methods to compare the treatment of various chemical species within the three chemical mechanisms. As with the global graph properties, these vertex characterization metrics are calculated independent of any particular chemical state. This is in contrast to other common methods of analyzing chemical mechanisms (e.g., integrating the systems of differential equations), which ultimately require some information regarding the chemical state.
One of the most commonly used vertex characterization methods is known as "vertex centrality." Centrality is a metric that is calculated for each vertex in a graph, which quantifies its importance within the structure of the system (Estrada, 2016;Grzybowski et al., 2009). While many centrality metrics exists, we focus here on the "out-degree centrality." Degree centrality is simply the number of edges connected to a given vertex (i.e., the degree of that vertex). For species vertices, the degree centrality corresponds to the number of times a species is represented in a reaction mechanism in any way. Out-degree centrality is the number of edges leaving a given vertex. For species vertices, the out-degree centrality is the number of times a species is represented in a reaction mechanism as a reactant. Vertices with higher out-degree centrality are commonly identified as important vertices within a graph (Estrada, 2016), and we use this metric here to quantify the important species within the structure of the chemical reaction mechanisms. We focus on out-degree centrality over the degree centrality to maintain a focus on reactively important species. Species with a high degree centrality, by contrast, include those that are present in many reactions as products but are not particularly relevant to the gas-phase chemistry (e.g., CO 2 in the GEOS-Chem mechanism, which is included in certain reactions to balance carbon). We acknowledge that this centrality score structural importance metric is quite simple, and likely misses some important indirect interactions in the system, particularly as they relate to less reactive species that are known to strongly influence the atmospheric chemical state (e.g., CH 4 ).
The five species with the highest out-degree centrality for each of the three chemical mechanisms are shown in Table 1. In each mechanism the chemical species OH has the highest centrality score. This structural importance is consistent with domain expertize in atmospheric chemistry, since OH is well-known to be the atmosphere's most important oxidant (Jacob, 1999;Seinfeld & Pandis, 2016). After the highest-ranked vertex of OH, the mechanisms differ in their ranking of species by out-degree centrality. All three include HO 2 and NO among the top five species, which reflects the importance of the HO x family (OH + HO 2 ) and reactive nitrogen chemistry in the structure of all three models.
In the MCM, HO x and reactive nitrogen species (NO, NO 2 , and NO 3 ) comprise the entirety of the five highest-ranked species. In the GEOS-Chem and Super-Fast mechanisms, peroxy radicals are among the highest-ranking species: methylperoxy radical in the Super-Fast, and methylperoxy (MO 2 ) and peroxyacetyl radicals (MCO 3 ) in GEOS-Chem. The MCM includes a highly detailed representation of peroxy radical chemistry, with >1,200 total peroxy radical species. GEOS-Chem includes >20 peroxy radical species, and Super-Fast includes only one (methyperoxy radical). This reduction of species increases the relative importance of peroxy radicals in the simpler schemes. If peroxy radicals were similarly grouped together as a single vertex in the graph of the MCM mechanism (i.e., as "RO 2 "), it would similarly appear in the top five as in the Super-Fast and GEOS-Chem. Much of the difference between the GEOS-Chem and MCM mechanisms lies in this treatment of peroxy radicals. Consistent with atmospheric chemistry domain knowledge, in the MCM all organic peroxy radicals systematically react with NO, NO 2 , NO 3 , HO 2 , or an organic peroxy radical. To reduce complexity, not all of these organic peroxy radical pathways are included in GEOS-Chem or Super-Fast.
These graph theoretical importance metrics can yield insights into the structure of these mechanisms independent of any particular chemical conditions. By soley characterizing structural features of the graph, these methods can be more robust to compensating errors in mechanism construction, e.g., elevated production and loss terms canceling out during model integration, which would be obscured by other computational analysis methods. It should be noted that while the characterization metrics are independent of chemical state, the mechanism structure is the result of design choices that are implicitly driven by the atmospheric chemical state. These choices are largely influenced by current understanding of the reactive chemistry of the atmosphere, and by the relative importance assigned by experts to various aspects of atmospheric chemistry in the mechanism design. For example, the higher rank of NO for GEOS-Chem is indicative of design choices to emphasize high NO x urban chemistry over low NO x remote chemistry.
These structural importance metrics ultimately have implications for the computational efficiency of a given mechanism. In the case of GEOS-Chem, the large NO out-degree centrality importance ranking means that for all chemical conditions, the mechanism calculates many reactions including NO. Even in regions or use cases where the ultimate prediction of the atmospheric chemical state is not sensitive to NO concentrations, the GEOS-Chem mechanism still represents a relatively large number of these reactions, leading to reduced computational efficiency for a given level of target mechanism performance in certain chemical regimes.
Species centrality rankings like those in Table 1 can be used to confront the process representation within the mechanisms with a priori assumptions and knowledge of atmospheric chemical reactions. This is not specifically limited to out-degree centrality. For example, in the GEOS-Chem mechanism CO 2 has an in-degree centrality of 31, meaning that in GEOS-Chem there are 31 reactions that produce CO 2 . This conflicts with domain knowledge in atmospheric chemistry where only one reaction is the dominant chemical source of CO 2 , namely the oxidation of CO by OH, and reflects choices made to balance carbon in the mechanism. Similar conclusions can be reached investigating the Super-Fast mechanism where the compound isoprene has an in-degree centrality greater than zero, despite that there are no major chemical sources of isoprene thought to exist in the atmosphere. Both of these inconsistencies with domain knowledge exist to SILVA ET AL.  achieve a given design goal of the mechanisms, specifically carbon closure in the GEOS-Chem mechanism (Safieddine et al., 2017) and the maintenance of reactive carbon in the Super-Fast mechanism (Cameron- Smith et al., 2006).
To further investigate the differences between the three chemical mechanisms, we explore the graph lineages of several species that are well-known to be important in atmospheric chemistry. In the context of graph theory, lineage corresponds to the minimum number of edges between a given a vertex and all other vertices in the graph. Lineage distances in the backwards direction are known as ancestor distances, and those in the forward direction are known as descendant distances. The ancestor distance indicates how many reaction steps a species is from all possible precursors, and the descendant distance indicates how many reaction steps a species is from all possible products. To facilitate comparison between mechanisms of varying complexity, ancestor and lineage distances are normalized by the natural log of the average path length in each graph. This accounts for the fact that larger graphs can have larger lineage distances simply due to their increased size. These are most readily visualized through histograms, shown in Figure 3 for the following species commonly studied in atmospheric chemistry: CH 4 , isoprene (C 5 H 8 ), ozone (O 3 ), and OH.
The differences in these chemical mechanisms are visually apparent in Figure 3. In general, the largest outlier is the Super-Fast mechanism, with GEOS-Chem and the MCM representing similar relative lineages of the four species.
In both GEOS-Chem and the MCM, methane has the largest average ancestor distance, or number of reaction steps, reflecting the fact that the small gas-phase chemical source of methane (oxidation of large molecules such as acetaldehyde and vinylhydroperoxide) is not tightly coupled to most of the reactions that occur in the mechanisms. In Super-Fast, the chemical source of methane is not represented at all, a simplification related to the overwhelming fraction of the methane source from direct emissions. From SILVA ET AL.
10.1029/2020GL090481 7 of 10 the ancestor distances, it is apparent that the majority of species in all three mechanisms are in some way chemical products related to methane chemistry. This is largely achieved through the influence of methane on atmospheric oxidation. The Super-Fast chemistry shows a much larger spread in the relative number of reaction steps, and as the mechanisms increase in size, the fraction of chemical species influenced by methane oxidation grows.
The lineage analysis also immediately reveals some other key ways in which the simplified Super-Fast mechanism differs from the more complex GEOS-Chem and MCM mechanisms, particularly in its treatment of the formation both isoprene and ozone. Isoprene has no chemical sources in the real atmosphere, nor in the two more complex mechanisms, and therefore has no ancestors (as seen in Figure 3). However, in the Super-Fast mechanism, isoprene has chemical precursors driven in some way by ∼65% of the mechanism. This nonphysical relationship was implemented in the Super-Fast mechanism to meet the design goal of maintaining the abundance of reactive carbon in the atmosphere in order to reproduce ozone concentrations and the chemical loss of methane at low computational cost (Cameron-Smith et al., 2006). For ozone, the dominant production pathway in all mechanisms is through direct photolysis of NO 2 . This is an assumption made for computational simplicity, as the reaction of O + O 2 is exceedingly fast in the atmosphere and the atmospheric reservoir of O 2 is large. This is the only formation mechanism for ozone in the Super-Fast mechanism, and as such there are no gas-phase reaction precursors to ozone production. There are precursors in the GEOS-Chem and MCM mechanisms; however, as the chemistry is sufficiently detailed to represent other possible formation pathways (e.g., OH + OH, etc.).
Of the four species described here, the representation of OH within all three mechanisms is the most similar from a lineage perspective. Consistent with domain knowledge in atmospheric chemistry, OH is tightly coupled to most species in the mechanism, with low average ancestor and descendant distances. As the mechanisms increase in complexity, the fraction of the mechanism related to the formation of OH increases from ∼0.5 in the Super-Fast to 0.95 in the MCM. The same conclusion holds for the species OH influences, which increases from ∼0.8 in the Super-Fast to 1.0 in the MCM.

Summary and Future Outlook
Graph-theoretical methods provide a quantitative set of tools for investigating chemical mechanisms relevant to atmospheric chemistry. We demonstrate their applicability through comparison of three chemical mechanisms that vary across a wide spectrum of complexity using a class of graphical models known as species-reaction graphs. The mechanisms are similar at a high level but have substantial differences apparent upon investigation of graph theoretical properties. These differences are structural within in the mechanisms and are independent of chemical state. Interrogation of chemical mechanisms with graph-theoretical techniques can provide objective metrics for assessing the properties of the mechanisms against domain knowledge and prior beliefs. Beyond the possibilities for additional insights and perspectives in atmospheric chemistry, a key advantage of these graph-theoretical methods is their modest computational requirements: any part of the chemical analysis we presented in this paper can be completed in <5 min of total compute time using only a laptop computer with freely available open source software. Ultimately, graph-theoretical techniques enable a new set of tools that can be used in conjunction with traditional computational approaches for the investigation and interpretation of atmospheric chemical mechanisms.
In our future work, we aim to generalize the methods proposed in this paper across a variety of atmospheric conditions, while enhancing an additional set of promising graph-theoretical tools relevant to atmospheric chemistry. We plan to directly incorporate chemical information about reaction rates and net chemical fluxes into the analysis presented in this paper, including the impacts of processes external to the gas-phase kinetic mechanism such as photolysis, heterogeneous chemistry, and chemical sources and sinks. A particularly interesting direction for future work is the application of graph clustering (e.g., Barber, 2007;Newman & Girvan, 2004) to atmospheric chemistry. Graph clustering on species-reaction graphs may enable the development of reduced-complexity chemical mechanisms, with new reproducible and quantitative methods of model simplification, building on previous literature from other fields of chemistry (Feinberg, 2019;Lu & Law, 2006;Sander et al., 2019). Additional future work will explore how graph-theoretical techniques can be applied to chemical mechanisms to rapidly diagnose errors in chemical mechanism construction (e.g., missing or incorrect edges or reactions, Chapelle et al., 2010), the model's sensitivity to process uncertainties (e.g., reaction rates, fluxes, etc., Weir et al., 2017), and the influence of new reactions on the complex system (e.g., through influence maximization techniques, Minutoli et al., 2019). As the modern understanding of the chemistry of the atmosphere grows more complex, there is an increasingly pressing need for novel approaches to gain insight into, and simplify, this complexity. Graph-theoretical techniques thus represent a valuable complement to existing tools in computational atmospheric chemistry research.

Data Availability Statement
All the data used in this work, specifically the mechanism graphs files, are available at 10.5281/zenodo.3970944. The gephi software is available at https://gephi.org/users/download/and the igraph software is available at https://igraph.org/r/.