Joint CO2 Mole Fraction and Flux Analysis Confirms Missing Processes in CASA Terrestrial Carbon Uptake Over North America

Terrestrial biosphere models (TBMs) play a key role in the detection and attribution of carbon cycle processes at local to global scales and in projections of the coupled carbon‐climate system. TBM evaluation commonly involves direct comparison to eddy‐covariance flux measurements. We use atmospheric CO2 mole fraction ([CO2]) measured in situ from aircraft and tower, in addition to flux‐measurements from summer 2016 to evaluate the Carnegie‐Ames‐Stanford‐Approach (CASA) TBM. WRF‐Chem is used to simulate [CO2] using biogenic CO2 fluxes from a CASA parameter‐based ensemble and CarbonTracker version 2017 (CT2017) in addition to transport and CO2 boundary condition ensembles. The resulting “super ensemble” of modeled [CO2] demonstrates that the biosphere introduces the majority of uncertainty to the simulations. Both aircraft and tower [CO2] data show that the CASA ensemble net ecosystem exchange (NEE) of CO2 is biased high (NEE too positive) and identify the maximum light use efficiency Emax a key parameter that drives the spread of the CASA ensemble in summer 2016. These findings are verified with flux‐measurements. The direct comparison of the CASA flux ensemble with flux‐measurements confirms missing sink processes in CASA. Separating the daytime and nighttime flux, we discover that the underestimated net uptake results from missing sink processes that result in overestimation of respiration. NEE biases are smaller in the CT2017 posterior biogenic fluxes, which assimilate observed [CO2]. Flux tower analyses reveal an unrealistic overestimation of nighttime respiration in CT2017 which we attribute to limited flexibility in the inversion strategy.

and with flux priors have been the focus of the atmospheric inversion community (Baker et al., 2006;Gurney et al., 2002;Philip et al., 2019;Schuh et al., 2019). Flux priors are commonly obtained from the simulated net ecosystem exchange (NEE) from terrestrial biosphere models (TBMs).
TBMs (a "bottom up" approach) simulate surface CO 2 fluxes from site level to global scale as they integrate ecological and meteorological drivers (Fung et al., 1987). Flux priors in atmospheric inversions are commonly obtained from the NEE simulated by TBMs. TBMs also simulate the land carbon component in Earth system models used for climate projections. Joint assimilation systems have also been developed to benefit from both ecosystem measurements and atmospheric mole fractions (Kaminski et al., 2002). The optimization procedure relies on an adjoint model of the biogeochemical processes such as the Biosphere Energy-Transfer Hydrology model included in the Carbon Cycle Data Assimilation System at large scales (Knorr, 2000). Such a system was enhanced to assimilate satellite ecosystem products available over the globe (Kaminski et al., 2012) or eddy-flux and mole fraction measurements over continents (Koffi et al., 2013). These approaches optimize a number of model parameters in part of the underlying TBMs, and have been applied using remote [CO 2 ] from towers or satellites (e.g., Scholze et al., 2019). On sub-continental scales, atmospheric [CO 2 ] require fine-resolution models to simulate the complex atmospheric dynamics (e.g.,  and dense tower networks (Andrews et al., 2014) combined with eddy-flux tower networks (e.g., AmeriFlux) covering a wide array of ecosystems and climatic zones for evaluations TBMs have been shown to vary widely in their projections of terrestrial CO 2 sink strengths, not only in magnitude but also even in sign (Baker et al., 2006;Gurney et al., 2002). Huntzinger et al. (2011) evaluated flux variability from four TBMs over North America and the potential impact on the inversion results. They found that the diurnal variability in surface fluxes within the near field of tower [CO 2 ] observations appear to have a significant impact on the high-frequency variations in the atmospheric data, and, thus, the inversion needs to adjust the temporal (and spatial) variability of the prior fluxes.  compared the modeled [CO 2 ] errors attributed to biogenic CO 2 fluxes, fossil fuel emissions, atmospheric transport, and large-scale boundary inflow from daily to annual timescales and discovered that the biogenic CO 2 fluxes dominate the model errors across timescales, implying that [CO 2 ] observations hold promise for evaluating and improving TBMs. Global inversions usually optimize CO 2 fluxes on monthly to annual timescales at the grid scale in the magnitude of a few degrees by a few degrees. Higher spatiotemporal resolutions in the prior are often not altered. As the resolution of inversion system improves and the need for accurate flux data at high resolution increase, evaluating the higher-resolution spatial and temporal structure of TBMs is increasingly important. TBMs with more accurate, high-resolution surface biogenic CO 2 fluxes will improve our diagnosis, attribution, and projection of terrestrial carbon dynamics, and improve "top-down" analysis systems that rely on TBMs for a priori surface flux estimation. Several multiple-model intercomparison projects have been conducted to characterize or synthesize current understanding of land-atmosphere carbon exchange and inform the uncertainty or confidence surrounding projections of future exchange and feedbacks with the climate system, such as the Multi-Scale Synthesis and Terrestrial Model Intercomparison Project (Huntzinger et al., 2013) Trends in Net Land-Atmosphere Carbon Exchange (TRENDY1; Sitch et al., 2008), the regionally focused Large Scale Biosphere Atmosphere-Data Model Intercomparison Project (LBAMIP2; Gonçalves et al., 2013), and the International Land-Atmosphere Benchmarking Project (ILAMB; Luo et al., 2012). These model intercomparison projects were built upon the protocols that specify standard model inputs, simulations and simulation setup procedures. These model intercomparison exercises have been used to explore the uncertainty in model simulations that arises from internal variability, boundary conditions, and parameter values for structural uncertainty from different model fluctuations (Schwalm et al., 2015). However, due to the complexity of TBMs, it is challenging to trace errors in individual models to misrepresentation of specific ecological processes or inappropriate model parameters through these model intercomparison projects. This merits deep exploration of just one modeling framework but with a perturbed parameter ensemble.
Recently, Zhou, Williams, Lauvaux, Davis, et al. (2020) introduced an ensemble of biogenic CO 2 fluxes simulated by the Carnegie-Ames-Stanford-Approach (CASA) biosphere model for North America at the resolutions of 5 km for North America and ∼500 m for the US CONUS region from 2003 to 2019 by perturbing three model parameters-maximum light use efficiency E max , optimal temperature of photosynthesis T opt , and temperature response of respiration Q 10 -on the basis of the CASA biome types. Those parameters were chosen as a result of a sensitivity test of simulated biogenic CO 2 fluxes to a series of CASA parameters. Furthermore, the range of E max values was determined at the ecosystem level by comparison with eddy-covariance measurements of net CO 2 flux. Zhou, Williams, Lauvaux, Davis, et al. (2020) illustrated that the pruned L2 ensemble has good agreement with the flux data and outperforms many other TBMs at diurnal and annual scales, while overestimating model errors as represented by comparison with flux measurements.
Atmospheric [CO 2 ] measurements provide another opportunity for evaluating modeled biogenic CO 2 fluxes, as evidenced by the fact that continental [CO 2 ] gradients are mainly attributed to the biosphere (e.g., Feng, Lauvaux, Davies, et al., 2019;. The major differences between mole fraction and flux measurements fall in the size of the surface influence area, or footprint, that influences a given measurement, and also the upwind memory of the samples. The size of a flux tower footprint is only about 1 km (e.g., McCaughey et al., 2006), and the measurements carry nearly instantaneous information of surface fluxes. Owing to these two factors, the flux measurements do not directly represent regional to continental fluxes, as the local fluxes captured may not be representative of broader scale patterns. One objective of this study is to explore the coherence between CO 2 flux and mole fractions with respect to modeled biogenic CO 2 flux evaluations. The CO 2 flux measurements selected for this study are from the AmeriFlux network (https://ameriflux.lbl.gov/), which has more than 150 active flux sites sampling a wide range of sites from the Amazonian rainforests to the North Slope of Alaska.
An in situ [CO 2 ] tower typically has a footprint of hundreds of kilometers (e.g., Gloor et al., 2001;Sweeney et al., 2015) and carries the integrated atmospheric [CO 2 ] signals from day and night. Aircraft measurements have even larger footprints with broad spatial sampling. TBM-modeled CO 2 fluxes can be evaluated against [CO 2 ] measurements by means of Lagrangian and forward Eulerian transport modeling. Both are subject to transport model errors (e.g., Pillai et al., 2012). In Lagrangian modeling, the modeled biogenic CO 2 mole fractions ([CO 2 bio] hereafter) can be directly calculated by convolving biogenic CO 2 fluxes with the influencing areas of the [CO 2 ] measurements that are simulated by an Eulerian transport model (Uliasz et al., 1994). The difficulty of this approach is to define the observed [CO 2 bio] due to mixed signals in the [CO 2 ] measurements (e.g., Ogle et al., 2015). In Eulerian transport modeling, the CO 2 transport is treated as a passive tracer (Sarrat et al., 2007). The modeled total [CO 2 ] is the sum of biogenic, fossil fuel, oceanic, fire CO 2 components in conjunction with boundary conditions as described in Feng, Lauvaux, Davies, et al. (2019) and . The modeled error therefore can be from any or multiple components in addition to model transport (Feng, Lauvaux, Davies, et al., 2019;. Note that these model error sources are also of concern in the "top-down" estimation. Another objective of this work is to explore to what degree the atmospheric [CO 2 ] data can be used to evaluate TBMs. Here we use CASA (Potter et al., 1993) for demonstration and adopt the Eulerian transport model WRF-Chem (Grell et al., 2005;Skamarock et al., 2008) to serve this objective.
In this study, we employ both ground-based and aircraft in situ [CO 2 ] data for the model evaluations. The NASA-funded Atmospheric Carbon and Transport (ACT)-America project was designed to improve the CO 2 and methane (CH 4 ) flux estimates by reducing transport and flux uncertainties. Two aircraft measured atmospheric CO 2 , CH 4 , and other gas species over Mid-Atlantic, Mid-West, and Southern Gulf regions in fair and frontal weather regimes. Typical flights encompassed 4-6 h of midday conditions, encompassing 400-800 km in the horizontal and altitudes ranging from 300 to 9,000 m above ground level. Two aircraft flew together, collecting two to four vertical levels (level legs), often stacked one above the other, and typically about 8-12 vertical profiles per flight day. Pal et al. (2020) found that large horizontal and vertical gradients of [CO 2 ] exist across frontal boundaries based on the data collected from the summer 2016 campaign. The cross-frontal [CO 2 ] contrasts are greatest in the atmospheric boundary layer (ABL), ranging from 5 to 30 ppm, while the contrasts are about 3-5 ppm in the free troposphere (FT). In the vertical dimension, higher [CO 2 ] appears in the FT than in the ABL in the cold sector while the opposite pattern appears in the warm sector. Averaged ABL-to-FT [CO 2 ] differences can be about 12 and −6 ppm in the warm and cold sectors, respectively. These unique flights were designed to be highly sensitive to the seasonal magnitudes of regional-scale carbon fluxes and to provide broad spatial coverage that cannot be obtained with the current long-term observing network. The third objective is to explore to what degree the aircraft [CO 2 ] measurements and tower measurements yield consistent evaluations of the modeled fluxes. Merging these two [CO 2 ] observing systems lends more confidence to our conclusions regarding the modeled fluxes and provides a test of the two observing systems.
The three objectives of this study are addressed through comparing the WRF-Chem [CO 2 ] simulated using the CASA L2 flux ensemble members to the airborne and tower-based [CO 2 ] data for the ACT-America summer 2016 campaign period (July 18 to August 28, 2021). The WRF-Chem model setup and methods are described in Section 2. All the data used in this work are described in Section 3. The modeled and observed [CO 2 ], CASA parameter constraints, and causes of the model errors are illustrated in Section 4. Sections 5 and 6 are discussion and conclusions, respectively.

Transport Model Setup
All transport model simulations use WRF-Chem version 3.6.1 (Grell et al., 2005;Skamarock et al., 2008). The modification made to transport greenhouse gases as passive tracers (Lauvaux et al., 2012) allows us to carry ensemble tracers for biogenic CO 2 fluxes (Section 2.2) and [CO 2 ] boundary conditions (Section 2.4) in one transport run (Section 2.3). In each transport run, WRF-Chem carries 39 CO 2 tracers for the ensembles of boundary conditions and biogenic fluxes, ocean, fossil fuel, and biomass burning fluxes. The modeled total [CO 2 ] is the sum of a boundary condition, a biogenic flux, and the oceanic flux, fossil fuel emission, and biomass burning CO 2 tracers as described in Feng, Lauvaux, Davies, et al. (2019) and . The [CO 2 ] boundary condition tracers are propagated into WRF-Chem hourly with the consideration of the conservation of mass (Butler et al., 2020). Five global CO 2 inversion/reanalysis systems are used for [CO 2 ] boundary conditions (Section 2.4) and 29 biogenic CO 2 fluxes are used for the biogenic CO 2 tracers (Section 2.2). The CO 2 oceanic flux, fossil fuel emission, and biomass burning are taken from CarbonTracker version 2017 (CT2017; Peters et al., 2007).
The same model configurations in Feng, Lauvaux, Davies, et al. (2019) and  are used except for the meteorological initial and boundary conditions. In this study, we used the ERA5 reanalysis (Hersbach et al., 2020) and benchmark the model transport by nudging the WRF-Chem simulation to ERA5. The wind evaluations show that nudging clearly improves model transport (see Text S1 in supporting information). A suite of transport runs is created for uncertainty quantification (Section 2.3). Choices of the model physics schemes are summarized in Table 1. All WRF-Chem simulations have a horizontal resolution of 27 × 27 km for the period from July 18 to August 28, 2016 covering the ACT-America summer 2016 aircraft campaign at hourly resolution. Vertically, WRF-Chem has 50 levels from surface to 50 hPa with 29 levels within 2 km above ground.

Biogenic CO 2 Flux Ensemble: CASA and CT2017
We include 29 biogenic CO 2 fluxes in each transport run as separate tracers: the 27-member CASA L2 NEE ensemble, the mean of the CASA NEE ensemble, and the CT2017 posterior biogenic CO 2 flux. Option used Reference

Table 1 WRF-Chem Model Physics Parameterization Choices
The CASA ensemble members  were generated by perturbing the maximum light use efficiency (E max ), optimal temperature of photosynthesis (T opt ), and temperature response of respiration (Q 10 ) with the consideration of the biome types in CASA, as described in Zhou, Williams, Lauvaux, Davis, et al. (2020). These three perturbed parameters were determined to dominate the sensitivity of modeled CO 2 fluxes to the model parameters according to an Extended Fourier Amplitude Sensitivity Testing analysis. The initial range for each parameter was broadly sampled for the L1 ensemble. Parameter ranges were subsequently narrowed to those consistent with AmeriFlux data, resulting in a L2 ensemble. CASA simulates gross primary productivity (GPP), total ecosystem respiration (Re), and NEE at monthly resolution. The monthly GPP and Re fluxes were then downscaled to 3-hourly resolution by 3-hourly air temperature and shortwave downward radiation from the North American Regional Reanalysis (Mesinger et al., 2006) using the Olsen and Randerson (2004) method. Two sets of flux products are included in the official release of the CASA flux ensemble, one at 500-m resolution covering the US CONUS region and the other at 5-km covering a broader swath of North America. We used the L2 5-km CASA ensemble for this study. Details about the CASA ensemble products can be found in Zhou, Williams, Lauvaux, Davis, et al. (2020) and Zhou, Williams, .
Unlike CASA that directly simulates biogenic CO 2 fluxes ("bottom-up"), CT2017, a "top-down" flux estimate, optimizes the a priori fluxes (CASA; Potter et al., 1993) by assimilating observed [CO 2 ] (Peters et al., 2007). CT2017 global 3-hourly posterior biogenic fluxes at 1° × 1° are used in this study. Note that, unlike the CASA ensemble used spun up to equilibrium, the CT2017 biogenic flux prior from CASA include an assumed global terrestrial biospheric sink of 2 PgC/yr.
Both CASA and CT2017 biogenic CO 2 fluxes have a 3-hourly temporal resolution. To downscale to hourly fluxes for the transport model simulations, values in each 3-hourly flux file are repeated hourly over the period they represent.

Transport Ensemble
The transport ensemble runs are generated using the combination of multiple physical parameterizations and the stochastic kinetic energy backscattering scheme (SKEBS; Berner et al., 2009;Shutts, 2005). Through this combination, model meteorological initial conditions and physics were perturbed at the same time, introducing transport uncertainty due to the model dynamics and physics. Previous studies demonstrated that model errors can be best captured by a combination of multi-physics and SKEBS (Berner et al., 2011(Berner et al., , 2015 as opposed to a single perturbation scheme. Feng, Lauvaux, Davies, et al. (2019)

Boundary Condition Ensemble
A suite of optimized [CO 2 ] from five global inversion systems are collected for this study. They are CT2017 (Peters et al., 2007, with updates documented at http://carbontracker.noaa.gov), TM5 as described in Basu et al. (2016), GEOS-Chem developed in the Carbon Monitoring System (Liu et al., 2014), GEOS-Chem as described in Schuh et al. (2019), and PCTM as described in Barker et al. (2004). These global modeled [CO 2 ] fields are propagated into WRF-Chem hourly as separate tracers following Butler et al. (2020).

Footprint Analysis
To understand the fluxes influencing the observations used in this study, the Lagrangian Particle Dispersion Model (Uliasz et al., 1994) is used to create surface influence functions (footprints) for each flight in the Summer 2016 ACT aircraft campaign (see description in Section 3.1). Particles are released along the time of each aircraft transect within the boundary layer, and are traced backwards in time over a two-week period using meteorology provided by the WRF nudged-transport simulation. The influence by the surface is represented by the number of particles interacting with the surface grid, that is, below 50 m above ground, are summed up, providing a temporal and spatial function that relates the signal observed by the aircraft to the surface fluxes responsible for that signal (Seibert & Frank, 2004).
In addition to aircraft [CO 2 ] measurements, we also use a subset of the ground-based [CO 2 ] tower measurements from the NOAA ObsPack GlobalViewPlus package (Cooperative Global Atmospheric Data Integration Project, 2019; see Section 3.2). We create footprints for the towers by releasing particles at 21 UTC backward in time over two-week period using the nudged transport from July 18 to August 28, 2016 across 23 different tower sites in the US. The selection of 21 UTC limits the tower observations to well-mixed ABL conditions, minimizing model transport uncertainty.

Model Evaluation Metrics
To demonstrate the importance of the uncertainty in the biosphere, we first compare the contribution of biosphere, transport, and boundary conditions to the modeled [CO 2 ] uncertainty. We use the RMSD of a given component to illustrate the ensemble spread and associated uncertainty.
We focus on the performance of the individual biogenic CO 2 flux members through the biome-based model biases, which is determined by the CASA biome map and the footprint of the measurements. The uncertainties in modeled [CO 2 ] associated with the individual biogenic flux members are determined by the spread of the transport and boundary conditions ensembles.
To investigate the causes of modeled [CO 2 ] biases and eliminate the potential interference of transport and boundary conditions inherent in [CO 2 ] comparisons, we also directly compare the CASA flux members and CT2017 fluxes member with AmeriFlux eddy-covariance flux measurements. This comparison tests for consistency between [CO 2 ] and eddy-covariance flux analyses and lends more insight into the diel cycle of CO 2 fluxes. We group these analyses according to the dominant biomes surrounding each [CO 2 ] tower.
Due to the uneven distribution of flux and [CO 2 ] towers, flux towers cannot be always found in the footprint of a given [CO 2 ] tower. We therefore propagate the flux biases at each [CO 2 ] tower location using Equation 1 below. By assuming that a given biome type in the CASA model and CT2017 have a similar behavior (bias) everywhere, we group the flux towers influenced by the same dominant biome together and calculate the overall flux, T F , and bias for the given biome. The flux biases at a given [CO 2 ] tower location i B , can be expressed in the following equation.
where T W and t W are the spatial and temporal weighting functions that propagate the flux biases from the biome level to the [CO 2 ] site level, respectively. The subscript T denotes the biome: Indexes 1 to 14 are the biome in CASA following Zhou, Williams, Lauvaux, Davis, et al. (2020); index 0 represents water bodies. T W are the fractional areas of individual biomes relative to the entire summer-averaged influence area of FENG ET AL.

10.1029/2020GB006914
tower-based, daytime [CO 2 ] observations ( Figure S2). In the analysis, we find that some towers can be influenced by up to six biome types according to the footprint of the tower [CO 2 ] measurements. The temporal weighting function, t W , is essentially equal across the 24 h of a day due to the indistinguishable diel cycle after averaging the footprint over the period of interest ( Figure S3b and Text S2 in SI). We acknowledge that a more thorough analysis should consider day-to-day variation in t W for each tower observation. We hypothesize that the day-to-day variation in t W will have minimal impact on the bias analysis we focus on in this study.

ACT-America Aircraft Data
Two aircraft, the NASA Langley Beech-craft B200 King Air and NASA Goddard Space Flight Center's C-130H Hercules aircraft, were used to collect high quality in situ measurements of greenhouse gases, other gas species, and meteorological fields over Mid-Atlantic (MA), Mid-West (MW), and South Gulf (South) regions of the United States. The flight dates and patterns can be found Table S1. All the flights during this time took off around noon and landed around 5 p.m. local time. In this study, we used ACT-America L3 Merged in situ Atmospheric Trace Gases and Flask Data . This product provides integrated measurements and metadata flag information, including flight pattern, airmass type, and boundary layer information at five-second intervals. More information about the ACT-America campaign and measurements can be found at https://actamerica.ornl.gov/ and Davis et al. (2021). The nearest point interpolation is applied to extract modeled [CO 2 ] along the flight tracks.

NOAA ObsPack GlobalViewPlus [CO 2 ] Product
We also use tower-based in situ [CO 2 (Table 3) from this data package were selected for investigations. The Ob-sPack product collects greenhouse gas data from providers around the globe and reformats the data into the ObsPack framework in support of carbon cycle modeling studies . The [CO 2 ] data are first organized into hourly data, and then nearest point interpolation is applied to extract modeled [CO 2 ] at the tower locations.

AmeriFlux CO 2 Flux Measurements
We also include CO 2 flux measurements to evaluate our findings regarding biogenic CO 2 flux members. The CO 2 flux data are obtained from the eddy-covariance measurements from the AmeriFlux network (https:// ameriflux.lbl.gov). Seventy-one flux tower sites from the domain of interest were used. The data providers are responsible for data quality control. The locations and information of the sites can be found in Figure 1 and Table S2. We obtained a single estimate of NEE for each flux tower location from the non-gap-filled NEE values reported by AmeriFlux, with a preference for eddy-covariance measurements with a storage correction. Tower-measured NEE was averaged to three-hour intervals to match the time resolution of CASA and CT2017. No intervals were excluded if they had any reported data.

Spatiotemporal Variability of [CO 2 ]
We select five of the 25 research flights (Figure 2) to illustrate the typical flight patterns for fair and frontal weather regimes. As expected, both models and observations show large [CO 2 ] gradients in the ABL, leading to large variations in [CO 2 bio]. Pal et al. (2020) reported that an elevated [CO 2 ] band was repeatedly observed along the cold frontal boundary, a feature also captured by the simulations. The 8/4/2016 frontal case shows a narrow, elevated [CO 2 ] band at the frontal boundary (∼25 ppm difference in [CO 2 ] across the front). We examined this frontal case with a 3 × 3-km (cloud-resolving) resolution model and showed that this elevated [CO 2 ] band has a maximum width of ∼200 km and a length of over 800 km extending from northeastern Kansas to northeastern Iowa (Samaddar et al., 2021). Figure 2 shows that the frontal boundaries are associated not only with elevated [CO 2 ] but also with highly variable [CO 2 bio]. More than 5 ppm RMSD of [CO 2 bio], caused by variability among the CASA ensemble fluxes, appears in the ABL along the frontal boundaries. Enhanced by the baroclinic instability, the cold airmass on the west lifted the warm airmass aloft. ABL [CO 2 ] penetrates into the free troposphere along the frontal lifting. RMSD in [CO 2 bio] greater than 3 ppm, an indicator of strong surface influence, reaches up to 3.5 km above sea level on 7/18 and 1.7 km on 8/4 (Figure 2). On the contrary, both [CO 2 ] and [CO 2 bio] have less variability for the fair-weather cases.
We use the ABL [CO 2 ] observations to evaluate biogenic CO 2 fluxes since, as Figure 2 illustrates, these data are the most sensitive to variations in the biological CO 2 fluxes. We limit our work to ABL [CO 2 ] for the rest of this analysis. Figure 3 shows the averaged [CO 2 ] sampled by aircraft and simulations. Note that the modeled [CO 2 ] are from the transport-nudged simulation (described in Section 2.1), while the uncertainties are determined by spreads of the ensemble runs associated with different components described in Sections 2.2, 2.3 and 2.4. All ACT-America aircraft collected afternoon samples aiming at well-mixed ABL conditions. The most outstanding feature overall is that the biosphere is the most uncertain component in the simulation for the fair and frontal weather regimes except for three cases in the South and one in the Midwest. The footprint analysis ( Figure S4)    Observations tend to be encompassed by the ensemble model spread except on four flight days: 7/22, 7/25, 7/26, and 8/12, on which the model shows large discrepancy with observations. Our preliminary investigation indicates that the disagreements on 7/25 and 7/26 are mainly caused by unrealistically strong uptake to the west of the Appalachia area in the CT2017 biogenic fluxes; 8/12 is due to errors in the long-range transport.
In the following sections, we focus on investigating the coherence between aircraft and tower [CO 2 ] data and evaluating the model biases across the CASA ensemble members. CT2017 serves as the reference for this exercise. As an inversion product that is constrained by atmospheric data, we expect that CT2017 should agree well with observed [CO 2 ] even though the ACT-America aircraft data were not assimilated in CT2017. Figure 4 shows    Table S1. The observations used in (b) are from the [CO 2 ] tower measurements at afternoon hours (19-22 UTC). The locations and information can be found in Figure 1 and Another feature worth noting is that the CASA-simulated [CO 2 ] are clustered into two groups. Some high E max members fall into the low E max group, and the rest falls in the medium E max group. This grouping is evident in both aircraft and tower [CO 2 ] comparisons. Given the same transport and boundary conditions were used, the clustering is likely driven by the difference in biogenic fluxes associated with the values of the three parameters used for generating the CASA flux ensemble.

Identification of the CASA Key Parameters
The modeled [CO 2 ] biases associated with individual CASA flux members and CT2017 with the uncertainty bounds that are determined by the transport and boundary condition ensembles are shown in Figures S6 and S7 for each observation from aircraft and tower comparison, respectively. Applying the footprint analysis to aircraft and tower measurements and the CASA-defined biome map, we obtain the model biases for each biome. We list the top 2 biomes with the most influence on each aircraft and tower measurements in Tables 3 and 4, respectively. Croplands (CR), deciduous broadleaf forest (DB), and grasslands (GL) are the major biome types sampled by aircraft; CR, DB, GL, and evergreen needleleaf forest (EN) are mainly sampled by towers ( Figure 5). The samples primarily influenced by water bodies are removed from the results. Two aspects stand out in the comparisons, modeled [CO 2 ] from all the flux members are positively biased across all biome types, and one group of the CASA flux members has better agreement with the observations than others. These two aspects are illustrated consistently in the comparisons of both aircraft and tower measurements.
For the first aspect, we find that all flux estimates, including CT2017 and CASA ensemble, overestimate [CO 2 ]. This is also reflected in Figure 3. Figure 3 also displays that both biogenic flux and transport can play an important role in modeled [CO 2 ] errors. We first focus on the relative performance across the ensemble members to isolate the influence from model transport. The model biases seem scaled with the degree of plant productivity, given that we find larger biases associated with CR and DB and smaller biases associated with EN and GL. Zhou, Williams, Lauvaux, Davis, et al. (2020) also reported that the monthly averaged NEE of the CASA ensemble averaged over 13 years had a larger positive bias in CR and DB than EN and GL in summer. When comparing aircraft-based biases to tower-based biases, the WRF-Chem simulated [CO 2 ] is more positively biased. Assuming that spatial variability in [CO 2 ] is related to temporal variability, as towers and aircraft basically observe the same weather systems, the larger aircraft-based biases might be caused by different geographic sampling or by the enhanced variability in aircraft data (designed to sample frontal systems) leading to larger biases. As expected, CT2017 has better agreement with the observations overall since its fluxes have been optimized using atmospheric [CO 2 ] data. However, the fact that WRF-Chem performs better when coupled to optimized CT2017 biogenic CO 2 fluxes confirms that transport model differences remain much smaller than flux differences.  Figure 2. For the second aspect, we discover that distinct groups of CASA members reflect the three parameter values for maximum light use efficiency, E max . The flux members that show the best agreement with the observations mostly have medium E max values while the groups with low and high E max values correspond to larger model biases. Both aircraft and tower measurements identify that E max is the dominant parameter in the CASA ensemble, which is consistent with the sensitivity results of Zhou, Williams, Lauvaux, Davis, et al. (2020).
The performance of the individual flux members is summarized by ranking them as a function of bias ( Figure 6). Further investigating the top-performing 11 ensemble members according to Q 10 values (Figure 7), we find that in general, the flux members with low Q 10 value (Q 10 = 1.2) are ranked high, followed with medium Q 10 (Q 10 = 1.4), and then high Q 10 value (Q 10 = 1.6). In contrast, no T opt -driven grouping is visible (not shown). We conclude that Q 10 plays a secondary role in CASA-simulated summer NEE.
FENG ET AL.
In summary, the aircraft and tower data deliver consistent results. Given the multiple level-leg sampling and profiling strategies available in the aircraft data in addition to the large vertical gradients in [CO 2 bio] that appear in data set (Figure 2), there is potential to impose additional constraints on biogenic fluxes and mixing heights using the vertical gradients of [CO 2 ]. Although outside the scope of this study, these vertical gradients will be examined in future work.

The Causes of Modeled [CO 2 ] Biases
The model-tower comparisons show the modeled [CO 2 ] biases for each [CO 2 ] tower we included in the previous analysis in Figure 8a. All the CASA members lead to an overestimate of [CO 2 ] except at three towers: BAO, ETL, and OSI. The CASA flux members result in modeled [CO 2 ] that is similar to CT2017 for GL and EN sites with the exception of three sites in the South Gulf region (GCI01, GCI03, and GCI04). Variability in the TBM parameters have more impact on CR and DB where they have the larger sink strength and plant productivity. A similar pattern of biases across biome types can be seen in CT2017, though CT2017 shows slightly better agreement with the observations.
Model transport, boundary condition, and other CO 2 flux components also contribute to modeled [CO 2 ] errors. Although we illustrated that the contribution is less than that from the biosphere, it can potentially shift the biases uniformly up or down. Therefore, to root out the potential interference, following Equation 1, we calculate flux biases associated with each [CO 2 ] tower over the same time period following Equation 1 (Figure 8b). Consistent with the [CO 2 ] analysis, all the flux members show positive biases across the [CO 2 ] towers, indicating the positive biases in modeled [CO 2 ] are mainly due to the fact that CASA and CT2017 underestimate net uptake. For CASA, this can be attributed to weak plant productivity and/ or strong respiration. Zhou, Williams, Lauvaux, Davis, et al. (2020) pointed out that there is a missing net sink due to the lack of crop harvest and forest recovery in the CASA model, yielding a net overestimate in annual respiration. Additionally, the relative performance among the CASA E max groups and CT2017 with respect to flux tower data is consistent with the comparison to [CO 2 ]. The low E max values lead to the largest biases in both [CO 2 ] and flux space, and CT2017 shows smaller biases for CR and DB. Such coherence across measurement space (mole fraction and flux) and platforms (aircraft, [CO 2 ] towers, and flux towers) lends a high degree of confidence to these results.
We break the daytime and nighttime flux biases apart to explore what causes the overall weak NEE uptake in CASA. In summer, the daytime fluxes are a combination of GPP and ecosystem respiration; the nighttime fluxes are driven completely by respiration. In the flux observations, the all time, daytime, and nighttime averaged fluxes range from −2 to −1 µ mol m −2 s −1 , from −8 to −3 µ mol m −2 s −1 , and from 0.5 to 2.5 µ mol m −2 s −1 , respectively ( Figure  estimation of the daytime net sink (Figure 8c): decreasing E max values leads to larger biases across biomes (sites), and increasing E max values do not offset the weak net uptake in CASA, indicating that tuning the parameters cannot fully counteract the lack of the harvest sink in CASA for summer 2016. The nighttime bias of CASA's downscaled respiration has a smaller bias than the daytime productivity plus respiration, bias is still large, especially when considered as a portion of the averaged nighttime flux magnitude. On average, CASA underestimates the daily averaged sink by 1-2 µ mol m −2 s −1 for EN, CR, and DB and by 1 µ mol m −2 s −1 or less for GL. The different magnitudes of flux biases across the biomes suggest that the flux bias is scaled with the strength of the seasonal uptake and/or the strength of the annual net carbon exchange (i.e., missing sink).
CR and DB have more distinct E max ranking patterns for the daytime and nighttime flux biases. For CR, low E max leads to larger biases in both daytime and nighttime fluxes due to low daytime net uptake and nighttime respiration. Since E max in CASA directly impacts GPP and indirectly impacts respiration, increasing both GPP and respiration is favorable for capturing CR fluxes in CASA. This is also reflected by the combination of high E max and high Q 10 values (P25 and P27) being ranked in the top group in Figure 6. For DB, however, low E max leads to large, positive flux biases in daytime but the smallest positive bias at night, suggesting that the bigger issue for DB is that the CASA model tends to respire carbon faster than in the actual ecosystem.
In both mole fraction and flux space, CT2017 agrees more closely overall with the observations than the CASA members after averaging ( Figure 8). However, once we break nighttime and daytime flux apart, CT2017 overestimates the nighttime respiration for these four biome types and daytime net uptake for DB.
FENG ET AL.

10.1029/2020GB006914
15 of 21  Table 3. A mixed type is listed if the difference between the fractions of the top two biome types is within 10%.

Discussion
Multiple CO 2 observation platforms (i.e., aircraft and tower [CO 2 ] and flux tower data) across measurement space (i.e., concentration vs. flux) identify that E max is the most important parameter driving the spread of the CASA flux ensemble for summer 2016. In concentration space, the modeled [CO 2 ] biases can be contributed by biogenic flux, transport, boundary conditions and other CO 2 flux components. Although Figure 3 demonstrates that, in the sampled cases, the biosphere dominates the model errors, the transport errors are not negligible. The model transport errors can increase or decrease modeled [CO 2 ] biases in Figure 5 as a whole. We choose to focus on the relative performance among these flux members to minimize the impact of the potential transport errors on interpreting the results. However, we acknowledge that the complexity the model transport introduces in the results are not eliminated.
We argue that the transport error is unlikely to change our major conclusion that the biogenic fluxes are biased or change the relative performance across the biogenic CO 2 flux members. This argument is supported in the consistent results from the flux comparison in Section 4.3. This is also reflected by the results that CT2017 has smaller biases against measured [CO 2 ] (Figures 5 and 6). While the CT2017 fluxes were optimized with the TM5 transport model (Krol et al., 2005), we consistently found that the associated WRF-simulated [CO 2 ] bias was the lowest compared to the CASA members. We also note that a slightly different version of CASA (Potter et al., 1993) is used as a prior in CT2017. In our transport model setup, we nudged the WRF-Chem model to the state-of-the-art reanalysis product ERA5 in order to improve the large-scale dynamics. The comparison between the nudged and free run modeled wind fields illustrates the effectiveness and efficiency of this meteorological constraint. Finally, the ensemble of perturbed transport simulations confirms the secondary role of transport errors in [CO 2 ] model-data residuals. Hence, it seems highly unlikely that transport errors could be the cause of our findings regarding the CASA ensemble.
The CarbonTracker (CT) inversion system works by estimating scaling factors that multiply the prior model (CASA)'s NEE. During the growing season, an increase in the terrestrial sink is equivalent to amplifying the prior model's diel cycle, to perhaps unrealistic levels. These scaling factors are constant for each week and are independent from one week to the next. Thus, the CT fluxes may yield periods of unrealistically large diel cycles, possibly followed by periods with abnormally small daily cycles of NEE. The shape of the day-night flux differences and day-to-day changes cannot be adjusted by CT. To alleviate this problem, some regional inverse flux estimation pioneers have applied different strategies. Gourdji et al. (2012) first estimated diel cycle of the fluxes at a 3-hourly resolution for the North American continent. In Schuh et al. (2013), the two regional inversion techniques were used to minimize the artifact in the diel cycle of the estimated fluxes: the Colorado State University inversion technique optimized for GPP and respiration separately, while the Pennsylvania State University inversion optimized daytime NEE and nighttime NEE on a 7.5-day time scale. Most inversion systems that optimize NEE with prescribed diel cycle may potentially alleviate the unrealistic diel cycles but, on the other side, may be subject to temporal aggregation errors (e.g., Byrne et al., 2020;Peters et al., 2007;Schuh et al., 2010). Additionally, since CT relies heavily on simulated transport, it is also possible to retrieve exaggerated diurnal cycles in fluxes if that model ventilates the PBL too strongly. While this process could affect fluxes throughout the day, thin and stable nighttime boundary layers are particularly difficult to represent in models of this class. Section 4.3 provides more insight into the causes of the modeled [CO 2 ] biases for summer 2016. The biggest assumption we made in Equation 1 is homogeneity across the entire biome. The optimal method for calculating the flux biases at a [CO 2 ] tower should be based on the flux towers in a [CO 2 ] tower's footprint (shown in Figure S2). However, due to the uneven distribution of the flux towers in the domain of interest, this direct calculation is not possible. The method we used to derive the flux biases at each [CO 2 ] tower takes the influence from multiple biomes into account and results in a bias ranking among the flux members that is consistent with the ranking in mole fraction space, indicating the reliability of this method.
The CASA simulations were spun up equilibrium, and the parameters were adjusted to match the observations. CASA's overestimation of summer NEE likely derives from missing carbon sink processes because of the balanced-biosphere equilibrium starting condition resulting in carbon pools that are too large rather than a problem in the model's parameters (Pietsch & Hasenauer, 2006;Wutzler & Reichstein, 2007;Zhou, Williams, Lauvaux, Davis, et al., 2020). The range of the three perturbed parameters, E max , T opt , and Q 10 , were determined by comparison to flux measurements over 13-year simulation period. Tuning these parameters cannot solve the fact that carbon stocks are actually dynamic and out of equilibrium. Our results may erroneously indicate that a particular parameter set has the smallest bias when the core problem may be that the carbon pool is out of equilibrium. This is consistent with the finding that CT2017, with its imposed net global land sink, is less biased than the balanced biosphere CASA L2 ensemble. We therefore do not intend to stress which parameter set is the best. Instead, the highlight of this work is to demonstrate that atmospheric [CO 2 ] data can be used, with flux tower measurements, to diagnose the performance of TBMs. This study demonstrates the utility of multiple observation platforms for TBM evaluation.
The rankings in Section 4.2 show that increasing E max does not necessarily lead to better agreement with the measurements. Increasing E max increases GPP, but the carbon seems to respire away quickly in CASA and does not reside long enough, perhaps because harvest is not represented in this simulation. Key missing processes include the effects of management and land use such as (a) agricultural sinks from management that removes crops and crop residues thus decreasing the size of the carbon pool that might be respired, (b) pastureland sinks from cattle and other grazers that consume plant biomass and store it in their body mass, (c) forest carbon storage as trees and stands mature with sequestration in wood. Additional candidates include stimulation of ecosystem carbon sinks by growth enhancement factors such as rising CO 2 concentration, nitrogen deposition, fertilizer additions, and the like. Including these missing processes in the model framework would require a re-evaluation of all model parameters, and we might find that a different E max parameter set was best relative to what was identified in this study and might lead to a better match with this suite of observation. In addition, the temporal downscaling of CASA's monthly fluxes suggests that day-night variation in respiration may not be correctly captured in the temporal downscaling to hourly variations. The modeled respiration is primarily related to temperature. However, previous studies have shown that the correlation plots of respiration and temperature are scattered. The high-biased points tend to occur in daytime, and the low-biased points tend to occur at nighttime because carbon fixation processes are more active during daytime when more labile carbon is readily available to be respired.
Simplistic temporal downscaling of CASA's monthly carbon fluxes likely contributes to biases in the diel cycle at sub-daily scale. The 3-hourly CASA ensemble flux products were downscaled from the native monthly resolution using the Olsen and Randerson (2004) method, in which GPP is downscaled with downward shortwave radiation, Re is downscaled with air temperature and Q 10 , and NEE is the sum of GPP and Re. This makes it difficult to use the diel cycle analyses in Section 4.3 to make conclusions about GPP and Re in CASA at fine scales. However, redistribution of respiration to shift more to nighttime would appear insufficient to address the flux biases. In the flux diurnal analysis, we found a slight over-estimation of nighttime fluxes (bias was near zero for most ecosystems expect for DB) and a significant under-estimation of the daytime uptake. If we transfer respiration from daytime to nighttime to fix the daytime sink under-estimation, the net slightly positive flux bias during nighttime will increase, which is contrary to the expected near-zero correction at night or would exacerbate the overestimation where it already exists (again, mainly DB). Hence, changing the diurnal cycle of respiration is insufficient to address the day/night flux mismatches. Consequently, only an increase in GPP (or an overall decrease in respiration) can fix the bias, requiring model structural corrections to include the missing processes mentioned above. Also, note that the biases in the study period of six weeks are nonetheless on a similar timescale as direct estimation from native monthly CASA GPP and Re.
The ability to diagnose errors in GPP and respiration will require models that resolve the diel cycle and possibly additional data dedicated to the photosynthetic process. Carbonyl sulfide (COS) has been proposed as an independent proxy for GPP as it diffuses into leaves in a fashion very similar to CO 2 , but in contrast to the latter, is generally not emitted by respiration. Campbell et al. (2017) presented a global, measurement-based estimate of GPP growth during the twentieth century that is based on long-term atmospheric COS records, derived from ice-core, firn and ambient air samples and found that the observation-based COS record is most consistent with simulations of climate and the carbon cycle. Recently, Spielmann et al. (2019) used concurrent ecosystem-scale flux measurements of CO 2 and COS at four European biomes for a joint constraint on CO 2 flux partitioning. Their results demonstrated the importance of using multiple approaches for constraining present-day GPP due to a systematic underestimation under low light conditions with the classical approaches relying merely on CO 2 fluxes. Other studies have used the δ 13 C of nocturnal whole-ecosystem respiration as a proxy from which to derive carbon isotope discrimination associated with photosynthesis (Alstad et al., 2007;Bowling et al., 2002;Flanagan et al., 1996). Joint atmospheric gas constraints may improve our diagnoses of the causes of biases in TBM flux estimates. Studies using ACT-America airborne biogenic tracers, including CO 2 , COS, and CO, are submitted to this collection (Parazoo et al., 2020).

Conclusions
We evaluate the modeled [CO 2 ] biases associated with CASA TMB biogenic CO 2 flux ensemble members and the CT2017 posterior biogenic flux using aircraft and tower in situ [CO 2 ] jointly with eddy-covariance flux data from July 18 to August 28 in summer 2016. Aircraft and tower in situ [CO 2 ] were influenced by GL, EN, DB, and CR. While the mole fraction-based analyses revealed a systematic underestimation of carbon uptake by the balanced-biosphere CASA runs, the flux-based analyses identified a combined effect from an overestimation of respiration and under-estimation of productivity. The joint observational analysis yields strong confidence that these results span large spatial domains and multiple ecosystems due to the availability of long aircraft transects and a wide network of ground-based measurements. The results from analyzing both mole fraction and flux model-data residuals were consistent. The systematic errors in CASA that span all parameter values suggest that missing processes cannot be properly simulated by adjusting the existing parameters. Analyses indicate that modeled [CO 2 ] biases are related to biome productivity; the models tend to be biased more for high productivity biomes (DB and CR) and biased less for GL and EN. In particular, the summer harvest sink absent from CASA seems to be responsible for large biases in the Midwest. Lastly, CT2017, an inversion product that is constrained by atmospheric [CO 2 ] data and has an imposed net biogenic carbon sink, shows better agreement with [CO 2 ] mole fraction data compared to CASA flux ensemble members. However, the flux-based analyses revealed that the diurnal variations of CT were unrealistically large. We suggest that the scaling of the net daily fluxes in large-scale inversions must be further decomposed into day and night sub-components to reproduce the diel cycle of photosynthetic and ecosystem respiration processes. Lastly, our setup does not resolve the subgrid biome variability, which is a shortcoming at the scale. The CASA ensemble provides an alternative flux data set at 500 × 500 m. A sensitivity study with both sets of the CASA flux ensemble can be tested in future.