Evaluating the Performance of the Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) Tailored to the Pan‐Canadian Domain

Canada's boreal forests and tundra ecosystems are responding to unprecedented climate change with implications for the global carbon (C) cycle and global climate. However, our ability to model the response of Canada's terrestrial ecosystems to climate change is limited and there has been no comprehensive, process‐based assessment of Canada's terrestrial C cycle. We tailor the Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) to Canada and evaluate its C cycling performance against independent reference data. We utilize skill scores to assess model performance against reference data alongside benchmark scores that quantify the level of agreement between the reference data sets to aid in interpretation. Our results demonstrate CLASSIC's sensitivity to prescribed vegetation cover. They also show that the addition of five region‐specific Plant functional types (PFTs) improves CLASSIC's skill at simulating the Canadian C cycle. CLASSIC performs well when tailored to Canada, falls within the range of the reference data sets, and meets or exceeds the benchmark scores for most C cycling processes. New region‐specific land cover products, well‐informed PFT parameterizations, and more detailed reference data sets will facilitate improvements to the representation of the terrestrial C cycle in regional and global land surface models. Incorporating a parameterization for boreal disturbance processes and explicitly representing peatlands and permafrost soils will improve CLASSIC's future performance in Canada and other boreal regions. This is an important step toward a comprehensive process‐based assessment of Canada's terrestrial C cycle and evaluating Canada's net C balance under climate change.


Evaluating the Performance of the Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) Tailored to the Pan-Canadian Domain
To date, we are not aware of a comprehensive, process-based assessment of Canada's terrestrial C cycle. Preexisting regional or global C cycling assessments using process-based models, have a relatively coarse spatial resolution (0.5° or more) and have a limited representation of region-specific vegetation types, disturbance, and ground surface-related processes (Chaste et al., 2017;Friedlingstein et al., 2019;Hayes et al., 2012;Huntzinger et al., 2012;Peng et al., 2014). Inventory-based estimates, atmospheric inversion (top-down) models, and data-driven models have been used to estimate C fluxes and stocks in Canada, in some cases at extremely high resolution (1 km or less) but each with their own limitations (J. Chen et al., 2000;J. M. Chen et al., 2003;Ju & Chen, 2008;Kurz et al., 2009;Shiga et al., 2018;Sothe et al., 2022;Xiao et al., 2014). The inventory-based approach cannot disentangle the relative contributions of CO 2 fertilization and climate to vegetation growth, the inversion approach operates at coarse regional resolution, and the data-driven approach is dependent upon the quality of the training data and has difficulty disentangling CO 2 fertilization and climate impacts on vegeta tion. All three alternative approaches additionally have a limited ability to make future projections. Conversely, regional processes-based models have several advantages that can circumvent some of the drawbacks of the other approaches. They can run at high resolution compared to their global counterparts, include model parameters optimized based upon region-specific data, include regional Plant functional types (PFTs) that better capture the distribution of vegetation on the landscape, and utilize region-specific data sets (i.e., meteorological or disturbance history drivers) (Koca et al., 2006;Kuntoro et al., 2009;Morales et al., 2007;Santini et al., 2014;Seiler et al., 2014Seiler et al., , 2015. Developing a higher resolution process-based model tailored to the pan-Canadian domain is an important step toward disentangling the impact of different processes on Canada's net C balance and projecting how the Canadian C cycle will respond to future climate change. et al., 2022). Moreover, the PFTs used in LSMs have historically been developed to represent global patterns of vegetation and their associated traits (Bonan et al., 2002;E. O. Box, 1996;Melton et al., 2020;Wullschleger et al., 2014). Region-specific PFTs can enhance model realism, more accurately represent the diversity of vegetation on the landscape, and include more informed parameterizations that act to reduce regional biases Epstein et al., 2001;Mekonnen et al., 2021;Meyer et al., 2021;Peng et al., 2014;Rezende et al., 2016;Rogers, 2014). For these region-specific PFTs to improve model performance and robustness they require sufficient data or expert knowledge to inform their parameterization and specify their distribution. Adding a region-specific PFT increases the number of parameters used in the model. Therefore region-specific PFTs must be carefully specified by balancing realism and parsimony and avoiding issues of equifinality (Anderegg et al., 2022;Prentice et al., 2015). Currently, CLASSIC and indeed we are not aware of any other LSMs that include PFTs tailored to explicitly represent Canada's boreal forests, tundra sedges, and shrubs (Melton et al., 2020;Meyer et al., 2021;Wullschleger et al., 2014).
In this study, we tailor CLASSIC to the pan-Canadian domain by improving its representation of the distribution and traits of Canada's vegetation to enhance CLASSIC's representation of the Canadian C cycle. Given the wide array of land cover products that exist for Canada, we evaluate the performance of CLASSIC when run with prescribed land cover from four land cover products. We also evaluate the impact of including five new PFTs in CLASSIC including shrub, sedge, and region-specific tree PFTs. We compare our offline model simulations to independent remotely sensed and data-driven products to demonstrate the skill of the region-specific model configuration in representing the pan-Canadian domain relative to the standard model setup. Finally, we evaluate the model biases to determine where to implement future improvements. This is the first study to use these five new CLASSIC PFTs together, in a scaled-up high-resolution process-based model tailored to the pan-Canadian domain (i.e., in gridded simulations) and extensively evaluate their performance at the pan-Canadian scale.

The CLASSIC Model
CLASSIC is the community open-source successor to the coupled Canadian Land Surface Scheme (CLASS) (D. Verseghy, 2017;D. L. Verseghy, 2000D. L. Verseghy, , 2007D. L. Verseghy et al., 1993) and Canadian Terrestrial Ecosystem Model (CTEM) (Arora, 2003;Melton & Arora, 2016). A detailed description and evaluation of CLASSIC v1.0 can be found in Melton et al. (2020) and Seiler et al. (2021). The description below highlights updates since CLASSIC v1.0 that include or improve the representation of certain processes. Within CLASSIC, CLASS simulates the energy and water balances of the land surface and CTEM simulates the biogeochemical processes.
CLASS is the physics sub-model that, when driven by meteorological data, simulates the fluxes of heat, momentum, and water on and within the land surface. CLASS simulates four possible subareas within each grid cell: bare ground, snow-covered ground, canopy-covered ground, and snow-covered canopy, typically on a 30-minute time step in offline simulations. The heat and water fluxes are calculated for each of these four sub-areas and weighted by their fractional coverage within the grid square to find the grid cell average values. The model is set up to use 20 ground layers which gradually increase from 10 layers of 0.1 m thickness to a 30 m thick layer for a total depth of over 61 m. CLASS simulates water fluxes between the soil layers down to the depth of the underlying impermeable bedrock layers. The water fluxes use an improved first-order Lagrange interpolation to discretize the Richards equation for unsaturated vertical flow (MacKay et al., 2022). CLASS simulates heat transfer within all ground layers including the underlying bedrock. The soil textures and permeable depth used within the model come from the SoilGrids250m data set (Hengl et al., 2017). CLASS models the canopy as a single layer and in the default configuration uses four PFTs for the model physics: needleleaf trees, broadleaf trees, grasses, and crops. One of our model runs includes one new region-specific CLASS PFT in addition to those described above: Broadleaf shrubs (see Section 2.4 below; Table 1).
CTEM is a dynamic vegetation model which simulates the biogeochemical processes within CLASSIC. CLASS is coupled to CTEM on a daily time step. CLASS provides CTEM with information about the mean daily soil moisture, soil temperature, and net radiation on the land surface. CTEM in turn provides CLASS with information about the overlying vegetation including its height, leaf area index (LAI), biomass, and rooting depth. The vegetation is dynamically simulated by CTEM as a function of environmental conditions through its simulation of photosynthetic fluxes on the physics timestep, and daily simulation of C allocation to three live vegetation 10.1029/2022MS003480 4 of 24 components: leaves, stems, and roots. CTEM incorporates non-structural and structural carbohydrate pools within the three live vegetation components (Asaadi et al., 2018). CTEM allocates C first to the non-structural pool, and the model then simulates the flux of C to the structural pool. The non-structural pools buffer the supply of C to improve the seasonality of simulated LAI (Asaadi et al., 2018). CTEM also simulates daily autotrophic respiration (Ra) from the live vegetation components and heterotrophic respiration (Rh) fluxes from the litter and soil C pools. CTEM has a fire module that simulates fires and their associated C fluxes based on climate conditions, population density, and lightning strike frequency (Arora & Boer, 2010;. The simulation of the C pools and fluxes within the model utilizes a user-determined number of PFTs. In its default configuration, CTEM utilizes nine biogeochemical PFTs that map onto the physics (CLASS) PFTs (Table 1). One of our model runs utilizes new region-specific CTEM PFTs, in addition to the standard nine, including broadleaf deciduous shrubs, broadleaf evergreen shrubs, continental needleleaf evergreen trees, interior needleleaf evergreen trees, and sedges (see Section 2.4 below; Table 1).

Meteorological Forcings and Simulation Protocol
CLASSIC requires seven meteorological forcing variables for its simulation of matter, energy, and momentum exchanges between the land surface and the atmosphere: incoming shortwave radiation, incoming longwave radiation, air temperature, precipitation rate, air pressure, specific humidity, and wind speed. As described by Meyer et al. (2021), the daily meteorological forcing used in our simulations is from a merged data set (GSWP3-W5E5-ERA5). The 1901-1978 portion of the meteorological forcing comes from the Inter-Sectoral Impact Model Intercomparison Project 0.5° GSWP3-W5E5 bilinearly interpolated to a 0.25° grid (Kim, 2017;Lange, 2019Lange, , 2020aLange, , 2020b. The 1979-2018 portion comes from the 0.25° ERA5 time series bias corrected to match the means of the overlapping period of the GSWP3-W5E5 data set (ECMWF, 2019). Using the nearest neighbor method we interpolate the bias-corrected meteorological forcing to a 0.22° common grid using the Climate Data Operators suite (Schulzweida et al., 2006). We disaggregated the meteorology from daily to half-hourly time steps following the methodology of Melton and Arora (2016) except for incoming longwave radiation which is linearly interpolated. The fire module utilizes time-varying lightning-to-ground strike data from the Optical Transient Detector and Lightning Imaging Sensor Climatology Data Set (Cecil et al., 2014) and time-varying population density from the Trends in the land carbon cycle 2021 (TRENDY) protocol based on the History database of the Global Environment (Hyde) version 3.2 (Chini et al., 2021;Friedlingstein et al., 2022). Note. The mapping between CLASS and CTEM PFTs is shown along with the parameters for the maximum rate of carboxylation by Rubisco (v max ; μmol CO 2 m −2 s −1 ), maximum cold stress leaf loss rate (Ω C,max ; day −1 ), and the maximum drought stress leaf loss rate (Ω D,max ; day −1 ). a See Meyer et al. (2021) for additional details regarding the sedge and shrubs PFTs and associated parameters.
In our simulations, we prescribe the spatial distribution of PFTs using four different land cover products to better elucidate the influence of the new PFTs (see Sections 2.3 and 2.4 below). CLASSIC's fire module simulates fires and their associated C fluxes during our model runs. Although CLASSIC can simulate nitrogen (N) cycling and land use change (LUC), we did not use either in these simulations (Arora & Boer, 2010;Asaadi & Arora, 2021

Land Cover Products
We use four different land cover products to explore their impact on CLASSIC's ability to represent C cycling-related processes over Canada: Global Land Cover 2000 land cover (GLC 2000), North American Land Change Monitoring System land cover (NALCMS), European Space Agency Climate Change Initiative land cover (ESA CCI), and a hybrid land cover with the default 9 CLASSIC PFTs (Hybrid-9PFT). GLC 2000 is a 1 km resolution global land cover product with 22 classes. It was generated by Bartholomé and Belward (2005) from Satellite Pour l'Observation de la Terre VEGETATION (SPOT-VEG) data collected from November 1999 to December 2000 using an unsupervised image classification method. Its overall accuracy globally against a random sampling of reference sites is estimated to be 68.6% (Mayaux et al., 2006). GLC2000 is known to underestimate needleleaf forest cover in northwestern Canada, overestimate it in the Hudson Plains, and contains large areas mapped as mixed or mosaic forest cover, which are difficult to assign PFT fractional covers to due to the heterogeneity inherent to the cover classes (L. Wang et al., 2019). NALCMS is a 30 m resolution North American regional product with 19 classes (Latifovic et al., 2017). It was generated by the Canada Center for Remote Sensing from Landsat imagery collected from 2009 to 2011 using a supervised classification method (i.e., random forest machine learning with local optimization). We utilize the NALCMS product corresponding to 2010 in this study, which has an overall accuracy of 76.6% (Latifovic et al., 2017). NALCMS suffers from some uncertainty in its forest cover mapping in wetland regions because treed wetlands are not separated from herbaceous wetlands in its legend (Myneni et al., 2002;L. Wang et al., 2019). ESA CCI is a 300 m resolution global product with 22 classes and 15 sub-classes which covers the period from 1992 to 2018 (European Space Agency, 2017). It was generated by the European Space Agency by applying a combination of supervised machine learning classification and unsupervised image classification methods to three land cover products: Environmental Satellite (ENVISAT,2003(ENVISAT, -2012, SPOT-VEG (1999, and Project for On-Board Autonomy Vegetation (PROBA-V, 2013(PROBA-V, -2018. ESA CCI has an overall accuracy of 71% when compared to the GlobCover validation database (European Space Agency, 2017). We utilized the ESA CCI product corresponding to 2010.
Hybrid is a Canada-specific product generated by L. Wang et al. (2022). It combines NALCMS with a land cover classification generated by Hermosilla et al. (2018) using the Virtual Land Cover Engine (VLCE). The VLCE product was generated with a supervised random forest-based classification method using Landsat time-series data and informed by forest change and digital elevation information derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer. VLCE covers the years 1984-2012 and we utilize the product corresponding to 2010 in creating the Hybrid product. The overall accuracy for VLCE is estimated to be 70.3% in 2004, which is the year with the most reference samples (Hermosilla et al., 2018). Croplands are not separated from grasslands in VLCE and the grassland and bare ground classifications in VLCE's legend are not as detailed as in NALCMS. Therefore, the Hybrid product which has 17 classes is designed to blend the detailed land cover classification in NALCMS with a more accurate forest cover mapping by VLCE (L. Wang et al., 2022). Based on field survey data and expert knowledge of global biomes and class descriptions, we use cross-walking tables to convert each data set's land cover classes into the nine default PFTs in CLASSIC (Table 1) (A. Wang et al., 2006;L. Wang et al., 2019L. Wang et al., , 2022. Information about shrub fractional cover is available in the underlying Hybrid product, however, the default nine PFTs do not include shrubs so we assign a fraction of the shrub cover to the tree PFTs and the remainder to the C3 grass PFT in Hybrid-9PFT.

Additional PFTs
We implement five additional PFTs and evaluate how region-specific PFTs improve CLASSIC's performance in Canada and increase the model's realism. Three of the additional PFTs are non-tree PFTs including broadleaf evergreen shrubs, broadleaf deciduous shrubs, and sedges. The two shrub PFTs were broadly parameterized by Meyer et al. (2021) to represent broadleaf evergreen and broadleaf deciduous shrub genera globally, whereas the sedge PFTs were parameterized to represent mid to high-latitude sedge genera. The PFTs were extensively evaluated at a low Arctic eddy-covariance tower site by Meyer et al. (2021). We specify the fractional coverage of these three PFTs by creating a cross-walking table for the Hybrid product that includes 12 PFTs (i.e., the default nine plus the three non-tree new PFTs; Table S1 in Supporting Information S1) (L. Wang et al., 2019Wang et al., , 2022. The other two additional PFTs are needleleaf trees: continental needleleaf evergreen trees and interior needleleaf evergreen trees. The interior needleleaf evergreen tree PFT parameterization comes from Peng et al. (2014) and assumes 50% lower rates of leaf loss from cold and drought in the CTEM phenology model compared to the standard needleleaf evergreen tree PFT (Table 1). This PFT roughly corresponds to the pines (Pinus spp.), spruces (Picea spp.), subalpine fir (Abies lasiocarpa), interior Douglas fir (Pseudotsuga menziesii var. glauca), western hemlock (Tsuga heterophylla), and western red cedar (Thuja plicata) that occupy the interior of British Columbia (BC). We specify the fractional cover for this PFT by splitting the interior needleleaf evergreen PFT from the needleleaf evergreen tree cover in the 12 PFT version of Hybrid using land cover classifications encompassing these species or subspecies in BC's biogeoclimatic ecosystem classification map (MacKenzie & Meidinger, 2018;Salkfield et al., 2016). The continental needleleaf evergreen tree PFT parameterization is based on Qu et al. (2021) and has a lower maximum carboxylation rate of Rubisco (V max ; Table 1) than the default needleleaf evergreen tree PFT. This PFT primarily corresponds to black spruce (Picea mariana), which occupies the continental interior of Canada. We calculate the fraction of the total needleleaf evergreen tree cover that is white or black spruce using gridded species composition data from Canada's National Forest Inventory (NFI) for areas within Canada, and from the Scenarios Network for Alaska and Arctic Planning for areas within Alaska (Beaudoin et al., 2018;Land Cover v0.2, 2021). To estimate the fractional cover of the continental needleleaf evergreen PFT, we apply this fractional value to the needleleaf evergreen tree cover in the 12 PFT version of Hybrid. The resulting land cover product and associated model runs are hereafter referred to as the Hybrid land cover with 14 PFTs (Hybrid-14PFT).

Reference Data Sets
We evaluate the CLASSIC outputs against in situ and gridded observation-based data (hereafter termed reference data) available within the pan-Canadian domain. The 33 reference data sets contain information about 12 variables relevant to the energy, C, and water cycle including above-ground biomass (AGB), the fraction of area burnt (BURNT), gross primary productivity (GPP), latent heat flux (HFLS), leaf area index (LAI), net surface longwave radiation (RLS), net surface radiation (RNS), net surface shortwave radiation (RSS), sensible heat flux (HFSS), shortwave albedo (ALBS), snow water equivalent (SNW), and soil carbon (CSOIL). These data sets include either monthly mean values or are simply a snapshot in time (Table 2) and are versions of those detailed in Seiler et al. (2021Seiler et al. ( , 2022. The gridded reference data sets are bilinearly interpolated onto the 0.22° model grid. The in situ point-based reference data sets are not interpolated because we directly compare each observation to its overlaying model pixel (see Section 2.6). Our analysis focuses on AGB, CSOIL, GPP, and LAI as these variables are particularly relevant to the C cycle and multiple gridded reference data sets are available for each which allows us to consider observational uncertainty.
The GPP reference data sets are from the Moderate Resolution Imaging Spectroradiometer (MODIS) (Y. Zhang et al., 2017), the FluxCom initiative (FluxCom) (Jung et al., 2019), the Global Orbiting Carbon Observatory-2 Solar-induced Chlorophyll Fluorescence (GOSIF) (Li & Xiao, 2019), and the Global Land Surface Satellite Product Suite (GLASS) (Liang et al., 2021). MODIS GPP was calculated from a range of MODIS and reanalysis products using a light-use efficiency model which considers the efficiency with which vegetation uses light absorbed by chlorophyll to fix carbon via photosynthesis. GOSIF GPP was calculated based on a statistical model which relates GPP measurements from eddy covariance towers to solar-induced chlorophyll fluorescence from the global Orbiting Carbon Observatory-2 (OCO-2). FluxCom GPP was upscaled from eddy covariance towers using an ensemble of six machine learning models and an array of MODIS-derived remotely sensed products and meteorological data from the Climate Research Unit National Centers for Environmental Prediction version 8. We pre-process FluxCom GPP by calculating the median of the six ensemble members. GLASS GPP was calculated from a range of remotely sensed products detailing direct and diffuse radiation fluxes, vapor pressure deficit, and atmospheric CO 2 concentrations using an eddy covariance-derived light use efficiency model. All of these GPP data sets directly integrate or were originally validated against eddy covariance tower data which exhibits some spatial bias against far north regions in its sampling distribution (Jung et al., 2020;Keenan & Williams, 2018).
The LAI reference data sets are from MODIS (Myneni et al., 2002), the Advanced Very High-Resolution Radiometer (AVHRR) (Claverie et al., 2016), and the European Space Agency's Copernicus Global Land Service (Copernicus) (Verger et al., 2015(Verger et al., , 2016. The LAI reference data sets were all derived from surface reflectance based on satellite imagery. MODIS LAI was calculated by inverting a three-dimensional canopy radiative transfer model. Claverie et al. (2016) derived AVHRR LAI from AVHRR surface reflectance using an artificial neural network trained using LAI from MODIS (MCD15A2) and calibrated using in situ data from Baret et al. (2006). Finally, Copernicus LAI was generated from SPOT-VEG satellite imagery using an artificial neural network. The Copernicus LAI product was filtered to remove artifacts due to snow cover or poor illumination. At high latitudes, an additional correction was applied where the pixels were fixed at their minimum values when the sun's zenith angle was >70°. Gap-filling was also applied, but our analysis only uses non-gap-filled records.
The gridded AGB reference data sets come from the Global Carbon Observation and Analysis System (GEOCAR-BON) (Avitabile et al., 2016;Santoro et al., 2015), the work of Huang et al. (2021) (Huang2021), and the work of Y. Zhang and Liang (2020) (Zhang). The first in-situ AGB reference data set comes from Canada's NFI (Gillis et al., 2005) whereas the second (FOSXue) is a combination of two sets of observations compiled by  Note. The acronyms given here are defined in Section 2.5.

of 24
Schepaschenko et al. (2019) and Xue et al. (2017). These AGB reference data sets are diverse both in terms of the methodologies applied and the underlying field data or remote sensing covariates used. GEOCARBON AGB was created by harmonizing two pre-existing AGB data sets from Santoro et al. (2015) for boreal regions and Avitabile et al. (2016) for tropical regions. Therefore in our region of interest, it is primarily informed by the Envisat Advanced Synthetic Aperture Radar (SAR) derived estimates of Santoro et al. (2015). Huang2021 AGB was developed from Santoro et al. (2018) which, in turn, was derived from Advanced Land Observing Satellite and Envisat SAR. The SAR retrievals were used to estimate the volume of wood on the landscape. Then AGB was calculated based on wood density and a biomass expansion factor derived by upscaling in situ data. When validated against in-situ data, Huang2021 AGB performed better in boreal regions (relative root mean square difference; Rel. RMSD = 54%) than in tropical, subtropical, and temperate regions (Rel. RMSD = 57%-91%) . Zhang AGB used data fusion to integrate 10 pre-existing aboveground biomass maps that were then extensively evaluated against in situ observations and LIDAR observations. The pre-existing AGB products fused in Zhang exhibit large differences in AGB in boreal regions and positive biases globally. FosXue and NFI AGB are both in situ point-based reference data sets that were derived by upscaling field measurements using allometric equations. The NFI has excellent spatial coverage of forested areas within Canada and consists of approximately 20,000 plots located on a 20 × 20 km grid. The FOSXue data combined in-situ observations from Xue et al. (2017) and Schepaschenko et al. (2019). It has fairly limited spatial coverage within Canada encompassing <50 sites concentrated in southern forests.
The CSOIL reference data sets come from the Harmonized World Soil Database (HWSD) (Todd- Brown et al., 2013), and the SoilGrids system at 250m resolution (SG250m) (Hengl et al., 2017). HWSD CSOIL was created by combining soil survey data with the FAO Soil Map of the World to calculate the soil C content of the top 100 cm of soil. SG250m CSOIL was created by upscaling 150,000 soil survey data records using an ensemble of machine learning models and 150 remotely sensed covariates. We process SG250m to only include the first 100 cm of soil and make it comparable to HWSD. These two data sets are known to differ in the extent to which they represent peatlands, river floodplains, and permafrost soils leading to lower CSOIL in HWSD when compared to SG250m (Seiler et al., 2022;Tifafi et al., 2018).
The BURNT reference data sets come from the Global Fire Emissions Database (GFED4S) (Giglio et al., 2013), and the ESA CCI (Chuvieco et al., 2018). The SNW reference data sets come from a blended product developed at Environment and Climate Change Canada (ECCC) which combines four other gridded SNW products (Brown et al., 2003;Brun et al., 2013;Gelaro et al., 2017;Takala et al., 2011), and in situ SNW measurements compiled by Mortimer et al. (2020) (Mortimer). The surface energy balance-related reference data sets come from the Clouds and the Earth's Radiant Energy System (CERES) (Kato et al., 2013), the Global Energy and Water Cycle Experiment-Surface Radiation Budget (T. Zhang et al., 2011), the Conserving Land-Atmosphere Synthesis Suite (Hobeichi et al., 2020), FluxCom and MODIS (Strahler et al., 1999).

The Automated Model Benchmarking R Package
The Automated Model Benchmarking R package (AMBER) assesses model performance against the reference data sets and calculates skill scores (Seiler et al., 2021). The package calculates a total of six scores: the bias score (S bias ), the root-mean-square-error score (S rmse ), the phase score (S phase ), the interannual variability score (S iav ), the spatial distribution score (S dist ), and the overall score (S overall ). S bias assesses the difference between the reference and modeled mean values. S rmse evaluates the residuals of the reference and modeled time series. S phase assesses how well the model reproduces the seasonality in the reference time series. S iav assesses how well the model reproduces the interannual variability in the reference time series. S dist evaluates how well the model captures the pattern of a variable across space compared to the reference data. Finally, S overall is a weighted average of the other five scores where S rmse is weighted by a factor of two commensurate with its perceived importance in assessing model performance. S rmse , S phase , and S iav are omitted for the CSOIL and AGB reference data sets which are a snapshot in time. The scores are dimensionless and on a scale from 0 to 1. The scores express the level of agreement between the model and reference data with a higher value implying better performance. Lower values are, however, not necessarily a product of poor model performance as the scores are also affected by uncertainties in the forcing and reference data. Further details regarding the AMBER R package as well as the skill score equations are presented in Seiler et al. (2021) and Seiler (2019).
We also calculate benchmark scores for the reference data sets compared to one another. We calculate the benchmark scores by taking the minimum S overall from all possible comparisons of a particular reference data set and the other reference data sets available for that same variable. For example, for MODIS LAI the benchmark score is the minimum S overall for two comparisons: MODIS versus AVHRR and MODIS versus Copernicus. We use the minimum rather than the mean value to ensure we capture the full range of uncertainty. These scores quantify the level of agreement between the reference data sets. They are indicative of the S overall that is achievable given the uncertainty between the reference data sets. The benchmark scores for a single variable (i.e., GPP) can vary among the reference data sets due to the calculations involved in normalizing each statistical metric (Seiler et al., 2022). If the model skill scores reach the benchmark scores then the level of disagreement between the model and the reference data set is of similar magnitude to the uncertainty between the individual reference data sets. The model scores can exceed the benchmark scores when the model falls within the uncertainty range of the reference data.
We also visualize the average bias for AGB, CSOIL, GPP, LAI, and BURNT versus all available relevant gridded reference data sets (i.e., excluding the point-based NFI and FOSXue AGB reference data sets). The biases are calculated for each reference data set as the per-grid cell differences between the time-mean model and reference values (Seiler et al., 2021). If the reference data sets differ in their temporal coverage, each bias calculation uses different time-mean model values taken from the portion of the model run which overlaps that particular reference data. The biases across all the reference data sets (e.g., for CSoil the two raster layers bias model -HWSD, and bias model -SG250m ) are then averaged per-grid cell yielding average bias rasters.

The Spatial Distribution of Land Cover
The four land cover products differ in terms of the fractional cover of the 9 CTEM PFTs and their dominance. In all four land cover products, needleleaf deciduous trees, broadleaf drought/dry deciduous trees, C4 crops, and C4 grasses PFTs are for the most part found at lower latitudes and are not present or have a negligible fractional cover in the pan-Canadian domain ( Figure S1 in Supporting Information S1). The broadleaf evergreen tree PFT is only present in GLC 2000 and ESA CCI, with limited fractional cover (Figures S1a and S1c in Supporting Information S1).
Needleleaf evergreen trees dominate western and mid-latitude Canada, whereas broadleaf cold deciduous trees dominate southern Ontario and Quebec (Figure 1). The fractional cover of needleleaf evergreen and broadleaf cold deciduous trees is generally higher in GLC 2000 when compared to the other three land covers. C3 crops dominate southeastern and south-central Canada, however, the fractional cover of C3 crops is lower in GLC 2000 ( Figure 1a) than in the other three data sets (Figure 1b-1d). C3 grass is dominant in parts of south-central Canada and the Arctic; however, its fractional cover differs widely between the four data sets. In GLC 2000, C3 grass cover in south-central Canada is higher and more widespread, likely due to its lower C3 crop cover when compared to the other three data sets (Figure 1a). GLC 2000 also uses a mix of C3 grass and broadleaf cold deciduous trees at high latitudes ( Figure 1a). ESA CCI has consistent C3 grass cover at higher latitudes ( Figure 1c) whereas NALCMS and Hybrid-9PFT feature a peak ∼70°N and a gradual decline in C3 grass cover at higher latitudes (Figures 1b and 1d).

Comparisons of Model Simulations With Different PFT Cover
The AMBER scores of the CLASSIC model runs using the four different prescribed PFT covers vary when compared to an array of reference data sets (Figure 2). The model run using Hybrid-9PFT has the best overall performance for C cycling-related reference data sets. Hybrid-9PFT has the highest overall score for three out of the four GPP reference data sets with an average improvement of 0.013 when compared to the land cover with the lowest score (Figures 2b and 2c). Similar improvements are seen for LAI (3/3 data sets), AGB (2/5 data sets), and CSOIL (2/2 data sets). These improvements are primarily a result of improvements in the spatial distribution (S dist ) of these C cycling variables and in the bias (S bias , Figure 2b). The CSOIL and AGB reference data sets are snapshots in time and we cannot evaluate their seasonal or inter-annual performance (i.e., S rmse , S phase , and S iav ). Rather we focus on their spatial distribution and bias. The overall score differences are generally large ranging from 0.02 to 0.08. The NALCMS and ESA CCI model runs rank second or third against C cycling-related reference data sets with approximately equal frequency ( Figure S2 in Supporting Information S1). The score differences between the first and second-ranked model runs are often small (i.e., <0.01) but are eclipsed by large differences between the first and third-ranked model runs (i.e., >0.01).
ESA CCI consistently improves the model's performance in terms of surface energy balance-related comparisons and has the highest overall score for RNS (3/4 data sets) and ALBS (3/3 data sets; Figures 2b and 2c). These improvements are primarily due to changes in the spatial distribution of RNS and ALBS (Figure 2c). The differences in the overall scores are lower ranging from <0.01 to 0.03. The Hybrid-9PFT and NALCMS often rank second and third against these surface energy balance-related data sets and exhibit similar performance when compared to the top-ranked model run (i.e., score differences <0.01; Figure S2 in Supporting Information S1). Looking across all of the comparisons, GLC 2000 is the lowest-scoring land cover (22/33 comparisons; Figure 2d).
Average AGB ranges from 1.9 to 5.7 kg C m −2 in the various gridded reference data sets masked to the same spatial extent. In the point data average, AGB is 6.0 for FosXue, and 4.6 kg C m −2 for NFI. The spatial extent of the FosXue data, which is concentrated in southern Canada, is markedly different from that of the NFI data, which covers most forested areas in Canada ( Figure S3 in Supporting Information S1). The NFI point data has the widest range of any AGB reference data set (0-36.7 kg C m −2 ; Figure S3 in Supporting Information S1). The model runs fell into a smaller range toward the higher end of that found within the reference data (4.3-5.0 kg C m −2 ). In both the model runs and the reference data, AGB generally declines with increasing latitude (Figure 3a). For the model simulations, the slope of this decline is steepest for GLC 2000. Average CSOIL ranges from 15 to 50 kg C m −2 in the various reference data sets while the model runs fall into a small range (13-17 kg C m −2 ) at the lower end of the reference data. The model-simulated CSOIL is generally similar to the HWSD reference data set from 45° to 65°N but has lower values at higher latitudes (Figure 3b). The CSOIL reference data sets differ dramatically amongst themselves at mid to high latitudes with HWSD consistently lowest. The average GPP in the various reference data sets ranges from 1.3 to 1.7 g C m −2 day −1 while the model simulates a smaller range from 1.4 to 1.5 g C m −2 day −1 . GPP declines with increasing latitude in both the model runs and the reference data sets (Figure 3c). The model generally estimates higher GPP than the reference data sets at <60°N and is within the range of reference data sets at higher latitudes. GLC 2000 has the steepest decline in GPP with increasing latitude. The average LAI in the various reference data sets ranges from 0.9 to 1.3 m 2 m −2 and the model runs fall into a smaller range from 1.4 to 1.5 m 2 m −2 . All the model runs have higher LAI than the reference data from 45° to 60°N. The Copernicus reference data is substantially closer to the modeled values than MODIS or AVHRR (Figure 3d).

Additional Plant Functional Types
The Hybrid-14PFT vegetation cover product has more heterogeneous vegetation cover patterns than the baseline Hybrid-9PFT. In Hybrid-14PFT, needleleaf deciduous, broadleaf drought/dry deciduous, and broadleaf evergreen trees are again not present in Canada whereas some limited C4 crop cover is present in central Canada and southern Ontario ( Figure S4 in Supporting Information S1). Needleleaf evergreen trees in Hybrid-9PFT are largely replaced by continental needleleaf evergreen trees in the central mid-latitudes of Canada and interior needleleaf evergreen trees in western Canada in Hybrid-14PFT (Figures 4a and 4b). C3 grass PFT cover is largely replaced by broadleaf deciduous, and to a lesser extent, broadleaf evergreen shrub cover throughout Canada. In Hybrid-14PFT, the Arctic is now dominated by a mix of sedge, broadleaf deciduous shrub, and broadleaf evergreen shrub cover which replaces the homogenous C3 grass cover in Hybrid-9PFT (Figure 4b). In Hybrid-14PFT, broadleaf deciduous shrubs dominate the low arctic, but their fractional cover declines and is largely supplanted by sedges at high latitudes.

Model Performance With Additional Plant Functional Types
The addition of five CTEM PFTs to the model improves its performance against reference data for several C cycling-related variables ( Figure 5). The overall scores for three of the five AGB reference data sets improve between 0.04 and 0.14. This is primarily a result of large improvements (i.e., up to 0.14) in the spatial distribution and bias of modeled AGB (Figures 5b and 5c) This came at the cost of a performance loss against the Zhang and FOSXue reference data. The overall scores for three of the four GPP reference data sets also improve by between 0.04 and 0.06 due to improvements in the spatial distribution, interannual variability, and bias of modeled GPP. GLASS is the only GPP reference data product to show an overall score decrease with Hybrid-14 over Hybrid-9. Hybrid-14 improves its HWSD CSOIL bias score when compared to Hybrid-9, but decreases in the spatial distribution score against both CSOIL data sets lead to a decrease in Hybrid-14's overall score against both CSOIL reference data sets (Figure 5c). The overall scores now meet or exceed the benchmark scores for most GPP (4/4) and AGB (4/5) reference data sets, but fewer CSOIL (1/2) and LAI (0/3) reference data sets ( Figure 6). The C cycling-related overall scores consistently exceed the original GLC 2000 model run, except for CSOIL ( Figure 6, Figures S5 and S6 in Supporting Information S1).
There are smaller (i.e., <0.1) changes in the overall scores of surface energy balance-related variables except for latent heat flux (HFLS) where changes in its modeled distribution and inter-annual variability lead to overall score declines between 0.03 and 0.04 (Figures 5b and 5c). The HFLS overall scores nonetheless still exceed the benchmark scores for both reference data sets ( Figure 6).
The AGB for CLASSIC simulations with 14 PFTs is lower on average (3.1 kg C m −2 ) than that with 9 PFTs (4.5 kg C m −2 ). There is a similar decline in AGB with increasing latitude in both, but with 14 PFTs, the model is now closer to the middle estimate provided by the reference data ( Figure 7a). With 14 PFTs, CLASSIC simulated CSOIL (11 kg C m −2 ) is also lower on average than simulations with 9 PFTs (16 kg C m −2 ). Both model runs generally cluster around the HWSD CSOIL reference data (Figure 7b). With the 14 PFTs, CLASSIC simulated GPP is lower on average (1.1 g C m −2 day −1 ) than estimated with the 9 PFTs (1.4 g C m −2 day −1 ). The additional PFTs move GPP to within the range of the reference data at <60°N, where the model run with 9 PFTs generally is above the range of the reference data ( Figure 7c). The simulated LAI with the 14 PFTs CLASSIC run is lower on average (1.1 m 2 m −2 ) than with 9 PFTs (1.4 m 2 m −2 ) and is biased low compared to the Copernicus reference data from 45° to 60°N (Figure 7d).
Use of the 14 PFT land cover and associated parameterizations in CLASSIC significantly reduces regional biases in simulated AGB, GPP, and LAI across Canada (Figure 8). With the 14 PFTs model setup, AGB, CSOIL, and GPP are within the 95% confidence interval of the gridded reference data across the majority of Canada (Figure 8). Exceptions include interior BC where the model under-predicts AGB (Figure 8a) and southeastern and south-central Canada where GPP exhibits significant negative biases (Figure 8c). CSOIL did not exhibit distinct regional biases between the two model runs and large disagreements between the two reference data sets likely confound the CSOIL significance tests (Figure 8b). The largest absolute bias in CSOIL occurs in the Hudson Bay Lowlands region. Modeled LAI and BURNT exhibit strong, often significant biases across much of Canada in both model runs. With the CLASSIC 14 PFTs simulation, strong positive LAI biases remain in boreal and western Canada (Figure 8d). BURNT exhibits consistent strong negative biases in the mid-latitude boreal region of Canada and strong positive biases in the plains region (Figure 8e). BURNT falls outside the 95% confidence interval of the reference data across the majority of Canada in line with its low scores (Figures 5 and 6, Figure S6 in Supporting Information S1).

Discussion
We evaluate CLASSIC's performance across the pan-Canadian domain and demonstrate its skill at simulating C cycling at regional scales. Comparing CLASSIC runs using different prescribed PFT covers demonstrates the model's sensitivity to prescribed vegetation cover (Figures 1-3). The addition of five region-specific PFTs further improves CLASSIC's skill at simulating regional C cycling compared to the 9 PFT model runs and demonstrates that a well-informed regional parameterization can reduce biases (Figures 4-8). For Hybrid-14PFT, the overall scores (S overall ) for the majority of C-cycling (9/14) and many surface energy balance (7/15) processes meet or exceed the benchmark scores, further highlighting the skill of our regional parameterization (Figure 6, Figures  S5 and S6 in Supporting Information S1). Some processes (i.e., LAI, CSOIL, BURNT) continue to exhibit biases similar to those observed in CLASSIC and other LSMs at global scale (Figures 8b, 8d, and 8e) (Seiler et al., 2022).

Vegetation Cover Impacts the Modeled C Cycle
Differences in the distribution and fractional cover of PFTs can impact an LSM's simulated fluxes of matter and energy (Fritz et al., 2011;Gou et al., 2019;Hartley et al., 2017;Jung et al., 2007;Ottlé et al., 2013;Quaife et al., 2008;L. Wang et al., 2022). Our results clearly demonstrate CLASSIC is sensitive to differences in prescribed PFT cover which produce wide-ranging impacts across the model outputs. GLC 2000 exhibits consistently higher tree PFT cover and lower crop and grass cover than other land cover products (Figure 1). The higher tree PFT cover biases the simulated fluxes of matter and energy in the model resulting in the GLC 2000 run consistently scoring the lowest (Figure 2). In the GLC 2000 run AGB, GPP, and LAI fall above the range of the reference data sets at mid-latitudes where tree cover is highest, and below the range of the reference data at higher latitudes where C3 grasses are prescribed to dominate (Figures 1-3). CLASSIC is particularly sensitive to mid-latitude differences in prescribed tree cover owing to its parameterization and the growth forms prominent role on the landscape (Huntzinger et al., 2012;Melton & Arora, 2016;Melton et al., 2020). The Hybrid-9PFT run falls near or within the range of the AGB, GPP, and LAI reference data owing to its lower tree cover and gradual decline in C3 grass cover with increasing latitude (Figures 1-3). As a result, the Hybrid-9PFT run consistently scores higher in C cycling-related comparisons. These results demonstrate that the use of realistic land cover products in LSMs can help reduce regional or global C cycling biases.
Differences in the distribution and cover of PFTs are known to be a significant source of uncertainty between LSMs (Hartley et al., 2017;Teckentrup et al., 2021). GLC 2000, which generally is the lowest-scoring run in this study, has been employed in previous versions of CLASSIC and CanESM (Arora et al., 2009;A. Wang et al., 2006;L. Wang et al., 2022). It is the oldest and has the lowest spatial resolution of the five land cover products considered here and the only product created using unsupervised classification methods alone (Bartholomé & Belward, 2005;European Space Agency, 2017;Latifovic et al., 2017;L. Wang et al., 2022). It also does not include changes in land cover due to disturbance or agricultural LUC, which have occurred since 2000 and are included in the other land cover products which correspond to 2010. Hybrid-9PFT, on the other hand, is Canada-specific and integrates more recently produced higher-resolution land cover products. Therefore, advances in remote sensing which yield higher resolution, region-specific information, and more accurately characterize the vegetation on the landscape can represent a potential boon for improving the accuracy of LSM simulations on regional to global scales (Fritz et al., 2011;Lu & Weng, 2007;Macander et al., 2022;Ottlé et al., 2013;Ustin & Gamon, 2010). Methods that blend land cover products with varying extents, classes, or data types could allow global LSMs with regional biases to benefit from these advances (Hartley et al., 2017;L. Wang et al., 2022;X. Wang et al., 2017;Y. Zhang & Liang, 2020). Finally, model evaluation methods similar to those employed here that is, Seiler et al. (2021), Seiler (2019), and Collier et al. (2018, present a powerful tool for determining the impact of different land cover products on LSMs.

Region-Specific PFTs Improve the Representation of C Cycle Processes
We also demonstrate that PFTs designed for use in global models can exhibit biases when used in regional scale simulations while region-specific PFTs can reduce these biases by better representing the traits of vegetation on the landscape (Epstein et al., 2001;Harper et al., 2018;Peng et al., 2014;Rezende et al., 2016;Rogers, 2014;Wullschleger et al., 2014). The five additional PFTs in this study address significant sources of bias in AGB and GPP. This is possible because sufficient information is available to inform their incorporation into the model (i.e., Beaudoin et al., 2018 have the additional benefit of improving the LAI biases while not exacerbating the existing CSOIL biases when compared to reference data ( Figure 5).
In our baseline model runs there is substantial positive bias in AGP, GPP, and LAI across the forested region of central Canada (Figures 6 and 7). This aligns with results by Qu et al. (2021) showing that the default needleleaf evergreen tree PFT in CLASSIC has a high V max which leads to an overestimated GPP when compared to eddy covariance observations in Canadian boreal forest stands (predominantly spruce trees). Incorporating the continental needleleaf evergreen tree PFT reduces these biases (Figures 4, 7, and 8). Similar positive GPP biases in boreal Canada and Eurasia were observed in TRENDY LSMs (Seiler et al., 2022). V max for needleleaf evergreen tree PFTs also varies widely in LSMs (Rogers, 2014). The interior needleleaf evergreen tree PFT more accurately represents the leaf traits of needleleaf evergreen trees within interior BC and reduces the negative AGB biases in interior BC (Peng et al., 2014;Reich et al., 1995) (Figure 8).
The shrub and sedge PFTs improve model realism in high-latitude regions. These PFTs have been shown to improve the representation of soil temperatures, soil moisture, CO 2 , and energy fluxes in tundra ecosystems in site-level simulations (Meyer et al., 2021). The shrub and sedge PFTs more realistically represent the heterogeneous vegetation cover in tundra, which is often modeled using a single C3 grass PFT Meyer et al., 2021;Myers-Smith et al., 2011;Wullschleger et al., 2014). These ecosystems are particularly significant given shrub expansion and complex greening patterns and browning observed across the Arctic (Berner et al., 2020;Jia et al., 2003Jia et al., , 2009Mekonnen et al., 2021;Tape et al., 2006).
To the best of our knowledge, our modeling analysis and evaluation is at a larger scale (i.e., the pan-Canadian domain) compared to other efforts that implement region-specific PFTs, with the exception of modeling efforts within Europe (  et al., 2014, 2015). These studies have similarly benefited from the implementation of region-specific PFTs, which improve the representation of fluxes and the climate responses of vegetation. Unlike our analysis, which uses prescribed vegetation cover to focus on PFT's impacts on fluxes, other analyses have implemented region-specific PFTs to better dynamically represent the distribution of vegetation at regional scales (Koca et al., 2006;Santini et al., 2014;Seiler et al., 2014). Similar to our analysis, these region-specific parameterizations were required to balance realism and parsimony in their implementation of PFTs. Those which implement broader categories of region-specific PFTs similar to ours do so to address biases or specific patterns of vegetation on the landscape (Koca et al., 2006;Santini et al., 2014;Seiler et al., 2014Seiler et al., , 2015. Implementing a selection of species-specific PFTs, on the other hand, requires balancing the use of parameters tailored to broad groups of vegetation (i.e., groups of trees with different shade-tolerances) and parameters for which species-specific data is available (Koca et al., 2006). Further model evaluation and meta-analysis will help determine if global LSMs will see a similar benefit from these region-specific PFTs and knowledge transfer from regional models.

Will Further Regional Parameterization Improve Performance?
While there is significant diversity in tree genera across Canada, data quantifying how this diversity translates into differences in traits and plant function is limited (Beaudoin et al., 2018;Fisher et al., 2018;Iversen & McCormack, 2021;Iversen et al., 2017;Kattge et al., 2020). Additional region-specific PFTs need to be well-informed, ideally by field data, to balance realism and parsimony (Anderegg et al., 2022;Prentice et al., 2015). The addition of five PFTs brings the model runs within the range of available observation-based estimates provided by the AGB, LAI, and GPP reference data (Figures 6 and 7). As a result, further improvements in model performance against one data set are likely to degrade performance against another.
The GPP reference data sets have high benchmark scores but are relatively clustered, possibly due to the similar underlying data sets used to create and validate them (Jung et al., 2019;Li & Xiao, 2019;Liang et al., 2021;Y. Zhang et al., 2017). Hybrid-14PFT exceeds the GPP benchmark scores ( Figure 6). The improvements in simulated GPP, especially for latitudes <60°N for three-quarters of the data sets, come at the expense of performance versus GLASS which has generally higher GPP (Figures 5 and 7c). The AGB data sets have lower benchmark scores and vary more in their estimates possibly due to the diversity of methodologies and underlying data used to create them (Avitabile et al., 2016;Gillis et al., 2005;Huang et al., 2021;Santoro et al., 2015;Schepaschenko et al., 2019;Xue et al., 2017;Y. Zhang & Liang, 2020). Hybrid-14PFT exceeds the AGB benchmark scores in the majority of cases ( Figure 6). The improvements in simulated AGB against 3/5 of the reference data sets, likely come at the expense of performance versus the Zhang and spatially limited FOSXue which have higher average AGB (Figures 5 and 7c).
For LAI, Hybrid-14PFT improves slightly against MODIS and AVHRR at the expense of performance against Copernicus (Figures 5 and 7). Hybrid-14PFT approaches but does not yet meet, the benchmark scores for these data sets ( Figure 6). There is disagreement between MODIS/AVHRR, which are derived using similar methods, and Copernicus LAI, which employs additional filtering and correction at high latitudes (Claverie et al., 2016; Myneni   Verger et al., 2015Verger et al., , 2016. The positive LAI biases here are similar to those observed by Seiler et al. (2022), but are difficult to interpret given the disagreement between the individual LAI reference data sets. For CSOIL there is disagreement between the reference data due to differences in the extent to which peatlands, river floodplains, and permafrost soils are represented (Seiler et al., 2022;Tifafi et al., 2018) (Figures 6-8).
These processes are likewise not represented within the CLASSIC framework used in our study. As a result, Hybrid-14PFT falls close to the values for CSOIL HWSD and exceeds the benchmark score for that data set. The benchmark scores for CSOIL are relatively low and above 50°N we observe large differences in SG250m and HWSD CSOIL (Figures 3b and 6-8). SG250m and HWSD CSOIL both use spatially explicit estimates of soil C concentration and bulk density to calculate soil C stocks. The broader differences between these two data sets can be attributed to much higher soil C concentrations above 50°N in SG250m (Tifafi et al., 2018). Further field data collection, data-driven modeling, and more detailed data set intercomparison could help reconcile and improve these CSOIL reference data sets. For both CSOIL and AGB, the reference data sets utilized are snapshots in time, rather than monthly time series. As a result, we are unable to evaluate the model's seasonal and inter-annual performance (i.e., S rmse , S phase , and S iav ). The processes regulating the CSOIL and AGB pools are likely less responsive at monthly to annual time scales when compared to fluxes like GPP so we instead focus on their size (S bias ) and spatial distribution (S dist ) (Seiler et al., 2021(Seiler et al., , 2022. CSOIL and AGB data sets with greater temporal resolution could yield insight into the model's performance for these variables at annual and seasonal timescales and allow us to assess historical CSOIL and AGB trends. Ultimately, efforts to make field data for model parameterization more widely available and to create more accurate reference data sets are key for further regional parameterization (Kattge et al., 2020;Kyker-Snowman et al., 2021;Seiler et al., 2021Seiler et al., , 2022. Our results highlight areas in which further work could improve model realism and performance in the Canadian domain. First, disturbance processes (i.e., fire, harvest, and insect damage) have significant impacts on the net C balance of Canada's forests (Chaste et al., 2017;Giglio et al., 2013;Giles-Hansen & Wei, 2022;Ju & Chen, 2008;Kurz et al., 2008Kurz et al., , 2009Landry et al., 2016;Weber & Flannigan, 1997;White et al., 2017). Harvest affected 3% of Canada's land mass from 1985 to 2010, and fire affected 7% of Canada's land mass from 1985 to 2010 (White et al., 2017). Insect damage which often does not completely kill and replace stands affected 25% of Canada's land mass from 1990 to 2010 (CCFM: National Forestry Database, 2022). These three processes are underrepresented and biased in CLASSIC simulations for Canada (Figures 6 and 8). The absence of these disturbance processes likely contributes to the remaining positive AGB, GPP, and LAI biases. Second, despite the large uncertainty in the CSOIL reference data, the largest absolute CSOIL biases are in peatlands (i.e., the Hudson Bay Lowlands) and tundra ( Figure 8). These CSOIL biases mirror those observed in other TRENDY models and can likely be improved by explicitly representing peatland, river floodplains, permafrost C, and yedoma (Melton et al., 2019;Seiler et al., 2022;Wu et al., 2016). Future efforts to incorporate disturbance and high latitude soil C processes within CLASSIC in Canada will improve its representation of these globally important soil C pools and Canada's terrestrial C cycle more broadly.

Conclusions
Canadian ecosystems are critical components of the global carbon cycle which are responding to unprecedented climate change. We developed the first parameterization of a process-based LSM tailored to Canada. We demonstrate that region-specific land cover products and region-specific PFTs improve CLASSICs' performance against independent reference data. Our model evaluations show that future work focused on incorporating a parameterization for boreal disturbance processes (i.e., fire and harvest) and explicitly representing peatlands and permafrost soils are important next steps in tailoring CLASSIC for optimal performance over Canada with potential improvements for other boreal regions. We argue that developing further region-specific land cover products, well-informed PFT parameterizations, and more detailed reference data sets will facilitate improvements to the representation of the terrestrial C cycle in regional and global LSMs. Ultimately this is an important step toward a comprehensive process-based assessment of Canada's terrestrial C cycle and understanding the response of Canada's net C balance to climate change.