Improved Near‐Surface Continental Climate in IPSL‐CM6A‐LR by Combined Evolutions of Atmospheric and Land Surface Physics

This work is motivated by the identification of the land‐atmosphere interactions as one of the key sources of uncertainty in climate change simulations. It documents new developments in related processes, namely, boundary layer/convection/clouds parameterizations and land surface parameterization in the Earth System Model of the Institut Pierre Simon Laplace (IPSL). Simulations forced by prescribed oceanic conditions are produced with different combinations of atmospheric and land surface parameterizations. They are used to explore the sensitivity to the atmospheric physics and/or soil physics of major biases in the near surface variables over continents, the energy and moisture coupling established at the soil/atmosphere interface in not too wet (energy limited) and not too dry (moisture limited) soil moisture regions also known as transition or “hot‐spot” regions, the river runoff at the outlet of major rivers.


Introduction
Earth's climate and its evolution are determined by interactions between the ocean, the atmosphere, ice caps, and land surfaces under the external solar forcing and the atmospheric composition. For these reasons, numerical models need to couple all these components of the system when they are used for running climate projections to anticipate the impacts of climate change. In this general framework, the land surface-atmosphere interactions strongly modulate the regional climate (e.g., Seneviratne et al., 2010); they particularly control climate hazards, and their consequences (Jaeger & Seneviratne, 2011;Miralles et al., 2014) impact the freshwater discharge into the oceans and, in turn, the thermohaline circulation (Peterson et al., 2002). They rely on complex overlap of multiple land-atmosphere feedback processes and depend on the representation of the interactions between the soil moisture and the boundary layer through the partition of the available energy at the surface in sensible and latent heat, the impact on radiation (Betts et al., 1996;Eltahir, 1998;Schär et al., 1999), the representation of the convection and its sensitivity to subgrid scale heterogeneities (Guillod et al., 2015;Taylor et al., 2011Taylor et al., , 2012, the representation of soil moisture, and the possible interplay with the atmospheric circulation (Boé, 2013;Hohenegger & Stevens, 2018). The complexity and the variety of processes involved make the land-atmosphere interactions one of the key sources of uncertainty in climate change simulations at regional scale. As an example, analyses of CMIP5 models revealed considerable spread in the ability of models to reproduce observed correlation between precipitation and soil moisture in the tropics (Williams et al., 2012).
The various phases of the Climate Model Intercomparison Project (CMIP) give important milestones to measure the progress and the remaining open questions concerning the climate modeling and in particular the parameterization of the land surface-atmosphere interactions. Between Phases 5 and 6 of CMIP, significant efforts have been devoted to improving the atmospheric , the land surface, and hydrological components of the Earth System Model of Institut Pierre Simon Laplace (IPSL) and to tuning the Climate Model (CM). When the fully coupled model is used, compensating errors can hide the role played by the subgrid scale processes that regulate a large part of the exchanges of energy, water, and matter between the surface and the free atmosphere or constrain the related parameterizations to work in unrealistic conditions (e.g., Diallo et al., 2017;Roehrig et al., 2013). On the contrary, working with individual components impedes the activation of important couplings and feedbacks. Configurations with prescribed sea surface temperature (SST) and sea ice concentration (SIC) allow us to overcome these difficulties. These configurations are referred to as Atmospheric Model Intercomparison Project (AMIP) configurations. Together with an intermediate configuration, such as a nudged configuration in which the large-scale dynamics (i.e., the zonal and meridional wind components) is nudged towards reanalysis Coindreau et al., 2007;Diallo et al., 2017), these AMIP-like configurations are used here to assess how realistic the continental surface-atmosphere interactions simulated by the IPSL-CM are and to help interpret the fully coupled simulations produced with the atmospheric (LMDZ) and land surface (ORCHIDEE) components of the IPSL-CM.
The focus of the present analysis is put on the processes which control the energy and moisture exchange at the surface. Main features of the near surface climate over continents in the historical simulations done with the full IPSL-CM are documented in a companion paper (Boucher et al., 2020), where the biogeochemical aspects of the land surface-atmosphere coupling are considered.
In section 2, the major changes of LMDZ, ORCHIDEE, and their interface are summarized, and the simulations used for the analyses are described. The evolution of the main biases in near-surface variables since CMIP5 is analyzed in the third section, and sensitivity studies are used to identify the source of these biases. The role of the parameterizations and of the adjustment or tuning  is discussed. In the fourth section, the impact of the modified parameterizations on essential variables of the coupling (radiation, evaporation, precipitation, and surface soil moisture) is discussed for hotspot regions (Koster et al., 2004) such as Central North America and a region in the Sahel where the land surface coupling is strong but largely model-dependent (Boé & Terray, 2008;Hohenegger et al., 2009). The fourth section also deals with river discharge and its response to precipitation. It is a central target for a CM for several reasons: One of them is that the freshwater discharge into the Arctic Ocean from the boreal rivers affects the global climate system by impacting the thermohaline circulation (Peterson et al., 2002). It is also a valuable source of information for utilization of global water resources and prevention of floods and drought which can both increase the risk for populations in the context of climate change (Arnell & Gosling, 2013;Schewe et al., 2014). For some basins, it is possible to compare the results with observations which provides an assessment of the hydrological cycle over major watersheds. In the last section, the results are summarized, and directions for further improvements are presented.

The Atmospheric Model
LMDZ is the atmospheric General Circulation Model (GCM) that has been developed for about 30 years at the Laboratoire de Météorologie Dynamique (LMD). The versions of LMDZ used for Phases 5 and 6 (hereafter called 6A) of CMIP together with the configuration adopted are described in Hourdin et al. (2006) and Hourdin et al. (2020). The main contribution of IPSL to CMIP5 was done with a package hereafter called AP and referred to as "IPSL-CM5A" in the CMIP5 database. Version 6A is an improved version of the "new physics" package, a preliminary version of which has been tested for CMIP5  and is referred to as "IPSL-CM5B-LR" in the CMIP5 database. The changes from the AP to the "new physics" version are linked to a complete rethinking of the parameterizations of turbulence, convection, and clouds and are described in Hourdin et al. (2013). The main model modifications between the "new physics" and 6A are the revision of the eddy diffusion Yamada (1983) 1.5 order turbulent scheme already implemented in the new physics, the introduction of a stochastic triggering designed to make the frequency of occurrence of new convective systems within a mesh aware of the grid cell size , a modification of the thermal plume model for the representation of stratocumulus clouds (Hourdin et al., 2019), the introduction of the latent heat release associated with water freezing (not accounted for so far), and a new parameterization of non orographic gravity waves targeting the representation of the quasi-biennial oscillation (QBO). These changes were accompanied by a significant refinement of the vertical grid, both for the QBO issue and for a better representation of boundary layer clouds. The radiative codes in LMDZ are inherited from the ECMWF weather forecast model. In version AP a "wide band" spectral model was used both in the thermal infrared and in the shortwave (SW) spectrum (Morcrette, 1991). In version 6A, the infrared part was replaced by the RRTM code (Mlawer et al., 1997), based on a k-correlated scheme with 16 spectral bands. For the SW radiation the number of spectral intervals increased from 2 to 6 in order to better distinguish near infrared, visible, and ultraviolet radiation.
For the setting of the 6A version, particular attention was paid to the very stable boundary layers that occur over the ice sheet plateaus, sea ice, and boreal lands. Such boundary layers can experience very weak and intermittent turbulence even close to the ground surface, pushing the current state-of-the-art subgrid mixing parameterizations and underlying physical assumptions to their limits and even beyond. Together with the refinement of the vertical grid, the computation of the eddy diffusion in the Yamada (1983) scheme was revised. Minimum threshold values of the mixing length and of the stability functions of the eddy diffusion coefficient have been significantly decreased (Table 2) to allow for a cutoff of turbulence at a few meters above the surface in the very stable conditions encountered over the Antarctic Plateau and to obtain more realistic sharp vertical gradients in very stable atmospheric boundary layers (ABLs) . Such threshold values are often set in operational numerical models to compensate for the nonrepresentation of subgrid mixing processes and to prevent excessive near-surface cooling over land in winter (e.g., Sandu et al., 2013). The sensitivity of the continental temperature at seasonal and diurnal scale to the values of the thresholds will be discussed in sections 3.2 and 3.4. Moreover, a new numerical treatment of the Turbulent Kinetic Energy (TKE) equation in the new scheme prevents an artificial cutoff of the turbulence at standard time step values that could occur in previous model versions even at moderate stability Vignon, 2017).
The need to remove thresholds in the turbulence scheme to properly model the stable ABL over the Antarctic Plateau also raises the need to parameterize more explicitly the additional sources of mixing in other regions of the globe such as orography-induced small scales gravity-wave drag (Steeneveld et al., 2008) or the drag induced by vertical obstacles penetrating the boundary layers such as trees (Masson & Seity, 2009;Nepf, 1999). Due to the refinement of the vertical grid of the model several layers can intersect high vegetation. The loss of large-scale kinetic energy due to these drags is converted into TKE. The evaluation of the orography-induced gravity-wave drag is based on the scheme developed by Lott (1999), while the drag due to high vegetation is set proportional to the vegetation fraction, which penetrates the boundary layer. The two subgrid scale mixing processes generate TKE, which is accounted for in the prognostic equation (see Appendix A for details). The impact of these new developments on near-surface atmospheric variables are illustrated in section 3.4.

The Land Surface Model
The land surface is described by the ORCHIDEE model v2.0. The ORCHIDEE model v2.0 computes primarily the fluxes of energy, water, and carbon that are exchanged between the different soil and plant reservoirs and the exchange of these fluxes with the atmosphere. In addition, it computes the stocks of water and carbon in the different soil and plant reservoirs and the energy stored in the different soil and snow layers. Model state variables are prognostic, including the Leaf Area Index (LAI), as they are updated at each time step after the calculation of the fluxes between all reservoirs. The module computing dynamically the LAI, the vegetation albedo, and the soil water stress function applied to transpiration is activated for all CMIP experiments, except for the HighResMIP one (see section 2.4) that uses prescribed values. The vegetation properties are defined by plant functional types (PFTs), and their fraction within each grid cell is globally set from land cover maps that were derived specifically for the CMIP6 simulations (Lurton et al., 2020). These maps combine the historical maps from the land use harmonization database (LUH2v2h, Hurtt et al., 2011) and the maps derived from satellite observations (Bontemps et al., 2015). See https://orchidas. lsce.ipsl.fr/dev/lccci/, for more information. The water and energy budgets are computed at the same timestep as the atmospheric physics  using classical soil-vegetation-atmosphere transfer (SVAT) parameterizations. The most relevant modification since the version used for CMIP5 is related to the soil hydrology, the snow scheme, and the background albedo. The 2-layer conceptual parameterization (hereafter referred to as "Choi", (Ducoudré et al., 1993) used for CMIP5 is a double bucket model that has an upper layer with a varying depth that can appear at the surface after a rainfall event to deal with short-time processes and disappears after dry spells (Manabe, 1969). "Choi" refers to the scheme that Choisnel developed and tested for cultivated area over France. Laval (1988) showed that this model improved the sensible and latent heat flux computation on the original bucket model when introduced into the LMD Atmospheric GCM. In the version used for CMIP6, the vertical water transport is described using Richard's equation (De Rosnay et al., 2002;d'Orgeval et al., 2008) discretized with 11 layers. The layer thickness increases downwards and is doubled between each consecutive layer. The soil moisture column is active over 2 m; a free drainage condition is imposed at the bottom of the reservoir. This scheme hereafter called ctrl, as it is now the reference version for IPSL-CM, is sometimes referred to as the "11-layer" ORCHIDEE scheme. The potential of improvement of an early version of this scheme coupled with the AP and "new physics" versions of LMDZ has been tested in Cheruy et al. (2013) and Campoy et al. (2013). The soil thermodynamics and in particular the soil thermal properties have been revised by Wang et al. (2016). They have a significant impact on the surface temperature and its high frequency variability in all regions except for the moist regions (Cheruy et al., 2017). The vertical discretization for temperature is now identical to that adopted for water, with a minimum soil depth increased to 10 m (and even 90 m when the soil freezing is accounted for) so that the condition of zero flux at the bottom can be checked globally and annually. The soil properties (hydraulic and thermal) depend on soil moisture and soil texture, with three possible classes (sandy loam, loam, and clay loam). The dominant soil texture is assigned to each grid cell, based on the 1°soil texture map of Zobler (1986). The soil heat capacity is parameterized as a function of the heat capacity of the dry soil and the liquid water profile and when necessary the ice profile. The soil freezing is allowed and diagnosed in each soil layer following a scheme proposed by Gouttevin et al. (2012), but the latent heat release/consumption associated with water freezing/thawing is not accounted for. The freezing state of the soil mainly impacts the computation of soil thermal and hydraulic properties, reducing for instance the water infiltration capacity at soil surface. Wang et al. (2013) replaced the snow scheme of Chalita and Le Treut (1994) by a three-layer scheme of intermediate complexity largely inspired by that proposed by Boone and Etchevers (2001). A routing module Polcher, 2003) transforms the total runoff in each subbasin into river discharge through the ocean. This routing scheme relies on a cascade of linear reservoirs along the river network (stream reservoirs), complemented in each grid cell by two local reservoirs, to account for the delay between surface runoff and drainage, on the one hand, and overland and groundwater flow to the stream reservoir, on the other hand. When using Choi, which does not separate total runoff into surface runoff and drainage, an arbitrary partitioning is imposed, with 5% feeding the fast reservoir and 95% feeding the slow reservoir (Guimberteau et al., 2014). In the multilayer version of ORCHIDEE, evaporation from bare soil following a supply and demand pattern that is controlled by the moisture present in the surface layers of the soil (the four soil layers of the model closest to the surface), which evaporates at the potential rate if the soil moisture supply meets the demand.
The continental ice-covered surfaces (ice sheets and glaciers) are not included in ORCHIDEE, but they are treated in a specific module within LMDZ. Momentum and heat roughness heights as well as visible and near infrared albedos are set to constant values representative of snow conditions over the Antarctic Plateau (Vignon et al., 2018). The heat transfer in the snow and ice is parameterized as a conductive process with a fixed thermal inertia (Hourdin, 1992). The vertical grid is made of 11 vertical levels to represent the e-folding damping of thermal waves with typical periods from 1,800 seconds to 240 years. The value of the snow thermal inertia was calibrated to obtain realistic surface temperature and diurnal cycle amplitudes in Antarctica .
Le Quéré et al. (2018) have recently used a version of ORCHIDEE (referred to as Orchidee-Trunk), which is simular to the version used for CMIP6 in an intercomparison project focussing on the carbon and water fluxes where 15 other land surface models (LSMs) were involved. The skill scores obtained by ORCHIDEE are among the highest for most of the variables considered in this study and in particular for evapotranspiration, LAI, and runoff (see table B2 in Le Quéré et al., 2018), which are directly involved in our study.

The Coupling with the Surface
In the surface layer the boundary layer model uses Monin-Obukhov theory and bulk formulations proposed by Louis et al. (1982) to parameterize turbulent fluxes. Several modifications were made in the representation of the surface layer of LMDZ as well. First, and consistently with the changes done in the boundary layer to allow strong decoupling in stable atmospheres, the so-called "long-tail" stability functions from Louis et al. (1982) that artificially enhance the surface turbulent fluxes in stable conditions were replaced by more realistic "short-tail" functions from King et al. (2001). This was shown to significantly improve the representation of surface temperature on the very flat ice sheet of the Antarctic plateau.
A second important change is related to the computation of surface roughness height z 0 . At the surface itself, heat and humidity transfer are dominated by molecular diffusion, which is less efficient than the momentum transfer due to the pressure forces that are related to the geometry of the roughness elements of the surface (Garratt & Hicks, 1973). For these reasons the roughness heights for the momentum are currently much higher than that of heat or humidity. While a unique value was used in former versions for all the model state variables, a different value is now used for horizontal momentum z 0m and thermodynamical variables z 0h or tracers z 0a for all individual type of subsurface (land, sea, sea-ice, continental-ice). For each PFT ORCHIDEE used a prescribed value for the roughness height for heat and moisture independent of the development of the vegetation over continents. For a grid point composed of different types of vegetation, an effective surface roughness is calculated based on the flux conservation over the grid point. This value was also used for z 0m in LMDZ. Measurement campaigns often suggest that the roughness height for heat should be one-tenth of that for momentum for homogeneous surfaces and even less for heterogeneous surfaces (Malhi, 1996). Some studies propose that over vegetated areas the roughness height can be parameterized as a function of the LAI. This is the case for the model proposed by Massman (1999) and tested by Su et al. (2001), which has been implemented in ORCHIDEE v2.0. In forced mode, the dynamic roughness heights computed for each PFT as a function of the LAI help reduce latent heat calculated in winter on temperate sites, in good agreement with multiannual Fluxnet measurements (Figure S1, (https://fluxnet.fluxdata.org/data/la-thuile-dataset/)). Still in forced mode, the dynamic roughness heights impact the river discharge at the scale of individual watersheds with significant improvements for the Danube and the Mississippi watersheds (not shown). The impact of activating the dynamical roughness height in coupled simulations is limited for the considered space and time scales (see section 3.4), but the option is activated for all CMIP6 simulations.
Another important change in ORCHIDEE is related to the individual albedo of the bare soil and of the 14 vegetated PFT, which is now optimized with respect to MODIS observations. This calculation of the individual albedo leads to significant improvements, especially over deserts and semiarid areas where the albedo was significantly underestimated. The improvement is illustrated by a comparison of the time series of the albedo simulated with the CMIP5 and the CMIP6 versions of the model and measured at the sites in the Sahel (Figure 1). The simulation is forced to follow the synoptic variability by relaxing the large-scale circulation toward meteorological analyses, which allows direct comparison of the time series of the observations and the simulations Coindreau et al., 2007).
The near-surface (i.e., 2 m) temperature (CMIP6 variable tas, which is one of the most analyzed variables in CMs especially in relation to climate change impacts) is diagnosed through a procedure based on the Monin-Obukhov theory (Hess, 1995). The procedure involves both surface and first model-level variables. In situations when turbulence is very weak and the atmospheric layer above the surface is dry but the surface soil moisture is far from the residual moisture, the procedure occasionally fails, leading to nonphysical values for one timestep. When the procedure fails screen level temperature can reach 450 K and screen level specific humidity becomes negative (see Appendix B). Since the problem occurs rarely, and when it does occur it is only during one timestep in the day (very exceptionally it can occur during two or more timesteps in the day), it was undetected in the final version used for the production of the CMIP6. The problem affects the maximum daily near-surface temperature, the minimum daily near-surface relative humidity, and marginally the daily averages of these quantities. It occurs approximately 1,700 times (respectively, 2,200 times) in a simulated year of the AMIP (respectively, PreIndustrial Control [piControl], Eyring et al., 2016) experiments, which is very rare compared to the (365 × 144 × 142 × 96) times that the calculation is performed in one year of simulation. The CMIP6 experiments presented in this paper have not been rerun due to the time constraint imposed by the CMIP exercise; however, an a posteriori correction method has been developed. All the CMIP6 data that have been affected by this problem have been either unpublished or corrected a posteriori and republished.
The a posteriori correction method applied to the published data is given in Appendix B with an estimation of the associated uncertainties, which is of the order of several tenths of degrees for the daily values. Due to the low value of the reconstruction errors for the monthly mean values (30 times less than the reconstruction errors of the daily values), it was decided not to make a correction to the monthly values. The great advantage is that the monthly tas values are absolutely consistent across all CMIP6 experiments, regardless of whether the daily values have been corrected or not. According to these investigations we are confident that all the published values can be used safely for climate analysis.

Setup of the Simulations
To document the impact of the changes described in the previous sections, simulations forced by observed SST and SIC are produced by combining final versions of atmospheric physics (AP and 6A) and of the soil hydrology (Choi and ctrl), namely, APChoi (corresponding to the IPSL-CM5A in the CMIP5 database), APctrl, 6AChoi, and 6Actrl (corresponding to the IPSL-CM6A in the CMIP6 database) ( Table 1). A monthly mean climatology of SST and SIC calculated over the years 1978-2008 is used for the simulations in order to minimize the impact of the interannual variability in the evaluation. The 6Actrl experiment is also compared with the results of the AMIP experiment for which a 20-member ensemble has been produced and is published in the CMIP6 database. The impact of the new developments is documented thanks to an additional set of sensitivity experiments with the 6A physics, where the new features of the LSM and the ABL are individually tested ( Table 2). The horizontal  Deactivating dynamical roughness heights (Su et al., 2001) and using prescribed values with z 0m = z 0h 6Arsol Activating resistance to bare soil evaporation 6Aric No increased mixing in the stable PBL: ric = 0.20 (reference = 0.18) and lmximin = 0 6Aric83lmx Artificially increased mixing in the stable PBL:   . We also document simulations performed with a much finer grid of 50 km for the HighResMIP part of the CMIP6 exercise (Haarsma et al., 2016). Comparison of these low and high resolution versions allows us to distinguish the part of the model bias linked to the coarse resolution from that more fundamentally related to the model physical content. Nudged simulations in which the large-scale wind fields (zonal and meridional wind components) are relaxed towards the ERA-Interim reanalyzed winds (ERA-I, Table 3) with a time constant of 3 hours are also used and help assess a possible contribution from large-scale circulation deficiencies to the continental bias. Based on previous experience, it is known that a time constant of several hours (3-12) is short enough to constrain the large scale circulation and long enough for the physical parameterizations to fully operate (for wind nudging at least). More details on this approach can be found in Diallo et al. (2017). The first 3 years of all experiments, corresponding to the spin-up time of the hydrological model, are disregarded in the analysis.

Reference Data Sets
The sets of global gridded data used as a reference to evaluate the sensitivity experiments are listed in Table  3. They consist of a site-observations upscaled products for evaporation (Jung et al., 2011), satellite-based land evaporation, and surface soil moisture derived through data assimilation processes in the Global Land Evaporation Amsterdam Model (GLEAM) (Martens et al., 2017) and the ESA-CCI blended active and passive microwave retrieval of surface soil moisture (Dorigo et al., 2017), CERES-EBAF for surface SW radiation (Kato et al., 2013), and the Global Precipitation Climatology Project (GPCP) monthly product resulting from an integration of various satellite data sets and a gauge measurements analysis over land (Adler et al., 2003) for the precipitation. The total column integrated water vapor is evaluated using the reanalysis and extension of the NASA Water Vapor Project (NVAP) data set which comprises a combination of radiosonde observations, Television and Infrared Operational Satellite (TIROS) Operational Vertical Sounders (TOVS), and Special Sensor Microwave/Imager (SSM/I) data sets (Vonder Haar et al., 2012). The river discharges are extracted from the Global Runoff Data Center (GRDC) database (Milliman & Farnsworth, 2011). The Snow Cover Extent (SCE) is extracted from the output from the Interactive Multisensor Snow and Ice Mapping System (IMS) at the National Ice Center (NIC) processed at Rutgers University and included in the NOAA Climate Data Record (CDR) of Northern Hemisphere (NH) Snow Cover Extent. For the minimum and maximum daily temperature, we used the Climatic Research Unit (CRU) Time-Series (TS) version 4.01 of high-resolution gridded data (Harris et al., 2014).
The observations cover a period of at least 10 years compatible with the SST and SIC climatology used to force the model. We suggest that not considering the exact same periods for the simulations and the observations only has a minor impact on the results given that the model internal variability is damped due to the use of a climatological SST and SIC.
In the supplementary material, ERA-5 data (https://cds.climate.copernicus.eu/cdsapp#!/home) are used as a benchmark in addition to ERA-Interim data to evaluate the bias in the air-temperature (supporting information Figure S2).

Impact of the Revision of the Eddy Diffusion Parameterization
The improvements resulting from the revision of the turbulent scheme between the AP version and the 6A version of LMDZ are illustrated in Figure 2, showing the mean seasonal cycle of the air temperature for the first three atmospheric levels of LMDZ (version AP and 6A) together with the measurements recorded at six levels on the 45 m height mast at Dome C (75.1S,123.3E), Antarctic Plateau. For version AP (Figure 2a) an overall winter-time warm bias (up to 10 K) in the surface layer is noticeable. This version was also unable to properly reproduce the dynamical behavior of the very stable Antarctic boundary layers (Vignon et al., 2018), and further analysis of the vertical temperature profile in the first few hundred meters above Dome C revealed a significant underestimation of the climatological temperature inversion (not shown). For version 6A, both the near-surface temperature and its vertical gradient are in good agreement with observations from the surface up to the top of the mast ( Figure 2). The dynamics of very stable boundary layers is also much better simulated (Vignon et al., 2018) than in version AP. Figure 3 shows how version AP and 6A perform in a single-column configuration used to simulate the test case of DIurnal land-atmosphere Coupling Experiment (DICE) (Kansas, latitude 37.65°N, longitude 263.265°E) far from the ice sheets regions. The simulations cover a period of 3 days and three nights and the last night which is stable, and cloudfree is well suited to test the boundary layer scheme under stable condition. The hydrological scheme is bypassed by prescribing the ratio β of evaporation to potential evaporation and the surface thermal inertia to a value adjusted to the DICE case during the full run (Aït-Mesbah et al., 2015). For night time the near-surface temperature inversion is much stronger in 6A than for the AP run ( Figure 3a). The sensible heat flux is reduced with the 6A version and closer to the observations than the AP version (Figure 3b), which produced a too strong vertical mixing.

Relative Impact of Atmospheric and Land Surface Components on the Biases of Near-Surface Variables
Most of the biases in evaporation, 2 m temperature, SW downward radiation at the surface, surface albedo, precipitation, and total precipitable water can be analyzed by inspecting zonal mean variables over the continents ( Figure 4). To further comment regional aspects, maps of mean annual, JJA and DJF bias in 2 m temperature are depicted in Figure 5. The corresponding bias maps are displayed in the supplementary  Figure S7). The maps corresponding to the first member of the AMIP ensemble are also plotted, in order to confirm the representativity of the 6Actrl experiment with respect to the AMIP published data. We verified that the features discussed hereafter are shared by the other members of the AMIP ensemble. The statistical significance of the changes caused by the new land surface and atmospheric physics is assessed geographically for each studied variable in Figures 6 (6Actrl-6AChoi) and S8 (6Actrl-APctrl), with very similar results to the differences maps for APctrl-APChoi and 6AChoi-APChoi (not shown).
A clear improvement of the CMIP6 reference configuration 6Actrl is related to the radiation budget. This improvement is illustrated by the reduction of the bias in the downward SW radiation at the surface in 6Actrl (Figures 4d and 4j) and can be attributed to the improvement of the representation of the Cloud Radiative Effect (CRE) coming from the modification in the parameterizations and to the improved tuning of the model free parameters targeting the CRE . Consistently with the overall reduction in the SW radiation bias ( Figure S3) and in the evaporation bias especially over the continental United States ( Figure S4) the strong warm bias over the midlatitudes in summer ( Figure 5) that was shared by many models participating in CMIP5 (Cheruy et al., 2014) is reduced in the 6Actrl configuration used for CMIP6. Over the continental United States, Al-Yaari et al. (2019) also showed that the general agreement between areas of strong warm bias and areas of strong precipitation and soil moisture deficits is good. In the 6Actrl configuration the precipitation deficit is also significantly reduced ( Figure S9) and the surface soil moisture is in better agreement with the observations (see section 4.1). In connection with the developments on the vertical diffusion scheme, the warm bias that extended over a large part of the polar and boreal regions in winter is reduced or even replaced by a cold bias over part of the Arctic continent and Ocean, Greenland, and Antartica ( Figure 5). The cold bias is probably overestimated over Greenland, the Artic Ocean, and Antartica due to a warm bias diagnosed in ERA-I (Jakobson et al., 2012;Reeves Eyre & Zeng, 2017;Vignon et al., 2018). When using ERA-5, as reference data set instead of ERA-I, the bias over Greenland is  . Zonal mean bias in December-January-February (DJF) and June-July-August (JJA) over continents for the evaporation (a, g), the precipitation (b, h), the air temperature (c, i), the downward shortwave (SW) radiation (d, j), the surface albedo (e, k), the total precipitable water (f, l) in 6Actrl (thick black curve), 6AChoi (dashed black curve), APctrl (thick red curve), and APChoi (dashed red curve). For precipitation the blue curve corresponds to the absolute value of the observations (y-axis on the right side). The references are described in Table 3.

10.1029/2019MS002005
Journal of Advances in Modeling Earth Systems reduced in DJF (not shown). Over the Arctic continent Lindsay et al. (2014) report that ERA-I has a bias of less than 0.5 K compared to the observations.
As a result of the new snow scheme and of the optimization with respect to the MODIS observations, the surface albedo in the ctrl model is improved in most regions in winter (Figures 4e and S5) and over deserts (notably the Sahara) over the year. The new snow scheme improves the snow cover which was significantly underestimated with Choi ( Figure 7). With the exception of the surface albedo and to a lesser extent the evaporation, the overall structure of the bias is only marginally sensitive to the land surface scheme whose impact is mostly relevant at the regional scale ( Figures 5, 6, S3, S4, S5, S6, and S7).
When considering the continents globally, Choi and ctrl both overestimate the evaporation (especially in winter) regardless of the atmospheric model with which it is coupled. This overestimation is slightly less Figure 5. Mean multiannual bias in 2 m temperature (tas) for December-January-February (first column) and for June-July-August (middle column ) and for the full year (last column) in the APChoi (first row), APctrl (second row), 6AChoi (third row), 6Actrl (fourth row), and AMIP configurations (fifth row). The reference is given by ERAI reanalysis averaged for the 1979-2014 period.

10.1029/2019MS002005
Journal of Advances in Modeling Earth Systems Figure 6. Significance of the multiannual differences between configurations 6Actrl and 6Achoi, with gray coloring where the difference is not statistically significant based on Student's t test (with a p value < 0.05). The first five maps show the yearly mean differences for the air temperature, evaporation, precipitation, downward SW radiation at the surface, surface albedo, and the bottom right map shows the winter (DJF) difference in surface albedo. The displayed means and standard deviations are calculated over the whole globe including non significant points.  (Table 4), albeit this result is modulated at regional scale ( Figure 6). Investigating the minimum and maximum daily temperature shows a widespread warm bias of daily minimum temperature over the midlatitude (Figure 8). This bias is present over the whole year for the AP physical package used for CMIP5, and only in JJA for the 6A package used for CMIP6, it is very marginally sensitive to the land surface scheme. This is consistent with the reduction of the turbulent mixing in the PBL for the stable boundary layers obtained with the 6A atmospheric physics and with the results of Wei et al. (2017), which suggested that a bias in the simulated PBL mixing could very likely contribute to the temperature bias common to most of the models that participated to CMIP5 with AMIP experiments. The moist atmospheric bias over the midlatitudes in JJA (Figure 4l) could contribute to the warm bias of daily minimum temperature by minimizing the nocturnal radiative cooling, but further investigation is needed to explain this bias, which is shared by other models participating in CMIP6.

Atmospheric Process Sensitivity to the LSM Choice
The above analysis has shown that, for most variables and skills considered, the changes due to the atmospheric physics are larger and more broadly significant than the ones dues to the land surface physics, as confirmed by the comparison between Figures S8 and 6, respectively. We detail here the sensitivity of the 6A atmospheric physics to the LSM choice ( Figure 6). The differences between 6AChoi and 6Actrl are statistically significant (at the 5% level) over most of the continents for all variables but precipitation. The seasonal differences (not show but for winter surface albedo) are very consistent with the yearly differences. Ctrl induces a significant cooling over an extended region going from Siberia to China (up to 3 K locally in absolute value). This cooling is accompanied with an increase in evapotranspiration, some local reductions of the downward SW radiation, but also a widespread decrease of surface albedo (mostly driven by the summer season thus by vegetation), which is probably overruled by the large increase in albedo in winter. In contrast, large land areas of the southern hemisphere exhibit a significant warming from Choi to ctrl, along with an evaporation decrease, a decrease in surface albedo, and a downward SW radiation increase. Two exceptions can be isolated to the cooling/warming response to evaporation increase/decrease. The first one is the Sahara, where air temperature is reduced with ctrl, despite significant reduction of evaporation and increase of incoming SW radiation: The reason is the substantial albedo increase in this area, like in most sparsely vegetated zones. The second exception comprises the humid equatorial areas (intertropical convergence zone), where surface air temperature decreases without any significant evaporation change: There, the main driver seems to be the reduction of incoming surface radiation, likely related to precipitation increases, although these changes are rarely significant, and mostly in JJA. Precipitation is also significantly impacted by the choice of the LSM over monsoon regions, like Western Africa in JJA and Southern Amazonia in DJF, where ctrl tends to reduce evaporation and precipitation. The few spots over tropical oceans where the change in precipitation and evaporation are significant are probably due to slight modifications of the circulation in response for instance to the temperature changes. However, the amplitude of the changes is very low with respect to the typical oceanic values in those regions. When considering the continents globally, Choi and ctrl both overestimate the evaporation (especially in winter). This overestimation is slightly less for ctrl (Table 4), albeit this result is modulated at regional scale ( Figure S4).

Sensitivity Experiments
In order to further interpret the above results we use the sensitivity experiments described in Table 2.
Sensitivity simulations to the strength of the decoupling in stable condition were performed by changing the values of the minimal mixing length (lmixmin) and of the critical Richardson number (ric) above which the stability functions of the turbulent diffusion coefficient reach their lower-bound value (see Figure 2 in Vignon et al., 2017 for details). Those two thresholds enhance the mixing and prevent the turbulence cutoff in very stable conditions (Table 2, 6Aric, 6Aric83lmx). Figure 9 shows the impact of the sensitivity Note. APChoi corresponds to the IPSL-CM5A configuration used for CMIP5, and 6Actrl corresponds to the IPSL-CM6A configuration used for CMIP6. The last line corresponds to the mean value over continents calculated with the observation described in Table 3.

Journal of Advances in Modeling Earth Systems
experiments on the zonal means in JJA and DJF. North of 60°N, the near-surface temperature is highly sensitive to these thresholds. Allowing less decoupling (Table 2, 6Ari83lmix) significantly reduces the cold bias over continental areas in winter, but it deteriorates the vertical temperature gradient over the Antartic Plateau shown in Figure 2. A further increase of the decoupling with respect to the configuration adopted for CMIP6 (Table 2, 6Aric) leads to a reduction of the winter-time minimal temperature but does not impact the night-time bias in summer for the 6A version (not shown).
As expected, the orography-induced TKE production ( Table 2, NoOro, also see Appendix A) tends to warm the midlatitude and boreal-latitude in winter which partially counterbalances the effect of the reduced vertical diffusion for the stable boundary layers (not shown). The impact of deactivating the drag induced by the vegetation penetrating the boundary layer (Table 2, NoTree) is negligible for the near-surface temperature (not shown).
A sensitivity simulation focusing on the evaporation for the ctrl model was also designed to target the bare soil evaporation, which can reach the potential rate when the moisture in the first four layers of the soil is higher than the residual moisture (Table 2, 6Arsol). It is likely that the potential rate of evaporation leads to an overestimation of evaporation when patches of soil begin to dry out in the grid cell. To overcome this defect a resistance to bare soil evaporation can be added to the aerodynamic resistance. This approach has been implemented in ORCHIDEE using the formulation proposed by Sellers et al. (1986). The activation of this option reduces the evaporation (Figures 9a and 9g). However, amplifying the SW radiation bias at the surface over the midlatitude north (Figures 9d and 9j) and reducing the evaporative cooling results in  . Zonal mean bias in DJF and JJA over the continents for the evaporation (a, g), the precipitation in percent and the absolute value of the observation (b, h), the air temperature (c, i), the downward SW radiation at the surface(d, j), the surface albedo (k, e), the total precipitable water (f, l) in the reference experiment 6Actrl (thick black curve), in sensitivity experiments for the turbulent mixing 6Aric (thick blue curve), in 6Aric83lmx (blue thin curve), and in sensitivity experiment with a resistance to bare soil evaporation activated (thick pink curve).

Journal of Advances in Modeling Earth Systems
a strong warm bias (Figure 9c and 9i). The impact of deactivating the dynamical roughness height (Table 2, noz0Su) is detected at the regional scale on the evaporation and the temperature, but it is quite limited for the considered space and time scales (not shown).

Specific Regional Changes
Several biases mostly rely on regional features and are discussed in this section.
• Sahara: The cold bias between 15°N and 30°N in all simulations and for all seasons (Figures 4c and 4i) is mostly the signature of a cold bias over the Sahara ( Figure 5). It may be due to aerosol specification or to the failure to consider emissivity lower than the unit in the radiative transfer calculations. The cold bias is less pronounced with the AP physics because the strong positive bias in the downwelling SW radiation and the underestimation of the surface albedo compensate the excessive surface cooling due to the overestimated value of the surface emissivity. • Tibetan Plateau and High-mountain Asia: There is an overall cold bias in winter between 30°N and 40°N (Figures 4i and 5), which is particularly strong over the Tibetan plateau and High-mountain Asia where it is associated with a surface albedo bias ( Figure S5) and an overestimation of the snow fraction ( Figure 7). The albedo bias is not present in ORCHIDEE stand-alone simulations (not shown) and is lower when the old snow scheme is activated. This is consistent with a difficulty of the land-atmosphere model to melt snow leading to a too high albedo inducing a positive feedback on the temperature because of a deficit in net SW radiation at the surface. The weaker bias produced by the old snow scheme is consistent with the underestimation of the snow albedo, which was already documented by Wang et al. (2013). These results confirm that land surface atmosphere feedbacks play a significant role in this region. The temperature and the albedo biases are weaker in the HighResMIP simulations (not shown) and in nudged simulations ( Figures S10 and S11). The weakening of the bias obtained with the increase in resolution or with the wind nudging confirms that the regional circulation is an important component of the High-mountain Asia climate. A high resolution allows us to represent more realistic contrasts of the snow cover between the lowlands and the high mountains. It is also a way to better simulate the role of the orographic barrier played by the High-mountain Asia that stops the northward transport of moisture originating from the Indian subcontinent. This barrier explains the dryness of the Tibetan plateau (Krishnan et al., 2019;Ménégoz et al., 2014;Sabin et al., 2013), where an excess of moisture flux is simulated at coarse resolution, inducing a positive bias of snow cover that is enhanced by surface feedback. In the same way, by correcting the regional circulation, the nudging can reduce the positive bias of snow cover which impacts the surface albedo. • Central Asian lowlands: The more realistic representation of the snow albedo and the increased decoupling for stable boundary layers help obtain more realistic near-surface temperatures but does not eliminate the strong warm bias present in winter on the Central Asian lowlands in CMIP5 ( Figure 5, DJF). The temperature bias is further reduced when the large-scale circulation is relaxed toward meteorological analysis. The nudging reduces also the total precipitable water (not shown) that is greatly overestimated in this region. These results suggest that the large-scale dynamics contribute to the bias by a too strong moisture advection, the latter limiting radiative cooling. A residual negative bias in surface albedo ( Figure S5 DJF) can also contribute to the warm bias. In summer, the warm bias is also present, but it is mainly associated with an excess of SW radiation at the surface. • Eastern Siberia: Regardless of the model version, a strong warm bias persists in the extreme north-east of Siberia, north of the Sea of Okhotsk, and north of the Bering Sea. The bias is not present in the nudged-bywind simulations ( Figure S9), and it is less marked when the new snow scheme and the soil freezing are activated (NoSnowFreez experiment in Table 2, not shown). The bias is also reduced when the decoupling is increased. This suggests that both large-scale circulation and local processes and their interactions play a significant role in this region. Journal of Advances in Modeling Earth Systems that participated in CAUSES are able to correctly represent the diurnal cycle of the precipitation evaluated with the Atmospheric Radiation Measurement Best Estimate (Xie et al., 2010). The nudging does not allow to reduce the bias. In this region, rainfall comes from two different convective regimes. The first regime is associated with a local triggering of convection induced by daytime heating, and the second regime corresponds to the propagating systems over the Great Plains, initiated in the lee of the Rockies (Klein et al., 2006). The precipitation associated with the first regime is fairly well represented by the LMDZ model with a maximum delayed in the afternoon, which is a robust improvement of the convective scheme Rio et al., 2009). The night-time maximum due to propagative systems is absent in most models of CMIP5/CMIP6 and in particular in the simulations of LMDZ, which has no parameterization for this type of propagative system. • Amazonia and Central Africa: The strong warm bias present in the simulation with the AP atmospheric physics does not exist in the simulation with the 6A physics ( Figure 5) as a result of the strong reduction of the downward SW radiation at the surface discussed in section 3.2.

Tuning of the Global Model and Near-Surface Temperature Over Land
Significant efforts have been made to improve the physical content of the parameterizations. Yet, they remain an idealized and approximate representation of processes. As a consequence adjustment and tuning are unavoidable when all the atmospheric, land surface, and oceanic components are coupled .
A tuning of subgrid scale orography (SSO) was performed to better represent the atmospheric heat transport toward the Arctic Ocean, which is a key region for sea-ice formation and melting. The SSO schemes are applied to represent the blocking effect of orography at low levels and the breaking of gravity waves (see Lott, 1998 and more details in Appendix A). The sensitivity experiments (Table 2) reveal that the SSO tuning has an impact on the near-surface temperature mostly during the cold season, from November to March (see Figure 10), which is consistent with the established impacts of orography onto the large scale atmospheric circulation (Holton, 2004). Increasing the blocking effect of orography through the drag scheme cools Eurasia and warms western North America. This is consistent with a large blocking effect over the Rockies when increasing the drag, inducing anomalous southerly warm advection upstream, and northerly cold advection downstream (Holton, 2004). The sensitivity to the lift that modifies the flow direction shows different effects, with warm anomalies upstream of the Rockies and Himalayas and cold anomalies downstream of the Rockies and Himalayas, but with a larger amplitude, different location, and with a zonal-wavenumber 2 structure. The lift effect results from applying a force perpendicular to the local flow over orographic barriers. It causes larger meridional flow anomalies than the drag, which explains the stronger impact in terms of surface air temperature. The tuning of the version 6A was mainly done by increasing the drag and slightly reducing the lift parameter so that the tuning may have contributed to enhance the cold bias over Siberia, while reducing it over North America. However, the temperature anomalies explained by the new tuning remain small when compared to the bias itself.
An essential aspect of the tuning is to ensure that the radiative budget at the top of the atmosphere is in equilibrium and that the latitudinal distribution of each component of the radiative budget is as close as possible to the observations. A particular care was given to the tuning of free model parameters impacting the top of the atmosphere (TOA) radiation budget . Interestingly, none of the sensitivity studies described above strongly impacted the TOA radiative budget. This indicates that specific tuning targeting the land surface processes can be done independently to some extent. Such an approach has not been adopted for the 6A version of the IPSL-CM, but it could improve the performances of the model and reduce some bias in future versions of the model (Li et al., 2019).

Improvement of the Realism of the Hydrological Cycle in the Coupled Continental Surface-Atmosphere System
The impact of the more physical hydrological scheme (ctrl) used for CMIP6 (section 2.2) and the impact of the more realistic convective precipitation documented in Hourdin et al. (2020) on the hydrological cycle are addressed in this section in two specific ways: the analysis of moisture and energy coupling at the surface at regional spatial-scale and monthly time-scale and the analysis of the seasonal cycle of precipitation and river discharge at the scale of individual watersheds.

Soil Moisture-Evaporation-Radiation-Precipitation Coupling
The impact of the modified parameterizations on surface soil moisture, net SW radiation at the surface, evaporation, and precipitation is documented at regional scale in order to ensure homogeneous climate conditions to prevail. We focus on two hotspot regions (Koster et al., 2004) where the soil moisture-atmosphere coupling is strong: the Central North America (CNA) region as defined in the Special Report on Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation (SREX) (Seneviratne et al., 2012) and a box in the Sahel (−10, 30°E, 0-20°N). A third region corresponding to Western Europe (WE) where the coupling is weaker is also considered. The grid points corresponding to the WE box are selected according to the KoeppenGeiger climate classification system (Kottek et al., 2006) (Region 21 in Figure S12).
First considering the distribution for the simulated surface soil moisture itself, the regional histograms of the surface soil moisture show that the Choi land surface hydrological scheme produces a very narrow distribution with unrealistic null value of the surface soil moisture for lower boundary and low maximum values of the surface soil moisture (not shown). These characteristics of the surface soil moisture with the Choi scheme can be explained by the crude representation of the hydrology by this scheme for which the

Journal of Advances in Modeling Earth Systems
surface layer exists only intermittently. When considering the GLEAM and ESA-CCI soil moisture products, the width of the distribution is significantly smaller with the ESA-CCI product than with the GLEAM product (not shown). The fact that GLEAM takes the upper 0-10 cm into account while ESA-CCI correlates better with soil moisture up to 5 cm depth (Dorigo et al., 2017) while the surface soil moisture in the simulations corresponds to the moisture in the top 10 cm of soil might contribute to the differences. In addition, GLEAM and ESA-CCI soil moisture products are considered as observations, but they are highly dependent on the underlying models used to produce them and therefore suffer limits identified by Koster et al. (2009) that call for great caution regarding the reliability of the absolute values retrieved. For these reasons we prefer using the standardized Soil Moisture Index (SMI) defined in R. D. Koster et al. (2009;see their Equation 1), and the soil-moisture information at monthly time scale is mostly used to discriminate between very dry, moderately dry, moderately moist, and very moist soils in the corresponding regional distributions for evaporation, net SW radiation at the surface, and precipitation.
For soil moisture in the Sahel region ( Figure 11) the summer observations feature a U-shaped distribution in which dry and saturated states prevail. This U-shaped distribution is reproduced by both schemes (Choi and  [2001][2002][2003][2004][2005][2006][2007][2008][2009][2010]. Each row is dedicated to a particular variable: surface standardized soil moisture (first row), net SW radiation at the surface (second row), evaporation (third row), and precipitation (fourth row). The first four columns correspond to the reference experiments, and the last two columns correspond to the different sets of observations indicated above the corresponding histograms. The colors depict the PDF from the minimum to first quartile (dark red) from first quartile to the median (pale orange), from median to third quartile (cyan line), and from the third quartile to the maximum (blue line). For soil moisture, the y-axis is cut at .25 (representing 25% of the quartile) for the sake of readability but the driest quartile peaks at 0.8 (corresponding to 80% of the quartile) for APctrl and the moister quartile peaks at .8 for APChoi and APctrl. For evaporation they-axis is cut at .14 (corresponding to 14% of a quartile), but 55% (APChoi) and 90% (APctrl) of the evaporation associated with the first quartile is less than 0.1 mm/day. For the precipitation, the y-axis is cut at .12, but 70% , 85%, 15 %, and 40 % of the precipitation associated with the driest soil moisture quartile are less then 0.1 mm/day for APChoi, APctrl, 6AChoi, and 6Actrl and 20% and 10 % for GLEAM and ESA-CCI.

Journal of Advances in Modeling Earth Systems
ctrl) with strong differences: Choi favors the moistest contents much more than the observations while ctrl leads to a tri-modal distribution. This feature has been observed for several other regions with different climate and is the signature of using one dominant soil texture among three possible ones in each grid cell of the region, while in reality, many different soil textures coexist and lead to a mixed behavior. However, the U-shape is also present indicating that the scheme tends to favor dry or saturated situations for each texture as well. For all regions, the highest value of the net SW radiation is overestimated by as much as 20 Wm −2 . This holds for both AP and 6A versions of the model and for each soil moisture quartile. Various hypothesis can be formulated: This bias can either rely on a difficulty in processing CERES observations to retrieve the net radiation at the surface or rely on LMDZ. In this case, a problem with the radiative transfer code or a lack of simulated clouds or an underestimation of their radiative impact can be invoked. For the Sahel, when the surface is moist, the 6Actrl configuration tends to underestimate the occurrence of situations with an elevated evaporation rate and overestimate the occurrence of situations with low values of the net SW radiation (second and third columns in Figure 11). This feature can be interpreted as a too frequent occurrence of radiation-limited evaporative regimes with  [2001][2002][2003][2004][2005][2006][2007][2008][2009][2010]. Each row is dedicated to a particular variable: surface standardized soil moisture (mrsos, first row), net SW radiation at the surface (second row), evaporation (third row), and precipitation (fourth row). The first four columns correspond to the four reference experiments and the last two columns to the different sets of observations indicated above the corresponding histograms. The colors depict the PDF from the minimum to first quartile (dark red) from first quartile to the median (pale orange), from median to third quartile (cyan line), and from the third quartile to the maximum (blue line).

Journal of Advances in Modeling Earth Systems
respect to the soil moisture-limited evaporative regimes in this region. For CNA and for AP physics radiation is either insensitive to soil moisture (APChoi) or the low radiation is surprisingly associated with the driest soil quartile (APctrl) while, for 6A physics, low radiation associated with extended cloud cover is rather associated with the wettest quartile, which is consistent with the CERES product. Over both CNA and Sahel and for the driest surface soil moisture quartile, AP tends to favor little or no rainfall at the monthly time scale, probably over-simulating dry events. This feature is much weaker with 6A physics and in better agreement with the observations (bottom row in Figures 11 and 12). When Choi hydrology is activated with 6A physics, dry soils tend to have a sustained rate of evaporation, while the 11-layer hydrology also allows low evaporation rates consistent with the observations. When AP physics is activated, dry soils tend to be associated with too weak evaporation rates, this feature being more pronounced with 11-layer hydrology. Additional information concerning the evolution of the performances of the atmospheric model in the Sahel with the AP and 6A atmospheric physics are given in Diallo et al. (2017). In the hotspot regions, the 6Actrl configuration used for CMIP6 is the closest to observations due to both improved atmospheric physics and representation of soil hydrology. Low precipitation rates (at monthly time-scale) associated with dry soil are also overestimated in Western Europe. In this region where the soil moisture-atmosphere coupling is expected not to be dominant, the simulated net SW radiation, the simulated evaporation, and the simulated precipitation appear to be more sensitive to the soil moisture than the observed ones ( Figure S13). Figure 13 shows the seasonal cycle of precipitation observed and simulated by the four sensitivity experiments described above for 14 major watersheds together with the seasonal cycle of the river discharge observed and simulated at 14 stations on the rivers of the same major basins. For four out of the five boreal basins (Yukon, McKenzie, Ienisei, and Lena) the precipitation is often overestimated in all configurations. For some basins including Mississippi, Congo, and Amazonia, the seasonal cycle of simulated precipitation is significantly improved in volume or in phase in the configuration used for CMIP6 (6Actrl). For instance, in Tocantins basin in the Cerrado, the duration of the dry season is now reduced in agreement with the observations. This improvement can be attributed to changes in the parameterizations of the atmospheric physics. The impact of the LSM is limited, except over some midlatitude basins such as the Danube where the volume of precipitation is controlled by atmospheric physics and continental hydrology and is overestimated with the 6Actrl configuration used for CMIP6. With the 6Actrl configuration, simulated river discharges are also improved for the Mississippi, Amazonia, and Congo, owing to improved precipitation volume. The seasonal timing of river flow is different from that of rainfall because of the time needed for water to circulate in soils and along river systems after it has reached the ground. This timing is usually correct, with errors resulting from those of the simulated precipitation (e.g., intensity and location of rainfall events inside the watersheds), simulated land surface processes (e.g., snowmelt dynamics, permafrost, and transit times in the soil), and the fact that residence times of the routing reservoirs only depend on the type of reservoir (stream, overland, and groundwater) and the grid cell slope, while other regional factors can be important. In particular, the absence of floodplains in all the simulations largely explains the overestimation of river discharge in the Niger (d 'Orgeval et al., 2008) and Congo and may contribute to the early peak flows of the Amazon . The parameterizations of the land surface processes have a major effect in the five Arctic rivers, with a higher flow and earlier maximum when ground freezing is activated. This effect improves the simulated discharge in the two basins with the largest fraction of permafrost (Yenisei and Lena, in eastern Siberia). In the other three basins (Ob, Yukon, and McKenzie), the extent of frozen soils may be overestimated, and the overestimation of the river discharge by 6Actrl can also be related to the lack of dams and floodplains in the model (Gouttevin et al., 2012), with a potential feedback on permafrost extent, since a stronger cooling is required to freeze a wet soil than a dry soil. The Bramahputra (India) discharge shows improved volume and seasonality with the 6Actrl configuration, while the maximum of the precipitation is underestimated. For this particular river that originates from the Angsi glacier located in Tibet, the change in atmospheric physics improves the timing while the maximum discharge is improved (reduced) with the activation of the soil freezing. This nonintuitive impact of soil freezing is caused by an atmospheric feedback, with less precipitation in the watershed if the freezing is activated. Yet, the positive bias of all simulated discharges might rather be related to massive irrigation in this basin , which is not taken into account in these simulations.

10.1029/2019MS002005
Journal of Advances in Modeling Earth Systems Figure 13. Multiannual mean seasonal cycle of the precipitation (upper panel) and of river discharge (lower panel) observed and simulated for 14 major river basins and for the four reference experiments: 6Actrl, Choi6A, APctrl, and ChoiAP and for the "NoSnowFreez" experiment described in Table 2. The observations refer to the GPCP product for precipitation and to the Global Runoff Data Center (GRDC) database for the river discharges (Milliman & Farnsworth, 2011). The gray shaded areas indicate the interannual variability of the observed precipitation in the basin area (upper panel) and the interannual variability of the river discharge at the measurement stations.

10.1029/2019MS002005
Journal of Advances in Modeling Earth Systems

Concluding Discussion
The quality of the coupled atmosphere-land continental surface system implemented in the IPSL-CM for CMIP6 is evaluated, and the relative role of atmospheric and land surface processes in controlling the coupling at the surface is analyzed and quantified. The following conclusions are reached: • The improvement of the radiative balance and in particular the surface downward SW radiation makes it possible to reduce several temperature biases, some of which were shared by many models that participated in the CMIP5 exercise (e.g., summer bias in midlatitudes Stouffer et al., 2017). This confirms the essential role of the radiation and its interactions with clouds for continental climates. • The temperature in the surface layer of the polar regions is significantly improved thanks to the refined turbulent diffusion scheme for stable situations and to the new longwave radiative scheme in LMDZ version 6A (Vignon et al., 2018). The boreal regions respond with a slightly excessive reduction of the daily minimum temperature while in CMIP5, several models including LMDZ shared a warm bias (Wei et al., 2017). A more detailed consideration of the turbulent mixing linked to the subgrid orography or high vegetation may help to partially compensate for this cooling, but further tests and evaluation are necessary. • With the exception of the surface albedo, the snow cover, and to a lesser extent the evaporation, the overall structure of the near-surface biases is only marginally sensitive to the land surface scheme whose impact is mostly relevant at the regional scale. However, for a given description of the atmospheric physics, the differences induced by the change in the LSM are statistically significant (at the 5% level) over most of the continents for all variables examined but precipitation. • The multilayer hydrology gives a representation of the surface soil moisture in better agreement with available observations than the Choi scheme, and the representation of evaporation in regions of strong coupling of the continental surface with the atmosphere is significantly improved. • The snow scheme of intermediate complexity implemented in ORCHIDEE leads to a better description of the snow cover on the continents. Mountainous regions and in particular the Tibetan Plateau and High-mountain Asia remain challenging because radiative feedbacks and an imperfect description of the circulation in these regions at regional scale induce a strong cold bias. Further refinements of the snow scheme over complex terrains and of the atmospheric circulation are required to reduce these biases. • The calculation of the fraction of frozen water in the soil implemented in the multilayer hydrology combined with the improved realism in volume and seasonality of the precipitation simulated with the 6A version of LMDZ has improved the seasonal cycle of rivers discharge in several major river basins.
Further developments based on the current version of the coupled atmosphere-land continental surface system are also identified: • The attempt to take into account sources of turbulent mixing such as orography-induced small scales gravity-wave drag (Steeneveld et al., 2008) or the drag induced by vertical obstacles penetrating the boundary layers such as trees needs to be further refined. • The benefit of using the dynamical roughness lengths as proposed by Massman (1999) and tested locally by Su et al. (2001) over homogeneously vegetated surfaces (shrub, cotton, grass) has still to be thoroughly evaluated in the context of the imperfect heterogeneous land-atmosphere coupling. In this context, the bulk formulae for flux calculation use a unique value of the roughness length, aggregated over possibly highly heterogeneous subgrid surfaces, and a potentially wide range of contrasting subgrid surfaces sees the same boundary layer properties. The development of more robust parameterizations for flux calculation over heterogeneous surfaces could benefit in the future from high-resolution simulations such as Large Eddy Simulations. • For CMIP6, even though it would have reduced an overall overestimation of the evaporation, we considered it preferable not to activate the evaporation resistance of the bare soil in its current state to avoid reinforcing a warm bias in summer that would affect the quality of the simulations. Further work is needed to better calibrate the intensity of the evaporation resistance, which also impacts the ratio of transpiration to total evapotranspiration, shown to exert a key influence on biophysical feedback strength in both present and future climates . Owing to the number of intricated parametrizations in a CM, such work cannot be done in isolation, and our results show that particular attention must be paid to the uncertainties of cloud parameterizations and cloud-radiation interactions.
• The multilayer hydrology enables to consider new developments for the CM. One of them is the introduction of realistic groundwater description, which may alleviate some biases by means of enhanced evapotranspiration owing to capillary rise from the water table Wang et al., 2018). The inclusion of irrigation in the simulations could also help reducing persistent biases (Puma & Cook, 2010), especially in places where it is fed by groundwater abstraction at non-renewable rates, like in India or the U.S. Great Plains (Al-Yaari et al., 2019;Famiglietti, 2014). • In the version of ORCHIDEE used for CMIP6, the soil freezing is diagnosed in each soil layer, but the latent heat release/consumption associated with water freezing/thawing is not accounted for. This is, together with the better description of soil organic matter decomposition (Guimberteau et al., 2018), a preliminary step to account for the biogeochemical implications and positive feedback to global warming due to permafrost disappearance. • Since the CMIP6 version, a description of the nitrogen cycle and its coupling to the carbon cycle has been implemented in ORCHIDEE (Vuichard et al., 2019). The impact of soil nitrogen availability (and more generally of soil nutriments) is crucial for plant growth but also for the energy and water cycle. Very recently we also included an ensemble of developments to improve the representation of forest dynamic and forest management with the inclusion of (i) a new canopy radiative transfer scheme (two streams model), (ii) a new carbon allocation scheme based on observed allometric relationships, and (iii) age and diameter classes and management practices (from natural to coppices). These developments described in Naudts et al. (2015) have a direct impact on the surface climate, changing the albedo of forest, the roughness length (varying with tree height dynamic), the latent and sensible heat fluxes, and the overall surface temperature (see an application over Europe in Naudts et al., 2016). • Interestingly, none of the sensitivity tests to the surface processes described in this paper significantly impacted the TOA radiative budget, an essential target of the tuning of global CMs. This indicates that there is latitude for independent tuning for TOA radiation and for the land surface processes. Such an approach has not been adopted for the 6Actrl version of the IPSL-CM, but it could improve the performance of the model and reduce some bias in future versions of the model (Li et al., 2019). The tuning of the free parameters is now recognized as necessary step in model development  that should not rule out the improvement of the physical content of parameterizations.

Appendix A: Gravity-Wave and High-Vegetation Drag Induced TKE
LMDZ deals with two effects of the subgrid orography on the atmospheric flow: 1. the orographic blocking effect (called drag), 2. the orographic effect on the wind direction (called lift).
The drag and lift effects are described in Lott (1999). These two effects were modified during the tuning process (Gastineau et al., 2020). The drag and lift parameterizations (Lott & Miller, 1997) encompass two processes: (i) the "blocking" of the flow leading to a flow separation at the relief flanks and (ii) the orographic gravity-wave drag. The latter accounts for the drag due to wave breaking in the middle atmosphere as well as for the drag induced by low-level dissipation and breaking of trapped lee waves (Lott, 1998). The drag effect is calculated applying a local force opposed to the local flow, and it is used in all CMs . The lift effect is less widely used and involves a force perpendicular to the local flow.
For the setup of the sixth version of the model, the effect of the drag exerted by vegetation protruding into the first model layers has also been parameterized in LMDZ following Nepf (1999) and Masson & Seity (2009).
Orographic gravity-wave breaking and dissipation (e.g., Epifanio & Qian, 2008;Sun et al., 2015) as well as flow-canopy interactions (Finnigan, 2000) have been shown to be common paths to turbulence generation. More generally, every drag exerted on an air flow is associated to a loss of large scale kinetic energy and to an energy cascade from large scale kinetic energy to small scale turbulence (TKE) and ultimately to dissipation by molecular viscosity and conversion into enthalpy (Stull (1990)

Journal of Advances in Modeling Earth Systems
where ρ is the air density and u and v are the zonal and meridional components of the wind vector, respectively. The loss of kinetic energy k in an atmospheric layer associated to the parameterized drag "dg" thus reads where δz is the depth of the considered atmospheric layer. We will see hereafter that Γ K is an exchange term between large scale kinetic energy and TKE while Ψ corresponds to the vertical divergence of power associated to the parametrized stress. Once integrated over a whole atmospheric column, as (Boville & Bretherton, 2003).
To guarantee energy conservation in LMDZ version 6A, ∂ t Kj dg was initially calculated for each drag parameterization and then converted into enthalpy in each atmospheric column. To account for a more realistic mixing in the boundary-layer and to preclude artificial thermal decouplings over the continents, the loss of energy associated to the high-vegetation and orographic gravity-wave drag was then transferred to subgrid TKE before being converted into enthalpy, thereby enhancing the mixing in the boundary-layer. Practically, this is done as follows.
The parameterization of the vertical turbulent mixing in LMDZ version 6A is based on a local diffusion scheme combined with a mass-flux scheme for convective boundary layers, the so-called "thermal plume model" (Hourdin et al., 2002;Rio et al., 2010). The local diffusion scheme is a 1.5 order closure K-gradient scheme developed by Yamada (1983) in which the diffusion coefficients depend on the TKE calculated with a prognostic equation where c is a real constant, θ v the virtual potential temperature, u′w′ and v′w′ the components of the turbulent momentum flux, g θ v w′θ′ v the buoyancy flux, and K e a turbulent diffusion coefficient.
The conversion of large scale energy into TKE due to the orographic gravity-wave drag and high-vegetation drag can therefore be taken into account by including the Γ K terms associated to those parameterizations as additional "shear production" terms into Equation A6. For the vegetation, the drag coefficient is proportional to the fraction of protruding vegetation in the grid-box. More details and sensitivity tests can be found in Vignon (2017). One might also want to add the TKE tendency due to the flow-blocking component of the 10.1029/2019MS002005

Journal of Advances in Modeling Earth Systems
subgrid orographic drag scheme. However, the underlying physical mechanism responsible for the energy cascade associated to flow blocking is not a priori obvious. This aspect deserves further investigation.

B1 Diagnostics at the Screen Level
The calculation of the screen-level variables, tas (2 m temperature), huss (2 m specific humidity), and uas and vas (eastward and northward surface wind) is done iteratively following Hess (1995). It is based on the Monin-Obukhov similarity theory for the surface layer and the bulk formulation of the turbulent flux proposed by Louis et al. (1982). The 2 m relative humidity, hurs, is then diagnosed from huss and the saturated specific humidity at temperature tas.
The wind, the temperature, and the specific humidity profiles in the surface layer follow equations: with κ the empirical von Karman constant, L the Monin-Obukhov length, and Ψ the stability functions for the stability parameter ζ = z L . u * is the friction velocity, Θ * the temperature scale, and q * the humidity Figure B1. Cumulated histogram of the reconstruction errors for daily 2 m t. The y-axis is logarithmic. The red curve corresponds to the difference between the daily mean obtained with the original run and with the instantaneous values bounded with the surface and first atmospheric level temperature (ON experiment). The blue curve corresponds to the a posteriori correction.

Journal of Advances in Modeling Earth Systems
scale. An empirical formulation for the stability functions is given by Dyer (1974). According to the Monin-Obukhov theory, L, u * , Θ * , and q * are evaluated at the surface and are independent of z in the constant flux layer. A first guess of the screen variables is estimated owing to Equation B1. Then the Louis bulk formulation and the scale variables are used to calculate an updated value of the screen level variables. In situations where the turbulence is vanishing and the atmosphere above the surface is dry but the surface soil moisture is significantly above the residual value, a wrong diagnostic of q surf in ORCHIDEE led to inconsistencies in the stability diagnostics between the first-guess evaluation and the use of the Louis formulation. In such conditions, the calculation can produce unrealistic (overestimated) values of tas up to 450 K together with negative values of relative humidity. Luckily, apart from a few exceptional events, this occurs only one time a day at most. Thanks to that it was possible to a posteriori correct the screen level values for simulations for which the minimum daily relative humidity was archived. In the vast majority of cases, these failures occur in stable conditions. In such conditions Ψ H = − 5z L and one shows easily that Θ is a monotonous function of z, which implies that Θ is comprised between Θ s , the surface temperature, and Θ 1 , the temperature at the first atmospheric level of the model. A simulation where the screen level temperature is bounded at each timestep with the surface and the air temperature at the first atmospheric level will then be used to validate the a posteriori correction.

B2. A Posteriori Correction for the Screen-Level Variables
The a posteriori reconstruction algorithm is described hereafter. The general idea of the algorithm is to replace the erroneous values (daily maximum air temperature, tasmax or surface daily minimum relative humidity, hursmin) by an interpolation between the previous and the following day without failure. The rarity of the failure of the screen variable calculation makes this approach feasible. The erroneous values (failure) are detected by looking for negative values of hursmin.
• Step 1: We detect possible failure by identifying all the grid points and days (of index k) for which the estimated near-surface humidity is negative. • Step 2: We correct the daily mean temperature by correcting the maximum in the daily mean using information from the last and next day without failure as follows.For the derivation, we denote by T the 2 m temperature, tas.The daily average value will be noted T = Σ N 1 T i =N, where N is the number of timesteps i within a day and the maximum T max . We introduce the daily maximum anomaly D = T max − T . We apply the interpolation in time between the last (l) and next (n) day without failure to D, leading to the corrected value D * = (1 − a)D l + aD n with a = (k − l)/(n − l).Then we compute the corrected daily averaged temperature as T * noticing that • Step 3: We correct the maximum temperature from the corrected daily mean temperature T * and interpolated daily anomaly D * as which can be written as well using Equation B3 as For the daily values of tas, this approach leads to replacing a potential error of about 1 96 × 150 K (for a maximum error of 150 K on the instantaneous value of the temperature, 96 being the number of timesteps in one day for LMDZ version 6) by an uncertainty of at most 1 96 of the daily maximum anomaly (that is about 10K 96 ), that is, more than 10 times less. For tasmax the reconstruction procedure avoids creating extremes based on erroneous (and irrealistic) screen variable values.
A similar three-step approach is applied to correcting the relative humidity but taking as an information from the last and previous day without failure, the ratio R ¼ RH min RH , so that the procedure reads • Step 1: Same as for T. • Step 2: We apply the interpolation in time between the last (l) and next (n) day without failure to R, leading to the corrected valueR * = (1 − a)R l + aR n with a = (k − l)/(n − l). Then we compute the corrected daily averaged relative humidity as RH * noticing that and the corrected values are calculated with the following equations.
Step 3: We correct the minimum relative humidity from the corrected daily mean relative humidity RH * and interpolated ratio R * as RH * min ¼ R * * RH * : (B8)

B3. Evaluation of the Uncertainty Relying on the A Posteriori Correction
The a posteriori reconstruction (hereafter called OFF) is evaluated against the results of the near-surface temperature diagnosed on-line in the model (hereafter called ON) and bounded at each timestep with the surface and the air temperature at the first atmospheric level. In the ON experiment the bounding is applied only for diagnostic purpose and does not affect the behaviour of the model.
For each grid point and each day of the 36 years of an AMIP experiment, the reconstruction error is evaluated with the difference between the OFF and the ON experiments. Figure B1 shows the cumulated histogram of reconstruction errors with the a posteriori method and the ON bounding method for the daily mean and maximum daily temperature. For the majority of grid points and days, the OFF and ON methods give similar results. The reconstruction error lies within the range (−0.2 K, 0.4 K). These small differences between the two methods for daily values show that the near-surface temperature is not fundamentally modified by the OFF correction compared to what would be obtained with an on-line correction. The reconstruction errors for the monthly mean near-surface temperature would have been negligible compared to the daily errors, being 30 times smaller than daily errors.

Data Availability Statement
The version of LMDZ and ORCHIDEE used for the production of CMIP6 will be made available at the following address (http://www.lmd.jussieu.fr/∼lmdz/pub). In the ORCHIDEE community, the model is referred as "Orchidee Trunk," which is the official version developed at IPSL. The version used for the specific simulations runs for this paper is the "svn" release 3427 in the LMDZ6/branches/IPSLCM6.0.15 and the "svn" release 5626 in the tags/ORCHIDEE_2_0/ORCHIDEE_OL branche. Simulations data used in the present paper will be made available with a DOI if the paper is accepted for publication.

Acknowledgments
This work was supported by the DEPHY2 project funded by the French national program LEFE/INSU. The CMIP6 project at IPSL used the HPC resources of TGCC under the allocations 2016-A0030107732, 2017-R0040110492, and 2018-R0040110492 (project gencmip6) provided by GENCI. One of the authors (Y. Zhao) benefited from the French state aid managed by the ANR under the "Investissements d'avenir'' program with the reference ANR-11-IDEX-0004 -17-EURE-0006. This study benefited from the ESPRI (Ensemble de Services Pour la Recherche l'IPSL) computing and data centre (https://mesocentre.ipsl.fr), which is supported by CNRS, Sorbonne Universite, Ecole Polytechnique, and CNES and through national and international grants. Analysis presented in 4.14.1 benefited from the support of the CMUG and ESA-CCI programs (http://www.esa-cmug-cci. org). We also thank two anonymous reviewers for their helpful comments on the original manuscript.