Long‐Term Experimental Evaluation of a High‐Resolution Atmospheric General Circulation Model From a Hydrological Perspective

The reproducibility of Atmospheric General Circulation Model (AGCM) results in the current climate was evaluated to assess annual and monthly mean climate values from a hydrological perspective and to elucidate factors affecting them. Reproducibility was confirmed for precipitation, air temperature, and runoff, which were compared with basin‐average values to describe deviations in reproduced AGCM data. AGCMs have been successively applied over 65 years in the current climate, and annual mean values of precipitation generally have a positive bias in most basins and those of air temperature have a negative bias. However, runoff shows no clear pattern of bias. For monthly means, precipitation has positive and negative biases in July in the Northern Hemisphere. In January in the Southern Hemisphere, precipitation has a positive bias. In both months, air temperature has a negative bias. Factors contributing to this bias are discussed. From a hydrologic perspective, the annual mean bias in air temperature performs better in explaining the apparent evapotranspiration (i.e., precipitation minus runoff) than the bias in precipitation. In the tropics, the air temperature bias has a correlation coefficient of −0.176 with the precipitation bias and −0.406 with apparent evapotranspiration (negative values indicate a better correlation). However, this was not the case for the monthly average air temperature bias, possibly because of climatological influences or the inadequate representativeness of runoff in land surface models. The results show that runoff bias may contribute to air temperature bias. Accordingly, we propose a new method for comparing runoff bias and climate bias.

Thus, while there have been many evaluations of meteorological variables such as AGCM precipitation and temperature biases, few have directly evaluated AGCM runoff bias.Recent studies have considered the runoff bias, which has led to the development of runoff-related products such as Linear Optimal Runoff Aggregate (LORA; Hobeichi et al., 2019) and Global Runoff Reconstruction (GRUN; Ghiggi et al., 2019).On the other hand, in hydrological models, runoff is seldom used among the AGCM outputs, and other meteorological factors (e.g., precipitation and air temperature) are used with bias correction.In addition, the evaluation of runoff in land surface models is undertaken mainly by comparison with observed river discharge, and it is unclear how the runoff bias affects climate conditions.Against this background, this study focuses on runoff, which has not previously been adequately considered, in conducting an evaluation of meteorological variables from an AGCM from a hydrological rather than a climatological perspective.To this end, this study used a method that enables direct comparison of areal meteorological elements and river discharge point-data by averaging the evaluation target over a river basin.This method may not fully reflect the accuracy of meteorological elements such as precipitation and air temperature, depending on the size of the river basin area.Therefore, we evaluate the method using MRI-AGCM3.2,which is a high-resolution AGCM and a participant in HighResMIP (Haarsma et al., 2016).HighResMIP has been used for extreme events and regional studies because of its high resolution (e.g., Avila-Diaz et al., 2023;Mizuta et al., 2022;Roberts et al., 2020;Squintu et al., 2021), whereas global-scale assessments of climate values have yet to be performed.
Here we summarize specific weather elements and relevant studies.In this evaluation of the accuracy of AGCMs, we consider the reproducibility of precipitation, air temperature, and runoff data.Precipitation and air 10.1029/2023JD038786 3 of 21 temperature were selected because bias corrections have focused mainly on these meteorological elements to date (e.g., Grillakis et al., 2013;Maraun et al., 2017;Miao et al., 2016;Pierce et al., 2015) and they provide the most reliable observed values.Soil moisture observations are also available, but they are unsuitable for basin-scale assessments because of their point-scale nature.Soil moisture is heterogeneous, even at small scales.This makes it difficult to obtain average values at the scale of a river basin.On the other hand, since river discharge is the amount of runoff collected from a river basin, it can be converted to runoff height by dividing by the river basin area and then compared with flux factors such as precipitation and air temperature.Our aim is to identify hydrological information needed for annual and monthly mean climate values and their biases, as suggested by Lopez et al. (2009) and Kiem and Verdon-Kidd (2011).The AGCM used in the study provides results at different resolutions: the higher-resolution version provides only a single result from present to future, and the lower-resolution version provides four ensemble calculations in the present climate.Since the high-resolution version yields only one calculation result, it has a high uncertainty.Therefore, the low-resolution version can be interpreted to mimic the behavior of the high-resolution version when used in an ensemble.Results of the lower-resolution version were compared with those of the higher-resolution for the present climate.
The paper is organized as follows.Section 2 describes the models and observation products used in the evaluation and describes the evaluation methodology.Section 3 presents the results of model validation, including global trends and the reproducibility of precipitation, air temperature, and runoff within Köppen climate zones.Section 4 compares results of the lower-and higher-resolution versions of the model.Section 5 discusses evaluation and factor analysis of the higher-resolution model in the current climate from a hydrological perspective based on the results of Sections 3 and 4. Finally, Section 6 provides the conclusions.

Data and Evaluation
This evaluation was based on monthly and annual average climatological values in the present climate period (1950-2014, as follows), as AGCMs are not intended to reproduce past climatic phenomena.

Model Description
The AGCM studied here was the MRI-AGCM3.2model (Mizuta et al., 2012) developed at the Meteorological Research Institute, Ibaraki, Japan.The model includes simulations with grid resolutions of 20 and 60 km, which are referred to hereafter as AGCM20 and AGCM60, respectively.This model participated in the World Climate Research Programme Coupled Model Intercomparison Project Phase 6 (CMIP6) high-resolution intercomparison project (HighResMIP; Haarsma et al., 2016) and was used here for modeling the 1950-2099 period, with 1950-2014 used as the present climate and 2015-2099 as the future climate.With AGCM60, four ensemble calculations were undertaken for the present climate, whereas for the future climate, projections were conditional on the CMIP5 emission scenarios Representative Concentration Pathway (RCP) 2.6, RCP 4.5, RCP 6.0, and RCP 8.5.In AGCM20, limited computational resources meant that one calculation was undertaken for the present climate, and RCP 8.5 was projected for the future climate.Sea Surface Temperature (SST) used as boundary conditions were consistent with the HighResMIP protocol.Specifically, 0.25° daily SSTs were used for the present climate and the HadISST2.2data set (Kennedy et al., 2017) for sea ice.The same settings as those used in CMIP6 historical experiments (O'Neill et al., 2016) were used for ozone, volcanic aerosols, greenhouse gases, and solar activity.For non-volcanic aerosols, monthly mean three-dimensional data from MRI-ESM2 (Yukimoto et al., 2019) historical experiments were used, rather than the HighResMIP protocol.For the future climate, average CMIP5 model increments were added to observed SSTs and sea ice concentrations.For ozone, volcanic aerosols, greenhouse gases, and solar activity protocols from the CMIP6 Shared Socioeconomic Pathway (SSP) SSP 585, SSP 460, SSP 245, and SSP 126 experiments (Riahi et al., 2017) were used.For nonvolcanic aerosols, monthly averaged nonvolcanic aerosols from the SSP 585, SSP 460, SSP 245, and SSP 126 experiments with MRI-ESM2 were used.The model setup has been described in detail by Mizuta et al. (2022).

Observational Data
The AGCM was evaluated hydrologically, so observed base-station precipitation, air temperature, and river discharge data were selected for comparison with values predicted by the model.Details of the observed data used (i.e., resolution and time periods) are summarized in Table 1.

Precipitation Data
Global precipitation is often used by the Global Precipitation Climatology Project (GPCP; Adler et al., 2018) and the Climate Prediction Center (CPC) Unified Precipitation Project (Chen et al., 2008).Merged Analysis of Precipitation (CMAP; Xie & Arkin, 1997); however, these data sets include ocean areas.As this study involved basin analysis where only land data were applicable, 2020 Global Precipitation Climatology Centre (GPCC; Schneider et al., 2017) land-only data (resolution 0.25°) were used.

Air Temperature Data
Land-only air temperature data were available from Climatic Research Unit Time-Series (CRU TS) (Harris et al., 2020) with a resolution of 0.5° and the University of Delaware (UoD), USA (Matsuura & Willmott, 2018), with a resolution of 0.5°.The CRU TS data included other meteorological factors, and interpolation was performed using climatological values for sites with insufficient observations, but in basin-based assessments for some basins yielded no air temperature variations.This was not a problem with the UoD data, so those data were used in this study.

River Discharge Data
Monthly river discharge data were obtained from the Global Runoff Data Centre (GRDC, 2023; https://www.bafg.de/GRDC/EN/Home/homepage.html).Unlike precipitation and air temperature data, river discharge data have missing values.Since climate values are the target of evaluation here, monthly and annual averages of river discharge were calculated without considering missing values.The data points used in the analysis are described in Section 2.3.1.

Validation Method
Model validation involved comparison of calculated and observed values on a basin-by-basin basis, based on AGCM20 results supplemented by AGCM60 results (Section 2.4).

Evaluation Points
In evaluating an AGCM from a hydrological perspective, it is desirable to have corresponding river discharge data, so areas upstream of points where river discharge was observed were used as comparative basin units.River discharge observation points were those registered by GRDC, with basin geometry data in upstream areas also being provided.There are a total of 2,576 points for which basin shape data are provided by the GRDC.Some basins include data from multiple observation points, so the single point with the largest basin area was selected (per river basin).This reduced the number of data points for basin shape to 1,116.Basins smaller than or similar to the calculated grid size were excluded.Basins with an area of <4,000 km 2 (as provided by the GRDC) were excluded.This criterion was chosen because the AGCM60 results were also used, and the area of the AGCM60 calculation grid is ∼3,600 km 2 , consequently, an area of 4,000 km 2 was adopted as a round number.A final total of 364 sites were used for validation of precipitation, air temperature, and runoff, as shown by the Köppen climate classification in Table 2.For the Köppen climate classification, we used GIS data for the period 1976-2000 (Rubel & Kottek, 2010), which is close to the period of interest .
The climate classification that made up the highest percentage of the area of the river basin was used as the climate classification for that river.

Basin Average Values
The method for combining calculated and observed values into a single value for a basin unit is described here.Calculated and observed precipitation and  1) is calculated for grids for which the basin shape overlaps with the computational or observed grid (Figure 1).For example, if only a portion of a grid is covered by the basin, the basin area within the grid is calculated using the python package geopandas (version 0.9.0;Jordahl et al., 2021), which can calculate the overlapped area.If the entire grid is covered by the basin, the area of the grid is used.The calculated or observed value is then multiplied by the overlapped area in each grid.The values for all grids are added together, and then divided by the river basin area to obtain the basin unit value, which is the "basin average value" used in AGCM evaluation, derived as follows: where V w is the basin average value, A w is the GRDC basin area, V i is the calculated or observed grid value, A i is the overlap area of each grid and basin, and n is the number of overlaps between basins and calculated or observed grids.

Evaluation Methods
In the evaluation of basin average values, it would have been too complicated to evaluate each basin for all 364 sites so global situations were evaluated.Average climatological values were targeted for each month and year and evaluated using global maps and scatterplots to illustrate global situations.Evaluation indicators (Table 3) were used, together with overall average and 10-yr moving averages for the study period.Ten-year moving averages were used because there was no significant difference between them and the 30-yr averages or values obtained by shifting the 30-yr period by 5 years before averaging.The results were classified according to the Köppen climate classification.There was negligible difference between 10-yr moving-average biases and that for the entire period of interest, so bias results are calculated for the period of interest.

Ensemble Evaluation
There is only one AGCM20 result for the current climate, which is not sufficient to account for uncertainty.Therefore, we evaluated the four AGCM60 results to identify differences between the ensembles.The method was evaluated based on the difference between the AGCM20 and AGCM60 results.Specifically, we used the difference between the annual mean values of precipitation, air temperature, and runoff over the period of interest

Notation of figure
Formal nomenclature Formula Correlation Correlation Coefficient   and the ratios of the variance of annual means for each year.The AGCM20 results are denoted as spd, and the AGCM60 ensemble is denoted as hpd, hpd_m01, hpd_m02, and hpd_m03; hpd has the same settings as spd, except for grid resolution.

Validation Results
Precipitation, air temperature, and runoff from the AGCM20 calculations were compared with the observations (Section 2.3.3), and global situations were evaluated.

Global Precipitation Reproducibility
The global reproducibility of precipitation is indicated in Figure 2, showing annual, January, April, July, and October averages.Months were selected to represent winter, spring, summer, and fall in the Northern Hemisphere.Annual mean precipitation was positively correlated between calculated 10-yr moving averages and observed values in some areas and negatively correlated in others (Figure 2a).Mean January precipitation was negatively correlated between calculated and observed 10-yr moving averages near the equator (Figure 2b), whereas in April it was negatively correlated at mid-high latitudes of the Northern Hemisphere (Figure 2c).Correspondingly, a positive bias was prominent in the Northern Hemisphere, and a negative bias in the Southern Hemisphere.Mean July precipitation was negatively correlated at mid-high latitudes in the Northern Hemisphere (Figure 2d), similar to April.Negative correlations tended to be stronger than in April at low latitudes of the Southern Hemisphere.Biases were large at mid-high latitudes in the Northern Hemisphere, even in basins with positive correlations, whereas there were negative biases in basins with negative correlations at low latitudes in the Southern Hemisphere.For the average October precipitation, correlation coefficients tended to be similar to those in January (Figure 2e), with particularly large biases at mid-high latitudes in the Northern Hemisphere.Scatterplots for global reproducibility of basin-averaged precipitation (annual, January, and October means) are shown in Figure 3 (results for April and July were omitted because they did not differ significantly from the annual mean).Each dot in the figure represents a single basin, averaged over the period of interest; dot colors represent the Köppen climate classification, with tropical assigned to A and polar to E, in the order shown in Table 2.The annual mean precipitation reproduced less precipitation in some areas of the tropics (dark blue; Figure 3) but had an overall excessive trend.In January, precipitation tended to increase in arid zones with increasing variability, whereas more areas tended to be over-predicted relative to the annual average in October.
In summary, the correlation coefficient between calculated and observed 10-yr moving averages which is an indicator of long-term change, displayed no clear trend for either annual or monthly averages, suggesting a weak long-term change trend within seasons.To confirm this, Mann-Kendall tests were undertaken for annual and monthly averages.Theil-Sen slopes of the basins, significant at the 5% significance level, were determined for 65-yr basin-averaged precipitation; the results for the annual mean are shown in Figure 4, and no significant long-term trend is observed in either the calculated or observed values.With respect to the annual average of bias, calculated values were overestimated relative to observed values, as observed for other AGCMs (Zhang et al., 2016).The seasonal variation in bias was characterized as follows.At mid-high latitudes of the Northern Hemisphere, the bias tended to be smallest in January and largest in July.Precipitation is lower in January in this zone, but AGCM20 reproduced the situation fairly well.However, AGCM20 showed an overestimation for the higher July precipitation.April and October precipitation is intermediate between that of January and July, and the results were over-represented in AGCM20.The higher reproducibility in January at mid-high latitudes of the Northern Hemisphere suggests that AGCM20 reproduces the frontal precipitation in this region well.At low latitudes in the Northern and Southern hemispheres, the bias increased from January to April, with a negative basin bias in July and a small positive bias in October (barring a few basins).The Intertropical Convergence Zone (ITCZ) is located in the Southern Hemisphere in January and in the Northern Hemisphere in July, with positive and both positive and negative biases, respectively, at larger magnitudes than in other months.The poor reproducibility of the ITCZ at low northern and southern latitudes was noted in IPCC AR4 (Lin, 2007), and other models have shown similar trends.At southern mid latitudes, a positive bias was observed in January (summer), a slight negative bias in April (fall), a weaker negative bias in July (winter), and a positive bias in October (spring).This suggests that the southern mid-latitude zone reproduces convective precipitation excessively in January (summer), with a positive bias, and slightly under-reproduces frontal precipitation in July (winter), with a weak negative bias.10.1029/2023JD038786 8 of 21

Global Temperature Reproducibility
As it was likely that a long-term trend would be detected for air temperature, a Mann-Kendall test was undertaken, and the results for the annual mean are shown in Figure 4. Calculated and observed annual, mean results for January, April, July, and October mean values are shown in Figure 5.
Both the calculated and observed Theil-Sen slopes of the Mann-Kendall test were positive in all basins where they were significant (Figure 4), indicating an upward trend in air temperature.There was an overall negative bias for the mean annual air temperature (Figure 5), with positive biases in China and Russia in the Northern Hemisphere and at lower latitudes of the Southern Hemisphere.The January average air temperature had a positive bias in North America, which was not observed in the annual average.For the April average air temperature, the trend was similar to the annual average, but the negative bias was larger in basins at high latitudes of the Northern Hemisphere relative to the annual average.For July, the trend was similar to that of  6).In the subarctic region, calculations underestimated temperature in basins with relatively high air temperatures, and temperature was slightly overestimated in basins with relatively low air temperatures.Calculated and observed monthly mean values for January, April, July, and October in the tropical region are compared in Figure 7, which shows that January was generally under-reproduced; July was under-reproduced in areas with high observed values and over-reproduced in areas with low observed values; and October displayed trends opposite those of July, indicating that reproducibility varies from month to month in tropical regions.
To summarize, correlation coefficients between calculated and observed 10-year moving averages were globally positive for both annual and monthly mean values, capturing the warming trend from around the 21st century.
Regarding bias, calculated values under-represented observed values for annual means, as is the case for the CMIP6 model (Fan, Duan, et al., 2020).Seasonal variations were characterized as follows.At mid-high latitudes of the Northern Hemisphere, a high-temperature bias occurred in January, trending toward a low-temperature bias in April; in July, a trend of low-temperature bias occurred at high latitudes and a high-temperature bias occurred at mid latitudes.In October, there was a high-temperature bias trend at high latitudes and a low-temperature trend at mid latitudes, with an overall trend toward a high-temperature bias through January.Seasonal variations in the tropics were distinctive, with the tropical region and surrounding mid-latitude zones of the Northern and Southern hemispheres respectively tending to have a low-temperature bias in all seasons; the highest-temperature bias was observed in October, changing to a low-temperature bias in January.In July, there were large positive and negative biases in precipitation (Figure 2d) in the Northern Hemisphere but a generally negative bias in air temperature, whereas there was a positive bias in precipitation (Figure 2b) and a negative bias in air temperature in the Southern Hemisphere in January.If latent heat fluxes over land were over-reproduced by an oversupply of precipitation, air temperatures should shift toward a generally negative bias.Such a trend was observed in the Southern Hemisphere but not in the Northern Hemisphere.This point is discussed further in Section 5.

Global Runoff Reproducibility
In terms of reproducibility of global runoff, annual and monthly averages values for January, April, July, and October are shown in Figure 8.For both annual and monthly averages, the correlation coefficients show no consistent trend among river basins.The annual average tends to show a negative bias in the Northern Hemisphere and a positive bias in the Southern Hemisphere (Figure 8a).For January, there is a negative bias in the Northern Hemisphere and a positive bias in the low to mid-latitudes of the Southern Hemisphere, similar to the annual average (Figure 8b), although the bias is larger in the case of the January data.In April, which corresponds to spring in the Northern Hemisphere and autumn in the Southern Hemisphere, there is a global positive bias except for some low-latitude zones in both hemisphere (Figure 8c).In July, which corresponds to summer in the Northern Hemisphere and winter in the Southern Hemisphere, there is a negative bias globally except near the equator, and this trend is also seen in October, although the bias is smaller (Figure 8d). Figure 2a compares the bias of runoff with the bias of precipitation as presented in Section 3.1.The annual mean precipitation shows a positive bias globally, whereas for runoff the bias is negative in the Northern Hemisphere and positive in the Southern Hemisphere.For monthly average values, there is commonly a positive bias for precipitation, with a negative bias for runoff except in April.Generally, one would expect that as precipitation increases, runoff would also increase, but the AGCM20 results of this study do not show such a trend.In other words, with the exception of April, high soil moisture or high evapotranspiration can be considered.To summarize, annual and monthly average values show no clear trend in terms of correlation coefficients, which is an indicator of long-term change.The Mann-Kendall test was not used to confirm the results for runoff because the observed data contain missing values.However, no significant trend is expected, as in the case of precipitation (Figure 4).The annual mean of bias, is negative in the Northern Hemisphere and positive in the Southern Hemisphere.The bias in January is similar to the annual bias, while the bias in April is positive globally, and the bias in July and October is negative globally.Only April shows a positive bias, which is particularly pronounced in the high latitudes of the Northern Hemisphere.Some river basins in the Southern Hemisphere also show a positive bias, but these basins show a positive bias in January.Therefore, there may be a problem in the representation of snowmelt runoff in the land surface model.The northern high-latitudes show a positive bias in May (data not shown), and some northern high-latitudes regions also show a positive bias in June, so it is unlikely that positive bias reflects a problem in the representation of snowmelt runoff.Furthermore, the trend of the bias in runoff is different from that of the bias in precipitation and air temperature, suggesting that runoff is strongly influenced only by the land surface model.For the area of the ITCZ, which is characterized in terms of precipitation and air temperature, a negative bias is observed in low latitudes of the Northern Hemisphere and a positive bias in the low latitudes of the Southern Hemisphere in January.In July, low latitudes in the Northern Hemisphere show area of positive and negative bias, and in the low latitudes in the Southern Hemisphere show a negative bias.This point is further discussed in Section 5.

Ensemble Predictions
AGCM60 currently has four ensemble experiments in the current climate.AGCM20 results and AGCM60 ensemble results were compared in terms of precipitation, air temperature, and runoff, and their differences were determined.
Precipitation results (Figure 9) represent annual averages for the period of interest, with spd results subtracted from each ensemble.There was no difference between any of the ensembles and spd, indicating that the difference between ensembles was negligible.Similarly, for air temperature (Figure 10), differences between ensembles were not significant.The differences in runoff are similar to these in precipitation, and there are no differences between the ensembles, so the data are not shown.There are small differences in precipitation, air temperature, and runoff between the ensembles, whereas differences due to AGCM resolution are more pronounced.To confirm this, the variance of annual mean values of precipitation, air temperature, and runoff for each calculation was determined; minimum, maximum, and mean values for each basin, with ensemble results being divided by spd, are shown in Table 4.The ratio of the variance of each ensemble to spd indicates that, on average, there was no significant difference between AGCM20 and AGCM60 for either precipitation or air temperature (Table 4),  3).Red equations in the lower-right corners are for the regression lines.
with AGCM60 results being more variable than AGCM20 results.However, runoff has smaller minima, larger maxima, and on average higher values than precipitation, which is expressed using the same unit.This indicates that runoff has a uncertainty due to resolution than does precipitation.Based on a comparison of differences by resolution, we consider that even with an AGCM20 ensemble, there is no significant difference to single AGCM20 results as, on average, there is little difference between ensembles.Results for one AGCM20 are thus sufficient for consideration of climatic values of the current climate.

Results and Discussion
Our results indicate that calculated values tend to over-predict observed values for annual mean precipitation.The model reproduced observed values well overall, although errors in reproducibility were observed at mid-high latitudes in the Northern Hemisphere.Regarding seasonal variability of precipitation, the high reproducibility in January at mid-high latitudes in the Northern Hemisphere suggests that AGCM20 well reproduces frontal precipitation in this region.Positive and negative biases were observed at low latitudes in July and January in the Northern and Southern hemispheres, respectively, suggesting that reproducibility associated with migration of the ITCZ may be poor in this region.At mid latitudes of the Southern Hemisphere, a positive bias was observed in January and a negative bias in July, suggesting that convective precipitation in January (summer) is overestimated,  whereas frontal precipitation in July (winter), is underestimated.Annual average air temperatures had a positive correlation coefficient between calculated and observed values globally, capturing the rising temperatures since around the start of the 21st century.Seasonal variations are characteristic of the tropics.In the tropics and surrounding mid latitudes, the Northern Hemisphere tended to have a low temperature bias in all seasons, whereas the Southern Hemisphere had the highest temperature bias in October and a shift to a low temperature bias in January.The annual mean runoff shows a negative bias in the Northern Hemisphere and a positive bias in the Southern Hemisphere.With respect to seasonal variations, runoff shows a positive bias in April and a negative bias globally for all other monthly average values.The results for runoff bias differ from those for precipita tion  and air temperature bias, suggesting that runoff is strongly influenced by the land surface model only.The runoff bias that originates in the land surface models is discussed below.For precipitation, large positive and negative biases were found in July (Northern Hemisphere summer), with generally negative biases for air temperature; positive and negative biases were found for precipitation and air temperature, respectively, in January (Southern Hemisphere summer).Considering this from a hydrological perspective, if the latent heat flux over land was overestimated owing to an oversupply of precipitation, air temperature should generally shift with a negative bias.Such a trend was observed in the Southern Hemisphere (but not in the Northern Hemisphere), suggesting a bias in the reproduction of the ITCZ and development of the summer monsoon.The reproducibility of the ITCZ is limited by the double-ITCZ and equatorial Pacific cold tongue biases, which remain unsolved.Zhang et al. (2019) reviewed three ITCZ biases: a stratocumulus and associated sea surface temperature bias in the southeastern Pacific, a shortwave absorption bias in the extratropics of the Southern Hemisphere, and a bias due to convective parameterization.The third convective parameterization, according to the category of Zhang et al. (2019), is related to the ITCZ bias, as the present study uses observation-based SST as a boundary condition.On the other hand, precipitation-reevaporation and boundary layer-convection interactions play an important role in the formation of a double ITCZ in AGCM experiments (Bacmeister et al., 2006).Zhou and Xie (2017) also noted that the low-temperature bias in the terrestrial region contributes to the double-ITCZ bias.However, the results of the Atmosphere Model Intercomparison Project (AMIP) experiment, which gives the observed SSTs, indicate that the origin of the double-ITCZ bias is unclear (Zhou & Xie, 2017).Thus, from a climatological perspective, it is possible that the ITCZ bias is also influenced by the land surface temperature bias.Considering land surface temperatures from a hydrological perspective, differences in evapotranspiration and differences in soil moisture conditions may influence the air temperature bias.Although there are no long-term observational data for evapotranspiration and soil moisture content, long-term data are available from GRDC for river discharge (with some missing values).In terms of the runoff bias results, focusing on the low-latitude regions in January and July associated with the ITCZ, runoff shows a negative bias in January in the northern low latitudes and a positive bias in the southern low latitudes.In July, positive and negative biases are apparent in the northern low latitudes, and negative biases in the southern low latitudes.In comparing the bias of runoff with that of precipitation, the Northern Hemisphere in January shows a positive bias for precipitation and a negative bias for runoff, and in July shows positive and negative biases for both precipitation and runoff.The Southern Hemisphere in January shows a positive bias for both precipitation and runoff, and in July shows a negative bias for both precipitation and runoff.Except for January in the Northern Hemisphere, the bias patterns are similar for precipitation and runoff.However, no such pattern is observed for months other than January and July, or for regions outside of the low latitudes.The results for the precipitation, air temperature, and runoff bias are provided in Table 5.It is difficult to determine which of the precipitation and runoff biases better explains the air temperature bias because of the qualitative relationship between positive and negative.Therefore, the value of observed river discharge divided by basin area was subtracted from precipitation to obtain the apparent evapotranspiration.Soil moisture content is a state quantity and, assuming no longterm trend, is not expected to affect long-term evapotranspiration.By comparing calculated precipitation minus runoff to rivers with observed precipitation minus river discharge divided by basin area, the hydrological bias of air temperature can be clarified.To this end, we compared the bias of precipitation or apparent evapotranspiration with the bias of air temperature to determine which bias better explains the bias of air temperature by means of scatterplots.Annual mean results for the entire region and for the temperate and tropical regions, where negative correlations are observed, are shown  in Figure 11.For the entire region, apparent evapotranspiration has a higher correlation coefficient and steeper regression-line slope than precipitation bias in terms of its effect on the air temperature bias.In comparing the tropical and temperate regions, apparent evapotranspiration explains the air temperature bias better in tropical regions, whereas precipitation explains the air temperature bias better in temperate regions.Figure 11 shows no differences of the biases with respect to river basin area.Especially in tropical regions, the apparent evapotranspiration thus explains the negative bias in air temperature better than precipitation, because if evapotranspiration is overestimated, the latent heat flux will be over-reproduced and the sensible heat flux will be under-reproduced.
Given that the analysis is from a climatological perspective, the underestimation of air temperature may be caused by an underestimation in downward solar radiation that results from cloud formation due to water vapor advection and parameterization by climate models.This may result in increased precipitation, but the higher near-surface humidity might also reduce the exchange of water vapor between the surface and the atmosphere, resulting in an underestimation of the latent heat flux and an overestimation of the sensible heat flux.Another factor could be that the transport of drier water vapor results in an increase in the water vapor flux between the surface and the atmosphere, and evapotranspiration results in an increase in latent heat transport, leading to an underestimation of air temperature.However, since the annual mean precipitation is overestimated over land, it is unlikely that dry water vapor is being transported.In this context, the annual mean runoff bias may contribute to the air temperature bias.The January and July results associated with the ITCZ for all climate zones are shown in Figure 12.
There is little difference between tropical and temperate regions.Although the July results indicate a lower correlation coefficient, precipitation better explains the air temperature bias.From a climatological perspective, this can be explained by the advection of water vapor, accompanied by cloud formation, increased precipitation, and cooler temperatures.However, it is difficult to consider the effect of runoff bias from a climatological perspective.Therefore, the relationship between apparent evapotranspiration bias and air temperature bias is now considered from a hydrological perspective in the following.The apparent evapotranspiration bias is subject to the runoff bias in addition to the precipitation bias, which may offset the apparent evapotranspiration bias.Therefore, for July, the precipitation bias may have better explain the air temperature bias; the runoff bias might arise because the calculation used for the precipitation-to-runoff conversion is inaccurate.In particular, the observed river discharge includes horizontal groundwater flow that slowly discharges, and the land surface model does not explicitly represent this horizontal groundwater flow.Therefore, the model may not be appropriate for defining the relationship between apparent evapotranspiration and air temperature on a monthly basis.Recently developed land surface models are now able to consider not only the vertical behavior of groundwater but also horizontal flow (e.g., Zeng et al., 2018), as it has become clear that the vertical behavior of groundwater alone is not sufficient to predict soil moisture, evapotranspiration, and precipitation (e.g., Barlage et al., 2021;Maxwell & Condon, 2016).
Many previous studies have reported that runoff bias is strongly influenced by precipitation bias (e.g., Guo et al., 2022).However, the present results show that runoff bias may influence air temperature bias.To our knowledge, no previous study has shown that runoff bias, rather than a factor related to runoff bias, influences climate bias.To assess the impact of runoff bias on climate bias, this study used basin-averaged values.This enabled precipitation and air temperature, which are expressed as areal data in observations, to be combined with river discharge, which is expressed as point data, into a single value.Similarly, calculations can employ precipitation, air temperature, and runoff data, which are areal data, as a single value, and the bias can be determined by obtaining the difference between observations and calculations, thereby enabling a direct comparison of runoff and climate variables.In this study, precipitation and air temperature were used as climate variables, but the method can be used for other climate variables, and to jointly analyze areal and point data.The reproducibility of runoff needs to be enhanced to further improve the accuracy of climate models.
The AGCM20 model had only one calculation result in the 65-yr continuous experiment, so we attempted to supplement the AGCM20 results with the AGCM60 results, using an ensemble calculation in the present climate.We found little difference between the ensembles in the present climate for precipitation and air temperature.On the other hand, runoff tended to be more strongly affected by resolution than precipitation or air temperature.However, the differences between ensembles are minor (Table 4).Therefore, a single AGCM20 result is thus sufficient when considering current climate conditions.Furthermore, although not a result of HighResMIP used in this study, the latest CMIP6 model with the same model structure (i.e., MRI-ESM2-0) does not predict significant outliers for precipitation (e.g., Tang et al., 2021;Zhu & Yang, 2021), air temperature (e.g., Fan, Duan, et al., 2020;Papalexiou et al., 2020) and runoff (e.g., Guo et al., 2022;Hou et al., 2023) compared with other models.Based on the above, it is considered that the model used in this study is not significantly different from other HighResMIP models and the evaluation method for climate values implemented in this study can be utilized in other models.

Conclusion
With the aim of analyzing AGCM biases and their contributing factors from a hydrological perspective, this current-climate study employed a 65-yr continuous experiment that involved comparison between modeled and observation-based results.Basin-average values were used in the comparisons, covering the misalignment of precipitation and air temperature calculations with annual or monthly average climate values.Conclusions are as follows.
As an indicator of air temperature bias, apparent evapotranspiration (precipitation minus runoff) is particularly useful in a study of annual average climate values.The lack of adequate representation of monthly averages suggests that runoff derived from observed river discharge is affected by differences due to slow groundwater flow, which is not adequately represented in the land surface model, thus providing suggestions for future improvement of the model.The runoff output from land surface models has received little attention to date, possibly because the method for verifying runoff requires the input of runoff into a river model and comparison with observed values.River routine models are developed mainly by hydrologists rather than by AGCM developers and climatologists.However, by employing basin-average values, it is possible to simultaneously evaluate the reproducibility or bias of AGCMs for runoff and climate variables, and we suggest this as a new method.

Figure 1 .
Figure 1.Conceptual diagram for the calculation method for a basin average value.The circle represents a river discharge observation point; shaded areas represent the area upstream of the observation point; and gray lines represent the grid.

Figure 2 .
Figure 2. Global map of evaluation indicators for precipitation (a) annual mean (b) January (c) April (d) July, and (e) October (left, correlation coefficient; right, bias).

Figure 3 .
Figure 3. Scatterplots of calculated and observed average precipitation (top, annual mean; bottom left, January average; bottom right, October average).Colors indicate the Köppen climate classification: A, tropical; B, arid; C, temperate; D, subarctic; E, polar.Numbers in the upper left are indices for observed and calculated values (Table3); red equations in the lower-right corners are for the regression lines.

Figure 4 .
Figure 4. Global map of the Theil-Sen slope of the Mann-Kendall trend test (top left, calculated annual mean precipitation; top right, observed annual mean precipitation; bottom left, calculated annual mean temperature; bottom right, observed annual mean temperature).Precipitation in the top row is almost zero on the Theil-Sen slope, indicating that there is no long-term trend in precipitation.

Figure 5 .
Figure 5. Global map of the bias in average temperature (top, annual; middle left, January; middle right, April; bottom left, July; bottom right, October).

Figure 6 .
Figure 6.Scatterplots of calculated and observed annual mean temperatures (left, temperate; right, subarctic).Colors indicate basin area.Numbers in upper-left corners are indices for observed and calculated values (Table3).Red equations in the lower-right corners are for the regression lines.

Figure 7 .
Figure 7. Scatterplots of calculated and observed monthly mean temperatures in the tropical region (top left, January; top right, April; bottom left, July; bottom right, October).See Figure 6 for other details.

Figure 8 .
Figure 8. Global map of evaluation indicators for runoff (a) annual mean (b) January (c) April (d) July, and (e) October (left, correlation coefficient; right, bias).

Figure 11 .
Figure 11.Scatterplots of temperature bias and precipitation or apparent evapotranspiration bias relative to the annual mean (a) the whole area (b) tropical regions, and (c) temperate regions (left, precipitation bias; right, apparent evapotranspiration bias).See Figure 3 for details.

Figure 12 .
Figure12.Scatterplots of temperature bias and precipitation or apparent evapotranspiration bias relative to January and July (top left, precipitation bias in January; top right, apparent evapotranspiration bias in January; bottom left, precipitation bias in July; bottom right, apparent evapotranspiration bias in July).See Figure3for details.

Table 3
Evaluation Indices