Systematic daytime increases in atmospheric biases linked to dry soils in irrigated areas in Indian operational forecasts

The representation of land–atmosphere coupling in forecast models can significantly impact weather prediction. A previous case study in Northern India incorporating both model and observational data identified atmospheric biases in a high‐resolution forecast linked to soil moisture that impacted the representation of the monsoon trough, an important driver of regional rainfall. The aim of the current work is to understand whether this behavior is present in operational forecasts run by the India National Centre for Medium Range Weather Forecasting (NCMRWF). We utilize satellite observations and reanalysis to evaluate model fields in June, July, August, and September forecasts from 2020. Our analysis reveals systematic rapid growth in warm boundary layer biases during the daytime over North West India, which weaken overnight, consistent with excessive daytime surface sensible heat flux. The cumulative effect of these biases produces temperatures more than 4K warmer in 60‐h forecasts. These effects are enhanced by dry surface conditions. The biases impact circulation in the forecasts, which have implications for regional rainfall. The spatial distribution of warm biases in the Indo‐Gangetic Plain is remarkably consistent with the location of areas equipped for irrigation, a process that is not explicitly represented in the model. Our results provide compelling evidence that the development of an irrigation scheme within the model is needed to address the substantial forecast biases that we document.


| INTRODUCTION
Soil moisture (SM) is an important element of weather forecasting in semiarid regions where SM exerts a strong control on the transfer of water and heat between the surface and atmosphere through moderating the partitioning of surface fluxes (e.g., Small & Kurc, 2003). This influences the thermodynamic properties of the planetary boundary layer and can impact the formation of clouds and precipitation through positive and negative feedbacks (e.g., Ek & Holtslag, 2004;Eltahir, 1998;Findell & Eltahir, 2003;Taylor et al., 2012).
The importance of SM in weather forecasting has drawn attention to the impact of irrigation. By adding water to the soil, irrigation can alter the near-surface climate and drive significant changes in rainfall through SM-atmosphere coupling (e.g., Douglas et al., 2009;Fletcher et al., 2022;Kang & Eltahir, 2019;Sacks et al., 2009;Yang et al., 2019). The feedbacks are also sensitive to the type of irrigation (Lawston et al., 2015). Differences in water application and management will alter the spatial and temporal variation of surface properties, which will modulate the spatial patterns and diurnal cycle of surface fluxes and, in turn, the thermodynamic properties of the lower atmosphere and terrestrial moisture budget. How this is represented in a forecast model can result in substantially different rainfall predictions (Devanand et al., 2019). Despite this, irrigation schemes are often absent from weather and climate models.
Northern India is one of the most irrigated regions in the world (FAO, 2014) and exhibits strong landatmosphere coupling (Koster et al., 2004). Recent model experiments for the Ganges river basin (Fletcher et al., 2022) and South Asia (Devanand et al., 2019) have shown that the inclusion and representation of irrigation can impact rainfall prediction. For example, irrigation can enhance daytime orographic precipitation by 10%-30% (Fletcher et al., 2022). However, operational forecasts for the Indian region are generated from a model that does not currently incorporate an irrigation scheme.
The current work considers the India National Centre for Medium Range Weather Forecasting (NCMRWF) 4 km regional model (NCUM-R), which provides operational forecasts for the India region. Our previous work (Barton et al., 2020, hereafter B20) examined a forecast from the 1.5 km NCUM for a single day during monsoon onset. The analysis revealed significant biases (compared to reanalyses data) in the monsoon trough linked to misrepresentation of wet (including irrigated) surfaces in North-West (NW) India. This raises concern that similar biases exist in the operational forecasts. The objective of the current work is to analyze an entire season of operational forecasts to determine (1) if the B20 case was a one-off or an example of systematic behavior and (2) more comprehensively assess the link to the surface. This paper is organized as follows. Section 2 provides model details and lists the datasets employed in the analysis. Section 3 presents an evaluation of model fields over South Asia, followed by a more detailed analysis of biases in NW India. Our results are then discussed in Section 4 with our main conclusions summarized in Section 5.

| DATA
This study analyses 3-day operational forecasts from the 2020 Indian monsoon season (June-September, JJAS) generated by the 4 km NCUM-R (see Supporting Information S1). NCUM-R is a regional version of the Met Office Unified model optimized for India. The forecasts are initialized at 00UTC daily inheriting initial and lateral boundary conditions from the $12 km global model (Kumar et al., 2020). The model time-step is 60 s with diagnostic variables output hourly. The global SM analysis assimilates observations of screen-level variables (2 m temperature and humidity) and ASCAT satellite SM (Dharssi et al., 2011;George et al., 2016). The analysis is prepared every 6 h by comparing the model near-surface atmospheric state to screen-level observations. Differences are reduced by uniformly adding (removing) SM, constrained by satellite observations, from the nearsurface soil layers, to increase (decrease) evapotranspiration and hence moisten (dry) and cool (warm) the model. The resulting SM field is downscaled to 4 km to provide an initial state for the regional forecast. Vegetation status and surface elevation are derived from the 30 m Indian Space Research Organization land-use land-cover map and 90 m NASA Shuttle Radar Topographic Mission digital elevation map, respectively. Irrigated land cover type and irrigation processes are not explicitly represented; however, increased SM in irrigated areas may be partially captured by the SM analysis described above (e.g., Tuinenburg & de Vries, 2017). The model uses 80 vertical levels with a top at 38.5 km and 14 levels below 1 km. Convection is resolvable and subgrid scale deep-convection is not parametrized. More details can be found in Jayakumar et al. (2017aJayakumar et al. ( , 2017b. To evaluate initialized SM in the model, we employ 9 km SMAP L2 observations (Chan et al., 2018;O'Neill et al., 2021). We use measurements from the descending overpass at 00UTC (0530LT). The satellite covers a different swath of the country each day with a 4-day return time. SMAP has outperformed other satellite SM products (Chen et al., 2018). We also evaluate land surface temperature (LST) from the first day of each forecast using two independent satellite products, microwave and infrared. First, 10 km L2 microwave observations from the ascending overpass (1330LT, UTC + 5.5) of AMSR2 (de Jeu & Owe, 2014;Owe et al., 2008) with a 2-day return time. It is necessary to filter this dataset to minimize the impact of rainfall on the surface signal, as the 36.5 GHz channel used for LST retrieval is sensitive to water droplets in the atmosphere (Gao et al., 2008;Holmes et al., 2009). This is done by excluding all pixels with rainfall rates greater than 1 mmÁh À1 between 1300 and 1400LT in the microwave only GPM IMERG L3 HQ precipitation product (Huffman et al., 2019). Microwave LSTs provide measurements for all-sky conditions, though may contain relatively large errors ($4-5K, Holmes et al., 2009;Zhong et al., 2021). Second, 0.05 L3 infrared MODIS observations from the daytime overpass (1030LT, UTC + 5.5) of the Terra satellite (Hulley & Hook, 2021) with a 2-day return time. Infrared LSTs can have very high accuracy (<1K, Wan, 2008), but cannot observe the surface under cloudy conditions, a particularly strong constraint over India during the monsoon season.
To evaluate the atmospheric variables and surface fluxes in the model, we employ the most recent ECMWF reanalysis product ERA5 (Hersbach et al., 2018a(Hersbach et al., , 2018b. This provides hourly data at 0.25 resolution.

| South Asia surface evaluation
We find that top-level SM is a dominant driver of surface flux partitioning (see Supporting Information S1). We would therefore like to evaluate model SM with comparison to observations. For this, we employ initialized (first time-step 0530LT) top-level SM from each forecast and coincident satellite observations from SMAP. It is not possible to do a direct quantitative comparison between modeled and satellite SM as the two quantities have different physical meanings (e.g., depth of soil). Instead, we convert SM from the model and observations to a wetness index (WI) by expressing each value as a percentage of the wettest pixel in the domain. This allows us to perform a qualitative comparison of spatial features in each dataset and supplement this with a quantitative comparison of LST (discussed below). Figure 1a-d presents monthly JJAS mean differences (NCUMÀSMAP) between these fields. This highlights areas where the model is wetter/ drier than SMAP estimates relative to the rest of the domain. This comparison must be interpreted with caution, not only due to the qualitative nature of the calculation, but also due to uncertainties in the SM observations. SM is a particularly challenging variable to observe remotely, and SM products tend to have higher uncertainties than other variables (e.g., LST). However, Figure 1a-d reveals that the model appears to be drier than the observations (relative to other areas) in NW India and Pakistan. The drier surfaces in NW India (relative to other areas) are more extensive in June and July compared to August and September.
Due to the limitations of the SM evaluation described above, we also evaluate model LST, which can be directly compared to satellite observations, to look for consistent spatial structures. LST is sensitive to soil (and vegetation) water content and can therefore provide information on surface wetness (e.g., Anderson & Kustasd, 2008;Fisher et al., 2020). For this, we employ early afternoon (1330LT) and late morning (1030LT) LST from the first day of each forecast and coincident satellite observations from AMSR2 and MODIS, respectively. Figure 1e-l presents monthly JJAS mean differences (NCUMÀAMSR2/MODIS). The AMSR2 bias-maps show a >10K warm bias over NW India and Pakistan in June which extends across the country and into Bangladesh in July. The bias weakens over NW India in August and September but persists over Pakistan and Bangladesh, although the latter becomes less extensive in September. The quantity of MODIS observations are limited due to cloud cover; however, where we do have sufficient cloud-free data (in the NW of the domain), we see a consistent warm bias in NW India and Pakistan. The negative LST biases in the MODIS bias-maps may not be represented in the AMSR2 bias-maps as AMSR2 is known to be colder than other sensors (Boori et al., 2015).
Comparing the SM and LST evaluations, it is evident that the strongest dry soil biases, found in NW India and Pakistan, coincide with warm LST biases. This supports our initial interpretation that the surfaces in NCUM-R are drier than observations in this region. Our previous work, B20 identified atmospheric biases linked to drier surfaces in this area. Away from the arid North-Western part of our domain, we do not observe a strong negative correlation between WI and LST biases. This is linked to reduced sensitivity of surface fluxes (and LST) to SM in moister regions ( Figure S1).

| South Asia atmosphere evaluation
We now investigate whether atmospheric biases are present in regions where the surface state in the model is biased relative to independent satellite observations. We employ the same fields as B20, 925 hPa temperature, and mean sea level pressure (mslp). We compare 60 h changes in these variables in the forecasts and ERA5 reanalysis data (1730LT Day 3 À 0530LT Day 1). Figure 2 shows JJAS monthly mean biases (NCUMÀERA5). There is a strong warm bias over NW India and Pakistan in the region of the heat-low/monsoon trough throughout the season. This indicates that NCUM-R is systematically simulating a more intense heat-low/monsoon trough (higher low-level temperature, lower mslp) than the reanalysis. It could be interpreted that the B20 case study was an example of typical model behavior rather than a one-off occurrence. The trough is an important dynamical feature of the Indian monsoon impacting circulation and regional rainfall. Monthly mean JJAS biases in nighttime 925 hPa wind vectors near the end of the forecast (Day 3 2330LT) are overplotted on the pressure field in Figure 2e,f. This illustrates the impact of the temperature biases on the heat-low circulation, which we expect to be largest overnight (Racz & Smith, 1999). The bias vectors indicate increased flow toward the region of low-pressure biases with magnitudes up to 11 ms À1 (compared to monthly mean NCUM-R wind speeds up to 16 ms À1 ) in the NW box (box in Figure 2a). Such circulation biases are expected to have significant impacts on the atmospheric moisture budget and predicted rainfall. It is therefore vital to understand factors contributing to atmospheric biases in NCUM-R over NW India.

| Diurnal cycle of biases
We now focus on NW India, where the low-level temperatures in NCUM-R are particularly biased compared to ERA5. Taking a 10 Â10 box (70 E-80 E, 24 N-34 N, shown in Figure 2a), we first examine the hourly evolution of selected area-average atmospheric and surface variables in the forecasts compared to ERA5 (Figure 3).
While the magnitudes of biases vary by month, the hourly evolution follows a similar diurnal cycle throughout the season. Focusing on the temperature field (Figure 3a), it is clear that the warm bias increases during the daytime, when there is active surface heating. The warm bias rises rapidly after sunrise reaching maximum daily values around 1200LT on the first day and 1630LT on the second and third days. Daytime sensible H (latent LE) heat flux is higher (lower) in NCUM-R compared to ERA5 (Figure 3c,d). However, it is important to note that there is far larger uncertainty in fluxes within ERA5 than in nearsurface air temperature, a variable that is well constrained by observations. Bearing that in mind, we note that NCUM-R substantially under-predicts EF compared to ERA5, consistent with the former developing a strong daytime warm bias. Also of note in Figure 3a is the positive trend in daytime warm biases from Days 1 to 3, consistent with decreasing EF in NCUM-R. Indeed NCUM-R top-level SM dries out on average by 2.5 kgÁm À2 over the 3-day forecasts (not shown), driving stronger atmospheric warming. Overall, the characterization of excessive daytime warming, relaxing back to lower temperatures overnight, provides clear evidence that the model warm bias originates in response to the diurnal cycle of solar forcing. Furthermore, EF biases point to errors in the partition of insolation as the cause of the warm bias, with excessive sensible heat produced at the expense of latent heat, as in our B20 case study. Over the course of 3 days, model SM dries out, driving EF even further from the ERA5 depiction, and progressively increasing the warm bias.

| Spatial distribution of daytime bias changes
On average, the temperature biases over NW India increase during the daytime, consistent with surface flux biases. We now examine the spatial pattern of the warm biases, quantified by two metrics. First, we compute the daytime change in 925 hPa temperature bias (bT) averaged over the 3-day forecast, We also quantify the change in bT between Days 1 and 3, sampled at the time of their peak (1230 and 1630LT, respectively), Both of these metrics indicate (Figures 4 and S2) that the strongest warm biases occur during the monsoon onset month of June, with the effects maximized over the Indus and Ganges basins. The Indus signal in Pakistan mostly persists throughout the season, while over the NW India portion of the Ganges, there is a progressive weakening as the season progresses. The warm atmospheric biases in both regions tend to coincide with warm surface biases. In NW India, the surface bias weakens markedly over the monsoon season, consistent with the improving air temperature forecasts. The similarity in the surface and atmospheric structures over these two regions provides further evidence that it is surface flux biases, driven by excessively dry soils (Figure 1) which F I G U R E 4 Northern India monthly average atmospheric bias changes NCUM minus ERA5 (left and central column) and land surface temperature bias NCUM minus AMSR2 (right column) for June (a), (b), and (c), July (d), (e), and (f), August (g), (h), and (i), and September (j), (k), and (l). Left column: daytime bias change (1730 À 5300 LT, UTC + 5.5) averaged over 3-day forecast, central column: change in peak bias (Day 3 À Day 1). The box in (c) indicates the area used to sample rainfall in our definition of dry and wet forecasts.
are driving the atmospheric biases. Although excessive advection of warm, dry air from the Middle East may play an additional role, that atmospheric mechanism cannot explain the finer scale structure depicted in Figures 4  and S2, nor the diurnal cycle of biases in Figure 3.
To further explore the role of surface hydrology on the atmospheric biases, we created two subsets containing the 30 driest and wettest forecasts of the season, based on 72 h accumulated rainfall averaged over 27 N-33 N, 70 E-80 E (box in Figure 4c). This area contains the maximum surface and atmospheric biases. Figure 5 shows the mean ΔbT Daytime and ΔbT Peak for dry (a, b) and wet (c, d) subsets. Black stippling indicates where the mean biases in the dry forecasts are significantly higher than the wet forecasts according to a single-tailed t test (p < 0.05). The dry forecast means show distinct spatial patterns of warm bias increases, which are significantly weaker in the wet forecast means. In particular, there are strong warm bias increases over the Indo-Gangetic Plains (IGP), and predominately zero bias changes over the Thar Desert (TD) bordering Pakistan. The results in Figure 5 provide additional evidence of the surface hydrological origins of the warm biases. Little or no rainfall during the forecast will exacerbate the dry soil problem and therefore enhance excessive daytime heating. During wetter forecasts on the other hand, SM is replenished, allowing the surface and boundary layer to remain relatively cool during the daytime.
The IGP are characterized by extensive areas equipped for irrigation (Figure 5e). During the monsoon season, Jha et al. (2022) suggest that irrigated fields account for more than 80% of the area. There is no explicit representation of irrigation in the model, and we therefore expect it to substantially underestimate (overestimate) latent (sensible) heat flux in irrigated areas. This expectation is realized by the consistency in location of the largest warm biases with irrigated areas in the otherwise arid NW portion of the domain. Alternative possible mechanisms, notably rainfall biases contributing to a dry initial soil state, cannot explain the fine-scale structure of the LST bias (Figure 4), which in Pakistan in particular, very closely resembles the irrigated land along the Indus River.

| DISCUSSION
Our analysis has revealed systematic boundary layer warm biases (compared to ERA5) in the region of the monsoon trough. The warm biases develop during the day in locations where paddy irrigation is widespread during the monsoon season. The spatial and temporal signature of warm bias increases points to an underestimate of evapotranspiration and excessive surface heating, linked to dry soil biases which could be driven by a lack of explicit representation of irrigation in NCUM-R. The warm bias increases are particularly clear during drier periods of the season, when the lack of rain contributes to enhanced surface heating. Systematic circulation biases in the global model may also contribute through excessive advection of warm air from the Middle East, though this does not explain either the fine-scale structure or diurnal cycle. F I G U R E 5 Northern India average atmospheric bias changes NCUM minus ERA5 for the 30 driest (a) and (c), and 30 wettest (b) and (d) forecasts. (a) and (b) Daytime bias change (1730 À 0530 LT, UTC + 5.5) averaged over 3-day forecast, (c) and (d) Change in peak bias (Day 3 À Day 1). (e) Area equipped for irrigation around the year 2005 (percentage of total area in grid cell; Siebert et al., 2013). Black stipples in (a)-(d) indicate significance at p < 0.05 compared to alternative subset mean. To highlight significant features rather than grid-scale significance, the stipples are plotted at 0.5 resolution.
The current analysis evaluates atmospheric fields in comparison to ERA5, with our key results based on temperature biases. Although we would expect ERA5 nearsurface temperatures to be a good representation of reality, as they are well constrained by observations, ERA5 is still a model with lower resolution than NCUM-R (0.25 vs. 4 km). Additional analysis by colleagues at NCMRWF using the 4 km NCUM-R analysis (Hashmi Fatima, pers. comm.) indicates similar patterns of warm bias developing over the daytime and the 3-day forecast. This gives us confidence that, while the magnitudes may be representative of errors in both NCUM-R and ERA5, the evolution of biases (daytime and 3 days increases) is representative of a physical issue within NCUM-R.
Our work highlights the importance of accurate surface representation in operational forecasts. Efforts at NCMRWF are already being made to improve SM initialization in NCUM-R by assimilating observational data directly into the regional model (rather than downscaling from the global model) with positive results from case studies (Lodh et al., , 2022. Assimilation of observations (including SM) can capture wet features (including irrigation) in the first time-step (e.g., Tuinenburg & de Vries, 2017) but does not ensure correct model behavior for the duration of the forecast. Our finding that areas equipped for irrigation coincide with the largest warm bias increases suggest that issues in the land model structure also need to be addressed to take full advantage of improvements in SM initialization. In particular, the JULES land surface scheme employed within the NCUM takes no account of irrigation. This is crucial because the current global assimilation system performs a bias correction between the observed SM (from ASCAT) and the simulated top-level (0-10 cm) SM (G omez et al., 2020). Although ASCAT detects the presence of wet soil in the irrigated zone, the bias correction process takes a fraction of the observed SM anomaly (from a monthly climatology) and adds it to the JULES climatology. As there is no irrigated water in JULES, the model climatology tends to be quite dry in the IGP. Thus with this particular implementation of the ASCAT assimilation, the substantial bias between the observed and simulated SM remains largely uncorrected (Kumar et al., 2015). There are ongoing efforts at the UK Met Office to operationalize a basic irrigation scheme within JULES (Heather Rumbold, Pers. Comm.). Numerical experiments utilizing this scheme would be beneficial to support our findings.

| CONCLUSION
Our results reveal systematic warm atmospheric biases which develop during the daytime and accumulate over the 3-day forecast in the region of the monsoon trough (NW India). This impacts regional circulation, in particular the convergence zone associated with the trough, which has implications for regional rainfall and largerscale monsoon circulation. The spatial distribution of biases coincides with extensive areas of dry soil bias in regions equipped for irrigation. Therefore our results strongly suggest that an irrigation scheme should be implemented in NCUM-R, which could improve the operational forecasts.