Remote Sensing of Global Daily Evapotranspiration based on a Surface Energy Balance Method and Reanalysis Data

Currently available evapotranspiration (ET) products have not provided us with an accurate estimation for global irrigated land area. Thermal energy balance (EB) model has the potential to solve this problem. The accurate estimation of aerodynamic resistances is a major complication in most remote sensing ET models. An EB model using a column canopy‐air turbulent heat diffusion method was developed to more realistically depict dynamic changes in aerodynamic resistance. In order to estimate global ET and land surface fluxes for all weather conditions, Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua and Terra land surface temperature fields were combined and a nearest‐evaporative‐fraction gap‐filling method was merged into the EB model. A global ET product covering the period 2003–2017 was produced using the EB model. By combining thermal and optional information from MODIS satellites and surface meteorological forcing data from ERA‐Interim reanalysis data, the EB model provides a 5 × 5 km resolution estimate of daily ET without spatio‐temporal gaps at the global scale. Assessment of the daily EB ET at 238 flux sites showed a mean bias of 0.05 mm/day and an RMSE of 1.56 mm/day. Analysis of global precipitation minus ET demonstrated that EB ET has a relatively higher potential for agriculture water resource management than currently available global ET products, such as Landflux, GLEAM, MOD16, GLDAS, and ERA‐Interim ET. In addition, the EB model developed in this study can be applied to both polar and geostationary satellite thermal sensors.

The estimation of surface resistance is a major issue in most remote sensing ET models, especially the estimation of canopy and aerodynamic resistances. The LSA-SAF (Land Surface Analysis Satellite Applications Facility) model uses a numerical method to solve the energy balance equations over a simplified land classification of the Earth's land surface (e.g., high canopy, low canopy, bare soil, or snow surface) and uses a land-cover-based fixed roughness length or aerodynamic resistance value to describe heat and water exchange between the surface and atmosphere, for example, Martins et al. (2019). MOD16 (Mu et al., 2011), CSIRO (Zhang et al., 2010), and PT-JPL (Fisher et al., 2008) global ET models have developed empirical solutions for canopy resistance. It has been reported that ET value differences among models are mainly due to differences in the estimation of aerodynamic resistance (Bhattarai et al., 2018;Su et al., 2020). In addition, most estimation approaches do not address the dynamic changes in aerodynamic resistance with roughness and stability.
The EB method applies a parameterized aerodynamic resistance method to estimate the turbulent sensible heat flux (H) (Chen, Massman, & Su, 2019) and ET. Remotely sensed land surface physical properties are used to constrain the aerodynamic resistance, which allows the evolution of the EB model's aerodynamic resistance in space and time. The aerodynamic resistance method in the EB model uses column canopy information to describe canopy-air turbulent heat diffusion and corrects the typical problem of the underestimation of sensible heat within the surface energy balance system (SEBS) (Chen, Massman, & Su, 2019;Michel et al., 2016). The EB model uses the difference between the thermal land surface temperature (LST) and the overlying air temperature (Ta) to infer the evaporative fraction (EF). Daily ET is calculated as EF multiplied by daily available energy. As the surface-to-air temperature difference is used to infer stability, the EB model indirectly includes the stability effect in its calculation of ET. The performance of SEBS and the EB model when forced by satellite data, has not been compared before this study. In this study, we also assess their relative performance at global scales using satellite-derived input data.
The success of the EB model relies on the availability of existing satellite data to provide high-quality global LST estimates. The LST can be retrieved from either microwave or thermal satellite sensors (Holmes et al., 2018;Sobrino et al., 2020). Microwave remote sensing is not limited by cloud cover, unlike thermal remote sensing, which makes it possible to use microwave-based LST to estimate ET in all-sky conditions; however, this comes at the cost of coarse resolution (Holmes et al., 2018). Because of cloud cover, thermal infrared estimates of LST, which are available at high spatial resolution, suffer large spatio-temporal gaps. To overcome this problem, a nearest-day EF gap-filling method was used to calculate ET during cloudy days when satellite thermal data are unavailable. The EB method developed in this study can estimate ET for all weather conditions to provide gap-free ET and land surface flux and ensure complete temporal and spatial coverage.
The aim of this study was to produce a seamless global daily ET product using the EB model. The polar-orbiting Terra and Aqua satellites carrying the MODIS (Moderate Resolution Imaging Spectroradiometer) sensors pass over each point on Earth at least once per day. Chen et al. (2014) present an initial regional evaluation of the EB model over mainland China using observations from MODIS. This study extends the CHEN ET AL. 10.1029/2020JD032873 2 of 22 use of the EB model to the global scale, at a daily time scale, using MODIS data from both the Terra and Aqua platforms. In the EB model, self-preservation of EF during the daytime (Gentine et al., 2007(Gentine et al., , 2011 was used as a foundation to calculate the daily ET, which was computed from instantaneous H and LE estimates at the overpass time of the satellite. All the remote sensing input data in this work were based on MODIS. While MODIS was used in this study, the methods described here could be easily extended to polar thermal sensors, for example, VIIRS (Visible Infrared Imaging Radiometer Suite), in the future. The structure of the remainder of this study is as follows. Further details of the EB model and input data are described in Section 2. The enhancements made to the EB model are also described in Section 2. In Section 3, evaluation results and climatological analysis of the global-scale evaporative fluxes are presented. Section 3 also discusses the performance of the gap-filling technique using measurements from 238 flux towers over the world. Conclusions and Discussion are presented in Sections 4 and 5.

Energy Balance ET Model
The foundation of our EB model is the SEBS (Su, 2002), with modifications to the sensible heat computation described in Chen, Massman, and Su (2019). Tables 1 and 2 list the input and output variables, respectively, used in EB and SEBS, highlighting differences in the model formulations. For example, the canopy-air turbulent transfer parameterization in SEBS has been significantly modified in EB (Chen, Massman, & Su, 2019;Chen, Su, Ma, Yang, Wen, & Zhang, 2013). SEBS could be taken as a 'big leaf' model, whereas the EB model uses a column canopy-air turbulent diffusion method to simulate the turbulent flux. Both model performance on global ET estimation were also compared in Section 3.1.    Massman (1999) and Blümel (1999). In the EB configuration used here, roughness length was tied to landcover type, and a different method of gap filling was employed.  (Gentine et al., 2007(Gentine et al., , 2011. Daily net radiation ( d Rn ) was retrieved from reanalysis data.
Instantaneous net radiation flux (Rn) at the MODIS overpass time was computed as follows: Residual of surface energy balance Same as SEBS, with different H and Abbreviations: EB, energy balance; SEBS, surface energy balance system.
The sensible heat flux (H) [W/m 2 ] was computed using the Monin-Obukhov similarity theory (MOST) correction for the aerodynamic resistance (Chen, Massman, & Su, 2019): z is the heat roughness length (see below) [m]; and  h is the sensible heat stability correction function (Brutsaert, 1999). Here,  a was extrapolated to height r Z from the ERA-Interim 2 m air temperature using MOST assuming neutral stability conditions. The integral term on the right side of Equation 4 is the roughness sublayer corrections for sensible heat (Mölder et al., 1999), which is necessary when r Z [m] is in the roughness sublayer above the canopy top; and L is the Obukhov length [m]. The height of r Z is usually within the roughness sublayer over high forests. It is therefore necessary to include the roughness sublayer corrections for forest land cover (Cellier & Brunet, 1992;De Ridder, 2010;Garratt, 1980;Mölder et al., 1999). The roughness sublayer correction method from Physick and Garratt (1995) was used here according to Chen, Massman, and Su (2019). 0 d is derived from canopy height h and the wind speed extinction coefficient (Massman et al., 2017). The within-canopy wind momentum transfer model was used to estimate 0 d and 0m z : Readers are referred to Massman et al. (2017) for the calculation of   The roughness length for heat ( 0h z ) includes a correction from the momentum roughness length (Verhoef et al., 1997): kB at each pixel was derived using the fractional canopy coverage: Because vegetation height (h, [m]) is used to estimate roughness length, it is critical for accurate sensible heat estimates (Chen, Su, Ma, Yang, Wen, & Zhang, 2013). To estimate h in this study, we applied a recent global forest height map, which was produced by lidar from remote-sensing platforms (Simard et al., 2011). Because a short canopy height (e.g., maize, rice, and wheat) usually exhibits strong seasonal and interannual variations, we applied an Normalized Difference Vegetation Index (NDVI)-based estimate of short canopy height from Chen, Su, Ma, Yang, Wen, and Zhang (2013): are the minimum and maximum canopy height for a specific land cover type (LCT), respectively; min _ LCT h is set to 0.002 m (Chen, Su, Ma, Yang, Wen, & Zhang, 2013); and max _ LCT h for savannas (including woody savannas), cropland, grassland, shrubland, barren, and sparsely vegetated pixels were each given a different value (Chen et al., 2014). MODIS IGBP (International Geosphere-Biosphere Programme) land-cover type (MCD12C1) for 2010 was used to classify the pixels. value to calculate the short-canopy height. This ND-VI-based short-canopy-height was used to replace the laser forest canopy height estimate for vegetation shorter than 10 m. By merging the high-canopy heights and NDVI-based short-canopy data, we constructed daily dynamic maps of canopy height for the period 2000-2017. These were then used to estimate roughness length.
The ground heat flux 0 G [W/m 2 ] was assumed to be proportional to Rn: where  c  c = 0.05 (Monteith, 1973) and  s = 0.315 (Kustas & Daughtry, 1990) were empirical coefficients for full vegetation cover and bare soil, respectively. An interpolation was performed between the two limiting cases using vegetation coverage c f .
using the method from Massman (1999). For snow, ice, and glacier area identified by MODIS IGBP land cover, we used the equation 0 G = 0.05 Rn (Chen, Su, Ma, Yang, & Wang, 2013). The EF was calculated as follows: The daily ET was computed by assuming a constant evaporative fraction, after deriving G 0 , H, and LE and taking into account the energy and water limits (Su, 2002). Experimental studies have shown that EF is nearly constant during clear-sky daytime conditions (known as daytime self-preservation) (Crago & Brutsaert, 1996;Gentine et al., 2007Gentine et al., , 2011Nichols & Cuenca, 1993;Shuttleworth et al., 1989). EF was estimated at the satellite overpass time. To define the daily integrated ET, we simply assumed that EF is preserved during the day, which means that EF d = EF (Delogu et al., 2012). Peng et al. (2013) showed that instantaneous EF evaluated between 11:00 and 14:00 LT (local time) can generally represent the daytime average EF. The daily constant EF method has been shown to work well in both clear and cloudy situations, especially compared with other methods (Jiang et al., 2018). Assuming daytime and nighttime 0 G to cancel each other out, daily ET is then: where Nt i and Ns i are the net thermal and net shortwave radiation [W/m 2 ] for each three hours (0, 3, 6…21), respectively, obtained from the ERA-Interim data set.
The instantaneous EF field is constrained by the single daytime LST observation field (demonstrated by Equations 4 and 11). When there is no valid EF because of missing LST or poor quality of input data, EF from a previous day is taken as a prior-knowledge for the current day's EF. This makes EF a temporally continuous variable. Gap filled EF values were assigned a special flag, which could be used to check the CHEN ET AL.
10.1029/2020JD032873 6 of 22 influence of EF gap-filling method on the quality of the ET data. Section 3 will give analysis on the accuracy of gap-filled and non-gap-filled ET data using the flag.

Description of Forcing Datasets
The EB model inputs for the global ET calculation were taken from MODIS remote sensing data as much as possible, whereas the other data were obtained from reanalysis data (here ERA-Interim was used). Because of the influence of clouds, there are frequent spatio-temporal gaps in satellite visible and thermal band data. A variety of gap-filling algorithms have been used to produce continuous MODIS satellite-sensed variables Moody et al., 2005). MODIS sensors have the unique advantage of providing consistent measurements for a wide range of land surface variables and also provide observations for more than 18 years. Both MODIS Terra and Aqua sensors provide LST information at daily scale with global coverage. Gap-free albedo and LAI products from the Global LAnd Surface Satellite (GLASS) data set (Liang & Liu, 2012) were used to calculate the impact of canopy phenology on the momentum and heat transfer resistance. Thus, the momentum and heat transfer resistance have a diurnal and seasonal dynamic variation, which will further influence the turbulent heat and water transfer. The EB model has also been improved based on the IGBP land cover classification (Chen, Massman, & Su, 2019  available sensors used in the model, the daily ET product at 0.05° × 0.05° (latitude × longitude, ∼5 km at equatorial region) resolution was set as the final goal for this study, using data listed in Table 3. The daytime overpass times of Terra and Aqua are 3-4 h apart. EFs from both Terra and Aqua could represent the daytime mean EF, since EF is nearly constant throughout the day in the absence of stress-induced stomatal closure. Gaps in the LST products will create gaps in the EF retrieval. To minimize gaps in the daytime global LST map used in EB, daytime LST from both Terra and Aqua were jointly used. There are two ways to use the MODIS daytime LST datasets for ET or EF calculation. The first method is to combine Terra MOD11C1 and Aqua MYD11C1 daytime LSTs and use the combined LST for daily ET calculation with the EB model, represented by EB (Terra &Aqua). Another method is that ET could be retrieved by using the same EB ET model (including the EF gap-filling method) with Terra MOD11C1 or Aqua MYD11C1 daytime LSTs separately, represented by EB (Terra) and EB (Aqua), respectively. The EB (Terra) and EB (Aqua) estimates can then be averaged to represent daily mean ET. In this study, both methods were used to produce global daily ET. The differences between these methods are analyzed in Section 3.1.

Preprocessing of Input Data
The daily Terra (MOD11C1) and Aqua (MYD11C1) LST products available at 0.05° × 0.05° resolution for climate modeling (Wan & Li, 2008) were used in this study. Ideally, MOD11C1 and MYD11C provided us with two daytime global LST observations per day. Our results (later shown in Figure 4)   time). Correspondingly, Terra daytime LST (MOD11C1.005) was used as a background (Figure 1) to combine Aqua daytime LST (MYD11C1.005) in this study. The gaps in Terra MOD11C1 were filled by valid Aqua MYD11C1 data to build the daytime LST field. Figure 1 shows daytime LST using Terra MOD11C1 and Aqua MYD11C1 data for one particular day. Along with these composite LST maps, composite satellite acquisition time maps were constructed from the information in the MOD11C1.005 and MYD11C1.005 products. These time maps were used to extract meteorological information at each pixel at the time of the LST data acquisition. Although the LST combination procedure provides fields with considerably fewer gaps than those that would be obtained if only one satellite (Terra MOD11C1 or Aqua MYD1C1) were used, some areas continue to contain missing data due to persistent cloud cover (e.g., parts of the Amazon, Southeast Asia, Middle and Eastern Africa in Figure 9). In future research, we will introduce EF temporal filling method to fill these missing pixels in the final ET product.
European Center for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis data (Dee et al., 2011) were used for the meteorological forcing. To match the pixel size of satellite data, all the 0.125° ERA-Interim data (downloaded from http://apps.ecmwf.int/datasets/data/interim-full-daily/levtype=sfc/) were linearly interpolated to 0.05° spatial resolution. When transferring ERA-Interim 2 m air temperature from 0.125° to 0.05°, we accounted for adiabatic variations due to topography as follows:  ) was used to resample ERA-Interim 2 m air temperature from 0.125° to 0.05°. A lapse rate of 6.5 K/km was adopted for the orography adjustment of air temperature. We found that the elevation difference between 0.125° and 0.05° grids caused a >5 K difference in the 2 m air temperature for hilly areas. Because a 5 K bias of air temperature is not tolerable for sensible heat flux calculation, we used the orographic correction to adjust the air temperature. Topographic calibration of LWD was also conducted with the following equation:  , unit: K) was adjusted to the temperature at 10 m above the canopy top using a moist adiabatic lapse rate (6.5 K/km): The surface potential temperature ( a ) was derived with: where 10-m wind speed ( 10 U , unit: m/s) from the reanalysis was interpolated to be 10-m wind speed above the canopy (U ), using surface logarithmic profile assuming neutral stability: The spatially interpolated-humidity was directly used to calculate air density. The ERA-Interim data set provides surface air temperature (T2m), surface air specific humidity (Q), surface air pressure (p), and surface wind speed ( 10 U ) every 2 h, and surface downward short-wave (SWD) and downward long-wave (LWD) radiation every 3 h. The spline temporal interpolation method was used to obtain values of these variables CHEN ET AL.
10.1029/2020JD032873 9 of 22 at the overpass time of Terra and Aqua satellites. These pre-processing steps were used to ensure that the variables measured by the satellites and the ERA-Interim meteorological input data had been corrected to the same time and the same spatial resolution prior to their use in the EB model.

Evaluation Datasets and Methods
To evaluate the ET results, we used eddy-covariance data from the FLUXNET network, which provides in-situ ET estimates covering several continents including Australia, China, Europe, and the US, as well as various biomes and climates. The FLUXNET data were quality checked for temporal gaps and energy balance closure. ET data were used only from: (a) days with less than 10% gaps in its 48 half-hourly data record; and (b) stations with less than 40% lack closure in their energy balance. This quality check generated a total of 238 reliable sites from all the flux sites of FLUXNET2015 Data set, Ozflux (http://www.ozflux.org.au/), European Fluxes Database Cluster (http://www.europe-fluxdata.eu/), and the ChinaFlux network. Level 2 data from FLUXNET2015 were used in the validation exercise. It should be noted that the footprint of many of these sites may not be representative of MODIS pixels at 0.05° scale because of local surface heterogeneity. However, our aim at this point was to use flux data as much as possible based on the assumption that despite the small footprint of the flux observations in comparison with a 0.05° pixel, the temporal stability of both datasets enables a meaningful comparison. Given the inclusion of 238 global observation sites, this comparison is similar to the inverse of the widely used space-for-time approach in ecosystem modeling.
The Global Land-surface Evaporation: the Amsterdam Methodology (GLEAM) has previously been taken as a reference for comparing other ET products (Wang et al., 2017). It is used to assess the latitudinal variation of our EB ET. An accurate estimation of ET for irrigated area is important for water use efficiency. In order to compare ET performance for the irrigated area on Earth, the following available global ET datasets were also included in this study: MOD16 (Mu et al., 2011), Landflux (Mueller et al., 2013), GLDAS (Global Land Data Assimilation System) (Rodell, Houser, et al., 2004), GLEAM , and ERA-Interim (Dee et al., 2011). MOD16 uses a visible remote-sensing-based Penman-Monteith method and produces 1 km 2 ET data set at an 8-day interval. Landflux is a synthesis coarse product (with 1° × 1° grid size) which merges land surface models, and diagnostic and reanalysis ET. GLDAS is a land surface model simulation data set assimilating observations of land surface states, with a spatial resolution of 0.25°. GLEAM calculates ET for 0.25° grid via the Priestley-Taylor method using precipitation, surface soil moisture, and vegetation water content from remote sensing. ERA-Interim is a land-atmosphere coupled numerical simulation data set. It assimilates atmospheric state variable information from satellites and meteorological observations, and provides outputs at 0.125° resolution. These ET models represent different methods for ET calculation. MODIS LST is not included or assimilated in any of the ET models, except EB ET. This provides us with a potential chance to differentiate the role of LST plays on the ET. Chirps precipitation (Funk et al., 2015) was also used to do the water balance analysis, which does not include runoff in our study. Meanwhile, this method has a potential to identify irrigated areas.

Global Comparison of SEBS and EB
The results in Figure S1 demonstrate that the EB model had decreased the sensible heat underestimation by SEBS at flux tower grids. Figure S1 shows that the linear fitting slope between simulations and observations increased from 0.75 (SEBS) to 0.97 (EB), root mean square error (RMSE) decreased from 53.2 (SEBS) to 47.6 (EB), and R 2 increased from 0.19 (SEBS) to 0.23 (EB). This comparison of sensible heat flux derived from satellite data between SEBS and EB demonstrates a similar conclusion to that derived from the station forcing data by Chen, Massman, and Su, 2019.  Figure 2 shows that SEBS ET without EF gap filling resulted in ET underestimation at the global scale. The gap-filled SEBS ET data, however, contain higher values than the EB and GLEAM results in most areas worldwide. The reason for higher ET values from SEBS has been explained in previous publications, which shown that the roughness parameterization method in SEBS tends to produce underestimates of sensible heat flux and overestimates of ET or latent heat flux (Chen, Massman, & Su, 2019;Michel et al., 2016). The lower ET in the global EB estimation (Figure 2c) is due to the improvement described in Section 2.1, and is consistent with the in-situ evaluations (Chen, Massman, & Su, 2019). Figure 3 shows global maps of 10-year average annual ET calculated from the daily ET result of EB (Ter-ra&Aqua), EB (Terra), and EB (Aqua). The mean value of annual ET of EB (Terra) and EB (Aqua) is also given in Figure 3b. These global ET maps generally show a consistent spatial distribution (e.g., low ET values in arid regions, intermediate values across mid-latitude forests and agricultural regions, and the highest values in the humid tropics). They also show that the Amazon, Malaysian, and Indonesian tropical rainforest had the highest ET values, whereas the Sahel desert, and the Middle East had the lowest ET because of soil moisture limitations. Figure 3e shows that the four ET products had a similar latitudinal variation to that of GLEAM. GLEAM ET was linearly interpolated to 0.05° to assess the latitudinal variation of EB ET. EB ET do not have values for inland lake areas. The same lake mask was applied to the 0.05° EB and GLEAM ET datasets, allowing the comparison of their latitudinal mean values. The discrepancies were small among the four ET products: EB (Terra&Aqua), EB (Terra), EB (Aqua), and mean of EB (Terra) and EB (Aqua). EB CHEN ET AL.  (Terra&Aqua) was closer to GLEAM in the latitude band of 5°N and 20°S than are EB (Terra), EB (Aqua), and their mean. Figure 4 presents a comparison of EB ET products against eddy-covariance ET. Although EB (Terra) performance was slightly better than the performance of EB (Terra&Aqua) and EB (Aqua), we used the EB (Terra&Aqua) to minimize LST gaps. Correspondingly, we focus on analyzing EB (Terra&Aqua) in the following section. At the daily timescale, MB (mean bias) and RMSE (root mean square error) derived from EB (Terra&Aqua) were 0.38 and 1.8 mm/day, 0.05 and 1.5 mm/day, −0.03 and 1.4 mm/day, and −0.24 and 1.5 mm/day for flux towers in Australia, America, China, and Europe, respectively ( Figure S2). Some gaps in EF were due to missed LST information, whereas others were related to abnormal input data, which produce unphysical EF values. Abnormal EF values (higher than 1 and lower than 0) were also filtered and gap-filled. The EF gap-filling method was not only used for cases of cloudy days but also for cases of poor-quality input data. Our evaluation of EF gap-filled ET and clear-sky ET in Figure S3 shows that both conditions had nearly equal accuracy for shrub, deciduous broadleaf forest, evergreen needle-leaf forest, grass, savanna, and wetland. Minor differences between EF gap-filled ET and clear-sky ET were observed for barren, crop, evergreen broadleaf forest, and mixed forest classes. Collectively, these results demonstrate that the EF gap-filling method can be a useful strategy for seamless global daily ET retrieval.

Assessment of the ET Gap-Filling Method
The EB model was evaluated for a range of different biomes and climates ( Figure 5). The flux tower comparisons show that EB ET estimates in the equatorial winter dry region had the lowest RMSE and absolute mean bias. Performance in the monsoonal equatorial region was worse than that in the equatorial dry region. This might be associated with higher cloud frequency in the monsoonal equatorial region, and therefore a higher percentage of missing LST values. Warm temperature regions with warm and dry summers had the highest RMSE and absolute mean bias. Arid regions generally had larger fluctuations in their RMSE and mean bias. Figure 6 shows time series of daily ET simulation and in-situ observations at eight US flux sites that have been often used to evaluate other global ET products (Holmes et al., 2018;Mu et al., 2011). The remote sensing EB data could generally capture the dynamical variation in ET at most sites, except for the irrigated CHEN ET AL.  and rainfed agricultural sites in Mead, NE. The large overestimates in ET from EB at these sites were related to overestimation in net radiation from the ERA-Interim data set, which is discussed further in Section 3.4.

Evaluation of LST Forcings for ET Estimation Over Irrigated Land
The enhanced evaporative flux from irrigated croplands serves to cool the land surface and thus our EB models that apply LST forcing should have a higher potential to capture the flux variation relating to irrigation than models without LST forcing. To test this hypothesis, we compared the performance of available global ET products via water-balance analysis for irrigated area using CHIRPS (the Climate Hazards Group InfraRed Precipitation with Station data) precipitation (Funk et al., 2015), which has the same spatial resolution (5 km) as the EB ET. The ET in irrigated arid cropland should be higher than precipitation. Figure 7 shows that negative precipitation-minus-ET values for Nile Delta arid crop land were clearly demonstrated by the EB ET data, whereas ET from Landflux, GLDAS, GLEAM, and ERA-Interim could not capture this enhancement. Figure S4 shows that the EB model improved the representation of the impact of irrigation on the water balance in the Hetao irrigation plain along the Yellow River in China. The irrigated land is partly captured by MOD16, but not by GLEAM and other ET datasets. A global precipitation-minus-ET map from EB ET and CHIRPS also shows negative values for the agricultural areas of central USA, the Horn of Africa, Eastern Mexico, and northwest India (groundwater withdrawal, Rodell et al., 2018), suggesting that ET due to irrigation was captured by the EB model. MOD16, Landflux, GLEAM, and EB showed high P-ET values in the east part, and low values in the west of the middle reaches of the Yellow River (104-112°E, 36-41°N), which was not captured by GLDAS and ERA-Interim ( Figure S4). Landflux, which has the coarsest resolution of the studied datasets, can reflect the east-west contrast in the middle reaches of the Yellow River ( Figure S4b). It may be concluded that resolution is CHEN ET AL.  not the dominant factor for ET poor performance in this region. Figure S5 shows that SEBS has a negative P-ET value for the Nile basin and Hetao basin, which is similar to that of EB in Figure 7. This indicates that both SEBS and EB, using LST to constraint EF, can reflect the water balance conditions for irrigated regions. This may reflect the importance of using LST for the ET estimation for the irrigated region and explain why EB ET is better able to capture the details of the water-use patterns, and has a high potential for agricultural water resource management.

Sources of Error in EB ET Estimates
One source of uncertainty in ET from the EB model is the net radiation data, Rn d , used to scale instantaneous remotely sensed evaporation to daily ET under the assumption of constant daytime EF. For example, the ET retrievals at the US Midwest agricultural sites: Mead-Irrigated-US, Mead-Irrigated-Rotation-US and Mead-Rainfed-US sites in Figure 6 showed a strong positive bias. The main source of this bias is the overestimation of net radiation by the ERA-Interim data, as demonstrated in comparison with FLUXNET observations from the Mead rainfed site in Figure 8. Although the seasonal variation of EF d is captured by the MODIS data used in EB (Figure 8b), Rn d from ERA-Interim was two times higher than the observations in the summer (Figure 8c), resulting in the bias in ET d apparent in Figure 6.
Global-scale ET models are highly reliant on reliable net radiation (Fisher et al., 2008). Evaluation shows that the Rn d estimates correlate well with the flux tower observations (R 2 = 0.54), with a mean bias of −0.32 MJ/m 2 ·/day and an RMSE of 3.8 MJ/m 2 /day. There are many factors that may induce errors in the EC-MWF Rn d , such as deficiencies in the representation of clouds or land surface processes (Chen et al., 2017; CHEN ET AL.   Trigo et al., 2015). These studies demonstrated that ERA-Interim net radiation is more accurate in Europe ( Figure S6), explaining why EB ET has lower errors in Europe compared with other regions. EB(Terra) data included a few zero EF d values (Figure 8b), which might be caused by poor quality of other input datasets. Frequent cloud cover leads to substantial gaps in the LST data ( Figure 9). A 14-day LST gap due to cloud cover can occur frequently in the Amazon region in the wet season. In such cases, the EF gap filling method will continue the last clear estimate of EF for 14 days, potentially causing major uncertainties in the ET estimates during this period. India and South Asia also have long LST gaps in May-August during the monsoon seasons, limiting the capacity for continuous ET monitoring in these regions. In Southern France and Morocco, ET values in periods with LST gaps longer than 6 days are not suggested for use, as reported by Delogu et al. (2012).
The results of the current study indicate that differences between the model results and in-situ validation data may also arise from bias in the forcing data. Inaccuracies in the forcing data may have considerable impacts on the simulation of land-surface energy partitioning (Qian et al., 2006). The atmospheric forcing data are modeled variables, conditioned by the same environmental and canopy conditions within a pixel. Another issue is the spatial resolution of satellite and meteorological forcing datasets used in the EB method. The spatial resolution of the ERA-Interim forcing datasets is not as high as that of the MODIS remote sensing data. When ET is produced at a daily time scale, some model inputs and parameters from remote sensing, such as albedo and LAI, are not routinely available at the same temporal resolution. This might also influence the accuracy of daily ET.
The MOD11C1 and MYD11C1 LST products were produced using input from Terra and Aqua satellites, respectively, which overpass at different times of the day. When the two LST datasets are combined to calculate ET, this can sometimes result in inconsistent temporal and spatial variability in the EB (Terra&Aqua) CHEN ET AL. data set. At the global scale, significant inconsistency was only identified in Australia for June, July, and August ( Figure S7), and these strips of heterogeneity appeared nearly every year in dry months, but did not arise in other continents. Figure 10 demonstrates the multi-year annual average ET from EB (Terra&Aqua), EB (Terra), and EB (Aqua) for Australia in 2006-2017. The annual ET geospatial distribution shows some strips for the EB models without EF gap-filling (Figures 10a-10c). EB (Terra&Aqua) had more clear strips than EB (Terra) and EB (Aqua), indicating that the joint usage of Terra and Aqua LST introduced some inconsistency for Australia because of the different sensing time. The EF gap-filling served to smooth these strips in the annual ET maps (Figures 10d-10f), further emphasizing the importance of EF gap-filling for the EB global ET retrieval.

Discussion
We used the simple EF gap filling method, which could be directly merged with the EB model calculation. The nearest gap filling method could cause a discontinuous EF (shown in Figure 8b). To remove EF discontinuous changes, other spatial or temporal gap-filling methods, for example, that described by Bhattarai et al. (2019), could be employed in the EF posterior re-processing. Other options could have been applied to address issue of gaps in remotely sensed data. Instead of gap-filling EF, we could have targeted LST to be CHEN ET AL.  gap filled (Gascoin et al., 2015). Recent efforts on LST gap-filling (Holmes et al., 2018;Martens et al., 2018;Zhou et al., 2015) could help tackle the ET cloud issues in future. However, the high temporal and spatial variability of LST would raise other problems, whereas inaccurate or nonsteady observation times would also need to be dealt with. Gap-filling techniques in LST fields and/or rectification of LST observation times are likely to lead to considerable uncertainties and their propagation toward ET estimates needs to be carefully assessed. In contrast, EF presents smoother transitions, which supports the use of the gap-filling approach on EF. Due to the similar characteristics to MODIS sensors, in future versions, VIIRS, Sentinel 3A&3B, and FengYun 3B&3C data could replace MODIS thermal sensors. Geostationary satellite data can provide time-consistent information at hourly timescales, for example, Multifunctional Transport Satellite (MTSAT) can provide us with an hourly latent heat flux (Zhao et al., 2019). All these satellite data could also help to tackle the gap filling issues.
The performance of sensible heat estimation from the SEBS and the EB model was compared at local scales around flux tower sites by Chen, Massman, and Su (2019). The models in Chen, Massman, and Su (2019) were forced by meteorological data from the flux tower. The performance of SEBS and the EB model when forced by satellite data, has not been compared before this study. In this study, we assess their relative performance at global scales using satellite-derived input data. Despite the effort to develop surface and aerodynamic resistance in the EB ET model presented in this work, some grids remain prone to large ET biases. The EB model does not consider sub-pixel heterogeneity, while single towers have a smaller foot-CHEN ET AL.  print than the satellite pixels and thus do not represent the heterogeneity within 5 km pixels. This is also reflected by the difference between in-situ LCT and MOD12 LCT. The interpolation of the original 0.125° p, Q, 10 U , and SWD field to a finer resolution of about 0.05°, also introduces uncertainty in the ET estimates, since the area-averaged values of an ET pixel do not necessarily match the value derived from the satellite radiance averaged over the pixel footprint (Vinukollu, 2011). Some of the ET biases arise from the forcing data, especially daily net radiation (shown in Figure 8d). Other daily net radiation data is recommended to improve the ET accuracy. Since we assume that there is no change in ground heat content during the peri-CHEN ET AL.   od 2003-2017, impacts of variation in the ground heat content on ET during this period is ignored, which may also influence the accuracy of our ET data. This assumption on the same phase between soil heat flux and net radiation may also introduce errors in the ET calculation. In future, the phase difference between ground heat flux and net radiation could be used to better represent the diurnal cycle of ground heat flux (Gentine et al., 2007;Sugita & Brutsaert, 1991).
Dry-wet limits are an important feature for SEBS. The same dry-wet scaling was tested in the EB model as a part of the current study. Figure S8 compares EB ET with and without dry-wet limits, showing that these play a limited role for the EB global ET. The turbulent roughness length was estimated using a range of canopy information. 0m z variations in green-up and leaf-drop periods are included by using variation of LAI information. However the method does not take into account the geometric characteristics of the canopy, such as crown aspect, crown frontal (horizontal) area (Schaudt & Dickinson, 2000), and canopy density (Jasinski & Crago, 1999). Currently, these characteristics are generally not measured, which precludes application of a more physically based model. An improved description of the land surface using satellite spectral information may further lead to improved estimates of aerodynamic resistance and roughness lengths, which will further improve the simulation of land surface fluxes. Finally, EB-ET does not provide ET estimation for inland lakes and therefore users should take care when comparing global annual ET values with other ET datasets.

Conclusion
This study presented a thermal energy balance ET retrieval methodology that uses optical remote sensing data to describe the dynamic variation of the land surface canopy, such as canopy height and canopy vertical leaf area density. Reanalysis data (which were usually taken as observational data for general circulation models) were used to describe the variation of air temperature, wind speed, and solar radiation in the atmospheric surface layer. The thermal status of the land surface was described by using MODIS LST, whereas MODIS reflectance data helped to define vegetation characteristics. The turbulence process in surface layer and roughness sublayer was taken into account by using MOST theory. Two important processes that determine ET, radiation forcing, and turbulent exchange were included in the EB method. This ET model builds a mechanistic bridge between land-surface canopy characteristics and heat transport predictions. The ET model in this study also explicitly accounts for near-surface turbulent interactions affecting soil and canopy drying. The ET product can be used for applications in water resource management and research into climate impacts on the land surface energy and water balance. In this research, we demonstrated the feasibility of this method using MODIS LST products.
A two-step modeling approach was used in the global ET model. First, the model used remotely sensed LST daytime fields to calculate the daytime sensible heat flux field. Latent heat flux daytime field was obtained as the residual of the surface energy budget. Second, the daytime sensible and latent heat flux fields were used to calculate EF daytime field, and the EF daytime field was up-scaled to daily mean EF, which was ultimately used to obtain daily ET. A strategy was also developed for computing spatially and temporally continuous daily ET at the global scale using remote sensing thermal data from MODIS at 0.05° spatial resolution. Unlike most thermal ET models that only provide ET under clear-sky conditions, a combined approach to take into account Terra and Aqua daytime LST observations for EF gap filling was proposed to derive a continuous ET product. The LST daytime field is used to derive daily EF, while remaining gaps in daily EF maps are filled by the EF values in a day before.
The precipitation minus ET results for the heavily irrigated regions showed the deficiency of the currently available ET products. In particular, GLEAM is primarily a water balance model that uses precipitation as a boundary condition; MOD16 may not capture evaporation from extremely wet surfaces very well; and GLDAS and ERA-Interim do not incorporate irrigation in their simulations. It was shown, however, that the EB model can capture the energy balance and ET changes due to irrigation. The EB ET can be reprocessed with other EF gap-filling and downscaled to Plot-scale, which is preferred for the agricultural sector. In addition, the EB ET will be continuously updated to the latest time.
CHEN ET AL.