Data assimilation of Island climate observations with large-scale re-analysis data to high-resolution grids


  • Shu-Hua Lin,

    Corresponding author
    1. Global Change Research Center, National Taiwan University, Taipei, Taiwan, Republic of China
    2. APEC Research Center for Typhoon and Society, Taipei, Taiwan, Republic of China
    • APEC Research Center for Typhoon and Society, Taipei, Taiwan.
    Search for more papers by this author
  • Chung-Ming Liu

    1. The Chinese Association of Low Carbon Environment, Taipei, Taiwan, Republic of China
    Search for more papers by this author


Data assimilation is important for the spatial analysis of small regions with complex terrain and diverse climates and for interpolation among observations. A data assimilation method incorporating observations, coarse-grid re-analysis data and physiographical features is demonstrated to generate high-resolution temperature data for small islands such as Taiwan. The method is also able to weigh physiographic and anthropogenic factors. Among the spatial factors, the orographic effect is the dominating factor and the lapse rate varies seasonally. Population density is significantly related to temperature, which may correspond to the urban heat-island (UHI) effect. It is also shown that an anthropogenic factor could be used with this interpolation method to explain the details of the temperature variation. The data assimilation model provides an opportunity to assess the extent to which simple statistical regression equations, calibrated from natural variability, can reproduce climate changes driven by land effects without considering a complex climate model. Copyright © 2012 Royal Meteorological Society

1. Introduction

Taiwan, a small island ∼395 km in length and 144 km in width and with complex terrain, has diverse climates. This diversity requires high-resolution spatial analysis for a variety of environmental studies, such as water-resource management (Hay and Clark, 2003; Lin et al., 2010) and for studies of species habitat change (Hsu et al., 2011) and the epidemiology of infectious diseases. However, gridded climate data are restricted by the limited number of monitoring stations in Taiwan, so data are absent over large areas. Data assimilation is a technique using both statistical analysis and interpolation to integrate irregularly distributed observations into regular model grids (Wang et al., 2000).

A number of interpolation methods have been used, such as Kriging (for an overview, see Hartkamp et al., 1999; Sluiter, 2008), thin-plate alpine (Hutchinson, 1995), inverse distance weighting (Shepard, 1968), Gaussian weighting filters (Thornton et al., 1997) and artificial neural networks (Snell et al., 2000). Without topographic adjustment, these methods are not well suited to the complex terrain of Taiwan. Therefore, it is necessary to develop a more suitable interpolation method.

An optimum interpolation method for better regional/local climate simulation should consider physiographical features, land-sea distributions and land use types (von Storch, 1999). Hijmans et al. (2005) obtained a worldwide, high-resolution surface climate data set with a 10-arc-s resolution (∼1 km2) by adopting geospatial variables such as altitude, latitude and longitude to integrate scattered observations. In addition, the UK climatic data at 1-km resolution (Perry and Hollis, 2005) used not only the geospatial variables listed above but also variables such as land use and terrain shape.

Typically, the current highly gridded surface climate data were generated with observational data to weigh geographical variables and thus estimate the climate of each grid. In this paper, a data assimilation method incorporating coarse-grid re-analysis data and physiographic characteristics is shown to generate high-resolution temperature data for small islands such as Taiwan. The method also serves as an example of how to weigh physiographic and anthropogenic factors.

2. Data and methodology

The NCEP/NCAR re-analysis climate data sets, a joint product of the National Center for Environmental Prediction (NCEP) and the National Center of Atmospheric Research (NCAR; Kalnay et al., 1996), are well recognized as being observation-like, but they are of coarse resolution, they do not recognize land characteristics and they represent only the sea surface temperature fluctuations around Taiwan. Therefore, we conclude that the major deviations between the observational data and the NCEP/NCAR re-analysis data around Taiwan are due to local factors such as topography, land use and degree of urbanisation.

The topographical effect and degree of urbanisation were represented by altitude and population density, respectively. Multi-variate regression analysis was used to determine the weights of these factors for each grid. This constant parameter from the regression could represent the land-ocean circulation around Taiwan Island. The residual values after the filtering of these factors could hint at the local circulation effect.

2.1. Study area and data sources

Taiwan, an island with an area of 36 000 km2, is crossed by the Tropic of Cancer, which separates subtropical and tropical regions, and is surrounded by an ocean that lacks climate-monitoring stations. The island has a complex topography with steep slopes (Figure 1). The spatial pattern of the climate is varied, as is the ecology. In past studies using coarse-resolution climate data, the land effect and local climate variations on small islands have been ignored, and small islands have been treated as part of the ocean.

Figure 1.

Map of Taiwan and the locations of its 25 long-term climate observation stations and 102 auto-rain gauges with terrain contours. This figure is available in colour online at

The NCEP/NCAR reanalysis data have a resolution of ∼2.5° longitude/latitude with daily updates. After simple interpolation between local climate-monitoring stations, significant differences between the interpolated data and the climate data can be noted. Here, a data assimilation method (DAM) was developed to link the deviation with spatial parameters.

There are approximately 25 long-term climate-moni-toring stations in Taiwan, including a few in mountainous regions (Figure 1). These stations have been in operation since the 1950s. The climate-monitoring system was enhanced in 1990 by the addition of 102 auto-rain gauges around the island. The stations are operated by the Central Weather Bureau (CWB) of the Republic of China. A population density map was derived from census data of the year 2000 (Figure 2). The monitoring points represent the range well in both altitude and population density (Figure 3).

Figure 2.

Taiwan population density (person per square kilometres) map indicating 25 long-term monitoring stations and the population densities of the locations. This figure is available in colour online at

Figure 3.

Monitoring station distribution by terrain height and population density. Approximately 3/4 of the stations are located at lowlands (below 500 m a.s.l.) and 1/4 at midlands and highlands (between 500 and 1500 m and above 1500 m a.s.l., respectively). This figure is available in colour online at

2.2. Application of the DAM

After a thorough investigation of suitable parameters, the differences between the interpolated NCEP/NCAR re-analysis data and the local climate data were determined (Figure 4). Figure 4 shows a distribution similar to that shown in Figure 1, with deviations that are high in central regions and low in coastal regions. The altitude and the local population density proved to be sensitive enough to minimize the deviation through a multi-variate linear regression relationship (Equation (1)). The change of temperature with respect to altitude is the defined lapse rate. The change of temperature with respect to population density is likely related to the intensity of urban heating on the island. In general, the so-called land-cover index used in atmospheric circulation models is used to determine local characteristics, such as ice, water, forest and grass, and whether the area is suburban or urban. However, such an approach is not suitable for Taiwan, where local development is quite extensive everywhere. The deviation (δ) between the NCEP/NCAR interpolated re-analysis data and the observations is as follows:

equation image(1)

where H is the terrain elevation and D is the population density (thousands per square kilometres). The lapse rate (α), UHI index (β) and land–ocean interaction (ε) are determined by linear regression. Superscript m represents month and subscript i is the station identifier.

Figure 4.

The deviation of the annual mean temperature between the NCEP re-analysis and the observed climatology mean is assumed to be influenced by land factors. This figure is available in colour online at

Furthermore, the deviation between the NCEP/NCAR re-analysis data and observations shows significant diurnal variations, whereas the NCEP/NCAR re-analysis data generally show less diurnal variation. Here, we assume that the monthly NCEP/NCAR re-analysis output can be adjusted on the basis of the average difference pre-determined during a training period, which is 1990–1999 in this case. Therefore, the temperature estimated during a verification period, that is 1980–1989, can be calculated by the following equation:

equation image(2)

where Test is the estimated temperature, tncep is the NCEP/NCAR re-analysis temperature and µ is the average difference pre-determined for every month. Here, µ is estimated during the training period as

equation image(3)


equation image(4)

and tobst stands for the observed temperature. Clearly, the average difference for each time step can be varied because of the change of the training period and the data set used from the NCEP/NCAR re-analysis.

In the following section, two important indexes are used to assess the newly developed DAM. The Pearson correlation coefficient (R2) indicates the explanation rate of the scheme, and the root mean square error (RMSE) specifies the average error between model estimates and observations. These two indexes can provide a performance evaluation of a newly developed scheme for reproducing the observed temperature time series. Furthermore, the persistence of the scheme can be evaluated by checking the variation of these two indexes during the verification period.

2.3. Temperature-mapping procedure

The climatological temperature map is constructed from the altitude and population density. For mapping temperature contours, data from the digital elevation module (DEM) map of Taiwan, having an ∼40 m grid resolution, were collected and resampled into a 1-km horizontal and vertical resolution (Figure 1). The population density map is produced by combining the population data with the gridded administration map (Figure 2). The land–ocean interaction is a constant for each month, and the local circulation parameters for each grid were generated by a simple interpolation of 127 monitoring stations. The sum of the four factors gave the estimation of the climatologic mean temperature for each month. Figure 5 shows the procedure of adding four spatial layers to correct the bias for high-resolution temperature data.

Figure 5.

DAM-derived high-gridded monthly mean temperatures in January and July. (a) Observed temperature map derived by the linear interpolation method (Obst). (b) NCEP re-analysis temperature (NCEP rea.). (c–e) High-gridded temperature changes with the progressive addition of the physiographic factors, that is c = b + Topography, d = b + Topography + Urban Heating Island (UHI), e = b + Topography + UHI + Land–Ocean Interaction (LOI). (f) Complete map with the addition of the local spatial parameter (LSP). White area in upper panel indicates temperatures below 5 °C. This figure is available in colour online at

3. Results and discussion

The DAM is a scheme that evolved from the NCEP/NCAR re-analysis output to minimize its temporal and spatial deficiencies. The improvement can be demonstrated better in the monthly mean temperature estimates (Figure 5). It is shown that the higher resolution of the climatologic temperature data, providing the capability to incorporate local climate effects (e.g. land–ocean interaction, orographic effects, heat-island effects and local circulation effects), is another advantage of this method.

3.1. DAM assessment

The DAM is a general adjustment to reduce the errors in the temporal and spatial characteristics originating in the NCEP/NCAR re-analysis data set. A decadal temperature series (1990–1999) was used to calibrate the spatial parameters, and another decadal series (1980–1989) was used for validation. The R2 and RMSE indexes both showed that the DAM can enhance the ability of the simulation to reproduce monthly mean temperatures from NCEP/NCAR re-analysis outputs.

The results of the DAM estimates in the calibration and validation periods are compared with the observed temperature and the NCEP/NCAR re-analysis data (Table I). The DAM effectively improved the temperatures estimated from the NCEP/NCAR re-analysis data (Figure 5). Although these data could not identify the spatial variability in the Taiwan region, they could still effectively display the seasonal temperature cycle with an explanation rate as high as 99.5%. However, the NCEP/NCAR re-analysis still could not effectively reproduce the monthly mean temperature values (the spatial pattern of the deviation is shown in Figure 4). Therefore, we focus on assessing the ability of the DAM to improve the representation of the spatial and some of the temporal variation. The explanation rates and RMSEs among the 25 long-term monitoring stations were used to assess the DAM performance in each month within a year. After adding the spatial factors, the results in Table I show that the DAM can improve the explanation rate to 99% from the NCEP/NCAR re-analysis data and can decrease the RMSE to 0.2–0.5 °C. In addition, the DAM was consistent between the explanation rate and the RMSE in both the calibration and verification periods, and despite the small variation of the RMSE on the monthly mean temperature, the estimation was slightly higher in the verification period than in the training period. The NCEP/NCAR re-analysis data have an ∼10% explanation rate in winter because of the variability caused by differences in latitude (i.e. the south is warmer than the north) (Figure 5, upper panel); but it failed in the summer because of the smaller north/south difference (Figure 5, lower panel). Some city areas in the north are warmer than in the south (Figure 5), which is the reason for adding the anthropogenic factor of the UHI effect to the method.

Table 1. RMSE and R2 values of NCEP/NCAR re-analysis and DAM models estimating monthly mean temperature in training and verification periods station by station over Taiwan
  • a

    Ocean–land interaction (C).

  • b

    Lapse rate (C/km).

  • c

    UHI index (C/1000 person).

January− 0.13− 4.870.060.114.711.0000.000.104.580.9970.43
February− 0.02− 4.770.050.124.571.0000.000.144.440.9970.37
March0.72− 4.590.040.124.421.0000.000.134.420.9980.24
April1.00− 4.960.100.064.511.0000.000.064.570.9990.17
June0.78− 5.390.090.004.821.0000.000.004.840.9980.21
July0.74− 5.590.110.004.951.0000.
August0.59− 5.560.100.004.941.0000.
September0.22− 5.340.100.004.811.0000.000.004.910.9980.29
November0.05− 4.940.080.054.491.0000.000.054.500.9990.19
December− 0.29− 4.930.070.084.611.0000.000.094.570.9960.31

Although the spatial parameters were decided by the monthly mean temperature of all monitoring stations in the calibration period, these parameters are still useful for monthly temperature data. We take the monthly temperature in Taipei as a demonstration (Figure 6). The DAM estimates are closer to the diagonal lines than the NCEP/NCAR re-analysis data, which are consistent in both the calibration and validation periods. However, the DAM estimates still have minor errors among the individual monthly data sets, especially in summer, because the same spatial parameters were applied to both decades. Once the DAM method is applied to the monthly observed data set individually, the errors could be reduced.

Figure 6.

Comparison of NCEP re-analysis data (cross legend) to DAM estimates (circle legend) of monthly mean temperature for the four seasons at Taipei. Data of the 1990s data and 1980s represent calibration and validation periods, respectively. The x-axis is the observed mean monthly temperature, and the y-axis is the model simulation. This figure is available in colour online at

3.2. Physiographical features in Taiwan

The local climate is influenced by several factors within a small area. These factors include land–ocean interactions, the effects of seasons, the orographic effects of mountains, the heat-island effects of cities and the unique circulation patterns of different locales. Multi-variate linear regression analysis of the spatial factors is commonly used to examine smaller spatial scales (Hewitson and Crane, 1992). Two spatial factors, altitude and population density, are used in the present study. To confirm the efficiency of the selected factors, we used stepwise multi-variate linear regression analysis. Both spatial factors passed the test within the 95% confidence interval. The coefficients of the regression formula represent spatial parameters, such as lapse rate, urban heating and land–ocean interaction (α, β, ξ). The temperature residuals between observations and regression estimates represent the local circulation effects (µ).

3.2.1. Land–ocean interaction

The NCEP/NCAR re-analysis data supply observations such as sea surface temperature around Taiwan. Because the NCEP/NCAR re-analysis data set is of low resolution, it is not capable of identifying the land characteristics of small islands. Realistically, land–ocean interactions can be recognized by the intercept of the linear regression (ξ), which represents the common bias of the NCEP/NCAR re-analysis data. According to the results (Table I), the coefficient ξ is negative only during the winter, that is December to February; it is positive for the rest of the year. In other words, the NCEP/NCAR re-analysis overestimates the temperatures in winter and underestimates them in other seasons. The results correspond well with the fact that land surfaces gain and lose heat quickly, which results in warmer temperatures in the spring, summer and autumn and cooler temperatures in the winter. The parameter ξ is a useful coefficient for stating the weight of the land–ocean interaction factor for small islands.

3.2.2. Orographic effects

As shown in Figure 4, the large deviations of the NCEP/NCAR re-analysis data are the result of the terrain effect. Mountains can influence climate on local, regional and even global scales. On islands with complex terrain, mountains are the dominant factor influencing local temperature. It has been proposed that the lapse rate is spatially variable on a regional scale (Douglass et al., 2004a, 2004b), but this result cannot be confirmed in the present research. Instead, the temperature lapse rate analysed in the present study varies seasonally (Table I; Figure 7). The temperature lapse rates are proportional to the mean temperature and are higher in July/August and lower in February/March.

Figure 7.

Long-term monthly mean temperature and spatial parameters, that is lapse rate, land–ocean interaction and UHI index of each month within a year. The lapse rates are proportional to the climatologic monthly mean temperatures (6a), as is the UHI index (6b). This figure is available in colour online at

3.2.3. UHI effects

The population density is also found to be highly correlated with temperature deviation. The results in Table I show that population density influences temperature on various temporal and spatial scales. These results indicate that urban heating is proportional to the population density, at 0.04–0.11 °C per thousand per square kilometres (Table I), and that the heat-island effect may be present in Taiwan.

An UHI occurs when city temperatures run higher than those in suburban and rural areas, primarily because buildings have supplanted vegetation and trees. Moreover, human activity itself generates heat. The population in Taiwan is distributed mainly in mid- to low-elevation regions (Figure 2), among which the population density can be as low as 1–14 km−2 in mountainous areas and as high as 27 000 km−2 in metropolitan areas. Depending on the urban heating rate, the average temperature in urban areas may be as much as 3.2 °C higher than in rural regions.

3.2.4. Local spatial effects

The local spatial parameter was defined as the residual value of the deviation between the NCEP/NCAR re-analysis data and the observations after adjustment for the lapse rate, urban heating and land–ocean interaction. This parameter leads to the non-parameterized spatial factors or effects on climate. The local circulation parameters at most weather stations are within ± 1 °C, which, representing the present spatial parameter set, are adequate to explain the results for a majority of regions.

The monthly distribution of spatial parameters obtained in this study (Figure 8) indicates generally higher values along the central mountain range and lower values in the basin or lower-terrain regions. However, these values do show certain monthly variation patterns that are most likely linked with the seasonal changes of the general circulation. For instance, during the winter months, when the northeasterly monsoon prevailed, the low values centred at the northeastern plain were the highest in absolute value, in contrast to the lowest absolute values in the summer months when southeasterly winds prevailed. The sensitivity of the spatial parameter to terrain and seasonal prevailing winds indicates the reliability of Equation (2) in representing the response of complex terrain to the general circulation.

Figure 8.

The spatial parameter (the so-called local circulation) variation within a year. This figure is available in colour online at

3.3. Application

The high-resolution climatology maps of the monthly mean temperatures over the course of a year generated by the DAM could be used for empirical statistical downscaling (SD) as base maps (Figure 9). In most SD methods, a transfer function between a large-grid general circulation model (GCM) and local observation stations (Wilby et al., 1997; Zorita et al., 1999) focuses on simulating climatic anomaly fluctuations and removing the climatological mean before the transformation (e.g. singular valued decomposition, empirical orthogonal function or SDSM). Therefore, the base maps should be added after the SD process, for which the highly gridded climate data could be useful.

Figure 9.

DAM-generated high-resolution climatology maps of the monthly mean temperature within a year. This figure is available in colour online at

4. Conclusion

A DAM that improves the horizontal/vertical resolution by up to 1 km was developed to estimate the surface air temperature on small islands such as Taiwan. The basic concept behind the DAM is to reduce the errors in the temporal and spatial characteristics with coarse-resolution climate data.

A set of spatial parameters, including the orographic effect, the UHI effect and the local circulation effect, were established for the DAM to simulate regional climate patterns. Among the spatial parameters, the orographic effect is the dominating factor, and the lapse rate varied seasonally: it was high in the summer and low during the rest of the year. The population density is significantly related to temperature during a given time period, being associated with numerous human activities. This result may correspond to the UHI effect. It is also shown that an anthropogenic factor could be used with this interpolation method to explain details of the temperature variation. The DAM model provides an opportunity to assess the extent to which simple statistical regression equations, calibrated from natural variability, can reproduce climate changes driven by land effects without considering a complex climate model. In addition, the DAM was only applied to determine the climatological mean temperature in this paper, but the method could be adapted to generate high-resolution monthly temperature data.


The authors are grateful for the data provided by the Taiwan Central Weather Bureau. The NCEP Reanalysis Derived data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at The research was supported by the National Science Council of Republic of China under grants NSC 99-2621-M-002-019.