SEARCH

SEARCH BY CITATION

Keywords:

  • multilinear regression;
  • forward selection;
  • cross-validation;
  • coefficient of determination;
  • rmse

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

Interpolation of monthly averages of maximum and minimum temperatures for 1 year data on a 250 m grid by multilinear regression his presented here. The principal aim is to find a suitable parameter for interpolation of minimum temperatures in cases of nocturnal inversion. For this purpose a geostatistical parameter has been calculated and tested. It is related to the height relative to the nearest valley. The second aim is to find how the regression equation depends on the sea distance, if in a linear or non linear form and, if nonlinear, what is the best exponent. The procedure has been tested on a 1 year data set from 60 meteorological stations on Sardinia Island. The selection of parameters has been made by the forward selection method. The interpolation errors (RMSE) on independent stations (i.e. not used to calculate the regression coefficient) have been calculated by the cross-validation method using a developmental data set of size n − 1. The parameter that contains most of the variance is the height: the second one, for minimum temperatures, is the relative height. For maximum temperatures the second parameter is the sea distance, but only in summer months. The RMSE on the independent data ranges from 1.0 to 1.5 °C for minimum temperatures and from 0.5 °C (winter months) to 1.4 °C (summer months) for maximum temperatures. The effect of relative elevation in the regression is a 15% increase of the coefficient of determination. At the same time it lowers the RMSE significantly. Copyright © 2011 Royal Meteorological Society


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

The literature presents a great variety of studies about temperature interpolation. The differences are relative to three main topics: spatial resolution, data temporal reference (daily data, monthly or annual averages) and the interpolation method. Spatial resolution varies from a few hundred metres (Ninyerola et al., 2000) to a few kilometres (Gyalistras, 2003). The optimum interpolation has been used by Cacciamani et al. (1989) in the Po Valley, by Uboldi et al. (2008) in Lombardia (north Italy), by Chessa and Delitala (1997) for daily minimum and maximum temperatures at 20 km resolution over Sardinia. Multilinear regression is used by many authors, very often in combination with kriging, inverse distance weighting (IDW) and splines (Daley, 1991). Regardless of the method, a very important problem is how to find the variables (orographic, topographic, land cover) that influence the temperature field. According to Jarvis and Stuart (2001a,b) it seems that variables are more important than method in producing the interpolation performances. Jarvis and Stuart used multilinear regression to select 10 guiding variables among 35 topographic and land cover ones. In a successive phase the guiding variables were used to interpolate daily maximum and minimum temperatures in England at 1 km resolution. The residuals of the multilinear regression are interpolated by different methods: trend surface, kriging, IDW and thin plate splines. Once variables are included to guide the interpolation, differences in performances between methods are not significant, except for trend surface which gives poorer results.

A knowledge-based method to interpolate monthly maximum and minimum temperatures is presented in Daly et al. (2002, 2003). The method is applied to Puerto Rico, Vieques and Culebra at about 450 m resolution. The regression equation contains the elevation as an independent variable. The other topographic variables enter indirectly in the equation as follows. For each grid cell each station enters in the regression equation with a weight: this weight depends on the topographic variables of the station, on the grid cell and the relative distance between the station and the grid cell.

The Dipartimento idrometeoclimatico of ARPA Sardegna provides meteorological, climatological and agrometeorological products for different users, in particular for tourism and agriculture. As the mean distance between the network stations of ARPA is about 20 km it is necessary a high spatial interpolation, especially for temperature and precipitation. Thus, there is the need for very high spatial resolution data. A few stations register very low minimum temperature due to thermal inversion processes. In making empirical minimum temperature interpolation the principal problem is to find a suitable geostatistical parameter to describe this process. Another task is finding how the sea distance enters in the interpolation procedure of both minimum and maximum temperature, whether in a linear or non linear form.

2. Methodology

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

2.1. The geostatistical parameters

The method has been applied to monthly means of maximum and minimum daily temperatures. The interpolation is made in a 250 m grid by a multilinear regression method (see Appendix A). The geostatistical parameters are: elevation, sea distance, latitude, longitude and relative elevation. The elevation is obtained from a digital elevation model at 250 m resolution. The relative elevation is calculated as follows: for each grid point the minimum elevation in a 3 km radius is found, so that the relative elevation is the height relative to the minimum (the difference between the elevation and the elevation of the minimum grid point). So, for a grid point in a valley the relative elevation is very low. Grid points with very low relative elevation should be associated with thermal inversion phenomena. Some experiments have been made with 2 km radius and 5 km radius in calculating the relative elevation field, but the results have been worse than with 3 km radius. In fact with 2 km radius and 5 km radius the RMSEs on independent stations have been higher. A possible explanation is that 3 km radius coincides with some spatial scale of the island's orography. For instance in taking a radius too large it is possible to find a valley, but not the nearest valley for that pixel.

2.2. Hierarchy of parameters

In order to find the relative importance of each parameter the procedure is as follows (forward selection) (Figure 1). The regression with only one parameter at a time is made and then the coefficient of determination CD (see Appendix B) is calculated (Wilks, 1995). The parameter with the higher CD is the first one. To find the second parameter a bilinear regression with the first parameter and the remaining parameters in turn is made. The higher CD gives the second parameter. To find the third parameter a regression with three parameters is made, the first two chosen previously and the remaining parameters in turn. The higher CD gives us the third parameter, and so on until the last regression with five parameters. An example of this method will be presented in Sections 4.1 and 5.1. The method of maximum CD is also used to find the best exponent for sea distance and relative elevation in the regression equation.

thumbnail image

Figure 1. Flow chart to find the hierarchy of parameters

Download figure to PowerPoint

2.3. Verification on independent stations

In order to calculate the interpolation errors on independent stations (i.e. not used in the regression) the procedure (Figure 2) is as follows (cross-validation with implementation set of size n − 1): step one, the first station is used for independent verification and the other n − 1 stations to find the regression coefficients. Step two, the second station is used for verification and the other ones for regression. At step three the third station is used, and the remaining ones for interpolation, until the last station is used for verification and the others for interpolation. This method has the advantage of maximizing the number of independent stations used for verification and at same time it maximizes the number of stations used to calculate the regression coefficients.

thumbnail image

Figure 2. Flow chart of the verification procedure

Download figure to PowerPoint

3. The data

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

The method has been applied to the 60 ARPA Sardegna meteorological stations (Figure 3). Monthly means of daily maximum and minimum temperature for 2007 are used in the interpolation. At least 90% of daily data are required for each station. Sardinia is about 24 000 km2 and lies in the middle of the West Mediterranean Sea. Its complex orography includes plains, deep valleys, hills and mountain (the peak of Gennargentu Mount is 1834 m high). The station locations provide a large range in height above sea level (from a few metres to 1209 m) and in distance from the sea (from a few hundred metres to 45 km). The mean distance between the stations is about 20 km.

thumbnail image

Figure 3. Height above sea level of the stations. The maximum station height is 1209 m (Orgosolo Montes)

Download figure to PowerPoint

4. Results for minimum temperatures

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

4.1. Hierarchy of parameters

For each of the months in 2007 the hierarchy of parameters has been determined as explained in Section 2.2. Here the results for one winter month (January, Table I) and one summer month (July, Table II) are summarized. Tables I and II contain the CD of regression. The first line in the table refers to linear regression with one parameter. The most important one is the height (H), that describes 39% of the variance. The second line refers to CD with two parameters, one of them is the height; the higher CD value corresponds to the relative height (Rheight). H and Rheight account for 60% of the variance. The third line refers to CD with three parameters regression (two of them are H and Rheight). The CD rises to 65% in correspondence to the sea distance. The two last lines in the table correspond to latitude and longitude. They give a small contribution. Table II for July presents similar results. The height alone describes 47% of variance. The second parameter is Rheight and together with H describes 68% of variance. The third parameter is sea distance and with it CD raises to 70%. Similarly to Rheight, sea distance enters in the regression equation to the power of 0.2. This point shall be explained later. Finally, latitude and longitude give a small contribution to CD. For the other months the values of CD in function of the parameters are similar. This means that the relative importance of the parameters is the same for all months. Similar results are obtained for July 2008 and July 2009, January 2008 and January 2009. Figure 4 presents the RMSE on independent stations in function of the parameters for January. RMSE with only H is 1.85 °C, it drops to 1.55 °C adding Rheight and to 1.48 °C adding sea distance as a third parameter in the regression. Latitude and longitude seem not to give any effects on RMSE. An important point is that Rheight in the regression equation is to the power of 0.5. In other words the parameter is not Rheight but Rheight**0.5. Some experiments have been done with different exponents, in particular 1 and 0.5, and the latter produces better results in terms of CD and RMSE on independent stations.

thumbnail image

Figure 4. RMSE on independent stations in function of the parameters for minimum temperatures in January

Download figure to PowerPoint

Table I. Coefficient of determination for minimum temperatures in January
 HRheightLongLatSea_dist
  1. Forward selection. H is the height, Rheight is the relative height, long is the longitude, lat is the latitude, sea_dist is the sea distance.

CD%39911035
 LongLatsea_distrheight
CD%46395460
 LongLatsea_dist
CD%626065
 LongLat
CD%6067
 Long
CD%67
Table II. Coefficient of determination for minimum temperatures in July
 HRheightLongLatSea_dist
  1. Forward selection.

CD%4747230
 LongLatSea_distRheight
CD%48495068
 LongLatSea_dist
CD%686870
 LongLat
CD%6970
 Long
CD%71

4.2. Effect of relative height (Rheight t) on RMSE and CD

Figure 5 presents CD for all the months of 2007, with and without Rheight, in the multi linear regression. The effect of Rheight in the regression is to raise the CD by about 15%. CD ranges between 69% for April and 82% for November. There is no evident seasonal variation for CD. Figure 6 presents RMSE on independent stations for all months of 2007, with and without Rheight. The introduction of Rheight in the regression lowers RMSE on independent stations of about 0.3 °C. RMSE ranges between 0.97 °C (February) and 1.49 °C (August). Also in this case there is no evident RMSE seasonal variation. The effect of Rheight is greater on RMSE of particular stations, where nocturnal inversions are very important. In most of the cases the effect is to lower RMSE by about 1 °C, but in the case of Villlanova Strisaili (in July) even by 5 °C.

thumbnail image

Figure 5. Coefficient of determination of minimum temperatures for all the months of 2007, with (continuous line) and without (broken line) relative height (Rheight) in the multilinear regression

Download figure to PowerPoint

thumbnail image

Figure 6. RMSE of minimum temperatures on independent stations for all months of 2007, with (continuous line) and without (broken line) Rheight

Download figure to PowerPoint

5. Results for maximum temperatures

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

5.1. Hierarchy of parameters

The first difference compared with minimum temperature is the seasonal dependence. Table III shows the CD for different parameters for January. The first of them is the height, which determines 86% of the variance by itself. The second parameter is not so evident and does not give a very high contribution to the CD. In this case it is the longitude that produces a rise of CD to 89%. There is no evidence that either sea distance or relative height are dominant over the other. With the contribution of all the parameters CD rises to 93%. Table IV shows the corresponding results for July. The first parameter is again the height, but it describes only 18% of the variance. Clearly, the second parameter is the sea distance, that makes the CD rise to 66%. CD rises to 75% once all the other parameters are considered too.

Table III. Coefficient of determination for maximum temperatures in January
 HRheightLongLatSea_dist
  1. Forward selection.

CD%86711330
 LongLatSea_distRheight
CD%89868686
 Sea_distLatRheight
CD%919292
 LatSea_dist
CD%9392
 Sea_dist
CD%93
Table IV. Coefficient of determination for maximum temperatures in July
 HrheightLongLatSea_dist
  1. Forward selection.

CD%18158513
 LongLatSea_distRheight
CD%35226619
 LongLatRheight
CD%746667
 LatRheight
CD%7175
 Lat
CD%75

5.2. RMSE, CD and sea distance

Figure 7 presents the CD for each month (continuous line). Seasonal variation is evident. The CD varies between 76% in July and 96% in December. In summer there is a bigger variance compared to in winter, which is not described by these parameters. Figure 8 (continuous line) presents the RMSE on independent stations for each month and a seasonal variation is evident: it ranges between 0.51 °C in March and 1.19 °C in July. The sea distance in the regression equation is to the power of 0.2. The effect of the power of 0.2 is evident in Figures 7 and 8 by comparison of the continuous line (power 0.2) with the broken line (linear power). In summer months the increase of CD is evident: 9% in June, 11% in July, 6% in August (Figure 7). The effect on RMSE (Figure 8) is nearly 0.2 °C in June, July and August.

thumbnail image

Figure 7. Coefficient of determination of maximum temperatures for each month of 2007 (sea distance with power 0.2, continuous line; with power 1 broken line)

Download figure to PowerPoint

thumbnail image

Figure 8. RMSE of maximum temperatures on independent stations for each month of 2007 (sea distance with power 0.2, continuous line; with power 1, broken line)

Download figure to PowerPoint

6. Conclusions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

The introduction of relative elevation in the regression equation for monthly means of minimum temperatures produces a 15% increase in the coefficient of determination (CD). At the same time the RMSE on independent stations is reduced by about 0.3 °C. On particular stations with intense nocturnal inversions the RMSE is reduced by 1 °C or more. Regarding maximum temperatures, the introduction of the power of 0.2 instead of 1 for sea distance produces a CD increase near to 10% in summer and a corresponding RMSE decrease of 0.2 °C. These results are relative to the whole of 2007 and have been confirmed for January and July 2008 and January and July 2009. There are a few points that could be investigated in the future. Comparing Figure 5 with Figure 7 it is evident, for winter months, that for maximum temperature the CD is about 95%, while for minimum temperatures the CD is below 80%. So, for minimum temperature there is a bigger portion of variance to be accounted for. The second point is evident again in Figure 7, where maximum temperature CD ranges between 76% in July and 96% in December, so in summer there is a bigger portion of variance to be described.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

Paolo Boi wishes to thank Francesco Battaglia for his kind help. The authors are also grateful to two anonymous referees whose comments helped to improve the manuscript. Thanks to Dr Peter Burt for his constructive suggestions.

Appendix A

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

Multilinear regression is the more general case of linear regression. In the case of five parameters (or predictors) the regression equation is:

  • equation image(A.1)

x1x5 are the predictors and in our situation are elevation, relative elevation, sea distance, latitude and longitude. The coefficients b0b5 are determined by the condition of minimize the squared differences between the observed values Yi and the predictions yi:

  • equation image(A.2)

ei are often called errors or residuals.

Appendix B

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References

The coefficient of determination (CD) is defined as follows:

  • equation image(B.1)

SST is acronym of total sum of squares (or sum of squares, total), it is the sum of squared deviations of the measurements around their mean.

SSR stands for regression sum of squares, it is the sum of squared differences between the regression prediction and the mean of measurements.

SST and SSR are related by

  • equation image(B.2)

SSE stands for sum of squared errors, the squared differences of the residuals (the mean of residuals is zero). After that CD can be defined as follows:

  • equation image(B.3)

In the case where the regression prediction is coincident with the measurements the residuals and SSE are zero, SSR = SST and CD is one or in percentage 100%. CD is the portion of the variation of measurements that is described by the regression. In case of linear regression CD coincides with the correlation coefficient.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methodology
  5. 3. The data
  6. 4. Results for minimum temperatures
  7. 5. Results for maximum temperatures
  8. 6. Conclusions
  9. Acknowledgements
  10. Appendix A
  11. Appendix B
  12. References
  • Cacciamani C, Paccagnella T, Nanni S. 1989. Objective mesoscale analysis of daily extreme temperatures in the Po Valley of Northern Italy. Tellus 41A: 308318.
  • Chessa PA, Delitala MS. 1997. Objective mesoscale analysis of daily extreme temperatures of Sardinia (Italy) using distance from sea as independent variable. International Journal of Climatology 17: 14671485.
  • Daley R. 1991. Atmospheric Data Analysis. Cambridge University Press: New York, NY; 456.
  • Daly C, Gibson WP, Taylor GH, Johnson G, Pasteris P. 2002. A knowledge-based approach to the statistical mapping of climate. Climate Research 22: 99113.
  • Daly C, Helmer EH, Quinones M. 2003. Mapping the climate of Puerto Rico, Vieques and Culebra. International Journal of Climatology 23: 13591381.
  • Gyalistras D. 2003. Development and validation of a high-resolution monthly gridded temperature and precipitation data set for Switzerland (1951–2000). Climate Research 25: 5583.
  • Jarvis CH, Stuart N. 2001a. A comparison among strategies for interpolating maximum and minimum daily air temperatures. Part I: the selection of “guiding” topographic and land cover variables. Journal of Applied Meteorology 40: 10601074.
  • Jarvis CH, Stuart N. 2001b. A comparison among strategies for interpolating maximum and minimum daily air temperatures. Part II: the interaction between number of guiding variables and the type of interpolation method. Journal of Applied Meteorology 40: 10751084.
  • Ninyerola M, Pons X, Roure JM. 2000. A methodological approach of modelling of air temperature and precipitation through GIS techniques. International Journal of Climatology 20: 18231841.
  • Uboldi F, Lussana C, Salvati M. 2008. Three-dimensional spatial interpolation of surface meteorological observations from high-resolution local networks. Meteorological Applications 15: 331345.
  • Wilks DS. 1995. Statistical Methods in the Atmospheric Sciences. Academic Press, Inc.: San Diego, CA; 466.