The selection of predictors in a regression-based method for gap filling in daily temperature datasets


  • Gianmarco Tardivo,

    Corresponding author
    1. Dipartimento di Agronomia Animali Alimenti Risorse Naturali e Ambiente, Università degli Studi di Padova., Legnaro, Italy
    • Correspondence to: G. Tardivo, Dipartimento di Agronomia Animali Alimenti Risorse Naturali e Ambiente, Università degli Studi, di Padova., Viale dell'Università, 16-35020 Legnaro, Padova, Italy. E-mail:

    Search for more papers by this author
  • Antonio Berti


The presence of gaps in meteorological time series is a very common problem for long term studies, for example when computer activity is needed to carry out general climatological analysis. This problem can be solved through a method to reconstruct missing data; the method must be adapted to the density of the suitable stations and the climate zone which they belong to. Regression-based ones are among the most important methods used to carry out such reconstructions. A suitable search strategy for identifying the best reconstructing stations is a basic requisite for the proper implementation of this class of methods. In this article a detailed analysis of the effects of the number of predictors for a regression-based approach and their search strategy is presented. The multiple correlation between stations, related to the distance from the target station, was studied checking performances with a recently published regression model. This study was carried out for daily data of minimum, mean and maximum temperature of a dense network (111 stations within an area of a ∼76.5 km radius, on average). For the density of this network and comparing the system through different values of distance from target station, a better performance was achieved when the maximum radius within which to start searching for predictors was equal to or greater than 40 km. As a consequence it can be deduced that stations used to reconstruct gaps do not strictly need to be close to the target station. Setting the maximum number of predictors at four, and the maximum radius at exactly 40 km significantly reduces the number of the cases in which the reconstructed values present a reversing of the natural order: minimum < mean < maximum temperature.