Climate hazards cause human and economic impacts on a global scale. Floods and droughts in particular have major human and economic effects (Bruce, 1994; Obasi, 1994), and both are closely related to precipitation intensity, frequency and duration. An increase in precipitation-related natural hazards and consequent economic loss have been reported by several authors (Karl and Easterling, 1999; Meehl et al., 2000; Peterson et al., 2002; Aguilar et al., 2005; Hundecha and Bardossy, 2005; Moberg and Jones, 2005), and can be attributed to an increase in the vulnerability of society to these events (Kunkel et al., 1999). Climate change models indicate that the frequency and magnitude of extreme precipitation events could increase markedly in a number of regions during this century (e.g. Boroneant et al., 2006; Frei et al., 2006; Pall et al., 2007). This is likely to be accompanied by changes in the frequency distribution of precipitation (Katz and Brown, 1992; Easterling et al., 2000), affecting both the intensity and duration of precipitation-related extreme events, such as drought and floods.
Accurate estimates of the risk of extreme precipitation events and droughts are needed for agricultural and water management, land planning, public works programs and in other sectors. Extreme value analysis techniques are normally used to obtain frequency estimates of dangerous phenomena, and to enable the probability of occurrence and the return period of extreme events of a given magnitude and/or duration to be assessed (e.g. Smith, 1989, 2003).
The analysis of precipitation-related processes requires spatially dense databases with a high frequency of precipitation records (daily or sub-daily). The vast majority of available information is at a daily time scale, and sub-daily information is not widely available. Long time series datasets are the most valuable, as the reliability of frequency estimations is closely related to the sample size used during the analysis process (Porth et al., 2001). Long time series are also necessary in analysis of the temporal variability and trends of extreme events, and to estimate the risk and probability of these events.
Long-term, dense and reliable daily precipitation databases are uncommon for several reasons. Changes in the location of observatories within the same locality are frequent, resulting in fragmented or inconsistent data series. Human error can occur during the process of observation, and in the transcription and digitization of data (Reek et al., 1992). In addition, measurements at a meteorological station can vary as a consequence of instrument deterioration or replacement, variations in the time of observations, and changes in the surrounding environment. These factors increase noise in the data, and can lead to inhomogeneities that make the data unusable (Peterson et al., 1998; Beaulieu et al., 2007).
To overcome these problems, and to construct a reliable database for extreme value analysis, a process of reconstruction, quality control and homogenization of precipitation data is needed. This approach is common for monthly precipitation series, and several complete and homogeneous precipitation databases have been created (e.g. González-Rouco et al., 2001; González-Hidalgo et al., 2004; Brunetti et al., 2006). Several homogeneous databases of temperature series have also been generated at a daily time resolution (e.g. Manton et al., 1998; Brunetti et al., 2006), and some approaches have been developed for homogenizing this variable (Allen and DeGaetano, 2000; Vincent et al., 2002; Brandsma and Könen, 2006).
However, despite its importance, no complete protocols are available for processing daily precipitation datasets. Various procedures have been developed for filling data gaps in daily precipitation series (Karl et al., 1995; Eischeid et al., 2000), and for automatic (Feng et al., 2004) and manual (Griffiths et al., 2003; Aguilar et al., 2005) quality control of the datasets. These procedures are widely applied before analysis of daily precipitation data. Nevertheless, testing for inhomogeneities in daily precipitation datasets is not common, and although some reports exist (e.g. Schmidli and Frei, 2005; Tolika et al., 2007) they have usually been in relation to precipitation volume and not precipitation frequency, which is also crucial for guaranteeing the homogeneity of daily precipitation datasets (Wijngaard et al., 2003; Viney and Bates, 2004). Although there have been some recent advances in the creation of quality-controlled and homogenized daily precipitation datasets (e.g. Peterson et al., 2008), complete procedures for optimizing all the available information are lacking, as typically only the longest and most complete series are used. This suggests inefficient use of the available data, and the possibility of obtaining spatially dense information, particularly in countries where highly fragmented series not fulfilling the standard criteria of length and completeness exist among neighbouring areas.
Some studies in the Iberian Peninsula have focused on the creation of daily precipitation databases for spatial and temporal studies of climate change and climatic hazards. Romero et al. (1998) created 410 complete daily precipitation series for the Spanish Mediterranean provinces, using information derived from 3366 individual series. This analysis was mainly focused on filling gaps, and the homogeneity of the resultant series was not checked. Moreover, the final dataset is temporally limited, covering only the period from 1964 to 1993. Lana et al. (2004) developed a daily precipitation database for 1950–2000 from 75 rain gauges in Catalonia (northeast Spain). Homogeneity was checked using monthly totals, but there were gaps in the dataset. Although some global daily precipitation databases exist (e.g. Peterson et al., 1997; Gleason, 2002), the spatial density of data is very poor in most regions. For example, the Global Daily Climatology Network (GDCN; http://www.ncdc.noaa.gov/oa/climate/research/gdcn/gdcn.html#precip) includes only 19 precipitation observatories for the Iberian Peninsula, which is inadequate for capturing the large spatial variability that characterizes precipitation in this region. Klein-Tank et al. (2002) also developed a database for daily precipitation in Europe, but the spatial density is too low (10 observatories in Spain) to enable adequate spatial analysis.
This paper presents a process for reconstruction of a spatially dense database of daily precipitation records for the northeast part of Spain, using data since 1900 from the archives of the Spanish National Institute of Meteorology. The process included selection of suitable observatories for the reconstruction, gap filling, identification of anomalous and questionable records, and homogeneity testing. The objective was to construct a spatially dense, continuous, long, and reliable database for climate studies, reducing as much as possible the signal-to-noise ratio and eliminating all likely inconsistencies.
The original database comprised 3106 daily precipitation observatories in the study area, whose activities spanned the period of existence of the Spanish National Institute of Meteorology (1900–2002). The boundaries of the database correspond to administrative limits, and include 18 provinces in the northeast of Spain with a total area of 159 423.7 km2 (Figure 1). The spatial density is very high (one observatory per 51.3 km2), although variation in the density among regions is also high.
Standardization of instruments is very important to ensure the temporal homogeneity of the data. Pluviometers used in the Spanish observation network are approved and normalized by the National Institute of Meteorology, which provides the instruments and guarantees their uniformity. The Hellmann pluviometer was officially adopted in 1911 for the Spanish precipitation network. This device is characterized by a hollow cylinder with a funnel formed at one end, and is placed on a pole at a height of 1.5 m. The opening of the pluviometer has a surface area of 200 cm2, and it has a brass ring with a beveled edge to ensure the surface area remains constant and to avoid splash. Measurements are taken twice daily, at 9:00 h and 14:00 h. The measurement protocol has remained unchanged since 1911. Only two of the data series used in this study contained precipitation data prior to 1911. Therefore, the database was not affected by changes in instrumentation or measurement protocol.
The original database series were highly variable in terms of record lengths and the quantity and duration of data gaps. No information was available about data acquisition and the history of each observatory (metadata), including changes in location, measurement conditions, observers, and observation times. Metadata are very useful for assessing the quality of a series and to identify possible errors, but are frequently missing in raw climate databases.
The data collection periods for the majority of observatories were very short (374 had less than 60 months of complete records, and 1723 had 240 months or less). Only 286 stations had more than 600 months of records (50 years of data). The database was also very fragmented, with several observatories located in the same locality, but covering different periods.
The method comprised three main steps. The first involved reconstruction of the precipitation series with the objective of deriving continuous and long-term series by combining short-duration series from nearby observatories, and the filling of gaps by using auxiliary information obtained from nearby observatories. The second step was a quality control assessment of the reconstructed series to identify and substitute anomalous and questionable records in the database (negative precipitation, extreme precipitation events, some zero values, and records that differed markedly from values recorded in neighbouring observatories). The third step tested the homogeneity of the reconstructed series using four parameters of the series. This enabled identification of complete series and removal of periods for which data were not homogeneous; the latter was carried out to avoid the presence of spurious information in the final dataset. The three steps, the decisions taken, and the products obtained are described below in detail.
3.1. Reconstruction and gap filling
The reconstruction of a single long time series from a number of shorter series from neighbouring observatories enabled optimization of the highly fragmented daily precipitation data typical of many datasets. Reconstruction relied on the assumption that the cessation of data recording at one observatory, and the establishment of one or more new observatories close to the existing one, results in two or more data series which are usually not useful for climate analyses as a consequence of their short duration. Nevertheless, if the observatories are sufficiently close the differences in precipitation records are usually very small, so data from the shorter series can be combined into a single series, which is ascribed to the last observatory that collected the data. It is important to note that this approach is only valid where observatories are separated by short distances, and it is assumed that the combined series can exhibit inhomogeneities due to the reconstruction process. These inhomogeneities need to be identified and removed from further analyses.
A review of the literature showed that no general criterion exists for the selection of observatories suitable for reconstruction. Lana et al. (2004) is of the opinion that a minimum of 31 years of data in a 50-year period was necessary for a series to be included in a daily precipitation database for Catalonia. In a study involving the Spanish Mediterranean coast, Romero et al. (1998) used only those series with less than 10% of data missing in a 30-year period. Similarly, Eischeid et al. (2000) set a maximum of 48 months of missing data in a 40-year period as the criterion for rejecting a data series in western USA. From a set of 181 series, Haylock and Nicholls (2000) used only those that had less than 10 days missing per year for at least 80 years in the 88-year period they considered. Moberg and Jones (2005) were more restrictive in a trend analysis for the whole of Europe, selecting only those series with less than 3 years with data gaps in a total of 89 years.
We followed different criteria to select suitable data series for reconstructions as a function of temporal duration, the period covered, and the data gaps. Data series that covered a period of less than 15 years were considered too short to be suitable for reconstruction. This group of observatories (1106; labelled as Z in the dataset) were reserved for use in reconstructions of data from other observatories. The exception was where Z observatories included data from 2000–2002; these series were considered of high value as they could complement long-term data from nearby observatories that had ceased data collection in previous years. A total of 37 series matched this criterion. The remaining series (1963) were divided into two groups. Series A included those series (1094) covering a period of more than 25 years with less than 10 years of data gaps. Series B included those series (869) covering a period of 15 to 25 years, or a period of more than 25 years but with more than 10 years of data gaps. Series labelled as A and series labelled as B were considered suitable for reconstruction.
The next step was to fill data gaps in the time period covered by each series. Approaches to gap filling in daily climate series (Eischeid et al., 2000) involve consideration of only the data in the series, or involve use of the data from nearby observatories. Karl et al. (1995) and Brunetti et al. (2001) filled missing values by generating random rainfall amounts, based on the probability distributions of the variables studied. The goal of this procedure was not to give a realistic estimate of the unknown daily values, but to obtain a data series of equal length without changing the probability distributions of rainfall amounts. Nevertheless, for other applications it is more reliable to use methods based on the values recorded at nearby observatories (Paulhus and Kohler, 1952; Eischeid et al., 2000). We focused on methods based on the information from neighbouring observatories, and tested three different procedures: the nearest neighbour, inverse distance weighted interpolation, and linear regression methods.
To compare the methods, artificial data gaps were created by randomly removing 1% of the available observations from the A and B observatories. A total of 1963 series were tested, involving creation of 181 861 artificial data gaps using the nearest neighbour, inverse distance weighted interpolation, or linear regression method. After applying the three methods to these data, the root mean square error (RMSE) of the reconstructed gaps was used to choose the best method.
1.In the nearest-neighbour method data gaps were filled directly with data from the closest observatory that had information. To apply this method, two criteria were established: the nearest neighbour had to be within a radius of 15 km of the target observatory, and the correlation (Pearson's r) between the daily precipitation series from both observatories had to be higher than 0.5, with a minimum of 3 years of common data. These criteria were based on the average distance among observatories for the complete dataset, and the average correlations among observatories at different distances. Descriptive statistics showed that the average number of neighbouring observatories within a radius of 5 km was very small (3), but increased to 9 with a 10-km radius (Figure 2). However, the availability of observatories was highly variable across regions. Thus, 25% of the observatories have less than 6 neighbours within a radius of 10 km. To overcome this problem, we selected a threshold radius of 15 km, for which only 5% of the observatories had less than 10 neighbours. This threshold was not large enough to lead to important differences in the precipitation conditions among observatories.As a distance of 15 km may not have ensured similarity in precipitation conditions between two sites (e.g. in the case of strong elevation differences), we established an additional criterion in fixing the distance threshold, based on the correlation between the two series. The average correlation in daily precipitation between pairs of observatories with a minimum of 3 years of common data decreased rapidly as a function of distance from 1 to 50 km (average r from 0.78 to 0.45, Figure 3). At greater distances the decrease was slower but sustained. At a distance of 15 km, the average correlation was r = 0.62, but for greater distances (e.g. 25 km) the correlations were lower (average r = 0.57) in order to achieve a higher number of neighbours. In contrast, for shorter distances (e.g. 10 km, r = 0.67) the number of neighbours decreased markedly, as indicated above. Thus, a threshold distance of 15 km appeared to be a good compromise.
2.For interpolation from neighbouring data series we selected a local method based on the inverse distance weighting (IDW):
where z(xj) is the predicted value according to the weighted average of the data at points z(x1), z(x2), …, z(xn). The distance (d) between z(xi) and z(xj) is the weighting factor, and we used an r value of 2. We fixed a maximum distance of 15 km for the interpolation.
3.In the linear regression method, missing data were obtained by determining the most correlated single independent series. To avoid negative values and to retain the zeros, the regression line was forced to pass through the origin, providing a model only with a slope coefficient. This approach has been used to reconstruct daily temperature series (e.g. Allen and DeGaetano, 2001), as this variable is not affected by abrupt spatial changes, and varies gradually in space. Linear regression is very suited to obtaining reliable dependence models among a candidate observatory and auxiliary observatories used in the reconstruction. Additional problems arise with daily precipitation series, as these usually show lower correlations even among close observatories (Auer et al., 2005).
Of the three methods, the nearest-neighbour method provided the best results, with an average RMSE of 1.05 mm (with a range of 0.23–5.7 mm between stations). The IDW approach had an average RMSE of 1.23 mm (range 0.31–7.2), and the linear regression method had an average RMSE of 1.31 mm (range 0.30–6.9).
The performance of the three methods could not be evaluated solely on the basis of the RMSE, because a high RMSE value can mask important changes in the frequency of rainy days and extreme values. The series from the Fabra observatory in Barcelona illustrates this problem (Figure 4). This series was of good quality, had complete records (refer details about the observatory in Rodríguez et al., 1999), and had a large number of neighbouring observatories. The entire series was reconstructed between 1913 and 2002. The conclusions derived from the Fabra observatory can be generalized in respect of the other observatories. Among the three methods, the IDW and regression methods reduced the frequency of extreme values, and increased the frequency of events less than 50 mm. The nearest-neighbour method more accurately reconstructed the frequency of the most extreme events in the series. In comparison with the original series, the IDW reconstruction noticeably decreased the total number of dry days and increased the number of rainy days, affecting the number and duration of dry and wet spells and the average precipitation per day. This was a direct consequence of neighbour averaging, as a single neighbouring series with a daily record above zero resulted in the reconstructed series changing from a dry to a wet day. The IDW method included contributions from poorly correlated series, as all the observatories within 15 km were included. An alternative strategy may have been to weight the neighbouring data according to the correlation coefficient rather than the distance. However, this would not have avoided a decrease in the number of rainy days, as this is intrinsic to the averaging nature of the method, and independent of the weighting criterion chosen. In contrast, the regression method overestimated the number of dry days and underestimated the number of wet days. The nearest-neighbour method provided statistics closest to the original record, and also maintained the distribution characteristics of the original series better than the other methods. This evidence favoured the nearest-neighbour method over more sophisticated procedures involving several neighbours, and we therefore chose this method for gap filling in the dataset. For this purpose, we used the Z series within a 15-km radius of the target A series, and with a correlation coefficient higher than 0.5. If some gaps remained in the A series after this process, we also used the B series (lacking information until 2000–2002) within the 15-km radius. The B series used to fill gaps in some A series were discarded in subsequent reconstructions. If no data were available within a 15-km radius, we rejected all the data prior to the gap so as to avoid potential inaccuracies introduced by using data far from the target observatory. The unique exception to this was the 1936–1939 period, during which the instability caused by the civil war markedly reduced the number of observatories. In the few observatories for which gaps remained during these years, the earlier information was not deleted. A total of 862 observatories from the 1094 original A series were filled following this criterion. The remaining observatories (232) had the data removed before the gaps.
The remaining B series were also completed using the Z series and neighbouring B series, always located within a radius of 15 km. The Z series were used first, and the remaining gaps were completed using the B series. To avoid redundant information, we gave preference to the B series with data until 2000–2002. If both series did not reach this date, the shorter series was used to complete the longer series. The series used for gap filling were subsequently discarded. The data gaps in the remaining 37 Z series with data until 2000–2002 were filled following the same procedure.
After the gap-filling procedure, there were 1663 complete series comprising 1094 (A), 532 (B) and 37 (Z, until 2000–2002). Many of the completed series covered different periods. As the objective was to obtain a complete and reliable series up to the present, we performed a reconstruction procedure to create new series updated from near-complete series that covered different periods. As this procedure was a key issue for the creation of the database, the reconstruction process was done manually with the aid of a geographical information system in which the location, the data period, and the topography were available. This was a unique step in the creation of the database. Although an automatic procedure would be desirable for performing this step, we were unable to find an optimal approach that allowed merging of the series. The topographical diversity (including elevation, topographic barriers, different atmospheric influences among neighbouring valleys, and other factors), and the need to avoid redundant information, necessitated use of a manual process for this reconstruction.
Long series from observatories without data up to 2000–2002 were assigned to observatories with data updated to 2000–2002, which were located in the same or nearby municipalities (always within a radius of 15 km) and had similar topographic conditions (less than 100 m difference in elevation), using the nearest-neighbour method. Those series that finished before 2000 and lacked neighbours meeting these criteria were eliminated from the final dataset. When daily information was coincident among two or more observatories for the same day, data of the observatories containing data up to 2000–2002 were preferred. The spatial location of the reconstructed series was assigned to the location of the observatory with data updated to 2000–2002.
The result of this manual reconstruction process was 934 observatories with complete records until 2000–2002. Therefore, of the 3106 original observatories, 2172 were used in the reconstruction process, and thereafter discarded according to the criteria described above. Of the 934 observatories with complete records, 383 (41%) were reconstructed or combined with other observatories to provide data covering more than 20 years; 229 (24.5%) were reconstructed with 5–20 years of data; and 322 (34.5%) had data gaps less than 5 years. The spatial distribution of the reconstructed series (Figure 5) showed a homogeneous distribution in the study area, although a higher density of data series was present in some areas, such as in the south of the Lérida, Barcelona, and Castellón provinces, and the centre of Huesca and Álava provinces. The spatial density was lower than in the original series (170.7 km2 per observatory, compared to 51.3 km2 for the original dataset). All the provinces are well covered, and the presence of bias in the spatial distribution of the observatories used in reconstructions was discounted.
The number of data series available varied with respect to the starting date. Only two series were available with complete data from 1901. For 1920 onwards there were 68 series, and after 1940 there was a progressive increase from 206 series to a maximum of 934 in 1988. After 2000 there was a decrease in the number of available series until 2002 (December 2002 = 834 series).
3.2. Quality control
The objective of quality control was to identify erroneous or questionable records in the climate datasets. Errors in climate series are a common problem arising from sources such as the condition of instruments, and in data processing including collection, transcription and digitizing (Reek et al., 1992). As a consequence of the simplicity of rain gauges, very few instrument errors were expected in the databases used for this work. Nevertheless, errors arising from the data processing are equivalent to other climatic variables such as temperature.
Several criteria have been proposed for identifying erroneous data in climate series. As most errors are due to outliers (exceptionally high or low spurious values), some authors identify questionable values from a fixed threshold derived from the average and standard deviation of the series. The data of the previous and following day can also be used to identify anomalous spikes in variables in a series, such as temperature (Gleason, 2002). However, this approach cannot be applied to precipitation series due to its high temporal and spatial variability. A better approach to identifying outliers is comparison of daily values among neighbouring observatories. Several procedures can be followed for this spatial comparison. Feng et al. (2004) used linear regression for daily precipitation data among each observatory and the 5 most correlated observatories. The regression residuals were used to identify questionable values. Griffiths et al. (2003) identified major outliers in the rainfall records, and manually assessed whether the values were related to real meteorological events such as flooding or heavy rainfall events. They also checked for internal consistency against records from nearby localities. A similar manual approach was followed by Manton et al. (2001) in the South Pacific, Brunetti et al. (2001) in Italy, and Aguilar et al. (2005) in Central America.
Due to the high density of data available, we were able to follow an approach based on comparison of the rank of each data record with the average rank of the data recorded in adjacent observatories. The original daily precipitation series were converted to percentiles, after eliminating the zero values; this accounted for more than 60% of the data. Each precipitation value was replaced by its corresponding percentile, according to the complete series. After transformation the zero values were assigned a zero percentile. For each data series we selected the observatories located within a radius of 20 km, and set a criterion of a minimum of 4 observatories as a condition for performing the test. Where this criterion could not be met, the daily value of the target observatory was not compared.
Only records above the 99th percentile were checked in the 1st stage. The maximum allowed difference between a candidate observation and the average values of the percentiles in the neighbouring observatories was set at 60 percentile units. If the difference was higher than this the candidate observation was considered questionable. These values were flagged and substituted with data from the closest series. In the second stage, the records below the 99th percentile were compared to the average of the neighbouring series. In this case a difference of 70 percentile units was set as the threshold for identifying questionable data, and values exceeding this were flagged and substituted with data from the closest observatory.
Another common source of error is the inclusion of false zero values (Viney and Bates, 2004). Hence, zero values coinciding with substantial precipitation in nearby observatories were flagged following a similar approach, and if the average percentile in the neighbouring observatories was higher than 50, the zero value of the target observatory was substituted with data from the closest observatory.
The thresholds described above were set after analysing a number of series and showing that these values optimized the identification of questionable data. Different thresholds may be required in places where the climate characteristics are different.
Following Reek et al. (1992), we also checked the series for the occurrence of identical values (excluding zeros) on at least 7 consecutive days. These data were also flagged and substituted.
On average, the proportion of data substituted using the above criteria was 0.1% in each observatory (range 0–1.04%) (Figure 6). Only 47 series from a total of 934 had more than 0.4% of the data replaced. The highest proportion of data rejected in any one series using the described process was 1.04% (Figure 6). Most of the replacements (63.8%) corresponded to zero values (Figure 6). These values are similar to those reported in other studies. For example, Feng et al. (2004) found an average of 0.03% of the data questionable, while Reek et al. (1992) reported a figure of 0.04%.
As the methodology described can affect the probability distribution of the most extreme records in a series, a test was performed using standard methods for extreme value analysis. For this purpose we calculated the L-coefficients of skewness and kurtosis of the data series before and after the quality control process. Partial duration series (PDS) or series of peaks over a threshold were extracted in order to isolate only the extreme values (Beguería, 2005). Given a precipitation series X = (x1, x2, …, xi), where xn is the observation on a given day, the PDS Y = (y1, y2, …, yj) consists of the exceedences of the original series over a predetermined threshold, x0:
Therefore, the size of the series obtained depends on the value of the threshold, x0. For each series, the values corresponding to the 90th and 95th percentiles before and after the quality control process were used as thresholds for constructing the PDS.
The L-coefficients of skewness (τ3) and kurtosis (τ4) were calculated as follows:
where λ2, λ3 and λ4 are the L-moments of the PDS series. These were obtained from the probability-weighted moments (PWMs) of the series, using the formulae:
The PWMs of order s were calculated as:
where Fi is an empirical frequency estimator corresponding to the data xi. Fi was calculated following Hosking (1990):
where i is the range of xi in the PDS arranged in ascending order, and N is the number of data records.
We found that the relationship between the values of τ3 and τ4 before and after the quality control process was approximately linear, and noticeable changes were observed in only a few series (Figure 7). This provides evidence that the quality control process did not significantly affect the statistical characteristics of the extremes, with the exception of a few observatories that had greater differences from the surrounding series.
3.3. Homogeneity testing
A common problem in climate data series is the presence of inhomogeneities. The majority of these appear as abrupt changes in the average values, but also appear as changes in the trend of the series (Alexandersson and Moberg, 1997). Inhomogeneities in climate series can result in substantial misinterpretation of the behaviour and evolution of climate. Inhomogeneities can arise from human causes such as changes in the location of the observation station, alteration of the surrounding environment, observer changes and instrument replacement (Karl and Williams, 1987). Accumulation of daily precipitation over several days is another important problem that can introduce inhomogeneity into daily precipitation series (Viney and Bates, 2004), and the reconstruction of time series through the union of two or more series (the approach followed in this study) is a common source of inhomogeneities (Lanzante, 1996; Peterson et al., 1998). If a series is identified as non-homogeneous, use of the data for trend and variability analysis becomes questionable, and it is usually discarded.
A variety of methods have been developed to identify inhomogeneities in climate data series (read the reviews in Peterson et al., 1998 and Beaulieu et al., 2007). There are two general types of homogenization procedure: 1) absolute, which considers only the information in the time series being tested, and 2) relative, in which data from other observatories are also used. The latter procedure is more reliable as it involves comparison of the temporal evolution of a candidate series with that of a reference series created from correlated series nearby.
The majority of methods are focused on monthly, seasonal and annual data (Peterson et al., 1998). There is no standard approach for daily precipitation series because of the high spatial and temporal variability of this variable, and the difficulties in correcting the series if inhomogeneities are found. For this reason, the homogeneity tests applied to daily precipitation series can only identify the temporal inhomogeneities in the series enabling elimination of the periods and/or series which are not homogeneous.
Given the lack of methods for directly testing the homogeneity of daily precipitation series, the most common approach is to apply the techniques used for monthly precipitation series, after transformation of the daily series to a monthly equivalent (e.g. Brunetti et al., 2000; Feng et al., 2004; Lana et al., 2004; Schmidli and Frei, 2005; Tolika et al., 2007).
While this approach is valid only if the volume of precipitation is analysed, inhomogeneities in daily precipitation can be much more complex, as inhomogeneities can affect other parameters. For example, attributing multi-day rainfall accumulations to a single day is a common problem in daily data series (Viney and Bates, 2004). This practice reduces the number of rainy days and increases the average precipitation per rainy day, and may cause significant changes in the recorded frequency distribution of daily precipitation series, and the length of dry and wet spells. For series with many multi-day accumulations, Suppiah and Hennessy (1996) found an effect on temporal trends in percentiles when accumulations were either distributed or ignored. Changes in the observation protocol (e.g. through a change of observer) can produce inhomogeneities in the frequency of rainy days without causing an inhomogeneity in the monthly precipitation record. Therefore, there is a need to test the precipitation volume, and also the precipitation frequency and intensity. Wijngaard et al. (2003) tested the homogeneity of daily precipitation records by means of wet day count series rather than precipitation amounts. They argued that wet day counts have lower variability than series comprising annual amounts, and hence the former facilitate easier detection of inhomogeneities.
In this study we tested the homogeneity of the reconstructed and quality-controlled daily precipitation series using four monthly parameters: 1) monthly precipitation amount, 2) monthly average number of rainy days above 1 mm, 3) monthly maximum precipitation, and 4) number of days above the 99.5th percentile. Following Wijngaard et al. (2003), we adopted a 1-mm threshold for the second criterion because using a lower threshold (e.g. any precipitation) usually leads to a high rate of false inhomogeneities, caused solely by errors in measuring very low amounts. Calculation of the 4th criterion at a monthly time scale would have yielded a sequence of mostly zeros, but as we describe below, the homogeneity testing was performed seasonally and annually. As homogeneity testing was performed using averages of long time periods, this approach helped to identify changes in the frequency of the most extreme precipitation events, which could have been due to inhomogeneities in the dataset.
Of the several methods for detecting inhomogeneities in climate series, we used the standard normal homogeneity test (SNHT) developed by Alexandersson (1986) for single breaks. This is the most widely used test for detecting inhomogeneities in climate series (e.g. Keiser and Griffiths, 1997; Moberg and Bergstrom, 1997; González-Hidalgo et al., 2002). Various comparative studies of interpolation methods have shown that this method is better than other approaches, and facilitates detection of small breaks and multiple breaks in a series (Easterling and Peterson, 1992; Ducré-Robitaille et al., 2003).
As the reliability of inhomogeneity detection increased through the use of relative homogeneity methods based on information from neighbouring stations, we calculated reference series for each observatory. Although a single neighbouring series of good quality can be used as a Ref. (Keiser and Griffiths, 1997), it is very difficult to ensure that a series to be used as a reference to other series is completely homogeneous. In this study we used the approach of Peterson and Easterling (1994), as modified by González-Hidalgo et al. (2004), which uses several neighbouring stations to create a reference series for each of the four parameters analysed. The probability of inhomogeneities is therefore minimized, since all the series are considered as a whole.
To create the reference series we considered all the observatories within a radius of 50 km from the candidate observatory, according to:
where PR, i is the observation for the reference series in month i, Px, i is the observation at observatory x in month i, and wx is a weighting factor. Peterson and Easterling (1994) used the coefficient of correlation between the candidate series and each surrounding series as the weighting factor. However, they considered that the presence of discontinuities in the series could alter the coefficients of correlation, so they calculated the correlation from the series of differences according to:
where D is the difference between the two series for month i, and P is the observation corresponding to month i.
Correlations were calculated using monthly precipitation series; hence, 12 coefficients of correlation were obtained for each observatory. We discarded those observatories with any month having a correlation coefficient lower than 0.6. Finally, the weighting factor used for each observatory was the average of the correlation coefficients obtained for the 12 monthly series. The ProClimDB software (Štepánek, 2007a) was used to automate calculation of the 3736 reference series (4 for each observatory).
The AnClim software (Štepánek, 2007b) was used in the application of the SNHT to each observatory and parameter. The test was applied to seasonal and annual series of the four parameters, since this approach yields better results than using only monthly series. For each seasonal and annual series a T series was obtained using the SNHT. If the value of T in each month exceeded a certain threshold, the series was flagged as inhomogeneous. The threshold T value can be set to any given confidence level (α), and in this study a value of α = 0.05 was used (refer values in Alexandersson and Moberg, 1997). As a consequence of the substantial length of some climate series, some short inhomogeneous periods could be hidden after testing. To avoid this problem a sequential splitting procedure was applied after each 30 years of data, to detect short inhomogeneous periods (Štepánek, 2004).
As a consequence of the large quantity of information, we established an automatic criterion to accept or reject inhomogeneities. Flagged inhomogeneities were accepted only when they appeared in the annual series and a minimum of two seasonal series. Since the temporal location of the inhomogeneities can vary within a range of some years, a maximum difference of eight years was allowed between inhomogeneities found in the annual and seasonal series. Those data series with two or more inhomogeneities were removed from the dataset. In series in which one unique inhomogeneity was found, the period prior to the inhomogeneity was also removed, as correcting inhomogeneous periods in daily precipitation records is exceedingly difficult. We preferred to lose some information but retain the remaining high-quality data for subsequent climatic studies.
As explained above, we tested for series homogeneity using four variables. Firstly, we tested the homogeneity in the series of precipitation amounts. After removing the inhomogeneous series and periods, the remaining series were tested for inhomogeneities in the number of rainy days, and subsequently for inhomogeneities in the maximum values and the number of events above the 99.5th percentile. A total of 260 inhomogeneous series were found using monthly precipitation amounts, and 74 of these were discarded because they contained two or more inhomogeneities (Table I). A total of 157 inhomogeneous series were found at the second step, and 32 of these were discarded. Finally, 25 inhomogeneities were found in the extreme series, but no series were discarded. At the completion of the entire process, 407 series were found to be inhomogeneous, corresponding to 43.6% of the total series. For 301 series, the period prior to the inhomogeneity was deleted, and 106 series were completely discarded because they had two or more temporal inhomogeneities. We also analysed the impact of data reconstruction on the homogeneity of the series, to determine if inhomogeneities were introduced in the process of reconstruction and data filling using data from nearest neighbours. Table I shows the number of inhomogeneous series detected in series with more than 20 years of reconstructed data, stations with a reconstructed period of 5–20 years, and series having less than 5 years with data gaps. Of the 383 stations with reconstructions exceeding 20 years, 192 were inhomogeneous. Of the 229 stations with reconstructions of 5–20 years, 113 were inhomogeneous, and of the 322 non-reconstructed series filled only for periods less than 5 years, 137 series were inhomogeneous. As expected, a larger percentage of inhomogeneous series was found for long reconstructions (50.1% of the total) than for short reconstructions. However, there were no large differences between the series with reconstructions shorter than 20 years and the non-reconstructed series (49.3 and 42.5% of the total, respectively). This result indicates that the data gap-filling process can introduce inhomogeneities to the series. Nevertheless, with careful selection of neighbours using restrictive distance and correlation criteria, the number of artificial inhomogeneities added during the process could be minimized. In any case, these inhomogeneities were identified and eliminated during the four-step homogeneity testing.
Table I. Results of homogeneity testing. The number and percentage of inhomogeneous series for reconstructed and non-reconstructed series are also shown
Number of inhomogeneities
Number of days with precipitation > 1 mm.
Monthly maxima and number of days with precipitation above the 99.5th percentile
> 20 years
> 5 years < 20 years
< 5 years
The time evolution of the number of inhomogeneities showed an irregular distribution (Figure 8). A higher number of inhomogeneities was found in the decades of the 1920s and 1930s, and also in the decade of the 1960s. Nevertheless, with the exception of the year 1968, the number of inhomogeneities per year from 1960 to 1990 was very regular, affecting 1–2% of the available series.
In general, we found that for some significant inhomogeneities detected in monthly totals in some years it was very difficult to decide whether the temporal inhomogeneity in the series was real, and to which year it should be attributed. However, if seasonal and annual data were incorporated, inhomogeneities were more apparent. Therefore, we used the daily precipitation series aggregated in seasons and years to identify temporal inhomogeneities, which could be recognized in the variation of the T-values (shown in the example in Figure 9). Figure 10 shows the results of applying the SNHT to the l'Ametlla de Mar series using the seasonal and annual precipitation amount series, and also the series of number of rainy days. This example shows the situation found in some observatories, whereby no inhomogeneities were found in the series of precipitation amount, but significant inhomogeneities were found in the series of the number of rainy days (the middle of the 1970s in this example). Examination of the time evolution of the series and the associated T statistic did not provide any extra information. Comparison with the reference series clearly showed an accumulation of the same precipitation amounts in fewer days during the period 1950–1974. After this period the number of rainy days was again very similar to the reference series. This error is explained by the attribution of several days of rainfall to a single day, which could affect the frequency distribution of precipitation events, the average precipitation per event, and the duration of dry and wet spells, as discussed earlier. Therefore, the inhomogeneous period prior to the detected break was removed from the final dataset. This inhomogeneity would not have been identified if only monthly amounts had been used.
A final example shows a case in which an inhomogeneity was found in the series of monthly maxima for 1961 (Figure 11). However, it was not detected in the series of precipitation amount or number of rainy days, and was attributed to errors in the rain gauge recording of the most extreme events.
In this section, we present some results of comparisons of the characteristics of the database at various stages of the process. The usefulness of the database for several types of climate analysis is discussed. Some basic analyses were also performed on the spatial distribution of precipitation in the study area to illustrate the improved performance of the final database relative to earlier stages.
As a tradeoff in establishing strict quality control criteria, a reduction in the spatial density of data occurred (Figure 12). Of the original 3106 series, the final database consisted of 828 series comprehensively covering the study area. Some areas had lower data density, including the Pyrenean Range in the Lérida Province, the north of the Castellón Province, and the central areas of the Burgos Province.
The availability of data in the database varied greatly as a function of the length of the data series (Figure 13). For example, after the reconstruction process, a total of 207 series starting in 1940 were available. After homogeneity testing, this was reduced to 117 series (i.e. 56%). More importantly, there was a decrease from 471 to 291 series in the 1960s. Although the amount of data discarded in the homogenization process was large, the remaining series were adequate in terms of the quality and temporal homogeneity of the database. The spatial coverage of the database was also affected by variation in the length of the series (Figure 14). Although the number of series with records since 1920 was very satisfactory (34), the majority of these were located in the Cataluña (24) and Aragón (8) regions. For series with data since 1935 (92), there was an increase in the number of provinces represented, but the data density remained higher in the Cataluña region. This situation reduces the usefulness of the database for reliable spatial analysis, although long-term studies focused in the regions with good data density can still be undertaken. The number of series available from 1950 increased noticeably to 190, providing good coverage of the study area at adequate density. This temporal coverage is sufficient for trend analysis, considering the spatial variation in this factor. From 1965, a total of 331 series was available, providing very good data density with few spatial gaps. This high spatial density enables detailed spatial analysis of precipitation fields, and the recording period is long enough for extreme events analysis.
To assess how the quality control and homogeneity-testing procedures affected the final database, we undertook several analyses using the series available between 1960 and 2000. The variables analysed were: 1) average precipitation per rainy day, 2) average duration of wet spells, 3) average duration of dry spells, 4) number of days with precipitation > 0 mm, and 5) number of days with precipitation greater than 75 mm (extreme events). Preliminary statistical analysis showed that, in general, the range and variability of the precipitation parameters analysed were reduced after the quality control process and homogeneity test, but the average values remained similar. This does not necessarily mean that the spatial average over large areas is unbiased. However, among close observatories the precipitation differences (over a few days or long periods) caused by erroneous measurements or inhomogeneities were removed during the process, making the data more consistent. The procedure avoids spurious results in temporal analysis of precipitation records, but can also reveal spurious differences among neighbouring observatories in future climate studies. We performed a spatial analysis of the dataset at various steps in the process to assess their influence on spatial coherence of the precipitation parameters. For this purpose, we used semi-variance plots from daily precipitation series, which represent the spatial self-correlation of a variable (Figure 15). The semi-variance is half the variance of the differences between all possible points within a given distance. It is normally assumed that the semi-variance increases with increasing distance lag, in accordance with the basic geostatistical principle that closer objects are more similar. We additionally considered that high-quality series will have a lower semi-variance than poor-quality series, especially over shorter distance lags. Both hypotheses were confirmed in our test. For all precipitation parameters the semi-variance increased as a function of the distance lag, and the homogenous database consistently had a lower semi-variance, especially with distance lags less than 25 km. However, the difference among databases was less evident for some precipitation parameters, including the dry spell duration. The results of this analysis support the view that the spatial coherence of the homogeneous dataset is better than in the original and intermediate stages.
This was also evident when correlation coefficients were calculated between pairs of daily precipitation observatories separated by various distances (spatial self-correlogram; Figure 16). The relationship among neighbouring series was noticeably improved after the homogeneity-testing process, as shown by higher r-coefficients. This was a consequence of removal of those data series that differed markedly from adjacent series. Therefore, we believe that the quality control and homogeneity-testing processes described in this paper improved the quality and spatial coherence of the dataset, especially in relation to future spatial and temporal regional analyses.
We have described a process for creating a spatially dense database of daily precipitation in northeast Spain. The main contribution of the research relates to effective use of available data through a combined process of reconstruction, quality control and homogeneity testing. This is significant as very few examples exist of the construction of quality-controlled databases with daily time resolution and regional coverage. The usual approach has been to select long-term and reliable series, and to discard fragmented or short-term data. This results in a significant loss of information, and detrimentally affects the spatial density of the data, reducing the usefulness of the dataset for spatially explicit analyses. Our approach involved reconstruction of spatially close data series to generate a new and unique series. This substantially reduced the loss of data involved in the alternative procedure, but introduced the risk of creating spurious and inconsistent data series. To ensure the quality of the final database, the reconstructed series were subjected to quality control and homogeneity-testing processes consisting of several stages. Analyses of the final database included single site and spatial analyses and confirmed its coherence.
The methodology described in this paper can be readily adapted for use worldwide with other databases having similar characteristics (high spatial density, daily temporal resolution), particularly in areas where long-term precipitation series are rare and fragmented series covering different periods in the same locality are common, as in Spain. Although the methodology involves several steps, the data filling and homogeneity-testing processes are completely automated, and can be adapted to other regions without modification. For quality control purposes, some decisions must be made in advance, such as the optimum threshold values. It is likely that the values used in this work would yield good results with other datasets, but this should be assessed by users through a heuristic process. The only manual procedure in creating the database was during selection of the neighbouring series to be merged in the reconstruction process. Although an automated process could have been used, with the aid of a geographical information system, we chose to oversee this crucial step directly. The experience of the researcher and a good knowledge of the regional climate could be very useful in developing an automated process, based on the distance between observatories and using other auxiliary information layers, such as elevation. In this paper, we have detailed the decisions taken to construct a quality dataset in relation to problems which are typically encountered. It should not be assumed that all the decisions made have universal validity, as it is possible that some of the threshold values or the procedures adopted will need to be modified for application to other datasets, or according to the specific purposes of the research. We encourage other researchers to make similar assessments prior to undertaking any climate study, and we urge discussion of the specific problems associated with the use of high-frequency climate databases.
The database described in this paper is available for use by scientists for research purposes. Anyone interested in using the data is encouraged to contact the authors at the following e-mail addresses: firstname.lastname@example.org and email@example.com.
We would like to thank the Spanish Meteorological State Agency (AEMET) for providing the precipitation database used in this study. This work has been supported by the following projects: CGL2005-04508/BOS (financed by the Spanish Commission of Science and Technology and FEDER), PIP176/2005 (financed by the Aragón Government), and ‘Programa de grupos de investigación consolidados’ (BOA 48 of 20-04-2005), also financed by the Aragón Government. The authors would like to thank José C. González-Hidalgo and José M. García-Ruiz for their helpful comments.