This study was supported by the European Union EURO4M project (FP7-EC Cooperation Theme 9, SPACE, grant no. 242093).
Historical climatic data from station observations taken in North African and Middle East Mediterranean countries since the second half of the 19th century have been digitized and quality-controlled in the framework of the EU-funded European Reanalysis and Observations for Monitoring (EURO4M) project. Daily maximum and minimum temperatures and precipitation totals, along with sub-daily data for surface air pressure have been recovered by using historical data sources involving book/logbook collections archived in national and international data centres. The new dataset produced comprises climatic time series for 79 stations that have operated in southern and eastern Mediterranean countries. While the developed time series have data gaps, every effort has been made to infill these gaps, to improve assessments of the long-term changes in climate variability in the region.
Creator: Centre for Climate Change (C3), Department of Geography, University Rovira i Virgili
Title: C3-EURO4M-MEDARE Mediterranean historical climate data
Authors: Centre for Climate Change/URV
Publication year: 2013
Resource type: Book
A better understanding of the physical mechanisms of the Mediterranean's climate variability is crucial for developing advanced projections of future climate. To achieve this, a basin-wide knowledge of historical climate variations, with high temporal resolution, and over long-term scales is required to assess climate model simulations. Such knowledge requires long-term and high-quality climate time series, whose current availability is uneven in the Mediterranean, as the northern-basin countries (belonging to Europe) enjoy good data coverage, whereas the southern part (North Africa and Middle East) is a data-sparse region (Brunet et al., 2013). In addition, station-based data are also needed as input to more accurate reanalysis and gridded datasets. The southern data paucity is not a result of the lack of measurements, since meteorological observations were taken since the mid-19th century by former colonial endeavours. Although deserts dominate the area, individual stations or networks were deployed in the populated parts of southern and eastern Mediterranean locations and the meteorological records taken were often published in periodical publications.
In this context, the EURO4M project (http://www.euro4m.eu/index.html), in connection with the World Meteorological Organization (WMO) MEditerranean DAta Rescue (MEDARE: http://www.omm.urv.cat/MEDARE/), has set, among other objectives, the recovery of historical climate data from North Africa and Middle Eastern Mediterranean countries; namely Morocco and Spanish enclaves, Algeria, Tunisia, Libya, Egypt, Cyprus, Lebanon and Syria. This data rescue (DARE) effort has been carried out in coordination with other relevant DARE Initiatives and projects to avoid duplication and maximize resources. These other initiatives include projects such as the French historical climate and weather observations rescue project entitled Access to climate Archives despite Asbestos – (AAA; Jourdain & Dandin, 2011), the international atmospheric circulation reconstructions over the earth initiative (ACRE; http://www.met-acre.org/Home; Allan et al., 2011) and the European ReAnalysis of Global CLIMate Observations project (ERA-CLIM; http://www.era-clim.eu/).
The chosen climatic variables are atmospheric daily minimum (TN) and maximum temperature (TX), daily precipitation total (RR) and sub-daily air pressure (PP), all observed at meteorological stations that have operated in these countries. For air pressure, especially the historical records, this often represents air pressure adjusted to sea level (SLP). Although the period of interest was the pre-1950, the spatial and temporal span of the data recovered was finally dictated by the data sources located and accessed, which are described in Section 'Data sources used and rationale for meteorological station selection'. Details on the quality controls (QC) to which the digitized data are subject are given in section 'Data digitization and quality control', while section 'Dataset structure and future prospects' concludes by outlining the structure of the new dataset developed and provides some notes on future prospects.
1 Data sources used and rationale for meteorological station selection
The data sources used were sought in worldwide online repositories and at national archives containing historical climate data document collections, such as meteorological logbooks, yearly books, or weather charts. In most cases, these data sources are series of scanned volumes containing data at different time scales covering the station network of a country (often including data from adjacent countries), or for a specific observatory/station. Before the Second World War, they were published mainly by French, British and Italian colonial authorities, whereas national authorities administrated the meteorological services since independence and organized the respective data publications. These publications are secondary data sources, since they are transcriptions of original meteorological logbooks gathered from various stations. They have the advantage of having passed a data quality screening (this is indicated by some comments found next to the values and also by monthly summaries of data corrections), but they may also include transcription errors that occurred during the transference from the original to the secondary source.
Most of the data sources used were located in the online repository of the Central Library of the US National Oceanic and Atmospheric Administration (NOAA) which comprises digital (scanned) versions of many meteorological data collections from all over the world (developed in the framework of NOAA/NCDC Climate Database Modernization Program, 2000–2011; http://docs.lib.noaa.gov/rescue/data_rescue_home.html). Climatological departments of other national meteorological agencies also provided digital and scanned data documents from their archives. Météo-France provided Tunisian daily data series digitized in the framework of the CIRCE project (http://www.circeproject.eu), and also scanned copies of French data publications. The UK Met Office made available scanned copies of British colonial-era data collections (ACRE initiative) through the British Atmospheric Data Centre (BADC; http://badc.nerc.ac.uk/browse/badc/corral/images/metobs). The Libyan National Meteorological Center (LNMC) made available data catalogues for various stations from the country. The Spanish meteorological agency (AEMet) provided scanned copies of bulletins including data for stations in North Africa. Finally, at the library of the Ebro Observatory (Tortosa, Spain), supplementary data books were located which filled in data gaps within the overall climatic data series being recovered. Table 1 provides a list of the various climatic data books/collections used from all these data centres for recovering Mediterranean historical climate data. NOAA's Central Library was the main data source for the dataset development (71% of the total data), whereas the imaged data acquired from Météo-France (11%), LNMC (11%), UK Met Office (3%), AEMet (2%) and Ebro Observatory (2%) also played an important role, especially for specific countries and stations.
Table 1. Collections of climatic data sources used
Annales du Bureau Central Météorologique de France
Algeria, Egypt, Lebanon, Tunisia
NOAA, Météo–France, Ebro observatory
Annales de l' Observatoire de Ksara
Annuaire de la Société Météorologique de France
American Univ. (Syrian Protestant College) – Lee Observatory. Beirut
Bulletin Climatologique Mensuel du Liban
Bulletin Météorologique de l'Algérie
Algeria, Morocco, Tunisia
Bolletino Meteorologico della Cirenaica
Bulletin de Météorologique du Maroc
Algeria, Morocco, Spain
Bollettino Meteorologico dell'Africa Italiana
Boletín Meteorológico Diario de España
Cairo. Meteorological Reports
CIRCE-project digital data files
Egypt. Daily Weather Reports
Helwan Observatory Meteorological Reports
Libyan National Meteorological Center Archives
Monthly Climatological Data. Syria
Service Météorologique de Tunis
UK Climatological Returns
UK Met Office
UK Daily Weather Reports
UK Met Office
For each EURO4M/MEDARE-targeted country, the meteorological stations selected for data digitization followed this rationale:
Stations that have the longest and most complete historical records, either on their own or in combination with other records from different sources.
Stations for which there is a potential of merging their data with digitized series existing in climatic national and international databanks (spanning recent and current decades) and, therefore, may lead to the development of long-term climate time series.
Stations that form a network covering the Mediterranean part of each country, i.e. within a zone extending no more than ˜200 km from the coastline (only a few exceptions were made, the most prominent being the remote El Golea station in Algerian Sahara), and having a roughly even spatial distribution.
The 79 stations selected are listed in Table 2, while the location of their sites is shown in Figure 1.
Table 2. List of stations, climatic variables and data periods recovered
TX, daily maximum temperature; TN, daily minimum temperature; RR, daily precipitation amount; PP, sub-daily air pressure observations.
Using the data sources mentioned above, data digitization for the selected stations was key-entered and carried out with special care. The varying quality of the hand written or typed data pages and their scanned copies posed many difficulties when digitizing the data: scanned pages were sometimes too dark or too faded and this affected the readability not only of meteorological data but also of their corresponding dates. Therefore, date identification was crucial and time-consuming, since there are cases of missing data pages, double/triple copies of the same page or deviations from an ascending chronological page order found in the data books used. All these cases were potential sources for errors affecting the accuracy of the digitized data files; potential mistakes that without a visual cross-checking could not have been avoided and had possibly introduced non-systematic biases and additionally potentially compromised data reliability for use in future applications.
Data QC was the next step and comprised three stages:
Visual cross-comparison between the data source and the digitized data to verify the fidelity of digitization (transcription accuracy): sample data were examined across the overall data period to check if the correct station was indeed used (especially in the case of multi-station data pages), if the dates were correctly assigned and if the targeted climatic variables were correctly transcribed.
Automatic QC to identify non-systematic errors in time-series: the RClimDex software package (Zhang & Yang, 2004), reinforced with the ‘extraQC’ software (Aguilar & Prohom, 2011) were employed to identify potential temperature and precipitation data errors. The latter tool is an improved, version of the standard ‘RClimDex’ software and performs a series of additional tests to further ensure internal consistency (e.g. consecutive identical values and rounded values) and temporal coherency (large inter-daily differences), in addition to the usual gross-error and tolerance tests. Suspicious values were labelled and examined against the data sources to validate or reject them and, therefore, to either retain them or set them to missing, accordingly. For air pressure data QC, various statistical tests were developed aimed at identifying cases of extreme low/high air pressure records and also cases of zero-variance (‘consecutive identical values’) or high variance (‘jumps’ or ‘outliers’) for consecutive-day observations (and also for consecutive intra-day observations, if available).
Cross-station data checks by plotting, in parallel, data from two or more nearby stations to examine the inter-station consistency and ensure spatial coherency. Digitization and potential data source errors were identified as in the previous stage.
To deal with potential data source errors, ancillary data/information details were sought in the data books: data from nearby stations, the general weather setting (e.g. cloudiness, rainfall, wind direction/strength, weather charts), and reports of extreme meteorological events. If the information gathered could support the credibility of an unusual/suspicious datum value, the datum was left unchanged. Otherwise, a datum value change was made by setting it to a missing value (−99.9), unless the correct original value could be deduced from the ancillary information using expert judgement. The latter correction was made in certain cases, such as the swapping of Tx and Tn data, the adjustment of temperature values by multiples of 10°C, the derivation of the correct pressure value from the isobar lines drawn on the accompanying weather charts (if available). It should be noted that the data source error correction scheme was a conservative one: changes were made only when the data values appeared to be clearly unrealistic and replacement values were inserted when there was a strong certainty about them (based on the consultation of ancillary information). Overall, 0.5% of the data digitized were eventually corrected through the multi-stage QC procedure, with ~10% of them corresponding to data source error correction (half of these corrections involved substitution with missing values, while for the rest a new, corrected value was introduced, as explained above). A summary of the automatic QC results is provided in Table 3 and their traceability ensured by the accompanying documentation to the dataset provided as supporting information to this article (C3-EURO4M-MEDARE_documentingQC.txt).
Table 3. Results summary of the quality controls' (QC) applied to the daily minimum temperature (TN), daily maximum temperature (TX), daily precipitation (RR), hourly air surface pressure (PP) series
Data source errors
TN, TX, RR
Figure 2 shows the data volumes recovered per climatic variable and year. For Oran station, with the most ancient data recovery, there are years with data since the 1850s (Figure 3). Several other stations have data series starting in the late 1870s and 1880s, and the yearly amount of data then increases till the mid-1930s. The data recovery is limited over the Second World War period, increases again in the 1950s and is of only a modest amount in the 1960s–1970s. Only for one station in Libya, at Tripoli (Sidi El Mesri), the data recovery extended into the 1980s and 2000s (Figure 4). Missing volumes in the data source collections used led to distinct data amount minima for some years within the recovery period (as shown in Figure 2). The right-most column in Table 2 provides the data temporal range for every station recovered. Despite the use of multiple data sources to achieve an as complete as possible recovery of station data, the station series still have missing daily data and certain multi-year gaps exist within their temporal span (see Figures 3 and 4). Much of the missing data in from the last 3–4 decades and these data are likely digitized in National Meteorological Service (NMS) archives. Unfortunately, at present, these daily data series are not freely available.
3 Dataset structure and future prospects
The quality-controlled dataset developed comprises daily data for the 79 stations selected. The dataset consists of four data files, each of them including all station time series for each of the climatic variables targeted (TN, TX, RR and PP) in ASCII format. Data values in these times series run continuously from the starting year (1852) to the final year (2008) of the data recovery period, even if in some intervals (days, months, or years) there were no data recovered: missing data values (i.e. −99.9) were used for those data gaps. While the minimum/maximum temperature and precipitation data are accompanied by the respective date data (year, month, day of month), for air pressure data the observational hour is additionally provided.
The dataset is accompanied by a ‘readme’ file with information on the data file format and the station meta-data: station names, approximate WMO codes, geographical coordinates, climatic variables recovered, data period ranges, data sources used, time coordinates of observational times (local or UTC) and the periods with original (unadjusted to sea level) air pressure data.
This new dataset developed aims to cover a major data gap which has limited our knowledge on long-term climate variability in the southern and eastern Mediterranean regions. Although one of the station records recovered goes back in time to the mid-19th century, most of the time series start in the late 19th century or early 20th century and are far from being continuous with many data gaps remaining to be filled. It is expected that after merging the time series included in the C3-EURO4M-MEDARE dataset, combining them with additional digital data from other data-banks (principally NMS databases to cover the data gaps in recent decades), and once the temperature and precipitation series are subjected to homogenization, the time series will provide an advanced insight into the history of the Mediterranean climate.
The dataset development was funded by the European Union, Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 242093 (European Reanalysis and Observations for Monitoring – EURO4M project). CIRCE has been a project funded by the European Union (FP6/2002-2006, n0 036961). Olivier Mestre and Sylvie Jourdain provided the CIRCE Tunisian data. Khalid Ibrahim El Fadli provided the LNMC data for Libya. David Mallol, Clara Lopez, Gisela Ponce, Nilo Nagera, Alberto Fernández, Victor Vidal, Juan Jose Ferreras, Mireia Sánchez, Nolia Tomás, Roger Dobon, Sara Barceló, all of them students at URV, have contributed to the dataset development by digitizing data books and performing the initial data quality control.