This paper presents an intercomparison and evaluation of gridded temperature and precipitation data sets, based on observations in the Mediterranean and the Middle East region. Using available global and regional data, we investigate the spatial and seasonal distributions of these two parameters, including uncertainties and trends for eight subregions that signify distinct climate regimes. All data sets represent the overall spatial features well though partly with biases. Using the seasonal means, standard deviations and cumulative density functions for the eight subregions, we identify outliers among the data sets. The correlations between data sets are high except for some regional data products. Desert areas such as Saudi Arabia and Libya-Egypt appear problematic due to their sparse station network. Similar upward trends of temperature and downward trends in precipitation are found for most of the region in all data sets, while differences appear in their magnitude and level of significance.
 The subtropical region encompassing the Mediterranean and the Middle East is considered to be a climate change hot spot, as it is expected that surface temperature will rise relative rapidly and precipitation will decrease significantly [e.g., Giorgi, 2006; Intergovernmental Panel on Climate Change, 2007; Lelieveld et al., 2012]. In the past decades research on climate change in the Mediterranean has intensified considerably, with much emphasis on southern Europe [e.g., Lionello, 2012]. The Middle Eastern and North African (MENA) part of the region has many climate-related issues in common, such as steep spatial gradients and pronounced orography, water scarcity, hot summers; while climate change impacts must be considered within a diverse societal context. Unlike the northern Mediterranean, most of the MENA countries do not support a dense network of meteorological observations and the access to high-quality and long-term measurement data is limited. In recent years, there have been several studies addressing the MENA [e.g.,Aesawy and Hassanean, 1997; Hassanean, 2001, 2004; Zhang et al., 2005; Hassanean and Abdelbasset, 2006; Rehman, 2010; El Kenawy et al., 2009; AlSarmi and Washington, 2011], though most studies are limited to a few measurement stations.
 The evaluation of climate models as well as climate change impact studies require consistent and quality controlled data, and our project aims to help provide these. The climate data repository at the Cyprus Institute facilitates collecting, testing and the use of climate-related data sets available for the Mediterranean and MENA, with an emphasis on currently data-scarce countries (http://www.cyi-data.org). Several sources of gridded precipitation and temperature data are available for the region of interest. These data sets are either global or regional and available at different resolutions. The observations used for the various data products differ in numerous ways, including number, quality, spatial and temporal resolution, and the length and consistency of the measurement time series. The number of observations typically varies over the years and between subregions. Other differences may include quality checking and error correction procedures and the use of interpolation techniques. These distinctions may be important, leading to notable differences in the quality of the various products.
 As none of these observation-based data sets can claim to represent actual precipitation and temperature, problems may arise when investigating the impacts of changing temperature and precipitation e.g. on agriculture, ecosystems and water resources. Either individual data sets can be selected or ensemble data may be used if the quality is to some degree tested. The present study investigates the utility of alternate data products for trend analysis by conducting an inter-comparison of the available observation-based gridded precipitation and temperature data sets. In the absence of absolute quality criteria, our goal can neither be a quantitative evaluation of the data products for the region, nor an assessment of their uncertainties. Alternatively, we inter-compare climatology, the variability and trends inferred from the available global and regional data sets for the period 1961–2000, as a complementary source of information on climate change in this region, and check them for consistency and their level of significance.
 The data sets included in this study and their spatial/temporal characteristics are shown in Table 1. The subsections below describe the main characteristics of these data sets including their data sources, interpolation methods and an assessment of errors.
Table 1. Main Characteristics of the Data Sets Used in This Study
University of East Anglia Climatic Research Unit.
Global Precipitation Climatology Center - Variability Analysis of Surface Climate Observations.
Global Precipitation Climatology Center Version 4 - Full Data Reanalysis Product.
University of Delaware.
Climate Prediction Center Precipitation.
Climate Prediction Center Precipitation - Global Historical Climatology Network - Climate Anomaly Monitoring System.
Asian Precipitation - Highly Resolved Observational Data Integration Toward Evaluation of Water Resources.
 The Climate Research Unit has created a 0.5° by 0.5° resolution data set of monthly ground-based climate variables [New et al., 1999; Mitchell and Jones, 2005]. Monthly gridded data of temperature and precipitation over land have been generated from station observations in the CRUTS2.1 data set. The total number of stations for temperature and precipitation varies with time (27075 for precipitation in 2000, and 12783 for temperature in 2000). Data sources include data from Jones and Moberg , the Global Historical Climatology Network GHCN-v2 [Peterson and Vose, 1997], and monthly climate bulletins (CLIMAT). Inhomogeneities have been checked in the station records using an automated method that takes into account incomplete and partially overlapping records and using neighboring stations to construct a reference series against which a candidate series may be compared [Mitchell and Jones, 2005].
 In constructing the monthly grids, an “anomaly” approach has been used where the anomaly time series for each station relative to the 1961–90 mean is calculated, then the gridded anomalies at 0.5° resolution have been added to the well-observed mean climatology from 1961–90 to obtain high-quality estimates of the temporal variations in absolute values [New et al., 1999]. This procedure avoids biases that could result from variations due to differences in station elevations or in the calculation of the monthly values by the different countries. An angular distance-weighted (ADW) interpolation was used for averaging temperature and precipitation anomalies from the eight nearest stations to each grid point [Shepard, 1984; Willmott et al., 1985], whereas the 1961–1990 climatological reference data were interpolated as a function of latitude, longitude and elevation, using thin-plate splines interpolation [New et al., 1999]. Our region of interest shows an uneven distribution of station density between the northern part of the Mediterranean with a density reaching up to 20 stations per grid cell, less than 10 stations over the coastal areas of North Africa and the Middle East and less than 2 for the desert areas in the Sahara and in Saudi Arabia.
New et al. [1999, 2000]provide estimates of the uncertainty associated with the CRU climatology using both an internal cross-validation procedure and a comparison with other available observational climatologies. They found errors up to 40% for North Africa and 46% for Asia in the square root of the generalized cross-validation (RTGCV) for precipitation and 1.8 K/1.2 K for temperature. The errors are found to be largest over regions characterized by poor station coverage and high spatial variability, such as in dry, cold and mountainous areas. These areas are also prone to large interpolation errors. An additional source of uncertainty is the underestimation of precipitation, being collected in rain gauges, due to the undercatch of solid precipitation in cold areas. It results in additional inhomogeneities in the records and the correction requires detailed station metadata, which are not always available [New et al., 2000]. Additional inhomogeneities may be present in the time series of individual grid cells due to a sparse station network or the complete absence of data. The inhomogeneities are related to the extrapolation in data-sparse regions where the variable is relaxed toward the 1961–90 climatological mean when there are no stations within the correlation decay distance.
 The CRU TS3.0 data set for the period 1901–2006 has been released through the British Atmospheric Data Center (http://badc.nerc.ac.uk). The major difference between CRU TS2.1 and CRU TS3.0 is that no new homogenization is explicitly performed in the latter. The homogenization of the underlying data sets and those performed by national meteorological agencies prior to releasing their station data are incorporated. It is not clear whether new observational data sets have been incorporated for the time period 1960–2000 or not. A paper describing the production of the CRU TS3.1 (extending the data set to 2009) data set is in preparation.
2.2.1. CPC PREC/L
 The Climate Prediction Center CPC PREC/L is a station observation-based global, land, monthly mean surface air temperature data set, developed at the Climate Prediction Center of the National Center for Environmental Prediction (NCEP). The monthly gridded (0.5° × 0.5°) Precipitation reconstruction over Land (PREC/L) data has been derived from gauge observations collected in the GHCN-v2, and the Climate Anomaly Monitoring System (CAMS) data sets for the period from 1948 to the present [Chen et al., 2002]. The CAMS relies heavily on CLIMAT station reports, which are conceived to be of high quality. Original data comes from rain gauge observations from over 17,000 stations. The anomaly approach is used to produce the monthly grids. The interpolation of the anomalies is performed using the Gandin optimal interpolation (OI) technique [Gandin, 1965], while the long-term mean (1951–1990) is interpolated usingShepard .
 Comparisons between the data set and independent station observations over the United States have shown very high correlations of about 0.8 and a bias of nearly zero. No additional homogenization is performed to correct the gauges undercatch or to remove the discontinuity introduced from the changes in instruments. Another potential source of error is the absence of accounting for the orography in the OI interpolation method. This might cause serious problems for mountainous regions [Chen et al., 2002].
2.2.2. CPC Temperature
 The CPC Temperature is also a product developed at the Climate Prediction Center for the period 1948-present and uses a combination of stations observations from GHCN-v2 and CAMS. A gridded 30-year (1961–1990) monthly mean climatology is constructed by using the Cressman objective analysis [Cressman, 1959], an anomaly interpolation is then applied to yield full monthly values. Ultimately a topographic adjustment using a spatially and temporally varying temperature lapse rate is applied to account for the impact of orography.
 Comparisons with other global observational temperature data sets show that the quality of CPC temperature is reasonably good and captures the most common temporal-spatial features in the observed climatology and anomaly fields over both regional and global domains [Fan and van den Dool, 2008].
 The GPCC hosted at the German Weather Service DWD (Deutscher Wetterdienst, Germany) is the official precipitation data center of the World Meteorological Organization (WMO). GPCC operationally collects and archives global precipitation data and develops derived data products [Rudolf et al., 1994]. Data sources include synoptic weather observation data (SYNOP) received at DWD via the WMO Global Telecommunication System (GTS) and climatological (mainly 1951–2000) monthly mean precipitation totals at the same stations extracted from GPCC's collection of global normals (with data from more than 70,000 different stations worldwide). Four data products are available: the monitoring product, the first-guess product, the full data reanalysis product and the VASClimO 50-Year Data Set. Only the two latter are used in this study.
 The full data reanalysis product version 4 (GPCC-V4) is based on all stations present in the GPCC data base supplying data for single months. The data set covers the period from 1901 to 2007 and its coverage per month varies from less than 10,000 to more than 45,000 stations. The monthly values are calculated according to the anomaly approach based on anomalies from climatological normals at the stations, or from the GPCC high resolution gridded climatology whenever no station normal is available. The anomalies are spatially interpolated according to a modified spherical adaptation [Willmott et al., 1985] of Shepard's empirical weighting scheme [Shepard, 1968]. Extensive postprocessing is applied to all stations including quality control and harmonization of the metadata, quality-assessment of the precipitation data, selection and intercomparison of the data from different sources [Schneider et al., 2010].
 The VASClimO product (GPCC-VASClimO) is based on data selected with a mostly complete temporal coverage and homogeneity of the time series. The data set consists of time series of 9,343 stations covering at least 90% of the period 1951–2000. A Kriging interpolation is applied to the monthly precipitation totals to provide long-term analyses of area-averaged precipitation time series.
 Two major error sources have been identified for these products: the gauge undercatch and the stochastic sampling error due to a sparse network density. The relative sampling error of gridded monthly precipitation varies between +/−7 to 40% of the true area-mean [Schneider et al., 2010].
2.4. UDel Air Temperature and Precipitation
 The University of Delaware (DEL) data set is a monthly, globally gridded, high resolution station (land) data set for air temperature and precipitation from 1950–2008 [Willmott and Matsuura, 1995]. Station data are compiled from several updated sources including GHCN-v2, the Global Synoptic Climatology Network via the National Climatic Data Center (NCDC), the Global Summary of the Day (GSOD) from NCDC and the archive ofLegates and Willmott .
 The gridded fields are estimated from monthly station data averages using a combination of spatial interpolation methods. A climatology (1961–1990) for each month is produced first by combining average monthly station values from two available climatologies. Employing the climatologically aided interpolation CAI [Willmott and Robeson, 1995], a monthly value at each time series station was differentiated from a climatologically averaged value for that month, which is available at the station or is interpolated to the time series station location. Traditional interpolation, which employs an enhanced distance-weighting method [Shepard, 1968; Willmott et al., 1985], is then performed on the station differences to obtain a gridded difference field. A digital-elevation-model (DEM) assisted interpolation incorporating topographical effects [Willmott and Matsuura, 1995], through an average air-temperature lapse rate, is then used to adjust the climatology, though only for temperature. Finally, the gridded difference field is added to the interpolated (DEM-assisted) estimates of the climatology at the same set of grid points. The precipitation data have not been corrected for raingauge undercatch.
 The E-OBS data set is a European land daily high-resolution gridded data set for precipitation and minimum, maximum, and mean surface temperature for the period 1950–2009 [Haylock et al., 2008]. The E-OBS gridded data set is derived through interpolation of the European Climate Assessment and Data (ECA&D) station data described byKlein Tank . The station data in version 4 of the data set comprises a network of 2316 stations, with the highest station density in Ireland, the Netherlands and Switzerland, and lowest density in Spain, Northern Africa, the Balkans and Northern Scandinavia. The data set has been designed to provide the best estimate of grid box averages to enable direct comparison with Regional Climate Models (RCMs).
 A three-step process has been employed for interpolating the daily data: first interpolating precipitation totals and monthly mean temperature using thin-plate splines, then interpolating the daily anomalies using the Kriging method, then applying the interpolated anomaly to the interpolated monthly mean to create the final product.
 The comparison to existing high-resolution regional gridded data based on much denser station networks over Europe has shown significant mean absolute errors of the order of 0.5°C for temperature and greater than 100% for precipitation, indicating that it compares worst for mountainous regions. Many inhomogeneities were found in the gridded data set and were attributed to inhomogeneities in the underlying station data [Hofstra et al., 2010].
 The APHRODITE data set provides daily gridded precipitation data, and is the only long-term (from 1951 onward) continental-scale daily product that encompasses a dense network of daily rain gauge data for the Middle East [Yatagai et al., 2009]. Data sources include monthly precipitation obtained from Turkey (225 stations), Israel (19), Iran (154) and the precipitation climatology of Chen et al.  for the remaining countries of the domain, and daily precipitation from a total of 1394 stations in Turkey (338 stations). The GTS data were used for areas where data are not available. The Xie et al.  algorithm has been used as a basis for the analysis of daily precipitation, while the monthly precipitation climatology over 30 years was defined and interpolated according to the Shepard  algorithm. Subsequently, a ratio of the monthly climatologies to the world climatology data WORLDCLIM [Hijmans et al., 2005] has been calculated and interpolated using the weighted mean method based on Willmott et al. . The daily precipitation climatology was defined by Fourier interpolation, then the ratio of the daily observation to the daily climatology has been calculated and interpolated by again using the Willmott et al.  algorithm.
 The European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA40) project is a global atmospheric analysis of many conventional observations and satellite data streams for the period September 1957–August 2002 [Uppala et al., 2005]. ERA40 assimilates diverse sources of observational data including station data, satellite data, and radiosondes. The ERA40 surface air temperature analysis was derived by analyzing surface synoptic observations transmitted via World Meteorological Organization (WMO) SYNOP messages [Simmons et al., 2004] and suffers from significant gaps in the coverage of data available for assimilation prior to 1967. The changes in the Global Observing System, especially the availability of satellite data in the mid - 1970s, have affected its performance [Bengtsson et al., 2004]. Unlike the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis (NCEP/NCAR), ERA40 uses screen-level observations [Simmons et al., 2004] for its surface air temperature and is a commonly used reanalysis data set for the evaluation of General Circulation Models (GCMs). Only ERA40 temperature will be used in this study.
 In Appendix A, a map of the station distribution of the data sets is presented. This information is not available for all data sets. APHRODITE benefits from a dense network of rain gauges over Iran and eastern Turkey, while CRUTS3.0 and CPC have less stations per grid cell, though cover larger areas, such as the Levant and North Africa. In the comparison, all data are used at 0.5 by 0.5 degree resolution, and ERA40 has been rescaled to match the same resolution. Monthly surface air temperature and precipitation from all data sets are averaged over each defined region from January 1961 to December 1999. From the monthly values, seasonal averages are then calculated. The selected regions are characterized by different climate regimes and seasonal cycles of precipitation. Only land points are considered in the analysis, and the seasons are defined as December–January–February (DJF), March–April–May (MAM), June-July August (JJA) and September-October-November (SON).
 Our domain is shown in Figure 1. It covers the Mediterranean Basin and the Middle East, excluding the southern part of the Arabian peninsula, to be able to include the APHRODITE data set. Figure 1 also shows the eight subregions selected for a more detailed analysis. These subregions were chosen according to their climate classification categories based on Kottek et al. . Only regions with clearly dominant climate categories (Table 2) were considered with the exception of R6 for which it is difficult to distinguish a dominant category. The subregions have also been selected according to the high population densities, i.e. mainly along the Mediterranean coast with an attempt to include the largest cities, which also have the largest network of observations.
 Empirical cumulative distribution functions (cdf) F(x) are plotted for each subregion, for all seasons and for all data sets. The empirical cdf F(x) is defined as the proportion of X values less than or equal to x. To provide a measure of the phase association between data sets, the correlation coefficient is calculated for the monthly time series and for the deseasonalized time series by removing the monthly means for the original monthly time series. The temperature and precipitation variability analysis is measured using a locally weighted scatterplot smoothing (LOESS) regression of the anomalies. The variability is measured by the temporal standard deviation (SD) for temperature. For precipitation, the coefficient of variation (CV) is used, which is the standard deviation normalized by the 40-year average.
 Climatic trends are calculated for the monthly and the seasonal values for the period 1961–2000 using the non-parametric Theil-Sen's slope [Sen, 1968]. The trends are tested using the non-parametric Mann-Kendall test [Mann, 1945; Hipel and McLeod, 2005]. The statistical significance is determined at the 95% confidence level.
 R software was used to derive and plot the LOESS functions, the correlations and the Theil-Sen's slopes. Ferret software has been used to plot the geographical patterns.
4.1. Data Sets Intercomparison of Climatology
4.1.1. Spatial Characteristics
 The long-term annual mean precipitation and temperature for the period 1961–2000 are depicted inFigures 2 and 5. Due to the large number of data sets and for the sake of clarity we only compare the annual means. A more quantitative comparison is presented for the seasonal climatologies using the subregional spatial precipitation and temperature averages. To estimate the overall biases among data sets and their spatial distribution, we consider the spread among the precipitation and temperature data sets and the coefficient of variation among the precipitation data sets only. The spread SE(i, t) is represented by the ensemble standard deviation:
where E(i, t) is the ensemble data set at position i and time t:
xn(i, t) is the nth ensemble member (or a specific data set) at position i and time t, and N = 8 is the number of ensemble members for precipitation and N = 6 for temperature.
 The coefficient of variation CV corresponding to the spread is given by
Figure 2shows a number of topographically induced regional features, such as maxima over the Alpine region and the Balkans (e.g. the Dinaric Alps), the western coastal ranges of the Iberian Peninsula and the eastern margins of the Black Sea coast. Other less pronounced features in the rainfall distribution are present with different magnitudes in all data sets such as the eastern Mediterranean coast, the Maghreb region (N-Africa from Mauritania to Libya) and the Zagros Mountains.
Figure 3indicates the bias for each data set calculated against the ensemble reference data set. For the bias calculations it should be mentioned that the two CRU data sets deviate only little and thus influence the ensemble mean relatively strongly. This also applies to the GPCC data sets, though the differences between them are larger. Generally the CPC, DEL, GPCC-VASCLimO and GPCCV4 indicate wetter conditions compared to the other data sets for all regions, whereas E-OBS and APHRODITE are relatively driest. CPC shows wetter conditions over Spain, Turkey, the eastern Mediterranean and the eastern Maghreb. DEL appears to be unrealistically wet south of the Caspian Sea and the eastern part of the Fertile Crescent. The two GPCC data sets show large differences compared to the ensemble data set. GPCC-VASCLimO is wetter over the northern part of the Arabian Peninsula, the southeastern Black Sea and the western Caspian Sea and drier over Portugal and Morocco, whereas GPCCV4 is wetter over the Zagros Mountains, southern Caspian Sea coast, eastern Black Sea coast, the Iberian Peninsula and Morocco. The differences are attributed to the number of stations used in each data set, and the homogenization procedure used in GPCC-VASCLimO though not in GPCCV4. The CRU data sets have similar biases in all regions, with dry biases over the southeastern Black Sea and southern Caspian Sea and wet biases over the Balkan mountains, the southern Alps and the western Atlas mountains. This indicates that the new homogenization method applied in CRU3 has little effect on the resulting data set for our region of interest. E-OBS shows very dry conditions in eastern Turkey in comparison to the other data sets, which might be due to a sampling error associated with the poor gauge coverage in the underlying ECA&D station data set. APHRODITE is generally drier in the Balkan region and over Saudi Arabia and the eastern Mediterranean and wetter over the Zagros Mountains and south of the Caspian Sea. This data set includes measurements from a dense rain gauge network in Iran, which may have improved the representation of orographically influenced precipitation in this area.
Figure 4presents the time-averaged ensemble mean, spread and coefficient of variation for precipitation. The ensemble spread is found to be large over the eastern part of the Balkans, the southeastern Black Sea coast and the eastern part of the Fertile Crescent. The corresponding coefficient of variation exceeds 60% over northern Saudi Arabia and eastern Iran, where the mean precipitation is relatively low. It is also very high over the Sahara, where precipitation is near-absent.
Figure 5 shows that the main temperature patterns are captured quite similarly by all data sets (e.g. the asymmetry between the northern and the southern part of the domain). From Figure 6 it is clear that ERA40 has an overall warm bias in comparison to the ensemble mean. We expect this to be related to the relatively large influence of the analysis model considering the small number of stations included in the reanalysis. In addition, the smoothing of the topography in the analysis model may also play a role. The comparison against the ensemble temperature data set, shown in Figure 6, shows a general warm bias over Turkey, Iraq and northern Iran for CPC, DEL and the two CRU data sets. The CPC, CRU2 and CRU3 are rather colder over southern Iran, while E-OBS has a cold bias over the Maghreb and Turkey. The most problematic region for all data sets appears to be the southern part of Iran due to the absence of station data in this region.Figure 4 (right) indicates large ensemble spreads over eastern Turkey, Iraq and Iran in the Middle East and over the western Maghreb and the western Balkans. These regions have most problems with observational data sets; very sparse station coverage especially in desert areas and possible orographic effects in mountainous areas, and issues with representativeness related to the prevalence of low elevation and valley stations compared to high elevation ones [New et al., 1999, 2000].
4.1.2. Means and Standard Deviations
 Seasonal means of precipitation and temperature are compared for all data sets for the period 1961–2000. For temperature the standard deviation for each data set is calculated as the average of the standard deviation of each grid-cell. For precipitation the standard deviation is normalized by the total average, yielding the coefficient of variation.
 The seasonal mean precipitation and its coefficient of variation for each data set and for the different subdomains are presented in Figures 7 and 8, followed by the seasonal mean temperature and its standard deviation in Figures 9 and 10. In general the values are very close to each other for all subregions and seasons. The temperature means are very similar, with the exception of ERA40, which has a warm bias, particularly apparent during summer in subregions R1, R2, R5 and R8. The precipitation means show relatively large differences between data sets and subregions. Further, the winter values show more variability among the data sets than the other seasons. In particular the CPC data set appears to overestimate the winter precipitation in the subregions R2 and R3 and APHRODITE clearly underestimates the precipitation in all seasons and all regions for which this data set is available except in the Gulf region. Over the Balkans, E-OBS shows lower precipitation values in all seasons than is indicated by all other data sets.
 Although each data set is based on its own selected station network, with a probable substantial overlap, E-OBS and APHRODITE include a higher density of stations than the other data sets, notably in the Balkans in E-OBS and the region covering Turkey, the Levant and Iran in APHRODITE. This is expected to affect the quality of the data product. Moreover, the comparison of E-OBS with higher-density network data sets over different parts of Europe has shown that E-OBS is biased toward lower values [Hofstra et al., 2010]. Over Saudi Arabia, very low precipitation values are indicated by APHRODITE, leading to a very high coefficient of variation. The latter is larger for very dry regions in the summer season because of the very small mean precipitation values in the denominator. This is probably due to the account of regions where no station data are available, such as Saudi Arabia. In these regions no interpolation was made and a zero value was assigned. The CV over most of the sub-regions is similar for all seasons, except for R7, R8 and R3 in June, July and August (JJA) which are the ones with least precipitation. The temperature standard deviation tends to be high in winter over R5 and R6. This might be related to the multiple climate zones present in these regions.
4.1.3. Empirical Distribution Functions and Correlations
 The empirical distribution functions of temperature and precipitation are presented by season and region in Figures 11 and 12, respectively. The data sets span much the same values with some exceptions.
 Relatively large and systematic differences in the temperature distributions are found between the E-OBS data set and the other distributions for R1, R2. For the subregion R5, the distribution shows similarities only with DEL. The poor station coverage of the ECA-D station data set, the underlying station data, is basic to this result, which makes the potential use of this data set to study these subregions questionable. Similarly ERA40 shows large differences in comparison with other data sets in the Gulf region. Overall, the largest differences among data sets are found for the R5 subregion. These results are consistent with the findings ofsection 4.1.1.
 The temporal correlation coefficient has been calculated for all pairs of the monthly values of all data sets in Figure 13. Very high correlations, all above 0.9, are found for the temperature data sets. This reveals a very good phase association between data sets in the region. Additionally, correlation coefficients are calculated for all pairs of the deseasonalized temperature time series in Figure 14. The subregions R4, R5 and R6 show a near-perfect phase in long-term variability between data sets with very high correlation coefficients (between 0.9 and 1). This indicates a low spatial variability of temperature and the absence of strong gradients. Lower correlations (0.75) are found for ERA40 in R8.
 Unlike the E-OBS temperature distributions, the E-OBS precipitation distributions inFigure 12appear consistent with the other distributions in the regions accounted for in the data set. As seen above, CPC overestimates precipitation over the desert subregions R3 and R7. Large differences between data sets are also apparent over Saudi Arabia. The APHRODITE data set indicates almost no precipitation in all seasons. As mentioned in the previous sections the two GPCC products show different distributions over this subregion while GPCC-VASClimO is associated with overestimations in winter and fall. As discussed in the data set-description section, GPCCV4 uses all the stations available in the GPCC database and an anomaly approach (the interpolation of anomalies) to compute the monthly values, whereas GPCC-VASClimO uses selected stations based on their quality, and the monthly total precipitation values are directly interpolated. The anomaly approach has the advantage of reproducing the climatological value when station data are missing. On the other hand, GPCC-VASClimO has complete data coverage and data homogeneity of the time series, which makes it more suitable for climate variability assessments in specific locations.
 The precipitation data sets show lower correlation coefficients in Figure 15in comparison to the temperature data sets, which is an indication of the high spatial variability of precipitation, characterized by strong gradients. The correlation is lowest for R3 and R7, which are the regions with the large desert areas, and thus the least number and lowest quality of available observations. E-OBS exhibits low correlation coefficients with other precipitation data sets for R1 and R2, while it highly correlates with the other temperature data sets. The correlation coefficients of the deseasonalized precipitation time series presented inFigure 16 are generally lower than the latter, especially over R3 and R4. These results underscore the high level of uncertainty in precipitation observations related to space and time dependencies, which makes it difficult to quantify the error and the to identify the error sources. The similarities and differences between data sets result from a combination of factors: notably the underlying station data and the precipitation estimation methodology. Therefore, it is not possible to draw firm conclusions and assign superiority/inferiority qualifications to data sets.
 The mean annual temperature and precipitation anomalies over the period 1961–2000 have been calculated and smoothed by a local regression method (LOESS). All temperature and precipitation curves are presented for each region in Figures 17 and 18, respectively. Trends have been calculated using the monthly and the seasonal time series by applying the Theil Sen's Slope and tested with a Mann-Kendall test.
 A warming trend is derived for all data sets in all subregions as shown in Figure 17. The trend appears to have reversed from a slight cooling to warming in the 1970–1980 period. To explain this negative temperature trend in the Mediterranean, Lelieveld et al.  have hypothesized that an anthropogenic aerosol cooling, mainly due to sulfate, was important before 1980, but has reversed sign since then due to the decreasing European SO2emissions. In the Maghreb region the upward trend started in the middle 1970s whereas in the Balkans and the rest of the Middle East it started in the early 1980s. Again E-OBS is associated with relatively large anomalies compared to the other data sets over R1 and R2 consistent with the results above. Similarly ERA40 is accompanied by irregular anomalies in R7 and R8.
 The trend, calculated and statistically tested at 95% confidence level for monthly temperature time series for the period 1961–2000, is given in Figure 19. Almost all data sets show a significant trend, with the exception of R5 for which we do not find a significant trend for any of the data sets. The data sets present similar trend magnitudes for all regions with few exceptions: the E-OBS trends (0.39°C/decade and 0.52°C/decade) over R1 and R2 are much larger than the other data sets and seem unrealistic. Consequently a trend analysis for this region using E-OBS should be interpreted with caution. ERA40 is the only data set that shows a negative trend over the Gulf region and anomalous regressions for R7 and R8 inFigure 17. The smallest trends are found over the Balkan region, with a minimum positive trend between 0.04°C/decade for CRU3 and a maximum trend of 0.17°C/decade for CPC. The seasonal analysis in Figure 20 indicates the following features: all regions show a significant and strong warming trend in the JJA season, and R4 has a similar trend magnitude also in winter (0.65° C/decade in DJF and 0.72° C/decade in JJA). Strong trends are also found in the SON season over Saudi Arabia and the Gulf region, while over the Maghreb the trends are significant for the MAM, JJA and SON seasons. The summer warming trend is consistent with the findings of Giorgi  using the CRU data and Xoplaki et al. , who focused on the Mediterranean region as a whole, as well as region-based studies such asTayanç et al. for Turkey, Kafle and Bruins  for Israel and AlSarmi and Washington  for the Arabian Peninsula. Table 3 summarizes the trend analysis by region.
Table 3. Summary of the Temperature Trend Analysis Presented by Subregion
increasing annual trend starting from mid 70s, trend magnitude around 0.2°C/decade, mainly significant in MAM and JJA (for all data sets).
increasing annual trend starting from mid 70s, trend magnitude between 0.2° and 0.4°C/decade, significant in MAM, JJA and SON.
increasing annual trend starting from about 1980, trend magnitude between 0.2° and 0.3°C/decade, significant in all seasons.
decreasing annual trend until 1980 and then increasing trend, overall positive trend with a magnitude between 0.1° and 0.2°C/decade, significant in DJF and JJA.
decreasing annual trend until 1980 and increasing trend starting from about 1990, overall non-significant.
increasing annual trend starting from 1970, overall positive trend with a magnitude between 0.1° and 0.2°C/decade, significant in DJF.
increasing annual trend starting about 1990, positive trend around 0.2°C/decade, significant in MAM and JJA.
overall positive trend with a magnitude between 0.2° and 0.4°C/decade significant in JJA and SON.
 It is evident from Figure 18that a decreasing trend in precipitation has taken place during the period 1961–2000 over the Mediterranean whereas an opposite trend is observed in the Gulf region. For R3 and R7, both characterized by very low precipitation values, no clear trend can be identified. The E-OBS curves for R1 and R2 show again large anomalies compared to the other data sets.
 The regional trend signals shown in Figure 21 are less significant than for temperature, with only about 38% of the trends being statistically significant at the 95% confidence level. Significant trends for almost all data sets are found for Morocco, the Balkans (negative) and the Gulf (positive). Although small, the latter is consistent with the future projections for the 21st century presented by Evans , using 18 global climate models participating in the Intergovernmental Panel on Climate Change Fourth Assessment Report, and Lelieveld et al.  who used a regional climate model for the period 1951–2100. The seasonal analysis in Figure 22 corroborates the significant trends found for the monthly values, showing that the negative trend in the Balkans occurred in winter, while it took place in spring and summer in Morocco. Over Turkey and the Levant, all data sets show a negative trend only in winter. Similar winter trends have been deduced by Yatagai with significant drying tendencies in Turkey and Israel, using a 34-year monthly precipitation data. A summary of the precipitation trends by region is presented inTable 4.
Table 4. Summary of the Precipitation Trend Analysis Presented by Subregion
decreasing trend, negative trend between −2 and −4% per decade, significant in MAM and JJA.
decreasing trend starting in mid 1970s, overall non-significant.
decreasing annual trend until 1990, slightly reversed afterward, overall negative trend between −2 and −5% per decade, significant in DJF.
decreasing trend, significant in JJA at about −10% per decade.
decreasing trend since the late 1980s, overall non-significant.
contradicting trends among data sets, positive trend in JJA between 15 and 20% per decade.
increasing trend, positive trend significant up to 2% per decade.
 Global gridded data sets of temperature and precipitation have been compared over the Mediterranean region and the Middle East for the period 1961–2000. The intercomparison of the main statistical features has revealed differences among the global (CRU, CPC, DEL, GPCC, ERA40) and regional data sets (APHRODITE and E-OBS). In the former group, the ERA40 spatial and empirical distribution functions indicate large discrepancies relative to the other data sets, especially evident in the Arabian Gulf region. This data set also indicates anomalous trends for Saudi Arabia and around the Gulf. Therefore its use in studies of trends, climate change impacts and model evaluation is questionable for these subregions. The other global temperature data sets agree well for all subregions. E-OBS is the only regional product available for North Africa, but appears to be flawed for the Maghreb. Its large variability and overestimation of temperature in all seasons compared to the other data sets affect also the computed temperature trend. An evaluation of this product against long-term historical station data in this region will be necessary before its use can be recommended in impacts studies. For precipitation two subregions seem to be problematic: Libya-Egypt and Saudi Arabia, both characterized by very low precipitation rates and most probably associated with sampling issues related to the sparse rain gauge coverage. As a unique source of precipitation data for the region of the Middle East with its relatively high density of station data, APHRODITE shows little correlation and similarities with the other data sets over Saudi Arabia. A new version of APHRODITE (V1101) [Yatagai et al., 2012] has been released recently, with a domain extended to the south to include additional stations. This will likely improve its use for climatological analyses.
 The upward trend of temperature and the downward trend in precipitation indicated across the data sets agree in direction but often not in magnitude with studies focusing on the Mediterranean and the few climate change studies in the Middle East. In general, the MENA region is typically affected by both the lack of high quality data and the yet relatively modest interest by the climate research community. It would be highly desirable to increase the research efforts and the number of measurement stations, make available all existing meteorological data sets and invest into quality control procedures.
 Our analysis emphasizes that substantial differences exist between data sets, and only limited conclusions about their quality and the robustness of trends can be drawn. All data sets are associated with uncertainties, related to the limited number of stations included, the accuracy and the representativeness of the measurements and the influence of the schemes used to interpolate the sparse data. Unfortunately it is not yet possible to formulate quantitative recommendations, although some data sets are associated with outliers and anomalies, hence they should be used with caution. Uncertainties in observations should be considered when gridded data are used for numerical model evaluation. The choice of observational data sets might greatly affect conclusions about model skills. Furthermore, it may well be that for specific applications in certain regions, certain data sets are more appropriate than others. The preferred approach would be to use multiple data sets based on the spread among data, and to pursue qualitative assessments of uncertainties associated with climate observations.
Appendix A:: Station Distribution for Selected Data Sets
 In this section, a map (Figure A1) of the station distribution of the temperature and precipitation data sets is presented. Only data sets, for which this information is available, are shown.
 The authors thank all the data providers for the use of their data sets. The authors wish to acknowledge the use of the Ferret program for the spatial distributions figures in this paper. Ferret is a product of NOAA's Pacific Marine Environmental Laboratory (http://ferret.pmel.noaa.gov/Ferret/). The rest of the calculations and plots in this paper were done using the R software (http://www.r-project.org/) (part of the GNU project). The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007–2013)/ERC grant agreement 226144.