This study is an extensive revision of the Climatic Research Unit (CRU) land station temperature database that has been used to produce a grid-box data set of 5° latitude × 5° longitude temperature anomalies. The new database (CRUTEM4) comprises 5583 station records of which 4842 have enough data for the 1961–1990 period to calculate or estimate the average temperatures for this period. Many station records have had their data replaced by newly homogenized series that have been produced by a number of studies, particularly from National Meteorological Services (NMSs). Hemispheric temperature averages for land areas developed with the new CRUTEM4 data set differ slightly from their CRUTEM3 equivalent. The inclusion of much additional data from the Arctic (particularly the Russian Arctic) has led to estimates for the Northern Hemisphere (NH) being warmer by about 0.1°C for the years since 2001. The NH/Southern Hemisphere (SH) warms by 1.12°C/0.84°C over the period 1901–2010. The robustness of the hemispheric averages is assessed by producing five different analyses, each including a different subset of 20% of the station time series and by omitting some large countries. CRUTEM4 is also compared with hemispheric averages produced by reanalyses undertaken by the European Centre for Medium-Range Weather Forecasts (ECMWF): ERA-40 (1958–2001) and ERA-Interim (1979–2010) data sets. For the NH, agreement is good back to 1958 and excellent from 1979 at monthly, annual, and decadal time scales. For the SH, agreement is poorer, but if the area is restricted to the SH north of 60°S, the agreement is dramatically improved from the mid-1970s.
 The purpose of this paper is to revise, improve, and update the gridded land-based Climatic Research Unit (CRU) temperature database (CRUTEM4), last documented by Brohan et al.  (CRUTEM3). There are two principal reasons for such an analysis at the present time. First, some years have passed since it was last undertaken and significant changes and improvements have been made to the availability of monthly average temperature data in real time. The second reason is that several national and other initiatives (coordinated by National Meteorological Services, NMSs) have also dramatically improved the quantity and quality of monthly mean temperature data available. Some countries have extensively homogenized significant parts of their entire national holdings, releasing the results for all to use. Both these developments should improve the coverage of available data.
 Despite these improvements to the quantity and quality of data available, it is not expected that major changes will occur in the hemispheric-average series, as at these scales the existing averages are highly robust. The principal reason for expecting only small changes is that time series of the many thousands of station records are not statistically independent of each other. The number of statistically independent locations (at time scales above annual) over the Earth's surface has been estimated by several authors to be about 100 or less (see discussion by Jones et al. ). The improvements to data quality and quantity in the present study, though, should impact individual grid-box series and analyses of spatial patterns.
 The paper is organized in the following way. Section 2 extensively discusses the sources of additional data used in CRUTEM4 and the challenges of merging, replacing, and updating the existing station-based records. Section 3 discusses the gridding technique used to develop the improved grid-box data sets. Section 4 presents extensive comparisons of the new analyses with those already available, illustrating the improvements in coverage. Section 5 concludes.
 The station data sources incorporated into previous versions of the CRUTEM database have been extensively discussed by Jones et al. [1985, 1986], Jones , Jones and Moberg , and Brohan et al. . The station data used in the CRUTEM3 data set had assigned codes to each station, giving the principal source for each series (see above references). These have been augmented here, and a full list of source codes is given in Table 1. Although the ultimate sources of all the station data are the NMSs, much of these data have made their way to users via a number of World Meteorological Organization (WMO) and Global Climatological Observation System (GCOS) initiatives, as well as NMS Web sites and scientific publications. We have replaced station data in CRUTEM3 with improved data from NMSs for stations with the same locations as these were deemed to be of better quality. In some cases, the improvement could simply have been a more complete series with fewer missing monthly values.
Table 1. Source Codes and Number of Stations From Each
 The next sections introduce much of this additional material, but only the major source codes in Table 1 are discussed. Apart from NMS source material there are three additional sources that incorporate station data across the world's land areas: CLIMAT (WMO coordinated transmission of many meteorological parameters including monthly average temperatures), Monthly Climatic Data for the World (MCDW), and the decadal World Weather Records (WWR) volumes (from the 1950s onward up to the 1990s). CLIMAT and MCDW are sources that are available in real time and near-real time, respectively, and contain data for approximately 2000–2500 stations, though the number of stations available varies from month to month, particularly so for some developing countries. MCDW is available slightly later (3–4 months) than CLIMAT and tends to contain the same stations (though with fewer missing values), but considerably more for the contiguous United States. We do not use all the station data that report in CLIMAT and MCDW, but restrict ourselves to stations that have enough data to calculate 1961–1990 averages (see section 3.1).
 The WWR volumes are released every 10 years after the completion of each decade. WWR is an important source of data for South America, Africa, Asia, and many island groups. The availability of WWR data only every decade is part of the reason why the coverage of data in near-real time appears to reduce since the last decade of WWR was released for the 1990s. Part of this reduction is due to incomplete availability rather than to the nonexistence of data and should not be interpreted as evidence that the network of stations across the world's land area is reducing. WWR sources can additionally be important in other parts of the world for infilling missing monthly values that occasionally occur in CLIMAT and MCDW sources.
 The numbers of stations from each source are included in Table 1. Although each station is allocated a source code, most station series do not come from a single source [see also Jones and Moberg, 2003; Brohan et al., 2006]. Real-time monthly updating has to be based on CLIMAT and MCDW data, and most NMSs do not fully assess the quality of these data in real time. The CLIMAT and MCDW data are quality controlled by Meteorological Office staff. Within a few years we would expect to replace the recent data for some series with data from direct NMS sources or from the 2001–2010 WWR volume when it becomes available. Further details about updating are given in section 2.10. A station series is, therefore, often based on a combination of multiple sources: the source code given in Table 1 for each station indicates only the dominant source code. The ordering of the updating affects (to some extent) the exact number of sites added from each source.
 Another potential, extensive source of additional data is daily and hourly Synoptic Reports (SYNOP). SYNOP data also include many other weather variables and are one of the principal sources of input data for operational weather forecasts. We have never used data from the SYNOP source in our earlier versions of CRUTEM and continue to exclude it from the new database. There are a number of reasons for this. First, SYNOP data are operational in nature, so are not always extensively quality controlled by NMSs. Second, their coverage tends to be denser in regions where we already have many series. Finally, monthly averages derived from SYNOP data are often found to be biased compared with CLIMAT and MCDW data for several reasons (see discussion by E. Van den Besselaar et al. (Synoptic messages to extend climate data record, submitted to Journal of Geophysical Research, 2012)). These reasons include the facts that there are incomplete numbers of days in each month and the daily maximum and minimum temperatures are not necessarily the true values in midlatitude and high-latitude regions of the world. Additionally, many countries do not calculate monthly averages from monthly mean maximum and minimum temperature averages, so potential biases will be introduced into series updated with SYNOP data.
2.1. United States
 Previous versions of CRUTEM incorporated more stations for this region than any other land area. Our earlier work used almost all the station series available from CLIMAT and MCDW. The only series we excluded were those that we had deemed to have noncorrectable inhomogeneities, which was documented by Jones et al. [1985, 1986]. For CRUTEM2 [Jones and Moberg, 2003], this was supplemented by an additional 1023 series for the contiguous United States, but these all ended in 1996. We never sought to update these data for CRUTEM3, as the number reporting from CLIMAT and WWR for this region was already denser than any other region of the world (see discussion by Jones and Moberg ). With CRUTEM4 we have replaced the 1023 series with 892 series from the current U.S. Historical Climatic Network (USHCN, which contains 1218 stations for the contiguous United States; see code 44 in Table 1) described by Menne et al. . The version we have used includes adjustments for time of observation bias and site relocations (see details provided by Menne et al. ). As many of the additional USHCN series (i.e., the 1218 minus the 892) report through CLIMAT or MCDW, we have replaced our original series for these locations with USHCN data. With both additions we had to ensure that no data series appeared twice. Additionally, the earliest year in all the USHCN series is 1895, so in order not to lose any useful 19th-century data from the series we replaced, we compared USHCN series with those from the replaced set during the 1895–1900 period and kept any pre-1895 data where there was no step jump in 1895. Of the 892 USHCN stations incorporated into CRUTEM4, 525 stations had additional years added before 1895. The USHCN data we use will be periodically updated from the above source. Later we will show that the contiguous United States has only a negligible impact on average Northern Hemisphere (NH) temperatures by removing all station data from this region.
2.2. Russian Federation
 Monthly temperature time series for 475 stations were obtained from the All Russian Research Institute of Hydrometeorological Information (RIHMI)-World Data Center (WDC) (see code 43 in Table 1). We compared these data series with those we already held and identified three groups of stations: those in common to both data sets (131), those only in the CRUTEM database, and those only in the RIHMI-WDC data set (344). The latter group were incorporated into CRUTEM4, and those stations unique to CRU were retained. For the 131 stations in common, comparison revealed differences for some of the series. The differences were of two kinds: (1) systematic offsets between the data series (consistently differing for different months of the year) very suggestive of homogeneity adjustments having been applied to RIHMI-WDC data, and (2) apparently random differences. We are confident that the systematic offsets were applied to the data obtained from RIHMI-WDC rather than to our CRUTEM data, since the latter come from earlier World Weather Records (WWR) sources, and we applied few adjustments to former Soviet Union (fUSSR) data in the 1980s (see details provided by Jones et al. [1985, 1986]).
 The apparently random differences were also assessed, and while the Russian source mostly seemed to be a more reliable value (compared with neighboring stations), this was not always the case. We contacted the Russian NMS and sought to find any documentation about the systematic and random differences. We were not successful in finding any information for the systematic differences, but received considerable help with the random ones. At the end of the exercise, the number of sites in CRUTEM4 was increased by 344 (i.e., the number in the third category above). For some other sites, the majority of the series came from this source, so these are also classified as source 43 (see Table 1).
2.3. Former Soviet Union
 For countries entirely within the former Soviet Union (fUSSR), we updated data from daily data from 223 locations in the fUSSR, also downloaded from RIHMI-WDC (code 51 in Table 1). We downloaded series from 1990 onward (for series already in the CRUTEM database) that offered useful updates, recalculating monthly averages from the daily data in the archive. Most of these series are within Russia, but there were series for other fUSSR countries.
 Additionally, for central Asian countries within the fUSSR, we added in additional data from the National Snow and Ice Data Center (NSIDC) in Boulder, Colorado, choosing only stations for which we already had some temperature data [Williams and Konovalov, 2008] (see code 50 in Table 1). The records for 61 series within Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan were extended and/or improved (fewer missing values).
2.4. Canada, Australia, and New Zealand
 In both our previous two versions, CRUTEM2 [Jones and Moberg, 2003] and CRUTEM3 [Brohan et al., 2006], we have incorporated Canadian station temperature series that have been tested for homogeneity and adjusted for discontinuities that are due to site relocations and changes in observing procedures [Vincent and Gullett, 1999; Vincent et al., 2002]. The convention followed by CRU in the 1980s [e.g., Jones et al., 1985] was that all necessary homogeneity adjustments were applied to the earlier part of a station time series so that ongoing updates could be appended to the modern end of the series without the need for them to be adjusted. The adjustments applied by Vincent and Gullett  and Vincent et al.  have not followed this convention. Some minor further adjustments have been applied to the data since its last update in 2008 (see code 42 in Table 1) to address the change in observing time at airport stations in the eastern regions of the country (discussed briefly by Brohan et al. ). We apply these adjustments, therefore, to real-time CLIMAT updates for sites in this region prior to appending them to the modern end of a series, so that they are homogeneous with the past data.
 The station data we are using for Australia and New Zealand were discussed briefly by Brohan et al. . Source details (Web sites or literature references) for these and other groups in this section are given in Table 1 (codes 40 and 41). For CRUTEM4, we downloaded these homogenized data again and checked against what we had, incorporating all the changes made in Australia and New Zealand.
Bekryaev et al.  analyzed recent trends in Arctic temperatures. In order to improve coverage across the Eurasian and North American parts of the Arctic, they have gained access to more series (with respect to the series already in CRUTEM3) from the region. This data set was compared with the CRUTEM database (after the inclusion of the additional Russian and Canadian data discussed earlier). From this source, 125 stations were new to CRUTEM (coming mainly from Alaska, Canada, and Russia; Greenland is considered separately later). Additionally, many of the other records extended some series and/or made some series more complete, so were added where there was good overlap agreement. It may seem somewhat surprising that there are more data than analyzed or available from the Russian and Canadian NMSs. This just illustrates that personal contact has the potential to elicit additional sources beyond those that an NMS makes available over WMO systems such as CLIMAT or via its Web site. Also, many NMSs often have sites classified as being first- or second-order stations or being climatological or agriclimatological stations, so some series may not be available in near-real time to the NMS. Such sites may be considered as not available to be transmitted over CLIMAT, so are not made available for other sorts of international exchanges.
2.6. Greater Alpine Region
 This is a network of 141 stations for the Greater Alpine Region (GAR) developed during a number of projects led by the Austrian Meteorological Service. Many of the data series extend back to the 18th century and cover Austria, Switzerland, Slovenia, Croatia, and parts of France, Italy, Germany, the Czech Republic, Slovakia, and Hungary. These data have been extensively assessed for long-term homogeneity [see Auer et al., 2001] and have additionally been adjusted for biases in the period before the introduction of “screened” thermometer housing [Böhm et al., 2010]. The issue of the introduction of thermometer screens is discussed further in section 4. Temperature data for 107 stations were added. The additional 34 are either precipitation-only measuring stations or their data were of insufficient length for inclusion. The HISTALP source (code 49 in Table 1) does not include the Swiss stations. These were added from a different source (code 52 in Table 1).
2.7. Greenland, Faroes, and Denmark
 Long mean temperature station series for Denmark (five), Faroes (one), and Greenland (seven) were updated or added using data from recently completed Danish Meteorological Institute (DMI) reports [Cappeln, 2010, 2011; Vinther et al., 2006]. The two DMI reports are given separate source codes (47 and 48 in Table 1).
2.8. WWR Decadal Volumes
 When the 1991–2000 WWR decade was received (∼2006), we were able to infill significant numbers of missing values in our CRUTEM3 monthly series. The 1961–1990 volumes had been assessed for additional series in the course of the development of CRUTEM2 and CRUTEM3 [Jones and Moberg, 2003; Brohan et al., 2006], but the additional data from the 1991–2000 volumes were not always included. During the development of the CRUTEM4 database, it was realized that some of the series we had that came from Global Historical Climatology Network version 2 (GHCNv2) [see Jones and Moberg, 2003] did not always include data from earlier WWR decades (for 1961–1970, 1971–1980, and 1981–1990).
 GHCNv2 kept all sources of data separately for each station by the use of version numbers. The problem of deciding which might be the best source for a given year has been partly resolved within GHCNv3 (http://www.ncdc.noaa.gov/ghcnm/v3.php) as a single series has been developed for each location. We say partly, as there has to be some automation in any decision, and without manually checking each it is unlikely to be the best source in every case. As we noticed that some of our station series did not include the WWR data (which we almost always deemed to be of better quality than that received over CLIMAT and/or MCDW), we checked the data series we had for the three decades (1961–1990) against the WWR data. For a few stations we added in the WWR source (code 37 in Table 1) mainly for series from South America, Africa, southern and eastern Asia, and parts of Europe, and for many island groups around the world.
2.9. How Many CRU Homogenized Series Remain in the CRUTEM4 Database?
 Inhomogeneities may be introduced into a station series by a variety of effects, such as changes in instrument location, local environment, exposure, or recording practices (issues discussed by Trewin ). An early major effort by CRU in the 1980s identified, and attempted to correct where possible, significant inhomogeneities by inspection of data series and, particularly, by comparison with multiple neighbors. The results were fully documented by Jones et al. [1985, 1986]. One conclusion from this exercise was that the large-scale hemispheric and global series were little affected by the application of the adjustments to remove inhomogeneities, partly because positive and negative adjustments tend to cancel each other out [Brohan et al., 2006, Figure 4]. The adjustments did make improvements to the temperature series for individual grid boxes, but no further inhomogeneity adjustments were applied by CRU following those reported in the 1980s. Instead, we recommended [e.g., Jones and Moberg, 2003] that homogeneity assessments and the development of adjusted series should instead be undertaken by NMSs because they would in most cases have access to additional metadata and additional measurement series that would allow more accurate results to be achieved. WMO has a number of documents detailing the need for homogeneity adjustments [Aguilar et al., 2003; World Meteorological Organization (WMO), 2011].
 Following on from our recommendation, we have replaced some of our data series (including some that we had adjusted in the mid-1980s) with the results of a number of international or national NMS-led homogeneity projects (Table 1). The number of CRU-adjusted series in the mid-1980s was 312. With all the additions for this analysis, there are now 219 in CRUTEM4. This reduction has come about for the following reasons: 68 series have been replaced with newer series, 15 did not have 1961–1990 normals and so are not used, and 10 have been removed. This does not mean that there are fewer adjusted series within the database, just that the adjustments have been undertaken by NMSs. The incorporation of the USHCN data set and the replacement of the colocated series means we already had reduced the number of contiguous U.S. station data in CRUTEM4 that were adjusted by CRU in the 1980s.
 For the 219 series that were identified by CRU as in need of adjustment (and that have not subsequently been replaced by alternative data series), we revisited the neighbor comparisons reported in the mid-1980s [Jones et al., 1985, 1986]. These comparisons showed that the adjustments for stations in the Southern Hemisphere (SH) outside of Africa reported by Jones et al.  had not actually been applied to the station data used in CRUTEM3. These adjustments have now been applied to the station data used in CRUTEM4. For Southern Hemisphere stations in Africa, the comparisons showed that the adjustments had been correctly applied, though the period of adjustment had been reported incorrectly by Jones et al. . The adjustment was correctly applied to the data prior to the inhomogeneity, though it was reported that the data after the inhomogeneity were adjusted. The availability of the station series is discussed in a later section and allows further inspection of these neighbor comparisons and a comparison between the CRUTEM3 and CRUTEM4 station data.
2.10. Updating the Series
 In the course of all the above work, it became apparent that a number of the WMO station identifiers had been changed by the NMSs. Using the latest list of these identifiers (http://www.wmo.int/pages/prog/www/ois/volume-a/vola-home.htm), some of the CRUTEM3 identifiers were changed to the updated numbers. Changes to identifiers seem to be made by some NMSs to indicate that the station is no longer a manned location but has been replaced by an automatic weather station (AWS), but this is not always the case. The WMO list of station identifiers (referred to as Volume A) is updated at the beginning of every year. It is possible to monitor this and to also flag up any “new” WMO identifiers that appear in the monthly CLIMAT updates at the end of each year. Updating is much easier using current WMO station identifiers.
 CRUTEM4 will continue to be updated in near-real time from CLIMAT and MCDW sources. These sources provide data for far fewer than the total number of series now in CRUTEM4, which is over 5500 (including the additional 892 from USHCN). Updating will, therefore, lead to a significant drop in stations beyond 2010 (between them the two sources have a maximum of about 3000 stations) if only these sources are used. All the series discussed above should be updated on Web sites, but with different schedules. We intend to periodically check the Web sites and update the series every 2 years for those that are not updated in a more routine fashion. For Australia and Greenland it is likely these can be incorporated at the same stage as MCDW (i.e., 3–4 months behind the real-time update using CLIMAT). USHCN data will be updated at the end of each year. For the other regions or countries discussed, updating should be possible annually or every 2–3 years. There are GCOS initiatives to request more countries to release many more of their national series over the CLIMAT system. Several European countries (e.g., Germany and Spain) have already begun to do this. In terms of the global average, it would make the most difference if Russia and Canada also did this, as their areas are large.
2.11. Availability of the Station Series
 Given the importance of the CRUTEM land temperature analysis for monitoring climate change [e.g., Trenberth et al., 2007], our preference is that the underlying station data, and software to produce the gridded data, be made openly available. This will enhance transparency and also allow more rapid identification of possible errors or improvements that might be necessary (see, e.g., the earlier discussion of homogeneity adjustments in the SH).
 Nevertheless, we are reliant on obtaining some data from NMSs and must be careful not to jeopardize our continued access to these data. Apart from data obtained from public sources, some data in our database were obtained without a clear indication of our freedom to make it openly available or perhaps with informal agreements not to do so. In November 2009, the UK Met Office wrote on our behalf to all NMSs to determine if we could release the versions of their monthly temperature series that we held. Of the about 180 letters, we received 62 positive replies, 5 negative replies, and the remainder did not reply. For some of the positive replies, conditions were imposed, basically of two kinds: (1) please point users to the NMS Web site where they might gain access to more or improved station series, or (2) permission to release some but not all the series.
 Not content to withhold data for those countries for which we had either no reply or a negative reply from their NMS, we have compared station locations and data with those available in GHCNv3. Where the locations and most of the data agreed, we deemed that we could release these data because they were already available through GHCNv3. WMO Resolution 40 (http://www.wmo.int/pages/about/Resolution40_en.html) requires that all monthly mean temperature data “necessary to provide a good representation of climate” should be freely available, though the extent to which this is enforced in cases in which NMSs do not make this data available is unclear. Furthermore, this is an agreement signed by the NMSs and WMO and not with other third parties. Data from the WMO's Regional Baseline Climatological Network (RBCN) should be freely available, however they have been obtained. Additionally, data from CLIMAT, MCDW, and WWR are freely available, just in different formats.
 As a result of these efforts, we are able to make the station data for all the series in the CRUTEM4 network (except for 18 series from Poland) freely available, together with software to produce the gridded data (http://www.cru.uea.ac.uk/cru/data/temperature/ and http://www.metoffice.gov.uk/hadobs/). Note that in many cases these station data have been adjusted for homogeneity by NMSs; in order to gain access to the original raw data (i.e., as measured data or daily and subdaily measurements), it will be necessary to contact each NMS.
3. Transformation of the Station Data to a Regular Grid
 All analyses of large-scale temperatures require strategies to reduce the biases in, e.g., hemispheric averages and principal component patterns that would arise from uneven station density (i.e., biased to regions where station density is high) or from temporal variations in data coverage (e.g., a reduction in data from regions with cooler average temperatures). These strategies typically include the representation of temperature anomalies on a regular grid [Peterson et al., 1998]: The most widely used method is termed the climate anomaly method (CAM) [e.g., Jones, 1994], with the other two being the reference station method (RSM) [Hansen et al., 2010] and the first-difference method (FDM) [Peterson et al., 1998].
 Direct comparisons of the three approaches with the same basic data were discussed by Peterson et al.  and Vose et al. . Possible differences between the techniques and advantages and disadvantages of each are also discussed by Jones et al. . In this study we use the CAM approach, which requires reducing all the station temperature data to anomalies, from a common period such as 1961–1990 on a monthly basis. Grid-box anomaly values were then produced by simple unweighted averaging of the individual station anomaly values within each grid box.
 The main disadvantage of the CAM is that stations must have enough years with data within the 1961–1990 period in order to be used. For some stations with incomplete data for 1961–1990 it will be possible to use published 1961–1990 normals [WMO, 1996], although care is required when doing this.
3.1. Development of 1961–1990 Normals and Outlier Checks
 Monthly averages for 1961–1990 (the latest WMO normal period) were calculated from the enhanced station data set, accepting an average if at least 14 years of data are available. For stations where this was not possible, WMO  normals were used, if available, for all months. For a further set of stations, 1961–1990 normals were estimated using the 1951–1970 period and adjusted by the difference between the grid-box averages for 1961–1990 and 1951–1970 from the earlier gridded data (see discussion by Jones and Moberg ). Altogether, 1961–1990 normals were developed for 4842 stations, of which 4625 were calculated directly, 151 from WMO , and 66 using 1951–1970 averages. Temperature data for the remaining 741 stations without 1961–1990 normals were not used in the subsequent gridding. In terms of station years (for which a year with at least nine valid months counts as a year) over the 1850–2010 period, the amount of data not used totals only 4% of the overall station-year total.
 The choice of 1961–1990 rather than a later 30 year period (e.g., 1971–2000 or 1981–2010) ensures that as much data as possible are used. There would be a much greater amount of unused data if a more recent 30 year period were used. The period 1961–1990 also ensures consistency with earlier analyses. Differences in base periods can also confuse users, especially the media [see Arguez and Vose, 2011].
Section 2 has extensively discussed the sources of the additional temperature data. Although many of the sources have undergone detailed homogeneity testing, there is still the possibility of outliers, which might induce a longer-lived influence if they occur during the 1961–1990 period. To assess outliers, we have also calculated monthly standard deviations for all stations with at least 15 years of data during the 1941–1990 period. If a station does not have standard deviation values, then the station is not used in the subsequent gridding. This removes an additional 59 series, but all of these are relatively short in duration. All outliers in excess of 5 standard deviations from the 1961–1990 mean were compared with neighbors and corrected or set to the missing code. After this step, the 1961–1990 normals and the 1941–1990 standard deviations were recalculated. In the subsequent gridding (section 3.2), outliers in excess of 5 standard deviations are omitted. As there are no outliers for 1961–1990 values, this step applies only to years before 1961 and after 1990.
3.2. Gridding and Number of Stations Used Through Time
 Each of the 4842 stations with normals were first associated with their 5° × 5° latitude-longitude grid box and grid-box anomaly values calculated by simple averaging of all available station anomaly values within each grid box for all months 1850–2010. All station outliers in excess of 5 standard deviations were omitted from the analysis. Apart from retaining the grid-box temperature values, we also retain the number of stations per grid box. This latter value will be necessary to calculate a “variance-adjusted” version of the gridded data set following the approach outlined by Jones et al. , Jones and Moberg , and Brohan et al. . The approach used adjusts the variance of each grid-box time series to be compatible with the infinitely sampled grid box [see Jones and Moberg, 2003; Brohan et al., 2006]. This version of the data set is referred to as CRUTEM4v, with the unadjusted version referred to as CRUTEM4. CRUTEM4v reduces the impact on each grid-box time series of changing station availability through time. CRUTEM4v is recommended for use for small regions and individual grid-box time series, especially if users wish to consider changes in variance and/or extremes (at the monthly time scale [see, e.g., Jones et al., 1999, Figure 5]). At the hemispheric scales, which are discussed in the next section, there is very little difference between averages calculated with CRUTEM4 and CRUTEM4v. Brohan et al.  additionally discusses reasons for appropriate usage of these two versions of the data set. In the subsequent analyses in the next section, we use CRUTEM4 to calculate all the hemispheric and any regional series used.
 Before moving to the next section, we first explain changes in the number of stations available through time. Figure 1 illustrates both the number of stations used each year and the percentage of area coverage this produces for each hemisphere. The results are compared with the earlier analysis by Brohan et al.  using CRUTEM3. The improvement in the station numbers is more dramatic for the NH compared with that of the SH. The big increase in 1895 represents the starting date for many of the stations in the contiguous United States, while there is a similar increase in station numbers in 1951 when the first of the 10 year WWR volumes became available. Numbers of stations reduce from a peak in the 1960s, occurring in a series of steps at the end of each decade, which is indicative of the cause being changes in station availability in the WWR volumes. For the SH, there are few improvements in coverage, the main ones being due to improved use of the 1971–1980 WWR volumes and the inclusion of more data after 2000.
 In terms of percentage of area coverage, the improvements have had a smaller effect than in terms of station numbers, with the increase being greater in the NH compared with that in the SH. The small step changes at the end of each decade (1980, 1990, and 2000) are due to the WWR volumes. There are generally enough contiguous U.S. data for 2010 from the CLIMAT and MCDW sources, so the unavailability of USHCN data in 2010 does not affect the area coverage for the NH, and the drop in the final year for the NH is principally due to missing Russian data. There is little change in area coverage for the SH in CRUTEM4 compared with that of CRUTEM3.
4. Analysis of the Enhanced Gridded Land Data
4.1. Hemispheric-Scale Averages and Comparisons With CRUTEM3
 Hemispheric-average time series were produced using cosine weighting of grid-box values in each hemisphere [Jones, 1994]. Averages were calculated for each month from January 1850, and then seasonal and annual averages were calculated using the hemispheric-average monthly values. Standard 3 month climatological seasons were used, with December of the previous year counting toward the winter value for the current year. As December 1849 is not available, all the seasonal and annual series for the Northern Hemisphere (NH) begin in 1851. For the Southern Hemisphere (SH), the first year is taken as 1856. Before this date, there are fewer than 5 stations with data. Beginning with 1856, the number of available stations in the SH increases to 5 series, reaching 10 by 1860 (see Figure 1). In later figures (Figures 5b5 and 9b) we will highlight that uncertainty ranges of SH averages, calculated from so few stations, are substantial.
Figure 222 shows the seasonal and annual values for the NH and SH and an annual series for the global land together with 10 year Gaussian smoothed series. For comparison, the smoothed CRUTEM3 series are also shown to indicate the impact of the additional or replaced series in the station database. The global land series is computed by weighting the two hemispheres approximately in proportion to the areas of their landmasses (i.e., global = [(2/3)NH + (1/3)SH]). This weighting is different from the equal hemispheric weighting applied by Brohan et al.  for CRUTEM3. The new weighting has also been applied here to CRUTEM3.
 The differences between the two sets of smoothed lines indicate excellent agreement from 1880 up to 2000 for the NH. At this decadal time scale, all the additions have made no discernible differences between the analyses, an initial indicator that for hemispheric-scale averages the analysis is very robust. CRUTEM4 is very slightly warmer since 2000 for the NH for the year and all seasons except summer. The likely reason for this is the additional data in the Arctic (particularly Russia), and this is further investigated in the next section. Prior to 1880, CRUTEM4 is slightly cooler than CRUTEM3, more so in winter (December, January, and February (DJF)) and spring (March, April, and May (MAM)) than in the other seasons. Again, later analyses will be suggestive that this results from the additional Russian series. For the SH, differences between CRUTEM4 and CRUTEM3 are slightly greater earlier in the series and extend up to the early 20th century, particularly in the austral winter (June, July, and August (JJA)). CRUTEM4 is cooler than CRUTEM3 during 1861–1910, the exact period depending on the season. CRUTEM4 has been very slightly warmer than CRUTEM3 since about 2005. Possible reasons for the differences in the 19th century in the SH are investigated in the next section. Uncertainty ranges, calculated using the same approach as that used by Brohan et al.  are shown in later figures (on the decadal scale in Figure 5 and on the interannual time scale in Figure 9).
 For the NH, year-to-year variability is greatest during winter and least in summer. The slightly greater variability prior to 1880 in all seasons (except summer) is more likely to be due to sparser coverage then a real feature. This greater variability is marginally reduced by adjusting the individual grid-box time series for changing station data contributions (introduced by Jones et al.  and the data set produced here called CRUTEM4v), but the variance of regional averages has not been similarly adjusted for reduced grid-box availability. For the SH, year-to-year variability is more similar between the seasons.
 All seasons and the annual series for both hemispheres show comparable century-scale warming from the beginning of the 20th century, but there are differences in timing between them. Warming is significant in all seasons and annually for 1861–2010, 1901–2010, and 1979–2010 (except for May and December for the SH for 1979–2010). Table 2 provides the warming explained by a least squares linear fit to the monthly series for these three periods. Warming in all three periods tends to be greater in the NH compared with that in the SH, and the NH warming has a much more marked seasonal character than that for the SH. Table 2 also includes calendar year average values for CRUTEM3. CRUTEM4 warms more than CRUTEM3 for all three periods because of the cooler values before about 1880 (particularly in the SH) and slightly warmer values in the NH since about 2000.
Table 2. Total Temperature Change (°C) for CRUTEM4 Described by Linear Least Squares Regression Lines Fitted Over Three Periods: 1861–2010, 1901–2010, and 1979–2010a
Comparative annual values for CRUTEM3 are shown at the bottom.
 The marked seasonality of the warming for 1861 to ∼1900 (estimated by comparing the NH trend differences in Table 2 for 1861–2010 and 1901–2010) may be artificial because of the possible impacts of direct sunlight on the instruments prior to the development of Stevenson-type screens in higher northern latitudes during summer (see earlier discussion in relation to the HISTALP data set [Böhm et al., 2010]). The addition of the newly adjusted series in the GAR may be the reason for the slight difference between CRUTEM3 and CRUTEM4 before 1860, when coverage was sparse outside Europe. Böhm et al.  and Brunet et al.  suggested that this issue is much wider in scale across the midlatitudes and high latitudes of the NH. Alternatively, if this seasonal contrast is real, then it implies a marked change in continentality (greater winter-summer temperature differences) over part of the NH prior to 1880. Further work is required, but the studies reported above are clearly suggestive of screen exposures being the more likely cause.
 In this section we compare spatial patterns between CRUTEM4 and the earlier CRUTEM3 data set. In Figure 3, we plot the annual temperature anomaly for the decade 2001–2010, with respect to our base period of 1961–1990, for both analyses and their difference. This difference clearly illustrates the improvement (i.e., outlined in black in Figure 3c) in coverage in CRUTEM4 compared with CRUTEM3, particularly across the higher latitudes of Eurasia and North America. As this expansion of spatial coverage in the Northern Hemisphere has contributed to warmer temperatures in CRUTEM4, the 2001–2010 decade is warmer than in CRUTEM3 for the NH (0.80°C compared with 0.73°C). There is much less coverage change across the Southern Hemisphere, and the two corresponding averages are 0.43°C for CRUTEM4 and 0.40°C for CRUTEM3. Figure 3c is mostly green, but differences do occur, particularly over the contiguous United States and Australia, where we have made many changes to the station data used (see discussion in section 2).
 In Figure 4, we show linear trend maps for annual temperature averages for 1951–2010 for both analyses and the difference. Figure 4b for CRUTEM4 shows the improvements in coverage, which can also be seen in Figure 4c by the grid boxes outlined in black. Of the grid boxes in common between the two analyses, 499 boxes differ within ±0.2°C in their total trends over the 60 year period, with 86 boxes indicating that the CRUTEM4 trend was >0.2°C more than CRUTEM3 and 41 with CRUTEM3 having >0.2°C more warming than CRUTEM4.
4.2. Assessment of the Robustness of Hemispheric Averages Omitting Large Numbers of Stations
 In the previous section, we illustrated the robustness of the large-scale averages by comparing this new version of the data set (CRUTEM4) with the previous (CRUTEM3). Differences are relatively minor and well within the error ranges estimated by the earlier study by Brohan et al.  and recalculated here. In this section we expand on this by using considerably less station data while still producing essentially the same hemispheric series at the decadal time scale. We do this, first, by using mutually exclusive subsets of the overall station data and, second, by omitting all the station data from some large countries.
4.2.1. Using Only a Subset of the Station Data
 For this exercise we took the 5583 stations and separated them into five subsets, each containing a unique 20% of the data. The ordering of the stations in the station file uses the World Meteorological Organization (WMO) numbering system, with the exception of the 892 USHCN stations, which have been placed at the end. The first subset contained stations ordered 1, 6, 11, 16,…, etc., in the list. The second contained stations ordered 2, 7, 12, 17,…, etc., with the fifth set containing the stations ordered 5, 10, 15, 20.… In this separation into five subsets, no account was taken of whether the station had sufficient data for the 1961–1990 reference period. Therefore, after removal of those station records with insufficient data during the 1961–1990 reference period, the size of each subset may differ slightly. It will also differ back in time, since record length is also not considered when forming the subsets. For each subset the 20% of the data were gridded using the same method as described in section 3.2 and hemispheric seasonal and annual averages calculated as stated in section 4.1. Figure 5 shows the hemispheric averages from the five networks, by season and year, together with those of the complete CRUTEM4 network (i.e., 100%). Differences among the five networks are barely noticeable after the 19th century for the NH. For the SH there are larger differences, but for both hemispheres they are well within the error ranges calculated by the approach of Brohan et al. . For the 19th century, differences are marked only in the Southern Hemisphere, where coverage is poorer than in other parts of the world.
 The results shown in Figure 5 are not unexpected. A similar assessment of this kind was undertaken by Parker et al.  using two networks of offset and nonadjacent 5° × 5° latitude-longitude grid boxes. The differences in the 19th century in the SH for Parker et al.  were larger, but these were due to an even smaller set of stations (and hence grid boxes) being used. The simple reason that a small network of well-located sites can closely reproduce the series derived from a much greater station network is due to there being a limited number of independent spatial degrees of freedom (see Jones et al. , where this concept was explored in considerable detail). That paper concluded that hemispheric and global average temperatures (at annual time scales and above) could be reliably estimated (i.e., within the error ranges shown in Figure 5) by as few as 50–100 sites. Reliably here means within the error range estimated by Brohan et al. . The greater differences during the 19th century, especially for the SH, arise because the station network was so limited then that separating it into five subsets resulted in each subset having insufficient stations to obtain a reliable SH temperature estimate. This point is discussed more in section 4.3.
 There are a number of obvious asides that can be made once the concept is realized. For example, if resources became available for digitization of early temperature data, then these would be best targeted at the data-sparse regions, particularly in the Southern Hemisphere and the tropics. These issues are discussed further by Jones and Wigley .
4.2.2. Omitting Large Countries
 Another possible concern is that the CRUTEM4 station database might be unduly dominated by data from particular countries or regions. Gridding the data overcomes this to a large extent, but the robustness of the CRUTEM4 data to this issue can additionally be assessed by considering the effect of removing series from different countries of the world. In the first part of this exercise, we took the 5583 stations and separately removed all stations in the contiguous United States and Australia. Figure 6a6 shows the NH seasonal and annual averages based on all stations compared with averages omitting sites from the contiguous United States. The effect here is only noticeable in the 19th century and then mostly only in winter (DJF) and spring (MAM). In these seasons, and to some extent in the annual mean, omitting the contiguous U.S. data lowers the earliest temperature estimates, implying that the mean U.S. temperature anomalies are slightly warmer than the mean for the rest of the NH. Figure 6b shows similar plots omitting all Australian stations. This is a much more severe test than in Figure 6a, as Australia is a much larger component of the SH landmass than the contiguous United States is of the NH. Removing Australian stations has a larger effect, particularly prior to 1900, but as with Figure 5, if error ranges were plotted these would easily encompass the differences seen. The sign of the difference arising from the removal of Australian temperatures varies between seasons and with time, indicating no systematic difference with the mean of the rest of the SH. In the annual mean, removing Australian data warms the SH mean around 1860 and in the 1940s, but cools it during the 1880s.
 Although both Australia and the contiguous United States are very large areas, we now go a stage further and omit two larger regions: first, Russia, and, second, the former Soviet Union (fUSSR). The results are shown in Figures 7 and 8. As expected, the effects of removing the fUSSR are slightly more apparent than when removing just Russia, though the periods of the differences tend to be similar (as Russia was a large component in terms of area of the fUSSR). Removal of either tends to make the NH slightly warmer in the 19th century, particularly in the winter (DJF) and spring (MAM) seasons. As we have added large numbers of extra stations in both Russia and the Arctic (particularly the Russian Arctic), this is probably the principal reason for the slightly cooler NH temperatures during the 19th century and to a lesser extent the slightly warmer temperatures in the last 10 years in CRUTEM4 compared with those of CRUTEM3. The similarity of the seasonal differences between Figure 2 and Figures 7 and 8 is very suggestive of this being the most likely cause. Additional data in other parts of the world (principally Europe in the 19th century) are also probably factors.
 The negligible effects of omitting large regions (and consequently large numbers of stations) are a direct result of the remaining stations still being adequate for monitoring hemispheric averages by sampling the most important spatial degrees of freedom, across the world's land areas.
 There are also issues with the exposure of early instrumental data prior to about 1910 over parts of Australia [Nicholls et al., 1996]. It is important that resources be found to objectively estimate the necessary adjustments, so that pre-1910 data can be used with more confidence. Biases that are due to different exposures of early thermometers are also important in Europe, particularly for the period before 1870 [Böhm et al., 2010]. Issues with the different exposure properties (from pre-louvered-screen locations) are only beginning to be incorporated into global temperature databases. Traditional approaches to station homogenization are unable to detect the problem as all sites within a region are likely similarly affected by the same problem (see discussion by Jones and Wigley ). In this study, we have included 107 series from the GAR that have been adjusted to attempt to compensate for changes in exposure, but it is apparent that stations in other midlatitude and high-latitude regions probably need adjustment during the summer months (typically to cool the earliest temperature estimates relative to the modern data). For the NH, the effect principally occurs for the period before about 1880, so the regions of the world where additional assessment is needed is Europe, Russia, and Iceland-Greenland. Canada and Alaska are also likely to be affected, but there are few stations beginning before 1880. Assessment will be difficult as all series are likely to be similarly affected. Approaches such as the rebuilding of the screens from the 19th century [e.g., Brunet et al., 2011] and taking parallel measurements is a possible avenue to follow.
4.3. Comparison of Annual Hemispheric Series With the Results of Analyses by Other Groups
 In this section the two hemispheric land-only averages are compared with two other analyses: the series developed by the National Climatic Data Center (NCDC) [Smith et al., 2008] and the Goddard Institute for Space Studies (GISS) [Hansen et al., 2010]. Our present study uses a base period of 1961–1990, while NCDC currently uses 1901–2000 and GISS uses 1951–1980 for their published series. For direct comparison we have adjusted both series to our 1961–1990 base period on a monthly basis. Figure 99 shows hemispheric seasonal and annual series from CRUTEM4, additionally plotting decadally smoothed series for the two U.S. analyses. For both the NH and SH, CRUTEM4 tends to more closely follow NCDC than it does GISS, even though all three show similar amounts of long-term warming since 1880. The reason why CRUTEM4 more closely follows NCDC has been discussed before [Vose et al., 2005] and relates to these two analyses using the same 5° × 5° latitude-longitude grid boxes compared with the 40 equal-area boxes used per hemisphere by GISS. Correlations between CRUTEM4 and NCDC-GISS are 0.984/0.980 for the NH and 0.950/0.927 for the SH (for the 1880 to 2010 period) and support the findings of Vose et al. . Differences among the three analyses are greater in the SH compared with those of the NH, particularly before about 1920. Differences are not sustained right back to the start of records, however, as the lines move closer together again in the 1880s. The uncertainty ranges for the SH are larger than those of the NH because of more missing boxes (particularly over the Antarctic) and fewer stations per grid box over Africa and South America than the northern continents.
4.4. Comparisons With ERA-Interim and ERA-40 Reanalyses
 In this section we compare CRUTEM4 at the hemispheric resolution with similar land averages calculated from two versions of the European Centre for Medium-Range Weather Forecasting (ECMWF) Re-Analyses (ERA-40 and ERA-Interim). ERA-40 covers the period 1958–2001, and ERA-Interim (which uses four-dimensional (4D) variational assimilation compared with the three-dimensional (3D) schemes in ERA-40) covers the period from 1979 to 2010. For a discussion of the ECMWF Re-Analyses see the works by Simmons et al. [2004, 2010] and Uppala et al. . A common period for both reanalyses is 1981–2000, so we reduce their absolute land temperature values to anomalies from this base period. Figure 10a10 shows seasonal and annual comparisons between the two reanalyses and CRUTEM4. As with the earlier plots, we show seasonal and annual values of CRUTEM4 (from the 1961–1990 base period) with the two ECMWF Re-Analyses as smoothed series using a 10 year Gaussian smoother. For the NH, both ERA-40 and ERA-Interim track one another very well over their period of overlap (1979–2001) and are offset from CRUTEM4 by an amount that relates to the difference between the 1961–1990 and 1981–2000 periods. In Figure 11 we compare ERA-Interim with CRUTEM4 for the Northern Hemisphere on the monthly time scale from 1979. For this plot, the base period of 1979–2010 is used for both series. The agreement between the two series is excellent. ERA-Interim warms slightly more than CRUTEM4 over this period, which is probably due to greater warming in the Arctic land grid boxes in ERA-Interim that are missing in CRUTEM4.
 For the SH in Figure 10b, there are marked differences between both reanalyses during their overlap period. ERA-Interim is closer to CRUTEM4, but the similarity of the smooth curves is markedly less good, particularly in the austral autumn (MAM) and winter (JJA). ERA-40 is further offset from CRUTEM4 before about 1980 in all seasons except austral summer, and this is due to a cold bias in the climate model used by both reanalyses over the Antarctic [Uppala et al., 2005]. To illustrate this further, we have calculated averages for both the SH 0º–60°S and for the Antarctic (60°S–90°S) for all three series (Figure 1212). For ERA-Interim, the time series agreement (for the SH 0°–60°S) is almost as good as the NH land, but for ERA-40, there is a significant divergence before the early 1970s, with warmer ERA-40 temperatures in all seasons. This difference was commented on by Simmons et al.  and was shown to be due to ERA-40 being given little input data for Australia prior to the early 1970s. With little input data to correct model biases, the reanalyses tend to the model simulation, which for Australia is a model that is biased warm (see further discussions by Simmons et al.  and Uppala et al. ). For the Antarctic, the cold bias in the climate model used by ERA-40 is clearly evident, particularly so in all seasons, although it is smaller in the austral summer (DJF). Figure 13 repeats Figure 11 but for the SH 0°–60°S, showing good agreement between CRUTEM4 and ERA-Interim, but this is less good than the NH for the 12 month Gaussian smoothed lines.
 In this paper we have detailed the developments to the CRUTEM4 data set available from the Climatic Research Unit. The improvements to the quality of the grid-box data set have been made possible by better availability of the basic station data. The homogeneity of the station data has been improved by investments of effort by a number of research groups and particularly by a number of NMSs around the world. We undertook much homogeneity work in the 1980s, but recommended at that time that this work be best undertaken by NMSs. This is beginning to come to fruition, and we hope that more can find the resources to complete this task. In the 1980s, we adjusted 312 station series (then about 10% of the overall total of stations). Replacement of many of these series by improvements from NMSs means that there are only 219 stations (4.6% of the new total of stations with normals) that we adjusted almost 30 years ago. The major bias issue that still affects the data set relates to the exposure of the thermometers before louvered screens were introduced between 1870 and 1880. Three studies [Böhm et al., 2010; Brunet et al., 2011; Nicholls et al., 1996] have considered the problem (summer temperatures are probably biased warm by up to 0.5°C) and provided adjusted data in the case of the Greater Alpine Region, which we have used. We urge more studies of these kinds to be undertaken using the parallel measurement approach developed by Brunet et al. .
 Differences in the hemispheric averages produced by the new version (CRUTEM4) compared with those of the earlier version (CRUTEM3) are relatively small and well within the error ranges developed using the techniques described by Brohan et al. . This result is not unexpected and confirms a number of other studies by the groups producing these data sets. To illustrate this robustness further, we carried out two sets of analyses, focusing on the hemispheric-scale averages that result. First, we separated the station data into five independent samples, each comprising 20% of the basic station series. Second, we separately omitted all the station series from large countries (contiguous United States, Australia, Russia, and the former Soviet Union). For both sets there were differences between the analyses, but they were barely visible on time series plots after 1900 for the Northern Hemisphere (NH) and after about 1920 for the Southern Hemisphere (SH), so effects are only for periods for which coverage becomes markedly sparse. Even then, differences were well within the range of the error estimates we have developed in an earlier study [Brohan et al., 2006].
 Finally, we compared the hemispheric averages with estimates derived from reanalysis products (ERA-40 and ERA-Interim) developed by the European Centre for Medium-Range Weather Forecasts. ERA-40 covers the period 1958–2001, and ERA-Interim covers 1979–2010. For the NH, the agreement between the two reanalyses and CRUTEM4 was excellent. For the SH, agreement was considerably poorer, but if the SH was restricted to 0°–60°S then it was markedly improved. Problems with reanalyses over the Antarctic are well known, though ERA-Interim is a considerable improvement over ERA-40 for the Antarctic region.
 The authors thank all the scientists and NMSs who have collected and made available the monthly mean temperature data referred to in Table 1. The authors thank David Parker and Adrian Simmons for comments on an earlier draft. This work has been supported by the U.S. DOE (grant DE-SC0005689), JISC (ACRID project), and by the University of East Anglia. CM was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101).