1. Atmospheric conditions strongly influence ecological systems, and tools that simplify the access and processing of atmospheric data can greatly facilitate ecological research.
2. We have developed RNCEP, a package of functions in the open-source R language, to access, organise and visualise freely available atmospheric data from two long-term high-quality data sets with global coverage.
3. These functions retrieve data, via the Internet, for either a desired spatiotemporal extent or interpolated to a point in space and time. The package also contains functions to temporally aggregate data, producing user-defined variables, and to visualise these data on a map.
4. By making access to atmospheric data and integration with biological data easier and more flexible, we hope to facilitate and encourage the exploration of relationships between biological systems and atmospheric conditions.
Clearly, this intimate relationship between short- and long-term atmospheric conditions and biological systems demonstrates the need to incorporate atmospheric data in ecological studies. A large amount of atmospheric data, stored in diverse formats, can be accessed through various Internet portals (Shamoun-Baranes, Bouten, & van Loon 2010); however, not all data are freely available. The National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Reanalysis data set (Kalnay et al. 1996), hereafter called R-1, and NCEP/Department of Energy (DOE) Reanalysis II data set (Kanamitsu et al. 2002), hereafter called R-2, are high-quality, well-documented, freely available data sets with global coverage of numerous atmospheric variables. In recent years, these NCEP data sets have been increasingly used in ecological research including studies of phenology (e.g. Chmielewski & Rötzer 2002; Cook, Smith, & Mann 2005), land cover and atmospheric interaction (e.g. Lawton et al. 2001; Marengo et al. 2008), and bird migration (e.g. Kemp et al. 2010; Shamoun-Baranes, Bouten, & van Loon 2010 and references therein). NCEP data can be accessed via ftp where users can download a full year of data, a web service where users can select a temporally and spatially continuous subset of data, or queried directly from the Internet using the Open-source Project for a network Data Access Protocol (OPenDAP; Sgouros 2004). In all cases, NCEP data are stored in the netCDF data format (a well-documented binary format for storing array-oriented scientific data; Rew et al. 2010) and the user needs to extract the data using specially developed software tools.
To facilitate the extraction, organisation, aggregation and visualisation of NCEP R-1 and R-2 data, we have developed the RNCEP package of functions in the R language for statistical computing and graphics (R Development Core Team 2010). R is a freely available open-source computing environment that is highly extendable and can be run on multiple platforms. R is extensively used in ecological research with tools tailored to the ecological community (e.g. Calenge 2006; Kneib & Petzoldt 2007). This article describes the functionality of the RNCEP package.
The NCEP data sets
The NCEP/NCAR R-1 and NCEP/DOE R-2 are freely available state-of-the-art gridded reanalysis data sets with global coverage of many relevant atmospheric variables spanning 1957 to present and 1979 to present, respectively. Data for many variables are available at 17 pressure levels ranging from 1000 to 10 mb. Other variables describe conditions either at or near the surface. These data have a spatial resolution of 2.5 × 2.5° and a temporal resolution of 6 h (00, 06, 12, 18 h UTC). Still other variables in these data sets are given on a global T62 Gaussian grid with 192 equally spaced longitudinal and 94 variably spaced latitudinal grid points, also in 6-h intervals.
We have designed several functions that can be used to facilitate common tasks when working with atmospheric data. These tasks are described in more detail in the following text and presented in schematic workflows (Figs. 1 and 2).
Utilising R’s extensive facilities, one can perform GIS and statistical operations on any retrieved atmospheric data from within R or data may be exported in a variety of formats, ranging from R’s internal format to formats that can be imported into third-party GIS software (see the RNCEP help files).
R functions to retrieve and organise reanalysis data from a specified spatiotemporal extent
Task 1 – Gather NCEP data
The NCEP.gather() function retrieves data from the R-1 or R-2 data set, utilising the OPenDAP method of data access, for a given spatiotemporal range. Users must specify (i) the variable of interest, (ii) the level (i.e. pressure level, surface or T62 Gaussian grid) from which the variable is to be obtained, (iii) the desired spatial range, (iv) the desired month and year ranges and (v) whether data should come from the R-1 or R-2 data set. With these parameters specified, NCEP.gather() returns to the R environment a three-dimensional array containing the specified variable over the desired spatiotemporal range. For example, the user can extract air temperature at the 850-mb pressure level from the R-1 data set for October, 1970–2000, 50°N–60°N, 5°W–10°E (Fig. 1).
Task 2 – Restrict NCEP data
The structure of the NCEP data sets makes it difficult to download an interrupted time series of data per year. For example, one cannot easily obtain data for only 1200 UTC every day for a specific month. The function NCEP.restrict() can be used to remove unwanted temporal intervals from the data imported into R in Task 1 using NCEP.gather(). Using NCEP.restrict(), one can remove data for a specified year, month, day, hour or any combination of the four. For the example in Fig. 1, this function is used to restrict the temperature data to 1800 UTC and 1–15 October.
Task 3 – Aggregate NCEP data
Once an array of atmospheric data has been obtained, and the time series restricted, derived variables may be calculated by temporally aggregating or summarising the array, for example by calculating a mean, percentage of occurrence or an accumulation. For instance, one could calculate mean maximum temperature to explain Malaria epidemics (Githeko & Ndegwa 2001), the frequency of tailwind assistance for migrating birds to explain the timing of spring arrival (Sinelschikova et al. 2007), or seasonal temperature accumulation (i.e. degree days) to explain variability in plant phenology (Wang 1960). The NCEP.aggregate() function is applied to perform these temporal aggregations. It summarises data at each grid point and returns a new array with the same spatial dimensions as the input array. The user specifies the function to apply (either an internal R-function such as ‘mean’, ‘max’ or ‘sum’ or a function created by the user) and whether or not to aggregate each temporal component: year, month, day and hour. While data returned by NCEP.gather() and NCEP.restrict() are technically weather data as they describe atmospheric conditions at relatively short temporal intervals, NCEP.aggregate() can be used to derive climate data by averaging the atmospheric data over a sufficiently long period. In the example in Fig. 1, mean temperature at 1800 UTC from 1 to 15 October 1997–2000 is calculated.
R functions to interpolate reanalysis data to specified points in space and time
The function NCEP.interp() interpolates variables from the R-1 or R-2 data set to specified locations in space and time (see Fig. 2 for a workflow of this procedure). The user must specify the atmospheric variable, level (again, pressure level, surface or T62 Gaussian grid) from which the variable should be obtained, and spatial and temporal location to which the variable should be interpolated. Further optional arguments include parameters to control interpolation. NCEP.interp() will accept vectors as arguments and can, therefore, easily be applied to all of the points in an ecological data set with a single command. Thus, users could calculate the temperature and wind conditions at each location along the entire migratory route of an individually tracked animal (e.g. Shamoun-Baranes et al. 2003; Shamoun-Baranes, Bouten, & van Loon 2010 and references therein).
NCEP.interp() queries the NCEP data base, utilising OPenDAP via the Internet, obtaining data from the eight grid points surrounding the desired location in space and time. If the method of interpolation is given as ‘linear’, the function performs trilinear interpolation in latitude, longitude and time. Alternatively, if the method of interpolation is ‘IDW’, the function performs inverse distance weighting (Shepard 1968) in space followed by linear interpolation in time. The user can turn off interpolation in space or time or both, in which case NCEP.interp() performs ‘nearest-neighbour’ interpolation returning the value of the closest grid point in space or time or both, respectively. Spatial interpolation is always performed assuming a spherical grid rather than a planar surface. To indicate the precision of an interpolated result, NCEP.interp() calculates the standard deviation of the values used to perform the interpolation. Thus, precision is described in the same units as the interpolated output, with smaller values indicating less variability among the predictor points. Some variables in these data sets (e.g. cloud cover) should not be temporally interpolated as they describe conditions over an interval of time rather than at a specific point. For these variables, NCEP.interp() will not perform interpolation in time and instead automatically retrieves data describing the interval within which the specified date-time falls.
R functions to visualise reanalysis data
Regardless of whether data are obtained over a spatiotemporal extent or interpolated to a point in space and time, it is often desirable to visualise these data on a map.
NCEP.vis.area() produces a contour plot of a single date-time (or single aggregated date-time) from a data-array (Fig. 1). Specifying only the input data-array and date-time to visualise, the user can quickly obtain a representative map. The map’s spatial dimensions, for instance, are automatically set according to the spatial range of the input data-array. The user can also manually configure each aspect of the map.
NCEP.vis.points() produces a map indicating the value of a variable interpolated to point locations as obtained using NCEP.interp() (Fig. 2). The colour of each point indicates the interpolated value. The user only needs to specify the location of each point and the value of the variable at that point to produce a representative map, yet all aspects of the map are configurable.
RNCEP is intended to streamline access to and organisation of atmospheric data from two freely available high-quality gridded data sets with global coverage. Although other tools exist to extract and visualise these data (e.g. GrADS Doty, Holt, & Fiorino 1995; IDV Murray et al. 2003), the RNCEP package integrates those functions that are particularly useful for an ecologist: data download, format conversion, subset, aggregation, interpolation and visualisation. The package enables rapid adjustment of the spatial and temporal ranges obtained, translating logical user input into the necessary commands in OPenDAP and returns data in an easily interpretable, unpacked format. Further, RNCEP does not require that any data be stored locally, nor must any particular data base connection be maintained. With only an Internet connection, data may be obtained on-demand, explored, manipulated, integrated into R statistical analyses and either saved in various formats or discarded. In the future, other gridded data sets (e.g. the NCEP Global Ocean Data Assimilation System) that are available online in a stable format may be accessed using the approach of the RNCEP package. In this way, many functions in the RNCEP package may be applied directly to these new data sets.
Ecology is inherently a multidisciplinary field at the interface between various Earth systems. As such, tools to integrate data from the different systems, enabling efficient and versatile research, are an important aspect of an e-science environment (Hey & Trefethen 2005). We hope RNCEP will facilitate and encourage the incorporation of atmospheric data into ecological research and promote the exploration of relationships between biological systems and atmospheric conditions.
To cite RNCEP or acknowledge its use, cite this article as follows, substituting the version of the package that you used for ‘R package version 1.0.1’:
Kemp, M. U., van Loon, E. E., Shamoun-Baranes, J., & Bouten, W. (2011). RNCEP: global weather and climate data at your fingertips. Methods in Ecology and Evolution, 2. doi: 10.1111/j.2041-210X.2011.00138.x (R package version 1.0.1).
NCEP R-1 and NCEP R-2 data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their website at http://www.esrl.noaa.gov/psd. R-2 data were produced with the support of the U.S. National Weather Service and U.S. Department of Energy. Please acknowledge the use of NCEP data in any documents or publications. Our studies are facilitated by the BiG Grid infrastructure for e-Science (http://www.biggrid.nl), Flysafe2 and the Dutch National Authority for Data concerning Nature (GaN; http://www.gegevensautoriteitnatuur.nl). The authors thank the associate editor of MEE and two anonymous reviewers for constructive comments on an earlier draft of the manuscript.