Journal of Geophysical Research: Atmospheres

Data collection and management for Global Energy and Water Cycle Experiment (GEWEX) Continental-Scale International Project (GCIP)



[1] The Global Energy and Water Cycle Experiment (GEWEX) Continental-Scale International Project (GCIP) was focused on the Mississippi River basin to take advantage of the existing meteorological and hydrological networks that were upgraded with new Doppler radars, wind profilers, and automatic weather stations together with an upgraded version of the Geostationary Operational Environmental Satellites (GOES) operated by the United States. The Mississippi River basin encompasses a wide range of climate, soil moisture conditions, vegetation types, and surface topography. The GCIP Science Plan identified the major phase of GCIP as the 5-year Enhanced Observing Period (EOP), targeted for 1995 to 2000, during which original data sets would be assembled for the GCIP database. GCIP organized these data into in situ, model output and satellite remote sensing data. Data from three mesoscale models were included with the start dates ranging from May 1995 to April 1997. A number of GCIP Initial Data Sets (GIDS) were prepared, starting in 1993, to provide the data services support during the buildup period before the 5-year EOP. The data sets were compiled for on-line access by GCIP investigators and were also published on CD-ROM. A number of special data sets, in addition to the EOP data sets, were compiled during the course of GCIP. Some of the most significant special data sets are summarized in this paper.

1. Background

[2] The recognition that water and energy budgets were not well understood on regional and global scales was a significant factor that led to the establishment of the Global Energy and Water Cycle Experiment (GEWEX) in 1987. It was further recognized that the global data were inadequate to help develop and apply to new schemes for representing the hydrological cycle in climate models. For this reason the GEWEX leaders decided to concentrate on areas of the globe where the volume and quality of data should be sufficient as test beds.

1.1. GEWEX Continental-Scale International Project (GCIP) Selection

[3] The International GEWEX Scientific Steering Group at its January 1990 session proposed a continental-scale project as a major new effort to be undertaken during the first phase of GEWEX. The primary purpose of such a continental-scale project was to develop a test for both atmospheric and hydrologic components of future climate models over a selected river basin for a minimum of five years. The river basin on which such a project would focus needed to meet the following criteria: (1) encompass a sizable number of horizontal grid squares within current generation global climate models; (2) be of a practical size to enable assembling adequately comprehensive databases using current and incipient observing technologies; (3) encompass a wide range of climate, soil moisture conditions, vegetation types, and surface topographies; and (4) have adequate ground-based observing and data systems, as well as historical records.

[4] It was decided that the Mississippi River basin could satisfy the requirements listed above. The watershed of the Mississippi River basin (3.2 × 106 km2) is the third largest of the 16 rivers in the world with individual flow rates of more than 104 m3 sec−1 and the largest of any Northern Hemisphere river providing flow to the oceans [Baumgartner and Reichel, 1975]. Figure 1 shows the tributaries of the Mississippi River basin on the North American continent from the Appalachian Mountains in the east to the Rocky Mountains in the west; this is a region of widely varying land use, vegetation, soil conditions, groundwater characteristics, and topography. The varying landscapes provide a diverse region in which to study clouds, precipitation, runoff, and land-vegetation-atmosphere interactions.

Figure 1.

Mississippi River basin, the focus of GCIP activities.

[5] The extensive meteorological and hydrological networks covering the continental United States along with the implementation of the upgrade of the U.S. observing network with new Doppler radars, wind profilers, and automatic weather stations could provide the best opportunity for collecting the required extensive data sets.

1.2. GCIP Implementation Strategy

[6] The completion of the GCIP Science Plan [World Meteorological Organization (WMO), 1992] was the impetus to begin the serious planning for the implementation of this long-term climate research project with an emphasis on the compilation of the data sets required to carry out the research needed to achieve the GCIP Science Objectives (formulated in early 1990, slightly revised later in 1995; see National Research Council (NRC) [1998]) are as follows: (1) determine the time and space variability of the hydrological and energy budgets over a continental scale; (2) develop and validate macroscale hydrological models, related high-resolution atmospheric models and coupled atmospheric/hydrological models; (3) develop and validate information retrieval schemes incorporating existing and future satellite observations combined with enhanced ground observations; and (4) provide a capability to translate the effects of a future climate change into impacts on water resources on a regional basis.

[7] The GCIP Science Plan identified the major phase of GCIP as the 5-year Enhanced Observing Period (EOP), targeted for 1995 to 2000, during which original data sets would be assembled for the GCIP database. It also stated that “GCIP will aim to provide a data management system flexible enough to accommodate all types of data needed for the study of energy flux and hydrologic processes and for the evaluation of the water and energy balance of the Earth/atmosphere system. The system will also provide the user a catalogue and browse capability, to include listing of all relevant data even if not within the on-line archive, availability on a continual basis; a capability of handling many simultaneous inquiries; and the ability to access data in other centres.”

[8] The two pivotal components of GCIP [WMO, 1992] were (1) the development of a comprehensive observational data base for the Mississippi River basin that would be readily available for GCIP analyses, and (2) the establishment of an evolving program of model development that would permit the observations to be extended spatially within GCIP or applied globally with new observations. A series of planned and ad hoc research and technical activities addressing observing systems, algorithm development, quality assurance issues, and water and energy budget studies linked these pivotal components, as shown in Figure 2.

Figure 2.

Research and technical activities linking the pivotal components of GCIP.

[9] The overall strategy for implementing GCIP [International GEWEX Project Office (IGPO), 1993] is shown in Figure 3. The database development portion is described in the sections that follow. It was an early implementation decision that model development will follow the two paths shown in Figure 3. On the “operational path” were tasks that needed to be completed prior to the start of the EOP in 1995 to enhance the output data sets to be archived for GCIP during the EOP. Other tasks included work on algorithms for deriving remotely sensed data from operational radars and satellites and also included the reanalysis of GCIP data sets using further improved operational models. On the “research path” were the longer-term modeling and analysis tasks that would contribute to achieving the GCIP science objectives and described in the research volume of the GCIP Implementation Plan [IGPO, 1994a].

Figure 3.

GCIP implementation framework.

2. GCIP Data Management

[10] A GCIP Data Workshop was held in Saskatoon, Canada, in May 1992 to assess the requirements for data and data services as they were either expressed or implied within the GCIP Science Plan. It was agreed that the efforts needed to compile and manage the composite data sets for the 5-year EOP could not be estimated to any degree of accuracy with the information currently available. It was recommended that a separate activity be started by GCIP as soon as possible to begin the task of designing, developing and implementing a GCIP Data Management and Service System (DMSS).

2.1. Data Collection and Management

[11] The GCIP Data Collection and Management (DACOM) Committee, formed in the summer of 1992, had the responsibility to (1) coordinate the design, implementation and operation of an efficient and effective DMSS for the GCIP; (2) advise on the GCIP Data Management Plan; (3) advise on the availability of data from existing data centers to meet the needs of GCIP; (4) inform GCIP on the status of observation systems and networks that constitute the sources of GCIP data; (5) coordinate interagency data collection and management tasks in GCIP; and (6) recommend and, where possible, facilitate improvements to observation and data management facilities for GCIP.

[12] The period from September 1992 through March 1993 was one of intensive activity for the DACOM to further define the database development as part of the implementation framework previously shown in Figure 3. This intensive activity culminated in the preparation and publication of Volume I of the GCIP Implementation Plan entitled “Data Collection and Operational Model Upgrade” [IGPO, 1993]. This volume contained information that (1) identified the sources of observations from existing and planned networks; (2) further enhanced those networks where necessary; and (3) assisted in developing data sets accumulated from existing observational systems and derived from operational model outputs.

[13] The issues of data management for GCIP were divided into strategic and tactical planning efforts. The strategic portion of the data management planning established the implementation strategies needed to achieve the following overall data management objectives: (1) During the course of the Project, the GCIP data management system would compile information on data collected in the data centers to produce composite data sets for GCIP users. (2) At the completion of the Project, the GCIP data management system would turn over the composite data sets and documentation (metadata) to a permanent archiving agency for continued use in climate-related studies.

[14] The tactical data management planning was carried out for each definable data set compiled within the Project. It was recognized that each data set could have unique features but it would be necessary to define a consistent set of information to assure accurate communications between the users of the data and the data suppliers. The tactical data management plan served as the reference document during the collection and compilation phases for each data set. This plan was updated and revised as necessary to become the documentation and final report for the specific data set.

[15] The strategic planning for data management was culminated in the publication of Volume III of the GCIP Implementation Plan in March 1994 entitled “Strategic Plan for Data Management” [IGPO, 1994b]. Within this Plan the DACOM committee recommended that the GCIP DMSS be distributed among four Data Source Modules shown in Figure 4. The organizations supporting these data source modules were asked to develop working relations with the existing data centers that could provide their specific type of data and to provide user services in coordination with those provided by the existing data centers. The DACOM committee recommended an evolutionary implementation approach for the data source modules. This recommendation was carried out during the next several years by (1) UCAR Joint Office for Science Support (JOSS) for the In Situ Data Source Module, (2) UCAR/NCAR for the Model Output Data Source Module, (3) NASA/Marshall Space Flight Center (MSFC) for the Satellite Remote Sensing Data Source Module, and (4) Distributed Module for Special data sets with the producers of these special data sets maintaining and providing user services for these data.

Figure 4.

GCIP-DMSS user services configuration.

2.2. Buildup for the Enhanced Observing Period

[16] By the spring of 1993, the implementation planning for GCIP Research [IGPO, 1994a] had evolved to a multistage framework employing four tiers of developmental studies: (1) Continental-Scale Area (CSA), >106 km2; (2) Large-Scale Area (LSA), 105 to 106 km2; (3) Intermediate-Scale Area (ISA), 103 to104 km2; and (4) Small-Scale Area (SSA), <10 to 102 km2.

[17] The Mississippi River basin encompasses a wide range of climate, soil moisture conditions, vegetation types, and surface topography. The watershed area is large enough (3.2 × 106 km2) to form several large-scale subregions on the order of 106 km2, where focused studies can be done on a variety of scales. Large area studies can be specialized to take advantage of the unique characteristics of each subregion while providing a continental-scale context for a number of atmospheric and hydrological focused studies.

[18] The Mississippi River basin was subdivided into four LSAs shown in Figure 5 with specific focused studies in each (see Table 1). This progress enabled GCIP to begin a two-year buildup period as was identified in the GCIP Science Plan [WMO, 1992]. The research and development activities during this period included initial diagnostic and evaluation studies, model validation and intercomparisons, and data systems tests prior to the EOP. A GCIP Static Data Systems Test consisted of a 3-month period from 1 February 1992 through 30 April 1992. It included data from the U.S. Weather Research Program's Fronts Experiment Systems Test (STORM-FEST), conducted from 1 February to 15 March 1992 and was augmented by hydrological, geographical, and vegetation data for the Mississippi River basin. An additional six weeks of atmospheric, hydrological and land surface data was added from existing data centers.

Figure 5.

Boundaries for the GCIP large-scale areas.

Table 1. Significant Hydroclimate Features in Each of the Four LSAs
LSAMajor River BasinFocused Study
SouthwestArkansas-RedWarm season energy and water cycle processes, budgets and models
North-CentralUpper MississippiCold season energy and water cycle processes, budgets and models
EastOhio-TennesseeHeavy precipitation and excessive runoff
NorthwestMissouriLand cover effects over the annual cycle

2.3. GCIP Integrated Systems Test

[19] In compiling the information about the existing data sets and the modified or new types of observations to be available as a result of the upgrading of the observational networks for Volume I of the GCIP Implementation Plan [IGPO, 1993], the DACOM recognized the need for a dynamic systems test in addition to the Static Data Systems Test described above. Therefore it endorsed the idea of a GCIP Integrated Systems Test (GIST) as a GCIP pilot study to test the data collection capabilities and provide initial analyses in the pre EOP stage prior to 1995. It further proposed that the LSA-SW, shown in Figure 6, was the best LSA to carry out GIST because the implementation of the modernizing project by NOAA was scheduled to be largely completed by the summer of 1994.

Figure 6.

Latitude-longitude boundaries for the GCIP Integrated Systems Test area.

[20] A Detailed Design Workshop for the LSA-SW was held in October 1993 in conjunction with the annual meeting of the GCIP Science Panel. The principal results from this Workshop included the elucidation of a research plan for the LSA-SW as part of the EOP starting in 1995 [IGPO, 1994a]. It also summarized the data needed for such research as part of the GIST and agreed on a schedule for GIST from 1 April to 31 August 1994.

[21] The purpose of compiling the GIST data set was to create a prototype data set over one of the GCIP LSAs that was similar to the future data sets collected during the GCIP EOP that began in October 1995. In particular the objective was to formalize the formats and procedures for compiling, archiving, and disseminating the GCIP EOP data sets [IGPO, 1996].

2.4. Data Management During the Enhanced Observing Period

[22] At the completion of the of the GCIP Research Plan [IGPO, 1994a], a workshop was held to complete the design of a GCIP DMSS system to be implemented by the beginning of the EOP. The principal recommendations from this Workshop were as follows: (1) use of the World Wide Web framework as the DMSS infrastructure; (2) use of the Global Change Master Directory for the GCIP data set directory; and (3) use of two types of data set guides: (i) individual data sets; Data Source Modules are responsible for preparation and upkeep and (ii) tactical data management plans will serve as data set guides for composites.

[23] The issue of evolving data requirements over the course of the five-year EOP was resolved by making it part of the GCIP Major Activities Plan. The initial version of this Plan was published in December 1994 [IGPO, 1994c]. It provided the first efforts to integrate all proposed activities in the early phases of GCIP into a single action plan covering the 1995–1996 period, with an outlook of the plans for 1997. The Major Activities Plan served to focus the broad sweep of research, data collection and management, and other support activities into a series of action plans that were updated and issued on an annual basis throughout the life of the project with the final plan [IGPO, 1999] focused on the Missouri River basin.

2.5. Compilation of GCIP Data Sets

[24] The intent of the GCIP research community to rely as much as possible on existing data centers as the archive location of GCIP data means that data sets are geographically distributed among these data centers. There was an intensive effort to compile a centralized set of information on the data sets regardless of location. In some cases this set consists of a directory and inventory of the data set, and in other cases it consists of only directory information with the inventory information available from the data center where the data set is stored. A tactical data collection and management plan was prepared for each definable data set compiled by the Project. This plan was converted to a data summary report when the compiled data set was completed.

[25] A number of GCIP Initial Data Sets (GIDS) were prepared to provide the data services support during the buildup period before the 5-year EOP. Preparation of the GIDS started in 1993, and the data sets were compiled for on-line access by GCIP investigators to the extent that was technically and economically feasible. They were also published on a CD-ROM for wide distribution, especially to international persons interested in performing initial diagnostic, evaluation, and modeling studies on GCIP-related topics. The specific data sets compiled during the Enhanced Observing Periods of GCIP are identified in Figure 7. A number of special data sets, in addition to the EOP data sets, were compiled during the course of GCIP. Some of the most significant special data sets are summarized in section 6.

Figure 7.

Compiled standard data sets for GCIP research.

3. In Situ Data

[26] The extensive meteorological and hydrological networks covering the continental United States was a principal factor in selecting the Mississippi River basin as the location for the GCIP study area [WMO, 1992]. The upgrading of these networks during the early and mid 1990s by the National Weather Service with new Doppler radars, wind profilers, and automatic weather stations provided the best opportunity for collecting the required essential data sets needed for the estimation of water and energy fluxes and related climate parameters, and for the validation of these estimates based on remote sensing. An assessment of the available atmospheric, hydrologic and land surface observations over the Mississippi River basin [IGPO, 1993] revealed a variation among the different parameters from very plentiful to sparse, and in a few instances such as soil moisture and atmospheric flux measurements, were essentially nonexistent. In the latter cases the GCIP supported the enhancement of these measurements in selected locations of the Mississippi River basin. These and other special data sets are described in section 6.

3.1. In Situ Data Source Module

[27] The In situ Data Source Module was responsible for providing data management and information resources for surface, upper air, radar, and land surface characteristics data of interest to GCIP. It was located at UCAR in Boulder, Colorado, within the Joint Office for Science Support (JOSS). The relevant GCIP data sets compiled by this Module can be accessed on the World Wide Web at The In situ Data Source Module maintained data/metadata access and information, GCIP data management plans and reports, and links to data information sources, other DMSS Modules, collaborating projects, and other related data activities.

3.2. In Situ Data Sets

[28] In situ data collected for GCIP consisted of data sources from National, Regional, State, and local networks. A complete survey of all in situ data sources in the Mississippi River basin was performed and details are provided in the various GCIP Tactical Data Management Plans. These data were collected, processed, archived, and disseminated to the GCIP community by JOSS. In addition, a series of “composite” data sets were produced. This included a Near Surface Observation (NESOB) special data set in the Arkansas-Red River basin area, which is described in section 6.

3.2.1. Upper Air and Surface Data

[29] Upper air rawinsonde and vertical profiler data were collected from a variety of operational (e.g., NWS) and research (e.g., DoE/ARM) networks in the Mississippi River basin at the highest vertical resolution available. These data sets were processed, quality controlled, and archived by JOSS. High-vertical-resolution winds from NWS rawinsonde soundings were computed by using balloon position information [Williams et al., 1993, 1998]. Quality control was also performed on all soundings by JOSS using a combination of automated internal consistency checks (i.e., gross limits; rate of change) and visual plotting. Further information on the JOSS upper air processing procedures is given by Loehrer et al. [1996, 1998].

[30] Surface meteorology and hydrology data were collected from a variety of operational and research networks in the Mississippi River basin at the highest resolution available. Meteorological data included standard observations of temperature, humidity, pressure, wind, precipitation, radiation, as well as derived flux measurements. Most of these data were compiled into “composite” data sets and described in the next section. Certain stations also recorded a variety of soil measurements such as profiles of soil temperature and moisture. Hydrological data collected included streamflow and reservoir level.

[31] Radar data were collected primarily from NWS Doppler Weather Service Radars (WSR-88D) in a variety of forms. Highest resolution included the WSR-88D individual radar Level II data, but Level III (derived products) and various merged Stage Product data sets (e.g., raingauge and derived precipitation) were also archived along with composite imagery (such as the national radar composite) and the River Forecast Centers gridded radar products.

[32] Land characterization data included in the GCIP archive were also integral to understanding the hydrology and water cycle. These included multisoil characteristics data (e.g., soil type and depth), vegetation index, and a variety of elevation and digital elevation model data sets. Further details on these land characterization data sets are given in section 6.

3.2.2. Composite Data Sets

[33] A series of surface and upper air “composite” data sets were compiled for GCIP. The major advantage of compiling these surface and upper air “composite” data sets is effort/cost efficiency by eliminating the need for each investigator to individually re-process separate network data sets.

[34] A surface “composite” data set involved the collection of all operational and research surface network data from all available sources; extraction of common standard meteorological parameters; conversion of all data to a common format; provision of uniform quality control; and, generation of final “composite” data sets at various time resolutions (e.g., 1-min and hourly). For surface “composites,” JOSS performed horizontal quality control on station observations of pressure, temperature, dew point, wind speed and wind direction by comparing “expected values” computed using an objective analysis method adapted from that developed by Cressman [1959] and Barnes [1964]. This method allowed for short term (30 day) variations by using 30-day standard deviations computed for each parameter when determining the acceptable limits for “good,” “questionable,” or “unlikely” flags. “Expected values” were computed from inverse distance weighted station observations within a 300 km radius of influence (ROI) centered about the station being quality controlled (the station being quality controlled is excluded). In addition, separate “composites” of in situ precipitation (15-min, hourly, and daily) were generated. Gross limit checks were used to flag the precipitation values.

[35] Similar to the process outlined above for generating surface “composites” data, two upper air “composites” were generated from operational and research rawinsonde data. These included all soundings: (1) in highest vertical resolution and (2) interpolated to 5-mb levels.

3.2.3. CD-ROMs

[36] A series of CD-ROMs for specific data collection periods were compiled for GCIP: (1) The first GCIP Initial Data Set (GIDS-1) from 1 February to 30 April 1992; (2) the GCIP Integrated Systems Test (GIST) from 1 April to 31 August 1994; and (3) the Enhanced Season Observing Period (ESOP-95) from 1 April to 30 September 1995.

[37] CDs contain in situ data, documentation, imagery, and companion software tools to browse and display the data (i.e., areal plots, time series plots, altitude plots, image displays) to support DOS, MacIntosh, and UNIX based systems. In addition, a companion CD-ROM to GCIP was produced by the U.S. Geological Survey containing geographic information for the entire GCIP domain (described in section 6). A fifth CD-ROM provides a water and energy budget synthesis for the GCIP period. The synthesis includes a brief description of the Mississippi River basin climate, physiographic characteristics, available observations, representative types of models used for GCIP investigations, as well as a comparison of water and energy variables and budgets from models and observations.

[38] All the CDs produced for GCIP are incorporated into the GCIP data set to be archived at the NCDC.

3.3. In Situ Data Access

[39] To facilitate easier access to the GCIP in situ data, an on-line data matrix table has been developed and can be obtained from the In situ Data Source Module or directly at This table organizes data sets by category (e.g., precipitation, fluxes, upper air, etc.) and by GCIP data collection periods and LSAs. Links are provided to directly access data sets residing either at JOSS or distributed at remote servers.

[40] GCIP in situ data are archived and disseminated through the UCAR/JOSS Distributed Data Management System also known as CODIAC ( CODIAC is an on-line, interactive data management system that consists of a data catalog, data inventories, station descriptions, and an order entry system. CODIAC is a distributed system that allows the user to link to an in-house database or other remote centers with on-line data systems (e.g., NCDC) for further information on data sets and data delivery. CODIAC provides information about each data set by title, abstract, time, location, and frequency of observations as well as the appropriate metadata. Detailed information on stations and observing platforms include station name and location as well as observed parameters.

[41] The user may browse selected data sets. This includes time series plots for surface parameters, skew-T/log-p diagrams for soundings, as well as GIF or PNG images for radar composites, model analyses, and satellite imagery. CODIAC also allows users to directly retrieve data. On-line data sets may be downloaded via the Internet or can be sent via magnetic media (i.e., 9-track, Exabyte, or Digital Audio tape). Off-line data are available only via magnetic media. The user can use World Wide Web “forms” to order the data on-line. Data may be selected by time and/or location and are available in several formats depending on the data set in question. Any documentation concerning the data itself, processing steps, or quality control procedures used is automatically included.

4. Model Output Data

[42] The priority objective for GCIP was to further the understanding of the terrestrial hydrological and energy cycles and to enhance the ability to observe and model these cycles on scales appropriate for climate studies. The major components of the terrestrial hydrological cycle are soil moisture, surface evaporation, water vapor, clouds, rainfall and runoff. Since a number of these variables are not observed routinely, e.g., surface evaporation and soil moisture, it was necessary for GCIP to place a heavy emphasis on the products of a 4-Dimensional Data Assimilation (4DDA) system.

4.1. Model Output Data Sets

[43] Several operational numerical weather prediction models and associated 4DDA systems provided the data needed for both the observation and modeling of the hydrological and energy cycles. These included the National Centers for Environmental Prediction (NCEP), the Canadian Meteorological Centre (CMC) and the NOAA Forecast Systems Laboratory (FSL). The fundamental features for the GCIP model output data from operational systems were [IGPO, 1993]: (1) Acquisition of model output from several operational centers from a range of operational models of varying resolution, physics, and data assimilation systems. (2) Enhancement of the traditional model output to include additional fields needed by researchers to perform meaningful studies of the water and energy cycles. (3) Identification of key near term model or assimilation upgrades deemed essential for the GCIP EOP. (4) Documentation of operational model changes during the GCIP timeframe.

[44] The model output fields compiled as part of the GCIP data sets consisted of four types: (1) Gridded 2-D fields, mostly comprised of surface, subsurface, and top of the atmosphere fields. (2) Gridded 3-D atmospheric fields with subsurface fields added as they were developed and implemented during the course of the EOP. (3) Vertical profile time series at selected points labeled as Model Output Location Time Series (MOLTS) providing values at a temporal frequency of hourly or better. (4) Fixed fields which referred to those gridded fields that remain fixed from day to day such as terrain and soil characteristics.

4.2. Collection and Management of Model Output Data

[45] The National Center for Atmospheric Research (NCAR) has archived model output from three mesoscale models with the Eta model output from NCEP starting in 1995, the Global Environmental Model (GEM) model output from the CMC and the Mesoscale Analysis and Prediction System (MAPS) model output from the FSL starting somewhat later. All three models were providing model output data on a routine basis by 1997. The limitations in resources available forced GCIP to collect the gridded 3-D fields for the Eta model only.

[46] Since mesoscale work often requires access to analyses from global models, these data are briefly discussed. In addition, some researchers need access to the global sets of observations and those are available. Documents are currently being compiled for scanning and will be placed on-line for easy access by users.

4.2.1. Mesoscale Model Data at NCAR

[47] The mesoscale model data for North America includes data from the NCEP Eta model starting May 1995; from the NOAA MAPS (Mesoscale Analysis and Prediction System) model starting August 1996; and from the Canadian GEM (Global Environmental Model) model starting April 1997. By the end of the GCIP Enhanced Observing Period in April 2002, there were almost seven years of data from Eta; 5.7 years from MAPS; and 4.7 years from GEM. By January 2000, NCAR had 510 GB of data in the archives and this grew to 1049 GB by April 2002. By January 2000, a cumulative total of 450 GB of these data were delivered to users, which increased to 2860 GB by April 2002. The data with the most intense use is from the Eta model.

4.2.2. Horizontal Resolution of Mesoscale Models

[48] NCAR has one archive of NCEP mesoscale model data for North America that started October 1971. It has a resolution of about 190 km. By 1994, the typical resolution of an operational model covering North America was 60 to 80 km. During about 1999–2001 the operational model resolution was usually about 30 km. The model output data archived for GCIP was provided on a Lambert Conformal Map base with a nominal resolution of 40 km in the Mississippi River basin. This map base is used as a standard output by the National Weather Service and is labeled as AWIPS 212 to provide coverage over North America from about 18°N to 60°N latitude.

4.3. Mesoscale Reanalysis

[49] NCEP plans to do a mesoscale reanalysis for the years 1979–2003, with output each three hours. It will use the Eta model with a resolution of about 30 km, and a domain that includes the Conterminous 48 states, Mexico, Canada, Alaska, most of Greenland, the North Pole, and Hawaii. NCAR has plans to archive the output data.

4.4. NCAR Archives of Observations for Reanalysis

[50] The global archives of observations compiled by NCAR are needed for reanalysis work including that using regional mesoscale models. The latter require the results from global models for boundary conditions, which used the global observations. Also, mesoscale models will be used for reanalysis in many parts of the world. Many of these models will ingest observations for their region or continent. Their main source of observations will be from the global archives.

[51] The data work at NCAR started in 1991 to prepare for the NCEP/NCAR reanalysis for the period from 1948 to 2001. Numerous diagnostics were run on the observations and many problems were detected and resolved. For example, hydrostatic consistency checks were performed on the rawinsonde data. These diagnostics also checked for agreement between the station elevation and the near surface levels of the sounding, which detected stations identified incorrectly in the archives. Statistics on the failure rate of the hydrostatic check at each vertical level of each radiosonde station and each month were calculated. These identified stations and months where all of the data in the stratosphere had been assigned to the wrong pressure levels. For many sets of ship upper air observations, checks of the ship's tracks were identified and erroneous ship locations were fixed. There was often overlapping data between different sets of rawinsonde data. Numerous comparisons of reported data between the data inputs showed that a few stations were wrong in some data inputs; it also showed that the wind units (knots or m/sec) were wrong for some stations in some sets. Over 100 station-years of rawinsonde data were found for which the recorded date and time was wrong by 12 hours or more. The types of observations in these archives are rawinsondes, pibals, aircraft, satellite-cloud winds, satellite temperature soundings, surface 3-hour synoptic, and ocean surface observations. NCAR used previous data gathering experience during 25 years and kept gathering more observations, using help from USAF, NCDC, Argentina, Brazil, Australia, UK, France, and many other countries. One key input was all of the observations used for operational analyses by NCEP starting in 1962. Documents are available that describe most of this work: Kalnay et al. [1996] describe the NCEP/NCAR reanalysis. This paper has two pages about the preparation of the observations. Jenne [1999] describes the observations in more detail. Further information about the data archives at NCAR is given in section 4.6.

4.5. Other Model Output Data at NCAR

[52] Users may need analyses for regions that are outside of the domain of a particular mesoscale analysis. The output from the global models can serve that need. Also some researchers may need boundary conditions from a global analysis, in order to run their own mesoscale model for their region. For example, a research group in Chile obtained data from a global model archive at NCAR in order to run a high-resolution mesoscale model for their mountainous region.

[53] The horizontal resolution of the global models has been getting smaller, even when run in the mode for reanalysis. The NCEP/NCAR reanalysis (for 1948–2001, 54 years) resolution is T62 (208 km). The ECMWF ERA-40 reanalysis started production about May 2000 for 1987 onward, and September 2001 for 1957–1987. It has a resolution of T159 (0.75 degree, or 83 km). The easiest data to use from these models is a 2.5 degree set of data in pressure coordinates (about 250 km resolution), but the data in model coordinates are also archived.

4.6. NCAR Data Documentation

[54] The NCAR Data Support Section (DSS) started a project in April 1999 to gather documents into bundles, prepare more documents, and digitize them so that they are available online. The production scanning started in March 2000 and by October 2000 papers through document bundle RJ0062 (4474 pages) were completed. In September 2002, work was completed through document RJ0229 (14,725 pages). These documents cover many subjects. For example, there are 16 documents about satellite data (about 915 pages), and 17 bundles (1240 pages) about rawinsonde data. Two documents that give more information about the mesoscale model archives at NCAR, and the sets of global observations are (1) RJ0200, Model Data Center for GCIP (28 pp), and (2) RJ 0212, Observations for Reanalysis 1946–2000 (55 pp). A list of these documents is online at To find other documents that may be of interest, an initial version of a guide to the documents has been prepared (RJ0281, 66p).

5. Satellite Remote Sensing Data

[55] The GCIP Science Plan [WMO, 1992] established a guiding principle that GCIP would, to the extent possible, rely upon existing or planned operational observing programs, including space-based observations. The GCIP implementation planning [IGPO, 1993] recognized a strong GCIP need for satellite data to provide retrievals of atmospheric, hydrologic and land surface variables.

[56] The purpose of the Satellite Remote Sensing Module, operated by the NASA/MSFC Global Hydrology and Climate Center in Huntsville, AL and shown in Figure 4, was to assist GCIP users in obtaining information about the available satellite data and products for the GCIP area. Because of the vast quantity of data and products from the multitude of satellites and instruments, the approach was to exploit the existing archives and data sets at the national operational or archive centers. Additionally, data were available from commercial companies. The archive centers provided the necessary infrastructure in data processing, verification and validation, product generation, and data distribution. Many centers had developed sophisticated information systems that simplified the process of locating specific data and products and ordering the data. Because the standard product generation from the archive centers might not satisfy all the needs of the GCIP community, custom data sets could be generated to facilitate group or individual research efforts.

[57] The operational polar orbiting and geostationary meteorological satellites were identified as the primary source of satellite remotely sensed data for GCIP [IGPO, 1993]. This enabled the GCIP data collection and management effort to rely upon the existing NOAA satellite data processing at the NESDIS for near- real time operational data products and the NESDIS satellite data archives in the National Climate Data Center (NCDC) for the retrospective satellite data for a significant portion of the satellite data used by the GCIP investigators.

[58] A particular interest for GCIP is the need to link radiative and hydrological processes. Satellites provide the most realistic approach for obtaining radiative fluxes on scales of interest in climate studies. In the framework of a NOAA/NESDIS activity entitled “Geostationary Satellite Products for GEWEX Continental-Scale Project (GCIP)” directed by J. D. Tarpley and supported by the GCIP Project, an insolation algorithm developed at the University of Maryland [Pinker and Laszlo, 1992] was implemented by the NOAA/NESDIS starting with GOES 8 in 1994. The short-wave surface radiation budget products were produced operationally starting in January 1996. The Satellite short-wave radiation products derived from the GOES satellite data include (1) surface downward flux, (2) surface downward photosynthetically active radiation (PAR), (3) top of atmosphere downward flux, and (4) top of atmosphere upward flux.

[59] The estimates are being made on an hourly basis for 0.5 degree targets for an area bounded by 70–125 W longitude and 25–50 N latitude. Further details about this special satellite data set for GCIP are available as of this date on the World Wide Web at the URL address:

6. Special Data Sets

[60] A number of special data sets were compiled during the course of GCIP in addition to the satellite short-wave radiation budget data set described in the preceding section. In most cases, the person or organization that compiled the special data set maintained the management and distribution of their data set. With the completion of GCIP these special data sets are being compiled to assure continuity as part of the overall GCIP data set. Some of the most significant special data sets are summarized in the remainder of this section.

6.1. Near-Surface Observation (NESOB) Data Set

[61] A NESOB was designed to satisfy a composite set of data requirements that is suitable for (1) land surface process studies, (2) validation and verification of land surface processing schemes, (3) detailed validation and verification of model output from regional land-atmosphere coupled models, and (4) derivation of surface energy and water budgets.

[62] The data for this composite data set was collected from a region encompassing the ARM/CART site in Oklahoma and Kansas operated by the U.S. Department of Energy (DoE) and the Little Washita Watershed operated by the Agriculture Research Service of the U.S. Department of Agriculture. The vertical dimension includes from 3000 meters above the surface to two meters below the surface in three components: Boundary Layer (Z < 3000 meters), Surface Layer (0 < Z < 10 meters), and Subsurface Layer (−2 < Z < 0 meters). The land surface studies and models can use the data at point locations to force land surface models or can make use of the observations to complete an area analysis for different size areas within the NESOB domain. The difficulty in achieving a consensus on the techniques for an area analysis has necessitated a decision to compile data as close as possible to an observational measurement. This will enable an investigator to use whatever analysis techniques are deemed appropriate for their specific research.

[63] The coordination of this data compilation effort with the DoE/ARM project was carried out for GCIP by the DACOM with the compilation carried out by the JOSS. Two NESOB data sets were compiled during GCIP: (1) A 6-month warm season from 1 April to 30 September 1996 (NESOB-96) and (2) an annual cycle from 1 April 1997 to 31 March 1998. Further information on NESOB is located at:

6.2. GCIP Reference Data Set (GREDS)

[64] The U.S. Geological Survey supported the preparation of a CD-ROM containing a number of different data sets expected to have wide use among GCIP investigators. Data contained on this CD-ROM include Geographical Information System (GIS) files of hydrometeorological stations, land use, geology, physical and hydrologic boundaries, reservoirs, rivers, topographic map and Landsat scene indices, and a Digital Elevation Model (DEM). One of the major criteria for including a specific type of data on the CD-ROM was that the data were expected to change little if any during the course of the GCIP Enhanced Observing Period. A CD-ROM containing the GREDS data was published in 1995 and included as one of the GCIP archived data sets. Software to extract and view data is included.

6.3. Flux Tower Data Set

[65] Several long-term flux monitoring sites provided data during the GCIP EOP. A network of three stations supported by GCIP and operated by T. Meyers at NOAA established a special GCIP data set. Half-hourly observations of wind speed and direction, air temperature, relative humidity, pressure, incoming global radiation, incoming and outgoing visible radiation, net radiation, ground heat flux, precipitation, wetness, skin temperature, soil temperature (at 2, 4, 8, 16, 32 and 64 cm), average wind vector speed, kinematic shear stress, stream wise velocity variance, crosswind velocity variance, vertical velocity variance, sensible heat flux, latent energy flux, CO2 flux and soil moisture at 20 cm.

[66] The first long-term flux monitoring site was established within the Little Washita Watershed, near Chickasha, Oklahoma. In 1996 a tower (34 58′N, 97 57′W) was placed within a grazed pasture field. The final observations for this site were collected on 21 April 1999. During the summer of 1996, a second flux/meteorological system was installed just south of Champaign, Illinois, on a farm in a field rotated between corn and soybeans each growing season. The site (40 00′N, 88 22 23′W) characteristics are typical of those found throughout the midwestern United States, with most of the land in agricultural production. A third site started operation in Fort Peck, Montana, in November 1999 adjacent to the NOAA Surface Radiation (SURFRAD) facility. The tower (48 18′N, 105 6′W) is surrounded by natural grassland in all sectors providing a minimum fetch of 200 meters over relatively flat terrain.

[67] Further details on this special data set along with other flux site data are available on the World Wide Web at the URL address:

6.4. Soils Data Set

[68] Soil information is now widely required by many climate and hydrology models and soil-vegetation-atmosphere transfer schemes. Miller and White [1998] describe the development of a multilayer soil characteristics data set for the conterminous United States (CONUS-SOIL) that specifically addresses the need for soil physical and hydraulic property information over large areas. The State Soil Geographic Database (STATSGO) developed by the U.S. Department of Agriculture-Natural Resources Conservation Service served as the starting point for CONUS-SOIL. Geographic information system and Perl computer programming language tools were used to create map coverage of soil properties including soil texture and rock fragment classes, depth-to-bedrock, bulk density, porosity, rock fragment volume, particle-size (sand, silt, and clay) fractions, available water capacity, and hydrologic soil group. Interpolation procedures for the continuous and categorical variables describing these soil properties were developed and applied to the original STATSGO data. In addition to any interpolation errors, the CONUS-SOIL data set reflects the limitations of the procedures used to generate detailed county-level soil survey data to the STATSGO map units.

6.5. Five-Year WSR-88D Precipitation Data Set

[69] This data set contains hourly 4 km resolution radar-based precipitation for the Mississippi River basin for the period January 1996 to December 2000. This data set was developed by the University of Iowa's Hydrosciences and Engineering Department in conjunction with the Princeton University Department of Civil and Environmental Engineering. The data are derived from input composite reflectivity maps from the Global Hydrology Resource Center. Included with the data are tools for database manipulation and documentation.

[70] The precipitation data set consists of hourly accumulations with a spatial resolution of 4 by 4 km. Input data included the national 15-minute reflectivity composite from the NEXRAD network of WSR-88D radar, daily rain gauge data, and hourly rain gauge data. The final data set includes derived hourly rainfall maps for the entire Mississippi River basin as well as the input reflectivity data.

6.6. Soil Moisture Data Set

[71] In situ measurements of soil moisture have been made by a number of countries around the globe during the past 70 years. Robock et al. [2000] describe a global soil moisture data bank dedicated to collection, dissemination and analysis of soil moisture data from around the globe. Soil moisture measurements from the Mississippi River basin are included as part of this data bank. The Illinois Climate Network includes soil moisture measurements at 19 stations from 1981 to the present. Hollinger and Isard [1994] have given the data measurement, calibration procedures, and preliminary results for the period from 1981 to 1991. GCIP supported the installation of a network of 23 stations within the ARM/CART site that began operations in 1996 and is continuing to the present

7. Concluding Remarks

[72] One could identify a number of lessons learned as a result of the long and extensive GCIP data management effort. A number of these can be attributed to the unique conditions of this international project carried out largely within one country. For example, an early recommendation to the data management group was to structure the efforts along the traditional lines for an international project, (i.e., surface, upper air, and remote sensing data so that participants could contribute to setting up and operating a data center for GCIP). The DACOM considered it would be more cost effective to stress a configuration that was most compatible with the existing data centers in the United States. For this reason, the structure of in situ, model output, and satellite remote sensing data source modules were adopted.

[73] Another lesson learned was the value of maintaining a close working relationship and collaboration with existing data centers. This made it possible to transfer some of the quality control techniques developed by GCIP to achieve the research quality data sets, to the quality assurance groups at the existing data centers.

[74] The problems of checking the numerous observational data sets for reanalysis and obtaining more data were even larger than initially estimated. The diagnostic checks usually found a large number of problems. It was then necessary to develop techniques and computer code to help understand the nature of the various problems and fix them. Also, there were delays in obtaining added data inputs. Other data could not be obtained because of national policies. Some important sets of observations would have been lost if this task had not been done in the 1990s. In various data sets it was found that a number of surface and upper air stations have locations that are wrong by over 100 km and a few had errors of over 1000 km. In one case, all of the Southern Hemisphere stations were reflected to the Northern Hemisphere. In another case, west longitude became east longitude. One set of aircraft reconnaissance data had all locations wrong by 10 degrees of longitude. Some observations from ships were in tracks that moved across the Sahara desert! Future research will now have more of the older observations to work with, and they will be much cleaner and easier to use.

[75] The contribution of the DACOM Committee during the three-years prior to the start of the five-year EOP in 1995 resulted in the design and implementation of a solid infrastructure that made a major contribution to the overall success of GCIP. The demonstrated results of this major contribution is shown by the series of GCIP composite data sets archived for the second objective for the DMSS: At the completion of the Project, the GCIP data management system will turn over composite data sets and documentation (metadata) to a permanent archiving agency for continued use in climate-related studies.


[76] The authors acknowledge the contributions of the DACOM during the course of GCIP, in particular the contributions made by Arthur Booth, Wayne Faas, Wanda Ferrell, Rex Fleming, William Kirby, Christopher Miller, and Chester Ropelewski.