This paper demonstrates the potential of combining observed river discharge information with climate-driven water balance model (WBM) outputs to develop composite runoff fields. Such combined runoff fields simultaneously reflect the numerical accuracy of the discharge measurements and preserve the spatial and temporal distribution of simulated runoff. Selected gauging stations from the World Meteorological Organization Global Runoff Data Centre (GRDC) data archive were geographically coregistered to a gridded simulated topological network at 30′ (longitude × latitude) spatial resolution (STN–30p). Interstation regions between gauging stations along the STN–30p network were identified, and annual interstation runoff was calculated. The annual interstation runoff was compared with outputs from WBM calculations, which were performed using long-term mean monthly climate forcings (air temperature and precipitation). The simulated runoff for each cell was multiplied by the ratio of observed to simulated runoff of the corresponding interstation region from the GRDC data set to create spatially distributed runoff fields at 30′ resolution. The resulting composite runoff fields (UNH/GRDC Composite Runoff Fields V1.0) are released to the scientific community along with intermediate data sets, such as station attributes and long-term monthly regimes of the selected gauging stations, the simulated topological network (STN–30p), STN–30p derived attributes for the selected stations, and gridded fields of the interstation regions along STN–30p. These data sets represent high-resolution fields that are of value to a broad range of water-related research, including terrestrial modeling, climate-atmosphere interactions, and global water resource assessments.
 Spatially distributed runoff estimates can be derived from land surface hydrology models that rely on either climate data or atmospheric model outputs such as precipitation, air temperature, radiation, vapor pressure, and wind speed [Vörösmarty et al., 1989] and from atmospheric vapor budget calculations [Browning and Gurney, 1999]. When observed climate forcings are used, potentially large errors in their geographic specificity can arise. This problem is widely recognized in the climate research community [Willmott and Rowe, 1985], and such errors can then propagate through the water budget calculations [Vörösmarty et al., 1998a] and thereby considerably compromise the accuracy of the computed water budgets.
 One important way to validate components of hydrological models is to compare predicted and observed runoff, the latter computed as river discharge at gauging station divided by upstream contributing catchment area. Discharges can be measured more accurately than other components of the land-based energy and water cycles, with, perhaps, the exception of temperature [Krahe and Grabs, 1996]. Discharge measurements have an accuracy on the order of 10–20% [Dingman, 1994; Rantz, 1982], which is much higher than what typically can be achieved for precipitation [Hagemann and Dümenil, 1998]. The routine availability of such information could contribute to the validation and improvement of climate, terrestrial ecosystem, and water resource models, which often show marked discrepancies between observed and modeled runoff. Atmospheric scientists [Gutowski et al., 1997; Rudolf, 1998] and ecosystem modelers [Dirmeyer et al., 1999; Costa and Foley, 1997], and water resource assessments [Vörösmarty et al., 2000c] are beginning to adopt river discharge data for calibrating and validating their models.
 Even though discharge is an accurate measure of integrated terrestrial runoff, it typically offers little information on the spatial distribution of runoff within a watershed unless the river basins are highly instrumented. Disaggregation of the river discharge signal is necessary when spatially distributed runoff information is needed. Early works of Baumgartner and Reichel  and Korzoun et al.  estimated global runoff using manual techniques to develop such runoff fields on an annual basis.
 A collaboration between the University of New Hampshire (UNH) and the World Meteorological Organization's Global Runoff Data Centre (GRDC) seeks to develop automated procedures for routinely producing high spatial resolution runoff fields that are based on atmospheric drivers and observational discharge networks. The primary product of this initial joint effort is a set of monthly mean composite runoff fields (UNH/GRDC Composite Runoff Fields V1.0) on a 30′ global grid. The intermediate data sets, such as the simulated river network and the coregistered discharge gauging stations data, are also being released to the global research community. The remainder of this paper describes our methodology and gives some global and continental scale results.
 Our overall strategy is to distribute observed discharge data from monitoring stations as runoff over a simulated river network, determined simultaneously by a water balance model and by the observed discharge itself. This combination of modeled and observed information produces a spatially distributed pattern of terrestrial runoff reflecting variations in meteorological forcings, while at the same time constraining the magnitude of runoff by the instrumental record for discharge.
2.1. GRDC Discharge Gauging Station Data Set
 GRDC collects and maintains an archive of river discharge data with global coverage. Access to their data holding is regulated by GRDC's Policy Guidelines for the Dissemination of Data and Costing of Services [Grabs, 1997]. The GRDC's discharge data set has the advantage over publicly available global data sets (such as UNESCO/UNH RivDIS [Vörösmarty et al., 1996a, 1998b] of having continuous updates for many of its stations. The aim of the present study was to develop climatologically averaged monthly runoff fields. Mean monthly discharge time series associated with the discharge gauging stations therefore were averaged over the time period of observation. Unfortunately, the time period of observation varies station by station [Grabs et al., 1998]. This makes the resulting monthly discharge regimes not fully consistent. Substantial delays in data access and large declines in monitoring capacity prevent an accurate analysis of the recent past [Vörösmarty et al., 1999, 2000c].
2.2. Digital River Networks
 The Global Simulated Topological Network (STN–30p), which is a gridded river network at 30-min spatial resolution [Vörösmarty et al., 2000a, 2000b] and allied river-based geographical information system (Global Hydrological Archive and Analysis System, UNH-GHAAS) was used to organize the landmass of the Earth. STN–30p stands out from similar gridded network products [Renssen and Knoop, 2000; Oki et al., 1999; Graham et al., 1999] because of its extensive and careful validation [Vörösmarty et al., 2000a] and provides not only the connectivity of neighboring grid cells defining flow directions, but also provides an intensive set of attributes for each grid cell (catchment area upstream, mainstem length upstream, distance to basin outlet, stream order using Strahler ordering scheme [Strahler, 1964], etc.) and drainage basin (basin area, mainstem length, receiving water body, etc.). STN–30p represents 59,132 continental land cells, which are linked as river systems within 6152 river basins ranging in size from a few hundred square kilometers to 5.8 × 106 km2. STN–30p has been validated against several independent atlases and station-based attribute sources, such as the UNH edition of the UNESCO global selected discharge series [Vörösmarty et al., 1996a], and R-ArcticNet, a pan-Arctic discharge data set including station data from Russian, Canadian, and U.S. monitoring archives representing the Arctic region [Lammers et al., 2001].
2.3. Geographic Coregistration of Discharge Gauging Stations to STN–30p
 GRDC stations were geo-registered to locations on STN–30p using a semi-automated procedure detailed by Fekete et al. . The coregistration of this information permitted a coherent assessment to be made of suitable candidate sites. Out of the ∼3700 stations in the GRDC archive, we eliminated those based on any or all of the following criteria.
Sites with a short period of record (<12 years) were eliminated. A 12-year threshold was found to be a reasonable compromise between preserving enough of the discharge gauging station entries at GRDC to capture the spatial distribution of runoff and using only those stations that have long enough observation records to calculate long-term mean discharge.
Sites with an area of <10,000 km2 were eliminated. This requirement was due to georeferencing the selected gauging stations to a 30′ gridded network. As our experiences with STN–30p and other networks show [Vörösmarty et al., 2000a; Fekete et al., 2001], a 30′ network cannot simulate the contributing area well on basins smaller than on the order of 104 km2.
Small (<10,000 km2) and anomalous interstation areas were evaluated. We applied similar minimum area criteria to interstation regions as applied to catchment areas, recognizing the same limitations due to resolution limits of STN–30p. When stations were too “close,” the station with the longer observation record was kept. Because of inconsistencies between GRDC station records and the station topology derived from STN–30p, this interstation area was sometimes found to be negative or suspiciously high or low compared to interstation regions derived from STN–30p. When we could not resolve this inconsistency, the station with shorter observation record was removed.
Multiple stations within a single 30′ grid cell were evaluated. Multiple stations falling into the same grid cell would result in ambiguity in station topology. We attempted to resolve this ambiguity by moving stations to neighboring grid cells whenever it was possible. Otherwise, the station with a shorter observation record was removed.
Anomalous interstation discharge was eliminated. The interstation discharge at several stations had suspicious values (extremely high or low or negative). Those suspicious stations were removed from our selection whenever we had convincing evidence from the neighboring stations or from the literature that the reported discharge values were erroneous.
 The final selection yielded 663 sites and their associated interstation areas (Figures 1 and 2). This set yields a relatively consistent time series. More than 60% of these discharge gauging stations operated during the period 1970–1980.
Figures 3a and 3b demonstrate the highly consistent correspondence between GRDC-reported and simulated catchment and interstation areas. The catchment and interstation area comparisons show 7.5 and 11% mean absolute error with 2% and 3% bias, respectively. The positive bias is due to the STN–30p tendency, mentioned earlier, to overestimate catchment areas. These errors are similar to those obtained through earlier comparisons by Vörösmarty et al. [2000b]. Figure 4 shows the frequency distribution of the symmetric error in comparing reported and STN–30p estimated catchment and interstation areas. Here the symmetric normalized errors were computed as
where εsym is symmetric normalized error (percent), Xsim is simulated value, and Xobs is observed value.
 The selected gauging stations represent a significant proportion of the continental landmass. The total upstream area represented by the 298 most downstream stations (Figure 2) on the STN–30p network is 67 × 106 km2, i.e., >50% of the 133 × 106 km2 continental landmass in STN–30p (which does not include Antarctica, the glaciarized portion of Greenland, and the Canadian Arctic Archipelago). If we consider only the actively discharging 93 × 106 km2 portion of STN–30p simulated networks (based on a 3 mm yr−1 runoff threshold), this data set represents 72% coverage. Table 1 gives individual continent totals.
Percentage given relative to potential/contributing catchment area.
Actively flowing portion of continental landmass defined as areas exceeding 3 mm yr−1 runoff.
 Monitored catchment area is an imperfect measure of how well the spatial distribution of runoff for a particular continent is represented. A good example is South America, which is an otherwise poorly documented continent, but, since its large river systems (such as the Amazon, Orinoco, and Parana) are monitored close to the river mouth, the percentage of monitored area appears high.
 GRDC mean monthly discharge data were originally intended to be used to calculate mean runoff. Interstation discharge, which is the difference between discharge upstream (entering into) and discharge downstream (leaving) of an interstation region, when divided by interstation area, yields runoff. This interstation runoff is partly affected by the time delay of the water traveling from the upstream stations to the outlet of the region and partly by the surplus runoff generated within the interstation region. In order to limit the impact of travel time delays, we had to ensure that the residence time of water traveling between adjacent stations would be short relative to time step of our analysis. Travel time delays along river networks can be estimated by assuming uniform river flow velocity. If we assume average river flow velocity of 1 m s−1, then a parcel of water can travel 3600 × 24 × 30 × 1 m s−1 ≃ 2500 km in a month. Although most of the discharge gauging stations used in our analysis were much closer to each other than 2500 km, some of them (mainly on the Amazon) had distances on the order of several hundred kilometers; therefore the travel time delays were not negligible at the monthly time step, and we calculated the observed interstation runoff on an annual basis only for all sites.
2.4. Application of the Water Balance Model (WBM)
 A recent version of the UNH water balance model [Vörösmarty et al., 1998a] was used to generate a spatial distribution of monthly mean runoff at 30′ (longitude × latitude) spatial resolution for the global landmass. For the present study, the WBM used Hamon's temperature-based potential evaporation function [Hamon, 1963], which was “corrected” using a soil drying function having a quasi-daily time step. The use of a temperature-based potential evaporation function allowed us to make water balance calculations based on air temperature, precipitation, land cover, and soil information, all widely available at the target spatial resolution of 30′ and with a well-established record in Earth System analysis. Contemporary land cover classification was from “potential” vegetation [Melillo et al., 1993] overlayed with cultivated areas from Olson's land-use classification [Olson, 1992]. This composite vegetation was remapped to eight major cover types (conifer forest, broadleaf forest, Savannah/shrubland, grassland, tundra/nonforested wetland, cultivation, desert, and open water), which were found to have characteristic evapotranspiration properties [Federer et al., 1996]. Dominant soil type and texture were from Food and Agriculture Organization/UNESCO . Land cover classification and dominant soil types were combined to estimate rooting depth and water holding capacity as given by Vörösmarty et al. [1996b]. Topographic data were from the ETOPO5 [Edwards, 1992] global elevation data set. Climatologically averaged monthly air temperature and precipitation fields were from C. J. Willmott and K. Matsuura (Willmott and collaborators' global climate resource pages, 1999 (available at http://climate.geog.udel.edu/ climate/index.shtml)). These data were interpolated from station-based records and represent an update of earlier Legates and Willmott [1990a, 1990b] climate fields.
Figure 5 shows observed and simulated mean annual runoff averaged over each subbasin associated with a selected GRDC station. The numerical dispersion appears discouraging in terms of WBM performance. However, when we compare mean annual observed and simulated runoff to annual “observed” precipitation (Figures 6a and 6b), we see that WBM runoff is “well behaved” in relation to the precipitation data used, whereas the observed station runoff shows a wide disparity and a less regular relation to the precipitation. Observed runoff often exceeds the precipitation, which demonstrates a very obvious inconsistency between the two data sets.
 The distribution of simulated errors in Figure 7 shows a mean bias of 7.9 mm yr−1 and shows standard deviation of 173 mm yr−1 over the full spectrum of basin sizes. This is virtually identical to that found on 679 instrumental catchments in the United States using the Hamon potential evaporation function [Vörösmarty et al., 1998a]. Positive bias means that WBM overestimates runoff. Bias is high for basins 104 km2 smaller (mean = 106.1 mm yr−1) in contrast to larger basins (mean = 6.1 mm yr−1 and 8.1 mm yr−1 for 105 and 106 km2, respectively). This supports the area threshold discussed by Vörösmarty et al. [2000b] below which a 30-min resolution will compromise water budget analysis.
2.5. Composite Runoff Fields
 Since observed river discharge at a station represents a spatial aggregation of distributed terrestrial runoff, the disaggregation of observed discharge to reconstitute a spatially distributed runoff field requires additional knowledge about the spatial patterns of runoff controls (i.e., atmospheric drivers, land cover, and soil properties) and potential time delays along flow pathways. Lacking this information, the only possibility is to assume a uniform spatial distribution and to distribute the observed interstation runoff uniformly over all interstation areas. We applied biophysical drivers to the WBM in order to produce a spatially varying runoff pattern within each interstation region. As we have discussed, WBM-type models can be inherently biased due to inaccuracies in climate forcings (most notably, precipitation) [Vörösmarty et al., 1998a; Federer et al., 1996]. By combining observed and simulated information, we can redistribute runoff across each interstation area while at the same time assuring that the aggregate runoff is consistent with observation and with mass conservation.
 Our method links WBM runoff and discharge gauging station data across individual interstation regions defined by the STN–30p and calculates a corrected mean for modeled runoff for each region. The simulated mean runoff can then be related to observed runoff over the same domain through a set of correction coefficients for each distinct interstation area.
 The procedure can be formalized as follows. The mean observed interstation runoff for interstation region i can be expressed as
where oi is mean annual observed interstation runoff [L/T], oi is mean annual interstation discharge [L3/T], and Asi is interstation area defined by STN–30p [L2].
 The mean water balance runoff in the interstation region i becomes
where wi is mean annual WBM runoff for Asi [L/T] and Rwbm is local (i.e., grid cell) annual WBM runoff [L/T]. The water balance runoff correction coefficient, ξsi, for interstation area Asi is calculated as
The corrected runoff for each grid cell, Rc [L/T] then becomes
 Assuming there is no substantial year-to-year water storage, the (ξsi) terms can be calculated on an annual basis. As we discussed in section 2.2, these coefficients can be calculated at shorter time steps only when the travel time delays are negligible and the runoff regime dominates the difference between upstream and the region's outlet discharge. Rc was calculated for only those interstation regions where the observed runoff was positive. Negative observed runoff within the interstation region means decreasing discharge along the river going downstream. It can occur naturally, when river waters infiltrate into the groundwater (e.g., Nile and Niger), or due to human water uptake for irrigation and interbasin transfers (e.g., Colorado). The resulting composite field is shown in Figure 8.
3. Distribution of Contemporary Global Runoff
 One important application of the composite runoff fields (Figure 8) is a digital geography of spatially distributed terrestrial runoff. Various statistics and summaries by regions such as continents and receiving water bodies can be calculated (Tables 2 and 3).
Table 2. Distribution of Terrestrial Runoff (mm yr−1) by Continents and Receiving Water Bodiesa
Table 3. Distribution of Discharge (km3 yr−1) by Continents and Receiving Water Bodies
Black and Mediterranean Seas
 We compared the UNH/GRDC composite fields to estimates made by Baumgartner and Reichel , Korzoun et al. , and GRDC [Grabs et al., 1996]. There is good general agreement over individual continents, but there also can be sizable disparities (Table 4). Runoff given by Korzoun et al.  and the UNH/GRDC composite show best agreement in relatively wet continents and show less agreement in dry areas. For Australasia, there is a very large disparity. We think this is partly due to inconsistencies in the delineation of Australasia in the different studies. Unfortunately, the early studies do not provide enough information to reconstruct exactly their definition of Australasia. The agreement of UNH/GRDC with GRDC estimates [Grabs et al., 1996] is also best in wetter regions and is poorest in dry regions. Since the GRDC estimates assume similar runoff in the monitored and unmonitored portions of the continental landmass, the GRDC estimate has a tendency to overestimate dry continents like Africa.
Figure 9 shows the latitudinal runoff means for the landmass from UNH/GRDC and from Baumgartner and Reichel . The degree of agreement is generally quite good at the global scale, and major features of runoff generation are apparent, for example, the similar placement of the Intertropical Convergence Zone, the desert belt, and the Polar front. Significant differences occur only below 30°S. We have to note that Baumgartner and Reichel  provide runoff over land below 55°S despite the absence of any meaningful landmass, except Antarctica.
 Calculating mean runoff by successively including river basins ranked by area (Figure 10) shows the progression toward global mean. Mean runoff calculated from the top 25 river basins (representing 40% of the continental landmass and 56% of the actively flowing portion of the landmass) is already within 5% agreement of the global mean runoff of 299 mm yr−1 (Table 4).
 Comparing discharge to oceans (Table 5) according to Korzoun et al. , Baumgartner and Reichel , and the composite runoff field derived summaries, the latter tends to be lower than the first two estimates. Some differences might be due to a different delineation of ocean catchments. Furthermore, Korzoun et al.'s  estimate includes groundwater flow to ocean, which could be significant in some regions. In general, both Korzoun et al.'s  and Baumgartner and Reichel's estimates of the continental total discharge flux to ocean are higher than that of the composite runoff fields derived in this study.
 The global river discharge estimates published in the scientific literature vary considerably (38,800 [L'Vovich and White, 1990], 39,700 [Baumgartner and Reichel, 1975], 40,700 [Postel et al., 1996], 42,700 [Grabs et al., 1996], and 46,900 km3 yr−1 [Korzoun et al., 1977]). These differences are partly due to the differences in the set of discharge gauging stations used for the analysis (e.g., GRDC used 198 stations with a total of 52.3 × 106 km2 catchment area measuring 18,000 km3 discharge, while the 298 most downstream stations out of the 663 considered in this study represent 67 × 106 km2 catchment area monitoring 20,700 km3 yr−1 discharge).
 Besides differences in the set of discharge gauging stations represented in the various continental discharge estimates, further differences in the final results arise from differences in how the measured runoff was extrapolated to unmonitored regions. The simplest approach is to assume similar runoff on the monitored and unmonitored portion of the continental landmass. Considering the 133 × 106 km2 of total area of the nonglaciarized landmass, this assumption would result in a total of 41,000 km3 yr−1 (20,700 km3 yr−1 × 133 106 km2 ÷ 67 106 km2) annual discharge. Although this approach could be reasonable for some parts of the globe, it fails to recognize the fact that large portions of the unmonitored regions are actually dry (and there is no river water to monitor). If we proportionally reduce this estimate to represent the actively discharging area of the landmass (i.e., assume identical runoff on the unmonitored but actively discharging landmass), we get a total of 28,700 km3 yr−1 (41,000 km3 yr−1 × 93 106 km2/133 106 km2) annual discharge. This estimate is much lower than any other estimate published, suggesting thatthe unmonitored but actively flowing portion of the continentallandmass is probably wetter than the monitored average.
 The composite runoff fields developed within the present study capture a higher wetness for the unmonitored landmass (722 mm yr−1). The global total discharge estimate of 39,319 km3 yr−1 agrees best with earlier estimates made by Baumgartner and Reichel  and L'Vovich and White .
4. Conclusions and Summary
 River discharge data likely represent the most accurate quantitative information about the global terrestrial water cycle, but this information has not been uniformly adopted in Earth Systems studies, such as GCMs or terrestrial productivity models. Spatially distributed runoff fields that are derived fromthese flow observations have numerous applications, ranging from calibrating and validating the soil vegetation atmosphere transfer component of atmospheric models to providing sustainable water supply estimates in water resource assessments.
 With the advent of geographic information systems (GIS) technology and emerging global GIS data sets such as digital elevation models and corresponding simulated river routings, the linkage between observed river discharge at individual stations and spatially distributed fields of runoff can be established. This capability also offers new opportunities to improve the accuracy of estimates of climate variables such as precipitation and evaporation by closing the water budget over well-defined area sencompassed by sequential discharge gauging stations.
 This paper has applied a new method linking river discharge observations to spatially distributed runoff. A high-quality global river discharge data set was coupled with a well-validated simulated river network (STN–30p). By linking GRDC station datato STN–30p, numerous inconsistencies were found between the two data sets. These inconsistencies were due to either the insufficient resolution of the 30-min routing or errors in thelinked data sets. Linking the two data sets highlighted the problems encountered when linking independently developed biophysical data sets at the global scale. Fortunately, this very linkage helped to identify potential problems and often provided information on how to correct the errors discovered. The composite runoff fields reported here are constrained by discharge observations but preserve the spatial and temporal distribution of runoff according to a water balance simulation, and we believe these to be the most accurate spatially distributed runoff fields available today over the global domain.
 The method is not limited to WBM runoff. Any distributed runoff estimate like the 1° × 1° runoff fields developed within the framework of the Global Soil Wetness Project [Dirmeyer et al., 1999] could serve as a basis for such compositing of spatially distributed and observed runoff.
 Besides development of the composite runoff fields, the present work allowed an evaluation to be made of the spatial coverage of present-day discharge observational networks. Roughly 50% of the continental landmass, representing 52% of the discharge, is now monitored. This highlights an important limitation on our ability to monitor the impact of global change on the terrestrial water cycle. Either modeling, expansion of the current land-based observational network [World Meteorological Organization, 1999; Vörösmarty et al., 2001], or innovative remotesensing techniques will be necessary [Vörösmarty et al., 1999; Birkett, 1998; Smith, 1997]. The authors of the present work are convinced that in the near term, reconstitution of a core discharge monitoring station network with potential real-time reporting at the global scale would allow significant improvement in our current capacity to quantitatively describe the global water cycle [Grabs et al., 1996]. A synoptic discharge gauging station network as a complement to the World Meteorological Organization (WMO) World Weather Watch program and the regional implementation of WMO's World Hydrological Cycle Observing System is advised.
 This work was supported by several sponsors, including NASA-TRMM (grant NAJ5-4785), NASA-EOS (grant NAJ5-6137), NASA Cooperative Agreement (grant NCC5-304), DOE (grant IR61473), NSF (grant ATM-9707953), and the U.S. Committee on Science for Hydrology. The UNH/GRDC Composite Runoff Fields V1.0 is available on CD-ROM from both the Global Runoff Data Centre (Federal Institute of Hydrology, Kaiserin Augusta-Anlagen 15-17, Koblenz, 56068, Germany, phone: +49 261 1306 5269, fax: +49 261 1306 5280, e-mail: firstname.lastname@example.org) and the Water Systems Analysis Group, Complex System Research Center of University of New Hampshire (Morse Hall, 39 College Road, Durham, NH 03824, USA, phone: 603862 1792, fax: 603 862 0188) and via the web (http://www.bafg.de/grdc.htm, http://www.watsys.sr.unh.edu/).