2.2.1. River Data
 River discharge and DIP concentration data were compiled for 111 medium-to-large sized river basins worldwide from several sources (Appendix A). Rivers used for model calibration and validation included basins from a broad range of latitudes, climate types, and sizes (Figure 1; Appendix A). They also included river basins that were both heavily developed and relatively free of human activity. Runoff in study basins ranged from 0.01 to 2.75 m yr−1 with a median value of 0.28 m yr−1. Median annual DIP concentration ([DIP]) ranged from 0.002 to 0.810 mg L−1 with a median value of 0.030 for all systems. When possible, we used flow-weighted mean [DIP] in our analysis (17 cases). When this was not possible, we used median [DIP] (90 cases) or mean concentrations (four cases). The combined discharge of study rivers accounts for over 50% of the world river exoreic discharge (37,400 km3 yr−1 [Meybeck, 1982]), with 37% in the calibration data set, and 14% in the validation data set.
Figure 1. Spatial distribution of basins used to calibrate and validate the NEWS-DIP model. Bold lines represent approximate borders used to delineate continents for continental scale aggregations. See Appendix A for data, model output, and basin names.
Download figure to PowerPoint
 We applied several filters to the available data to assure quality and appropriate application. We included only data where we could verify that both flows and concentrations had been measured (i.e., were not model-derived). We limited our analysis to concentration and discharge data that had been collected after 1970. We also limited our calibration and validation analyses to basins that encompassed more than ten 0.5 × 0.5 degree grid cells (see section 2.2.3.). We selected the most seaward, freshwater sampling point on each river included in our study, and the great majority of stations were located within 50 km of the coast. We also limited our analysis to exorheic river basins, basins exporting water to one of the major oceans, the Mediterranean Sea, or the Black Sea. Rivers discharging to the Baltic Sea were treated as discharging to the Atlantic Ocean.
 In addition to the 111 medium-large basins used for model calibration and validation, we also used data from 393 smaller basins in the continental United States (each containing fewer than ten 0.5 × 0.5° grid cells; range 70–38616 km2) to test NEWS-DIP's ability to predict DIP load for smaller watersheds. These data were taken from the same sources and subject to the same quality control filters as the medium and large-sized basins.
2.2.2. Model Calibration, Validation, and Sensitivity Analysis
 For the NEWS-DIP model, approximately half of the basins (56 rivers) in our global river data set were randomly assigned as calibration rivers (Figure 1). Calibration was achieved by optimizing the model to attain the highest model efficiency (R2) while maintaining coefficients within the range of literature values. Model efficiency (capital R2, not the coefficient of determination (r2)) is a metric ranging from 0 to 1 reflecting the degree of fit between measured and modeled values [Nash and Sutcliffe, 1970]. When R2 = 1, all points fall on the 1:1 line. When R2 is 0, model error is equal to the variability in the data. Coefficients a, b, Lmax, and Wmax were the only calibrated coefficients in the NEWS-DIP model. No coefficients relating to point source inputs or reservoir retention were calibrated.
 We determined the potential ranges for coefficients a, b, Wmax, and Lmax through analysis of published data. Coefficients a and b determine the inflection point and the steepness of the curve relating non-point source P (including weathered P) to water runoff. Wmax defines the upper limit for weathering derived DIP (kg P km−2 yr−1). Lmax defines the upper limit for the fraction of applied manure and P fertilizer that is carried downstream. To determine the ranges of a, b, and Wmax (as well as the shape of the relationship between runoff and weathering-derived P), we examined published data for P export from basins receiving low levels (<100 kg km−2 yr−1) of anthropogenic P. For these basins, there appeared to be a sigmoid relationship between runoff and DIP yield, with an inflection point somewhere between 0 and 1 m yr−1 runoff. We therefore allowed the inflection point of modeled DIP yield (defined by a) to vary between 0 and 1, with a step size of 0.05. We allowed b (defining the steepness of the rising arm of the sigmoid curve in the relationship between runoff and weathered P export) to vary between 1 and 20, with a step size of 1. In these relatively uninhabited basins, P-yield never exceeded 40 kg P km−2 yr−1. Therefore, in our model calibration procedure, we allowed Wmax to vary between 10 and 40 kg km−2 yr−1 with a step size of 2.
 We assumed that non-point P would respond similarly to weathered P (i.e., as the same function of runoff), and therefore applied the same a and b to our representation of non-point P mobilization as was applied to our representation of weathering-derived P. The maximum non-point P mobilized (Lmax) is treated as a fraction of the non-point P (fertilizer and manure) applied. We determined the potential range of Lmax based on literature values. Plot-level and regional studies have generally found that about 0.1–3.3% of the P applied as fertilizer or manure is lost as DIP via surface water transport [Burwell et al., 1997; McColl et al., 1977; Nicholaichuk and Read, 1978; Sharpley and Syers, 1979; McDowell and McGregor, 1984; Sharpley et al., 1995; Bennett et al., 1999; Baker and Richards, 2002]. However, these studies have generally not been conducted in areas subject to high runoff rates, and the fraction of applied P fertilizer and manure lost to surface waters as DIP is likely to be higher in such systems. Also, with respect to manure, this percentage can vary substantially depending on livestock diet, soil sorption capacity, and runoff. Reported values for export of DIP from manure treated under ideal conditions for DIP loss (P-rich manure, no conservation tillage, and intense, artificial rain events) range up to 40% [Ebeling et al., 2002]. For this study, we allowed Lmax to vary between 1 and 10% with a step size of 1%. We then used a script to test every possible combination of these coefficients, and chose the set of coefficient values yielding the highest R2.
 The 55 basins that had not been used for model calibration were assigned to a validation data set. We used this data set to evaluate NEWS-DIP bias and precision according to Alexander et al. . Prediction error (K) is expressed, as by Alexander et al. , as
where L is the model prediction, and M is the measured stream DIP export.
 Change in model efficiency (R2) [Nash and Sutcliffe, 1970] was determined upon removal of model components (e.g., point sources, non-point sources, weathering sources, consumptive use, and reservoir DIP retention). This change in model efficiency was then used to evaluate the relative importance of different model components in explaining DIP export. We also subjected the NEWS-DIP model to a sensitivity analysis in which we varied each model input and coefficient and each combination of inputs and coefficients (±5%) and quantified model response to these variations.
 In addition to our work with NEWS-DIP, we also evaluated r2, R2, model bias, and model precision of two other DIP export models. One of these was a model recently published by Smith et al. , developed as a product of the International Geosphere Biosphere Program-Land Ocean Interactions in the Coastal Zone (IGBP-LOICZ) project and hereinafter referred to as LOICZ-DIP. LOICZ-DIP model predictions were compared with measurement data not used in the formulation of the original LOICZ-DIP model. We carried out a similar analysis for a quasi-empirical model developed for large rivers [Caraco, 1995] (hereinafter referred to as CARACO-DIP); this model used literature values to constrain coefficients for a model that is consistent with physical drivers of DIP export, but was not validated using non-calibration data. In analysis of both models, we used 47 river basins with more than 10 cells each, all coinciding with basins used to validate NEWS-DIP. However, eight basins in the data set used to validate NEWS-DIP were excluded from LOICZ-DIP and CARACO-DIP validation data sets because they were used in the original calibration of those models. We used SPSS 11.5.1 for basic statistical procedures and Matlab 6.0 for model calibration and R2 calculations.
2.2.3. Hydrological Inputs and Reservoir Retention
 An updated version of the STN30-p global river network (STN30-p version 6.0 [Vörösmarty et al., 2000a, 2000b]) was used to define basin boundaries for model runs at 0.5 × 0.5° resolution. Because we were limited to 0.5 × 0.5° resolution, and because basins were delineated using a digital elevation model, STN30-p basin shape and size deviated somewhat from actual basin shape and size. This problem worsened as basins decreased in size. For example, for basins encompassing more than ten 0.5° × 0.5° grid cells, the average ratio of modeled to measured basin size was 1.18 ± 0.48 (1 S.D.). However, for basins with fewer than ten 0.5° × 0.5° grid cells, the average ratio of modeled to measured basin size was generally too high and quite variable (7.14 ± 28.98 (1 S.D.)) We therefore limited our calibration and validation analyses to basins that contained more than 10 0.5 × 0.5° degree grid cells. We used modeled runoff estimates from the water balance model (WBM) as described by Vörösmarty et al. [2000a, 2000b] to supply runoff values for model runs.
 To estimate the impact of consumptive water (and thus DIP) use on DIP export, we multiplied predicted DIP yield by the ratio of measured post-dam water discharge to measured pre-dam water discharge (Qact/Qnat). Values for Qact/Qnat were taken from Meybeck and Ragu , and when unavailable were assumed to equal 1. This approach assumes that DIP removed from rivers for irrigation or other consumptive purpose does not find its way back into surface drainage waters.
 For our estimate of DIP retention by reservoirs, we used a spatially explicit dam database [Vörösmarty et al., 1997]. This database includes locations and reservoir volumes for 714 large (>15 m tall) dams worldwide. We calculated P retention in reservoirs (D) according to Wilhelmus et al.  as
where Rt is the change in retention time (days) due to the creation of reservoirs, calculated by dividing reservoir capacity by reservoir discharge as done by Vörösmarty et al. . However, rather than clustering reservoirs by sub-basin as done by Vörösmarty et al. , we evaluated Rt for every reservoir for which we had data. This approach may somewhat overestimate the impact of reservoirs when reservoirs occur in sequence and relatively close together. It may also underestimate the impact of reservoirs by failing to include smaller, but potentially important, impoundments. However, this approach represents a marked improvement over ignoring reservoirs altogether as previous models have done.
2.2.4. Point Source Inputs
 Point sources are critically important in defining DIP export by many rivers. Previous global DIP models have included point sources as major drivers of DIP export, but have relied solely on population density [Smith et al., 2003] or estimated urban population density [Caraco, 1995] as predictors of DIP point sources, ignoring potentially important factors such as variability in P excretion rates, sewerage, and wastewater treatment. P excretion rates, sewerage, and wastewater treatment all vary substantially at the global scale [World Health Organization (WHO)/UNICEF, 2001; Bouwman et al., 2005b], so including them in an estimate of point source inputs is likely to enhance model predictive capacity. We calculated net DIP point source emission to surface water similarly to Bouwman et al. [2005a] as
where Ecap is net phosphorus emission to surface water (kg person−1 yr−1), T is the rate of P removal via wastewater treatment (i.e., P retention as a fraction of the P influent to treatment plants; 0–1), I is the fraction of the population connected to sewerage systems (0–1), and Pem is the gross human P emission (g P person−1 d−1). In addition to sewage, P-based detergents may constitute a significant source of surface water DIP in many countries. However, they are not explicitly accounted for by NEWS-DIP owing to a lack of input data.
 We used a conceptual relationship of per capita human P emission and per capita income similar to that used by Van Drecht et al.  for nitrogen,
where Pem is per capita daily human P emission (g per person per day) and GDP is per capita gross domestic product (1995 US$ per capita per year). GDP for each country is divided by 43,639, the world's highest per capita GDP in 1995 (Switzerland) [World Bank, 2000]. Low-income countries have human per capita P emissions of about 1.3 kg yr−1 and industrialized countries between 2.3 and 2.6 kg yr−1. Data for I (equation (8)) were extracted from Bouwman et al. [2005b].
 For countries where sewage treatment data were available (16% of countries globally), T was calculated as
where Fmech, Fbiol, and Fadv are the fractions of each country's sewage that has mechanical, biological, and advanced treatment, respectively. Coefficients for each treatment type (0.1, 0.35, and 0.8) were assigned as suggested by Black and Veach Consulting Engineers  and Slam . For countries where no data on sewage treatment were available we used regional estimates of sewage treatment [WHO/UNICEF, 2001; Bouwman et al., 2005b].
2.2.5. Diffuse P Sources
 There is some indication that non-point P inputs such as fertilizer and manure are important sources of DIP to surface waters during runoff events [Ebeling et al., 2002; Baker and Richards, 2002] at local to regional scales. However, manure and fertilizer P inputs have not explicitly been included in past efforts to model DIP export at the global scale. In NEWS-DIP, P inputs from inorganic P fertilizer (Pfe) and animal manure (Pam) were calculated as done by Bouwman et al. [2005a]. For fertilizer P inputs, national P-use data were used and distributed across agricultural areas, maintaining different application rates for different crop types as done by Bouwman et al. [2005a]. For manure P inputs, we used published N:P ratios of manure for various livestock species, including pigs, cows, chickens, sheep, goats, and horses [Bouwman et al., 2005a].