We present an overview of national water databases managed by the U.S. Geological Survey, including surface-water, groundwater, water-quality, and water-use data. These are readily accessible to users through web interfaces and data services. Multiple perspectives of data are provided, including search and retrieval of real-time data and historical data, on-demand current conditions and alert services, data compilations, spatial representations, analytical products, and availability of data across multiple agencies.
Hydrologic data are central to the mission of water programs at the U.S. Geological Survey (USGS) as reflected in the recently released Water Science Strategy – Observing, Understanding, Predicting, and Delivering Water Science to the Nation (Evenson et al. 2013). We collect a wide range of hydrologic data, assure the quality of those data, and make them freely available in national databases. The most institutionally significant of those databases is the National Water Information System (NWIS), which began as a core USGS capability in the 1980s and has been maintained and improved to meet ongoing requirements. NWIS included a blending of several previous database systems, and was ultimately the successor to the USGS Water Data Storage and Retrieval System (WATSTORE). Data from NWIS are available to the public through a web interface (http://waterdata.usgs.gov) and through data services (http://waterservices.usgs.gov). Instructions and tutorials are provided through an online help system (http://help.waterdata.usgs.gov/). In this paper, we describe the objectives of these and other USGS water data systems, the size and variety of holdings, the level of usage, and the variety of data delivery mechanisms that these USGS systems employ.
The USGS water data systems follow a set of key principles:
The data stored and delivered meet USGS quality-assurance standards. These are mostly data collected by USGS, but some are collected by others and quality-assured through very specific agreements and protocols.
Access to the data is the same regardless of the geographic location of a data collection site or the data user. The key principle is that users should not need to learn multiple data delivery mechanisms as they shift their geographical focus. The system needs to serve users operating at all scales from local to regional to national.
As much as possible, the system delivers all types of water resources data using the same methods and definition of terms. The data cover open channel flow, groundwater, water levels, water chemistry, water use, and other water-related data. There are major differences between different types of hydrologic monitoring environments (for example: rivers, lakes, and wells), however, so the system needs to be organized in a way that recognizes any intrinsic differences. There is just one data dictionary and set of definitions of sites and variables.
The system integrates the most current information (in some cases minutes old) with historical information (dating back a century or even more). This enables users to put current information into a historical context.
The system delivers data in many different product forms. These include graphical, tabular, and text presentations that are equally useful on web browsers and for export to applications. The system supports automatic data delivery and provides data on request. Through this wide range of products, we aim to make the information available to individuals who range from casual users curious about the environment or about recreational opportunities, to highly specialized users who will ingest the data into scientific analyses or water-management models.
The database contains only the data and not interpretive products or analyses based on the data. Methods of analysis are always evolving as our understanding of hydrology and development of analytical tools improves, and users should be able to control the analysis they conduct. The data themselves are fundamental and do not change over time.
In addition to a well-designed graphical user interface (including help and tutorial information accessible from all pages), the system has clearly defined and documented Application Programming Interfaces or data services, that conform to international standards where available, from which models and other value-added products can be based.
Application of these principles over 125 years has provided some basic “lessons learned” that may be valuable to other organizations considering similar water-data acquisition and management programs:
Apply quality assurance practices consist-ently and without compromise.
Use standard, well-documented techniques and methods for data collection, processing, and management.
Make sure that the long-term integrity and availability of data are accepted as mission critical.
Do not shortcut metadata; these are essential to categorize, qualify, and search data.
Develop technical and joint-funding partner-ships with other organizations with mutual interests for high-quality and robust water-resources data to assure the sustainability of data collection programs and long-term data availability.
The amount of data stored in the system is significant and always growing. Table 1 lists some of the major data types in the system and the number of sites where collected. Table 2 lists the number of data values or records they represent. An important data type in hydrology is daily values. These data represent either one representative measurement per day or the average of many observations per day. These datasets typically span many years or decades and are of great value in determining long-term statistics and trends in hydrologic conditions. Table 3 lists the major types of daily values and the number of observations of each type.
Table 1. Number of sites by data type provided in the USGS NWIS public database as of January 27, 2014.
Number of Sites
*Some sites have multiple data types
All monitoring sites
Water quality data
Tide level data
Daily values data
Streamflow measurements data
Groundwater level measurements data
Water quality samples data
Peak discharges (floods) data
Table 2. Number of data values provided in the USGS NWIS public database as of January 27, 2014.
Number of Data Values
Groundwater level measurements
Water quality samples
Water quality analyses
Peak discharges (floods)
Table 3. Number of daily values in the USGS NWIS public database by major data type as of January 27, 2014.
Number of Daily Values
Stage (water level in surface water)
Water level in wells
pH of water
Specific cunductance of water
Other daily values
The multi-purpose nature of the data delivery system is a crucial characteristic. By design, it provides products that benefit many different types of users who have diverse needs and have diverse levels of knowledge. The data are very well suited to national and regional scales of analysis, and in some cases, local scales as well.
Example uses include:
Active water management (hourly to monthly time scales)
Water planning (short and long term)
Establishing initial conditions for water forecasting models
Risk assessment (flood and low-flow frequency)
Water quality assessment and regulation
Scientific research on water quantity and quality
We can characterize current levels of usage of the system in several ways. During 2013, there were over 300 million pages accessed through users’ web browsers and the equivalent of an additional 190 million page views through data services. Data services provide for direct computer-to-computer delivery of the data product. For users who will use the data as input to some application (for analysis or operations), the advent of these data services has been a vast improvement over downloading through a web browser and then having to manipulate the output to make it suitable for input to user applications. The growth in the use of USGS hydrologic data is clearly in the area of data services although browser use continues at an average of 25 million data views per month.
Figure 1 summarizes the history of system use. Major events such as widespread flooding or drought explain some of the temporal variability. Growth over time depends on general user awareness, but growth has also coincided with the release of new features.
The Fundamental Time Step
For more than a century, from 1888 through the 1990s, the primary product of much of USGS water data collection was the daily value of a hydrologic variable of interest. The most common of these was daily mean discharge, but it also included daily mean values parameters such as temperature, specific conductance, or water level in wells. This approach was driven by practical limitations of records processing and print publishing. Before the 1960s, manual and graphical records provided limited time resolution and processing was particularly labor intensive. Daily computations were at about the limit of commonly available technology. Later, digital recording and computing technologies provided a capability to collect and process data at a finer time step. A single printed report page, however, can only accommodate a table of one year of daily values of discharge for a given streamgage. The printed page was still the primary means by which we delivered data. In its earliest years, USGS released data in Water-Supply Papers and other publications. Water-year compilations began in 1911, and until 1976, USGS released data by hydrologic regions in periodic Water-Supply Papers. From 1961 to 2005, USGS produced state-based annual Water Data Reports, and in 2006 converted those fully to an electronic-only national Water Data Report, still with a focus on daily data for a water year. In 2014, the national Water Data Report will be replaced with a similar on-demand, site-by-site, web-based report.
With the advent of digital data recorders and computers, changes were coming fast in the availability of hydrologic data. It was possible to collect, process, store, and distribute data faster than ever before. Hydrologic data processing now provided the same daily data products as printed reports, but in the form of computer files of tabular data. The sensor data used to create the daily data product were actually collected at a much shorter time step (most commonly 15-minute intervals) as “instantaneous values,” but these were only used within the individual USGS office in the process of creating the daily values product. Because data storage was still very expensive, few of the early instantaneous values were stored in databases. Hydrographers saw them only as “raw material” for the daily computations and not as a standalone data product. Furthermore, there was no process to track the details of final data editing and adjustment, so even if a hydrographer reprocesses the data, the computed daily averages may not agree with the final published daily value.
With ongoing reductions in the cost of data storage and data transmission, USGS re-examined its data delivery paradigm in the late 1990's and began a transition towards making the finer time-step data (instantaneous values) a primary product. Especially important in lower order streams, instantaneous values data are crucial to understanding the dynamics of changing water levels, flows, concentrations, and fluxes (or loads). Better understanding will ultimately lead to better models and tools that will enhance the interpretation of the large body of historical data.
In the case of streamflow data, these finer time-step data have a great deal of utility. They are crucial to the process of streamflow modeling and streamflow prediction. Determining how the watershed responds to rainfall input can be severely limited if the data can only be presented as a step function of daily values rather than as a nearly continuous time series. Shorter time-step data are also important for determining the transport of sediment and chemicals downstream. Current practice is to estimate transport using a relationship between instantaneous sediment or chemistry measurements and daily mean discharge. For example, methods described in Cohn and others (1992) and Hirsch and others (2010) use the full record of daily mean discharge to estimate a record of daily mean fluxes. We know that when there is a large variation in discharge during the course of a day, these relationships are seriously limited in their ability to predict flux. Sub-daily information is also critical for flood hazard mitigation because the time history of inflows to a river or reservoir is required to determine the effectiveness of a design. Flood-control systems must work to limit the size of the peak water levels or flows, which cannot be determined based on daily averages.
The USGS provides fully qualified instantaneous values online from October 1, 2007 onwards, and in some cases prior to 2007 because of the need for this data at a finer time step. On January 27, 2014, there were more than 3.7 billion values available for 14,517 sites and 160 parameters. We believe that these shorter time interval data will be of great value to the hydrologic science community.
The best-known USGS hydrologic data product, the streamflow record, was only an historical product for about 80 years. That is to say, the numbers became available to the science and water management community a few months to a year or more after the fact. They were quite useful for scientific studies and provided important data for water planning and for hazard and resource assessments, but they were of no help to timely water management, water operations, or hazard warnings. Gradually, various systems became available in USGS streamgages that provided direct transmission of the data in near-real time to forecasting and operations activities of the National Weather Service, the Bureau of Reclamation, the Army Corps of Engineers, water and power utilities, and other agencies. Early systems used landline telephone (starting in 1921) and radio technologies (starting in 1931). They were diverse and complex systems, typically uniquely designed for use by a particular partner agency, and not available to the public or scientific community.
In 1976, USGS began to experiment with the use of Geostationary Operational Environmental Satellite (GOES) to transmit data from field stations. The use of this technology has grown rapidly and continues to grow to the present day. For streamflow, the number of real-time sites has grown from 120 in 1978, to 1000 in 1982, 5100 in 1999, and 9600 in 2014. The system transmits not only streamflow data, but also groundwater levels, many sensor-derived chemical and physical variables related to river conditions, and atmospheric conditions at USGS water-monitoring locations. Table 4 shows a partial list of the major data types that USGS delivered in real time on one particular day. Data sites are distributed across the nation and are in every state, although the density of sites in any given state will vary due to the hydrologic regime and requirements of partner agencies.
Table 4. Real-time parameters transmitted on January 27, 2014. List includes the top 11 parameters and 5 other selected variables for which interest is growing. Some sites transmit more than one similar parameter so count may be more than the number of sites.
Variable Being Transmitted
Gage height (water level in stream)
Lake or reservoir elevation
Nitrate plus nitrite
Sodium absorption ratio
Blue green algae
The investment in real-time data delivery has had a number of outcomes that the USGS did not anticipate. One of those outcomes is the expansion of the community of users of USGS hydrologic data. Availability of these new data has led many ordinary citizens, business owners, farmers, and local government officials to become users of the data, and in many cases frequent users. Making a decision about moving people and property out of flood-prone areas is an example of a decision informed by these real-time data. For recreation, people are keenly interested in knowing if the river is near its ideal flow for the activity that they plan. Flows that are too high or too low can make for an unrewarding or unsafe water sports experience. Water recreation enthusiasts have become major consumers of USGS streamflow data. Real-time data are also very helpful to the scientific research community because they help to monitor current field conditions to help determine optimal times to travel to the site to collect critical scientific data.
Frequently, USGS program managers are asked if the addition of real-time capability makes data collection less expensive because of reduced labor costs. The simple answer is no. The counterpoint, however, is that it does significantly improve the usefulness, accuracy, and completeness of the dataset. We described some of the added utility of the data above. Accuracy is improved because the hydrographers who make flow measurements to keep an accurate calibration for the streamgage (the rating curve) can target particular ranges of water level that need additional measurements and the real-time data can help them determine when those conditions exist. This information triggers their travel to the site to make an additional measurement. Also, without real-time data, if any kind of system failure happened at a streamgage (e.g., loss of power, damage to sensors, damage to data storage systems, or even total destruction of the streamgage) the problems would be unknown to the USGS staff until the time of the next field visit, which might be as much as 6 weeks away. Now, when serious problems occur at a streamgage, the real-time data help to identify problems quickly and can trigger a visit to the streamgage to rectify the situation within a few days or less. Estimates of missing data values are more reliable if the gaps are short than if the gaps are long.
Perhaps most surprising is the increase in the availability of real-time groundwater level data (about 1600 sites nationwide). In general, groundwater levels in wells fluctuate much more slowly than do water levels in rivers, and having just one observation per day may be sufficient for most scientific or water management purposes. Nevertheless, providing real-time groundwater data actually has a number of benefits. For example, during drought periods, water managers and public officials need to know water levels in aquifers throughout their area in order to assess drought severity. Without real-time data, it becomes a very slow and costly process to visit every well in a regional groundwater-monitoring network. Groups such as a governor's drought task force may require weekly updates from a network of groundwater wells. We can now provide a summary in a matter of minutes using data that are no more than a few hours old. This contrasts with summaries that used to take several days and multiple people to compile using data that were already a few days old. Also, real-time data are just as important for operating groundwater sites as any other sites, if not more so. Because groundwater conditions in general change slowly and instrumentation is relatively stable, routine visits are much less frequent than for streamgages. The ability to retrieve and review data through telemetry can lead to fewer routine visits, but more importantly it can reveal problems that otherwise might not be detected for several months. There is also a public information benefit with the provision of real-time groundwater data. Groundwater is largely unobservable directly and is poorly understood by the public. The availability of real-time groundwater data on the web enables the public and public officials to view the heretofore “invisible” water resource beneath their feet and to understand how it changes in response to precipitation events and to pumping, particularly during dry periods when groundwater is especially important. We believe that access to real-time groundwater data has the potential to improve the public's understanding of this important, but poorly understood, resource.
With the real-time capability in place and a new challenge of how to make so much data tractable to USGS stakeholders, systems such as WaterWatch were conceived as a means to summarize current water conditions in a nationally consistent and useful way. The WaterWatch system (http://waterwatch.usgs.gov) provides national or state map depictions of the status of streamflow (Figure 2). The color-coding of each site's status depends on the percentile level of the current streamflow from the historical distribution of streamflow for that day of the year. For example, a dark brown dot on the map represents a streamgage for which the most recent estimated streamflow is lower than the 10th percentile on the historic flow distribution for that day (but not the flow of record, depicted by a red dot). This map makes it possible, at a glance, to determine where flows are very high, or normal, or in extreme low flow condition. Animations of previous maps, available at the WaterWatch web site, can be highly informative about the spatial and temporal coherence of hydrologic anomalies and the way that they spread, shrink, and persist for long periods on the landscape. Users can select an individual state to see it in more detail and then explore each individual streamgage represented, learning their name and current condition, or “clicking through” to gain access to both the historic and recent data about the site.
USGS also provides GroundwaterWatch, (http://groundwaterwatch.usgs.gov) which tracks water levels in wells, and WaterQualityWatch, (http://waterwatch.usgs.gov/wqwatch) which tracks water quality as measured by stream water-quality sensors nationwide. WaterQualityWatch includes the most common parameters of temperature, specific conductivity, pH, dissolved oxygen, turbidity, and nitrate. GroundwaterWater and WaterQualityWatch provide fewer sites than WaterWatch (see tables 1 and 4 to compare relative data population), but each product provides a useful status review of current conditions across the nation.
Data Discovery Using NWIS Mapping Applications
For scientists engaged in hydrologic research within a watershed, state, or region, a common challenge is to find sites that offer a rich data record that may be useful for the research they are conducting. The NWIS web interface allows searching of the public database for sites that meet criteria, and visualization of the results using its mapping functions. For example, a researcher may be interested in finding all stream sites in the five water resources regions that make up much of the eastern United States (New England, Middle Atlantic, Great Lakes, South Atlantic, and Ohio) that have at least 1200 water quality samples.
Figure 3 shows the results returned by a search. Information about each site on the map is available by selecting it with the mouse cursor on the map or from the list on the left. In either case, “clicking through” will give the user direct access to the available data. For input to other software or for future data retrieval, an option is available to export the results to a file. Access to the user interface starts at http://waterdata.usgs.gov. Data discovery capabilities such as these are of great value to the research community because they can quickly identify the datasets that have significant potential for use in research on a given topic. Sorting through the 1.57 million sites in the NWIS public database to find the sites with the type and amount of data needed for a given study is a daunting task, but these tools help to make it much easier.
A spatial database that may be useful to some researchers is GAGES-II (Geospatial Attributes of Gages for Evaluating Streamflow; Falcone 2011). This was an update to the original GAGES database (Falcone 2010). It includes the attributes of the watersheds upstream of each of 9,322 streamgages that the USGS has monitored with at least 20 complete years of discharge record since 1950, and/or were active in water year 2009. The dataset documents important variables such as drainage area, mean basin altitude, geologic and soil characteristics, population density, population change, types of agricultural activities, rates of fertilizer application, extent of man-made water storage, climatic variables, and many other relevant hydrologic variables. One use of the database is for locating sites with the desired characteristics for a planned study. It quantifies a variety of possible explanatory variables in regional or national hydrologic studies, and is available at http://water.usgs.gov/GIS/metadata/usgswrd/XML/gagesII_Sept2011.xml.
Access to USGS Water Data in Conjunction with Other Sources of Water Data
All of the discussion above has related simply to USGS datasets and access to them. Of course, there are many other sources of water data available. At a national scale, there is the Environmental Protection Agency's (EPA) STORET system for water quality data, which aggregates regional, state, and local data using the Environmental Exchange Network (http://www.exchangenetwork.net/). Other state agencies, federal water agencies, river basin commissions, regional agencies, local water utilities, volunteer groups, and university researchers maintain additional regional and local datasets. About a decade ago, the USGS began forming a variety of partnerships that have led to some very successful approaches to discover and access data from multiple sources. The two most significant of these are the Hydrologic Information System (HIS) that has been created and is maintained by the Consortium of Universities for the Advancement of Hydrologic Sciences Incorporated (CUAHSI), and the Water Quality Portal, which is a cooperative service sponsored by the USGS, EPA, and the National Water Quality Monitoring Council. Both of these efforts are the result of work across the water community in developing common standards for time series and discrete water-quality data, respectively.
HIS is a system for finding and retrieving relevant data from a wide variety of data sources; particularly the USGS, several other federal agencies, and many universities (see http://wdc.cuahsi.org/). It uses a data service format known as WaterML. The USGS has been an active participant in the development of WaterML, making sure that the HIS would have full and accurate capabilities to find and deliver a wide range of USGS water data to HIS users. Associated with the HIS is HydroDesktop (http://his.cuahsi.org/hydrodesktop.html) that not only finds and downloads data via the HIS, but also structures it in such a way that it can be seen in map and graphical form, and entered easily into a variety of data analysis and modeling systems. WaterML Version 2 is being developed within a joint working group of the World Meteorological Organization and Open Geospatial Consortium, and will provide a consistent international standard (http://www.opengeospatial.org/projects/groups/waterml2.0swg).
The Water Quality Portal (http://www.waterqualitydata.us) is a system for discovery and retrieval of water quality data from NWIS and the EPA STORET system. Searching includes location parameters, site parameters, or sampling parameters; and can return data files to users in several forms: comma-separated values, tab-separated values, workspaces in the statistical language “R”, as spreadsheets, or KMLs (for site information only). Using the Portal avoids users having to learn two different search and retrieval protocols, and there is a cross-reference for the differences in categories and codes between NWIS and STORET.
An additional source of interagency data is the Reservoir Sedimentation Database (http://ida.water.usgs.gov/ressed/), which includes nationwide data from the U.S. Army Corps of Engineers, the U.S. Bureau of Reclamation, the USDA Natural Resources Conservation Service, and the USGS.
Linking Data Services Directly to Analysis Software
An innovative approach to data access is the incorporation of retrieval from data services directly into the body of analysis or modeling code. One example is an R statistical package for retrieval, analysis, and anomaly calculation of daily streamflow data (Ryberg and Vecchia, 2012, http://pubs.usgs.gov/of/2012/1168). Another is Exploration and Graphics for River Trends (EGRET), which is available at https://github.com/USGS-R/EGRET/wiki. The EGRET package and the associated dataRetrieval package facilitates a variety of statistical methods of exploring trends in surface-water quality and trends in streamflow. These software packages contain the necessary code to download data from USGS data services, check for certain types of errors, and organize the information into data structures that are appropriate to the analyses being made by the particular package. Data retrieval and data analysis should be a single integrated process. In the future, we expect that integrated data retrieval and application systems will be an important part of the USGS water data delivery system.
New Systems Designed for Mobile Devices and for Push Notification of Hydrologic Data
USGS provides several data delivery products that make current hydrologic conditions data more readily available for timely decisions. For field researchers, that could make the difference between getting and missing a critical sample or measurement for target conditions.
In 2010, USGS released WaterAlert (http://water.usgs.gov/wateralert) to provide frequent users of current data with an email or text-message notification when conditions cross a threshold that they set, such as when river stage or discharge goes above a particular level. Its flexible design allows for any up-or-down threshold of interest and supports multiple data parameters. It provides a map to locate sites of interest; a subscription form to customize user preferences; and, where available, information to help select thresholds, such as a link to National Weather Service flood stages. Figure 4 shows the subscription form with the user-selectable settings. As of January 21, 2014, there were 55,060 WaterAlert site-parameter-threshold subscriptions from 40,330 users.
In early 2013, USGS released WaterNow (http://water.usgs.gov/waternow) to provide users with a readily available and fast tool to obtain up-to-date data. The service works on any device that supports email or text messaging, either online or mobile. For a streamflow site, send an email or text message to WaterNow@usgs.gov with a USGS station number in the subject or body and within a few minutes, you will receive a response with the most recent values of stage and streamflow. For other site types, by including just a USGS station number, WaterNow will return a list of available parameters for that site. The service supports any real-time surface-water, groundwater, water-quality, or atmospheric parameters available from USGS. Figure 5 shows several examples of text-message queries.
Later in 2013, USGS released a mobile-friendly version of the NWIS web interface (http://m.waterdata.usgs.gov) that delivers frequently accessed current conditions data. It includes an easy-to-use map browser and automatic-location functions that are particularly suited to smart phones (Figure 6).
USGS endeavors to make its water data readily accessible to users. Technological developments over the last few decades have enabled a transition from paper reports to online reports and from daily data to instantaneous data. Targeted data compilations and analytical tools add value to the data and enhance information overall. For up-to-the minute use, data are available on demand through the NWIS web interface, NWIS data services, WaterAlert, WaterNow, and USGS Mobile Water Data. The USGS Water Data Discovery page at http://water.usgs.gov/data includes updated information for these and other USGS water data products. The USGS motto is “Science for a Changing World.” In the realm of water data access we continue to focus both on change that happens hour to hour and day to day, but also the change that happens over time frames of decades to centuries. USGS water delivery mechanisms continue to evolve and improve in step with new information technologies. An increasing emphasis on national and international data standards and web services allows USGS to integrate our data with others more readily and to share software components across the research and water management communities. Distributing water data with applications on new mobile platforms brings value to new and nontraditional consumers of hydrologic information.
Author Bio and Contact Information
Robert M. Hirsch is a Research Hydrologist with the USGS in Reston, VA. His research focuses on the description and understanding of long-term variability and change in surface-water quality and streamflow. Bob served as USGS Chief Hydrologist from 1994–2008 before returning to research. He can be contacted at email@example.com.
Gary T. Fisher is a retired Hydrologist and resides in Baltimore, MD. He managed Special Projects from 2007–2013 for the USGS Office of Water Information, including an NWIS User Group and the development of new data delivery applications. As a USGS Scientist Emeritus, he continues to develop and promote applications of USGS water data. He can be contacted at firstname.lastname@example.org.