An open web-based module developed to advance data-driven hydrologic process learning

The era of ‘ big data ’ promises to provide new hydrologic insights, and open web-based platforms are being developed and adopted by the hydrologic science community to harness these datasets and data services. This shift accompanies advances in hydrology education and the growth of web-based hydrology learning modules, but their capacity to utilize emerging open platforms and data services to enhance student learning through data-driven activities remains largely untapped. Given that generic equations may not easily translate into local or regional solutions, teaching students to explore how well models or equations work in particular settings or to answer specific problems using real data is essential. This article introduces an open web-based module developed to advance data-driven hydrologic process learning, targeting upper level undergraduate and early graduate students in hydrology and engineering. The module was developed and deployed on the HydroLearn open educational platform, which provides a formal pedagogical structure for developing effective problem-based learning activities. We found that data-driven learning activities utilizing collaborative open web platforms like CUAHSI HydroShare and JupyterHub to store and run computational notebooks allowed students to access and work with datasets for systems of personal interest and promoted critical evaluation of results and assumptions. Initial student feedback was generally positive, but also highlighted challenges including trouble-shooting and future-proofing difficulties and some resistance to programming

generic equations may not easily translate into local or regional solutions, teaching students to explore how well models or equations work in particular settings or to answer specific problems using real data is essential. This article introduces an open web-based module developed to advance data-driven hydrologic process learning, targeting upper level undergraduate and early graduate students in hydrology and engineering. The module was developed and deployed on the HydroLearn open educational platform, which provides a formal pedagogical structure for developing effective problem-based learning activities. We found that data-driven learning activities utilizing collaborative open web platforms like CUAHSI HydroShare and JupyterHub to store and run computational notebooks allowed students to access and work with datasets for systems of personal interest and promoted critical evaluation of results and assumptions. Initial student feedback was generally positive, but also highlighted challenges including trouble-shooting and future-proofing difficulties and some resistance to programming and new software. Opportunities to further enhance hydrology learning include better articulating the benefits of coding and open web platforms upfront, incorporating additional user-support tools, and focusing methods and questions on implementing and adapting notebooks to explore fundamental processes rather than tools and syntax. The profound shift in the field of hydrology toward big data, open data services and reproducible research practices requires hydrology instructors to rethink traditional content delivery and focus instruction on harnessing these datasets and practices in the preparation of future hydrologists and engineers.

| INTRODUCTION
Hydrologists investigate the distribution and variation of water across a range of spatial and time scales. In the face of mounting water resources challenges-due to a growing population, climate and land use change, and shifting societal values-hydrology has evolved from a mainly applied engineering discipline to a fundamental underpinning of geo and environmental sciences (Eagleson, 1991;National Research Council, 1991;Vogel et al., 2015;Wagener et al., 2007). As an applied and interdisciplinary science, hydrology benefits from firsthand knowledge gained by working with many different datasets.
Generic equations are not easily translated into local or regional solutions, and experience with specific systems and datasets is critical for hydrologic practice and research. Such data-driven analysis is often needed to conceptualize complex processes and to explore how well models or equations work in particular settings or to answer a specific problem.
As demands on hydrologists have grown, so have calls to enhance hydrology education at the upper division and graduate levels to adequately prepare students for both research and industry (Merwade & Ruddell, 2012;Ruddell & Wagener, 2015;Wagener et al., 2021).
Enhancing students' ability to conceptualize, analyze and interpret complex hydrologic processes is an area of much research (Bourget, 2006;Habib et al., 2018;Marshall et al., 2013;Merwade & Ruddell, 2012;Ngambeki et al., 2012;Ruddell & Wagener, 2015;Wagener et al., 2007Wagener et al., , 2012. Educators have recognized a need to augment traditional teacher-centred lectures centred on fundamental physical laws with student-centred, data-driven learning activities that enable students to explore the hydrological system using authentic datasets and modelling tools (Merwade & Ruddell, 2012). Problembased learning activities that include the use of authentic, real-world problems and datasets have been shown to enhance engineering and hydrology learning outcomes and career preparation Habib et al., 2012Habib et al., , 2019Litzinger et al., 2011;Lyon & Teutschbein, 2011;Merck et al., 2021;Sanchez et al., 2016). As a result, several web-based educational platforms that offer learning in an internet-based environment have been developed to incorporate real-world data and modelling resources in hydrology learning activities (e.g., SERC, CSDMS, COMET, HydroViz, RWater).
At the same time, the sheer volume and access to hydrologic data has grown rapidly through breakthroughs in remote sensing and in situ data collection and data services. "Big data" promises to provide new hydrologic insights to address mounting water resources challenges, and collaborative open web-based platforms are being increasingly developed and adopted by the global hydrologic science community to harness these datasets (Goodall et al., 2017;Slater et al., 2019). "Open" in this case implies that data and computational resources can be openly shared, discovered, and accessed among the community (Chen et al., 2020) while the underlying software may, in some cases, be commercial (i.e., not open-source). Open-source software by contrast is free to use, distribute, and modify. Open-source software provides unique opportunities in education for accessibility (Rajib et al., 2016) and in research for transparency and reproducibility as it reduces the financial and time costs for others to reproduce results (Rosenberg et al., 2020); however, it may not always have extensive technical support.
As in many fields, hydrology is trending toward a standardized open web-based structuring of data services, formats, and metadata to facilitate data management, analysis, and sharing needs. For example, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) has developed an array of web-based data services and information systems specifically for the hydrologic science community (Goodall et al., 2017;Horsburgh et al., 2008;Horsburgh, Aufdenkampe, et al., 2016). Other open web-based platforms not specific to hydrology are also being increasingly adopted.
For instance, GoogleColab, Google Earth Engine, and Jupyter Notebooks all allow users to create and share documents that contain live code, equations, visualizations, narrative text and link to web-based data services. Collaborative platforms like these provide convenient, standard workspaces and tools for the hydrology community, but they also demand that hydrologists and hydrology instructors keep pace with the rapid advancements.
With the promises of "big data" in hydrology come new challenges related to data management and reproducibility. Reproducibility is a critical requisite to advancing hydrologic discovery and innovation, and to subsequent integration and reuse of findings by the community (Choi et al., 2021;Essawy et al., 2020;Hutton et al., 2016;Stagge et al., 2019;Wilkinson et al., 2016). The complexity and diversity of hydrologic systems reflected in emerging data requires that scientists can reproduce methods developed in specific settings more broadly across a range of scales and locations to robustly evaluate hypotheses and assumptions (Ceola et al., 2015;Clark et al., 2016;Hutton et al., 2016). Particularly as datasets and models become more complex, analysis procedure and code need to be transparent and well-documented to allow for reproduction (Rosenberg et al., 2020;Stagge et al., 2019). The increasing use of open and open-source software by the hydrologic science community underpins these dual aims of accessibility and reproducibility.
The shift in data availability and analysis capabilities offered by open web-based platforms and the call for reproducible research have fundamentally transformed the role of hydrology instructors from disseminators of knowledge to guides in learning, critical thinking, and good research practices. However, these changes have not yet fully translated into changes in the education of future hydrologists. While educational platforms are emerging to support authentic, problembased learning, as described above, they are mostly static and lack mechanisms for harnessing the emerging open data services and practices being adopted by the professional community. They also generally lack a formalized pedagogical structure to help instructors develop their own learning activities with these aims in mind. One exception is HydroLearn, a web-based collaborative hydrology education platform that provides a formalized and validated pedagogical structure-including tools to support instructors in creating learning objectives, formative assessment questions-to develop authentic, problem-based learning activities. Student learning of concepts and technical skills has been found to increase after using HydroLearn modules Merck et al., 2021). However, Hydro-Learn's capacity to harness emerging open platforms and data services to enhance conceptual understanding in hydrology through datadriven learning remains largely untapped.
Advancing understanding in hydrological processes requires a workforce trained in working with data and learning from data, and learning platforms and modules designed to facilitate data-driven learning have the potential to change the way hydrologists do research that advances hydrological processes. This article describes a HydroLearn physical hydrology learning module targeting advanced undergraduate and early graduate students in hydrology and engineering. The aims of the learning module are (1) to harness emerging open web-based platforms in order to (2) develop data-driven learning skills whereby students actively explore key concepts using real data that is relevant and meaningful to them thereby (3) enhancing student learning of fundamental hydrology concepts while (4) providing experience applying good data management and reproducibility practices.
The article briefly describes several open web-based platforms for hydrology and their potential educational utility, introduces the learning module including integration of these platforms, and offers initial student perceptions and instructor reflections on the module.

| OPEN WEB-BASED PLATFORMS AND PROGRAMMING PACKAGES FOR HYDROLOGIC ANALYSIS
Collaborative open web-based platforms and tools are being increasingly adopted by the hydrologic science community. A comprehensive review of available resources is beyond the scope of this article.
Instead, here we briefly summarize the platforms and programming packages utilized in the HydroLearn physical hydrology learning module and their potential educational utility, including CUAHSI HydroShare and JupyterHub, and ESRI Story Maps.

| HydroShare
HydroShare is a web-based collaborative platform for hydrology data storage, retrieval, sharing, and processing (Essawy et al., 2020;Horsburgh, Morsy, et al., 2016;Tarboton et al., 2014). Hydrology instructors and students increasingly use HydroShare to access free cloud-based versions of several software programs and hydrologic models and use them for various research and learning applications, or to access previously uploaded static teaching resources (Ward et al., 2020).

| JupyterHub
CUAHSI JupyterHub is an open cloud-based environment for computational notebooks that allows users to create and share documents that contain live code, equations, visualizations and narrative text (Choi et al., 2021). Jupyter notebooks (https://jupyter.org/) are used to write, build, and run codes as well as run pre-installed software (e.g., TauDEM, Tesfa et al., 2011;Tarboton, 2018), but can also be used as teaching tools to build programming and data management skills.

| ESRI Story Maps
Finally, ESRI Story Maps combine narrative text with immersive content that fills the screen with maps, images, or videos for an engaging learning experience. While the code for ESRI Story Maps is not opensource, these cloud-accessed resources harness ArcGIS's analysis tools and GIS platforms, and can be hosted and made publicly available directly through ArcGIS Online. Story Maps allow students to directly interact with data through a personalized hands-on experience (e.g., Kerski, 2019). Alternatively, students can be assigned to create their own Story Maps to dynamically communicate project results (e.g., Battersby & Remington, 2013).
For example, the R waterData package allows a user to import daily hydrological data from the United States Geological Survey (USGS) web services and plot time-series data (R Core Team, 2020; Ryberg & Vecchia, 2012). A detailed description of R packages relevant for hydrologic analysis is provided in Slater et al. (2019).
The integration of open web-based platforms and programming packages allows engineering and hydrology students to use authentic data to make sense of the concepts they are learning in their courses, while learning about the data and tools that are openly available.

| HYDROLEARN PHYSICAL HYDROLOGY LEARNING MODULE
HydroLearn is itself an open web-based platform that aims to help hydrology instructors develop, share, and adapt learning modules. It combines research-based active learning methods with authentic online learning modules. The modular nature of HydroLearn and the dynamic computational notebooks allow instructors to use, combine, or adapt content, datasets and scripts from existing learning modules to their specific instructional needs and geographic settings. Active learning is supported through the ability to embed video-and image-based content, questions, other websites, and learning activity templates (Figure 1). Common elements of HydroLearn modules include Check-Your-Understanding (CYU) questions, quantitative problems, and authentic learning activities.
CYU question formats include multiple choice, checkbox, drag-anddrop questions, and open response to higher-level questions related to process interpretation. By contrast, authentic tasks are high cognitive-demand tasks built to reflect how knowledge is used in real life and to simulate the type of problems that a professional might tackle. Each learning activity has a grading rubric, an assessment tool intended to set clear expectations for students and make grading more objective. The platform provides wizards and templates to help instructors develop strong learning objectives and align the teaching activities, learning outcomes, and assessments, a process referred to in the learning literature as constructive alignment (Biggs & Tang, 2011;Kandlbinder, 2014). Sections M4 and M5 of the module are not covered here because they are similar to other sections in format and learning elements used. The entire module, including these sections is available online (Lane & Garousi Nejad, 2018). Table 1 lists key learning objectives, learning activities, open web-based platforms and data sources for each section.

| Section M1: Data analysis and statistics in hydrology
The first section of the learning module addresses all four aims outlined in the introduction. Students are introduced to fundamental concepts in hydrology while learning basic data analysis and management F I G U R E 1 Key components of HydroLearn learning modules include: Clear learning objectives and requirements (top-left, then clockwise), content combines multiple media, learning activity grading rubrics, and check-your-understanding (CYU) questions skills through a set of problems and an authentic learning activity. Key terms and concepts are introduced using an ESRI Story Map. The problems and authentic learning activity are performed in a Jupyter notebook accessed through HydroLearn. Following the established HydroLearn structure, the section starts by delineating the learning objectives and provides key background information, a detailed T A B L E 1 Detailed chart of key sections in the HydroLearn physical hydrology learning module, including learning objectives, learning activities, and open web-based platforms and datasets

Module section
Learning objectives (the student will be able to…)  The packages and code needed to perform the calculations are provided and well notated to familiarize students with basic programming notation and key functions. This prepares them for the next section in which they are asked to modify the code slightly to use a different dataset that they select. The notebook provides a gentle and contextbased introduction to R programming recognizing that learners without programming knowledge are more likely to be interested and see its value when it is applied in the context of an authentic problem (Kalelio glu, 2015).
In one problem, students first estimate long-term average evapotranspiration rates for several watersheds using a simple water balance model, and then compute the 95% relative and absolute uncertainties in these estimates.

| Water balance
In the final learning activity of this section, the student delineates the watershed that was analysed in the previous steps and then uses the delineated watershed to calculate key water balance components.
Streamflow data is downloaded from the USGS NWIS website for the Logan River stream gage. A web client is used to retrieve the annual 800-m precipitation for 30-year normals (1981-2010) from PRISM (Daly et al., 2000). This dataset is visualized and then clipped to the watershed extent in the Notebook. Finally, the student calculates mean annual precipitation over the watershed and reports this value along with watershed area, mean annual streamflow, and the runoff ratio.
F I G U R E 4 A Jupyter notebook guides students to use code to retrieve USGS streamflow data, plot time series, and answer summary questions based on their results

| Section M3: Runoff generation
Section M3 is distinct from the others in that it focuses on building examples from the field. The HydroLearn section solidifies key concepts discussed in the Story Map related to rainfall-runoff processes and promotes active learning through targeted CYU questions.

| Section M6: Simulating runoff using TOPMODEL
The culminating section of the module builds on data-driven learning skills developed in M2 and concepts covered throughout previous sections to simulate semi-distributed variable source area runoff generation in a tributary to the Logan River using TOPMODEL.
TOPMODEL is a conceptual hydrologic model that uses basic topographic and soils information to estimate runoff from the saturated and unsaturated zones (Beven, 1989). The location of the interface between the two zones, quantified by the water Perceptions on the utility of HydroShare and Jupyter notebooks were variable. In M1, students with even a small amount of programming experience were initially far more receptive to these tools than students with no prior experience, but this discrepancy diminished over the semester as students established more familiarity with the platforms and programming syntax. A subset of students with no programming experience indicated early on that they thought they would appreciate these tools more once they developed basic programming skills and were now interested to do so. Others said that they appreciated knowing that Jupyter notebooks exist, even if they were "still unable to replicate or augment the code so far." In terms of the structure of the notebooks themselves, most students were grateful for the amount of code that was already provided for them, but some indicated that doing more of the coding themselves would improve learning outcomes. Some expressed frustration about technological challenges such as losing server connection and needing to log out of JupyterHub and start over. as well as its integration with CUAHSI HydroShare, JupyterHub and ESRI Story Maps. One student said HydroLearn was "easy and straightforward to use, and provided all relevant links making it convenient to access everything… the layout was such that I easily followed instructions for the learning activities and found the questions I needed to answer." Other students noted that the "variety of ways in which the material was presented allowed for better understanding" and "allowed for more focus on principles rather than just coding." The CYU questions in particular appeared to help students solidify key ideas and support higher learning levels. For instance, in M1, students computed water balance uncertainty in several watersheds through a series of calculations and were then asked to check their understanding by comparing the different watershed results in their own words in the context of their physical catchment and climate settings. The computational notebooks allowed students to easily switch between calculations and text-based response in the same document. One student indicated that the CYU questions throughout the module "helped me focus on what was particularly important within smaller blocks of information" and "were really helpful for developing my understanding of rainfall-runoff processes." Another student articulated that "one of the things I enjoyed most about the module was that it really tested your understanding. The CYU questions in particular had relatively simple answers, but they did a good job of testing actual understanding of the concepts-especially the CYU question hints and explanations as to why the answer was correct."

| Instructor reflections
The effectiveness of integrating multiple open web-based platforms to enhance teaching hinged on the formalized pedagogical structure provided by HydroLearn. The emphasis on constructive alignment between teaching tools, learning activities and objectives facilitated development of activities that integrate data and tools from multiple sources while explicitly targeting mindfully crafted learning objectives across multiple levels (e.g., understand, apply, analyse). Mindful framing of questions encouraged the students to think critically about the underlying processes while learning the basics of the data analysis tools rather than getting lost in the mechanics of the calculations.
Each section was followed up by in-class discussion regarding which settings the equations and models worked well in and which settings gave strange results and why that might be. These discussions provided an opportunity to guide students to critically evaluate model assumptions and requirements based on their varied personal experiences working with different datasets. Most students chose to work with watersheds that they were personally familiar with, often where they had grown up. The discussions that followed were much more in-depth and engaged than those the instructor has had following learning activities that rely on pre-canned data from a well-behaved system.
In early applications of this learning module, the intense focus by the instructor on familiarizing students with the tools may have distracted from clearly conveying the value of these tools. Several students questioned the need to learn how to use programming and computational notebooks to complete learning activities given other ules that use open web-based platforms and programming, we encourage the use of code that is easy to understand, troubleshoot, and requires limited prior programming or operating system knowledge of students or instructors, particularly if the students have a range of backgrounds and programming experience. While we considered using only one programming language, all the functionality that we wanted to use was not equivalently available in either one of the languages. While there is an acknowledged burden associated with multiple languages, the notebooks were designed for students with little to no prior programming experience and we feel that the guidance on the differences and exposure to both languages is an important part of the learning experience.
Support mechanisms to guide learners through the data-driven procedures and provide just-in-time assistance are critical to the success of online learning activities (Habib et al., 2018;Kolodner et al., 2004). This is particularly true when multiple new tools are being presented at once, and it may be difficult to foresee where students might make mistakes or need assistance. For these reasons, the material should be presented with appropriate curricular expectations and include embedded interactive tools to support students' progression through the lessons and activities (Habib et al., 2018). The issues described above could likely be addressed in large part by incorporating additional technical support within the HydroLearn module. These user-support tools might include narrated video tutorials, additional CYU questions or check-in points, and formative feedback quizzes.
There are inevitable costs to the emphasis on new tools and software, and student feedback indicated some difficulty focusing on the key concepts and higher-level learning objectives with so much emphasis on using tools and performing calculations. Furthermore, with any technology and particularly open and open-source, there is always the challenge of future-proofing learning activities to limit the need to re-write or adjust scripts. Already, in the year since the module was first developed, several scripts had to be revised to accommodate a transition in CUAHSI JupyterHub's platform structuring. Even so, the open nature of HydroLearn allows for updates of the resources and content, as opposed to it being more difficult to update static, closed material (e.g., textbook, slides, pdfs, etc.). There are also numerous and growing options for platforms (e.g., Google Colab, GitHub) that may work as well or better than those applied in this module and have long-term support and cyberinfrastructure at a much larger scale.

| CONCLUSION
As an applied and interdisciplinary science, hydrology relies on direct experience with many different data sets and analysing many systems.
Teaching students to explore different datasets and how well models or equations capture hydrological processes in particular settings or to solve a particular problem is essential. The learning module described in this article is a case study that demonstrates harnessing state-of- the-science open web-based technology that is increasingly utilized by the hydrology professional community to enhance physical hydrology education and prepare students to apply open and reproducible tools and practices. The data-driven learning activities allowed students to work with datasets for systems that they were particularly interested in, and enabled critical evaluation of results and assumptions. Generally, based on student perceptions and the instructor's reflections, we found that: (a) harnessing web-based platforms facilitates data-driven learning, (b) the utility of computational notebooks should be more clearly communicated, and (c) opportunities remain to enhance student learning. Challenges included some resistance to programming and unfamiliar software and time consuming technical and technological difficulties. Opportunities to further enhance datadriven learning include better articulating the benefits of using open web-based platforms upfront, incorporating additional user-support tools, and focusing methods and study questions on implementing and adapting codes to explore fundamental processes rather than tools and syntax. The profound shift in the field of hydrology toward using open data and data analysis platforms requires hydrology instructors to rethink traditional content delivery and focus instruction on using these data and data analysis tools in the preparation of future hydrologists and engineers.