An Alternative Approach for Improving Prediction of Integrated Hydrologic‐Hydraulic Models by Assessing the Impact of Intrinsic Spatial Scales

The effect of spatial scale and resolution has been quantified individually for different hydrologic and hydraulic processes. However, the model structure and intrinsic resolution are seldom modified to accurately capture scale‐dependent physical processes. Although automated calibration methods exist for computationally expensive integrated models, an alternate approach reliant on improving the model structure is proposed here. This study advocates for a better representation of the intrinsic spatial scales of physical processes and their submodels by quantifying the impact of different types of spatial scaling on the overall watershed response. First, the effect of spatial extent scaling is quantified by evaluating the change in the basin response (e.g., streamflow and inundation extent) across a small and large subwatershed for the same region. Second, the effect of modifying the relative intrinsic spatial scales of surface‐groundwater (SW‐GW) submodels is quantified. Finally, the results are used to implement a better model structure for improving prediction across two watersheds with distinct physical characteristics. The findings suggest that the relative intrinsic scales of SW‐GW submodels may be different for different hydrogeological systems depending on the ratio of the characteristic length scales of hydrologic‐hydraulic processes. Conducting a scaling analysis can help identify how different physical processes can be best represented in integrated models for a range of climatological and physiographic conditions which can potentially serve as an alternative to extensive calibration in distributed models. Therefore, it is recommended that this analysis should be included as a prerequisite to extensive parameter calibration for large‐scale‐integrated models.

Since several physical processes are formulated in a single system in integrated models, several degrees of complexities arise, including but not limited to, coupling mechanisms (first order, sequential iterative), spatiotemporal scaling (spatial extent or parametric; hourly or daily), optimal numerical formulations, and computational efficiency (Atkinson et al., 2003;Clark et al., 2015;Ko et al., 2019;Paniconi & Putti, 2015;Patil & Stieglitz, 2015;Smith et al., 2004;Yu et al., 2014). Due to these complexities, there is a need to understand how spatial extent scaling can affect hydrologic and hydraulic processes, model parameterization, and intrinsic scaling of submodels (Peters-Lidard et al., 2017). Currently, these models are extensively calibrated using automated or manual approaches to produce a range of variables that can best capture the watershed complexities (Khakbaz et al., 2012;Pokhrel et al., 2012). However, automated calibration methods (e.g., sequential uncertainty fitting) often produce several combinations of parameters, even containing unrealistic ones, that may capture the basin response for specific events, but not across a broad range of climatological and antecedent conditions (Beven, 1993;Beven & Freer, 2001;Pappenberger et al., 2006;Reed et al., 2004).
One of the major issues in spatial extent scaling of integrated models is that the physical processes do not scale linearly from smaller to larger spatial scales as all physical processes exhibit their own characteristic spatiotemporal scales (Bloschl & Sivapalan, 1995). Spatial extent scaling refers to increasing the spatial extents of a model from a small subbasin with a single reach to a large basin with several drainage networks. Hydrologic processes such as rainfall, GW recharge, lateral seepage, and infiltration and hydraulic processes such as channel flow routing, travel time, and movement of hydraulic fluxes are all influenced by spatial extent scaling (Goodrich et al., 1997;Muleta et al., 2007;Seyfried & Wilcox, 1995;E. Vivoni et al., 2007;Wood et al., 1988;Woods et al., 1997).
Additionally, integrated models contain several submodels for hydraulic routing, rainfall-runoff partitioning, and GW flow simulations. Each of these submodels has an intrinsic spatial scale which refers to the average resolution or length (e.g., edge length for a triangular mesh or grid size for a square mesh) at which the hydraulic or hydrologic processes are calculated and aggregated across the entire basin. Due to the variability in how different processes work in an integrated model, the intrinsic spatial scales (or resolution) of submodels needed to accurately simulate these processes may be different for different physiographic and climatologic conditions.
Over the last few decades, several studies have evaluated the intrinsic spatial scales of single processes or individual submodels (Bloschl & Sivapalan, 1995). For example, Becker and Braun (1999) suggested that a simple spatial scaling of Instantaneous Unit Hydrograph for modeling lateral flows in basins can be obtained using drainage area and mean surface elevation. Yu et al. (2014) investigated the effect of spatiotemporal grid scales of input data (precipitation and soil moisture) on the simulated discharge response. The study concluded that the intrinsic scale (or resolution) of the rainfall-runoff model is affected by the choice and resolution of its parameters. In a similar study evaluating the effect of simplifying the watershed geometry on basin runoff response, Lane and Woolhiser (1977) found that a simple kinematic cascade geometry can preserve the runoff response of the watershed accurately. In terms of gridded models, this means that a maximum equivalent resolution exists for characterizing the overland flow region beyond which the simulated hydrograph becomes unreasonable. Additionally, Skøien et al. (2003) evaluated the characteristic space-time scales of precipitation, runoff, and GW flow using a variogram analysis.
One of the important studies in spatial scaling of integrated models by E. R. Vivoni et al. (2005) evaluated the spatial sensitivity of basin-averaged and distributed hydrologic response (water balance, runoff mechanisms, surface saturation, and GW flow) with respect to changes in topographic resolution. However, this study only analyzed the effect of modifying the spatial resolution of triangulated networks of the surface region (or intrinsic scale of the surface water [SW] submodel) and found that the near-stream resolution acts as a dominant control in affecting model performance. Considering that integrated models have both SW and GW submodels, there is a need to quantify the optimal intrinsic scales (or resolution) of both these submodels relative to each other in a single system for improving the model performance across a range of spatial scales.
Similarly, studies have evaluated the effect of spatial scaling on watershed properties and quantified the resultant spatial aggregation of these properties in models. For example, Rodriguez-Iturbe et al. (1995) analyzed the distribution of soil moisture from local point measurements to watershed scales (thousands of kilometers). Another study by Wu et al. (1982) analyzed the effect of roughness distribution in the direction of flow in affecting the simulated discharge and concluded that a single (uniform) roughness can be used for a given reach length provided that the hydrographs obtained from both single and spatially variable roughness are identical.
The primary hypothesis of this study is that the spatial extent scaling of integrated models can be improved through a better representation of the intrinsic scales of the physical processes and submodels, and a more realistic reproduction of overland flow characteristics can be achieved not only at the outlet but also within the watershed as an alternative to extensive parameter calibration. This hypothesis is tested through the following steps: (a) analyzing the effect of spatial scaling of model extents on the hydrologic and hydraulic processes (e.g., rainfall, recharge, and runoff) by quantifying the changes in basin response across a small and large subwatershed for the same region; (b) modifying the relative intrinsic structure (or resolution) of the SW-GW submodels and quantifying the change in model prediction; and (c) using the results from (a) and (b) to better capture the physical processes and improve integrated modeling at a larger spatial scale and across another watershed with distinct physical characteristics. Detailed description of the study hypotheses, objectives, and analysis approach is provided in Section 2.1.

Hypotheses and Analysis Approach
First, the change in basin response due to spatial extent scaling is quantified across a small and large subwatershed for the same region by comparing the streamflow prediction at the outlet and across four additional locations (for the large-scale model) where USGS gages are available for comparison. To identify whether the uncalibrated model parameters chosen a priori are physically feasible and justifiable, the integrated model with smaller spatial extents is validated across two USGS gages for both streamflow and depth for multiple short-and long-term hydrologic simulations. The inundation extents are also validated using an existing validated HEC-RAS model. To optimally delineate the relative contributions from different physical processes, two model structures, one containing only the surface hydraulics and other containing an integrated hydrologic and hydraulic structure, are developed. Further, the relative contributions from different physical processes, for example, rainfall and runoff generation, infiltration, lateral GW seepage, and GW recharge are quantified at a watershed scale after normalization. Finally, the study also quantifies the change in water table (WT) depths in the subsurface when the basin scale is changed from small to large while ensuring the SW-GW mesh structure remains constant.
The second objective aims to find an optimal model structure that can most accurately represent the watershed dynamics across a range of spatial scales for an appropriate set of model parameters (surface and subsurface) obtained from open access data sets. This is accomplished by modifying the intrinsic spatial scale of the GW submodel with respect to the SW submodel for a range of ratios to identify a ratio representative of the characteristic length scales of the hydrologic-hydraulic processes in the region which also results in best performance. As an alternative to extensive calibration, this study tests whether the performance of integrated models can be improved by adopting an optimal model structure. The effect of changing intrinsic spatial structure of the SW submodel is not evaluated here since several studies have quantified its effect and overall effect on watershed processes. This study focuses on modifying the relative intrinsic spatial scales of SW-GW submodels to change how they interact with one another to improve the spatial extent scalability and overall performance for a range of climatological and physiographic conditions.

Study Area and Data
A portion of the Upper Wabash River (UWR) basin in Indiana, USA (shown in Figure 1), is selected as the test site for this study. The UWR basin (4,385 km 2 ) experiences significant SW-GW interactions due to the existence of glacial till deposits, and shallow aquifers with a deep confining layer of shale. In addition, another watershed with distinct land cover but similar climatic conditions, is used as a validation site. Thus, a portion of the White River (WHR) basin with an area of 344 km 2 , and encompassing the city of Indianapolis, IN, USA, is also chosen for this study. As shown in Figure 1 and Table 1, the WHR basin is characterized by a significantly higher (86%) proportion of developed (urban) region, when compared to the UWR basin (10%), which has a predominantly agricultural land use. The average depth to WT for the UWR basin is 150 cm, while the WHR basin has a relatively shallow WT with an average vadose zone depth of 65 cm. A 9-m resolution topographic data set of high accuracy based on a LiDAR survey containing river channel bathymetry is used for both study areas (Dey et al., 2019).
The type of data, the spatial or temporal resolution of data, and the data sources are provided in Table 1. Several spatial data sets such as the land use, soil type, initial WT depth, impervious cover, shallow aquifer thickness, and roughness are used in this study. In addition to the land use distribution, Figure 1 also shows the location of the two study areas and the main hydrologic features such as the river networks, subbasins, and outlet. The soil classification in Table 1 suggests a dominance of soil group "C" for UWR basin and an increase in percentage of soil group "D" for the WHR basin. The average vadose zone depth from Table 1 suggests that the available storage in the unsaturated zone is higher for the UWR basin compared to the WHR basin. The NLDAS data are preferred over the National Oceanic and Atmospheric Administration's Climate Data Online, often referred to as rainfall gage data, since NLDAS has higher spatial resolution compared to gage data, which is sparsely distributed. The outlet of the UWR basin (shown in Figure 1) is located at the USGS gage 03335500 Wabash River at Lafayette, IN (latitude 40.422°N and longitude 86.897°W) and the outlet for the WHR basin is located at the USGS gage 03353000, White River at Indianapolis, IN (latitude 39.737°N and longitude 86.169°W).

Model Description
The ICPR model, a physically based, two-dimensional (2D) coupled SW-GW integrated model, is used for this study (Streamline Technologies, 2018). The model description is divided into three sections (Sections 3.1-3.3) to describe the model characteristics, development, and parameterization of the overland flow region, vadose zone, and 2D GW region. A representation of the physical systems, processes, and relevant model structure used for this study is presented in Figure 2a. In ICPR, momentum, energy, or diffusive wave equations are lumped along the edges of a triangular mesh ( Figure 2c) for flow routing from one vertex to another and the mass balance equations are lumped at irregular-shaped polygons (referred to as "honeycombs" in Figure 2c) which are formed around the triangle vertices and are used to establish the local control volumes. These honeycombs act as "subbasins" where all the hydrologic calculations are carried out. Even within an individual honeycomb, the geospatial data sets used for parameterizing land use and soil characteristics are intersected to form subpolygons containing unique intersections of these data sets (as shown in Figure 2c). Finally, the triangular mesh is used to develop a diamond mesh for roughness classification (referred to as "midnodes" in Figure 2c). For simulating overland flow, ICPR uses a multimesh structure and a finite volume solution for evaluating runoff as shown in Figure 2c. The conservation of mass equations for the finite volume solution in ICPR are written as follows: where Q = flow rate (L 3 T −1 );  E x = length of channel (L); Z 1 , Z 2 = WSE at upstream end of link, WSE at downstream end of link, respectively (L); and C f = conveyance factor.
ICPR uses the concept of nodes, links, and basins to solve the 2D overland flow which results in a single system of equations that is fully integrated. The water moves along the triangle edges and an idealized rectangular shape is assumed for flow computations. The average length and width of the rectangle are calculated from Equations 6 and 7, respectively. The average 2D slope is calculated from Equation 8 and includes a longitudinal slope component and a cross slope component. An exponential decay function, relating Manning's n to surface depth, is used in ICPR (Equations 9 and 10). To evaluate the discharge along two cross sections in the 1D flow region, the average friction slope is used (Equation 11). These methods are automatically used in ICPR based on the flow regimes, and the choice of the method is based on the information on friction slope provided in French (1985). These equations are also presented in Saksena et al. (2019Saksena et al. ( , 2020.

Vadose Zone Characterization
The soil moisture accounting, and subsequent recharge in this study, is computed using the "vertical layer method." In this method, the vadose zone is discretized into three vertical layers which are depicted in three shades in Figure 2b and are bound by the ground surface and the WT. This discretization provides a nonhomogeneous soil characterization in the vadose zone in which each of the three layers has distinct soil properties. Based on the gSSURGO database, each vertical layer in the vadose zone has an initial thickness of 50 cm; however, this thickness can change due to WT fluctuations during the simulation. This method is based on a noniterative kinematic approach which distributes each of the three individual layers into multiple cells ( Figure 2b) and tracks the fluxes through individual cells in both upward and downward directions. In this study, a total of 30 cells (10 for each soil layer) are used to track the flow of water though the vadose zone. While all the cells in a single soil layer start with the same initial parameters, each unsaturated cell is wetted or dried individually during the simulation depending on the direction and magnitude of the hydraulic flux, available soil storage, and unsaturated hydraulic conductivity.
Within the vadose zone, the water enters the soil column at the topmost cell and the hydraulic fluxes are calculated at the bottom of each cell based on the current moisture content and the unsaturated vertical conductivity as estimated by the Brooks-Corey method (Rawls & Brakensiek, 1982), shown in Equation 12.
where θ = current moisture content,  r E = residual moisture content, φ = saturated moisture content, Further, a mass balance is performed for each cell depending on the input flux minus the output flux and the moisture contents are updated based on these fluxes. Once the calculation reaches the bottommost cell of the vadose zone, the downward flux from this cell is removed from the vadose zone and delivered to appropriate GW node. After this step, the cells are rebalanced from the bottom to the surface while ensuring that the moisture content in each cell does not exceed saturation. In case any cell contains more moisture than the saturated moisture content, the excess is moved to the adjacent (upward) cell and delivered to the surface cell if the excess reaches the topmost cell. Based on the dynamic movement of the WT, the cells are "fully wetted" as the WT moves upward or "dried" to field capacity if the WT drops below the saturated cells. For more information, please refer to Saksena et al. (2019Saksena et al. ( , 2020.

Two-Dimensional Groundwater Characterization
Within the surficial aquifer, the GW flow in ICPR is calculated using the 2D unsteady phreatic solution of the continuity equation using a finite element formulation. The continuity equation for unsteady phreatic 2D GW flow is shown in Equation 13. Additionally, the velocity vectors for an isotropic porous media are expressed using Equation 14. The water transport, storage variation, and external flows into the vadose zone and overland flow region are calculated by creating a six-point triangular element using a quadratic interpolation function dependent on a cyclic permutation (Figure 2d), and the system of equations is solved using the Cholesky method (Kuiper, 1981;Martínez, 1989). Within ICPR, Equation 15 is expressed and solved for six nodes, which are formed using three triangle vertices and three midside points, and the solution is approximated using finite element formulations for each individual triangle over the entire mesh network. Finally, the seepage rates along a sloping ground surface, riverbank, or seepage faces on a hill are calculated using Equation 16. Detailed equations and their approximations used for GW modeling in ICPR are provided in Section SI-1.
where n is the fillable porosity (or specific yield); h is the GW elevation (piezometric head, L); u, v are the velocity vector components (L T −1 ); t is time (T); x, y are the Cartesian coordinates (L); K is the permeability (conductivity) of the porous media; A-F = coefficients of the six-point quadratic function; seepage The changes in WT location due to GW flow, and subsequent recharge from the vadose zone, are dynamically tracked at each time step. The surficial aquifer is bounded by a spatially distributed bedrock layer with variable thickness, which is assumed to be impermeable. Excess rainfall that does not enter the soil column is assigned to its respective surface node and is routed from one surface node to another. Water that reaches the bottom of a given soil cylinder by crossing the interface between the vadose zone and the saturated zone is aggregated and delivered to its respective GW node as recharge. If the soil column in the vadose zone becomes saturated and there is inundated water on the surface, the vadose zone is bypassed, and a known head condition is applied to the corresponding GW node based on the surface node stage. In this scenario, the seepage into or out of the GW system at that location is calculated by the GW model and removed from or delivered to the surface node based on Equation 16.
As mentioned earlier, the WT distribution is built using contour maps developed by the Indiana Department of Natural Resources (IDNR) using an extensive observational database. These contour maps are used to generate an initial WT depth raster using inverse distance weighting interpolation method which is subtracted from the ground LiDAR to obtain an initial WT surface. This raster is only used as an initial condition and the final surface is obtained after simulating the model for a substantially long low flow period (>1 month) to ensure that the WT distribution reaches equilibrium. The equilibrium is obtained when the baseflow simulated by the model consistently matches the observed low flow and the overall mass balance error (MBE) in the system is substantially reduced (−2% < MBE < +2%).

Mesh Development for SW and GW Zones
The 2D mesh for both SW and GW is generated using two features called "breaklines" and "breakpoints" as shown in Figure 3b. Both the SW and GW regions are discretized at the same mesh resolution. "Breaklines" are used to digitize the stream network as each point on the "breaklines" is assigned a triangular vertex node in ICPR. The "breaklines" are digitized across the entire length of the river networks such that the minimum distance between any two vertices is 9 m and the maximum edge length along straight channels is 27 m with an average edge length of 15 m in the river channels. Similarly, "breakpoint" features are used for mesh digitization and refinement outside the river channel, and to better define storage in the modeling domain. Equilateral triangular "breakpoint patterns" with an edge length of 60 and 300 m are assigned inside and outside the river floodplains, respectively. This ensures the formation of a triangular mesh of average edge length 15 m in the river channels, 60 m in the floodplain, and 300 m outside the floodplain as illustrated in Figure 3b and presented in Table 3. In addition, a 1D level-pool nodal representation of the ponds located throughout the watershed is integrated with the overland flow model. The 9 m LiDAR is used to extract stage-area rating curves, at a vertical resolution of 0.5 m, thereby ensuring that the wetted areas are calculated from the lowest elevation in the pond up to the pond control volume at every 0.5 m. A pond control volume is obtained by digitizing pond features in ArcGIS using a combination of terrain data and aerial imagery as shown in Figure 3a. Since the WHR basin is located in a highly developed region, the road networks in the basin can play a significant role in routing the flow during extreme events. Therefore, the TIGER/Line® shapefile of road networks published by the U.S. Census Bureau is imported into ICPR ( Figure 3a) as "breaklines" after simplifications to ensure that triangle edges are formed along the roadways thereby capturing important topographical features.

Model Parameterization
The shallow and deep roughness values corresponding to the NLCD land use classes (Table 2) are directly obtained from the maximum and minimum roughness values from Chow (1959). Table 2 presents the initial subsurface parameters used for the vadose zone and the GW region. The gSSURGO database contains spatially distributed hydrologic soil group classification based on runoff potential which includes variable parameters based on depth of soil. The four soil zones used in this study are based on this classification. From this data set, the median values of the saturated vertical conductivity, moisture content at field capacity, and moisture content at wilting point (presented in Table 2) are extracted for the specific soil zones.
The other parameters presented in Table 2 are extracted using the soil texture percentages (e.g., percentage of sand and clay), which are also available from the gSSURGO database. These soil texture percentages are incorporated in the equations provided in Rawls (2004) for estimating the other subsurface parameters such as pore size index, soil matric potential, and residual moisture content (Streamline Technologies, 2018). For the GW zones, the map of the spatial distribution of different unconsolidated aquifer systems for Indiana is available from IGIC's Indiana Map Server (http://maps.indiana.edu/index.html). The aquifer properties for Wabash Valley are obtained from a study on the investigation of GW resources in Indiana (Pohlmann, 1987). Once these parameters are extracted, they are imported into the model. Since the modeling is dynamic, these parameters are used as initial or boundary conditions for the model and the actual values of moisture content, WT depth, and hydraulic conductivity evolve with time based on the amount of water percolating into the subsurface.

Delineating the Spatial Extents of Models
The 4,385 km 2 model of the unregulated UWR basin (shown in Figure 4a) is characterized as the model with a larger spatial extent which is then reduced to a smaller model ( Figure 4a) with a spatial extent of 1,757 km 2 . The spatial extents for the models are determined by the location of upstream USGS gages that act as boundary conditions for flow input (Figure 4a). While the model with the smaller spatial extent (referred to as "S") contains four upstream gages including the Wabash River, the model with the larger spatial extent (referred to as "L") contains eight upstream gages (Figure 4a). The USGS gage 03329000, Wabash River at Logansport, is used as the upstream boundary in the Wabash River for the "S" model, and the USGS gage 03323500, Wabash River at Huntington, is used for the "L" model. However, it is ensured that the outlet for both models is at the USGS gage 03335500, Wabash River at Lafayette (Figure 4a). Table 3 provides a description of the model configurations used in this study.

Two Model Structures for Analyzing the Effect of Spatial Extent Scaling
To identify which hydrologic outputs are affected by increasing the spatial extents of the model, the differences in surface and subsurface contributions across "S" and "L" models need to be quantified individually. Therefore, the effect of spatial extent scaling is analyzed across two degrees of process-based complexities using two types of model structures, with and without subsurface connectivity. First, the 2D hydrodynamic modeling structure is created using the methodology described in Section 3.1. This model structure only incorporates the hydraulic processes and, thus, is based solely on streamflow routing governed by roughness characterization. The subsurface is assumed to be impervious and there is no infiltration in this structure. The models created using this structure for small and large spatial extents are referred to as "S-1" and "L-1," respectively.  Second, a fully integrated modeling structure is developed following the procedure described in Sections 3.1-3.3, which contains the surface, vadose, and GW zones. The models created using the integrated modeling structure across small and large spatial extents are referred to as "S-2" and "L-2," respectively (Table 3). Details on how the integrated model structure is developed and how its performance compares to the 2D hydrodynamic modeling structure can also be found in Saksena et al. (2019).
To analyze the effect of spatial extent scaling on model performance, the streamflow prediction at both extents scale is compared with observed streamflow at the outlet and across four additional locations (for the large-scale model) where USGS gages are available. These locations represent six reaches representing different stream length scales in the watershed. This is done to ensure that the results are valid and applicable across a range of streams types and watersheds. Table 6 presents the geomorphologic characteristics for the study reaches used for comparison of streamflow with observed data and Figure 4b shows the locations of these USGS gages. The relative contributions from different physical processes, for example, rainfall and runoff generation, infiltration, lateral GW seepage, and GW recharge are quantified for both small-scale and large-scale (S-2 and L-2) models after normalization. The change in WT depths in the subsurface when the basin scale is changed from small to large is also quantified while ensuring the SW-GW mesh structure remains constant. . Illustration of (a) two spatial extents used in this study with upstream USGS gages, (b) location of USGS gages used for testing the effect of spatial stream length scaling, (c) fine resolution triangular mesh, and (d) fine resolution honeycomb mesh.
A limitation of the approach adopted here is that the change in soil moisture spatial distribution is not considered. Instead, the spatial and depth variability in the vadose zone is captured through the WT distribution. Although the ICPR model tracks soil moisture spatial and depth variability during the simulation, soil moisture maps at any given time cannot be extracted in the current version. The future version of the ICPR model, which is currently under development, will have the capability of extracting soil moisture maps, and therefore, the analysis for future studies would include comparison with soil moisture observations.

Validation of S-2 Model
This step is carried out to identify if the set of uncalibrated model parameters chosen a priori are physically feasible and justifiable, since the issues with modeling structure at larger spatial extents can only be analyzed if the parameters produce realistic and consistent predictions at smaller spatial extents (Beven, 2002;Wood et al., 1988). The validity of the model parameters is evaluated by simulating S-2 across three independent events, a 4-month simulation from April 1 to July 31, 2013, a 3-month simulation from February 15 to May 15, 2018, and a relatively smaller storm event with a 10-year peak flow from May 5 to May 27, 2009. The streamflow and depth at the outlet (Wabash River at Lafayette) for the 4-month simulation is compared with observed data and shown in Figures 5a and 5b. In addition to the outlet, the streamflow prediction at an additional gage inside the model domain (Tippecanoe River at Delphi) is also presented in Figure 5c.
Since observed extents of inundation for these events are not available, the maximum inundation extents in 2013 are compared with a reference data set generated using a calibrated HEC-RAS model. The F-statistic (Equation 17), which is the ratio of area of intersection to the area of union, is used to quantify the difference in spatial variability of the maximum inundation extents between any two configurations. In addition to comparing S-2 and the HEC-RAS model, the F-statistic is used to evaluate the difference in maximum inundation extents between the configurations described in Section 4.4.
where 12 E A = area of intersection between two configurations, 1 E A = inundation area of first configuration, and 2 E A = inundation area of second configuration.

Effect of Changing the Relative Intrinsic Spatial Scale of the SW-GW Submodels
This section focuses on testing the effect of modifying the relative structure of the SW-GW submodels. Initially, both the SW and GW regions are discretized at the same spatial resolution (Section 4.1), and hence the intrinsic scales of the SW and GW submodels are assumed to be the same. First, the GW resolution of S-2 is modified to a coarser resolution and SW resolution is left unchanged. The S-2 model is converted into two The values corresponding to the SW and GW resolution in the river (R), floodplain (F), and the overland region (O) outside the floodplain represent an average resolution or vertex spacing in these regions determined from the SW and GW meshes.

Table 3
Description of Model Configurations coarser resolution (3 and 5 times) models (referred here as S-3 and S-4) by changing the average triangular vertex spacing of the GW submodel, as shown in Table 3. The percentage of SW to GW triangles reduces significantly from a 100% in S-2 to 11.4% in S-3 and 7.1% in S-4 as shown in Table 4. Additionally, a model with a coarser GW resolution is generated for the larger spatial extent, and revised model is referred as L-4.
The difference in performance across S-3 and S-4 model configurations is evaluated by quantifying the improvement/reduction in streamflow prediction and overall quality and shape of the hydrograph. Further, the difference in model performance across four gages and six statistical parameters for L-1, L-2, and L-4 is also quantified. In addition, the baseflow and direct runoff variability across models with different ratio of SW to GW resolution (L-2 vs. L-4) is also evaluated. Baseflow is a useful parameter in optimally delineating the contribution from GW to SW flow and direct runoff is used for quantifying the contributions from precipitation forcing. The baseflow is evaluated using the linearized Dupuit-Boussinesq approximation (Equation 18) resulting in an exponential equation (Tallaksen, 1995) provided below: where t E Q is the flow at time t, 0 E Q is the flow when t = 0, and k is the exponential decay constant. This equation is used to evaluate the baseflow for all the events for L-2, L-4, and observed data. The baseflow is then used to quantify the contributions from direct runoff. The estimated baseflow is also compared to the lateral seepage inflow and outflow rates obtained for S-2 to confirm the applicability of baseflow separation method used here. Finally, the subsurface variability in the prediction of S-2 and L-2 is also quantified by evaluating the difference (both positive and negative) in the median WT depth simulated during the 120day (4-month) period. Similarly, this difference is evaluated between S-4 and L-4.
It should be noted that the aim of the study is not to validate the GW submodel against observational data, instead, the study analyzes how modifying the relative intrinsic scales of SW-GW submodels can help in identifying the "optimal" structure which improves the surface system predictions measured using simulated hydrographs and resultant flood inundation. Unless WT observations are available publicly, comparison of this data with model simulations is not feasible especially when developing large-scale-integrated models. An ideal scenario would be to compare spatially distributed WT observations across model configurations, but due to lack of these observations, this study quantifies the change in WT distribution when the ratio of SW-GW mesh structure is altered.
Section 4.2 validates the streamflow hydrographs, depth, and extent of inundation for the S-2 model, thereby, showing that the surface system is capturing the overall volumes reasonably. Since the net change in storage at any time is equal to inflow minus outflow, and the overall mass balance of the system is predicted reasonably (−2% < MBE < +2%), the study assumes that the net inflow for the subsurface (outflow for the surface system) is captured reasonably, hence the change in WT distribution can be considered as a representative variable reflecting the change in GW contributions across different model configurations. Therefore, the change in the subsurface structure is analyzed by quantifying the change in WT distribution across different configurations.

Mesh Properties, Spatial Aggregation, and Scaling of Model Configurations and Parameters
The SW mesh resolution is the same for all four models (S-1, S-2, L-1, and L-2) and is created using the methodology described in Section 3.2. There is no GW mesh for S-1 and L-1 but the GW mesh resolution for S-2 and L-2 is the same as the SW mesh resolution. For illustration, the triangular and honeycomb mesh structures generated near a confluence for the models are shown in Figures 4c and 4d.
After creating these configurations, SW mesh is overlain on the LiDAR DEM and the elevation of LiDAR cell (9 m resolution) directly intersecting with a specific triangular vertex is extracted and assigned to the vertex. Similarly, the GW mesh is overlain on the WT Depth raster (30 m resolution) and the initial elevations are

Table 4 Spatial Aggregation and Geometric Characteristics for the Integrated Model Configurations With Different SW-GW Mesh Resolutions
extracted. The spatial aggregation parameters for these meshes are calculated and presented in Table 4. Since the same SW mesh is used for S-1 and S-2, and L-1 and L-2, the number of triangles and the mean triangular area are the same, so only the parameters for S-2 and L-2 are reported. To compare the spatial resolution of triangulated irregular networks (TINs) with gridded DEMs, E. R. Vivoni et al. (2005) used two spatial aggregation parameters, the horizontal point density ( E d ) and the equivalent cell size ( e E r ), which can be evaluated using Equations 19 and 20.
where E d = horizontal point density used to capture the degree of aggregation in a triangular mesh compared to a fixed grid model (E. R. Vivoni et al., 2005), t E n = number of triangular nodes, g E n = number of DEM nodes, e E r = equivalent cell size (or length scale), and E A = basin area. The number of DEM nodes   g E n is evaluated using the 9 m LiDAR for the SW region and the 30 m WT depth raster for the GW region.
The " e E r " values for the SW and GW meshes for S-2 (114.6 m) and L-2 (122.6 m) represent the equivalent spatial resolution of the TINs. Similarly, the " E d " values provide a ratio of the number of computational elements used in the SW (S-2 = 0.006 and L-2 = 0.006) and GW (S-2 = 0.057 and L-2 = 0.050) meshes in comparison to the fixed-resolution-gridded raster data sets. Table 4 also presents the average geometric characteristics of the model configurations. The total number of subpolygons or soil cylinders generated for the models is also shown in Table 4 which represent unique intersections of the land use, soil, and rainfall data sets with the SW mesh (Section 3).
Generally, varying relative SW-GW mesh resolution or spatial extent scale for integrated models can impact the parameterization of the input layers and associated attribute table of parameter values. The way the model structure is designed in this study significantly reduces the impact of parameter spatial aggregation, especially for the land use and unsaturated zone properties. First, the study incorporates a consistent resolution of the SW model by not varying the SW mesh structure across different configurations (as shown in Table 3) that are built for evaluating the effect of spatial extent scaling and intrinsic spatial scaling of the GW submodel (or resolution). In the context of this study, this relates to how the watershed is discretized into smaller irregular-shaped "honeycombs" that act as "subbasins" following a distributed structure as shown in Figure 2c. Since the total number of surface honeycombs remains the same, the land use parameters and the parameters assigned to the soil cylinder remain constant when the GW mesh resolution is varied. The study uses 30 m raster data sets for assigning the land use and soil properties to corresponding grid cells.
Since the surface mesh is constant, these raster data sets intersect with the same mesh across different configurations, therefore, the input precipitation remains the same, and the land use and soil parameters remain the same.
Second, the ICPR model allows for subgrid variability of geospatial parameters inside an individual mesh element. As mentioned earlier, "honeycombs" act as "subbasins" where all the hydrologic calculations are carried out. Even within an individual honeycomb, the geospatial data sets used for parameterizing land use and soil characteristics are intersected to form subpolygons containing unique intersections of these data sets (as shown in Figure 2c). Therefore, the spatial scale and resolution of geospatial parameters are not aggregated or modified even within the smallest mesh element, and hydrologic calculations (rainfall-runoff generation) are individually carried out for each of these subpolygons. This can also be confirmed in Table 4 as the total number of subpolygons forming unique soil cylinders (2,077,171) remains constant for L-2 and L-4 configurations even though the GW mesh structure is varied. Therefore, the amount of water propagating through the hydrologic system does not vary when the GW mesh structure is changed, instead, the flow of water is impacted by differing WT gradients underneath the SW model, thereby, changing the available subsurface storage. Thus, a distinct advantage of approach used here is that the effect of modifying the GW resolution (or subsurface topographic variability) can be independently evaluated.

Numerical Simulation and Statistical Analysis of Watershed Responses
The configurations (described in Table 3) for the UWR basin are used to simulate the 2013 summer floods of Indiana (April 1 to July 31, 2013). During this period, multiple rainfall events of varying magnitude were observed in Indiana that also resulted in a peak streamflow corresponding to a 50-year return period in the Wabash River at Lafayette. A warmup period of 100 hr is chosen for all simulations for all model reaches to be fully connected. The statistical analysis involves evaluating the goodness of fit between all the models and observed data and is carried out using the coefficient of determination (R 2 ), slope parameter (for comparison about the line Y = X), and the Nash-Sutcliffe efficiency (NSE). The Percent Bias (PBIAS) is evaluated to compare the outflow volumes from the watershed with the observed data. This parameter also provides an estimate of the overprediction and underprediction of water storage in the system, as the inflow volumes for all models at a given scale are the same. Therefore, it is used as a representative parameter for comparing the overall water balance across models. Additionally, the scale-normalized-root-mean-squared error (RSR) is evaluated to estimate the RMSE in prediction, scaled by the standard deviation of observed data. Finally, the model performance is also determined by the percentage of error in peak flow prediction (%Error peak flow), as the peak flow governs the maximum inundation extents due to flooding.

Results
The results are presented in four parts: (a) validation of uncalibrated model parameters using S-2, (b) analyzing the effect of scaling of spatial extents of the model, (c) evaluating the effect of changing the intrinsic scale of the GW model, and (d) validation of improved spatial scale representation at another watershed.

Validation of Uncalibrated Model Parameters for Multiple Events Using S-2
The statistical parameters (R 2 = 0.96, NSE = 0.95, and PBIAS = −6.4%) for the 4-month simulation (Figure 5a) and results shown in Figures 5b-5f suggest that the streamflow, depths, and inundation extents are predicted accurately, and hence, the model parameters used in this study are reasonable for spatial scale analysis. It should be noted that the NSE values reported here are higher than what are typically seen in hydrologic models. This is because traditionally, NSE values are evaluated over a long-term (>1 year) simulation at a daily scale, while in this study the NSE values are evaluated at an hourly scale for subyearly (<6 months) simulations which improves the absolute value of NSE. An F-statistic value equal to 88.1% suggests a high correlation between the extents of inundation obtained from the two models. Finally, the streamflow comparison for 2018 and 2009 is presented in Figures 5e and 5f. Further, the results also suggest that the model parameters used in this study are reliable and independent of the events chosen.

Comparison of Streamflow and Inundation Extents
In this section, the 300-hr (or 12.5-day) period containing the storm event with the 50-year peak flow is compared across the four models (S-1, S-2, L-1, and L-2). A smaller period is chosen to effectively delineate the effect of spatial scaling of model extents on streamflow and inundation extents. The results shown in Table 5 suggest that S-1 overpredicts the peak flow by 11.6% compared to S-2, which overpredicts the peak flow by only 1.2%. While there is some difference in peak flow, the overall hydrograph shape, PBIAS, and NSE for both models are similar (Figure 6a). While the performance of S-1 and S-2 is similar, the results from Figure 6b highlight the diminished performance of the two model structures for L-1 and L-2. Additionally, there are notable differences in the hydrographs for L-1 and L-2, which suggests that spatial scaling of model extents increases the influence of the contributions from subsurface processes.
The PBIAS for L-1 (−7.45%) is significantly higher compared to L-2 (−1.60%), and error in peak flow is also higher for L-1 (−16.4%) compared to L-2 (−10.6%), indicating that the contributions from subsurface processes, which in-turn determine the water balance, change from S-2 to L-2. A lower PBIAS for L-2 also indicates that there is minimal MBE, however, this also means that more storage is trapped in the overland flow system, indicating that the inaccurate representation of the intrinsic scales (or resolution) of SW-GW submodels may be the cause for peak flow underprediction. Although Figures 7a-7d only show a portion of the maximum inundation extent at the instance peak flow is observed at the outlet, they clearly highlight the impact of different model structures between S-1, S-2, L-1, and L-2. On comparing the maximum inundation extents (depth >15 cm) across the entire simulation duration between S-1 (88.9 km 2 ) and S-2 (93.3 km 2 ), the F-statistic is reported as 88.6% suggesting an acceptable correlation between the two configurations. A low value of F-statistic (67.5%) between L-1 (138.7 km 2 ) and L-2 (198.9 km 2 ) shows increased water contribution from headwater streams with higher hydraulic conductivity and steeper valleys (Figure 7d), thereby, increasing the overall inundation.
The flow prediction statistics shown in Table 6 suggest that L-2 performs with high accuracy for reaches with a smaller length and drainage area including Eel River (12 km) and Tippecanoe River (23 km), which is also highlighted by near perfect flow hydrographs in Figures 8a and 8b. However, the flow hydrographs in Figures 8c-8e and statistical comparison in Table 6 highlight the relative inaccuracy of L-2 for reaches with larger stream lengths and drainage area. The performance is affected by the inability of L-2 in capturing the baseflow for the Wabash River at Wabash and Wabash River at Peru. The results also show that prediction of L-2 improves from upstream (Wabash River at Wabash) to downstream (Wabash River at Lafayette), therefore, the inaccurate contributions at upstream locations in the main river and headwater streams are nullified as the streamflow moves downstream since smaller reaches with accurate prediction contribute water to the main river (Wabash River at Lafayette).

Comparing Volumetric Contributions for "S-2" and "L-2"
Figures 6g and 6h present the volumetric contributions (in mm) from subsurface processes during the simulation for S-2 and L-2, which are evaluated by dividing the total volume by the total watershed area. The precipitation values (Figure 6g) indicate that there is a similar amount of rainfall occurring for S-2 and L-2, which highlights the magnitude and extent of the 50-year flood event inside the simulation period.
While the rainfall aggregation across the two spatial extents is similar, the subsurface contributions change from S-2 to L-2, as shown in Figures 6g and 6h, respectively. More specifically, the amount of vadose zone storage increases for L-2 when compared to S-2. Similarly, the amount of rainfall excess (or surface runoff from rainfall) is higher for L-2 when compared to S-2. This difference is expected as there is an increase in the impervious land cover in the floodplain of about 0.7% from S-2 to L-2, which increases rainfall excess. As excess water gradually reenters the soil column in ponded areas over time, the rainfall excess reduces across both extents. The recharge (Figure 6h) on the other hand, is similar for both S-2 and L-2, which supports the argument that the recharge operates over larger characteristic length scales when compared to overland region and vadose zone, and therefore, aggregates linearly and does not change significantly even after increasing the modeling extents (Bloschl & Sivapalan, 1995).
Since rainfall characterization is influenced by the intrinsic scale of the input data set, the consistency in rainfall volumes across two spatial extents in this study is also driven by characterizing rainfall variability using a relatively fine NLDAS rainfall forcing (12 km resolution). Considering that the characteristic length scale for hourly rainfall lies within 10-20 km (Skøien et al., 2003), the spatial aggregation of rainfall is more feasible using this data set. It is expected that if a coarser intrinsic scale (or resolution) of rainfall data set is used, for example, a sparsely distributed gage data, the difference in scaled rainfall volumes may change across the small and large models which can influence the streamflow hydrograph and hydrologic volumes. While the intrinsic scale of rainfall forcing is not considered as a factor in this study, it can play an important role for other watersheds and/or rainfall events.

Evaluating the Effect of Changing the Intrinsic Scale of the GW Submodel
This section presents the results of increasing the intrinsic scale of the GW model or coarsening the GW resolution with respect to the SW resolution.

Effect on Streamflow Prediction and Hydrograph Shape
As shown in Table 6 and Figures 9a and 11b, the use of a coarser resolution GW mesh for both S-3 and S-4 improves the prediction of rising and falling limb of the hydrograph, lag-time, and hydrograph duration, when compared to S-2, which has a shorter hydrograph duration, and errors in the rising limb and lag-time ( Figure 6). This suggests that the subsurface contributions to streamflow are better represented using S-3 and S-4. A thin-shaped "flashy" hydrograph with a short duration for S-2 suggests relatively less contribution from subsurface processes. The statistics provided in Table 6 suggest that the overall performance of S-3 is comparable to S-4; however, the PBIAS comparison suggests that S-4 (−0.1%) has a slightly improved water balance compared to S-3 (−3.59%). By incorporating optimal GW resolution in S-3 and S-4, the streamflow estimation improves significantly, highlighted by smaller error variance (RSE), higher NSE, and R 2 values. Therefore, using a GW resolution which is approximately 3-5 times the SW resolution results in improved model prediction for this watershed.

Scaling of Model Extents by Modifying the Intrinsic Scales of Submodels
Section 5.3.1 suggests that modifying the intrinsic scale of the GW submodel with respect to the SW submodel improves the streamflow prediction for the S-4 model. The results shown in Table 6 and Figure 9 highlight the improvement from L-2 to L-4, and therefore, suggest that using optimal intrinsic scales of SW-GW submodels results in a better streamflow prediction even for larger spatial extents. The streamflow prediction across four USGS gages in the Wabash River improves significantly, for example, the NSE for Wabash River at Wabash increases from 0.32 for L-2 to 0.76 for L-4. Additionally, the performance of S-4 and L-4 becomes comparable. For example, the PBIAS (S-4 = −0.1% and L-4 = −0.4%) and NSE (S-4 = 0.96 and L-4 = 0.97) for Wabash River at Lafayette are very close to each other suggesting that the model performance is not affected by increasing the spatial scale of model extents. On the other hand, there is a significant difference in the performance of S-2 and L-2 shown by the higher variability in PBIAS (S-2 = 7.4% and L-2 = 16%) and R 2 (S-2 = 0.97 and L-2 = 0.84). Therefore, modifying the intrinsic scales of the submodels improves the spatial scalability of integrated models.

Table 6 Geomorphic Characteristics of Study Reaches and Performance of Different Model Configurations in Predicting Streamflow at USGS Gage Locations for the UWR (S-3, S-4, L-2, and L-4) and WHR Basins (WHR 1 and WHR 2)
The difference in the model performance across four gages and six statistical parameters for L-1, L-2, and L-4 is presented in Figure 10. The results from Figure 10 show that incorporating an integrated modeling structure in L-2 instead of a purely hydrodynamic structure (impervious surface) in L-1 improves the overall streamflow prediction in the watershed. Further, modifying the intrinsic scales of the submodels in L-4 results in a greater improvement in streamflow prediction than simply using an integrated model (L-2). The extent of improvement also varies from upstream to downstream with the upstream-most gage having the largest improvement (Wabash River at Wabash) and the downstream-most gage (Wabash River at Lafayette) having the smallest improvement. Hence, the degree of improvement also changes depending on the location of the reach in the watershed.

Estimating Baseflow and Runoff Variability Across Models
Two hydrographs corresponding to the 120-day simulation of the L-2 and L-4 configurations at the Wabash River at Lafayette are selected to evaluate the baseflow using Equation 18 (presented in Section 4.4). The baseflow is then used to quantify the contributions from direct runoff. As shown in Figure 11d, the baseflow for L-2 is substantially higher compared to observed baseflow with a PBIAS = 88.9%. On the other hand, the baseflow for L-4 matches the observed baseflow reasonably with a PBIAS = −4.2%. Although there is a significant different in estimated baseflow, there is a less severe difference in direct runoff generated from L-2 (PBIAS = −12.0%) and L-4 (PBIAS = +0.4%) when compared to observed data, and this difference can be attributed to the inaccurate peak streamflow estimation in L-2. The hydrographs obtained for other locations (Figure 9) confirm the results shown here, hence, the results from the baseflow analysis across one location are also applicable to other locations even though, the difference in baseflow estimates may be different. Moreover, the recession limbs of the hydrographs (Figures 11a and 11b) suggest an inaccurate prediction for L-2 and a good fit for L-4. Since the recession limbs of the direct runoff curves (Figure 11c) fit well with observed data for both L-2 and L-4, the results suggest that the error in L-2 is cause primarily caused due to inaccurate baseflow prediction, highlighting the need for modifying the relative intrinsic spatial scales of the SW-GW submodels to improve baseflow estimation. The estimated baseflow is then compared to the lateral seepage inflow and outflow rates obtained for S-2 ( Figure 11e) to confirm the applicability of baseflow separation method used here. These volumetric rates for S-2 are aggregated across all the river channels and represent the total inflow and outflow in and out of the GW region. The shape of the baseflow curves in Figure 11d matches well with the shape of the lateral seepage inflow for S-2 (Figure 11e), thereby, confirming that the baseflow estimation method is suitable.

Effect on Distributed Flood Inundation Depths and Extents
Section 5.2.1 highlights the impact of model structure as seen by the maximum inundation area (km 2 ) and F-statistic for S-1  E S-2 and L-1  E L-2. When compared to S-2 (93.3 km 2 ), S-4 has a greater maximum inundation extent (99.5 km 2 ) and the F-statistic representing S-2  E S-4 is equal to 87.7%. Figures 12a-12d present the distributed flood depth (in meters) maps representing the maximum inundation extents for S-2, L-2, S-4, Figure 10. Variation of statistical performance in streamflow prediction across four USGS gages using L-1, L-2, and L-4 model configurations.
and L-4. The improvement in streamflow prediction (Figures 8c,8d,9c,and 9d) at Wabash River at Wabash and Wabash River at Peru (Figure 4b) is translated into increased extent of inundation between these two gages which can be seen in Figures 12b and 12d. The maximum inundation extent for L-4 (211.3 km 2 ) increases when compared to L-2 (198.9 km 2 ) with an F-statistic representing L-2  E L-4 equal to 91.0%. To evaluate how changing the model structure impacts the spatial extent scaling with respect to distributed flood inundation extents, the F-statistic representing S-2  E L-2 and S-4  E L-4 is quantified. The increase in F-statistic from S-2  E L-2 (79.2%) to S-4  E L-4 (86.3%) shows that the difference in maximum inundation extents between "small" and "large" models reduces. These results highlight the impact of modifying the intrinsic spatial scales of submodels on distributed flood depths and extents which also leads to more consistent prediction at both spatial extents.

Analyzing the Change in Water Table Distribution Across Spatial Extent Scales and Different Submodel Intrinsic Scales
The results presented earlier highlight the improvement in prediction across different hydrograph components for the two spatial extents. Figure 12e shows that the difference in the median WT depths predicted by S-4 and L-4 reduces when compared to the difference between S-2 and L-2. It should be noted that the WT elevations predicted by the S-4 model are also compared with observed WT elevations across two fields for a total of 17 unique observations (spatial and temporal distribution combined) located in the floodplain for two storm events in 2018 and 2019 (not presented here), and the overall performance suggests a good fit in predicting WT across the floodplain. The region in "red" denotes the locations where there is a difference between the median WT depth between S-2 and L-2 but not in case of S-4 and L-4 ). The region in "green" denotes the intersection between these differences ( | ) identifying the locations where the median WT depth does not improve. The results show that the variability in WT depth reduces suggesting an optimal scaling of spatial extents when the intrinsic scales of the submodels are incorporated accurately.

Validation of Improved Spatial Scale Representation at Another Watershed
To evaluate broader applicability of the results from this study across watersheds with different physical characteristics, the findings from the previous sections are tested at the WHR basin. This basin has a highly developed urban land use and a shallow WT (Table 1), in comparison to the UWR basin, which is an agricultural watershed with a deeper unsaturated zone, although both basins are in a similar hydrogeologic setting. Although a similar SW resolution is used to build the SW mesh, the incorporation of urban features (road networks, ponds, and bridges) results in an overall finer SW mesh (Figure 3a) compared to the UWR basin. For the first model (referred to as WHR 1), the SW and GW resolution (intrinsic scales) are assumed to be the same for the largest reach (White River at Indianapolis, reach length = 50 km). For the second model (referred to as WHR 2), a fine resolution SW region and a coarse resolution GW region, is used.
The models (WHR 1 and WHR 2) for the WHR basin are simulated for a 300-hr long simulation, for an event with a 50-year peak flow return period, after a warm-up period of 50 hr. These models require a smaller warm-up period due to a relatively smaller study area. It should be noted that these models are created using the same representative parameters for soil type, roughness, and aquifer properties as the L-4 model. However, due to an increased impervious cover, land use variability may impact watershed hydraulics as the floodplain roughness reduces, as well as hydrologic processes such as rainfall excess and subsurface storage. Comparison of the two models ( Figure 13) clearly indicates that WHR 2 has a superior performance (NSE = 0.94, PBIAS = −8.7%, and peak flow error = −4.8%) compared to WHR 1 (NSE = 0.71, PBI-AS = +24.4%, and peak flow error = +15.5%). An improved performance for WHR 2 is obtained when the GW mesh size is approximately 2-3 times the SW mesh, indicating that a relatively finer GW mesh is needed for this basin in comparison to the UWR basin. Overall, WHR 1 overpredicts the streamflow for both high and low flows (Figure 12a) in addition to erroneous estimation of the recession limb of the hydrograph.
The results indicate that the incorporation of optimal intrinsic scales in integrated models results in an improved performance even for a watershed with variable land use characteristics. The hydrologic volumes for the WHR 2 (shown in Figures 13e and 13f) suggest a significant increase in rainfall excess, and subsequent decrease in vadose zone storage, and recharge, when compared to the UWR basin. The influence of urban land use on hydrologic process interactions has been evaluated by several studies (Lerner, 1990;Price, 2011;Schilling et al., 2008), and the results here validate the findings from these studies. Therefore, it can be concluded that the hydrologic processes are accurately represented using the modified model structure for the WHR basin.

Effect of Spatial Extent Scaling
The overall similarity in the hydrographs and maximum flood extents (F-statistic = 88.6%) for S-1 and S-2 suggests that for models with a small spatial extent and fine spatial resolution, there is insignificant difference in the overall prediction with and without the inclusion of subsurface processes. In such cases, the role of rainfall and subsurface processes in influencing model prediction is diminished. On the other hand, the difference in the performance between L-1 and L-2 (F-statistic = 67.6%) can be attributed to the increased significance of rainfall excess and subsurface flow at large spatial scales, which leads to increased contributions from lower-order headwater streams. The inundation extents (Figures 7a-7d) suggest that the integration of rainfall and subsurface hydrology with hydrodynamics becomes essential for watersheds with higher drainage density in the stream network as the volumetric contributions from lower-order streams increase at larger scales . Therefore, the difference between the S-1 and S-2 can widen for other watersheds if the model domain has a higher drainage density with more low-order streams.
These results suggest that for certain spatial extents, physiographic characteristics, and climatological conditions, a simplistic model structure might yield a similar performance compared to a complex integrated model. This process might still require an extensive calibration procedure, but in these cases, it may be feasible to achieve a comparable performance without the application of a complex model structure. However, the exact conditions where such an approach would be applicable can only be identified by comparing the two model structures. For example, the flows generated from a rainfall-runoff model with empirical routing can be applied as boundary conditions for ungaged streams when a simplistic modeling approach is incorporated, but the overall performance can still be affected by the uncertainty in the flow estimation and change in antecedent conditions, which can reduce the performance of simplistic models . Regardless, such an analysis may prove beneficial for longer simulations or in situations where integrated modeling is not feasible due to data and computational limitations.
Additionally, there may also be conditions where several serially combined simplified models can be coupled together, thereby, reducing the need to include more complex physical processes. The application of a serially linked approach has been incorporated for hydraulic modeling using models such as HEC-RAS for producing flood risk maps. This approach assumes that an impervious surface system can be adopted if the spatial extent of the model is small, and the hydrologic event is significantly large. These models are individually developed and calibrated for smaller areas, and the results are linked together. The exclusion of other physical processes is compensated by assuming that the streamflow, stage, and extent of inundation are solely functions of roughness distribution and topography. Studies have shown that this approach does not always work well for events outside the calibration range and for events with significantly different antecedent conditions which causes erroneous prediction (Kidmose et al., 2015;Saksena et al., 2019). Further, including a piece-wise linear combination of smaller-scale models would require calibrating each individual model with observational data (e.g., streamflow). This information may be available for certain streams with a finer spatial distribution of observational gages, but the approach may not be applicable for simulating large ungaged streams.
Also, serially linking simplified models ignores the variability in GW movement (e.g., WT distribution) which may affect the surface system beyond the spatial extents of smaller-scale simplified models. Finally, the application of physically complex large-scale models is increasingly becoming essential since water-shed-scale water management decisions often rely on a thorough understanding of how one hydrologic system affects the other. Regardless, in situations where sufficient information is available for optimally calibrating smaller-scale models, linking several simpler models may be feasible and desirable over developing a complex integrated model at a large scale.

Effect of Changing the Relative Intrinsic Scale of SW-GW Submodels
The results suggest that a relatively coarser intrinsic scale (or resolution) of the GW submodel compared to the SW submodel works better in this hydrogeological and climatological setting. It should be noted that an equivalent cell size for GW model for S-4 (∼430 m) and L-4 (∼398 m) maybe misleading for the GW system (Table 4) being simulated here, even though it is coarser relative to the SW submodel. This is because this study incorporates a six-point cyclic finite element approximation for simulating GW flow where the distance between two GW nodes is equal to half the triangular edge length (centroid to triangle vertex). Therefore, the intrinsic scale (Figure 2d), or the distance between two GW nodes with distinct hydraulic heads, is half, which approximates to 215 and 199 m for S-4 and L-4, respectively. Further, even with an overall larger intrinsic spatial scale, the GW submodel for S-4 and L-4 is derived from a flexible mesh which has an average resolution of 75 m in the river channels and 300 m in the floodplain. Therefore, the intrinsic spatial scale between two GW nodes (from the six-point cyclic finite element method) in the river and floodplain is approximately 37.5 and 150 m, respectively.
Considering that the surficial aquifer in this system has a significant percentage of glacial till that has a relatively higher porosity and a high volume of stored water, the mesh structure can still be considered "fine" for the river and its floodplain even though this resolution is coarser relative to the SW system. The SW submodels for all configurations have an average resolution of 15 and 60 m in the river and its floodplain, respectively, and the intrinsic length scale for the St. Venant formulations is equal to this value. Therefore, the equivalent cell size for SW (∼115 m) and GW (∼398 m) for L-4 may seem to be large, but the calculations are carried out at 15 m (SW) versus 37.5 m (GW) for the river and 60 m (SW) versus 150 m (GW) for the floodplain. While the GW mesh is still coarser with respect to the SW mesh, the relative magnitude of their intrinsic spatial scales is not substantially large.
Second, having a very fine resolution GW mesh in the river and floodplain for the ICPR model results in more triangular nodes in the GW system (in the case of L-2) which further results in more seepage faces into the surface system, and a larger volume of water is delivered from the GW system to the SW system in the river-floodplain region since the available water storage in the GW system is much larger compared to the SW system for this aquifer system. This results in increased baseflow during low flow conditions for L-2 which can be seen in Figure 11. For a significantly saturated system, this increased baseflow (Figure 11d) causes the hydrographs to overpredict the streamflow for L-2 (Figure 11a). With a relatively coarser GW mesh for L-4, the number of seepage fronts from the GW into the river-floodplain is reduced, the resultant baseflow prediction is also reasonable compared to observed baseflow (Figure 11d), and the overall hydrograph compares well with observed streamflow (Figure 11b). This is the main reason why a relatively coarser GW mesh compared to the SW mesh results in more accurate performance. This result may only be applicable for similar glacial aquifer systems (e.g., the mid-western USA) with a longer and consistent wet season, milder slopes, and a higher WT distribution. For regions with an arid climate, steeper slopes (e.g., mountainous regions), and a moderate to lower WT distribution, a GW mesh resolution like the SW mesh resolution may be more appropriate. Therefore, this study concludes that it is important to incorporate the knowledge on the characteristic spatiotemporal scales for different hydrologic processes when modeling regions with varying climatological and physiographic characteristics. In the absence of such information, a spatial scale analysis like this study can help in delineating the optimal model structure for producing consistently accurate hydrologic simulations at large spatial scales.

Summary and Conclusions
The physical processes governing the watershed response to storm forcing operate at different spatial and temporal scales. Some factors such as watershed topography and channel bed-form can influence hydrologic responses locally, as well as at large scales. The overall influence of other processes changes with wa-tershed scale, for example, rainfall, recharge, and seepage fluxes, may not influence watershed responses at small scales, but play an important role at large scales. Therefore, the assumptions related to characterizing these processes in models are only applicable over certain thresholds. The overall results from this study suggest that there may not be a one size fits all solution for spatial extent scaling of integrated models. However, a better representation of intrinsic scales of the physical processes and submodels can provide a more realistic reproduction of the basin response as an alternative to extensive calibration. The following conclusions are drawn from this study: 1. At small scales, the role of rainfall and subsurface processes in influencing riverine hydrodynamics is diminished as boundary inflows dominate. For events with very low exceedance probabilities, flooding becomes streamflow dominated at small scales, and the influence of hydrologic processes is reduced. 2. The influence of hydrologic processes increases at large scales as excess rainfall and lateral seepage also influence riverine flow, implying an increased significance of headwater and lower-order streams. The findings from this study support the argument that GW recharge operates at larger spatial scales, when compared to overland region and vadose zone. 3. Considering that the typical characteristic length scales for shallow GW flows may be different (coarser or finer) than overland or channel flow, the intrinsic scale (or resolution) of the GW submodel should be modified with respect to the SW submodel. Application of a coarser GW submodel with respect to the SW submodel improves the prediction of rising and falling limb of the hydrograph, lag-time, and hydrograph duration for the UWR and WHR watersheds. 4. By changing the intrinsic scale of the GW submodel with respect to the SW submodel, the streamflow prediction improves not only at the outlet but also across several locations inside the basin. The degree of improvement in streamflow prediction changes depending on the location of the reach in the watershed. 5. In addition to better streamflow prediction, the baseflow and subsequent direct runoff prediction are also improved considerably once an optimal model structure is incorporated. 6. The distributed maximum flood depth maps suggest a significant change in overall inundation extents (and depths) on changing the intrinsic scale of GW submodels. These maps also show evidence for a more consistent prediction across small and large scales after finding the optimal model structure. 7. The difference in the median WT depth between the two models with different spatial extents reduces when the intrinsic scales of the submodels are incorporated accurately. 8. By evaluating the effect of spatial scale variability on physical process characterization, an optimal modeling structure can be determined for any watershed. For example, even with an urban land use, accurate water balance and streamflow are achieved for the WHR basin by incorporating optimal intrinsic scales of the submodels. Therefore, it is important to incorporate the knowledge on the characteristic spatiotemporal scales for different hydrologic processes when modeling regions with varying climatological and physiographic characteristics.
The analysis approach used in this study can be used toward improving the performance of integrated models. If the model structure across different watersheds is varied such that the conservation of mass holds and the characteristic length scales of different physical processes are captured, the prediction can be improved for any system. Other important findings from this study include disseminating the specific contributions from different processes and parameters. For example, roughness distribution affects the lag-time and surface storage, while GW flow controls the thickness of hydrograph limb, peak streamflow, and water balance of the watershed. Similarly, future studies can include the effect of evapotranspiration in addition to subsurface processes and simulate the water balance for an entire watershed.
Future studies should also incorporate the temporal-scale variability of rainfall for capturing the influence of rainfall at large watershed scales. In conclusion, the hydrologic and geomorphic characteristics may or may not change across different regions and spatial scales, but the methods and schemes needed to characterize these processes need to be addressed in integrated models. Such an analysis is especially vital with the increased focus on developing stream-network-based large riverine models at regional or continental scales.

Data Availability Statement
Data used in this paper are open and freely accessible through the links provided in Table 1 and are also available as a HydroShare resource (https://www.hydroshare.org/resource/9204810f4148421f9c7f19151a36e4d8/).