Biogeographic classification of streams using fish community– and trait–environment relationships

Present a hybrid biogeographic and environmentally constrained clustering approach to classify ~853,000 stream reaches in the eastern United States. Examine the frequency of stream typologies in the landscape relative to anthropogenic stressors to identify potential stream conservation needs.


| INTRODUC TI ON
Streams and rivers display extraordinary complexity in their physicochemical characteristics. This complexity varies from reach to reach and arises from diverse geological, topographic, and land cover characteristics that emerge locally within the reach or remotely from the upstream catchment (Benda et al., 2004;Ward, Tockner, Arscott, & Claret, 2002). Such physicochemical diversity translates to equally diverse lotic biotas occupying stream ecosystems. Despite freshwater ecosystems being among the most biodiverse ecosystems in the world, they are also the most imperiled, consequently from societal water demands and other adverse anthropogenic activities (Dudgeon et al., 2006). As such, water resource managers and biodiversity conservationists are challenged with the need to adequately define and map spatial variation in the physicochemical and biological features of stream ecosystems. Such efforts are specifically necessary for identifying stream types that will respond to anthropogenic stressors, stratifying sample locations for monitoring programmes and prioritizing conservation or habitat preservation projects (Hawkins et al., 2000;Hayes et al., 2003;Pressey et al., 2000;Roux et al., 2008).
Stream classification schemes offer a means to simplify and generalize the ecological complexity of streams for the purpose of water resources management and biodiversity conservation.
Approaches to stream classification are diverse and include schemes seeking to partition streams into homogenous groups (i.e., classes) based on geographic proximity (Omernik, 1995) or similarity in abiotic (Hawkins et al., 1993) or biotic characteristics (Barbour et al., 1996;Herlihy, Hughes, & Sifneos, 2006). For example, ecoregion approaches aggregate adjacent drainages based on faunal or floral differences stemming from region-specific habitat differences or biogeographic histories (Abell et al., 2008). Alternatively, abiotic habitat approaches seek to characterize spatial variation in physicochemical conditions, particularly channel geomorphology (Rosgen, 1994) and hydrology (Poff & Ward, 1989). This habitat template approach implicitly incorporates species-environment relationships by using abiotic variables known (or hypothesized) to affect the distribution of organisms or the rates of ecological processes (Townsend & Hildrew, 1994).
If the goal of stream classification is to capture abiotic factors that are causally linked to biota and ecological processes, schemes should explicitly incorporate species-environment relationships.
Several recently developed statistical techniques have facilitated the application of such species-environment classification schemes (De'Ath & Fabricius, 2000;De'Ath 2002, Ferrier, Drielsma, Manion, & Watson, 2002. For example, Brenden, Wang, and Seelbach (2008) used multivariate regression trees to quantify the relationship between fish community composition and reach-scale environmental conditions, specifically summer temperature, in streams of Michigan, USA. Similarly, Leathwick et al., (2011) used generalized dissimilarity modelling to classify streams based on the response of fish and macroinvertebrate communities to reach-scale physicochemical conditions in New Zealand streams. Both of these classifications were developed at modest spatial extents (i.e., 250,000 and 268,000 km 2 , respectively), which tends to constrain habitat and taxonomic diversity. For broader-scale classification (i.e., >1,000,000 km 2 ) encompassing numerous drainage basins, it becomes necessary to consider not only reach-scale physicochemical factors, but also regional variation in climate and historical biogeographic boundaries (Abell et al., 2008;Olden & Kennard, 2010;Tonn, Magnuson, Rask, & Toivonen, 1990). Another advantage of these species-environment approaches to stream classification is that models can be developed using discrete sampling sites and, assuming environmental predictors are GIS-derived (e.g., National Hydrography Data set; USEPA, 2005), extrapolated to all confluence-to-confluence stream reaches in the study area (Brenden et al., 2008;Leathwick et al., 2011). This provides spatially contiguous maps which can be used to estimate the prevalence of stream classes and identify anthropogenic land use threats.
Using multivariate species-environmental relationships provides a novel approach to simultaneously understand the importance of environmental variables in explaining biogeographic patterns while also directly harnessing the variation of these relationships in clustering. The application of trait-based approaches can further improve the mechanistic underpinnings of species-environment relationships by offering a means to elucidate causal links between abiotic conditions and organismal vital rates (McGill et al., 2007;Frimpong & Angermeier, 2010). Moreover, comparing trait-environment relationships with species-environment relationships can shed light on the relative roles of environmental filtering and historical biogeography structuring biological communities (Hoeinghaus, Winemiller, & Birnbaum, 2007). Although these advantages of traitbased approaches have seen increased application in basic ecology and freshwater ecology in recent years, the practice of incorporating trait-based ecology in stream classifications has seen limited attention (but see Tolonen, Leinonen, Marttila, Erkinaro, & Heino, 2017).
The overall aim of this study was to explore stream classification schemes for the eastern United States. Such tools that facilitate uniquely inform management decisions regarding anthropogenic threats to the biodiverse freshwaters of the eastern United States.

K E Y W O R D S
conservation prioritization, fish communities, GIS mapping, multivariate regression trees, stream classification, trait-based ecology the understanding and management of freshwater biodiversity are greatly needed for this region as it harbours the richest temperate freshwater fauna in the world, particularly fishes, mussels and crayfishes (Abell et al., 2008;Haag, 2012;Taylor et al., 1996). This conservation concern is further exacerbated by the rapid increase in the human population in the eastern United States (Radeloff et al., 2012) and its concomitant stress on stream habitat and water resources (Richter, Mathews, Harrison, & Wigington, 2003

| Fish community data
We used previously compiled fish community data from federally administered stream fish surveys (Troia & McManamay, 2017). Similar sources of fish distributional data have been collated previously and used as a comprehensive community data set in national-scale studies of stream fish biogeography, ecology and conservation (Giam & Olden, 2016;Herlihy et al., 2006;Mitchell & Knouft, 2009). For our study, we compiled records within National Hydrology Dataset (NHD) hydrologic regions 01 through 06 (Figure 1a). We used ArcGIS (version 10.3, ESRI Inc.) to join the Cartesian coordinates of each site to NHD version 1 flowlines (i.e., confluence-to-confluence stream reaches; hereafter 'NHD reaches') and their associated unique identifier (i.e., COMID) to link GIS-derived environmental data layers to fish community data.
Because we were interested in describing classes based on native fish communities responding to natural environmental conditions, we retained only survey sites that were minimally impacted by anthropogenic activities. To accomplish this, we removed sites with 'high' or 'very high' cumulative disturbance indices according to the National Fish Habitat Action Plan (NFHAP; Esselman et al., 2011). We also removed species from each site that were not historically native to the NHD sub-basin (i.e., HUC8) in which that site was located. Lists of historically native species were acquired from the NatureServe Explorer data set (NatureServe, 2010). This procedure removed species that are not native to the study region (e.g., species of Eurasian origin) as well as native invaders (sensu Scott & Helfman, 2001) that have increased in prevalence due to habitat alteration and sportfish stocking.

| Trait data
We acquired existing or developed new trait data describing reproductive life-history, body morphology, trophic guild membership and temperature guild membership of native species represented in the community surveys (Table 1). These traits exemplify important axes of functional variation among North American freshwater fish species and correlate with environmental gradients associated with broad-scale climate and physiography as well as reach-scale environmental conditions associated with stream network position and catchment characteristics (Frimpong & Angermeier, 2010).
Life-history and trophic guild data were acquired from the FishTraits database (Frimpong & Angermeier, 2009 TA B L E 1 Data sources for 13 traits used in the function-based stream classification dummy (i.e., binary) variable for each of the four trophic guilds. For life-history traits, we used the trilateral life-history framework of Winemiller and Rose (1992). This framework describes trade-offs in age at maturity, fecundity and parental care resulting in three strategic endpoints: opportunistic, periodic and equilibrium. We used a soft classification scheme whereby species were plotted in trivariate life-history space and Euclidean distances between the species and each of the three strategic endpoints were calculated (Mims & Olden, 2012). This method produced three continuous variables representing a species' affinity for each of the three life-history strategies. Temperature guild data were acquired from Lyons et al., (2009). We classified species as warm-water, cool-water or cold-water using a separate dummy (i.e.,binary) variable for each of the three temperature guilds. Morphological traits were developed from lateral-view photographs of fish specimens that we digitized from field guides (Table S1). We used geometric morphometric analysis to characterize variation in body shape and fin origins (Franssen, Goodchild, & Shepard, 2015). We placed sixteen homologous landmarks ( Figure 2a) on each specimen using tpsDig software (Rohlf, 2004a). Next, we performed a generalized Procrustes analysis to resize, translate and rotate landmarks followed by a relative warp analysis using tpsRelw software (Rohlf, 2004b). This method produced relative warp (i.e., ordination axis) scores for each species, the first three of which were retained as Analyses of trait-environmental relationships can violate the statistical assumption of independence because trait values among species are not independent of one another owing to shared evolutionary history (Felsenstein, 1985). Controlling for phylogeny ensures spatial variation in trait composition is attributable to environmental variation and not confounded by common lineage of species pools. To control for phylogeny, we used generalized linear models to predict each trait from family number based on Gaussian distributions (i.e., for continuous traits) or Poisson distributions (i.e., for binary traits) following McManamay and Frimpong (2015). For each trait model, deviance residuals were extracted for each species and used in subsequent trait analyses.

| Biogeographic and environmental data
We acquired existing or developed novel GIS data layers describing biogeographic realms, stream discharge, channel gradient, summer temperature and hydrology of the ~853,000 NHD reaches located in the eastern United States (Table 2). We selected these environmental factors because they influence taxonomic and functional composition of stream fishes (Lamouroux et al., 2002;Lyons et al., 2009;Olden & Kennard, 2010;Xenopoulos & Lodge, 2006). Moreover, each variable is available at the segment resolution for the eastern United States either directly from the NHD or can be accurately modelled using natural landscape characteristics available from the NHD. Although many anthropogenic factors (e.g., catchment land use and hydrologic alteration) influence stream fish communities, we selected only natural environmental factors to correspond with the native community data with the goal of characterizing natural community-environment relationships and developing natural classes.
Biogeographic realms for obligate freshwater organisms such as fish are defined by historical drainage and glaciation patterns that limited dispersal across drainage divides. We used the freshwater ecoregional map produced by Abell et al. (2008) as a categorical predictor variable with 12 discrete biogeographic realms spanning the eastern US study area (Table 2, Figure 1c). Stream discharge is a measure of habitat size (Xenopoulos & Lodge, 2006) and correlates with the origin and diversity of food resources available to fishes (Vannote, Minshall, Cummins, Sedell, & Cushing, 1980). We used mean annual discharge as a measure of stream network position, which was modelled previously using the Unit Runoff Method and is available for all NHD reaches (NHDPlus V1, 2010) ( Figure 1d).
Channel gradient correlates with channel hydraulics, bedform and substrate size (Montgomery & Buffington, 1997) and is known to affect the community composition of fishes (Lamouroux et al., 2002).
We acquired channel gradient for all NHD reaches, which was an existing attribute of the NHD (Figure 1e). Mean summer temperature is a key physicochemical variable directly linked to fish community composition along gradients of stream size, elevation and latitude (Lyons et al., 2009;Rahel & Hubert, 1991). Mean summer (July-August) stream temperature from >800 monitoring sites were previ-  (Table S2).
We used Random Forests to model principal components using 166  (Table   S3). This algorithm is useful for characterizing non-linear relationships between predictor variables and the response variable as well as modelling high order and complex interactions among predictor variables (Cutler et al., 2007). Because predictor variables were GIS-derived and available for each of the ~853,000 NHD reaches in the eastern US, the Random Forests model was used to extrapolate component scores to all NHD reaches (Figure 1g-i).

| Stream classification analyses
We used multivariate regression trees (MRT) to develop discrete classes based on community-environment relationships (De'Ath, 2002). MRTs use a recursive partitioning procedure to sequentially split observations (e.g., fish communities) into dichotomous groups with the goal of minimizing within-group variation while simultaneously identifying environmental variables that distinguish each group from one another. Separate MRTs were developed based on taxonomic composition and functional composition of stream fishes.
For the taxonomic MRT, we retained only those species occurring in ≥5% of the sites to maximize detection of predominant communityenvironment relationships (Poos & Jackson, 2012). As a secondary analysis, we included all species to ensure models accounted for habitats supporting highly rare or endemic species. Site-by-species matrices were Hellinger-transformed prior to implementation of the MRT analysis (Borcard, Gillet, & Legendre, 2011). For the functional MRT, we calculated abundance-weighted means for each of the 13 traits. This calculation produced weighted means for continuous traits (reproductive life history and morphology) and the proportion of individuals within a community belonging to each binary trait (trophic and temperature guild). This site-by-trait matrix was standardized from 0 to 1 prior to implementation of MRT analysis so traits with different value ranges were equally weighted. Because temperature guild membership is based on field observations of thermal habitat use (Lyons et al., 2009) and therefore could inflate model performance and explanatory capability of temperature, we also fit function-based MRTs without temperature guild membership.
We developed MRTs with two objectives in mind. First, we developed unpruned trees using all seven environmental predictors and each predictor variable on its own. Comparing explanatory capability (e.g., R 2 ) among these eight models based on taxonomic and functional community matrices allowed us to evaluate the relative community-structuring roles played by each of the environmental predictors. Second, we grew taxonomy-based and function-based trees using all seven environmental predictors and then pruned them back using the minimum cross-validation relative error rule (Borcard et al., 2011). This set of trees provided an optimal number of terminal nodes (i.e., stream classes) that maximized explanatory capability while maintaining the simplicity and interpretability of the classification scheme.

| Environmental layers and fish data
The A total of 371 native fish species were documented at 111 sites. Of these species, 63 occurred at <5% (55) of the sites and were considered rare species. Taxonomic MRTs were constructed that excluded and included these rare species.

| Unpruned MRTs and explanatory capability of environmental variables
The variation explained by each MRT increased with the size of the  Figure S1). In the functional MRT, however, removing temperature guild membership did not change the relative explanatory capability of environmental or ecoregional predictors ( Figure S2).

| Pruned trees and stream class mapping
The optimal taxonomic MRT for non-rare species explained 26.4% of the variation in community composition and retained 13 terminal nodes (i.e., leaves), which we interpret as ecologically informative classes hereafter referred to as T1 through T13 (Figure 5a). When considering all species, variance explained by the optimal taxonomic F I G U R E 3 Pairwise correlations among seven predictor variables used for constrained clustering (i.e., multivariate regression trees). Grey symbols represent a random 10% sample of all NHD stream segments (n = 85,000) and black symbols represent the subset of NHD segments containing a fish community sample (n = 956) MRT reduced to 22.9%, although tree structure (including important variables) was similar to that for non-rare species ( Figure S3). From here on, we describe only the results of the non-rare taxonomic MRT. Of the twelve internal nodes representing environmental/ biogeographic divisions, three involved ecoregional divides, three involved summer temperature thresholds, three involved discharge thresholds, two involved gradient thresholds and one involved a hydrology threshold. The first division involved an ecoregional divide separating Atlantic coast drainages (T1 through T6) from the Mississippi River and Great Lakes drainages (T7 through T13).
Within the Atlantic Coast classes, summer temperature thresholds distinguished cold-and cool-water streams (T1 through T3) from warm-water streams (T4 through T6). Cold-and cool-water streams were further divided into high-gradient cold-water streams (T1), high-gradient cool-water streams (T2) or low-gradient cold-water or cool-water streams (T3). Warm-water streams on the Atlantic  The optimal functional MRT explained 25.2% of variation in community composition and retained six terminal nodes hereafter referred to as F1 through F6 (Figure 5b). The optimal functional MRT without temperature guild membership included in the trait matrix explained 26.1% of the variation and also retained six terminal nodes ( Figure S4). Of the five internal nodes representing environmental/biogeographic divisions, three involved ecoregional divides and two involved summer temperature thresholds.

| Prevalence and anthropogenic degradation of stream classes
The 13 taxonomy-based classes varied substantially in prevalence, which we characterized as the number of stream kilometres per class distributed among the 852,543 NHD reaches (Figure 7l). The T12 class was most prevalent (269,326 stream kilometres), followed by the T4 (158,427 km) and T5 (152,614 km) classes. Classes representing larger streams (i.e., mean annual discharges greater than 37 cubic feet per second) were the least prevalent and included the T8 (5,743 km), T10 (29,613 km) and T13 (40,966 km) classes. It is important to consider that although these classes are represented by relatively few stream kilometres, their large discharge provides substantially more habitat area per stream kilometre than do the smaller streams represented by more prevalent classes (e.g., T11 and T12). Stream kilometres was more evenly distributed among function-based classes compared to the taxonomy-based classes ( Figure 8f). The F6 (282,775 km) and F1 (269,498 km) classes were most prevalent, whereas the F2 (187,488 km) and F3 (187,247 km) classes were least prevalent.
The degree to which stream classes have been anthropogenically degraded varied substantially among classes (Table 4). In general, those classes in regions with high agricultural land use (i.e., Ohio, Indiana, Mississippi delta) or high human population densities (i.e., northeast) were the most degraded. Among taxonomy-based classes, the T12 class exhibited the greatest degradation; 45% of stream kilometres were classified by NFHAP as high or very high disturbance. At least 30% of the T2, T5 and T13 classes were classified as high or very high disturbance. Among the least disturbed taxonomy-based classes were the T1 and T8 classes. Function-based classes exhibited similar and moderate anthropogenic degradation, with all six classes ranging from 21% to 36% of stream kilometres classified by NFHAP as high or very high disturbance. It is important to note that reaches of long rivers inundated by reservoirs (e.g., the entire Tennessee River) are not scored by NFHAP. Given that these reaches are impounded, it is most appropriate to view them as degraded. As such, the degradation reported for the T3, T4 and T12 classes in the 'High' and 'Very high' columns of Table 4 likely underestimate the degree of true degradation owing to large percentages in the 'Unscored' column.

| D ISCUSS I ON
Our overall aim was to use species-and trait-environment relationships of fish communities to classify streams across the physiographically and biologically diverse eastern United States.
Developing classifications for large regions can be challenging, given that community composition is a product of historical biogeography and numerous contemporary environmental conditions that vary at multiple spatial scales (Diamond, 1975;Tonn et al., 1990;Weeks, Claramunt, & Cracraft, 2016). By comparing environmental and biogeographic associations of taxonomic versus functional composition, we were able to estimate the relative importance of these environmental factors in structuring different aspects of fish communities. Moreover, classification schemes mapped in a spatially contiguous manner (e.g., all confluence-to-confluence stream reaches) can inform water resources management and biodiversity conservation using GAP analyses (Turak & Linke, 2011). By integrating fish communities at discrete sites with the spatially contiguous TA B L E 3 Top three statistically significant (p < .05) indicator species for 13 taxonomy-based stream classes environmental layers originating from the frequently used National Hydrography Dataset, we were able to quantify the spatial distribution and prevalence of fish-based stream classes as well as estimate anthropogenic degradation of these classes.

| Natural drivers of assemblage composition
Species-and trait-environment relationships revealed by our environmentally constrained classification schemes were generally congruent with the findings of previous studies on stream fishes in North America and on other continents. Our findings are congruent with temperature thresholds identified at narrower spatial extents within the eastern United States (Brenden et al., 2008;Lyons et al., 2009  ing not only trait means but also trait variances (Cornwell, Schwilk, & Ackerly, 2006 & Ward, 1989).
Studies that compare community-environment relationships based on taxonomic and functional composition can shed light on the relative roles of contemporary environmental filtering versus historical dispersal and speciation (Hoeinghaus et al., 2007; but see Heino, 2008). Our taxonomy-and trait-based classification of fish communities in the eastern United States supported this assertion. That environmental factors were more effective predictors of functional composition than taxonomic composition suggests that aspects of taxonomic variation among communities is not a consequence of contemporary environmental filters and therefore is related to non-environmental factors (e.g., historical dispersal and speciation). The importance of ecoregional boundaries in explaining taxonomic composition of fishes is not surprising, given the complex biogeographic history of the fish fauna in eastern North America (Mayden, 1992). Including rare species in the taxonomy-based classification scheme decreased the explanatory power of MRTs. On one hand, the inclusion of rare species could be hypothesized to increase the explanatory power of ecoregional boundaries because most rare species in our data set are narrow-ranging endemics that are functionally similar to broad-ranging congeners (Etnier & Starnes, 1993).
However, including rare species also induces zero inflation in models, potentially impacting model stability, and induces model bias, as MRTs might select for environmental variables serving as mere artefacts of systems where rare species are found, such as remnant patches (e.g., small streams) of former distributions, as opposed to true environmental gradients associated with species occurrences.
To summarize, these findings demonstrate the need for stream classification systems based on species-environment relationships to include contemporary environmental and historical biogeographic factors, particularly for those schemes developed at broad spatial scales and/or in regions with complex biogeographic histories.

| Conservation and management implications
Stream classifications are essential for decision-making in water resources management and biodiversity conservation. Some have suggested that conserving habitat typologies, that is surrogates for ecological process supporting biodiversity, are a superior approach to climate change adaption, as creating reserves representing the diversity of these types will simultaneously preserve processes that maintain species diversity (Beier & Brost, 2010 (Comte, Buisson, Daufresne, & Grenouillet, 2013) and eastern cold-water species specifically (Meisner, 1990). The immediacy of this threat depends on the velocity of climate change which has been evaluated in the western United States (Isaak et al., 2016), but not yet in the eastern United States. Another conservation concern is large river habitats, which are least prevalent in terms of cumulative length and are also highly degraded owing to flow regulation for flood control, commercial navigation and hydroelectric generation (Vörösmarty et al., 2010). Great rivers of North America (e.g., Mississippi, Missouri, and Ohio Rivers) and their large tributaries are important reservoirs of fish diversity and require conservation action (Pracheil, McIntyre, & Lyons, 2013).
Management and conservation decisions should not only depend on fishes, but on all stream-dwelling taxa including macroinvertebrates, crayfishes and mussels. This is particularly important in the eastern United States where these invertebrate taxa exhibit high regional richness and local endemism (Stein, Kutner, & Adams, 2000).
It is unclear whether our fish-based classifications would accurately represent spatial variation in other stream-dwelling fauna and flora.
Some previous studies suggest that fishes and invertebrate taxa respond similarly to natural environmental conditions and/or anthropogenic disturbances (Johnson & Hering, 2010), whereas others indicate differing responses (Pilière et al., 2014;Troia, Williams, Williams, & Ford, 2015). Presently, compilation of occurrence data for non-fish taxa lags behind that of fishes in the eastern United States (Troia & McManamay, 2016); however, the growth of openaccess biodiversity data will soon allow the completion of classification schemes for the eastern United States that incorporate these other important taxa. A final caveat is that many freshwater taxa occupy lentic habitats whereas our classification schemes only correspond to lotic habitats. Although this is a common limitation of many freshwater classification schemes (Melles, Jones, & Schmidt, 2012), it is important to acknowledge that numerous species and functional components of the eastern North American freshwater fish fauna may not be adequately represented. This is particularly important in the southeastern United States where many unique fishes occupy wetland habitats (e.g., species of the family Elassomatidae) and the upper Midwest where glacial lakes support unique species (e.g., lake trout [Salvelinus namaycush] and species of the family Coregonidae).

| Broader implications and conclusions
Our study highlights at least three important considerations when developing habitat classifications for streams or, more generally, for other aquatic or terrestrial habitats. First, we have provided a straightforward, yet powerful, technique to directly (as opposed to indirectly) incorporate the relative importance of environmental variables in organizing ecological communities into stream classification. Secondly, our work emphasizes that spatial variation in biodiversity is a product of environmental conditions and historical factors that vary at a range of spatial scales (Diamond, 1975;Weeks et al., 2016). For example, our classification revealed the importance of reach-scale channel gradient but also region-wide biogeographic boundaries in distinguishing biological communities. Anthropogenic alterations to ecosystems increasingly present continental-and even global-scale challenges (Hulme, 2005;Vörösmarty et al., 2010).
As such, classification schemes should continue to include appropriately scaled environmental predictors as we have shown with eastern US stream fish assemblages. Finally, trait-based approaches can improve confidence in the causative linkages between biotas and environmental conditions, be those natural or anthropogenic TA B L E 4 Percent of stream kilometres per taxonomy-based class (T1 to T13) and function-based class (F1 to F6) distributed among five levels of anthropogenic habitat degradation