Environmental drivers of benthic fish assemblages and fish‐habitat associations in offshore areas of a very large river

Fish‐habitat associations are poorly known in offshore areas of very large rivers. We examined physical habitat structure and its effect on habitat use and assemblage formation of benthic fishes in the main channel of the Danube River, Hungary. Principal component analysis of physical variables showed that sample unit (i.e., 500 m long reaches) and cross‐channel transect‐level data of corresponding reaches were highly correlated. We found clear gradients in physical variables from areas with high velocity and higher proportion of hard substratum (pebble and cobble) to areas with low velocity, high mean depth and finer substratum (mainly sand) composition. Variation in velocity was coupled with variation in both mean depth and substratum composition (i.e., Shannon diversity of sediment composition) and higher proportion of silt material. Differences in physical habitat structure (flow, substrate) also manifested among river segments. Classification and regression tree analyses (CART) and fish abundance – occupancy patterns in the PCA template revealed that many species showed clear responses to environmental heterogeneity (barbel, Barbus barbus; schraetser, Gymnocephalus schraetser; Danube streber, Zingel streber; whitefinned gudgeon, Romanogobio vladikovy; round goby, Neogobius melanostomus) while others (white bream, Blicca bjoerkna) showed very elusive habitat use patterns. Multivariate regression tree analysis confirmed the results of CART and indicated that transect‐level substratum composition was the most important determinant in the formation of benthic assemblages. These results on habitat use can contribute to the more effective conservation management of offshore fish assemblages, which is important due to increasing inland navigation in the Danube River.

correlated. We found clear gradients in physical variables from areas with high velocity and higher proportion of hard substratum (pebble and cobble) to areas with low velocity, high mean depth and finer substratum (mainly sand) composition. Variation in velocity was coupled with variation in both mean depth and substratum composition (i.e., Shannon diversity of sediment composition) and higher proportion of silt material. Differences in physical habitat structure (flow, substrate) also manifested among river segments. Classification and regression tree analyses (CART) and fish abundanceoccupancy patterns in the PCA template revealed that many species showed clear responses to environmental heterogeneity (barbel, Barbus barbus; schraetser, Gymnocephalus schraetser; Danube streber, Zingel streber; whitefinned gudgeon, Romanogobio vladikovy; round goby, Neogobius melanostomus) while others (white bream, Blicca bjoerkna) showed very elusive habitat use patterns. Multivariate regression tree analysis confirmed the results of CART and indicated that transectlevel substratum composition was the most important determinant in the formation of benthic assemblages. These results on habitat use can contribute to the more effective conservation management of offshore fish assemblages, which is important due to increasing inland navigation in the Danube River.  (Allan & Flecker, 1993). However, sampling even basic habitat and assemblage data can be problematic in remote, hardly accessible habitats and/or where extreme habitat conditions preclude effective sampling, such as, for example, deep water oceanic habitats or canyons of high mountains. In river ecology, the offshore areas of very large (or great) rivers have been long considered as areas that are difficult to sample effectively, due mostly to depth and flow conditions (e.g., Dettmers, Gutreuter, Wahl, & Soluk, 2001;Loisl, Singer, & Keckeis, 2014). Not surprisingly, most habitat assessment studies on fish assemblages are restricted to shoreline analyses only, or to smaller streams and rivers (see e.g., Boys & Thoms, 2006;Er} os, Tóth, Sevcsik, & Schmera, 2008;Keckeis et al., 2013). Although offshore areas proved to be important feeding, breeding and wintering habitats for many large river fishes (Galat & Zweimüller, 2001;Wolter & Bischoff, 2001), there is still limited knowledge on specific fish-habitat associations and the assembly of species in a variety of biogeographic regions (e.g., Cao, Parker, Edison, & Epifanio, 2019;Dettmers et al., 2001;Ridenour, Starostka, Doyle, & Hill, 2009), and most studies address only rough spatial scales.
Fish-habitat associations of offshore areas may be also difficult to model effectively. For example, environmental gradients in substrate composition and water velocity are relatively short in the potamon, at least compared with the littoral zone or with wadeable streams of the rhithron, where contrasting changes in physical habitat quality and fish assemblages are more common (Allan & Castillo, 2007;Er} os, 2017;Er} os et al., 2017). This may result very elusive fish-habitat relationships for offshore areas. Fish abundance may also vary largely offshore (Wolter & Freyhof, 2004;Szalóky et al., 2014), which can be due either to the response of fish to subtle changes in physical habitat quality or stochasticity in abundance. Advanced statistical and machine learning techniques, which can handle nonlinear and complex interaction effects, may be better applicable to explore fish-habitat relationships in these cases than traditional methods such as linear or multiple regressions (Knudby, Brenning, & LeDrew, 2010;Olden, Lawler, & Poff, 2008). In this regard, regression trees proved to be especially useful for modelling fish-habitat relationships (Knudby et al., 2010;Vezza, Parasiewicz, Calles, Spairani, & Comoglio, 2014).
Although these tools are still largely underutilized in large river fish ecology, and especially not for offshore areas, they have been already successfully applied in modelling the fish assemblage structure of shoreline habitats (Wilkes, Maddock, Link, & Habit, 2016).
Distribution and abundance of fish may not only be influenced by the physical attributes of the sampling area, but by the surrounding habitat (Er} os & Grossman, 2005;Schlosser, 1991). However, how the hierarchical structure of the habitat influences the organization of fish assemblages or fish habitat relationships offshore is largely unknown. Therefore, in this study, we examine the benthic fish assemblages of offshore areas of the Danube River, Hungary. We characterize the physical habitat structure of offshore areas and use multivariate regression trees for defining the scale dependent environmental determinants of benthic fish assemblages, and classification and regression trees for examining species specific habitat relationships.
Former samplings using a specifically designed benthic framed trawl showed that offshore areas serve as important habitats for many rare and endangered benthic species of high conservation concern in the Danube (Szalóky et al., 2014), the habitat use of which is still poorly known. We predicted that the joint analysis of focal scale (i.e., sample unit level) physical attributes with physical attributes of the surrounding environment (i.e., transect or higher level environmental F I G U R E 1 Map of the study area and location of sampling sites in the Hungarian portion of the Danube River (a). Three large and separating segments are differentiated using empty (I), grey (II) and black (III) dots (b). Distribution of two exemplar transects and their corresponding sample units (500 m long) are also shown on a bathymetric map (c) heterogeneity) may better characterize the physical heterogeneity of the relatively homogenous offshore environment and, therefore, may yield stronger fish-habitat relationships than using variable scores of sample level environmental heterogeneity only. Consequently, we were especially interested to explore the relationships between sample unit and transect-level environmental heterogeneity and their influences on fish assemblages and on the habitat use of the assemblage constituting species. In addition, we hypothesized that most benthic species will respond to offshore gradients in habitat structure, even if we experience short environmental gradients. In this regard, we predicted that changes in substrate composition will be the most influential mechanism that govern the habitat use of individual species and the benthic community.  Note: (a) sample unit level, where variables were measured within the 500 m long unit; (b) transect level, where values of the variables were calculated using the mean of the sample unit level values; (c) transect neighbourhood level, which embraced 2,000 m long segments, and where we used a grid of 15 × 15 m 2 systematic, equally distance measurements from a bathymetric map for calculating mean depth and its coefficient of variation for each segment.
section lying within Budapest, the capital of Hungary), interrupted with embanked rip-rap shorelines of~100-1,000 m long sections.

| Data collection
Offshore distribution of fishes was examined using a benthic framed trawl. The trawl consisted of a stainless steel frame (2 m wide × 1 m high) to which a drift net was attached (mesh size 5 and 8 mm for the inner and outer mesh bag, respectively) (for details see Szalóky The collected fish were identified, measured to the nearest mm standard lengths (70% of fish) and then released back to the river. Note, that preliminary exploratory analyses showed that young and adult fish of the same species showed basically the same occupancy patterns in the habitat template, which was characterized by a principal component analysis (see below). Therefore, to save space we did not show size or age group specific analyses in this article, although we could clearly separate young and adult individuals for some species based on length-frequency histograms (see Appendix I).
Physical habitat variables were quantified at three spatial scales, specifically at the sample unit, at the transect, and at the transect neighbourhood scales (Table 1) which embraced a 2,000 m long segment (see Table 1). For this we used a grid of 15 × 15 m 2 systematic, equally distance measurements from a bathymetric map for each segment.

| Statistical analysis
We We used classification and regression tree analyses (CART) to directly select those key physical variables, which may determine the most the habitat use of the species. CART is a flexible and robust classification and prediction method, and it is ideally suited for modelling non-linear interactions, which often appear in ecological data (Breiman, Friedman, Stone, & Olshen, 1984;De'ath & Fabricius, 2000). Trees explain variation in a single response variable by repeatedly splitting the data (here the CPUE of each species) into more homogenous groups using combinations of the predictor variables (here physical habitat data). Finally, we also used multivariate regression trees (MVRT), a multivariate extension of CART (De'ath, 2002) to model the response of the assemblage to the physical habitat data. We used a cross-validation procedure to find the optimum tree size and to avoid overfitting the data. Indicator species analysis was then used to find the most characteristic species to each assemblage group (Dufrêne & Legendre, 1997).

Segment-level distribution of samples in the ordination plane
showed that most environmental heterogeneity occurred at the macroscale (i.e., between large river segments), especially between the lower and the two upper segments, while within segment, mesoscalelevel physical heterogeneity was relatively low (Figure 2).

| Fish-habitat associations
We collected 33 species and 9,274 specimens during the 199 trawling paths. The most dominant benthic species (Table 2) Figure 3). For example, the barbel and the Danube streber preferred areas with higher velocity and dominance of pebbles and cobbles, while the schraetser and zingel were found in areas with finer substratum, lower velocity and higher mean depth. The invasive Ponto-Caspian gobies, such as the round goby, the bighead goby and the racer goby, showed relatively similar habitat affinity, which differed to some degree from the native species. Specifically, their habitat use pattern could be best characterized by considering both PC1 and PC2 axes, where a gradient in velocity and variation in velocity were the main physical determinants, and secondarily the proportion of rarer substratum materials, such as silt.
The CART analysis generally supported the results obtained by the visualization of the abundanceoccupancy patterns in the PCA.
However, it specifically selected some cut-off values both for the predictor physical variables and for the fish abundance values, which help to more specifically quantifying the habitat use of the species (Figure 4). In this regard, CART also revealed that no valid models could be obtained for some species (racer goby, white bream, bighead goby, zingel), due to the lack of clear response of these species to the examined physical variable gradients.

| Formation of benthic fish assemblages
The MVRT analysis indicated that transect-level substratum composition was the most important determinant in the formation of benthic fish assemblages ( Figure 5). Areas with lower portion of pebble (<37.5%), and consequently, higher portion of finer bed material, had a benthic assemblage which could be mainly characterized by the whitefinned gudgeon, the schraetser and the white bream. The dominance of pebble (>37.5%) was the best associated with the Danube streber, the barbel and the round goby. Although the portion of gravel was also a determinant of the separation of gobiid species (>4%), and especially for the round goby since the relative abundance of the bighead and the racer goby was very low in this assemblage. Overall, the assemblage-level analyses with MVRT corresponded well with the species specific result of CART analyses.

| DISCUSSION
To our knowledge, this is the first study that directly models the relationships between physical habitat variables and the abundance of several offshore benthic fish species in the very large Danube River.
Although former studies addressed patterns in the occurrence and abundance of species offshore to some degree, these studies focused more on the comparison of inshore versus offshore assemblages and the effect of sampling methods on assemblage structure (see Of the invasive Ponto-Caspian gobies, the round goby was the most dominant species offshore. It occurred along a variety of habitat conditions, although its abundance was lower in the most fast flowing and deepest habitats. The CART analysis further highlighted that its abundance was the highest in relatively slow flowing areas with higher portion of rough substratum (gravel and pebble). This result is consistent with our former, more local scale study, which directly quantified offshore habitat preference curves for the species (Baranya et al., 2018). The habitat use of the racer goby differed to some degree from the round goby. Specifically, the racer goby was more abundant in slow flowing areas with silt substratum and higher variation in depth. Finally, the bighead goby was very rare offshore. It occurred among mean habitat conditions and, not surprisingly, the CART analysis did not select physical variables which would significantly influence its abundance pattern. Overall, our study prove that several goby species occur offshore, although of these, only the round goby is the only species which have abundant populations offshore in the Middle-Danube (see also Szalóky et al., 2015). Several studies justify that riprap covered shorelines are preferred habitats for most goby species inshore, and especially for the round and bighead  (Wolter & Bischoff, 2001). Of the examined species the white bream proved to be the most ubiquitous. The species occurred along the whole physical variability gradients, although it generally reached the highest abundance among mean habitat conditions confirming former findings (Wolter & Bischoff, 2001;Wolter & Freyhof, 2004). Not surprisingly the CART analysis did not select specific variables, which could be related the most with the abundance of the species. However, it should be noted that results of the CART F I G U R E 5 Result of the multivariate regression tree analyses (MVRT) for the benthic fish assemblage. For each tree, bars show the relative abundance of fishes, and the significant indicator species. The key response variable, which defines each split in the tree and its mean value is also shown analysis should be considered with caution based on results of the PCA, which showed that many physical variables correlate with each other and it is hard to precisely select the one and only variable which is responsible the most for the habitat use of these benthic species.
Mesoscale-level formation of fish assemblages was elusive offshore, which was indicated by the relatively high cross-validation error of MVRT. Nevertheless, the assemblage constituting species differed to some degree in relative abundance among the differentiated assemblage types. Differences in substratum composition were the most influential separating variables in assemblage patterns, which is not surprising for this benthic assemblage (Greenberg, 1991).
Although we found that the relative proportion of pebble was one of the key variables in the formation of this offshore assemblage, again, it must be emphasized that substratum composition in general showed correlation with other physical variables.
In conclusion, our findings demonstrate that many benthic species do respond to offshore physical habitat heterogeneity in the Danube River. Although most environmental heterogeneity was related to the macroscale (10 4 -10 5 m), we found that mesoscale-level (10 1 -10 3 m) differences in physical habitat quality clearly influence the formation of fish assemblages in this very large river. Consequently, we encourage researchers and managers to pay attention not only to inshore but also to offshore physical and biological heterogeneities of large rivers, even in seemingly homogenous habitats, where geomorphological breaks in flow, depth and substratum patterns cannot be easily recognized. This may contribute to the more effective conservation and management of offshore fish assemblages, which is critically important in the era of increasing inland navigation.