Modeling riparian species occurrence from historical surveys to guide restoration planning in northwestern USA

. Successful restoration of riparian habitats and functions depends in part on selection of plant species that are suited to local geomorphic and climatic conditions, which often relies on contemporary reference sites to characterize target riparian vegetation communities. In heavily modi ﬁ ed landscapes, a lack of undisturbed sites hinders the description of reference conditions to help guide planning efforts. In lieu of contemporary reference sites, we used historical Public Land Survey data from the late 1800s and early 1900s to document historical streamside vegetation at 1685 sites distributed throughout the Columbia River basin. We used those data to construct a random forest classi ﬁ cation model using climatic and geomorphic variables to predict the probability of occurrence of riparian vegetation groups (conifer, deciduous, shrub, and willow) and individual taxa ( ﬁ r, pine, cedar, cottonwood, alder, sagebrush) for all stream reaches with bankfull width > 6 m in the interior Columbia River basin. The most common predictor variables included in the best models for vegetation groups or individual taxa were mean annual precipitation, minimum temperature, elevation, and bankfull width. For some taxa, temperature range and ﬂ oodplain width were also important predictors. The probability maps indicate that riparian zones were likely dominated by willow species in semi-desert regions and by conifer species in the humid mountain regions. Deciduous species dominated riparian areas in transition zones between conifer forests and semi-deserts. Species distributions suggest that streams in the semi-deserts were likely characterized by little shade and low wood abundance, whereas streams in the humid mountains would have been more heavily shaded with high wood abundance. The transitional deciduous areas were likely shaded with moderate wood abundance. Historical trends in air temperature and precipitation suggest relatively small changes in climate since the time of the surveys, indicating that current species ranges are likely similar to historical species ranges. Hence, these maps can be used to help identify suitable taxonomic groups and expected riparian functions for riparian restoration in the Columbia River basin, with appropriate adjustments made to site-speci ﬁ c restoration designs to account for model uncertainty, future climate change, or land use constraints.


INTRODUCTION
Restoration of riparian habitats requires identification of species compositions that are suited to local geomorphic and climatic conditions, which often relies on the identification of reference conditions for riparian vegetation communities (Bolliger et al. 2004). Reference conditions are typically based on undisturbed or minimally disturbed sites (either historical or contemporary), and they provide a benchmark against which current conditions can be compared (Harris 1999, Pollock et al. 2012. Reference conditions have two main purposes in restoration planning. First, they can be used to evaluate current riparian status and function relative to a normative state, thereby aiding in the identification of sites in need of restoration (Hughes 1997, Hyatt et al. 2004, Egan and Howell 2005. Second, reference conditions help define restoration targets and appropriate species compositions in the design of restoration actions (Harris 1999, Bolliger et al. 2004). However, in heavily modified landscapes a lack of undisturbed reference sites often hinders the description of reference conditions to help guide planning efforts (Hughes et al. 2005). In such cases, use of historical information can guide the identification of reference conditions Montgomery 2001, Egan andHowell 2005).
Recent emphasis on restoration of riverine ecosystems for Pacific salmon (Oncorhynchus spp.) in the western USA has prompted riparian restoration actions to provide shade and wood to improve stream habitat conditions (Beechie et al. 2000, Fullerton et al. 2009, Justice et al. 2017, reduce sediment and nutrient delivery to streams (Katz et al. 2007), and control erosion (Rood et al. 2015). In the Pacific Northwest, restoration of conifer forests has received the bulk of attention (Welty et al. 2002, Meleason et al. 2003, Pollock et al. 2012, and there has been little guidance on riparian restoration where conifer species are not naturally present (Wissmar 2004). One issue of particular management importance is how to select plant species that are suitable for riparian restoration in drier regions, especially where natural riparian vegetation communities were dominated by willow and shrub species (e.g., the semi-arid interior Columbia River basin).
In this paper, we reconstruct patterns of historical riparian vegetation in the Columbia River basin based on the Public Land Surveys (PLS) conducted in the late 1800s and early 1900s, prior to significant modifications of most riparian areas. Public Land Survey data have been used extensively to reconstruct historical upland forest conditions in the midwestern USA (Whitney 1986, Radeloff et al. 1999, Friedman et al. 2001, Whitney and DeCant 2005. Less effort has been made in the Pacific Northwest, and only a few of these studies have attempted to reconstruct floodplain vegetation patterns (Sedell and Froggatt 1984, Collins and Montgomery 2001, Collins et al. 2003. In this study, we use a sample of PLS data across the Columbia River basin to develop random forest models of riparian taxa occurrences based on climatic and geomorphic variables, and map those distributions across more than 250,000 stream reaches. We present our model results in two forms: probabilities of occurrence for riparian vegetation groups and individual taxa, and a map of dominant riparian vegetation classes. We discuss differences in statistical importance of geomorphological and climatic controls on riparian vegetation taxa, as well as potential sources of error that limit prediction accuracy. Finally, we discuss the potential value of our results in riparian restoration planning, as well as the implications of using a historical reference in the context of current vegetation patterns and potential effects of climate change.

Study area
The Columbia River drains 674,478 km 2 from British Columbia, Canada, and seven U.S. states (Quigley and Arbelbide 1997), and anadromous salmon can currently access 187,000 km 2 of that area in the USA (Fullerton et al. 2006; Fig. 1). Elevations range from 50 m in the Columbia gorge to over 3700 m in the Salmon River basin, and mean annual precipitation ranges from <20 cm/ yr in the central deserts to 355 cm/yr in the Cascade Mountains (PRISM Climate Group 2019; Fig. 2). The area includes a wide range of physical and ecological conditions ranging from relatively wet forests in Cascade Mountains to semi-arid and desert regions in the central plateaus (US EPA 2015). Common land uses include v www.esajournals.org agriculture at lower elevations and timber harvest at higher elevations; large urban areas are uncommon in the interior Columbia River basin (Fullerton et al. 2006). Natural upland vegetation in the Columbia River basin varies in relation to temperature and precipitation patterns. The sagebrush steppe and grasslands are at lower elevations and drier than v www.esajournals.org 3 May 2021 v Volume 12(5) v Article e03525 higher elevation forested regions. Upland forests dominated by ponderosa pine (Pinus ponderosa) generally form a belt in the transition zone between steppe and higher elevation forests, which consist of Douglas-fir (Pseudotsuga menziesii), grand fir (Abies grandis), western larch (Larix occidentalis), true fir species (Abies spp.), and lodgepole pine (Pinus contorta; Daubenmire and Daubenmire 1968, Franklin and Dyrness 1973, Hall 1973. Riparian vegetation reflects these patterns to some degree, but different topography, disturbance patterns, and moisture levels can result in riparian vegetation communities that differ from surrounding uplands (Wissmar 2004). For example, riparian areas in low-elevation grass and shrublands are often dominated by willows (Salix spp.), other shrub species, or deciduous tree species such as cottonwood (Populus spp.) and alder (Alnus spp.; Daubenmire 1970, Evans 1989). Ponderosa pine communities may also extend into the low-elevation steppe and grasslands within shaded valleys and riparian areas. At middle elevations in the transition zone between shrub-steppe and forest vegetation upland types, conifer trees such as Ponderosa pine (P. ponderosa) and Douglas-fir (P. menziesii) are mixed with deciduous tree species (Daubenmire 1970, Crowe et al. 2004. At higher elevations, riparian conifers are more prevalent, although broadleaf deciduous species are still present (Daubenmire 1970, Crowe et al. 2004). v www.esajournals.org Extensive change of river and riparian systems in the interior Pacific Northwest began in the early 1800s when fur trappers dramatically reduced beaver populations (Gibson and Olden 2014). Prior to settlement by ranchers and homesteaders, many floodplains were lined with woody vegetation such as willow, cottonwood, pine, and herbaceous species like sedges and rushes, perhaps in part because of beaver activity (McAllister 2008). Floodplains in the semi-arid interior Pacific Northwest were attractive for settlement, and significant portions have since been converted from these natural vegetation covers to agricultural fields, urban landscapes, and livestock grazing areas (Fullerton et al. 2006). These land uses, combined with stream channelization, loss of beaver, and draining of wetlands, have greatly altered the function and appearance of the basin's riparian vegetation (Fullerton et al. 2006, Pollock et al. 2007).

Methods overview
We hypothesized that predictors of riparian plant species along small streams would mainly be climate variables such as mean annual precipitation or temperature (Wissmar 2004, McKenney et al. 2007, Martin and Canham 2020, especially in forested areas where dominant riparian species are similar to upland species (Pollock et al. 2012). By contrast, we expected that species along large floodplain rivers would also be influenced by variables such as depth to the water table, soil type, erosion frequency, or flooding frequency (Naiman et al. 2005), and therefore may differ from upland vegetation (Stromberg et al. 1991, Braatne et al. 1996, Cordes et al. 1997, Latterell et al. 2006). Moreover, we expected differences between upland and riparian vegetation to be more distinct in xeric regions than in mesic regions because soil moisture gradients are more pronounced (Sabo et al. 2005). However, site-specific data on variables such as depth to water tables are not available across the study area, so we examined whether riparian vegetation along both small and large rivers could be predicted by coarser resolution climate and geomorphic variables.
We collected a broadly distributed sample of riparian vegetation points from PLS notes, as well as climatic and geomorphic predictor variables at each sample point (details of the survey methodology are in subsection The Public Land Survey).
We used historical PLS vegetation data to reconstruct likely natural riparian vegetation rather than using contemporary reference sites because the latter were exceedingly rare in the heavily modified interior Columbia River basin, especially in the shrub-steppe and grassland zones where riparian reference sites were virtually nonexistent. We only used PLS data points where the survey transect crossed a stream or river channel, or where the survey notes indicated that vegetation was located along the stream or river bank. Because the PLS notes did not include information on abundance of taxa, we simply recorded presence or absence of taxa at each point. We then grouped taxa present at each point into one of four general riparian vegetation groups: conifer, broadleaf deciduous, willow, or shrub (Table 1), corresponding to differences in vegetation height and riparian functions (e.g., wood supply, shading, leaf litter supply). Any point could have more than one group present.
For each PLS point, we assigned vegetation data to the adjacent stream reach and calculated climatic and geomorphic predictor variables for those reaches. We then constructed multivariate statistical models to predict riparian vegetation group and taxon distributions based on environmental variables across the interior Columbia River basin and developed maps of predicted likelihood of occurrence of each vegetation group or individual taxon. We also explored the importance of each independent variable as a predictor of riparian vegetation group and created maps of the most likely taxonomic group or taxon for each reach.

The Public Land Survey (PLS)
Beginning with the Land Ordinance of 1785 and the Northwest Ordinance of 1787, the public lands of the U.S. were divided into 36 square mile (93.2 km 2 ) townships, which were further subdivided into one square mile sections (2.59 km 2 ; White 1991). Survey teams traversed the boundaries between sections, measuring distances with 66 foot (20.1 m) chains composed of one hundred links, recording distances in units of chains and links. Each section corner or quarter section corner (midpoint between section corners, approximately 800 m spacing) was marked with a wooden post or a mound of dirt, and vegetation types and/or species were described. Surveyors were also instructed to record abrupt boundaries between vegetation types, water bodies (such as wetlands), and trees that directly intersected the survey line. Surveyors used the point center quarter sampling method for trees at some corner points (White 1991), but not at treeless points or at most riparian points. In those cases, surveyors simply recorded vegetation present. Meander surveys were also conducted in some areas. During meander surveys, surveyors followed the bank of a river or other water body recording direction and distance for each bank segment and recorded vegetation when they reached the boundary between sections. The extensive history and details of the PLS are well summarized in other sources (White 1991).
While surveyors were instructed on how to conduct surveys, some did not apply methods consistently or record information at the same level of detail as more careful surveyors. Surveyor bias for or against certain tree sizes, species, or locations has been documented (Bourdo 1956, Manies et al. 2001, and we also noted that surveyors rarely recorded riparian vegetation types along small streams (less than about 6 m bankfull width), limiting our ability to document historical riparian vegetation along smaller streams in the channel network. Nonetheless, recording of tree taxa present along larger streams was sufficiently consistent and widespread for this analysis.
A key challenge in using PLS notes for mapping riparian species is the potential inaccuracy in species identification. We recognize that surveyors were not trained in plant identification, and they may have misidentified species at times. Moreover, surveyors often used only general common names, and assigning a species name is not possible. For example, the similar appearance of western birch (Betula occidentalis) and alder (Alnus sp.) could result in misidentification of those species. Similarly, surveyors often recorded only pine, which could represent either ponderosa pine or lodgepole pine, and fir, which could represent either Douglas-fir or grand fir. However, these species may occur in very different topographical and environmental conditions, and some environmental variables such as elevation can help determine the correct species for restoration. Although today many people tend to refer to all conifer species as pine, we found that surveyors often noted both fir and pine at the same survey point throughout the region, suggesting that this was generally not the case among surveyors. Additional potential sources of error from the PLS data are that absence of a species in the vegetation description may not indicate absence of species at a particular site and that sometimes it is unclear if a survey point is near a stream. For consistency in interpreting the notes, we assumed that a species was absent if it was not mentioned, and we included only points that were clearly adjacent to a stream to avoid inadvertent inclusion of non-riparian data points.

Historical vegetation mapping
We acquired PLS maps and survey notes from the Bureau of Land Management for Oregon and Washington (https://www.blm.gov/or/landrec ords/survey/ySrvy1.php), and Idaho and Montana (https://glorecords.blm.gov/search/default.a spx). We then created a geospatial data layer of sampled PLS points, limited to the interior Columbia River basin within the USA as no Canadian equivalent surveys were available (Fig. 1). In selecting sample sites, we attempted to capture the range of elevation and precipitation zones in the basin, anticipating that this would adequately represent geographic areas with similar elevation and precipitation even if not included in the sample. However, other conditions that may influence riparian vegetation such as lithology or soil type were not considered v www.esajournals.org in sample site selection and are not included as covariates in the models.
To collect the vegetation data, we followed the surveyors' paths along section lines and entered a point corresponding to the location of each surveyor's record of vegetation encountered. We recorded presence or absence of 65 taxa at that point and noted whether the point was within a floodplain (based on visual examination of the digital terrain model) or adjacent to a nearby stream (based on surveyor notes or proximity to a stream in the digital hydrography layer). Vegetation notes were sometimes vague and impossible to relate to specific species (e.g., briars, brush, or no timber), whereas others were identifiable to a likely genus or family, but not species (e.g., pine, birch, willow). Many species appeared to be identified relatively consistently among surveyors (e.g., larch) despite multiple names for some species (e.g., balm and cottonwood were likely the same species). Because of these inconsistencies, we grouped all taxa into four general groups for analysis: conifer, deciduous, willow, and non-willow shrubs (Table 1). We separated willow from other shrubs because willow was consistently recorded across the region, whereas other shrub species were more rarely recorded and there were insufficient points to create distinct groups. We assumed that generalization of detailed information into broad categories should reduce effects of surveyor error (Schulte and Mladenoff 2001). In addition to the vegetation groups, we analyzed distributions of six taxa for which there were enough sample points to run the analysis: fir, cedar, pine, cottonwood, alder, and sagebrush (Table 2).

Assigning predictor variables to vegetation points
We used the National Hydrography Dataset (NHD) High-Resolution data at 1:24,000 scale to represent the stream network in the Columbia River basin (Moore et al. 2019), which we segmented into 200-m reaches. We chose four climatic and five geomorphic predictor variables (i.e., independent variables) that potentially influence riparian vegetation and assigned them to each reach. We found that mean annual minimum and maximum temperatures were so highly correlated that we eliminated maximum temperature and conducted the analysis with eight final variables (Table 3). Precipitation, temperature, and elevation largely constitute the environmental envelope of plant species, whereas the geomorphic variables influence channel pattern and physical disturbance regime (e.g., flooding, floodplain erosion, and creation; Beechie and Imaki 2014), which also influence distributions of riparian plant species (Naiman et al. 2005(Naiman et al. , 2010. We assigned mean annual precipitation and mean annual maximum and minimum temperature values to each reach (PRISM Climate Group 2019) and calculated mean annual temperature range. Using the 10-m National Elevation Dataset (NED), we estimated the slope of each 200-m reach by calculating the elevation difference between upstream and downstream ends of a reach and dividing by the reach length (Gesch 2007). Because the NHD streamline was not always aligned with the lowest point of the DEM, we searched for the lowest elevation point within a 30-m radius of the upstream and downstream end points of each reach. We calculated bankfull channel width based on drainage area and precipitation (Beechie and Imaki 2014), and floodplain width from the NED (Bond et al. 2019). To calculate the percentage of the drainage area upstream of each reach that can supply fine sediment (hereafter referred to as "fine sediment supply"), we used a classification of bedrock geology into nine categories of grain size and erosion resistance and calculated the proportion of the drainage area containing finegrained lithologies with low or moderate erosion resistance (Beechie and Imaki 2014).  (125) and we excluded those sites from the analysis.
Our final data set included 1685 reaches, with each of the four riparian vegetation groups and six individual taxa well represented (Table 2).

Predicting riparian vegetation types in the Columbia River basin
The first step in the development of predictive models for riparian vegetation was to confirm that environmental characteristics differed among vegetation groups using a non-parametric permutation test, PERMANOVA (Anderson 2001). The test was performed using the VEGAN package in R Version 3.6.1 (Oksanen et al. 2013, R Core Team 2019). We also conducted a multiple comparison test using PERMANOVA to detect significant differences in environmental characteristics between all paired vegetation groups.
We used a random forest algorithm (Breiman 2001) to predict the likelihood of presence for each riparian vegetation group in each reach, as well as for each individual taxon in each reach. We used the random forest model because it has high classification accuracy compared to regression methods, is insensitive to correlations among variables, and is ideal for predicting relationships that may include non-linearities (Cutler et al. 2007). Similar to classification and regression trees, random forest models repeatedly split the response data into groups by choosing the single predictor among a randomly chosen set at each split point that provides the most parsimonious discrimination. To prevent overfitting problems, random forest algorithms generalize across many trees built with a subset of the data. We constructed random forest models with the ranger and caret packages for R (Kuhn 2008, Wright andZiegler 2017). We randomly selected 80% of reaches to train the model, while the remaining 20% were used to test model accuracy. We used overall model accuracies (the percentage of test points correctly classified for presence or absence) and the kappa statistic, as well as variable importance based on Gini impurity, to identify the best model for each vegetation group and species.
Using the most accurate models, we developed probability maps for each vegetation group to illustrate where each plant group had a high probability of occurrence (>75%), intermediate probability of occurrence (25-75%), or low probability of occurrence (<25%). Each final model included 1000 trees and two variables at each node, and probabilities were calculated as the proportion of trees that vote for the presence of a specific vegetation group in each reach. We also produced a single map of dominant vegetation  (2014) v www.esajournals.org classes to more simply elucidate the distribution of species groups and riparian functions. For this map, each reach was classified based on species groups with probabilities of occurrence greater than or equal to 75%, and reaches with high probability of both conifer and deciduous groups were classified as mixed (Table 4).

RESULTS
The non-parametric permutation test revealed that riparian vegetation groups occupied significantly different environments (P < 0.001). Conifer species were concentrated in high precipitation areas, but were distributed across the range of elevations and temperatures in the data set, whereas willow occupied the widest range of environments, overlapping each of the other species groups in some portion of its environmental envelope (Fig. 3). Despite apparent overlap in environmental characteristics among groups, pairwise comparisons showed that environmental characteristics differed significantly among each pair of riparian vegetation groups (Table 5).
Each riparian vegetation group was best predicted by a random forest model with slightly different predictor variables, and models for the conifer and deciduous groups were most accurate with overall accuracy near or above 90% (Table 6). Overall accuracies of the shrub and willow models were also high (85% in both cases). Individual taxon predictions were over 90% accurate for species with very distinct precipitation requirements (fir, pine, and sagebrush). River-dependent species (cottonwood and alder) were more difficult to predict based on climate and simple geomorphic variables, yet accuracy was above 80% for each ( Table 6).
Maps of probabilities of occurrence for each vegetation group show that conifer species were concentrated in the mountain regions with higher precipitation, whereas willow species were moderately likely or very likely to occur almost everywhere (Fig. 4). Deciduous species were most likely to occur in a transition zone between the wetter mountains and semi-desert regions, but also had a moderate probability of occurrence almost everywhere in the basin. Shrub species were rare in the PLS notes and were rarely predicted with high probability in the Columbia River basin.
A map of the dominant vegetation types across the basin illustrates a general pattern of species transition from conifer, to mixed, to deciduous, to willow as one moves from the mountains downstream to lower precipitation areas (Fig. 5, compare to Fig. 2). However, many reaches in semi-arid areas of the basin were classified as uncertain, indicating that the model could not identify a dominant taxonomic group in some environments. For individual taxa, pine and fir were the only Only the willow group has >75% probability of occurrence Short trees and shrubs (generally <10 m height); no large wood supply; provides shade and leaf litter Shrub Only the shrub group has >75% probability of occurrence Short trees and shrubs (generally <10 m height); no large wood supply; provides shade and leaf litter Uncertain No species group has >75% probability of occurrence, but most likely a mix of deciduous and willow species Functions most likely include shade, root strength, and leaf litter. May include supply of short-lived wood species with significant areas of high probability of presence, and cedar and sagebrush were low probability nearly everywhere (Fig. 6). Cottonwood and alder had a moderate probability of occurrence over much of the basin. Four predictor variables were in all of the best vegetation group models: precipitation, minimum temperature, elevation, and bankfull width (Table 6). These same variables were also in all of the individual taxa models, except that minimum temperature was not in the best Fig. 3. Non-metric multidimensional scaling (NMDS) ordination based on dissimilarity of geomorphic and climatic variables between conifer, deciduous, shrub, and willow riparian vegetation groups (stress = 0.12). Note that there is broad overlap among species in temperature and elevation, but conifer species are confined to areas with higher precipitation and narrower floodplains. v www.esajournals.org sagebrush model. Temperature range and fine sediment supply were also in the best model for conifer species, and slope was only included in the shrub model. In the conifer model, precipitation was the most important predictor, whereas precipitation, elevation, and bankfull width were roughly equal in importance for the deciduous species (Fig. 7). Elevation was the dominant predictor for shrubs, and bankfull width was the dominant predictor for willow. Importance of variables for individual taxa mirrored those of the riparian vegetation groups, with mean annual precipitation the most important predictor for the conifer species (fir, pine, cedar), whereas predictor variables for cottonwood, alter, and sagebrush were relatively equal in importance.

DISCUSSION
The finding that pine and fir were the most common species in the conifer zone (42% and 34% of points, respectively) is consistent with other studies (e.g., Crowe et al. 2004), but we note that it is not clear from the surveyor notes when pine refers to ponderosa pine or lodgepole pine, or when fir refers to Douglas-fir or a species of true fir. However, these species generally occupy different bioclimatic envelopes. For example, lodgepole pine generally occurs above about 1300 m in the study area, a higher elevation than ponderosa pine (Hall 1973, Kovalchik 1987, Crowe et al. 2004. Similarly, Douglas-fir and grand fir occur at lower elevations in the study area (above about 400 m; Crowe et al. 2004, Kovalchik andClausnitzer 2004), but other species of true fir such as Pacific silver fir (Abies amabilis) or subalpine fir (Abies lasiocarpa) occur at much higher elevations (1100-2300 m; Hall 1973, Crowe et al. 2004, Kovalchik and Clausnitzer 2004.
The predicted ranges of deciduous-dominated and mixed conifer-deciduous riparian forests are also consistent with descriptions of current patterns (Daubenmire andDaubenmire 1968, Daubenmire 1970). Cottonwood and willow communities-often joined by birch and alderare commonly found in riparian areas of the grassland portions of the Columbia River basin (Daubenmire andDaubenmire 1968, Daubenmire 1970), which lie between the drier sagesagebrush steppe and wetter conifer forests. Upstream of the deciduous forests is typically a zone of mixed conifer-deciduous riparian forests, where black cottonwood, birch, and/or alder are intermixed with conifer species (Franklin and Dyrness 1973, Moseley 1998, Crowe et al. 2004, Kovalchik and Clausnitzer 2004.
Historical willow-dominated communities were generally located in drier regions, where early explorers and railroad surveyors frequently noted a scarcity of trees and a preponderance of shrubby willows and graminoids along the lower reaches of the area's major rivers (Stevens 1860, Meinig 2016. At present, willow-dominated communities are still commonly found in drier Notes: Predictor variables are described in Table 2. BF, bankfull; Elev., elevation; F. sed., fine sediment; FP, floodplain; Min., minimum; Precip., precipitation; Temp., temperature. regions of the interior Columbia basin (Evans 1989). However, willow species are also present in the deciduous and conifer zones, but generally as an understory species. This minor presence is also apparent in our modeled historical willow distribution, where willow appears with lower probability in conifer and deciduous forests than in the willow-dominated areas.
Some have speculated that cottonwood and water birch (B. occidentalis) may have been much more common historically than at present (Daubenmire 1970, Evans 1989), but we found no evidence that the range of either species has been reduced. Nonetheless, our data support the idea that cottonwood was both common and widespread in the interior Columbia River basin historically and that where it was present cottonwood species are a reasonable choice for inclusion in the species mix for riparian restoration.

Implications for restoring riparian functions
While our analysis cannot identify individual species for riparian restoration, the maps of probable occurrence of taxonomic groups can provide general guidance for riparian restoration in the Columbia River basin. Most importantly, our results can help identify taxonomic groups that are likely to provide riparian functions that are similar to those to which aquatic species have adapted. For example, in stream reaches of the v www.esajournals.org semi-desert central Columbia Plateau, willows were common, deciduous vegetation moderately common, and conifers were absent. This suggests that stream reaches of the central Columbia Plateau likely contained little large wood and received little shade, especially along larger river channels. Hence, riparian restoration in the central Columbia plateau could focus on willows and other smaller tree and shrub species that provide similar functions (Crawford 2003). By contrast, in conifer areas large wood was likely much more abundant and taller trees would have provided significant shade, suggesting that restoration in those areas should focus on conifer species providing large wood and shade, possibly mixed with other deciduous or willow species. In the transitional deciduous-dominated areas, reaches were likely shaded with moderate wood abundance, and restoration could focus on reestablishing tall deciduous species that provided significant shade. In each case, identifying individual species that should be planted must rely on more detailed site information and species' environmental tolerances.
Physical channel changes or land use constraints can limit opportunities to restore native riparian species and functions. For example, channel incision in some areas has created inset floodplains that are much narrower than the historical floodplain (Pollock et al. 2007, Beechie et al. 2008. As a result, the water table is lowered, and riparian vegetation exists only on the v www.esajournals.org inset floodplain, while the historical floodplain becomes a terrace dominated by drier upland vegetation (Pollock et al. 2007. Hence, riparian restoration with native species is limited to narrow inset floodplains where plants can access the water table. However, physical restoration of incised channels can raise the water table so that a greater area of Fig. 6. Maps of modeled likelihood of presence for riparian taxa across the interior Columbia River basin, showing that some taxa are predicted with relatively high certainty of presence or absence (fir, pine, cedar, and sagebrush). Other taxa have less distinct environmental envelopes (e.g., cottonwood, alder), and therefore, the models predict intermediate probability of occurrence over a large area.
floodplain and terraces can be colonized or replanted with appropriate riparian species (Pollock et al. 2014, Powers et al. 2019. Similarly, isolation of floodplains by levees reduces width of the riparian area and constrains channel dynamics that created diverse species and age structure on floodplains (Fullerton et al. 2006, Hall et al. 2007. Reconnection of such floodplains can restore river-floodplain dynamics, allowing restoration of native species and age diversity in dynamic riparian areas (Naiman et al. 2010).
We acknowledge that the lack of small streams (<6 m modeled bankfull width) and less common riparian environments in our sample will exclude some unique riparian vegetation types such as alpine meadows, wetlands, or sparse vegetation along streams in the channeled scablands of the Columbia Plateau (Crawford 2003). Moreover, variables that were not included in the analysis (e.g., lithology or soil type) may also influence riparian species occurrence, meaning that extrapolation of modeled probabilities of occurrence to unsampled areas may result in prediction errors due to missing covariates in the model. Indeed, our map of the most likely vegetation types (Fig. 5) includes significant portions of the semi-arid Columbia Plateau and Snake River Plain classified as uncertain, meaning that no taxonomic group was predicted with high probability of occurrence in those reaches. Other studies have identified a range of shrub, grass, forb, and wetland riparian communities in those areas, many of which do not match any species group in our analysis (Moseley 1998, Crawford 2003. These results suggest that stream types not represented in our analysis may have had different species compositions and riparian functions than those that can be predicted by the model, and also that in some areas the model is simply too uncertain to provide guidance on suitable taxonomic groups for restoration. As noted earlier, selection of individual species for riparian restoration should consider local site characteristics and species tolerances to assure that species are suited to locations where restoration is planned.

The historical reference and climate change
Selecting a historical species mix as a restoration target presumes that current environmental conditions are similar to those that determined historical species distributions. One argument against using historical references as a target for restoration is that environmental conditions may be so different today or in the future that the historical condition cannot be achieved (Harris et al. 2006, Seavy et al. 2009). For vegetation types in the study area, our observations across the region suggest that current species distributions generally match those in the historical notes and our model results, and we have no observations of range expansion or reduction of any riparian species. However, some studies suggest that the climate envelopes of many species may shift location in the future (Holsinger et al. 2019), although established species may persist while new species may not disperse rapidly (Davis andShaw 2001, Sittaro et al. 2017). This may create a lag between climate change and shifts in riparian species ranges.
From the early part of the 20th century  to near-present (1985-2016), there have Fig. 7. The relative importance of each input variable included in the final random forest models predicting the likelihood of occurrence of conifer, deciduous, shrub, and willow species. The conifer distribution is mainly determined by precipitation, whereas the other groups are more equally influenced by a number of variables. been only modest increases in annual average, average minimum, and average maximum temperatures (+0.85°C; Vose et al. 2017), and no clear trend in annual precipitation (ranging from À5% to +5% across the region; Easterling et al. 2017). Hence, there has been little change in climatic conditions affecting modeled riparian species distributions through the 20th century. However, a 2.5°C increase in the coldest temperature of the year over that time period ) may contribute to recent pine beetle infestations, which have killed extensive areas of lodgepole pine forest in the Rocky Mountains (Mitton and Ferrenberg 2012). Fire frequency has also increased in the west recently (Westerling et al. 2006), and both factors may decrease the extent of mature conifer forests in the region.
If future climatic conditions are likely to be different from those that supported historical and current riparian species (Harris et al. 2006, Seavy et al. 2009, Holsinger et al. 2019, it is logical to consider those changes in riparian restoration planning (Harris et al. 2006, Beechie et al. 2013, Perry et al. 2015. The CMIP5 climate change projections for 2070-2099 indicate drier summers (0% to À20%) and wetter winters (0% to +20%), from the baseline near-present period of 1986-2006. Projected increases in 20-year extreme precipitation are +9% to +10% for mid-century (2036-2065) and late century (2071-2100), respectively, under RCP4.5 (moderate emissions), and +11% in mid-century to +19% in late century under RCP8.5 (high emissions; Easterling et al. 2017). These changes suggest increasing peak flows and decreasing low flows in the interior Columbia River basin, although predicted flow changes vary across the region (Beechie et al. 2013, Tohver et al. 2014. Higher peak flows are likely to increase the frequency of riparian erosion, which may lead to larger patches of early seral species in the near stream zone , Naiman et al. 2010. By contrast, decreasing low flows may lead to larger areas of gravel bar with little vegetation (Perry et al. 2015). In general, these changes are not expected to shift species ranges within the study area, but may cause shifts in patch sizes and the proportion of species coverage within a given reach. Guidance for modifying restoration plans to account for such projected changes is intended to help ensure that selected plant species persist in a range of climate futures (Perry et al. 2015).

CONCLUSIONS
The PLS notes provide a useful, systematic record of historical riparian vegetation patterns in the Columbia River basin, which we used to model and map probabilities of occurrence of natural riparian vegetation types across the region. These maps of dominant riparian taxonomic groups can be used as a guide for identifying species groups that provide riparian functions similar to those to which aquatic species have adapted. However, there are uncertainties in the model predictions that preclude identifying individual species for use in restoration, and understanding local environmental conditions and habitat needs of potential riparian taxa will be important for identifying plant species that are suitable for inclusion at individual sites.
Historical conditions do not necessarily represent current or future conditions, but past climate trends and future climate projections in the study area suggest that changes in precipitation and air temperature are not likely to lead to appreciable shifts in distributions of riparian taxa. However, increasing peak flows and decreasing low flows may affect the abundance of individual species more than changes in species presence or absence. That is, we may see wider disturbed areas and larger patches of early seral species as a function of increased peak flows and floodplain erosion, as well as increased abundance of xeric species resulting from decreasing low flows and drying of low-elevation floodplains and bars. We suggest following existing guidance for adapting riparian target conditions or restoration plans to accommodate future climate change or other geophysical and land use constraints.

ACKNOWLEDGMENTS
This project was funded by the Western Regional Office of the National Marine Fisheries Service. Timothy Beechie, Michael Pollock, and Oleksandr Stefankiv wrote the paper, and Oleksandr Stefankiv and Morgan Bond conducted all GIS and statistical analyses. The authors have no conflicts of interest to declare. Aimee Fullerton, Peter Kiffney, and two anonymous reviewers provided helpful reviews of the manuscript.