Generalizing Tree–Level Sap Flow Across the European Continent

Sap flow offers key insights about transpiration dynamics and forest‐climate interactions. Accurately simulating sap flow remains challenging due to measurement uncertainties and interactions between global and local environmental controls. Addressing these complexities, this study leveraged Long Short‐Term Memory networks (LSTMs) with SAPFLUXNET to predict hourly tree‐level sap flow across Europe. We built models with diverse training sets to assess performance under previously unseen conditions. The average Kling‐Gupta Efficiency was 0.77 for models trained on 50% of time series across all forest stands, and 0.52 for models trained on 50% of the forest stands. Continental models not only matched but surpassed the performance of specialized and baselines for all genera and forest types, showcasing the capacity of LSTMs to effectively generalize across tree genera, climates, and forest ecosystems given minimal inputs. This study underscores the potential of LSTMs in generalizing state‐dependent ecohydrological processes and bridging tree level measurements to continental scales.


Introduction
Accurate quantification of plant transpiration is a critical component in hydrological research, accounting for approximately 65% of global terrestrial evapotranspiration (e.g., Good et al., 2015).Plants play a pivotal role in controlling the exchange of water between the atmosphere and land surface.Yet, capturing complex plant water use responses to environmental conditions and estimating transpiration across spatio-temporal scale, remains challenging.Among the limited number of available measurements for plant transpiration, in-situ sap flow sensors are the most widespread technique due to their relative low cost and ease of use (Dugas et al., 1993).Sap flow has long been recognized, especially in ecology and plant physiology fields, as a fundamental measurement for deciphering vegetation functionality and transpiration dynamics in both forested (Granier & Loustau, 1994) and agricultural ecosystems (Dugas et al., 1994).
While the analysis of sap flow to gain process understanding or to improve predictions has been less prevalent in catchment hydrology than ecology, recent studies highlight how sap flow measurements can enhance understanding of intricate relationships among vegetation characteristics, hydrometeorological factors, and catchment properties.For instance, Hassler et al. (2018) conducted an extensive study to determine the relative influence of tree-, stand-, and site-specific characteristics on sap velocity patterns, using data from 61 beech and oak trees across 24 sites in Luxembourg.Their findings suggest that transpiration estimates at the catchment scale could be significantly improved by taking into account not just hydro-meteorological drivers, but also the spatial patterns of the composition of forests (e.g., total basal area).Renner et al. (2016) showed that variability in sap flow driven by topography and aspect could be balanced out by the forest stand composition, resulting in equivalent transpiration rates across south and north facing hillslopes.This exemplifies how vegetation dynamics adapt to environmental conditions to effectively use available resources.Hoek van Dijke et al. (2019) used sap flow measurements to explore the link between normalized difference vegetation index (NDVI) and transpiration.They showed that NDVI is not always reliable for modeling transpiration, especially during drought periods, as the correlation between NDVI and sap flow can vary positively or negatively, influenced by seasonal changes, moisture availability, and hydrogeological factors.Integrating sap flow into catchment studies and understanding its spatio-temporal variability is thus key for improving transpiration estimates at landscape scales.There is therefore a critical need to generalize sap flow regionally beyond tree-level measurements.
Sap flow measurement campaigns have historically been designed to explore plant-soil interactions at plot or plant scales, as exemplified by studies such as Jackisch et al. (2020) and Seeger & Weiler, 2021.In an effort to amalgamate and harmonize these numerous, individual small-scale field studies into a comprehensive, global open-source sap flow database, the SAPFLUXNET initiative was established (Poyatos et al., 2021).The SAP-FLUXNET database presents thereby a unique opportunity to learn generalizable relationships across different plant genera and different climates.This is essential for estimating sap flow in "ungauged" regions, forest stands, or specific trees where direct measurements of sap flow have not been conducted and plant water use is unmonitored.
Deep learning offers powerful avenues to find generalizations in large data sets (e.g., LeCun et al., 2015).For example, Koppa et al. (2022) employed a feed-forward neural network to analyze daily, global data sets from eddy covariance stations, satellites and sap flow measurements from SAPFLUXNET.Their aim was to create an accurate global vegetation stress model that improves simulations of the reduction of evaporation from its theoretical maximum, for instance during periods of water limitations.By integrating the feed-forward neural network into an existing process-based model, they improved the ability of the process-based model to estimate global evaporation rates.Further, Loritz et al. (2022) showed that recurrent neural networks could simulate vegetation dynamics in form of catchment-averaged sap velocities with low residuals even in areas where the model has not been trained.Similar to Koppa et al. (2022), Loritz et al. (2022) showed that the deep-learningbased sap velocity simulations can be coupled with a process-based, hydrological model, in this case with the objective to replace the semi-empirical Steward-Jarvis equation, resulting in more accurate transpiration estimates and ultimately better soil moisture simulations particularly during a drought year.In addition, Li et al. (2022), highlighted the ability of recurrent neural networks to estimate the vegetation dynamics of a specific tree species in New Zealand from standard meteorological variables.Both Loritz et al. (2022) and Li et al. (2022) found that particular recurrent neural networks like Long Short-Term Memory (LSTM; Hochreiter & Schmidhuber, 1997) models are suitable architectures to simulate and predict sap flow when they compared different deep learning architectures.However, both studies trained models on relatively small and local data sets, representing only the dynamics of a forest stand or a small catchment without deciphering different tree species behavior or different forest types.The extent to which LSTMs can detect consistent relationships between treelevel sap flow across different genera, measurement methods, climates and forests remains an open question.
Regionalized sap flow predictions are crucial information for understanding vegetation water use at larger scales, offering key insights into transpiration dynamics and plant-climate interactions.Such information is necessary to manage forest ecosystems, particularly under the evolving challenges of global change.Despite its importance, accurate sap flow data beyond the tree-level remains challenging due to measurement uncertainties and the complex interplay of various global and local environmental drivers and controls.This study aims to investigate the potential of using LSTMs for modeling tree-level sap flow at an hourly time scale across the European continent.We developed continental tree-level sap flow models leveraging the SAPFLUXNET database to extract generalized relations between sap flow from different tree genera, dynamic atmospheric drivers and forest stand characteristics.We developed different experimental training setups to evaluate the performance of these models in time and space.The presented deep learning approach offers avenues to overcome limitations of transpiration models when tree-level parameterizations for stomatal conductance are locally unavailable, could be used to assess different forest structures and their implications for regional transpiration rates, and ultimately provide robust sap flow based transpiration estimates across scales.

Data-A European Subset of the SAPFLUXNET Database
The SAPFLUXNET database (Poyatos et al., 2021) represents a comprehensive global repository of tree-level sap flow measurements and their ancillary data including tree and forest characteristics and meteorological observations.We used version 0.1.5 of the database and selected a European subset comprising 64 forest stands out of 202 stands in the global data sets, encompassing 738 individual trees.We specifically focused on the European subset of SAPFLUXNET due to the high density of measurements and strong overlap of tree genera found within this region.In total, we included six tree genera with >20 individual tree measurements: 282 plants with Pinus, 159 plants with Picea, 144 plants with Quercus, 94 plants with Fagus, 30 plants with Larix, and 29 plants with Pseudotsuga.Selected data sets represented six forest types according to the International Geosphere-Biosphere Programme (IGBP) classification: 34 evergreen needle-leaf forest (ENF), 11 mixed forest (MF), 8 deciduous broadleaf forest (DBF), 5 deciduous needle-leaf forest (DNF), 4 evergreen broadleaf forest (EBF), and 2 savannas (SAV).We treated each individual tree's seasonal sap flow time series separately resulting in a total of 2,279 years of sap flow (cm 3 h 1 ) observations, each corresponding to about one growing season (April to September, 3-6 months).The division into individual seasonal time series is a key in this study as it allows us to include all sensors, even if they cover only a few months at any possible point in time.The winter period (October to March) was excluded because transpiration is low or zero at most stands.

Selected Model Features
We considered six dynamic features comprising meteorological variables at an hourly resolution available in the SAPFLUXNET database: air temperature (°C), relative humidity (%), vapor pressure deficit (kPa), shortwave incoming radiation (W m 2 ), precipitation (mm), and wind speed (m s 1 ).We considered six static features (time invariant), comprising four forest stand characteristics: mean elevation (m), long-term mean annual temperature (°C), long-term mean annual precipitation (mm), and forest type (DBF, DNF, EBF, ENF, MF, SAV), and two individual tree-level characteristics: diameter at breast height (DBH; cm) and tree genera (Fagus, Larix, Picea, Pinus, Pseudotsuga, and Quercus).We implemented one-hot-encoding for each of the 6 genera and 6 forest types in the data set.Our feature selection was identified by several trial-and-error runs to balance model performance and practical applicability.This selection reflects our strategic choice to minimize input variables and use the maximum number of sites in Europe and to enhance the model's generalizability across unmeasured forest stands, where such detailed information might not be readily available.Omitted variables, which can be important sap flow predictors (e.g., soil water limitation, total basal area) may increase model performance if data becomes more prevalent across sites in the future.

Deep Learning Model-Long Short-Term Memory
LSTMs are recurrent neural networks that are specifically engineered to circumvent the vanishing and exploding gradient problem encountered in regular recurrent neural networks (Hochreiter, 1998;Hochreiter & Schmidhuber, 1997).This is achieved through the introduction of a cell state, which provides the network with the capability to learn long-term dependencies that are typically important in environmental, sequential data.The memory cell works in conjunction with so-called "gates," mechanisms that evolve the memory and output over time, while allowing the error to propagate consistently through the network, thereby facilitating the learning process.LSTMs have showcased their efficiency and aptitude in hydrological modeling and have emerged as one of the top-performing models for simulating various state-dependant ecohydrological phenomena, such as streamflow, soil moisture and ecosystem water and carbon fluxes (e.g., Besnard et al., 2019;De Bartolomeis et al., 2023;Kratzert et al., 2019).We chose LSTMs in this study following the results of Loritz et al., 2022 who previously showed that LSTMs and Gated recurrent networks (GRUs; Cho et al., 2014) are suitable model architectures for simulating hourly sap velocity.

LSTM Hyperparameters
We explored ranges of hyperparameter settings based on previous hydrological studies (Mai et al., 2022) and found that LSTM performance was relatively stable with a wide range of hyperparameters.We chose settings from Mai et al., 2022, with modifications made to (a) a reduced sequence length (which represents the number of time steps the network looks back to process and learn from sequential inputs) and (b) a reduced number of epochs.Both changes only minimally influenced model performance while greatly decreasing training times.The hyperparameter setting used for all model variants are: number of hidden layers = 1; hidden layer neurons = 256; learning rate = 0.0005; dropout rate = 0.4; batch size = 64; sequence length = 24 hr; epoch 20; iterative optimization algorithm = ADAM.

Model Setups and Evaluation
We developed several experimental model setups (see Table .1) using different amounts of data for model training to test the performance of the LSTMs to predict hourly sap flow in time and space.Our objective is to compare these model setups (a) to quantify model performance in predicting sap flow for unseen periods and for unseen locations and (b) to assess the value of training LSTMs on larger data sets like SAPFLUXNET instead of building models individual for each forest stand or genera.In the following we explain each model setup in detail: We developed gauged-continental models, trained across all 64 forest stands to assess the ability of the LSTMs to generalize in unseen time periods but in seen forest stands (temporal performance).We developed ungaugedcontinental models, trained on random subsets of 33 forest stands and tested on 31 stands, to examine the LSTM's ability to generalize and predict sap flow at unseen forest stands (spatial performance).We developed two types of specialized models (single forest stand models; single tree genera models) on smaller, more constrained training sets, to infer if the LSTM's performance improves if trained on a large and diverse database like SAPFLUXNET.Single forest stand models are trained only on a single forest stand (location) across each genera.Single tree genera models are trained on a single genera (e.g., Fagus, Pinus) but across all forest stands where the genera is present.Further, we developed gauged and ungauged baselines using a simple statistical approach to benchmark the LSTM performance.Gauged baselines represent the monthly averaged hourly diurnal cycle of sap flow for each stand and for each genera present at a stand across the European continent.Ungauged baselines represent the monthly averaged hourly diurnal cycle of sap flow for each genera across the european continent.These baselines are built 10 times on the same randomly selected training data as the gauged-and ungaugedcontinental models and are exclusively derived from the data without using a machine learning model.For all model setups, data were divided such that a single vegetation growing season of tree-level sap flow data is treated as an individual time series and entirely part of either the training, validation or test data (in total 2,279 individual time series).For the gauged-continental models (Table .1) we split the 2,279 individual time series 10 times into 1,140 years (50%) for training, 912 years (40%) for testing and set aside the remaining 227 years (10%) for validation.The ungauged-continental models were developed using a training data set composed of 10 randomly selected subsets, each containing 33 forest stands of the totally available 64 forest stands.These models were then tested on the remaining separate testing set comprising 31 forest stands.To ensure representativeness, the division of training and testing for the ungauged-continental models was stratified, maintaining proportional representation of each forest type as classified by the IGBP.This approach guaranteed that all IGBP forest types were consistently included in each training subset.
By training and testing all models in this study on 10 randomly drawn subsets without repetitions using a so-called Monte Carlo cross validation scheme, we can assess the robustness of our models with respect to the information content of the training data (Maier et al., 2023).Further, it allows us to train network ensembles to assess uncertainties of our sap flow simulations.All numeric input features were standardized by subtracting the mean and by dividing them by the standard deviation of the training data.We assessed each model's performance using the Kling-Gupta efficiency (KGE, Gupta et al., 2009) and its three components: Pearson correlation (r), bias ratio (α), and variability ratio (β), the Nash-Sutcliff efficiency (NSE), and the mean absolute error (MAE).

Performance of Gauged-Continental Models
The gauged-continental models, representing the upper bound in continental performance, were capable of simulating tree level hourly sap flow with an average KGE of 0.77 ± 0.04 (Figure 1a), in comparison to 0.64 ± 0.05 for the gauged baselines (long-term monthly averaged hourly diurnal sap flow cycle for each stand for each genera, Table.1).Looking closely at model performance across tree genera, there is a consistent pattern of high KGE values for Quercus, Fagus, and Pseudotsuga trees.Notably, even Pseudotsuga, the tree genera with one of the smallest data set (80 years), achieves a KGE of 0.76 ± 0.07 in contrast to a KGE of 0.34 ± 0.07 for the gauged baselines.Sap flow simulations of the Picea trees showed weaker performances with KGEs of 0.55 ± 0.06, despite being frequently found in the data (gauged baselines = 0.28 ± 0.03).The amount of data for a tree genera does not directly correlate with the model performance.
Figures 1c, 1d, and 1e illustrate three sequences of hourly sap flow observations and simulations based on the gauged-continental models of a Quercus, Pseudotsuga and Picea tree for five consecutive days.While there is some uncertainty in the simulations (blue band), most models of the ensemble agree on the diurnal sap flow pattern, which is underpinned by the fact that Pearson correlations between different members of the model ensemble are all higher than 0.8.In agreement with the overall findings, the gauged-continental models capture the sap flow dynamics from the shown Quercus tree well (Pearson correlation = 0.9) and also match the absolute values (β = 1.03).The model performance for the shown Picea tree was lower and matches the dynamics to a certain extent (Pearson correlation = 0.77) but misses the absolute values.For the Pseudotsuga, the patterns and absolute values are well matched and even the drop of sap flow during midday for 2 days is matched reasonably well by the gauged-continental models.Model uncertainty across random subsampling is low for both the Quercus and Pseudotsuga tree compared to the Picea tree.

Performance of Specialized Models
Single tree genera models trained on a single genera across all sites (Table.1) did not outperform the gaugedcontinental models.At best they achieved an equivalent performance.Further, single tree genera models were found to be more sensitive to the amount of data, particularly if trained only on a few sites, and can exhibit large performance differences, even leading to negative KGEs at some locations.In contrast, the gauged-continental models remain relatively stable (±0.04 KGE) and are less affected if the data representing a genera is reduced or if the number of stands is varied.Here, no simulation, not even removing a tree genera completely, resulted in a negative KGE.This indicates that (a) the randomly selected training data due to the Monte-Carlo subsampling of the gauged-continental models each hold a similar amount of information about the relationship between input features and sap flow and (b) it opens avenues to extend the training data set by new tree genera even if they have only been measured at a few locations for a short period.
We also compared the outcomes of the single forest stand models and gauged-continental models, focusing on the specific tree genera measured at each site as not all tree genera are present in each forest stand.At certain forest stands, for example, in some locations in central Europe, the performance of the single forest stand models closely mirrors the gauged-continental models for the dominant genera at these sites (Fagus).However, in some locations in South France, performance was lower than the gauged-continental models for the dominant genera (Quercus trees) with KGEs around 0.5 or 0.6 versus KGEs of 0.79.However, despite being trained on more consistent data, owing to the fact that the same team managed each site and primarily the same types of sensors were used for sap flow measurement, no single forest stand model exceeded the performance of the gauged-continental models in any of the forest stands.

Performance of Ungauged-Continental Models
As expected, the performance of the ungauged-continental models is on average lower than that of the gaugedcontinental models yet proved reasonable with an average KGE of 0.52 ± 0.16 in comparison to a KGE of 0.11 ± 0.15 of the ungauged baselines.We found that the performance of the models was particularly affected by the frequency of forest types, despite stratifying random subsampling of the training data by IGBP.In other words, forest type with lower frequencies in the training data had a larger effect on the overall test scores than the amount of samples of a tree genera.The observed performance variance of the ungauged-continental models (standard deviation of 0.16) highlights the variability of the information content in the training data due to the random subsampling.This variability reduces if the model is trained on more observations and on a wider range of forests.For instance, model performance of the ungauged-continental models increased rather quickly while the standard deviation of the ensemble dropped if, for instance, 70% instead of 50% of the data was used for training the ungauged-continental models (KGE ∼ =0.65 ± 0.08).

Gauged-Continental Models: Robust Sap Flow Estimates Across Europe, Overcoming Measurement Uncertainties and Generalizing Across Climates and Tree Genera
In this study, gauged-continental models always matched or outperformed specialized single genera or stand models.Given that SAPFLUXNET uses various sensors and measurement techniques across the globe (see the discussion of the SAPFLUXNET publication during the review process in Earth System Science Data; Poyatos et al., 2021) it is striking that the gauged-continental models trained in this study can generalize tree-level sap flow across diverse climate zones and genera with a performance akin to the specialized models that have been train and tested on more consistent data set in this study (e.g., in terms of sensor type, installing method, tree type, climate zone) and compared to the results described by Li et al. (2022), Loritz et al. (2022).
Gathering sap flow data can result in strong variations even at the same tree, making it challenging to achieve consistent and accurate readings (Steppe et al., 2010).For instance, sap flow measurements taken just a few meters apart in trees of similar genera, size, and height can show significant differences (Vandegehuchte & Steppe, 2013).Particularly absolute values vary, while the overall dynamics, akin to what is frequently found with respect to in-situ soil moisture observations, are typically well-matched if similar sensors are installed in different trees located in the same forest stand (e.g., Hassler et al., 2018;Loritz et al., 2017;Zehe et al., 2010).Differences in absolute values might thereby arise from various small-scale structural characteristics, such as properties of the sap wood or heterogeneous flow paths inside the stem.These factors are typically unknown and not provided to our models as input.The model has, therefore, no way to learn such small-scale differences that might explain why at two similar trees in close proximity different absolute sap flows have been measured.This might be one reason why all models in this study generally learn to represent the dynamics of sap flow well, but can exhibit a bias for certain trees.
Our findings reinforce previous research suggesting that deep learning models, trained on large and diverse data sets, often outperform those trained on more specialized, homogenous data (e.g., Kratzert et al., 2019;Wi & Steinschneider, 2022).The latter depends, however, on the information content of the database which tends to, but does not have to, increase with data set size (Singh & Bárdossy, 2012).The results of our study, as shown in Figures 1c, 1d, and 1e, demonstrate that the Monte Carlo subsampling method developed specifically for our research goal to simulate sap flow, along with the application of network ensembles, significantly enhances our ability to address uncertainties in our model simulations.This approach aligns with recent advancements in data analysis, such as the use of Gaussian mixture models as a headlayer for LSTMs, a technique effectively employed by Klotz et al. (2022) for analyzing rainfall-runoff data.A technique we plan to explore to assess uncertainties within the SAPFLUXNET data set in our future research.

Ungauged-Continental LSTMs Provide Reasonable Sap Flow Estimates at Ungauged Forest Stands, and New Data Have the Potential to Further Enhance This Capability
We quantified LSTMs performance for predicting hourly sap flow at forest stands that were unseen during training.The results show that the ungauged-continental model's performance, although lower than that of the gauged-continental model, was nevertheless still reasonable with an average KGE of around 0.52.We found that the best-performing forest stands were often in the most frequent forest types (e.g., ENF).While the model was capable of predicting sap flow also in boreal and mediterranean forests, where measurements are more scarce, these predictions became less reliable the further they deviated from the training data showing clear limitations of the ungauged-continental model and the chosen training data set.The random subsampling of the training data and experiments with increasing the training data highlight that each new set of sap flow data can make the LSTM more robust, particularly in vegetation types that are less frequently monitored.Given that there are many tax funded large sap flow data sets available in Europe (and likely in other parts of the world) that are yet not openly available and not included in SAPFLUXNET, we argue that our study hints toward the currently unused potential that lies ahead when these data sets are shared in a consistent manner or if new measurement campaigns would be designed specifically to close the spatial and ecological gaps in the SAPFLUXNET data set.The methodology we employed to segment and aggregate sap flow data into distinct time series based on growing seasons ensures that all future sap flow measurements can be incorporated into our model, provided they exceed the minimum sequence length (>24 hr) required by the LSTM.
The ungauged-continental models can make reasonable hourly predictions of sap flow in unseen forest stands for a majority of the European continent.This entails that with prescribed forest type and weather data (e.g., from climate models), it is possible to create hourly sap flow maps for Europe with one, continental deep learning model.Surely these dynamic maps have clear limitations but there are limited options to gather hourly information about plant water use at the tree level at ungauged sites.Furthermore, as we simulate tree-level sap flow based on different forest stand characteristics (DBH, genera) this model could be used to assess different forest structures and their implications for regional transpiration rates and how they potentially change under different forest management strategies.Such dynamic sap flow maps could also be used to evaluate or replace transpiration models that are frequently found in land surface or hydrological models as shown in Loritz et al. (2022).Our study hence demonstrates an avenue toward developing ensembles of continental sap flow models to predict sap flow and evaluate vegetation dynamics around the globe.Future research could explore additional deep learning Geophysical Research Letters 10.1029/2023GL107350 architectures for sap flow prediction and refine feature selection in SAPFLUXNET, potentially integrating satellite data or global reanalysis products to estimate sap flow in forest without ground measurements.

Impact of Local and Global Environmental Factors on Model Simulations
The two continental models showed lower variance during random subsampling of training data than the specialized models.Furthermore, they could simulate sap flow for tree genera not included in the training set, achieving a KGE between 0.2 and 0.4.The latter suggests that forest type information (IGBP) is more crucial for the models than specific tree genera data.This could be due to lesser variability within certain forest types across different tree genera compared to the variability across forest types.
The SAPFLUXNET database includes sap flow measurements with various sensor types, which introduces different levels of uncertainty and affects comparability (e.g., Köstner et al., 1998;Lundblad et al., 2001;Tournebize & Boistard, 1998).However, incorporating sensor type information into our models did not reduce simulation errors nor did the performance for certain types of sensors improve compared to other sensors.This finding suggests that differences in sap flow measurements across sensor types are not systematic but rather somewhat random and underpins the findings of Tournebize and Boistard (1998) that sap flow taken with different measurement techniques in forests is, besides the inherent uncertainty of every sap flow measurement, comparable.
Our analysis emphasizes the significance of tree and forest site characteristics in explaining variations in sap flow, in agreement with the findings of Hassler et al. (2018).Specifically, during our feature selection, we observed that forest type (IGBP) notably improved model performance, while the addition of total stand density or total basal area slightly increased simulation accuracy (KGE increased from 0.79 to 0.81).Given the marginal improvement, we opted to exclude these features from the final model to ensure its wider applicability.Our study encompasses a broad range of climate zones and forest types and show, unlike catchment-level studies (e.g., Hassler et al. (2018)), that climatic factors, particularly long-term average precipitation and temperature, play a crucial role in our models.They are key to explaining variability in plant water uptake and water use strategies across different biomes (Bassiouni et al., 2023).
It is important to note that our findings are contingent on the current state of the SAPFLUXNET data set.As this data set expands, incorporating measurements from similar climate and forest types, factors like the hydropedological setting and more detailed forest stand descriptions may gain importance.Further, exploring the contribution of individual variables through advanced interpretability methods could shed light on their relative importance, serving also as a validation for the model.While this approach is compelling and merits further investigation, it requires a specialized training process for precise evaluation, which is beyond the scope of this study.

Conclusions
This study evaluated the potential and limits of deep learning to generalize sap flow dynamics across the European continent using the SAPFLUXNET database and effectively bridge gaps between scales.We demonstrate that LSTMs achieve reasonable performance in predicting hourly tree-level sap flow for different tree genera and in diverse climate zones and forest types.A key technical criteria for developing robust LSTMs include the random subsampling strategy to account for uncertainties in sap flow measurements and the splitting of the sap flow data into individual time series to enable model training across sites and measurement periods.Training LSTMs on large and diverse data sets and on several tree types proved beneficial compared to specialized models and supported the objective of using LSTMs to generalize vegetation dynamics beyond the individual tree-level sap flow measurement.By leveraging the capabilities of LSTMs to simulate state-dependent ecohydrological processes, we achieved generalization with minimal and broadly available forest characteristics and dynamic meteorological inputs.As more data sets become publicly accessible, we anticipate improvements in the precision and scope of such models, particularly for forest types underrepresented in SAPFLUXNET.Continental-scale LSTMs could significantly enhance our understanding of vegetation water use dynamics reflected in sap flow, potentially serving as critical benchmarks for partitioning evapotranspiration in land surface and hydrological models.

Figure 1 .
Figure 1.Performance and standard deviation (±) of the 10 (a) gauged-continental models and (b) the ten ungauged-continental models measured by the KGE for all testing data (overall; orange) and for each of the six tree genera (genera name; green).Observed and simulated hourly sap flow using the gauged-continental model for five consecutive days in the testing data for (c) a Quercus tree in the year 2011 (France), (d) a Pseudotsuga tree in the year 2012 (Germany) and (e) for a Pinus tree in the year 2001 (Sweden).Blue bands visualize the uncertainty of the gauged-continental model given the Monte Carlo subsampling scheme and the individually trained LSTMs.

Table 1
Description of Model Training SetupsFrom Specialized to Regionalized Approaches All models are trained, validated, and tested in an hourly resolution.