Probabilistic Intraday Wastewater Treatment Plant Inflow Forecast Utilizing Rain Forecast Data and Sewer Network Sensor Data

Forecasting of wastewater treatment plant inflow dynamics constitutes an enabler technology for wastewater treatment process optimization using model predictive control. However, accurate inflow prediction is still challenging, especially for strong rainfall events, where complex system dynamics and missing information on future rainfall represent limiting factors. We propose a seasonal probabilistic time series model for modeling the short‐term wastewater inflow accurately while providing quantification of forecast uncertainty. To ensure suitability for practical implementation, the unconstrained parameters of the predictive distribution are modeled as linear functions of the input variables in the framework of Generalized Additive Models for Location Scale and Shape. Non‐linear effects are approximated by Rectified Linear Units, accounting for buffering within the sewer network and flow‐dependent catchment response time. In addition to water level measurements from within the sewer network and rain rate measurements, rain forecasts are incorporated as exogenous regressors, where historical rain forecasts are used for model calibration. The model performance is evaluated on historical data from a German wastewater treatment plant using deterministic and probabilistic scoring rules. We benchmark against an autoregressive time series model and a long short‐term memory artificial neural network. Our results show that the proposed model unites the benefits of high prediction accuracy of the neural network and enhanced intelligibility of the autoregressive model, but accurate real‐time rain forecasts are mandatory for successful real‐world implementation.


10.1029/2022WR033826
2 of 21 model predictive control (MPC) schemes and decision-making processes.Thus, these predictions contribute to enhancing the efficiency of WTP operations and also support the enhancement of climate resilience with respect to the increased risk of heavy precipitation events associated with climate change (Z.Li et al., 2019;Nissen & Ulbrich, 2017;Stentoft et al., 2019;D. Zhang et al., 2018;Q. Zhang et al., 2019).Furthermore, active sewer network control utilizing forecasted flow properties can be employed to optimize the usage of a network's buffering capacity to prevent overflows and WTP overloads and hence enhance wastewater system resilience (Garofalo et al., 2017;Seggelke et al., 2013;Svensen et al., 2021;D. Zhang et al., 2018).
However, it has been shown that predicting inflow dynamics is particularly challenging for strong rainfall events (Pedersen et al., 2016).Whereas inflow dynamics are dominated by diurnal seasonality in case of dry weather (Leitão et al., 2006;Q. Zhang et al., 2019), exogenous information on future rain is required for successful inflow prediction for strong rain events with a sufficiently long forecast horizon (Pedersen et al., 2016;Q. Zhang et al., 2019).Yet, utilizing exogenous information for inflow prediction is challenging due to the complex hydrodynamical properties of wastewater sewer networks (Q.Zhang et al., 2019).
Regarding suitability for implementation into existing WTP control structures, inflow forecasting methodology is subject to further constraints in addition to the challenges posed by complex system dynamics: Low computational demand is necessary in order to generate predictions quickly enough for real-time control.In the context of MPC, the computational demands for the inflow prediction model are less stringent than those for internal MPC models, as it runs only once per timestep to set optimization boundary conditions.However, it must still adhere to time constraints due to the runtime of subsequent MPC computations.In addition, the ability to quantify forecast uncertainty represents an important aspect for sophisticated decision-making (Stentoft et al., 2019;Svensen et al., 2021), rendering probabilistic forecast models favorable over deterministic methods.Furthermore, it has been suggested in personal communication with practitioners and in the literature (Vaughan & Wallach, 2020) that black-box-models like artificial neural networks (ANN) face a generally rejective attitude from practitioners due to security concerns for critical infrastructure.Thus, interpretability and intelligibility represent important aspects to foster trust in new methodology and hence facilitate practical utilization.Finally, applicability to different wastewater sewer networks without manual calibration and re-modeling to account for the specific network structure is advantageous due to low cost and time consumption.
Hydrodynamic, machine learning and hybrid approaches for inflow rate and composition prediction exist in the literature.Machine learning models utilize algorithms to automatically learn patterns and relationships in data, allowing for efficient and scalable modeling of complex dynamics without relying on explicit expert knowledge and detailed network topology data (Bishop, 2006).As a result, they are a focus of current research in intraday inflow forecasting.Used input variables vary throughout the literature with a focus on autoregressive properties of inflow and rain measurements: Zhou et al. (2019) implemented a random forest model for generating probabilistic forecasts of daily inflow using climate data as exogenous regressors, where historical measurements were used as oracles substituting climate forecasts.D. Zhang et al. (2018) used a Long Short-Term Memory (LSTM) to predict future inflow for load-balancing between multiple WTPs and improved buffering capacity usage within the sewer network.Pedersen et al. (2016) demonstrated the utilization of a simple rainfall-runoff model (lumped reservoir model) for predicting the WTP inflow rate's response on rainfall events, proposing a Bayesian update of estimated model parameters, but did not incorporate rain forecasts.Heinonen et al. (2013) utilized rain radar forecasts as input for a hydraulic model to predict intraday WTP inflow.Wang et al. (2019) combined Convolutional ANN and LSTM ANN in order to forecast WTP inflow chemical oxygen demand (COD) from temperature, pH, NH 3 -N, inflow rate and COD data.El Ghazouli et al. (2021) predicted the inflow using an autoregressive ANN with real-time and predicted water consumption as well as infiltration flows as exogenous variables.Langeveld et al. (2017) developed an empirical model for inflow quality prediction, modeling different water quality processes for individual inflow dynamics regimes.F. Li and Vanrolleghem (2022) utilized a multi-objective genetic algorithm to train an LSTM network with respect to influent average behavior and variability.In the reviewed literature, various data-driven methods are explored to forecast the WTP inflow rates and pollution levels across different forecasting horizons, employing a diverse array of input variables.However, none of these methods leverage rain radar forecasts and water level data from the sewer network to generate probabilistic intraday forecasts of WTP inflow, with a particular emphasis on accurate predictions for the periods following rain events.With this context, we outline the major contributions of this research as follows: 10.1029/2022WR033826 3 of 21 2. Due to the probabilistic approach, the generated forecasts yield detailed information on forecast uncertainty from the full predictive distribution.3. We demonstrate that, by modeling the unconstrained distribution parameters as linear functions of the input variables, the non-linear relationships in the data resulting from the system dynamics after rain events can be approximated so well that the inflow forecast uncertainty is dominated by the rain forecast uncertainty.4. The proposed model exhibits good adaptability to different wastewater networks without extensive expert knowledge because the central hyperparameters are tunable based on data statistics and automated parameter search.Furthermore, the utilization of LASSO regularization improves robustness against model overspecification.
In this article, we propose an inflow forecasting model designed with the research goal to develop a machine learning method for probabilistic intra-day inflow forecasting with a forecast horizon of 2.5 hr and a focus on rain events.Here, measurement data from within the sewer network alongside rain forecast data shall be utilized while aiming to provide a balance of intelligibility and prediction accuracy suitable for practical implementation.Furthermore, the method should be designed not to require detailed knowledge of the particular sewer system to which it is applied or reliance on hydrodynamical simulations.Instead, it should generalize to allow coupling with physically-based methods as model inputs, such as a hydraulic surface-runoff model.This flexibility is intended to enhance the model's applicability across various scenarios while maintaining its core data-driven characteristics.
The model was trained with and evaluated on data from a combined sewer network located in the district Weiden of Cologne, Germany, covering a relatively small catchment area of 1,010 ha with approximately 52,300 residents and a population equivalent (PE) of approximately 80,000 PE.The network schematization is depicted in Figure 1.The remainder of the paper is organized as follows: Important system properties considered for the model design and the developed forecasting model are described in Section 2. The used data set is described in Section 3. Experimental results are given in Section 4 and a concluding discussion is provided in Section 5.

Materials and Methods
We propose a machine learning model for generating distributional inflow forecasts from time series data.The model, which is described in Section 2.2, was designed to approximate non-linear relationships arising from the system's physical properties that are described in Section 2.1.Our procedure for generating forecasts using our model specification (Equation 2) consisted of the following steps: 1. Model inputs were measured inflow, measured water levels, measured rain rates and future rain rates (rain forecasts or, in part of our evaluation, historical rain rates as oracles).2. The model hyperparameters were tuned as described in Section 2.3.2 by iterative re-calibration on the training data in order to adjust the model specification to the data of the investigated sewer network.Optionally, knowledge of system properties can be induced in this step by explicitly defining nodes of linear splines for individual inputs where behavior changes are expected.We performed the tuning on a point-forecast model for computational performance and used the best-fit model specification and coefficients as initial values for the distributional forecast model training the next step.3. The model coefficients were estimated using the training data as described in Section 2.3.1.4. Predictions on the test data set were generated using the calibrated distributional forecast model.
The sewer network's properties we considered relevant for the model design are described in Section 2.1 and the model is specified in Section 2.2.

Properties of Considered Sewer Network
The proposed model was designed to address two dominant characteristics of sewer networks that simple autoregressive models struggle to capture because of the emergence of non-linear effects.These characteristics include the wastewater buffering capacity within the sewer network and the relationship between flow velocity and flow rate.
Buffering by storage elements represents a common design principle of sewer networks in order to alleviate the sewer network's limited hydraulic capacity in case of strong rainfall events (Butler et al., 2018).Hence, a strong increase in network inflow does not result in an equivalent increase in future WTP inflow.In the case of the studied sewer network, the inflow rate is limited to approximately 700 L/s after a rain event and remains at this level until the buffering elements are depleted.A storage tank sewer (tank sewer) (Butler et al., 2018) located upstream from the WTP was identified as an important element in the studied sewer network because its state of depletion was expected to provide valuable information on the expected time of inflow decrease after a rain event.In the remainder of this paper, the three phases of rising inflow, near-constant inflow near the hydraulic capacity limit and the following decline of inflow are referred as rise, high and decline.
The dependency of the flow velocity V on the flow rate follows from the Manning equation (Hager, 2010) that delivers a first-order approximation for the open surface flow regime.In particular, the hydraulic radius contributes to V and depends on the water level and the pipe cross-section (Hager, 2010).Furthermore, combined sewer overflows alter the sewer network's hydrodynamic behavior during strong rainfall events, where the open channel flow assumptions do not hold and the measured flow variables yield reduced information on the system state due to hysteresis.Accordingly, the catchment response time (Giani et al., 2021) and the temporal dependency structure between measurements in the network and future inflow are not constant but depend on the system's hydrodynamic state.

Model
In this section, the time series data from one input data source (e.g., one water level sensor location or rain station) is referred as measurement variable whereas derived quantities from the input data are referred as features.The matrix   corresponds to the measured data, where the column vectors are the individual measurement variables' time series.Hence, a measured value of measurement variable v at time t is referred as We propose a Generalized Additive Model for Location, Scale and Shape (GAMLSS) (Stasinopoulos & Rigby, 2008) for probabilistic inflow prediction.Within the GAMLSS framework, the response variable's probability density function (PDF)  f( |) is conditioned on a vector   ∈ ℝ   of N p realized explanatory features.With a PDF with N P parameters indexed by parameter index p ∈ {1, …, N P }, each PDF parameter Θ p is a sum of J p non-parametric functions S p,j and a linear component with parameter vector β p , mapped to the PDF's domain using a monotonic link function g p : The proposed model only incorporates the linear component in favor of model interpretability and computational efficiency.Hence S p,j simply represent the identity and are therefore omitted.Our formulation of x p incorporates features corresponding to the autoregressive structure of the data generating process and seasonal effects.We designed the autoregressive features to approximate the non-linear effects described in Section 2.1 and to model interactions between measurement variables.The resulting expression for  (Θ) is: Here, the first term corresponds to the autoregressive component, and the second term represents the seasonal component.The coefficients     and     were estimated from the data in the model training process, as described in Section 2.3.V, L p,v and T p,v are sets of measurement variables, lags and thresholds.S a , S w and S d are seasonal indices indicating the phase of annual, weekly and diurnal seasonality with corresponding indicator functions 1 w (t) and 1 d (t) to express seasonal dummies (Hylleberg et al., 1993). () denotes B-Spline (Ziel et al., 2016) basis functions for modeling a smooth change of diurnal and weekly patterns throughout the year.The individual components of Equation 2 are motivated and explained further in the following.

Autoregressive Component
As described in Section 2.1, measurements of rain and water levels provide information on the future inflow.The according temporal dependency structure was modeled as an autoregressive process (Hamilton, 2020): (Θ) at time t was formulated as linear combination of past realizations of the input variables with coefficients β p,v,l : (3) Here, L p,v denotes sets of lags determining which past time steps of a each variable v are used.

Non-Linear Effects
Non-linear effects of individual lagged variables were modeled by piecewise linear approximation using Rectified Linear Units (ReLU) (Nair & Hinton, 2010;Ziel et al., 2016).A set of thresholds T p,v was defined for each PDF parameter and variable, dividing the variable's domain into intervals contributing with individual slopes.The according features were constructed as clipped values,  max( [  − ] ) , for each threshold τ ∈ T p,v .In order to include completely linear behavior, the threshold τ = −∞ was included.The choice of appropriate thresholds was part of the hyperparameter tuning procedure described in Section 2.3.The resulting auto-regressive term from Equation 3 with linear approximation for each considered lag of each measurement variable is: The approximation method is illustrated in Figure 2 for the rain event data from Figure 4.By modeling the non-linear effects of each lag, changes of the autoregressive dependency structure throughout different regimes of the network state, in particular the buffering and the flow-dependent dwell-time (see Section 2.1), can be approximated.As a simplified example, consider a level measurement, where a higher level will cause a more immediate effect on inflow than a lower level due to the shorter catchment response time.In this case, the coefficient β p,v,l,τ would be estimated significantly differently from zero for accordingly large τ at short lags, whereas a value near zero would be found for the same τ at long lags.However, it is important to note that the actual relations between coefficients in the trained model are more complex in general, for example, encoding derivatives through finite differences.

Seasonal Effects
As described in Section 3.1, the inflow exhibited diurnal patterns varying throughout the week and the year.The diurnal and weekly patterns were modeled using seasonal dummies as artificial features with value 1 for each time t corresponding to the represented daytime and weekday and 0 else.As components of the feature vector, these seasonal dummies represent a time-dependent intercept by multiplication with their corresponding model coefficients.With the data frequency of 15 min yielding 96 phases a day and three modeled diurnal patterns (working day, Saturday, Sunday), 96 * 3 = 288 seasonal dummies were incorporated for one annual season.
As described in Section 3.1, the diurnal patterns varied throughout the year.The according annual seasonality was modeled using separate diurnal and weekly seasonal dummies for a set S a of four annual seasons.By weighting the resulting 96 • 3 • 4 seasonal dummies with quadratic B-Spline (Ziel et al., 2016) basis functions  () for each annual season a ∈ S a , a smooth periodic transition between the annual seasons' patterns is realized.The resulting seasonal component of transformed PDF parameter  (Θ) is: Here, the seasonal dummies are expressed through a product of indicator functions 1 w (t)1 d (t), which is one if t corresponds to the diurnal phase d and the weekly phase w.

Rain Cumulation and Interaction
Due to the system's hydrodynamic properties, we expected the short-term accumulated rain rate values to influence inflow dynamics.Hence, for the measured and forecasted rain rate, the sum of the N previous values was incorporated as artificial measurement variable, where N = 6 was found to provide the best fit with respect to the Bayesian information criterion (BIC), which is described in Section 2.3.The effect of rain was expected to depend on the current inflow situation due to the system's limited hydraulic capacity.In particular, the effect of rain was expected to decrease for high inflow rates.This was modeled as a two-way interaction term of inflow at t − 1 and rain rate by using the product of the two variables as artificial measurement variable.In experiments where rain forecasts were used instead of rain oracles, addition of two-way interaction of all rain forecast steps further improved the forecast.The piecewise linear approximation described in Section 2.2.2 enabled handling non-linear effects of the interaction.

Choice of PDF and Link Functions
As PDF, Johnson's SU (JSU) (Johnson, 1949) distribution was chosen due to its versatility for modeling non-normally distributed data, while still providing an intuitive parameterization: The JSU represents a transformation of a normal distribution with standard deviation λ > 0 and mean ξ.It provides parameters γ and δ > 0 to control the skewness and kurtosis.
The following link functions were found to provide favorable results in our experiments: Here, Equations 7 and 9 represent shifted and scaled sigmoid functions (Han & Moraga, 1995) and Equation 10corresponds to the Softplus as implemented in the PyTorch framework (Paszke et al., 2019).

Calibration and Hyperparameter Tuning
Training the model given in Equation 2 involved the estimation of the coefficients         from training data and the choice of hyperparameters.

Estimation of Model Coefficients
The coefficient estimation process involved two consecutive steps to facilitate regularization while limiting computational demands during the iterative hyperparameter tuning process.First, a point forecast model was fitted using LASSO regularization (Tibshirani, 1996) to prevent overfitting and perform feature selection.The estimated coefficients were then used as starting values for the coefficients of the location PDF parameter ξ in the probabilistic model's optimization process minimizing a penalized-likelihood (Wasserman, 2004) loss function.
With LASSO, the estimate  β of a parameter vector β is obtained by minimizing the mean squared error penalized by the parameter vector's l1-norm.With the realized response variable y and the design matrix X for N samples, the LASSO estimator is: An important property of the LASSO estimator is the ability to perform feature selection by setting individual parameters to zero.By construction, LASSO selects randomly among sets of highly correlated features (Xiao & Sun, 2019).Thus, regarding the model interpretation, an excluded parameter does not necessarily correspond to low explanatory value of the corresponding feature without considering the data's correlation structure. 10.1029/2022WR033826 8 of 21 The selected features were used for training the probabilistic model, under the assumption that the features selected for the point forecast model were also a viable selection for the probabilistic model.The coefficients for the PDF parameter ξ were initialized with the LASSO estimate.The coefficients for γ, λ were initialized as zero, as this resulted in higher consistency of parameter estimation than random initialization in our experiments and ξ corresponds to the JSU distribution's mean at those starting conditions.The PDF parameter δ was set as a trainable constant, and hence not conditioned on the model inputs, because instabilities were observed in learning and forecasting otherwise.As δ has strong influence on the JSU distribution's higher moments that give high weight to extreme values, its estimation is sensitive to outliers (Barnett & Lewis, 1994).
Parameter estimation for the probabilistic model was performed using ADAM gradient descent (Kingma & Ba, 2014), minimizing a penalized-likelihood (Wasserman, 2004) loss function In Equation 12, the likelihood   is penalized with the mean square error MSE, weighted with a constant α = 10, as our experiments showed that the penalty term improved prediction accuracy with respect to the MSE without reducing probabilistic accuracy, as evaluated using the energy score (see Section 2.4).The parameter vectors for the individual PDF parameters were optimized independently in a cyclic scheme with independent instantiations of the ADAM optimizer, which improved the convergence rate compared to simultaneous optimization.

Hyperparameter Tuning
Hyperparameters for our model include the choice of thresholds T p,v for the linear splines, lags L p,v and the regularization parameter λ lasso .λ lasso was tuned by an exponential grid search with respect to the BIC (Stoica & Selen, 2004) for each model configuration.The BIC provides a measure for the balance between model complexity and in-sample prediction accuracy and can be used for optimizing the out-of-sample prediction performance based on tuning with in-sample data.It penalizes the negative in-sample likelihood   with the parameter count k and the number of observations n: The utilization of feature selection with LASSO facilitated the definition of the thresholds and lags as densely populated sequences without incorporation of extensive expert knowledge for exact definition of each threshold and lag for each variable.The thresholds for each variable were defined as equidistant sequences with variable-specific number of steps, and with minimum and maximum values determined as follows: For rain measurement variables, the smallest measured rain rate from any rain event was chosen as minimum and the 0.95-quantile was defined as maximum.For other measurement variables, the smallest measured value and the 0.99-quantile were chosen as minimum and maximum values respectively.Results for different numbers of thresholds are provided in Section 4. The lags were chosen as the past six timesteps, L p,v = {1, 2, 3, 4, 5, 6}, because using more lags did not yield an improvement with regard to the BIC.

Forecast Study Design
The proposed model's forecasting performance was evaluated on in-sample and out-of-sample data.In-sample evaluation indicates how well the model learns from the training data, while out-of-sample evaluation gauges its generalization to new data.The data set covering 3 years was split into an in-sample data set comprising two consecutive years and an out-of-sample data set for the subsequent year.This division ensured the maximum utilization of training data while still encompassing all annual seasons in the out-of-sample data.
In order to analyze the effects of water level, realized and future rain rate as well as of the individual model components described in Section 2.2, different model configurations were implemented and calibrated with the procedure described in Section 2.3.In addition to the model's overall prediction performance over the complete data set, an analysis for different inflow situations was of interest with regard to the practical use cases described in Section 1.Hence, individual analysis was performed for subsets of the data representing dry weather periods and the rise, high and decline phases after rain events.

Scoring Rules
An appropriate choice of scoring rules with respect to the studied forecasting problem is important to quantify the prediction performance with respect to the realized (truth) values (Ziel & Berk, 2019).The probabilistic forecasts 10.1029/2022WR033826 9 of 21 generated by the proposed model can be reduced to point forecasts by calculating the mean  ŷ ∈ ℝ  for all steps of the forecast horizon H, with t denoting the time of the first forecast horizon step, so that the commonly used (Bjerregård et al., 2021) scoring rules root-mean-square error (RMSE) and mean absolute error (MAE) can be utilized with respect to the truth    ∈ ℝ  .Here, N is the number of evaluated datapoints.
However, while those point forecast scores provide good interpretability, the information on the predictive distribution is discarded.Here, probabilistic scoring rules are available in order to quantify the predicted distribution's correspondence to the unknown distribution of the underlying data generating process.We used the energy score (Gneiting & Raftery, 2007) as a probabilistic scoring rule.The energy score is a strictly proper scoring rule that can be estimated from M independently sampled trajectories  ỹ of the inflow over the forecast horizon with: where β = 1 is chosen in accordance to common literature (Ziel & Berk, 2019).

Statistical Significance Test
We employed the Diebold-Mariano (DM) test (Mariano, 2002) to assess the statistical significance of a forecast A exhibiting better accuracy than forecast B. The DM test is applicable to the present multivariate setting with possible dependency structure between the individual forecast steps (Ziel & Berk, 2019).The DM test resembles an asymptotic test against the null-hypothesis that the mean score difference  Δ is zero with test statistic with the estimated standard deviation    .The results from the forecast study were tested with a significance level of 5%.

Sampling
The probabilistic predictions were generated by sampling M = 80 trajectories for each time t representing the last measurement time before the respective forecast horizon.Each trajectory was constructed using Monte Carlo (Wasserman, 2004) simulation by sampling each step h of the forecast horizon consecutively.For each forecast step, the sampled values of the trajectory's previous steps were used as inflow measurement variables for lags l < h referring to future and hence unobserved values.As a result, the constructed samples represented the inflow's autoregressive dependency structure throughout the forecast horizon.

Benchmarks
In order to benchmark the proposed model's performance, two benchmark models were implemented: First, an autoregressive process with exogenous regressors and seasonal effects (Hamilton, 2020) (SARX) was implemented as a linear point-forecast model that is widely used in time-series analysis.Second, a distributional ANN (Marcjasz et al., 2023) was used as black-box benchmark model trading computational efficiency and interpretability for the ability to learn highly non-linear relationships from the data (Yu et al., 2019).For all benchmarks, the lags L = {1, …, 6} were chosen in order to condition the prediction on a history window equivalent to the proposed model.Also, the future rain oracles as well as the water level measurements were used as exogenous regressors, so that the same information was utilized as with the proposed model.As a linear model, the SARX model is easy to interpret and shows low computational demand.The model was implemented according to the formulation used by the statsmodels (Seabold & Perktold, 2010) python library: In Equation 18,   [w,  − ] refers the inflow (represented by feature index w) at lag l,   [w,  − 1] is the last measurement of feature v from the set V of exogenous regressor indices and d s is a seasonal dummy for diurnal seasonality.Probabilistic samples were generated for each step in the forecast horizon by sampling M = 80 samples from a normal distribution with the point estimate used as mean and the in-sample residuals' standard deviation.
In the distributional ANN model, an LSTM was used to predict the unconstrained γ, ξ, λ parameters of the predictive JSU distribution.The architecture of an LSTM enables selective memorization and forgetting of information throughout lagged data, making it suitable for modeling multivariate time series data that exhibits non-linear relationships (Hochreiter & Schmidhuber, 1997).The PDF parameter δ was set as a trainable constant like for our model as described in Section 2.3.1.For modeling of daily, weekly and annual seasonal patterns, time information was encoded as features using both, sine and cosine functions with position in day, week and year as phase.Hyperparameter tuning for the LSTM model's dropout rate and unit count was performed using the Optuna (Akiba et al., 2019) Python package.

Data
The data set used covered the period from 1 January 2017 to 31 December 2019 and consisted of the following subsets.All data was available with a frequency of 15 min except radar raster data with 5 min.

WTP inflow measurement (inflow)
2. Local rain rate measurement from one station located near the WTP within the sewage water catchment area of Weiden (rain rate) 3. Water level measurements from four locations spatially distributed throughout the sewer system (water level), including the tank sewer located near the WTP 4. Rainfall radar raster images covering Germany, provided by the DWD opendata service (Wetterdienst, 2022)

Sewer System and Rain Measurement Data
Measured inflow minimum, maximum, average and standard deviations were 28.2, 825.7, 177.8, and 151.1 L/s.During dry weather periods, water level and inflow rate exhibited pronounced diurnal and weekly seasonal patterns.In addition, annual seasonality was observed.Diurnal, weekly and annual seasonalities are shown in Figure 3.
Figure 4a illustrates the system dynamics after strong rain events: due to the short catchment response time, the surface runoff dynamics and the broad spatial distribution of the rain event, inflow starts to rise steeply with a short delay of 15-30 min after rainfall onset.Simultaneously, buffering within the network occurs.The water level in a buffering element at a remote location within the network starts to rise immediately, whereas the water level in the tank sewer near the WTP starts to rise later.A proportion of the water from the quickly depleting remote buffering elements arrives at the tank sewer, causing a longer phase of water level increase until depletion starts.Hence, after the hydraulic capacity limit at the WTP has been reached, the inflow remains approximately constant until the buffering elements are depleted and the inflow then decreases quickly toward dry weather levels.
Figures 4b and 4c show inflow measurements plotted against past water level data from a remote location within the network with lags of 15 and 75 min for each, dry weather periods and rise.Two important system properties explained in Section 2.1 are emergent: first, a non-linear relationship between water level and inflow is observed especially after strong rain events due to limited hydraulic capacity, buffering and complex flow dynamics.The non-linear effects are less pronounced in the dry weather regime, indicating that this regime can be modeled well with a linear regression model.Second, the inflow measurement correlates well with the water level from 75 min before during dry weather periods.This indicates that the water level measurements might provide information on future inflow due to the long catchment response time for relatively large forecast horizons.After rain events, in contrast, a pronounced relation is observed for the 15-min lag, but much less for 75 min, indicating the effects of buffering and regime change in flow dynamics with short catchment response time. .Example strong rain event followed by phases of rising (blue background), constant (green background) and declining (red background) inflow with corresponding response of measurement variables (a).Inflow versus a level measurement from within the sewer network from 15 to 75 min in the past for dry weather (b) and after strong rain events (c).Some non-linearity can be observed particularly after rain events.The shown measurements in (b) and (c) cover a period from 1 January 2017 to 31 December 2019.

Rain Forecast Data
Historical rain radar forecast data with sufficiently high frequency of at least 15 min, as required for model training and calibration, was not available.Hence, radar forecasts have been constructed in analogy to the DWD RADVOR product (Winterrath & Rosenow, 2014): The dense optical flow between subsequent raster images has been estimated using the Robust Local Optical Flow (RLOF) (Senst et al., 2012) algorithm.The optical flow was then used to interpolate the 5-min frequency data to 1-min resolution in order to prevent fencing artifacts.The resulting raster images were temporally averaged to 15-min frequency.For the resulting 15-min averaged raster images, RLOF was calculated between subsequent images.The resulting optical flow was then used to extrapolate the 15-min raster images to forecasts with 15-min frequency with a forecast horizon of 2.5 hr.From the constructed raster image forecasts, a time series was generated for each forecasting step by spatial aggregation over the network's catchment area for each image of the corresponding forecasting step.The absolute physical dimension of the rain rate was not preserved throughout the process, which is, however, not relevant for the machine learning inflow forecasting model.

Preprocessing
The water level data exhibited frequent sharp local minima, indicating invalid sensor data.For each measurement station, an individual threshold was defined by visual examination of the data.Datapoints below the given thresholds were replaced by the last valid measurement.In addition, prolonged periods of constant values were present in water level data.According to personal communication, those datapoints represented missing sensor data.The corresponding datapoints were marked as invalid and were excluded from this research.Rain rate data was clipped by a threshold of 0.085 mm/min because our experiments indicated that very high rain rate values did not provide additional information due to the system's limited hydraulic capacity, while causing a reduction of model stability by constituting a sparsely populated extreme data regime.
Due to daylight saving time, a 1-hr shift in diurnal inflow patterns was observed during summer time.One hour of interpolated measurements was inserted at the beginning of summer time and 1 hr of data was removed at the end of summer time in order to achieve consistency with respect to the UTC time for straight-forward modeling of the seasonal patterns.

Results
In Section 2, we proposed a machine learning model for probabilistic short-term WTP inflow rate prediction utilizing rain forecast data and water level measurements from the sewer network.In this section, our results of the forecast study described in Section 2.4 are provided and discussed.In Section 4.1, the proposed model's ability to learn the data dependency structure arising from the system properties is examined without the further inaccuracy added by the uncertainty of rain forecasts by using historical rain values as perfect (oracle) rain predictions.The influence of rain forecast inaccuracy in a real-world scenario is then investigated in Section 4.2 using historical rain forecasts constructed according to Section 3.2.Our analysis primarily emphasizes out-of-sample scores in order to analyze the performance in a real-world scenario.The corresponding in-sample results can be found in Tables S2 and S4 in Supporting Information S1.As we focus our analysis on the situations of dry weather, rise and decline, the scores for high phase periods following rain events are provided in Tables S1 and S3 in Supporting Information S1.Hence, score values mentioned in this section refer to out-of-sample values.Relative improvements refer to the energy score.

Forecast Accuracy With Rain Oracle
Scores over all data and the subsets dry weather, rise and decline are provided in Table 1.The results indicate that the inflow during dry weather conditions can be forecasted accurately by a simple model configuration because the process is dominated by seasonal effects.In case of rain events, however, the approximation of non-linear effects by splines and the utilization of future rain as well as network water level data yielded a significant enhancement of prediction quality.In the following, the effects of the different model design aspects described in Section 2.2 are discussed for dry weather periods and the rise, high and decline phases after rain events.
The energy score for dry weather periods without approximation of non-linear effects, using only the lagged inflow and seasonal terms as regressors, is 37.46 L/s.However, the prediction accuracy for rise after rain events is low with an energy score of 527.11L/s due to the lack of information on the effects of future and past rain.
Utilization of water level data resulted in only a small improvement of 8.28% for rising inflow after a rain event, but a larger improvement of 19.16% for all data, indicating that the catchment response time after strong rainfall events became so short that the level measurements did not provide information on imminent inflow rise events.A considerable improvement of 21.88% is found for the decline phase.For the decline phase, we have identified the water level measurement from the tank sewer located before the WTP as most important level measurement.Its water level provided direct information on the buffering state, as anticipated in Section 2.1.A large improvement by 26.67% is observed for dry weather periods, indicating that exogenous level measurements did indeed provide useable information on future inflow within the catchment response time.In Figure 5, we present an example illustrating how level measurements facilitated the prediction of a transient increase in inflow.The observed spike was not preceded by a measured rain event.Without utilization of the water level data, the spike was not predicted.With the water level data, the model successfully leveraged the available information to predict this spike.We assume a low-intensity rain event that was not measured by the rain measurement station due to it's small size as a possible explanation for this spike.For the successful prediction of the rise phase after stronger rain events, our results show that information on future rain is mandatory.For this phase, incorporation of measured rain data yielded an improvement over the model with inflow and water level measurements of 17.62%.A further improvement by 29.43% was achieved using future rain information.Still, high forecast inaccuracy is observed during the transition from the steep inflow increase after a rain event to the high inflow phase when the hydraulic capacity limit is reached.In this regime, the inflow was often under-or overestimated, depending on the temporal structure of the associated rain event.A visual example is depicted in Figure 5.The incorporation of linear splines for the approximation of non-linear behavior resulted in an improvement over all data of 16.49% for 3 thresholds per measurement variable and a further improvement of 8.29% for 15 thresholds.For the rise phase after rain events, the improvements are 21.82% and 14.76%.The aforementioned overestimation and fluctuations in predicted inflow after inflow rise are reduced, indicating that the non-linear effects were approximated well by the proposed model, as shown in Figure 5.
Figure 6 shows the residuals as a function of a water level measurement with and without linear splines.For high water levels, overestimation of future inflow is present without linear splines, indicating that the non-linear relationship shown in Figure 6 was not modeled well.In contrast, this regime is represented well when linear splines were used.Furthermore, Figure 6 illustrates heteroscedasticity in the data and hence the expedience of conditioning the full predictive distribution on the features.There is also an improvement of 12.51% and 10.26% for dry weather periods.Here, it is important to note that this does not necessarily indicate that pronounced non-linear effects were present in this regime.Without linear splines, if a feature exhibits linear and non-linear relationships with the inflow for different regimes, accuracy in linear regimes is sacrificed for performance over the whole data range.
Both, two-way interaction terms of rain with inflow and cumulation of rain over the last 1.5 hr did not yield an improvement of prediction accuracy if studied separately.However, utilization of both, where the cumulated rain was used for interaction with inflow, resulted in a considerable improvement for rise after rain events by 14.83% and also for inflow decline after the following phase near the hydraulic capacity limit by 7.40%.The behavior of different model configurations described above with respect to energy score is also found for the RMSE and MAE.This indicates that the proposed model was capable to quantify the forecast uncertainty reasonably well.
The proposed model without linear splines outperformed the SARX benchmark model that used the same inputs by 32.01%.It is assumed that the proposed model's advantage originates from the modeling of weekly and annual seasonality and in particular from conditioning the full predictive distribution on the data, as the RMSE difference of 14.10% is significantly smaller.The best-fit configuration of the proposed model outperformed the SARX by 51.88% for all data and 53.49% for rise after rain events due to the effects described above.
The LSTM benchmark model performed similar to the proposed model regarding both, point forecast and probabilistic scores.Over all data, the proposed model achieved a small, but statistically significant improvement of 2.94% over the LSTM.The LSTM results show superior performance for rise after rain events by 6.19%, indicating that the LSTM is suited well for modeling the strongly non-linear dependency structures as well as possibly complex interactions.

Forecast Accuracy With Rain Forecasts
The forecast study results for the models with forecasts used for training and prediction are discussed in the following.Out-of-sample scores over all data and the subsets dry weather, rise and decline are provided in Table 2.The patterns found in the results with oracles, as described in Section 4.1, are also evident in the results with rain-forecasts.However, the forecast accuracy of our best-fit model with rain forecasts is reduced significantly by 26.66% in comparison to our best-fit model with rain oracles.The inferior performance is primarily found for the phases after a rain event, indicating that the rain forecast accuracy can be identified as the limiting factor here.However, a minor increase in RMSE by 6.39% is also found for dry weather, which may also be attributed to the rain forecast accuracy, as false rain events are predicted in some situations.Figure 7 shows a comparison of rain forecasts using rain oracles and rain forecasts for an exemplary rain event, illustrating the impact of rain forecast inaccuracy for a real-world use case.Furthermore, the impact of the rain forecast error on the energy score and on the prediction of the inflow rise starting time after rain events is visualized as a function of forecast time step.Initially, the energy score disadvantage and the rain forecast correlation display short-term fluctuations.These are likely influenced by finite-sample effects and model-specific properties, such as the limited ability to predict convective cloud behavior over extended periods and the limited spatial resolution of the rain forecasts.Furthermore, our model leverages short-term information from level measurements, which may contribute to a slower increase in the energy score disadvantage for short forecast horizons where rain information is less critical.Indications of asymptotic behavior in both the energy score disadvantage and the rain forecast correlation are present for forecast horizons exceeding 2 hr.This observation aligns with our expectation that the convergence of what then becomes substantial rain forecast uncertainty will propagate to the energy score disadvantage.The RMSE of inflow rise start times is significantly lower than that of rain onset times, except for long forecast times.This discrepancy may suggest that the model has learned to offset some of the errors in the rain forecasts.Also, information from level measurements within the network can be leveraged, particularly for short forecast times, thereby enhancing the predictive accuracy.
Considering the benchmarks, the LSTM results show a more pronounced advantage over the GAMLSS for later steps in the forecast horizon than with rain oracles, whereas the GAMLSS still performed better for short forecast   horizons.As the rain forecasts were constructed as time series from radar raster forecast, they exhibited uncertainty in the temporal domain as well as in magnitude of rain rate.Thus, we suggest that the LSTM was able to extract more predictive information from the relations in the temporal domain and the magnitude of the rain forecasts, as forecasts for the full forecast horizon were presented as features to the inflow forecast models.Furthermore, whereas the GAMLSS and LSTM showed far superior performance to the SARX model for rain oracles, the relative performance difference between the GAMLSS and LSTM in the one hand and the SARX in the other hand is smaller.This supports the finding that the rain forecast accuracy represented the main limiting factor for the inflow forecast accuracy.

Discussion
We implemented and evaluated a machine learning model for probabilistic intraday inflow prediction utilizing water level measurements from within the network, rain measurements and future rain as exogenous variables.
The model was specified with respect to dominant properties of sewer networks, with a focus on non-linear effects arising from situation-dependent changes of flow dynamics like flow-dependent catchment response time and buffering of water within the network.In this particular example, prediction using rain forecasts is relatively accurate, as the onset time of inflow rise is predicted well, but prediction is still significantly worse than with rain oracles.(c) Depicts the average energy score disadvantage when using rain forecasts versus rain oracles for all rain events (red), alongside the Pearson correlation coefficient between forecasted and measured rain rates (blue), as functions of forecast time.Larger negative values indicate a larger disadvantage.(d) Shows the root-mean-square error (RMSE) of predicted wastewater treatment plants (WTP) inflow rise start times after rain events, using rain forecasts (red), and the RMSE of rain onset times in the rain forecasts, compared to actual measurements (blue), both as functions of forecast time.
Our results show that the incorporation of exogenous variables provides a significant benefit for the prediction of inflow.In particular, reliable information on future rain is mandatory to achieve a prediction of the dynamics found for strong rain events with sufficient accuracy for practical use.Also, data from sensors spatially distributed throughout the network can provide valuable information to understand and predict the system behavior.However, our results indicate that non-linear effects prevent accurate inflow prediction using a basic linear time series model like the evaluated SARX benchmark model.Using approximation via linear splines, those relationships were represented successfully while still resembling a linear model for the unconstrained PDF parameters, providing interpretability and intelligibility.
Our model generates probabilistic forecasts, providing full information on the forecasted probability distribution of future inflow.Because the predictive distribution's individual parameters are conditioned on the input data, situation-dependent changes in forecast uncertainty are captured in the generated predictions.This is especially beneficial regarding the different flow dynamics of dry weather and rain events.Resembling a machine learning method, the proposed model exhibits good adaptability properties as it does not rely on extensive specification of expert knowledge on the target system.Although knowledge of some of the studied system's key properties like the watershed lag time may guide the a priori choice of included measurement variables and model hyperparameters like the lags and thresholds, feature selection and regularization using LASSO relaxes the demand for exact specification.Furthermore while the model has been designed considering physical network properties, the resulting inclusion of linear splines as a modeling technique offers versatility in capturing other non-linear relationships.Therefore, incorporating additional input variables into the model is a straightforward process through retraining.
Our results using rain oracles suggest that the proposed model as well as the benchmark probabilistic LSTM effectively handle system dynamics, making them viable for use in practical applications like WTP MPC.However, high accuracy of rain forecast data is crucial to exploit the potential of the studied machine learning methods.
Here, the probabilistic LSTM showed superior performance for long forecast horizons if our inaccurate baseline rain forecasts were used instead of the perfect rain oracles.This comes at the price of reduced interpretability and intelligibility, though, and the proposed GAMLSS showed marginally better accuracy for short forecast horizons.We identify the following further limitations of our methodology: 1.As the model is trained on historical data and learns diurnal, weekly and annual seasonality, our methodology assumes that the system operates under climatic stationarity.While our results, in line with the literature, suggest that a data-driven model can be trained on relatively short historic time horizons compared to the gradual changes caused by climate change, the incorporation of online learning and transfer learning methodology may be beneficial to adapt for short-and mid-term changes of climate conditions.2. Being trained on historical data, a central assumption of our methodology is that the structural and hydraulic properties of the sewer network remain constant throughout the training data horizon and during application.This poses a strong assumption and is, aside from the aforementioned benefit from online learning and transfer learning, a good example why an integration of our data-driven method with hydrodynamical models is an interesting perspective.For instance, fast surrogate hydrodynamic models may be combined with the data-driven method using forecast combination techniques.Moreover, hydrodynamic models of suitable network areas may be employed as input to the data-driven model.3.Although both, the LSTM and our GAMLSS exhibited superior performance to the SARX even when rain forecasts were used as inputs, we consider the handling of the complex information from inaccurate rain forecasts for long forecast horizons a limitation of our model and propose further tuning with respect to the used rain forecasting method's properties and the model's response as a subject of further research.4. We used data from a single rain gauge and the rain radar was aggregated to a single input variable.Hence, the dynamics arising from the spatial distribution of rain events were not leveraged.Whereas the analyzed network was small, spatially resolved measured rain rates and rain forecast information may represent an interesting aspect of future research, especially for wastewater sewer networks with large spatial extent.Here, local features of rain events as well as longer catchment response time may provide additional information that can be employed as further regressors.
In addition, the following aspects constitute interesting topics of future research: 1. Probabilistic rain forecast information (Sønderby et al., 2020) hold strong potential as input quantity of probabilistic inflow models because information on the rain forecast predictive distribution can be propagated.
2. While this research focused on the prediction of inflow rates, the adaption of our model for forecasting also pollutant load, which represents a further relevant input for WTP optimization methodology, is of interest.3. Incorporating physical models, such as rainfall-runoff models, as model inputs may be beneficial for explicitly modeling system dynamics where possible and desirable.In conjunction with the aforementioned spatially resolved rain data, this can reduce the complexity of the relationships that our model must approximate.Furthermore, this may be advantageous with regard to the previous point on climate change, as the dynamics within the sewer network do not change significantly with increasing rainfall strength once the network's hydraulic capacity limit is reached.Therefore, changes in rainfall-runoff dynamics might dominate the deviations from the learned dynamics.4. Within the scope of our study, data and resources for building a well-calibrated hydrodynamic benchmark model were not accessible.We consider situations where the adaptation of hydrodynamic models for a particular sewer network is not feasible or desired as well as the combination of hydrodynamic and data-driven models as primary use cases for data-driven models.Nevertheless, benchmarking data-driven models, such as ours, against a hydrodynamic model can be of interest.As our results demonstrate, the prediction uncertainty of our data-driven inflow forecast model and LSTM is primarily driven by the inaccuracy of rain forecasts.
Here, it is an interesting research question whether data-driven methods show an advantage in adapting to and compensating for some uncertainties or suboptimal calibration of rain forecasts.
As a conclusion, our results show that exogenous information on future rain and the network's hydrodynamic state is crucial for inflow prediction with a focus on rain events and the information can be utilized within an intelligible probabilistic machine learning forecast model suitable for practical implementation.However, accurate probabilistic rain forecasts with suitably short runtimes are required for successful application.and Sven Frank from Stadtentwässerungsbetriebe Köln for their technic support, data set provision, and sharing of domain knowledge.Additionally, we thank Mark Oelmann of Hochschule Ruhr West and MOcons GmbH & Co. KG for his valuable assistance in coordinating our research project.Open Access funding enabled and organized by Projekt DEAL.

Figure 1 .
Figure 1.Schematization of the analyzed sewer network, highlighting water level sensors (orange) and the rain gauge (blue) employed as exogenous regressors in our model.

Figure 2 .
Figure2.Illustration of linear regression without (a) and with (b) approximation of non-linear relationship by linear splines using five clipped versions of the explanatory variable with evenly spaced thresholds as regressors.The shown data is the measured wastewater treatment plants (WTP) inflow versus a water level measurement from within the sewer network from 15 min in the past that is also depicted in Figure4.

Figure 3 .
Figure 3.Diurnal median inflow profile for weekly (a) and annual (b) pattern.

Figure 4
Figure 4. Example strong rain event followed by phases of rising (blue background), constant (green background) and declining (red background) inflow with corresponding response of measurement variables (a).Inflow versus a level measurement from within the sewer network from 15 to 75 min in the past for dry weather (b) and after strong rain events (c).Some non-linearity can be observed particularly after rain events.The shown measurements in (b) and (c) cover a period from 1 January 2017 to 31 December 2019.

Figure 6 .
Figure 6.Inflow forecast error versus level measurement from within the network.The best-fit-configuration (a) does not exhibit the overestimation of inflow for high water levels found in the predictions of a model configuration without approximation of non-linear effects by linear splines (b).

Figure 7 .
Figure 7. Probabilistic forecast of inflow after a rain event by the proposed model with (a) rain oracles and (b) rain forecasts.In this particular example, prediction using rain forecasts is relatively accurate, as the onset time of inflow rise is predicted well, but prediction is still significantly worse than with rain oracles.(c) Depicts the average energy score disadvantage when using rain forecasts versus rain oracles for all rain events (red), alongside the Pearson correlation coefficient between forecasted and measured rain rates (blue), as functions of forecast time.Larger negative values indicate a larger disadvantage.(d) Shows the root-mean-square error (RMSE) of predicted wastewater treatment plants (WTP) inflow rise start times after rain events, using rain forecasts (red), and the RMSE of rain onset times in the rain forecasts, compared to actual measurements (blue), both as functions of forecast time.

Table 1
Out-Of-Sample Scores for Different Configurations of the Proposed Model With Rain Oracles

Table 1 Continued
Note.N τ,W , N τ,n , N τ,r , N τ,i : Number of linear splines for inflow, network, rain and interactions.R f , R c : Whether rain forecasts and rain cumulation are used.Bold and italic: Best scores of all models and proposed model configurations if statistical significance is given.Figure 5. Probabilistic inflow forecast by the proposed model with (a) and without (b) utilization of level measurements from within the network.No rain event was measured before the observed inflow spike.(c) and (d) Show a probabilistic inflow forecast by the proposed model for a strong rain event, with and without the approximation of non-linear effects by linear splines, respectively.

Table 2
Out-Of-Sample Scores for Different Configurations of the Proposed Model With Rain Forecasts