Single-Site Forecasts for 130 Photovoltaic Systems at Distribution System Operator Level, Using a Hybrid-Physical Approach, to Improve Grid-Integration and Enable Future Smart-Grid Operation

With rising shares of photovoltaic (PV) generated electricity in the power grids, PV power forecasting is rapidly gaining importance for energy suppliers and grid operators. But specifically, the distribution system operators (DSOs) require detailed forecasts at higher spatial resolutions, to manage grid congestions and avoid voltage band violations, which are local phenomena. In view of future smart grid operation, the authors propose a forecasting scheme using numerical weather prediction (NWP) data to feed a PV performance model and compute single‐site power forecasts. The intraday outcomes of this rather conventional forecasting scheme are combined with a supervised‐learning model, based on recurrent neural networks (type: long short‐term memory) which is fed by the 15 min measurement data of the two previous days for each PV system. This approach accounts for different local effects, not covered by the NWP‐based physical model, such as partial shading or snow cover. Furthermore, it mitigates the impacts of “imperfect” data on side of the DSOs, used to parametrize the PV performance model. This hybrid‐physical approach has been applied to about 130 PV systems, validated over a 2 year period, and yielded a normalized root‐mean‐square error of 8.3% and 8.7% in 2020 and 2021.

official database in place (in EU, US, Australia, and Asia), only 2 contain information on the orientation (Germany and Austria), and even only Germany include information on the inclination of the mounted surface.
Although simplified models could still result in decent forecast performance on aggregated time frames, [7] as the technical information on the systems is limited, the need for accurate PV power forecasts at high spatial (single-site) and temporal resolution (15 min is common for grid operators and utilities) creates a challenging setting.
Apart from these challenges, physics-based forecasting models will not account for malfunction of the system or local effects on the irradiance, such as partial shading or snow cover, which could be an argument for data-driven methods.
In the following section, our forecasting approach is positioned as compared to the state of the art, and the novelty is outlined. Section 3 presents the forecasting methodology, the metrics for the performance evaluation, and the use case to which the method is applied. The results are described in Section 4, first explaining the overall outcomes of the initial method and then focusing on the improvements by adding a postprocessing method and its specific advantages on the shortcomings of the previous approach. We finish this article by drawing conclusions out of the presented results (Section 5) and giving a brief outlook to further developments in Section 6.

State of the Art and Novelty of the Presented Approach
Although solar irradiance and solar power forecasting is a wellrecognized topic for more than a decade, [8,9] the amount of publications contributing to this research field grew exponentially over the past few years. [10] To appropriately position our approach within this versatile research field, we would like to limit us to our targeted field of application: Considering the specific challenges and needs of grid operators, and especially DSOs, our forecast method is dedicated to single-site forecasts on ID and DA forecast horizon, and builds upon the limited data availability of this stakeholder group, which we are referring to in the following section.
As stated in Blaga et al., [10] solar power forecasting consists of two main tasks: 1) forecasting the solar irradiance; 2) modeling the conversion of irradiance to electric power output along the whole conversion chain of a PV system. Considering the already high level of maturity of PV performance models (if the data on the PV systems is sufficiently available), the irradiance prediction appears to be the decisive factor for accurate solar power forecasting.
The approaches to perform solar irradiance forecasts described in the literature can mainly be categorized into: 1) Statistical models (including persistence forecasts); 2) Machine learning (ML) models; 3) NWP models; 4) Cloudmotion-based approaches (either using all-sky-imagers or satellite images); 5)Combinations of the aforementioned, referred to as hybrid models.
The performance of the respective model type and its benefits compared with the other approaches depends strongly on the forecast horizon. For the application described in Section 1.1, the most relevant horizons are ID and DA, which we define in our study as follows (deviating slightly from the common definitions): 1) ID forecasts: Power forecasts for the same day, computed once per day in the early morning (as soon as NWP data are issued), covering 24 h (00:15-24:00); 2) DA forecasts: Power forecasts for the following day, covering 24 h (00:15-24:00 of the following day).
In the literature, ID forecasts are commonly defined by 1-6 h ahead, [8] often computed including data (e.g., measurements) from the same day, which is actually not the case in our approach, where the NWP forecasts are computed based on previous day observations, and only measurement data of the previous day(s) are used by the data-driven part of our model. Hence, when comparing our approach to state-of-the-art studies, for ID and DA level, we are referring more to DA forecast models.
Analyzing irradiance forecast methods for the DA forecast horizon, NWP models are most common [8] and are often outperforming the other methods, when combined in a hybrid setting using ML or statistical postprocessing methods, as recommended by Blaga et al. [10] Also on the 6 h to DA forecast horizon, NWP-based methods are still predominant, [8] although the lacking comparability and rich versatility of approaches and study conditions make it difficult to give clear recommendations.
Previously described benefits of NWP-driven forecasting schemes and the requirements of the future DSOs as well as their data availability led to the choice of our forecast concept (see Section 3), which can be characterized as follows: as described in the study by Yang, [11] the authors should position their work with regards to the state of the art by classifying: 1) the methods used; 2) by characteristics. Regarding methods (1), we are using NWP data to drive a PV performance model to calculate single-site PV power forecasts in 15 min resolution, from 0-72 h ahead. The output is enhanced by an ML model, which works with measurement data of the previous days from the individual PV systems. Thus, the full approach represents a hybrid model, using an NWP/PV performance model, with ML-based postprocessing. Concerning the characteristics (2) of the forecast scheme, the single-site forecasts of the NWP-driven PV-performance model are deterministic on this level. Once the model output is fed into the ML-based model, in addition to other inputs such as measurement data, the supervised learning algorithm results in a probabilistic single-site power forecast.
The combination of methods joining a statistical or ML method with a physical model, such as in our case, is called the hybrid-physical approach. [8] But the examples in the literature differ substantially in which part of the forecasting is assigned to the statistical or ML method or the physical model: Some use statistical or ML methods in time domains where NWP models are weak(er), such as intra-hour or few hours ahead. Others try to compensate for local effects, which would otherwise be lost due to a coarse spatial resolution of the NWP models. Larson et al. [12] uses NWP data of two models and model output statistics (MOS) to correct for shortcomings in the outputs of one of the used NWP models and compute global horizontal irradiance. Others compute PV power directly, via statistical, data-driven models but the representation of the physical model seems to be rather limited to very specific aspects. [13] In Leva et al., [14] NWP data and measurements are used to train an artificial neural network to perform DA hourly power output predictions of a single PV plant, without any physical models involved.
Our approach differs from those and focuses on the PV performance model, fed by NWP forecasts and uses a data-driven approach to correct for local effects and imperfect data. Such imprecise data can be shortcomings in the parametrization of the performance model, due to lacking degree of details on the individual systems. Local effects influencing the in-plane irradiance and causing deviations from the NWP irradiance predictions could be shading and snow cover.
Regarding the choice of the data-driven model, which should be able to correct for those shortcomings in the data quality, or to identify and compensate for the effects on the local, in-plane irradiance, we were looking for a time-series-specific method. Recurrent neural network architectures such as the long short-term memory (LSTM) [15] and the gated recurrent unit (GRU) [16] have become the standard approach to model sequence data, such as environmental [17] and retail [18] time series. For instance, Abdel-Nasser and Mahmoud directly model PV time series using an LSTM model, and perform one hour ahead forecasts. [17] Distinct models have to be learned for each site they apply the model. Our approach bears similarity to the study by Abdel-Nasser and Mahmoud, [17] by involving direct modeling of PV power time series. But we are able to build upon NWP forecasts, by entering them as covariates in an encoder-decoder LSTM model architecture. Also, our approach outputs probabilistic ID forecasts: In this article, median results are reported, but any set of quantiles can be output instead, possibly accounting for the confidence of the forecasts in a smart grid system. In the study by Wang et al., [19] the seasonal effects typically witnessed in PV power data are modeled using time correlation features and partial daily pattern prediction, whereas the LSTM model is applied separately to scale-free data. They focus on forecasts aggregated at a daily scale. Solar irradiation forecasting has also been addressed in seminal works, [20] but using diagonal recurrent neural networks superseded by the LSTM and the GRU since then.

Methodology and Use Case
The forecast scheme described here is using an NWP model (From European Centre for Medium-Range Weather Forecasts [ECMWF]) feeding a PV performance model [7] to compute single-site forecasts of roughly 130 PV systems in the distribution grid of a small DSO in Luxembourg. The NWP forecasts (horizontal irradiance and ambient temperature), actually on hourly time resolution, are interpolated to yield 15 min. values. This is necessary to be in line with the measurement values resolution and represents the common temporal resolution on ID and DA level in the utility sector and grid operation. The forecast is computed and issued once per day in the early morning, as soon as the NWP data are available, and covers 72 h ahead, starting from midnight of the same day. The 72 h single-site forecasts are split into three forecast horizons: ID (0-24 h ahead), DA (24-48 h), and 2 days ahead (48-72 h). The overall scheme of the forecasting model, representing the part presented in this publication and further described in the following, is depicted in Figure 1.
The irradiance forecasts of NWP models, specifically the model of ECMWF, deliver the best available performance up to date. [21] But in addition to the uncertainties that are inherent to forecasts in general, there are local effects that might reduce the irradiance that reaches the module's surface, in practice. Two simple examples can be partial/temporal shading effects or snow cover.
On a technical level, the accuracy of a PV performance model has its limitations in the availability and accuracy of the data necessary to parametrize the model (nominal power, location, orientation, and inclination of the module surface, etc.). As stated before (Section 1.1), grid operators often lack exact information on some of those parameters. Furthermore, there might be technical malfunctions of PV systems, reducing their power, which cannot be represented in the PV performance model. But since these factors influence the performance of the PV system, those aspects are inherently represented by the measurement time series data of the PV systems. Hence, a suitable approach is to combine a physical PV performance model, fed by predictions from an NWP model, with an ML approach to correct for the influences of such effects described earlier. The forecasting approach presented in this article uses a supervised learning algorithm, based on the LSTM model.
The algorithm, known as DeepAR in the literature, is specifically developed to forecast time series of similar nature, by training a single model. [18] Originally designed for retail data, it comes with multiple magnitudes in data using a scale-free approach. Indeed, multiple orders of magnitude are covered by the demand for various items. In the context of this article, this relates to PV systems with various nominal power and production profiles. DeepAR allows to handle them using a single model, with little to no per-site overhead in model size. Thus, contrary to alternative approaches presented in Section 2, we can address a priori any amount of systems with a single model. In its current form, the DeepAR model has been used to forecast ID time series for a full day for each PV system, computed in the early morning. The model works in an auto-regressive fashion by encoding measurement time series of the two previous days, and recursively forecasting ID steps by reinjecting forecasts as input. In addition, it allows to input additional covariates in view to increase the forecasting capacity of the model. For example, time-related features (e.g., hour in the day, day in the month, season) can be used, as well as custom features. In the context of this article, we use these time features as covariates, but also the NWP/PV performance model-based power forecasts as custom covariates, thus implementing the projected hybrid-physical approach. DeepAR models the series of expected PV power for a given system, and uses the covariates to obtain improved forecasts. www.advancedsciencenews.com www.solar-rrl.com Regarding the application of this hybrid-physical approach by a DSO for an active distribution grid operation, the approach has the advantage that apart from the smart meters of the PV systems, no additional sensors or other hardware are required. Only smart meter measurements, NWP data, and the technical information on the PV systems are necessary.
An additional part of the forecasting approach of our project is a sky-imager-based forecast, to complement the previously described scheme with an approach that could cover the short-term (intro-hour) time domain. This approach seems promising, since the area studied in this project is relatively small, and a set of all-sky-imagers could oversee the region. But this part of our hybrid approach is not discussed in this publication.

NWP-Data and PV Performance-Model-Based Forecast
The majority of forecasting models on DA level and the "longer" ID level, such as 6 h ahead and more, are using NWP data to some extent, either stand-alone or in some hybrid setting, which leads to the best results for those forecast horizons. [8][9][10] We used forecast data from the NWP model of the ECMWF, retrieving global, horizontal irradiation, and ambient temperature forecasts, on a daily basis in hourly resolution. To gain the same temporal resolution as our measurement data and to meet the preferences of the stakeholder in the energy business, the values have been interpolated for sub-hourly values, to get 15 min resolved forecasts.
In its current setting, our model is applied to a set of about 130 PV systems in a rather close area (most of the systems being located within a 4 Â 4 km area). Basic technical data of those systems, such as nominal power, location, orientation, and inclination, is available, to model the output power of the system individually, by means of our PV performance model.
As described by Koster et al., [7] we are using a rather detailed PV performance model, which considers manifold aspects and losses of the PV systems production chain: transmission and reflection losses of the irradiance, temperature, and irradiance depended on module efficiency, degradation and mismatch losses, ohmic losses in AC and DC cabling, inverter efficiency, etc. Most aspects of the model are documented in our previous publication under "Modelling of PV-reference systems." [7] Such a high degree of technical details concerning inverter types, cable length, and cross-section are not known by the DSO, TSO, or utility company, which forces us to use either very simplified models, or to assume standard values for such parameters. In our previous article, we were able to demonstrate that the use of standardized parameters, did not affect the accuracy substantially, hence we are using our detailed PV performance model including standard values also in our current work.
A large dataset for 2 years (2020 and 2021) of 15 min resolved feed-in measurements for 119 PV systems of the modeled 130 system, is available. Datasets for 2020 showed some data gaps for a few systems, due to changes in the metering system, and the datasets for 2021 are almost complete. On the one hand, those datasets were used for the ML-enhanced forecast (training, validation, and testing) presented later, and on the other hand, used for the first calibration step of the model, to reduce bias.
There are different options to set calibration factors or apply statistical enhancement or even MOS to the outputs of the PV performance model. However: a) in view of the later applied ML-based forecast enhancement and b) to yield a good alignment of forecast and production curves, we decided on the following approach: Optimization of the calibration factor for each system by minimizing mean bias error (MBE), led to a minimal MBE over the year, but also to clear misalignments on clear sky days for some systems. This can be caused through periods of "under performance" of the real PV systems, e.g., by shading, temporary malfunctions, etc., which would lead to higher calibration factors. Hence, we choose a set of clear sky days, with smooth, bell-shaped forecast and measurement curves, and set the calibration factors by minimizing the difference between the maxima of the curves. This does not guarantee the minimum MBE over the curse of a full year, but yields a good alignment of those curves. Attenuation of other temporary under-performance effects, would then be left to the ML-based enhancement in the following step.
The forecast routine presented in this article is retrieving the NWP data every morning, once published by the ECMWF. At the same time, it downloads the measurement values for each system from the day before and computes the new solar power forecast for 72 h ahead, starting from 00:00 of the current day. Based on those forecast data, we distinguish the dataset for the current day, called ID in the following, the DA time frame, and the 2 days-ahead forecast horizon, reaching the end of the 72 h period.

ML-Based Approach, Using Recurrent Neural Networks-LSTM
The DeepAR model is an encoder-decoder model: [16] It uses a fixed-size context to encode a state vector, and forecasts a prediction interval (i.e., ID time slots) under the condition of this state vector and the optimal covariates. In the present work, training and testing samples have the 48 previous hours of data as context, and the 24 h to come as prediction interval, for a given PV system. We use 60% of the data for training, 20% for validation, and 20% as the test set. As, for a given day, the correlation between power profiles is large, splitting our experimental dataset randomly with respect to both, PV system and forecasted day, would lead to learning power profiles by heart, therefore heavily harming generalization capabilities in production. Instead, we divide our experimental dataset into training, validation, and test parts according to the day at hand, which prevents prediction intervals to be spilled over training, validation, and test sets. Also, we define a static temporal pattern for cutting the data, allowing for all seasons to be represented evenly in the dataset cut. Also, DeepAR forecasts the parameters of probability distributions (e.g., normal, binomial, chosen depending on the nature and distribution of the data). Empirical forecasting quantiles (if only a point forecast is needed, the median) can then be computed by taking sample paths: the LSTM is used to obtain independent sequences of rolling forecasts (100 in our experiments) by sampling from the forecasted distribution, and recursively reinjecting the sample as input in an autoregressive fashion. In our experiments, we test the normal and the Student www.advancedsciencenews.com www.solar-rrl.com distribution. We also test a normal distribution truncated at 0 (as PV power values may not be negative), which, to our knowledge, has never been tested in the literature.
To implement the hybrid-physical approach, we used PV performance model-based power forecasts as covariates, as they are known beforehand for the time steps to be forecasted.
Furthermore, we involved features characterizing PV systems as additional covariates: the system-identification number as a categorical feature, and system specifications (i.e., orientation, inclination, nominal power, and calibration factor) as continuous features. We performed a large ablation study, which allowed us to evaluate the respective contribution of the technical elements described earlier.
In the end, the best results, presented in the Experimental Section of this article, are obtained using the proposed truncated normal distribution. Using the system-identification number or characterizing features also contributes significantly and favorably to forecasting performance, with the best results being obtained using the system identification as a categorical feature.
In this article, only the ID forecast has been enhanced by the ML approach, while the models to adapt also forecasts on DA and 2 DA horizon, have just been trained and not yet evaluated.

Evaluation of Forecast Performance by the Introduced Performance Metrics
The evaluation of the forecast performance is an essential part of solar forecasting and versatile metrics are established in the literature. [9] In recent years, well-recognized authors have recommended good practice to address the verification of forecast qualities of the models [22] and the related challenges in comparing results published in the research by Yang. [11] To evaluate: a) the performance of the proposed forecasting approach, b) its improvement by combining different methods, and c) to be comparable to other evaluations of forecasting approaches found in other articles, the following evaluation criteria have been chosen in analogy to the previous studies. [9][10][11]22,23] The forecast performance and the improvements by adding a data-driven approach to the method are mainly assessed by using the mean bias error (MBE), to identify systemic bias, and the root mean square error (RMSE) to evaluate the average deviation of the forecast from the measured value. Mean absolute error (MAE) is not given here, since it provides a similar evaluation as the RMSE, but RMSE was considered more appropriate in view of our application to the power sector.
Mean bias error (MBE) is defined as where N represents the number of data pairs, t the time stamp, f the forecast value at time t, and x the measured value at the same time. MBE indicates if the forecast has a systemic deviation toward over-or underestimation of the PV power generation. Since positive and negative deviations will partially compensate for each other, due to the aggregation over the time range, it cannot be used to evaluate forecast performance alone, since it will not evaluate variance or correlation.
RMSE is a common term in the evaluation of forecasting algorithms for solar irradiance [9] as well as for power forecasts from wind and solar systems. This metric evaluates the average difference between forecasted and measured values, which is also often assessed by the use of the MAE, but RMSE gives higher weight to larger deviations. Hence, RMSE is considered suitable for power predictions in utility companies, since large errors are disproportionally problematic in those applications, as stated by Lorenz et al. [24] Root mean square error is defined as where the definition of N, f, and x correlates to the definition of the aforementioned MBE.
As stated in many previous research articles and reviews [10] is the effect of using day-time as well as night-time values in the evaluation. Since solar irradiance and solar power production are zero during night-times, the forecast is trivial and might bias the forecast evaluation toward lower average forecast deviations. Although, it might also have the opposite effect if normalization is applied. [25] Hence, in this publication and in coherence with recommendations in the literature, we are using day-time values only. In practice, all data pairs (x and f ), where both of the values were zero, have been sorted out and were not considered in the performance evaluation. For the probabilistic forecast values, a threshold needed to be defined, since those values never reached exactly zero.
To ease the comparability of the metrics between the different PV systems, but also toward other studies, normalization has been applied. Unfortunately, in some resources, it is not clearly stated to which reference the metrics have been normalized. [22] In solar forecasting, normalization is often referring to an average value (sometimes to a maximum) of a certain period, which is appropriate for solar irradiance forecasting. Nevertheless, in solar power forecasting, and in view of the application to a utility and grid operator domain, normalization to the nominal power is considered more suitable.
Previously described metrics are used to assess the general forecast performance, and the improvements by the enhancement through a data-driven model, and to estimate variations in forecast accuracy under different circumstances, weather conditions, season, or time of day. To enable the comparison of solar power forecast methods, the assessment of the skillfulness of the forecasting scheme is of interest. [22] The comparability of forecasting approaches across studies has repeatedly been addressed in the literature. This underlines the problematics rising from different locations and climates, temporal resolutions, forecast horizons, and data sources apart from the diverse characteristics of the forecasting models. To reduce those influences, the skill score (RMSE based) [11,22] is often referred to as the most suitable metric for such comparison.
Skill score "s" is defined as www.advancedsciencenews.com www.solar-rrl.com where the RMSE(f,x) is as defined in (2), while the RMSE(r,x) is the RMSE of a reference method. The common reference method to be used in solar forecasting is a persistence forecast or "smart-persistence," [11] which assumes that the conditions (e.g., clear-sky index) of the current period remained the same as in the previous time step or interval. Since we are using previous day data to forecast the ID horizon in the early morning hours for the full day, a clear-sky adjusted smart persistence forecast equals the previous day production curve-Yang et al. [22] referred to as 24 h persistence-which is used as a reference in our evaluation.

The Use Case
The PV power forecasting model presented in this article originates from the project "CombiCast" and was developed in collaboration with a local DSO. Electris (a brand of Hoffmann Frères Energie et Bois s.à r.l.) is an energy supplier and grid operator (with 5000 customers) in the region of Mersch, Luxembourg. Currently, about 130 PV system are connected in the distribution grid of Electris, which are ranging from 1.4 kWp to almost 250 kWp, with an average of 17 kWp nominal power, summing up to a total of 2,2 MWp. For 119 of these systems, enough 15 min resolved measurement values are available, to evaluate the forecast performance and to train a data-driven model. The DSO has information on the location, nominal power, inclination, and orientation of all the PV systems, and some additional data for a few systems, but is not considering the possibility of multiple orientations. Smart meter data (in 15 min. resolution) are available for the training and testing of the data-driven model (historical data), but are not accessible in real time. Hence, since those data are available with 1 day of delay, the forecasting scheme is designed to make use of the previous day's data on ID level.

Results and Discussion
The performance of the NWP-based forecast and the ML-enhanced forecast has been evaluated by common performance metrics, such as RMSE, MBE, and skill score "s" (based on RMSE and in reference to a persistence forecast). Only daylight values were used and the results are normalized to nominal power (for more details see Section 3.3). In the following, first, the forecast performance of the pure NWP-based, physical, and deterministic power forecast will be presented. This is followed by an evaluation of the overall improvements, related to the ML-enhanced approach. Finally, some detailed views on specific PV systems, showing characteristics that demonstrate the advantages of the proposed hybridphysical approach, are given.

Overall Forecasts Performance of the NWP-Based, Physical, Deterministic Power Forecast
As described in the use case description (3.4), the DSO provided 15 min resolved measurements of 119 PV systems for more than 2 years (2020 and 2021). In Figure 2 the RMSE, normalized to the nominal power, has been averaged over all PV systems, and depicts the evolution over the years. Normalization allows for averaging and comparison of those data between the systems, but results in sensitivity to the average production of the time period under investigation (here monthly). Hence, the nRMSE drops during months of lower production (winter period, November-February).
On average over all PV systems, the purely NWP-driven power forecast resulted in a normalized RMSE of 9.9% in 2020 and 10.3% in 2021, for the ID forecast, as shown in Table 1. The nMBE indicated a tendency to overestimate, but in a very moderate manner, with nMBE values of 0.6% in 2020 and 1.2% in 2021. The bias could be even more reduced by statistical postprocessing, as mentioned under 3.1, but in view of our application targeting DSOs and active distribution system operation, a precise single-site forecast should be focusing on optimizing the reproduction of the diurnal profile.
Also, the RMSE-based skill score of the pure NWP-driven forecast reaches a good level compared to other single-site forecasts for ID level (computed in the early morning of the same day, covering the full day and using only previous day data). A skill score of 29.6% (in 2020) and 33.8% (in 2021) for ID forecasts is reached, covering the full day, and further increases when evaluating the DA level (32.6% in 2020, respectively, 35.5% in 2021).
Considering the aforementioned comparability issues, [11,22] a direct comparison of performance indicators across studies needs to be interpreted cautiously and shall not be overrated. But some articles using NWP models mentioned by Antonanzas et al. [8] for the time frame "6 h to DA" for single sites  could serve as a reference to compare our ID forecast results (0-24 h ahead). To give some orientation, nRMSE values between 9% and 13% can be found as the best results. Skill scores of 30-40% in the ID time frame have also been mentioned. As shown in Figure 3, it can be observed, that the forecast performance, measured in normalized RMSE, is the lowest for the nearest forecast horizon, as it could be expected, but that the difference between the three is rather small. Evaluated over the whole period of 2021, nRMSE reaches 10.3% for the ID time frame and 10.9%, respectively, 11.1% for DA and 2 days-ahead forecast horizon, as presented in Table 1. Also, the nMBE stays relatively stable, independent of the forecast horizon.
Investigating the forecast performance of each individual system, it can be observed that, for the large majority of PV systems (87 out of 119) the nRMSE lies between 8% and 12%, and only 5 systems show an nRMSE above 14% (see Figure 4).
To identify the source of higher nRMSE for some systems, we performed a visual check of all systems. We analyzed the forecast and production curves and their alignment, checking the system data (nominal power, location, and orientations) as well as observing aerial photos of each system. For some of the less well-aligned curves, we have been able to identify potential reasons: 1) System with versatile orientations: Due to the simplified information on the individual PV systems, as mentioned to be common for DSOs, TSOs, and utility companies, [6] our dataset only contained a single orientation and inclination per system. This is an important parameter in the PV performance model. Visual checks identified 9 systems where PV systems consisted of subsystems of different orientations, which degrades forecast performance; 2) Partial, temporal shading: Based on the production curves and the aerial photos, 5 PV systems could be identified to be partially shaded, e.g., by other buildings or trees, which affected the forecast performance during specific seasons or time of day; 3) Temperature/limited inverter capacity: one PV system does not show a "bell-shaped" production curve during clear-sky days-the curve seems to be cut. This could be indicating temperature problems or nominal capacity limits of the inverter.
1 Apart from those effects harming the forecast performance for specific systems during certain periods, some short periods of "snow cover" could be found, which degraded the overall forecast performance for the whole system portfolio.
In Chapter 4.2, we will investigate the improvements of the forecast performance by the ML-enhanced forecast, specifically for those local-or system-specific phenomena.
The individual skill score of the NWP-based ID forecast for each PV system (in 2021), is shown in Figure 5 (blue dots) and is positive for almost all the systems. This indicates that it is outperforming the persistence forecast, as described under 3.3. The average skill score (33.8%) is already pleasant, with the skill score of 75% of the systems above 31.9%. But 26 PV systems remain with a skill score below 30%, while we expect for most of these, that the DeepAR model shall be able to improve the performance substantially.  www.advancedsciencenews.com www.solar-rrl.com

Forecast Improvements, Based on ML-Enhanced, Probabilistic Forecasts
The improvements of the NWP-based ID forecast by the data-driven post-processing, using the DeepAR algorithm, are in the following analyzed on an overall level, across the whole portfolio of PV systems. In the subchapters, the improvements of the forecast for specific systems, with known forecast deficits, will be addressed. As described under 3.2, our forecasting scheme does not use the ML-based model to forecast solar power production, purely based on historic values. Our ML-based forecast is rather an enhancement of the previously calculated NWP-based power forecast.
The overall nRMSE for all PV systems, estimated for the ML-enhanced ID forecast in 2020 and 2021 is 8.3%, respectively 8.7%, which represents a reduction of the nRMSE by 1.6% absolute, in 2020 and 2021 (see Table 2). The already low nMBE changed to À0.6% in 2020 and 0.2% in 2021, representing an improvement at least in 2021. The overall skill score substantially increased up to 44% and 47.5%, as depicted in Table 2.
Comparing the improved ML-enhanced forecast metrics to the values of other hybrid forecast schemes in the literature, specifically the skill score reaches the top of the range of other methods (up to 40-50% for 6 h-ahead to DA [8] ). Furthermore, although the nRMSE values in the literature reach down to 7%, the nRMSE found for our method lies clearly below the average.
But more than the improvements on the average overall PV systems, the project team was targeting the improvements for those systems that could not be matched very well with the NWP-based forecast. From the visual checks of the alignment of production and forecast curve (NWP-based), it could be observed that in most cases a misalignment of the curves was resulting in a lower nRMSE and skill score (some of the reasons are mentioned in the previous chapter). The ML-based model was able to improve specifically on this temporal alignment of the curves, which results in the improvements shown in Figure 5: It can be observed that the forecast for all PV systems improved, on the yearly level, but specifically, the forecast for the systems showing the weakest skill score improved the most.

Probabilistic Forecasts-Provision of Confidence Intervals
Another advantage of the presented approach is to turn a deterministic forecast output into a probabilistic output, and the provision of confidence intervals. Such assessments of the probability of the provided forecasts are sought-after by many stakeholders in energy provision and grid operation, [8] since it empowers their risk management.
As indicated in Figure 6 by the light green and darker green bands around the ML-enhanced forecast, the confidence intervals of 50% and 90% are given, while the dark green line represents the median of the resulting distribution. The graphs in the picture are the aggregated view of all the individual forecasts, which are plotted by our visualization tool.
In this article, the topic of probabilistic forecasts is not addressed in detail, and no evaluations on their quality are presented, but this is the subject of ongoing evaluations and will be presented in future publications.

Improvements for Incomplete Information on PV Systems with Multiple Orientations
To accurately calculate the PV power production throughout the diurnal cycle, the orientation and inclination need to be correctly Figure 5. Skill score of numerical weather prediction (NWP)-driven forecast 2021 (blue), compared to the skill score of the machine learning (ML)-enhanced forecast (orange), both on the left axis, and the improvement of the skill score (gray lines, referring to the right axis). www.advancedsciencenews.com www.solar-rrl.com defined for the PV performance model. As stated before, working with typical data available at DSOs, TSOs, or utility companies, this type of data is often too simplified, giving only one orientation per PV system. [6] An example is illustrated in Figure 7 for the system-ID no.44 -an installation on a double-pitched roof. The database of the DSO reports an inclination of 37°and an azimuth of 277°(toward West) for the installation of 11 kWp, but ignores that half of the capacity is East-oriented. This results in the misalignment of the NWP-based forecast (Figure 7-right graph). For this specific system, the simplification of the real situation to a unidirectional system caused high systemic deviations of the forecast during the morning hours. The averaged MBE in March 2020 for the morning hours between 8:00 and 9:00 lies around 3 kW. Figure 8 depicts the hourly MBE for each day in the month by the thin lines, and the respective monthly averaged curve (thick, orange line) for the NWP-based forecast. The averaged MBE (thick, blue line) of the ML-enhanced forecast, as shown in the graph, illustrates that this systemic deviation has drastically been reduced by this algorithm.
In case of system no. 44, this resulted in an overall improvement of the skill score from s ¼ 5.1% (NWP-based) to s ¼ 36,7% (ML-enhanced), a rise in skill score by 31.6% difference. Out of the 9 systems, categorized as installations of "multiple orientations," this is the largest improvement. The NWP-based skill score of systems in that category was estimated at 27.9% as     Figure 9 indicates the systemic deviations (by the hourly MBE) during the morning hours (7:00-9:00) in September, and visualizes the (slight) improvement by the blue, thick line, for MBE of the ML-enhanced forecast. The skill score for September, October, and November 2021 rose from 28%, 33%, and 38% for the NWP-based forecast, to   The NWP-based skill score of all 5 PV systems categorized as potentially shaded, is relatively low, with 22.2% in 2021 as compared to the average skill score of all systems of 33.8%. On average, those 5 systems improved their skill score by a difference of 20.6%, while the average improvement of all systems is estimated at 13.7%.

Improvements during Periods of Snow Cover
Snow cover on PV modules, obviously, drastically reduces their performance or even stop the production completely. Forecast schemes based on irradiance predictions and PV performance models would still predict PV power feed-in during those periods, if no other measures are implemented to avoid this.
In our datasets, during the years 2020 and 2021 in the mild middle-European climate of Luxembourg, snow cover over periods of several days was seldom. But the example shown in the graphs of Figure 10 indicates a period from the 8th of February until the 13th, where this system has been reduced in its production, due to snow cover. An overcast weather period (5-9th) with low irradiance prediction and low production measurements can be observed and results in low production forecasts (NWP, in red). NWP-based forecast deviates already from the 8th till the 9th from the even lower production (blue line). But nRMSE and nMBE started rising drastically (lower graph) from the 10th until the 13th of February, when the irradiance forecast was high, while snow is covering the modules.
The green line (in the upper graphic) represents the median of the probabilistic ML-enhanced forecast, and shows much better alignment with the production curve, after observing the deviations of the NWP-based forecast (during the 8th and 9th). It shows that the algorithm might need a certain adaptation period, to adapt its forecast accordingly.
Since the period of snow cover in the dataset was very short and most PV systems were even less impacted than the example shown earlier, due to steeper inclination angles, the overall impact on skill score from the snow cover effect cannot be evaluated. We observe high improvement rates for February 2021 (see Figure 11), but similarly to January and August-the impact of attenuation of the snow cover effect on the forecast cannot be isolated.

Conclusions
With the forecast approach presented in this article, the team targeted to provide and prove a method targeting specific objectives: 1) to meet the future requirements of the grid operators, especially DSOs in view of an active distribution grid management, in terms of spatial and temporal resolution; 2) to limit the data requirements of the forecast scheme to data accessible or soon to be accessible to those stakeholders Thus, aiming at single-site forecasts on ID and DA levels, we proposed an NWP-driven PV performance model, as the basis of the model. In a second step, we proved the ability of a data-driven approach, using an RNN model of the LSTM type, to mitigate the temporarily appearing systemic deviations due to either local effects, such as shading, or lacking data quality, especially on the orientation of the systems.
As a first step, the performance of the NWP-driven PV performance model based forecasts has been evaluated, by forecasting power outputs of 119 PV systems and comparing them to measurements for two full years (2020/2021). The model showed a small bias toward slight overestimations, with an nMBE of 0.6% and 1.2% in 2020 and 2021. The nRMSE of around 10% can be considered a good result, compared to other studies, as well as the RMSE-based skill score of 29.6% and 33.8% for both years, averaged over all systems.
The assessment of the forecast performance for individual systems provided insights on some underperforming single-site forecasts. A visual analysis based on aerial photos could link several underperforming systems to specific effects, such as partial shading or multiple orientations, not represented by the PV performance model.
The forecast data has, in a second step, been introduced as covariates to an LSTM model, which forecasts the PV power output by considering the two previous days production time series. By this, the algorithm has proven to be able to attenuate the impact of previously mentioned temporary effects, and generally improved the forecast performance: 1) Overall, the ML approach succeeded in a further reduction of nRMSE, to 8.3% and 8.7% (in 2020 and 2021); 2) The average skill score overall PV systems (in 2021) improved from 33.8% to 47.5%; and 3) The impact on the already rather low bias, was indifferent, since the nMBE for the 2020 dataset changed from 0.6% to À0.6% and in 2021 from 1.2% to À0.2%.
Moreover, especially the forecasts for systems showing the worst performance metrics were improved the most, which can be seen in Figure 5.
The forecast performance concerning PV systems categorized to be impacted by lacking data on multiply orientations or partial shading has also been improved: 1) Multiple orientations: the reduction of systemic forecast deviation during certain time of day has been reduced and resulted in an improved skill score rising from 27.9% in that category to 43.7%-the forecast and productions curves were much better aligned; 2) Partial/temporary shading: also in this category, the skill score improved, but curves were less well aligned. But the impact on Figure 11. Skill scores per month over 2021 and respective improvements.
www.advancedsciencenews.com www.solar-rrl.com skill score was still high, improving from s ¼ 22.2% to 42.8% on average overall system in that category. Also, temporary limited effects, such as snow cover, were corrected for, but the dataset did not provide much evidence for it, since snow cover over several days, was scarce in the data. Since the ML model uses measurements of the two previous days as covariates, it takes up to 2 days until such effects are fully incorporated, and reflected in the forecast.
Concluding, it can be stated that the benefits of the presented approach lie within its simplicity and that no additional devices or sensors are necessary, while working exclusively with data available to the future DSOs. The improvements of the forecast suffering from local effects such as shading and more complex orientations of the PV systems, is specifically beneficial for small-scale, rooftop PV systems in urban areas, where such effects are very common. Furthermore, the effect of local shading on the forecast quality is more severe in regions with high shares of direct irradiance, where this forecast model could be even more beneficial. This also holds true for regions with longer periods or more frequent periods of snow cover.
Furthermore, the presented approach, using the LSTM-type model DeepAR, provides a probabilistic forecast, which could be a clear benefit, but needs further investigation on the quality of the given confidence intervals.

Outlook/Perspective
As just mentioned, to assess the full potential of the presented hybrid-physical model, in view of its application for grid operators, but even more important when targeting the use by utility companies or any other stakeholder active in energy markets, the quality of the confidence intervals given needs to be assessed.
Furthermore, the team will publish a detailed analysis of the experiments performed and options tested during the development of the ML-based approach, investigating the effects of different parameters used as covariates, versatile distributions, and splits of the dataset for learning, validation, and testing. Those aspects were only briefly mentioned in the methods chapter. Worthwhile testing may be the inclusion of a parameter indicating expected weather changes (regarding irradiance), as compared to the two previous days fed into the ML model. This might help to overcome some performance losses observed under such changing conditions, where the LSTM algorithm tends to carry over previous data trends.
Furthermore, the project team investigates the use of an All Sky Imager, to utilize a cloud motion-based forecast approach to complement the hybrid-physical setup, with the aim to gain accuracy on short-term ID and intra-hour forecast horizons.