Measurement data‐driven investigation of the actual power grid resilience with increasing renewable energy feed‐in

The transformation of the energy system towards a sustainable reduction of CO2 emissions and the consequently rising contribution of fluctuating energy sources to the German energy system leads to new challenges. In particular, the sufficiency of the present powerline system is of interest. To evaluate the German network sufficiency, a data‐based modelling is used to measure the number of intervals with insufficient transport capabilities as well as the yearly sum of infeasible power transports. In order to do so, the actual residual loads are calculated from the power data in the database “GEOWISOL” and the data on the German power grid are obtained from the project “SciGrid.” The temporal resolution of power data is 15 min, and the spatial resolution refers to the 2‐digit ZIP code regions. The present transmission network is shown to be insufficient in a linearly increasing number of intervals for scenarios with increasing renewable power production. The uncertainties of the results presented are also investigated by means of a Monte Carlo simulation with the known uncertainties of the power data. The simulation results in a relative uncertainty for the number of intervals with insufficient transport capabilities of the network of roughly 1%. Thus, it can be used in the future either to investigate network expansion plans or to validate network expansion studies with simulated data or synthetic networks.

capabilities of the transmission network, because windparks are mostly placed in rural regions, while the most demand appears in densely populated areas.
To investigate the transport capabilities of the German power transmission network in present and future scenarios, a power flow model is required. The model data for the power generation and the power output (demand) must satisfy the following four requirements: • (R1) Representing reality: The data must be obtained with minimal assumptions and have a temporal resolution of 15 min and a reasonable spatial resolution. • (R2) Completeness: The data must be complete for the investigated area (here: Germany) and time (here: one year). • (R3) Currentness: The data must be up-to-date. • (R4) Quality: The uncertainty of the data must be known in order to assess the uncertainty of the simulation result.
The first requirement is set to ensure that the modelling is performed as realistically as possible, while at the same time creating an implementation that is not overly resource-intensive in both obtaining the data and modelling the power grid in the resolution of the obtained data. The second requirement is set to reduce the number of artefacts caused by gaps in the database. The third requirement is set to improve the contribution of the investigation to the discussion on the expansion of renewable power production. The fourth requirement is set to assess the trustworthiness of the proposed method and its results. The power generation and output data can then be used in combination with a network model and a power flow algorithm to form a power flow model. Such a power flow model finally enables realistic predictions for the sufficiency of the power transmission network.
The challenge of transporting the produced power has been investigated in conflict to R1 with simulations without any measurement data 3 or with simulated power production data. 4 Other studies work with data that is derived from weather reports or measurement data with a limited spatiotemporal resolution of 1 h in the time domain and 40 km × 40 km in the spatial domain such as 5 and therefore represent the reality inadequately and do not satisfy the requirement R1. The project METIS 6 investigates the effects of sector coupling in the network and includes a hydrogen grid, but uses datasets of the year 2012 of power production and therefore the project does not satisfy the requirement R3. Another work 7 studied the capabilities of demand coverage of wind and solar power in the US, but used weather data from 2006 to 2008 which also does not meet the requirement R3. Smith et al 8 also investigated the power grid in the United States of America and even included investigations on the European one, but focused on the interplay of the expansion of power production and grid extension rather than evaluating the present grid. Lastly, Beiter et al 9 investigated the power grid in a databased fashion, but used demand data from the year 2012 as well as wind data from the years 2007 to 2013, which does not meet requirement R3. Finally, no publication included the quality of the database regarding its uncertainty, so that requirement R4 is not satisfied. As a result, a data-driven investigation that satisfies all four requirements is missing.
Therefore, the aim is to first build up a database of measured power generation and demands that satisfies the requirements R1 to R4 to combine it with the network model and to run a power flow algorithm in a second step. Lastly, a data-based power flow simulation can be achieved by using the database, the network model and the power flow algorithm. The aim includes the evaluation of the sufficiency of the present power grid in different scenarios. Furthermore, the uncertainty of the results is asserted using a Monte Carlo simulation.
In Section 2, the overall approach is introduced. It consists of the combination of the power database, the network model and the power flow algorithm on the network. In Section 3, the quality of the investigation regarding the sufficiency of the network is evaluated by estimating the uncertainties of the resulting network sufficiency. In Section 4, one exemplary application of the combined power and network datasets and the algorithm is shown: The algorithm computes the network sufficiency of the present German transmission network in different future scenarios with an increasing amount of renewable energy generation. The paper closes with a conclusion and an outlook.

POWER FLOW MODEL
In order to run a data-based power flow simulation, the following three steps are conducted: • The residual loads are calculated as difference between power demand and renewable power generation for each region i and each interval t k . Power shortage with respect to the demand is filled up with conventional sources. • The network data are obtained and processed to suit the resolution of the power database. • The power flow algorithm introduced here is used in each interval t k to determine whether the network is sufficient to conduct all power transport demands.
With the database of power generation and demand introduced in Section 2.1, the obtainment of the network data described in Section 2.2 and the power flow algorithm introduced in Section 2.3 as prerequisites, the investigation of network sufficiency is finally performed. In order to investigate the power transmission network, the power gap P gap is introduced first as the difference between all renewable power generation and all outputs of the network for an interval t k : Note that the term P RE i t k used here is the sum of P wind i t k , P solar i t k , P water i t k and P biomass i t k . One requirement here is the equality of power generation and demand, and therefore, the power gap is closed either by conventional power sources or by curtailment of renewable power generation. The residual load is now defined as follows: Here, the terms P C i,rel and P RE i,rel represent the relative incidence of conventional and renewable power generation: Note that the term for conventional power P C i t k used here is the sum of P oil i t k , P nuclear i t k , P brown coal i t k and P hard coal i t k . The normalized terms P C i,rel and P RE i,rel are used here to fill the gap between power demand and renewable power production in the first case and to curtail the renewable power production to match the power demand in the second case.
The investigation includes the calculation of the intervals t INC with insufficient network capabilities (INC) as well as their number N INC . Note that k represents the time index and ranges from 1 to 35,040 over one year. Furthermore, the amount of power P NPG that the network cannot transmit (the network power gap) is calculated for each respective interval. As equation, the network power gap reads: where P TD i→j denotes a transport demand from region i to region j and P TD,S i→j denotes the successfully transmitted power. Both N INC and P NPG are calculated as absolute and relative quantities, denoted by a tilde. The relative quantity Ñ INC is normalized to 35,040, the total number of intervals for one year, whereat P NPG is normalized to the yearly sum of all transport demands.

| The database of power generation and output data
Before the power transport can be modelled, the residual loads are calculated with Equation (2). They are the difference between power demand and renewable power generation, and are adjusted here by adding conventional power input or curtailing renewable power generation to satisfy the equation: The individual summands in the equation are taken from a database, which is an enhancement of the GEOWISOL database introduced in. 10 The database also includes the four conventional power sources oil, brown and hard coal and nuclear power, that are used to satisfy the full power demand. The temporal resolution of all entries in the database is 15 min, while the spatial resolution refers to the 2-digit ZIP code regions of Germany. There are 95 regions, and the number of residents is roughly similar for all regions at 850.000. 11 The power demand data were gathered in two steps: first, the time series of the 50 largest distribution network operators were obtained. These power demands correspond to roughly 90% of the total demand. The remaining demand is added according to the relative distribution of industry in the respective region, as it is caused most likely by the power demand of heavy industries that are not monitored by the distribution network operators. This distribution is obtained by our partners in the industry. As an example, the yearly sums of the power demand are plotted in Figure 1.
Regarding the wind power, the collected data represent 3.5% of the total installed wind power in Germany. In particular, the wind data only cover 52 of the 95 ZIP code regions, because either the operating companies do not share the data or anonymization requirements have to be complied. Therefore, the completion of the wind power data including the upscaling to the real wind power was performed in the following way, exemplified with ZIP code region i: (2) 3. In order to estimate the real wind power P wind i in the respective region, the normalized wind power wind i t k of the region is upscaled by the total installed power P wind installed,i in that region: Regarding the solar power, the collected measurement data represents ca. 10% of the installed power in Germany and it covers all 95 ZIP code regions. In order to estimate the real solar power, the measurement data for each ZIP code region is scaled up by the known total installed power values of all installed solar units in the respective region. As an example, the yearly sums of the generated wind power and solar power are plotted in Figure 2A,B, respectively. The generated wind power is unevenly distributed with a generally higher production in the northern German regions. The solar power is more evenly distributed over Germany.
The biomass power generation, the water power generation as well as the conventional power generation from oil, nuclear, brown coal and hard coal has been obtained from the ENTSO-E transparency platform. 12 As an example, the yearly sums of the power generation by biomass and water power are plotted in Figure 3A,B, respectively. The biomass power generation is distributed over Germany, but the water power generation is generally higher in southern German regions.
As another example, the yearly sums of the four conventional sources are plotted in Figure 4. Other wind i With the collected power generation and power demand data the residual load in Equation (2) can be calculated. The yearly sum of the residual load (RL) without the filling of conventional powers is finally shown in Figure 5. Note that the mean residual load over one year is calculated with the respective algebraic signs of the power values and not by using absolute values. An interpretation of the yearly sum of the residual load in each ZIP code region is the amount of energy that the region exhibits or requires, if it would have an ideal storage system. According to Figure 5, regions with an average power surplus (green) and regions with an average undersupply (red) occur. In conclusion, the oversupply of some regions and the undersupply of others is a result of the different distribution of power generation and power demands.
As intermediate result, the calculated residual load obeys the requirement of minimal assumptions and adequate resolution (R1). The residual load also is complete in the time and space domain and therefore fulfil the second requirement (R2). The residual load is also adequately up-to-date (R3). The power database is an important intermediate result as it opens the opportunity to investigate scenarios with different expansion aspects on a database of measured power data. The calculated residual load is now the first prerequisite to perform a power transport simulation.

| Obtainment of network data and the network model
The network data have been gathered by the SciGrid project and are taken from the SciGrid website. 15 The data consist of 511 vertices and 836 powerlines between the vertices. The vertex data include an ID for reference, its geographical coordinates, the network operator and the operating voltage. The powerline data include the IDs of the startpoint and endpoint vertices, a powerline ID, the network operator, the operating voltage and frequency, the powerline length and resistance. Note that the operating voltage serves to calculate the maximum capacity of the powerline. The calculation obeys the German industry standard 50182, 16 published by the German institute for standardization, where the thermal expansion of the wires limits the transported power. The temperature limit defined is 80°C.
In order to use the vertices and links obtained from the SciGrid project, the data have to fit the resolution of the power generation and demand data. Therefore, it is necessary to map the 511 vertices to the German ZIP code regions according to their geographical coordinates. There are 39 vertices with coordinates outside of Germany, which are omitted directly. The remaining 472 vertices are then mapped to their ZIP code regions. Second, the powerlines are assigned to their startpoint and endpoint vertices with the respective IDs. Third, to simplify the network, all powerlines with startpoint and endpoint vertex in the same ZIP code region are omitted. Lastly, all vertices in one ZIP code region are merged into one ZIP code vertex and the powerlines between two ZIP code regions were also merged into one powerline, see Figure 6. For powerlines without data on resistance R or reactance X, the values R = 0.04 Ω/km and X = 0.32 Ω/km were used as these values were the most frequently occurring ones in the dataset. The transport capacity of the resulting powerline C 1 → 2,n+m equals the sum of its components C 1 → 2,n and C 1 → 2,m .
The resulting simplified power transmission network is shown in Figure 7B. It contains 95 vertices and 280 powerlines and enables the combination of the residual loads and also the application of the power transport algorithm in order to satisfy the residual loads of each of the 95 ZIP code regions from Equation (2). In comparison with other network models such as, 17,18 the model here is simpler as it has roughly 75% fewer network vertices, but it is more complex than models that investigate the power flows in only one region 19 as they focus on the distribution power network. The presented network has the minimal necessary complexity to investigate the power flows and the respective network sufficiency of the German transmission network.

| Power flow algorithm
In order to model the transport of power through the German transmission network, the PYPOWER F I G U R E 5 Yearly sum of residual loads for 2019 without renewable expansion. Values in TWh. Coastal regions profit of the offshore wind parks. Coal regions appear green due to the filling of the power gap F I G U R E 6 Example of the reduction of the SciGrid dataset to fit the power data. In (A), 2 powerlines from ZIP code region 1 to ZIP code region 2 exist, but they are merged. The capacity C 1→2,n+m of the resulting powerline is the sum of the components C 1→2,n and C 1→2, m F I G U R E 7 Reduction of the SciGrid network data to fit the resolution of the power data. In (A), there are all 472 vertices as well as 836 powerlines between them given in the SciGrid dataset. In (B), the simplified network with fewer vertices and powerlines is shown. It contains of 95 vertices and 280 powerlines implementation of the Newton-Raphson algorithm is accessed by using the python package Pandapower. 20 It models an AC power flow and is used here without any reactive power injections. First, the residual loads are translated into either a generator or a load at the networks nodes that are called buses. Second, after the placement of the residual loads on the respective buses, the powerlines between the 95 ZIP code regions are placed and their electrical properties are derived by means of the rules of parallel electrical connections. Third, the Newton-Raphson algorithm is used to evaluate the direction and amount of power transmitted on each powerline as well as the respective phase angles of each network node. The resulting phase angle differences have typically small single-digit values, and therefore, they are not further examined in the investigation. Each powerline can either be sufficient with a line load of below or equal to 100% of the transmission capacity or overload with a line load of above 100% of the transmission capacity. Note that the occurring power losses are neglected here.
The procedure is repeated for all of the 35,040 intervals and each interval with at least one overload powerline increases the number of intervals with insufficient transport capabilities N INC by one. The power that cannot be transmitted within the standard DIN EN 50,182 increases the network power gapP NPG .
This model is adequate to evaluate the network sufficiency, since it uses realistic assumptions for powerline transport capacity. As a result, the constituents of the power flow simulation are complete and the quality of the method can be examined.

| QUALITY EVALUATION OF THE DEVELOPED POWER FLOW MODEL
In order to calculate the network sufficiency and the network power gap with assessable quality, the uncertainties u(N INC ) and u(P NPG ) have to be estimated or calculated.
Since an analytical calculation is not possible due the use of an iterative optimization technique in the implementation of the Newton-Raphson algorithm, the uncertainties are estimated with a Monte Carlo approach according to the international guide to expression of uncertainty in measurement (GUM). 21 For this purpose, the relative uncertainty of the measured values of the power data in the database are estimated with the expert knowledge of our project partners in the respective industry to be u P wind i = u P solar i = . . . = 5 %. Derived from the uncertainties of the power values, the residual load as sum of the power values also has a relative uncertainty of u(RL i ) = 5%. For the Monte Carlo simulation, the residual load data of 100 randomly chosen intervals were perturbed with 100 different perturbations that are predetermined by the uncertainty of the residual loads (see Figure 8A). Since the only attributes known about the uncertainty of the power values are the limits, and since the GUM provides the principle of maximum entropy, the perturbations are drawn from a uniform distribution following this principle. Note that these uncertainties only account for the measurement devices and data completion, but do not include uncertainty from the assumptions made. The number of perturbed intervals with insufficient network capabilities is calculated for each of the 400 randomly chosen intervals and the distribution of the resulting number of intervals with network insufficiency is shown in Figure 8B. The distribution obeys roughly to a Gaussian distribution and Ñ INC can be estimated.
For the investigation of the uncertainty of the network power gap, an operation point with an intermediate network power gap is chosen first. The residual load is then perturbed with a uniformly distributed perturbation as before. Next, power flow algorithm is used to calculate the network power gap as declared in Section 2. The procedure is repeated 200 times with a different perturbation, and the resulting values for the network power gap are plotted in a histogram (see Figure 9B). The resulting histogram is then used to estimate the standard deviation P NPG of the distribution. As a result, the uncertainty of the relative network insufficiency can be quantified to Ñ INC ≈ 1.2%. The uncertainty of the network power gap however can be quantified to P NPG ≈ 0.5%. In conclusion, the requirement R4 is satisfied as the uncertainty of the results, and therefore, their quality is known.
Note that the resulting uncertainty includes only the uncertainty taken from the Monte Carlo approach, yet it is incomplete without the uncertainty contributions of the network segmentation and the leading assumptions, for example, that no power is transmitted to Germany's adjacent neighbours. The considered uncertainties are in an arguable size to evaluate the sufficiency of the network regarding the handling of all transport demands in different future scenarios.

HIGHLY RENEWABLE SCENARIOS
According to sustainability goals of the German government of reaching 65% renewable power in the electrical sector in the year 2035 and 80% renewable power in the year 2050. 1 The percentage of renewably covered electrical power is called scenario factor SF, which is defined by the relation Note that renewable power P RE (t k ) includes onshore and offshore wind power, solar power, biomass and water power. The present scenario is the 49% scenario, but the two planned scenarios as well as the 100% scenario will be investigated in the following. In order to conduct the investigation, an expansion factor EF is calculated for each scenario. It is then used to uniformly increase the renewable power generation. For each scenario with the given scenario factor SF, the equation for the upscaling of the renewable power generation reads: Before the residual load RL is calculated for each interval, the respective power gap P gap (t k ) with respect to the expanded renewable generation must be calculated: Now, for each interval in 2019 the residual load RL in region i is calculated according to the extended version of Equation (2), which takes the expansion factor into account: These corresponding transport demands are then applied to the network model with the power transport algorithm described in Section 2.3. This procedure is repeated for each 15-min interval in the investigated year 2019.
The number of intervals with insufficient network capabilities and the missing power transport capabilities defined in Equation (4) are evaluated and shown in Figure 10. The results in Figure 10A show an increase in the number of intervals with insufficient network capabilities for an increasing scenario factor SF. The offset most likely originates from the assumptions made here, precisely the assumption that power transports to adjacent neighbours of Germany do not exist. The increase appears to be linear with a coefficient of determination of 0.996. The results in Figure 10B also show a linear increase in the transport demands that the network cannot satisfy.
F I G U R E 9 (A) Rectangular probability distribution of one exemplified residual load RL of 1000 kW, (B) resulting distribution of relative network power gap P NPG . The results obey roughly to a Gaussian distribution with P NPG ≈ 0.5% The linear fit also has a coefficient of determination of roughly 1. The order of magnitude of the network power gap is in the range of hundreds of gigawatts. For the sake of completeness, the results have also been plotted relatively to the total number of intervals in Figure 11A or the sum of all transport demands in all intervals in Figure 11B.
Both the investigation on P NPG and N INC show that a network expansion is necessary to satisfy the transport demands in the planned scenarios. The example also shows the potential of the suggested data-based model to produce results with reasonable uncertainties and thereby evaluate the transmission capability of the German power grid in different scenarios.

| CONCLUSION AND OUTLOOK
In conclusion, the proposed method is a tool to model the network sufficiency of the present transmission power network based on real measurement data. The method consists of a database of measured power generation and demand data, network data of the SciGrid project and the application of the Newton-Raphson algorithm. After an introductive section, the three parts of the overall approach were introduced in the second section: the power database, the obtainment of network data and the power flow algorithm. In the third section, the quality of the investigation was estimated with a Monte Carlo approach that led to an assessable uncertainty of the results. It has been shown that the method evaluates the network sufficiency accurately and the uncertainty of the results is u(N INC ) ≈ 1.2% for the number of intervals with insufficient network capabilities and u(P NPG ) ≈ 0.5% for the network power gap. The considered uncertainties have their origin in the power sensors on each observed wind turbine or photovoltaic inverter. Note that the uncertainties from the assumptions made are not included here. Finally, the expansion scenarios planned for the next decades were examined and used as examples for the power flow simulation. The number of 15-min intervals N INC with insufficient network capabilities and the respective network power gap P NPG are shown to grow linearly with the scenario factor SF.
The investigation evaluates the network sufficiency in highly renewable scenarios and states that the present network is increasingly insufficient for higher expansion rates of the renewable powers. The results emphasize the importance of the network development and network planning, but also indicate the significance of decentralized power generation that does not stress the power transmission network. With the proposed method, the network development can be done virtually and on the basis of measured data to help optimizing the transition to a highly renewable energy system.
As outlook, storage systems will be introduced to the power transport model to produce an even more realistic data-based model of the German energy system. The interaction of storage systems and transport networks is of special interest, since the combination can lead to economically or ecologically optimized control strategies. Another application of the proposed method is its use for virtual network extension and thereby increasing transmission capacity in a cost-effective way. Finally, another F I G U R E 1 0 (A) Number of intervals with insufficient network capabilities (INC) plotted over the scenario factor. A linear connection is visible. In (B), the network power gap P NPG is plotted over the scenario factor. Here, again, a linear relationship can be seen F I G U R E 1 1 (A) Relative number of intervals with insufficient network capacities gap. Ñ INC plotted over the scenario factor. In (B) the relative network power gap P NPG . The results are plotted over the scenario factor important application for the proposed method is the validation of studies of the energy system that used simulated datasets instead of measured ones.