Transboundary Rainfall Estimation Using Commercial Microwave Links

Unlike actual rainfall, the spatial extent of rainfall maps is often determined by administrative and political boundaries. Similarly, data from commercial microwave links (CMLs) is usually acquired on a national basis and exchange among countries is limited. Up to now, this has prohibited the generation of transboundary CML‐based rainfall maps despite the great extension of networks across the world. We present CML based transboundary rainfall maps for the first time, using independent CML data sets from Germany and the Czech Republic. We show that straightforward algorithms used for quality control strongly reduce anomalies in the results. We find that, after quality control, CML‐based rainfall maps can be generated via joint and consistent processing, and that these maps allow to seamlessly visualize rainfall events traversing the German‐Czech border. This demonstrates that quality control represents a crucial step for large‐scale (e.g., continental) CML‐based rainfall estimation.

The data sets vary significantly with respect to spatial distribution, frequencies, and lengths.The Czech data set has a higher CML density in populated regions (e.g., the city of Prague), whereas in Germany the CMLs are more evenly distributed.In Germany, CML frequencies essentially vary between 12 and 39 GHz while in the Czech Republic the data set comprises approximately 30% E-band CMLs with frequencies above 70 GHz.The CML length in Germany is above 1 km in 99% of the cases.In the Czech Republic, 26% of CMLs have a length below 1 km and 1% even below 0.1 km.
These differences have a strong effect on the sensitivity of the path attenuation on rainfall, that is, on the CML's detection limit, a parameter used throughout this study.The detection limit quantifies the minimum rain rate that is required to induce an integrated path attenuation of 0.33 dB, which is a standard signal quantization of the CMLs.Also, it roughly determines the precision of the retrieved rain rate.The detection limit is dependent on frequency and length which have a wider range of values in the Czech data.Hence, also the range of detection limits is larger in the Czech data.The Czech CMLs have detection limits from 0.04 to 132 mm/hr, whereas the detection limit of the German data lies mostly between 0.2 and 1 mm/hr, and only two German CMLs exceed a detection limit of 2 mm/hr.Note that a crucial difference in this regard is the presence (for the Czech Republic) and absence (for Germany) of E-band CMLs which are generally rather sensitive to rainfall.The E-band CMLs that are longer than a few kilometers have an exceptionally low detection limit below 0.1 mm/hr.This property is beneficial for sensing light rainfall but these highly sensitive CMLs are also more prone to experience very high attenuation.Strong rainfall can therefore even lead to a loss of connectivity along the CML as the receiver fails to record below a certain level (see Polz et al., 2023 and the description of blackout gaps in Section 3.1).

Reference Data RADOLAN-RW
We use RADOLAN-RW as a reference.It is a product of the German Weather Service (DWD) based on 17 C-band weather radar stations supplying gridded rainfall information on a 1-by-1 km spatial and hourly temporal resolution.The radar information is adjusted to over 1,000 rain gauges by additive and multiplicative correction schemes (Bartels et al., 2004).
RADOLAN-RW was chosen as it is the official real-time product for quantitative precipitation estimation of DWD and has been used in other studies like, for example, Graf et al. (2020).Despite the fact that two countries are considered in this study, RADOLAN-RW serves as the only reference.This is done to avoid additional potential error sources stemming from combining independent reference data sets.Unfortunately, there is no gridded data set of comparable resolution and quality available that covers both Germany and the Czech Republic completely.Nevertheless, the border region considered in this study is covered by RADOLAN-RW to a large extent (see Figure 1), and hence RADOLAN-RW can be considered a suitable reference.
We compare the estimated rainfall maps with RADOLAN-RW directly (Section 4.3), but also, we evaluate the rainfall retrieval based on the CML paths (Section 4.2).For this, we calculate the weighted sum of pixel values of RADOLAN-RW taking into account the length of the pixels' intersections with the CML paths.

Methods
The CML processing algorithms are primarily based on those applied in Graf et al. (2020), which originally were adjusted to a purely German CML data set and a different period than considered in this study.The processing can be subdivided into two aspects: (a) dealing with erroneous data, i.e., quality control, which is particularly relevant for opportunistic data with their potentially high number of error sources associated with engineering details rather than atmospheric aspects; and (b) rain rate retrieval and mapping which involve steps that are related to well understood challenges but nonetheless are associated with considerable uncertainties.The difference between the German and Czech CML data sets required some adaptations and extensions to the quality control part of the established algorithms.

Quality Control
Investigating the data sets reveals the necessity for quality control algorithms.Before describing the individual algorithms conducted in this regard (Steps 1-6 below), we summarize the main observations that justify them.The justification is largely based on the physical limits of rainfall, its statistics and the knowledge of how rainfall can and cannot be reflected in CML observations.For most described patterns, an example is given in Figure 2. We observe anomalous data points and periods in several CMLs.For instance, a limited number of unreasonably low or high values leads to spikes in the time series.Moreover, the time series of some Czech CMLs show periods in which the baseline of received signal levels (RSL) drops to values far below the median and then stays at approximately (but not exactly) this level for several minutes, hours or even days before it leaps up again.We refer to these patterns by plateaus.Furthermore, we encounter gaps in the time series of the RSL at presumably rainy periods, when the signal before and/or after the gap is significantly lower than the median of the whole time series.Those gaps are considered blackouts, that is, they are gaps caused by a failure of the receiver to process RSL values below a certain threshold in the case of heavy rainfall (Polz et al., 2023).Moreover, we observe short gaps in the time series, due to outages in the acquisition or other technical aspects.These are not considered blackouts if the RSL is not particularly low before or after the gaps.
In addition to the period-based observations, there are issues that affect CMLs as a whole.Most prominently, several CMLs show high fluctuations throughout the raw signal time series.These fluctuations may occur in daily or random patterns and are often clearly stronger and affect more time steps than the fluctuations induced by rainfall.Moreover, the Czech data include CMLs with very high detection limits.These CMLs are, by definition, not capable of measuring weak rainfall.Furthermore, they are less precise even if the rainfall exceeds the detection limit.A reason for this is the quantization of the recorded signal which does only allow a coarse estimation.
The above mentioned observations led to the definition of the following steps that are applied to improve data quality.Of those, the first four affect only single data points of the time series, while the latter two affect CMLs as a whole.

Step 1: Removing Specific Fill Values
Missing values are often given as numerical fill values, for which there is not a strict convention.We set signal levels to missing values if they have any of the following values: −99.9, −99, 255, or approximately 1e37.Of course, other fill values might occur in other data sets.Nevertheless, this step is defined in a specific way as each fill value might have a different unknown reason and meaning, which makes it useful to identify them and to address them directly.

Step 2: Filtering Plateaus
The plateau filter applies to data points that fulfill both of the following conditions: (a) the centered rolling maximum RSL of three data points is below −85 dB, and (b) the centered rolling standard deviation of RSL is smaller than 0.5 dB.Additionally, data points that are adjacent (next and second to next) to such plateaus are filtered.The threshold of −85 dB was chosen as the distribution of RSL exhibits a peak for lower values, which is not explicable by rainfall induced attenuation.

Step 3: Filling Blackout Gaps
Following the approach of Polz et al. (2023), we consider a period of missing values to be a blackout gap if the last RSL value before the gap or the first value after the gap is below −65 dB.In this step, these gaps are filled by the lowest RSL recorded by the CML over the whole month.Note that the maximum period that can be filled does not exceed 1 hr, i.e., at maximum half an hour after a gap starts and half an hour before the end of a gap.The gap is not filled at all if its length exceeds 1 hour.

Step 4: Filling 5-Min Gaps in the Time Series
The steps above depend on RSL.Step 4, in contrast, is based on the total loss, i.e., the difference between transmitted and RSL.If there are gaps in the total loss time series and if they do not exceed 5 min, we interpolate them linearly.If they exceed 5 min they will remain unaffected by this step.

Step 5: Filter Due To Fluctuations in the Time Series
This filter comprises two tests: (a) the 5-hr rolling standard deviation of the total loss exceeds 2 dB at least 10% of the time; (b) the 1-hr rolling standard deviation of the total loss exceeds 0.8 dB at least 33% of the time.All CMLs that fulfill at least one of these conditions are removed in this step.

Step 6: Filter by Detection Limit
The detection limit is defined as the minimum rainfall required to induce an observable change in the signal of a CML.It is calculated via the frequency and length of the CML, a fixed quantization of 0.33 dB, and the k-R relation with parameters defined by ITU-R (2005).Step 6 removes CMLs with a detection limit of at least 2 mm/ hr.This threshold was chosen heuristically but based on the fact that a large proportion of the rainfall amount in the Central European climate can be attributed to rain rates below this value.Hence, CMLs that cannot sense such low intensity rainfall are neglected.

Rain Rate Retrieval and Spatial Interpolation
We calculate rain rates via a procedure of four steps based on the total loss time series of individual CMLs: (a) a classification of wet and dry periods based on the rolling standard deviation over 1 hr, (b) a subtraction of the baseline that is assumed to be constant during a rain event, (c) correction for signal loss induced by wet antennas based on the method proposed by Leijnse et al. (2008), (d) calculation of rainfall based on the k-R-relation with parameters defined by ITU-R ( 2005).The minutely rain rates are aggregated to hourly rainfall amounts which are then used to estimate rainfall maps via inverse distance weighting (IDW).For further details we refer to Graf et al. (2020), who explain the rain rate retrieval and interpolation in more detail.We follow their approach with the following two exceptions.First, we always classify the blackout gaps that have been filled according to Step 3 (see Section 3.1) as wet periods; second, we only consider neighbors in a maximum distance of 30 km in the IDW algorithm.

Analysis Setup
In the first part of the analyses (Section 4.1), we show the effect of all the steps considered for quality control.In Section 4.2, we quantify the quality of path-averaged rain rates in a comparison to RADOLAN-RW along those path using the performance indices mean absolute error (MAE), bias (BIAS), and Pearson correlation coefficient (PCC) defined as follows: (2) where μ(.), cov(.), σ(.) are the mean, the covariance, and the standard deviation over time, respectively.R CML and R RAD are the CML and the RADOLAN-RW hourly rainfall amounts, respectively.Finally, in Section 4.3, the rainfall maps are evaluated qualitatively.
For the latter part of the analyses, we distinguish three processing lines which differ in the selection of steps used for quality control (see Section 3.1).We refer to these processing lines by the terms No Filter, Graf 2020, and Full.In the No Filter case, only the basic Step 1 is performed.In the Graf 2020 case, additionally Steps 4 and 5 are performed.These steps have been adopted from Graf et al. (2020) and hence, this processing line represents a current standard approach of dealing with data quality adjusted to a purely German data set.In the processing line Full, all steps defined in Section 3.1 are performed.The steps are always conducted in the order used above.We analyze the effect of the different processing lines in Sections 4.2 and 4.3.In Section 4.1, we analyze all the mentioned steps conducted for quality control and do not distinguish between the processing lines.

Effect of Quality Control Algorithms
Figure 2 showcases the effect of steps introduced in Section 3.1 and provides statistics on the amount of data that is affected.The first example shows that there would be extremely high rain rates toward the end of the shown period, if the plateau filter was not active.The second example shows how blackout gap filling can help to capture a rain event that otherwise would have been missed.In the third example, a strongly fluctuating CML yields rain events far too often and without correlation to the reference.The fourth example presents a CML with a high detection limit; although it captures most of the rain events the amount is generally far too high and even minor changes in RSL suggest strong rain.In the latter two examples, the respective CMLs are removed completely from the analysis when considering the Full processing line.
The lower part of Figure 2 shows that only a small amount of data is affected by the plateau filter and the blackout gap filling.Affected hours are defined as hours in which at least 10 min are labeled either as a plateau or as a blackout gap.Only for the class of data points that are associated with high reference rain rates and either very low or very high detection limits, the plateau filter affects a larger share.For the blackout gap filling there is a clear positive correlation between the amount of affected hours and reference rain rates.Moreover, CMLs with lower detection limit are affected more often.Similarly, mostly the CMLs with low detection limits are affected by high fluctuation.By definition, the filter based on the detection limit affects only the class with the highest detection limit.

Path-Based Quantitative Analysis
We analyzed path-averaged rain rates of the CMLs in the border region for 1 month by comparison to RADOLAN-RW along the CML paths and analyzing the performance metrics.Figure 3 shows CML quantities and the performance metrics dependent on detection limit, the kind of quality control algorithms, and the country.
The boxplots indicate the spread over the CML dimension.
The different range of detection limits of the two data sets can be seen in the upper row of Figure 3.A reduction of the number of CMLs can be observed in the figure, and explained by the filtering involved in the quality control steps.While this reduction affects the data sets of both countries, it is clearly more pronounced for the Czech data set.Starting from No Filter the additional filtering of Graf 2020 affects CMLs of all classes of detection limits.
The additional filters (Full) almost only affect the Czech CMLs and primarily the ones of high detection limits.
The performance metrics depend on the detection limit.All three metrics deteriorate toward the classes of high detection limits.This can be seen by worse median values and, in a more pronounced manner, by the worse mean values, and the marked skewness of the distributions, especially for the detection limit classes >0.5 mm/hr.The BIAS additionally shows a general increase with detection limit: While CMLs with very low detection limit (e.g., the ones of E-band frequency) tend to underestimate the rainfall amounts, the ones with high detection limits tend to overestimate.The effect of the detection limit can mainly be seen in the Czech data where each class contains a considerable number of CMLs.For Germany, the effect is less clear due to the small number of CMLs with detection limits above 1 mm/hr in the German data set.Nonetheless, outliers in MAE and BIAS are more prevalent for the detection limit class of 0.5-1 mm/hr compared to the class 0.1-0.5 mm/hr, also in the German data.Note that for readability not all outliers of MAE and BIAS are shown in Figure 3.
The performance metrics also depend on the three processing lines and the two countries.Considering the effect independent of the detection limit, that is, focusing on the shaded parts of Figure 3, the following observations can be made.A reduction of outliers with extended processing can be observed throughout.While the median of the MAE varies only little for the different processing lines, the number of outliers is clearly reduced by the enhanced processing.This can particularly be seen for the Czech data by the improved mean values: for example, the MAE of the Czech CMLs has the values 0.19, 0.19, 0.11 mm, for the No Filter, the Graf 2020, and the Full processing lines, respectively.A similar observation can be made for the BIAS, where the medians are very close to zero for all processing lines, but where the mean values of No Filter (0.46) and Graf 2020 (0.46) are clearly higher than those of the processing line Full (0.02).Independent of the country, the PCC improves with increasingly effective processing both in terms of the median (from 0.89 to 0.91 to 0.93 for the German data, and from 0.84 over 0.84 to 0.86 for the Czech data) as well as by a reduction of the number of outliers and increasing mean values.
The effect of the processing lines can also be seen within individual classes of detection limit.Especially for the category of CMLs with detection limits in the range 0.5-1 mm/h, the extended processing affects the mean of MAE and BIAS strongly, while the effects on the quantiles depicted in the boxplots are small.This shows that the extended processing mainly reduces the number of outliers and thereby their influence on the metrics.

Rainfall Maps
Figure 4 shows rainfall maps for an event (21 June, 21:50 to 22 June, 4:50) that traverses the German-Czech border.The CML-derived maps capture the event well.That is, via all of the processing lines (first three rows in Figure 4) it is possible to generate maps that reproduce the overall pattern of the event.
Nevertheless, particularly for the No Filter processing line several shortcomings can be observed.For example, there are spots of overestimation.These appear most prominently in the cities and towns in the Czech Republic where the CML networks are dense (e.g., in Prague located within the red square in the upper left map of Figure 4, and Strakonice encapsulated by the orange square in the panel of the last time step).Moreover, there are white spots of underestimation within the rainfall field, particularly, at the time stamp 01:50 (highlighted by a purple square).Furthermore, for the first two time steps in which the rainfall is mostly located over Germany (region highlighted by magenta square), the high spatial variability as well as the high amounts observable in the reference is only weakly represented in the CML-derived maps.
Positive effects of extended quality control algorithms can be observed by comparing the different processing lines.The spots of overestimation in the Prague region are present in all time steps for the No Filter case, and also in the Graf 2020 case, but not anymore when applying the Full processing.The local false rainfall in Strakonice is already removed via the Graf 2020 processing.The extended processing also helps to reduce some of the white spots that appear while the rain event is located over the westernmost part of the Czech Republic (e.g., time step 01:50), though several of these spots persist.The underestimation in the time steps 21:50 and 22:50 is reduced from the No Filter to the Full processing lines, albeit the representation of the spatial variability remains limited.

Discussion and Conclusion
We found that two individual CML data sets can be processed consistently with acceptable results even when applying algorithms that had been adjusted to only one of them and for a different period.However, while this holds for many CMLs and over most periods, it produces unrealistic rainfall amounts in some situations, which, despite their rarity can have strong influence on the maps.
Thereby, this study confirmed that it is crucial to deal with quality control when using CML data for rainfall estimation.Not only the frequency and the length distributions that determine the detection limit of the CMLs, but also unreliable periods or gaps in the time series of individual CMLs need to be considered.Some issues such as blackout gaps and CMLs with high fluctuations in the signal exist in both data sets.Others, like the periods we refer to as plateaus are only observable in the Czech data set.Global processing algorithms are required that address the individual characteristics but still allow a consistent treatment of all available data.The need to extend the set of algorithms developed for one data set when applied to a different independent data set, shows precisely the degree to which established routines are transferable, and where they are insufficient.
We applied and analyzed quality control algorithms which we partly adopted from Graf et al. (2020) and partly developed in this study.These algorithms involve filtering, that is, a reduction of the amount of data, which is generally not desirable.However, filtering is less problematic for generating rainfall maps if the sensor density is high in relation to the resolution of the map.In this study, the majority of filtered CMLs is in the Czech Republic and often in the cities where the network is dense enough so that the loss of several devices with questionable observations is justifiable.

Figure 1 .
Figure 1.Data overview.Left: Sensor locations with the analyzed border region defined by the black box; the shaded background shows the coverage of RADOLAN-RW.Right: Distribution of frequency versus length of commercial microwave links (CMLs) within the analyzed region (German and Czech CMLs in upper and lower panel, respectively); dashed lines show levels of detection limit.

Figure 2 .
Figure 2. The effects of quality control algorithms.Four exemplary time series that have been treated by one of the steps of Section 3.1 are shown in the upper part.Statistics on the abundance of similar occurrences are presented in the lower part.The left column treats period based steps, and the right column steps that affect commercial microwave links (CMLs) as a whole.In the left column affected means that at least 10 min per hour are either filtered (plateau filter) or filled (blackout gap filling).In the right column, the percentage of CMLs affected by either fluctuation or high detection limit is shown.Note that this analysis relates to the processing line Full in which all the steps of quality control are conducted.

Figure 3 .
Figure 3.A path-based quantitative analysis for the whole month (June 2021).The commercial microwave links (CMLs) are categorized into classes of detection limit and three different processing lines are shown by different color intensities of bars and boxplots.By definition, CMLs are not available in the Full processing for the highest class of detection limits.The first row shows the amount of CMLs in each class.The latter three rows show the mean absolute error (MAE), the BIAS, and the Pearson correlation coefficient, respectively.The shaded part of the figures considers all CMLs independent of their detection limits.The left and right column consider the German and Czech CMLs, respectively.For Germany, two classes of detection limit contain very few CMLs and for those the metrics are shown as individual points for each CML instead of boxplots.Note that not all data points lie within the presented range of values for the MAE and the BIAS.

Figure 4 .
Figure 4. Maps of a rainfall event (21 June, 21:50 to 22 June, 4:50) (time progressing from left to right).The first three rows are interpolations based on commercial microwave links for the different processing lines.The bottom row is the reference RADOLAN-RW.A comparison of the Full processing and RADOLAN-RW in a movie sequence can be found in Blettner (2023).