Homogenization of German daily and monthly mean temperature time series

Long time series can be potentially influenced by breakpoints. This study presents an automatic procedure to detect and homogenize breakpoints in German daily and monthly mean temperature time series. To verify breakpoints metadata information is used. The homogenization tool is evaluated with a synthetic data set with the result of smaller mean bias and RMSE than the unhomogenized data. For homogenized German temperature data in most cases, smaller and less breakpoints are detected compared to the raw data. The mean differences of trends before and after homogenization is small (0.05 K/length of each time series). The result of the homogenization is presented in three case studies. After homogenization, the trend in these cases is closer to the calculated trend for Germany calculated using a gridded data set.


| INTRODUCTION
To study climate change, long homogeneous time series are needed.These time series can be influenced by effects not related to the climate.These features can result in an incorrect interpretation of climate change.For that reason long time series have to be evaluated on inhomogeneities and homogenized if breaks are detected.
Changes in the geographical position of the station (station relocation), measurement environment (e.g., growing trees), measurement practice (e.g., observation times), measurement systems (e.g., automatization) and used formulas and time ranges (e.g., to calculate daily mean or daily extreme values) can potentially result in breakpoints in long time series (WMO-No.1245, 2020).One example of an inhomogeneous temperature time series is Hohenpeißenberg, analysed and homogenized in the study of Winkler (2009).Breaks in the Hohenpeißenberg time series are caused by altered glass of the Palatina thermometer, recalibration of the thermometer, drift of the zero point caused by the glass composition and changes of the observing times (from 07:00, 14:00 and 21:00 to 08:00, 14:00 and 20:00) influencing the daily mean temperature values.Trewin (2010) summarized effects leading to breakpoints in long temperature time series.For example station relocation (even a few meters) can change the temperature measurements drastically as well as changes in the screen type used to protect the thermometers against radiation (Böhm et al., 2010;Brandsma & Van der Meulen, 2008;Brunet et al., 2011).There are different homogenization methods to meet different homogenization requirements.Most methods are using reference time series calculated with nearby stations ('relative methods).In most cases, annual or monthly temperature data are homogenized (Auer et al., 2007;Begert et al., 2005;Hannart et al., 2014).Some homogenization software (e.g., MASH, PRODIGE and ACMANT) are evaluated during the European project COST ES0601 (HOME).After the project the software HOMER was developed (Mestre et al., 2013) which is used in many homogenization studies (e.g., Delvaux et al., 2019;Skrynyk et al., 2019;Yosef et al., 2018).
To homogenize daily data, different statistical methods are developed (compared to the homogenization of monthly or annual data).Daily data have a higher variability and is influenced by autocorrelation.Breaks in daily time series not only cause changes in the mean value, they also can influence the higher order moments.For that reason, most methods adjust the distribution in contrast to only adjusting the mean value.One example is the method 'HOM' developed by Della-Marta and Wanner (2006).With the help of a highly correlated reference series, the time series are homogenized using a nonlinear model (set up in homogeneous subperiod).Correction factors are defined for every decile of the cumulative distribution function.For that reason, the complete distribution can be homogenized.'SPLID-HOM'is another homogenization software able to adjust the higher order moments with an indirect nonlinear regression method (Mestre et al., 2011).Further examples of daily data homogenization are Toreti et al. (2010), Kuglitsch et al. (2009), Hewaarachchi et al. (2017), Lund et al. (2007) and Squintu et al. (2018).
In Germany, the impact of automatization on the homogeneity of long time series of selected parameters is analysed by using parallel measurements of 13 German climate reference stations (as it is recommended by WMO-No.100 (2018)).The results of daily sunshine duration are summarized in Hannak et al. (2019).Kaspar et al. (2016) analysed the impact of automatization on long air temperature series using the data of German climate reference stations.The differences between manual and automatic temperature measurements on traditional observing times are small.These investigations conclude that in Germany, there is no risk of inhomogeneities in long time series of daily mean temperature caused by the automatization.To study the effect of breakpoints and homogenization during the time period of parallel measurements, Hannak et al. (2020) homogenized the parallel temperature measurements and compared the results to the study of Kaspar et al. (2016).The results are comparable, the breaks during the parallel measurement period (about 10 years of data) compensate each other and the mean differences between automatic and manual measurements are close to zero.The procedure to homogenize the time series is similar to the one used in this study with some adaptations.
In this study, German daily and monthly mean temperature time series are tested on breakpoints and homogenized using reference series (calculated with neighbouring stations).The procedure to test and homogenize the data works totally automatically.The first part describes the homogenization of German daily mean temperature series.The procedure to homogenize daily data is evaluated using a synthetic data set produced by Killick et al. (2019).Afterwards the homogenization of German monthly mean temperature series are introduced.To get a consistent data set between daily and monthly data, the daily homogenized data are combined with the monthly homogenized time series (which in some cases go further back in time).With some case studies, the results of the homogenization are demonstrated.In the final part, the key findings of the study are discussed.

| DATA
In this study, German daily and monthly mean temperature series are tested on breakpoints and homogenized if needed.Further information of the station used can be found online in the climate data center (cdc) of the German weather service (DWD) (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/cli mate/monthly/kl/historical/).
The observation instruction and measurement systems were not unique in Germany in the past and changed over time (Kaspar et al., 2013).For example, for daily mean temperature values the traditional equation after Kämtz was used in the past in Germany at most of the stations (Kämtz (1831), see also Kaspar et al. (2016)).This equation is based on three observations per day with double weight on the evening value.In the former German Democratic Republic, daily mean values were calculated by using different observing times compared to the western part of Germany (6:00, 12:00, 18:00, 24:00 instead of 7:00, 14:00, 21:00).Since the automatization of temperature measurements, an arithmetic mean over 24 hourly values per day is used operational.Kaspar et al. (2016) show small mean difference between these two equations.Additionally, the number of stations and the available temporal resolution of the data changes over time (see Figure 1).
The German temperature data sets are quality controlled for example the operational data are checked for completeness and consistency (Kaspar et al., 2013).For example, the measured values are compared with monthly threshold values and manually controlled when the threshold value is reached.The thermometers are calibrated and since the automatization, two instruments are used in parallel.If the difference between the two thermometers becomes very large (1 K), the measured value is checked manually.

| PROCEDURE OF BREAKPOINT DETECTION AND HOMOGENIZATION OF DAILY MEAN VALUES
The process of breakpoint detection and homogenization can be subdivided into different steps (see Figure 2).All steps work totally automatically and no manual intervention is needed.

| Step 1: Calculation of reference series
The first step is to calculate a reference series with several nearby stations.The correlation coefficient between the station under study and the surrounding ones (here 150 km) is determined.Day-to-day mean temperature changes are used to prevent too strong influences of breaks on the correlation coefficients.Only stations with a correlation coefficients larger or equal than a critical value (here 0.9) are used to estimate the reference series after Alexandersson and Moberg (1997): in which x j stands for the different time series of the nearby stations, y mean is the mean value of the candidate time series and cor j are the correlation coefficients of each station.Only stations with a minimal time range of observations (here at least 1000 days) are used.If there are less than a defined number of stations (here at least six stations) than the six stations with the largest correlation coefficients are used to calculate the reference series after Alexandersson and Moberg (1997).

| Step 2: Breakpoint detection
Differences between the reference series and the candidate series (difference series) are used to detect breaks.In this study, the R-function 'uniseg' out of the R-package 'cghseg' is used (Picard et al., 2011(Picard et al., , 2016)).Instead of daily data, monthly mean differences (candidate minus reference) are calculated for the breakpoint detection.The detected breaks subdivide the time series into different segments.With a t-test, the segments are tested on significant changes of the mean.Only significant breaks are used.The breaks detected by monthly differences define the time range in which breaks can be expected.
To get a more accurate break date detected in monthly data, ±2 months around the break is used to determine the breakpoint in daily data.If 'uniseg' is not able to detect a break in this time range using daily data, the break date of using monthly data is used for further steps.
An example is illustrated in Figure 3 (see Hannak et al., 2020).

| Step 3: Identification of breaks in nearby stations and recalculation of reference series and breakpoint detection
The breakpoints in the difference series can be caused by an inhomogenous reference series or candidate series.To identify the inhomogenous time series, the breakpoints are tested by a pairwise comparison.In a time range of ±3 years around each detected break (see Section 3.2), the difference series of each nearby station and the candidate series is tested on breakpoints.If the break is detected in at least half of the different time series of neighbouring station, this can be a sign for an inhomogenous candidate series.Breaks detected at the beginning or at the end of the time period are not considered.Furthermore, breaks with a Signal-to-Noise Ratio (SNR) of less than 0.5 are not counted.The SNR is calculated after in which D is the difference of the mean value of daily data before the break and after the break (Lindau & Venema, 2016).In the case of multiple breaks in the time series, D is calculated with the mean values of two subsequent segments and the standard deviation σ of the first segment is used.
If the break is only detected in one pairwise comparison of one nearby station and the candidate series, this nearby station is inhomogenous.The break date is used to split this nearby station into two time series and these two time series are treated as two stations separately.Afterwards the reference time series is calculated a second time.The breakpoint detection is done a second time with the difference series of the second reference series and the candidate series (as described in Section 3.2).The next step is to identify which breaks in the difference series are a sign of an inhomogenous candidate series.To identify the breaks in the candidate series, different criteria are used.If at least one of these cases (criterion 1, 2 or 3) is true, than the break is most likely caused by the candidate time series and the next step will be to use these breakpoints to split the candidate time series into segments and to homogenize the data.To explain breakpoints with metadata information the time range used to search for information depends on the probability to detect a break and the SNR (Lindau & Venema, 2016).Therefore, only metadata information is used which is sufficiently close to the break date.It is not possible to identify all breaks as breakpoints of the candidate time series with these three criteria.Some breaks can be possibly caused by an inhomogenous reference series.In such a case no homogenization is done.

| Step 5: Homogenization
At the beginning of the homogenization part, the time series is divided into segments.The recent segment is used as training period and the other segments are adjusted to that segment with the recent segments first.
If there is a metadata information, the date of the metadata information is used as break date instead of the date from the statistical breakpoint detection.To homogenize the data, the method 'HOM' (Della-Marta & Wanner, 2006) is used.As described in the introduction, this method adjust the complete distribution.The homogenization is done separately for each season.For example to homogenize winter values of the time series, winter values of the break and the training period are used.
If the segments are smaller than 1 year or it is not possible to estimate the correction factors by season, all data are used and no separation in seasons is done.

| VALIDATION OF HOMOGENIZATION PROCEDURE
The evaluation of the homogenization procedure is limited by the lack of knowledge of the truth.For that reason, synthetic data are used to analyse the added value of a homogenization procedure.Killick et al. (2019) designed a data set of synthetic daily mean temperature data for four different regions in the USA (Wyoming, South East, North East and South West).Four different set of realization are created, called Worlds.
World 1 is the best guess for the real world scenario, World 2 is similar to World 1 but has an increased station density, World 3 has step changes only as inhomogeneities and World 4 has increased autocorrelations.The recommendation of Killick et al. (2019) is to choose region Wyoming and use all different worlds to test the homogenization procedure.
In most cases the mean differences between the homogenized and the clean time series are smaller than the mean differences between the broken and clean time series (Figure 4).In World 1-3, the homogenization captures most of the inhomogeneities resulting in more homogeneous data (see as one example Figure 5).In World 4, the standard deviation of some difference series of homogenized minus clean time series is larger than the broken time series minus the clean time series.Probably the homogenization procedure has problems with the increased autocorrelation of this world (see as one example Figure 6).
In summary, the mean bias (to the clean data) is smaller using homogenized data compared to the broken data in all worlds (Figure 7, left) and the RMSE decreased using homogenized data (Figure 7, right).

| RESULTS OF HOMOGENIZATION USING DAILY MEAN TEMPERATURE SERIES OF GERMANY
The homogenization procedure is run twice.In total, 732 daily time series are homogenized.During the first time of homogenization, 2970 breaks are detected and 2032 of them are identified and the corresponding time series are homogenized.Eight hundred eighty-one breaks are identified by the first criterion (pairwise comparison with neighbouring stations), 1424 are identified by the second criterion (with metadata information) and 346 by the third criterion (breaks before 1950).The results of multiple criteria can be true for one break.
Afterwards the homogenization procedure is repeated.Again breaks are detected in the time series and the time series are homogenized a second time.During the second round of homogenization, 1711 breaks are detected and 967 are identified and used for the homogenization.Three hundred fifty-five breaks are identified with the first criterion (pairwise comparison with neighbouring stations) and 717 of the breaks are identified by using metadata information (criterion two).One hundred sixty-two breaks are before 1950 (criterion three) and are used to homogenize the data.
Figure 8 shows the trends of each time series before homogenization (raw data), after the first homogenization phase, after the final homogenization phase and the differences between the trends after the final homogenization and before homogenization (trend of raw data).It should be noted that the time series are of different lengths and cover different time periods.For this reason, only changes in trends are to be analysed (the same time series are used for the histogram of trends using raw data and for the histogram of trends after the final homogenization).No conclusion about the differences of the trend in Germany can be drawn by these histograms because the time series cover different time ranges and the station used are not equally distributed over Germany.
F I G U R E 5 Example for World 1. Top: daily differences of station 68 and reference series (calculated with nearby stations) and detected breaks (vertical lines); second row: monthly differences of station 68 and reference series and detected breaks (vertical lines); third row: difference series of homogenized time series of station 68 and reference series (green line), differences series (as in top figure) in black and detected breaks as vertical lines; bottom row: differences between homogenized time series minus clean data (green line) and broken time series minus clean data (red line).[Colour figure can be viewed at wileyonlinelibrary.com]Before homogenization, 3014 breaks in 845 time series are detected.After the final homogenization of daily data, 1247 breaks in 522 time series are still detected.The histogram of the break size is shown in Figure 9.The distribution of the break size has rarely changed but the number of breaks is strongly reduced.The remaining breakpoints cannot be explained with the defined criteria.Missing metadata may be an explanation.

| HOMOGENIZATION OF MONTHLY MEAN TEMPERATURE TIME SERIES
In some cases, monthly mean temperature series have a greater availability of observations reaching further into the past compared to daily data.The monthly time series are homogenized by an other procedure (described in the following section).Afterwards, the F I G U R E 6 Example for World 4. Top: daily differences of station 25 and reference series (calculated with nearby stations) and detected breaks (vertical lines); second row: monthly differences of station 25 and reference series and detected breaks (vertical lines); third row: difference series of homogenized time series of station 25 and reference series (green line), differences series (as in top figure) in black and detected breaks as vertical lines; bottom row: differences between homogenized time series minus clean data (green line) and broken time series minus clean data (red line).[Colour figure can be viewed at wileyonlinelibrary.com] daily homogenized data (as described in this study) are used to calculate monthly mean temperature values and combined with the homogenized monthly data to have a consistent data set.As with the homogenization of daily data, the monthly data are homogenized completely automatically.

| Step 1: Reference series and breakpoint detection
As described in Section 3.1, the reference series is calculated by the method of Alexandersson and Moberg (1997).Only stations with a minimal time range of

| Step 2: Identification of breaks
Similar to Section 3.4, the breaks are identified as breaks in the candidate series by different criteria.The first criterion is a pairwise analyses of each difference series with neighbouring stations.If the break is detected in at least half of the difference series (with each neighbouring station), the break is very likely in the candidate series.In the second criterion, metadata information is involved in a given timerange (dependent on the SNR) around the break date.If the time lag between the metadata information (minus a tolerance range; here 18 months) is smaller than the next metadata information of a neighbouring station, the break is rated as a break in the candidate series.In the last criterion, all breaks before 1950 are classified as breaks in the candidate series.If one of the three criteria is applicable, the break is in the candidate series.

| Step 3: Homogenization
To homogenize the time series, a method (called 'Linear Scaling') adjusting the mean of the time series (similar to the study of Vincent et al. (2002)) is used.
As described in Hannak et al. (2020), the correction factor is the difference of the mean deviation between the time series under study and the reference series of the training period (recent segment) and the break segment.The correction factor is determined individually for each month of the year.If it is not possible to calculate monthly correction factors (because the availability of data is too low), the complete data range of the break segment and training segment is used and just one correction factor is determined in such a case.The procedure of monthly homogenization is done twice (as the homogenization of daily data described in the first part).

| Step 4: Combination
Finally, the daily homogenized data are averaged to monthly mean values.These monthly time series are combined with the homogenized monthly time series of section 6 (reaching further into the past than the daily data) such that the recent part is from the homogenized daily data.The combined time series is tested on breakpoints.If the combination point between the monthly data (determined by the homogenized daily data) and the monthly homogenized data (described in Section 6) causes a break in the time series, the time series is homogenized with the method 'Linear Scaling'.The result is a consistent data set.The homogenized daily data are consistent with the final homogenized (and combined) monthly data.

| Results of homogenization using combined monthly temperature series of Germany
The mean break size of the final homogenized (and combined) monthly time series is zero (as for raw data, Figure 10) and the number of detected breaks (after homogenization) is strongly reduced.After homogenization most of the still detected breaks are inside a smaller Histogram of break size of raw daily data (left) and after final homogenization of daily data (right) in K.
interquartile range between −0.33 and 0.35 K (compared to −0.34 and 0.36 K of the raw data).

| CASE STUDIES
To illustrate the effect of homogenization on single time series, case studies are presented here.The trend of these time series are compared to the trend of Germany which is calculated with an interpolated data set of monthly data.The first example is the station Freiburg.In the year 2006, the station has been moved from the inner city to the airport, which caused a break in the time series of daily mean temperature.
During the first round of homogenization of daily data, five breaks are detected in the difference series.Four breaks can be identified as a break in the candidate time series (Figure 11).The most dominant break (23 November 2006) is caused by the station relocation.
The break at 17 November 1967 and 29 April 1989 is caused by some modification of the Stevenson screen (not specified in details).After the first homogenization of daily data, the trend in the time series is larger compared to the trend in the raw data (Figure 12).The trend after homogenization is comparable to the trend in Germany (0.29 K/decade during 1950-2020).The difference in the trend between the first homogenization and the second homogenization of daily data is small.
Figure 13 shows the annual anomalies (calculated with monthly mean temperature values) of raw data and after homogenization (combined product of homogenized daily and monthly data).The trend of the time series is lager (+2.1 K in contrast to +1.9 K before homogenization).The positive anomalies of the homogenized data during the recent time is higher positive than in the raw data (which is influenced by the station relocation to a colder place).The monthly mean temperature data of Bremen goes further back in time such that the monthly series is homogenized and combined with the daily data.During the first round of monthly data homogenization, 15 breaks are detected, 11 of them before 1950.Further three breaks were identified via metadata information (station relocation and modification at the radiation screen).In the combined product of averaged daily homogenized data and homogenized monthly data, the trend is more similar to the trend of Germany in that time range (Figure 17).

| CONCLUSIONS
In this study, daily mean temperature series of Germany are tested on breakpoints and homogenized after the method developed by Della-Marta and Wanner (2006).In some cases, monthly mean data goes further back in time.In these cases, monthly data are homogenized with the method 'Linear Scaling' and combined with the homogenized daily data (which are mostly available for recent past) such that the daily data covers the time range of the recent years.The final monthly product is consistent with the homogenized daily data.The homogenization of temperature data is done automatically and no manual intervention is needed.
The key findings of this study can be summarized with following points: • The validation of the homogenization procedure of daily data with a synthetic data set shows good results.The homogenized data are more similar to the clean data than before without homogenization (smaller mean bias and RMSE).• After homogenization of German daily mean temperature time series, less breaks are detected than before homogenization.• Most of the breaks are identified by the second criterion (using meta data information), followed by the first criterion (pairwise comparison with nearby stations).• The mean differences of trends before and after homogenization is small (0.05 K/length of each time series) with an interquartile range of −0.1 K/length of each time series and 0.2 K/length of each time series, i.e. on average the trend is slightly higher after homogenization.• In the final homogenized monthly mean product, the total number of breaks are strongly reduced (compared to raw data).• The break size of the still detected breaks (in the combined monthly product) are in a smaller interquartile range (between −0.33 and 0.35 K) than before homogenization (between −0.34 and 0.35 K). • In the three case studies, the trend of the homogenized data is more similar to the German trend than before homogenization.
To determine the temperature trend for Germany, the station data are interpolated.The method for the gridded data set uses monthly means which are derived from daily station data.To interpolate the data, height regression and inverse distance weight (IDW) are used (Müller-Westermeier, 1995).The German trend which is used here in the study for comparison reason is based on this gridded data set.The area of Germany is monthly spatially averaged.The monthly values are used for an annual average, with all values equally weighted.This time series for Germany is used to get the German trend.More information about the German trend can be found in Kaspar et al. (2023).
The data homogenized here were also used for the interpolation as input data.The difference between the trend of the non-homogenized and the homogenized data is very small.The algorithm used so far (Müller-Westermeier, 1995) for the interpolation seems to be sufficient to catch changes in the measurement network.

F
I G U R E 1 Map of stations that measured in 1881 (left) and 2021 (right).[Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 2 Flowchart to illustrate the different steps of homogenization.F I G U R E 3 Example for the breakpoint detection with differences (automatic minus manual observations) of monthly data (black line) and daily data (grey line); The vertical lines represent the results of the breakpoint detection method (here the R-function 'uniseg') for different temporal resolution (daily: cyan line, monthly: blue line).The Figure is taken from the study of Hannak et al. (2020).[Colour figure can be viewed at wileyonlinelibrary.com] 3.4 | Step 4: Identification of breaks in the candidate series Histogram of mean differences in K between broken and clean data (left) and homogenized and clean data (right) of all stations in World 1 and World 4.
Histogram of break size of raw monthly data (left) and after final homogenization (combined data, right) in K. F I G U R E 1 1 Left: Daily difference series between candidate series (station Freiburg) and reference series (top) and monthly difference series (bottom).Red vertical lines: detected breaks; Cyan vertical line: detected and identified break.Right: Daily difference series between candidate series (station Freiburg) and reference series (black) and between homogenized candidate series and reference series (green).Red vertical line: breakpoint.[Colour figure can be viewed at wileyonlinelibrary.com]The time series of Ulm is influenced by breaks as well.During the first homogenization of daily data, four breaks are detected (Figure 14, left).For two of them (8 October 1964 and 10 December 1991) metadata information is available in the given time range around the breaks.In both cases the station was moved to an other location.The other two breaks (26 March 1981 and 25 July 1997) the breakpoints are identified by the first criterion (with pairwise comparison with each neighbouring station).For that reason, all detected breaks are used for the homogenization step (Figure 14, right).After the first homogenization of daily data, no further breaks are detected.During the first homogenization of monthly data (reaching further to the past than the daily data), two breaks are detected (October 1935 and September 1964) but just the first break (October 1935) is used for the homogenization step because this break is before 1950 and no metadata information is available for both breaks.During the second homogenization of monthly data, two breakpoints (March 1913 and December 1924) are detected and both are before 1950.For that reason, these breakpoints are identified as breaks in the time series of Ulm.After the homogenization of the monthly data, the homogenized daily data is combined with the homogenized monthly data.In the final product of monthly (and combined) data, no further breaks are F I G U R E 1 2 Temperature anomalies of Freiburg before homogenization (top) and after first homogenization of daily data (middle row) and after second homogenization of daily data (bottom).Below the figures, the multi-annual mean (1961-1990), the linear trend for the time range shown, the linear trend per decade and the linear trend per decade in Germany (calculated with a gridded data set) are shown.[Colour figure can be viewed at wileyonlinelibrary.com] detected.The trend after homogenization is larger than in time series of the raw data (Figure 15) and more similar to the trend of Germany.A station in the north of Germany is Bremen.At this station, seven breaks are detected during the first round of daily data homogenization.Four breaks are identified F I G U R E 1 3 Temperature anomalies of Freiburg before homogenization (top) and after homogenization (combined product, bottom).Below the figures, the multi-annual mean (1961-1990), the linear trend for the time range shown, the linear trend per decade and the linear trend per decade in Germany (calculated with a gridded data set) are shown.[Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 1 4 Left: daily difference series between candidate series (station Ulm) and reference series (top) and monthly difference series (bottom).Red vertical lines: detected breaks; Cyan vertical line: detected and identified break.Right: daily difference series between candidate series (station Ulm) and reference series (black) and between homogenized candidate series and reference series (green).Red vertical line: breakpoints.[Colour figure can be viewed at wileyonlinelibrary.com] as breaks in the candidate time series (Figure 16): the first break (February 1947) is before 1950 (third criterion) and the other breaks are identified with metadata information (second criterion).At these points in time, the station was moved to an other location.The most recent break (June 1978) is identified by criterion one F I G U R E 1 5 Temperature anomalies of Ulm before homogenization (top) and after homogenization (combined product, bottom).Below the figures, the multi-annual mean (1961-1990), the linear trend for the time range shown, the linear trend per decade and the linear trend per decade in Germany (calculated with a gridded data set) are shown.[Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 1 6 Left: Daily difference series between candidate series (station Bremen) and reference series (top) and monthly difference series (bottom).Red vertical lines: detected breaks; Cyan vertical line: detected and identified break.Right: Daily difference series between candidate series (station Bremen) and reference series (black) and between homogenized candidate series and reference series (green).Red vertical line: breakpoint.[Colour figure can be viewed at wileyonlinelibrary.com] (using pairwise comparison with nearby stations) and via metadata information (second criterion).During the second phase of daily homogenization, six breakpoints are detected and four breakpoints are identified as breaks in the time series of Bremen.Four breaks are identified via metadata information (second criterion; January 1936: station relocation, June 1978: modification at thermometer and radiation screen, September 1990: modification on radiation screen, not specified in detail) and the most recent break (August 2000) is identified via pairwise comparison with neighbouring stations (first criterion).

F
I G U R E 1 7 Temperature anomalies of Bremen before homogenization (top) and after homogenization (combined product, bottom).Below the figures, the multi-annual mean (1961-1990), the linear trend for the time range shown, the linear trend per decade and the linear trend per decade in Germany (calculated with a gridded data set) are shown.[Colour figure can be viewed at wileyonlinelibrary.com]

•
The first criterion is based on a pairwise analysis of the difference series of each nearby station with the candidate series ±3 years around each break.If in at least half of the pairwise difference series the breakpoint detection was successful this may indicate an inhomogenous candidate series.Breaks at the beginning or the end are not counted.If there are only two nearby stations than the break should be detected in both difference series.•For the second criterion metadata information of the candidate station and each reference station are needed.If there is a metadata information in a given time range around the break, the time lag is calculated between the metadata information and the break date.If the time lag of the candidate sensor minus a tolerance range (here 540 days) is smaller than the smallest time lag of the reference sensors, the candidate time series might be inhomogenous.
• In the past, less metadata information and station data are available.Especially before 1950, the number of station is small.For that reason, all detected breaks before 1950 are treated as breaks in the candidate series.
Mean bias and mean RMSE of all stations of each world.Blue: Broken minus clean data; green: homogenized minus clean data.[Colour figure can be viewed at wileyonlinelibrary.com]