Piecewise linear fitting and trend changing points of climate parameters



[1] Finding an overall linear trend is a common method in scientific studies. It is almost a requirement when one intends to study variability. Nevertheless, when dealing with long climate temporal series, fitting a straight line only seldom has a relevant meaning. This paper proposes and describes a new methodology for finding overall trends, and, simultaneously, for computing a new set of climate parameters: the breakpoints between periods with significantly different trends. The proposed methodology uses a least-squares approach to compute the best continuous set of straight lines that fit a given time series, subject to a number of constraints on the minimum distance between breakpoints and on the minimum trend change at each breakpoint. The method is tested with three climate time series.

1. Introduction

[2] During the last decade there have been a large number of papers discussing long term linear tendencies of climate parameters, such as precipitation, temperature and the NAO index, to mention just a few [e.g., Groisman and Easterling, 1994; Hurrell, 1995; Easterling et al., 2000; Thompson et al., 2000; Tuomenvirta et al., 2000; Tank et al., 2002; Ostermeier et al., 2003]. Many of these studies were motivated by the quest for anthropogenic climate changes in the last century. Analyzing the global temperature time series for the period 1880 till 1997, Karl et al. [2000] pointed out that a linear trend is not adequate to describe its low frequency behavior. Even an eye inspection revealed that the mean warming obtained by fitting a straight line did not occur in a persistent way, but in two sustained periods, one beginning around 1910 and the other starting in the mid 1970s. In order to clearly separate the two periods of warming, they devised two approaches: one based on Haar Wavelets, which was able to identify three discontinuities in the time series, and the second (the preferred one) consisting of the minimization of the residual sum of squares of all possible combinations of four line segments representing time intervals of 15 years or more, and constrained to have their end points intersecting at the year of change point. Using this approach they were able to evaluate the partial trends, a better overall trend, and most of all they identified three breakpoint years: 1910, 1941 and 1975.

[3] The methodology proposed here is a development of that second approach, where instead of arbitrarily fixing the number of line segments, which in Karl et al. [2000] resulted of an eye inspection of the time series, the number and location of the breakpoints are simultaneously optimized. The method computes the best combination of continuous line segments that minimize the residual sum of squares, subjected to a pair of conditions: (a) the interval between breakpoints must equal or exceed a given value, (b) two consecutive trends must obey one or more imposed conditions.

[4] Applying this methodology to the time series used by Karl et al. [2000], representing the mean world temperature, with the conditions of a minimum 15 year interval between breakpoints and of changing sign between two consecutive trends, leads to the results they have obtained. The results are still the same if instead of a minimum 15 year period ones uses 10, 20 or even 30 years.

2. The Algorithm

[5] To describe the method of implementation let us assume one wants to fit, to a given data series, a continuous curve made of four straight-line segments. This kind of problem is not standard in scientific and statistical program packages, and in some way was what Karl et al. [2000] needed to solve in their work. Essentially, having the time series

display math

where the bp(2), bp(3) and bp(4) are the breakpoint positions in the series, the problem is then to fit, in the least squares sense, the function

display math

The continuity condition of the three line segments leads to

display math

If the breakpoint positions, bp(2), bp(3) and bp(4), are known, or are imposed, it is quite easy to obtain a linear system of five equations with five unknowns, a1, a2, a3, a4 and c1, equating to zero the partial derivatives of the sum of the square differences between the fit function and the observations. But the equation system obtained that way could not be generalized to an arbitrary number of line segments, making it almost impossible to devise a convenient and computational efficient process to deal with an unknown number of breakpoints and its positions. Instead of that, one can transform this problem into a general case of an over-determined system of linear equations of the type

display math

where s is the vector solution, s = {a1, a2, a3, a4, c1}, y is the time series, y = {y1,…ym}, and A is a (m × 5) rectangular matrix with the first bp(2) lines equal to {ti, 0, 0, 0, 1}. The bp(2) + 1 to bp(3) lines will be equal to {tbp(2), titbp(2), 0, 0, 1},the bp(3) + 1 to bp(4) lines will be equal to {tbp(2), tbp(3)tbp(2), titbp(3), 0, 1} and the bp(4) to the m lines are equal to {tbp(2), tbp(3)tbp(2), tbp(4)tbp(3), titbp(4), 1}. This formulation allows for the use of a standard solution for this kind of over-determined problem.

[6] It is also easy to extend the method to an arbitrary number of finite line segments to fit into the series, because the creation of the corresponding A matrix can be easily programmed. Let A be a rectangular matrix [m × (ℓ + 1)] where ℓ + 1 is the number of the continuous line segments to fit in the y series of m elements, ℓ is the number of breakpoints, that should be greater or equal to 1. Let bp and b be two vectors of ℓ + 1 elements, The first element of bp is set to 1 and the remaining ℓ elements are the position of the breakpoints in the y series. The first element of b is set to zero and the remaining elements are the time values of the breakpoints b(i) = t(bp(i)). The A matrix can be computed with the following algorithm (in pseudocode):

display math

With the creation of the A matrix completely automated, the system of equation (1) can be consecutively solved for every combination of possible solutions, that obeys the minimum imposed distance between two consecutive breakpoints. One starts with the case of one breakpoint, up to a maximum of twelve breakpoints. For each of these cases one gets the best fit, that satisfies the desired conditions between two consecutive linear trends. At the end, the solution that minimizes the residual sum of squares is chosen.

[7] This approach of solving several times a linear system of equations changing the position and the number of breakpoints, was found to be preferable to the approach of solving a non-linear system of equations that arises when one considers the breakpoint positions as unknowns, which results in a larger number of computer operations needed by the converging condition of the non-linear scheme. The chosen approach also made it possible, and easy, to impose any kind of condition between two consecutive trends.

[8] Furthermore, the adopted procedure obtains all the partial solutions with less breakpoints and allows one to build a series of the sum of the square residuals of the best fit with one, two or three breakpoints, up to a maximum of twelve. Probably this series could be used to study the significance of solutions with several breakpoints by analyzing the relative decrease of the square sum of residuals. This problem is presently being addressed. Nevertheless, it must be pointed out that the number of breakpoints is essentially determined by the conditions imposed between two consecutive trends. It is strongly recommended that, whenever the breakpoint positions are separated by the minimum period allowed, an eye inspection of the series best fit result is performed.

[9] To illustrate the method, it will now be applied to three time series of climate parameters.

3. Azores December Maximum Temperature

[10] The mean maximum December temperature at Angra do Heroismo (Azores) is an interesting test case, because it illustrates the biased conclusions one can make by simply evaluating the linear trend. Results obtained with the proposed methodology, subjected to a minimum of 15 years between breakpoints and different signs between consecutive trends, are presented in Figure 1. In the same figure, the best linear fit (dashed line) has also been plotted. The method identifies two breakpoints, in 1935 and 1960, separating a period of warming (1901–1935) at a rate of 0.14°C/decade, followed be a period of stronger cooling (1935–1960, with cooling rate of −0.51°C/decade) and by a later period of warming (1965–2002, at 0.22°C/decade). These three linear trends lead to a positive overall mean trend of 0.01°C per decade for the period 1901–2002. For the same data, the linear trend given by the dashed line is negative, and equal to −0.07°C/decade.

Figure 1.

Maximum December temperature at Angra do Heroismo (Azores), breakpoints (1935 and 1960), partial tendencies, in °C/decade, and the linear trend (dashed line).

[11] Looking only at the linear trend one could erroneously conclude that the maximum December temperature at Angra do Heroismo has been gradually decreasing in the 20th century, when what really happened was a strong cooling in the 25 year period starting in 1935 and ending in 1960. Even if the warming periods did not fully compensate for that cooling period, the simple statement of a linear decrease in temperature would be misleading.

4. NAO Index

[12] The North Atlantic Oscillation (NAO) is one of the major features of the Northern Hemisphere climate system. It was first acknowledged by Walker [1924] but only in the last quarter of the 20th century was it extensively studied. It has many times been pointed out that in recent decades the NAO index has exhibited a positive trend.

[13] However, due to the high irregularity of its time series many authors have reported periods of increase and decrease of the NAO index. Invariably, the location of the start and ending moments of those periods resulted of an eye inspection of the series plot. The results of applying the proposed methodology to the NAO index, imposing different values (between 10 and 35 years) for the minimum time distance between breakpoints, and the condition of sign change between two consecutive trends, is presented in Figure 2.

Figure 2.

Dependency of the location of breakpoints in the NAO index, between 1880 and 2003, on the minimum duration of the time intervals between breakpoints. NAO index computed by J. Hurrel, as the difference of normalized sea level pressure between Lisbon, Portugal, and Stykkisholmur/Reykjavik, Iceland, updating the series of Hurrell [1995]. The different trend lines have been vertically displaced to avoid superposition, keeping the same scale.

[14] Figure 2 illustrates an objective way of testing the robustness of piecewise linear trends, in highly variable time series, where a visual inspection may be misleading. When the computed trend lines have durations equal to the imposed minimum, as is the case of most lines in the two lowest solutions shown in Figure 2, the corresponding trends are an artifact of the constraints and do not represent the low frequency variability of the series. This is particularly clear in the lowest curve. On the other hand, the increasing trend of the NAO index computed for the period 1965–1990 seems a very robust result, the first year being a breakpoint present in all solutions. Ostermeier and Wallace [2003] devised a comparable, but slightly more intricate way of analyzing the tendency of the 20th century NAO index, which led to similar conclusions. The persistence of a positive trend in the NAO index for almost 3 decades, even when one considers a 10 year minimum period, is a remarkable feature, unprecedented in the series history. One may conclude that the reported high values of the NAO index in the last decades are essentially due to the existence of that persistent positive trend, which may have ended in the last few years.

[15] An analysis of the set of solutions presented in Figure 2 shows that the first and last breakpoints in each solution are highly constrained by the boundary condition, i.e., the need to satisfy the minimum length imposed between the breakpoints and the series limit. The 20 and 25 year cases lead to a good visual fit of the series, as shown in Figure 3, where the original data is also included.

Figure 3.

Piecewise linear fitting of the NAO index for a minimum period between two breakpoints of 20 years (dashed line) and 25 years (full line), for the condition of signal change between consecutive trends.

[16] From Figure 3 we may conclude that the low frequency behavior of the NAO index, for the 25 year case, presents a positive trend till 1909, followed by a negative trend till 1964, and then again a positive trend. The net trend is positive, 0.20/decade, and much stronger than the one obtained by fitting a single line (0.04/decade). The boundary conditions do not allow for the 25 year solution to pick up the downward trend of the NAO index in the last decade. As a consequence the upward trend after 1965 is somewhat reduced in that solution. The best fit to the 1965–1990 trend is the one given by the 10 year solution, since it is the least constrained, and it suggests a much faster evolution of the index at a rate of 1.6/decade. Prior to 1960 the behavior of both solutions presented in Figure 3 are qualitatively similar. The 20 year solution still keeps the last downward trend but starting at 1986 due to the boundary condition. For the 10 year solution the corresponding breakpoint is in 1992 (Figure 2).

[17] The results presented in Figures 2 and 3 explain the distribution of partial trends (partial trend values, plotted as the domain beginning year trend versus end year trend) reported by Ostermeier and Wallace [2003]. Negative trends prevail prior to the 1960s, while strong positive partial trends dominate for periods beginning in the 1960s and 1970s and ending prior to the early 1990s.

5. Lisbon Winter Precipitation

[18] The Lisbon precipitation time series has often been pointed out as a climate parameter strongly dependent of the NAO index values, and the two series are known to be negatively correlated [Hurrell, 1995; Trigo et al., 2004]. The remaining question is to know if they present the same low frequency behavior.

[19] Figure 4 presents the breakpoint locations obtained for different imposed minimum periods between them. The original series with the best fit for the 20 years and 25 year cases is also presented, in the upper panel of the figure. Two breakpoints, in the early 1930s and early 1960s are present in all solutions.

Figure 4.

Dependency of the location of breakpoints in the winter (December through March) Lisbon precipitation, between 1900 and 2002, on the minimum duration of time intervals between breakpoints. The precipitation time series is plotted together with the 20 years (dashed line) and 25 years case (full line) in the upper panel.

[20] When comparing Figures 2 and 4, one sees that, although the position of the breakpoints does not exactly match, the winter NAO index and the Lisbon winter precipitation show the same low frequency behavior. For the 10 year solutions the positions of the breakpoints almost coincide and the partial trends present opposite signs, as expected. As happens with the NAO index, the winter precipitation of Lisbon, when allowed to change trend signs every 10 years, chooses not to do it, and between the early 1960s and the early 1990s presents a persistent negative trend, which is a remarkable fact in the series history.

[21] In the NAO index and in the Lisbon precipitation the breakpoint in the early 1960s is very robust, making it a very important turning year in these climate series.

6. Concluding Remarks

[22] Although frequently used, the linear trend can be deceptive. It is possible to obtain an improved “trend” through the weighted average of partial trends, that better describes the low frequency behavior of the times series. At the same time one gets a set new climate parameters (breakpoints) that can provide a new insight into global and local climate studies. After the work of Karl et al. [2000] some studies used the years of 1945 and 1975 as turning points of the climate series of the 20th century [e.g., Tank et al., 2002], not considering the fact they were obtained for the global surface temperature and have no special meaning for other climate series. These years may be completely inadequate for many climate time series, namely for precipitation, for the NAO index, as shown in this paper, and probably for many other climate time series.

[23] We believe that a systematic study of the distribution of climate breakpoints in space and time may provide new insights into the spatial patterns of climate change and its relation with the global circulation. The proposed method allows for an objective and rather simple way of computing those breakpoints from large numbers of climate time series, without important a priori decisions.

[24] The algorithm was implemented in a FORTRAN 90 program, using LAPACK routines for solving over-determined linear equation systems. The source code can be freely obtained by requesting it by mail to the first author.


[25] This work was possible due to a sabbatical year of the first author from the Universidade da Beira Interior, with the financial support from Centro de Geofísica da Universidade de Lisboa and from FCT, through a grant co-funded by the European Union under program FEDER. The NAO data was kindly provided by James Hurrell.