Seasonality is an important source of variation in many processes and needs to be incorporated into rainfall models. The stochastic seasonal rainfall models for high temporal resolution data [Sansom and Thomson, 2010] and daily data (T. Carey-Smith et al., A hidden seasonal switching model for multisite daily rainfall, manuscript in preparation 2012) both depend on the specification of one day of the year for each season being in the particular season. So, if the seasonality is to be represented by four seasons then it is necessary to provide four dates on each of which it can be said that, every year, the season is of the first, second, third or fourth type respectively on that day of the year. The model fitting can only proceed once these dates, the mid-seasons, have been provided.
 In a region of large seasonal rainfall variation the determination of these mid-seasons would not be difficult and the model fitting not sensitive to their choice. However, although it is clearly evident in the annual pattern of monthly accumulations, New Zealand's rainfall seasonality is not strong and careful assessment of the mid-season dates is necessary. They need to be estimated with a precision of days and the 55-year long rainfall records of daily data from 141 stations spread across New Zealand were analysed on a regional basis. The analysis found regionally coherent dates when the mean daily rain rate changed significantly and that over the years these dates could be modelled as a four component von Mises distribution with characteristics consistent with stochastic seasonality.
 Seasonality is an important source of variation in many processes especially those in the natural world such as rainfall and has usually been incorporated into rainfall models in a deterministic way. Either the year is simply divided into a number of fixed parts (4 for seasonal resolution, 12 for monthly resolution) with each part being modelled individually so that the parameters are independent from part to part. Or, the parameters are allowed to vary continuously over the year as Fourier components. Both of these can be termed deterministic as the model parameters on any particular day of the year are the same for all years. Alternatively, the common experience of what can be described as, for example, “an early winter” or “a long summer” implies that seasons can start and/or finish at times of the year that vary from year-to-year. Such seasonality was proposed by [Sansom and Thomson, 2007] and can be termed stochastic.
 Although season onsets vary from year-to-year, in any particular year the onset of a season is likely to be contemporary across neighbouring rainfall stations within a region. A recommendation of the single-site high temporal resolution modelling in [Sansom and Thomson, 2010] was that stochastic season models should be extended to encompass regional multi-site networks over longer time periods using the more conventional, and abundant, daily rainfall accumulations. Carey-Smith et al. (manuscript in preparation, 2012) have adopted this strategy for including seasonality in a new stochastic model of daily rainfall which allows seasons to be earlier or later, and longer or shorter, than usual. This leads to increased rainfall variability over and above that explained by the standard fixed (deterministic) seasons.
 New Zealand rainfall shows relatively weak seasonality with significant rainfall occurring throughout the year and no markedly dry or wet seasons. This is illustrated in Figure 1 where monthly rainfall accumulations for Invercargill in the far south of New Zealand at 46°25'S, 168°20'E are presented. So transitions between homogeneous rainfall seasons will generally be more subtle, diffuse and difficult to determine than the marked changes experienced in, for example, monsoons. The weak rainfall seasonality reinforces the need to consider multi-site daily rainfall data over longer time periods as recommended by [Sansom and Thomson, 2010]. Furthermore despite its weakness in New Zealand, rainfall seasonality does exist and needs to be allowed for in models if they are to truly represent the behaviour of rainfall and provide valid simulations as required, for example, in risk analysis.
 The modelling and prediction of the onset of more definite seasons such as monsoons has been considered by a number of other authors. [Stern, 1982] using a simple Markov chain model for rainfall occurrence, considered wet season onset where this was defined as the first day when the daily rainfall, or two day total, exceeded a given threshold. More recently, [Lima and Lall, 2009] modelled multi-site rainfall occurrence using a Bayesian seasonal model with conventional Fourier components and season onset defined as the time when the posterior probability of rainfall occurrence exceeds 0.5. [Slocum et al., 2010] used cumulative rainfall anomalies to estimate the onsets of dry and wet seasons.
 Both the [Sansom and Thomson, 2010] and (Carey-Smith et al., manuscript in preparation, 2012) models are such that within each season, rainfall is modelled as a homogeneous hidden Markov model (HMM) with each season having its own dynamics, but common precipitation mechanisms. Thus, for example, a heavy rain episode in summer is stochastically equivalent to a heavy rain episode in winter (non-seasonal mechanisms), but the frequency and clustering of rainfall and dry episodes varies from season to season (seasonal dynamics). Furthermore, both depend on the specification of one day of the year for each season being in the particular season. So, if the seasonality is to be represented by four seasons then it is necessary to provide four dates on each of which it can be said that, every year, the season is of the first, second, third or fourth type respectively on that day of the year. The model fitting can only proceed once these dates, the mid-seasons, have been provided to a precision of days rather than weeks or months.
 The study presented in this paper is the estimation of these mid-seasons from the 55-year long rainfall records of daily data from 141 stations spread across New Zealand. Section 2 presents the data and general approach of analysing running accumulations on a regional basis. Section 3 is a more detailed explanation of the two stage method that was used to produce the results presented in Section 4. Any method applied to data will produce an answer, so Section 5 explores the physical significance of the results and Section 6 is a summary with some conclusions.
2 Data and General Approach
 Daily rainfall measurements have been made in New Zealand since 1852 but a national network of gauges dense enough to adequately determine the spatial variation of rainfall was not established until the mid-1940s. At times this network consisted of up to 600 stations but the largest number of stations covering the longest period with contemporary data that could be extracted was for 141 stations from 1st January 1945 until 31st December 1999 (see [Thompson, 2006]). Even in this reduced dataset some stations had missing days in which case a value was estimated from nearby stations. The rainfall was estimated from those stations within 20 km of the site with the missing data by taking a distance weighted average rainfall from all the neighbouring sites. Thus, stations close to the site with missing data had a larger influence on the estimation than sites further away. Figure 2 shows the locations of the 141 stations, of these 94 had less than 10 per cent missing data and the other 47 locations had composite records made up from the records of two or more adjacent sites which together spanned the 55-year period, with any missing data estimated by the distance weighted averaging technique. While it is recognised that the composite datasets may contain inhomogeneities, removing or reducing such inhomogeneity in daily data is difficult and unnecessary for the current study.
 For a particular station, the traditional way of showing its seasonality is to accumulate the daily falls over every calendar month of the record and examine the annual trend of the long term monthly means. The time series of monthly accumulations and the annual trend with its interannual variability is shown in Figure 1 for Invercargill. A similar figure was shown in [Sansom and Thomson, 2007] and the bottom panels of their Figure 8 show that, although a long-term tendency exists for January and May to be the wettest months and February and August to be the driest, these peaks and troughs in accumulation were not always just those months. If in every year January etc were the wettest/driest months then seasonality could be said to be deterministic but, as they are not, seasonality can be described as stochastic.
 The approach of accumulating rainfall over months is reasonable as it will remove some of the high frequency variability and facilitate the appreciation of the general annual trend and inter-annual variability. However, stochastic seasonality implies that, for a particular season, a step change from one type of behaviour of the rainfall process to another type does not take place on the first day of a given month each year. Thus, to better describe both the general trend and variability a daily rather than monthly temporal resolution is required and, hence, an approach without monthly accumulating but one using the raw daily values. Monthly accumulations are equivalent to the mean daily fall for the month multiplied by the number of days in the month and the mean daily fall over an arbitrary number of days is just the gradient of the time-accumulation plot of a daily rainfall record as shown in Figure 3. In this figure the total fall from the first day to the Nth day has been plotted at day N and, compared to a simple plot of the daily falls, a relatively smooth plot results such that times with a fairly steady mean daily fall can be discerned. The peaks and troughs of the annual trend in monthly accumulations are equivalent to changes of slope on the time-accumulation plot: a wetter month has a higher mean daily fall and the time-accumulation plot is steeper.
 In the [Sansom and Thomson, 2007] stochastic seasonality scheme the times of change from one season to the next are random and take place instantaneously. Thus before a change the precipitation process might be described by a model with a certain set of parameters while after the change the model is the same, but the parameters have changed to a new set which continues to be in force until the next change of season. The scheme does not include a transition period which has characteristics of both the season before the change and the one after. Within the context of daily data, “instantaneously” means from one day to the next so the day when a season change takes place will appear as a turning point on the time-accumulation plot. This is the point where a change from one more or less steady slope on the plot during a certain season to another more or less steady slope on the plot during the next season.
 The approach, then, was a move away from analysing monthly rainfall accumulations to one of finding significant turning points on the rainfall time-accumulation plot. Furthermore, under the reasonable assumption that a change of season will be, at least, a regional event if not a national one, the time-accumulation plots from neighbouring stations were expected to display turning points at about the same time. A final assumption made was that four seasons exist: the general form of any cyclic process which moves from one extreme to another through transitory states has four stages. A meteorological example is the annual variation of mean temperature from the heat of summer, to the cooling of autumn, to the coldness of winter and, finally, the warming of spring. Seasonality of rainfall in New Zealand is not so well defined but [Sansom and Thomson, 2007] (see their Figure 8) showed that four seasons are clearly discernible in the south and can be seen elsewhere although the winter minimum is not marked.
 To allow for the expected spatial coherence of season changes, the stations were organised into regions as shown in Figure 2. Twelve stations were included in each region and the regions formed such that in general all 12 could be considered to represent an area with a uniform rainfall climatology[Sansom, 1985; 1986]. A few stations were included in more than one region. Some were not able to be sensibly put together and are shown in the figure as black dots. Two areas in particular were concerned: the west of the South Island which is a high rainfall area with insufficient stations to form a region; and, the southwest of the North Island where a dearth of stations centred at about 39°30'S, 175°0'E precluded the formation of a region containing the stations centred about 40°0'S, 175°30'E which have a climatology distinct from the regions shown by + or × or inverted triangles.
 The method developed was applied to regions but the preliminary step of detecting turning points in the increase of accumulation with time was applied to the time series from individual stations. Two techniques were tried: firstly, those embodied in [Zeileis et al., 2002] where generalised fluctuation tests are implemented for application to time series. However, these tests are designed specifically to detect isolated major changes and not for the regular succession of changes associated with seasonality which, at least in the case of New Zealand rainfall, are less distinct. The adopted technique was that described by [Sansom, 1997] and [Sansom and Gray, 2002] in which a monotonic accumulation line can be approximated by selecting its salient points. A first, and extremely rough, approximation to a monotonic accumulation line is just the first and last points giving a straight line between them rather than the details that actually exist. The difference between the approximation and the actual accumulation line will be a maximum at some point and that point can be considered to be the next most important point in an approximation. The extraction of this first turning point is illustrated in Figure 3 where the vertical red line crosses the accumulation trace at that point of maximum difference. The green line, in the figure, connects this point to the origin and shows how the upper part of the plot deviates from the accumulation line of the lower part. The inset is an enlargement at this first turning point and shows that the prior three months had a lesser rainrate than the three months afterwards, especially immediately afterwards. After finding the first turning point the approximation is two straight line segments and it, as a whole, will have a maximum difference from the actual accumulation line and the point where that occurs should be retained in the approximation and so on. Indeed, all the points making up the monotonic accumulation line could be ranked and then a decision made about how many need to be retained for the approximation to be adequate for the purpose to hand.
 For the case under consideration, an approximation was not required instead four, more or less, significant turning points per year of record were required. Perhaps, more significant changes in the mean accumulation with time take place than those associated with seasons in which case seasonality would be of lesser importance but, in this study the most highly ranked turning points were considered to be candidates for times of season change. All records were 55 years long and, ideally, 220 turning points per station needed to be identified. However, to diminish the risk of spurious seasons in the form of short periods (i.e., sub-season scale) of drought or enhanced rainfall, or the risk of missing a season through secondary effects out-ranking some of the weaker season changes, more than 220 were identified. If taken too far spurious seasons would almost certainly be introduced thus the number was limited to 330. This was in the expectation that enough of these would not be supported by other stations in the region and the overall number contemporary with other members of the region would be closer to 220.
 As each of the 330 turning points was identified, the error between the current (i.e., that due to the turning points already identified) approximation and the actual accumulation line was noted for use as a measure of the point's importance. The sign of this metric also identified the sense of the rainrate change with negative metrics indicating an increase in the mean daily fall in the days after the turning point to that of the days before. As the regions were chosen to be sufficiently small to have a uniform rainfall climatology, at a season change within a region the sign of the metric should be the same throughout. (Between regions the sign may be different even though the time of change is the same. For example, a season change that occurs due to a change in the prevailing airflow could well cause enhanced rain in one region but diminished rain in a neighbouring region).
 The method developed to analyse the 330 turning points from each of the 12 stations within a region consisted of two stages: 1) the actual dates of the season changes within a region in each of the 55 years of record were estimated; and, 2) the parameters specifying the mean and variability of the dates (i.e., day of year) of each of the four season changes were estimated. Stage 1 involved preliminary estimation of the season change parameters based simply on the sign of the metric (see below for a brief description and the Appendix for a fuller account) and was primarily intended to provide estimates of the mean mid-seasons (i.e., days of the year midway, on average, between adjacent season change dates). These can then be used to divide up the year and the turning points within each such division for each year are the candidates for the time of change of that particular season for that particular year. Provided the variability of the season change times is not too great the exact location of the mid-seasons is not required as few changes would be expected at those times. Stage 2 used these classified candidate dates (see below for a brief description and the Appendix for a fuller account) in further estimation of the season change parameters.
 Thus, considering all the stations in a region, the time span of the record was divided into periods in which the metric from all stations had the same sign. For each such period the median date of the turning points, the number of stations in the period and the date range of the period were determined. Occasionally in a period a single station had more than one candidate date but this was reduced to one by retaining the occasion when the magnitude of the station's metric was largest amongst those offered by that station. Whether a particular period truly covers a season change time was judged from the number of stations in the period and its date range. For those periods retained with sufficient members having turning points in a short enough interval, the median date was taken to be the time of season change.These dates were fitted to a four component mixture of von Mises distributions [Evans et al., 2000].
 The von Mises distribution is a circular distribution so unlike an unbounded distribution along the real line the domain is restricted to the range of angles that cover a single revolution around a point. It is similar to the normal wrapped around a circle with one parameter specifying the mean location as an angle and the other parameter the concentration at the mean (similar to the reciprocal of the standard deviation). High concentrations of 10 or more apply to cases with relatively small variability about the mean whereas a zero concentration is equivalent to a uniform distribution around the circle. A circular distribution was required as by its nature seasonality is cyclic. The von Mises is a standard choice of circular distribution and parameter estimation was made by numerical maximisation of the likelihood. Initial values were required for the fitting procedure. Firstly, setting the component fraction to be 0.25 for each of the four components was appropriate as four seasons were expected every year. Secondly, in the expectation that the variability was concentrated about the mean time of change, the initial value for the concentration parameter was taken to be 10 for all components. Finally, since little was known about the mean dates of season change, random dates were generated as initial values and 50 trial fits were made. Some of the trials were distinctive and unlike any others but, generally, they formed a few large groups. Using kmeans clustering ([Hartigan and Wong, 1979]) the degree of clustering needed to divide the trials such that the largest group contained about half of the trials was applied. (Kmeans clustering is a numerical procedure that aims to partition n observations into k sets (k ≤ n) so as to minimize the within-cluster sum of squares). For each of the components, the mean of the pdfs of the fits in this cluster were then numerically fitted to a von Mises pdf and the parameters of those von Mises fits used to initialise a final numerical maximisation likelihood fit to the data.
 For Stage 2, discrimination limits were required such that the day before the limit could be identified as most probably belonging to one season while the day after would most probably be of the next season. These limits, termed the mid-seasons as described above, were taken to be where the tails of the pdfs of adjacent components crossed. Thus, considering all the stations in a region the time span of the record was divided into periods stretching from one mid-season to the next and the turning points within each period were the candidates for the times of season change. Some statistics were compiled for each period: 1) the number of candidates; 2) by dropping those whose metric is of opposite sign to that of the majority and by only retaining one candidate from each station (the one nearest to the median date of all candidates), the number of different stations that agree with respect to their metric's sign i.e., N1; 3) by dropping the candidates most distant from the median until either the range of candidate dates is less than 15 days or less than 4 stations remained or all were at the median date, the final number of candidates i.e., N2; 4) the median date of the final candidates; and, 5) the date range of the candidates i.e., R days. A score, S, was constructed as:
which can range from 0 to 36 as each of the additive components range from 0 to 12. This score measures how well the median date was selected in terms of agreement amongst the stations. The median dates for each region were then fitted to four component von Mises mixture distributions by the same method as used in Stage 1.
 Each region is designated by a station number as indicated on Figure 2 and contains 12 stations. Thus with the 330 turning points found for each station there was a maximum in each region of 3960 periods in the unlikely circumstance that stations never agreed on the day and alternated in sign. However, Table 1 shows the number of periods found was generally somewhat less. For example, for region 1001, the 3960 turning points grouped into 797 periods in each of which the metric had the same sign for a number of stations. The distributions of the number of stations in a period and of the number of days from the earliest to the latest in each period are shown for 1001 in Figure 4. As another example, those for 5744 are also shown in Figure 4 as are the overall distributions. These were all similar with the station number distributions having fat tails such that periods with the maximum of 12 stations were nearly as equally represented as those with only 4 or 5 stations. The date range distributions were dominated by the first class but much of that was due to periods containing a single station and consequently a range of 1 day. As explained in more detail in the Appendix, to avoid such periods, and the other extreme when a period with many members had a large date range, periods were only retained for further analysis when they contained more than one station and the date range of the period was at most 2 days. The numbers of surviving periods are shown in Table 1 where the mean over the regions is 229 which was close to the expected number of 220.
Table 1. Defining a Period as the Interval During Which the Signs of the Metric for the Turning Points Stay the Same, Then for Each Region, the Total Number of Periods and the Number Which Are Just a Few Days Long but in Which When Several Stations Agree on the Sign. Also the Number Selected for the Stage 2 Fitting
Total Number of Periods
Number with >1 station and Period < 3 days long
Number for Stage2
 The Stage 1 fitting method was applied to each region and Figure 5 shows the “best” and “worst” fits. This judgement has been made according to what might be expected within the hypothesis of stochastic seasonality with regard to the variability of season change dates. In particular: each of the four seasons should occur once a year so each component should be equally represented with a fractional representation of 0.25; each component should have a period of the year when it is dominant and of little significance at other times; and the average length of the seasons should not be too short or too long.
 In Figure 5, the dates which are fitted to the four component von Mises mixture distribution are shown as the background histograms. The yellow dashed lines show the individual fits from the 50 trials that formed the largest cluster when sufficient kmeans clustering was applied to produce one cluster with about 25 members. The black lines are the components of the mixture estimated from the yellow fits as described in the last section. The red line shows the sum of the components and, finally, the green and red vertical lines are, respectively, at the mean of the components and at the mid-seasons defined as the places where adjacent components’ pdfs cross. For 5744: the components are of similar size ranging from 20.1% to 29.4%; each component is the dominant one when the black and red lines are close or touch; and, the average season length ranges from 71 days to 115 days. In contrast, for 4771: the components range from 10.7% to 42.1% of the total; no component is ever entirely dominant to the exclusion of the others (the ones located at days 78 and 181 most closely approach dominance but the period around day 250 is confused); and the seasons range from a mean length of 59 days to 104 days. Also an alternation through the year of mean change dates and mid-seasons should occur but for 4771 there is an exception to this near day 250.
 This exception was dealt with by re-setting the mid-season for 4771 at day 250 to be at day 240. A similar exception was found for 2758 and was dealt with in a similar way. Using the mid-seasons as discriminants the turning points were divided into individual seasons and one was chosen from each season as detailed at the end of the last section. In the bottom row of Figure 4, histograms of the scores, S, for two of the regions and for all regions combined show distributions slightly skewed to the right with means at about 20. The score had been devised to aid in the elimination of ill-defined season changes but there seemed no basis to suggest that the low scoring seasons should be dropped so all seasons were then fitted to a four component von Mises mixture. The numbers of season change dates for each region are shown in the rightmost column of Table 1; the average number is 204 so, on average, in 16 of the 55 years one of the seasons could not be identified.
 In Figure 6 counts of the turning points for each week of the year are shown in the top row with the mean change dates and mid-seasons estimated in Stage 1 shown respectively by vertical green and red lines. In the middle row the turning points are plotted as small points by year and day-of-year and coloured according to between which pair of mid-seasons they fall (i.e., by season). The one chosen by the scheme outlined above is indicated by an “x” with a size depending on its score so the larger ones were for those seasons changes when many stations agreed upon a short period. The dates of these chosen ones were used to produce the histogram in the bottom row and the results of the fitting procedure are shown as in Figure 5 with, again, the “best” and “worst” being shown. The “best” was 5744 as before with even tighter fits to the overall envelope while the “worst” was 1564 but this might well be considered to be “better” than the “best” Stage 1 fit at 5744.
 Figure 7 shows New Zealand with the 11 regional fits superimposed at about their correct geographical positions i.e., the four components shown by black lines in the bottom row of Figure 6 are repeated near the top of Figure 7 with label 1564 and at the bottom with label 5744. In each of the 11 fits the four components are of a similar size with each one dominant at and near to its peak. The mean times are shown in the histogram at the bottom right of Figure 7 coloured by season and the median day of year of change for each season is shown by the dashed vertical green line; similarly the median mid-seasons are shown by the dashed vertical red lines. Table 2 contains these mean times and also the mean mid-seasons for each region and the mean season lengths as both the number of days between the median season change dates and the median mid-season dates.
Table 2. For Each Region the Mean Day of the Year When Season Changes Take Place or That Is, on Average, the Mid-Season i.e., the Day Between Season Changes That Is Most Likely to Be of That Season. Also the National Median Day of Year for the Changes and Mid-Seasons (These Are Shown by the Green and Red Lines Respectively in Figure 7) and the Median Number of Days Between Either Season Changes or Mid-Seasons
Mean season change day of year
Mean mid-season day of year
Duration to next
 The fitting process as used in both Stage 1 and 2 can be applied to any set of dates and some preliminary fits to random samples of 220 dates over a period of 55 years indicated these often had the same character as the fits to the data. The question then was: do the turning points extracted from the daily rainfall record have any physical significance? Or are they just another set of random dates? To answer this, 100 sets of 220 randomly selected dates over a 55 year period were processed in the same way as were the turning points from the rainfall records. These had been collected into 11 regions and histograms of the parameters of the von Mises mixtures are shown in Figure 8, for Stage 1 in the top row and Stage 2 in the middle row. This is strictly the case for the mixing fraction and the concentration parameter but the location parameter has been converted into season lengths (i.e., the gaps between adjacent locations). The parameters estimated from the 100 random sets are shown in the bottom row.
 As there were four components the overall mean mixing fraction must be 0.25 for both the data and the random dates. Similarly the overall mean season length must be π/2 = 1.57 but the only constraint on the concentration parameter is that it is positive. Thus t-tests between the Stage 1 or 2 and the simulated mixing fractions and season lengths gave p-values of 1 but for the F-tests on the variances the p-values are highly significant especially for Stage 2. For both mixing fractions and season lengths the variability about the known mean values is smaller for the data than for the simulations. This agrees with what might be expected with regard to the variability of season change dates within stochastic seasonality. Regarding the concentrations in the rightmost column of Figure 8, as there were several values over 50 with a maximum one of 3312, the tail for the simulations has been pulled into the last class shown in the figure. This will decrease both the mean and variance and the p-value of the Stage 2 F-test against the simulations with their full tail was under 10-6 rather than the 0.4 shown on the figure. On the other hand the p-values for the t-test against the full tail were 0.21 and 0.44 which might suggest that the mean concentrations for both the data and simulations were not significantly different but the mode of the Stage 2 results is clearly higher than the mode of the simulations. Again this agrees with what might be expected with regard to the variability of season change dates within stochastic seasonality i.e., the times of season change, although variable, are not excessively so.
6 Summary and Discussion
 The daily rainfall records from 141 stations within New Zealand were grouped into regions with 12 stations in each region and each of the 11 regions so formed was analysed separately by the same methodology. The daily records were recast as cumulative sums to facilitate selection of those days when a change in the mean daily rainrate, or a turning point, occurred. Assuming that four rainfall seasons per year occur then the first 4N turning points, where N is the record length in years, can be taken as candidates for times of season change. To allow for mis-identification, somewhat more than 4N were taken as candidates but by looking for consistency within the region some candidates were rejected and, on average over the regions, approximately 4N were retained.
 The conceptual model for stochastic seasonality is that, given four seasons exist, there are four days of the year which represent the mean times of season change and there is some variability about these days from year-to-year. The statistical model chosen to represent this was a four component von Mises mixture distribution. The parameters of the mixture distribution were estimated by a two stage process with each stage using the full set of candidate dates and having its own rejection process. The first stage used the sense of change of the turning points to classify them into seasons and the estimation procedure resulted in a first estimate of the mid-seasons. These were used in the second stage as discriminant limits to re-classify the turning points for another application of the estimation procedure. The rationale is that as few season changes would be expected near the mid-seasons, an initial, even if coarse, estimate of their positions should provide better discrimination than the sense of change of the turning points. In both stages, the estimation procedure was applied many times to randomly chosen initial values for the location of the mean dates of season change. Especially in the second stage, many of the fits from these random starts were similar in terms of the shapes of their pdfs and a final fit was initiated from those parameters that engendered the “mean” pdf.
 The results from the 11 regions were similar with the fractional representation of the components in the mixture distributions being around 0.25 and so supporting the assumption of four seasons a year. Also the mean season change dates were relatively evenly spaced so none of the seasons were, on average, particularly short or long. Also, the variability about these mean dates was moderate so, again, no seasons were particularly short or long. Over New Zealand for each season, generally, half of the regions agreed to within two weeks on the mean date of change; this was particularly so for those changes at about day 50 and day 130 but for the other two seasons there were almost two modes centred two weeks either side of the overall mean. However, these variations did not follow any geographical pattern.
 The application of a defined method gave results that aligned with expectations regarding stochastic seasonality and, to increase confidence that the statistical techniques had indeed revealed physical behaviour, the method was applied many times to randomly generated sets of dates. The results from the observed dataset were highly significant and different from those of the random datasets in ways that underlined the physical nature of the result. These ways are: firstly,only occasionally were there three or five, rather than four, seasons in a year whereas this occurred frequently with the random dates; secondly, mean season lengths stayed close to 90 days but some much shorter or longer ones occurred in the random data; finally, although some extremely well-defined seasons occurred in the random dates, the inter-annual variation in the observed data was generally less than in the random data.
 The estimation of dates for the actual times of season change was made in two stages. The first stage made an initial guess from candidate dates as detailed below and by fitting these to the four component von Mises model found estimates of the mean dates of the mid-seasons i.e., the dates approximately half way through the seasons. In the second stage these mid-seasons were used as fixed limits to the seasons so that a more refined guess, as detailed below, for the actual dates from the same set of candidate dates could be made and followed by another fit to the von Mises mixture. This appendix uses an example from one group of stations for a period of 400 days to illustrate how, from the same set of candidate dates, the two stages select the actual dates of season change.
 In Figure A1 the candidate dates are plotted so, for example, a possible season change was detected for Station 8 of the 5744 group on about day 390; or, as another example, Stations 1 to 7 and also 10 all agree that the season changed on day 415. In the upper panel the start and end of periods during which the metric of the stations in the group are of the same sign are shown by the dashed vertical green and red lines. For example, for the period from about day 390 to day 415 the metric for all 12 stations was the same then on about day 450 the metric of Station 1 changed its sign but not until about day 570 did the metrics of Stations 7, 8, 10 and 11 change, followed shortly afterwards by Stations 1 (again) and 2, 3, 4 and finally 9 with 5, 6 and 12 not showing a change at all. In this example, the first change for Station 1 was discarded because the metric on that occasion was of smaller amplitude than at its other change. Then the dashed vertical blue line near day 580 shows the median date of all the change dates in that period i.e., the estimated actual season change time. Other vertical dashed blue lines indicate actual season change times but only those times where more than one station was involved and the spread of the individual dates was no more than 2 days were retained. These are indicated by the solid vertical blue lines (for purposes of illustration this was relaxed to 6 days in the example otherwise there were no survivors). These were fitted to the von Mises mixture and the mid-seasons were estimated.
 The thick vertical black lines in the bottom panel of Figure A are drawn at these estimated mid-seasons. In Stage 2 from all the candidate season change dates between adjacent mid-seasons a date for a season change was estimated. In the example, there are no candidates between about day 470 and day 550 or after day 750 so for those periods no season change date can be estimated. For the other seasons the selection process is as follows: drop candidates whose metric does not have the sign of the majority of the candidates i.e. those in the figure with an X; where more than one exists at a particular station only retain the one closest to the overall median of the remaining candidate dates i.e., those in the figure with the big dots; drop candidates furthest from the median one by one until either the date range of the remainder is less than 15 days or less than 4 stations remain or all stations agree on the date i.e., leaves those large dots in the figure with an internal white dot; the median date for these remaining stations is taken as an estimate of the actual date of season change i.e., the vertical blue lines in the figure. Again these (i.e. the ‘X's in the middle panels of Figure 6) were fitted to the von Mises mixture to give the fits shown in the bottom panels of Figure 6.
 This work was funded through core funding from the New Zealand Ministry of Science and Innovation. Walter Zucchini provided support that the approach taken was viable and Pierre Ailliot a valuable introduction to fitting von Mises distributions.