Extreme value distributions describe interannual variability in the seasonal North Atlantic phytoplankton bloom

The North Atlantic phytoplankton bloom depends on a confluence of environmental factors that drive transient periods of exponential phytoplankton growth and interannual variability in bloom magnitude. I analyze interannual bloom variability in the North Atlantic via extreme value theory where the generalized extreme value distribution (GEVD) is fitted spatially to annual maxima of satellite‐measured surface chlorophyll. I find excellent agreement between the observed distribution of interannual bloom maxima and those predicted from the GEVD. The spatial distribution of fitted GEVD parameters closely follows basin bathymetry where the largest extremes and heaviest distribution tails are found on the continental shelves and slopes. Trend analyses suggest weak evidence for changes in GEVD parameters, despite regional trends in mean chlorophyll levels and sea surface temperature. These results provide a framework to quantify interannual bloom variability and call for further work examining how extreme blooms propagate through food webs and contribute to carbon export.

bloom (Visser et al. 2011). Blooms also contribute to carbon export and are often associated with large export pulses to depth (Briggs et al. 2011).
Several biophysical mechanisms are thought to control bloom initiation and development (Behrenfeld andBoss 2014, 2018). For example, the critical depth hypothesis predicts that bloom initiation onlys begins when mixed layer light levels exceed the threshold of respiration rates, allowing phytoplankton biomass to increase if division rates exceed other losses (Behrenfeld andBoss 2014, 2018). The critical turbulence hypothesis assumes the same threshold as the critical depth hypothesis but focuses on division rates in the actively mixing turbulent layer (Huisman et al. 1999;Taylor and Ferrari 2011). The dilution-recovery hypothesis suggests a more general balance between phytoplankton division and loss rates (often driven by grazing) allowing blooms to develop when division rates are declining so long as loss rates decline below division rates, for example, with dilution via mixed layer deepening (Evans et al. 1985;Behrenfeld 2010). Autumn blooms are also known to occur in some regions, often due to entrainment-driven nutrient fluxes (Findlay et al. 2006). Across proposed mechanisms, the population-level phenomena of blooms arise due to transient-positive imbalances between division and loss rates (driven by variations in division, loss, or both) such that the net exponential growth rate r t ð Þ is positive, that is, with μ t ð ÞÀl t ð Þ ¼ r t ð Þ > 0 prior to the bloom, where μ t ð Þ is the division rate and l t ð Þ is the total loss rate. Post-bloom, losses increase to match and exceed division, most often because of increased grazer abundance supported by the elevated phytoplankton biomass (Behrenfeld andBoss 2014, 2018). The magnitude and duration of the transient exponential growth period determines the magnitude of the bloom biomass maximum. We hereafter refer to a "bloom" as the period of positive net exponential growth and "bloom magnitude" as the maximum concentration achieved over this period (see Behrenfeld and Boss 2018 for a discussion of bloom definitions).
The North Atlantic basin exhibits one of the largest and well-studied seasonal blooms across the global ocean. Deep winter mixing and rapid re-stratification creates ideal conditions for exponential phytoplankton growth, with bloom initiation following a northward progression as re-stratification occurs earliest at low latitude (Dutkiewicz et al. 2001;Siegel et al. 2002). Meteorological variability then modulates the precise timing and magnitude of the bloom according to the impacts on local mixing dynamics (Dutkiewicz et al. 2001;Follows and Dutkiewicz 2002). Importantly, variability in bloom timing and magnitude has been linked to the magnitude of carbon export (Briggs et al. 2011) and to fisheries productivity via the "match-mismatch hypothesis" (Platt et al. 2003). Climate change is expected to alter North Atlantic bloom dynamics via a range of factors, including changes in seasonal mixing depths, nutrient fluxes, and the metabolic impacts of warmer temperatures (Sommer and Lengfellner 2008). Quantifying the dynamics of the North Atlantic spring bloom is thus of central importance for understanding the relevant oceanographic and ecological processes and will aid in tracking the associated impacts of climate change.
Viewing interannual bloom magnitudes as extreme values in phytoplankton time series brings to bear the statistical theory of extreme values. Under the Fisher-Tippett-Gnedenko theorem, the maximum of a sequence of random variables converges in distribution to the generalized extreme value distribution (GEVD; Coles 2001). The theorem yields a threeparameter probability density function describing the limiting distribution of maximum values generated from samples of a stochastic process, analogous to the central limit theorem for the mean of a distribution (Coles 2001;details below). Like the lognormal or chi-square distribution, the GEVD can exhibit a strong right skew; however, the GEVD has the advantage of being derived specifically for stochastic maxima. While the GEVD is increasingly applied to geophysical and climate studies (Easterling et al. 2000;Katz 2010;Aghakouchak et al. 2020), there have been fewer applications to biological time series (Batt et al. 2017; but see interesting exceptions, e.g., Gaines and Denny 1993;Benedetti-Cecchi et al. 2015;Butitta et al. 2017). Consistent with the exponential nature of biological growth, Batt et al. (2017) found that biological time series exhibit consistently heavier tails in their extreme value distributions relative to chemical and geophysical time series. These findings suggest North Atlantic bloom magnitudes as an ideal target of extreme value analysis, due to its annually repeating cycle of transient and variable exponential growth.
Here, I analyzed the North Atlantic satellite chlorophyll record to quantify seasonal phytoplankton bloom variability via extreme value analysis. I estimated GEVD parameters at a 1 /4 latitude-longitude scale. I mapped the fitted parameters spatially and evaluated the GEVD goodness of fit to the chlorophyll time series. I correlated the fitted parameters to bathymetric properties of the North Atlantic basin. I further evaluated evidence for nonstationarity (i.e., time-variability) in GEVD parameters in the context of satellite-observed chlorophyll and temperature trends across the basin. Results of this study will provide a statistical framework to describe interannual bloom variability and allow us to test an important hypothesis with respect to basin-scale environmental change.

Observations
I analyzed two sets of basin-scale satellite chlorophyll observations. First, I used chlorophyll estimates from the Moderate-resolution Imaging Spectroradiometer-Aqua (MODIS-Aqua) sensor spanning years 2002-2021 from the Oregon State University Ocean Productivity database (https:// sites.science.oregonstate.edu/ocean.productivity/). Temporal resolution of the satellite images was 8 d per image. Eighteen complete time series years yielded 18 annual maxima observations for fitting GEVDs. Associated inherent optical properties were estimated using the Garver-Siegel-Maritorena algorithm (Maritorena and Siegel 2005). MODIS-Aqua-based chlorophyll estimates were gap-filled for missing observations due to clouds according to the algorithm described at http://orca. science.oregonstate.edu/gap_fill.php. While gap-filling can alter the underlying chlorophyll distribution, it is not expected to affect the annually measured maximum value as the method calculates an average over observed time points. Secondly, I used the Ocean Colour Climate Change Initiative (OC-CCI) chlorophyll product (Sathyendranath et al. 2019; accessed from www.oceancolour.org) covering the same time period as the MODIS-Aqua dataset to be consistent between the two analyses. OC-CCI chlorophyll is a synthetic product generated by combining information from multiple sensors (Sathyendranath et al. 2019). OC-CCI was not gap-filled and contained missing values due to clouds. I gridded both sets of chlorophyll observations to 1 /4 latitude-longitude resolution. Wintertime satellite chlorophyll observations were not available for higher latitudes and were replaced with a value of zero concentration which did not affect the extreme value analysis. Sea surface temperature data from the MODIS-Aqua sensor were obtained from the Oregon State University Ocean Productivity Database site cited above. Bathymetric depth data were obtained from https://www.gebco.net/data_and_products/gridded_ bathymetry_data/. I analyzed chlorophyll observations as a proxy for phytoplankton carbon biomass, noting that chlorophyll and carbon can decouple due to photoacclimation processes, particularly at low light (Behrenfeld et al. 2005;Sathyendranath et al. 2020).

Extreme value analysis
Given a time series of chlorophyll observations at an individual location, I estimated the parameters of the GEVD via the block maxima approach (Gilleland and Katz 2016), taking time series blocks as individual years. I define x ¼ max y 1 ,y 2 , …, y n À Á as the maximum chlorophyll measurement in a single year of n measurements. Over m years I have m yearly maxima, thus defining the observed annual maxima time series x 1 ,x 2 ,…,x m (the map showing the month when the maxima most frequently occurred is displayed in Supplementary Fig. S1). For an annual maximum x, the GEVD has a probability density function given by where μ, σ, and ξ are the location, scale, and shape parameters, respectively. The location parameter shifts the GEVD along the x axis, the scale parameter controls the spread, and the shape parameter controls the peaked-ness of the mode and heaviness of the distribution tail. Examples of how μ, σ, and ξ modulate the GEVD are given in Supplementary Fig. S2. Formulas for the expected value, variance, and mode of the GEVD are given in Supporting Information S2. In this study, m = 18 for years 2002-2021 and n = 46 due to 8-d spacing of the satellite observations; that is, I fit a GEVD to 18 yearly annual maximum chlorophyll concentrations at each 1 /4 latitude-longitude grid cell. Goodness of fit was evaluated by quantile-quantile plots where the empirical quantiles of the observations are correlated against the theoretical quantiles predicted from the fitted GEVD distributions.
In addition to the three-parameter GEVD described above, I also apply a nonstationary extension where the parameters are described as simple linear functions of time (Gilleland and Katz 2016) where θ is one of the GEVD parameters, θ 0 is the intercept of the linear relationship, and θ 1 is the slope, that is, the rate of change with respect to time.
The log-likelihood for the GEVD parameters (Gilleland and Katz 2016) is given by I maximized the log-likelihood function with respect to the parameters to obtain empirical estimates. I maximized with respect to μ, σ, and ξ in the case of stationary GEVDs, and with respect to the intercept and slope in the case of nonstationary GEVDs. I restricted the analysis to estimating one nonstationary parameter at a time due to data restrictions and weak identifiability when multiple parameters are allowed to vary. I only considered linear functions of time and suggest nonlinear functions for future work. I used numerical optimization routines implemented in the extRemes library within the R programming language (Gilleland and Katz 2016). I compared stationary and nonstationary fits according to the Bayesian Information Criterion (BIC), given by BIC ¼ is the maximized likelihood at empirical estimates b μ, b σ, and b ξ. k is the number of parameters in the GEVD (three for stationary GEVDs, four for nonstationary GEVDs), and n is the number of yearly maxima used in the fit. A GEVD was fit to chlorophyll time series in each 1 /4 pixel. The parameters were mapped spatially and correlated with basin bathymetry. Parameter uncertainty was derived by taking the square-root of the inverse Hessian matrix of the negative log-likelihood function evaluated at the maximum likelihood estimates.

Results
The distributions of annual chlorophyll maxima showed excellent agreement with those predicted from the GEVD. Across the Atlantic basin, observed distribution quantiles correlated with those from the fitted GEVDs at r = 0.97 on the arithmetic scale ( Supplementary Fig. S3a) and r = 0.98 on the log scale ( Supplementary Fig. S3b). The spatial distribution of the quantile-quantile correlation was also consistent across the basin, with no apparent relationship with latitude or distance from the coast (Supplementary Fig. S3c). The region of largest disagreement occurred off the southwest coast of Europe, yet correlations were still above r = 0.7 and remained so across the basin.
When fitted GEVD parameters were mapped spatially, I found that parameter magnitude closely followed basin bathymetry (Figs. 1, 2). Location, scale, and shape parameters were consistently elevated on the shelf and adjacent waters (< 700 m depth; Figs. 1, 2a-c). Location parameters were elevated by over fivefold, scale parameters elevated approximately twofold, and shape parameters elevated over threefold in waters shallower than 700 m. The distinction in magnitude between shelf and open ocean water was strongest for location and shape parameters, and weaker for scale parameters, demonstrating a correlation between the mean interannual bloom magnitude and the "heaviness" of the underlying distribution tail. This pattern appeared on the eastern and western sides of the basin. Deeper slope waters off the coast of Greenland also showed elevated location and shape parameters. The largest parameter magnitudes were found in the shallow Baltic Sea (Figs. 1c,d, 2a,c). The scale parameter showed a different pattern with bathymetry where parameter magnitude was modestly elevated in shelf waters but also showed a step-change decrease in the deepest waters (> 4000 m; Figs. 1b, 2b). However, the aerial distribution of shelf vs. deep water is markedly different, with deep waters limited to the southern half of the basin around the Mid-Atlantic ridge while shelf seas are widely distributed. Parameter uncertainty weakly correlated with parameter magnitude for location and shape parameters, particularly around the Greenland slope, but had no consistent relationship across the basin ( Supplementary Fig. S4). Scale parameter uncertainty was elevated in deeper water ( Supplementary Fig. S4). The areaweighted average of the parameter coefficients of variation (the standard deviation of the parameter uncertainty divided by the fitted mean parameter) was 0.11, 0.38, 0.24 for location, scale, and shape parameters, meaning the 1σ uncertainty was 11%, 23.5%, and 24% of the mean, respectively ( Supplementary Fig. S4). I repeated the uncertainty analysis by artificially halving and doubling the number of observations to diagnose the impact of sample size on estimation uncertainty. Area-weighted coefficients of variation increased to 14%, 26.3%, and 34.9% when sample size was halved, and decreased to 8%, 16.6%, and 17.0% when sample size was doubled, respectively ( Supplementary Fig. S5).
I quantified the correlation between fitted GEVDs parameters using linear relationships (Fig. 2d-f). Bivariate relationships between parameters were well described by a linear intercept and positive slope, indicating strong positive scaling relationships. The strongest relationship was found between fitted location and shape parameters, reflecting that GEVD distributions increase in magnitude and heavy tailedness with decreasing bathymetric depth. I visually characterized how the GEVD changes from deep to shelf waters using the fitted linear relationships (Fig. 2g,h), noting that this characterization represents the basin-averaged relationships so may not necessarily be representative of individual regions. The empirical pattern underlying positive scaling among location, scale, and shape parameters is also reflected in the underlying extreme value time series where the mean and median of the annual extremes are positively related to the extreme variance ( Supplementary Figs. S6, S7). Using a nonstationary GEVD analysis, I found weak evidence for temporal trends in GEVD parameters, despite significant trends in chlorophyll levels and sea surface temperature across the North Atlantic (Fig. 3). Nonstationary parameters were favored in 34.3%, 29.9%, and 36.3% of basin area for location, scale, and shape parameters, respectively (Fig. 3a-c). Where nonstationary parameters were favored, trends in the location parameter positively correlated with background chlorophyll trends (r = 0.70; Fig. 3; Supplementary Fig. S8). However, trends in the scale and shape parameters did not correlate with chlorophyll nor sea surface temperature trends (Supplementary Fig. S8; r < 0.1 in all cases). The weakest evidence for nonstationary parameters was found in the area with the strongest sea surface temperature trends, specifically the warming-cooling dipole pattern on the western side of the basin caused in-part by a slowdown of the Atlantic overturning circulation and associated northward heat transport (i.e., the North Atlantic "warming hole"; Keil et al. 2020). I also examined parameter variability at the level of Longhurst biogeochemical provinces but found no consistent pattern, with high inter-and intra-province variability ( Supplementary  Fig. S9).
I repeated the GEVD parameter estimation using the OC-CCI chlorophyll product and found consistent results to those based on MODIS observations presented above (Supplementary Figs. S10, S11). I also examined the annually observed extremes for autocorrelation and found weak evidence across the basin ( Supplementary Fig. S12).

Discussion
Our analysis demonstrates that annual chlorophyll maxima are well-described by the GEVD based on the statistical theory of extreme values. I achieved a high goodness-of-fit and interpretable spatial pattern across the North Atlantic basin. A clear pattern emerged in the correlation between GEVD parameters and bathymetric depth, with the magnitude and tailedness of chlorophyll extremes increasing on shelves and slopes, implying more variable extremes in these environments. The mechanism for this pattern is unclear and will require a further understanding of how nutrients, shelf stratification dynamics, grazing, and other factors interact to generate extreme chlorophyll concentrations. The heavy distribution tail on the shelf may be related to variability in bloom timing which has been shown to impact interannual variability in shelf bloom magnitude (Friedland et al. 2015). However, the observed relationships may also be a more general food web phenomenon where the tailedness of the chlorophyll distribution is a function of background concentrations which are only coincidentally elevated in shelf environments. I suggest further work examining the potential mechanisms that could explain different extreme value distributions under contrasting oceanographic conditions. Classical models of phytoplankton blooms (Behrenfeld and Boss 2014) may be extended to include stochastic forcing and generate statistical distributions of interannual bloom magnitudes.
Results of the nonstationary analysis suggested weak evidence for changes in GEVD parameters over time. The lack of correlation with observed trends in background chlorophyll and sea surface temperature suggests that climate-related drivers are playing a limited role in modulating interannual bloom magnitude, despite the North Atlantic showing significant climate change (Keil et al. 2020). I note, however, that the current nonstationary analysis is limited by time series length, here applied to 18 yr of observed annual maxima. Uncertainty in the chlorophyll satellite record is also introduced via missing values due to clouds which partly obscures the distribution of maxima. Here, I found that gap-filling the missing observations had no appreciable effect on the estimated extreme value parameters when comparing the gap-filled and non-gap-filled chlorophyll datasets. Continued studies will be required to monitor changes in bloom magnitude over time. An extended satellite record will provide greater statistical power to detect climate-driven trends. An extended record will also constrain more complicated functions describing the variation of parameters with time and their statistical association with environmental factors. Biogeochemical general circulation models may also be analyzed to gain insight into the distribution and generating mechanisms of annual chlorophyll maxima.
Ecologically, interesting questions arise about how extreme blooms propagate through food webs, contribute to carbon export, and impact ecological processes more broadly. For example, annual fisheries recruitment is often characterized by heavy-tailed distributions where individual years exhibit extremely large cohorts, often fueling the fishery for years (Saetre et al. 2002). Extreme bloom years may increase the probability of strong cohorts via the match-mismatch mechanism (Platt et al. 2003), perhaps with temporal lags between blooms and recruitment modulated by trophic transfer. With respect to carbon export, I expect extreme blooms to contribute disproportionately to interannual carbon fluxes due to the commonly observed positive effect of phytoplankton productivity on carbon export efficiency (Britten and Primeau 2016) and observed carbon fluxes associated with blooms in the North Atlantic (Briggs et al. 2011). The observed positive relationship between location and shape parameters may be of particular interest where the frequency and intensity of extreme blooms scales with the background chlorophyll levels. Databases of carbon flux observations may be used to test these relationships at the basin and global scale (Mouw et al. 2016). The GEVD is one extreme value analysis tool that is theoretically motivated when considering block maxima (which was particularly appropriate here to describe the maxima of a repeating annual cycle); however, other statistical descriptions of ecological heavy tailedness can also be used in this context, for example, the lognormal distribution (Anderson et al. 2017).
In summary, the GEVD provided a useful statistical description of bloom variability and a general framework to quantify the spatiotemporal statistics of interannual bloom maxima. I hope this study leads to further analysis of marine ecosystem variability using extreme value theory to better understand how environmental conditions give rise to ecological extreme events, how extreme blooms contribute to fisheries productivity and carbon export, and how these processes may change with climate.