Intraseasonal predictability of natural phytoplankton population dynamics

Abstract It is difficult to make skillful predictions about the future dynamics of marine phytoplankton populations. Here, we use a 22‐year time series of monthly average abundances for 198 phytoplankton taxa from Station L4 in the Western English Channel (1992–2014) to test whether and how aggregating phytoplankton into multi‐species assemblages can improve predictability of their temporal dynamics. Using a non‐parametric framework to assess predictability, we demonstrate that the prediction skill is significantly affected by how species data are grouped into assemblages, the presence of noise, and stochastic behavior within species. Overall, we find that predictability one month into the future increases when species are aggregated together into assemblages with more species, compared with the predictability of individual taxa. However, predictability within dinoflagellates and larger phytoplankton (>12 μm cell radius) is low overall and does not increase by aggregating similar species together. High variability in the data, due to observational error (noise) or stochasticity in population growth rates, reduces the predictability of individual species more than the predictability of assemblages. These findings show that there is greater potential for univariate prediction of species assemblages or whole‐community metrics, such as total chlorophyll or biomass, than for the individual dynamics of phytoplankton species.


| INTRODUC TI ON
Primary production by marine phytoplankton fuels marine ecosystems and marine fisheries (Ryther, 1969). Marine phytoplankton contribute nearly half of global net primary production (Field et al., 1998) and modulate climate by exporting carbon to ocean depths (Falkowski & Oliver, 2007). Because of the importance of phytoplankton dynamics for marine food webs, fisheries, and biogeochemical cycles, there is growing interest in making skillful future predictions of phytoplankton population and community dynamics (Anderson et al., 2016).
Phytoplankton population and community dynamics exhibit variability on rapid (daily to interseasonal) as well as longer-term (interannual to decadal) timescales (Barton et al., 2016;Chavez et al., 2003;Chiba et al., 2012;Falkowski & Oliver, 2007). These dynamics are affected by changing environmental conditions (Falkowski & Oliver, 2007;Margalef, 1978), but also by physiological (i.e., a change in organism traits without a change in genome) and evolutionary (i.e., a change in genome) responses to these changes (Collins et al., 2014;Hunter-Cevera et al., 2014;Irwin et al., 2015;Lohbeck et al., 2012) and interactions within food webs (Di Lorenzo & Ohman, 2013;Ripa & Ives, 2003;Vasseur, 2007;Xu & Li, 2002). While environmental variations on certain spatial and temporal scales may be predictable (e.g., seasonal variations in light), the temporal and spatial trajectory of geophysical turbulence and climate dynamics are typically less predictable (Lorenz, 1963). Consequently, phytoplankton population and community dynamics that are extrinsically forced by environmental variations tend to be unpredictable. The complexity of physiological and evolutionary responses to these changes (Collins et al., 2014;Hunter-Cevera et al., 2014;Irwin et al., 2015;Lohbeck et al., 2012), as well as non-linear and potentially chaotic dynamics with plankton communities (Ascioti et al., 1993;Benincá et al., 2008;Giron-Nava et al., 2017;Huisman & Weissing, 1999), adds to the difficulty of predicting plankton community dynamics. Because of the demonstrated empirical links between variations in plankton community structure, detrital flux to the benthos, and trophic efficiency and fisheries production (Stock et al., 2017), the management of living marine resources could be improved with skillful predictions of phytoplankton community structure (Hobday et al., 2016;Marshall et al., 2019;Tommasi et al., 2017).
Anomalies in phytoplankton populations (here defined as a deviation from mean conditions) rapidly decorrelate in time and space (Doney et al., 1998). For example, phytoplankton population anomalies in most of the global ocean persist for only a few weeks, on average (Kuhn et al., 2019). In contrast, midlatitude sea surface temperature and nutrient anomalies last, on average, a few months to a year or longer (Deser et al., 2003;Kuhn et al., 2019), and may have interannual persistence when subsurface anomalies are reexposed by deep water column mixing in subsequent years (Deser et al., 2003). Longer duration, persistent anomalies in temperature are possible due to dominant modes of climate variability and marine heatwaves, for example, in the tropical Pacific due to El Niño Southern Oscillation (Deser et al., 2010) and in the North Pacific due to the North Pacific "blob" in 2013-2015 . Surface temperature can, in some regions, be predicted skillfully months or even years in advance (Song et al., 2008;Stock et al., 2015;Taboada et al., 2019). Ocean surface chlorophyll and primary productivity may also be predictable, up to years in advance, though the degree of predictability varies strongly in space (Park et al., 2019;Taboada et al., 2019). Prediction skill of the physical and chemical environment, and aggregate measures of primary producers (e.g., chlorophyll and primary production) on intraseasonal and longer timescales is developing quickly (Park et al., 2019), and predictive models exist for select, influential taxa such as harmful algal bloom-forming taxa (e.g., Anderson et al., 2016). However, ecological predictions for a broad range of plankton populations have not been conducted, and less is known about whether predictability for integrated species assemblages may differ from that of individual taxa.
In this paper, we quantified the predictability of single species and multi-species assemblages of phytoplankton using monthly averaged phytoplankton time series data for 198 phytoplankton taxa sampled at the long-term coastal monitoring Station L4 in the Western English Channel (50° 15'N, 4° 13'W)

collected between
October 1992 and December 2014. In this context, we defined an assemblage as a group of more than one species, where the abundance of the assemblage is the sum of the abundance of the individual taxa within the assemblage. Using empirical dynamic modeling (Sugihara & May, 1990), we first quantified the univariate predictability of each taxon, defined as the correlation coefficient between predicted and observed taxa abundance one month into the future.
A higher correlation coefficient implies higher predictability. We then aggregated species based on several levels of taxonomic hierarchy and/or individual size, with assemblage sizes ranging from 2 to a maximum of 198 species, and assessed the assemblage predictability in comparison to the predictability of individual taxa. Our decision to directly forecast the abundance of phytoplankton assemblages (instead of aggregating the forecasted abundance of individual species) was driven by the fact that many ecological measurements, such as total chlorophyll or biomass (Barton et al., 2015), taxonomic assignations to a level of organization higher than species (e.g., diatoms), and size-fractionated observations (Hirata et al., 2011;Marañón et al., 2012), are in practice measurements of phytoplankton assemblages.
Specifically, we ask the following: (1) "Are the dynamics of single species more or less predictable than the dynamics of assemblages of species aggregated together?", and (2) "How does assemblage composition affect predictability?". Our underlying hypothesis is that assemblages of species are likely to be more predictable than individual species. Such aggregated community metrics often have lower temporal variance than individual species, provided that the constituent time series are not perfectly correlated (Doak et al., 1998;Schindler et al., 2015;Vasseur & Gaedke, 2007). This process is called the "portfolio effect" and was first described in financial investment theory to improve stability by increasing portfolio size, and assuming individual investments are not perfectly correlated (Markowitz, 1952). The "portfolio effect" has been demonstrated to be effective in guiding large-scale fisheries restoration efforts by quantifying multi-stock dynamics (DuFour et al., 2015), as well as understanding changes in diversity-stability relationships (Lhomme & Winkel, 2002). These approaches typically use the portfolio effect as a measure of aggregate stability, such as for tracking the restoration of Chinook salmon population diversity (Yamane et al., 2018) or for monitoring soil microbial populations following perturbation (Wagg et al., 2018). The potential for using "portfolios" of phytoplankton species to improve predictability of critical, aggregate ecosystem components has not yet been evaluated. In addition, we ask the following: (3) "What factors affect the predictability of phytoplankton assemblages?". To address this question, we developed a simple model resolving species interactions and stochasticity in a community of phytoplankton to understand the mechanisms that explain changes in predictability between individual and assemblages of taxa.

| MATERIAL S AND ME THODS
The Materials and Methods section first describes our analysis of phytoplankton time series data from the English Channel, where we sought to quantify how predictability of aggregated time series changes with assemblage size, and thereby address questions 1 and 2 posed in the Introduction. We then describe two idealized plankton community models that help understand the factors that affect the predictability of phytoplankton assemblages (question 3).

| Ecological data from the English Channel
Station L4 is a coastal marine station located approximately 10 nautical miles off Plymouth, UK, and is characterized by summer nutrient depletion with seasonal vertical and horizontal influx of nitrate into the system (Smyth et al., 2010). Total chlorophyll and functional groups of phytoplankton (e.g., diatoms) exhibit pronounced seasonal cycles at this station, but relatively small long-term trends (Smyth et al., 2010;Widdicombe et al., 2010b).
Water samples were collected on weekly basis (weather permitting) at a depth of 10 m using a 10-L Niskin bottle, and species were identified by light microscopy using the Utermöhl technique (Widdicombe et al., 2010b). Although measurements at station L4 are conducted on a roughly weekly basis, we calculated monthly averages from the available weekly data in order to minimize gaps in the time series. Monthly averages were calculated by an arithmetic mean of all observed points for each month. The dataset includes 198 phytoplankton taxa, including well-defined species (e.g., Guinardia delicatula), but also more broadly defined groups such as "Phytoflagellates 2 μm," indicating all flagellates with a mean diameter of 2 μm (Widdicombe et al., 2010a(Widdicombe et al., , 2010b. In this dataset, there were a wide range of phytoplankton species and associated dynamics, from numerically abundant, generic groups such as "Phytoflagellates 2 μm" (Figure 1a) to rarely occurring Prorocentrum dentatum ( Figure 1b) and seasonally occurring Paralia sulcata ( Figure 1c). Each time series spans from October 1992 to December 2014, and the length is consistent for all taxa. We show the time series for these three taxa to give a sense of the range of dynamics present in the phytoplankton record.

Some of the most numerically dominant diatom species were
Leptocylindrus minimus, Chaetoceros socialis, and Pseudo-nitzschia delicatissima, whereas the most numerically dominant dinoflagellate species were Karenia mikimotoi and Prorocentrum cordatum. Further details on the phytoplankton time series, such as taxonomic classification (e.g., diatom, dinoflagellate, coccolithophore, and phytoflagellate), cell radius, mean abundance and standard deviation of abundance, can be found in Table A1 in Appendix 1.

| Assemblage formation
We estimated the predictability of both individual time series and grouped time series created by aggregating multiple time series together (i.e., creating an assemblage of taxa). We grouped taxa together: (a) randomly, (b) within the diatom, dinoflagellate, coccolithophore and phytoflagellate functional groups, and (c) by cell size.
In the case of grouping species randomly, we selected a random subset of all taxa for each assemblage size (2, 3, 4,…, 198). In the case of taxonomic groups, we followed the same process but limited the selection to only diatoms (n = 130), dinoflagellates (n = 37), coccolithophores (n = 16), or phytoflagellates (n = 14). We excluded the time series of Phaeocystis for this part of the analysis as it was the only time series in its taxonomic category. In the case of grouping by cell size, we created broader categories based on approximate cell radius (<5 μm, 5-12 μm, >12 μm) and randomly sampled taxa within those categories. Each assemblage time series was an arithmetic sum of individual species cell density (cells L −1 ) at each month over time. Since some assemblages might have higher or lower prediction skill by chance, we ran 1000 trials to generate a distribution of forecast estimates for each assemblage size.
Prediction estimates for each assemblage size were provided as a mean correlation value ( ) with 95% confidence intervals (defined as ±1.96 × SE). As there are only 198 taxa that are available, it is likely that some of the 1000 trials for each assemblage size were not unique combinations of taxa (in particular, the largest assemblage sizes). To minimize selection bias, we did not make predictions on assemblages beforehand nor look at pairwise correlations within an assemblage. Each species had the same probability of being included in an assemblage.

| Empirical dynamic modeling
We used empirical dynamic modeling (EDM) to estimate the univariate predictability of both individual and grouped time series.
Briefly, EDM is a non-parametric framework for creating predictive models of dynamic systems (Sugihara & May, 1990). EDM has been successfully applied to many problems, such as predicting fisheries recruitment (Munch et al., 2018) and demonstrating the nonlinearity of large-scale oceanographic processes (Hsieh et al., 2005). In this study, we employed simplex projection with leave-one-out crossvalidation to predict phytoplankton assemblage and taxa dynamics (method described below; Sugihara & May, 1990).
Natural phytoplankton communities can be complex with large numbers of interacting species. The population dynamics of any one species can be subject to a large number of unknown variables. Takens's theorem (Takens, 1981) suggests that it is possible to reconstruct the dynamical attractor of a system with time lags of a variable of interest. This concept has been historically applied in ecology to understand the population dynamics of species in complex biological systems (Godfray & Blythe, 1990;Schaffer, 1984), including the prediction of coastal phytoplankton dynamics (McGowan et al., 2017).
Based upon state-space reconstruction using Takens's theorem (Deyle & Sugihara, 2011), we focused on the univariate prediction skill of individual or aggregated time series-to test how well an individual time series can predict itself. This is an equation-free approach that allows us to recover mechanistic relationships in the ecosystem (DeAngelis & Yurek, 2015;Ye et al., 2015). Univariate attractors are constructed from lagged embeddings of the variable of interest. The number of lags is determined by E, the embedding dimension. A detailed discussion on embedding dimensions can be found in Godfray and Blythe (1990). Univariate attractors can then be represented as where x t is the value of a variable of interest, x at time t, and x t−1 is its value at time t−1, etc. We selected for the optimal embedding dimension (i.e., the E with the highest prediction skill ρ) from 1 to 10 for each phytoplankton time series, whether for individual species abundance or for aggregate assemblages. Following this reconstruction in E-dimensions, we then look at the nearest neighbors of every point and track their movement in time. Future predictions are the weighted average of the trajectory of the nearest neighbors using leave-one-out cross-validation. Further details on the method outlined here can be found in Sugihara and May (1990).
Empirical dynamic modeling is one of many different approaches to make short-term predictions using time series embeddings (Casdagli, 1989;Farmer & Sidorowich, 1987).
Prediction estimates are given by ρ (0 ≤ ρ ≤ 1), which is the Pearson correlation coefficient between observed data and the predicted values based on time series reconstruction. A higher ρ means that predictions a month in advance are closer to observations, whereas lower ρ values mean the predictions are less accurate. We elected to use ρ as a metric of prediction skill, as opposed to other possible metrics (Kim & Kim, 2016), because of its simplicity and widespread use in empirical dynamical modeling studies (e.g., Chang et al., 2017). We next assess whether predictability exceeds what would be expected from seasonal ecological changes.
Phytoplankton in natural ecosystems often respond to seasonal environmental and biological changes. Therefore, we describe a method for testing whether the predictability we estimate within each time series exceeds what is expected from seasonality alone.
Here, we implemented a surrogate test for each time series prediction. Surrogate time series were created in a series of steps: (i) calculate the climatological seasonal cycle for each taxon or assemblage (i.e., average all the data from all Januarys in the time series to calculate the average January phytoplankton abundance), (ii) calculate the residuals, or anomalies, by subtracting the seasonal cycle from each time series, (iii) shuffle the time series of residuals, and (iv) add the shuffled residuals back to the repeating climatological seasonal cycle. The resulting surrogate time series removes the ecological dynamics within the time series but retains the seasonal cycle. We then calculated a ρ value using the simplex methodology described above for this surrogate time series to assess the predictability of the seasonality. For each time series (1000 trials each of assemblage size 1, 2, 3, … maximum assemblage size), we created 100 surrogate time series. A direct comparison of ρ between a time series and its associated surrogates then indicates the strength of our model in predicting actual ecosystem dynamics over seasonal forcing. All analyses were conducted with the "rEDM" package (v1.2.3; https:// github.com/Sugih araLa b/rEDM) in R (R Core Team, 2020). All the plots were created using the R package "ggplot2" (Wickham, 2016).

| Noise modeling
We created a simple model to test how noise added to a time series, for example as measurement error, influences predictability. In the model, the abundance of each species through time (x i,t ) is a sine function with a species-specific phase shift and time-varying noise added: (1) , is the frequency of oscillation, is the phase shift (rad), obs is the scaling factor of noise (unitless), obs i,t is the observational error cells L −1 , and obs is an offset cells L −1 to ensure that The amount of observational error in the time series ( obs i,t ) was selected from normally distributed noise and modulated by obs .
Observational error refers to error occurring from the sampling process. In this example, the model resolution is monthly, and we add noise to each monthly time step. We tested the predictability of 100 species in assemblages (of 1, 2, 3…100 species) with three levels of noise ( = 0, 0.5 and 1). The phase shift ( ) allowed for some variability in dynamics across the model species. The values for each of the parameters can be found in Table 1. Sample time series from the model can be found in Figure S1.

| Phytoplankton community modeling
We next describe an idealized model of an interacting phytoplankton community that we used sequentially to test how inter-species interactions and stochasticity in vital rates influence predictability for individual and multiple model taxa. The model is based on the Lotka-Volterra competition equations (Lotka, 1920). The abundance of species i , x i cells L −1 , is controlled by the realized population growth rate r i,t day −1 , the carrying capacity K i,t cells L −1 , and the sum of interactions ( i,j ;unitless) with other species (x j ): The realized population growth rate and carrying capacity for each species vary through time and are explained below. We allowed each species to interact with only one-fourth the total number of species in the ecosystem, and interaction strength was regulated by . The interaction strength between any two species i,j was randomly selected from a uniform distribution with limits ± and varied on a monthly basis. We created two treatments for interaction strength: (i) = 0 for no interactions between species and (ii) = 0.25 for strong interactions between species. We limited the number of interacting species to one-fourth the total in order to simulate a community where many but not all species interact directly. Since interactions could be both positive and negative, every species had a minimum abundance of 1 × 10 −10 cells L −1 through time to prevent extinction.
The carrying capacity of each species (K i,t ) was controlled by the seasonal cycle without the addition of noise (Equation 3): The scaling parameter K was constant and equal for every species. We added a small value to ensure that K i,t > 0. Each species had a basal physiological growth rate day −1 selected from 0.8 to 1.2. To introduce process noise into the system (i.e., noise added to the key organism trait in the model), we added randomly generated values of growth rate p i day −1 to the seasonal cycle of each species. The values were selected from a normal distribu-tion∼ mean = 0, 2 = 1 . We chose to add noise on monthly timescales, assuming that environmental processes that influence growth rates (such as sea surface temperature anomalies) persist for weeks to months (Kuhn et al., 2019). The realized population growth rate for each species (r i ) at time t was a function of the physiological growth rate ( ), the seasonal cycle, and total amount of process noise Frequency of oscillation rad month −1 2 12 obs Scaling factor for noise -0, 0.5 (low) and 1 (high) Observational error for species i

TA B L E 1 List of model parameters, units, and values used in the noise model
The parameter p was determined at the start of each experiment. We had two treatments for adding process noise: (i) p = 0 for no process noise and (ii) p = 1 for high process noise. p day −1 is an offset to ensure that r i ≥ 0 after the addition of process noise.
We simulated 20 years of data from our model with a 6-h time step. We aggregated the 6-hourly data to monthly averages to maintain our timescales of prediction and keep our model comparable to L4 monthly averaged data. Our goal was to check the effects of interaction strength ( ) and level of process noise ( p ) on prediction skill (ρ) across different assemblage sizes in the model ecosystem. Values for each of the parameters in the model can be found in Table 2. Sample time series from the model can be found in Figure S2.

| RE SULTS AND D ISCUSS I ON
3.1 | Are single species more or less predictable than assemblages of species?
3.1.1 | How does assemblage size affect predictability?
Predictability increased with the number of aggregated taxa, regard- One implication of this result using univariate methods is that while individual species may not be highly predictable on average, many species aggregated together can be much more predictable.
This result appears consistent with empirical studies from aquatic and marine settings (Schindler et al., 2015) as well as theory (Koellner & Schmitz, 2006). For example, experimental acidification of lakes has found that species composition changes markedly while total biomass may not (Frost et al., 1995;Schindler, 1990) and that experimental nutrient enrichment across multiple lakes produced contrasting responses in community structure but relatively consistent increases in total biomass (Cottingham & Carpenter, 1998). Mutshinda et al. (2016) found that the total biomass dynamics of diatoms and dinoflagellates at the L4 station in the English Channel were distinct from one another and tied to environmental variations, but that the biomass of individual species within each assemblage was typically less tied to environmental conditions.
Most ecosystems have a mix of abundant and rare species, and some individual species have higher individual predictability than others. As assemblage size increases, the likelihood that the few abundant and/or more predictable species will be included in the assemblage increases. We tested whether the increase in assemblage predictability with assemblage size (Figure 2) could be attributed to the inclusion of more abundant and/or predictable taxa by repeating the analysis in Figure 2 but excluding: (a) the top 25% of most numerically abundant taxa, defined as the time series with highest mean abundance through time ( Figure S4) and (b) the top 25% of most predictable taxa, defined as the time series with the highest univariate predictability through time ( Figure S5). Even when excluding the most abundant or predictable species, we found that predictability increased with assemblage size.

| The effect of noise on model time series predictions
Using the simple model where species differed only in the timing of their seasonal blooms and noise added to the time series (Equation 1, Section 2.4), we found that increasing the level of noise decreases the predictability of model populations (Figure 3).
In the case where no noise is added to the repeating seasonal cycles of abundance, the system had perfect predictability ( = 1) and increasing the assemblage size did not change predictability.
As the noise increases, the maximum prediction skill decreases, and this reduction is particularly evident for the individual time series (assemblage size = 1) (Figure 3). In the case of low and high noise ( obs of 0.5 and 1, respectively), prediction skill increased with assemblage size, as for the L4 times series data (Figure 2). The aggregation of multiple time series amplified the seasonal cycles by diluting the effect of noise. This is also why the surrogate time series were as predictable as the model time series. These results suggest that noise from a range of sources, such as differences in sampling method and high variance among replicate samples, tends to decrease predictability. The larger the amount of noise added, the larger the assemblage size must be to achieve high predictability. Given that there will always be a range of sources of measurement error in phytoplankton time series such as at L4 (see Barton et al., 2020), this model result suggests that aggregating across multiple species may be a desirable strategy for making predictions.

| Simulated species interactions and stochasticity
Using a simple ecological model that resolves species interactions and stochastic variations in growth rates (Equations 2-4, Section 2.5, Figure S2), we examined how interaction strength and stochastic behavior in growth rates influence predictability. In the case where species did not interact ( = 0) and there was no of 5, 50, and 100 randomly grouped species from October 1992 to December 2014, and (b) mean prediction skill ( ) for assemblages of species ranging in size from 1 to 198 species in the assemblage (red) and their seasonal surrogates (blue). Each point is the mean of 1000 trials, and the black lines represent local regression fits for both sets of data (actual time series and surrogates). The shaded regions are 95% confidence intervals (defined as ± 1.96 × SE ). The confidence intervals narrow with increasing assemblage size because the number of distinct assemblages of species decreases stochasticity ( p = 0) in growth rates, the predictability of all assemblages was perfect ( = 1) and did not increase with assemblage size (Figure 4a). In the case where there was no stochasticity ( p = 0) in growth rates, but species interacted ( = 0.25, Figure 4b), the predictability of individual species was lower than the predictability of aggregated assemblages. Prediction skill increased with assemblage size and exceeded what would be expected from seasonality alone. In the case with stochasticity in growth rates ( p = 1) but no interactions between species ( = 0, Figure 4c), the predictability of individual species was lower than the predictability of assemblages and increased with assemblage size. This predictability was also greater than what would be expected from seasonality.
We found a similar result in the case with both stochasticity and species interactions ( p = 1, = 0.25; Figure 4d). Thus, when stochastic variations in growth rate are uncorrelated across species, the dynamics of individual species may be more difficult to predict than assemblages containing many species. The dynamics of individual species are also more difficult to predict in the presence of inter-species interactions. Because there are many possible factors that influence the growth of phytoplankton in real systems that may be difficult to resolve, and because the interactions between species are in many cases difficult to assess, our results suggest under these conditions, predicting single species will be difficult while prediction of assemblages of species may be more skillful.

| How does assemblage composition affect predictability?
We also explored whether predictability varied across functional groups of species (e.g., diatoms, dinoflagellates, coccolithophores, and phytoflagellates) and across cell size (Figures 5 and 6 2005) collectively rather than on a species level, even though species within functional groups in many cases have different traits (Edwards et al., 2012;Marañón et al., 2013) and ecological dynamics Mutshinda et al., 2016). We also separated taxa measured at L4 into three size classes: small (<5 μm), medium (5-12 μm), and large (>12 μm). The size cutoffs are arbitrary but designed so that each assemblage has a roughly equal number of taxa.
Like functional groups, in many cases phytoplankton of similar size are aggregated together in field measurements or satellite algorithms (Hirata et al., 2011). Cell size constrains many important organism traits, such as growth rate and nutrient affinity (Edwards et al., 2012;Marañón et al., 2013), as well as predator-prey interactions (Hansen et al., 1994(Hansen et al., , 1997 and therefore may provide an additional way of grouping phytoplankton, and drawing out any differences in predictability that could arise as a result of assemblage composition. For each taxonomy or size-based assemblage analyzed, we found that the coefficient of variation decreased, on average, over the multiple possible groupings with increasing assemblage size ( Figure S3).

| Taxonomy
Mean prediction skill ( ) increased with assemblage size for diatoms, coccolithophores, and phytoflagellates (Figure 5a-c). Unlike other functional groups, the predictability of dinoflagellates did not increase with assemblage size (Figure 5d). For all the functional groups, there was a clear difference between the predictive skill of time series and predictions based on seasonality alone. Dinoflagellates also had lower maximum predictability at high assemblage size than other groups (lower in Figure 5d compared to others). Why do dinoflagellates apparently differ from other groups in this regard?
Dinoflagellates are a morphologically and physiologically diverse functional group of phytoplankton (Brandenburg et al., 2018;Hackett et al., 2004;Smayda & Reynolds, 2003). They exhibit not just a range of morphology and size-constrained traits, but also large variations in trophic mode, motility, production of allelopathic chemicals, and other traits (Smayda, 1997;Stoecker et al., 2017). Since predictability did not increase with assemblage size, we believe that dinoflagellate species have dynamics that arise from different processes from each other (i.e., aggregated data are not indicative of a common process on a larger ecological scale). Unlike dinoflagellates, the predictability of both diatoms and coccolithophores increased with group size (Figure 5a and b). The differing performance of aggregated coccolithophores, diatoms, dinoflagellates, and phytoflagellates suggests that increased predictability for phytoplankton assemblages is dependent on the choice of taxa in each assemblage. An implication of the lower performance of certain assemblages is that field measurement programs should focus efforts on identifying to species level those phytoplankton, such as the dinoflagellates, for which predictability does not increase with group size. In contrast, for functional groups such as diatoms, a coarser level of identification (i.e., to genera) would not limit the utility of the data for making predictions.

| Cell size
Next, we describe how predictability ( ) changed across three size bins (<5 μm, 5-12 μm, and >12 μm). For cells in the <5 μm size  (Figures 6a and 7a). This suggests that numerical models used for ecological prediction and forecasting, which typically resolve a few to as many as a hundred phytoplankton, may need to resolve even more biodiversity among the larger size classes (Dutkiewicz et al., 2015;Follows et al., 2007). Similarly, for field monitoring programs designed to detect (Widdicombe et al., 2010b) and forecast changes in marine ecosystems, an ideal sample design might focus taxonomic identification time on the larger taxa but allow some of the smaller, harder to identify taxa to be aggregated together. This could be done using automated or supervised quantification of machine classifiers (Orenstein et al., 2020) that are preferentially trained on larger taxa. The use of flow cytometry or similar flow-through counting of smaller cells might also make monitoring programs more efficient without reducing the utility of data for forecasting changes in marine ecosystems.

F I G U R E 6
Mean prediction skill ( ) of time series (red) and their seasonal surrogates (blue) for cells with radius (a) <5 μm (n = 68), (b) 5-12 μm (n = 63), and (c) >12 μm (n = 67). Each point is the mean of 1000 trials, and the black lines represent local regression fits for both sets of data (actual time series and surrogates). The shaded regions are 95% confidence intervals (defined as ± 1.96 × SE). Pie graphs at right provide the relative diversity of functional groups within each size-based category (Diat, diatoms; Dino, dinoflagellates; Cocco, coccolithophores; Phyto, phytoflagellates; Other, Phaeocystis)

| Study limitations and future directions
Here, we discuss several methodological choices that influence our results. First, we chose to forecast using aggregated data across multiple species, as this is a common feature of many oceanographic and ecological observations. However, in certain cases and systems, aggregating the forecasts of individual species may be more appropriate, for example, if a particular species has disproportionate impacts, such as for harmful algal bloom taxa (Trainer et al., 2012). Second, we used all raw time series from the L4 database which varied in taxonomic resolution (e.g., size-fractionated class like "Phytoflagellates 2 μm" and genera like Chaetoceros spp.). It is likely that the actual predictability of some assemblages might either increase or decrease due to the inclusion of time series with a lower taxonomic resolution. Future studies may consider quantifying this effect by differentiating predictability across different levels of aggregation.
Finally, we chose to use monthly data for our analyses, though considerable phytoplankton dynamics exist at higher frequency. This choice reflected practical considerations regarding data availability, and we were unable to detect any high-frequency dynamics as a result; however, future studies should address how data collection frequency influences predictability.
Anomalies in phytoplankton populations rapidly decorrelate in time and space (Kuhn et al., 2019). Our study was limited in spatial resolution because Station L4 is a fixed sampling location. Because Station L4 is a coastal site, an increase in the spatial resolution of phytoplankton time series might capture more variable environmental conditions and associated dynamics for both phytoplankton species and assemblages of species. Similarly, the temporal resolution of our study was limited by data availability. Our forecast distance of a month reflects a compromise between data resolution and the likely window of skillful future predictions using our methodology.
Predictions at timescales shorter than a month were not possible given the data availability (we used monthly resolved time series), and predictions at timescales longer than a month are difficult because phytoplankton population dynamics rapidly decorrelate in time. However, future studies could consider systematically testing the effect of temporal resolution on forecast skill for phytoplankton species vs assemblages, as well as forecasting beyond one month.

Understanding the effects of temporal resolution on forecast skill
could answer questions about the optimal frequency of field sampling for phytoplankton populations and the appropriate lags to consider while predicting their dynamics.
There are various additional approaches that can be used to maximize the predictability of any time series, but here we used a univariate simplex approach, instead of bringing in other ecological or environmental variables to improve prediction. One such approach would be multiview embeddings (Ye & Sugihara, 2016). Multiview embeddings are a multivariate approach that leverage information in causally related variables to increase the predictability of any individual time series. This typically relies on identifying important variables for every taxon or iteratively testing multiple variable combinations for each time series. In our study, the availability of 22 years of data and 198 individual time series allowed us to utilize a simpler, univariate approach to answer our questions. Future studies could consider multivariate approaches to improve the predictability of any target species or assemblage.

| CON CLUS IONS
The dynamics of individual phytoplankton taxa are noisy and typically difficult to predict. We used empirical dynamic modeling to assess the univariate predictability of taxa and assemblage (i.e., groups of more than one taxa) dynamics over monthly timescales and found F I G U R E 7 Mean prediction skill ( ) of time series (red) and their seasonal surrogates (blue) for only diatoms of the following radius: (a) <5 μm (n = 44), (b) 5-12 μm (n = 40), and (c) >12 μm (n = 46). Each point is the mean of 1000 trials, and the black lines represent local regression fits for both sets of data (actual time series and surrogates). The shaded regions are 95% confidence intervals (defined as ± 1.96 × SE) that increasing the number of taxa in an assemblage increased predictability. Predictability was also significantly affected by how species were grouped together. Dinoflagellates and large phytoplankton (>12 μm cell radius) had lower overall predictability and did not increase in predictability with assemblage size. In contrast, aggregating species as coccolithophores, diatoms, and phytoflagellates led to improved predictability of the composite assemblage abundance time series over individual taxa. Similarly, small phytoplankton (<5 μm cell radius) were more predictable in assemblages than as individual taxa and this predictability exceeded that which we expect from seasonality alone. The presence of noise in our simulations, such as observational error and stochastic environmental influence, reduced the overall predictability of phytoplankton taxa.
While our study considers only 22 years of data from one temperate coastal ocean location, our findings have several implications.
Firstly, field monitoring programs should continue to focus efforts on species-level identification of dinoflagellates and large phytoplankton. In contrast, high predictability of smaller phytoplankton and coccolithophores, diatoms, and phytoflagellates could be achieved by aggregating them together, for example, by size fractionating measurements or identifying species only to genus or higher level.
Overall, our results suggest that the dynamics of species assemblages will often be more predictable than that of individual species, but that the increase with assemblage size depends upon how species are grouped together. In certain cases, forecasting "portfolios" or assemblages of species together rather than as individuals may improve population forecasts and the management of living marine resources.
One such example of how our results may influence management of living marine resources is the food available for forage fish such as sardines and anchovies in coastal upwelling biomes (Checkley et al., 2017). While individual species fed upon by these economically important fish might not be individually predictable, aggregating across prey species within the preferred prey size range of each fish species might increase predictability of total available food and facilitate better management of the fisheries, especially by accounting for variability in predator responses and predator-prey interactions (Hunsicker et al., 2011). Another example is forecasting of phytoplankton genera (e.g., Anderson et al., 2016). The diatom genus Pseudo-nitzschia, for example, includes many species and a range of strains within species (Hubbard et al., 2008;Trainer et al., 2012).
Certain strains and species of Pseudo-nitzschia spp. produce domoic acid, a potent neurotoxin with negative impacts on ecosystems. Our results suggest it may be more feasible to predict the total abundance of Pseudo-nitzschia spp. than constituent species and strains.
Future studies should assess whether similar increases in predictability with increasing assemblage size occur in other marine, aquatic, and terrestrial systems, and assess whether the implications of this study apply more broadly.

ACK N OWLED G EM ENTS
We would like to thank the Plymouth Marine Laboratory for the

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.