Potential for Early Forecast of Moroccan Wheat Yields Based on Climatic Drivers

Wheat production plays an important role in Morocco. Current wheat forecast systems use weather and vegetation data during the crop growing phase, thus limiting the earliest possible release date to early spring. However, Morocco's wheat production is mostly rainfed and thus strongly tied to fluctuations in rainfall, which in turn depend on slowly evolving climate dynamics. This offers a source of predictability at longer time scales. Using physically guided causal discovery algorithms, we extract climate precursors for wheat yield variability from gridded fields of geopotential height and sea surface temperatures which show potential for accurate yield forecasts already in December, with around 50% explained variance in an out‐of‐sample cross validation. The detected interactions are physically meaningful and consistent with documented ocean‐atmosphere feedbacks. Reliable yield forecasts at such long lead times could provide farmers and policy makers with necessary information for early action and strategic adaptation measurements to support food security.


Introduction
Agriculture is of particular strategic importance in Morocco. Most of the arable land is devoted to cereals with wheat accounting for the majority of total cereal production and thus playing a key factor for national food security. However, most of the arable land is located in arid or semiarid regions which are characterized by long dry periods and high year-to-year rainfall variations (Born et al., 2008). Since very little of the arable land is irrigated, this leaves Morocco's wheat production heavily dependent on large fluctuations in rainfall intensities (Berdai et al., 2011). Reliable seasonal forecasts could help in reducing the vulnerability of the Moroccan agriculture to weather risks by enabling timely in-season adaptation. Since the Moroccan climate is projected to become drier and hotter with ongoing global warming, such forecasts will likely become even more important in the future (Born et al., 2008;Filahi et al., 2017).
Operational yield forecasting systems provide estimates at lead times of a few days up to 3 months before harvest in May-June. Provisional forecasts are released every year by the Crop Growth Monitoring System-Morocco (CGMS-MAROC) in April and then constantly revised over the course of the season. CGMS-MAROC uses a physical crop growth model combined with statistical models (Bernardi, 2016;Bregaglio et al., 2014). Based on empirical regression models using weather and vegetation data, Balaghi et al. (2008) accurately forecast grain yields as early as of March. Yet, both approaches use the Normalized ©2020. The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

10.1029/2020GL087516
Key Points: • Moroccan wheat yield anomalies are hindcasted 4 months before harvest based on climate precursors • Precursors are extracted from gridded fields of climate variables using physically guided causal discovery algorithms • The detected causal interactions are physically meaningful and consistent with documented teleconnections in the climate system Difference Vegetation Index (NDVI) during mid-season of the growing phase, which limits the earliest possible release date to early spring.
Longer lead times may be achieved through utilization of remote climatic drivers which influence rainfall variability over Morocco and thus wheat production. Total annual wheat yields are significantly correlated to accumulated rainfall during the rainy season lasting from September to May (De Wit et al., 2013). Intraseasonal rainfall variability in turn is influenced by large-scale climate dynamics including atmospheric circulation patterns and sea surface temperatures (SSTs) over the Pacific and Atlantic Ocean, which may persist over months, allowing for skillful forecasts of rainfall and NDVI at extended lead times (Jarlan, Driouech et al., 2014;Knippertz et al., 2003;Rodríguez-Fonseca et al., 2006). The most prominent mode of large-scale variability in the Atlantic, the North Atlantic Oscillation (NAO), has been shown to directly influence the early stage of Moroccan wheat growth in December by shaping the storm tracks which bring moist air from the Atlantic Ocean to the land . Moreover, indirect influences on Moroccan rainfall may occur via atmospheric teleconnections; wave trains, for instance, can emerge from SST forcing and may lead to temperature and rainfall changes in far-away regions downstream of the wave (Schlueter et al., 2019;Shaman & Tziperman, 2011).
Tapping into this potential source of forecasting rainfall and thus Moroccan wheat yields, we here apply a physically motivated approach based on causal discovery algorithms  to find causal climate precursors for interannual wheat yield variability at least 4 months before harvest. Previous studies have successfully applied the methodology of causal precursors to forecast extreme stratospheric polar vortex states relevant for midlatitude winter weather (Kretschmer et al., 2017) and the Indian summer monsoon intensity (Di Capua et al., 2019).

Moroccan Wheat Yield Data
Nationally aggregated annual wheat yield data for the time period 1979-2017 is taken from the website of the Food and Agriculture Organization (FAO) (FAOSTAT, 2017) with wheat yields given in hectograms per hectare (hg/ha). Annual anomalies are calculated based on the difference to the yield in the previous year (first differences), thereby removing possible linear trends. See the supporting information for the original time series of absolute values ( Figure S1). We chose to forecast yields instead of total production because yields are more directly correlated with the climate, whereas the crop area needed to calculate total production also depends on socioeconomic influences. If production is required, a wheat area mask can be derived from static crop masks like SPAM (Fritz et al., 2015) or MIRCA2000 (Portmann et al., 2010) or from harvested area of last season.

Climate Data
Precursors are derived from two climate variables: SST and geopotential height at 500 hPa (Z500), with the latter being a commonly used level to describe high-and low-pressure systems in the mid troposphere. We selected these climate variables because they were shown to be linked to Moroccan winter climate and/or wheat yields (e.g., Knippertz et al., 2003;Tuel & Eltahir, 2018). Both climate variables are taken from the ERA5 reanalysis product provided on a 1°× 1°longitude-latitude grid covering the time period 1979-2017 at monthly time resolution (Hersbach et al., 2019). Similarly, as for the Moroccan Wheat Yield (MWY) time series, monthly climate anomalies are calculated at each grid cell by calculating the difference to the same month of the previous year. Due to the first differences approach for anomaly calculation and the wheat growing season lasting from November to June, the analysis is limited to the years 1981-2017.

Building the Statistical Forecast Model-A Three Step Approach
Building the forecast model consists of three steps: (1) defining potential precursors from gridded climate variables by spatial clustering of correlation maps, (2) selecting causal precursors from potential precursors using causal discovery algorithms, and (3) applying multiple linear regressions on observed yield anomalies using causal precursor time series.

Step 1: Define Potential Precursors
Potential precursors are defined as confined regions of a climate variable whose changes precede changes in the target variable, that is, nationally aggregated MWY anomalies. In a first step, pairwise correlation analyses are conducted between MWY anomalies and lagged time series of monthly Z500 and SST anomalies at each grid cell of the gridded globe between 90°N and 20°S to include possible teleconnections from the northern hemisphere and the tropics. Thereby, statistical significance at the grid cell level is defined at the 2% threshold (two-tailed p value < 0.02). Using two climate variables (Z500 and SST) and four time lags (September to December) thus leads to eight correlation maps from which potential precursors are extracted. Potential precursors are defined by grouping significantly correlated grid cells of the same correlation sign using density-based spatial clustering of applications with noise (DBSCAN, Ester et al., 1996;Schubert et al., 2017). In DBSCAN, a radius of 300 km is chosen to define neighboring grid cells, which is found to produce regions of reasonable sizes and spatial separation.

Step 2: Select Causal Precursors From Potential Precursors
So far, potential precursor regions have been identified which are correlated with the target variable MWY. These lagged correlations, however, do not necessarily imply causation. Noncausal, spurious correlations can emerge from indirect links, common drivers, or autocorrelation effects. To remove such spuriously correlated precursors, we apply a multivariate causal discovery algorithm . The algorithm uses partial correlations to iteratively check whether the link between a given potential precursor and the target variable can be explained by any combination of the remaining potential precursors. If this is the case, that is, if the given potential precursor is conditionally independent from the target variable, then this potential precursor is removed. Otherwise, it is considered as a causal precursor. A detailed step-by-step description of this causal selection step can be found in Kretschmer et al. (2016). Despite the thorough selection process, the definition of causality given here, like any causal interpretation, rests on several underlying assumptions (Runge, 2018). In this sense, causal precursors as defined in this study should be understood as climatic indices which exhibit a significant, time-lagged linear dependence with MWY anomalies that cannot be explained by any other identified potential precursor or combination of those.
The combination of Steps 1 and 2 of the method part was first introduced by Kretschmer et al. (2017) as the response-guided causal precursor detection. Here, we apply the same method albeit with the modification of clustering significantly correlated grid cells in Step 1 in contrast to merging only directly neighboring grid cells. This has shown to improve the robustness of detecting potential precursor regions.

Step 3: Build the forecast model based on causal precursors
In the last step, we perform a multiple linear regression between the anomaly time series of the selected causal precursors and MWY anomalies to build the forecast model in the form MWY forecast ¼α þ ∑ n i β i · CP i þ ε i , where α is the intercept, β i is the parameter of the ith causal precursor (CP i ) with error term ε i , and n is the total number of causal precursors.

Extracting Causal Precursors From Climate Data
In total, 61 potential precursors are extracted (Step 1) from the pairwise correlation analysis between the gridded climate variables and MWY anomalies, indicating both positive as well as negative correlations (respective red and blue regions with contours in Figure 1). Potential precursors are found in each correlation map with spatial patterns of Z500 precursors showing larger differences between time lags compared to SST as expected from higher variability in the atmosphere. Correlation maps are robust with similar regions found for different significance thresholds and subsamples of the studied time period (see details in Figure S2).
Among all 61 potential precursors, only five are found to be causally linked to MWY anomalies following Step 2 of the model building approach (Figure 2). These causal precursors include a region of negatively correlated Z500 anomalies over Central to Southwestern Europe in November and December, suggesting that Z500 anomalies in these months provide relevant, independent information for MWY. Otherwise, the applied causal discovery algorithm should have eliminated one of the two precursors during the conditional independence test. Consistently, the correlation between both Z500 regions is only weak (Pearson correlation coefficient of r = 0.36, Figure S3). A second causal precursor is found in December which refers to positively correlated SST anomalies in the Coral Sea northwest of Australia. Two causal precursors emerge in October and relate to positively correlated SST anomaly fields-one in the North Atlantic off the East Coast of the United States and the other in the tropical Atlantic along the western African coastline. In September, no causal precursor for MWY is identified. We test the robustness of the causal selection step by altering significance thresholds and applying them to subsamples of the data and overall find consistent results (see detailed discussion in Figure S4). Particularly, Causal Precursors 1-4 only show little sensitivity to the chosen settings.

Validation of the Moroccan Wheat Yield Hindcasts
Hindcasted yield anomalies strongly correlate with observed anomalies, explaining 88% of the observed yield variance over the full time period with a root mean square error (rmse) of 2,530 hg/ha (Figure 3a). Thereby, each causal precursor contributes a similar individual share of 15-25% to the total explained variance Figure 1. Potential precursors derived from 500 hPa geopotential height anomaly fields (Z500, left) and sea surface temperature anomaly fields (SST, right). Pairwise correlations are calculated between wheat yield anomalies and the respective climate variable at each grid cell and time lag ranging from Lag 4 (December) to Lag 7 (September). Significantly correlated grid cells are then aggregated to homogeneous regions using cluster analysis (black contours).
( Figure S5) as computed from variance decomposition of the multiple linear regression model (Grömping, 2007). Oscillating MWY variability over the last decade seems to be driven by similar variability of the causal Z500 precursor regions in December and November ( Figure S6), which is in line with increased correlation strength over time between both precursors and MWY ( Figure S7). In contrast, correlation strength between MWY and the causal SST precursor in the equatorial Atlantic starts at a high level of around r = 0.8 and then decreases to around r = 0.4 in 2010. The transition phase when the correlation of MWY with the Z500 precursors becomes stronger than with the SST precursor corresponds to the time period where hindcasts diverge most from observations (1999)(2000)(2001)(2002)(2003) and may thus play a role for this discrepancy. Analyses of the hindcast residuals confirm that the assumptions of a multiple linear regression model are fulfilled, that is, that residuals are characterized by a mean value of zero, constant variance (homoscedasticity), and no significant autocorrelation and follow a normal distribution ( Figure S8).
The regression model is robust with respect to its regression parameters of the identified causal precursors. To show this, we divide the time series into two parts; regression parameters are derived from the training period (19 years, 1981-1999) and then used to hindcast MWY anomalies over the test period (18 years, 2000-2017). The explained variance over the training period (91%, rmse = 2,100 hg/ha) is high and similar to the explained variance over the test period (85%, rmse = 3,170 hg/ha), indicating that the regression model does not suffer from overfitting given the hypothetical case that all five causal precursors were known (Figure 3b).
We next implement an out-of-sample cross validation to further validate the predictive skill of our hindcast model in the case that causal precursors are not know a priori. For this, we iteratively remove two consecutive years from the time series with the remaining years serving as the training period and the left-out years as the test period. We choose to remove two consecutive years instead of just one to account for the strong year-to-year autocorrelation of the causal precursor time series ( Figure S9). The full hindcast model (Steps 1-3) is then calculated using data from the training period only to ensure that data against which the model skill is validated do not enter any part of the model building process.
Hindcasted yield anomalies from this cross validation still explain 49% (rmse = 5,330 hg/ha) of the observed variance over the full time period with observations mostly staying within the 95% prediction interval (Figure 3c). The drop in explained variance is due to the fact that not all five causal precursors are detected in each training period, which is primarily due to small changes in the identified potential precursor sets. Repeating the cross validation using prescribed potential precursors from the full time period increases the explained variance to 76% (rmse = 3,640 hg/ha, Figure S10).

Forecasting Wheat Yields and Comparison to Other Statistical Methods
We next assess the potential of our approach to forecast interannual MWY changes and find it to produce accurate forecasts when operated in a one-step-ahead mode (Figure 4). For this, we use climate and yield data from the 25-year period prior to the to-be-forecasted year to build the full forecast model; that is, to define potential precursors, select causal precursors and derive the regression parameters. Regression parameters are then applied to causal precursor anomalies from the 26th year to produce the forecast. Afterwards, the 25-year period is shifted by 1 year to rebuild the complete model used to forecast the next year and so on. This way, possible long-term changes in teleconnections affecting MWY can in principle be captured.  observed decline in yield is significantly lower than forecasted and in 2009 where the observed yield anomaly is significantly higher.
To assess the added value of our forecast model, we compare results to two simple forecast models: one which assumes that the forecasted yield is equal to the average of historical yields plus a linear trend and a second one which sets all forecasts to be the anomaly of the previous year but inversed in sign (see details in Figures S11 and S12). The latter model has no physical meaning but is motivated by the characteristic time series of strongly alternating yield anomalies. The average + trend model and the previous-year model show some skill in forecasting next year's yield during 2006-2017 (r 2 = 0.71 and 0.58, rmse = 5,840 and 6,000 hg/ha, respectively). However, predictive skill drastically decreases in the out-of-sample cross validation with 2 years omitted in the training phase (r 2 = 0.29 and 0.24, rmse = 6,290 and 7,400 hg/ha, respectively), indicating that most of the skill in the forecast mode comes from the strong year-to-year autocorrelation of MWY and causal precursor anomalies. Our causal precursor-based model outperforms both simple models by a factor of around 2 with respect to explained variance.

Discussion and Conclusions
We have shown that Moroccan wheat yield anomalies which are strongly linked to winter rainfall changes can be robustly predicted using five causal precursors extracted from geopotential height anomalies at 500 hPa and SSTs. The physical interpretation of the discovered links is discussed in the following.
A clear direct effect can be derived from the November and December geopotential height anomalies over Europe indicated as Causal Precursors 1 and 3 (see Figure 2). A high pressure system over this region deflects extratropical storms to the north which bring moist air from the Atlantic Ocean to the land (Hurrell, 1995). In turn, negative geopotential height anomalies would favor more zonal storm tracks, leading to more rainfall over Morocco (and thus higher yields) consistent with the negative link we find between the precursors and wheat yields. The center and spatial pattern of the two precursors resemble the southern region of pressure anomalies characteristic for the NAO. Indeed, also the NAO counterpart of positively correlated Z500 anomalies over Greenland/Iceland was identified in the correlation maps ( Figure 1) but not found to add additional information for MWY. A strong link between NAO and Moroccan precipitation has already been reported and used for predictions (El Hamly & Sebbar, 1998;Knippertz et al., 2003). Here, this region is selected from our data-driven method directly, confirming earlier findings. The positive correlation between October SST anomalies at the East Coast of the United States (Precursor 4, Figure 2) and changes in wheat yields may arise via extratropical storm track activity. The causal precursor region largely overlaps with a region of strong cyclogenesis of extratropical storms. Cyclogenesis is largely determined by the surface layer and hence by SSTs (Hoskins & Valdes, 1990). High temperature gradients in this region provide favorable conditions for the creation of extratropical storms and thus increased storm track activity associated with anomalously wet conditions over Europe and North Africa (Lehmann & Coumou, 2015).
A more indirect effect can be assumed from both tropical SST precursors on Moroccan winter rainfall and thus wheat yields. There is an extensive body of literature on tropical-extratropical interactions which explains how tropical thermal forcing impacts on extratropical weather conditions through induced atmospheric responses (see, e.g., Robertson & Vitart, 2019 and references therein). The most important tropical-extratropical teleconnection at the subseasonal to seasonal time scale emerges from the Madden-Julian Oscillation (MJO) (Stan et al., 2017;Vitart, 2017). It has been shown that Phases 6 and 7 of the MJO can enhance poleward and vertical Rossby wave propagation leading to negative NAO-like conditions via a stratospheric pathway (Lee et al., 2019) and thus positive precipitation anomalies over western North Africa (Cassou, 2008;Lin et al., 2009). This link is in agreement with December Precursor 2 in the West Pacific, suggesting that it provides predictability for Moroccan wheat yields via its remote influence on winter rainfall. The reported SST Precursor 5 in October is consistent with a documented tropical driver of Moroccan wheat yields. Warming of this region along the western African coastline has been shown to enhance latitudinal moisture transport via changes in trade winds, which is important for autumn rainfall in Morocco and thus for the early phase of wheat development (Knippertz et al., 2003).
The reported set of causal precursors is robust over the studied time period. However, for some shorter time intervals, only a subsample of the set is found to be significant. Assessing the origin of these differences using data from climate models could give valuable insights into whether this is a statistical artifact or due to actual changes in physical teleconnections. Moreover, albeit all five causal precursors were found to be similarly important to forecast Moroccan wheat yields, each of them may be relevant for different phases of rainfall during the rainy season or rainfall at different locations. For example, it has been suggested that pressure anomalies consistent with Precursor 1 are important for early wheat growth , whereas tropical Pacific SSTs corresponding to Precursor 2 are relevant for late-season precipitation (El Hamly & Sebbar, 1998). This should be assessed in subsequent research by linking climate drivers to spatially resolved rainfall over Morocco using the causal discovery algorithm presented in this study. Finally, further insights can be gained by analyzing how teleconnections operating on longer time scales might affect the precursors identified in this study. For example, Lee et al. (2019) showed that the El Niño Southern Oscillation (ENSO) influences the above-mentioned MJO-NAO link through modulation of the seasonal mean background state.
Recent research showed the great potential of teleconnections as a source of predictability on subseasonal to seasonal time scales, relevant for a multitude of applications (Dobrynin et al., 2018;Merryfield et al., 2020;White et al., 2017). Here we showed that climatic information can be used to forecast Moroccan wheat yields 4 months before harvest through its direct link to prevailing rainfall conditions. This would offer release dates 3-5 months earlier compared to current operational forecast systems which use vegetation data and provide first yield estimates in March (Joint Research Centre-Monitoring Agricultural Resources, JRC MARS), April (CGMS-MAROC), or May (U.S. Department of Agriculture, USDA). Comparison of forecast skill between the different models is nearly impossible due to marked differences in input data, metrics, validation techniques, and even the forecasted output (e.g., yield vs. production). Yet, it is reasonable to assume that forecast accuracy of monthly updated operational forecasts generally improves over the course of the season. Ideally, our model would thus be used to increase the volume of existing forecast information and extend the lead time for initial estimates. Such long lead times could significantly improve strategic adaptation measures from the state to farm level, including early wheat import planning and the application of plant protection materials and fertilizers, and provide humanitarian actors with timely information for early action. The presented method can easily be transferred to other indicators and regions. Yet, we emphasize that expert knowledge, for example, about appropriate climate precursors, and a careful interpretation of the results are crucial to extract meaningful results.