Historical changes in global yields: major cereal and legume crops from 1982 to 2006

Authors


  • Editor: Thomas Gillespie

Abstract

Aim

Recent changes in crop yields have implications for future global food security, which are likely to be affected by climate change. We developed a spatially explicit global dataset of historical yields for maize, soybean, rice and wheat to explore the historical changes in mean, year-to-year variation and annual rate of change in yields for the period 1982–2006.

Location

This study was conducted at the global scale.

Methods

We modelled historical and spatial patterns of yields at a grid size of 1.125° by combining global agricultural datasets related to the crop calendar and harvested area in 2000, country yield statistics and satellite-derived net primary production. Modelled yields were compared with other global datasets of yields in 2000 (M3-Crops and MapSPAM) and subnational yield statistics for 23 major crop-producing countries. Historical changes in modelled yields were then examined.

Results

Modelled yields explained 45–81% of the spatial variation of yields in 2000 from M3-Crops and MapSPAM, with root-mean-square errors of 0.5–1.8 t ha−1. Most correlation coefficients between modelled yield time series and subnational yield statistics for the period 1982–2006 in major crop-producing regions were greater than 0.8. Our analysis corroborated the incidence of reported yield stagnations and collapses and showed that low and mid latitudes in the Southern Hemisphere (0–40°S) experienced significantly increased year-to-year variation in maize, rice and wheat yields in 1994–2006 compared with that in 1982–93.

Main conclusions

Our analyses revealed increased instability of yields across a broad region of the Southern Hemisphere, where many developing countries are located. Such changes are likely to be related to recent yield stagnation and collapses. Although our understanding of the impacts of recent climate change, particularly the incidence of climate extremes, on crop yields remains limited, our dataset offers opportunities to close parts of this knowledge gap.

Introduction

Various global agricultural datasets have been developed to investigate the effects of agriculture on climate, carbon and nitrogen cycles, water resources, land-use changes and biodiversity (Leff et al., 2004; Monfreda et al., 2008; Ramankutty et al., 2008; Portmann et al., 2010; Potter et al., 2010; Sacks et al., 2010; Ray et al., 2012). Such datasets are also useful for testing the reliability of crop simulations performed by global gridded crop models (Rosenzweig et al., in press).

Crop production per unit area (yield) is a fundamental parameter in agricultural and environmental research. At present, global yield datasets are limited to the country yield statistics from the Food and Agriculture Organization of the United Nations (FAO) and the global 5′ (c. 10 km) grid yields in 2000 from M3-Crops (Monfreda et al., 2008) and MapSPAM (You et al., 2006). Although FAO data have been used in numerous studies that analysed the trends and temporal variation in yields (Lobell & Field, 2007; Lobell et al., 2011), these data cannot be applied to depict the spatial variation of yield within a country. In contrast, the M3-Crops and MapSPAM datasets are powerful at capturing the spatial variation in yield or yield gaps (Licker et al., 2010) but they are inappropriate for time-series analysis. These shortcomings hinder investigations of the impacts of climate variability and change on historical yields, thus requiring the development of a global dataset of historical yields (e.g. Ray et al., 2012).

Another shortcoming of global datasets, including the spatially explicit global dataset of historical yields compiled by Ray et al. (2012), is the lack of quantitative information on errors inherent in the data. However, it is generally not possible to quantify errors in yield statistics due to errors in measurements and farm sampling, spatial aggregations and rounding. An alternative is to quantify errors of modelled yields relative to yield statistics. This approach can be useful for analysing yield datasets as well as historical land-use datasets (e.g. Klein Goldewijk et al., 2011).

With that aim, we proposed a model for estimating grid yields using satellite-derived net primary production (NPP), FAO country yield statistics, the crop calendar in 2000 from Sacks et al. (2010), harvested area in 2000 from the M3-Crops data (Monfreda et al., 2008) and share of crop production by cropping systems in the 1990s from the US Department of Agriculture (USDA, 1994, 2013). Using this model, we developed a global dataset of historical yields for maize, soybean, rice and wheat at a grid size of 1.125° (c. 120 km) for the period 1982–2006.

The modelled yields were compared with the M3-Crops, MapSPAM and subnational yield statistics reported for 23 major crop-producing countries. Although the yield data used for the model and comparison are not necessarily independent of each other (e.g. FAO data and subnational statistics), the comparison gave us insights into the reliability of modelled yields, as the model spatially disaggregated FAO data into grid yields without any information on yields from subnational statistics, M3-Crops and MapSPAM. Errors of modelled yields relative to subnational data were characterized and included in the dataset.

Methods and Data

Yield estimation model

The model for estimating grid yields consisted of six steps (Fig. 1): (1) modelling of the timing and length of crop growth periods, (2) accumulating crop-specific grid NPP during the growth period, (3) adjusting FAO country yields for a secondary cropping system, (4) combining FAO and NPP data for the major cropping system, (5) combining FAO and NPP data for a secondary cropping system, and (6) averaging yields across cropping systems. Details of each step are given below.

Figure 1.

Schematic diagram illustrating the steps of the yield estimation model.

Model inputs

The model estimated the yearly time series of yield of the crops in a 1.125° grid cell. Major inputs for the model were crop-specific grid NPP derived from the NOAA (US National Oceanic and Atmospheric Administration)/AVHRR (Advanced Very High Resolution Radiometer) (Text S1 in Supporting Information); FAO country yield; the crop calendar in 2000 (Sacks et al., 2010); harvested area in 2000 (Monfreda et al., 2008); and share of production by cropping system in the 1990s from the USDA report of historical averages of crop area, yield and production over the world's major crop-producing regions (USDA, 1994, 2013). In the model, major and secondary cropping systems were considered for maize, rice and wheat (winter and spring), whereas only a single cropping system was considered for soybean.

Accumulating crop-specific grid NPP for growth periods

Because the management and technology adopted by farmers differ even within a grid cell (Sacks et al., 2010), we considered a certain interval for each planting date and harvesting date of a crop within a grid cell. This, in turn, required us to account for the possibility of various timings and lengths of the crop growth period in the model. We fitted a normal distribution to each time evolution pattern of planting date and harvesting date for a given crop, cropping system and grid cell (Step 1 in Fig. 1). The typical planting date and 25% of the range between the lower and upper limits of the planting date obtained from the crop calendar in 2000 (Sacks et al., 2010) were arranged to a grid size of 1.125° from the original 5′ grid size. These planting date values were set as the mean and standard deviation of the normal distribution, respectively. Another normal distribution was fitted to the harvesting date in a similar manner.

We specified 500 different growth periods on the basis of the planting dates and harvesting dates sampled from the normal distributions. Planting date (or harvesting date) was sampled again if it fell outside the limits of the crop calendar (Sacks et al., 2010). Growth periods differed according to the grid cell. Daily crop-specific grid NPP values were summed for each of the growth periods, year by year, resulting in 500 different yearly NPP time series for each crop, cropping system and grid cell (Step 2 in Fig. 1).

Adjusting FAO country yields for secondary cropping systems

Generally, differences in yield exist across the cropping systems considered here. This could be important for rice and wheat because double or triple cropping of rice was practised in the tropics and spring and winter cropping of wheat was used in many regions, whereas a multiple cropping system of maize was used in a limited region (e.g. Brazil) and soybean was generally produced once a year (Sacks et al., 2010). For instance, in Thailand, the mean rice yield for the period 1988/89 to 1992/93 from the secondary cropping system (3.81 t ha−1) was almost double that from the major cropping system (1.94 t ha−1) (USDA, 1994). The yield averaged over cropping systems is generally closer to the yield of the major cropping system than to that of the secondary cropping system (e.g. the mean rice yield in Thailand for that period was 2.11 t ha−1) because the mean yield is weighted by the share of production for each cropping system. Therefore, the assumption of an equal yield across cropping systems is not valid.

In the model, we assumed that the FAO data represent yields of the major cropping system of maize, rice and wheat (winter) rather than those of the secondary (spring) cropping system of the crops. In addition, we assumed that the NPP data from Step 2 contained information on the relative differences in yields between major and secondary cropping systems of maize, rice and wheat. Based on these assumptions, for each cropping system and year, we averaged the NPP values over the cropland grid cells in each country. The grid cells where each cropping system was used were specified using the crop calendar (Sacks et al., 2010). The ratio between the cropland-mean NPP value for the major cropping system of maize, rice and wheat and that for the secondary cropping system of the crops was computed year by year and country by country. The ratio was then multiplied by the FAO data to obtain FAO country yields adjusted for secondary cropping systems of maize, rice and wheat (Step 3 in Fig. 1).

Combining crop-specific grid NPP with FAO country yields

It is reasonable to assume that the yearly NPP time series calculated in Step 2 contain information on the spatial variation of aboveground biomass of the crops. The harvest index, the fraction of actual yield in total aboveground biomass, is needed to convert the aboveground biomass into crop yield (e.g. Lobell et al., 2003), although it is difficult to obtain such data on a global scale. However, many factors related to yield are likely to follow political boundaries, as suggested by Licker et al. (2010) for wheat and maize yields in western and eastern Europe. We assumed that, like yield, the harvest index follows political boundaries due to different management styles and technology.

For the major cropping system of maize, soybean, rice and wheat, we calculated the ratio of the NPP value for each cropland grid cell to the mean NPP value averaged over cropland grid cells located in a country, year by year and country by country. For each year, the calculated NPP ratio for each cropland grid cell was multiplied by the FAO country yield to obtain the spatial pattern of yields of the crops in a country (Step 4 in Fig. 1). Thus, the spatial variation in modelled yields in a country followed that in the NPP, whereas the temporal variation in modelled yields basically followed those in the FAO data. However, the temporal variation in the NPP also affected that in modelled yields if yield stagnation or collapse occurred in a grid cell, because the NPP ratio for a cropland grid cell varied year by year. The grid yields of secondary cropping systems of maize, rice and wheat, if applicable, were calculated in a similar manner, but using the FAO data adjusted for a secondary cropping system (Step 5 in Fig. 1).

Averaging modelled yields over cropping systems

As already noted, the cropping-system-mean yield is weighted by the share of production for each cropping system. The USDA report of historical averages of crop area, yield and production over the world's major crop-producing regions (USDA, 1994, 2013) and the updated versions (USDA, 2013) are the most well-synthesized sources of information on the share of production by cropping system. However, most data from the reports are limited to those countries that were the major crop producers in the 1980s and 1990s.

To be as consistent as possible with other data used in the model (i.e. the crop calendar and harvested area in 2000), we used the production data in the 1990s from the USDA (1994, 2013) for the following countries for which the data were available from the report: for maize, Brazil, Mexico and the Philippines; for rice, China, India, Thailand and Vietnam; and for wheat, Canada, China, Kazakhstan, Russia, Ukraine and the United States. Identical values were used for all grid cells in a country. If no data were available, even shares of production by cropping system were used. For the crops except soybean, the mean yield over the cropping systems was calculated for each year and grid cell, if multiple cropping systems were used in a grid cell (Step 6 in Fig. 1).

The calculations for Steps 3 to 6 were conducted for each of the 500 yearly NPP time series to form many ensemble members of modelled yields. These ensemble members represent the uncertainty of modelled yields associated with different timings and lengths of crop growth periods. For each grid cell and year, we calculated the median value of modelled yields and used this when evaluating the reliability of modelled yields.

Subnational yield statistics

For 23 major crop-producing countries, yearly subnational yield statistics for 1982–2006 were collected (Table S1). The data collection was conducted independently of the Agro-MAPS project (You et al., 2006; FAO, 2013), although there are common sources of data between Agro-MAPS and this study (e.g. the USDA). County, district or municipality data were collected for seven of these countries. For the other countries, we used province, state or country data because finer-resolution data were not readily available to us.

If yield values were not directly available from the data, we computed these values by dividing production by harvested area year by year (Table S1). The subnational data were arranged to calculate the area-mean yield of a 1.125° grid cell according to the method of Hansen & Jones (2000):

display math(1)

where Yi is the area-mean yield of the ith grid cell (t ha−1), HAj is the harvested area of the jth political unit that falls within the ith grid cell (ha), calculated from the M3-Crops data, Yj is the yield of the jth political unit (t ha−1) and J is the total number of political units. The use of the M3-Crops data indicates that the geographic distribution of harvested area was considered to be constant across the study period. The calculated yields do not necessarily represent exact area-mean yields, for example if a grid cell spanned two political units and data for either political unit were not available.

The arranged subnational data represent yields averaged over multiple ecosystems (irrigated, rain-fed, lowland and upland) and cropping systems (single, double or triple). For rice, the data in some countries were the average of double (Cambodia, China, India, Laos and Philippines) or triple cropping (Vietnam). For wheat, the data used were aggregated values of various wheat crops (winter, spring and durum wheat).

Statistical analyses

Pearson's correlation coefficients and the root-mean-square errors (RMSEs; relative to the mean subnational yield statistics in 1982–2006) were used to assess the similarity of temporal variations between two samples (the modelled yields and subnational statistics) and that of the absolute value, respectively, if more than 5 years of subnational statistics were available for a grid cell. The annual rate of change in yield was also calculated by separately fitting a simple linear regression model to each of two samples, if more than 5 years of subnational statistics were available for both the first (1982–93) and second halves (1994–2006) of the study period; the slopes of the regression lines were divided by the mean yield value in 1982–2006 to obtain the annual rate of change in yield for each period.

The modelled yields were compared with yield data around the year 2000 from M3-Crops (Monfreda et al., 2008) and MapSPAM (version 3.0.6; You et al., 2006). The M3-Crops data used the mean yield in 1997–2003 (or 1990–96), whereas the MapSPAM data used the yield in 2000 (or the mean yield in 1999–2001). For both datasets, users cannot ascertain what time period of data is used for a grid cell. We averaged the modelled yields over the periods 1997–2003 and 1990–96 for comparisons with the M3-Crops data to account for the uncertainty about the period of data used in the dataset. Likewise, the modelled yields in 2000 and those averaged over the period 1999–2001 were used when comparing modelled yields with the MapSPAM.

After evaluating the reliability of modelled yields, we calculated the mean, year-to-year variation (represented by the coefficient of variation, CV) and annual rate of change in yields for the periods 1982–93 and 1994–2006 for the modelled yields from one grid cell to another. The differences in values of the statistics between the two periods were also computed. For each of the statistics, global-means, latitudinal-means for every 10° bin (both weighted by harvested area) and spatial variation of the statistics (represented by a range of values of the statistics covering 90% of values across grid cells centred on the median value) were computed to summarize the characteristics of historical changes in yields. Statistical significances of the historical changes in yields were tested using the ensemble members of the modelled yields and the errors in the modelled yields against the subnational statistics (Text S2).

Results

Performance of the model

The correlation and RMSE values between the modelled yields and subnational statistics in 1982–2006 are mapped in Fig. 2. The subnational statistics covered 50.0–78.4% of the global harvested area in 2000 (Table 1, Fig. S1). More importantly, those proportions correspond to 50.9–85.2% of the global production in that year. Therefore, the assessments of the model performance were possible for a substantial proportion of global croplands. Across all crops, the model achieved good correspondence (represented by higher correlation and lower RMSE) in major crop-producing regions: the American Midwest, the Northern China Plain, the Brazilian Cerrado and the Argentinean Pampas. In these regions, most correlation values were greater than 0.8 (significant at the 1% level) and the RMSE values were less than 30%. The RMSE values in minor crop-producing regions tended to be larger (> 50%). For all crops, the correlation and RMSE values gradually worsened with a decrease in the extent of harvested area in a grid cell (the harvested area fraction, fHA) (Fig. S2).

Figure 2.

Correlation coefficients (left) and root-mean-square errors (RMSEs; relative to the mean subnational yield statistics in 1982–2006) (right) calculated between the modelled yields and subnational yield statistics for the period 1982–2006 for maize, soybean, rice and wheat. Light gray indicates that no subnational data were available. Dark gray indicates non-cropland area.

Table 1. Quantitative and areal coverage of collected subnational yield statistics for maize, soybean, rice and wheat. Global values were derived from the M3-Crops data (Monfreda et al., 2008)
CropProduction (million tonnes)Harvested area (million hectares)
Subnational (A)Global (B)(A)/(B) (%)Subnational (C)Global (D)(C)/(D) (%)
Maize42359171.67313653.7
Soybean13816285.2587478.4
Rice29157250.97515050.0
Wheat32456357.511620955.5

The comparisons between two samples showed good overall performance of the model in capturing the annual rates of change in yields in 1982–2006, even at the grid cell level (Fig. S3). However, as with the correlations and RMSEs, the correspondence in the annual rates deteriorated with the decrease in fHA.

The modelled yields were compared with yields around the year 2000 from the M3-Crops and MapSPAM data. The modelled yields explained 45–71% of the spatial yield variation of the M3-Crops, with RMSE values of 0.5–1.8 t ha−1 (corresponding to 30–47% of the global mean yields from the M3-Crops; Fig. 2). The corresponding values for the MapSPAM were 49–81% and 0.5–1.6 t ha−1 (25–46%). The agreement between the mean modelled yields around 2000 and MapSPAM was better than that between the modelled yields and M3-Crops for maize, rice and wheat, but not for soybean, if an R2 value was used as the measure; even if an RMSE value was used, the agreement between the modelled yields and MapSPAM for maize and rice was still better than that for the M3-Crops. A portion of the M3-Crops soybean yields of about 1 t ha−1 corresponded to the data for Russia, whereas a portion of the M3-Crops rice yields of about 6 t ha−1 corresponded to the data for Japan. Although we found large discrepancies between the mean modelled wheat yields in 1997–2003 and M3-Crops (Fig. 3), those discrepancies were reconciled if the mean modelled yields in 1990–96 were used (Fig. S4, these data are distributed in northern France and southern England). In contrast, the discrepancies between the modelled yields and M3-Crops for rice and soybean (Figs 3 & S4) were not reconciled by using data for the different time period. At least for rice yields, we found that values of the M3-Crops in south-western Japan were too high if compared with the subnational yield statistics in that region, suggesting some sort of flaw in the quality check of yield data in M3-Crops. Reasons for the discrepancy in soybean yields between the modelled yields and M3-Crops were not readily available to the authors. However, we found no such discrepancies in yields when modelled yields of soybean and rice were compared with MapSPAM.

Figure 3.

Comparisons of modelled yields at the grid cell level (YModel) and yields around the year 2000 from the M3-Crops (YM3-Crops) and MapSPAM (YMapSPAM) data for maize, soybean, rice and wheat. Mean modelled yields in 1997–2003 were used for comparisons with M3-Crops data, and modelled yields in 2000 were used for comparisons with MapSPAM data. The coefficient of determination (R2) and root-mean-square error (RMSE, in per cent, relative to the global mean yield of M3-Crops or MapSPAM) calculated between two samples (red, YModel vs. YM3-Crops; blue, YModel vs. YMapSPAM) are presented.

Historical changes in crop yields

The means, CVs and annual rates of change in maize yields in the periods 1982–93 and 1994–2006 and the differences in values of the statistics between the two periods are presented in Fig. 4. Results for the other crops are shown in Figs S5–S7.

Figure 4.

Means (upper row), coefficients of variation (CVs, middle row) and annual rates of change (bottom row) in modelled maize yields in 1982–93 (left column) and 1994–2006 (middle column), and differences in values of the statistics between the two periods (right column). Light gray indicates that no modelled yields were available due to the lack of crop calendar data. Dark gray indicates non-cropland grid cells.

A comparison of the global-mean maize yields between the two periods showed that the mean yield in 1994–2006 (5.07 t ha−1) significantly increased (by 16.3%) from that in 1982–93 (4.36 t ha−1; Table 2) (see Text S2 for the significance). The global-mean CV of maize yields in 1994–2006 (17.6%) significantly decreased (by 0.8 percentage points) compared with that in 1982–93 (18.4%). The slopes of the linear regression models fitted to the modelled maize yields in 1982–93 and 1994–2006 corresponded to 0.06 and 0.09 t ha−1 year−1, respectively, indicating that, at the global level, maize achieved a higher annual rate of increase in yield in 1994–2006 (2.2% year−1) than in 1982–93 (1.4% year−1).

Table 2. Calculated means, coefficients of variation (CVs) and annual rates of change in yields and the spatial variation of the statistics (represented by the ranges of values of the statistics covering 90% of values across grid cells centred on the median value) for the periods 1982–93 and 1994–2006 for maize, soybean, rice and wheat (weighted by harvested area). The difference between the two periods is shown in bold if the change was significant at the 5% level
Crop(A) 1982–93(B) 1994–2006(B) – (A)
Mean yield (t ha−1)   
Maize4.36 (0.42, 7.71)5.07 (0.65, 8.87)0.71 (−0.14, 1.98)
Soybean1.91 (0.39, 2.57)2.34 (0.60, 2.97)0.43 (−0.16, 0.66)
Rice3.59 (0.79, 6.81)4.20 (1.13, 7.43)0.61 (−0.19, 1.39)
Wheat2.47 (0.71, 5.36)2.77 (0.85, 5.91)0.30 (−0.13, 1.56)
CV of yield (%)   
Maize18.4 (11.0, 37.1)17.6 (9.6, 45.7)−0.8 (−16.0, 28.4)
Soybean14.9 (8.5, 29.2)13.3 (8.7, 23.6)−1.6 (−11.9, 8.0)
Rice14.1 (8.1, 28.7)14.6 (9.1, 29.8)0.5 (−11.2, 13.0)
Wheat16.8 (10.6, 29.5)18.7 (9.2, 47.6)1.9 (−8.4, 32.0)
Annual rate of change in yield (% year−1)   
Maize1.4 (−3.0, 6.2)2.2 (−1.0, 7.6)0.8 (−5.0, 8.2)
Soybean1.4 (−1.0, 4.8)0.6 (−1.6, 3.7)−0.8 (−4.2, 3.3)
Rice1.7 (−1.9, 5.1)1.9 (−1.7, 5.8)0.2 (−3.7, 5.6)
Wheat0.1 (−3.3, 3.8)1.0 (−1.3, 6.4)0.9 (−3.8, 8.2)

At the global level, increased mean yields in the recent period (1994–2006) were also observed for soybean, rice and wheat (Table 2). Like maize, the CV of soybean yields in the recent period decreased by 1.6 percentage points from that in the previous period (1982–93), suggesting that global maize and soybean yields became more stable with time. However, the CVs of rice and wheat yields in 1994–2006 increased by 0.5 and 1.9 percentage points, respectively, compared with those in 1982–93, suggesting that variation in global rice and wheat yields increased in the recent period. The annual rates of change in rice and wheat yields in 1994–2006 were still higher than those in 1982–93, as was the case for maize. In contrast, the annual rate of change in soybean yields in 1994–2006 (0.6% year−1) decreased by 0.8 percentage points compared with that in 1982–93 (1.4% year−1), suggesting that soybean yields stagnated at the global level in the recent period.

At the latitudinal-mean level, significant increases in the mean yields in the recent period were observed for all crops across most latitudinal zones (Fig. 5). However, the increases in the mean yields at low latitudes (equator-ward of ± 20°) were smaller than those in the mid and high latitudes of the Northern and Southern Hemispheres (20–60 °N and 20–60 °S) in 1994–2006. Significant increases in the CVs of maize, rice and wheat yields in 1994–2006, compared with those in 1982–93, occurred in the low and mid latitudes of the Southern Hemisphere, especially 0–40 °S, whereas the CVs of yields significantly decreased in the recent period in the remaining areas and crops (Fig. 5). The annual rates of change in maize yields in 1994–2006 significantly increased in many latitudinal zones compared with those in the previous period, with exceptions around 20 °N and 60 °S, whereas changes in soybean yields significantly decreased in 1994–2006, especially in the Northern Hemisphere, such as 10–50 °N (Fig. 5). The annual rates of change in rice and wheat yield in the recent period fell between those of maize and soybean.

Figure 5.

Latitudinal-mean differences in means (Δmean), coefficients of variation (ΔCV) and annual rates of change in yields (Δrate) for every 10° bin between the periods 1994–2006 and 1982–93 (the recent period minus the previous period) for maize, soybean, rice and wheat. Negative and positive latitudes indicate the Southern and Northern Hemisphere, respectively. An asterisk indicates that an increase (or decrease) is significant at the 5% level.

Discussion

Characteristics of the model and dataset

The modelled yields match reasonably well with the data from M3-Crops (Monfreda et al., 2008) and MapSPAM (You et al., 2006). Although the results suggest some uncertainties of yield values from the existing datasets, the discrepancies between the modelled yields and the datasets (wheat) are probably due to the different time periods of yield data across grid cells used for the datasets (Figs 3 & S4), but this was not always the case (soybean and rice).

Based on comparisons with the subnational statistics, the modelled yields for major crop-producing areas (represented by higher fHA values, Fig. S9) are sufficiently reliable (Figs 2, S2 & S3). At the global level, the production calculated from the grid cells where the modelled yields are reliable (the correlation values between the modelled yields and subnational statistics are significant at the 5% risk level; Fig. 2) account for a substantial percentage (69–88%) of the global production in 2000 calculated from the grid cells where the evaluation is possible on the basis of the subnational statistics (Fig. S8). The quantitative coverage of the reliable modelled yields is large enough to calculate global production accurately if the obtained percentages of the reliable production are extrapolated.

The reliability of modelled yields for grid cells with lower fHA values was lower than that for grid cells with higher fHA values. Some discrepancies in the geographic patterns of yields between the modelled yields and subnational statistics were found. For instance, the mean maize yields in 1994–2006 in the north-western United States are higher than those in the American Midwest (Fig. 4). This kind of discrepancy could appear in minor crop-producing areas. Caution is necessary when applying the modelled yields in minor crop-producing areas from our dataset to analyses.

The lower reliability of modelled yields in minor crop-producing areas is mostly due to the use of a time-constant harvested area map (Monfreda et al., 2008). Cropland distributions have changed over time (Ramankutty & Foley, 1999). In areas surrounding cities and marginal croplands that have suboptimal growing conditions, the extent of croplands has decreased due to urbanization or abandonment (Yoshida et al., 2012). In some countries (e.g. Brazil), however, croplands have been newly developed in response to increasing demand for crops (Schnepf et al., 2001). In both cases, satellite-derived crop-specific grid NPP data may fail to represent the status of crops accurately if a time-constant harvested area map is used to specify croplands. This is true in both major and minor crop-producing areas, but may be distinct for minor crop-producing areas.

The lack of information related to agricultural systems also may lower the reliability of modelled yields. The crop calendar (Sacks et al., 2010) did not cover the third season of the triple rice cropping system, which probably explains the large discrepancy between the modelled yields and subnational statistics of rice in Vietnam, where triple cropping is practised (Fig. 2). Planting and harvesting dates (or crop growth periods) in some regions have changed over time (e.g. maize in the United States, Kucharik, 2006; rice in Japan, Uno et al., 2012). Such temporal changes can also be found in the share of production by cropping system (e.g. maize in Brazil; USDA, 2007). Our model used time-constant data for the crop calendar (Sacks et al., 2010), share of production by cropping system (USDA, 1994, 2013) and harvested area (Monfreda et al., 2008), and the lack of historical information helps to explain errors in the modelled yields.

Despite the lack of information, using this model we successfully developed a global dataset of historical yields that aligned with satellite statistics. Our analyses showed that the dataset derived from this model is novel in its spatial and temporal resolution and can offer opportunities to investigate historical changes in global yields. The data are freely available at http://geointa.inta.gov.ar/publico/iizumi/gdhy.zip. The development of this model and dataset is a marked advance in the compilation of global agricultural datasets because the model is capable of estimating grid yields over global croplands using publically available data. Our model is independent of the method used by Ray et al. (2012), who provided a global dataset of historical yields using reported crop statistics alone, and is expected to be more effective than their method for rapidly updating yield data. This suggests that our model and dataset can be used for the monitoring and short-term prediction of the status of food availability at a global scale, as demonstrated by Iizumi et al. (2013). Furthermore, the model can be applied to satellite products of higher resolution (e.g. data from the Moderate Resolution Imaging Spectroradiometer, MODIS) than those used here to derive yield data at finer resolution and extend the period of yield data to more recent years than described here.

Stagnation, collapses and increased year-to-year variation of yields

Our dataset depicts spatially explicit historical changes in yields across the world at a 1.125° grid cell level. The changes revealed were substantially more diverse than those depicted in previous studies (e.g. Hafner, 2003) based on FAO data (an exception is Ray et al., 2012). The changes range from areas that achieved higher and more stable yields with enhanced annual rates of increase in yields (e.g. North America) to areas suffering from decreased annual rates of increase in yields (yield stagnation) and/or increased year-to-year yield variation. In some regions, even the mean yields in the recent period (1994–2006) were decreasing (yield collapses) relative to the previous period (1982–93), in addition to the aforementioned adverse changes in yields.

In most of the world, the mean maize yields in the recent period were higher than those in the previous period, with some exceptions such as in Botswana, Kenya, Madagascar, Zambia and Zimbabwe, where yield collapses occurred (Fig. 4). The annual rates of increase in maize yields in western Europe, east Asia, Australia and sub-Saharan Africa in the recent period were lower than in the previous period, indicating the occurrence of yield stagnation in these regions. Our analyses also revealed the occurrence of soybean yield stagnation in China, India and Paraguay (Fig. S5); rice yield collapses in Nigeria; rice yield stagnation in China and India (Fig. S6); and wheat yield stagnation in some developed countries including the United States (especially the western part), the United Kingdom, France and Germany (Fig. S7). Our results for stagnation and collapses in maize and wheat yield were consistent with previous reports (Hafner, 2003; Brisson et al., 2010; Lin & Huybers, 2012; Hawkins et al., 2013), which analysed specific countries using yield data at various levels. For all crops, similar yield stagnation and collapses were reported by Ray et al. (2012), who analysed a statistics-based global dataset of historical yields in 1960–2010. Despite the different datasets, both Ray et al. (2012) and this study showed consistent results, thus providing even more evidence of the recent yield stagnation and collapses of major crops.

Our results revealed increased year-to-year variation of yields in the recent period. Comparatively large CVs of maize yields in the previous period appeared in southern and sub-Saharan Africa and the Middle East (Fig. 4). In the recent period, the CVs of maize yields increased in Central and South America and Southeast Asia in addition to the aforementioned regions, which already had large CVs. The CVs of yields of all crops tended to increase in South America, southern and West Africa, the Near East and southern Europe, although there was some variation across countries (Figs 4 & S5–S7). Importantly, those areas that have been facing increased year-to-year variation in yield tended to show smaller increases in mean yields in the recent period than those in other areas where no such increases in the CV of yields were observed (mid and high latitudes in both hemispheres). Interestingly, the annual rates of increase in yields in those areas where increased year-to-year yield variation was found remained higher than the rates of increase in yields in the other areas where no such increases in the CV of yields were observed (Fig. 5). This finding suggests the possibility that the increased year-to-year yield variation may be derived from the recent decrease in mean yields. Unfortunately, few relevant studies are available for comparison with our results regarding the changes in the year-to-year yield variation. Although Kucharik & Ramankutty (2005) reported a decreasing trend in the amplitude of year-to-year variation of maize yields in the American Midwest in 1910–2001 (with large decadal variation), their analysis did not cover the period 2002–06. However, the greater stability of maize yield in the United States suggested by Kucharik & Ramankutty (2005) is qualitatively consistent with our finding.

Although it is beyond the scope of this study to present the underlying reasons for these historical changes in yields, higher mean yields and annual rates of increase in yields and decreased year-to-year variation of yields across the crops may have resulted from improvements in management practices, mechanization, irrigation facilities, dissemination of weather information, other advanced technologies and agricultural institutions. Factors related to planting dates, hybrid seeds, soil degradation, expansion of cropland to low-productivity lands, climate trends, climate extremes and policy may explain the stagnation, collapses and increased year-to-year variation in yields (Lal, 1995; Kucharik, 2006; Lobell & Field, 2007; Nellemann et al., 2009; Lobell et al., 2011; Obalum et al., 2012; Hawkins et al., 2013). For instance, Lobell & Field (2007) reported that increased temperature during the growing season had negative impacts on mean yields of maize and wheat in 1981–2002 at the global scale. Hawkins et al. (2013) suggested that the recent stagnation of maize yields in France since about 2000 is in part associated with the greater frequency of high temperatures during the growing season.

Conclusions

We developed a global dataset of historical yields in 1982–2006 using a model that aligns country yield statistics, grid yield proxy from satellite products, and various global agricultural datasets; this aligned with satellite statistics. For each year, this dataset contains about 144–2180 subnational grid statistics and 1312–4171 modelled grid yields, along with the error information of modelled yields against the subnational statistics. The development of this model and dataset is a marked advance in the compilation of global agricultural datasets, and it is expected to help improve monitoring and short-term prediction of the status of food availability at a global scale. However, this advance was achieved at the expense of a coarser grid size (c. 120 km) and smaller number of crops (four) in this dataset compared with those in previous studies (e.g. Monfreda et al. (2008) dealt with 173 crops at a grid size of 5′). Despite the limitations, we believe that this dataset offers a tremendous opportunity for investigating the impacts of climate variability and change on yields and the influences of agriculture on climate, carbon and nitrogen cycles, water resources, land-use changes and biodiversity, as well as testing the reliability of crop simulations at the global scale.

Analyses of our dataset showed that crop yields became higher and more stable at the global level and the global yields of the three cereal crops (maize, rice and wheat) never stagnated between 1994 and 2006. However, a different picture emerged at the latitudinal-mean level. Our results corroborate the yield stagnation and collapses in some regions that were previously reported and revealed that regions in the low and mid latitudes of the Southern Hemisphere, where many developing countries are located, faced increased year-to-year yield variation in the recent period. Because these regions achieved limited increases in mean yields in the recent period, the increases in year-to-year yield variation were likely to be responsible, at least in part, for the yield stagnation and collapses. Further research is needed to elucidate the mechanisms underlying the linkages between the increased year-to-year yield variation and yield stagnation or collapse.

Acknowledgements

We thank the editors and referees for their comments. The authors thank A. Tsutsumida-Takesaki, A. S. Rao, S. Naresh Kumar, A. J. Soja, J. L. McCarty, A. Petkov, B. Qian, Y. Masaki and R. Beukes for their support in collecting subnational statistics; H. Den at Academic Express Inc. for his support in data processing; and M. E. Brown for her suggestions about estimating NPP. This study was supported by the Environment Research and Technology Development Fund (S-10-2) provided by the Ministry of the Environment, Japan. T.I. was partially supported by the Grant-in-Aid for Scientific Research (Start-up: 23880030) provided by the Ministry of Education, Culture, Sports, Science and Technology, Japan. This work was conducted under the auspices of the Agricultural Model Intercomparison and Improvement Project (AgMIP).

Toshichika Iizumi is an agro-meteorologist involved in the research project ‘Global Risk Assessment toward Stable Production of Food’ (called GRASP) at the National Institute for Agro-Environmental Sciences. The focus of this project is to develop tools for evaluating spatiotemporal variation in food production at the global scale and to depict possible paths to global food security under future changes in climate, water resources and land use.

Ancillary