Cropland for sub-Saharan Africa: A synergistic approach using five land cover data sets

Authors


Abstract

This paper presents a methodology for the creation of a cropland map for Africa through the combination of five existing land cover products: GLC-2000, MODIS Land Cover, GlobCover, MODIS Crop Likelihood and AfriCover. A synergy map is created in which the products are ranked by experts, which reflects the likelihood or probability that a given pixel is cropland. The cropland map is then calibrated with national and sub-national crop statistics using a novel approach. Preliminary validation of the map was undertaken and the results are presented. The resulting cropland map has an accuracy of 83%, which is higher than the accuracy of any of the individual maps. The cropland map is freely available at agriculture.geo-wiki.org.

1. Introduction

Global cropland cover provides vital baseline information of agricultural production in many spatially explicit models and in applications such as the United Nations' Millennium Ecosystem Assessment (2005), and the Global Environmental Outlook (GEO, 2007). Furthermore, land cover and in particular cropland influences climate - it has been shown that cropland management itself can have an influence on the global climate [Lobell et al., 2006]. Changes in land cover may also lead to biophysical effects on the climate through changes in evapotranspiration, albedo and surface roughness [Brovkin et al., 2006].

Accurate spatial information of cropland is particularly important for crop monitoring of food security and early warning, and satellite derived land cover datasets have been widely used for this purpose. A major challenge in addressing future global food security is to determine the spatial distribution as well as the identification of areas where additional food or non-food crops can be grown. Such information is also essential for global economic land-use models which evaluate the impact of a certain policy (e.g., impact of a strict EU biofuel target on deforestation, food security etc.) see Havlik et al. [2010]. It has been shown both globally [Fritz and See, 2008], and in particular for Africa [Fritz et al., 2010], that large discrepancies between current continental and global land cover maps exist both in terms of overall area and spatial distribution. Ramankutty et al. [2008] estimated that the global cropland area lies between 1.22 and 1.71 billion hectares which translates into a 40% difference between the upper and lower estimates. Different medium to coarse resolution global land cover datasets exist that identify cropland, i.e. GLC-2000, MODIS Land Cover, GlobCover, MODIS Crop Likelihood, but each was derived using a different classification algorithm and furthermore it does not have particularly high accuracy for cropland estimation [Friedl et al., 2002; Bicheron et al., 2008].

Recent approaches to the development of cropland maps have focused on using national [Pittman et al., 2010] and sub-national statistics [Ramankutty et al., 2008], but there have been few attempts at integrating the current products, especially for agriculture. An exception is the work by Jung et al. [2006], who used a fuzzy agreement scoring approach to explore the synergies between global land cover products for carbon cycle modeling. This paper extends the concept of the synergy map [Jung et al., 2006], by including expert judgment in ranking the land cover products as well as the inclusion of regional and national products derived from higher resolution satellite data. A novel approach to calibration based on national and sub-national information is also introduced and the validation of the resulting cropland map is presented.

2. Data and Methods

2.1. Data Sources

Five land cover products are used in this paper: GLC-2000, MODIS Land Cover, GlobCover, MODIS Crop Likelihood and AfriCover. Table 1 provides the details of these datasets. Although there are other cropland products available [e.g., Ramankutty et al., 2008; Ramankutty and Foley, 1998], they are not included in this analysis because they are either rather old or use the products listed above in their development and are at a coarser resolution. Instead we chose to use the original land cover products in this analysis.

Table 1. Comparison of Land Cover Products
CharactersticsGLC-2000MODIS Land CoverGlobCoverMODIS Crop LikelihoodAfriCover
Reference year2000200520052000–20081982–2000
ProducerJRCUniversity of BostonESA/JRCUniversity of MarylandFAO
Resolution1 km500 m300 m250 m and 1 km30 meter
ReferenceBartholomé and Belward [2005]Friedl et al. [2002]Bicheron et al. [2008]Pittman et al. [2010](http://www.fao.org/waicent/faoinfo/sustdev/EIdirect/EIre0053.htm)

Cropland data at the country level are reported by the Food and Agriculture Organization of the United Nations (FAO). Sub-national level data have been assembled from various national statistics publications and agricultural census surveys by IFPRI (International Food Policy Research Institute). IFPRI's sub-national crop database covers all major crops in the world and includes mostly the second-level sub-national units (e.g. a district in Kenya) but also some third-level administrative units if available [You et al., 2009]. In some locations, favorable biophysical conditions allow multiple cropping for certain crops. Cropping intensity is defined as the ratio between harvested crop areas (where double or triple cropping areas are counted twice or three times respectively) and the physical areas where the crops are planted. The cropping intensity data by country is compiled from various sources, including national statistics, agricultural census, household surveys, and expert opinions [You et al., 2009]. We calculated physical areas by dividing the harvested area by the corresponding cropping intensity. We then sum all these physical areas by crops and get the total cropland for a certain administrative unit.

The final cropland by sub-national units serves as a lower bound for the validation of cropland extent from satellite data. There are two reasons for adopting this approach. Firstly, the crop list may miss minor crops produced in a particular region, and secondly, that account has not been taken of mixed cropping (i.e. planting two or more crops at the same plot within a year) due to data availability. Pasture and areas of low intensity cropping are not included in the cropland extent map.

2.2. Methodology

The methodology consists of a series of steps, which are shown in Figure 1 and are described in each of the sections that follow. Before starting, all the land cover products were standardized to a resolution of 1 km.

Figure 1.

Overview of the methodology for the creation of a cropland map.

2.2.1. Creation of a Synergy Map

The basic idea behind the synergy approach is to give each pixel a score based upon the agreement between the different land cover products used in the analysis [Jung et al., 2006]. The synergy approach is extended in this paper in two ways. The first modification allows the producer of the final map to rank the land cover products before they are combined (either based on expert judgement, timeliness or accuracy assessment) rather than giving all products the same weight. The more agreement between the maps, the higher the likelihood or probability that cropland exists at that pixel. Figure 2a illustrates how this ranking works using three different land cover products. Land cover product #1 is given the highest ranking by experts followed by land cover products #2 and #3. The black pixels are areas of cropland in each land cover product. A pixel is assigned a value of 1 if all three maps agree that cropland exists. A value of 2 is assigned to areas of cropland where the land cover products #1 and #2 agree but land cover product #3 does not. With three land cover products, there are 7 possible combinations as shown in Table 2, which provides a simple illustration. However, this methodology can be applied to any number of land cover products where five maps have been used in the development of the final cropland map outlined in this paper.

Figure 2.

Creating a synergy map based on (a) ranking the land cover products and (b) giving priority to the top rank which means that the layer is always used first even if there is no agreement with another map at a specific pixel. The black pixels are areas of cropland in each product.

Table 2. Synergy Map Ranking Combinations When Combining Three Land Cover Products
Rank ValueMeaning for RankingMeaning for Priority Ranking
1All land cover products agreeAll land cover products agree
2LC #1 and LC #2 agreeLC #1 and LC #2 agree
3LC #1 and LC #3 agreeLC #1 and LC #3 agree
4LC #2 and LC #3 agreeOnly LC #1 indicates cropland
5Only LC #1 indicates croplandLC #2 and LC #3 agree
6Only LC #2 indicates croplandOnly LC #2 indicates cropland
7Only LC #3 indicates croplandOnly LC #3 indicates cropland

The second modification to the synergy methodology of Jung et al. [2006] is referred to here as priority ranking, which allows for the incorporation of regional products at places where confidence is the highest. Thus maps with very high confidence, i.e. those based on high resolution data in combination with information from the ground such as a TM-based visual interpretation, are always ranked higher than the other land cover products, even if they have no or little agreement with the priority ranked product. This concept is illustrated in Figure 2b where the effect of the priority ranking can be seen. There are two pixels, labeled with values of 4 and 5, where 4 in this situation means that land cover products #2 and #3 agree on cropland but land cover product #1 does not while a value of 5 indicates that only land cover product #1 indicates cropland at that pixel. This is also illustrated in Table 2 in the right hand column. The order of these rankings is switched from Figures 2a to 2b because land cover product #1 has a higher priority in the ranking and therefore this pixel gets a value of 4 instead of 5 when applying the priority ranking method. This algorithm would most likely be used if there is a regional product available that has been derived from higher resolution images together with ground data so confidence in this product would be higher than the rest.

For the cropland map created for Africa, five different land cover maps were used in order of rank or confidence: AfriCover, MODIS Crop Likelihood, MODIS Land Cover, GlobCover and the GLC-2000 land cover products. When five different products are combined, there are 30 different possible combinations or rankings from 1 to 30, see Table 3. AfriCover is given priority ranking over the other land cover products. The remaining land cover maps were then ranked based on how accurate they are perceived to be in cropland estimation. In this case we assumed that the more recent product was indeed perceived as more accurate (1st most recent MODIS Crop Likelihood, 2nd most recent MODIS Land Cover, 3rd most recent GlobCover and least recent GLC-2000). A synergy map was then created after implementing the ranking.

Table 3. Synergy Map Ranking Combinations When Combining Five Land Cover Productsa
ClassGLC-2000 PixelMODIS Land Cover PixelGlobCover PixelMODIS Crop Likelihood PixelAfriCover Pixel
  • a

    Where 1 indicates the presence of cropland and 0 indicates no cropland present.

111111
201111
311011
410111
511101
601011
700111
801101
911001
1010101
1100011
1200101
1300001
1411110
1501110
1611010
1710110
1811100
1911010
2010110
2101010
2200110
2310010
2401100
2511000
2610100
2700010
2801000
2900100
3010000

2.2.2. Creating a Cropland Map Through Calibration With National Crop Statistics

To create a map of cropland, the synergy map must be calibrated using crop statistics. Where only national crop statistics are available, the process is as follows for one country at a time:

1. Cells with the highest ranking are selected and the sum of the cropland area is calculated taking the minimum value of cropland as specified in the legend definitions of each land cover product. Where the synergy map shows agreement of 2 or more land cover products, the average cropland area would be used. For example, in Africa the GLC-2000 uses 50% for the minimum cropland while MODIS uses 60%. If both maps agreed that a given pixel contained cropland, a minimum value of 55% would be used.

2. The cells with the next highest ranking are added to the previously selected cells and the total area of the selection is calculated.

3. Step 2 is repeated until a total area of the selected cells (summed up on a country level) match the national statistics as closely as possible.

This process creates a cropland map based on the minimum value of cropland specified in the definitions of cropland in different land cover products. It is also possible to create these maps using the maximum definition and the average. An illustration of the algorithm is provided in Figure 3 for a simple example. The synergy map pictured has pixels with ranks between 1 to 4. The total areas of cropland belonging to these classes are 59, 75, 36 and 77, respectively. The national statistics for that country indicate an area of 180 so the classes are summed until they approach this value, which would be classes 1 to 3, giving a total value of 170. Hence the new statistics on cropland for that country would be 170, and the resulting map is a combination of the three highest ranked areas, or areas with the highest likelihood of containing cropland.

Figure 3.

Creating a cropland map through calibration of the synergy map with national crop statistics.

2.2.3. Calibrating the Synergy Map With Sub-National Crop Statistics

The detailed sub-national crop database compiled by IFPRI [You et al., 2009] contains statistics for nearly all crops, but in some sub-national units there may only be a few or no statistics available. However, it is still possible to make use of these data by applying some simple modifications to the cropland map created from the national calibration. The process is illustrated in Figure S1 of the auxiliary material for a country with 3 sub-national units represented by dark grey, light grey and white. Numbers in each pixel show the area of the cropland. Sub-national cropland areas are calculated from the nationally calibrated map and compared to the sub-national statistics using these simple rules:

1. When no data are available at sub-national level, the cropland areas are taken from the national level map, which is the white zone in Figure S1.

2. When the sub-national level statistical data results in higher cropland areas than the one derived from the national analysis (pixel areas summed up on a sub-national level from previous analysis – see section 2.2.2), then the sub-national cropland values are used for this zone, shown as light grey in Figure S1.

3. When the national level analysis yields a higher area of cropland than the sub-national level, the algorithm outlined in section 2.2.2 is rerun for such zones. However, the sub-national statistics are used in place of the pixels summed from the nationally calibrated map, with the requirement that the national crop totals must be preserved.

2.2.4. Validation of the Cropland Map

The cropland map was validated against a sample of 2553 points systematically distributed over Africa at each lat/long intersection. These vector points were placed on top of Google Earth as KML files within agriculture.geo-wiki.org and the validation was done as the percentage of cropland visible within a 1 km pixel. The outline of the pixel appears directly on top of Google Earth when the validation tool embedded into agriculture.geo-wiki.org is used. Using Google Earth as a tool for validation of land cover has been undertaken for the validation of other land cover products [Pekkarinen et al., 2009; Bicheron et al., 2008] and is particularly appealing in remote areas of Africa where no other datasets for validation are currently available. The majority of the very high resolution images were from the year 2003 to 2007. Even though not ideal we still consider this validation dataset sufficient to validate our cropland map. In the future Google Earth/Google Earth Engine will make an increasing amount of very high resolution time series available which will allow to improve the validation datasets. The validation was undertaken by experts trained in the recognition of cropland areas. Three different confidence levels were defined: 1 = very confident (corresponding to very high resolution Google Earth images (60 cm to 1 m); 2 = medium confidence (high resolution – 2 to 10 m) or Landsat in non-complex landscapes; and 3 = least confident (30 m Landsat data). The experts were a group of young scientists who have been specifically trained in the validation of land cover during the summer of 2010. Validation was undertaken at each latitude/longitude intersection across Africa, and distributed between the scientists. In some cases, multiple validations were undertaken at the same point.

3. Results

The validated cropland map is provided in Figure S2. A confusion matrix was produced differentiating between cropland and non-cropland. The overall accuracy for all 2553 pixels was 82.7%, the omission errors for cropland and no cropland were 61.8% and 8.0% respectively, and the commission errors were 49.7% for cropland and 12.4% for no cropland. When confidence level 3 was excluded, the number of validated pixels reduced to 2125, but the results show a robust pattern (i.e. overall accuracy of 83.1%, omission errors of 62.5% (cropland) and 7.6% (no cropland) and commission errors of 49.8% (cropland) and 12.1% (no cropland). Finally, when just those pixels are included with very high confidence, the overall accuracy increases to 85.7% although the sample size decreases quite substantially to 1889 pixels.

High accuracies are achieved in eastern Africa (probably due to the inclusion of AfriCover), whereas the accuracy is lower in Western Africa (Burkina Faso, Ivory Coast, and Ghana). A case study in South Africa has been chosen which shows the 4 different cropland input maps, the probability layer, and the resulting calibrated synergy map (Figure S3).

4. Summary and Discussion

We synergistically combined existing cropland data sets to improve estimates of croplands in Sub-Saharan Africa, reducing local discrepancies with national and sub-national crop statistics. The resultant cropland map ranges from 83–86% accuracy. The methodology is flexible in that it can easily incorporate new products as they become available, in particular with respect to new Landsat based products that will be developed over Africa (e.g., AfriCover Senegal). An accuracy of 83% represents both an improvement over the original input products and an independent dataset (the M3-Cropland layer produced by Ramankutty et al. [2008]). If the same error matrix is applied to the M3-Cropland layer the resulting accuracy is 69% (although the pixel size is larger and hence not directly comparable). However, the accuracy assessment also shows that there is still room for improvement. The method described here obviously depends highly upon the accuracy of the input layers as well as the reliability of the sub-national statistics. As more high quality data becomes available in the near future, this product is expected to improve further.

We expect that the improved cropland extent map will also be incorporated into models in order to decrease the current uncertainty which is associated with cropland extent, and that this will lead to an overall improvement in predictions applied in the field of global vegetation modeling, global land use modeling, climate change modeling and earth systems modeling in general.

Acknowledgments

We would like to thank Markus Tum, Luke Burns and Adriana Gomez for carrying out the validation. This research was supported by the European Community's Framework Programme (FP7) via EuroGEOSS (226487).

Ancillary

Advertisement