Mapping forest canopy height globally with spaceborne lidar



[1] Data from spaceborne light detection and ranging (lidar) opens the possibility to map forest vertical structure globally. We present a wall-to-wall, global map of canopy height at 1-km spatial resolution, using 2005 data from the Geoscience Laser Altimeter System (GLAS) aboard ICESat (Ice, Cloud, and land Elevation Satellite). A challenge in the use of GLAS data for global vegetation studies is the sparse coverage of lidar shots (mean = 121 data points/degree2 for the L3C campaign). However, GLAS-derived canopy height (RH100) values were highly correlated with other, more spatially dense, ancillary variables available globally, which allowed us to model global RH100 from forest type, tree cover, elevation, and climatology maps. The difference between the model predicted RH100 and footprint level lidar-derived RH100 values showed that error increased in closed broadleaved forests such as the Amazon, underscoring the challenges in mapping tall (>40 m) canopies. The resulting map was validated with field measurements from 66 FLUXNET sites. The modeled RH100 versus in situ canopy height error (RMSE = 6.1 m, R2 = 0.5; or, RMSE = 4.4 m, R2 = 0.7 without 7 outliers) is conservative as it also includes measurement uncertainty and sub pixel variability within the 1-km pixels. Our results were compared against a recently published canopy height map. We found our values to be in general taller and more strongly correlated with FLUXNET data. Our map reveals a global latitudinal gradient in canopy height, increasing towards the equator, as well as coarse forest disturbance patterns.

1. Introduction

[2] Forest vertical structure remains poorly characterized despite being a predictor of aboveground live biomass [Lefsky et al., 2002; Drake et al., 2002; Anderson et al., 2006], primary productivity [Thomas et al., 2008], and biodiversity [Goetz et al., 2007]. Here, we model forest vertical structure using data from the Geoscience Laser Altimeter System (GLAS) aboard ICESat (Ice, Cloud, and land Elevation Satellite). The altimeter transmitted a 1024 nm light pulse and recorded the reflected signal (waveform) [Zwally et al., 2002]. We use these data to construct a global wall-to-wall map of forest canopy height.

[3] ICESat/GLAS acquired data globally between 2003 and 2009. However, lidar shots provide an incomplete coverage of the Earth. GLAS footprints were approximately 65 m in diameter, spaced by 170 m along track and several tens of kilometers across tracks, a distance that increased in the tropics. To compound this data dearth, obstruction by clouds often hindered coverage. Thus, producing a wall-to-wall map requires exploiting the relationship between footprint level lidar-derived canopy height estimates and spatially continuous ancillary variables, such as data from the Moderate Resolution Imaging Spectroradiometer (MODIS).

[4] There are numerous approaches to associate the sparse lidar footprints with the spatially continuous ancillary variables. One option is to segment or classify the study site to obtain patches that share a meaningful ecological parameter (e.g. age, species composition), such that lidar measurements can be scaled up to the patch level [i.e., Lefsky et al., 2005a; Boudreau et al., 2008]. A recent global canopy height map [Lefsky, 2010] was produced by segmenting MODIS reflectance data to delineate forest patches. We argue that those results can be difficult to interpret at coarse resolution given that (i) MODIS images were segmented based on spectral/textural heterogeneity thresholds that do not readily translate into forest stand properties; (ii) model error can be attributed to both sub-pixel (500-m MODIS) and sub-patch (1–900 MODIS pixels) variability. As an alternative, we propose to associate selected GLAS shots with 1-km pixels as we believe that in the absence of high-resolution forest disturbance/age maps to serve as segments, the pixel approach deserves evaluation.

[5] We produced a global wall-to-wall canopy height map by combining GLAS RH100 estimates and global ancillary variables. This paper also extends previous work [Lefsky, 2010] by validating results against field measurements and considering the impact of sub-pixel variability on model accuracy. In mapping canopy height globally, we are interested in characterizing fine-scale variability (attributable to disturbance) against a backdrop of generally more coarsely changing edaphic and climatic gradients. With that goal in mind, we selected globally available climate, elevation, and vegetation cover layers. This includes the MOD44B percent tree cover product [Hansen et al., 2003] from MODIS, elevation from the Shuttle Radar Topography Mission, SRTM [Farr et al., 2007], as well as climatology maps from the Tropical Rainfall Measuring Mission, TRMM [Kummerow et al., 1998] and Worldclim database [Hijmans et al., 2005]. All ancillary variables were resampled to bring the resolution of the output wall-to-wall map to 1-km.

[6] In the context of building a systematic algorithm, the first step was to develop an objective procedure to select waveforms and correct slope-induced distortions in RH100 [Lefsky et al., 2005b] and calibrate canopy height estimates; RH100 estimates were calibrated with field measurements. Second, we used a regression tree approach to model lidar points from the available ancillary variables. Finally, model predictions were independently validated against field estimates from 66 FLUXNET sites distributed globally and covering a broad range of forest types.

[7] Our error analysis accounted for the disparity in scales between lidar footprints and the resolution of the ancillary variables. In our model, lidar shots effectively represented samples of the underlying forest structure, and did not always intersect the tallest tree within a 1-km pixel. Conversely, ancillary variables might not always reflect the environmental conditions at the lidar footprint, but rather comprise an average over 1-km2. Due to the scarcity of fine-resolution global vegetation maps, we have examined the impact of sub pixel variability on model error indirectly by looking at two surrogates of forest heterogeneity: degree of disturbance and forest type. First, we hypothesize that our estimates at protected sites (as defined by the UN World Database on Protected Areas, have less error due to less structural variability associated with anthropogenic disturbance. Second, we hypothesize that across forest types, model error increases with variance in canopy height.

[8] The resulting wall-to-wall map can be downloaded from the web ( and reveals regional canopy height gradients as well as coarse disturbance patterns. Our results were compared against another canopy height product [Lefsky, 2010], and differences were examined in light of the choice of calibration algorithm, ancillary variables, and modeling procedures.

2. Methods

2.1. Canopy Height Estimation

[9] Our analyses were based on the GLA14 land product version 31. Each GLA14 waveform is a fit of the original GLAS waveform, modeled by a maximum of 6 Gaussian distributions [Brenner et al., 2003]. The GLAS-derived estimate of canopy height is the waveform metric RH100, defined as the distance between signal beginning and the location of the lidar ground peak [Harding and Carabajal, 2005; Sun et al., 2008; Boudreau et al., 2008]. In the GLA14 product version 31, the signal beginning is defined as the location at which the signal is 3.5 times above the noise standard deviation. The location of the Gaussian distributions is constrained to be between the signal beginning and end. Generally, the ground can be determined as the last Gaussian peak, which works best in flat areas and open canopies. Within closed canopies, locating the ground is sometimes difficult (e.g. when the last peak has low amplitude relative to another neighboring peak) [Boudreau et al., 2008]. It has been shown that using a regression through waveform extent and a terrain index derived from an ancillary DEM can alleviate ground detection issues and improve canopy height estimates [Lefsky et al., 2005b; Rosette et al., 2008]. However, the regressions may be site specific and may introduce significant biases. On the other hand, Rosette et al. [2008] obtained reasonable results using the location of the last Gaussian peak as the ground level and, importantly, found it had the lowest mean error (0.39 m). Since the regression tree methodology essentially represents overall trends, it is important that potential bias in the GLAS estimate of canopy height be minimized.

[10] After the systematic selection process described in section 2.2, the RH100 values were rounded to the nearest meter and used as input in the regression tree to produce a wall-to-wall canopy height map (see section 2.3). If more than one GLAS shot intersected a 1-km pixel, all points were used. Instead of locally combining multiple GLAS shots, model averaging is performed as the last step of the regression tree approach (see section 2.3).

2.2. Waveform Selection

[11] We selected the data acquired with laser L3C between 2005-05-20 and 2005-06-23. This campaign was chosen due to its temporal overlap with the 2005 MODIS Percent Tree Cover product (MOD44B).

[12] The overall goal of the waveform selection procedure was to isolate data points from forested sites while reducing the impact of slopes and cloud contamination on canopy height estimates. We selected GLAS shots that fell within a forest class as defined by the Globcover map [Hagolle et al., 2005]. Because the 65 m GLAS footprint samples only a fraction of the 1-km2 land cover pixel, not all waveforms are from forest canopies. Those cases were problematic given our objective to model the tallest canopies as opposed to gaps or forest edges. To ensure that shots were reflected from a forest canopy, we selected waveforms characterized by more than one Gaussian peak, assuming that waveforms with a single peak are due to ground reflection only.

[13] GLA14 waveforms were also filtered using engineering and signal parameters to account for cloud cover and terrain slope. The GLA14 product contains a cloud detection flag that we found to be too stringent. Instead, we computed the Signal-to-Noise Ratio (SNR) to detect waveform hindered by clouds as well as waveforms with other unidentified measurement and system issues. To further remove waveforms with signal dominated by clouds, we selected waveforms that were co-located close to the ground as defined by SRTM (i.e. within 80 m to account for forest height and SRTM elevation errors).

[14] Terrain slope is the main factor contributing to canopy height estimation error [Lefsky et al., 2005b; Duncanson et al., 2010]. The impact of slope is to broaden the lidar waveform, thereby introducing a bias in canopy height estimates. For a large footprint lidar such as GLAS, pulse broadening can be significant. Assuming homogeneous canopy height within the footprint, the maximum broadening of the waveform extent can be calculated as the distance between the lowest ground point and the top of the highest tree within the footprint. This implies a potential bias = Df *tan(slope) [Chen, 2010], where Df is the footprint diameter. As an example, laser L3 with a ∼65 m footprint may exhibit an 11.5 m canopy height estimation bias over a 10 degree slope.

[15] To minimize canopy height estimation bias due to terrain slope, we produced a slope map extending between 60S and 60N, from the 90 m SRTM elevation data to estimate the potential bias. Building on this simplified model, we only preserved waveforms located in slopes below 5 degrees and for which the necessary bias correction was less than 25% of the measured RH100. For instance, a waveform located on a 5° hill was selected if the original RH100 was greater than 23 m. We found this step significantly improved predictions in comparison with an earlier product version [Simard et al., 2008]. Although, slope compensation should always reduce the measured canopy height (RH100), the random DEM errors cause under-and over-estimation of slopes (∼±5°). This generates a random compensation error (∼6 m) that is reduced through averaging within the regression tree model. On the other hand, a heterogeneous canopy within the GLAS footprint may be over-compensated by our simplistic correction model that assumes homogeneous stands at the scale of a GLAS footprint.

2.3. Producing a Global Wall-to-Wall Forest Canopy Height Map

[16] We employed the regression tree method Random Forest (RF) [Breiman, 2001] to model RH100 values based on global ancillary variables, and to estimate canopy height values for areas not covered by GLAS waveforms. RF has been formalized in the context of non-parametric statistics and machine learning algorithms. The method has been successfully used in initiatives to map vegetation flood state [Whitcomb et al., 2009], biomass [Baccini et al., 2008; Powell et al., 2010] and species distribution [Prasad et al., 2006]. One main advantage is that multiple predictor variables can be incorporated without making assumptions about their statistical distribution or covariance structure. RF iteratively splits the response variable (RH100) into two groups. At each split the model chooses the predictor variable that minimizes the within-group variance in RH100. The list of splitting rules is stored in a “tree” object. RF has been shown to outperform other machine learning algorithms due to its “bagging” strategy [Prasad et al., 2006]. Essentially, before growing each tree, RF selects a user-defined number of input points and subsamples the predictor variables. This procedure is expected to decrease the correlation among individual “trees,” which improves model accuracy [Breiman, 2001]. After the model is run, a single prediction is obtained by averaging the predictions from all “trees.”

[17] We included 7 ancillary variables in our regression tree model, primarily corresponding to climate and vegetation characteristics (Table 1). Some datasets were combined to achieve global coverage, as follows. The 3B43_V6 Accumulated Precipitation product from the Tropical Rainfall Measurement Mission (TRMM) covers latitudes between −50 and 50. For large latitudes, the precipitation estimates were obtained from the Worldclim product (Table 1). Similarly, the SRTM elevation product is restricted to latitudes −60 and 60, and the coverage was extended using the GTOPO product. We found pixels above 70N that were identified as forest sites by the Globcover product, but yet not covered by the MOD44 map. For these sites, we defined the percent tree cover as the average percent tree cover for each Globcover class, calculated for latitudes 68–70N. All maps were interpolated to 1-km using majority rule (for categorical variables) or bilinear interpolation (for continuous variables).

Table 1. Ancillary Variables Used in the Regression Tree Approach to Model Canopy Height (RH100) Estimates From GLAS
Annual Mean Precipitation (mm)Worldclim1950–20000.00833 DegHijmans et al. [2005]
 TRMM2001–20080.25 DegKummerow et al. [1998]
Precipitation Seasonality 100 * SD (mm)/mean (mm)Worldclim1950–20000.00833 DegHijmans et al. [2005]
TRMM2001–20080.25 DegKummerow et al. [1998]
Annual Mean Temperature °C * 10Worldclim1950–20001 kmHijmans et al. [2005]
Temperature Seasonality 100 * SD (°C)Worldclim1950–20001 kmHijmans et al. [2005]
Elevation (m)SRTM + GTOPO20001 kmU.S. Geological Survey [2006]
Tree Cover (%)MOD44B2005500 mHansen et al. [2003]
Protection Status (7 classes)UN World Database on Protected Areas2010Vector

[18] The RF package used here was implemented in C++ with a wrapper in R language (R Development Core Team,, 2011), and includes both regression and classification functions. Here, we employ a regression tree approach to handle a continuous, ordinal response variable (RH100). We generated one “RF forest” per Globcover class [Hagolle et al., 2005]. We considered 12 classes here, including forest/cropland mosaics but excluding shrublands. Each “RF forest” contained 500 “RF trees,” and individual “RF trees” were grown from a subset of 25000 randomly selected points and 4 predictor variables. The sampling was performed with replacement, that is, there is some overlap in the input data among “RF trees.” The results were mosaicked to produce the wall-to-wall map.

[19] An initial modeling attempt included slope maps derived from SRTM, but that introduced artifacts in the results so slope was removed from the list of ancillary variables. Inclusion of Normalized Difference Vegetation Index (NDVI) maps from MODIS, did not improve model accuracy. Furthermore, the model averaging employed by Random Forest creates a trade-off between overall model accuracy and ability to predict extreme values. We updated the model to address this issue with two modifications: (1) by taking the maximum prediction across all 500 trees, instead of calculating the average prediction; and (2) by developing a stratified sampling strategy to ensure that all height classes are equally represented in the training data. These two strategies led to a significant decrease in model accuracy. We decided for using the model in its original inception, which facilitates reproducibility.

2.4. Field Calibration/Validation

[20] We present two forms of field validation—one for the individual GLAS estimates of canopy height within the 65 m footprints, and the other for the resultant modeled 1-km pixels from the wall-to-wall map. For clarity, we will refer to the former as the calibration procedure since it is to ensure the footprint-level RH100 measurements correspond to the top canopy height. On the other hand, comparison of field data with the resulting canopy height map will be considered as the validation procedure.

[21] Calibration of the RH100 derived from GLAS waveforms was based on a set of 98 co-located field measurements collected in tropical ecosystems in Uganda, Africa, during September 2008. Each field sample covered an area of 1600 m2 centered on a GLAS shot. The height of the three tallest trees within the plot was measured using a clinometer. To minimize the effect of slope [Lefsky et al., 2005b; Yang et al., 2011] in the field data, we considered only GLAS shots on terrain with a slope smaller than 10% (slope was measured with a clinometer). To minimize the effect of roughness we limited the area where tree height was collected and measured tree height only of the 3 tallest trees within a radius of 20 meters. By doing so, the measurements are derived from the footprint region with highest gain [Lee et al., 2011]. The height of the tallest tree and the average height of the three tallest trees were compared to the co-located footprint level estimates of RH100.

[22] The Uganda dataset (above), as well as canopy height data from the FLUXNET La Thuille database [Baldocchi, 2008; Baldocchi et al., 2001] were used to validate the 1-km wall-to-wall map and assess model error/uncertainty. The FLUXNET database was selected for three primary reasons: (i) most sites are relatively homogeneous and flat to 1-km2, which minimizes error due to sub pixel variability; (ii) the sites have global coverage, include most major biome types and climates, and represent one of the largest available datasets of ecosystem measurements; and, (iii) data quality is generally high due to the presence of a physical above-canopy tower at each site, as well as repeated measurements for these well-instrumented sites that require accurate canopy height for their core flux measurements. The FLUXNET database contains 475 sites, of which 120 sites contained tree height data relevant to the ICESat time period, 86 sites had in situ data and were covered by our map. Finally, 66 FLUXNET sites with vegetation known to be greater than 5 m were covered by our map. This represents 9 vegetation classes, primarily dominated by closed (>40%) broadleaved deciduous forest (16 sites) and closed (>40%) needleleaved evergreen forest (15 sites). Some error is introduced because the longitude/latitude for each site was given for the tower location, but the canopy (e.g., footprint) that the tower instruments measure is often adjacent to the tower, rather than symmetrically surrounding the tower. The accuracy and method of the in situ canopy height measurements was generally not reported and non-uniform, though from experience working at some of these sites and taking such measurements the in situ uncertainty typically increases linearly with height from negligible to up to 10% for the tallest (>30 m) trees. Method of measurement ranges from: ground-based or airborne lidar, laser rangefinder/clinometer, and climbing the tower or tree and measuring distance to ground with a tape measure (or directly holding up a tape measure to shorter vegetation). At seven sites, land cover heterogeneity surrounding the site was significant and led to severe pixel contamination. We present results for both with and without those seven “outliers.”

2.5. Error Analysis

[23] The main objective of the error analysis was to assess the influence of terrain and sub pixel variability on model accuracy. That is, how consistently accurate are predictions in disturbed vs. pristine sites, in tall vs. short forest types, and in flat vs. steep terrain? For this task, we compared model predictions against the RH100 data points used to derive the regression tree. The root mean square error (RMSE) was calculated for each forest type as:

equation image

Where yi is the canopy height observed from GLAS (RH100) and equation imagei is the modeled canopy height in the wall-to-wall map.

[24] In addition, we calculated the percent error for each GLAS point as:

equation image

A t-test was used to examine the impact of protection status (protected or not protected) on percent error. We hypothesize that protected areas have lower error due to lower variability in canopy height within 1 km pixels.

2.6. Comparison With the Existing Global Canopy Height Product

[25] We compare our wall-to-wall estimates of canopy height map with the recent product from Lefsky [2010], which is also based on GLAS data. Although both maps aim at characterizing canopy height globally, there are important differences in terms of canopy height metric and modeling strategy. We calculate and discuss the differences between the two maps in light of the different calibration procedures, ancillary variables, and modeling approaches. We also compare the individual pixels from Lefsky [2010] to the FLUXNET validation dataset.

3. Results

3.1. Waveform Filtering and Slope Correction

[26] We first selected waveforms with multiple peaks that belonged to a Globcover forest class. We found the GLA14 cloud detection flag removed lidar waveforms that could still be used for canopy height estimation. On the other hand, the choice of a minimum acceptable SNR value was easily identified. The plot in Figure 1 shows that low SNR is a good indicator of the measurement quality since the large amount of very large canopy height (>50 m) for SNR below 50 is not realistic. Our choice of a minimum acceptable SNR value can be strengthened by considering the difference in terrain elevation estimates between GLAS and SRTM data. Figure 2 shows the elevation difference between these two datasets. For differences greater than a few tens of meters the GLAS returns likely originated from clouds, explaining the low SNR values encountered. Finally, only waveforms in terrain slopes less than 5° and requiring a slope correction less than 25% of the uncorrected RH100 were selected to minimize the impact of terrain slope on the regression.

Figure 1.

Signal-to-Noise Ratio (SNR) versus RH100 estimates from GLAS points.

Figure 2.

Difference between GLAS ground estimates and SRTM elevation (m). Very large values (>1000) in general had low Signal-to-Noise Ratio. These artifacts are likely due to cloud contamination.

[27] The waveform selection procedure generated a clean set of 2.5 million data points from the initial 39 million land shots for campaign L3C (6.4%). The resulting clean GLAS dataset is globally distributed with a denser sampling in flat, forested areas such as the temperate and boreal forest, the Amazon and the Congo Basin (Figure 3).

Figure 3.

Density of GLAS shots (per degree) used as input in the tree regression procedure.

3.2. Global Canopy Height Gradients

[28] Canopy height estimates from the filtered GLAS shots revealed two main patterns. RH100 values increases with MODIS tree cover estimates (Figure 4). In general, canopy height decreases with latitude and elevation, except for a peak around 40S (Figure 5). Tall forest stands (in Victoria, Tasmania, and New Zealand) had a disproportional contribution to this latitudinal band, whereas in the North canopy height values were averaged across a broader range of forest types.

Figure 4.

Relationship between percent tree cover estimated from MODIS Vegetation Continuous Fields and GLAS-derived canopy height (RH100). The dashed vertical lines indicate the range in GLAS RH100, the horizontal line within the boxes corresponds to the median canopy height and the boxes contain 50% of the points.

Figure 5.

Mean GLAS-derived canopy height (RH100) as a function of latitude.

3.3. Wall-to-Wall Canopy Height Map

[29] The wall-to-wall canopy height map (Figure 6) can be downloaded from In the Amazon basin, the wall-to-wall map shows differences in canopy height as a function of distance from rivers, as well as edge effects associated with road construction and along the Amazonian arc of deforestation (Figure 6, inset). The distribution of canopy height values was in general slightly narrower in the wall-to-wall map as compared to the original GLAS points (Table 2), but also showed a right-skewed distribution (Figure 7).

Figure 6.

Wall-to-wall map produced by modeling GLAS points with a regression tree approach. The inset shows a disturbance gradient in the Amazon (color scale was adjusted to increase contrast).

Figure 7.

Comparison of the distribution of canopy height values from GLAS points and the wall-to-wall map.

Table 2. RH100 Mean and Standard Deviation (m) for Input GLAS Points and the Wall-to-Wall Canopy Height Map
 Mean (m)Standard Deviation (m)
Map values16.98.0
GLAS values18.111.5

3.4. Ground Validation and Error Analysis

[30] Because the 98 points collected in Africa were coincident with the GLAS data, we used them to calibrate the estimates of canopy height from GLAS waveforms (i.e. local estimates of RH100) as well as validate the model estimates (i.e. the final spatially continuous map). The result is shown in Figure 8 with (a) local RH100 calibration error of 4.1 m (R2 = 0.84) and (b) wall-to-wall model RMSE of 6.6 m (R2 = 0.64). In the latter case, the wall-to-wall map tended to underestimate canopy height of tall forest stands (Figure 8b).

Figure 8.

Validation of GLAS and regression tree predicted top canopy height using the African dataset. (a) Field measured top canopy within GLAS footprint versus GLAS estimate of the canopy height. The estimates of canopy top height from GLAS shots correspond closely to the field estimates within the footprint (RMSE = 4.1 m and R2 = 0.84). (b) The regression tree model predicted height versus average field measured canopy height. The model tends to underestimate taller canopies (RMSE = 6.6 m and R2 = 0.64).

[31] For the FLUXNET validation, the RMSE, coefficient of determination (R2), and slope of the linear regression were calculated to assess the deviation between the in situ measurements and the wall-to-wall map. Figure 9 shows our model has an RMSE of 6.1 m. By removing 7 outliers with error larger than 10 m (triangles), the RMSE reduces to 4.4 m with a significant r2 of 0.69.

Figure 9.

Comparison of derived heights and field measured height from the FLUXNET dataset.

[32] Model accuracy was also examined by looking at the difference between the wall-to-wall map and the footprint level GLAS estimates of canopy height. Figure 10 shows the geographical distribution of model RMSE with respect to the footprint level GLAS RH100 estimates within a 1° cells. Mean percent model error was lower in protected sites (26%, as compared to 32% in non-protected sites), and this difference was statistically significant at 0.95 confidence level (t = 48.98, P < 0.001). Furthermore, RMSE differed among forest cover classes. Results indicate that model accuracy is lower in tall, closed broadleaved forests (a class that includes the Amazon) and in mosaic land cover classes (Table 3).

Figure 10.

Model RMSE with respect to RH100 estimated from GLAS shots within 1° cells.

Table 3. Mean and Standard Deviation by Vegetation Type With Model RMSE Computed From Individual GLAS Footprint
Class NameaMean RH100 (m)SD RH100 (m)RMSE (m)
Mosaic Cropland/Vegetation14.09.96.0
Mosaic Vegetation/Cropland13.19.66.0
Closed to Open Broadleaved or Semi-Deciduous Forest29.911.77.8
Closed Broadleaved Deciduous Forest19.08.96.4
Open Broadleaved Deciduous Forest/Woodland13.07.04.4
Closed Needleleaved Evergreen Forest20.39.85.7
Open Needleleaved Deciduous or Evergreen Forest17.28.45.8
Closed to Open Mixed Forest19.88.85.4
Mosaic Forest or Shrubland/Grassland12.88.75.2
Mosaic Grassland/Forest or Shrubland12.48.34.5
Closed to Open Broadleaved Forest Regularly Flooded – Fresh or Brackish Water26.110.85.2
Closed Broadleaved Forest or Shrubland Permanently Flooded – Saline or Brackish Water16.09.13.8

3.5. Comparison With Previous Global Map of Canopy Height

[33] We compared our canopy height map (Figure 6) with the Lefsky [2010] map shown with the same color coding and geographical projection (Geographical, WGS84) in Figure 11. There are large differences in vegetation coverage that can be attributed to our inclusion of land cover classes with low forest cover densities such as mosaic crops, open forest, and saline flooded forests. In addition, open needleleaf deciduous or evergreen forest and closed to open mixed broadleaf and needleleaf forest, are only partially covered in the Lefsky [2010] map, but fully covered in our map.

Figure 11.

Global forest height by Lefsky [2010] using matching color table and coordinate system (i.e. geographical, WGS84).

[34] We calculated the difference between the two products, and found our canopy height estimates to be in general taller though not everywhere (Figure 12). This is partially explained since we modeled the top canopy height instead of Lorey's height, which is a tree-size weighted mean height. For closed to open broadleaf evergreen or semi-deciduous forest, which dominates the tropical belt, our estimates are on average 12 m (±8.9 m) taller. In the boreal zone, which is dominated by open needleleaf deciduous or evergreen forest our map has a canopy average 7 m (±8.3 m) taller. However, this land cover class is not fully covered in the Lefsky [2010] map, which greatly contributes to this mean difference.

Figure 12.

Canopy height difference between forest height maps from this study and Lefsky [2010].

[35] In areas where the Lefsky [2010] map shows taller canopies (Figure 12), we noticed a general spatial pattern that corresponds to regions of severe topography. However, we found no correlation between slope steepness and the amount of difference between the map estimates. Instead, to verify this observation, we plotted the cumulative distribution of the area where the Lefsky [2010] map is taller than ours as a function of terrain slope (Figure 13). Terrain slope was computed using SRTM elevation dataset. The plot shows that around 40% of these areas occur in terrain where slopes are steeper than 5°. We can only conclude that the Lefsky [2010] map generally has taller forests in the presence of topography.

Figure 13.

Cumulative distribution of points where estimates from the Lefsky [2010] map are larger than the ones reported in this study as a function of terrain slope. Areas where estimates of canopy height from this study are taller than Lefsky's are not shown here.

[36] The difference in height between the Lefsky [2010] estimates and FLUXNET measurements should not be impacted by topography since sites are generally flat. However, we can expect to see a bias due to the different metrics that will not affect the r2 in the FLUXNET validation. We found poor correspondence in the Lefsky [2010] map with the FLUXNET validation data (r2 = 0.01, RMSE = 9.6 m; with 17 “outliers” removed: r2 = 0.26, RMSE = 6.1 m) (Figure 14).

Figure 14.

Comparison of Lefsky [2010] canopy height map versus FLUXNET dataset (same field measurements used in Figure 9).

4. Discussion

[37] The global distribution of RH100 estimates from processed GLAS waveforms revealed a latitudinal gradient in which canopy height peaks at low latitudes and in Southern temperate rainforests (Figure 5). The peak around latitude 40S coincides with forests in SE Australia and Tasmania that harbor the species Eucalyptus regnans, one of the world's tallest flowering plants [Keith et al., 2009].

[38] A few issues complicate the comparison of our map and the existing canopy height map [Lefsky, 2010]. Our product (Figure 6) maps the top of the canopy as defined by the waveform percentile metric RH100. Lefsky [2010] has mapped Lorey's Height (Figure 11), which is empirically derived from waveform shape parameters and is expected to be lower than RH100. At the lidar footprint level, Lorey's Height shows a robust relationship with aboveground biomass (Mg/ha) [Lefsky, 2010]. On the other hand RH100 is a more direct measurement, which can also be related to biomass and has been used in ecosystem dynamics modeling [Thomas et al., 2008]. Beyond differences in canopy height metrics, product utility will also depend on the sampling and prediction errors that incur during the generation of the wall-to-wall maps.

[39] Canopy height values were aggregated when GLAS shots fell in the same forest patch in the Lefsky [2010] map. In our case, predicted canopy heights were averaged during the tree regression procedure. Thus both products represent aggregated canopy height measures, while differing with respect to the fundamental unit and the stage at which the aggregation is performed. In addition, the land cover product used to delineate forest types improved our results. We used more inclusive forest categories from [Hagolle et al., 2005] that included mosaics, thus our map has a broader coverage. On the other hand, the use of an ecoregion map [Olson et al., 2001] in the Lefsky [2010] product likely allowed the identification of extreme canopy height values associated with particular associations (for example, Red Woods in the Northwest USA). The observed difference in regions with severe topography (Figure 13) indicates that the impact of topography on large footprint lidar estimates of height remains an open issue. In our case, the errors introduced by terrain slope were mostly removed by excluding GLAS shots that required significant slope correction and through the model's averaging process. However, the GLAS shot selection process means our model assumes canopy height for a given forest type and cover, whether on steep or flat terrain, is the same if it has the same climatic conditions, elevation, and percent tree cover. Any residual error due to terrain slope is included in the validation with the FLUXNET data.

[40] Our product shows a better correspondence with unweighted canopy height from FLUXNET sites, as compared to the Lefsky [2010] map. This is expected given that the Lefsky [2010] map models Lorey's height and underscores the fact that the two products have different ecological meanings and potential applications (Figure 9), although more field data are necessary to perform a better comparison of the two products. We argue that our use of a constant resolution (1-km) facilitates efforts to quantify the impact of sub pixel variability on model error. Indeed, a recent paper mapping forest carbon stocks (Mg/ha) over the tropics [Saatchi et al., 2011] exploits the footprint-level Lorey's Height estimates from Lefsky [2010] but models biomass over a 1-km grid. Lefsky [2010] argues for the use of patches, not pixels, as the fundamental unit used to model lidar points. We agree there is not a single optimum resolution to sample forests given variation in canopy geometry, elevation, and successional stage. But there is also a constellation of clustering algorithms that can be used to delineate forest patches. Ultimately, the question is whether these can identify meaningful ecological units such as even-aged stands [e.g., Lefsky et al., 2005a]. Alternatively if units are small enough, their area is comparable to the lidar footprint [e.g., Kimes et al., 2006] such that searching for homogeneous areas may not be necessary given dense lidar coverage.

[41] Model error when comparing predictions against relatively homogeneous forest stands near FLUXNET towers and field plots in Africa was ∼6.1 m (or 4.4 m without outliers) and 6.6 m respectively. While these results limit predictions at the local scale, the map produced here contains one of the best descriptions of forest vertical structure at regional and global scales currently available. For example, regional patterns were observed in the Amazonian arc of deforestation, where smaller trees were predicted along forest edges. There are still limits to our ability to capture tall (>40 m) canopy height values at 1-km resolution (Figures 7 and 8b). Model accuracy was mainly influenced by (i) saturation of the relationship between canopy height and the ancillary variables and (ii) sub pixel variability. The relative contribution of these factors likely influenced the differences in model accuracy across forest types (Table 3). In mosaic classes, lidar points are more likely to miss tall trees and the mismatch between lidar-derived estimates and the 1-km ancillary variables is more pronounced. Tall mature tropical forests, on the other hand, present a challenge due to the saturation of tree cover with canopy height (Figure 4) and the broader range of tree heights (Table 3 and Figure 8b), meaning lidar samples might not always intersect emergent trees. The geographical distribution of model RMSE as compared to the RH100 derived from GLAS (Figure 10), which is larger in forested areas, also indicates the spatial resolution of the input layers may not be sufficient to capture canopy heterogeneities. As expected, model accuracy was higher in protected forests, underscoring the challenges in modeling forest structure in the presence of anthropogenic disturbance. Future initiatives to model vertical canopy structure globally will have to address these sampling design issues. One question is whether fine-resolution continental mosaics (Landsat, ALOS/PALSAR) can be used to model fine-scale disturbances and improve these results.

5. Conclusions

[42] We presented a systematic approach to select, process, and model lidar-derived canopy height estimates from GLAS to produce global wall-to-wall canopy height maps. The prediction map shows a reasonable correspondence with 66 FLUXNET sites (RMSE = 6.1 m, R2 = 0.49; RMSE = 4.4 m and R2 = 0.69 without outliers). A comparison between the model and the input lidar points shows that error differs across forest types and increases as a function of canopy height. The results shown here have implications for the design of active-passive sensor fusion algorithms for mapping forest vertical structure. We expect that maps with error less than 6 m can be obtained when lidar-derived canopy height estimates are integrated with fine-resolution maps, and/or forest age maps. During our work, canopy height data from FLUXNET sites emerged as a critical dataset for validation of forest structure maps, enabling comparison across a wide range of forest biomes. We recommend that FLUXNET canopy height measurements (e.g. top, mean and Lorey's canopy height) be standardized and expanded to all forest sites. In addition, this dataset should be centralized and distributed publically over the internet to facilitate the validation and comparison of future maps by international investigators. An 890 MB GeoTIFF file of the global 1-km canopy height map for public download can be found on the website Users are encouraged to contact the first author and provide feedback about the product.


[43] The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Funded through the MEaSUREs program (project WBS 547714. N. Pinto was funded by the NASA Postdoctoral Program (NPP). The authors would like to thank fruitful discussions with Claudia Carabajal, Mike Kobrick, Ralph Dubayah as well as the anonymous reviewers for their constructive comments. M. Simard is funded by the NASA MEaSUREs program. Canopy height data were provided to J. Fisher by the PI's part of FLUXNET.