Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
Switchgrass (Panicum virgatum L.) is a perennial grass native to the United States that has been studied as a sustainable source of biomass fuel. Although many field-scale studies have examined the potential of this grass as a bioenergy crop, these studies have not been integrated. In this study, we present an empirical model for switchgrass yield and use this model to predict yield for the conterminous United States. We added environmental covariates to assembled yield data from field trials based on geographic location. We developed empirical models based on these data. The resulting empirical models, which account for spatial autocorrelation in the field data, provide the ability to estimate yield from factors associated with climate, soils, and management for both lowland and upland varieties of switchgrass. Yields of both ecotypes showed quadratic responses to temperature, increased with precipitation and minimum winter temperature, and decreased with stand age. Only the upland ecotype showed a positive response to our index of soil wetness and only the lowland ecotype showed a positive response to fertilizer. We view this empirical modeling effort, not as an alternative to mechanistic plant-growth modeling, but rather as a first step in the process of functional validation that will compare patterns produced by the models with those found in data. For the upland variety, the correlation between measured yields and yields predicted by empirical models was 0.62 for the training subset and 0.58 for the test subset. For the lowland variety, the correlation was 0.46 for the training subset and 0.19 for the test subset. Because considerable variation in yield remains unexplained, it will be important in the future to characterize spatial and local sources of uncertainty associated with empirical yield estimates.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Dedicated bioenergy crops are being promoted in the United States and abroad as renewable alternative feedstocks to conventional petroleum energy supplies (Lewandowski et al., 2003; Ragauskas et al., 2006). Transportation fuels, like ethanol, derived from cellulosic plant biomass could benefit economic growth, enhance energy security, reduce greenhouse gas emissions and mitigate the potential impacts of global climate change (Kheshgi et al., 2000; Smith et al., 2000).
Perennial bioenergy feedstocks, such as native grasses and trees, are considered one of the most sustainable sources of renewable transportation fuel because they produce large amounts of biomass, require limited input of water and nutrients, and minimize ecological damage to soils and rivers (Sanderson et al., 1996; McLaughlin & Walsh, 1998; Heaton et al., 2008). Switchgrass (Panicum virgatum L.), a native warm-season grass found in grasslands of the eastern United States (McLaughlin & Kszos, 2005), is one perennial plant under intensive study as a possible bioenergy feedstock. It is a widespread component of the native North American tall grass prairie with a range of adaptation from Nova Scotia, Ontario, and Maine to North Dakota and Wyoming, south to Florida, Nevada, and Arizona, and into Mexico and Central America (Hitchcock, 1971). Across this range, switchgrass populations exist either as upland or lowland ecotypes that differ in habitat preference, morphology, and productivity.
There is great interest in predicting biological, environmental, and geographic variation in yields for perennial bioenergy crops (Heaton et al., 2004). Two types of models can be used to predict yields: mechanistic plant-growth models and empirical models based on field data. For switchgrass, only plant-growth models have historically been used. Various general purpose plant-growth models, such as EPIC (Brown et al., 2000, Thomson et al., 2009), ALMANAC (Kiniry et al., 1996, 2005), and SWAT (Nelson et al., 2006; Baskaran et al., 2009), have been used to predict switchgrass. Grassini et al. (2009) published a model specifically developed for switchgrass. Predictions from these models have been validated against field data collected from a limited geographic range under uniform management conditions. Plant-growth models are extremely valuable, particularly for applications that require extrapolating beyond climate conditions currently experienced by switchgrass.
Empirical models also play an important role. One extreme view advocates the exclusive use of empirical models based directly on field measurements (Peters, 1980). In our view, empirical models, based on data collected over a wide geographic area under diverse management conditions, are needed to understand what responses to environmental gradients mechanistic models should be expected to reproduce. In the functional validation approach developed by Jager et al. (2000), discrepancies between empirical and mechanistic model responses are used to suggest future improvements in mechanistic models. Empirical models are the starting point for a functional validation approach.
The purpose of this study was to develop empirical models to describe relationships between switchgrass yield and environmental covariates. A second role was to use the empirical models to predict switchgrass yield for the conterminous United States. Our empirical modeling efforts built on the wealth of field trials reported in the open literature from site-specific variety trials conducted across the United States over the past two decades (Davis, 2007; Gunderson et al., 2008). In this study, we described empirical responses of yield to environmental covariates and management practices and differences in responses of lowland and upland varieties of switchgrass. In addition, we characterized the residual unexplained variation in switchgrass yield. These empirical models can now be used for functional validation of mechanistic plant-growth models and as input to other models that require yield predictions. Our results are presented spatially for the eastern United States and can be used to assess the implications of our findings for regional and national biomass supply.
Materials and methods
Published field studies of switchgrass yield were compiled from numerous literature sources (Davis, 2007; Gunderson et al., 2008). Following Gunderson et al. (2008), we excluded field studies growing a mixture of ecotypes in order to estimate yields specific to switchgrass ecotypes. Studies of harvest frequency have produced contradictory results (Sanderson et al. 1996; Thomason et al., 2004; Fike et al., 2006), but they concur that yields are lower when harvest frequency exceeds three times per year. We excluded first-year harvests because these are typically lower than those in subsequent years (Fike et al., 2006; Gunderson et al., 2008) and include cases of failure during establishment. Similarly, we excluded trials that experienced catastrophic failures, as indicated by yields <1 Mg ha−1 dry weight (Gunderson et al., 2008). Studies included both those that did and did not irrigate during establishment, as yield was measured during later years.
For the lowland ecotype, field trials were available at 28 locations ranging in latitude from Texas to New Jersey (Table 1). For the upland ecotype, data from more field trials were available in northern locations (Montreal, Canada, North, and South Dakota), and fewer trials were available at southern locations (Louisiana, Texas, and Oklahoma) (Table 1). Our approach to obtaining covariates was to rely on geospatial databases. This was necessary because climate and soils information were not consistently reported across studies. Climate variables used as predictors were obtained from the nearest orographically corrected PRISM climate gridpoint (Daly et al., 1994; Table 1). Soils data (depth to bedrock and % sand) were obtained from the State Soil Geographic Database (STATSGO, USDA Soil Conservation Service, 1992). For each field observation of switchgrass yield, we determined location-specific minimum winter temperature ( °C) (Tmin), average temperature ( °C) for April–September of the year of harvest (Tavg), total April–September precipitation (cm) during the year of harvest (Ptot), total nitrogen fertilizer (kg ha−1) applied (Ntot), an indicator variable set to one if fertilizer was applied (IsFert) and zero otherwise, depth to bedrock (Drock) in m, number of harvests per year (HarvFreq), stand age (Age) in years, and an index of soil wetness (WetSoil) calculated as (100-% sand) ×Ptot, Our soil wetness index represents an interaction between temporally variable precipitation and the percentage of sand (constant for each location). Soils with a lower percentage of sand have a higher water holding capacity, which has implications for yield even at the same level of precipitation (Evers & Parsons, 2003, Parrish & Fike, 2005).
Table 1. Description of field trial locations
Lowland field trials
Upland field trails
Latitude and longitude are in decimal degrees. PRISM climate stations identifiers are given for each location. We list the number of average yield values reported for each of the two ecotypes, with the number of observations in the test subset in parentheses.
We estimated average yield for each ecotype using generalized logistic regression. We applied a logit transform to average yield, LYield=log(Yield/Yieldmax)/[1−log(Yield/Yieldmax)], to ensure that mapped values would not exceed those represented in the data. The maximum yield (Yieldmax) for the upland ecotype was 28 and 40 Mg ha−1 dry weight for the lowland ecotype. The full model included both climatic and nonclimatic covariates [Eqn (1)]. LYield is expressed as a linear function of variables defined in ‘Data’, with coefficients v1 to v11 and intercept, v0. The model for residual error, å, indicates that it is assumed to be normally distributed with variance–covariance matrix, C.
Because several of the field trials provided multiple estimates of switchgrass yield at a given location, it was important to account for within-location correlation. Our model assumes independence between locations but nonzero correlations within yield measurements taken at the same location, i. This error structure is described by a compound symmetric variance–covariance model, which has block-diagonal variance–covariance matrix C of the errors, å [Eqns (2) and (3)]. Within-location correlation, ñ, is estimated for nonzero blocks. Data limitations prevented us from estimating location-specific fixed effects.
We first fitted a full model including all predictors using the ‘nlme’ package in r (Pinheiro & Bates, 2000). We then fitted reduced models (models with fewer predictor variables) for upland and lowland varieties to be used in mapping. For each ecotype, we selected a reduced model to include only those predictors that were significant at ∝=0.1.
We divided our data into two parts: a subset for parameter estimation and a test subset for evaluating goodness-of-fit. Because we wished to consider correlations among measurements from the same location, we stratified the sample by ecotype and location. We selected two measurements for the test subset at random from each stratum, except in cases where only one was available. We used the estimation subset of data to estimate parameters. We then assessed goodness-of-fit of each model by fitting each to the test dataset, which represents approximately 10% of the total data available. Predicted yields were obtained for the test subset by back-transforming logit-transformed estimates based on reduced models in Table 2. The training (test) subset of data in the reduced models included 600 (48) lowland and 459 (55) upland observations.
Table 2. Parameter estimates including coefficients and two parameters describing compound symmetry in residual error for the full models on the left
Full generalized least square models
Reduced parameter models
Estimates for the reduced models used in predicting potential switchgrass yields are shown on the right. Predictors are location-specific average temperature (Tavg) for April–September in the year of harvest, minimum (Tmin) winter temperature (°C), total April–September precipitation (cm) during the year of harvest (Ptot), an index of soil wetness (WetSoil), total nitrogen fertilizer (kg/ha) applied (Ntot), an indicator variable for fertilizer application (IsFert), depth (m) to bedrock (Drock), number of harvests per year (HarvFreq), and stand age (Age) in years.
We evaluated alternative models using both goodness-of-fit and information-theoretic. We reported two goodness-of-fit criteria: residual standard error and Pearson's correlations between predicted and observed yields for both the training and test data subsets. We also reported Akaike's information criterion (AIC). A model with a lower AIC should be preferred over alternative models with higher AIC, even if its goodness-of-fit is poorer. This is because AIC penalizes for over fitting to a particular dataset by including excessive number of predictors and favors models more likely to perform well with new datasets (Burnham & Anderson, 2002).
For each model, we examined the distribution of residuals to determine whether the mean was significantly different from zero. We also regressed the predicted values against observed and compared these visually for each switchgrass ecotype.
The purpose of our mapping analysis was to use the empirical model developed (i.e., reduced models) to estimate potential switchgrass yields in geographic locations where no data were collected, without extrapolating beyond the range of climate values represented. This is not meant to imply that the current crop- or land-cover would in actuality be supplanted by switchgrass, but rather indicates expected yields, according to the empirical models, based on climate and soils. We will refer to these models as ‘mapping versions’. Our data included field trials conducted at winter temperatures between −17 and 8 °C, mean growing season temperatures between 13.8 and 27 °C, and >310 mm total growing season precipitation. In the mapping analysis, we masked out regions of the United States with more extreme values. For management variables, which are not intrinsically spatial, we assumed fixed values. For the lowland ecotype model, which included Ntot, we assumed switchgrass would be fertilized with 80 kg N ha−1. For both ecotypes, we used a stand age of 4 years. Fike et al. (2006) found that upland varieties produce higher yields with two harvests than with one. We therefore set harvest frequency in a way that is optimal for each ecotype: one harvest per year for lowland and two harvests per year for upland varieties.
The full and reduced models explained a significant amount of the variability in switchgrass yield for both the upland and lowland varieties. Yield showed the expected uni-modal response to average growing season temperature, with a significant positive coefficient for Tavg and a negative coefficient for the quadratic temperature term to lower yields at high temperatures (positive v1 and negative v2 in Table 2). Both ecotypes showed a positive response to minimum winter temperature. Both ecotypes showed a positive response to precipitation and both had a significant negative interaction between precipitation and temperature (v4 and v5 in Table 2). Lowland varieties showed stronger responses to average temperature than upland varieties.
We considered two soil-related variables (v6 and v10 in Table 2). Yield showed a significant positive response to our soil moisture index (WetSoil) for the upland, but not the lowland, variety. Depth to bedrock (Drock) was not a significant predictor of yield for either ecotype, and was excluded from the reduced models.
The full models included four management-related variables: stand age, number of harvests per year, an indicator variable for fertilization, and total nitrogen. Of these, only the lowland ecotype showed a positive response to total nitrogen. The remaining predictors were not significant and were excluded from the reduced models (Table 2).
Correlations between yields from field trials in the same location, c, were significantly greater than zero in the final, reduced models (Table 2). Note that the number of observations increased slightly (total degrees of freedom +1 in Table 2) in the reduced models because observations that had missing values for predictors were removed could be used in the analysis.
All Pearson's correlations between predicted and observed values (back-transformed to Mg ha−1) were highly significant. For the upland variety, the correlation was 0.6190 (95% CI=[0.5591, 0.6725], df=456, P<0.0001) for the training subset and 0.5795 (95% CI=[0.3690, 0.7335], df=52, P<0.0001) for the test subset. For the lowland variety, the correlation between predicted and observed yield was 0.4596 (95% CI=[0.3932, 0.5213], df=583, P<0.0001) for the training subset and 0.1851 (95% CI=[−0.1111, 0.4511], df=44, P=0.22) for the test subset. Correlations are usually lower for the test subset than for the data used to develop the model.
The median difference between measured and predicted switchgrass yield was 0.081 Mg ha−1 (range −2.9758 to 3.734 Mg ha−1) for lowland and 0.0718 Mg ha−1 (range −2.941 to 3.678 Mg ha−1) for upland varieties. For the upland variety, the reduced model produced a mean residual standard error of 0.6975, with standardized residuals between −2.98 and 3.73 SD and an interquartile range of (−0.56 to 0.71). For the lowland variety, the reduced model had a mean residual standard error of 0.7971, with standardized residuals between −2.74 and 5.48 SD and an interquartile range of (−0.59 to 0.55). Lowland values with magnitudes greater than three were evaluated as potential outliers.
A simple least-squares regression showed significant positive relationships between measured and predicted switchgrass yields (Fig. 1), although a great deal of scatter remained. The largest deviations were predictions of the highest lowland yields, which were underpredicted by the reduced model (Fig. 1a). We had no other reason to remove these observations as putative outliers.
The mapping version of the reduced models above showed the expected gradient of higher yields in the eastern United States and lower yields in the western United States (Fig. 2a). Note that we excluded grey areas from prediction because the predictors fell outside the observed range in Fig. 2. The highest predicted lowland yields were centered on the three-state junction of Tennessee, North Carolina, and Georgia, with lower predictions moving outward from this junction (Fig. 2a). High yields were also predicted throughout the states of Illinois, Kentucky, and Virginia. Low yields were predicted in the far west, the Gulf coast, and at higher latitudes of New York and Michigan (Fig. 2a). Interestingly, moderately high yields were predicted in some isolated pockets of the Sierra-Nevada Mountains, areas outside the natural range for switchgrass (Fig. 2a). Maps of estimates represent potential yield on lands available for planting switchgrass and do not suggest that switchgrass will replace existing land cover.
Predicted upland yields were generally lower than lowland yields. Upland yields were higher than lowland yields in many areas of the western United States and at high latitudes, including northern Michigan, Wisconsin, and Maine (Fig. 2b). The highest upland yields were centered near the three-state junction of West Virginia, Kentucky, and Ohio (Fig. 2b).
The generalized logistic model presented here provides a means of estimating switchgrass yields in different locations based on local climate, soil conditions and management choices. In this study, we found that yields in field trials of the lowland ecotype were generally higher than yields of the upland ecotype. Both ecotypes showed a quadratic response to average temperature. The lowland ecotype showed a more-significant positive response to minimum winter temperature. This is expected since this ecotype does not do as well at high latitudes (Casler et al., 2004). Precipitation was strongly correlated with yields of both ecotypes. Lee & Boe (2005) noted a strong precipitation response for upland varieties in North Dakota. Only the upland ecotype showed a significant response to our soil moisture index (Table 2). It has been suggested that replacing SSURGO-derived or locally measured soil water holding capacity for % sand in our soil wetness might improve the skill of this predictor.
Despite removing data for field trials during the first year of establishment, we found a negative response to stand age that was significant for both ecotypes, suggesting a decline in yield with age after several years of harvest, as noted by Lee & Boe (2005) for upland varieties. Fertilizer application had a positive effect on lowland, but not upland, yield. Other studies have also shown a positive effect of nitrogen for upland (Madakadze et al., 1999) and lowland varieties (Sanderson & Reed, 2000), but with diminishing returns. Sanderson & Reed (2000) reported that fertilizer was not beneficial during the establishment year. Bedrock depth was not identified as an important predictor of yield, perhaps because few field trials were conducted in shallow soils.
Spatial patterns predicted by the mapping versions of the empirical models seem to deviate most from expectations on the western and northern margins of the natural range for switchgrass. This highlights how important it is to collect field data from sites with marginal conditions, which provide more information than data from sites with ideal conditions for use in both empirical and process yield models. In drier western areas, predictions for lowland yield based on the empirical model appear higher than expected. For example, our results indicate lowland yields of 10–15 Mg ha−1 in the Big Bend region along the United States–Mexico border in western Texas. According to Sanderson et al. (1999b), the high-yielding lowland ‘Alamo’ variety would likely not perform well in western Texas where annual rainfall is <50 cm. Likewise, predicted yields of 5–10 Mg ha−1 in the semi-arid rangelands of SD, WY, and CO are higher than expected. Baskaran et al. (2009) found the largest deviations between SWAT-model predictions for Alamo switchgrass and those of the lowland mapping model in the southwest and between the latitudes of 41° and 43° and east of the Dakota's. Additional trials in these western areas are needed to better define productivity in more arid environments. Although this study made a special effort to identify and include sites as far north as Montreal, Canada in order to better represent yields at high latitudes, trials in more northern locations are needed to better define yields for lowland and for upland varieties at higher latitudes (Casler et al., 2004). In these areas, where it is necessary to extrapolate to new conditions, estimating yields using process-based models is probably a better alternative.
Previous studies have used mechanistic plant-growth models to predict switchgrass yield. Kiniry (1996) was able to explain 76% of variation in yield at five sites in one state (Texas) using the ALMANAC model. However, in a later comparison, Kiniry et al. (2005) was able to explain only 47% of variation among five locations in the south. ALMANAC performed well in explaining variation among locations, but not as well in explaining year-to-year variation within yield. We also found that temporal variation within-location were the most difficult to predict. This suggests to us that attributes shared by trials at the same location, such as soils properties, are unlikely to improve predictions. Grassini et al. (2009) also developed a plant-growth model for switchgrass and compared predictions for 10 years at six sites both in the far northern and southern range of the Midwestern United States. Aboveground biomass predictions were within 15% of reported values. The EPIC model was used by Thomson et al. (2009) to simulate switchgrass yields over a larger region (for the conterminous United States). Spatially, their predictions showed some similarities with results presented here, with both predicting low values in the west. But the two studies also showed some differences in geographic patterns. EPIC predicted high yields in Florida, along the Gulf coast of Texas and Louisiana and the coast of North Carolina. Our empirical model predicted lower yields in these areas. Calibration was conducted for seven locations in the southeast and overall validation statistics were not reported. We caution that comparing R2 values obtained by comparing observed and predicted values from different plant-growth models or from empirical models is a questionable practice, due to differences in the numbers of parameters involved. Generally speaking, it would be best to report such statistics for new ‘test’ data (locations, years) not used in calibration.
In our view, the most important contribution of the empirical relationships identified here is to serve as a basis for evaluating and improving mechanistic plant-growth models for switchgrass. Understanding where relationships between mechanistic models and their drivers fail to reproduce those observed in nature is a more constructive approach to validation than simply comparing the values themselves (Jager et al., 2000). Baskaran et al. (2009) compared SWAT-predicted yields for Alamo switchgrass, a lowland variety, with those predicted by mapping version of the lowland empirical model. A regression between SWAT-predicted and empirical model yields gave an R2 of 0.51. However, on average, lowland yields predicted by the empirical model (Fig. 2a) tended to be higher than those of the SWAT model. As discussed earlier, the empirical model for the lowland ecotype predicts much higher yields on the southwestern and northern margins.
We have several suggestions for future data collection to facilitate regional assessments. First, seasonal timing of harvest has a well-known effect on yield. It would be useful if future studies could report local measurements of temperature and rainfall. Reporting yields by year, instead of reporting averages across multiple years, would also increase the usefulness of data reported in the literature by allowing matching to the relevant local conditions. Reporting relevant local soil attributes, such as water holding capacity, depth to bedrock, slope, and elevation would be useful. Reporting precise field locations is important as it can improve associations of yields with available geospatial data. It would be helpful to include future trials from a much wider range of locations and conditions. For example, yield data are needed for sites farther west, at higher elevations and slopes, shallower soils, and under less-than-ideal conditions for growth.
The empirical estimates provided by this study can be used to facilitate functional validation of plant-growth models. Results from the best-available yield models, whether empirical or mechanistic, are needed as input to other regional models used in bioenergy assessments. For example, economic models that estimate changes in land use require estimates of the relative profitability of growing switchgrass instead of other crops. Best-available regional yield estimates are also needed by models to identify optimal locations for siting biorefineries (e.g., Graham et al., 2000).
This research was funded, in part, by the Department of Energy Office of Biomass Programs. ORNL is managed by UT-Battelle, LLC for the USDOE under contract DE-AC05-00OR22725. We thank Nadia Ally for spending a summer perusing the literature for switchgrass data from higher latitudes. Bob Perlack deserves a great deal of credit for supporting and shepherding this research and sharing his expertise on switchgrass yields. We appreciate Robin Graham for her support and review of this manuscript. Finally, Tris West is responsible for significant improvements in this manuscript and we thank him for organizing this special symposium.