Quantifying the effects of varietal types × management on the spatial variability of sorghum biomass across US environments

Regional‐scale estimations of sorghum biomass production allow identification of optimum genotype×environment×management (G×E×M) combinations for bioenergy generation. The objective of this study was to determine the degree of contributions of G, E, and M toward variability in sorghum biomass in the United States. Using the Agricultural Production Systems sIMulator in a grid computing platform, biomass was simulated for irrigated and rainfed conditions for 30 years across the United States for four sorghum varietal types (grain—GS, sudangrass—SS, photosensitive—PS, and photo‐insensitive—PI). Simulated biomass was assessed by environments clustered using the sum of intercepted solar radiation (ir), mean of temperature stress factor (tp) and water stress factor (sw). Simulated biomass ranged from 5.8 t ha−1 (GS‐rainfed) to 27.5 t ha−1 (PI‐irrigated). Under high‐temperature environments (mean annual temperature = 25°C), rainfed biomass between 40 and 80 days after planting (DAP) was strongly correlated with sw (r = 0.64–0.86) and irrigated biomass with ir (r = 0.68–0.81). Under low‐temperature environments (mean annual temperature = 18°C) after 40 DAP, tp and ir had greater effects than sw (r = 0.55–0.82). Biomass variance was mainly explained by varietal type (50%–76%) in all environments×irrigation combinations, except in the high‐ and mid‐temperature environments under rainfed conditions where rainfall had the major effect (25%–45%). However, when mean temperature during the growing season decreased from 25°C (high environments) to 18°C (low environments), the contribution of mean temperature to biomass variance increased from 7% to 34% (rainfed) and from 4% to 36% (irrigated). Varietal type had the larger interactions with other factors independently of the environment and irrigation. We demonstrated a need to quantify (i) the main G×E×M drivers of biomass variability based on environmental stress factors and (ii) the variance contribution of these drivers on sorghum biomass. Our regional‐scale estimations are key inputs for future robust biomass projections of energy sorghum genotypes integrating G×E×M under climate change scenarios.

temperature environments (mean annual temperature = 18°C) after 40 DAP, tp and ir had greater effects than sw (r = 0.55-0.82). Biomass variance was mainly explained by varietal type (50%-76%) in all environments×irrigation combinations, except in the high-and mid-temperature environments under rainfed conditions where rainfall had the major effect (25%-45%). However, when mean temperature during the growing season decreased from 25°C (high environments) to 18°C (low environments), the contribution of mean temperature to biomass variance increased from 7% to 34% (rainfed) and from 4% to 36% (irrigated). Varietal type had the larger interactions with other factors independently of the environment and irrigation. We demonstrated a need to quantify (i) the main G×E×M drivers of biomass variability based on environmental stress factors and (ii) the variance contribution of these drivers on sorghum biomass. Our regional-scale estimations 1 | INTRODUCTION Sorghum [Sorghum bicolor (L.) Moench] is the fifth most important cereal worldwide, with grain production of 59.3 million tons in 2018 (FAO, 2020). In the United States, the sorghum grain demand for ethanol production systems have considerably increased during the last years (Duff et al., 2019). Sorghum has cultivated varieties distributed among five races and over 25 species of wild relatives, offering enormous genetic diversity for crop improvement (Bhattacharya et al., 2011). Forage sorghum varieties maximize biomass at nitrogen (N) fertilizer rates less than those needed to maximize biomass for corn (Zea mays L.; Thivierge et al., 2015). Forage sorghum varieties have been recognized to produce more biomass per unit of irrigation (Rooney et al., 2007). Several sorghum varieties also have photoperiod sensitivity that prevents the crop flowering under long photoperiods of summer in higher latitudes, which confers advantage for biomass production as the crop continues as vegetative until killed by frost (Olson et al., 2012). Sorghum is also known for its drought tolerance, allowing it to produce high biomass in waterlimited environments in part due to its lower transpiration compared to corn, which allows the crop to conserve soil moisture and maintain growth for a longer period (Borrell et al., 2000;Carcedo & Gambin, 2019;Chapman et al., 2000;Mutava et al., 2011).
Crop biomass is influenced by genetics (G), environment (E), crop management (M), and their interactions (Hammer et al., 2020;Rotili et al., 2020;Seyoum et al., 2018). Crop modeling has been extensively used to quantify the effect of G × E × M on sorghum grain yield using field-scale simulations (i.e., from a climate station, single soil profile data from sampling, and crop management based on field measurements; Hammer et al., 2010;Liang et al., 2020;MacCarthy et al., 2009;Raymundo et al., 2021). However, to better explore the main drivers of sorghum biomass for different genotypes, regional modeling approaches that consider the spatial heterogeneity of E and M are helpful (Antle et al., 2017). Such information will assist the agricultural industry in assessing crop management options, harvest methods, and future infrastructure for sorghum energy biomass . Equally important is the need for this spatial information to identify the best G×E combinations that increase sorghum biomass and support the potential expansion of bioenergy production to 2040 in US regions where sorghum is not currently grown (Langholtz et al., 2016).
Several studies reported observed sorghum biomass for a variety of genotypes and regions from field experiments (Buxton et al., 1999;Gill et al., 2014;Hoffmann & Rooney, 2014) and simulated biomass from crop model exercises at field scale (Kent et al., 2020;Lopez et al., 2017;Yang et al., 2021). There were also efforts to develop maps of potential sorghum biomass under rainfed Lee et al., 2018) and irrigated conditions (Huntington et al., 2020) based on statistical regression models and machine learning across the United States. However, these models have two limitations: (i) they do not consider G (e.g., crop phenology, morphophysiological features) and M (e.g., irrigation strategy, planting date) as drivers of the spatial variability of sorghum biomass and (ii) they had only limited validation (e.g., seven biomass observations in Daly et al., 2018).
Here we applied a biophysical crop model (the Agricultural Production Systems sIMulator [APSIM]) to generate G×E×M combinations and assessed the effects of the G×E×M interactions on rainfed and irrigated sorghum biomass and its variability at regional scale in the United States. The quantification of this variability and its partitioning (hereafter referred to as variance partitioning) is key to distinguishing the main drivers of spatial biomass variability (Ramirez-Villegas & Challinor, 2012;Teixeira et al., 2017). The identification of the relative importance of those drivers on sorghum biomass helps inform how we might use insights from regional model applications.
The objective of this study was to distinguish which of G, E, and M are dominant factors generating variability in sorghum biomass in the United States. We specifically investigated the relative variance contribution on simulated biomass from four sorghum varietal types, grain (GS) sudangrass (SS), photosensitive (PS), and photo-insensitive (PI), under two irrigation scenarios (rainfed and irrigated) across the potential production areas for energy sorghum in the United States. We choose varietal type and irrigation strategy as factors of analysis in this study due to their potential contribution to increase sorghum biomass for bioenergy. Although irrigation is not a current common practice in sorghum production systems of the United States, this are key inputs for future robust biomass projections of energy sorghum genotypes integrating G×E×M under climate change scenarios.

K E Y W O R D S
APSIM, bioenergy, feedstock, genotype, irrigation, model upscaling, pSIMS, regional modeling, spatial variability, variance partitioning practice is expected to increase in order to reduce the gap between actual and potential biomass production to the extent that this is economically viable (Ciampitti et al., 2020;Huntington et al., 2020). First, we tested APSIM using aggregated field data from a diverse set of previous biomass experiments. Second, we set up a national-scale modeling experiment to estimate the variability in biomass across the sorghum-growing regions and its spatial drivers.

| Study region
This study was undertaken across the extent of the potential production areas for energy sorghum in the United States at a 30 arc-minute resolution (~35 km grid cell resolution in the North to ~50 km grid cell resolution in the South). This included 1860 grid cells that were selected based on the potential economic availability of biomass resources from agricultural lands described at the farmgate in the Department of Energy 2016 Billion-Ton Report (Langholtz et al., 2016). The geographical area has considerable heterogeneity in topography, resulting in several environments with a variety of climate regimes ( Figure  S1). The potential areas for biomass sorghum described by Langholtz et al. (2016) cover 100% of the state areas without discrimination between croplands. In order to better represent potential agricultural areas for biomass sorghum, we masked our data analysis to cropland areas (1325 grid cells at 30 arc-minute resolution) reported in the parallel System for Integrating Impact Models and Sectors (pSIMS; Elliott et al., 2014). Figure 1 shows the data sources, the source of biomass variance, the models and spatiotemporal scales, and the methods of analysis and tools used in this study. Initially, we conducted a field-scale model validation to test the ability of a crop model to predict in-season and final sorghum biomass (Sections 3.1 and 4.1) for locations where field data were available. Then, using climate and soil gridded data at 30 arc-minute resolution, the sorghum biomass was simulated across the potential production areas for energy sorghum in the United States (Sections 3.2 and 4.2).

| APSIM-Sorghum description
The APSIM-Sorghum model (hereafter referred to as APSIM or the model; Hammer et al., 2010Hammer et al., , 2019 within F I G U R E 1 Outline of the approach used in this study to assess the effects of genotype (G)×environment (E)×crop management (M) on simulated sorghum biomass including an overview of data sources, sources of biomass variance (E, G, and M), models and spatiotemporal scales, and analysis tools used in this study. Numbers in superscripts indicate the modeling step where the data source and source of variance were applied (1 for field-scale modeling and 2 for regional modeling) APSIM Classic 7.9 r4132 (Holzworth et al., 2014) was initially validated and then used to simulate sorghum biomass across the potential production areas for energy sorghum in the United States (Section 2.4). The sorghum model in APSIM has been previously described (https:// www.apsim.info/docum entat ion/model -docum entat ion/crop-modul e-docum entat ion/sorghum) and tested across a large range of environments (e.gAkinseye et al., 2020;Carcedo & Gambin, 2019;Clarke et al., 2019;Singh et al., 2017). A complete description of the model structure and parameters was provided by Hammer et al. (2010 and. The model contains algorithms that simulate crop phenology, growth, and soil-plant N dynamics. Crop phenology is simulated using thermal time thresholds for each phenological stage. When water and N conditions are optimum, crop growth rate on a daily step is only limited by solar radiation and is calculated as the product of intercepted solar radiation and radiation use efficiency (RUE). When the crop is under water stress, biomass accumulation is calculated as the product of potential crop transpiration (limited by soil moisture, root extent, and water uptake capacity) and transpiration efficiency, adjusted for atmospheric vapor pressure deficit (i.e., relative humidity; Hammer et al., 2010).

| Observed field data
Observed data from 439 site×year×management/genotype combinations reported in published (Buxton et al., 1999;Gill et al., 2014;Heitman et al., 2017;Maw et al., 2016;Roby et al., 2017;Wight et al., 2012) and other datasets (J.J. Volenec and S.M. Brouder, unpublished datasets) were assembled to test the APSIM performance to predict aboveground biomass (hereafter referred to as biomass; Table  1). These experiments were conducted from 1988 to 2015 and comprise an unbalanced set of 21 sorghum genotypes across eight US states (17 locations; Figures S2 and S3). The experimental data were generated by three research projects. (1) The Regional Feedstock Partnership between the US Department of Energy and the Sun Grant Initiative (Gill et al., 2014;Lee et al., 2018) which aims to assess the yield potential and stability of sorghums grown across diverse production environments in the United States. This dataset included a combination of 6 sorghum genotypes (one cultivar and five hybrids)×7 locations (six states)×5 years (2008)(2009)(2010)(2011)(2012).
(2) The second dataset was generated by Purdue University under the CenUSA Bioenergy Project (https://nifa.usda.gov/cenus a-bioen ergy) which included 3 genotypes×3 Indiana locations×4 years. That project compared sorghums to perennial grasses for producing biofuels and bioproducts on marginal lands. A previous detailed APSIM validation (third dataset; Yang et al., 2021) was used to compare model biomass estimations for final (i.e., at the end of the season) versus in-season harvests. Yang et al. (2021) (2015 and 2017). A description of the TERRA Project dataset was not included in this paper, for details see Yang et al. (2021). All experiments have been carried out to study biomass response to various treatments, including N fertilizer rates, time of planting, population, and the adaptability of genotypes across locations and years ( Table 1). The crop data collected in these experiments vary with the environment and treatment applied; however, crop biomass at harvest was the only common variable between them. Therefore, we only used this crop response variable to validate APSIM estimations.

| Model inputs for field simulations
Historical daily climate data (daily rainfall, maximum and minimum temperature, and global solar radiation) from the AgMERRA climate forcing dataset (Ruane et al., 2015) were used as inputs to the model. This dataset extends from 1980 to 2010. For years after 2010, climate data were retrieved using the bestiapop Python package (https:// github.com/JJgur i/besti apop) Ojeda et al., 2021a) from the NASA Prediction of Worldwide Energy Resource (POWER)-Climatology Resource for Agroclimatology database (https://power.larc.nasa.gov). These climate datasets have been widely tested with weather station data across several environmental regions (Bender & Sentelhas, 2018;Merlos et al., 2015;Ojeda et al., 2017).
We used the FAO Harmonized World Soil Database (Nachtergaele et al., 2010) to setup the soil profile properties for each location. The soil data included bulk density, organic carbon, pH, air dry (corresponding to the moisture limit for dry evaporation of the soil), drained lower limit (LL15), drained upper limit (DUL), saturated (SAT) volumetric water contents, KL (potential rate of water extraction), and XF (root exploration factor). Maximum plant available water capacity (PAWC) is calculated as the difference between DUL and LL15. A complete description of soil parameters used for the validation of APSIM is provided in Table S1 for each environment.
Initial soil moisture data were not available for all field experiments. Therefore, initial moisture in the entire soil profile was assumed to be full (100% of PAWC) at planting in all field experiments except for the TX locations. This is typical for most corn/soybean [Glycine max (L.) Merr.]growing areas in the Midwest and Eastern United States  42 1988, 1989, 1990, 1991, 1992  −102.07 2000, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2014, 2016, 2017 59 May 19  during the planting window (i.e., late April to early June; Ojeda et al., 2017). Under the TX locations (Bushland, College Station, Corpus Christi, and Weslaco), the entire soil profile was assumed to be 50% of PAWC at planting. Crop management was configured to reproduce those applied in the experimental plots based on recorded information (Table 1). Four varietal types (GS, SS, PS, and PI) were created based on observed data from 18 hybrids in west-central IN, United States (Yang et al., 2021). The genotypes used by Yang et al. (2021) were clustered in four groups based on thermal time between the end of juvenile stage (100°Cd after emergence) and floral initiation stage (which affects maturity, final leaf number, and potential canopy area), extinction coefficient, RUE, and response to photoperiod (which also affects final leaf number) (Table S2). We compared the varietal characteristics of the 21 genotypes from the validation dataset with these four groups and we assigned to them one varietal type to be used in the model simulations. All varietal types were used in the model validation except for grain sorghum (GS; Table 1).

| Evaluation of model performance
The model fitness was evaluated by assessing the concordance correlation coefficient (CCC) (Lawrence & Lin, 1989) and the root mean squared error (RMSE) between observed and simulated (Piñeiro et al., 2008) values for biomass. The CCC combines precision through Pearson's correlation coefficient, which represents the proportion of the total variance in the observed data that can be explained by the model, and accuracy by bias which indicates how far the regression line deviates from the concordance line (y = x). This coefficient has been previously used in several crop modeling studies to evaluate model performance (Ojeda et al., 2017;Pembleton et al., 2013;Tedeschi, 2006).

| pAPSIM simulations
The pSIMS (Elliott et al., 2014) synthesizes crop management, weather, and soil data per grid cell and allows for parallel computing crop model simulations. pSIMS is a Python-based framework designed to support integration and high-resolution application of any site-based climate impact model that can be compiled in a Unix environment (Elliott et al., 2014). This framework has been previously used to assess climate change impacts on corn grain yield using the Decision Support System for Agrotechnology Transfer (DSSAT; Glotter et al., 2016) and APSIM (Baum et al., 2020). We used a parallelized version of APSIM (pAPSIM) implemented in pSIMS to simulate sorghum biomass at 30 arc-minute resolution over the potential production areas for energy sorghum in the United States. pAPSIM allows for regional-scale simulations by using the Swift parallel scripting language (Wilde et al., 2011) to process APSIM concurrently on several clusters and computer cores. Inputs included historical daily weather data  from the AgMERRA climate forcing dataset (Ruane et al., 2015) and soil data from the FAO Harmonized World Soil Database (Nachtergaele et al., 2010) using the parametrization described in Section 2.3.3. Spatial variability of PAWC (from 150 to 400 mm) across the study region is shown in Figure S4.

| Experimental design and
sources of variance A regional simulation experiment was designed as a factorial combination of three sources of variance (i) 1325 climate×soil grids (30 arc-minute resolution), (ii) four varietal types (GS, SS, PS, and PI), and (iii) two irrigation strategies (rainfed and irrigated). The combination of the three sources of variance resulted in 10,600 combinations of factors. These combinations were simulated over 30 years  which resulted in a total of 318,000 biomass simulations (Figure 1). We fixed all crop management factors over time and space (except planting date, see Section 2.4.3) to separately evaluate the varietal type and irrigation strategy effects on biomass variability from other factors. The varietal types were parametrized using the same clustering explained in Section 2.3.3 for model validation. The automatic irrigation strategy during the crop season was configured to maintain nonlimiting water conditions for the crop under irrigation. A soil water availability factor was calculated by dividing actual soil water supply [SW (actual soil water) − LL15] by the potential soil water supply (DUL − LL15) every day. Irrigation was applied when this ratio was less than 1 (i.e., the difference between the supply and demand). Automatic fertilizer application was set in the model to supply N in response to crop demand and minimize the degree of stress on growth rates to 0. Soil water and N were reset to their default values at planting every year. The crop was harvested when it achieved physiological maturity (1204, 1230, 1266, and 1363°Cd from emergence to physiological maturity for GS, PI, SS, and PS, respectively), or it was killed by frost, or at 120 DAP. We used the fixed harvest date (120 DAP) to end the growing season for seasons where the photosensitive varietal type did not achieve physiological maturity or was not killed by frost.

| Planting date estimation
Sorghum planting dates in the United States are primarily governed by crop rotation and are adjusted according to thermal conditions (minimum threshold for soil temperature) (Ciampitti et al., 2019). The recommended mean air temperature for optimal germination ranges from 15 to 23°C. Minimum temperatures below 10°C can delay sorghum germination (Anda & Pinter, 1994) and, after germination, the crop does not grow and develop with mean temperatures below 11°C (Hammer et al., 1993). Farmers in the Northern Great Plains/Late Production region (OK, KS, AR, and MO) sow sorghum after the corn (Z. mays L.) planting ends, corresponding to mean temperatures >11°C ( Figure S5a). On the other hand, planting dates in the Southern Great Plains/Early Production region (TX, LA, and MS) are also driven by soil moisture and rainfall events around the planting window. For this study, we set up in pAPSIM a variable planting rule at 30 arc-minute resolution based on the median corn planting date, mean air temperature, and rainfall because there is lack of spatial data for sorghum planting dates at high resolution in the United States. The USDA Corn Crop Progress Maps are gridded geospatial weekly datasets (9 km resolution) which are fully synthetic representations of confidential, county corn planting dates (Rosales, 2021; Figure S5b). We used the 2020 USDA Corn Crop Progress Map to estimate the planting date windows for sorghum at 30 arcminute resolution in the study region. A sorghum window planting date from 15 to 45 days after the median corn planting date was applied for both irrigation strategies (rainfed; Figure S6a and irrigated; Figure S6b). The planting was executed when the mean air temperature was, at least, 11°C for 7 consecutive days. Under rainfed conditions, there was also a rainfall requirement of, at least, 30 mm of cumulative rainfall during 14 consecutive days to sow the crop.

| Environmental clustering
In order to study the G×E×M interactions, environmental clusters were created by the algorithm K-means (Kumar et al., 2011;Vassilvitskii & Arthur, 2006) applying scikitlearn Python 3.8.5 (https://www.python.org). K-means groups similar data points together to discover underlying patterns. The clustering was applied using the sum of intercepted solar radiation (ir; Equation 1; Figure 2a where radn is the daily solar radiation (MJ m −2 ) and I/Io is the fraction of solar radiation that the crop intercepts each day, where ft is the temperature response of photosynthesis (i.e., RUE in the model) which varies from 1 between 20 and 35°C to 0 when the mean air temperature is <8 or >50°C.
The soil water availability factor (Section 2.4.2) was used to derive the stress factor for leaf expansion (fw). When fw = 0, the crop was under complete water stress and when fw = 1, there was no crop water stress. In all cases, t0 and t1 indicate the start and end of each period considered in the calculation (planting, 40 DAP, and 80 DAP for t0 and 40 DAP, 80 DAP, and harvest for t1. The cluster number (k = 3; Figure 2j) was determined based on the Elbow method for optimal clustering (Bholowalia & Kumar, 2014;Ding & He, 2004; Figure S7) based on the patterns of ir, tp, and sw during the crop growing season across the study region ( Table 2). The clusters were named accordingly to the mean temperature during the three periods assessed (P-40, 40-80, and 80-H) in high (high-temperature), mid (mid-temperature), and low (low-temperature; Figure 2k).

| Spatial visualization, summary statistics, and correlation analysis
The simulated biomass output by pAPSIM was extracted from NetCDF files. Only data above the 2.5% percentile (5.8 t ha −1 ) of the entire dataset was considered for the analysis. This biomass threshold compensated the minimum operational costs of sorghum production in the United States (315 USD ha −1 ; USDA NASS, 2021a) considering a reference sorghum biomass market price of 65 USD t −1 (Sun et al., 2020). The selected cutoff point is also aligned with previous crop failure definitions for grain sorghum of 3 t ha −1 (Ciampitti et al., 2020;Grains Research & Development Corporation, 2017;Whish et al., 2005). Simulated biomasses below this threshold were considered as crop failures and quantified by the failure risk as the number of years with crop failure/30 (total of years that the crop was sown) × 100. We mapped the spatial patterns of simulated biomass (mean and SD) for each varietal type and irrigation strategy at a 30 arcminute resolution across the study region using xarray, Cartopy, GeoPandas, GDAL, Fiona, and Shapely Python packages. Summary statistics and plotting by environments (high, mid, and low) were also developed using scipy.stats.linregress, pandas, NumPy, and Matplotlib Python packages. We calculated the Pearson's correlation coefficient (across years  for a given grid cell) between the simulated biomass and climate variables (global solar radiation sum, rainfall sum, and mean daily temperature from planting to harvest) and stress factors (sum of ir, mean tp, and mean sw for P-40, 40-80, and 80-H) for each varietal type×irrigation combination.

| Variance partitioning
To determine the contribution of each source of variance to the total biomass variability, two variance-based sensitivity indices were computed for each grid cell and growing season (Monod et al., 2006). According to this method, the variance of the simulated crop biomass is partitioned into fractions which can be attributed to each source. Specifically, variance sources were global solar radiation, mean temperature, rainfall, varietal type, irrigation strategy, planting date, and PAWC. Two indices, namely Main Effect (Equation 4) and Total Effect (Equation 5), were used to separate the variance caused by one source from the variance caused by the interaction. and where E[Biomass X i ] denotes the expected value of sorghum biomass across all sources is the expected value of sorghum biomass across all sources except X i . In other words, Main Effect explains the share of the components to crop biomass variability without interactions, that is, if Main Effect = 1, the assessed factors explain the entire proportion of sorghum biomass variability, but if Main Effect < 1, residuals exist which means additional factors are needed to explain this variability. Total Effect represents the interaction of a given factor with other factors, that is, high Total Effect values for a given factor denote high interactions of that factor with other factors, therefore, Total Effect does not include residuals. We used percentile threshold to classify environmental factors (global solar radiation, mean temperature, rainfall, and PAWC) into three categories of low, medium, and high, that is, the low class is defined as below the 33% percentile, medium as between the 33% and 66% percentiles, and high above the 66% percentile of the values distribution.

| Field-scale modeling
The observed biomass varied between environments and genotypes from 0.5 t ha −1 in Columbia City, IN (genotype Dale) to 34.7 t ha −1 in Roper, NC (ES5200; Figure 3a,b). The analysis of the field experimental data showed that observed biomass strongly varied for a given genotype (SD from 1.7 t ha −1 for SPSordan79 to 6.0 t ha −1 for M81E) or environment (SD from 2.  Table S4). It is important to note this summary performance of the model may be affected by the imbalance of the presence of each genotype in each environment ( Figure S3). The SD of simulated biomass (5.4 t ha −1 ) was close to the SD in the observed field data for final harvests (5.5 t ha −1 ). The accuracy of APSIM was higher when simulating in-season sorghum biomass of 18 genotypes at West Lafayette, IN (i.e., harvests within the growing season; RMSE = 3.4 t ha −1 , CCC = 0.86; Figure 3c; Table S5; Yang et al., 2021) than final biomass in other environments ( Figure 3b). The genotypes had a bigger effect on the deviation between simulated and observed values than environments (Figure 3d,e).

| Spatial variability of sorghum biomass
The mean, SD, and coefficient of variation (CV) of simulated biomass were markedly influenced by environment, varietal type, and irrigation strategy (Table 3). Density distributions consistently showed differences in sorghum biomass between varietal types (negative skew for GS and PS) across environments and illustrated the reduction of CV of biomass under the irrigated strategy ( Figure 4). Although PI showed the highest simulated sorghum biomass under the irrigated high environments (27.5 t ha −1 in the Northwest TX), this G×E combination showed the highest CV of biomass under rainfed conditions (26.1%) ( Figure 5). PS exhibited the lowest simulated sorghum biomass under the rainfed low environments (5.8 t ha −1 , center WI; Figure 5). Under rainfed conditions, the mean simulated biomass ranged from 11.1 t ha −1 for GS under the low environments to 15.7 t ha −1 for PI under the mid environments ( Figure   Factor Period T A B L E 2 Mean and standard deviation (SD) of intercepted solar radiation sum (ir), temperature stress factor (tp), and water stress factor (sw) from planting to 40 days after planting (DAP; P-40), from 40 to 80 DAP (40-80), and from 80 DAP to harvest (80-H) in the high, mid, and low environments. Only data from the rainfed treatment were used for this environmental analysis 5; Table 3). On average, SD of simulated biomass was less for the GS (2.0 t ha −1 ) in comparison with SS (2.3 t ha −1 ), PS (2.2 t ha −1 ), and PI (3.0 t ha −1 ; Figure 5). Under irrigated conditions, the mean simulated biomass ranged from 11.7 t ha −1 for GS under the low environments to 21.6 t ha −1 for PI under the high environments ( Figure 5; Table 3). The range of SD of biomass was reduced under irrigated conditions and it varied from 1.5 t ha −1 (GS) to 1.9 t ha −1 (PS). Under the mid environments, rainfed biomass was highest in the states of NE, IL, and IN ( Figure  5). Spatial variance of flowering date and harvesting date changed mainly due to varietal type and environment ( Figure S10). On average, we found a negative correlation for biomass versus flowering date and a nonsignificant correlation between biomass versus harvesting date ( Figure S11). Although PS showed the maximum flowering date and cycle length, on average this varietal type did not show the maximum sorghum biomass (Table 3). Using the 2.5th percentile across all environments as a "failure" cutoff (seasonal crop biomass < 5.8 t ha −1 ), it was found that most "failures" occurred in the low environments (82.4% of the total failed seasons including rainfed and irrigated scenarios; Figure 6). For low environments, the spatial distribution of crop failures reflected the incidence of mean temperature less than ~20°C and the effect was independent of irrigation status. However, under rainfed conditions, it was related to soil moisture condition around planting in the mid-(9.1% of the total failures) F I G U R E 3 Top: (a) Distribution of observed (Obs) and simulated (Sim) biomass (t ha −1 ) and Obs versus Sim biomass (t ha −1 ) for the APSIM validation datasets for (b) final (this study) (c) and in-season harvests reported by Yang et al. (2021). Final harvest indicates crop harvest carried out at physiological maturity or end of season. In-season harvest indicates harvests carried out during the growing season from 31 to 94 days after planting. In panels (b) and (c), the solid gray line represents the line 1:1, that is, y = x and the solid black line the regression line adjusted to the complete dataset. A complete description of the validation dataset is provided in Table 1. Bottom: Panels d and e show the deviation between simulated and observed biomass (t ha −1 ) at final harvest for every genotype (across environments) and every environment (across genotypes), respectively. The black line in panels (d)  and high environments (8.5% of the total failures) mainly in the west areas of TX, OK, and KS ( Figure 6). Maximum failure risks (>80%, i.e., the crop failed in 24 years out of 30) were found for PS in the low environments under high latitude areas such as west NY (Figure 6).

| G×E×M drivers of simulated sorghum biomass
Overall, simulated biomass showed different responses to spatial variability in climate factors (global solar radiation, mean temperature, and rainfall) for different varietal types and irrigation strategies across the study region ( Figure S12). Main spatial differences were found between the low environments of the northern regions and the other regions and their differential response to irrigation strategies ( Figure S12). In the high environments (i.e., high mean temperature environments, most of TX, OK, KS, AR, MO, KY, TN, FL, and South IL and IN and West of SC, NC, and VA), there was a positive correlation between irrigated biomass and global solar radiation (r = 0.57-0.76), maximum temperature (r = 0.41-0.66), and irrigation (r = 0.54-0.75), while the biomass was negatively correlated with planting date (no correlation-−0.43; Figure 7a). Under rainfed conditions, the correlation between biomass and all factors was negative (r up to −0.7 for maximum temperature), except for rainfall (r = 0.66-0.75) and PAWC (r = 0.26-0.33). Under the mid environments (i.e., mid mean temperature T A B L E 3 Mean planting date (PD), flowering date (FD), harvesting date (HD) and mean, standard deviation (SD), and coefficient of variation (CV) of simulated biomass in the high, mid, and low environments (Envs) for each varietal type (VT) [grain sorghum (GS), sorghum sudangrass (SS), photosensitive sorghum (PS), and photo-insensitive sorghum (PI)] and irrigation strategy (Irr) [rainfed (R) and irrigated (I)]

Env
VT Irr environments, most of NE, IA, AL, GA, West SC, NC, VA, and OH, and Central and North IN and IL), the pattern of correlation between biomass and environmental and crop management factors were similar to the high environment, except for the effect of minimum temperature which was positive (r = 0.33-0.74) and PAWC which was negligible (r = no correlation-0.22). In contrast, the correlation of simulated biomass to global solar radiation and to maximum and minimum temperatures were positive for the low environments (i.e., low mean temperature environments, North MN and WI and most of MI, OH, PA, and NY) under rainfed and irrigated scenarios (r up to 0.77). For all environments and varietal types, the correlation between biomass and irrigation was positive (r = 0.51-0.75). The drivers of biomass variability differed between varietal types mainly in the mid environments under rainfed conditions. For example, the correlation between biomass and minimum temperature was positive for SS and PS, and negligible for the other varietal types under these environments (Figure 7a). Simulated biomass showed different responses to variations in stress factors (ir, tp, and sw) for different varietal types, irrigation strategies, periods of the growing season, and environments (Figure 7b). Under the high environments, rainfed biomass was positively correlated with sw and irrigated biomass with ir during the complete crop growing season although, between 40 and 80 DAP, the correlations were the highest for all varietal types (r = 0.64-0.86; Figure 7b). Similarly, correlations between biomass and stress factors were found in the mid environments, except for tp which showed positive correlation for all varietal types, particularly for PS (r = 0.71). Under the low F I G U R E 6 Failure risk of sorghum biomass (%) under the rainfed (top row) and irrigated strategy (bottom row) for grain sorghum (GS), sorghum sudangrass (SS), photosensitive sorghum (PS), and photo-insensitive sorghum (PI) at 30 arc-minute resolution across the potential areas for energy sorghum in the United States. Failures include years when the crop biomass was <5.8 t ha −1 . Values above each map indicate the percentage of failures with respect to all simulated grid×varietal type×irrigation strategy×year combinations (n = 318,000). Gray areas indicate no crop failure occurred. Scatterplots (right side) show the relationship between biomass (t ha −1 ) and mean temperature (°C; Mean Temp) during the growing season (rainfed top, irrigated bottom) for the failure years F I G U R E 7 (a) Mean correlations (from −1 to 1) between simulated biomass (t ha −1 ) and global solar radiation sum (MJ m −2 ; Rad), mean daily maximum temperature (°C; Tmax), mean daily minimum temperature (°C; Tmin), rainfall sum (mm; Rain), maximum plant available water capacity (mm; PAWC), planting date (Julian Day; PDate),and irrigation applied (mm; Irr); and (b) mean correlations between simulated biomass (t ha −1 ) and sum of intercepted solar radiation (MJ m −2 ; ir), mean temperature stress factor (0-1; tp), and mean water stress factor (0-1; sw) from planting to 40 days after planting (P-40), from 40 to 80 DAP (40-80), and from 80 DAP to harvest environments, ir and tp outweigh the effect of sw for all varietal types and irrigation strategies (r = 0.31-0.82; Figure  7b). A complete list of linear correlation coefficients (including p-values) between biomass, environmental and crop management variables can be found in Table S6.

| Partitioning of the sorghum biomass variance
To identify which components were associated with crop biomass variance, we partitioned simulated biomass into different G×E×M factors (Figure 8). When we compared the results by environment and irrigation strategy, varietal type explained the greatest variance in biomass, but it differed depending on environment mainly due to differences in their climatic patterns ( Figure S1). Biomass variance was mainly explained by varietal type (50%-76%) in all environments ×irrigation strategies combinations, except in the high-and mid environments under rainfed conditions where rainfall had the major effect (25% and 45% for mid-and high environments; Figure 8a). However, when mean temperature during the growing season decreased from 25°C (high environments) to 18°C (low environments), the contribution of mean temperature to biomass variance increased from 7% to 34% (rainfed) and from 4% to 36% (irrigated). Main effect of planting date never was higher than 4% ( Figure 8a) although a stronger negative correlation was found for the high environments in comparison with the other environments ( Figure S13). Particularly for rainfed environments, we found higher residual components (5%-14%) than irrigated environments (4%), indicating there were additional factors that explain some of the biomass variance. Varietal type had the larger interactions with other factors independently of the environment and irrigation strategy (Figure 8b).

| DISCUSSION
Understanding the sources of sorghum biomass variability in the United States assists in developing confidence in bioenergy projections to achieve the 58 million tons of biomass sorghum in 2040 targeted by the Department of Energy 2016 Billion-Ton Report (Langholtz et al., 2016). This study quantified the spatial variability of sorghum biomass, the associated G, E, and M drivers and main sources of variance under contrasting varietal types and irrigated conditions in the potential areas for energy sorghum in the United States. With a better understanding of the relative contribution of each biomass driver, specific crop practices can be implemented to ensure the biomass variability is adequately reduced and potential production can be achieved. Further, the environmental classification and simulation analysis provide a baseline to understand the impacts of climate change on sorghum biomass production.

| Field-scale modeling
Previous evaluations of APSIM reported RMSE for sorghum biomass estimations in the range of 1.5-1.6 t ha −1 (Akinseye et al., 2017;Carcedo & Gambin, 2019). The results reported by Yang et al. (2021)  environments (RMSE = 4.4 t ha −1 ; CCC = 0.68; Figure 3). The reason for these differences could be related to the fact that the fertility and the initial conditions of the soil at West Lafayette considered excellent for corn production, were also optimal for sorghum growth and the measurements, and crop management records of high quality. By comparison, the other Indiana sites would be considered unsuitable for corn production (Bu and CC) or moderately good for corn (La). For example, Kent et al. (2020) reported RMSE = 4.36 t ha −1 and CCC = 0.41 comparing rainfed biomass from DAYCENT with experimental data, while Gautam et al. (2020) found RMSE = 5.5 t ha −1 comparing simulated rainfed biomass with county observed datasets from the US National Agricultural Statistics Service with the same model. APSIM responded satisfactorily to multiple genotypes and environments (climatesoil-management combinations) across a wide range of biomass observations (0.5-34.7 t ha −1 ). The model's fitness-for-purpose for this paper was further supported by the spatial patterns of sorghum biomass in the potential production areas for energy sorghum in the United States that align with field experimental data across the country ( Figure 5).
Lack of specificity in model inputs is a crucial source of uncertainty in crop yield estimations (Fleisher et al., 2017;Ojeda et al., 2021aOjeda et al., , 2021bOjeda et al., , 2021c. Part of the prediction error found in this study (Figure 3) is explained by the fact that field experiments used for model validation were not specifically designed for modeling purposes, and therefore, there were gaps for model inputs such as a lack of measured soil data to initialize the model or crop management records. This highlights the need for minimum dataset guidelines be developed for future studies that ensures measurement of key soil attributes (e.g., soil moisture, NO 3 , and NH 4 before planting) that will enhance the reuse of biophysical data in calibration and validation of crop models.

| Spatial variability of sorghum biomass
Sorghum biomass was simulated over a period of 30 years at 1325 locations (climate×soil combinations) across the potential sorghum cropping areas in the United States using present planting dates, nonlimiting N fertilization, and recommended crop management practices (Ciampitti et al., 2019(Ciampitti et al., , 2020. The outcomes of this research quantify the sensitivity of sorghum biomass to varietal type and irrigation strategy in different environments. For potential sorghum cropping areas in the United States (Langholtz et al., 2016), in average, varietal type (first), rainfall, and mean temperature (second) were the most important drivers of sorghum biomass variability (Figure 8). This was also reflected in the probability density distribution ( Figure 4) and in the spatial pattern of simulated biomass across the high-, mid-, and low environments evaluated ( Figure 5). These findings emphasize that potential sorghum biomass can be achieved (up to 27.5 t ha −1 ) when (i) the crop is fully irrigated (i.e., preventing water stress) and (ii) varietal types with high RUE (RUE > 1.3 MJ m −2 ), high extinction coefficient (ec > 0.56), and relatively short timing to flowering (<80 DAP; e.g., PI and SS) are used in water-limited environments, for example, West TX, OK, or KS.
Aside from Huntington et al. (2020), most regional biomass estimations in the United States have been generated under rainfed conditions Gautam et al., 2020;Langholtz et al., 2016). As an opportunity crop, sorghum (predominantly for grain) is mainly grown under nonirrigated conditions (<38% of the area under irrigation; USDA NASS, 2021b) mostly in Texas (Ciampitti et al., 2019). Field experiments have demonstrated that irrigation can increase energy biomass production in several water-limited regions of the United States such as the Southern Great Plains (Yimam et al., 2015), the Midwest (Roby et al., 2017), and the Southeast (Rocateli et al., 2012). Huntington et al. (2020) recently projected irrigated biomass production to 2099 using machine learning algorithms based on climate variables. They found a range of 25-27.5 t ha −1 of sorghum biomass for West TX, OK, and KS where irrigation was the most important driver of biomass variability. Water is becoming more limiting in many of these regions where sorghum is now grown (TX, OK, KS, and NE) as the Ogallala aquifer becomes depleted. This will limit irrigation options in these regions or make it uneconomical, including for biomass production. In this scenario, our research can help to identify suitable agroecozones where sorghum will be adapted for rainfed production. Gautam et al. (2020) reported higher rainfed biomass estimations (up to 19 t ha −1 ) than our analysis for similar US environments (mainly in LA, AR, and MO) using the DAYCENT model. The higher biomass estimations of that study compared to our results could be related to the fact that they used (i) a biogeochemical model instead of a crop model carrying differences in model structure, (ii) different climate (Global Historical Climatology Network, GHCN) and soil (US National Cooperative Soil Survey, SSURGO) data sources as input to the model which may generate differences in the data resolution and parameter estimations, and (iii) uniform genotype parameters across all environments. This contrast reinforces the value of exploring a wide range of G×E×M combinations when actual and future regional simulations of sorghum biomass are generated at regional scale, especially for irrigated conditions. Sorghum crop failures can be driven by different factors associated with the growing environment (Hammer et al., 2014). For the low environments, crop failures were associated with low minimum temperatures that killed the crop (minimum temperatures less than 0°C) or delayed the crop development due to a slow thermal time accumulation. For example, when the crop was harvested before the stage of flag leaf biomass was significantly reduced and the crop failed. This effect was higher for photoperiod-sensitive types ( Figure 6) due to greater thermal time requirement from the end of juvenile to floral initiation with respect to other varietal types (Table S2). On the other hand, all crop failures occurred under rainfed conditions ( Figure 6) and mainly in soils with low PAWC in the first soil layers (data not shown) in the mid-and high environments. The main drivers of these crop failures were the low temperatures and soil moisture around planting and small and less frequent rainfall events during the beginning of the growing season. APSIM has a time threshold after planting which limits the crop emergence (150°Cd −1 ). Therefore, when these environmental conditions occurred and a delay exists between planting and emergence, the crop did not emerge.

| G×E×M drivers of simulated sorghum biomass
Spatial gradients in global solar radiation, mean temperature, and rainfall, and their corresponding stress factors (ir, tp, and sw) influenced sorghum biomass under both irrigated and rainfed conditions (Figure 7; Figure S12). Similar multiscale environmental crop response characterizations have been previously reported for corn in the United States (Jin et al., 2017) and New Zealand (Teixeira et al., 2017), for sunflower (Helianthus annuus L.) in France (Casadebaig et al., 2020), and for wheat (Triticum aestivum L.; Chenu et al., 2013) and potatoes (Solanum tuberosum L.; Ojeda et al., 2021b;Ojeda et al., 2020) in Australia. However, in this manuscript we combined environmental clustering based on three crop stressors (ir, tp, and sw), correlation and variance partitioning analysis to assess the biomass sorghum variability at a large scale in potential growing environments of the United States. For the high-and mid environments (Figure 2), our findings suggest that irrigation is needed to achieve potential biomass production and, under rainfed conditions, the right selection of varietal type and planting date are key to start the season with high moisture levels and reduce the interannual variability of biomass ( Figure 5; Table 3) and failure risk ( Figure 6). However, under the low-and mid environments (e.g., irrigated areas in NE), the effect of temperature on RUE and the radiation interception after 40 DAP play the most important role to achieve high crop biomass (Figure 7b). Therefore, early planting dates (though avoiding last spring frost) and the use of varietal types with shorter grow cycles (such as PI; Figure  S10) would benefit the crop growing conditions under these environments.
The variance partitioning method is a sensitivity analysis that allows quantification of the variance contribution of selected input variables on model outputs such as crop grain yield or biomass (Monod et al., 2006;Ruget et al., 2002). This method has been applied mainly to quantify the contribution of various climatic factors to corn and wheat grain yield under European environments (Webber et al., 2018). Based on our results, we identified that for environments where the crop is not constrained by temperature (high environments) and water (i.e., when it is irrigated), varietal type [individually ( Figure 8a) and combined with other factors (Figure 8b)] outweighed the contribution of other crop management factors (planting date) and environmental factors (solar radiation, mean temperature, rainfall, and PAWC) to crop biomass variability. However, when mean temperature had a major role on crop growing conditions (low-and mid environments), mean temperature starts to play an important role on biomass variance (Figure 8a). These results highlight the importance of including the genotype as a factor when a factorial simulation analysis is configured to assess biomass variability at large scales where results may vary across environments.

| Limitations of this study and future research
The study region used in this paper (Figure 2) included environments where sorghum is not currently grown in the United States (Ciampitti et al., 2019(Ciampitti et al., , 2020. The regional modeling analysis was implemented over the potential areas of energy sorghum production across the United States (Langholtz et al., 2016) where low mean temperature environments are included which are not ideal for sorghum biomass production (Hammer et al., 1993). Surprisingly, irrigated biomass productions up to 22.2 t ha −1 were predicted with PI genotypes under these environments (e.g., central-south MN and south MI; Figure 5). However, no validation data were found for model testing under these environments ( Figure S2). Further field datasets for crop model validation are required to improve the confidence of APSIM estimations from these environments.
Planting date is a major factor in determining the potential biomass production of any crop across environments (Baum et al., 2020) and usually information about this factor is scarce at high-resolution spatial levels (Minoli et al., 2019). The objective of this study was not to assess the effect of planting dates on biomass variability. However, this factor is an essential input for the model which defines the environment when the crop will grow.
In addition to climate, soil, genotype, and irrigation strategy, other crop management factors such as N fertilization (form, timing, method of application), initial soil moisture, other plant nutrients (P, K, and S) and pH, and previous crop may affect biomass differently and the influence of these factors has not been investigated in this study. For example, a reset rule for soil water and N was applied every year to avoid the effects of previous year on following years. However, sorghum farmers use different crop rotations (single or double crops) based on their cropping system (grain, silage, dual-purpose, and energy; Olson et al., 2012;Rooney, 2004) which vary accordingly with the region, farm history, or land tenure. For example, over 70% of the farmland in IN is not farmed by the landowner; instead, it is rented often on annual contracts which makes difficult to record information about previous crops (NASS, 2014).
This study was conducted using only one single crop model (APSIM Classic 7.9) under the pSIMS framework (pAPSIM) and findings may be different using other crop models and gridded platforms, which vary in structure and complexity (Chapagain et al., 2020;Thorburn, 2017). Our results were presented at only one spatial resolution level (30 arc-minutes) due to the availability of long-term climate and soil data. However, pAPSIM simulations could increase the spatial detail in future studies where climate, soil, and crop management information are available at higher resolution levels than the used in this study.

| CONCLUSIONS
This work was the first attempt to validate and apply the APSIM sorghum module across a wide range of G×E×M for predicting sorghum biomass at regional scale in the United States. This study demonstrates: (1) Maximum sorghum biomass can be achieved combining varietal types with high RUE and ec and potential irrigation management (non-water stress) in the hightemperature environments of the United States. (2) To expand the energy sorghum-growing areas to lowtemperature environments, special attention needs to be paid to crop management practices that improve the environmental conditions during the growing season (optimum temperatures, availability of solar radiation, and soil water under rainfed conditions) such as the use of varietal types with short period growth time or the selection of planting dates accordingly to the regional frost calendar. (3) Varietal type parameterization was critical for explaining differences in biomass for both irrigation strategies under contrasting environments. (4) The environmental clustering based on stress factors and the variance contribution analysis is needed to discriminate the main G×E×M drivers of biomass variability and to prioritize farm decisions.

ACKNOWLEDGMENTS
This work was supported by the US Department of Energy ARPA-E Program Award Number DE-AR0001135 and by the Agriculture and Food Research Initiative Competitive grant no. 2011-68005-30411 from the National Institute of Food and Agriculture. The authors gratefully acknowledge the provision of field experimental data for model validation from the University of Idaho (Dr. Kent Jeffrey). They also thank Chrisbin James (School of Agriculture and Food Sciences, The University of Queensland) and Diego Perez (Bupa) for technical support on Python and Linux coding to set up pAPSIM. APSIM is provided free for research and development use (see www.apsim.info for details). They thank Maria Salas-Fernandez (Iowa State University) for useful discussion on initial design of simulations for this paper.

CONFLICT OF INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding authors upon reasonable request.