Farmers' heterogeneous perceptions of marginal land for biofuel crops in US Midwestern states considering biophysical and socioeconomic factors

Planting bioenergy crops on marginal land is critical for avoiding competition with food crop production. While many studies have estimated marginal land availability using various methods, only a few studies have considered the role of socioeconomic factors in affecting perceptions about the availability of marginal land. This study analyzes land‐use survey data to examine the determinants of farmers' perceptions of marginal land availability on their farms. We find that farmers' perceptions are affected by a combination of unfavorable biophysical (e.g., soil water capacity, temperature variability, and slope) and socioeconomic factors, of which farm size appears to be significant. Interestingly, we identify different determinants of perceptions among farmers that claim to have marginal land and those that do not; the former are determined mainly by unfavorable biophysical factors, while the latter are mainly explained by small farm size. We further apply a prediction model that is trained by a machine learning algorithm to Midwestern states, and derive maps of marginal land likelihood and associated dominant influencing factors. The results suggest that marginal land is primarily under pastureland and grassland cover and in the Dakotas and Nebraska; there is also some marginal land under crop production in the Corn Belt. Our findings contribute to improving understanding of the complex determinants of heterogeneous perceptions of marginal land and can inform the design of more targeted policies for bioenergy crop adoption.

diversion of agricultural lands to avoid competition with food and feed crops (Stoof et al., 2015); they should also not be grown on land under natural forests or grasslands, which are carbon sinks and sources of ecosystem services (Kim et al., 2009). A promising solution to this problem is to grow bioenergy crops on marginal lands (Cai et al., 2011;Gelfand et al., 2013), that is, the agricultural land that farmers might be willing to convert from crop production to bioenergy crops because of low productivity, low profitability, high environmental sensitivity, difficulty in land management, etc. (Jiang et al., 2019;Khanna et al., 2021).
Numerous studies have attempted to identify marginal land for bioenergy crops from different approaches. Some identify "biophysical marginal land" based on land productivity (Cai et al., 2011;Yang et al., 2020), climate and land characteristics (e.g., slope, soil, etc.; Kang et al., 2013), and/or environmental sensitivity (Gelfand et al., 2013). Others track the history of land-use change and identify marginal land as abandoned land (Campbell et al., 2008) or land undergoing intensive land-use changes (e.g., a switch between forest and agricultural land; Cai et al., 2011). Researchers also attempt to identify "economic marginal land" based on the profitability of food and bioenergy crops (Swinton et al., 2011).
These studies use secondary information to identify marginal land for bioenergy crop production. They assume that farmers and landowners (the ultimate decision-makers for marginal land availability) will make land-use choices fully based on those biophysical factors and/or economic theories. However, as demonstrated by many studies (Lee et al., 2018;Sullivan-Wiley & Teller, 2020;Walters, 2017), farmers' decisions/perceptions about land use depend to a large extent on their preferences, experiences, and socioeconomic backgrounds (e.g., age, education, and farm size). To the best of the authors' knowledge, only one study (Skevas et al., 2016) has attempted to understand the behavioral aspect of marginal land availability, which identified socioeconomic factors such as land rent and environmental concerns to be significant. More research is needed to improve the understanding of the impacts of various socioeconomic factors, along with land biophysical factors, on farmers' perceptions of marginal land availability.
Another limitation in previous marginal land studies is the lack of understanding about the drivers of farmers' heterogeneous perceptions of marginal land availability. Numerous studies have demonstrated the existence of heterogeneous behavior rules in farmers' land-use decisions Garrod et al., 2012;Zhou et al., 2020). However, little is known about the heterogeneity of farmers' perceptions on marginal land availability, especially how the dominant causes of land marginality vary among farmers. Answers to this question are of critical importance for the design of more targeted bioenergy crop promotion policies.
In this study, we address the above limitations by applying an explainable neural network (NN) model to examine the data obtained from a survey of Midwestern US farmers' perceptions of marginal land. Farmers' socioeconomic backgrounds, as well as the biophysical characteristics of their lands, are used to explain their responses of marginal land availability based on the NN model. The heterogeneity in farmers' perceptions about classifying their land as marginal is addressed through a deep Taylor decomposition (DTD; Montavon et al., 2017) of the NN model. Through the DTD, we identify the distinctive responses of different farmers to changes in biophysical and socioeconomic factors, and the dominant reason why, at an individual level, part of their land is marginal. We further apply the model to Midwestern states to identify patterns of marginal land availability and associated dominant influencing factors. The analysis is expected to advance understandings about the heterogeneous drivers of marginal land availability and provide implications for more targeted outreach and promotion strategies for bioenergy crops.

| METHODS
We hypothesize that farmers' perceptions of marginal land availability are affected by both biophysical and socioeconomic factors, as demonstrated in other land-use decision studies (Eaton et al., 2019;Skevas et al., 2016). Figure 1 F I G U R E 1 Workflow to identify farmers' perceptions of marginal land availability using a machine learning model | 851 provides a general workflow of the steps to develop the machine learning model and to derive farmers' perceptions of marginal land. The socioeconomic factors, as well as farmers' responses of marginal land availability, are derived from a survey described in Section 2.1. Biophysical factors are derived from a set of remote sensing data and farm boundary data from Farm Market ID (2020), as described in Section 2.2. The procedures for data preprocessing are also described in Section 2.2. The NN model is trained and validated in Section 2.3 using the survey samples. The trained NN model is interpreted in Section 2.3 through the DTD. Finally, in Section 2.4, the trained NN model is applied to land pixels in the US Midwestern states to derive a map of marginal land likelihood (MLL) and associated dominating factors.

| Farmers' survey
A survey for growers in the US Midwestern states was conducted in late 2019 and early 2020. The survey starts by collecting farmers' perceived availability of low-quality land, that is, if a farmer considers at least part of his/her farm as lowquality land. We further collect the top three reasons for the farmers who claim low-quality land and the current land uses on their low-quality lands. Other information regarding the farmers' background characteristics (e.g., demographic conditions), economic conditions, land management objectives, and environmental attitudes is also collected. Refer to section I of the supporting information (SI) for the survey questions.
In the survey, the phrase "low-quality land" is used instead of "marginal land" to provide better clarity for farmers. Low-quality land is defined in the survey as "land which has relatively low crop yield due to high slope, sandy soil, waterlogging, prone to flooding, or other undesirable characteristics." Other factors (e.g., low profitability and difficulty to access machinery) are also listed as possible reasons for farmers to justify marginal land from various perspectives ( Table 2). Since no quantifiable measure of land quality is provided in the survey, farmers' responses represent their own justification of relatively low-quality land.
The survey was delivered to 2500 farmers in the Midwest via both Internet (online) and mail. The 2500 farmer samples were selected from the Farm Market ID dataset based on farm size to ensure coverage of a wide range of farm sizes. We obtained 242 responses out of the 2500 farmers with an overall response rate of 9.7%. Among all the respondents, 204 reported the availability of low-quality land on their farms.

| Data preprocessing
A large set of potential influencing factors (including both biophysical and socioeconomic factors, Table 1) are constructed as candidate input features for the machine learning model to explain low-quality land (marginal land) availability. Major data sources in Table 1 include the survey, remote sensing products, and the county-level or state-level census from the United States Department of Agriculture (USDA). Refer to section I in the SI for a full explanation of variables in Table  1. The biophysical factors for each respondent are extracted based on the farm boundary data derived from Farm Market ID (an example of the farm boundary is shown in Figure S1).
Features in Table 1 with missing data are filled using the multivariate iterative imputation method (Royston, 2005). For a particular multivariate dataset, the iterative imputation method estimates the missing value of one feature by deriving a regression function between this feature and the other features (i.e., imputation). Each feature is imputed sequentially, and the imputation is repeated 10 times for better convergence (Royston, 2005). Afterward, each feature in Table 1 is min-max normalized into the range of [0, 1].
We then use a backward feature selection (BFS) approach (Abe, 2005) to identify the machine learning model's input features. Implemented in the Python package "Scikit-learn" (Pedregosa et al., 2011), the BFS identifies the optimal combination of a prespecified number of features for machine learning models through an iterative approach. A key step in BFS is to estimate each input variable's feature relevance measure, calculated from a random forest classifier (Breiman, 2001). It should be noted that NN is not used for the BFS step because it does not provide a feature relevance measure. A trial-and-error analysis is performed to select the number of features associated with the largest binary classification accuracy.

| Neural network modeling
We use the widely applied feedforward artificial NN to extract the relationships between selected input features and farmers' binary perception of marginal land availability (PML). An advantage of the NN model is its strong flexibility to approximate a relationship with precision. Another benefit of NN is its ability to interpret farmers' heterogeneous sensitivities to the various influencing factors, which is introduced in section III of the SI.
The flexibility of NN (and other machine learning models) requires relatively large training samples to constrain the model from learning the random errors in the training dataset (Raudys & Jain, 1991). Based on the results of empirical experiments, a binary classification machine learning model with a small number of input features (smaller than 10) requires at least 100-200 training samples (Vabalas et al., 2019;Zhang & Ling, 2018). This study has a sample size of 204 and the number of selected input features for the NN model is 6 (see section II in the SI), which makes the sample size appropriate for the machine learning model.
The selection of the NN hyper-parameter (including NN architecture, activation function, etc.), as well as the training of the NN model, is introduced in section II in the SI. We use fivefold cross-validation suggested by Yadav and Shukla (2016) to validate the NN model. The validation performance is measured by the area under the receiver operating characteristic curve statistic (AUC; Fielding & Bell, 1997). Ranging between 0 and 1, AUC >0.5 indicates a random classifier better than a random guess, and AUC = 1 indicates a perfect classifier.
To interpret the model, we develop a partial dependence plot (PDP; Molnar, 2019) for the trained NN model. The PDP illustrates the marginal impact of different input features on the MLL by accounting for the average effects of the other features in the model. The PDP is implemented using the Python package PDPbox (SauceCat, 2018). A positive value in the PDP represents a positive impact of an input feature, and vice versa.
To further interpret the NN model, we apply the DTD (Montavon et al., 2017) of the NN model with each modeling sample. For each sample (farmer), DTD calculates a unique combination of feature relevancies R p , which represents the importance of each influencing factor p in explaining the farmer's perception about their land being marginal. The values of R p are non-negative, and their sum equals the NN prediction, that is, the MLL. In order to make the feature relevancies comparable for different farmers, the R p values of each prediction are normalized: where NFR p is the normalized feature relevance for the pth input feature. Full details about the DTD and feature relevance measures are introduced in section III of the SI.

| Model prediction for Midwestern states
After the NN model for marginal land availability is trained and validated, we apply the model to land pixels in Midwestern states with an existing land use of cropland, pastureland, or grassland (Wickham et al., 2014). We also calculate the NFR values for each input feature and denote the dominating factor as the feature with the largest NFR value for each prediction.
The values of biophysical factors at each land pixel are collected from the sources shown in Table 1 (USDA, 2017). The county average farm size is calculated from countyspecific farm data for 12 classes of farm size (Table S2). The remote sensing data for model prediction are projected to World Geodetic System (WGS) 84 coordinates and resampled to 250 m resolution using the "Bilinear" resampling methods in ArcGIS (ESRI, 2012). All other county average socioeconomic data are uniformly converted into raster data throughout the county with the above resolution and projection.

| Survey results and model assessment
Among the 204 effective responses, 133 report at least part of their land as low-quality land (marginal land); 109 indicate the reasons why their land is marginal. The result (Table 2) suggests that less optimal biophysical conditions ("low soil quality," "high slope," and "prone to flooding") account for most of the farmers' choice of low-quality land availability, but there is also a considerable number of farmers who relate their decisions to economic factors (e.g., "low profitability") and infrastructure conditions (e.g., "difficult to access machinery"; Table 2). It should be noted that the factors in Table 2 are only available for farmers with marginal land. Further explanation of farmers' perceptions of marginal land availability (especially for the farmers with no marginal land) therefore requires use of the factors in Table 1.
The feature selection procedure also identifies both biophysical and socioeconomic factors as inputs for the NN model. Based on the recursive feature selection result shown in Figure S2, the following six factors are selected: farm size (Total_area), growing season precipitation (Prcp_grow), root zone soil water capacity (Rootaws), average slope (Mean_ slope), growing season mean temperature (Tmean_grow), and growing season diurnal range of temperature (Trange_ grow). Spatial distributions of the selected features are shown in Figure S3.
The NN model achieves an AUC value of 0.72 in the crossvalidation dataset (Figure 2), comparable to the model performances of similar studies (Rizzo et al., 2014;Wimberly et al., 2017). The PDP in Figure 3 shows that the MLL generally increases with Total_area, Mean_slope, and Tmean_grow; and decreases with Prcp_grow, Rootaws, and Trange_grow. Some nonlinear relationships between the input features and MLL can be identified in Figure 3. For example, the MLL increases quickly with farm size when Total_area is smaller than 2000 acres but slowly when Total_area is larger than 2000 acres. For the land slope, the PDP curve increases rapidly when Mean_slope is less than 4% but is relatively flat when Mean_slope is greater than 4%. As can be seen from Figure 3, the partial dependence patterns only show the marginal impact of an input feature under the average condition of the other features in the NN (Molnar, 2019). The relationship between the MLL and an input feature might vary under different survey samples, as shown in Section 3.2.
To derive the uncertainties shown in Figure 3, a bootstrapping method (Freedman, 1981) is applied to generate 200 realizations of the NN model by resampling the training data with replacement. The shaded areas in Figure 3 show the standard deviation of the bootstrapping variance. The result suggests a notable but acceptable level of uncertainty caused by the relatively small sample size used in this study; the general relationships identified above are not affected by the uncertainties.

| Model interpretations
The scatterplots in Figure 4 show near-linear relationships between the six influencing factors and their NFR values. The signs of the correlation coefficients are opposite between the positive (the farmer has marginal land) and negative (the farmer has no marginal land) prediction samples. More specifically, we find that for the farmers with marginal land, the T A B L E 2 Farmers' response for the reason of low-quality land

Reason
Low Averaging over the surveyed farmers, among the six influencing factors, Trange_grow has the largest NFR value for positive predictions and Total_area for negative predictions ( Table 3). The average NFR values for other influencing factors also vary between the positive and negative predictions but with less significant magnitudes (Table 3).
It should be noted that the collection of additional survey responses could allow for the construction of a more complex NN model with more input features and deeper hidden layers. The NFR-feature value relationships of a more complex model might be different from that identified from Figure 4. Nevertheless, the results presented here provide an important preliminary understanding of the socioeconomic and biophysical factors' impacts on farmers' heterogeneous perceptions of land being marginal. The results further show how the relative importance of these factors varies among farmers F I G U R E 4 Scatterplot of the values and normalized relevancies of the six influencing factors for the (a) positive and (b) negative prediction samples. Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_grow: growing season diurnal range of temperature F I G U R E 3 Partial dependence plot for the six neural network model inputs. Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_grow: growing season diurnal range of temperature (e.g., between the farmers claiming or not claiming that their land is marginal).

| Marginal land availability and its dominant influencing factors in Midwestern states
We evaluate the spatial likelihood of marginal land availability in Midwestern states by applying our trained NN model to the cropland, pastureland, and grassland in this region. The summary statistics of our survey samples and the overall statistics of Midwestern farmers from the USDA NASS dataset are similar (Table S3), suggesting good representativeness of our survey.
The results suggest that farmers in the western parts of North Dakota, South Dakota, and Nebraska are generally more likely to claim marginal land than farmers elsewhere in Midwestern states (Figure 5a). In these T A B L E 3 Average normalized feature relevance values of the six influencing factors for the positive and negative prediction samples. Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_grow: growing season diurnal range of temperature

F I G U R E 5
Pixel-level prediction of (a) marginal land likelihood and (b) the dominating factors. Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_ grow: growing season diurnal range of temperature western locations, the relatively low root zone soil water capacity (Rootaws, Figure S3) explains the marginal land existence for most land pixels (Figure 5b). An exception is a few counties where farms are large ( Figure S3), and Total_area is identified as the dominating factor ( Figure  5b). Farmers in the traditional Corn Belt region (Iowa, Illinois, Indiana, southern Michigan, western Ohio, eastern Nebraska, eastern Kansas, southern Minnesota, and parts of Missouri) claim relatively low MLLs (Figure 5a), but there are also local variations as a result of farmer heterogeneity (e.g., the mixture of yellow and green colors in Iowa in Figure 5a). The impact of farmer heterogeneity can be better identified in the map of dominant influencing factors ( Figure  5b) where local transitions of dominant influencing factors can be spotted throughout the figure. In addition to the spatial patterns identified above in the western parts of North Dakota, South Dakota, and Nebraska, and the traditional Corn Belt region, Mean_slope is found to be the most important factor in the high slope regions of Wisconsin and southern Nebraska (Figure 5b; Figure S3). In general, Total_area is identified as the major factor for most of the land pixels. Trange_grow, Rootaws, and Mean_slope explain the marginal land availabilities in most other land pixels.
We further classify the results from Figure 5 by the existing land use of each pixel: cropland, pastureland, and grassland ( Figure 6). Figure 6a suggests that cropland has the lowest MLL among the three land-use types, followed by pastureland and grassland. Our results confirm findings from previous studies that marginal land is primarily located on non-croplands (Cai et al., 2011;Campbell et al., 2008;Yang et al., 2020). Nevertheless, there is still a significant number of cropland pixels with an MLL over 0.5, which suggests a considerable amount of marginal land is located in existing croplands. We also identify a change in the dominant factor between the different types of land uses (Figure 6b). The most important factor in Midwestern states is Total_area for cropland, Trange_grow for pastureland, and Total_area for grassland.

| Farmers' perceptions of marginal land and its influencing factors
Our analysis of the survey data demonstrates that farmers' perceptions of marginal land availability are affected by both biophysical and socioeconomic factors. Total_area appears to be a controlling factor, especially for farms growing food crops (Figure 6b). Farmers with large farm sizes are generally more likely to claim marginal land. The controlling role of farm size can be further identified in Figure 7, which shows an increase in MLL with farm size as well as the increasingly dominant role of Total_area in explaining MLL with an increase in farm size. This observation could be understood through several mechanisms. From a physical perspective, the trend might occur based on our binary definition of marginal land availability and due to land heterogeneity (Dalgaard et al., 2011). From a human perspective, farmers having different farm sizes might have different land management objectives and constraints. There is evidence that farmers with larger farm sizes tend to have larger emphasis on environmental consequences (Willock et al., 2008); meanwhile, larger farms are less likely to have enough labor to manage each piece of the land and therefore treat a greater portion of the land as marginal. Further separation of the impacts of farm sizes could be achieved by incorporating additional survey samples and input features in the NN model.
The other influencing factors used in the NN model are biophysical factors related to land productivity (potential crop yield). Rootaws controls soil efficiency of rainfall retention for crop evapotranspiration (He & Wang, 2019) and is found to positively affect crop yields; Mean_slope negatively affects crop yields through its impact on soil water and nutrient losses (Jiang & Thelen, 2004). The effects of three climate factors (Tmean_grow, Prcp_grow, and Trange_grow) on land productivity are complex. Each crop usually has an optimal temperature and rainfall, above which the crop production can be negatively affected by increased evapotranspiration loss (for temperature) and flooding loss (for rainfall). Below optimal temperature and F I G U R E 6 (a) Probability densities for three land-use pixels' marginal land likelihoods, and (b) frequencies of the dominating factors for the three land-use pixels; each row in (b) represents the input feature and each column the land use. Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_grow: growing season diurnal range of temperature rainfall impair crop production because of low-temperature plant failure and lack of water to support crop growth (Leng & Hall, 2019;Porter & Gawith, 1999;Rosenzweig et al., 2002). Trange_grow represents the changes in the daytime and nighttime temperature. This factor might positively affect crop yield through reduced respiration loss during the night or negatively affect crop yield through increased evapotranspiration loss during the daytime (Lobell, 2007). In the US Midwestern states, Prcp_grow and Trange_grow are found to be positively correlated with crop yields (Goldblum, 2009;Leng & Huang, 2017) and Tmean_grow negatively correlated with crop yields (Leng, 2017). As expected, we find the impacts of the investigated biophysical factors on farmers' perceptions of marginal land (Figures 3  and 4) confirm their relationships with crop yields.
Interestingly, in Figure 4 and Table 3, we identify two distinct behavior rules between farmers with and without marginal land. The perceptions of farmers without marginal land are mainly explained by their (usually) small farm size, and the perceptions of farmers with marginal land are mostly explained by other (usually unfavorable) biophysical factors. There are also opposite signs of the NFR-feature relationships between farmers with and without marginal land. These distinct differences imply a potential divergence of land management objectives between groups of farmers (Willock et al., 2008), which shows the presence of heterogeneity among farmers.

| Limitations
Because of the limited sample size, our NN model includes only six influencing factors for predicting MLL: five relate to biophysical factors and one (Total_area) represents compounding impacts of both biophysical and socioeconomic factors (see discussion in Section 4.1). In reality, farmers' socioeconomic characteristics, for example, land management objectives and education level, could affect their perceptions of marginal land availability (Skevas et al., 2014(Skevas et al., , 2016. Incorporation of these factors can potentially clarify the compounding impacts represented by Total_area in this study (see Section 4.1 for details).
Another limitation due to a relatively small sample size (204) is the relatively high cross-validation variance, as shown in Figure 2. However, as shown in Figure 3 and Table  3, the relatively high cross-validation variance does not affect the general relationships identified in this study. A relatively low sample size (100-300) is also considered acceptable in many machine learning analyses of survey results (Khondoker F I G U R E 7 (a) Pixel-level predictions of marginal land probability and (b) the dominating features assuming different area classes. Area classes 1-12 denote ranges of farm size from small to large (refer to Table S2 for their detailed ranges). Total_area: farm size; Prcp_grow: growing season precipitation; Rootaws: root zone soil water capacity; Mean_slope: average slope; Tmean_grow: growing season mean temperature; Trange_grow: growing season diurnal range of temperature Oskar et al., 2020). In general, Vabalas et al. (2019) and Zhang and Ling (2018) recommended that the minimum sample size for a binary classification machine learning model should be 100-200. Nevertheless, a larger number of survey samples could help improve the validity and predictive power of the NN model.
Our results of marginal land availability might contain uncertainties and biases if the respondents and nonrespondents have systematic differences, leading to nonresponse bias (Peytchev, 2013). In this study, one source of nonresponse bias is the potential for respondents and nonrespondents to differ in the explanatory variables (e.g., farm size). Such differences can lead to biases in the NN model parameters (Krawczyk, 2016) and hence the interpretation about the dominating factors of farmers' marginal land perception in Figure 5b. They might also lead to biases in the estimate of the overall marginal land availability in the Midwest. For example, if farmers with larger farms are more likely to respond to the survey, the result will overestimate the marginal land availability in the Midwest due to the positive impact from the farm size, as identified in this study. Another source of nonresponse bias relates to the potential for respondents to have systematically higher or lower marginal land availability than nonrespondents because of some factors not investigated in this study. Such differences might lead to biases in the estimate of overall marginal land availability in the Midwest ( Figure 5a); however, these differences should not affect the interpretations of the model results (Figure 5b) if the underlying mechanism of nonresponse does not correlate to the explanatory variables used in this study. Nevertheless, nonresponse bias is a common problem in survey studies, and a low response rate (our response rate is 9.7%) might not necessarily be its cause (Groves & Peytcheva, 2008). In this study, we do not anticipate the first type of nonresponse bias (related to explanatory variables) because the respondents have similar distributions to the Midwestern farmers in the six influencing factors (Table S3). To avoid the second type of nonresponse bias (caused by undetected reasons), further study with improved survey design is needed, for example, adjust survey design based on the results from preliminary survey responses (Groves & Heeringa, 2006).

| Implications and conclusions
In the survey, 133 farmers report their current land use on marginal lands. Over half of them allocate at least part of their marginal land for food crop production, about one third grow hay or pasture, and only five farmers utilize their marginal land for bioenergy crops (Table 4). Such a result confirms our findings in Figure 6a that a considerable part of marginal land might come from existing croplands. Thus, our result implies that at least part of bioenergy cropland will replace existing croplands in the future. In the debate over the "food versus fuel dilemma," researchers have suggested growing bioenergy crops on non-cropland to avoid competition between fuel and food production (Campbell et al., 2008;Graham-Rowe, 2011). Our findings agree with the others (Cai et al., 2011;Emery et al., 2017;Skevas et al., 2016;Yang et al., 2020) that some potential marginal land for planting bioenergy crops is already used for food crops. However, converting marginal land with food crops to energy crops would not cause major competition between food and fuel because the land is already low quality. In addition, researchers suggest that converting part of existing cropland into perennial bioenergy crops could help reduce nutrient loss, diversify the near monoculture landscape in the US Midwestern states, and improve farmland biodiversity. The above mechanisms can subsequently improve crop production for the unconverted lands (Landis et al., 2018;Robertson et al., 2017;Werling et al., 2014).
The goal of this study is to identify the marginal land ( Figure 5a) that could potentially be used for bioenergy crop production (especially when some policy incentives are available, e.g., cost-sharing, additional financial incentives, and government conservation programs; Bergtold et al., 2014). However, given farmers' land management decisions are still mostly driven by economic profitability (Augustenborg et al., 2012), their adoption of bioenergy crops will not be limited to the low-quality land we have identified. Higher quality lands can also be used for bioenergy crops when favorable market conditions are available (Brown et al., 2016). For example, approximately 40% of US corn croplands were used for ethanol production in 2019 (USDA, 2020).
That said, the map of influencing factors in Figure 5b could help policymakers design more targeted policies to promote bioenergy crop production on marginal land. One such policy we suggest, given the controlling role of farm size on farmers' perceptions of marginal land availability, is to target large farm owners and operators to encourage bioenergy crop adoption. Such a recommendation is consistent with previous theoretical and empirical analysis (Khanna et al., 2017;Lynes et al., 2016). Larger farms are also expected to be more willing to adopt bioenergy crops because of the economy of scale (larger farm size compensates the usually high fixed cost of bioenergy crop production; Khanna et al., 2017). Overall, our study highlights the importance of socioeconomic factors in explaining farmers' heterogeneous perceptions about their land being marginal. Though a full explanation of the impacts of various factors would require additional survey data and probably a more complex machine learning model structure, the results of this study provide valuable information for researchers to improve their understanding and modeling of farmers' land-use choices, particularly for bioenergy crop adoption.