Applications of artificial intelligence- based modeling for bioenergy systems: A review

Bioenergy is widely considered a sustainable alternative to fossil fuels. However, large- scale applications of biomass- based energy products are limited due to challenges related to feedstock variability, conversion economics, and supply chain reliability. Artificial intelligence (AI), an emerging concept, has been applied to bioenergy systems in recent decades to address those challenges. This paper reviewed 164 articles published between 2005 and 2019 that applied different AI techniques to bioenergy systems. This review focuses on identifying the unique capabilities of various AI techniques in addressing bioenergy- related research challenges and improving the performance of bioenergy systems. Specifically, we characterized AI studies by their input variables, output variables, AI techniques, dataset size, and performance. We examined AI applications throughout the life cycle of bioenergy systems. We identified four areas in which AI


| 775
LIAO And YAO 1 | INTRODUCTION Different renewable alternatives have been explored to reduce fossil fuel consumption and Greenhouse Gas (GHG) emissions (Abdmouleh et al., 2015). Among these alternatives, bioenergy is currently the largest renewable energy source (50% of the global renewable energy sector) that shows great potential in addressing global energy and climate change issues (International Energy Agency, 2018). The modern industry has developed a variety of technologies to convert biomass into different forms of energy products, including solid (e.g., firewood, fuel pellets, and biochar), liquid (e.g., biodiesel, bioethanol, and bio-oil), and gaseous products (e.g., syngas, biogas, and bio-hydrogen; Guo et al., 2015). The benefits of bioenergy have been widely discussed in the literature, such as enhancing energy security, economic development of rural areas (Demirbas & Demirbas, 2007), regional agriculture growth (Maltsoglou et al., 2013), and the natural resources utilization efficiency (Khishtandar et al., 2017). But large-scale adoption of bioenergy systems is still limited due to challenges related to biomass feedstocks (e.g., large variations in biomass quality), biorefineries (e.g., difficulties in process control and operations), and supply chains (e.g., high complexity and risks; Asadullah, 2014;Nguyen et al., 2015). To address these barriers, tremendous efforts have been made on bioenergy research and development. Traditional modeling approaches, such as process and supply chain optimization, have also been intensively explored in the literature (Ghaderi et al., 2016).
Artificial Intelligence (AI) has received increasing interest recently. AI refers to machines' ability to perform activities that mimic human intelligence (Russell & Norvig, 2010). AI could be implemented through different techniques in computer science, such as machine learning, heuristic algorithms, and fuzzy logic (FL; Mohd Ali et al., 2015). Many applications have been demonstrated in different domains, such as chemical engineering, intelligent manufacturing, and building energy conservation (Dounis, 2010;Li et al., 2017;Mohd Ali et al., 2015;Rahmanifard & Plaksina, 2018). Compared with areas that have thousands of publications related to AI applications (e.g., solar and wind energy; Marugán et al., 2018;Wang et al., 2020), AI applications to bioenergy systems are limited. However, previous studies indicated the tremendous potential of AI in addressing barriers in bioenergy development. For example, Castillo-Villar reviewed 51 case studies using metaheuristic algorithms to address the bioenergy supply chain (BSC) challenges (Castillo-Villar, 2014). Ardabili et al. reviewed the applications of machine learning and deep learning techniques in various biofuel research domains (Ardabili et al., 2020). Previous reviews are highly scattered, and most of them have focused on either a part of bioenergy systems (e.g., the performance of engine using biofuel; Yusri et al., 2018) or a single AI approach (e.g., Artificial Neural Network [ANN]; Sewsynker-Sukai et al., 2017). Given the large variety of AI approaches, bioenergy products, conversion technologies, biomass types, and supply chain design, a holistic review of existing and potential AI applications throughout the entire BSC (i.e., from biomass cultivation to final end-use) is needed. This review aims to address this need. The following sections introduce the main parts of the biomass systems covered in this review (Section 2), discuss main AI techniques (Section 3), and highlight the insights from previous AI applications characterized by bioenergy technologies and AI algorithms (Section 4). Park, Kelley, et al., 2020). Therefore, previous research have focused on biomass characterization and screening for energy applications (Carpenter et al., 2014). As a result, a large amount of characterization data have been generated (e.g., Phyllis2 (ECN. TNO, 2012) and Bioenergy Feedstock Library (U.S. Department of Energy & Idaho National Laboratory, 2015). However, most data are currently underutilized and biomass research still heavily rely on traditional trial-anderror. Powered by AI, those data could be valuable resources to support faster and more efficient biomass screening.
There are two types of biomass conversion technologies: thermochemical conversion and biochemical conversion (Goyal et al., 2008). Thermochemical conversion produces solid, liquid, and gaseous products by thermally processing biomass. Common thermochemical technologies include gasification, pyrolysis, torrefaction, and hydrothermal carbonization . Biochemical conversion converts biomass feedstocks to liquid (e.g., biodiesel and bioethanol) and gaseous (e.g., biogas) products by small molecules, bacteria, microorganisms, or enzymes (Brethauer & Studer, 2015). Matching the conversion technology with biomass feedstock and optimizing the conversion process from different perspectives (e.g., economic feasibility, environmental sustainability, and product quality) has been one of the main research areas for bioenergy . Different modeling approaches have been investigated (Ahmad et al., 2016;Cambero & Sowlati, 2014). Traditional simulation and optimization models often rely on quantitative understandings of the relationships among biomass characteristics, process operating conditions, and product yields and properties, which is challenging as many of these relationships have not been fully understood. On the contrary, many AI techniques do not rely on pre-known knowledge of input-output relationships, potentially filling the knowledge gap and enhancing traditional modeling approaches.
Depending on the conversion technology, different products can be produced and used to supply energy in the forms of electricity, heat, and work (e.g., biofuel used in the engine of vehicles; Thrän et al., 2015). Solid biofuels (e.g., biochar) derived from torrefaction and pyrolysis processes can be combusted to generate heat (Guo et al., 2015) or co-fired with coal in traditional or combined heat and power (CHP) plants (Nunes et al., 2014). Recently, the co-production of biochar and syngas in gasification has gained increasing attention, given the potential of improving conversion economics . Liquid biofuels can be produced by different technologies (e.g., bioethanol from fermentation, biodiesel from transesterification, bio-oil from pyrolysis, and then upgraded to biofuel through Fischer−Tropsch synthesis). Liquid biofuels are alternatives to fossil-based gasoline, diesel, and jet fuels used in engines and electric generators . Gaseous biofuels usually refer to either methane-rich gas (biogas) from anaerobic digestion or syngas from gasification (Guo et al., 2015). Biogas and syngas can be either directly burnt for on-site power generation or upgraded to common fuel types (e.g., natural gas upgraded from biogas and liquid fuels upgraded from syngas) (IEA Bioenergy, 2009).
BSCs involve many activities, as shown in Figure 1. Given the considerable uncertainty associated with biomass availability and quality, exquisite design and accurate operations of BSCs are often needed to ensure the overall economic feasibility and supply chain robustness. Different approaches have been explored, but few previous studies have fully addressed those challenges . Previous reviews identified main research gaps, including modeling issues (e.g., the large number of variables required and high computational cost (De Meyer et al., 2014)) and methodological challenges (e.g., the difficulties in incorporating stakeholder perspectives and potential conflicts, socioeconomic factors, and environmental constraints (Ba et al., 2016;Mafakheri & Nasiri, 2014)).

| ARTIFICIAL INTELLIGENCE
AI is a promising tool to address the research gaps discussed previously. Four types of major AI approaches were reviewed in this study and shown in Figure 2 based on the major branches proposed by Kalogirou (Kalogirou, 2003) with some modifications from two other studies (Dounis, 2010;Smolensky, 1987).
Symbolic AI uses symbols to represent cognition and implements logic deduction to reflect the process of human cognition (Haugeland, 1985;Smolensky, 1987). Techniques with top-down paradigms (e.g., FL, expert | 777 LIAO And YAO system (ES) and case-based reasoning) have been developed based on symbolism (Venkatasubramanian, 2019). The symbolic AI has broad applications in process system engineering (Uraikul et al., 2007), and some of them relate to bioenergy systems such as FL-based control for biomass-based power plants (Jurado et al., 2002) and boilers (Romeo & Gareta, 2009).
Heuristics are the stochastic search methods executed through learning-based techniques and experience (Zheng et al., 2013). Two main types of heuristics are evolutionary algorithms and swarm intelligence. Evolutionary algorithms include genetic algorithms (GA), differential evolution, and evolution strategies (Vikhar, 2017). Swarm intelligence has different algorithms, such as particle swarm optimization (PSO) and ant colony optimization (ACO; Chakraborty & Kar, 2017). Previous studies have demonstrated the powerful capability of using heuristics to find near-optimal solutions for complex problems such as BSC planning and scheduling (Castillo-Villar, 2014;Devika et al., 2014;Keller, 2018).
Machine learning is another subset of AI that can learn and improve from experience for specific tasks without being explicitly programmed (Mitchell, 1997). Two types of machine learning are mostly used: connectivism and statistical learning (Russell & Norvig, 2010). The connectivism includes techniques such as feedforward neural network (FNN), radial-basis functional network (RBF), recurrent neural network (RNN), and the state-of-art convolutional neural network (Himmelblau, 2008;Jha et al., 2017). These AI techniques are also called "Artificial Neural Network (ANN)." Statistical learning is based on statistical methods. Representative techniques include support vector machine (SVM), random forest (RF), and Bayesian network (BN; Dounis, 2010). These machine learning techniques differ in their fundamental principles and structure, and thus their performance varies. For example, compared with regular FNN, RBF is less sensitive to the noises in training data, and its training process is generally faster, but RBF requires a large number of hidden nodes that limit its application in some cases (Mohd Ali et al., 2015). Table 1 shows a more detailed comparison of different machine learning techniques.
This review includes a few other techniques such as agentbased modeling (ABM), that have been applied to BSCs and biorefineries Singh et al., 2014). Some studies explored hybrid methods such as neuro-fuzzy, neurogenetic, and fuzzy-genetic methods (Cordón et al., 2001). One representative hybrid approach is the adaptive neurofuzzy inference system (ANFIS) that integrates neural networks and FL (Youssef et al., 2017) and has been applied to renewable energy systems (Jha et al., 2017).

ARTIFICIAL INTELLIGENCE TO BIOENERGY SYSTEMS
Articles were collected by searching different combinations of AI and bioenergy keywords in Google Scholar (Google LLC, 2021) and Web of Science (Clarivate, 2021). Bioenergy keywords include bioenergy products (e.g., biofuel, biogas, bioethanol) and different bioenergy systems components (e.g., pyrolysis, hydrolysis, biofuel engine, and bioenergy supply chain). Literature was collected and screened based on three criteria: (1) articles published in peer-reviewed journals and conference proceedings; (2) articles published in recent 15 years (2005-2019); (3) articles applied one or multiple AI techniques to processes relevant to bioenergy systems in Figure 2. Articles not identified through the previous search but highlighted by recent review papers were also included (Ardabili et al., 2020;Castillo-Villar, 2014;Jha et al., 2017;Levstek & Lakota, 2010;Obafemi et al., 2019;Sewsynker-Sukai et al., 2017;Suganthi et al., 2015). In total, 229 papers were initially collected. The screening was conducted to exclude articles with very small dataset sizes given the lack of rationale of using AI, a relatively more complex technique than traditional regression approaches. The screening was based on a rule of thumb -"the number of samples should be at least 10 times the number of inputs" (Kavzoglu & Mather, 2003), an empirical rule that has been used by various studies (Baum & Haussler, 1989;Garson, 1998;Haykin, 2009). The limitation of this rule is the lack of considering other factors such as data availability and model complexity. The rule was implemented by first identifying the number of input variables and the size of whole data samples used by the AI models in each literature. Then those articles whose dataset size smaller than 10 times the number of input variables were excluded. Besides, articles without transparent documentation of dataset size and data sources were excluded. After the screening, 164 of 229 papers were reviewed in this study. The summary of the individual paper is provided in Tables S1-S5. Studies were characterized into four groups based on the purpose of AI applications (n is the total number of publications): • Category 1: prediction of biomass feedstock properties for rapid screening and selection of biomass species (n = 20). • Category 2: prediction of process-based performance indicators of biomass conversion for process optimization and design (n = 60).
• Category 3: prediction of biofuel properties and devices/ facilities performance for the optimal utilization of bioenergy (n = 45). • Category 4: optimization for supply chain design and planning from both technical and sustainability perspectives (n = 39). Figure 3 depicts the number of publications in each category by year. There are three observations. First, starting from 2016, the annual publications related to AI applications to bioenergy have dramatically increased. This may be explained by the increased global interest in bioenergy as an energy solution for climate change mitigation, especially after the Paris Agreement adopted in 2015 (United Nations Climate Change, 2015; Welfle et al., 2020). Second, the overall increase was primarily attributed to the increase of AI applications in Category 2 for biomass conversion that seems to be aligned with the increased interest in process modeling of biomass conversion after 2014 (Welfle et al., 2020). Third, most studies focused on category 2, followed by category 4 and 3. The following sections provide an in-depth discussion of each category.

| AI applications for the prediction of biomass feedstock properties
Biomass properties have direct impacts on the operability of biomass conversion and the quality of bio-based products. Thus, this study focuses on reviewing the AI applications for the prediction of biomass feedstock properties that have the potential to be linked to either AI-based or traditional process-based simulations for biomass conversion. Studies

| 779
LIAO And YAO applying AI to agriculture systems for biomass production are not included, and they have already been reviewed in previous literature (Chlingaryan et al., 2018;Liakos et al., 2018).
AI offers a promising alternative to reduce the cost and time for biomass characterization, such as using oxygen bomb calorimeter with ASTM standard D5865-13 for higher heating value (HHV; García et al., 2013), thermogravimetric analysis (TGA) with ASTM standard D7582-15 for ash/moisture content (ASTM D7582-15, 2015) as proximate analysis, and ultimate analysis for the carbon and hydrogen content of biomass (Kirsten, 1983). Traditional analytical approaches are time-consuming and expensive. Many studies used AI to predict biomass characteristics based on other properties that are easier to measure (see Table S1 for detailed summaries for each paper).
The majority of AI studies reviewed focus on the prediction of HHV, and the proximate and ultimate analysis provides essential biomass characterization data used as the inputs to AI models. Proximate analysis data were used more frequently as the inputs of AI models than ultimate analysis data in the studies reviewed. A possible explanation is that the ultimate analysis is generally more expensive and timeconsuming than proximate analysis (Cordero et al., 2001). Specifically, 14 papers predicted the HHV of biomass using the composition data from proximate analysis covering a variety of biomass types (e.g., woody biomass, herbaceous and agricultural and animal biomass) (Akkaya, 2016;Ceylan et al., 2017;Dashti et al., 2019;Estiati et al., 2016;Ghugare, Tiwary, Elangovan, et al., 2014;Hosseinpour et al., 2017Hosseinpour et al., , 2018Keybondorian et al., 2017aKeybondorian et al., , 2017bOzveren, 2017;Samadi et al., 2019;Suleymani & Bemani, 2018;Uzun et al., 2017;Xing, Luo, Wang, Gao, et al., 2019). The sizes of datasets have large variations, ranging from 50 samples to 830 samples, while 40% of the reviewed studies in Category 1 used the dataset with 300-400 samples and the rest are either below 300 or above 400. The number of output (e.g., HHV) and input (e.g., fixed carbon, volatile matters, and ash content) are similar, and most of them achieved high accuracy (e.g., R 2 > 0.9, root mean squared error <1.5). Only five studies have predicted the HHV of biomass using ultimate analysis data that provides elemental composition such as the content of carbon, hydrogen, oxygen, nitrogen, and sulfur (Boumanchar et al., 2019;Darvishan et al., 2018;Duan et al., 2018;Ghugare, Tiwary, Elangovan, et al., 2014;González-García, 2018). One study indicated that using ultimate analysis data to predict HHV is likely to be more accurate than using proximate analysis data (Vargas-Moreno et al., 2012).
Across all of those studies, nine of them compared the performance of AI-based models with traditional empirical correlation, and they showed higher R 2 of the AI models than that of traditional approaches (Akkaya, 2016 Tiwary, Elangovan, et al., 2014;Ghugare, Tiwary, Tambe, 2014;Huang et al., 2016;Xing, Luo, Wang, Gao, et al., 2019). One interesting AI application is predicting ultimate analysis data based on the proximate analysis data, and the trained model has demonstrated superior performance compared with traditional linear regression (Ghugare, Tiwary, Tambe, 2014). Given the expensive ultimate analysis, this could be a promising direction that is worth further exploration. Besides, one study used ultimate analysis data (144 samples) to predict the cellulosic composition of biomass (e.g., predict the content of cellulose, hemicellulose, and lignin with RF model), and the trained model showed high accuracy . Another study used ultimate analysis data to predict biomass chemical exergy (relative errors within ±1.5%; Huang et al., 2016).
Although most studies achieved R 2 > 90%, many of them reported different model accuracy when using different combinations of AI techniques, training algorithms, and other statistical analysis methods (e.g., Principal Component Analysis (PCA) and Partial Least Square Analysis (PLS)). For example, Estiati et al. observed slight decreases of R 2 for ANN models when increasing training dataset size, which was explained as a result of deviations brought by additional data samples (Estiati et al., 2016). Hosseinpour et al. used the same datasets to train different machine learning models to predict the HHV of biomass, and concluded that ANN integrated with PCA and PLS was the best given their lowest mean average percentage error (MAPE) and mean squared error (MSE; Hosseinpour et al., 2018). Another study compared ANFIS combined with different heuristic approaches (PSO and GA) to predict HHV using the same dataset, and the ANFIS model optimized by PSO achieved the highest R 2 = 0.9759 . Those observations are useful references to select AI techniques for future AI applications.

| Summary
AI has been used to predict the properties of biomass feedstock and it is a trend to use easier-to-measure data to predict relatively hard-to-measure properties. Most AI studies focus on the prediction of HHV using the data from either proximate or ultimate analysis. More studies used proximate analysis data that are generally less expensive and time-saving than the ultimate analysis data. Given such difference, one interesting application and a promising direction are to predict the ultimate analysis data based on the proximate analysis data. Many studies compared the performance of AI with traditional empirical correlation and concluded the superior performance of AI, although the performance of AI is subject to the combinations of training datasets, AI techniques, training algorithms, and other statistical analysis methods.

| Thermochemical conversion technologies
Pyrolysis and gasification are two of the most mature thermochemical conversion technologies (Aslani et al., 2018), potentially explaining why most of the previous AI studies focused on these two technologies (Table S2). Pyrolysis and gasification thermally degrade biomass under an inert atmosphere but with different temperatures(400-700°C for pyrolysis and >700°C for gasification) and residence time (1 s-30 min for pyrolysis and 10-20 s for gasification; Lehmann & Joseph, 2009;Sharma et al., 2015). Table 2 summarizes the AI studies reviewed (see Table S2 for details of each study).
AI shows a unique capability to support traditional modeling approaches for pyrolysis. At the process level, processbased simulations (e.g., using Aspen Plus) rely on input data such as the yields and properties of products that are commonly obtained by experiments . Table 2 shows several studies that used AI techniques to predict the yields and properties (e.g., HHV and carbon content) of products based on the operating conditions (e.g., temperature, residence time, heating rate) and/or feedstock characterization data. Such prediction allows for the rapid development of process simulation models without expensive and timeconsuming experiments (Cheng et al., 2020). At the reaction level, kinetic models built upon fundamental reaction mechanisms and reaction kinetics can also predict the yields and composition of pyrolysis products. However, many kinetic parameters need to be determined by experimental data (Hameed et al., 2019). Table 2 lists three studies that used AI approaches to estimate kinetic parameters. One study trained FNN models using the compositional analysis data (the contents of cellulose, hemicellulose, and lignin) to predict the logarithmic forms of the three parameters required by kinetic models (pre-exponential factor, reaction order, and activation energy; Sunphorka et al., 2017). Aghbashlo et al. and Xing et al. modified this model by introducing the heating rate as an additional input variable (Aghbashlo et al., 2019;. The extended kinetic model could predict the detailed composition of pyrolysis products, allowing for further process simulations using software such as Aspen Plus (Peters et al., 2017). One interesting application is the use of AI to reduce the solution time of reaction network models for pyrolysis and enable faster process optimization (Hough et al., 2016(Hough et al., , 2017. Based on these studies, a conclusion is that AI techniques could facilitate the rapid development of process simulations and kinetic models for pyrolysis. Another benefit of AI demonstrated by previous studies is the capability of investigating different types of biomass and including whole biomass components. Traditional pyrolysis models such as kinetic, network, and mechanic models are mostly developed for cellulose rather than whole biomass that include cellulose, hemicellulose, lignin and other minor components (Hameed et al., 2019). Many AI studies reviewed in Table 2 include biomass composition data as input variables, and they cover different types of biomass. For example, Karaci et al. used different types of feedstock and catalysts coupled with the quantity of catalyst and pyrolysis temperature to predict the ratio of hydrogen-rich gas in the product (Karaci et al., 2016). The feedstock and catalyst types were represented numerically (e.g., use 1, 2, 3 to represent cotton shell, tea waste, and olive husk, respectively), and these numbers were used at the input of their ANN model. Most studies have used biomass characterization data from different analytical tools such as proximate analysis, ultimate analysis, and lignocellulosic composition analysis, as shown in Table 2. The inclusion of different biomass components allows previous AI studies for using contribution analysis or sensitivity analysis to quantitatively investigate the impacts of biomass compositions that are challenging to be explored by traditional pyrolysis models and are often explored by experiments (Liao et al., 2019;Sunphorka et al., 2017;Zhu et al., 2019).
A few studies trained AI models using TGA data that could be used to reduce the time and expenses of performing TGA. TGA is an analytical technique that measures the mass of samples as it is treated in given temperature profiles and it has been used to study biomass thermal behavior in pyrolysis. Those studies using TGA data have the largest datasets (>1000 samples) among the studies reviewed in this section. Most studies focused on a single type of biomass such as walnut shell, olive oil residue, forest residue, and sewage sludge, except one study published by Naqvi et al. that used blended rice husk and sewage sludge as the feedstock and set the feedstock blending ratio as one of the input variables of FNN (Naqvi et al., 2019).
Most AI studies of gasification focus on predicting syngas composition, and H 2 content is the most frequently used output variable (see Table S2 for a detailed variable list).

T A B L E 2 A summary of AI applications for pyrolysis and gasification
Ref. This can be explained by the importance of syngas composition, especially H 2 /CO ratio, to downstream processes such as direct combustion of syngas or Fisher-Tropsch synthesis to convert syngas to biofuel; Sahoo et al., 2012). Other output variables include calorific values (HHV and lower heating value (LHV)) of syngas (Mutlu & Yucel, 2018;Ozonoh et al., 2020;Pandey et al., 2015Pandey et al., , 2016Yucel et al., 2019), cold gas efficiency (CGE) and carbon conversion efficiency (CCE). In addition to reaction temperature and time, gasification studies include a few technologyspecific variables, including equivalence ratio (defined as the ratio of air needed by the gasification to air needed by the complete combustion of biomass feedstock; Asadullah, 2014), steam to biomass ratio (for steam gasification), and calcium oxide (CaO) to biomass ratio (for CaO catalyzed gasification; Rezk et al., 2019). One powerful AI application uses AI for dynamic analysis, control, and optimization of gasification that is very difficult to achieve by traditional gasification modeling approaches such as thermodynamic equilibrium models and kinetic models. The thermodynamic equilibrium approach is limited to a given gasifier at the given operating conditions as this method relies on assumptions such as constant temperature and perfect chemical mixing (Safarian et al., 2019). Kinetic models can be very accurate, but the accuracy requires reliable understandings of complex phenomena such as gas-solid-particulate fluid flows and microscopic evolution of particle distributions (Safarian et al., 2019). AI is capable of using changing or even real-time data without the need to understand complex phenomena. Table 2 lists several studies using real-time monitored variables such as the temperature and flow rate of air and fuel to predict syngas properties. The data sizes of these studies are generally large (>2350 data samples). For example, one study used the dynamic neural network (DNN) to predict real-time syngas temperature of a co-current fixed bed gasifier with varied operating conditions (Mikulandrić et al., 2016). The authors concluded that DNN is more accurate than multiple linear regression models. Other studies have used AI for gasification optimization. Kumar et al. used FNN to determine optimal operating conditions based on the ultimate analysis data of biomass feedstock (Kumar et al., 2018). Rameshkumar et al. trained an ANFIS model to predict the tar content after gasification to improve the syngas quality (Rameshkumar & Mayilsamy, 2014).

Input variables
Another potential advantage of AI over traditional gasification models is the capability of including reactor information. Thermodynamic equilibrium models assume that the reactor is zero-dimensional, and kinetic models are time-consuming to develop (although they can predict product compositions at different reactor positions), as discussed previously (Safarian et al., 2019). Ruiz et al. pointed out that investigating the impacts of gasifier types could be a future research direction to improve gasification performance (Ruiz et al., 2013). Two AI studies include reactor information. One used the distance from the bottom of a reactor (Sreejith et al., 2013) as an input. The other quantified the impacts of biomass composition in different reactors by establishing FNN models for circulating fluidized bed gasifier (CFB) and bubbling fluidized bed (BFB; Puig-Arnavat et al., 2013). This paper was initially excluded in the screening process given their small dataset but included here given the unique inclusion of different gasifiers.

| Biochemical conversion technologies
Biochemical conversion technologies include transesterification, hydrolysis, fermentation, and anaerobic digestion, which broadly differ in feedstock, reaction mechanism, and products. Table 3 summarizes AI studies by technologies (see Table S3 for detailed documentation).

Transesterification
Transesterification synthesizes biodiesel from oils/fats (e.g., vegetable oil, animal fats, and algal lipids) that consist of triglycerides (Meher et al., 2006). The major products of transesterification are glycerin and fatty acid methyl ester (FAME) that is biodiesel with high energy content (38-45 MJ/kg;Hoekman et al., 2012). Table 3 shows the yield of biodiesel as the most common output variable, although different indicators were used (e.g., FAME yield and FFA conversion). In addition to input variables related to the transesterification process, most studies included inputs related to assisted process such as ultrasonic power for ultrasound-assisted transesterification (Naderloo et al., 2017;Sajjadi et al., 2017), pressure for supercritical transesterification (Baghban, 2019;Farobie et al., 2015;Guo & Baghban, 2017), and mixing intensity for mechanical stirring aided transesterification (Sajjadi et al., 2017). These studies concluded that assisted process parameters impact biodiesel yield, although such impacts are not as significant as transesterification process parameters such as methanol-to-oil ratio (Karimi, 2017;Kusumo et al., 2017). Such information is valuable for future scale-up and optimization of advanced transesterification processes. In addition, two studies combined AI with process simulation. Nicola et al. combined GA with Aspen Plus simulation to optimize the transesterification process (Nicola et al., 2010). Another study used the data generated from Aspen Plus simulation to develop soft sensors by boosting (an ensemble machine learning (ELM) method) to predict the output material flows of transesterification simulation (Ahmad et al., 2019). Note that only 8 out of the 28 transesterification studies initially collected passed the screening process. Twenty studies were excluded given Biomass characterization data (e.g., carbonto-nitrogen ratio, substrate concentration), operating conditions (e.g., hydraulic retention time (HRT), pH, and stirring intensity).
their small datasets sizes, indicating a strong need for larger and high-quality datasets. Table 3 shows that all reviewed AI studies focus on enzymatic hydrolysis, one of the most prominent technologies for converting biomass to sugar used by subsequent fermentation for bioethanol production (Van Dyk & Pletschke, 2012). All AI studies for enzymatic hydrolysis focus on predicting sugar yields, but they used different indicators such as the yields of glucose, xylose, or total reduced sugar. For input variables, all studies used feedstock characterization data but used different enzyme data (e.g., xylanase, cellulose, α-amylase, β-glucosidase) depending on enzyme types (Astray et al., 2016;Das et al., 2015;Rivera et al., 2010;Sebayang et al., 2017a). For operating conditions, all studies in Table 3 for hydrolysis included reaction time. The traditional modeling approaches, such as mechanistic kinetic models developed for enzymatic hydrolysis, generally fail at modeling extended reaction times (Jeoh et al., 2017). Previous AI studies for the fermentation process have concentrated on estimating the final bioethanol yield or the final sugar concentrations using input variables such as initial sugar concentration, yeast concentration, and operating conditions (e.g., reaction time, temperature, pH; Grahovac et al., 2016; Jahanbakhshi & Salehi, 2019; Talebnia et al., 2015). Only one study predicted the kinetic parameters of the batch fermentation process using FNN based on operating conditions (Saraceno et al., 2010), and another study used an inverse neural network to control the temperature of the fermenter (Imtiaz et al., 2013).

Hydrolysis and fermentation
Although enzymatic hydrolysis and fermentation are usually operated subsequently for bioethanol production, most of the previous AI studies have investigated the two processes separately. Within all studies reviewed, only one study included both processes and predicted the composition of the final product (Talebnia et al., 2015). Given that most of the reviewed AI studies used the same AI technique FNN, it is technically feasible to integrate trained hydrolysis and fermentation AI models, but more data will be needed. More data and case studies are needed to support holistic optimization of the entire process that includes pretreatments, hydrolysis, and fermentation processes.

Anaerobic digestion and dark fermentation
Anaerobic digestion uses microorganisms to degrade biomass into biogas (Chynoweth et al., 2001), a promising bioenergy product that contains CH 4 and CO 2 . Table 3 lists the studies reviewed in this section. For input variables, all studies reviewed included feedstock characterization. Carbon-to-nitrogen ratio and substrate concentration are the most frequently used variables for biomass characterization. Most studies included operating conditions, and hydraulic retention time (HRT) is the most frequently used. This is not surprising as HRT has enormous impacts on the productivity and economic feasibility of biogas production (Mao et al., 2015;Shi et al., 2017). Note that AI studies for anaerobic digestion have the most extensive data size compared with other biochemical conversion processes, and some of them used data from industry-scale biogas facilities (De Clercq et al., 2019).
A few studies compared AI-based methods with traditional methods. Anaerobic Digestion Model 1 (ADM1) developed by (Batstone et al., 2002) has been widely used for anaerobic digestion. One study established ANFIS to predict the properties of biogas from the anaerobic digestion of palm oil mill effluent (Tan et al., 2018), and the authors concluded that ANFIS has higher accuracy compared to the results with ADM1. Similar conclusions were made by another study that highlighted ANN coupled with GA and ACO as a promising alternative to computationally expensive ADM1 (Beltramo et al., 2019). Sathish et al. compared the performance of ANN and the Response Surface Methodology (RSM), an integration of mathematical and statistical techniques, in predicting the biogas yield (Sathish & Vivekanandan, 2016). They concluded that both models have high accuracy, but ANN is slightly better than RSM. They recommended ANN for future studies as RSM has a structured nature.
There are a few emerging areas for AI applications. Dark fermentation is one of the most promising technologies to produce bio-based H 2, and significant efforts are needed to improve H 2 production efficiency (Łukajtis et al., 2018). Four studies utilized FNN to model the dark fermentation (El-Shafie, 2014;Nasr et al., 2013;Rosales-Colunga et al., 2010;Sewsynker & Gueguim Kana, 2016), and their results could support further process improvement and scale-up. Another area is the use of industrial effluents as feedstock for anaerobic digestion that has been used for wastewater treatment in recent decades (Rajeshwari et al., 2000). One AI study used palm oil mill effluent with seed sludge as feedstock, and their input variables include water quality indicators such as chemical oxygen demand and total suspended solids (Tan et al., 2018). Another promising AI application is genetic engineering with different tools (e.g., CRISPR) to overcome the bottlenecks of biochemical conversion and microbial robustness (Ng et al., 2017). Genetic engineering applications in improving the metabolic process of microalgae to produce biofuel (Radakovits et al., 2010) and the applications of machine learning in genetics (Libbrecht & Noble, 2015) are promising, even though these applications are in their infancy. This area is constantly evolving, how and when AI could be used to provide what useful insights need more exploration and research.

| Summary
This section highlights a few advantages of AI applications compared to traditional models for thermochemical and biochemical pathways.

LIAO And YAO
For pyrolysis, AI shows unique capabilities in supporting the rapid development of traditional process simulation and pyrolysis kinetic models and investigating different types and components of biomass that are very challenging to be explored by traditional pyrolysis models. AI applications using TGA data could potentially reduce the cost and time of TGA for biomass thermal analysis. For gasification, compared with thermodynamic equilibrium and kinetic models, AI does not rely on reaction assumptions or prior knowledge of complex phenomena. Previous studies show that AI enables dynamic control and optimization of gasification using changing or even real-time data and can include reactor data to improve gasification performance. Most AI studies have focused on pyrolysis and gasification, two of the most mature biomass conversion technologies. More data and AI case studies will be needed for other emerging technologies such as torrefaction and hydrothermal carbonization.
For biochemical conversion, AI studies focused on enzymatic hydrolysis, fermentation, and anaerobic digestion. Most AI studies modeled enzymatic hydrolysis and fermentation separately, although these two processes are usually operated subsequently for bioethanol production. It is technically feasible to integrate AI models of hydrolysis and fermentation, but more data will be needed. Pretreatment also needs to be considered to support holistic analysis and optimization of the entire process. AI studies for anaerobic digestion showed superior performance than existing models that are more computationally expensive. A few emerging areas for AI applications in this area are identified, such as dark fermentation, using waste feedstock, and potential integration with biotechnological tools.
AI studies for biochemical conversion and thermochemical conversion have a few differences. First, AI studies for biochemical conversion used much smaller datasets and mostly focused on single feedstock. As the dataset size usually limits the number of variables (May et al., 2011), most AI studies for biochemical conversion either excluded biomass characterization data or only included a limited number of variables. Data availability is challenging to be addressed merely by AI and modeling communities; it needs collaborative efforts from broad science and engineering communities on data collection, documentation, and share. In addition, fewer AI studies for biochemical conversion considered product quality, but AI studies for thermochemical conversion generally include quality variables such as HHV and product composition. Given the importance of product quality in bioenergy applications and scale-up, more product quality indicators should be included in future AI studies in this area.

| AI applications to bioenergy end-uses
Bioenergy has three types of end-uses: electricity, heat, and work (e.g., the chemical energy in biofuel is converted to kinetic energy by a vehicle engine that drives a vehicle to move). Table 4 (see Table S4 for detailed documentation) summarizes AI applications related to bioenergy end-uses.

| Biofuel used as transportation fuel
Biofuel has an enormous potential to mitigate GHG and other air emissions of the transportation sector (Liew et al., 2014). Previous AI studies have focused on the prediction of biofuel properties and vehicle engine performance. Many AI studies predicted cetane number that is an essential indicator of fuel's ignition characteristics Faizollahzadeh Ardabili et al., 2019;Kessler et al., 2017;Miraboutalebi et al., 2016;Mostafaei, 2018;Rocabruno-valdés et al., 2015). Other studies estimated the physicochemical properties such as viscosity, density, and iodine value using FAME content and chemical structure information of biodiesel (e.g., the number of double bonds, carbon and hydrogen atoms) and fuel blending ratio (when biodiesel is blended with conventional diesel fuel; Aminian & ZareNezhad, 2018;Özgür & Tosun, 2017;Razavi et al., 2019;Rocabruno-valdés et al., 2015). As the FAME content and yield are output variables of AI models for biomass conversion to biodiesel (Table 3), the trained AI models for cetane number could be integrated with those production models to improve/optimize biomass conversion to produce high-quality biofuel. Besides, AI can also assist the simulation of biofuel phase equilibrium behaviors by predicting the binary interaction parameters of local composition models with given temperature and biofuel/solvent composition (Reynel-Ávila et al., 2019).
Two studies compared AI approaches with traditional analytical methodb s. One study compared the performance of ANN with multiple linear regression (MLR) in predicting cetane numbers (Piloto-Rodríguez et al., 2013), and concluded that ANN and MLR showed similar prediction capabilities, but ANN may have advantages in modeling complex parameters. Another study compared ANN and the traditional Knothe-Steidley method and Ramirez-Verduzco method in measuring the kinematic viscosity of biodiesel (Meng et al., 2014). Their results showed that ANN has better performance, particularly in accounting for the contributions of the minor components to viscosities, while the other two methods underpredict the kinematic viscosities. Note that neither of these studies is included in Table 4 given their small dataset sizes. Therefore, those conclusions still need to be validated by comparative studies with larger datasets.
For engine performance, most studies used AI to predict indicators related to either energy or emissions (as shown in Table 4

| 787
LIAO And YAO parameters of vehicle comfort. Most studies used FNN, while several studies used ELM, a learning scheme of FNN (Huang et al., 2004), and concluded that ELM-based methods show better performance in predicting biofuel engine performance and emissions than other methods such as SVM and ANN (Aghbashlo et al., 2016;Sebayang et al., 2017b;. Most AI studies investigated biodiesel blended with fossilbased diesel as engine fuel, but several studies investigated blending biodiesel-diesel complex with other fuels, including ethanol (Ghaderi et al., 2019;Oǧuz et al., 2010;Silitonga et al., 2018), H 2 (Javed et al., 2015), zinc oxide nanoparticles (Javed et al., 2018), and natural gas (Çelebi et al., 2017). These studies used similar input-output variables as the AI models for biodiesel-diesel blends. Those AI models could help support future applications of different types of biofuels.

| Bioenergy used for heat and/or electricity generation
Compared with the literature for biofuel, AI studies for bioenergy used for heat and/or electricity generation are relatively limited. For heat generation, several studies applied AI to boiler systems that utilized biomass or co-fire biomass and coal to optimize boiler performance or address technical challenges. Romeo et al. trained ANFIS model using the historical and online data of a boiler to predict the steam output and fouling indexes (e.g., furnace, superheater, and evaporator), and the turbine power output was improved by 3.5% (Romeo & Gareta, 2009). Pital et al. used 1200 data samples from flue gas sensors to train FNNs whose input is the flue gas O 2 concentration, and output is the flue gas CO concentration, a key parameter of combustion efficiency, pollution control, and operation safety (Pital & Mižák, 2013). Tóth et al. used the flame image data collected from the digital camera in the boiler combined with operational parameters to train a DNN to predict real-time water temperature for identifying potential operating problems and facilitating robust control (Tóth et al., 2017). For electricity generation, AI has been applied to different technologies. Using the data from a biomass CHP plant in Sweden, two studies trained FNNs for individual facilities in the CHP plant (e.g., gas turbine, boiler, and steam turbine), but an integrated model for the entire plant has not been developed (De et al., 2007;Fast & Palmé, 2010). Djatkov et al. used two symbolic AI techniques, FL and ES, to improve the system performance for a biogas production plant (anaerobic digester) combined with a biogas power plant (Djatkov et al., 2014). One study trained an FNN model for a micro gas turbine, which predicted the facility power output and other conditions of the outlets (e.g., CO 2 emission, compressor outlet temperature, and pressure; Nikpey et al., 2014). Another study trained an FNN for small-scale electricity generation by solid oxide fuel cell, which used the flowrates of biogas contents (H 2 , CO 2 , CH 4 ) and operating conditions to predict the final output voltage (Baldinelli et al., 2018).

| Summary
Previous AI studies have demonstrated AI's potential in predicting biofuel properties and engine performance of pure/ blended biofuels, which could support engine test and design for biofuel use. Furthermore, these trained AI models could be integrated with biomass conversion models discussed previously to allow system-level planning and optimization to produce biofuel for high engine performance. Although most AI studies focused on environmental and energy (related to economic aspect) indicators, a few studies tried to incorporate the indicators relevant to vehicle comfort. How to leverage AI to consider different aspects of sustainability and technical performance could be a future research direction. For bioenergy used for heat generation, a few studies applied AI to improve the boiler performance or identify potential process issues. Previous studies have applied AI to different electricity generation technologies, ranging from CHP to solid oxide fuel cell. Given the large variety of different technologies, those AI studies do not have many similarities in variables, dataset sizes, or AI techniques.

| AI applications to bioenergy supply chain
Supply chain design and optimization are essential for largescale production and utilization of bioenergy. Conventional BSC modeling approaches include optimization (e.g., multiobjective mixed-integer linear programming) and simulation (e.g., Monte Carlo simulation), and most of them are computationally intensive, depending on the complexity, variability, and uncertainty of BSCs (Awudu & Zhang, 2012;H. Ghaderi et al., 2016;Shabani et al., 2013). Table 5 summarizes AI studies (documentation in Table  S5) by AI techniques, biomass feedstocks, bioenergy products, decision levels, and objectives. The summary of each study reviewed in this Section is available in Table S5. As many supply-chain parameters (e.g., biomass availability) are subject to locations, the geographic locations of case studies explored in each paper reviewed are provided in Table S5. The strategic, tactical, and operational decisions refer to longterm, medium-term, and short-term decisions, respectively (Ba et al., 2016). Table 5 shows that most studies published between 2008 and 2013 focused on bioenergy for heat and electricity generation, while studies after 2014 focused more on biofuels, including biodiesel, bioethanol, biogas, bio-oil, biobutanol and fuel pellets. In addition, most AI applications are for region-specific case studies with single or multiple biorefineries, except for several studies at larger scales such as county-level analysis (Arabi et al., 2019;Asadi et al., 2018;Durand et al., 2012). Most studies in Table 5 concentrated on strategic (e.g., location selection problem) and/or tactical levels (e.g., inventory planning) except one study that focused on weekly BSC operational planning for heat and electricity generation using PSO and GA (Pinho et al., 2018). Another exception is Mayerle et al., who implemented the first-fit-decreasing (FFD) heuristic algorithm based optimization with the combined strategic, tactical, and operational decisions (Mayerle & Neiva de Figueiredo, 2016). Another trend is that the number of studies with mixed decision levels (mixed strategic-tactical and strategic-tactical-operational) increased in recent years. As mixed decision levels usually indicate higher complexity (De Meyer et al., 2014) that may lead to computational barriers, heuristic algorithms can help solve multi-decision-level BSC optimization problems in a reasonable runtime (Baños et al., 2011;Pezzini et al., 2011). Previous studies have demonstrated AI as a promising option to overcome computational barriers and provide nearoptimal solutions for BSC problems. Many studies combined traditional optimization methods with heuristic algorithms. Several studies highlighted the reduction of computational costs by using heuristics (Asadi et al., 2018;Kumar et al., 2015;Lopez et al., 2008;López et al., 2008), especially when solving BSC models with a large number of variables and constraints (Asadi et al., 2018). Kumar et al. reported that adaptive large neighborhood search (ALNS), a heuristic algorithm, took 140 s while CPLEX took ~3 h to find the solution for the same BSC problem . A few studies showed a longer computational time comparing heuristic-based optimization (that does not have a commercial solver) with commercial solvers such as LINGO (Sarker et al., 2018(Sarker et al., , 2019Wu et al., 2015). However, compared with open-source mixed-integer non-linear programming (MINLP) solvers, BONMIN and NOMAD, heuristics can achieve better values for the objective functions (Sarker et al., 2018(Sarker et al., , 2019Wu et al., 2015). However, all three studies were conducted by the same authors and this advantage of heuristics was reported for similar biogas systems. More studies for different biomass system will be needed to make a generic conclusion regarding the advantage of heuristics. Several studies compared the computational time of different heuristics and concluded that the performance of PSO in reaching the near-optimal solution surpassed other types of heuristics (Gómez-González et al., 2013;López et al., 2008;Vera et al., 2010), with an exception that the GA performs better than PSO in optimizing the net present value of the given BSC scenario (Rafiel & Zadeh, 2013).
Furthermore, some studies leveraged data-driven AI techniques such as FNN and SVM to develop predictive models when the knowledge of input-output relationships are too limited to use traditional optimization/simulation approaches. Mirkouei et al. used SVM to predict the biomass quality, and accessibility indicators that were then used in an optimization model for the biomass-to-bio-oil supply chain in Oregon, U.S., and the model was solved by GA (Mirkouei et al., 2016(Mirkouei et al., , 2017. Another example is the application of ANN to predict soil erosion and soil conditioning index of the corresponding region (K. Sahoo et al., 2016), while in the past, such information is very challenging to be incorporated into BSC optimization/simulations due to high computational load and difficulty to measure (Mirkouei et al., 2016(Mirkouei et al., , 2017Sahoo et al., 2016).
Several reviewed studies utilized FL-based methods to address BSC uncertainties related to biomass availability (Arabi   Khishtandar, 2019;Tong et al., 2014), biofuel demand (Tong et al., 2014), land use (Balaman & Selim, 2014, 2015, and biomass prices (Balaman et al., 2018;Khishtandar, 2019). In addition, two studies used ABM to simulate the individual behavior across BSC and then implemented the simulation-based optimization (Kim et al., 2018;Shu et al., 2017). More case studies are needed to generate standardized, widely accepted, and easily implemented AI selection procedures and applications. Given many AI algorithms, selecting suitable algorithms is critical and highly depends on the scope, data, and structure of BSCs. Table 5 shows large variations of AI techniques, and those studies can be helpful references for the future and applications of AI. More efforts are also needed to explore AI applications regarding the environmental and social aspects of bioenergy systems. Table 5 shows that most studies have focused on economic objectives. Only a few studies included environmental objectives, such as minimizing GHG emissions. A recent study proposed a framework that applied ABM simulation to assess the sustainability of bioenergy regionally (Rouleau & Zupko, 2019), which evaluated the sustainability from economic, environmental, and social aspects. This study suggested using standardized methods such as Life Cycle Assessment (LCA) to evaluate the environmental impacts of a product's life cycle that covers most activities involved in BSC. Many LCAs have been conducted for biomass-based products (Liao et al., 2020;Peters et al., 2015;Vienescu et al., 2018) and the integration of LCA to BSC optimization has been explored (Yue et al., 2014). One major challenge of conducting LCA for bioenergy systems, especially for new technologies or feedstock, is the lack of Life Cycle Inventory (LCI) data (e.g., energy consumption and air emissions; Yao & Masanet, 2018), and AI could be a promising tool to address this challenge. Several studies used AI techniques such as ANN to generate missing LCI data (Hischier et al., 2005;Liao et al., 2019Liao et al., , 2020Song et al., 2017). A few studies used machine learning approaches to generate spatial/temporal explicit LCI data for biomass production Romeiko et al., 2019Romeiko et al., , 2020. However, more case studies are needed to understand the effectiveness of different AI techniques in addressing the data challenges of LCA. Regarding the social aspect, more research is needed to understand what specific social indicators are critical to BSC design and planning and how AI could help estimate/analyze/generate these indicators.

| Summary
Most AI studies in this section concentrated on optimizing BSCs at different decision-making levels, especially at strategic and/or tactical levels. Besides, heuristics-based BSC optimizations focused on mixed decision levels in recent years. A variety of heuristics have been used, and a few studies reported lower computational costs of heuristics compared with traditional optimization methods. AI has been used to address the modeling challenges of traditional BSC optimization, such as unknown input-output relationships, uncertainty, and inclusion of individual behavior. Most studies optimize economic performance, and future research should explore AI applications for understanding and optimizing the environmental and social impacts of bioenergy systems. Integrating AI with LCA is a promising research direction, and a few studies already show the benefits of AI in addressing data challenges of LCA for different biomass systems.

| CONCLUSIONS
This review identified four aspects that previous AI applications focused on: predicting biomass feedstock properties, predicting process performance of biomass conversion to bioenergy/biofuels, predicting biofuel properties and the performance of bioenergy end-use systems, and design, planning, and optimization for BSC.
The review demonstrates the powerful capability of AI in addressing research, measurement, and modeling challenges in the bioenergy areas. This study identified AI applications in generating data that is time-consuming, expensive, or hard to directly measure. These include biomass properties, biofuel properties, kinetic parameters, engine performance, and LCI data. AI shows superior performance than traditional models in many studies for different biomass conversion pathways. Previous studies have used AI to address the limitations of traditional optimization and simulation approaches, such as uncertainties, reliance on knowledge-based models, and high computational costs.
For future research, efforts are needed in three areas: • Developing standardized and practical procedures for selecting algorithm and determining dataset size. Developing such procedures will require comprehensive understandings of the impacts/effectiveness of different algorithms and training data samples to solve various bioenergy problems. More case studies for bioenergy systems with different biomass feedstock, conversion technologies, and products will be needed. • Enhancing data collection, documentation, and sharing across bioenergy-related areas. Effective applications of AI techniques in the bioenergy area rely on the availability and quality of data that are mostly generated by independent experimental studies and improved data sharing from broad science and engineering communities. Relevant measures can be proposed to enhance data availability and quality, for example, getting AI experts involved earlier in experimentation (especially for high-throughput experimentation). Also, more accessible, well-documented, and high-quality data are needed to broaden AI applications to support decision-making related to bioenergy systems design, operations, and optimizations. • Exploring the potential of AI in supporting the sustainable development of bioenergy systems from a holistic perspective. Previous studies have either focused on a single aspect of sustainability (e.g., economic) or an individual part of bioenergy systems (e.g., biomass conversion). Analytic tools such as LCA have been a foundation to establish holistic assessment, and AI-integrated models could promisingly address the challenges such as lack of data. More explorations are needed to identify suitable and effective ways to leverage different AI techniques and existing tools to enable holistic assessment and optimization of bioenergy systems towards sustainability.