Artificial neural network based modeling for the prediction of yield and surface area of activated carbon from biomass

Activated carbon (AC) is an adsorbent material with broad industrial applications. Understanding and predicting the yield and quality of AC produced from different feedstock is critical for biomass screening and process design. In this study, multi‐layer feedforward artificial neural network (ANN) models were developed to predict the total yield and surface area of AC produced from various biomass feedstock using pyrolysis and steam activation. In total, 168 data samples identified from experiments in literature were used to train, validate, and test the ANN models. The trained ANN models showed high accuracy (R2 > 0.9) and demonstrated good alignment with the independent experimental data. The impacts of using datasets based on different biomass characterization methods (i.e., ultimate analysis and proximate analysis) were evaluated and compared. Finally, a contribution analysis was conducted to understand the impact of different process factors on AC yield and surface area. © 2019 Society of Chemical Industry and John Wiley & Sons, Ltd


Introduction
A ctivated carbon (AC) is an adsorbent material with high porosity, large adsorption capacity, and superior surface reactivity. 1 It has been widely used as an adsorbent and a catalyst in the manufacturing, pharmaceutical, water treatment, and agricultural sectors. 2,3 Activated carbon can be produced from a variety of feedstocks such as coal, petroleum residues, woods, agricultural residues, and other carbonaceous resources. 4 Using renewable feedstock such as biomass and agricultural / industrial byproducts (e.g., biochar) has gained increasing attention for its potential to reduce life cycle environmental footprints, and enhance the efficiency of natural resource utilization. 4 There are currently two general methods to produce AC from biomass -chemical activation and physical activation. Steam activation, a typical type of physical activation, has been commercialized to produce AC for environmental applications. 5 The quality and physicochemical properties of AC produced by steam activation are driven by many factors such as feedstock quality and composition, steam-to-carbon ratio, and operational conditions (e.g., time, temperature, and oxidative atmosphere) of the carbonization and activation processes. 6,7 To produce highquality AC products, it is critical to understand the effects of various biomass feedstock and process parameters. Furthermore, given a large number of biomass candidates, there is a need for decision-support tools that can be used by scientists, engineers, and industry to screen different kinds of biomass and to tailor the initial process design strategies for specific feedstock candidates.
Previous studies indicated that a high ash content in the feedstock leads to low adsorption capacity for AC. 8,9 Aworn et al. 10 tried to optimize the Brunauer-Emmett-Teller (BET) surface area of AC from agricultural waste at different activation temperature. They found that optimal activation temperature depends on the type of biomass feedstock and its composition -in particular ash content. 10 The effects of operational conditions on AC production have also been studied by separately examining the initial carbonization and subsequent activation steps. In the carbonization step, previous studies indicated that temperature and residence time are key parameters affecting the yield and quality of AC product. For example, Li et al. 11 investigated AC produced from coconut shell and found that higher carbonization temperatures resulted in lower yield and a higher BET surface area of AC. Regarding the activation step, most previous studies focused on the impact of activation temperature and time, and steam to biochar ratio (the total mass of steam in the activation step divided by the mass of biochar). Chen et al. 12 found that the yield of AC from pint nut shells decreased as the activation temperature increased from 750 to 950 °C. Similar trends have been reported in other studies using Sicyos angulatus L. and bamboo wastes as feedstock. 13,14 Another study indicated that increasing steam-to-biochar ratio decreases AC yield. 15 A few studies have tried to develop predictive models by statistically analyzing the relationships between process conditions and the quality of final AC products. Azargohar et al. used a central composite design (CCD) to understand the impacts of activation temperature, activation time, and steam-to-biochar ratio on the yield and BET surface area of AC produced by steam activation of spruce wood and lignite coal. 16,17 Other studies used CCD to optimize ACs for methane adsorption, 18 chemical oxygen demand removal, 19 and iodine number. 20 Most previous studies have focused on analyzing operating parameters of steam activation process, while fewer have conducted a systematic evaluation of the biomass feedstock type and composition. Jiang et al. 21 used different machine learning approaches such as linear regression, support vector regression, and random forest regression, to predict the methylene blue number and iodine number of the strawbased AC product. Their study included the composition data of three different types of straw as well as both carbonization and activation processes. 21 However, none of this prior work tried to evaluate or predict the interactions between a wide variety of biomass feedstocks and a wide variety of carbonization and activation conditions. This work fills gaps in the prior work, and explicitly addresses the impacts of the interaction between the biomass source and the activation process on the final AC properties using artificial neural networks (ANN). ANN is a data-driven approach that can reveal complex relationships between multiple inputs and outputs without the need for a mathematical description of the phenomena. 22 Unlike a programmed model, an ANN model is a trained system that has adaptability and improved accuracy with updated data. 23 It has been applied to predict the process parameters of biomass conversion such as oil extraction yields, 24 kinetic parameters of pyrolysis, 22 and selectivity of pyrolysis products. 25 However, to the best of our knowledge, ANN has not been applied to the steam-activation process to predict AC yield and surface area. With the ability to unveil complex relationships between input and output variables, ANN has great potential for predicting the quantity and quality of AC products derived from different biomass feedstock.
In this study, ANNs are developed to predict the total yield and BET surface area of AC production. The ANN model presented in this study can be used for early-stage screening of biomass feedstock before intensive investment in research and development (R&D). It can also be used by a broad range of stakeholders such as scientists, engineers, and project managers to enhance experimental and process design. In addition, the predicted data (e.g., AC yield) from ANN models can be used in the process simulation (e.g., Aspen Plus simulation) for AC production using different biomass feedstock, where AC yield is a key input data. 26

Material and methods
In this study, two multilayer feedforward ANNs were developed to predict the total yield (Y T ) and BET surface area (S BET ) of AC production. The Y T and S BET are com-monly used to indicate the performance of AC production due to their dominant impact on the production costs and performance of ACs. 3,27 In feedforward ANNs, the outputs of one layer of neurons are fed into the next layers of neurons without any backward connection. 28 This type of ANN has been applied to many chemical engineering problems. 29 Using the common definition from the literature, Y T is defined as the dry weight of AC product divided by the dry weight of the feedstock, as shown in Eqn (1). S BET is determined by the BET method, which is a common method for measuring the surface characteristics of porous solids. 30 Y T u mass of AC product mass of feedstock 100% (1) The ANN models in this work were trained using 11 input parameters, including six related to feedstock composition (i.e., carbon content, hydrogen content, and oxygen content from ultimate analysis; fixed carbon, volatile matter, and ash content from proximate analysis), two related to carbonization conditions (i.e., carbonization temperature and time), and three related to activation conditions (i.e., activation temperature, time, and steam to biochar ratio). The input parameters were collected from 20 literature references that studied a wide variety of biomass feedstock (see Table S1 in the supplementary material for data samples).
To test the capability of the ANN models for predicting Y T and S BET across a broad range of biomass feedstock, an extra validation was performed using a relatively comprehensive set of experimental data from Chen et al. 12 The work by Chen et al. 12 used pine nut shell as a feedstock, and none of the other data sets used in the training the ANN model included pine nut shell as a feedstock. A comparison between the two alternative data sets used for model validation will be discussed in detail below. A contribution analysis was performed to understand the impacts of feedstock composition and process parameters on Y T and S BET across different biomass feedstock.

Analytical scenarios
The input data were all collected from the literature. Based on a preliminary evaluation of this data it was found that many studies did not measure both ultimate and proximate analysis. This has been attributed to the cost and complexity of the special instruments used to measure ultimate analysis, while proximate analysis is relatively easy to measure. 31 This limitation was addressed by creating three scenarios to investigate the impacts of alternative attributes to measure biomass characteristics on the accuracy of ANN models: • Scenario 1: proximate analysis + carbonization conditions + activation conditions (eight input variables). • Scenario 2: ultimate analysis + carbonization conditions + activation conditions (eight input variables). • Scenario 3: proximate analysis + ultimate analysis + carbonization conditions + activation conditions (11 input variables).

Experimental data collection and preprocessing
To provide some level of consistency the references were screened using the following criteria: • Biomass feedstock was identified with transparent data for elemental analysis and / or proximate analysis. • Data on Y T and/or S BET included a clear description of the analytical method and calculation procedure. • Specific time and temperature information on carbonization and activation processing conditions. • Carbonization dwell times were greater than 10 min, and there was an oxygen-free atmosphere for both the carbonization and activation steps. • Peer-reviewed journal articles in the past 20 years.
The data can be preprocessed by normalization to enhance the learning speed of ANN, as the output of ANN becomes more sensitive to variation in input values. 32,33 In this study, the data are normalized by a linear normalization function as shown in Eqn (2). 33 where S i,j is the normalized value for each data point, V i,j , in the data sample i (i = 168) for parameter j (j = 8, 8, and 11 for input parameters in three scenarios and 2 for output parameters). Min j and Max j are the minimum and maximum for parameter j.

Artificial neural network
In this work, multilayer perceptron (MLP) feedforward neural networks were developed to predict Y T and S BET . A typical MLP structure consists of three types of layers 34 : an input layer representing input variables, an output layer including output variables, and one or more connected hidden layers simulating the correlations between inputs and outputs. The structure of the ANNs developed in this study is shown in Fig. 1. Neurons in hidden layers connect normalized input to output data by the activation function (f act ) as shown in Eqn (3): where x i is the ith component of input vector; y j is the output of jth neuron; w ij is the weight from ith component of the input layer to the jth neuron in the hidden layer; θ j is the bias of jth neuron in the hidden layer, which is summed up with other input values of jth neuron to provide an additional adjustable parameter. Equation (3) is also used for the output layer to process the data fed by hidden layers. Typically, different activation functions are selected for hidden and output layers. In this study, the tan-sigmoidal function (tansig as shown in Eqn (4)) was used as the activation function of hidden layer, and linear transfer function (purelin, as shown in Eqn (5)) was selected for the output layer. This combination was selected based on its effectiveness in previous studies. 22,24,[35][36][37] tansig( ) The number of hidden layers and the number of neurons in each hidden layer are important parameters in constructing the ANNs. 38 In this study, the number of hidden layers was selected to be similar to previous studies with similar data sample size. 22 The number of neurons in the hidden layer must be determined carefully. Setting the number of neurons too high may lead to overfitting, where the ANN model learns unnecessary details of training data 39 and then has decreased value for predicting the behavior of new samples. Setting the number of neurons too low (offsetting) may result in a failure to reach the optimal conditions and leads to 'underfitting' in the training process, with a subsequent decline in the quality of any future ANNpredicted data. 40 There are no standard rules for selecting suitable number of neurons in hidden layers, 41 so the number of neurons is commonly determined by trial and error. 42 In this study, two approaches are used to avoid offsetting and overfitting. First, the number of neurons (from 10 to 20) was varied during the training, and those with reasonable combinations of computation time and high R 2 were kept (e.g. convergence in 1000 iterations with overall R 2 > 0.9).
Then the simulation results were tested against the independent experimental data with changing input parameters (e.g., temperature changing from 750 to 950 °C) to check for overfitting. This allowed for determination of the final number of neurons for ANN models for each combination of inputs and outputs.
Training an ANN model is the process modifying the connecting weights between neurons (w ij ) and biases of neurons (θ j ) of ANN (Eqn (3)) according to the training data. 42,43 Different algorithms are available to perform this training process, and back-propagation algorithm (BP) is one of the most widely used algorithms in MLP training. 41 Thus, it was selected to train the ANN models developed in this work. To ensure the generalization capacity of ANN models, early stopping and regularization are common strategies used along with the training algorithm. 44 In this study, the Levernberg-Marquardt method (LM), one common early-stopping strategy, was used along with the BP algorithm. Using the combined LM-BP algorithms, the 155 of 168 member dataset was randomly partitioned into 'training', 'validation', and 'test' sets. 45 Another 13-member dataset from Chen et al. 12 was used in a second validation process discussed below.

Measure of model accuracy
In this work, the training set was used to generate ANNs with varying weights and biases. The validation set was used to estimate the error of ANN models and check the stopping criteria. For example, the ANN models stop when the number of iterations (epochs) reaches a maximum, the value of performance indicator on training set reaches the target, and the number of continuous iterations that have increased performance of validation set reaches the predefined validation check time. The test set was used to evaluate the performance of ANNs. In this study, the error of ANN on a dataset was calculated by the mean square error (MSE, given by Eqn (6)), a commonly used cost function for a LM-BP algorithm. The performances of ANNs were measured with other performance indicators such as correlation coefficient (R 2 , given by Eqn (7)), and mean average percentage error (MAPE, given by Eqn (8) where y i,exp is the value of the ith experimental data, y i,sim is the value of the ith ANN predicted data, y avg is the average value of the experimental data, and n is the size of dataset.
In this work 70%, 15%, and 15% of the 155 samples were randomly allocated to the training, validation, and test sets. Thirteen independent data samples that were not included in the previous three sets were used to test the model in a second validation and test for overfitting. 12,22 The ANN training was conducted using MATLAB R2017a Neural Network Toolbox. The stopping criteria were 1000 epochs as the maximum number of iterations, 10 -4 as the target MSE, and 20 times as the predefined validation check time.

Contribution analysis
The Garson equation (Eqn (9)) was used in this work to understand the relative importance of each input parameter: 46 where I i represents the relative importance of the ith input parameter of the ANN model, n represents the number of neurons in the hidden layer, m represents the number of input variables, w ij represents the weight between the ith neuron in the input layer and the jth neuron in the hidden layer, and w jk represents the weight between the jth neuron in the hidden layer and the kth neuron in the output layer.

Results and discussion
ANNs and validation  low MAPE value suggests the high robustness of the ANN models.
As discussed in the previous section, three scenarios were developed to investigate the impacts of using different data as inputs for the biomass composition. Scenario 1 is based only on proximate analysis data, scenario 2 is based only on the ultimate analysis, and scenario 3 uses both sets of data. Based on the performance indicators shown in Table 1, there were no significant differences in using different biomass characterization data for ANN training to achieve high accuracy. However, in the extra validation step using only the data from Chen et al. 12 , the ANN models trained in three scenarios demonstrate different capabilities in predicting Y T and S BET for AC derived from a biomass feedstock that was not in the original training datasets. This capability is critical if this ANN approach is to serve as an effective tool to enhance biomass R&D decision making, such as screening biomass feedstock for high Y T and S BET , or preselecting operational conditions for experiments.
In the work by Chen et al., the carbonization temperature and carbonization time were fixed at 600 °C and 10 min. The activation conditions were varied between 750 and 950 °C for activation temperature, 40-120 min for activation time, and 0.5-2.5 kg kg −1 for steam-to-biochar ratio. Activation temperature, activation time, and steamto-biochar ratio were varied based on three experimental scenarios in their study, including varied activation temperature with fixed activation time (60 min) and steam-tobiochar ratio (2 kg kg −1 ); varied activation time with fixed activation temperature (1123 K) and steam-to-biochar ratio (2 kg kg −1 ); and varied steam-to-biochar ratio with fixed activation temperature (1123 K) and activation time (80 min). Figure 2 shows validation results for the Chen et al. data, predicted from the ANN models developed for scenario 1, i.e., proximate analysis data only. Figure 2 shows good alignment (error% < 30%) between the ANN-predicted response and the experimental data for Y T , and a more modest correlation between the predict and measured S BET . Overall, these results demonstrate the potential for the ANN models to predict the Y T and S BET area for AC over a wide variety of processing conditions. Figure 3 shows the results for using the ANN models developed from ultimate analysis data (scenario 2) to predict the Y T and S BET . Similar to Fig. 2, there was a good alignment between simulated and experimental results. However, the nonlinear relationship between steam to biochar ratio and S BET was not captured in this simulation, where S BET decreased then reached a plateau as the steamto-biochar ratio increased. One possible explanation for the difference could be that the impacts of steam on S BET depend on diffusion rate and porosity of materials, 47 which are significantly affected by the physical property of biomass feedstock such as fixed carbon and volatile matters. 48 Such properties are commonly measured by proximate analysis and are not included in the training datasets in scenario 2. Figure 4 shows the extra validation results for ANN models developed using both ultimate and proximate analysis data as inputs for the ANN model in scenario 3. With this combined set of input data, there is a strong correlation between predicted and experimental Y T across a    49 demonstrated that the dataset size of an ANN model with ten and more input variables should be 20 times larger than one with eight input variables. For scenario 3 in this study, the dataset needs to be significantly expanded to improve the ANN model, which is not feasible due to lack of data availability but could be achieved in the future when more data are available.
Another strategy is to reduce the complexity of ANN -in other words, to reduce the number of input variables, 50 which is reflected through the three scenarios in this study. Scenarios 1 and 2 have fewer input variables (or say less complexity with eight input variables) than scenario 3 (11 input variables). In all three scenarios (Figs 2-4) the ANN models predict Y T better than S BET . One explanation is the differences in the anatomical and macromolecular morphology of the different biomass samples included in the original training set. There are significant anatomical differences between dense biomass materials such as nut shells, and relatively porous wood biomass with its open structure of fiber lumens and vessels. Even within a single class of material such as wood there are large differences in density, the chemical structure of lignin and hemicelluloses, the diameter of fiber lumens, and interconnections of ray cells. There will also be significant differences between biomass sources related to the detailed macromolecular-level organization of the three dominant biomass structural components -cellulose, hemicellulose, and lignin. All these factors will lead to different chemical reactions, and  differences in the formation microcracks, that can allow gases and vapors to escape the residual solid as the AC forms. Finally, while the ash content is included as a input for the proximate analysis, there are well known catalytic effects of alkaline earth elements on the decomposition of cellulose and hemicelluloses. 51,52 The total ash content is thus not the best indicator for the effects of minerals on biomass thermal decomposition reactions. The weights (w) and biases (θ) generated by the best performing ANN models are summarized in Tables 2 and 3 (scenario 2 for Y T prediction, R 2 = 0.962 for the testing set, and scenario 1 for S BET , R 2 = 0.940 for the testing set, prediction based on the highest R 2 and alignment with independent experiment data in the extra validation step). The ANN models presented in this work can be fully reproduced by using the data shown in Tables 2 and 3, and the predefined ANN structure as described in the artificial neural network section above.
These models can be used for biomass screening, and process simulation and optimization for AC production. For example, researchers developing process simulation models of AC production in Aspen Plus, where Y T is one of the critical inputs, 26 can use these models to predict Y T . This prediction can provide significant savings of time and money, related to having to make a number of experimental runs with multiple sources of biomass.

Relative importance of input parameters
The relative importance of each input parameters for Y T and S BET is quantified with the Garson equation (Eqn (9)) and shown in Fig. 5. Because all three scenarios show good alignment and reasonable accuracy in predicting Y T , the results of all three scenarios for Y T are included. As the ANN models in scenario 2 and scenario 3 cannot reflect the nonlinear relationship between activation operations and S BET , Fig. 5 only includes the result for the S BET prediction in scenario 1.
Regarding the prediction of Y T , the results in all scenarios reveal the relative importance of input parameters in descending order: feedstock properties, activation conditions, and carbonization conditions. For the S BET , the activation conditions have the largest impact, followed by feedstock (proximate analysis), and carbonization conditions.
The relative importance of variables in this study can be used to improve the design and operation of the AC production process. More parameters (e.g., particle size distribution, anatomical features or detailed ash composition of biomass, and the heating rate of carbonization or activation) could be included in future work to predict other performance indicators (e.g., energy consumption or environmental footprints). Data availability and quality will be the challenges. For example, during data collection and cleaning for this work, the authors found a number of papers that did not report steam use and final Y T . The authors also observed different terminology and calculation methods for the same parameter in different papers. To expand the datasets of this work and advance ANN models for AC property prediction or other applications, developing and encouraging standardization of terminology, data reporting, and documentation across academic communities and related research areas will be the key.

Conclusions
The ANN models show high accuracy and alignment with the experimental data for Y T . For the S BET , the ANN models using proximate analysis data for biomass composition show the best alignment with the experimental data. The contribution analysis shows the large impacts of activation conditions and feedstock properties, indicating the importance of optimizing raw materials and activation conditions. The ANN models can be a powerful tool for feedstock selection and process design, as well as to generate Y T and S BET data that are needed for process simulation and assessment.