Development of Artificial Neural Network System to Recommend Process Conditions of Injection Molding for Various Geometries

This study combines an artificial neural network (ANN) and a random search to develop a system to recommend process conditions for injection molding. Both simulation and experimental results are collected using a mixed sampling method that combines Taguchi and random sampling. The dataset consists of 3600 simulations and 476 experiments from 36 different molds. Each datum has five process and 15 geometry features as input and one weight feature as output. Hyper‐parameter tuning is conducted to find the optimal ANN model. Then, transfer learning is introduced, which allows the use of simultaneous experimental and simulation data to reduce the error. The final prediction model has a root mean‐square error of 0.846. To develop a recommender system, random search is conducted using the trained ANN forward model. As a result, the weight‐prediction model based on simulated data has a relative error (RE) of 0.73%, and the weight prediction using the transfer model has an RE of 0.662%. A user interface system is also developed, which can be used directly with the injection‐molding machine. This method enables the setting of process conditions that yield parts having weights close to the target, by considering only the geometry and target weight.

This study combines an artificial neural network (ANN) and a random search to develop a system to recommend process conditions for injection molding. Both simulation and experimental results are collected using a mixed sampling method that combines Taguchi and random sampling. The dataset consists of 3600 simulations and 476 experiments from 36 different molds. Each datum has five process and 15 geometry features as input and one weight feature as output. Hyper-parameter tuning is conducted to find the optimal ANN model. Then, transfer learning is introduced, which allows the use of simultaneous experimental and simulation data to reduce the error. The final prediction model has a root mean-square error of 0.846. To develop a recommender system, random search is conducted using the trained ANN forward model. As a result, the weight-prediction model based on simulated data has a relative error (RE) of 0.73%, and the weight prediction using the transfer model has an RE of 0.662%. A user interface system is also developed, which can be used directly with the injection-molding machine. This method enables the setting of process conditions that yield parts having weights close to the target, by considering only the geometry and target weight.
Direct discrete numerical optimization does not require explicit objective functions. However, it requires numerous simulations and cannot be used to develop an integrated process optimization system that covers various geometries and materials. Metamodelbased optimization, which is more popular than direct optimization, uses objective functions that are used to make approximate results with acceptable accuracy. Initially, the simplest metamodel-based optimization can predict the quality of products using a simple regression model. [7] Moreover, response-surface methodologies, artificial neural networks (ANNs), radial basis functions, and Kriging methods are common metamodels used for injection-molding optimization. [8,9] Among the various metamodels, ANN-based optimization has been widely applied, because they can adopt major technical advancements, such as new activation functions, improved initializers, dropout regularization, and emerging optimization. [10] Mok et al. attempted to apply an ANN to set the initial process conditions using a single hidden layer ANN under fixed geometry and material conditions. They demonstrated that ANN-based optimization could surpass trial-and-error-based time and accuracy. Various additional studies have been conducted to improve the performance of ANN-based optimization. Early studies developed ANN models based on a limited number of simulation data. For example, the Taguchi method was applied to 27 data samples for ANN training. [11][12][13] Due to the limited number of data, they select the test data samples from the 27 cases, which cannot ensure the accuracy about arbitrary data. Shen et al. collected the 252 samples that dramatically reduced errors. [14] Data sampling methods and data numbers have been studied using various design-of-experiment (DoE) methods to find the optimal sampling method with limited data. Full factorial sampling, the Taguchi method, and combinations have been applied to control process conditions using trained ANN models. [11,15] Input and output features were changed according to each research's objective. Mold and melt temperature, injection time, packing pressure, and packing time are usually applied as process conditions. Dynamic input data, such as pressure and velocity profile data (instead of a single point), are normally preprocessed using self-organizing maps (SOM). [16] For quality characteristics, most cases measure warpage and weight. The maximum von Mises stress is used as an output feature to evaluate the product's impact resistance based on the simulation data. [17] However, when experimental data are used, quality characteristic features are restricted to easy measurable features, such as temperature and weight (instead of warpage and maximum stress).
Metamodel-based optimization requires both an ANN prediction model (from process condition to quality feature) and an optimization method for the reverse model (from quality feature to process condition). Genetic algorithms and particle-swarm optimization have been used [18,19] to reduce computational cost and to find reverse solutions (output feature to input features). However, these studies only used simulation and experimental data. Others, however, used simulation data to develop a prediction model, testing it using experimental data. [20] Verification results about injected lens curvature showed 4.28-16.2% relative error (RE) between prediction and experimental results, which is too high for the real applications. Furthermore, previous models could not consider the characteristics of geometrically distinct components, because they only dealt with single mold. This limitation could be overcome using variety of mold data for training. Hopmann et al. developed a prediction model that could be applied to two different molds. However, each ANN model was used for only one; an integrated ANN model was not achieved. [21] The major contributions of this article are the introduction of versatile geometrical molds and applying transfer learning. Various molds-based ANN model is first introduced for a general prediction model. Most previous studies only changed the process condition under one fixed-geometry condition. Therefore, to apply a process-control system that uses an ANN, an engineer must conduct the simulation or experiment to gather the training data to create a surrogate model. However, we gather both simulation and experimental data from 36 different molds (3600 simulations and 476 experiments). After that, the influence of geometry information is quantified and used as an input feature. We also introduce transfer learning, [22] which can combine both simulation and experimental data, to improve accuracy. It can minimize the gap between the simulation and real data, and thus, it can improve the performance with limited experimental data. Furthermore, we conduct the systemic approach to determine data sampling and DoE methods, hyper-parameters, and optimization to maximize the ANN's performance. As a final process, the system is integrated with a real injectionmolding machine based on a user interface system. This research develops a system that recommends appropriate process conditions by considering only geometry and the final product information. With this system, non-expert engineers can set up the machine in a short time. [23] The remainder of this article is composed as follows. Section 2 describes the method of data acquisition for simulation, experiment, and geometry. Section 3 specifies the ANN algorithm for injection molding. Section 4 develops the process-condition recommender system. Section 5 presents discussion and conclusions. Overall procedure of development of recommender system is shown in Figure 1.

Data Acquisition
A data-driven ANN model required a training dataset of input and output features. Thus, we must first determine the input features, the output features, and the total sample number. Following data acquisition and training, the ANN model can predict its output using arbitrary input feature combinations. The result of the ANN prediction model is a prediction value, and the result of simulation or experiment (not the metamodel) is the actual or true value. An example of a complete dataset is shown in Table 1.

Sampling Method
Prior to gathering data, the process and quality features were determined. The input and output features required several necessary requirements: input features need to have a high influence on the product quality; it should be easy to quantify ANN training, and it should be easily measured by both experiments and simulations. Based on these requirements and previous Overall procedure of the process recommender system development. It comprises three main parts: data acquisition, ANN, and process recommender system. Several sampling methods were considered, but mixed sampling gave the best result. With this sampling method, data from simulation (3600 cases from 36 molds) and experiments (11 molds from 476 cases) were collected. Each datum contained 15 geometric features, which were quantified by CAD viewer software. Then, the data were preprocessed for ANN learning. To increase the prediction model ability, feature extraction and hyper-parameter tuning were applied. Furthermore, transfer learning was introduced to combine data from simulation and experiment to overcome the limitations of experimental data (i.e., high noise, small amount). The well-trained ANN model was used to develop a recommender system by applying a random search method. Finally, simulations and experiments were repeated to verify the recommender system. www.advancedsciencenews.com www.advintellsyst.com works, we selected melt temperature, mold temperature, filling time, packing time, and packing pressure as variables. In previous works, cooling time was used for high throughput. We assumed that the cooling time is set to be long enough to not require optimization, because we focus on product quality. As an output feature, only the product's weight is used. Many previous studies used warpage. However, in experiments, it has not been possible to measure displacement quickly and precisely. Thus, only weight was used. For this reason, we demonstrate that weight can represent the overall product quality by plotting the scattering of weight and warpage having linear coefficients of determination R 2 0.7023 ( Figure S2, Supporting Information). Thus, the weight can determine the product quality with the acceptable level. Data were collected for use in developing the ANN prediction algorithm. In general, the amount of required data increases if the number of input and output parameters is large and the relationship between input and output is complex and non-linear. [14] However, with most real problems, the relationship between input and output cannot be expressed as a simple equation. Thus, the complexity of given data must be determined empirically. [24] To find an optimal sampling method and the right number of experiments, we compared various DoEs. Geometric and material parameters were fixed, and only the process parameters were changed ( Figure 2a). All data were generated using simulation, not by experiment. We upped the number of data from 48 cases to 108 and 243. For each set, we compared full factorial sampling, [25] random sampling, Taguchi's method, [6,12] and a mixed sampling that combined Taguchi with random (Table S1, Supporting Information).
The same test set is required to determine the sampling method. To obtain an effective test result, we randomly selected the variables for each step size within the specified ranges ( Figure 2a) to generate 50 randomly sampled test sets. For example, for filling time, the range was 0.5-1.5 s, and the step size was 0.1 s. Therefore, we selected one of 0.5, 0.6, … 1.4, 1.5 s.
For the case of 48 conditions, the full factorial sampling used 3 Â 2 Â 2 Â 2 Â 2 (filling time Â melt temperature Â mold temperature Â packing pressure Â packing time, respectively) levels. Random sampling was performed the same way as for the test set. For Taguchi's method, 27 of 48 cases were collected among the combinations of three levels of the five features. For mixed sampling, 27 out of 48 cases were extracted using Taguchi's method, and the remaining 21 cases were extracted via random sampling.
The 108-and 243-sample cases were treated similar to the 48-sample cases. The 108-sample case with full factorial used 3 Â 3 Â 3 Â 2 Â 2 (filling time Â melt temperature Â mold The methods yielded different results ( Figure 2c). The error of each model was quantified using root mean-square error (RMSE) where N is the number of samples. y i is the predicted value of the ith data sample, and t i is its true value. www.advancedsciencenews.com www.advintellsyst.com Full-factorial sampling always showed poor results. Mixed sampling obtained the best results when the number of the sample was 48 or 243, giving similar results as those of random sampling when the number of samples was 108. Therefore, in this article, we used the mixed sampling method to maximize the accuracy of machine-learning algorithms using limited data. A training data example is described in Table 1.

Simulation
Machine-learning algorithms require different datasets to represent different patterns. There are various CAE approaches (e.g., finite volume method [FVM], boundary element method [BEM], and finite-element method [FEM]) for this purpose. FVM is only good for simple geometric processes, owing to its hexahedrons mesh type. BEM has limited capability handling inhomogeneous and nonlinear problems. Polymer behavior from solid to liquid is highly non-linear, making it difficult to solve using BEM. Therefore, FEM is the most appropriate method that can be used to describe the process of injection molding with high accuracy, despite being time-consuming. [5,26] Most commercial numerical injection-molding simulations are built using FEM, including Moldflow, Moldex, and Auto desk.
In this research, we simulated 36 molds of different shapes. Injection-molding simulation required 2000-3000 s per case, depending on the size and complexity of the mold. We conducted 100 cases for each mold, and we used a mixed sampling method. The total number of cases generated from 36 molds was 3600. Table S1, Supporting Information, shows an example of one of the molds. The simulations were conducted using Maps 3D (VMtech, Korea) CAE software. [27] The material was polypropylene, Hopelen J-150 (Lotte chemical, Korea) for all cases. We also verified the simulation model by comparing the results of arbitrary mold simulation and experimental data, as shown in Figure S1, Supporting Information.

Experiment
A simulation requires a long setup and execution times and has limited accuracy, and, although simulation and experimental results have similar patterns, experimental values are generally more accurate and useful than simulated ones. [28] Therefore, the process recommender system built from CAE data is not appropriate for application in real industry. Experimental data must instead be obtained to guide the controls of an injectionmolding machine. Unfortunately, experimental data are much more difficult to collect than CAE data. When process conditions are changed, data are only accurate after the system reaches a stable state. For example, when the mold temperature is reduced from 50 to 45 C, stabilization requires many cycles. Furthermore, to reduce experimental error, the average weight is used after repeating experiments ten times per condition. These constraints limit the amount of data to be collected, compared with the simulation case. We experimented with 50 conditions for each mold and used mixed sampling: 27 cases obtained using Taguchi's method, and 23 cases obtained by random sampling (Table S2, Supporting Information). Using 11 different molds yielded 650 conditions. However, we only used 476, because the others caused faults (e.g., burning, short shots, and sink marks). The same materials that were used in the simulation were used in the experiments. The measured weight was the total of the runner and the product.

Geometry
Geometry information was quantified and included among the ANN input variables. If the 3D or mesh information were to be used as geometry information, the dimension of the data would become too large, and learning could not be performed as normal. Therefore, only variables that can be automatically quantified using a printing stereolithography (STL) file were used to make injection-molding ANN models. [29] The 15 pieces of shape features (Figure 3a) were extracted and jointly developed using computer-aided design (CAD) viewer software (VMtech, Korea). Table S3, Supporting Information, shows the list of 36 molds and the data number of each.
Using the CAD viewer, a user can extract quantified geometry information using the product STL file without simulation. Most geometry features (e.g., volume, surface area, and projection area) can be easily obtained from the STL file without special methods. Some features, however, require analysis using appropriate algorithms. For example, we used the shrink-sphere method to obtain thickness information for the geometry of the injection-molding product. [28] In this method, to calculate the thickness of a point, several spheres are created surrounding that point. Then, each sphere's volume is increased until all spheres touch. Then, the thickness of the point is obtained by the smallest diameter that fits within the surrounding spheres. [30] Another feature is flow length, which is an index that evaluates the distance from the gate to the filling end, using only the shape information without simulation. To do this, the shortest path between elements is found using the Dijkstra algorithm. [31] The final output from the CAD viewer is Json-type data divided into entire information, cavity information, and gate information at the first level. Detailed values are included at the next level. To utilize this information, we developed a Python module that passes the geometry information of Json-type data from the CAD viewer.

Artificial Neural Network
An ANN algorithm was used to predict the final product properties. The smallest unit cell is a neuron. Each neuron multiplies its input by a weight, and then uses a transfer function to generate an output net j as where w ij is the weight value from j to i, x i is the i input value, n is the number of input, and b is the bias value. Then, net j passes to a non-linear activation function. Various activation functions can be used, depending on the model structure and data. The neuron is one node of the overall ANN structure, which consists of three parts: the input, output, and hidden layers.
The ANN training procedure consists of forward and backward propagation. Forward propagation calculates the final value using the updated weight value from input to output layers. Thus, appropriate updating of weight and bias values is the key point of ANN training. Backward propagation updates those weight and bias values to minimize the error between the forward  propagation result and the data output. [32] The gap between the target value and forward result is evaluated by calculating the RMSE. [33] Our weight-prediction model was generated in three steps. First, the data were preprocessed for machine learning. This process includes scaling and data separation. Next, the appropriate ANN structure and parameters for the data were selected using hyper-parameter tuning. We compared both grid and random searches for hyper-parameter optimization. Finally, transfer learning was used to combine simulation and experimental data.

Data Preprocessing
Data preprocessing consists of two main parts: unification of the distribution of the input data, and separation of the data into training, validation, and test sets. If the scales of the input values are different, the convergence speed slows, and the accuracy decreases. [34] The process conditions also have very different scales. For example, melt temperature varies from 180 to 200 C, whereas a filling time varies from 0.5 to 1.5 s. Feature scaling is performed to eliminate this scale difference. We want to use information about patterns of data in machine learning via scaling methods. [35] In this study, min-max normalization was used where P is a normalized datum, D is the input datum, D min and D max are, respectively, the minimum and maximum values of the original data, and P max and P min are, respectively, the upper and lower boundaries, after normalization. After normalization, all variables range from 0.1 to 0.9. After data re-scaling, the data were divided into a training set, a validation set, and a test set. The training set was used to learn the ANN model, and the validation set was used for hyper-parameter tuning. The test set was used to evaluate the accuracy of the tuned model. In general, the ratio between the validation and test sets is 10-15%, depending on the number of data. The validation and test set ratios decrease with the total amount of data. When the number of data is very small, "leave one out" cross-validation is used. [33] In this study, data separation was performed differently for the simulation and experimental cases, because the numbers of simulation data and experimental data were different. The simulation considered 36 molds (3600 samples). The 28 molds (2800 samples) were used as the training set, six molds (600 samples) were used as the validation set, and two molds (200 samples) were used as the test set. The experiment evaluated 50 samples from 11 molds. Some samples were not available. Thus, we used data that had been collected previously. We used ten molds (327 samples) as the training set, two molds (99 samples) as the validation set, and one mold (50 samples) as the test set.

Sensitivity Analysis
A sensitivity analysis was conducted using Equation (4) [36] to determine the main features for the prediction model. Sensitivity ≡ Output change ð%Þ= Input change ð%Þ (4) The relationship between the input and output of injection molding is not simply linear. Thus, we performed sensitivity analysis for changing ratios. We built an ANN model with simulation data. The volume-and surface-area-related features have a significant effect on the final product weight.
To select the final geometry features for an ANN model, we built several models with different geometries (i.e., different input dimensions) (Figure 3b). Models having 300 different hyper-parameters were generated for each number of geometry features used, and the appropriate number of geometry features was evaluated using the average of the top 10 and 300 ANN models. The averages of top-10 RMSEs were 0.373, 0.530, and 1.698 for 4, 9, and 15 geometry features, respectively. The average of the 300 RMSEs was 2.749, 4.542, and 6.215 for 4, 9, and 15 geometry features, respectively. The tendency was similar to the top-10 result (Figure 3c). Therefore, we used four geometry features for the final ANN model. In the general case, more features will help increase accuracy. However, in this article, the minimum geometry features case had the best RMSE, because the output feature was the only weight of products. Volume and surface area information have significant effects on the final product's weight compared with other features, such as flow length, projection area, and thickness. Moreover, owing to the lack of geometrical diversity (36 different molds), more geometry features were not available.

Hyper-Parameter Tuning
To build an ANN model, many parameters should be considered, (e.g., activation function, optimizer, initializer, learning rate, layer, node, epoch, dropout, and batch size). Those hyperparameters have significant effects on prediction accuracy. In this section, we compare and analyze various types of hyperparameters. For the activation function, optimizer, and initializer, we used the widely used methods. [37] Then, we conducted a random search to determine the remaining hyper-parameters (i.e., layer, node, epoch, dropout, learning rate, and batch size).
Determining the activation function, optimizer, and initializer is a very complex problem. The appropriate method depends on the data. Some papers have shown that the appropriate initializer depends on the type of activation function. [38] Therefore, in this article, we empirically determined this function by combining three hyper-parameters: activation function, optimizer, and initializer. We then used RMSEs to compare the error. For comparison, we applied the 3600 cases of simulation data and fixed the other hyper-parameters. The comparison test used one hidden layer, 30 nodes, a learning rate of 0.01, 3000 epochs, dropout turned off, and a batch size of 1500.
The choices of activation function, optimizer, and initializer affected the accuracy of the result. The average RMSEs, according to the activation function, were 7.691, 6.317, 5.655, and 4.893 for sigmoid, [39] tanh, the rectified linear unit (ReLU), [40] and the exponential linear unit (ELU), [41] respectively. This means that ELU was the most appropriate activation function for injectionmolding data. The initializer had a minor effect on the RMSE of the prediction model. For the optimizer, the accuracy was www.advancedsciencenews.com www.advintellsyst.com highest using Adam, [42] with accuracy descending using RMSProp, [43] gradient descent, [44] and Adagrad [45] in that order. The ELU activation function, the Xavier initialize, [46] and the Adam optimizer gave the best accuracy. Therefore, we used them for each model ( Table 2). Random search is an optimization method that can seek additional feature space for a given trial. In this research, we tried 324 random searches. The hyper-parameter list and its minimum, maximum, and step size were specified for each parameter (see Table 3). For example, in the case of the number of layers, from Steps 2 to 5, the step size was 1. Thus, we selected 2, 3, 4, or 5 as the number of layers. The RMSEs of the models obtained using grid search were 0.4, 0.416, and 3.112 for the top average, the top-10 average, and the average of the entire 324, respectively. The RMSEs of models obtained via random search were 0.314, 0.367, and 2.749 in the same order. These four comparisons indicate that the models obtained using the random search were advantageous in finding the global best for the same number of trials. The best model had two layers, 28 nodes, 5500 epochs, a 0.05 learning rate, a 0.9 dropout rate, and a 1152 batch size (Table 3).

Transfer Learning
Thus far, all processes were performed only via simulations. However, as noted, the results of the simulations and the experiments differ (Figure 4a). [22,41,47] Furthermore, practical applications require a model that is derived from experimental data. The amount of experimental data is limited. Thus, the ANN model did not give good results (RMSE 7.178, average RE 13.604%) when using the same hyper-parameters as the simulation data. These inferior results occurred, because the experimental data had more noise than the simulation data, so prediction was inaccurate. To solve this problem, we implemented transfer learning. In deep learning, a model can be trained with sufficient data and labels. However, this method results in overfitting or underfitting when the number of data is insufficient. Injection-molding data have similar problems. Simulation data are cheaper and easier to gather than experimental data. Transfer learning can conquer the immense error that occurs when only experimental data were used.
Transfer learning consists of two steps. First, a model is trained using a large number of cheap data. Then, only a few layers of the model are trained with the small amount of expensive target data (Figure 4b). In this article, we use the simulation data as source data and experimental data as target data (Figure 4c). This approach achieved better accuracy (RMSE 0.846, average RE 0.715%) than cases that used experimental data only (Figure 4d).

Process Condition Recommender System
We used a weight-prediction model to develop a system that recommends initial process conditions. This weight-prediction system uses 20 input features to predict the weight of the final product by combining the shape information and the process conditions. However, the system used to derive initial conditions should reverse the model to recommend process conditions that satisfy the target weight. This task requires an ANN forward model that predicts the weight accurately in real time from process conditions and weight. We also need an optimization algorithm to find the solution.

Random-Search Optimization
Random searching for optimization is very similar to random searching to find the hyper-parameter. The optimization search  Table 3. Hyper-parameter tuning for layers, nodes, epochs, learning rate, dropout rate, and batch size. Each of parameter has minimum, maximum, and step value. To find the optimal hyper-parameter, random search is conducted by combining those parameters (324 different models). As a result, two layers, 28 nodes, 5500 epochs, a 0.05 learning rates, 0.9 dropouts, and 1152 batch size had minimum RMSE error for validation data set. www.advancedsciencenews.com www.advintellsyst.com Figure 4. a) Example of weight difference between experimental and simulation data. They have similar patterns but different values because of simulation errors. b) Concept of transfer learning. c) Application of transfer learning to injection-molding data. "Large amount data" are 3600 cheap simulation data. "Small amount of data" are 476 expensive experimental data. The model first learns using the simulation data. Then, it copies the weight except for the last layer. Then, the final layer's weights are updated from scratch using the experimental data. Distribution of actual and predicted weights d1) before transfer learning (only experimental data) and d2) after transfer learning (both experimental and simulation data). Both graphs are arranged by ascending order for visualization.
www.advancedsciencenews.com www.advintellsyst.com can be completed in real time, because only the forward process of the trained ANN is used. The optimization that conducts the random search proceeded as follows: 1) extract the minimum and maximum values of input process conditions; 2) set a step size for each process condition, and make a candidate list; for example, for fill time, 0.5 s, 0.6 s, 0.7 s … , and 1.5 s; 3) combine the process features from the list from Step 2 10 000 times; 4) predict the list of 10 000 random process conditions to the trained ANN model; and 5) from the 10 000 weight-prediction results, extract the process conditions that satisfy the target weight range (Figure 5a).

Ordering System
The system that uses random search to optimize initial conditions yields 70-80 process conditions, depending on the model. However, the user only wants to obtain a few conditions that are approximate to the current state of the injection-molding machine. Therefore, we develop an ordering system that is easy to apply to injection-molding machines. The system comprises of four major steps: 1) normalizing all recommended process conditions and the current state of injection molding; 2) calculating absolute error from recommended process conditions to the current state; 3) obtaining the weight sum (Equation (5)); and 4) selecting the top ten process conditions and re-ordering by mold temperature.
The first step of normalization unifies each feature's effect to calculate the weighted sum at step 3. In our case, the normalized process conditions were subtracted from normalized values of the current machine state. Then, Equation (5) was used to calculate the weighted sum. We applied high coefficients for melt and mold temperatures, because temperature-related features are difficult to control. We calculated the weighted sum only for the top ten results. As a final step, we sorted these ten process conditions again by mold temperature. Using this ordering system, the user can apply the process condition from low-to-high mold temperatures (Figure 5a).

Verification
We built an ANN model that estimated product weight by considering geometry information and process conditions. As a result, we obtained two systems: one that was learned using simulation data, and the other that was created by transfer learning using combined simulation and experimental data. To verify that these systems are usable, we used CAE software for the simulation-driven model and conducted experiments for the transfer-learned model. Finally, ten recommendation conditions were re-simulated and re-experimented. The mean REs were 0.656% for a cup mold and 0.804% for a cosmetic container. Similarly, the system using transfer learning had an average RE of 0.662% in experiments (Figure 5b). These results indicate that the process recommender system can suggest reasonable process conditions at values near the target weight. The ANNbased system searches the proper process condition using various methods. Past recommended process conditions mainly used temperature, pressure, or velocity. Thus, they used only one optimal solution. Our recommender system, on the other hand, explores multiple optimal solutions. However, for actual system use, users still must manually control the machine to find the optimal process condition due to the error. These errors are caused by various reasons: simulation error, experimental measuring error, and ANN model error. Both simulation and experimental errors are hard to correct. Therefore, in this article, we focused on reducing the ANN-model error via gathering large datasets, hyper-parameter tuning, and transfer learning.

User Interface
The algorithm was trained using a powerful 2950x thread ripper (Advanced Micro Devices, lnc.) central processing unit workstation, a Titan volta 12 GB graphical processing unit (GPU) with python tensorflow GPU. However, the use of such expensive computers may not be viable in an actual industrial environment. Therefore, we stored only the forward model of the trained ANNbased prediction model and made it into an executable file, including a random search and an ordering system. Such files can be run in real time on a cheap panel computer. A serial communications protocol called Modbus is used to transfer the results from the panel computer to the injection-molding machine to help it automatically apply the process conditions. A user interface system was also developed to utilize and control these systems on the injection-molding machine screen (Figure 5c).

Discussion and Conclusion
We developed a system that recommends injection-molding process conditions in real time under fixed material conditions. To construct this system, various methods (e.g., data sampling, feature extraction, hyper-parameter tuning, transfer learning, and optimization) were applied. Several widely used methods were compared at each step to find an optimized method for the injection-molding process. Data were sampled using a mixture of Taguchi and random sampling. The 3600 cases of simulation data (from 36 molds) and 476 cases of experimental data (from 11 molds) were collected as training, validation, and testing for ANN model. We also conducted geometry-feature extraction guided by the results of sensitivity analysis. As a result, four geometry features related to surface area and volume were applied for the final ANN model. After gathering data, a random search was conducted to find the optimal hyper-parameters. Then, we used transfer learning to enable the model to use experimental data. The completed weight-prediction ANN model was used to develop a system to recommend process conditions via random search. Then, the final recommendation condition was extracted using an ordering system and verified by simulation and experiment. The final model achieved an average RE of Figure 5. a) Schematic of random search optimization. First, geometric features are extracted from the target molds. Then, 10 000 random combinations are generated within the range of minimum to and maximum of each process condition. Then, 10 000 random product weights are predicted, and the conditions that yielding the target weight are selected. As a last step, the nearest-10 process conditions are sorted and selected. b) Verification result of the process recommender system. Verification of simulation results of b1) cup and b2) cosmetic case and b3) experimental verification of the cup mold. Each graph's left-10 points are verification result, and right-50 or -100 points are the training source data. Comparison confirms that the recommender system can reduce the distribution of the product weight to near the target weight. c) Left: concept and equipment setup of connecting method between injection machine and panel computer; right: user interface for the process recommender system. Material and geometry information are needed to calculate and recommend the optimal process conditions. www.advancedsciencenews.com www.advintellsyst.com 0.73% from simulation and 0.63% from experiments. We then developed a user interface. This study differs from existing studies in two major ways. First, this one considers the geometry conditions. Second, we used data from both simulation and experiment. In general, a manufacturing system can consider three types of inputs: process conditions, geometries, and materials. Most prior studies that combined ANNs and injection molding only considered changes in process conditions of fixed mold shape and molding material. In contrast, in this article, the process was optimized by obtaining a large amount of data from a range of geometric conditions (36 molds). In addition, extant studies used only simulation or experimental data, whereas ours introduced transfer learning for both. This combined use of a small number of expensive experimental data and a large number of cheap simulation data achieved similar performance as methods using large experimental data.
As a result of the final verification, the weight mean RE was 0.63%, which is acceptable in an actual industrial machine. There are various reasons for this level of error. First, it exists in the training data itself. Simulation data have inevitable errors, as shown in Figure S1, Supporting Information. Similarly, as shown in Table S2, Supporting Information, experimental data have small fluctuations, despite the same process conditions. Moreover, the lack of experimental and geometry data increased the error. The number of molds and experimental data was insufficient compared with the complexity of injection molding. However, these data required a great deal of time and cost to acquire. Therefore, it is important to consider an alternative algorithm to perform well using a small amount of data. Nonetheless, this study changed the process and geometry conditions, but it did not consider changes in material. In addition, it used only the weight of the product as the index to evaluate product quality. This decision was made, because the output feature must be easily quantifiable and measurable with both experiments and simulations for various geometries. Therefore, further research should quantify the properties of the material, and data should be collected via simulation and experiments using various materials.
In conclusion, this research developed a system that even a non-expert in injection molding can use to set up process conditions using the artificial intelligence (AI) system. Assuming that the material is polypropylene, the user only needs a 3D file with an STL extension and the target weight of the product. Then, the recommender system suggests ten process conditions having less than 1% RE. As a final step, the engineer selects one condition among the ten recommended conditions and performs finetuning. Previous studies have only considered laboratory-level development that combined the ANN and injection molding. This article, on the other hand, shows the possibility of optimizing AI-based injection molding at actual industrial sites. Future work should improve the quality and quantity of data. Qualitative development should consider factors, such as material conditions, environment conditions, process features, and geometric features. Quantitative improvement should consider various molds and materials. As a result, the system will be able to respond quickly to new products and could be used to control the process conditions of a smart factory.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.