Optimisation of laser welding of deep drawing steel for automotive applications by Machine Learning: A comparison of different techniques

Laser welding is particularly relevant in the industry thanks to its simplicity, flexibility and final quality. The industry 4.0 and sustainable manufacturing framework gives massive attention to in situ and non‐destructive inspection methods to predict laser weld final quality. Literature often resorts to supervised Machine Learning approaches. However, selecting the ApTest method is non‐trivial and often decision making relies on diverse and unclearly defined criteria. This work addresses this task by proposing a statistical comparison method based on nonparametric tests. The method is applied to the most relevant supervised Machine Learning approaches exploited in literature to predict laser weld quality, specifically, considering the optimisation of a new production line, hence focussing on supervised Machine Learning methods that do not require massive data set, that is, Generalized Linear Model (GLM), Gaussian Process Regression, Support Vector Machine, Classification and Regression Tree, and Genetic Algorithms. The statistical comparison is carried out to select the best‐performing model, which is then exploited to optimise the production process. Additionally, an automatic process to optimise Machine Learning models and process parameters is resorted to, basing on Bayesian approaches, to reduce operator effect. This work provides quality and process engineers with a simple framework to compare Machine Learning approaches performances and select the most suitable process modelling technique.


INTRODUCTION
Manufacturing in Industry 4.0 makes extended use of artificial intelligence (AI) to analyse the big data collected during manufacturing process to qualify, model, optimise and control the quality of the products. 1The big data analytics supported by machine learning (ML) techniques are a pillar of the digital transformation of the manufacturing sector, which is finding realisation at different levels of manufacturing, spanning from pre-production and research and development to actual production lines. 2In turn, this is essential for the creation of digital twins of manufacturing processes that are a key enabling technology to deploy Zero-Defect Manufacturing for sustainable production successfully. 3,4elding is one of the most widely exploited joining technologies, finding applications in several sectors. 5Amongst the several available welding techniques, laser welding (LW) is particularly interesting.6][7] Therefore, it allows higher flexibility, effectiveness and productivity, making it highly attractive for industrial applications, for example, in aerospace, automotive, military, shipbuilding and electronics. 8However, the inherently chaotic nature of the laser system is liable to introduce several defects, for example, sputter and weld break-ins, and ultimately requires tight, adaptive and responsive quality controls.Currently, in situ controls allow achieving high informativity in real-time by means of non-destructive procedures and typically exploit high-speed optical and thermal cameras, X-ray computer tomography (X-CT), and acoustic and optical sensors. 8The massive amount of data calls for AI processing by machine vision and ML techniques to extract relevant features for real-time quality control [9][10][11] and establish analytical and empirical models to predict the final quality of the component. 8,9,12,13Indeed, post-process inspections are also essential.These aim at inspecting the weld geometry and detecting visible and internal defects.They are based on eddy current, X-CT and ultrasonic techniques and sometimes on destructive inspections that require cross-section and inspection by optical microscopes.Although highly expensive, the latter is mandatory when innovative materials are being developed or introduced in the manufacturing line.In particular, the penetration depth of the weld bead is one of the most critical parameters to ensure the final quality of the process and ensure adequate mechanical properties. 8hus, establishing empirical models to predict and control it from process parameters is essential in pre-production and in R&D.Literature applies several ML techniques to model these kinds of relationships, ranging from supervised to unsupervised techniques. 7,8,14esign of Experiments (DOE) and relevant analysis methodology, for example, response surface methodology (RSM) and Generalized Linear Model (GLM), have been extensively adopted.Abioye et al. 15 applied DOE and GLM to maximise the penetration depth of disk laser welding of aluminium alloys.Sathish et al. 16 exploited Taguchi design to optimise laser welding of butt joints of aluminium alloy in terms of mechanical strength.Similarly, Torabi and Kolahan 17 optimised via RSM the weld bead to maximise the ultimate tensile strength for thin stainless steel.Ozkat et al. 18 deployed RSM to achieve a physics-driven model of the weld bead geometry and coupled it with FEM simulative models.Indeed, similar approaches have been deployed to other welding processes, for example, spot welding by Satpathy et al. 19 and friction stir welding by Gagliardi et al. 20 Gaussian Process Regression (GPR) has also been adopted in the literature to model the effect of process parameters on weld bead geometry of stainless steel 21 and of aluminium alloys. 22ernel-based regression models have also been adopted to study the effect of process parameters on penetration depth and mechanical properties. 7,8For example, Petković 23 exploited support vector machine regression (SVM or SVR) to model the geometry and the resistance of the weld based on laser welding process parameters, including clamping conditions.Later on, Zhang and Zhou 24 exploited SVR to optimise the weld bead geometric and mechanical properties of stainless steel.
Classification and Regression Trees (CART) have also been adopted to model the relationship between process parameters and weld quality, 25 for example, by XGBoost algorithm for Al-Li alloys by Zhang et al. 26 Moreover, in addition to explainable artificial intelligence, 27,28 genetic programming (GP) and neural networks (NN) have been adopted for several welding applications. 29,30GP has been exploited by Wilson et al. 31 to model the LW of deep drawing coated materials, and by Nikolić et al. 32 for low carbon and stainless steel.As far as NNs usage is concerned, for example, Nikolić et al. 32 trained an artificial-NN to predict the geometry of the LW bead for low carbon and stainless steels, Schmoeller et al. 33 trained a variable autoencoder to predict the penetration depth of the LW of aluminium alloys.Indeed, within this quite complex framework offering several alternatives, authors often investigate multiple ML approaches to model and optimise LW. 7,8,14 However, in several cases, the comparison relies only on root mean square error (RMSE) of predictions, sometimes neglecting Bayesian approaches upon which ML methods rely and often disregard-ing hyperparameters optimisation. 23,24,29,32Literature presents method to compare performances of different supervised ML modelling based on holistic and structured framework.However, they tackle problem specific modelling related to data-rich and knowledge poor scenarios. 34Conversely, in the case of resource intensive manufacturing process setup, for example, LW, data tends to be scarce, due to the high monetary and environmental costs, but knowledge tends to be high, that is, main process parameters and their effect is already known, so that variable and dimension reduction is not necessary.This work proposes a simple methodology to cater for inherent stochastic nature of ML modelling approaches to compare performances of ML when deployed in technological application.Accordingly, this work compares performances of some of the most applied in the literature ML methods within a statistical framework, to provide practitioners guidelines in adopting ML techniques in modelling LW quality.The comparison will be limited to the models most largely applied in laser welding process optimisation, according to the literature as detailed above.The comparison will be performed on an industrially relevant case study, that is, the process setup of deep-drawing steel for automotive application.In particular, the paper considers as input parameters raw process parameters that can be directly set on the machine.The rest of the paper is structured as follows: Section 2 presents the considered case study, and the applied ML techniques and the comparison methodology, while Section 3 outlines and discusses the results, and Section 4 finally draws the conclusions.

Materials and experimental setup
This work focuses on modelling the setup of a LW process for deep drawing steel for automotive applications to achieve an understanding of the process for optimisation.The case study is offered by AGLA Power Transmission, an industrial company operating in the automotive sector.The LW process has been carried out by state-of-the-art mass production equipment featuring an Ytterbium fibre laser source, with an adjustable power source up to 10 kW, single-mode beam.The LW targets the manufacturing of a support for the clutch discs of a CVT gearbox.The part consists of two components: a hub and a tonewheel, both out of standard deep drawing steel. 35This work considers the penetration depth of the weld bead, S n as the quality control variable.Such choice is not the uniquely possible, and is motivated by the customer requirement for the considered industrial case study, and supported, as far as its practical relevance, by the literature review, briefly outlined in Section 1, which relates it to the mechanical strength of the joining. 8he investigation of the process parameters is performed according to the literature and considers the effect of the welding speed v, the laser power P, the focal position (also referred to focus offset) F O .Conversely to other studies, 36 such approach provides operator a direct indication about how to act on the machine, dispensing with the requirement of complex information on material properties, for example, absorptivity, and process, for example, the relationship between the focus and the beam area.For confidentiality, this works reports analysis based on normalised power with respect to the average laser spot area, that is, the power density P d .The parameters are reported to have a well-defined effect on the weld geometry.In fact, increasing the power or decreasing the speed a deeper penetration can be obtained.Similar results can be obtained by decreasing the focal spot area (related to the focal position), for a given pair of power and speed. 15,37he effect of the process parameters is investigated by realising 88 specimens according to an unbalanced design resulting from the implementation of four investigative DOEs and the addition of some further sparse conditions to enrich the investigated space.Indeed, the implemented experimental design is inevitably linked to the involved companies' available resources.The choice of the parameters, shown in Table 1, and the resulting investigated conditions, reported in Table A1, are according to the industrial company's former experience in processing the materials with a different solid state laser source.
Once the LW has been performed, the component has been cross-sectioned in lubricated condition.The cross-section is then polished with grit paper (240, 320, 800 and 1200) and then with a diamond solution with decreasing grain size (6 μm, 3 and 1 μm).Optical inspection of the weld bead cross-section is performed after Nital etching by means of a metallographic optical microscope Laborlux 12 ME Leitz with 50× magnification to allow the measurement of the weld depth S n .

Machine Learning modelling approaches
According to literature, briefly reviewed in Section 1, the most commonly adopted ML methods are considered in this work.Because this work tackles the optimisation of the process setup, which is typically associated with the availability of few data, neural networks are not considered.Supervised ML techniques are considered, both belonging to explainable AI, that is, GLM, GPR, SVR, CART and not, that is, GP.In the following, a brief overview of the considered approaches is provided, along with the optimisation approach of the relevant parameters.In general, any of the supervised machine learning method that will be discussed can be represented as a function that achieves an estimation of the output y, ŷ = (, ), based on a set of predictors, x, and a set of hyperparameters, .In the following, the main features of the methods are discussed functionally to the introduction of the hyperparameters that will be optimised.More in-depth discussion on the methods can be found in reference literature, for example 38,39 for GPR and 40 for GP.
When the models are validated, a method based on statistical inference is applied to compare performances.The best model will be then optimised to seek the process parameter set that maximises the weld bead length.

Generalized Linear Model
Generalised Linear Model (GLM) is a ML technique that infers a statistical model between a set of predictors and one or more outputs, whose probability distribution is most typically assumed to be normally distributed.The model is a linear combination of the predictors, which may be passed through a nonlinear function, 38 and the coefficients of the linear combination are estimated by the GLM by the least square method, that is, by maximising the estimation of the log-likelihood. 38,41Here, a third-order model is considered.ANOVA is usually exploited to identify statistically significant parameters while catering for the degrees of freedom of the estimation and of random errors.Therefore, including nonsignificant parameters in the model is liable to worsen prediction and increase RMSE.Consequently, variable reduction is essential and non-trivial.[44]

Gaussian Process Regression
Gaussian Process Regression (GPR) is a stochastic regression method for interpolating and inferring models in sparse datasets, that is, investigating large portion of the domain in a non-necessarily structured nor densely filled way. 39,45,46It relies on the assumption that the model response, that is, the prediction, depends on the correlation between the model response at two different evaluation points, and that the correlation is a function of the distance h between these evaluation points.In particular, under this assumption, the GP prediction can be written as: where   ( 0 ) is a regressive term and   0  −1 (  − ) a correction term.The solution, which minimises the mean squared prediction error, assumes the variable () =   ()+Ψ() consists of a regression, in particular a linear combination of m function () with linear combination parameters , and the spatially correlated regression error Ψ() ∼ (0,  2  (; )), having (; ) the correlation matrix dependent on the distance h and a set of parameters .The correction term depends on the residuals, with F is the matrix with entries   =   (  ),  ∈ {1, 2, … , },  ∈ {1, 2, … , } and   the training set with n data, weighted by the correlation, let  0 = (( 0 −  1 ), ⋯, ( 0 −   ))  . 39,45,46When training a GP model, it is crucial to determine the regression model F and the correlation parameters.Thanks to the presence of the corrective term in the prediction, a possible solution is the ordinary kriging, which includes only a constant term rather than a linear combination of trend functions.The estimation of the spatial correlation can be performed by means of variogram (ℎ) and its empirical estimate γ (ℎ), for example according to Matheron, that is, with () = {(  ,   ) ∶   −   = ℎ; ,  ∈ {1, 2, ⋯, }}and the operator # is the cardinality. 47The prediction of spatial correlation at different points and distances requires fitting the empirical variogram γ , according to some kernel function, and several alternatives are available in the literature, for example, Matèrn, squared exponential, Gaussian. 48,49

Support Vector Machine Regression
Support Vector Machine is a machine learning classification algorithm that charts the input data in hyperspace and defines a hyperplane that can achieve binary classification. 50The solution of a nonlinear classifier is enabled by space transformation through a kernel function, such that hypersurfaces nonlinear in the original hyperspace are hyperplanes in the transformed space. 38The main parameters of an SVM are the parameters describing the hyperplanes and the tolerance ϵ.This defines a tolerance around the hyperplane for the classification.When deployed for regression, SVM aims at minimising the distance between the data point and the hyperplane, according to the tolerance.In particular, if the training data point y falls within the tolerance ϵ of the related estimate ŷ, the error (  , ŷ ) is considered null, 38,51 that is:

Regression Trees
Classification and Regression Tree (CART) graphically describe a regression model by a tree, where terminal leaves are regressors or constants and branch nodes indicate mathematical operations to be performed to the branches merging in that node.Trees' main parameters are the depth, that is, the number of nodes on the same branch, and the width, that is the number of branches.CART can be constructed by boosting.Boosting applies a weak learner that is a certain CART, to a weighted dataset iteratively.At each iteration, weights are updated so that data points with greater prediction error are associated with larger weights.The procedure aims to maximise the accuracy. 52The accuracy can be expressed in several ways, depending on the specific boosting algorithm that is applied.Amongst the others, L2Boosting minimises squared error, Gradient boosting the absolute error, AdaBoost the exponential loss, LogitBoost the logloss.Further criticality in identifying the adequate CART is the excessive growth of trees, for it worsens readability and makes them liable to overfitting the data. 38The creation of an ensemble of weak learners is particularly effective in relieving this issue, 53,54 for the ensemble results in greater simplicity, robustness and accuracy by aggregating several simpler weak learners.The decision rule across the weak learners is typically by a simple majority.Consequently, in addition to the parameters typical of the weak learner, and the boosting method, the base learner numerosity and the method to create their several instances define the ensemble.Bootstrap aggregating, that is, bagging, allows creating an ensemble of trees by taking bootstrap samples ℒ B of m data from the learning set  = {, }, drawn randomly with replacement, and exploiting each of them to create a weak learner   , such that ŷ =   (). 53If the input data and the data dimension are subsampled with the same methodology, a random forest can be obtained.This solution has the advantage of constructing uncorrelated predictors for the different samples ℒ B . 38,54.2.5 Genetic Programming Genetic Programming (GP) is an alternate route for constructing CART.This methodology identifies the most suitable realisation of a CART by means of stochastic investigation of several alternatives, modelling the population.The alternatives are stochastically generated, relying on genetic principles of crossover and mutation and survival.55,56 GP requires the creation of a first initial population of CART, randomly generated, of a certain size.Each CART fitness is evaluated according to a criterion.In this work, the RMSE was selected.Then, the population is updated iteratively; at each iteration, a new generation of CART is available for evaluation so that the fittest CART can be selected.Typically, the algorithm is stopped after a certain number of generations.In this work, a population of 500 individuals and 50 generations are considered.Each generation consists of the same number of individuals as the initial population.Genetic programming intervenes in how the new individuals are generated. Thesexploits either the crossover, that is, a new individual results from the random combination of branches of CART in the most recent population, and mutation, that is the new individual is generated by randomly modifying a branch of an existing CART.The mix of these two genetic operators is relevant, as well as the possibility that a new individual is reproduced from the population.Last, survival can limit the portion of newly generated individuals that will actually go through in the next generation.In particular, elitism principles can be applied.This can keep the fittest individual (keep-the-best) between both parents and children, while others are replaced, that is, selected by fitness by giving priority to children.Alternatively, total-elitism and half-elitism can be considered.The former takes in the new generation the absolute fittest individuals between parents and children, with no prioritisation.The latter selects half of the next generation's population as the fittest individuals between parents and children, while the other half is replaced.In this work, total-elitism is not considered, for it reduces the investigation capabilities of GP.In addition to GP specific hyperparameters, the usual dimension of CART, that is, width and depth, play a major role in complexity, readability, and computational effort.40 Last, a further source of variability lies in the initial population random generation.Therefore, the literature suggests testing the same hyperparameters set on multiple initial population independent random generations.In this work, 60 runs are performed, and the model providing the minimum RMSE is considered the best GP model.40,55,56 Typically, the selection of hyperparameters is performed by trials and errors and largely relies upon operator expertise.40,55,56 In this work, an automated optimisation based on a Bayesian algorithm is applied; the algorithm is described in Section 2.3.To the authors' best knowledge, the application of Bayesian optimisation to optimise GP hyperparameters is unreported.40

Model optimisation and validation
The presented and considered supervised Machine Learning approaches, that is, GPR, SVR, CART and GP, are highly interesting tools to draw models describing relationship between output variables and input influence factors.In general, they can be represented as a function that achieves an estimation of the output y, ŷ = (, ), based on a set of predictors, x, and a set of hyperparameters, .The hyperparameters are highly specific for the considered ML method, similar to the algorithm to evaluate such function .However, the selection of the best  is a non-trivial task. 38In fact, due to the numerous parameters and their nonlinear effect on the goodness of fit, optimisation in closed-form solutions is often computationally expensive.A heuristic alternative methodology exploits black-box models between the hyperparameters  and a cost function modelling the prediction accuracy, Acc = Acc().In the case of regression, RMSE can be chosen as (lack of) accuracy estimate.Indeed, this is not the unique alternative, for other metric to describe goodness of fit of regression could have been selected, for example, MSE, MLE. 57,58However, RMSE is a conventional choice and is suitable to have a straightforward estimation of the effect on the model's prediction uncertainty.Under the considered assumption, the methodology exploits Bayesian optimisation algorithm 58 to maximise the accuracy, that is, minimise the RMSE, by finding  best = argmin  (RMSE()).Table 2 summarises the  for the considered supervised ML methods, in accordance with the description in Section 2.2.Parameters' ranges were selected out of literature and best practices tackling similar problems, as summarised in the introduction.Some parameters are held fixed, that is in the case of GP, in accordance with the best practices present in literature and sensitivity analysis.Specifically, the selected values are minimum from empirical practices to provide robust results independent from their values.The Bayesian optimisation is computationally less demanding and achieves a global optimisation. 59The Bayesian optimisation assumes that the black-box accuracy cost function,Acc(), is a real Gaussian Process realisation, for it provides suitable flexibility and regularity to the function.A Gaussian Process prior is hypothesised and maintained through updated posterior distribution that is new observation of the function.The particular choice of the prior allows tractable posterior and introduce a covariance term, dependent on the distance of probed points that allows improved exploitation and exploitation of the domain. 58,60,61Thus, choosing the new evaluation point,  next , of the cost function is essential in the algorithm because it determines the posterior.Studying a certain acquisition function,  = ( * ; , {(, )}), such that  next = argmax  * (), allows identifying  next . 58,60One of the criticalities is defining a suitable trade-off between the exploitation and the exploration of the hyperparameter space.In particular, the acquisition function has to guarantee that regions that provide minimisation of the cost function are thoroughly investigated, that is exploited, and that those with higher uncertainty, that is little explored, are appropriately investigated.Amongst the others, the constrained overexploitation expected improvement per second is a suitable choice for achieving a global optimisation considering the computational effort. 58,60In particular, the method assumes the acquisition function as: where   ( * ) is the minimum posterior mean and   ( * ) the posterior mean of the Gaussian Process model describing the evaluation time.Under the assumption that the accuracy distributes as a Gaussian Process model with a predictive mean ( * ; , {(, )}) and predictive standard deviation ( * ; , {(, )}), Equation ( 4 where Φ(( * )) is the standard normal density function. 58,59Additionally, the posterior standard deviation must not be smaller than a certain fraction of the prior standard deviation.This constraint avoids overexploitation, that is, finding local minima.In fact, if it is not satisfied, the new hyperparameters set  next belongs to a region with a small uncertainty, that is, it is between already tested points.If that is the case, a multiplication factor proportional with a factor multiple of 10 to the number of performed iterations of the Bayesian algorithm is applied to the  next to correct the next evaluation point. 59According to best practices, 30 iterations are performed. 58he Bayesian optimisation algorithm is applied to each ML modelling approach independently.The Bayesian optimisation algorithm is not applied to GLM because it would result in unnecessary complexity.Model validation to test for generalisation and robustness is performed, and accuracy is evaluated in terms of RMSE on a validation set obtained by a constrained bootstrap sampling of the training set.In particular, k-fold cross-validation is adopted in this work.K-fold cross-validation splits in k folds the data set, each of which in turn is used as a test set.Accordingly, each point is predicted once and used to build the classifier k-1 times.In this work a conventional 5-fold cross-validation is considered. 38The accuracy is computed as the average accuracy of all the folds. 38Per each k-fold, the model is trained, the parameters optimised by the Bayesian optimisation algorithm, and then validated.
Both supervised ML methods and related optimisation and validation are implemented in MATLAB 2019b.GP base algorithm is deployed by relying on the GPLAB Toolbox v3.0. 62

Performances comparison
The comparison between the sets of RMSE related to the different considered models is performed by means of the Wilcoxon rank-sum test. 63This is a nonparametric hypothesis test to compare the median of two samples, which can have different sizes, under the null hypothesis that the medians are equal.Adopting a nonparametric test is useful in the case at hand for twofold reasons.It does not require performing a hypothesis on the distribution of the RMSE, and it enables the comparison of samples of different sizes.

RESULTS AND DISCUSSION
The optical metallographic inspection allowed to identify the weld depth S n .Results are reported for the sake of readability in the annex in Table A1.These are exploited to draw prediction models according to the methodology discussed in Section 2. While formal RMSE comparison will be discussed as per Section 2.4 exploiting k-fold validation, when model parameters optimisation results are discussed, synthetic indication of average RMSE and R 2 evaluated from the k-fold cross-validation will be reported.

Generalized linear model
Figure 1 shows the main effect plot of the output variable S n to the considered factors.Nonlinear effects and qualitative significance can be appreciated.Accordingly, the tentative choice of selecting a third-order polynomial model with complete interaction seems reasonable for the GLM.
The GLM is applied with stepwise variable selection, resulting in the model of Equation (6).

Gaussian Process Regression
Bayesian optimisation selected a universal kriging model with Matèrn 5/2 kernel.This kernel expresses an exponential-like covariance function as: (7)   where the parameter   is the correlation length.The Bayesian optimisation estimates the   to 7.6534 mm, while the   = 4.0297 mm.The average RMSE of 0.226 with a R 2 of 82% resulted from cross-validation.As could have been expected, the Bayesian optimisation selected a universal kriging model for it has greater flexibility.

Support vector machine regression
Bayesian optimisation selected a Gaussian kernel and a tolerance ϵ of 0.028.The Gaussian kernel achieves a space transformation of two regressors according to: where  is related to scaling, and selected by Bayesian optimisation as 0.003.This optimisation, when cross-validated, results in an RMSE of 0.327 mm and an R 2 of 73%.

Regression Trees
The Bayesian optimisation was performed to select between the parameters reported in Table 2.The CART was selected as an ensemble of 210 weak learners, built by L2Boosting.The Bayesian optimisation constrained the maximum depth to be smaller than 10, leaving the width free.The CART resulted in an average RMSE of 0.269 mm with R 2 of 93% from cross-validation.

Genetic Programming
The Bayesian optimisation was performed to select the best hyperparameters.Because, to the best knowledge of the authors, this was not applied before to achieve an automatic selection of GP model parameters, more insights are offered in this case.Figure 2 shows the box plots of the RMSE resulting from the 60 random independent generations of the initial population of the 30 iteration steps of the Bayesian optimisation procedure.According to Section 2.2.5 and 2.4, the model is selected as the one associated with minimum RMSE, that is the best set is the one identified in the 10th iteration of the Bayesian optimisation.This choice is validated by testing if there is a significant difference between the minimum RMSE and the median of the sample having the minimum median RMSE that is those got in the 23rd iteration of Bayesian optimisation.Systematic differences could be highlighted with a risk of error (p-value) largely smaller than 0.1%.Thus, although the set of hyperparameters associated with the 10th iteration is more sensitive to the initial random generation of CARTs, its performances yield the actual minimum RMSE.The selected hyperparameters to generate the optimal GP model exploits a keep-the-best elitism operator that acts on a new generation created according to a mix of genetic operators of crossover, mutation and replication of 27%, 3% and 70%, which explains the greater sensitivity to the initial condition.The model resulted in a CART with a width of 16 leaves and a depth of 6 nodes, achieving a RMSE of 0.4 mm and a R 2 of 63%, after cross-validation.

Performance comparison
Figures 3 and 4A show, for the considered machine learning approaches, the predicted value and the residuals as a function of the experimental values, respectively.Only in the case of GP, a significant trend can be appreciated in the residuals, suggesting the poor performance of the model, also already indicated by the RMSE and the R 2 .This lack of fit of GP makes testing the normality of related residuals little meaningful.Poor GP performances can be explained considering that GP provides significative advantages when a large sample space has to be investigated, considering a high-dimensionality problem domain. 55,56,64,65As far as other models are concerned, no systematic deviation from normality can be appreciated neither graphically by means of normal probability plot of the residuals in Figure 4B, nor identified by the Anderson-Darling test, with a risk of error of 5%, for the GLM and GPR residuals.In particular, GPR residual presents a slightly hyponormal NPP, which is not significant when tested by the quantitative normality test.Conversely, SVM and CART residuals distributions are significantly different from a normal distribution, with a relevant skewness and high kurtosis.Bayesian optimisation's computational load engages for from 2′ to 5′ a high-end performance laptop (16 BG RAM, CPU Intel Core i7-8750H @2.2 GHz, GPU NVIDIA GeForce GTX 1060), which increases to 15′ in the sole case of the GP, due to its inherent training structure which increases the complexity of the operations.
Figure 5 shows the box-plot of the RMSE from the cross-validation of the considered models.These data are exploited to perform the Wilcoxon nonparametric test, as presented in Section 2.4.Pairwise comparison between the cross-validation RMSE performed at a risk of error of 5% by the Wilcoxon rank-sum test shows that the GPR performs better than the other trained models.In particular, Table 3 summarises the alternative hypothesis that cannot be rejected when the null hypothesis that the two sample medians are equal ( 0 ∶ x1 = x2 ) is rejected with a risk of error of 5%.
TA B L E 3 Results of the Wilcoxon rank sum test.Inequality indicates the unrejected alternative hypothesis with a p-value <0.05, empty cells mean H 0 could not be rejected.

RMSE2
Surface plots of GPR model with respect to (A) focus offset-power density plane, (B) speed-power density plane, (C) speed-focus offset plane.Surfaces are drawn holding the third variable constant to the process optimum.

Process optimisation
According to the former section, the best trained and cross-validated model is the Gaussian Process Regression. Figure 6 shows three representative surface plots.The GPR model is exploited to achieve process optimisation to maximise the weld depth.When addressing process optimisation, productivity and sustainability are essential.Amongst the considered process parameters, speed is associated with productivity; thus, it is better to have it constrained at a high value.In particular, considering the main effect plot in Figure 1, it was held at 2.4 rad/s.Another critical aspect is process sustainability,  which can also be associated with energy consumption, with which laser power density is associated.However, within the considered process window, no dramatic changes can be induced 66 ; therefore, it is left unbound during the optimisation.The optimisation is performed again through Bayesian optimisation to avoid other computationally heavy methods.In this case, the cost function is −   = (  ,   ,  = 2.4 ∕), where the negative sign is required for the optimisation algorithm seeks to minimise the cost function, whilst the process optimisation targets   maximisation.Figure 7 shows the response surface of the cost function which allowed the identification of optimal process conditions, which are reported in Table 4. Consistently with physics-based models, 36,37 a deeper penetration is obtained by higher power, lower speed and best focus; however, productivity bounded optimisation derogates from absolute conditions to increase as much as possible the speed, while respecting the tolerance specifications.It is worth noting that such considerations are made possible thanks to the selection of explainable ML, as in the case of GPR and GLM.Those are in general much more difficult to be performed when other ML modelling approaches are considered, for example, CART, GP, SVM.This current limitation of some black-box ML modelling is currently tackled by Physics-based AI. 67,68 For the sake of comparability, also the second-best trained model, that is the GLM, is exploited for process optimisation.This comparison is also considered because the GLM is a conventional regression method. 41Results are compared by exploiting the prediction intervals in Figure 8. 42,45 The comparison shows that the average predictions are compatible.Moreover, it shows that despite the prediction of GPR model is more uncertain locally, for it includes a covariance estimation, the overall prediction interval is still less uncertain than the GLM approach.The covariance-estimation weight in the GPR prediction results in a better accuracy and overall precision, which is consistent with the GPR properties as introduced in Section 2.2.2.Thus, GPR is more robust and general, see Figures 3 and 4, where prediction and residuals of GPR show only one outlier related to a poor weld.Conversely, GLM shows two outliers: one at very large penetration depth, that is, in correspondence of a correct weld, and the one related to poor weld.Poor welds are those that resulted in a defective joining, mostly for S n < 4.5 mm.GPR show smaller error (i.e., better accuracy) for these points, which is consistent with the predictive behaviour of GPR, at the cost of higher local greater prediction intervals, 45 which do not impact the overall behaviour, as shown in Figure 8, resulting in a relative prediction uncertainty of 8% for GPR (against a 10% of GLM).Consequently, the covariance-weighted error in the prediction allows a reduction of the expanded uncertainty of 20%, which can be essential in defect prediction and quality planning. 69 I G U R E 8 Prediction Intervals at 95% confidence level of the optimised weld depth for the best (GPR) and second-best (GLM) model.

CONCLUSIONS
Industry 4.0 and sustainable manufacturing requires adopting non-destructive predictive models for process quality.Laser welding main quality indicator is the weld depth, which may be correlated to process parameters by means of supervised Machine Learning algorithms.This work has proposed a statistical comparison method to compare the performances of different modelling approaches.The method provides a straightforward tool to process and quality designers to select the most suitable model for the scenario at hand.The approach first optimises model hyperparameters via Bayesian optimisation and then compares performances by means of nonparametric median-based hypothesis tests.The necessity for a Bayesian approach to optimise Genetic Programming while catering for its generation method is demonstrated.Greater robustness of Gaussian Process Regression is shown with respect to other models, for it can predicts defects even when in the training dataset very few defects-related data are presented.Additionally, physics informed explanation on the obtained results is performed on the basis of some of the selected ML models, for example, Gaussian Process Regression and Generalized Linear Model.Other modelling approaches, for example, SVM, GP, that does not fall in the category of explainable-AI, are currently hindering such explanation.The selected model is then exploited to optimise the quality of the laser welding of a deep drawing steel for automotive application.Process optimisation is achieved by a bounded Bayesian optimisation of the selected best model, which in the specific case study is Gaussian Process Regression, to achieve tolerance specification, while catering for productivity through a computationally effective approach.The drawn model allows quality prediction, thus enabling a significant reduction, possibly to zero, of destructive quality inspections, for the considered variable.In particular, estimation of the probability of generated defects can be performed, both in optimised and variable process state, through the verification of compliance with the tolerance specification of the prediction interval.

F I G U R E 1
Main effect plot of penetration depth with respect to speed, power density and focus offset.Each data point represents the average of the 88 collected data grouped per each factor level.

F I G U R E 2
Genetic Programming Bayesian Optimisation result.F I G U R E 3 Predicted versus experimental value of weld depth S n .

F
I G U R E 4 (A) Residuals versus experimental value of weld depth Sn; notice the trend in GP prediction.(B) Normal probability plot of the residuals; GP is excluded from the analysis due to the systematic trend in the residuals.No deviation from normality can be appreciated for the GLM residuals.Slight hyponormality is suggested for GPR residuals, but not identified by quantitative test.F I G U R E 5 Box-plot of the RMSE of the different trained models of the results of k-fold cross-validation.

F I G U R E 7
Cost function surface for weld depth S n maximisation exploiting the Gaussian Process Regression model.TA B L E 4 Results of process optimisation for the two best prediction models.
The authors would like to thank S. Bonù from AGLA Power Transmission, L. Bonù from FORZA SMART INDUSTRY and Dr. R. Cagliero from LBN Ricerca for having provided the case study and the laboratory facilities, and Miss G. Di Paola and Mr. L. Bonamassa for the support in the laboratory activity and data preparation.No funding was received to support this research.Giacomo Maculotti https://orcid.org/0000-0002-2467-0426Maurizio Galetto https://orcid.org/0000-0003-0424-3416R E F E R E N C E S

TA B L E 1
Considered parameters values in the implemented experimental design.Values of welding speed are in angular units as the welded component is axial symmetric.Power values are normalised to the laser spot area to avoid the disclosure of sensitive information and the power density is thus reported.
Considered hyperparameters  to be optimised by the Bayesian algorithm.
TA B L E 2