Functional clustering methods for resistance spot welding process data in the automotive industry

Quality assessment of resistance spot welding (RSW) joints of metal sheets in the automotive industry is typically based on costly and lengthy off-line tests that are unfeasible on the full production, especially on large scale. However, the massive industrial digitalization triggered by the industry 4.0 framework makes available, for every produced joint, on-line RSW process parameters, such as, in particular, the so-called dynamic resistance curve (DRC), which is recognized as the full technological signature of the spot welds. Motivated by this context, the present paper means to show the potentiality and the practical applicability to clustering methods of the functional data approach that avoids the need for arbitrary and often controversial feature extraction to find out homogeneous groups of DRCs, which likely pertain to spot welds sharing common mechanical and metallurgical properties. We intend is to provide an essential hands-on overview of the most promising functional clustering methods, and to apply the latter to the DRCs collected from the RSW process at hand, even if they could go far beyond the specific application hereby investigated. The methods analyzed are demonstrated to possibly support practitioners along the identification of the mapping relationship between process parameters and the final quality of RSW joints as well as, more specifically, along the priority assignment for off-line testing of welded spots and the welding tool wear analysis. The analysis code, that has been developed through the software environment R, and the DRC data set are made openly available online at https://github.com/unina-sfere/funclustRSW/


Introduction
Resistance Spot Welding (RSW) is the most common technique employed in joining metal sheets during body-in-white assembly of automobiles, 1, 2 mainly because of its adaptability for mass production. 3 Typical car body contains about 5000 spot welds joining metal sheets of different materials and thicknesses. 4 The quality of many critical spots 5 is routinely controlled in order to guarantee the structural integrity and solidity of welded assemblies per vehicle. 3 Quality assessment is typically based on tests performed at the end of the RSW process (off-line) on finished sub-assemblies through direct or indirect evaluation of weld-joint key characteristics. 6 Off-line testing is, however, costly and lengthy and thus unfeasible on the full production, especially on large scale.
In the modern automotive industry 4.0 framework, automatic acquisition systems allow to routinely control welders during running operations (online) through the continuous record of a large volume of process parameters. In particular, the so-called dynamic resistance curve (DRC) is the most important process parameter acquired on-line 7 and is popularly recognized as the full technological signature of the metallurgical development of a spot weld. 8 In this scenario, a paramount issue constantly faced by practitioners is the identification of homogeneous groups (clusters) of spot welds based on DRC observations, in terms of mechanical and metallurgical properties. The identification of clusters with a convenient interpretation is useful for exploring the mapping relationship between process parameters and the final quality of the RSW joints produced, and, in general, for supporting the experiencebased learning of any technological process. In this regard, the most common practice in industry is to analyze one or few scalar features extracted from the acquired DRC, even though feature extraction is known to be often difficult, arbitrary and risky of collapsing useful information.
On the contrary, in this paper, each DRC observation is suitably modelled as a function defined on the time domain, i.e., as functional datum. Functional data analysis (FDA) [9][10][11][12] is the set of methods that consider functional data as its founding elements. Clustering functional data is usually a difficult task, because of the intrinsic infinite dimensionality of the problem. A thorough overview of functional clustering methods can be found in Ramsay and Silverman 9 and Ferraty and Vieu. 11 Then, it is worth mentioning Cuesta-Albertos and Fraiman 13 who proposed a pure functional version of the k-means algorithm, which is very popular in the multivariate setting, 14 as an alternative to the method of Abraham et al., 15 who instead applied the k-means algorithm to the coefficients obtained by projecting the original profiles onto a lower-dimensional subspace spanned by B-spline basis functions. Another version of k-means algorithm is that of Chiou and Li, 16 which relies on a particular distance between truncations at a given order of the functional principal components expansion. 9,17 This version can be broadly regarded as an instance of the method proposed by Bouveyron and Jacques, 18 who modelled the functional principal components through Gaussian mixture. Some parsimony constraints on the variance parameters are also considered to define a family of parsimonious sub-models. A similar methods, which is based on a functional principal components expansion of the functional observations, was proposed by Jacques and Preda. 19 The work of James and Sugar 20 is recognized as the first example of model-based procedure for functional data clustering, as well as the method proposed by Giocofci et al., 21 which, in particular, relies on the wavelet decomposition of the functional observations, and is particularly appropriate for peak-like data, as opposed to methods based on splines. More recently, Delaigle et al. 22 proposed a functional k-means algorithm able to cluster observations asymptotically perfectly. A sparse functional clustering procedure, that is clustering functional data while jointly selecting the most relevant features, was developed by Floriello and Vitelli 23 and, in particular, by Vitelli 24 who accounted for possible curve misalignments. For the sake of completeness, Bayesian approaches have appeared as well [25][26][27] in the literature, even if they are beyond the scope of this paper.
After providing Section 2 with the technological background and the description of the functional DRC data set collected from the RSW process that has motivated this research, we give in Section 3 a deeper hands-on illustration of the most promising functional clustering methods to be applied to the DRC data set at hand. In Section 4, we discuss and interpret from technological viewpoint the main results obtained, even if the proposed approach could go far beyond the specific application hereby investigated. We conclude by Section 5 with a discussion of issues highlighted by this data set and a broader perspective of the potentiality of the proposed methods. Technical details for each of the clustering methods implemented in this paper are presented in the Appendix.
The DRC data set and the R 28 code are made openly available online 29 to allow the reader to possibly investigate other approaches with this data set and to encourage the fruitful spread of functional data clustering methods among practitioners in industry.

Technological Background and Data Structure
The considered RSW process 30 refers to an autogenous welding process in which two overlapping steel galvanized sheets are joint together, without the use of any filler material, at discrete spots. Joints are performed by applying pressure to the weld area from two opposite sides by means of two copper electrodes. Voltage applied to the electrodes generates a current flowing between them through the material. The electrical current flow because the resistance offered by metals causes a large heat generation (Joule effect) that increases the metal temperature at the faying surfaces of the work pieces up to the melting point. Finally, due to the mechanical pressure of the electrodes, the molten metal of the metal sheets jointed cools and solidifies forming the so-called weld nugget. 6,31 The typical shape of a DRC acquired during this process is displayed in Figure 1 for illustrative purposes. In the light of Dickinson et al., 8 it mainly depends on physical changes induced in the material by the ongoing welding process and can be roughly outlined into five stages, as well depicted by Adams et al. 32 For the sake of conciseness, these stages can be summarized as influenced by two main concurrent effects due to (a) the metal electrical resistivity and (b) the contact area among the metal sheets to joint. These effects develop during the RSW process by means of the heat produced by the current flow and the clamping pressure generated by copper electrodes. In particular, DRC values are proportional to (a), which increases with material temperature.
On the contrary, DRC values are decreasing with b. That, in turn, is increasing with two main factors: (b.1 ) the deformation, due to the clamping force, of the surface asperities, that are softened by the high temperatures; and (b.2 ) the melting of the metal, that guarantees the sheet continuity by occupying the interstices between the work pieces to weld. So stated, the typical DRC behaviour ( Figure 1) can be interpreted by the turnover of the effects due to (a) and (b). Specifically, DRC decreases at first because of the effect due to (b.1 ), which dominates effect due to (a) until the local minimum; then, conversely, DRC increases because the effect due to (a) dominates effect due to (b) until the local maximum, which represents the beginning of the nugget formation. Finally, the DRC decreases slowly to the end of the RSW process, because the effect due to factor (b.2 ) dominates that due to (a) to a lesser extent. In a nutshell, DRC behaviour can be roughly outlined by one local minimum point, one local maximum point, and the resistance value at the end of the welding process.
The data set for the problem at hand consists of 538 DRCs that are plotted in Figure 2 and pertains to spot welds of the same type collected during RSW lab tests at Centro Ricerche Fiat (CRF). The latter have been carried out on coupons of two sheets having thickness equal to 0.7 mm and 1.3 mm and made of FE220BH and FE600DP galvanised steels, respectively. The energy was supplied in a single pulse of current. The weld time period is 237 ms. Strictly speaking, the values of electrical resistance used to obtain each DRC observation are not direct measurements, but obtained, according to the first Ohm's law, 33 as the ratio between the voltage at electrode tips and the current intensity measurements. For each DRC observation, these have been collected at a regular grid of 238 points equally spaced by 1 ms. In particular, the electrode tip voltage has been measured using dressed copper wires attached to the electrodes. Whereas, the current intensity has been measured by means of an air-core toroid in the primary of the welder transformer. Copper wires are checked up to ensure their integrity at the beginning of every welding cycle. Electrical resistance of the metal sheets is assumed much larger than that of the copper electrodes That is, the copper electrode resistance does not practically affect the measurement of metal sheet resistance.
Note that the raw data plots for the 538 DRCs at hand yield shape and features coherent with those discussed with reference to Figure 1, but show non-negligible variability that motivates the goal of the present paper in supporting practitioners to build homogeneous groups of DRCs. The intent is to identify through the latter spot welds having similar mechanical and metallurgical properties, and groups themselves that stand apart from one another. In particular, clustering methods, and even more their functional version, will be of great value in this regard with the ultimate goal of guiding practitioners along the priority assignment for off-line testing of welded spots and the welding tool wear analysis.

Functional Data Clustering Approaches for Dynamic Resistance Curves
Usually, functional data consist of independent realizations X 1 , . . . , X n of a functional random variable X with values in an infinite dimensional space, which is typically taken to be L 2 (T ), the separable Hilbert space of square integrable functions defined on the compact domain T . In most applications, T ⊂ R and represents time, however, multidimensional domains could be considered as well. Typically, X 1 , . . . , X n are not entirely available but are observed through a finite set of observation points. This means, only dis- being m i the number of discrete points available for the i-th observation. The aim of the clustering analysis is to define M partitions, i.e., clusters, of the data X 1 , . . . , X n such that observations in different clusters are as dissimilar as possible and that observations within the same cluster are as similar as possible. In the rest of this section, we describe the most promising approaches for functional clustering, which can be grouped in raw-data clustering, filtering methods, adaptive methods, and distance-based methods.

Raw-data clustering
The raw-data clustering approach consists in the clustering of discretized version {X ij } of the functional observations {X i } by means of classical multivariate methods. This simple approach does not need the reconstruction of the functional data and relies on well-established multivariate algorithms. One of the most popular clustering algorithm is k-means. In the functional setting, k-means aims to partition the observations into M clusters C * 1 , . . . , C * M such that the within-cluster sum of squares is minimized, that is where C 1 , . . . , C M are all the possible observation partitions in M groups, X i = (X i1 , . . . , X im i ) T , and µ m is the mean vector of the observations in C m . Hierarchical clustering 14 produces a representation in which clusters at each level of the hierarchy is formed by all and only the clusters of the lower levels. Strategies for hierarchical clustering are mainly divided into two approaches: agglomerative (bottom-up) and divisive (top-down). The former starts at the bottom (i.e., each observation in one cluster) and at each level recursively merges a selected pair of clusters into a single cluster. The latter starts at the top (i.e., all observations in one cluster) and at each level recursively splits one of the existing clusters at that level into two new clusters. Different versions of agglomerative methods arise from the choice of the intergroup dissimilarity metric, e.g., single linkage, complete linkage, average linkage. Ward (1963) 34 considered hierarchical clustering procedures based on minimizing the loss of information from joining two groups. Finally, model-based clustering assumes that the data in each cluster is generated from a given probabilistic distribution and the combined data stems from a convex combination of these distributions. In all the aforementioned methods, the number of clusters M has to be determined. For k-means and hierarchical clustering, this can be done based on many indices, 35 e.g., the silhouette width, 36 the gap statistic, 37 the Dunn index; 38 whereas, for model-based clustering, information criteria, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC) as well as integrated completed likelihood (ICL) could be used.
Analysis of raw data through classical multivariate techniques has several problems, because of the high number of evaluation points and the strong correlation. This may especially affect the model-based approach, that assumes non-singular covariance matrix. Moreover, raw-data clustering approach has the drawback of not taking into account the functional nature of the data and is not suited for curves observed at different evaluation points. While this suggests to use approaches specifically designed for functional data, for comparison purposes we still propose the raw data approach to the clustering. Details on multivariate clustering methods can be found in Everitt

Filtering methods
Filtering methods rely on the reconstruction of the functional observations {X i } from the discrete points {X ij }. The most common approach 9 is to assume that the functional observations are embedded in a finite dimensional functional space spanned by a finite set of basis functions. In particular, each X i can be written as The basis functions {φ j } can be either pre-specified, e.g. B-spline, 41 Fourier, 9 and wavelet, 42 or data-adaptive, e.g. obtained using functional principal component analysis (FPCA). 17 In case of pre-specified basis functions, if the {X ij } are observed with measurement error, then the coefficient vector c i is usually estimated asĉ i via penalized least-squares, even though standard least-squares could be used as well, 9 that iŝ where D 2 is the second order differential operator and λ > 0 is a smoothing parameter. It measures the trade-off between fit to the data, as determined by the residual sum of squares in the first term, and smoothness of X i , as quantified by the second term. Then, the reconstructed functional observation isX The choice of the smoothing parameter λ is based on the well-known trade-off between variance and bias. In particular, it is usually performed by picking the λ corresponding to the minimum value assumed by the generalized crossvalidation criterion. This criterion takes into account the degrees of freedom of the estimated curve that vary according to λ. 9 Moreover, the choice of K in Equation (2) is not crucial, 43 until it is sufficiently large to capture local behaviours of functional data.
The FPCA provides a data-adaptive basis to obtain the functional data as in Equation (2). In particular, the functional observations are reconstructed, for i = 1, . . . , n, aŝ where ξ i = (ξ i1 , . . . , ξ iL ) T is the vector of principal component scores or simply scores defined as ξ il = T ψ l (t) X i (t) dt, and ψ = (ψ 1 , . . . , ψ L ) T is the vector whose elements are weight functions referred to as principal components. Principal components are defined by an iterative algorithm which at each step finds the weight function that maximizes the mean square of the scores, or their sample variance, that is under the constraints: The choice of the number L in Equation (5) of retained components depends on several necessities. Generally, the retained principal components are chosen such that they explain at least a given percentage of the total variability. However, more sophisticated methods could be used as well. 44 In practice, reconstruction of functional observations allows one to reduce the dimensionality of the data by summarizing each curve through a finite set of parameters, that is {ĉ i } or {ξ i } depending on whether basis functions used are pre-specified or data-adaptive. Then, the finite set of parameters are clusterized by means of standard multivariate clustering techniques, such as k-means, hierarchical clustering or model-based clustering. As for the raw-data clustering methods, several indices could be used 35 to choose the number M of clusters.

Adaptive methods
The present set of methods relies on a finite dimensional representation of the functional data through basis functions (similarly to the filtering approaches) where the basis expansion coefficients are treated as random variables with cluster-specific probability distributions. This differs from the filtering methods, where the basis expansion coefficients are considered as parameters. One of the first example of adaptive method was in James and Sugar, 20 referred to as fclust. Similarly to the filtering approaches, if the functional observation X i belongs to the m-th cluster among the M clusters, it is modeled through basis functions as where φ = (φ 1 , . . . , φ K ) T are natural cubic splines, and η ik is a vector of spline normal random coefficients defined as with µ m the coefficient vector of the m-th cluster mean, and γ i ∼ N (0, Γ) the subject-specific random effects for the i-th curve. Then, the vector of discretized values X i = (X i1 , . . . , X im i ) T is modelled as  (10), where the cluster membership vector is modeled as a multinomial random variable with parameters (π 1 , . . . , π M ), with π m the probability of an observation to belong to the m-th cluster. Thus, the mixture likelihood is defined as where f m (X i ) is the conditional density function of X i belonging to the m-th cluster, that is The maximization is often carried out by means of the expected maximization (EM) algorithm. Once the unknown parameters have been estimated, each curve X i is assigned to the cluster whose estimated posterior probability of cluster membership π m|i =f m (X i )π m / M j=1f j (X i )π j is maximum. Moreover, the cluster mean coefficients µ m could be further optimally parameterized to produce useful low-dimensional representations of the curves. 20 Information criteria, such as AIC and BIC, are used to select the number M of clusters and the basis dimension K. 20 The use of spline basis has two main drawbacks: (i) they are inappropriate when dealing with functions that show peaks and irregularities, (ii) they require heavy computational efforts and so are not suitable to represent high dimensional data. For these reasons, Giocofci et al. 21 proposed an adaptive method based on the wavelet decomposition of the curves, referred to as waveclust. Similarly to James and Sugar, 20 the functional observation X i belonging to the m-th cluster is modeled as where µ m is the principal functional fixed effect that characterizes the mth cluster mean and U i is a subject-specific random deviation from µ m . By applying discrete wavelet transform to model in Equation (11), contaminated with an additional measurement error function E i (t), t ∈ T , the model reduces to a linear mixed-effect one. That is, are the vectors of scaling and wavelet coefficients of µ m , U i , E i and X i + E i , respectively; α m and β m are non-random parameters, whereas are normal random vectors with zero mean and covariance matrices G and σ ε I m i , respectively. Once projected in the wavelet domain, the clustering model (12) resumes to a standard one with additional random effects whose variance is of particular form. Thus, parameters are estimated by maximizing the likelihood function typically using the EM algorithm. Final assignment of each curve to a cluster is performed by maximizing the posterior probability of clustering membership. The number of clusters are chosen through BIC or ICL. 21 The last presented adaptive method was proposed by Bouveyron and Jacques, 18 and referred to as funHDDC. They consider, as James and Sugar, 20 that if X i belongs to a given cluster m, it admits the following basis expansion where Ψ = (Ψ 1 , . . . , Ψ K ) T is a given vector of basis functions, and γ im is a k-dimensional random vector. All the functions X i in a given cluster m are assumed to be adequately described in a low-dimensional functional latent subspace with dimension d m < K spanned by a group-specific basis function {ϕ mj }. Then, for a given X i in the cluster m, the random latent expansion coefficients λ i = (λ i1 , . . . , λ idm ) T in the group-specific basis function {ϕ mj } are linked to γ im as where U m is the K × d m matrix composed by the first d m columns of the orthogonal K × K matrix Q m , whose entries are the coefficients that the linearly link {Ψ k } and {ϕ mj }, and ε i ∈ R K is an independent random noise term. By assuming that λ i ∼ N (µ m , S m ), with S m = diag (a m1 , . . . , a mdm ), and that ε i ∼ N (0, Ξ m ), then (a m1 , . . . , a mdm , b m , . . . , b m ). Let us assume the cluster membership vector is modeled as a multinomial random variable with parameters (π 1 , . . . , π M ), with π m the probability of an observation to belong to the m-th cluster. Then, the mixture likelihood is defined as where f m (γ i ) is the conditional density function of X i to belong to the mth cluster, that is γ i |m ∼ N U m µ m , Q m ∆ m Q T m . The maximization is conveniently carried out by means of the EM algorithm. Moreover, it is possible to obtain parsimonious submodels of Equation (15) by constraining model parameters within or between groups. The latent subspace dimension d m and the number of clusters M are chosen through a scree-test and BIC, respectively. 18

Distance-based methods
These methods are the functional extension of classical geometric clustering algorithm to functional data, such as k-means 13 and hierarchical 11 clustering, that basically rely on the definition of proximity or dissimilarity among observations. Therefore, the extension to functional data consists in the introduction of an appropriate functional measure of proximity or dissimilarity. In this respect, several authors 11,13,45 agree upon the use of the following measure of proximity between the curves X i and X j where X (l) denotes the l-th derivative of X. In this case the number of clusters could be suitably chosen through the silhouette index. 36

Results and Discussion
In this section, we discuss on the results obtained by implementing the functional clustering methods presented in Section 3 to the DRC data set illustrated in Section 2. For the sake of readability, implementation details are deferred to the Appendix. The optimal number of clusters selected by each approach mentioned in Section 3 is reported in Table 1. Note that most of the methods provide similar results and identify two or three clusters. The only exceptions are some model-based methods, viz. adaptive fclust, filtering B-spline, filtering FPCA, and raw data, which select from four to eight clusters. In general, the larger the number of clusters, the harder the technological interpretation, i.e., the less straightforward the discrimination of spot welds belonging to different groups. Inflation in the number of clusters is usually due to overfitting problems especially for model-based approaches applied to high-dimensional correlated data and complex variance structures. In this case, the number of parameters to be estimated can be very large and may lead to instability, no matter if the BIC criterion, that penalizes the model complexity, is used to select the optimal number of clusters. This issue may be exacerbated by the use of model-based methods on raw data (see third row of Table 1), which do not rely on an optimal basis representation and typically contain additional noise. In Table 1 it is also reported the computation time required for each approach using a machine with an Intel Xeon 2.10 GHz processor. The adaptive approaches result as the most computationally intensive, while all the others require less than 3 minutes to complete the analysis. Even if strictly dependent on the data set at hand, this information may be crucial when dealing with complex data structures in order to pick the most appropriate method to be used when computational resources are limited. Figure 3 shows the DRCs coloured according to the cluster assignment provided by each method. Whereas, Figure 4 depicts the centroids (i.e., mean functions) for each cluster. Note that, in both figures, first, second and third rows of panels refer to clustering methods that select two, three and more than three clusters, respectively.
With reference to those figures and coherently with the features highlighted in Section 2, we note that DRC centroids have local minimum points with approximately the same abscissa, but different resistance values; local maximum points with approximately the same value, but different abscissa; different resistance values at the end of the functional domain. It will be in fact convenient to facilitate the following technological interpretation and insights into the industrial problem at hand to focus attention on (I ) the amplitude difference between minimum and maximum resistance values, (II ) the phase difference between minimum and maximum abscissas, and (III ) the final resistance value.
The first row of panels of Figure 4 displays centroids associated to cluster 1 having amplitude, phase difference and final resistance value smaller than those respectively associated to cluster 2. The separation is clear as cluster 1 centroids show a local minimum value, in the first part of the functional domain, that is distinctly larger than that corresponding to cluster 2 centroids, and decrease more rapidly to lower values in the last part of the domain.
Whereas, in the second row of panels, the amplitude, phase difference and final value of centroids increase together from cluster 1 to cluster 3, except for panels filtering FPCA k-means and raw k-means that have centroids of clusters 2 and 3 with approximately the same final value. That is, cluster 1 centroids show the larger local minimum value and the more rapid decrease Figure 3. Plot of the functional data. Each panel correspond to one of the proposed clustering methods, curves are coloured accordingly to the corresponding cluster assignment. Plots are arranged such that first, second and third rows of panels refer to clustering methods that divide DRCs into two, three, and more than three clusters, respectively.  Figure 4. Plot of the cluster centroids. Each panel corresponds to one of the proposed clustering methods, curves are centroids of each cluster obtained with the corresponding method. Plots are arranged such that first, second and third rows of panels refer to clustering methods that divide DRCs into two, three, and more than three clusters, respectively. at the end of the functional domain, whereas the other centroids tend to be more similar. Finally, with reference to each panel of the third row of Figure 4, centroids are shown to have cluster number sorted in ascending order by amplitude and phase difference, only. That is, the final resistance values do not preserve the order set by the phase difference, as in the first two rows of panels. In fact, with reference to the third row of panels of Figure 4, final resistance values of centroids obtained by adaptive fclust and filtering B-spline model-based, respectively depicted in the first and second panel, are sorted in ascending order with the cluster number as 1,2,4,3; whereas, those of the third panel, referring to filtering FPCA model-based method, are ordered as 1,2,3,5,4; and those of the fourth panel (raw model-based method) as 1, 3,2,5,7,8,4,6. As it typically happens, 14 also for this data set no one clustering method can be judged to be best in all circumstances. However, dealing with a small number of clusters, say two or three in this case, shall provide with a clearer interpretation of groups of functions that are well distinct and more likely to lead to informative classifications. Therefore, in what follows we assume selecting three clusters as the better compromise to trade off straightforward interpretation of DRCs belonging to the same clusters and distinct characterization of each cluster.
Consistently with the technological literature, 8,32 being the minimum point abscissa practically constant and the maximum point the landmark for the start of nugget formation (see also Section 2), we can state that the smaller the phase difference, and thus the larger the time interval between the local maximum and the end of the welding process, the more the heat energy supplied for nugget growth. Note that the inflection point typically located between the local minimum and maximum of the DRC (see, e.g. Figure 1) ideally represents the welding melting point. Centroids having larger phase difference shall thus characterize welding spot clusters with smaller nugget size. Unfortunately, in the real case at hand, this ideal statement may not hold because of the natural wear process of the electrodes, which induces, as conjectured by RSW process experts, a non-negligible increase of the weld area, and thus different welding conditions for each spot weld.
In order to explain the cluster in terms of electrode wear, in Figure 5 we report seven DRCs for which it has been possible to retrieve qualitative information on the wear status of the electrodes. For this purpose, without loss of generality, we refer to the filtering B-spline hierarchical method among those selecting three clusters, and compare the corresponding panel in the second row of Figure 4 with Figure 5. With reference to the latter figure, we may want to analyze the two extreme wear cases (thin solid and dotted lines) and conjecture that centroid of cluster 1 of Figure 4 shall correspond to the smaller electrode area, i.e., electrode just renewed, whereas centroid of cluster 3 to the larger one, i.e., electrode with severe wear. Even though the technological cause is different, experts' opinion is that DRCs associated to cluster 1 and those associated to cluster 3 correspond to spot welds with improper nugget diameter. In particular, for spot welds that belong to DRCs in cluster 1, the root cause is attributed to the excessive clamping pressure in the welding zone. The large clamping pressure is also confirmed by the small amplitude difference in the DRC centroid of cluster 1. 8 Conversely, spot welds pertaining to cluster 3 correspond to larger electrode area, so that the clamping force generates the lower pressure in the welding zone and cannot guarantee the proper value of current density. Indeed, despite the larger amplitude difference in the DRC centroid, which proves the smaller pressure, the nugget diameter may result undersized because of the delay in the nugget formation. Therefore, it turns out that the better spot welds in terms of nugget formation achieved by DRCs pertaining to intermediate cluster(s).
This conjecture becomes more clear in the light of Figure 6 in which observations with qualitative electrode wear status, already displayed in Figure  5, are colored by cluster number (assigned through filtering B-spline hierarchical method) and are superimposed to the corresponding centroids, already displayed in the first panel of the second row of Figure 4. By Figure 6, the wear status of the electrode clearly appears as a determinant factor in the clustering of DRCs. This result represents an important industrial finding that confirms an expert conjecture supported by functional data made available under the Industry 4.0 paradigm. The next steps to exploit this result is to explicitly identify conditions on the wear status to signal, when the corresponding DRC is associated to a cluster that does not guarantee proper mechanical and metallurgical properties, the need e.g., for electrode breakin, renewal or substitution. To put into action this strategy, or even more complicated maintenance programs, further technological investigation and off-line quality testing should be carried out for every DRC cluster. The ultimate goal is to avoid the random sampling of the sub-assemblies to be tested off-line and to support more specific priority assignment. One could in fact imagine to assign higher priority to future spot welds having DRC observation with larger (resp., smaller) distance from cluster centroids that have revealed to refer to adequate (resp., inadequate) quality.

Conclusions
In this article, we tackled the issue of finding homogeneous groups of dynamic resistance curves (DRCs) coming from a resistance spot welding (RSW) process, which, in the modern automotive industry 4.0, is of crucial relevance to better understand the effects of the process parameters on the final weld quality. To avoid loss of information caused by arbitrary scalar feature extraction, DRCs have been modelled as functional data defined on the time domain, and, accordingly, clustering methods specifically designed for functional data have been presented in a practical hands-on overview with the aim of facilitating their practical implementation. To the best of the authors' knowledge, this is the first study where functional clustering methods are applied to the whole DRC functional observations to gain technological insights on RSW processes, even if the framework used could go far beyond the specific application hereby investigated. The effectiveness of the presented functional clustering methods is demonstrated by applying them to 538 DRCs acquired during RSW lab tests at Centro Ricerche Fiat (CRF). It turned out that the identified clusters of DRCs are strictly linked with the wear status of the electrodes, that, in turn, affects the electrode contact area, clamping pressure in the welding zone and current density, and impacts on the final quality of spot welds in terms of mechanical and metallurgical properties. Indeed, in accordance with the experts, we agree the better spot welds shall correspond to DRCs belonging intermediate clusters having proper amplitude difference and small phase difference.
A broader perspective of the results is given in supporting practitioners in the priority assignment for off-line testing of welded spots and in the electrode wear analysis. Functional clustering analysis could be in fact imagined to be embedded in a wider on-line statistical quality control framework for RSW processes, which is able to properly exploit the properties of the clusters identified. Finally, the relationship between the electrode wear and the final quality of spot welds, which has been discovered by the proposed functional clustering analysis, could be now further investigated through the specific definition of opportune quantitative variables in the direction of routinely tracing wear status.
neers Alessandro Zanella, Gianmarco Genchi, and Mariano Quagliano for their technological insights in the interpretation of the results. component analysis on the obtained functional data set and retained only the components that explain the 99% of the total variability in the data. This allowed in practice to reconstruct original functions with a strong dimension reduction. In fact, since profiles in this data set are smooth, only 6 principal components were required.
For both B-spline and FPCA basis, when using the model-based approach, the selection of the optimal number of clusters was based on optimization of the BIC criterion. In the other cases, viz. k-means and hierarchical clustering methods, we relied on the R package NbClust, 35 which allows the calculation of several indices. The optimal number of clusters was chosen also in this case according to the majority rule.
Adaptive The adaptive fclust method was performed by means of the R package fclust, which requires the choice of the number of basis functions K, as mentioned in Section 3.3, that was set equal to 5, 10. Parameters and the number of clusters were set based on the BIC criterion.
For waveclust, we relied on the R package curvclust, 46 dedicated to model-based curve clustering. In particular, the considered models include Functional Clustering Mixed Models (FCMM, i.e., functional clustering with the presence of functional random effects), but also traditional functional clustering model (FCM, without functional random effects). Among FCMMs, several structures of the variance of the random effect can be chosen. In particular, the following alternatives are available, as mentioned in Giacofci et al.: 21 constant, group, scale-position, and group-scale-position dependent. It is also possible to decide whether to retain all coefficients or to perform individual denoising to keep coefficients which contain individual-specific information, by applying nonlinear wavelet hard thresholding before clustering. We considered all parameter combinations of the variance structures and chose the model and number of clusters that optimize the BIC criterion. Model fitting was performed by maximum likelihood method using the EM algorithm and the stochastic EM as initialization method.
When using funHDDC, several parameters need to be chosen. We considered 0.2, 0.5, and 0.9 as possible values for the threshold of the Cattell' screetest used for selecting the group-specific intrinsic dimensions d m . Moreover, the following alternative models are available, as described in Bouveyron and Jacques: 18 a kj b k Q k d k , a kj bQ k d k , a k b k Q k d k , ab k Q k d k , a k bQ k d k , abQ k d k . In this paper, we considered all parameter combinations and chose the model and numbers of clusters that optimize the BIC criterion. Moreover, to avoid local minima, for each parameter combination we repeated the model fitting 20 times and kept the model with the largest log-likelihood.

Distance-based
In the distance-based approach, we applied the clustering algorithm for each number of clusters by considering the distance in Equation (17) with l = 0, that is the usual L 2 distance, then, we kept the model with the best value of the silhouette index.