A label machine for mechanical systems: Discovering operating states with unsupervised learning from load time series

Labeling time series data according to operating states is often a time‐consuming task that requires expert domain knowledge of the underlying mechanical system. In this paper, we propose a data‐driven algorithm that identifies and detects operating states from time series data by grouping time ranges of similar signal behavior together using an unsupervised machine learning approach. The scattering transform and principal component analysis are utilized to extract signal characteristics from time series data which are subsequently clustered by a Gaussian mixture model to generate operating states. To evaluate our approach, we compare the automatically generated operating states with a manual definition of operating states created through expert knowledge. Based on a publicly available eBike dataset, the results demonstrate that the data‐driven definition of operating states can yield similar results to a rule set based on expert knowledge.


INTRODUCTION
A major challenge in designing mechanical systems is to obtain representative load assumptions that accurately describe their fatigue behavior.Such representative load assumptions are commonly created as a combination of individual operating states that are extrapolated based on an estimated usage profile throughout the useful lifetime of the system [1].
In practice, operating states can be modeled manually through a factor model approach where operating states are defined based on expert knowledge.Different factors (e.g., road type or payload for a vehicle) that influence the behavior of the system are divided into appropriate value ranges, and each permutation of factor values is then declared an operating state [2][3][4][5].The acquisition of these states' distribution profiles (usage profiles) is accomplished via data collection during measurement campaigns with test vehicles, or through the evaluation of questionnaires.
Burger et al. [6] extend this concept by taking into account additional data sources and analysis methods, thereby allowing a semi-automated collection of usage profiles.This approach involves identifying the current operating states using a mix of sensor readings.These readings can be taken from real customer vehicles and sent using telemetry systems.Burger et al. further describe how supervised machine learning models can be incorporated for establishing a relationship between operating states and the available sensor data which is trained based on 200 h of manually labeled video footage.
A similar approach is used by Stellmach et al. [7] where different-potentially coexisting-maneuvers of an agricultural machine are described by sensor data that is available on the machine.To detect whether a maneuver is present, multiple sensor signals are linked logically through a rule set that is created based on expert knowledge.They further differentiate each maneuver based on ranges of vehicle speed.Those maneuvers are then used to characterize the fatigue damage associated with the corresponding operating states.
The aforementioned approaches share the manual definition of operating states based on domain knowledge of the systems' overall operating conditions on a coarse timescale.An alternative data-driven approach is given by Heindel et al. [8,9].They present an algorithm based on the scattering transform and principal component analysis (PCA) that extracts a low dimensional representation of time series signals which can be utilized for maneuver identification and fatigue monitoring applications.This toolchain was previously utilized in a geophysics application as a preprocessor for clustering seismic time series data with a Gaussian mixture model with the goal of predicting seismic activity [10].Heindel et al. were able to successfully transfer this approach into the domain of fatigue analysis, suggesting a broad domain-unspecific applicability.
In this paper, we propose an algorithm that automates the definition and identification of operating states for a specific point of measurement based on load or stress time series.It combines the scattering transform and PCA to extract signal characteristics from a time series signal [8] and employs a Gaussian mixture model (GMM) clustering [10] to derive previously unknown operating states solely by grouping similar signal characteristics.The algorithm has a limited required number of hyperparameters necessary to define data-driven operating states.It, therefore, has the potential to reduce the expert knowledge and effort that is usually required to manually label time series data by operating states.The proposed algorithm is intended as the first step in a toolchain that subsequently performs a hyperparameter optimization, which is why we select the hyperparameters manually in the scope of this paper.A publicly available eBike dataset [9] is used to compare the operating states generated by the clustering algorithm with a set of operating states manually defined based on the factor model approach.
First, the individual steps of the algorithm are introduced.It follows a description of the dataset and the construction of a factor model for manually derived operating states.Subsequently, the algorithm and the choices for setting the hyperparameters in the numeric studies are explained in detail.The resulting operating state definitions are evaluated, followed by a short discussion and an outlook towards future extensions.

METHODS
The scattering transform and PCA are used to extract signal characteristics from time series data.Its output is clustered through a GMM.This paper provides a general overview of the underlying concepts and focus on the relevant parameters that influence the results of the algorithm.

Scattering transform
The scattering transform converts a time series signal into a time-shift invariant representation that is stable with respect to time-warping deformations [11,12].It is commonly used as a feature extraction method for various regression and classification tasks in a broad range of domains such as audio [11] and image processing [13], geophysics [10], and fatigue monitoring as well as maneuver identification [8].
The scattering transform captures time, frequency and transient information from a time series signal, by computing modulation spectrum coefficients (scattering coefficients) of multiple orders by cascading wavelet convolutions [14] and modulus operations [11].For the first-order scattering coefficients, the original signal is convolved with a range of different wavelets defined by a filter bank   1 ().The filter bank is constructed by dilating a mother wavelet  * () where  1 is the number of wavelets per octave in the first order and therefore dictates the frequency resolution.
A modulus operation is applied to the results of the first wavelet transformation, which is then averaged over time with a low-pass filter   to compute the first-order scattering coefficients.The low-pass filters' averaging scale is given by 2  with  ∈ ℕ [11].In an iterative process, the higher-order coefficients are computed through applying another wavelet convolution and modulus operation on the coefficients of the previous order.The filter banks can differ in frequency resolution by selecting  +1 ≠   .In practice, two orders of scattering coefficients are usually adequate as higher-orders contain minimal information.
In this paper, the scattering transform is computed using the python package kymatio [15].This implementation uses the complex Morlet wavelet (Gaussian envelope) as the mother wavelet.

Principal component analysis
A PCA [16,17] is commonly used to perform dimensionality reduction.It does so by computing new variables called principal components (PCs) which are linear combinations of the original variables.
The first PC is computed to contain the largest possible variance.All subsequent PCs are constructed iteratively to maximize the remaining variance with the constraint that they are orthogonal to the previous PC.This results in a linear, orthogonal transformation of the original dataset where the resulting axes are sorted by variance.Mathematically, the PCs are the eigenvectors of the dataset covariance matrix.These can be determined by performing singular value decomposition of the covariance matrix.The PCs are sorted by their eigenvalues, which represent the variance along this axis.
Dimensionality reduction is achieved by taking only a limited number of PCs into account.This can lead to a large reduction of necessary axes if most of the variance can be explained by only a few PCs.There is no general method for selecting the correct number of PCs, as the best number depends on the data and the application it is used for [16].However, there are several recommendations that range from selecting only PCs with eigenvalues ≥ 1, utilizing graphical evaluations (scree-or elbow-method) or by selecting as many PCs as are required to retain a predefined amount of variance.
PCA is highly sensitive to the absolute value range of the datasets' features.Mitigating measures include feature-wise preprocessing steps, such as centering (ensure zero mean) and scaling the value ranges to similar orders of magnitude (e.g., unit variance or unit value range).
In this paper, the python package skicit-learn [18] is used to perform PCA.Scikit-learn always centers all features to zero mean.

Gaussian mixture model
A GMM approximates the multivariate distribution described by a dataset through a linear superposition of Gaussian distributions [17].The parameters of the Gaussian distributions can be calculated by randomly assuming initial parameters for each Gaussian which are then adapted to the data through the expectation maximization (EM) algorithm [19].A GMM performs soft clustering, which means that every data point belongs to every cluster with a certain weight.However, it can also be used to directly associate a data point with one specific mixture component (cluster) by assigning it to the cluster with the highest probability density.The latter approach is used in this paper.By relying on Gaussian distributions to describe a cluster shape, the GMM can identify elliptical and overlapping clusters with different densities.In this paper, the GMM is realized with the python package scikit-learn [18].

EXPERIMENTS
We propose an algorithmic toolchain to define and identify operating states for a mechanical system solely based on time series data.The toolchain consists of the scattering transform followed by a PCA, which extracts a condensed representation of the time series characteristics.This representation is computed for fixed size time windows, where it is subsequently clustered into operating states using a GMM clustering.The effectiveness of the approach is demonstrated on the publicly available ePredict dataset [9] where the operating states obtained by our approach are compared to a definition of operating states based on a factor model approach.

Dataset
The ePredict dataset [9] contains strain and acceleration data measured for an eBike.The measurements were conducted on public grounds.They contain trips of general usage (subset: "Unlabeled data") and trips that were generated under predefined driving conditions (subset: "Labeled data").In this paper, the "Labeled data" is used to manually derive operating states from the driving conditions, because the knowledge of these predefined driving conditions allows creating a factor model.Throughout this paper, we focus on one specific point of measurement-strain gauge 1 (sampled at 1.2 kHz)-whose position on the eBike is indicated by the red circle in Figure 1.
The "Labeled data" subset contains individual trips with an average duration of about three minutes each where the driving conditions were held constant.It is further divided into a training and a test split, while only the training splitthat contains 16 individual trips-is utilized in the scope of this paper.The driving conditions span a 3D parameter space based on speed, road type, and rider while the influence of the rider is assumed to be his mass.The measurements were conducted on an asphalt and a cobblestone road.The speeds per measurement were kept constant with target speeds {5, 10, 15, 20, 25} km∕h on asphalt and {5, 10, 15} km∕h on the cobblestone site.There are two different riders with masses  1 = 66 kg,  2 = 83 kg who rode the bikes in identical fashion (same speed, same road type).
The individual trips are synthesized into one single time series for visualization purposes.As the analysis is performed based on time windows, only complete time windows are taken into account.This requires removing the last time window from each trip if it is incomplete.The resulting time series and the arrangement of the individual trips are described in Figure 2.

Reference operating states
To establish a reference definition of operating states, we manually construct a factor model [6] to classify the individual trips of our time series into operating states.As the three parameters used to create the individual trips are known, the simplest factor model would consider every parameter a factor, hence generating 16 different parameter sets or operating states-one for each trip.However, when taking the signal behavior into account (see Figure 2) there are no significant differences in the signal characteristics (especially amplitudes) between both riders, except for the states [15 km∕h, cobblestone] where the heavier rider produces significantly higher amplitudes.The operating states are defined as the combinations of the factors speed and road type.In addition, we further distinguish between both riders for 15 km∕h on cobblestone, which leads to a total of nine operating states.

Algorithm and parametrization
The proposed algorithm consists of four general steps.At first, the time series is partitioned into nonoverlapping equally sized time windows.Second, the scattering transform is applied to every time window.The third step performs PCA on the scattering coefficients, while the resulting PCs are then clustered through a GMM in the last step.
In this paper, we include optional transformations-logarithmization and / or normalization-between the scattering transform and the PCA, which alter the scattering coefficients and therefore influence the PCA.These transformations are computed independently for every feature vector of the scattering coefficients .The logarithm is computed as where  = 10 −6 is a small offset for numerical stability of In case both transformations are present, the normalization is performed after the logarithm.The hyperparameters for this algorithm are the combination of the hyperparameters for each step.Whether taking the logarithm and / or applying the normalization are carried out is also considered a hyperparameter.The steps result in the hyperparameters window size, the parameters of the scattering transform (largest low-pass filter scale , wavelets per octave of first and second-order filter banks  1 and  2 [15]), the number of PCs to consider and the number of clusters used by the GMM.
For the experiments, we select the window size to be 4096 ≈ 3.4 s in agreement with [8].The scattering transform is constructed with  = 12,  1 = 16,  2 = 1, which leads to a high-frequency resolution with narrow frequency and wide time range support.The number of PCs cannot be reasonably selected in advance as it needs to be determined based on the actual data and the usage of the optional transformations.It can be obtained either by utilizing the guidelines mentioned in Section 2.2 or through systematic experimentation and iterative refinement.For the GMM, we set the number of clusters to nine, which equals the number of operating states of the reference definition.

RESULTS
In this section, a brief overview of the numerical studies is given.The focus lies on the effects of the hyperparameters on the clustering results.It follows a comparison of the manually and automatically defined operating states.

Effect of transforming scattering coefficients
By analyzing the data presentation before applying the PCA, the effects of the optional transformations on the scattering coefficients can be evaluated.The main goal is to increase the discriminability between time windows from different driving conditions.As can be observed in Figure 3, the raw (Figure 3A) and normalized scattering coefficients (Figure 3B) show significant differences between the asphalt road (low coefficient values) and cobblestone road (high coefficient values) in the frequency band between 5 and 100 Hz.The different speed ranges per road type can hardly be distinguished.When applying the logarithm (Figure 3C), it becomes apparent that the coefficients differ by several orders of magnitude between the speed ranges at certain frequencies.Especially in the logarithmic and normalized representation (Figure 3D), it is possible to distinguish the different speeds and road types mostly by considering the coefficients around 10 and 100Hz center frequency.

Comparison to manually defined labels
In Figure 4, the results of the PCA-after logarithmization and normalization-can be observed, colored according to manually defined operating states (A, B) and based on the results of a subsequent GMM (C, D).The PCA leads to four visually separable groups, where-based on the manually labeled operating states-the first group contains all asphalt trips (I-V) and the remaining three groups represent a different speed on the cobblestone road each (VI-IX).The distinction between both riders (VIII, IX) is not obvious in the first two PCs but in the fourth PC that is not displayed here.The speeds on the asphalt road correlate with PC 1, while the distances between these cluster centers decrease with increasing speeds.The clusters on asphalt ground are more similar to each other than on cobblestone.
To automatically identify the operating states, the GMM is performed based on the first four PCs.The guidelines for choosing the number of PCs according to Section 2.2 suggest using the first two PCs (eigenvalues ≥ 1 and elbowmethod), which are not able to describe the overlapping clusters (I-V and VIII-IX).Therefore, four PCs where selected manually through iterative refinement.The resulting operating states show striking similarities to the manually defined ones, especially for the cobblestone trips that show high discriminability between the speed ranges.On the asphalt road, the automatically defined operating states for the speeds 5 and 10 km∕h match their respective counterparts (state 1, 2), while the states of the higher speeds differ in detail.As can be seen in Figure 4D, states 3-5 appear to correlate with the strain amplitudes when labeled based on the GMM, opposed to the speed-based correlation of the manually defined operating states (III-V).

CONCLUSION
In this paper, we introduce an algorithm to automatically define operating states from time series data based on the scattering transform, PCA and Gaussian mixture model clustering.For an eBike dataset, it is demonstrated that meaningful operating states can be derived based on a single channel strain signal.These states show many similarities to manually defined operating states that are based on the overall vehicle state.The main differences occur between trips on asphalt roads in a speed range of 15-25 km/h, where the manually defined operating states correlate with the speed, and the automatically defined operating states correlate with the strain amplitudes.The temporal resolution of our algorithm is the size of a time window (here ≈ 3.4 s), which indicates that the signal-based distinction of operating states can be performed on a small timescale.For the given dataset, it was observed that applying a logarithm and a normalization to the scattering coefficients increases the discriminability between different driving states.It has to be noted that-in order to archive the best resultsthe number of PCs to consider for the clustering has to be determined manually as common guidelines suggest excluding some essential PCs.The remaining hyperparameters are set based on minimal expert knowledge that requires a basic understanding of relevant frequency bands (window size, scattering filter banks: ,  1 ,  2 ) and to preselect the number of operating states that are to be defined.
In order to further reduce the amount of expert knowledge required to parametrize this algorithm, an objective metric for describing the quality of the operating state definition is essential.In the general case, there will be no manual reference as this would make this algorithm obsolete.Commonly used internal clustering metrics (e.g., the silhouette coefficient) [20] are based on distances in the input space of the clustering algorithm, which in this case is the PC space.It is highly questionable whether such metrics are suitable as objective measures of quality, since these distances are not directly physically interpretable and are subject to hyperparameter change.Therefore, in order to automatically determine the hyperparameters, it is imperative to derive a physics-based quality metric to evaluate the performance of an operating state definition.

A C K N O W L E D G M E N T S
Open access funding enabled and organized by Projekt DEAL.

F I G U R E 1 F I G U R E 2
ePredict eBike.The red circle indicates strain gauge 1, which is used in this paper to showcase the performance of our proposed algorithm.It reacts mostly to loads caused by an excitation through the underground into the vertical direction[9].Dataset as concatenation of 16 individual trips in time series representation.All trips are described by the three parameters rider mass, speed, and road type.The measurements for both riders were collected in identical fashion.
zero-or close-to zero-valued scattering coefficients and |(… )| denotes the element-wise modulus operator.The normalization projects the value range of each coefficient onto the interval [0, 1] by computing  normalized = ( − min )∕(max  − min ).

F I G U R E 3
Scalogram of first order scattering coefficients (wavelet scales converted to corresponding center frequencies) after different transformations (A) the untransformed coefficients, (B) the normalized coefficients, (C) the logarithmized coefficients, (D) the normalized representation of the logarithmized coefficients.The normalized versions (B, D) show white noise characteristic in the outer frequency bands, where the orignal scattering coefficients are close or constant to zero.The asphalt and cobblestone trips can be distinguished in all versions (at around 5 Hz), while the different speed ranges-especially for the asphalt trips-can best be separated by transformation (D).F I G U R E 4 Comparison of manually declared operating state definition with operating states found by the clustering algorithm.(A) and (C) show the time windows in the PC space colored by the respective operating state.(B) and (D) show the same data in the time domain, also colored by the respective operating state.The depicted PC representation is based on logarithmized and normalized scattering coefficients and the GMM clustering algorithm takes four PCs into account and is parametrized to use nine Gaussian distributions.