A Non‐linear Manifold Strategy for SHM Approaches

In the data‐based approach to structural health monitoring (SHM) when novelty detection is utilised as a means of diagnosis, benign operational and environmental variations of the structure can lead to false alarms and mask the presence of damage. The key element of this paper is to demonstrate a series of pattern recognition approaches which investigate complex correlations between the variables and thus potentially shed light on the variations within the data that are of interest for SHM.


Introduction
When structural health monitoring (SHM) technology is adopted as a tool for monitoring a structure, then the system often has to run continuously and online. The effects of any environmental variations must be considered and identified before choosing and using a reliable feature for revealing any structural condition.
A catholic argument is that no sensor exists that can directly measure any type of novelty. For this reason, feature extraction is used to derive useful metrics from the raw data that can further be post-processed through advanced signal processing tools. The methods for feature extraction serve two purposes: a reduction in the dimensionality by mapping the data from high-dimensional spaces to lower-dimensional spaces and a revealing of hidden aspects of the data by learning the structure between the variables of interest.
One of the most challenging tasks in SHM methodology is to understand and eliminate the influence of temperature on structural response. Especially for bridges, which is the immediate concern of this study, temperature is generally understood to be an important environmental factor which affects the dynamic response of the structure, due to its influence on the stiffness of structural parameters and on the boundary conditions of the structure [1][2][3][4][5].
Various methods and algorithms have been proposed in order to counteract and remove the influence of external variations such as principal component analysis, autoassociative neural networks (AANN) [6] or more recently cointegration [1,[7][8][9][10][11][12][13]. Although, these methods exhibit a series of advantages and disadvantages in terms of removing the influence of operational and environmental conditions, little effort has been carried out in terms of constructing a robust chain of methods that characterises the manifold that is constructed between the variables and distinguishing which of the outliers indicated in the data are representing environmental/operational variations and which are representing damage or structural performance degradation. For more details regarding the different natures of outliers in multivariate statistics, the reader can consult [14][15][16][17].
A recent paper by Dervilis et al. [17] explores an approach of robust regression and robust multivariate statistics as a means of characterising and distinguishing the influence of environmental and operational conditions on the structural response. Specifically, the outliers may arise in the data as the result of both benign and malign causes, and it is important to understand their sources [17].
The layout of this paper is as follows. The discussion begins with the description of the Z24 bridge. In Strategy section, the strategy that will be followed in this paper is presented. Nonlinear manifold learning via locally linear embedding section gives some background analysis regarding the non-linear manifold learning approach. The study concludes with the presentation of some key results.

A Quick Overview of the Z24 Bridge
The Z24 bridge (see Figure 1) was a concrete highway structure in Switzerland connecting Koppigen and Utzenstorf, and in the late 1990s, before its demolishment procedure, it was used for SHM purposes under the 'SIMCES' project [1,18]. During a whole year of monitoring of the bridge, a series of sensor systems captured modal parameter measurements, as well as a family of environmental measurements such as air temperature, soil temperature, humidity, wind speed etc. The critical point in this benchmark project was the introduction of different types of real progressive damage scenarios towards the end of the monitoring year (Table 1).
For the purposes of this study, the four natural frequencies that were extracted over a period of year, including the period of structural failure of the bridge are used. Figure 2 shows the four natural frequencies with values between 0-12 Hz (vertical y-axis is the natural frequencies in Hz).
The beginning of the introduced failure occurs at observation 3476. The time instances between Figures 2 and 3 are the same. It has to be mentioned that some values of failed measurements have been removed.
The Z24 bridge was recently extensively analysed using robust methods such as least-trimmed squares and minimum covariance determinant (MCD) techniques as means of exploring environmental variations for SHM purposes in a previous and ongoing work [16,17]. It was found that environmental variations due to sub-zero temperatures manifest themselves differently in feature space compared with the damaged condition. This was very vital information as it showed that the nature of outliers between operational/environmental variations and damage may have markedly different characteristics.
Furthermore, it was found that the Z24 bridge has a highly non-linear behaviour.
It can be noted that there are some visible fluctuations between observations 1200-1500 (below À5°C). As one can see, there are no visible fluctuations after the introduction of damage (3476) and is clear that the temperature fluctuation masks the dynamic presence of damage. This is the reason that advanced machine learning techniques are utilised as a means of revealing the hidden characteristics of the structural modal data.
The critical fluctuations are highly related to periods of very cold temperatures much under 0°C, and there is a direct connection with increased stiffness based on the  Failure of anchor heads 7 Number of post-tensioning tendon failures freezing of the asphalt layer of the bridge deck. In turn, these large temperature fluctuations are suitable candidates to introduce non-linear characteristics.
The motivation of this paper is to reveal the non-linear manifold between the natural frequencies and then to try to remove these temperature variations and detect clearly the damage. The next section describes the strategy that will be followed.

Strategy
The chain of methods applied in this paper aims to investigate the appearance of benign fluctuations in data from the Z24 bridge. First, the whole data set of the four natural frequencies is reduced to two dimensions using a non-linear manifold technique, in this case locally linear embedding (LLE) (non-linear principal component analysis via the usage of auto-associative neural networks [6] is another strong method). For the current purposes, LLE is used as it is a much simpler tool than AANN.
Next, the minimum MCD estimator index is used in order to reveal inclusive outliers without a priori knowledge of whether benign variations are present in the normal test data [14][15][16][19][20][21][22]. The application of robust computation of location and covariance estimation of multivariate data is of significant interest in the investigation of inclusive outliers (for more details see also Appendix A).
In order to make visible the influence of temperature on the measured natural frequencies, a powerful automatic clustering technique like affinity propagation (AP) algorithm can be applied [23,24], as here. AP identifies exemplars among data points and forms clusters of data points around these exemplars. The specific algorithm operates by simultaneously considering all data points as potential exemplars and exchanging messages between data points until a good set of exemplars and clusters emerges. More detailed information about the exact procedure on how the AP algorithm is passing the messages between data points can be found in [23,24].
The strategy finishes by trying to predict the components of the manifold in order to remove any doubt about which data is influenced by environmental fluctuations and which belongs to the damaged case. Use of Gaussian processes (GPs) is a current research area of increasing interest, not only for regression but also for classification purposes (for more details, readers are referred to Appendix B and [25]). GPs are a stochastic non-parametric Bayesian approach to regression and classification problems. These GPs are computationally very efficient, and the non-linear learning is relatively easy. GP regression takes into account all possible functions that fit to the training data vector and gives a predictive distribution around a single prediction for a given input vector. A mean prediction and confidence intervals on this prediction can be calculated from the predictive distribution.
The initial and basic step in order to apply GP regression is to obtain a mean and covariance function. These functions are specified separately and consist of a specification of a functional form and a set of parameters called hyperparameters. Here, a zero-mean function and a squaredexponential covariance function are applied (see Appendix B or [25]). When the mean and covariance functions are defined, then the inference method specifies the calculation of the exact model and in simple terms describes how to compute hyperparameters by determining the minimisation of the negative log marginal likelihood. The software used for the implementation of GP regression was provided by [25].
Setting an appropriate threshold in the absence of any damage-state data, as is the case in this study, is a non-trivial task. A Monte Carlo simulation based on extreme value statistics was used. The procedure that was conducted in order to calculate the threshold is as follows: matrix is constructed with each individual element a randomly generated number from a normal distribution with zero mean and unit standard deviation.
• The distance value is evaluated for all matrix values, where the robust mean and covariance matrix are inclusive. The largest (i.e. extreme), value recorded for each trial matrix is stored.
• The process is repeated for a large number of trials in order to generate an array of 'extreme' distance calculations. Next, all the values are ordered in terms of magnitude. The critical values (alpha value, α) can take different values such as 5 or 1% for a test of discordancy. In this paper, α is set equal to 1% giving a 99% confidence limit.

Non-linear Manifold Learning via Locally Linear Embedding
As mentioned in previous sections, the combination of strong non-linearity and the influence of environmental fluctuations makes the damage detection performance very weak. This is the reason that a quick and effective method of non-linear manifold learning such as LLE is introduced in this study [26,27].
An extensive overview of the algorithm can be found in [26,27]. Briefly and for the purposes of this paper, a short description is given here.
The LLE method is based on simple geometric intuition. If the observations consist of n real-valued vectors {x i } with dimensions D and they are sampled from a smooth underlying non-linear manifold, then each data point and its neighbours are expected to lie on or close to a locally formed patch of the manifold. The local geometries can be characterised by finding linear coefficients that can reconstruct each data point with respect to each set of neighbours.
If one establishes K nearest neighbours per data point, then the reconstruction error is given by the cost function: (1) where [W ij ] is the weight contribution of the j th data point to the i th reconstruction. In order to compute these weights, the cost function has to be minimised under the following constraints. The reconstruction errors that are subject to the constrained weights should be invariant to rotations and rescaling. In turn, in order that the LLE algorithm preserves this invariant manifold idea as a final step of the method, each measurement {x i } should be mapped to a lower dimensional vector {Y i } that minimises the cost function: (2) The main difference with the previous cost function is that here the weights are fixed but the {Y i } co-ordinates are optimised.

Revealing the Non-linear Manifold between the Natural Frequencies
As can be seen from Figure 4, the Z24 natural frequency data are projected into two-dimensional space using the LLE algorithm, and in Table 2, a description of the different data sets in relation to temperature is given.
The condition 'undamaged' is given as a label to the data when the monitoring campaign started. The labels are just for characterising the 'condition' just throughout the monitoring year when the project started (as of course, the bridge had been in operational service for some period before the monitoring campaign has started).
The manifold that is revealed is giving two distinctive directions of the data sets, one regarding the cold and very cold temperatures (green) and one regarding the damage observations (black). Furthermore, it is worth noting that the hot temperatures (red) lie in the same space as some early damage data. If one tries to identify outliers without setting beforehand a training set by using the MCD method (see Figure 5), one can see that both the cold temperatures and the damage will appear as outliers.
Of course, this connection between data and temperature is known if measurements of the temperatures were obtained. If the measurements of the temperatures were not known beforehand, then an automatic clustering method can be applied, in this case, the AP algorithm. This automatic clustering method is presented as a novel future tool in structural data analysis as it offers the advanced characteristic of defining different categories within the manifold of the variables. As an example in this case study in order to make visible the dramatic influence of temperature on the measured natural frequencies, the AP tool presents the high potential of applying it on modal data.
It can be seen in Figure 6 (free to find maximum number of clusters) and especially Figure 7 (restrained) that the AP algorithm finds five classes which have a very good agreement with the separation of data presented in Table 2. The AP algorithm finds the cold, normal and hot temperature (and some damage data) influenced points as well as three stages of damage observations which, in comparison with the MCD index, are showing a progressive direction.
As a last step after the MCD and AP tools are applied, one can use GP regression in order to predict the 1 st component based on the 2 nd one and vice versa. As a training set for the GPs, only the first 500 points were used, and the rest as testing set. Gaussian process regression model error is used as an index of abnormal response. Furthermore, as it will be seen  later using this regression error (residual error, which is the difference between the algorithm predictions and the actual data), a strong visualisation that indicates when faults occur will be presented. The calculation of the threshold is explained in [6,16]. As can be seen from Figures 8, 9, the GPs predict/classify correctly most of temperatures (especially Figure 9 and mainly the very cold temperatures which were found as outliers in Figure 5), and the residual error exceeds the thresholds when damage is present. This is a strongly encouraging result as the strategy followed, manages to a great extent, to minimise the novelty due to temperature variations by learning their non-linear characteristics and by applying a strong nonlinear regression tool like GPs to detect novelty that is directly connected with the damaged state of the bridge.
It has to be pointed out that the training involved the first 500 points which means no freezing temperatures below À5°C (as between points 1201-1500) are utilised in the training data set. This key comment is a vital point in validating that the chain of strategic steps that are followed here can offer a useful tool in the robust investigation of benign variations during a monitoring campaign.

Conclusion
The purpose of this paper is to highlight the key utility of some specific machine-learning methods, not only for novelty detection analysis but also as a method of investigating the space where data clusters are lying. It also gives a chain of tools for revealing the influence of benign    variations like temperature when modal parameters are extracted. The main benefit of the approach taken here is that complicated algebraic analysis is not necessary. Furthermore, in this paper, robust outlier statistics and unsupervised learning techniques are used, focussed mainly on a high level estimation of the 'masking effect' of inclusive outliers, not only for determining the presence or absence of noveltysomething that is of fundamental interestbut also to examine the normal condition set under the suspicion that it may already include multiple abnormalities.
A multivariate data matrix [X] = ({x 1 }, …, {x m }) T is assumed of m points in n-dimensional space (n × m), where {x i } = (x i1 , …, x in ) T is an observation. Robust estimates of the centre μ and the scatter matrix σ of X can be calculated by the MCD estimator. The MCD tool looks for the h > m 2 ð Þ observations out of m whose classical covariance matrix has the lowest possible determinant. The raw MCD estimate of location (arithmetic mean) is then computed from the average of these h points, and the raw MCD estimation of scatter is the covariance matrix multiplied by a consistency factor.
The calculation of the lowest determinant is critical, as one moves from one approximation of MCD to another one with lower determinant. This tool and the proof that follows it are not obvious and can be found in the appendix of [19].
Based on the raw MCD estimates, a reweighting step can be added in order to increase the finite sampling efficiency. The advantage is that MCD estimates can resist up to (m-h) outliers, and in turn, the number h (or equally a ¼ h m ) controls the robustness of the estimator. The highest resistance compared to contamination is achieved by . It is proposed that when a large proportion of contamination is assumed, then h = an with a = 0.5. Detecting outliers can be challenging when m/n is small because some data points can become coplanar. This is a general problem in the machine-learning community called the 'curse of dimensionality'. It is recommended [20] that when m n > 5 , a should be 0.5. Generally, the MCD estimates of location and scatter are affine equivariant which means that they are invariant under affine transformation behaviour. This is crucial as the underlying model is then immune to different variable scales and data rotations. Rousseeuw and Van Driessen [19] developed the FAST-MCD algorithm based on a concentration step (C-step). C-steps select the h observations with the smallest distances and the scatter matrix with the lower determinant [19], and the main details are given.

APPENDIX B: GAUSSIAN PROCESS REGRESSION ALGORITHM
Rasmussen and Williams [25] define a Gaussian process (GP) as 'a collection of random variables, any finite number of which have a joint Gaussian distribution'. In recent years, GPs are gaining a lot of attention as a machine learning approach in the area of regression (or classification) analysis as they offer fast and simple computations. GP regression is a robust tool which takes into account all possible functions that fit to the training data set and gives a predictive distribution of a single prediction for a given input vector. As result, a mean prediction and confidence intervals on this prediction can be calculated from this predictive distribution. The basic details of the algorithm are presented following the steps in [25]. The algorithm that was used in the previous sections is also coming from Rasmussen and Williams [25].