Data-driven virtual sensing and dynamic strain estimation for fatigue analysis of offshore wind turbine using principal component analysis

Virtual sensing enables estimation of stress in unmeasured locations of a system using a system model, physical sensors and a process model. The system model holds the relationship between the physical sensors and the desired stress response. A process model processes both the physical sensors and the system model to synthesise virtual sensors that ‘ measure ’ the desired stress response. Thus, virtual sensing enables mapping between the physical sensors (input) and the desired stress response (output). The system model is a mathematical model of the system based on knowledge or data of the system. Here, the data-driven system model is constructed directly on data analyses for the specific system. In this paper, supervised learning and data-driven system models are applied to strain estimation of an offshore wind turbine in the dynamic range through a novel use of principal component analysis (PCA); 40 min of training data is used to establish the data-driven system model that can estimate the dynamic strain response with high precision for 2 months, while the estimated fatigue damage averaged out to (cid:1) 1.76% of the measured strain response.


| INTRODUCTION
Virtual sensing is a technique that expands measured data to unmeasured locations and/or transforms it into other quantities by synthesising virtual sensors. 1 The stress/strain estimation is a subcategory within virtual sensing that estimates the full-field stress/strain response of a system. 2 Stress or strain estimation is known by many names and terms: stress/strain prediction, reconstruction of unmeasured stress/strain, fatigue prediction/estimation, hybrid modal analysis, full-strain fields, full-field stress/strain estimation, full-field stress/strain distribution, virtual sensing, soft sensing, full-state estimation and so forth.Unfortunately, there is a lack of consensus and common terminology in the field 2 for virtual sensing.In this paper, we will use the terminology proposed by Marius Tarpø. 2 In this terminology, we need three components to synthesise virtual sensors: system model, physical sensors and process model, as outlined in Figure 1.The system model could have any form and format, but the model must contain all relevant information on the system and the sensors.In the terms of control theory, the model holds the different states of the system.The physical sensors measure physical quantities, and the term includes processing of the measurements.The process model is a state estimator of the full state of the system based on both the measurements from the physical sensors and the system model.Here, the full state is a set of state variables or principal components, which describe the entire system.In this process, the system model is reduced to a set of degrees of freedom that corresponds to the number and position of the physical sensors.This reduced system model determines the accuracy and stability in the inference of the full state based on data from the physical sensors.Through the estimated full state, virtual sensors are available for any desired quantity in the full system.We should note that the term 'virtual sensing techniques' is often interchangeable with the term 'process model' in the literature.
In turn, the estimated stress history paves the way for fatigue assessments of the system, so an estimated fatigue life is available.Thus, virtual sensing holds the potential for lifetime extensions for systems prone to fatigue damage like wind turbines.Therefore, plenty of research has been conducted for stress/strain estimation for offshore structures. 2 In the literature of virtual sensing, there are several successful applications for stress/strain estimation, while most applications are conducted in laboratories under controlled conditions.On the other hand, the number of applications under the actual operating conditions is scarce. 2Stress/strain estimation began in the 1950s with analytical relationships between response and strain for beams or plates [3][4][5] that enable virtual sensing limited to specific systems under the conditions and assumptions applied in the expressions.These analytical expressions dominated the field until transformation matrices were introduced in the 1990s that is not restricted to any specific linear system.Here, Okubo and Yamaguchi 6 introduced a displacement-to-strain transformation matrix, which is based on measurements of strain and displacement.Moreover, Seo et al 7 focused on the displacement-to-strain transformation matrix of mode shapes under free boundary conditions as basis vectors.Graugaard-Jensen et al 8 and Hjelm et al 9,10 applied strain estimation to a laboratory structure and a lattice tower under operational conditions using the modal expansion technique.Papadimitriou et al 11 introduced the Kalman filter (an adaptive filter) to estimate the full-field strain in a numerical simulation in 2011.In 2016, Maes et al 12 compared the modal expansion, the Kalman filter and the joint input-state estimation algorithm in strain estimation on an offshore monopile wind turbine and found them to be associated with identical performance.To accommodate for the quasi-static response of wave-induced structures, Iliopoulos et al 13 applied both accelerometers and strain gauges to the modal expansion technique, while Skafte et al 14 introduced Ritz vectors (pseudo-modes) to the modal expansion technique for strain estimation to account for modal truncation errors caused by the quasi-static vibrations.Risaliti et al 15 presented the augmented extended Kalman filter for full-field strain estimation on a nonlinear mechanical system using the implicit equation of motion.In 2020, Tarpø et al 16 studied the application of expanded experimental mode shapes for stress estimation, concluding that the expansion is equivalent to updating and calibrating the system model.Nabuco et al 17 applied dynamic strain estimation on an offshore tripod structure during operation with high accuracy.There, the uncertainty of the estimated strain response was lower than it is typically assumed in a design process; thus, the strain estimation could reduce the number of planned inspections.Tarpø et al 18 showed that the modal expansion technique applies to 'linear' subsystem within a nonlinear and time-varying system with general viscous damping.The concept was proven both numerically and in a laboratory test with two friction-coupled scaled platforms.At the end of the 2010s, the field is introduced to machine learning where Lu et al 19 used pattern recognition to estimate strain and Deng et al 20 applied learning to modal expansion.These techniques are data driven, so the system model is created directly from data.Datadriven virtual sensing is at its tentative beginning, and the potential of techniques is scarcely studied in the literature.
In this paper, we will study supervised learning and data-driven strain estimation on an operating offshore wind turbine for 2 months.Datadriven system models consist of relationships between the states (input and output) of a system, and these develop purely from data without any explicit knowledge of the system. 21Thus, data-driven strain estimation enables a transformation into strain if the system model is based on both other sensors and strain gauges.The reader should note that the data-driven system model is limited by the data upon which it is based.Therefore, virtual sensors are restricted to the instrumented locations from the training dataset.The reasoning behind data-driven strain estimation is that strain gauges may lose their reliable performance over long-term monitoring-especially in an offshore environment.Thus, we can utilise temporary strain gauges along with geophones or accelerometers to establish a data-driven system model.This enables us to estimate the strain history in the same location as the temporary strain gauges using only the geophones or accelerometers.In this paper, we utilise mere 40 min of training data to establish a data-driven system model that can estimate the strain history over 2 months with high precision.
The scope of this paper is to propose a new data-driven virtual sensing technique for stress estimation that contributes to a more sustainable and stable operation of wind turbines.The proposed technique is applied to the data collected in a long-term structural health monitoring (SHM) campaign to improve the robustness of the strain measurements over time.It is well known that vibration-based SHM is a reliable and efficient structural assessment tool in detecting early stage structural damages, extending the lifetime of the monitored structures, as well as in reducing structural maintenance costs by avoiding unnecessary on-site inspections.In this paper, the proposed technique is verified for stress estimation by an implementation in the dynamic and broad frequency range on an offshore wind turbine.We organise the remainder of the paper as follows: We introduce the theory of data-driven strain estimation using principal component analysis (PCA) in Section 2. Section 3 presents the study of data-driven strain estimation on an operating offshore wind turbine during a 2-month monitoring campaign.

| STRAIN ESTIMATION
In this paper, we follow the flow chart layout from Tarpø 2 (see Figure 1).Following this terminology, we require three components to synthesise a virtual sensor: the system model, the physical sensors and the process model.Generally, the performance of the virtual sensors depends on the combination of physical sensors, system model and process model, which complicates any evaluation of a virtual sensor.Any potential error, including measurement errors and signal-to-noise ratio of the physical sensors, modelling errors of the system model, the sensitivity of the reduced system model, processing errors in the process model, and violations of the assumptions for the system and process model, can propagate and transfer into the virtual sensor.Hence, this complex network of potential errors determines the performance and quality of the virtual sensor.
In the mathematical modelling of the system model, we can model the system directly on the available data from our physical sensors where we have either fully data-driven (empirical) system models or partial data-assisted system models. 19,20,22This process requires temporary reference sensors at the desired locations to train the virtual sensors.After the creation of the system model, we can remove and replace the temporary reference sensor with the virtual sensors.Data-driven system models build relationships within the data, while the data can calibrate and assist the mathematical modelling for the data-assisted system model.These system models hypothesise that they contain less modelling errors since they are based on the data of the system.The reader should note that the fully data-driven system model is limited by the data upon which it is based.Thus, we are unable to extend the system model beyond the data without adding information to the system model.

| Data-driven system model
There are many different ways to create a data-driven or partial data-assisted system model like machine learning, PCA and transformation matrix created directly from data.Generally, these applications require training datasets to set up the system model, and the quality of the system model depends on the training dataset representation of the actual conditions.Examples of data-driven system models can be found in Lu et al. and Deng et al. 19,20 In this paper, we will apply PCA to create a transformation matrix that transforms our displacement directly into the strain response in the same locations as we had strain gauges in the training dataset.
The idea behind PCA is essentially a dimensionality reduction of data to set of uncorrelated principal components that are ordered after their contribution to the original data. 23Furthermore, we can regard PCA as an unsupervised version of linear regression. 21In this paper, however, we use the PCA in a supervised learning approach to uncover a mapping between the displacement of a system (the input) and the strain response of that system (the output) based on training data.
First, we stack the displacement vector, y(t) ℝ ny , and the strain response, εðtÞ ℝ nε : where y c ðtÞ ℝ N is the stacked response vector and N = n y + n ε .
This stacked vector, y c (t), has a subspace, V, which is spanned by the column vectors, v i .Furthermore, we assume that these vectors are unknown.
These column vectors form a basis for the subspace V, and we form a transformation matrix with these vectors.
We can express the stacked response vector as a linear combination of these column vectors since these vectors span the subspace of the stacked vector, y c (t).
We want to estimate these column vectors by applying PCA.First, we calculate the covariance matrix of the stacked response.
where E½Á denotes the expectation operator and (Á) T denotes the transpose of a vector or matrix.
Then we apply singular value decomposition on the covariance matrix, which is symmetric.
where S ℝ NÂN is a diagonal matrix holding the singular values in descending order and U ℝ NÂN holds the singular vectors as column vectors, and they correspond to the singular values.
The singular vectors are approximately equivalent to the transformation matrix, V ≈ U, and they are estimates of the column vectors, v i .Furthermore, the singular vectors have components related to displacement and strain.
where U y ℝ nyÂN is the part of the singular vectors related to displacement and U ε ℝ nεÂN is the part related to the strain response.
We can use the singular values, S, to detect the number of principal components needed to represent the subspace of the stacked response vector since the singular values show the contribution of each column vector, u n . 23Singular values below a certain threshold indicate that the corresponding vectors have insignificant contributions to the response and they are removable.Furthermore, an additional indicator is the condition number of singular vectors related to the displacement, U y , where this metric indicates the stability of the model. 24There is a trade-off between the precision and stability in the form of the number of principal components-indicated by the singular values-and the condition number.We can truncate the singular vectors to the first n components.
In this paper, we propose a displacement-to-strain matrix based on the singular vectors and linear regression.
where H ℝ nεÂny is the displacement-to-strain matrix and (Á) + denotes the Moore-Penrose inverse.
This transformation matrix, H, holds the relationships between the displacement and the strain response of the system; therefore, it enables a mapping between these quantities.Thus, we can estimate the strain response in any other dataset or continuously in real time as Here, the singular vectors form the system model, while the pseudo-inverse is the process model that transforms displacement into strain.
Since the proposed technique is based on regression, the reader should note that it is similar to the modal expansion technique, [8][9][10][12][13][14][16][17][18] which utilises mode shapes instead of singular vectors. The transforion matrix works as a subspace reduction that removes any noise or response perpendicular to the new subspace of the singular vectors.Moreover, we must avoid an ill-posed inverse problem; hence, the transformation matrix requires a redundant sensor network with more physical sensors than the included singular vectors.Fortunately, we can somewhat bypass this limitation by separating the data into frequency bands using complementary filters and apply a new subset of the singular vectors for each band.In that case, the data are divided into frequency bands whenever the condition number of all relevant singular vectors related to displacement is too large.Furthermore, frequency ranges with low levels of response should be separated into individual frequency bands since the principal components from higher response levels will otherwise dominate.

| Quality measures
There exist no rules or guidelines for evaluating virtual sensors.We can evaluate a virtual sensor like a physical sensor in terms of range, repeatability, sensitivity and accuracy.Furthermore, we should also evaluate the robustness and reliability of the virtual sensors.The most common approach is, however, to compare the output of virtual sensors with a set of reference sensors through different quality measurements with different strengths and weaknesses.

| Time domain quality measures
The coefficient of determination (R 2 ) 25,26 is a popular quality measure in statistics and modal validation to check the correlation between a reference and an estimated quantity.The metric is equal to the mean square error (MSE) of the two quantities normalised with the variance of the reference quantity.Whereas the value of the MSE is relative to a specific dataset, coefficient of determination (R 2 ) is comparable between any datasets.Therefore, the coefficient of determination accounts for both amplitude differences and the general correlation between the quantities.A coefficient of determination with a value of 1 indicates perfect correlation with the same amplitudes.This metric should not be confused with the Pearson correlation coefficient.
where ε i ðtÞ is the measured strain response for the ith strain gauge, εi ðtÞ is the estimated strain response at the same location and Var½Á denotes the variance operator.

| Fatigue damage quality measures
It is equally important to evaluate the stress estimation in fatigue damage when the estimation is intended for a fatigue analysis.Using the SN curve, the fatigue life is expressed as the following function: where N i is the fatigue life for the given stress amplitude, σ i , m is the 'slope' of the SN curve and C is the fatigue capacity (or intercept on the N axis at a stress amplitude of 1).
In the case the amplitude of the stress amplitudes varies, we must apply cycle counting, 27 such as the rainflow counting technique, 28 to count all stress cycles.According to the Palmgren-Miner rule, the accumulated fatigue damage is a summation of all partial damage caused by each stress cycle. 27 where n cycles is the number of stress cycles.
For this propose, we apply the normalised error of fatigue damage (NEFD), 16 which is based on the SN curve (excluding the effect of a bilinear SN curve) and the Palgreen-Miner rule.
where D n and Dn are the accumulated fatigue damage for the measured and estimated signal, respectively, in the nth fatigue location; n cycles and ncycles are the total number of counted cycles for the measured and estimated signal, respectively; Δσ j and Δσ j denote the stress range from cycle counting; and Δε j and Δε j are the strain range from cycle counting.Here, η i ¼ 0 indicates a correct strain estimation in terms of fatigue damage, while a negative value suggests an underestimation and a positive value suggests an overestimation of fatigue damage.In this paper, we will apply m ¼ 3, corresponding to welded steel structures without corrosion protection.

| Normalised error in equivalent stress range
It is often preferable to access fatigue damage for variable amplitude loading in the form of a damage equivalent stress range.These stress ranges lead to the same fatigue damage as all stress ranges. 27 The normalised error of the equivalent stress range is given as 3

.2 | Wind turbine
The studied structure is an offshore wind turbine-Vestas V90, 3 MW-positioned in the Great Belt near Sprogø in Denmark with a gravitation foundation (see Figure 2).The wind turbine is equipped with four Sigicom V12 triaxial geophones attached approximately 17 m above the foundation, while four strain gauges are placed at 0.75 m above the foundation (see Figure 3A for the elevation of the sensors).The strain gauges measure along the longitudinal direction of the tower, and they measure, in pairs, in opposite sides of the cross section (see Figure 3B).The reader should note that the position of sensors limits the measured response to the tower modes.Furthermore, the geophones have a frequency range (bandwidth) with the low cut-off frequency of approximately 0.2 Hz, and measurement noise will dominate at lower frequencies.Therefore, the physical sensors restrict this case study to dynamic strain estimation-above 0.2 Hz.
The monitoring campaign was during a 2-month period (see Figure 4), where the sampling frequency was 20 Hz.In this study, we apply a high-pass filter to all data with a cut-off frequency of 0.2 Hz to exclude the quasi-static response of the wind turbine.The geophones are calibrated using digital correlation, 30 and we use the vertical geophones to reduce the tilt effects on the geophones. 31We use the integration theorem for the Fourier transformation 32 to integrate signals from the geophones into displacement.Afterwards, we divide the data into three frequency bands: 0.2-0.7,0.7-4.5 and 4.5-10 Hz using complementary filters.We apply these filters to avoid an ill-posed problem for the datadriven system model, and the use of multiple frequency bands increases stability in the system model.
In this monitoring campaign, we found intermittent spikes in the measured strain response when wind turbine produced an average power near zero or negative (see Figure 5A).In this case, strain gauges 1 & 3 and 2 & 4 should measure the same quantity but the opposite operational sign.The intermittent spikes, however, have the same operational sign.Thus, concluding, the noise spikes are noise, which could be caused by electrical interference on the strain gauges that only occurs at or around negative power production.We reduce the noise on the strain gauge by subtracting the pair of strain gauge that measures the same quantity with opposite operational sign and divide with two (see Figure 5B) for the calibrated strain gauges.

| Virtual sensing and strain estimation
We use 40 min of training data to create the data-driven system model.Figure 4 marks the training data with a wind direction of approximately 180 and a low-power production.We apply PCA as explained in Section 2.1 to generate the system model in the form of a transformation matrix in each frequency band.For the first frequency band, we utilise six principal component vectors, six principal vectors in the second band and two principal vectors in third frequency band to generate the three transformation matrices.We base the number of singular vectors on the singular values of each frequency band corresponding to the expected number of dominating modes.Figure 6 shows the results of the trained estimation of strain along with the measured strain for the training dataset.For validation of the data-driven system model, we use a total of 6100 datasets with a time length of 10 min equally distributed over the entire monitoring campaign of 2 months, corresponding to 100 datasets each day.We present the results of the strain estimation in box plots for three quality measurement: Figure 7A shows the coefficient of determination (R 2 ), Figure 7B shows the error of equivalent strain range and Figure 7C shows the NEFD.F I G U R E 5 Measured strain response when the wind turbine generates negative power: (A) the four strain gauges experience intermittent noise spikes and (B) the calibrated measured strain response with reduction of the intermittent noise spikes We look at the dataset with the worst strain estimation to analyse the result (see Figure 8).For this specific dataset, the wind turbine does not produce power (parked conditions) and the standard deviation of the measured strain is very low, and therefore, the signal-to-noise level decreases.Figure 8A shows that the measured strain response is dominated by the noise floor at À90 dB and the structural response only F I G U R E 7 Tufte box plot with the 2nd, 25th, 50th, 75th and 98th percentile for the estimated and measured strain for the two strain gauges exceeds the noise floor at 0.35 Hz, whereas the estimated strain response is mostly located below this noise floor.Therefore, the virtual sensors result in an underestimation of fatigue damage in comparison to the strain gauges.Concluding, the virtual sensor has a higher signal-to-noise ratio than the strain gauges.When the wind turbine does not produce power, the strain gauges act as a poor reference sensors since measurement The dataset corresponding to the worst estimation of strain with measured strain (black line) and estimated strain (red line): (A) frequency domain with the first three singular values of the spectral density matrix calculated using Welch averaging method with segments of 2048 data points and 50% overlap and (B) time domain with a zoom on the strain response [Colour figure can be viewed at wileyonlinelibrary.com] F I G U R E 9 Without datasets with negative power production, Tufte box plot with the 2nd, 25th, 50th, 75th and 98th percentile for the estimated and measured strain for the two strain gauges Wind turbine at Sprogø near the Great Belt Bridge in Denmark [Colour figure can be viewed at wileyonlinelibrary.com]

F I G U R E 3
Position of sensors on the wind turbine.(A) Illustration of the elevation of the sensors on the wind turbine where the red line illustrates the elevation of the four triaxial geophones while the blue line shows the elevation of strain gauges that measure strain along the longitudinal direction of the tower.(B) Cross section of wind turbine with heading, position and numbering of both geophones and strain gauges in relationship to the cardinal directions [Colour figure can be viewed at wileyonlinelibrary.com]

F I G U E 4
Monitoring campaign over a 2-month period with the measured wind speed, wind direction and the production of power.The training data are marked with a red line [Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E Training dataset corresponding to the marking in Figure4with measured strain (black line) and estimated strain (red line): (A) frequency domain with the first three singular values of the spectral density matrix calculated using Welch averaging method with segments of 2048 data points and 50% overlap and (B) time domain with a zoom on the strain response [Colour figure can be viewed at wileyonlinelibrary.com]