A convolutional neural network deep learning method for model class selection

The response‐only model class selection capability of a novel deep convolutional neural network method is examined herein in a simple, yet effective, manner. Specifically, the responses from a unique degree of freedom along with their class information train and validate a one‐dimensional convolutional neural network. In doing so, the network selects the model class of new and unlabeled signals without the need of the system input information, or full system identification. An optional physics‐based algorithm enhancement is also examined using the Kalman filter to fuse the system response signals using the kinematics constraints of the acceleration and displacement data. Importantly, the method is shown to select the model class in slight signal variations attributed to the damping behavior or hysteresis behavior on both linear and nonlinear dynamic systems, as well as on a 3D building finite element model, providing a powerful tool for structural health monitoring applications.

Beck 4 introduced an algorithm to investigate the model identifiability in structural model updating using a network of trajectories which finds all other output-equivalent optimal models.Importantly, Ching and Chen 5 developed a simulationbased approach for the simultaneous Bayesian model updating, model class selection, and model averaging.Muto and Beck 6 implemented, later, the transitional Markov Chain Monte Carlo method for nonlinear structures under seismic loading.Additionally, Cheung and Beck 7 proposed a general method for calculating the model evidence based on the posterior samples of the Markov Chain Monte Carlo approach, while Beck 8 investigated the Laplace's method of asymptotic approximation and the Markov Chain Monte Carlo methods for a structural health monitoring benchmark problem.
Furthermore, Raftery et al. 9 developed the method of dynamic model averaging for online model class selection.Chatzi et al. 10 proposed and experimentally validated a twofold criterion based on the smoothness of the parameter prediction and the accuracy of the estimation.Yuen and Mu 11 developed a novel model class selection component into the extended Kalman filter algorithm, to simultaneously provide the model class selection and the parametric identification in a real-time manner.Importantly, Kontoroupi and Smyth 12 explored how the Bayesian model selection and the unscented Kalman filter scheme for joint state and parameter estimation can be integrated into a single method using each model's probability-plausibility computation.][16][17][18][19][20][21][22][23][24][25] However, the current model class selection methodologies, apart from the class selection, incorporate also the system identification for each model.The main challenge here is derived from the effort of performing this task for partial unobservable systems, such as large systems under very limited information, or systems with unknown inputs.Similarly, this task is not trivial in empirical systems with nonlinear behavior where no acceptable closed-form equation representation exists.
A way to address those challenges is examined here using a generalized response-only and (after the training) real-time procedure based on the deep learning capabilities which selects automatically the system model class without having to identify its parameters, measure and estimate all dynamic states, or knowing the system input.The convolutional neural network approach is therefore employed of the deep learning library of methods.Importantly, the convolutional neural networks have already shown an impressive performance on selecting the class of visual imagery data 26 via an ability to recognize patterns.8][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44] The ability to provide the model class selection using a unique degree of freedom (DOF) response measurement, without system identification, and by using a neural network classification approach makes this approach distinctive from the current methodologies.
The methodology, specifically, results in a fast and accurate nonparametric vibration-based tool for model class selection which directly classify the model based solely on response signals.An algorithm enchantment is also investigated when the dynamic state estimates of a Kalman filter as developed by Smyth and Wu 45 and implemented as a physics-enhanced kinematics constraint, 46 train a network to recognize their patterns and classify the new and unlabeled signals.In this way, the advantages of the Kalman filtering [47][48][49][50][51][52][53][54][55] are explored to improve the performance of the convolutional neural network.Due to the convolutional neural network ability to learn and extract the optimal features with a proper training, the proposed approach achieves an impressive model class selection accuracy despite the response-only nature of the signals.
The work is organized as follows: the Bayesian model class selection and the limitations are overviewed in Section 2. In Section 3, the standard convolutional neural network architecture is provided, as well as a comparison of the onedimensional and the multi-dimensional convolutional neural network versions with a focus on the model class selection.In Section 4, the Kalman filter fusion is formulated for response-only, unknown input, and unknown model class systems.Section 5 provides the summary and the detailed algorithmic tables.Importantly, Sections 6, 7, and 8 investigate numerical applications on both linear and nonlinear dynamic systems, as well as on a 3D building finite element model.Subsequently, Section 9 presents a discussion, future research suggestions, and sensitivity analysis for the training process.Finally, the conclusions are provided in Section 10.

BAYESIAN MODEL CLASS SELECTION
To select the model class   in a Bayesian framework, one needs to use their prior probability distribution, and then assess their posterior probability plausibility.Let  be the space of the models  ∶  .The posterior probability (  | , ) of the model class   is defined using the Bayes theorem as: where, (  | ) is the prior probability of   ,  is the measurement vector, and ( |   ) is the evidence given the model   .The denominator is replaced by the summation of the prior probability and the likelihood for every model class, written as: Let   ∈   be the parameter  of the model   .The posterior probability distribution (  | ,   ) of   is written as: where, ( |   ,   ) is the likelihood given the parameter   and the model   , and (  |   ) is the prior probability density function of   given the model   .Here, computing the evidence ( |   ) for each model   is not trivial.Specifically, the high-dimensional integral is usually analytically intractable, for instance when nonconjugate prior probabilities and/or latent variables exist.
To this end, stochastic simulation methods are used.Particularly, the Markov chain Monte Carlo methods generate samples from the posterior distribution, and then compute the likelihood using the following identity of a rearranged Bayes theorem for every   : where, the natural logarithm (•) is applied to avoid numerical overflows.Equation ( 4) is also written as 6 : where, the first expectation term measures the posterior average data fit of the parameter set   , while the penalty-type second one represents the Kullback-Leibler divergence 56 between the parameter posterior and prior probability distributions.Finally, the identification with the highest evidence (( |   )) 12 or the least Kullback-Leibler divergence 13 is selected as the one with the most plausible model class.
However, this approach requires a parametric model-based implementation of the model class selection, which inevitably require a parameter estimation and the system input knowledge for input-output identification.Contrastingly in the convolutional neural network approach, a response-only nonparametric signal-based approach is implemented by using the machine learning means to directly select the model class by recognizing signal patterns.

CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE
The convolutional neural networks are a type of deep learning artificial neural network methods with an ability to recognize patterns in visual data.They are composed of multiple building blocks which automatically and adaptively learn spatial hierarchies of features.The one-dimensional convolutional neural networks (1D CNN) have been proven to be highly effective in a variety of signal processing tasks.The fundamental building block of a 1D CNN is the convolutional layer.The convolutional layer applies a set of filters to the input signal, producing a set of feature maps.The filters have a fixed size and slide over the input signal, computing a dot product at each location.In doing so, the resulting feature maps capture different aspects of the input signal, such as local trends and patterns.In practice, a 1D CNN may have multiple convolutional layers with different filter sizes and number of filters.Each layer can apply a different set of filters to the input signal, allowing the network to capture different aspects of the signal at different scales.
The examined one-dimensional convolutional neural network compares to the multi-dimensional counterparts as follows.A one-dimensional configuration fuses the feature extraction and the learning phases of the dynamic states.One-dimensional arrays are used instead of two-dimensional matrices for both the kernels and the feature maps.
Additionally, the network architecture has the hidden neurons of the convolution layers which perform both the convolution and the sub-sampling operations.The fully-connected layers are identical to the hidden layers of the multi-layer perceptrons where the classification task is mainly realized.Accordingly, the multi-dimensional matrix manipulations, namely the convolution and the lateral rotation, are replaced by their one-dimensional counterparts, namely the convolution and the reverse operations.Finally, the parameters for the kernel size and the sub-sampling are scalars.Importantly, this simplified structure of the convolution neural network requires only one-dimensional convolutions and therefore, a mobile and low-cost hardware implementation for near real-time applications.The algorithmic details of the 1D CNN are provided in Section 5.
A short description of the additional layers in the convolutional neural network architecture is provided.Input Layer: The layer where the input is specified.Convolutional Layer: The layer where the filters are applied to the input, usually between a subarray of the input array and the filter, and where the neurons connect to the input subarray.In this layer, the number of feature maps is also determined.Batch Normalization Layer: The layer where the normalization of the activations and gradients occurs leading to a simpler optimization training problem.It is usually followed by a nonlinear activation function.Pooling Layer: The layer where the down-sampling operation is applied to reduce the spatial size of the feature map and to remove the redundant spatial information.This leads to an increase of the number of filters in deeper convolutional layers without increasing the required amount of computation per layer.Fully Connected Layer: The layer where the neurons connect to the neurons in the preceding layer to combine all the features learned by the previous layers and identify the larger patterns.Importantly, the last fully connected layer combines the features to classify the data and is equal to the number of classes in the input data.Softmax Layer: The layer where the activation function normalizes the output of the fully connected layer.The output of this layer consists of positive numbers that sum to one, which are then used as classification probabilities by the classification layer.Classification Layer: The final layer where the probabilities are returned by the activation function for each input to assign the mutually exclusive classes and compute the loss.Importantly, the training of the network is implemented usually by a stochastic gradient descent with a specified number of epochs, where an epoch is a full training cycle on the entire training data set.
In this work, the one-dimensional convolutional approach is applied to select the model class.The examined approach fuses both the feature extraction and the classification blocks into a single and compact learning body.The advantage is the ability to extract optimal model class-sensitive features automatically from the response-only signals.

RESPONSE-ONLY AND UNKNOWN MODEL CLASS DYNAMIC STATE ESTIMATION USING THE KALMAN FILTER
For a further improvement of the 1D CNN performance with response-only signals when additional signals are available, the Kalman filter data fusion technique may be used by Smyth and Wu. 45,46The Kalman filter algorithm, given a series of noisy measurements observed over time, estimates optimally the system dynamic states using a joint probability distribution over the states for each timeframe.The algorithm works in two steps: the first step is the prediction of the dynamic states using the dynamic process model which also propagates the uncertainty of the dynamic states.The second update step incorporates the measurements to calibrate the dynamic state estimation using a weighted average strategy, where more weight is given to the estimates with higher certainty.The algorithm is recursive and it is used online and, potentially, with real-time data.
Even for simple systems though, the knowledge of the system parameters and input is needed to predict future steps.This leads to an unavailability of filtering the signals when response-only and unknown model class scenarios are examined.To this end, the dynamic states are filtered using acceleration and displacement measurements 45,46 as: where,  and  are the acceleration and displacement measurements, respectively, and   and   are their associated noise.It is assumed that   and   are white noise Gaussian processes.By introducing the state variables, Equations ( 6) and (7), without the noise terms, are written in matrix form as: =   (10)   If acceleration measurements are available at intervals of Δ, the process Equation ( 9) and the observation Equation (10)  are discretized as: ( + 1) =  ( + 1) namely, {  1 ( + 1)  2 ( + 1) where  step stands for the  ⋅ Δ time instance.
In the approach of Equation ( 13), a physics-enhanced fusion of the displacement and the acceleration signals is implemented.Specifically, the object kinematics equation is employed as the system pseudo-model to provide the physical relationship between the displacement and acceleration data, without incorporating any knowledge of the actual system model and its class.Equation ( 13) is simpler written using the well-known body-motion equations as: where, the acceleration  is assumed to be constant between each sequential steps; an assumption which does not lead to divergences due to the small value of Δ.
Overall, this fusion algorithm uses the Kalman filter which, given acceleration and displacement measurements, provides optimally the displacement and the velocity dynamic states.Importantly, the displacement measurement is provided by the integration of the acceleration signal on linear systems.
The obtained results can be used instead of raw signals to train, validate, and test the convolutional neural network for model class selection.Notably, those response-only signals are used as input data to the network, and they should not be confused with the output of the network.
Finally, the one-dimensional convolutional neural network procedure for model class selection is implemented as follows.The dynamic states of a unique system responses are loaded to train and validate the network.Importantly, these signals are already labeled with the model class.The one-dimensional convolutional neural network architecture is defined where the input size of the training data is specified as the number of their classes.Subsequently, the network training optimization algorithm is specified which included a mini-batch approach with an adequate number of epochs.For online purposes with a unique response training signal, the mini-batch size is set equal to 1, otherwise larger values also are used.Once the network is trained, it is used to evaluate the new and unlabeled signals, and select their model class.Importantly, no additional data such as the system input or the system parameters are needed.

PROCEDURE SUMMARY
The overall procedure is illustrated here where each step is detailed in Table 1: 1. Initialize the measurement filtering (optional for improved performance incorporating more data).Set the initial probability distributions for the dynamic states of each mode class response signal.
TA B L E 1 Kalman filter convolutional neural network (Kalman filter C-Net).
Step 1 (optional): • Initialize the dynamic state estimation: •  = 0 (Time step) •   = [  ] ( stands for Expectation) Step 2 (optional): • Predict and estimate the dynamic states: •  + =     +     (prediction) Step 3: • Initialize randomly all weights for the neural network • Forward propagate the input data: •    = (   ) and    =    downsampling • Compute the delta error at the output layer and back-propagate it: Step 4: • Post-process to compute the weight and bias sensitivities: • Update the weights and biases with the accumulation of sensitivities: Step 5: • Move to each next layer until the network is fully trained.Classify the unlabeled signals from Step 3 using the trained network.
2. Filter the dynamic states online (optional for improved performance incorporating more data).Predict the dynamic states using the acceleration measurements and the discrete state-space modeling.Estimate the dynamic states using the displacement measurements.The displacement measurements may have a different rate than acceleration measurements. 45,46Importantly, for linear systems double-integrate the acceleration measurements.Also, Repeat the filtering for the full signal duration.Repeat the Kalman filter procedure for all time steps to provide the full input.

Kernel function 𝒈(𝒕) Constraints
In Table 1,    is the network input at layer ℎ and neuron ,   is a scalar bias, and   is the output of the neuron  at the layer ℎ − 1.Also,   is the kernel weight from the neuron  at layer ℎ − 1 to the neuron  at layer ℎ, and    is the intermediate output.Related to the back propagation of the error starting from the output fully connected layer,  ℎ is the number of classes in the input data, and   corresponds to the target and output vector.Finally, the delta of the neuron  at layer ℎ, Δ ℎ , is used to update the bias of that neuron, as well as, all the weights of the neurons in the previous layer connected to that neuron.

APPLICATION TO LINEAR DYNAMIC SYSTEMS
For the linear numerical application consider the case of the damping model classes in structural dynamics.The standard equation of motion of a  DOF structural-mechanical system, in the case of proportional damping, is written as: where,  and  are the mass and stiffness matrices, respectively, and  is the proportional to  and/or  damping matrix that satisfies the orthogonality property.This means that if  is the matrix that contains the eigenvectors of the system, then  =    is a diagonal matrix and thus, a decoupling procedure can be implemented.Here, () and  () are the response of the system and the force applied to the system, respectively.With regard to damping the form of Equation ( 16) is restrictive, and for a general consideration of structural-mechanical systems, alternatively damping model classes are considered.This is implemented by one or more convolution integrals over a kernel function ().In doing so, the damping depends on the past history of the motion.The equation of motion then is written as an integro-differential equation: where, this formulation is a generalization of the standard damping modeling since by using the Kronecker delta function () as the kernel function (), Equation ( 17) reduces to Equation (16).
For the choice of the damping kernel functions, many candidate functions may be considered.Observations, though, from real systems 57 suggest that the exponential function can often adequately model the damping, and is a natural choice.
Table 2 shows several candidate kernel functions 13,58 which have been shown to adequately model the damping behavior of structural-mechanical systems.Here,   is damping model parameter which is properly calibrated by system identification procedures. 13he system of Equation ( 17) is then examined with various model classes.Specifically, the system matrices for the synthetic measurement generation are: with the initial conditions are (0) = [1 1]  and ẋ(0) = [0 0.5]  .White noise is chosen for the force  () = [ 1 ()  2 ()]  of mean value 0 and variance 9. Importantly, the initial conditions and/or the force should be chosen to excite the system sufficiently.Three model classes are considered with three different kernel functions, namely: To create synthetic measurements, the integration method of Katsikadelis 13,59,60 is implemented as: where: Here, the time discretization frequency is set equal to 100 Hz, therefore Δ is 0.01.The same holds for the sampling frequency of the synthetic measurements.Finally, to consider the effect of measurement noise, each response signal is contaminated by a Gaussian white noise sequence with a 10% root-mean-square noise-to-signal ratio.Different initial conditions are applied to the system to generate multiple responses for training and validation.The duration of the acceleration and displacement signal measurement for each model class is 40 s.
To Kalman filter all previous signals, the process covariance   and the measurement covariance   matrices are chosen to be constant during the identification process and equal to 10 −9 ⋅  × and 10 −3 ⋅  × , respectively.For larger values, the algorithm needs more data and time to converge, or it may even diverge.
The convolutional neural network architecture is defined as follows in Figure 1: An input layer with the three signals for each one of the three model classes A, B, and C, associated with their model class label.A convolutional layer is set with filter size equal to 2048 and number of neurons that connect to the same region of the input equal to 128 with casual padding.A rectifier layer, termed also as ReLu is also set, as well as a batch normalization layer with mini-Batch size equal to 1 for online purposes, and an additional convolutional layer with filter size equal to 2048 and number of neurons that connect to the same region of the input equal to 256 with casual padding.An additional rectifier layer is set along with an additional batch normalization layer with mini-Batch size equal to 1, and a global average pooling layer.Finally, a fully connected layer is set with a number of classes equal to 3, a softmax layer, and a classification layer.Importantly, an investigation of the number of the filter size and the number of neurons within the convolutional layer is shown in Section 9.Last but not least, the number of the maximum epochs in the optimization process is set equal to 15. Importantly, to design the architecture, although still an active research problem, 61 a simple CNN architecture is examined that has one hidden layer with one max pooling layer before the classification one.Based on the results and by controlling the trade-off between accuracy and training speed, the number of kernels and layers is increased until a satisfactory performance is reached.This work uses similar architecture building philosophy to the damage detection applications 28,32  Importantly, in this application, it may seem that the model is quite simple and, perhaps, does not need such a complex network in the prediction, meaning the convolution neural network is not efficiently designed.In reality though, removing layers from the network results in a poorer performance where the predictions are wrong.
Additionally, it may seem that the model classes are too idealized since the model class can be well depicted by the mathematical formulas in Table 2.In reality though, those models have been experimentally demonstrated that they represent the behavior of real dynamic systems, that is, chap.8 of Adhikari. 57

APPLICATION TO NONLINEAR DYNAMIC SYSTEMS
For the nonlinear numerical application consider initially the problem of a mass in free fall 62 landing on a generalized damped base material.The stiffness and damping elements of the base material are active only when the body is in contact with it.The equation of motion is nonlinear and it is expressed as: and if, for instance, the effectiveness of a twofold model is examined, then the equation of motion is written as: where, ℍ(()) is the Heaviside step function.Assume here,  = 1 Kg,  = 3 N s/m,  = 1000 N/m, () = − with  = 9.81 ∕ 2 for the gravity acceleration.The initial conditions are (0) = 0.1 and ẋ(0) = 0. White noise is chosen for the force  () of mean value 0 and variance 9. Importantly, the initial conditions and/or the force should be chosen to excite the system sufficiently.Two model classes are considered for the () in Equation ( 21) with two different kernel functions, namely: To create synthetic measurements, the integration method of Katsikadelis 13,59,60 is implemented as in Section 6 where, for nonlinear systems, the state transition matrix   and the input matrix   are modified and in the stead of  × and  × , the zero matrix  × is inserted.Also, the new input is: where, a system of equations provides the numerical solution of the nonlinear system, namely: Here, the time discretization frequency is set equal to 100 Hz, therefore Δ is 0.01.The same holds for the sampling frequency of the synthetic measurements.Finally, to consider the effect of measurement noise, each response signal is contaminated by a Gaussian white noise sequence with a 10% root-mean-square noise-to-signal ratio.The duration of the acceleration and displacement signal measurement for each model class is 100 s.
To Kalman filter the signals, the process covariance   and the measurement covariance   matrices are chosen to be constant during the identification process and equal to 10 −9 ⋅  × and 10 −3 ⋅  × , respectively.For larger values, the algorithm needs more data and time to converge, or it may even diverge.
Subsequently, the network architecture is defined similarly to Section 6.Two signal inputs are examined in Figures 4-5 with the same layout description as in Section 6.In total, 10 new velocity and displacement signals are classified, where ideally the first 5 signals belong to Model A, and the second 5 signals belong to Model B.
In Figure 4, the performance of the networks using only the DOF 1 displacement signals in the training and validation process is shown.The C-Net correctly selects the model class for each signal apart from one which is misclassified as Model A despite belonging to Model B. The Kalman filter C-Net also provides the same selection accuracy, but with a shorter training period and loss minimization.
In Figure 5, the performance of the networks using only the DOF 1 velocity signals in the training and validation process is shown.The C-Net selects correctly the class of seven signals, but misselects three of them.Contrastingly, the Kalman filter C-Net misselects only 1 signal out 10.In this examination, the Kalman filter C-Net shows a superior performance compared to C-Net in the selection accuracy, apart from solely a faster convergence.
To train the network, three earthquake inputs are considered, namely the Tabas of September 16, 1978 at Tabas (1.080 g), the Northridge of January 17, 1994 at Sylmar Converter Station (0.827 g), and the Kobe of January 17, 1995 at JMA (0.818 g), available from the Sylmar Converter Station (PEER strong motion database 81 ).Only those three are used for training the convolutional neural network, while three more are used for the validation step.
To Kalman filter the signals, the process covariance   and the measurement covariance   matrices are chosen to be constant during the identification process and equal to 10 −9 ⋅  × and 10 −3 ⋅  × , respectively.For larger values, the algorithm needs more data and time to converge, or it may even diverge.
Subsequently, the network architecture is defined similarly to Section 6.Two signal inputs are examined in

APPLICATION TO A 3D BUILDING FINITE ELEMENT MODEL
4][85][86] This problem examines the capability of the approach when due to the large number of DOFs, the network may not capture all the dynamic system changes and become inaccurate.The model has six DOFs at each node of a studied 2-storey and 2-bay at each direction 3D model.Each column has a length of 14 feet (4.3 m) with section W27x114, each beam has a length of 24 feet (7.3 m) with section W24x94, and each girder has a length of 24 feet with section W24x94.The ground boundary are assumed fixed, and the material properties are 29, 000 Ksi (200 GPa) for the Elastic modulus, 0.3 for the Poisson ratio, and 60 Ksi (413.6 MPa) for the yield stress.A hardening material law is chosen. 87The weight of all components is taken into account, and reinforced-concrete floor slabs are simulated with 150 pcf (2403 Kg/m 3 ) concrete density and scale factor 2 for dead loads.Importantly, the forceBeamColumn element is used for all components. 88wo model classes are considered for the Rayleigh damping 89 proportional to the matrix (Model A where,  =  1 ), or proportional to both the mass and the stiffness matrix (Model B where,  =  1  +  2 ) with the Reyleigh damping parameters  1 and  2 .F I G U R E 8 3D building finite element model system of Section 8 with material nonlinearity excited by earthquake inputs for the nonlinear history response calculation using OpenSees.DOF, degree of freedom.
To create synthetic measurements, the Newmark integration method is used to simulate the response with either the Newton with initial tangent or the Newton with line search method for the material nonlinearity, depending on the convergence issues.
Here, the time discretization frequency is set equal to 50 Hz, therefore Δ is 0.02.The same holds for the sampling frequency of the response measurements.Finally, to consider the effect of measurement noise, each response signal is contaminated by a Gaussian white noise sequence with a 10% root-mean-square noise-to-signal ratio.
To train the network, three earthquake inputs are considered, namely, the Imperial Valley of May 18, 1940 at El Centro (0.341 g), the Northridge of January 17, 1994 at Sylmar Converter Station (0.827 g), and the Kobe of January 17, 1995 at JMA (0.818 g), available from the Sylmar Converter Station (PEER strong motion database 81 ).
Only those three responses are used for training the convolutional neural network, while three more are used for the validation step.In this application, it is shown solely the C-Net performance to compare the training with acceleration signals which are not available in a filtered fashion by this Kalman filter approach.Importantly, to better illustrate the feasibility of the research in real buildings, the seismic responses of model is usually compared with some deformation index, such as story drift ratio, which can represent the deformation state of the structure.In the examined application this range is 0%-2%.However, in this work is not reported in detail to follow the unique DOF measurement approach for model class selection as examined earlier.The network architecture is defined similarly to Section 6.
Three signal inputs are examined in

DISCUSSION
The presented work provided a simple, yet effective, way to select the model class in structural dynamics.It did not aim to present a machine learning algorithm advancement, rather than to apply the vast capabilities of such tools [90][91][92][93][94][95] to the model class selection problem, for the first time to the best of the author's knowledge.To this end, the efficiency and robustness of the method was tested to both low-DOF systems and to a complex system, such as a 3D building finite element model.Further examinations and comparisons are also provided in the section to shed light into the method.Specifically, the comparison between C-Net and Kalman filter C-Net may seem not fair.In the Kalman filter C-Net, the availability of the dynamic states provides more information compared to pure C-net, and this leads to a better accuracy since it has deeper information.In reality, the purpose of this work is not to improve the C-Net, but to provide a way to exploit more data if available.Importantly for the explanation of the results, the Kalman filter approach provides improved training performance since it exploits the estimated dynamic states which have less noise; however this impact is irrelevant when poor filter size and number of neurons is used for the network.Relating to the visualization of the results, the horizontal axes of the model class selection may confuse at a first glance.They provide though the prediction of the network relating to the model that the signal was generated, and the model that the signal was classified.In this view, the count number of correct and wrong prediction is seen.
Along these lines, the topic "model class selection" should be clarified better as it touches many engineering fields.In reality, this work did not make any distinction between the field of application, and the potential is open for fields different than the structural identification.For the structural health monitoring field, specifically, the method provides the model that will be further used to identify the structure, without having to perform the identification for each one model first.
Specifically for structural health monitoring applications, the number of candidate models is usually low, and the method manages to provide a reliable prediction.However, for other fields, such as if one wanted to predict a model class for a nonlinear oscillator with some combination of polynomial stiffness terms, one would require 2  − 1 candidate model classes to comprehensively consider up to  ℎ order polynomial terms.With regards to this point, future research is recommended for application on those field investigating the number of the models which results in the method to fail, and how the number of candidate models effects the accuracy of the model class predictions.The reason lies into the fact that the number of candidate models would be prone to proliferation in a way that could potentially be detrimental to prediction performance.
Another concern is related to the model class selection capability without the need to identify the parameters.However, for Table 2, it is stated that parameter calibration is performed using system identification techniques, and this seems to be somewhat of a contradiction.In reality, those parameters were used to generate the signals to train the network, and they were not used or identified during the CNN model class selection process.
Regarding the network algorithm parameters, the examinations so far showed a recommendation of as high as possible values for the filter size and the number of neurons in the convolutional layers.The first one defines the kernel where the data are multiplied by, while the second one determines the number of feature maps.
However, this recommendation sounds restrictive or suboptimal since it leads to higher weights for back-propagation, and ultimately to higher computational cost.
Despite this, the computational cost of this approach is bearable.This is attributed to three main reasons: the onedimensional nature of the data, the unique signal training approach which may be implemented, and the Kalman filtering of the signals to remove the noise.
The higher values for the filter size and the number of neurons recommendation is not mandatory though.The user may achieve the same accuracy with a much lower value of them, and with reduced computational cost.However, for a low number of them, a reduced accuracy is observed despite that the training process wrongly seems to reach a 100% accuracy.
To demonstrate this, consider the examined linear and nonlinear systems.Compared to the previous numerical applications of Sections 6 and 7, only the filter size is changed to 3 and the number of neurons to 8.
Two signal inputs are examined in Figures 12-13 with the same layout description as in Section 6.In total, 9 new velocity and displacement signals are classified for the linear system, and 10 new velocity and displacement signals for the nonlinear system.Ideally for the linear dynamic system, the In Figure 12, the performance of the networks in the linear dynamic system using only the DOF 1 displacement signal parts in the training and validation process is shown.Both networks misselect five out of nine signals.Interestingly, both training processes reach a 100% accuracy despite that the loss is high.The loss can be then used as an indication that higher filter size and neural number are needed.Importantly, both networks have nine out nine correct selections for higher filter size and neuron number values, as shown in Section 6.
In Figure 13, the performance of the networks in the nonlinear system using only the DOF 1 displacement signal parts in the training and validation process is shown.Both networks misselect 3 or 4 out of 10 signals.Interestingly, both training processes reach a 100 % accuracy despite that the loss is high.The loss can be then used also in nonlinear systems as an indication that higher filter size and neural number are needed.Importantly, both networks have a higher number of correct class selections for higher filter size and neuron number values, as shown in Section 7.
Here, the sensitivity investigation is performed for a low number of model classes which potentially means that for a larger number of them, larger deviations are expected when the filter size and number of neurons is low.Importantly training the network with multiple number of signals overcomes the inaccuracies derived from low filter size and number of neurons, but increases the computational cost.
Last but not least, the training results and accuracy shows the normal variability of the convolutional neural networks results.In this unique response training approach, this limitation phenomenon is enhanced and additional research is recommended.Importantly, all the applications presented in this work are based on a very limited amount of data for training.In a scenario where a large amount of them (many signals train the network after many earthquake events for the same structure) higher accuracy is expected.However, this is not always available in real-life applications, which led to the low data or unique signal training investigation within this work.
Another concern is related to the investigation into the extrapolation capabilities of the approach since only the outputs measured from a system.The examinations so far showed the potential of the method when the structural model remains the same.However, this assumption may not be true if a change happen to the system, some damage for instance, or any other modification on the structure.To explore this, consider the examined 3D building.Compared to the previous numerical applications of Section 8, only some of the ground boundary conditions are changed to allow rotation, instead of fixed nodes (termed "outside training set" response in Figures 14-16).This simulates a damage scenario at the foundation of the structure, for instance.
Three signal inputs are examined in Figures 14-16 As a result, the approach is not capable of some form of extrapolation to predict model classes for systems with forcings outside of the training dataset to ensure good performance.When employed on a real engineering system where the system may change, one must have some prior belief about the expected forcing patterns in order to generate comprehensive training datasets, and retrain the network for future good prediction.It follows, as a future recommendation, that one requires some prior belief regarding anticipated forcings in order to use the approach, and the method can be combined with respect to Bayesian model selection approaches with Bayesian latent force estimation. 96,97This is a pertinent test for model class selection approaches in engineering applications as there could be high-cost or safety critical ramifications if a model class is confidently predicted incorrectly.
A final concern is related to the uncertainty quantification where the model class selection methodology should provide.Namely, a desirable property for model class prediction approaches to possess that accurately representing the uncertainty around predictions.In the framework of convolutional neural networks, this may achieved by retraining the model multiply times and take the average and the rest statistical properties of the network prediction.
Last but not least, regarding using other types of neural networks such as the long short-term memory ones, 98 an investigation was made.The long short-term memory neural networks are widely recognized as a powerful machine learning tool for both classification and regression problems.They belong to the wider library of the recurrent neural networks which use feedback loops with recurrent connections between the nodes of the network to make them capable of modeling sequences of signals, such as the structural vibration raw signal .
The intuition behind the them is to create an additional module in a neural network that learns when to remember and when to forget some characteristic of the provided vibration signal.In other words, the network, effectively learns which patterns might be needed in the signal and when that information is no longer needed.This poses an advantage for structural model selection among a group  of them when an unexpected excitation excites the structure which, as not attributed to model response to ambient environment, does not play an important role in the final model selection, and can be neglected.Importantly, this unexpected excitation is potentially of unknown magnitude, and the network does not need to have this information to perform the model selection.
The discussed, though, long short-term memory gates make the training more difficult and increase the training time of the network.To reduce training time and improve network performance, a simplified but improved gated recurrent unit architecture network 99 may also introduced for structural model selection.The gated recurrent unit chooses a new type of hidden unit that merges the forget gate and the vibration signal  gate into a single update gate, and mixes also the cellular and the hidden state into one state.The number of gates is decreased compared to long short-term memory which are termed update gate and reset gates.The final model is simpler than the standard long short-term memory resulting in a faster convergence for structural health monitoring applications.The author switched both convolutional layers with long short-term memory and gated recurrent unit ones, keeping the same architecture, and both of them always underperformed the convolutional one architecture.Additional research is therefore recommended on how those layers and architectures can compete the convolutional one in model class selection problems.
Finally, future directions are also provided in the area of using clustering techniques to judge which model class a signal belongs to, if it provides in an easier way to solve this problem, and what are the limitations compared to this work.Importantly though, the clustering approach does not incorporate a labeling philosophy to associate the signals to some models.

3 . 4 .
Feed the network.Provide the one-dimensional convolutional neural network with the raw signals or filtered signals from Step 2 associated with their model class.At this point, generate randomly the weights of the network.Also, Initialize the network training.Start the network training where the signal data are propagated between the layers.Implement the back-propagation algorithm in the network training.Post-process the signal data for the estimation of the weights and bias sensitivities.Update the weights and biases with the accumulation of sensitivities.Finally, move to each next layer.5. Select the model class.Use the trained network to classify the unlabeled signals.Specifically, provide the new, unused, and unlabeled raw or filtered signals from Step 2 as an input to the network to output the model class.TA B L E 2 Damping kernel functions () for the generalized damping model classes.

F I G U R E 1
Examined C-Net architecture for all numerical applications.
without any special adjustments that would potentially favor the model class selection problem.Two signal inputs are examined in Figures 2-3.In these figures, the first and second row refer to the displacement and acceleration raw signal used in the Kalman filter for all models.The third row refers to the network model class selection trained with unfiltered signals (C-Net) where the data generated by model A denoted by 1, model B denoted by 2, and model C denoted by 3 are attributed to each model A, B, or C. Similarly, the fourth row refers to the network model class selection trained with the Kalman filtered signals (Kalman filter C-Net).Additionally, the fifth row refers to the accuracy in the training process for both networks with respect to the number of optimization iterations, while the sixth row refers to the loss in the training process for both networks with respect to the number of optimization iterations.In total, 9 new velocity and displacement signals are classified, where ideally the first 3 signals belong to Model A, the second 3 signals belong to Model B, and the last 3 signals belong to Model C. In Figure 2, the performance of the networks using only the DOF 2 displacement signals in the training and validation process is shown.The C-Net correctly selects the model class for each signal.The Kalman filter C-Net also correctly selects the model class for each signal, but with a shorter training period and loss minimization than C-Net.In Figure 3, the performance of the networks using only the DOF 2 velocity signals in the training and validation process is shown.Both networks select correctly the model class for each signal expect one.Importantly, the Kalman filter C-Net converges faster.

F I G U R E 2
System of Section 6: Results for the linear dynamic system when training and validating with the DOF 2 displacement signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1, B->2, and C->3.Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.F I G U R E 3 System of Section 6: Results for the linear dynamic system when training and validating with the DOF 2 velocity signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1, B->2, and C->3.Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.
which has shown F I G U R E 4 System of Section 7: Results for the free fall nonlinear system when training and validating with the DOF 1 displacement signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1 and B->2.Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.F I G U R E 5 System of Section 7: Results for the free fall nonlinear system when training and validating with the DOF 1 velocity signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1 and B->2.Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.

Figures 6 -
7 with the same layout description as in Section 6.In total, 9 new velocity and displacement signals are classified, where ideally the first 3 signals belong to Model A, the second 3 signals belong to Model B, and the final 3 signals belong to Model C. In Figure 6, the performance of the networks using only the DOF 1 displacement signals in the training and validation process is shown.The C-Net correctly selects the model class for each signal apart from one which is misclassified as Model A despite belonging to Model B. The Kalman filter C-Net also provides the same selection accuracy, but with a shorter training period and loss minimization.In Figure 7, the performance of the networks using only the DOF 1 velocity signals in the training and validation process is shown.The C-Net selects correctly the class of eight signals, but misselects one of them.The Kalman filter C-Net also provides the same selection accuracy, but with a shorter training period and loss minimization.

F I G U R E 6
System of Section 7: Results for the hysteretic nonlinear system when training and validating with the DOF 1 displacement signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1, B->2, and C->3.Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.F I G U R E 7 System of Section 7: Results for the hysteretic nonlinear system when training and validating with the DOF 1 velocity signals.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net model class prediction where ideally A->1, B->2, and C->3 Fourth row: Kalman filter C-Net model class prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.

Figures 9 -
11 with similar layout description as in Section 6.In total, 10 new displacement, velocity, and acceleration signals are classified, where ideally the first 5 signals belong to Model A, and the second 5 signals belong to Model B. In Figure 9, the performance of the network is shown using only the top corner building DOF displacement signals.The C-Net correctly selects the model class for each signal apart from one which is misclassified as Model A despite belonging to Model B, and one which is misselected as Model B although belonging to Model A. In Figure 10, the performance of the network is shown using only the top corner building DOF velocity signals.The C-Net correctly selects the model class for each signal apart from one which is misclassified as Model A despite belonging to Model B.

F I G U R E 9
System of Section 8: Results for the 3D building finite element model when training and validating with the top corner DOF displacement signals (Kobe plot).First row: the displacement raw signals in .Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.Finally, in Figure 11, the performance of the network is shown using only the top corner building DOF acceleration signals.The C-Net correctly selects the model class for each signal apart from two which are misclassified.

F I G U R E 1 0
System of Section 8: Results for the 3D building finite element model when training and validating with the top corner DOF velocity signals (Kobe plot).First row: the displacement raw signals in m/s.Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.

F I G U R E 1 1
System of Section 8: Results for the 3D building finite element model when training and validating with the top corner DOF acceleration signals (Kobe plot).First row: the displacement raw signals in m/s 2 .Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.
first 3 signals belong to Model A, the second 3 signals belong to Model B, and the last 3 signals belong to Model C. For the free fall nonlinear system, the first 5 signals belong to Model A, and the second 5 signals belong to Model B.

F I G U R E 1 2
System of Section 6 in discussion Section 9: Results for the linear dynamic system when training and validating with the DOF 1 displacement signals, but with a poor filter size and number of neurons.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net class prediction where ideally A->1, B->2, and C->3.Fourth row: Kalman filter C-Net prediction.Fifth and sixth row: accuracy and loss in the training process for both networks.DOF, degree of freedom.F I G U R E 1 3 System of Section 7 in discussion Section 9: Results for the nonlinear system when training and validating with the DOF 1 displacement signals, but with a poor filter size and number of neurons.First and second row: the displacement and acceleration raw signals in m/s and m/s 2 , respectively.Third row: C-Net class prediction where ideally A->1 and B->2.Fourth row: Kalman filter C-Net prediction.Fifth and six row: accuracy and loss in the training process for both networks.DOF, degree of freedom.F I G U R E 1 4 System of Section 8 in discussion Section 9: Results for the 3D building finite element model when training and validating with the top corner DOF displacement signals, but selecting the model class of signals outside the training set (a change on boundary conditions is examined).First row: the displacement raw signals in .Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.
with the same layout description as in Section 8.In total, 10 new displacement, velocity, and acceleration signals are classified, where ideally the first 5 signals belong to Model A, and the second 5 signals belong to Model B. In Figure 14, the performance of the network is shown using only the top corner building DOF displacement signals.The C-Net misselects 7 out of 10 signals.Compared to Figures 12-13, both training processes reach a 100 % accuracy and the loss is low.The loss, then, cannot be used as an indication that the prediction is wrong.The same conclusion is derived in Figure 15 and 16 for the performance of the network using only the top corner building DOF velocity or acceleration signals, respectively.

F I G U R E 1 5
System of Section 8 in discussion Section 9: Results for the 3D building finite element model when training and validating with the top corner DOF velocity signals, but selecting the model class of signals outside the training set (a change on boundary conditions is examined).First row: the displacement raw signals in m/s.Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.

F I G U R E 1 6
System of Section 8 in discussion Section 9: Results for the 3D building finite element model when training and validating with the top corner DOF acceleration signals, but selecting the model class of signals outside the training set (a change on boundary conditions is examined).First row: the displacement raw signals in m/s 2 .Second row: C-Net model class prediction where ideally A->1 and B->2.Third and fourth row: accuracy and loss in the training process.DOF, degree of freedom.