Continuous self‐adversarial training of recurrent neural network–based constitutive description

Data‐driven methods yield advantages in computational homogenization approaches due to the ability to capture complex material behaviour without the necessity to assume specific constitutive models. Neural network–based constitutive descriptions are one of the most widely used data‐driven approaches in the context of computational mechanics. The accuracy of this method strongly depends on the available data. Additionally, when considering inelastic materials, whose constitutive responses depend on the loading history, the accuracy and robustness of the approximation are influenced by the training algorithm. The applied recurrent neural networks exhibit reduced robustness in the presence of errors in the input. When capturing the history dependency using previously predicted material responses, occurrences of prediction errors accumulate over several time steps. An approach for achieving enhanced robustness of the predictions is based on extending the initial training dataset by iteratively generating adversarial examples, subjected to perturbations, based on the current prediction errors. In this contribution, a continuous self‐adversarial training approach yielding robust recurrent neural network constitutive descriptions for inelastic materials is presented. Compared to the iterative method it is based on, it exhibits significantly improved training efficiency. In order to demonstrate the capabilities of the proposed methods, numerical examples with datasets obtained by numerical material tests on representative volume elements are carried out. Validation of the results is performed using both test load cases from the numerical dataset, as well as application as a constitutive model in the finite element method.

understanding of the constitutive interactions is required for the optimal use of these materials.Numerical homogenization introduces the possibility of comprehending the effect of the heterogeneity on the effective continuum behaviour; however, traditional approaches like FE 2 even though accurate, are computationally expensive.Neural network (NN)based homogenization as presented in [1] overcomes these drawbacks by utilizing NN to capture the effective material behaviour.Recurrent neural networks (RNN) offer compelling motivation for their use in numerical homogenization due to their ability to handle sequential data, thus enabling to capture temporal dependencies.Furthermore, RNNs have also been used in analysis under the consideration of uncertainty [2].This ability of RNNs renders them highly suitable to model intricate nonlinear mappings of complex material behaviour.However, within the FEM framework, in the absence of the sequential data, the predictions from RNN lead to deviation from the actual values.Thus, a novel self-adversarial training algorithm for RNN is presented in [3], which ensures the robustness of the RNN-based constitutive description within the FEM framework.Within the scope of this work, a new training framework for RNN is presented, which drastically reduces the epochs required for training.Within the proposed algorithms, the training dataset is continuously augmented while training, simulating the effect of the introduction of the prediction errors into the input of the RNN.Subsequently, the obtained RNN representing an effective homogenized continuum is compared to a reference solution within an FEM framework.

FUNDAMENTALS OF RECURRENT NEURAL NETWORK-BASED CONSTITUTIVE DESCRIPTION
To effectively model the inelastic properties of materials, a special class of NN termed RNN is employed.They are specifically designed to handle sequential data by incorporating feedback connections, which allows them to capture the temporal dependencies efficiently.Furthermore, RNN possesses a memory component that encapsulates the information from the previous steps and entails its effect while predicting the current time step.It is updated at each time step based on the current input.Owing to the sequential input and the memory component, RNNs have the ability to capture intricate nonlinear relationships in terms of weights and biases.These parameters are shared across all the time steps and require algorithms like backpropagation through time to optimize the parameters with respect to the loss function.In this framework, a gated recurrent unit (GRU) [3] is chosen as the recurrent unit.It combines the memory cell and the hidden state, thus, simplifying the architecture and making it computationally more efficient.
When incorporating the trained RNNs into the finite element framework, the absence of the data necessitates the generation of the input sequence by utilizing the predictions from previous time steps to predict at the current time step.Despite the negligible prediction error at each time step, the cumulative effect of small errors from the previous prediction gives rise to the issue of error accumulation at the current time step.The result is a substantial deviation of the predictions from the actual behaviour, despite having a considerably good training accuracy.To mitigate this problem and increase the robustness against the errors in input sequence from its own predictions, adversarial examples are introduced in the training dataset and the RNN is re-trained on the adversarial dataset.It has exhibited substantially improved results as presented in [4].
To mitigate the co-adaptation of the feature detectors, a dropout rate of 20% is applied to each layer in RNN.Additionally, to prevent overfitting and to facilitate the termination of training, an early stopping callback mechanism is used and the patience level is set to 20.The mean squared error (MSE) of the absolute loss is calculated to quantify the losses.Considering the symmetry of the stress and the strain tensor, they are represented in the Voigt's notation.

CONTINUOUS SELF-ADVERSARIAL TRAINING ROUTINE
The conventional training scheme consists of optimizing the RNN's parameters using a dataset consisting of input sequences and target output.The accuracy achieved during this training could be very high; however, it does not necessarily ensure robustness when integrated within the finite element framework.Adversarial training of RNN, in contrast, is an extension to the conventional training scheme that aims to improve the RNN's robustness against the adversaries in the input.Adversarial training involves training the NN on the adversarial dataset, which is generally generated by applying modifications to the original dataset.As presented in [3], it follows an iterative approach to training.The RNN is initially trained on the original dataset for a predefined number of epochs.The performance of the RNN is then evaluated by testing it on the test dataset.The absolute errors obtained from the test give the error distribution using kernel density The training dataset is augmented as the RNN is being trained by using a custom training algorithm.This approach not only enhances the robustness of the predictions but also reduces the number of training epochs compared to the iterative approach.Figure 2 schematically depicts the continuous self-adversarial training scheme.Initially, the RNN is trained on the original dataset and the parameters are optimized according to the prescribed loss function.Subsequently, the performance of the RNN is evaluated using the validation dataset.This dataset serves two purposes, firstly, it aids in monitoring the overfitting within the network, and secondly, the errors obtained from this dataset as are subsequently used to sample error distribution according to KDE.Since there is no relation between the perturbations of the time steps of the sequence and also over the different sequences, a new perturbed validation dataset from the sampled error is obtained as where  * are the sampled error for each sample,  are the original values,  ∈ [1,   ] is the input sequence and  ∈ [1,   ] is the component.Even after the perturbations, every perturbed sequence still maintains the same labels as the original.The performance of the RNN is evaluated on the perturbed validation dataset and errors are obtained according to Equation 1.The errors from this dataset are not used to monitor the overfitting of the RNN and the error distribution closely resembles the imperfect distribution that the RNN might be subjected to in the FEM simulations.Thus, the error distribution from this dataset is used to create copies of the perturbed training dataset which are then appended to the original training dataset to be used for training the RNN in the subsequent epoch as depicted in Figure 2.
The perturbations are introduced in the history variable tensor  and the stress tensor  of the training dataset, to simulate the effect of prediction errors in the input sequence as illustrated in the perturbed datasets in Figure 2. Within the context of the selected architecture of the RNN, overfitting is observed.Thus, the number of epochs is defined by the onset of overfitting, while the predefined number of epochs is set to a very high value.The training stops automatically when overfitting is detected.The perturbations are introduced in the training dataset without ending the training, thus it is possible to monitor the overfitting even while augmenting the dataset.The continuous monitoring of the overfitting and continuous introduction of the perturbations lead to a drastically low number of epochs.
The decision to utilize the validation dataset for obtaining error distribution during the first and second perturbation steps within an epoch is based on the investigation performed with validation and the test datasets.The pre-processing of the datasets prior to training is performed as given in [3].Thus, the size of the test dataset is smaller compared to the validation dataset, rendering it more susceptible to variability and fluctuations, leading to an unreliable error distribution.A parameter is introduced as wait duration to the algorithm, which controls the frequency of the perturbation of the dataset, with a higher wait duration, the RNN has more epochs to optimize the parameters on the same dataset.Conversely, with the wait duration set to 1, the training dataset is perturbed continuously, thus exposing the network to a new dataset with each epoch.
Feedback testing, as illustrated in the Figure 1, is performed to assess the network's performance in the presence of the adversaries within the input data and its robustness against it.It provides a metric for gauging the network's resistance against adversarial influence.

NUMERICAL EXAMPLE
The dataset for the training is generated by numerical material test (NMT) from an RVE consisting of a cylindrical fiber with linear elastic behaviour oriented parallel to the z-axis with a radius of 0.4 mm embedded in an elasto-plastic matrix.
The constitutive law and the material parameters are given in [3].The side length of the RVE is 1 mm.A cyclic macroscopic strain  is applied over the RVE as where  amp gives the amplitude of strain components and Λ() ∈ [−1, 1] is given as where  is a unitless quantity describing the time step, not necessarily physical time and  max is maximum number of time steps, gives the direction of loading.The values of  amp are randomly distributed as [ min ,  max ] to account for various loading setups.Additional methodologies as outlined in [5] could be explored to further enhance the efficacy of the dataset generation.

Dataset generation
The dataset generated after NMT from RVE comprises imposed strains and corresponding stresses acquired.The stress magnitude varies considerably across different components due to structural or loading-induced anisotropy.Data normalization is employed to enhance the generalization capability of the network and effectively handle varying input data scales.The normalized dataset is partitioned into training and testing purposes.The training dataset contains 90% of the data while the remaining 10% is allocated to the testing dataset, which is used for feedback testing.To ensure the model's ability to generalize with respect to varying sizes of load steps, a random dilution technique is applied.It involves randomly excluding certain load steps within the load case; however, still retaining the points that signify a change in the loading direction and preserving the temporal order of the loading steps [6].Multiple instances of dilution are carried out to encompass all possible combinations.
In this case,  is a scalar and together with  conveys the information regarding the load history.By incorporating these components in the input, the RNN is capable of capturing the phenomena associated with the history dependencies.The input sequence for the training data is assembled from the randomly diluted data as where  dim is the number of input components and   is the sequence length.While generating the training dataset, the  and  values are sampled from the file.

Model evaluation
Two different RNN models with different architectures are evaluated as described in Table 1.
A dropout value of 0.2 is applied to each layer.The RNN models are trained on two different datasets with   of 4 and 30 obtained from the same NMT.The performance of the RNN is parameterized by the median error (ME) and standard deviation (SD) of the MSE of the errors.The best performing RNN, trained on dataset with   of 4 and 30, is selected after conducting a detailed investigation, however, could not be presented here.From Table 1, it is observed that RNN 2 has a significantly lower value of SD.The difference in the SD could be attributed to various factors such as longer sequence length, wider architecture and a lower wait duration.The training progression of the RNN is illustrated in Figure 3, which indicates that RNN 2 trained for more epochs compared to RNN 1, signifying a late onset of the overfitting.Furthermore, RNN 2 required 1487 min to complete its training, whereas RNN 1 took 668 min to train.
A load case predicted by RNN 2 by feedback testing on the test dataset is compared to the target label obtained by NMT in Figure 4.An example, where all the components are loaded, is selected randomly and it exhibits a close fitting of the RNN output to the target values.
The trained RNN models are integrated into an in-house code to facilitate structural analysis.For this purpose, an example of a beam is considered, where one end is fixed and the other end is subjected to loading in all three axes.A comparison is made between a reference solution obtained from the FE 2 method and the results from the RNN-based solution in Figure 5. Evidently, it is clearly observed that the RNN 2 trained on a dataset with a sequence length of 30 outperforms the RNN 1 trained on a sequence length of 4. The results from RNN 2 closely resemble the reference solution while being more efficient than the reference solution.Additionally, the re-usability of the RNN also needs to be taken into consideration, as the same RNN can be used as a material model for numerous simulations as long as the loading is within the boundaries set by the training data.

CONCLUSION AND OUTLOOK
Within this contribution, an efficient training algorithm is proposed for training RNNs.This algorithm not only ensures the robustness of the RNN during application but also makes the training process efficient by reducing the number of epochs.An RNN-based constitutive description is also extremely efficient when compared to traditional methods of numerical homogenization like FE 2 .Furthermore, from the presented results, RNN demonstrated an excellent capability to accurately capture the intrinsic non-linear relationship between the input and output quantities representing a constitutive description.The comparison of RNN model to reference solution in the FEM framework displays the RNN's robustness against its own prediction errors and a good agreement is observed between the reference FE 2 solution and the RNN-based solution.
The framework discussed in this contribution, however, is contingent on the amount of data required to train the RNN.To address this challenge, physics-augmented loss functions could be explored as a potential solution.Moreover, the prediction of the shear components shows a relatively low accuracy in comparison to the direct components.This discrepancy in the predictions could be attributed to the intricate complexities inherent to the shear behaviour.The orthogonal decomposition of the stress components can be used to reveal the underlying correlations which could be used to reduce the complexity of the input data while training to improve the performance of RNN towards the shear components.

F I G U R E 1
Illustration of feedback testing.estimation (KDE).The perturbations in the training dataset are introduced according to the sampled error distribution and the RNN is re-trained on the perturbed training dataset for predefined epochs.The iterative re-training of the RNN serves to enhance the robustness and generalization of the network, however, requires a large number of epochs to train.The continuous self-adversarial training effectively eliminates the re-training of the network.

2
Flowchart for continuous self-adversarial training.

3
Training progression of the RNN.

F I U R E 4 F I G U R E 5
Performance of neural network with a sequence length of 30.Force displacement relations for the beam loaded in x-, y-, z-direction on free end.
TA B L E 1