Towards a framework of enforcing resilient operation of cyberphysical systems with unknown dynamics

Ensuring that safety-critical cyber-physical systems (CPSs) continue to satisfy correctness and safety specifications even under faults or adversarial attacks is very challenging, especially in the presence of legacy components for which accurate models are unknown to the designer. Current techniques for secure-by-design systems engineering do not provide an end-to-end methodology for a designer to provide real-time assurance for safety-critical CPSs by identifying system dynamics and updating control strategies in response to newly discovered faults, attacks or other changes such as system upgrades. We propose a new methodology, along with an integrated framework implemented in MATLAB to guarantee the resilient operation of safety-critical CPSs with unknown dynamics. The proposed framework consists of three main components. The runtime monitor evaluates the system behaviour on-the-fly against its correctness specifications expressed as signal temporal logic formulas. The model synthesiser incorporates a sparse identification approach that is used to continually update the plant model and control policies to adapt to any changes in the system or the environment. The decision and control module designs a controller to ensure that the correctness specifications are satisfied at runtime. For evaluation, we apply our proposed framework to ensure the resilient operations of two CPS case studies.

requirements, but at a later stage, this correctness guarantee is invalidated, possibly due to adversarial attacks on sensors, or violation of environment assumptions. Ensuring that a safetycritical CPS continues to satisfy correctness and safety specifications under faults or adversarial attacks is still very challenging, especially in the presence of legacy components for which accurate models are unknown to a designer. Current techniques for secure-by-design systems engineering do not provide an end-to-end methodology for a designer to provide real-time assurance for safety-critical CPSs by identifying system dynamics and updating control strategies in response to newly discovered faults, attacks or other changes such as system upgrades. Most CPS security literature [1,[3][4][5][14][15][16][17][18][19][20] also focuses on identifying if an attack is occurring, rather than identifying if system correctness and safety requirements are being violated due to the attack, and updating the controller at runtime only if needed to ensure resiliency.
Here, we propose a methodology, along with an associated framework, to ensure that a safety-critical CPS will continue to satisfy the correctness requirements under unexpected failures or adversarial attacks during runtime operation. The key challenges to be addressed here are: (1) we consider safetycritical CPSs that have (a) legacy software and physical components in which the physical plants and control policies are unknown to the designer, and (b) model properties that may change over time due to system upgrades, adversarial attacks, and changes in environment assumptions; (2) as new vulnerabilities are discovered, requirements also evolve aimed at ensuring resiliency; and then that requires the system identification must be repeated, and the control must be updated continually in real-time. Figure 1 shows an overview of our proposed framework, which builds around three blocks: (a) a runtime monitor, (b) a model synthesiser, and (c) a decision and control module.
The first module is a runtime monitor block which continuously monitors at runtime whether the system continues to operate safely while achieving the mission objectives. In other words, its function is to capture the erroneous behaviour of an actual system against its correctness properties in the presence of faults and attacks, over some finite time horizon ahead. To ensure a rich set of correctness properties, we specify objectives and constraints of a system using signal temporal logic (STL) [21], which is a prominent specification formalism for real-time systems. Our runtime monitor block utilises the algorithm proposed in [22] to compute and maintain the robust satisfaction intervals of an STL formula over partial traces one-the-fly. It also incorporates a bug analyser to examine counterexamples produced from the monitoring process to provide feedback to a designer. We note that although runtime monitoring cannot prove the system correctness, it is the best effort to efficiently find faults existing in safety-critical CPSs whose dynamics and control policies are unknown to the designer; and the use of formal verification may be infeasible due to the uncertainty, complexity, and heterogeneity of the system.
Many cyber or physical components in the system may not have accurate models available. Further, if the runtime monitor detects that the system is no longer satisfying the specifications, it means that the system or the environment may have changed due to faults, cyber or physical attacks, or simply that the system behaves differently in a new region of the state space (e.g., a car slipping when on ice). The second module, which is a model synthesiser constantly performs system identification for the system and the interactions with the environment to construct and adapt the system model. In our approach, we adopt a lightweight spare regression technique proposed in [23] to identify nonlinear system dynamics on-thefly. As using a spare regression, our model synthesiser is computationally efficient, requires a small amount of training data, and can produce an interpretable model that makes it viable for online training and execution in response to rapid system changes.
The last module is a decision and control module. Given the system model and the correctness requirements (i.e. safety requirements and mission goals), this module aims to automatically redesign a learning-based controller in order to make the system continue to satisfy the correctness requirements under unexpected failures or attacks. Our decision and control module is developed based on the Simplex Architecture [24], which involves the use of two controllers: a complex (learningbased) controller that provides a high-performance control to achieve mission-critical goals, and a safety controller is designed with simplicity to enforce safety-critical requirements. If the monitor detects that the complex controller does not satisfy the specifications, a system will be switched to use the safety controller in order to enforce safety. Simultaneously, the module analyses the counterexamples (i.e. negative robustness of STL formulas computed by the monitor block) and feedback from the bug analyser (involved a designer in the loop) to understand the erroneous behaviours. Based on that, the module will automatically synthesise a new control strategy that makes the system (with the updated plant model from the synthesiser) continue to operate safely while achieving the mission objectives. Here, we assume that the input space of a system can be parametrised by the set of parameters, and then the controller synthesis is accomplished by conducting a parameter synthesis [25] which automatically searches over the parameter space to determine the best value of control inputs. It is important to point out that the context of resilience considered here is specific for a safety-critical system (i.e. the system can change between its controllers to avoid unsafe regions) and is not preferred to recovering the system from an unsafe region to a safe region in response to faults, attacks or other changes such as system upgrades.
Our integrated framework consisting of the model synthesiser, the runtime monitor, and the control and decision module is built in MATLAB, which requires users to provide the following inputs: (a) input-output time-series data used to learn and update the plant model, (b) the environmental disturbances and control inputs that periodically updated following the update of the plant model, (c) the performance and safety requirements of the system specified in terms of STL formulas. For evaluation, we apply the proposed framework to ensure the resilient operations of two case studies in robotics and automotive control domains. The first case study is a simplified model of a robotic arm incorporating a series of elastic actuators that need to be resiliently controlled under a physical attack. The second case study is a simplified model of an adaptive cruise control (ACC) system under a sensor spoofing attack. For both case studies, we will present how to learn and monitor the original models against their correctness specifications encoded as STL formulas at runtime. Then, we describe the scenarios of faults and attacks on the two systems. Finally, we demonstrate how our framework can be used to update the plant dynamics and new control strategies to maintain the resilient operations of those systems during faults and attacks.

| Contributions
In summary, the main contributions of the paper are as follows.
• The fresh methodology to facilitate resilient operation of safety-critical CPSs against unanticipated attacks and failures by ensuring both of the mission objectives and safety requirements are met while minimising the model and controller updates. • We considers a system that may include legacy components such that the physical plants and control policies must be continually synthesised to adapt to the changes in system properties and environments. • The end-to-end design and implementation of a MAT-LAB toolkit, which integrates the model synthesiser, runtime monitor and decision and control module to automatically update the plant and control strategies to adapt to the changes in system properties and environments at runtime. • The applicability of our proposed framework on two proof-of-concept case studies where the CPS models can be updated online to ensure their resilient operations in the presence of unanticipated attacks and failures.

| Paper organization
The remainder is organised as follows. Section 2 presents an overview of our proposed methodology through a simplified example of a series elastic actuator (SEA). Section 3 describes the runtime monitor of our framework with respect to specifications encoded as STL formulas. Sections 4 and 5 present the model synthesiser and the decision and control module of our framework, respectively. Section 6 presents two case studies that illustrate the capability of our framework in continuously maintaining the resilient operations of the CPS models of the robotic arm incorporating a SEA element under a physical attack, and the simplified example of an ACC system under a sensor spoofing attack. Section 7 situates our proposed methodology to the existing literature and discuss some limitations of our approach. Section 8 presents conclusion and our future works.

| ILLUSTRATIVE EXAMPLE
In this section, we will explain our methodology through a simplified example of a SEA model. We assume that the system dynamics and control strategies have been learnt from a given data set, and it satisfies a safety requirement pertaining to

Environmental and control inputs
F I G U R E 1 An overview of the proposed framework that can facilitate resilient operation of safety-critical CPS in the presence of faults and attacks. The framework involves the integration of three main blocks: (1) a model synthesiser aims to constantly performs online system identification to update the system model, (2) a runtime monitor that continuously monitors at runtime whether the corresponding implementation can achieve its mission objectives with safety guarantees encoded as STL formulas, and (3) a decision and control module analyzes the counterexamples generated from the monitoring process and designs a new controller at runtime to enhance system resilience. CPS, cyber-physical system; STL, signal temporal logic NGUYEN AND GUPTA the position of the end-effector that varies with an input force. However, due to a physical attack on the system, the previous learning model no longer satisfies the safety requirement. In the following, we will describe how our framework can automatically detect a violation under the attack and synthesise a new SEA model that continues to satisfy its safety requirement. For simplicity, we consider a SEA model shown in Figure 2 that consists of two masses coupled with a spring and a dashpot, with the dynamical equations given by where the subscripts A and o respectively denote the actuator and end-effector (output) variables of positions, velocities and accelerations. F is the force input by the actuator, K is the spring constant, and d is the adjustable (controlled) damping coefficient. Here, we assume d = d e + d c , where d e denote the damping of the environment and d c is the controllable damping. m A and m o are the masses of the actuator and endeffector, respectively. In correspondence to a real motion system, the first mass is connected to the actuator of the system whereas the second mass represents the end-effector the functional part of the system. Assume that the position sensor is located on the end-effector side, that is x A is not observable; only x o is measured. Thus, Equation (1) can be rewritten as a fourth-order differential equation of F and x o , which is complex. We consider an example of safety specification of the system such that the velocity of the effector that is should always less than a threshold a and the position of the end-effector cannot excess a threshold b. That safety requirement can be encoded as the following STL formula The system can also have other performance specifications with respect to the limitations of overshoot, settling time and steady-state error. The corresponding STL formulas of these performance requirements will be introduced later in Section 6.1.
Suppose that F = F 0 sin (2πt), F 0 = 50 m A = 5, m o = 1, K ∈ [50, 100] and d ∈ [10,20], we first generate a data included 100 of simulations of Equation 1 with random values of d and K within their given ranges. Given the simulated data, we use the SINDYc (sparse identification of nonlinear dynamics with control) algorithm [23] to identify the SEA model. The learnt model using SINDYc has similar behaviours with the true system (the simulations of Equation (1)) and satisfies the safety specification. However, assume that there is a physical attack on the system that increases the damping of the environment. Hence the learnt model is no longer accurate. The runtime monitor of our framework predicts that the safety specification is no longer satisfied. Thus, we need to relearn the new plant model and update the input force F and the value of d c to maintain the resilient operation of the system. To do that, we call the model synthesiser incorporating the SINDYc algorithm to relearn the correct model. The decision and control module will update the controller to ensure that the correctness specifications are met for the new learning model. Figure 3 illustrates that the error between the true system and predicted model output is significantly increased when the parameter value changes at t = 10 s. The model is then relearned and updated at t = 12 s, which brings down the error once again. We note that parameters such as spring constant, damping coefficient, and output mass were arbitrarily chosen for our illustration purpose. One can select a different set of parameter values to represent the system dynamics, control strategy, and attack scenario for a SEA model. We will discuss the example in more detail in Section 6.1.

| RUNTIME MONITORING OF STL
Monitoring is an automated approach that aims to capture the negation of the correctness specifications that we want the system to satisfy. In monitoring, we first execute the system with respect to a given input, then check whether an output satisfying or violating the correctness specifications expressed using formalisms such as temporal logic. Monitoring can be performed both at design time (offline monitoring) and during F I G U R E 2 A simplified example of SEA model. SEA, series elastic actuator

F I G U R E 3
The error in the true system and predicted model output with respect to the actuator position. Here, the total damping coefficient d increases from 20 to 100 at t = 10 s due to a physical fault or attack. Thus, the learnt model is no longer accurate. Our method relearns the model at t = 12 s, which can then be used to update the controller runtime operation (online monitoring). In our approach, we aim to conduct online monitoring to capture the erroneous behaviour of an actual system against correctness properties in the presence of faults and attacks at runtime, over a finite time horizon T H . To express the correctness requirements of a system over its continuous real-valued signals, we use STL [21].
In what follows, we will describe the concepts of signal, system, the syntax and semantics of STL, and the corresponding online monitor.

| Signals and systems
We define a signal w as a function w : is the time domain which is a finite or infinite set of time instants. If D ¼ B, w is a Boolean signal whose value is either true or false, and if D ¼ R, then we say that the signal is realvalued. In this paper, we consider a piecewise linear signal, that is a sequence of time-value pairs. A trace, w : T → D 1 � ⋯ � D n , is a collection of n signals, where ∀t ∈ T; wðtÞ ¼ ðw 1 ðtÞ; w 2 ðtÞ; …; w n ðtÞÞ. An input-output system Σ is a function mapping a given input trace u : T → D m to an output trace x : T → D n . Intuitively, we can consider w = (u, x) as one input-output execution trace of the system Σ with n variables that describes an evolution of the system. Without loss of generality, we consider that the states and outputs of the system Σ are the same. We reserve the use of bold letters like w, x, u for traces (i.e. tuples of signals), while we use lowercase italicised letters such as w i to represent signals.

| Signal temporal logic
STL can be defined in terms of its syntax and semantics. Syntax describes the structure of syntactically-correct formulas for the logic, while semantics describes the meaning of the formulas and the rules to evaluate them.
A signal operator ϕ is a formula of the form y (w(t)) ≥ 0, where y is an arbitrary real-value function. A notion I is an , and a, b are real numbers and 0 ≤ a < b. If I is not specified, we assume that I = [0, ∞). We also allow Boolean operators ∨ and ⇒ with their standard meaning. Temporal operators used in STL formulas include always (□), eventually (♢), and until (U), respectively, where ♢ I φ = trueU I φ, and □ I φ = ¬♢ I ¬φ. For example, a trace w¼ Δ fw 1 ; w 2 g satisfies the if there exists a time instance t, 1 ≤ t < 2 such that w 1 is greater than w 2 . Next, we will define the quantitative semantics of STL which captures the robustness satisfactions of STL formulas [21,26].

| Robust interval semantics
In our approach, we aim to conduct online monitoring to capture the erroneous behaviour of an actual system against STL properties in the presence of faults and attacks at runtime, over a finite time-horizon T H [22]. For online monitoring, only a partial trace of the system is available for estimating satisfaction value. However, the quantitative semantics of STL recursively defined by Equation 3 only works with a complete trace, so it limits to offline monitoring. For online monitoring, the quantitative semantics of STL in Equation 3 must be extended to work with a partial trace over a finite time interval. Next, we define the notions of partial trace and robust satisfaction interval, then describe the interval-based semantics of STL.  We note that the robust satisfaction interval captures all possible robust satisfaction values corresponding to the suffixes of a partial trace. The robust satisfaction interval defined in Definition 3.4 can also be obtained by recursively computing a real-valued function [γ] defined as follows.
, t) that captures the robust satisfaction interval of an STL formula φ over a partial trace w [0,i] and a time t ∈ T can be defined as follows where y inf and y sup denote the infimal and supremal value of the function y (w(t)) over the signal domain D, respectively.

| Online monitoring of STL
The runtime monitor block of our framework utilises the algorithm proposed in [22] to monitor an STL formula online due to its two advantages. First, the algorithm can efficiently compute and maintain the robust satisfaction intervals of an STL formula over partial traces with incomplete data. Secondly, the algorithm requires a small amount of memory that is trace-length independent, so it is fast enough to run in a realtime manner for a complex system. Given an STL formula, the algorithm first breaks down the formula into a syntax tree including multiple sub-formulas, where each node represents a temporal operator and each leaf represents a signal predicate. For example, Figure 4 shows the syntax tree of an STL formula where each node is annotated with its corresponding time horizon. Based on the syntax tree, the algorithm computes on-the-fly the robust satisfaction interval for a partial trace of the system in a bottom-up fashion. Meaningly, the algorithm will proceed upward on the syntax tree, only updating the satisfaction interval of a node only if there is an update on the satisfaction interval of its children, for example the robust satisfaction interval of ♢ ½τ 2 ;τ 3 � ðx 2 > 10Þ is updated based on that of (x 2 > 10). As a result, the robust satisfaction interval of an STL formula will be recursively combined from the robust satisfaction intervals of its subformulas (presenting by nodes and leaves). We will not present the algorithm further here, but prefer a reader to see [22] for a detailed explanation. It is important to note that our runtime monitor's error detection mechanism is based on evaluating the robust satisfaction interval of STL formulas. Thus it is generic and not dependent on types of attacks and faults.

| MODEL SYNTHESIS
The model synthesiser block is designed to continually synthesising the plant model and control policies to adapt to the changes in system properties and environments at run-time. While learning-based methods such as Gaussian process [27][28][29] and deep learning neural network [30][31][32] have been used to perform offline identification for dynamical systems, applying them to online system identification is challenging. The main drawbacks of these methods are: (1) they often require large volumes of training data, (2) the learning model may not be interpretable and (3) the learning process is time-consuming, which limits their use for online identification. In our approach, we use the sparse identification of nonlinear dynamics with control (SINDYc) approach proposed in [23] to identify CPS models online. General speaking, SINDYc is suitable for online identification as it can overcome the disadvantages of machinelearning approaches like the Gaussian process and neural network. The underlying principle of SINDYc is spare regression which is computationally efficient and robust to noise. Spare regression also requires a small amount of training data and can produce an interpretable model that makes SINDYc viable for online training and execution in response to rapid system changes.
Suppose that the plant model of a system Σ has a form _ x ¼ f ðx; uÞ, where x ∈ R n is the state, u ∈ R m is the control ] [ , ] [ , ] [ + , + ] F I G U R E 4 An example of the syntax tree of an STL formula. STL, signal temporal logic input and f : R n � R m ↦ R n is a smooth mapping function. Let Θ(x, u) be a library of candidate nonlinear terms of x, u and the cross terms including both x and u, then a sparse regression will be used to identify the few active terms in Θ to approximate the function f. Assume that we can collect n numbers of measurements of the state x corresponding to the input u, and arrange them into data matrices X = [x 1 x 2 …x n ] and U = [u 1 u 2 …u n ], then the library of candidate nonlinear functions Θ can be evaluated as where the operator ⊗ defines the vector of all product combinations of the components in x and u. Then, the dynamics of a learning model can be derived from the equation _ X ¼ ϒΘ T ðX; UÞ, where ϒ is a sparse coefficient matrix. Then, a sparse regression will be used to identify ϒ with the fewest nonlinearity terms that can give good model performance by solving the Lagrangian minimisation problem: where a subscript i denotes i th row of each matrix, and α defines a sparse-penalty for the order of a learning model, which is chosen to balance between the model complexity and accuracy. The SINDYc algorithm takes three inputs including the library of candidate functions Θ T (X, U), the derivative _ X and the parameter α, and then outputs a spare coefficient matrix ϒ. The algorithm allows a user to smartly select a suitable library of candidate terms, depending on applications and the trade-off between model complexity and data fit. A good strategy is to learn polynomials first, and then increase the complexity of the library by including other terms such as trigonometric functions. For example, the simplified SEA model presented in Section 2 only has polynomial terms such that we do not need to learn sinusoidal components to avoid unnecessary computation. Our model synthesiser incorporates the SINDYc algorithm to constantly perform system identification for the system and the interactions with the environment to construct and adapt the system model. The model synthesis procedure consists of the following steps.
1. Given an empty model Σ, at design time, we call the SINDYc algorithm to learn the first operation mode of Σ from a given input-output data. Our priority is to learn the least complicated model while ensuring the error in the true system, and learning model output is bounded within a given threshold specified by a designer. The learning model is then verified against the system requirements.
2. At runtime, the model synthesiser will constantly call SINDYc to update system dynamics in response to newly discovered faults, attacks, or other changes. Such a new learning model will be incorporated as one operation mode into the system Σ. Thus, Σ can be considered as a hybrid system in which each operation mode defines a different region of the state space.

The switching conditions between operation modes of Σ is
synthesised based on the counterexamples returned from the runtime monitor. A switching condition can be timedependent, state-dependent or both. For example, the simplified SEA model will have two operations modes with respect to different values of the environmental damping d e .
And the switching conditions between the old operation mode to the new operation mode can be d e > λ. We perform a parameter synthesis that automatically searches over a parameter space to determine the best value of λ to ensure that Σ continues to satisfy its correctness specification.

| DECISION AND CONTROL MODULE
The final module is a decision and control module developed based on the Simplex Architecture [24] involving the use of two controllers: a complex (learning-based) controller provides a high-performance control to achieve mission-critical goals, and a safety controller is designed with simplicity to enforce safety-critical requirements. Overall, the decision and control module performs the following tasks: (a) design a complex controller that can be continually updated at runtime, (b) analyse counterexamples returned from the verifier and feedback from the bug analyser, and (c) perform appropriate switches between the complex controller and a safety controller in order to make the system continue to satisfy the correctness requirements under unexpected failures or attacks. In our approach, the complex controller is continually updated such that it can utilise the current model of the plant and ensures that the performance specifications are met. We consider the following heuristic for the usage of the complex controller: given a finite time horizon T H , at every control iteration, the complex controller Σ.CC of a system Σ is used only if ∃δ < T H such that: denotes the state of Σ at time t evolved from Init when using the complex controller. An STL formula φ represents the safety properties of Σ; and Σ.P and Σ.SC denote the learning plant and safety controller of Σ, respectively. Intuitively, this heuristic states that the complex controller is used only if there is always enough time to appropriately switch to the safety controller to enforce system safety, were a fault to arise. We note that the decision module is low-conservative in which the NGUYEN AND GUPTA usage of the complex learning controller is maximised to achieve mission-critical objectives. The working principle of our decision and control module is shown in Algorithm 1, where [γ](φ, w [0,i] , t) is a robust satisfaction interval at the time t when the corresponding statement of the algorithm is executed. The module will decide to switch to use the safety controller when the runtime monitor returns a negative robust satisfaction interval against the safety specification φ. If switching to the safety controller still does not ensure that the system continues to satisfy φ, we must stop the system immediately. Otherwise, the system keeps using the safety controller until the plant dynamics and controller are updated, respectively. The controller-synthesis function takes as inputs the new plant model and the STL specifications, and then outputs a new control strategy that makes the system continue to operate safely while achieving the mission objectives. Indeed, this function aims to address the following control synthesis problem.

Problem 1 Given a dynamical system Σ, a safety specification φ and a mission goal ϕ over a finite timehorizon T H , find a control strategy u such that
•Σ(u)j ¼ φ^ϕ, and •the robustness satisfaction interval with respect to φ, that is [γ](φ, Σ(u) [0,i] , t) is maximised, where t ≤ t i ≤ T H , Σ(u) denotes the output of system Σ with respect to an input u. For a complex system, solving this control synthesis is intractable even if we know the input constraints. In our approach, we assume that the input space U of the system Σ can be parametrised by the set of parameters P u . Thus, the control synthesis problem can be reduced to solving a parameter synthesis problem: find an optimal input parameter p � ∈P u such that the two conditions in Problem 1 are satisfied. For example, one possible use case is tuning control parameters of a PID controller to maximise the robustness satisfaction, that is making the system as robust as possible. In our framework, we incorporate the method proposed in [25] to conduct a parameter synthesis over the parameter space of control inputs. If the synthesiser can find the best value of control inputs over the given range, the module then updates the controller that makes the system continue to satisfy its safety requirements and achieve mission objectives. Otherwise, tool will suggest the designer to search over different parameter ranges while the safety controller is still being used. We note that the robustness-guided search implemented in our framework is generic, as it can be applied to sophisticated learning controllers using a neural network or Gaussian process.

| CASE STUDIES
In this section, we demonstrate the applicability of our proposed framework to ensure the resilient operations of two case studies. For each case study, we will present how to learn and monitor the original models against their specifications encoded as STL formulas at runtime, present the scenarios of faults and attacks on the two systems, and then demonstrate how our framework can be used to update the plant dynamics and new control strategies to maintain the resilient operations of those systems. Our framework was tested using MATLAB 2018a and MATLAB 2018b executed on an x86-64 laptop with 2.8 GHz Intel(R) Core(TM) i7-7700HQ processor and 32 GB RAM. All performance metrics reported were recorded on this system using MATLAB 2018a. To perform a parameter synthesis, we choose the CMAES solver [33], and the maximum optimization time is 30 s.

| Robotic arm with SEA model
The first case study is a robotic arm with SEA model. We previously introduced the simplified example of the SEA model in Section 2 to illustrate our approach. The SEA element has been applied widely in compliant robotic grasping as it makes a robot less stiff and tends to have more accurate, less noisy, and stable force control. In this section, we present the simplified example of a robotic arm which incorporates a SEA element using a PID controller. The controlled input force F in the Equation (1) has a form is the desired position, K p , K i , and K d are proportional, integral and derivative gains of the PID controller, respectively. We note that the robotic arm with a SEA model presented herein is not a representative of the complexity of a true robotic arm system, but a simplified example in which the dynamics, parameters and control equations are chosen for simplicity of presentation. Suppose that we want a robot moves towards the grasping point x o = 5 no later than 10 s, and eventually moves to the grasping point x o = 20 within 20 s. This objective requirement can be encoded as STL formula On the other hand, we also want to ensure that the system satisfies its performance and safety requirements. Here, the performance requirements describe constraints over the overshoot and transient (i.e. setting time) behaviours of the system, and the safety property specifies that the robot cannot move beyond the position of 30. Those requirements can be formalised as STL specifications as follows.
We first use SINDYc to identify the model from a data acquired from 100 of simulations of Equation 1 with K ∈ [10, 20], d ∈ [5,15] and m A = m o = 1. Initially, we have x o = 1 and other variables are equal to 0; and there is no external force applied to the system. Originally, the PID controller with K p = 0.001, K i = 0.05, and K d = 1.85 can control the system to achieve the objective requirement while ensuring the satisfactions of other performance and safety properties.
Assume that there is a physical attack on the systems at t = 10 s with a constant external force F ext = 5, and the value of d changes to 100. Thus, the original model learnt by using SINDYc is no longer correct. The runtime monitor predicts that the system will violate the safety property in the future, that illustrates by the red dash line in Figure 5. To maintain resilient operation, the system is now forced to use a safety controller, which simply applies the maximum braking a max until the robot stops. At the same time, the model synthesiser calls SINDYc to learn a new model that takes account of the external force and the new value of d. With the new plant model, the control and decision module conducts a parameter synthesis [25,34] to identify the new values of K p , K i , and K d . Since the violating trace goes beyond the safe distance of x o = 30, the PID controller needs to increase the values of parameters in order to reach the desired potion of x o = 20. The decision and control module then conducts a parameter synthesis over the ranges of K p , K i ∈ [0, 1] and K d ∈ [0, 20]. The model synthesiser took 0.354 s to learn the new plant dynamic while the parameter synthesis took 1.247 s to learn the new values of K p , K i , and K d . Figure 5 shows that the new plant model and control strategy with K p = 0.2, K i = 0.9, and K d = 10.5 updated at t = 12 s make the system continue to satisfy the safety requirements and achieve its objectives (i.e. φ overshoot (SEA) and φ transient (SEA)).

| ACC system
Next, we consider a simplified example of ACC system, including two autonomous cars: the host car and the lead car. The original ACC system operates in two modes: speed control and spacing control. In speed control, the host car travels at a driver-set speed. In spacing control, the host car aims to maintain a safe distance from the lead car. The vehicle has two state variables: d is the distance to the lead car, and v is the speed of the host vehicle. These states evolve according to the following equations where v l is the lead vehicle speed, u is the control input, and μ is the friction control. In this example, we assume that the lead vehicle speed is known exactly (e.g. it is communicated between vehicles). Conceptually, the dynamics for d represent that the derivative of the relative distance is the difference between the lead vehicle speed and the host vehicle speed. The dynamics for the host vehicle speed indicate that as the vehicle speed increases, it takes more acceleration (i.e. force) provided by the controller (i.e. engine) to maintain speed. The ACC system has two sensors that measure its velocity v via noisy wheel encoders, v enc = v + n enc , and a noisy GPS sensor, v gps = v + n gps , where n enc and n gps denote the encoder and GPS noises, respectively.
Additionally, the ACC system has a radar sensor that measures the distance to the lead vehicle, d rad = d + n rad , where n rad captures a corresponding noise. It is worth noting that the ACC model presented herein is not representative of the complexity of a true ACC system, but a simplified example in which the dynamics and control equations are chosen for simplicity of presentation, and sensor measurements are not considered complex mechanisms such as sensor fusion and Kalman filtering.
To design a control law, we need to estimate the state of the vehicle (i.e. we need the estimates of d and v, which we denote as b d and b v). To estimate the distance and velocity, we employ state estimators In speed control mode, v l is assumed to be the desired velocity set by the cruise control input. Observe that while it would be possible to use only the wheel encoders all the time, a better velocity estimate can be obtained by using an average velocity measurement (from both the GPS and wheel encoders) when the GPS sensor is performing within nominal specifications. To implement a controller, a control law based on the state estimates in speed and distance control modes are given: where u s is the control law in the speed mode and u d is the control law in the spacing mode, and d ref ¼ 10 þ 2b v is the reference distance between the two cars. These control laws incorporate a reference velocity of v l , which can be thought of as constant gain that depends on the lead (or desired) vehicle velocity. In contrast, the other terms depend on the deviation of the lead vehicle and host vehicle states. Switching between modes is handled by monitoring the state estimates, such that guards can be written as: Speed control to spacing control : b d < 10 þ 2b v Spacing control to speed control : b d ≥ 10 þ 2b v; where (in this example) the invariants activated when the guards are enabled to force control mode switching. The safety specification of the system is specified that d should always be greater than d safe , where d safe = v + 5. This safety requirement can be encoded as an STL formula, φ saf e ðACCÞ ¼ □ ½0;∞Þ ðd < 5 þ vÞ: We consider a scenario that the initial conditions of the system are v l = 20, d (0) = 50, v (0) = 30, b dð0Þ ¼ dð0Þ and b vð0Þ ¼ vð0Þ and μ = 0.0001. The original model learnt by using SINDYc with the noisy levels of |n rad | ≤ 0.05, |n enc | ≤ 0.05, |n gps | ≤ 0.05 satisfies the safety requirement φ safe (ACC). However, it is determined that either one of the GPS or wheel encoder sensors is spoofed with the value of −20 at t = 10 s. Consequently, the estimated velocity of the host car becomes significantly less than the actual value. Figure 6 shows that the model will no longer satisfy φ safe (ACC) at t = 27 s (a red dash line). Thus, the ACC system switches to use a safety controller, which increases the friction control to the maximum value of μ = 0.05 to quickly slow down the host car. When using the safety controller, the bug analyser of our framework can analyse a counterexample to identify which sensor is spoofed. Figure 6 illustrates the estimated error to the safe distance when using only one of the GPS or wheel encoder sensors (black dash lines). Based on that, we can determine that the GPS sensor is spoofed, and the wheel encoder is safe to be used. To provide resilience against the GPS spoofing attack, a new control strategy is to ignore the GPS value and uses only the wheel encoder to estimate velocity. It shows in Figure 6 that the system continues satisfying the safety specification when the new controller is updated at t = 15 s using only the wheel encoder. F I G U R E 5 Under the external force F ext = 5 occurs at t = 10, the runtime monitor with a prediction horizontal T H = 10 s detects that the robot will violate its safety property in the near future. Then the system switches to use a safety controller to completely stops the robot at t = 11 s. Our methods relearns the plant model and updates a new control strategy to make the robot move to the desired position x o = 20 as well as ensuring its performance and safety specifications

| RELATED WORKS AND LIMITATIONS
In this section, we will situate our work with existing literature and discuss several limitations of our proposed framework.

| Related works
The methodology proposed here aims to provide a comprehensive framework to support model-based design, optimal control, and safety assurance of learning-enabled CPSs. The most relevant work presented in [35], which provides formal safety guarantees for reinforcement learning in the presence of multiple possible environmental models. The approach synthesises a set of candidate models together with provably correct control policies, then performs a model identification process to select between available models at runtime in a way that preserves the safety guarantees of all candidate models. However, this theorem prover approach only works with a hybrid system whose dynamic is known. To apply the approach, we need to model both a system and its correctness properties using a differential dynamic logic, which is less expressive than STL. Also, the approach does not relearn the model on-the-fly. Besides that work, we situate our proposed methodology to the existing literature in the three following categories.

| Model-based design of resilient CPS and CPS security
Examples of model-based approaches to ensure resiliency include those based on co-simulation of discrete-event models [36], attacker defender games [37], mode-based repair of hybrid systems [38], and resilience proofing [39]. Testbeds such as [40] have also been introduced. CPS security is also a popular topic, including the works [17,[41][42][43]. However, there is a general lack of approaches that can consider systems with a black-box simulator or legacy components. More troublingly, the existing approaches do not offer a model or controller update mechanism for ensuring safe system operation when vulnerabilities are discovered during runtime operation, even if it is obvious that the model may have changed due to faults, attacks, or upgrades. We provide an end-to-end methodology and implementation to learn and update the model on-the-fly, with guarantees that correctness requirements are always satisfied.

| Real-time assurance of CPS
Existing techniques on real-time assurance [44][45][46][47][48][49] are largely limited to a system where the plant dynamics are known and simple, and environment assumptions are static. However, we consider a system that includes legacy components that have complex and unknown dynamics, so existing real-time verification approaches for CPSs are computationally expensive and cannot be applied to those systems in an online manner. Instead, we refer to use the runtime monitor, which is the best effort to efficiently find faults existing in autonomous CPSs whose dynamics and control policies are unknown to the designer. To monitor a system at runtime, all we need are the input-out traces and the specification formulas representing the correct requirements of the system. In our approach, we utilise STL to specify the objectives and safety requirements of a CPS, and the algorithm proposed in [22] to perform the online monitoring against the STL specifications.

| Robust controller design for learningenabled CPS
Two systems that are close to each other in terms of the inputoutput response in the open-loop may yield very different performance when put in feedback with the same controller. This realization led to the development of the area of identification for control, where the goal is to identify models such that controllers designed based on these models provide specific performance guarantees on the true system (see [50] for a comprehensive survey). With the emergence of the learningbased controller, design using methods such as the Gaussian process [27][28][29] and deep learning neural network [30][31][32], this field has seen a resurgence. An important open challenge is that of ensuring analytical guarantees on the stability and performance of the closed-loop system, with controllers that are designed based on models that are learnt from data. In our proposed framework, we use a fast system identification algorithm (e.g., using SINDYc [23]), perform a runtime monitor, F I G U R E 6 Under the sensor spoofing attack occurs at t = 10 s, the runtime monitor with a prediction horizontal T H = 20 s detects that the ACC system will violate its safety property in the near future. Then the ACC system switches to use a safety controller that applies an emergency brake on the host car. At t = 15 s, a new control strategy that only uses the wheel encoder sensor is updated to make the host car travels at a desired speed and the ACC system still satisfies its safety specifications. ACC, adaptive cruise control and then use the monitoring results to update the controller on-the-fly to simultaneously optimise the system performance for mission-critical objectives and guarantee safety requirements.

| Limitations of our approach
There are several shortcomings of our proposed approach. Although the STL-based monitoring procedure used in our runtime monitor is scalable, generic and can effectively detect a violation of STL specifications, it cannot give formal proof of the system correctness. Depending on the different types and complexity of safety-critical CPSs, other automatic verification tools can be considered to perform a reachability analysis or formally prove whether a system satisfies a given safety property. Examples of such verification tools are d/dt [51] and SpaceEx [52] (for linear/affine hybrid systems), Flow* [53] and dReach [54] (for nonlinear systems); and C2E2 [55] (for Stateflow models). Despite the expensiveness of STL, there are requirements including stability, security and safety of safetycritical CPSs defined over multiple execution traces of the system [56][57][58], which is something that cannot be expressed or monitor using STL.
Our model synthesis using SINDYc may be facing limitations of the measurement data used to update the model to achieve resilience, and may not scale to the complex CPSs that are of practical interest. One solution is that we can redesign our model synthesiser to run on parallel platforms to reduce computation time and enhance scalability. Besides that the functionality of our decision and control module is not fully automated yet as (a) it still requires a feedback from the designer to update the controller for specific systems, and (b) the control synthesis is currently based on conducting a parameter synthesis where the parameters space of control input space needs to be provided by users.

| CONCLUSION AND FUTURE WORKS
Here, we have presented a new methodology that can efficiently maintain the resilient operation of safety-critical CPSs with unknown dynamics in the presence of faults or adversarial attacks. The proposed methodology, along with the associated framework, is a combination of techniques from system identification, adaptive control, and runtime monitoring. Particularly, the runtime monitor predicts violations of correctness requirements expressed in terms of STL formulas. The model synthesiser incorporates a spare system identification approach that can continually update the plant model and control policies to adapt to the changes in system properties and environments. The decision and control module redesigns the controller such that the performance specifications will be met with the given system conditions and performance metrics at runtime. We demonstrated the applicability of our methodology by using the proposed framework to efficiently ensure that the CPS models of two case studies continue to meet their safety and performance requirements under realistic attacks.

| Future works
We intend to extend our framework in several directions. First, we plan to evaluate our current framework on other safetycritical CPSs that have a higher level of complexity beyond the two case studies in Section 6. Second, we want to incorporate other learning approaches such as the Gaussian process and deep learning neural network into the model synthesiser to perform offline identification for dynamical systems. Sideinformation such as conservation laws, symmetries, or dissipativity properties [59,60] that the system satisfies can be added to the learnt model to improve its fidelity. Third, the learning-based controller will be constructed using concepts from classical control such as dissipativity, as well as reinforcement learning [61][62][63]. This combination will increase the fidelity of the learning-based controller with a guarantee of dissipativity properties while providing an ability to update the controller learnt by using reinforcement learning quickly. Besides using a runtime monitor, we plan to develop a comprehensive verifier that allows the use of formal verification with respect to correctness requirements at multiple stages. The verifier will include an off-the-shelf verifier such as Flow* [53], dReach [54], and NNV [64,65], tran2019fm to verify the satisfaction of STL formulas based on performance requirements for the controller being designed, as well as a runtime verification tool to monitor in real-time any violations that are happening as the true system evolves with the designed controller. Finally, we also want to extend the framework to deal with a distributed learning-enabled CPS that have multiple, heterogeneous agents whose motion dynamics are unknown.