Fault tolerance in non-linear systems: A model-based approach with a robust soft sensor design

A novel multiple Kalman ﬁltre (KF)-based scheme is proposed, as a generalisation of conventional gain-scheduling techniques, for fault diagnosis and tolerance in a large class of multiple-input and multiple-output non-linear systems. The outputs are corrupted by unknown stochastic disturbance and measurement noise. A reliable and computationally efﬁcient, two-stage identiﬁcation of a piecewise linear parameter-varying Box–Jenkins dynamic model that better approximates the non-linear system, at each operating point, and the design of the associated KFs are proposed. Novel emulators, whose induced parameter changes mimic likely and predictive operating scenarios, are used to provide an accurate model identiﬁcation and robustness to noise, disturbance, non-linearity errors and model perturbations. These crucial emulators generate missing representative data, aid predictive analytics, and improve the reliability and accuracy of the identiﬁed KF model. A novel formulation of the KF is used for fault isolation, and the Bayes strategy is used to isolate difﬁcult-to-detect incipient faults in noisy environments. The proposed scheme leads to the design of a novel robust soft sensor aimed at replacing the maintenance-prone hardware sensor in practical applications including product quality assessment, performance monitoring, condition-based maintenance, fault diagnosis and fault-tolerant control. The proposed soft sensor was successfully evaluated on simulated and laboratory-scale physical systems.


INTRODUCTION
With non-linearity and complexity pervading the real physical world, it becomes unquestionably important to develop powerful tools to design such systems with a stable high performance in the face of many unavoidable and performancelimiting constraints such as components wear and tear and unknown Gaussian disturbances, model uncertainties and measurement noise. To mitigate these nefarious effects, the design of effective condition-based monitoring and preventive maintenance programs has gained great importance as it would allow for a timely detection of potential faults and take quick corrective measures to prevent any loss in performance and/or occurrence of any potentially harmful disasters, including loss of production and harm to the personnel. These preventive measures are at the core of system fault tolerance, which ensures continued system production and performance despite the occurrence It is anticipated that the wave of soft sensing will sweep through the measurement world through its increasing use of the Internet of Things (IoT) devices which provide the internet with its connectivity with physical devices that are embedded in the electronics and sensors. These devices can interact with each other and communicate with other devices, over the internet, and can be remotely monitored and controlled. The IoT may be described as a fusion of multiple technologies, real-time predictive analytics, machine learning, sensors, and embedded systems. Its industrial counterpart, called the Industrial Internet of Things (IIoT) relies on the use of smart sensors and actuators found in the industrial sector, for applications such as the enhancement of manufacturing and industrial processes. In the important application of fault diagnosis (FD), focused upon in this study, their use still poses a challenge in view of the stringent and conflicting requirements for a high probability of correct fault detection and isolation and a low false-alarm probability. There has been a surge of interest in both academia and industry in the area of soft sensing due to their attractive features. Our own interest here lies in studying the use of soft sensing for FD in a large highly complex non-linear system, which depicts a typical industrial process so as to make our study as realistic as possible. We shall now discuss our proposed multiple-input and multiple-output (MIMO) scheme for FD in such large industrial processes, starting with a review of the two main conventional approaches to FD, and discussing in detail, the new tools used that make up this new scheme, including linear parameter-varying (LPV) models, Box-Jenkins (BJ) models, emulators, robust two-stage Kalman filtre (KF)-based identification technique, and Bayes' discriminant function.

Motivation
The prime motivation for focusing on the model-based approach here is that it will serve as the vehicle or platform that will facilitate the introduction, analysis and application to FD of the new LPV model used here, and other new tools such as emulators and a computationally efficient identification technique.

LPV
For a large class of physical systems, a linearised model covering the entire operating region, may not however accurately capture the behaviour of the system over a wide range of operating regimes resulting from variations in the input to, and parameters of, the subsystems. In recent years, many complex, high-order and non-linear physical systems including aircrafts, robot manipulators, ships, mechatronic systems, multi-machine power systems, power electronics devices, and automotive systems have been successfully modelled using the LPV framework for developing gain-scheduling controllers, fault diagnosis schemes, and real-time simulation systems to predict the integrity of the system under different operating scenarios [12][13][14][15]. A piecewise linear BJ-MIMO system is obtained as a better approximation to the non-linear model in our proposed scheme. It is an augmented version of the signal and disturbance model and is termed herewith the BJ model [5]. The auto regressive (AR), auto regressive and moving average (ARMA), moving average (MA) are all special cases of the BJ model. The transfer matrices of the signal and the disturbance may be very different to each other, unlike those of the ARMA model, where the signal and the disturbance share the same identical denominator polynomial. Unlike the ARMA model, the BJ model of the system is observable but not controllable, while that of the signal is both controllable and observable. In other words, the transfer matrix of the system is non-minimal whereas that of the signal is minimal.

Emulators
As the parameters of the subsystems are not generally accessible to modify so as to generate rich datasets, emulators, which are hardware or software devices, are instead used and connected, in cascade, to the accessible inputs and outputs of the subsystems during the offline system identification phase [2][3][4][5]16]. An emulator may take the form of a pure gain or a linear filter of some finite order to induce gain and/or phase variations. The parameters of the emulator mimic variations in the subsystem it is connected to, which may arise because of the variations in the phase and/or magnitude of the transfer function of that subsystem. A vector, termed influence vector, plays a crucial role in the fault isolation process. The influence vector is made of elements that are partial derivatives of the feature vector (the feature vector is a vector made of the coefficients of the numerator and denominator of the transfer function) with respect to each emulator parameter [2,[13][14][15][16]].

Two-stage identification
A computationally efficient two-stage identification scheme is proposed here as a better alternative to identifying directly the complex high-order MIMO system. In the first stage, the multiple-input and single-output (MISO) system, as well as the associated KFs forming the MIMO system, are identified and then the single-input, singleoutput (SISO) sub-systems and their associated KFs are derived from each of the identified MISO systems.
In the second stage, reduced-order models are identified from the high-order models and their associated KFs obtained in the first stage.
In both stages, the residual of the KF is ensured to be a zero-mean white noise process, and the appropriate orders are then determined using the Akaike information criterion (AIC). Though not related to our method, another two-stage identification method applied to Hammerstein systems can be found in [19].

Fault diagnosis
Physical systems are invariably subjected to perturbations in their practical use. Since the system is formed of the signal and disturbance models, either or both models may be perturbed to reflect the practical nature of the system they are part of. The fault signal is the output of the perturbed subsystem formed of the signal and disturbance models. A fault in the signal model, the disturbance model or both is isolated analysing residuals of the KFs of both the system and the signal model [1][2][3][4][5].

Main contributions
The existing FD approaches for SISO linear systems system [1] is extended to non-linear MIMO discrete-time systems. As such, our proposed approach subsumes all existing linear systems-based approaches, and generalises the conventional linear-modelling technique, thus leading to a new technique with a better modelling accuracy. The major and important tasks of modelling, identification of the robust KF, capability of detecting and isolating incipient faults, supported by the use of the novel concepts of emulators, feature and influence vectors, extension of the KF use to fault isolation, all constitute some of our major contributions in the proposed design of the novel soft sensor. With the use of a soft sensor, a proactive action such as the one taken in condition-based preventive maintenance must be taken prior to the occurrence of a potential fault. The FD and isolation scheme is extended to a wide class of non-linear systems whose outputs are corrupted by unknown disturbance and measurement noise. The complex high-order MIMO system is split into several MISO systems, which are, in turn, split into simpler low-order SISO systems so that a computationally efficient scheme for system identification and FD can be developed [5].
The proposed scheme has been evaluated on several simulated systems as well as on laboratory-scale physical systems including a non-linear process control system. The maintenance-prone hardware sensors of a robot or of any other industrial process, especially those where space is critical, can now be reliably and cost-effectively replaced by soft sensors designed as shown in this study.
The study is organised as follows. Section 2 develops the piecewise linear BJ-MIMO model that approximates the nonlinear model using an LPV framework. The BJ-MIMO model is split into MISO and SISO models for computational efficiency in both identification and FD. In Section 3, the model perturbation based on the emulator used to mimic the likely perturbation scenarios is developed. In Section 4, the key properties of the KF, including the model-matching and model-mismatch ones, are derived. The two-stage identification scheme using the residual model of the KF is explained for identifying the robust models of both the system and the signal, and their associated KFs. The FD scheme based on the residuals of the KFs of the system and the signal for FD is developed. Applications of the soft sensor are also discussed. In Section 5, the proposed scheme for identification and FD on a laboratory-scale process control system is successfully evaluated. Finally, in Section 6, conclusions are given.

LPV MODEL
The operating conditions change not only due to disturbances and variations in the system states but also due to changes in the system parameters, environmental variables, production quantities, quality of raw material and so forth. The scheduling variables, related to the operating points used, include the steady-state reference input, state variables, velocity and power, and external variables such as the altitude, temperature, pressure, and air speed. A finite number of scheduling variables is selected to cover the entire operating regime, including the linear and non-linear ones. The scheduling variables are denoted = [ 1 2 3 . L ] T ∈ ℝ L . Based on this, a piecewise linear dynamic model approximating the non-linear system is then obtained. Figure 1 shows the trajectory of the operating region (in solid line) as a function of the scheduling variable. The operating points along the trajectory are shown as square dots. The input segments and their corresponding output ones are indicated by dashed lines. The operating points of the perturbed system (mimicked by the emulators) are indicated by round dots about each operating point.
The LPV approach is based on identifying the system model at all the operating points, instead of only at one of them, as is conventionally done, and designing the KFs for each of the identified models using the two-stage identification scheme. At each operating point, a linear model, termed optimal model, is identified as a better fit (in the least-squares sense) to the inputoutput data from the set of emulated perturbations. The optimal model thus obtained characterises the behaviour of the system over wider operating regions (in the neighbourhoods of the operating points) whereas the conventional model characterises the behaviour merely at the nominal operating point, that is, the conventional approach assumes that the model of the system remains unperturbed at every operating point.
In order to ensure an acceptable error of the approximation of the non-linear system by the LPV model at each operating point, the input, and hence the output, of the optimal model for a specific operating point, are both restricted to lie within the segment, or interval, pertaining to this operating point, as is shown in Figure 1. The set of robust optimal models thus obtained approximates a wider class of non-linear systems. The approximation accuracy increases with the increase in the number L of segments. Hence, the set of the L KFs identified at each of the L operating points approximates the desired single KF of the overall non-linear model in Equation (1). Note here that the determination of L is dictated by the case-specific accuracy requirements of the user who then uses his/her a-priori knowledge of the severity, in different regions, of the system non-linearity being dealt with. The more severe the non-linearity is in a particular region, the more warranted the use of a higher number of operating points in this region will be. An analogous situation is the area of non-linear sampling where different sampling rates are to be selected by the user depending on his/her a-priori knowledge of the fast-and slow-changing regions of the signal.

2.1
The non-linear model The MIMO non-linear state-space model of a system is where the non-linear functions f(.) ∈ ℝ n , g(.) ∈ ℝ q and h(.) ∈ ℝ m of Equation (1) are smooth functions; x(k) ∈ ℝ n is the system state vector; u(k) ∈ ℝ p is the user-defined input that generates the signal s(k) ∈ ℝ q to be estimated; u w (k) ∈ ℝ p is an inaccessible input that generates the disturbance w(k) ∈ ℝ q ; v(k) ∈ ℝ q is the measurement noise; y(k) ∈ ℝ q is the measurement output; y (k) ∈ ℝ m , termed latent variable, is the plant output that needs to be estimated as it is either inaccessible or not measurable. The signal, disturbance and measurement noise are all assumed to be uncorrelated with each other. It is important to note that the frequency responses of the signal and disturbance models are not restricted to have non-overlapping spectra, which adds to the practicality of our proposed scheme.
An example of the above model includes the Hammerstein and Wiener models, which are a cascade combination of a linear system and a static non-linear function, in different orders.

Illustrative example
Consider a Hammerstein model formed of a static non-linear block cascaded with a linear dynamic model that follows it. The Hammerstein model is a special kind of a non-linear system, which has applications in many engineering problems and its identification is still a topic of active research and development [17]. Consider a first-order non-linear system with a saturationtype non-linearity given by: where a = 0.9, b = 1; (u(k)) = tanh(u(k)); u(k) = sin( 0 k), Using the LPV approach, the input u(k) is approximated by u lpv , which is obtained by segmenting u(k) into L adjacent segments u lpv = [ Δu 1 Δu 2 Δu 3 … Δu L ] where each segment covers the region on either side of each of the L operating points, i . For example, Δu i is the input segment at the operating point i which covers the time interval The LPV model approximation is given by: is the slope of the non-linearity (u) = tanh(u) evaluated at the scheduling variable (or operating point) i . The non-linear system is identified at each operating point using a persistently exciting input formed of a sum of sinusoids [18]. The piecewise linear model at each operating point, i , i = 1, 2, … , L, is identified using the LPV approach. The number of segments, L, for approximation is determined using an iterative scheme. At each iteration, a number of segments was first selected and the performance of the identified model was evaluated by computing the mean squares error between the true and the identified model outputs. The higher the number of segments selected, the smaller the mean-squared error will be used but at the cost of a higher-computational complexity. Using simulation, the number of segments that provided the best trade-off between the identification error and the computational complexity was found to be L = 45. Figure 2 compares the non-linear, and the piecewise linear model using the sinusoid as a test input. In Figure 2, the top subfigure shows the sinusoidal input to the non-linear system, whereas the bottom one compares the true output of the non-linear system with those obtained using the LPV-based piecewise linear model and the approximate linear model. The proposed scheme gives a significantly better performance of the LPV piecewise The simulation results clearly show that the proposed piecewise linear model using the LPV framework is a better fit to the non-linear dynamic system. Depending upon the requirements of accuracy, the number of segments is then accordingly selected. The larger the number of segments used, the better the model accuracy. However, the computational complexity increases with an increasing number of selected segments.

State-space BJ model
The LPV approach is based on identifying the system model and its associated KF at all of the L operating points, i , instead of only at one such point, and then applying the two-stage identification to each of the identified models.
The resulting linear approximation model is an augmented model of the signal and the disturbance, namely, the BJ model. The system output y(k) ∈ ℝ q is an additive sum of the signal s(k) ∈ ℝ q , disturbance, w(k) ∈ ℝ q and the measurement noise v(k) ∈ ℝ q : ℝ is the real scalar field. For notational convenience, the operating point is implied and not explicitly shown. Let y(k), s(k), w(k), x(k), u(k), u w (k) and v(k) be the respective steady-state values of the output, signal, disturbance, state, input, disturbance input and measurement noise variable or notational convenience, the dependence on the operating point i is supressed, The LPV-based output of the system can be expressed as At each operating point, the system is linearised. The linearised MIMO state-space model (A, B, C) at an operating point i is where A ∈ ℝ nxn , B ∈ ℝ nxp , C ∈ ℝ qxn and E w ∈ ℝ nxp ; (A, B, C) is a linearised model at an operating point; x(k) ∈ ℝ n is the state; u(k) ∈ ℝ p is a user-defined input that is accessible; u w (k) ∈ ℝ q is a zero-mean white noise process that is inaccessible; v(k) ∈ ℝ q is an inaccessible measurement noise that is a zero-mean white noise process. Also, is termed the signal model as it represents a noise-free and disturbance-free models and (A w , B w , C w ) is the disturbance model. Assumption: u(k), u w (k) and v(k) are mutually uncorrelated. The linear model in Equation (5) captures the local behaviour in the neighbourhood of the steady-state values, defined by where x , r , w , and v are the respective allowable deviations for the prescribed accuracy which depends upon the application at hand. Transfer matrix: The map relating y(z ) and u(z ) of the linearised model in Equation (5), denoted by G(z ), is a qxp transfer matrix: where N(z ) is the qxp numerator matrix polynomial; D(z ) = |(zI − A)| is a scalar denominator polynomial; I ∈ ℝ nxn is an identity matrix: .
The output of the system is: where ∈ ℝ q is the output and 1 × q vector error, =[ 1 2 . q ] T , given by Computationally efficient scheme: The transfer matrix of the complex and high-order MIMO system in Equation (8) is formed of q MISO transfer matrices G j (z ); j = 1, 2, … , q, and the corresponding MISO systems are, in turn, decomposed into pq SISO transfer functions G ji (z ) j = 1, 2, … , q, i = 1, 2, … , p so as to develop a computationally efficient scheme for identification and fault diagnosis purposes.

Selection of the operating points
The problem of determining the number L of operating points is a trade-off between the accuracy of the LPV-based piecewise linear model in Equation (5) compared to the true non-linear model in Equation (1), on the one hand, and the computational complexity of the LPV approximation using L operating points to capture the non-linear behaviour of the system, on the other. The LPV approximation, y lpv (k), of the non-linear system subjected to the input u(k), that is obtained from using L operating points is given by where is the slope of the non-linearity shown in Figure 1 at the operating point i . The measure of the approximation accuracy, denoted by L , is the 2-norm of the sequence of errors between the outputs of the non-linear system y 0, i (k) and its LPV approximation y lpv, i (k) across all of the input intervals Δu i (k) fori = 1, 2, … , L, that is: Assuming that the computational time to compute y(k, , i ) is T for all i, the computational complexity, denoted by C T and measured by the time to compute all the outputs y(k, , i ), i = 1, 2, … , L is: It can be deduced from Equations (12) and (13) that the accuracy and the computational complexity increase with an increase in the number of segments, L. An optimal choice of L may be determined from analysing the following issues: 1. Ensuring that the maximum approximation error in Equation (12) is less than the maximum allowable model deviation set by the user according to the application at hand. Note here that this will also aid in maximising the probability of correct fault detection while minimising the probability of false alarm.
2. Ensuring that the total computation time in Equation (13) is less than that the maximum allowable time to detect a fault set by the user as per the application at hand.
The optimal choice of L is clearly a trade-off between these two issues, and is application-dependent. Note here that although, theoretically, such trade-off would impose a lower bound on the smallest possible fault detection time, in practice, this would not be a severely limiting factor, as the currently available computational speeds would be amply sufficient to cater for the needs of practical industrial processes which are only moderately fast.

MISO models
MISO models are derived from Equation (8), and the map relating the single output y j (z ) and the multiple inputs u(z ) becomes where The MISO state-space model (A j , B j , C j ) that relates all the inputs u(k) to a single output y j (k) derived from the state-space model is given by where u w (k) is the input that generates the disturbance w(k) and (A j , B j , C j ) is the state-space model of G j (z ).

SISO models
Similarly, the SISO state-space model relating the input r i (z ), and its associated output, denoted by y ji (z ), which is the same as the output y j (z ) when the input is u i (z ), and the rest of the inputs are set to u j (z ) = 0 for j ≠ i, is The transfer function model becomes where

FIGURE 3 Pairing of the inputs and outputs
The transfer function G ji (z ) may in general be a cascade combination of subsystems{G ji (z )}: The subsystems G jil (z ) may, for example, be a transfer function of a controller, an actuator, a plant or a sensor associated with a position control system, process control system, magnetic levitation system, or other systems [5].

Interconnected system: Pairing of subsystems
The system is an interconnection of subsystems such as the plant, actuator, sensors, and controllers shown in Figure 3, where the top subfigure shows the j th output of the system, i.e. y j = ∑ p i=1 y ji given by Equation (17). The bottom subfigure in Figure 3 shows that the transfer function G ji (z ) in the path from the input r i to the output y i j is formed of subsystems {G i jl (z )}. The subsystem G i jl (z ) is driven by the input u jil (z ) and its output is corrupted by the disturbance w jil (z ). The input and output of G ji (z ) are r i and y ji , respectively, v ji is the measurement noise, ji , given in Equation (17), is the combined effect of the disturbances {w jik } and {v ji } on the output y ji (z ).

MODEL PERTURBATION
We shall discuss here how the overall system model perturbation is represented by multiplicative frequency-dependent perturbation terms in its blocks and how the behaviour of the perturbed blocks is then mimicked by varying the parameters of the newly introduced powerful emulators used in these blocks, during the offline phase of the model identification process.

Model perturbations
The overall interconnected system, G(z ), is formed of pq subsystems G ji (z ), each of which is, in turn, a cascade combination of subsystems {G jil (z )}. A subsystem G jil (z ) is subject to perturbation, which then results first in the perturbation of G ji (z ), and finally in that of the overall system G(z ). Let G ji (z ) be the perturbed subsystem. It can be expressed as a multiplicative combination of the nominal model G 0 ji (z ), and a perturbation term, denoted by g ji (z ) [2], as follows: where g ji (z ) is a stable frequency-dependent perturbation of G 0 ji (z ), and g ji ( j ) represents a frequency-dependent gain, phase, and delay-type perturbations. Emulators: The emulator used for the SISO and the MISO systems to mimic the behaviour of the perturbed subsystem is developed here. An emulator E ji (z ) is connected to the accessible input u i (k) in cascade with G ji (z ). The emulator E ji (z ) may be a static gain , a pure delay z −d or a Blaschke product of first-order all-pass filtres [2], expressed by ∏ i ji2 ( ji1 +z −1 1+ ji1 z −1 ), or merely a single first-order all-pass filtre, given by ji2 ( ji1 +z −1 1+ ji1 z −1 ), where the emulator parameters ji2 and ji1 induce gain and phase changes, respectively. The perturbation term g ji (z ) is either unknown or partially known. During the offline identification phase, likely perturbations are introduced in the nominal model by varying the emulator parameters to induce variations in the phase and magnitude of the cascade combination of the subsystems, thereby implementing the required perturbation in the overall system. In other words, an emulator, denoted by E ji (z ), is selected to induce gain and phase variations in the nominal system. The emulator plays the role of mimicking the perturbation g ji (z ):

Emulator-generated data
During the offline identification phase, an emulator E ji (z ) is connected to the input u i (z ) and is in cascade with the nominal unperturbed model G 0 ji (z ): where . E jp (z ) } and G 0 ji (z )E ji (z ) is the emulator-perturbed model that mimics the actual perturbed model G ji (z ). Similarly G 0 j (z )E j (z ) mimics the perturbed G j (z ) .
The signals, s ji (z ) and s j (z ) which are, respectively, the disturbance and the noise-free output, become where G 0 ji (z )E ji (z ) and G 0 j (z )E j (z ) are the transfer matrices of the signals s ji (z ) and s j (z ), respectively.

Role of emulators
The important role of emulators in (a) obtaining robust and accurate, identified models, and their associated KFs, for the MIMO, MISO and SISO systems, and (b) in aiding the fault isolation task, is now discussed. Optimal robust model identification: An optimal MIMO, MISO and SISO model and the associated robust KFs are identified using the emulator-generated data. The KF associated with the optimal model is not only robust to the disturbance and measurement noise, but also minimises the degradation in the performance of the KF used to estimate the outputs in the face of the ever-present perturbations that occur in physical systems.
Isolation of faulty subsystem: The estimates of the influence vectors and the emulator parameters are employed in isolating the faulty subsystem from the overall system [13].

SOFT SENSOR: A KF-BASED APPROACH
The KFs for the MIMO model in Equation (5), MISO model in Equation (15) and SISO model in Equation (16) are identified directly from the emulator-generated input-output data, and form the core of the designed soft sensor, as shown next.
The KF for the MIMO system is denoted by , with its relevant latent variables and error expressions given by: The KF for the MISO and SISO systems are, respectively, denoted by The relevant latent variables and errors are given bŷ where (A 0 , B 0 , C 0 ), (A 0 j , B 0 j , C 0 j ) and (A 0 ji , B 0 ji , C 0 ji ) are the robust optimal fault-free models; K 0 , K j 0 , and K ji0 are the robust optimal Kalman gains; ‚ y (k),ŷ j (k), andŷ ji (k) are the latent variables, of the corresponding MIMO, MISO and SISO KFs.

Key properties of the KF
Residual models of the KFs given by Equations (23), (24), and (25) associated with the optimal fault-free MIMO, MISO, and SISO are derived next. MIMO residual model: The residual model of the KF that relates the residual e(z ) to the system, the input u(z ) and output y(z ) [2][3] is where relates y(z ) to e(z ); and u(z ) to e(z ), respectively; andF 0 (z ) = |zI − A 0 | is termed here as the Kalman polynomial. Similarly, the residual models for MISO and SISO are obtained as shown below. MISO residuals model: SISO residual model: Signals estimates: The estimatesŝ ji (z ) andŝ j (z ) of the signals (22) are:ŝ

Fault diagnosis
When there is a model mismatch between the actual model and the one embodied in the KF, the residual will no longer be a zero-mean white-noise process as it will also comprise of an additive term called fault indicator [1][2][3][13][14][15][16]. The fault indicator is affine in the deviation involved in the feature vectors associated with the actual and KF-embodied models. The residuals e(k), e j (k) and e ji (k) of the KFs (23), (24) , and (25), respectively, are given below.

Computationally efficient fault diagnosis
The residuals of KFs associated with the q MISO systems in Equation (15) are analysed to monitor the status of the overall MIMO system. The residuals, which are not zero-mean whitenoise process, indicate that the overall system is perturbed, and possibly faulty. The perturbed SISO subsystems are determined by analysing the residuals of the SISO systems forming the perturbed MISO systems.

Identification scheme
To ensure that the identified model is reliable and accurate, the KF residual (the error between the output and its estimate) must be a zero-mean white-noise process, which implies that the identified model has captured completely the static and dynamic behaviours of the model to be identified.

Bayesian hypothesis testing
Fault detection is posed here as a binary composite hypothesistesting problem [2]. The criterion to choose between the two hypotheses, namely, the presence or absence of a fault is based on minimising the Bayes risk, which quantifies the costs associated with correct and incorrect decisions. The decision between the two hypotheses is asserted by comparing the likelihood ratio, which is the ratio of the conditional probabilities under the two hypotheses, to a threshold value, as shown below: where t s (e) is a test statistic that is computed using past residual values e(k) = [ e(k) e(k − 1) . e(k − N ) ] T . The test statistic t s (e) depends upon the class of reference inputs and th is some threshold value that is chosen to meet the stringent and conflicting requirements of a high probability of correct fault detection and isolation and a low false alarm probability.

EVALUATION OF THE PROPOSED SCHEME
The testbed used here is a physical process control which is non-linear with a dead-band non-linearity, and the flow of the fluid is turbulent instead of being the usual laminar flow [13,20,21]. The laminar flow is the flow of a fluid when each particle of the fluid follows a smooth path, which results in the velocity of the fluid being constant, whereas a turbulent flow is an irregular flow that exhibits tiny whirlpool regions, resulting in a non-constant velocity. The non-linear model was approximated using a piecewise linear single-input and multiple-output (SIMO) BJ model. The output of the piecewise linear SIMO-BJ model is modelled as an additive combination of (a) the signal, which is an ideal noise-free height, flow rate, and control input, (b) the disturbance that includes effects of turbulence, and (c) the measurement noise. The BJ model is an augmented state-space model comprised of the model of the signal, and the disturbance. The process control system is a closedloop SIMO system relating the reference input r (k) to the outputs of interest, namely, the control input u(k), flow rate f (k) and height h(k). The desired output such as the height of the fluid in the tank is regulated by using a controller driven by the error between the reference input and the output to be regulated. Since multiple outputs are measured, multiple KFs are employed to detect and isolate the height sensor, flow sensor, actuator, and leakage faults. The multiple KFs included here are associated with the overall closed-loop system relating the input r (k) to all the outputs,u(k), f (k), and h(k); and openloop subsystems such as the actuator and sensor, relating u(k) to f (k) and f (k) to h(k). The system formed of subsystems whose faults are to be isolated, is shown in Figure 4. There are four such subsystems: Subsystem 1 is the flow rate sensor 1 , subsystem 2 the height sensor 2 , subsystem 3 the actuator G 1 = G 0 1 3 where G 0 1 is the fault-free transfer function, and subsystem 4 the leakage fault, 4 . The fault-free cases correspond to γ i =1:i=1,2,3,4. The various subsystems and sensor blocks are all shown in Figure 4. The first two blocks G 0 and G 1 = 3 G 0 1 , represent the controller and the actuator subsystems, respectively. The leakage is modelled by the gain 4 , which is used to quantify the amount of flow lost from the first tank. Thus, the net outflow from tank 1 is quantified by the gain (1 − 4 ). Since the two blocks G 0 2 and (1 − 4 ) cannot be FIGURE 4 Block diagram of process control system dissociated from each other, they are fused into a single block labelled G 2 = G 0 2 (1 − 4 ). The system is controlled using LAB-VIEW which acquires the flow rate, and the height sensor outputs. The controller is implemented in LABVIEW and the controller output drives the actuator, namely, the DC motor and pump combination. A fault in the sensor is introduced by including the emulator blocks, i : i = 0, 1, 2 in the control input, flow rate, the height sensors. Similarly, an actuator fault is introduced by including an emulator 3 between the controller output and the input to the DC motor. The leakage fault is simulated by opening the drainage valve of the first tank. The emulator 4 models the amount by which the valve is opened.
During the offline phase, the emulators mimic various types of likely faults resulting from the leakage, and perturbations in the actuator and the sensor. Using the emulator-generated data, the influence vectors and model of the system are identified. The emulator-generated data is shown in Figure 5.
The height, flow rate and control input profiles under various types of faults are shown in Figure 5. Subfigures A, B and C on the top show height profiles, and subfigures D, E and F in the middle show the flow rate profiles, and G, H and I at the bottom show the control input profiles under leakage, actuator and sensor faults, respectively. The faults are induced by varying the appropriate emulator parameters to 0.25, 0.5 and 0.75 times the nominal values to represent 'small', 'medium' and 'large' faults, respectively. However, by its very control-design objective, the closed-loop Proportional Integral (PI) controller will hide any fault that may occur in the system and hence will make it difficult to detect. In addition, the physical system exhibits a highly non-linear behaviour. The flow rate saturates at 4.5 ml/s. The dead-band effect in the actuator exhibits itself as a delay in the output response, in that when a step reference input is applied, the height output responds after some delay, as a minimum force is required to overcome the deadband effect and drive the actuator. These non-linearities affect the steady-state value of the height, in that, even though there is an integral action in the closed-loop control system, the steadystate error is non-zero for a constant reference input. The values of height, flow-rate and control inputs are all expressed in centimetres.

Fault diagnosis
If there are any perturbations in the MIMO, MISO or SISO system, the residual of the associated KF will not be a zero-mean  white-noise process. There will be an additive fault-indicating term. As a result, the autocorrelation of the residual will not be a Dirac delta function. Further, the variance of the residual, which is the maximum value of the autocorrelation at the origin, will be larger compared to the case of no perturbation. By analysing the autocorrelation of the residual, the presence of a perturbation or the absence of a fault is asserted using the Bayes decision strategy. Figure 6 shows the KF residual and its autocorrelation for the following cases: (a) Nominal (or fault-free), (b) leakage fault, (c) actuator fault and (d) sensor fault, and the four class labels defined by the step changes in the emulator parameters. The test statistic value is the lowest and the autocorrelation is that of a zero-mean white noise for the nominal (fault-free) case. Subfigures A, B, C, and D show the residuals and their test statistics shown as straight lines; subfigures E, F, G and H show the corresponding autocorrelations; subfigures I, J, K, and L show the corresponding class labels. The Bayes decision strategy was employed to assert the fault type, that is, to decide whether it is either a leakage or an actuator or a sensor fault, respectively, using the fault isolation scheme proposed in [2]. The values of the residuals are expressed in centimetres. The faults are accurately detected and isolated as shown by the correct assertions of the class labels. The fault size is also estimated, with the estimation error being high for actuator and sensor faults but not so for leakage faults. The fault size errors are mainly due to the saturation-type non-linearity effects of the fluid system. The non-linearity effects at play here affect mostly the actuator and sensor and are thus the main contributing fac-tors to the large-size estimation errors in the faults of these two devices.

False alarm probability
The Bayes decision rule is implemented using the test statistic computed from the KF residual which is an indicator of the fault-induced model perturbation in Equation (33). Thanks to the emulator, which mimics likely operating scenarios, the data it generates includes a wide spectrum of likely model perturbations and give rise to a large number of KF residuals associated with the resulting perturbed models. Model perturbations may not necessarily imply the system is faulty unless their estimated fault sizes are unacceptable. The mean value of the residual associated with an incipient fault, which gives the deterministic component of the model perturbation, is selected as the test statistic. The problem of fault detection and diagnosis, involving probability distributions of false alarm, detection, and missed detection in the presence of measurement noise, disturbances, and modelling errors is still a challenging problem and continues to enjoy a great deal of attention from both industrial and university researchers alike. Previous methods rely on historical data, trend analysis and so forth, which all require time to carry out and gather data for.
Our approach avoids that by 'mimicking' all of the past data through the use of emulators, hence is less time-consuming than other methods. This is a major difference that sets our study apart from the rest. In our proposed approach, as the incipient fault detection and its size are estimated using a KF, the threshold values need to be estimated to develop the Bayes' strategy for determining the probabilities of false alarm, detection and missed detection. The smallest failure (fault) that can be detected depends on the effect of model uncertainty and measurement noise on the system operation. The essence of the failure detection problem is to generate a residual and choose a threshold value such that a wide failure detection zone is covered. The threshold value needs to be chosen large enough to eliminate false alarms, yet small enough to ensure that sufficiently small failures are successfully detected. Table 1 The threshold value is easily derived using the estimated size of the incipient fault as shown in Table 1.
The KF residuals are Gaussian stochastic processes whose mean are the estimated fault indicator terms. The test statistic is chosen as the fault indicator term, which is the mean value of the residual. As both the test statistic and the threshold values are known, the probability distributions of the false alarm, detection and missed detection can be determined at selected threshold values, as shown in Figure 7, for leakage, actuator and sensor faults. These probability curves may be employed by the user as a design tool when designing the Bayes' strategy for fault detection. Figure 7 shows the role of the threshold values for the detection of leakage, actuator and sensor faults. Subfigures A, B and C show the false alarm probability versus the selection of the threshold values for leakage, actuator and sensor faults, respectively, whereas subfigures D, E and F show, respectively, the miss detection probability of leakage, actuator and sensor. Similarly, subfigures G, H and K show, respectively, the KF residuals and test statistics for leakage, actuator and sensor faults.

CONCLUSION
This study proposed a novel fault diagnostic scheme for a class of MIMO non-linear system when the output is corrupted by the disturbance and measurement noise. The nonlinear model is approximated by a piecewise linear BJ model using an LPV framework. A computationally efficient identification, fault detection and fault isolation schemes were employed by splitting the multivariable system first to (a) a MISO system and then to (b) a number of SISO systems. To ensure reliability and robustness, a novel scheme based on using emulatorgenerated data was employed to identify a robust optimal system and its associated KF. Thanks to this novel emulator-based identification scheme, the multiple KFs, one for each operating point, were shown to be accurate, reliable, and robust to modelling uncertainties, including non-linearities and neglected fast dynamics, while retaining their sensitivities to incipient faults. The proposed scheme was shown to outperform the conventional linearised model-free scheme in both detection and fault isolation.