Plausible Petri nets as self-adaptive expert systems: A tool for infrastructure asset monitoring

This article provides a computational framework to model self-adaptive expert systems using the Petri net (PN) formalism. Self-adaptive expert systems are understood here as expert systems with the ability to autonomously learn from external inputs, like monitoring data. To this end, the Bayesian learning principles are investigated and also combined with the Plausible PNs (PPNs) methodology. PPNs are a variant within the PN paradigm, which are eﬃcient to jointly consider the dynamics of discrete events, like maintenance actions, together with multiple sources of uncertain information about a state variable. The manuscript shows the mathematical conditions and computational procedure where the Bayesian updating becomes a particular case of a more general basic operation within the PPN execution semantics, which enables the uncertain knowledge being updated from monitoring data. The approach is general, but here it is demonstrated in a novel computational model acting as expert system for railway track inspection management taken as a case study using published data from a laboratory simulation of train loading on ballast. The results reveal self-adaptability and uncertainty management as key enabling aspects to optimize inspection actions in railway track, only being adaptively and autonomously triggered based on the actual learnt state of track and other contextual issues, like resource availability, as opposed to scheduled periodic maintenance activities.


INTRODUCTION
Self-adaptability is an important intrinsic property that is displayed by many natural systems to deal with the challenges presented by changing environments. In engineering, the need to incorporate self-adaptation has been acknowledged as important in allowing engineered systems to modify their behavior in response to changing conditions with little or no human input, hence increasing efficiency, safety, and availability while minimizing the possibility of human errors (Krupitzer, Roth, VanSyckel, Schiele, & Becker, 2015). Intelligent systems like expert systems have the ability to emulate the human capacity to make decisions within a especially when system-level operational nonlinearities (e.g., resource availability, concurrency and synchronization of components, etc.) need to be considered in the analysis (M. Chiachío, J. Chiachío, Sankararaman, & Andrews, 2017). PNs provide a graphical and mathematical language with well-established execution semantics, which can be combined with other computational techniques such as object-oriented programming, fuzzy sets, neural networks, etc., all of which greatly increase their suitability for modeling knowledgebased engineering problems (Li & Lara-Rosano, 2000). The basic concepts relative to the theory of PNs are summarized in Murata (1989). One of the main challenges in incorporating self-adaptation in expert systems is the handling of uncertain information during runtime, and the ability to update such information when new evidence becomes available (Bencomo & Belaggoun, 2013). However, the existing PN formalisms do not provide direct means to efficiently consider uncertain information (M. Chiachío, J. Chiachío, Prescott, & Andrews, 2016;M. Chiachío, J. Chiachío, Prescott, & Andrews, 2018). In the literature, a number of PN variants have been introduced to enhance the original PN approach with improved rules of inference and knowledge learning. Of the existing variants, fuzzy PNs (FPNs) (Looney, 1988) have received much attention due to their efficiency for reasoning in expert systems using fuzzy production rules based on imprecise and vague information (Chiang, Liu, & Lee, 2000;Flintsch & Chen, 2004;Lee, Liu, & Chiang, 1999;Zhou & Zain, 2016). In the past, some authors dealt with self-adaptation of FPNs by training an FPN model using a reference one taken as benchmark (Li & Lara-Rosano, 2000;Zhang, Wang, & Yuan, 2009). More recently, other PN variants have been introduced to deal with some sort of self-adaptivity, see for example (Hsieh & Lin, 2014;Vidal, Lama, & Bugarín, 2012;Vidal, Lama, Díaz-Hermida, & Bugarín, 2013), although most of them are domain or purpose-specific (Serral, De Smedt, Snoeck, & Vanthienen, 2015). However, to the best of the authors' knowledge, none of them are well suited to (a) embedding plausible information within their formulation, and (b) automatically reacting to that information by adaptive learning, while dealing with the hybrid nature of real-world dynamical systems, consisting of a combination of discrete and continuous processes whose evolution may be uncertain.
In this article, a new methodology is proposed to enable self-adaptation in expert systems using the Plausible Petri nets (PPNs). PPNs are a new class of models within the PN paradigm, originally developed by the authors in M. Chiachío et al. (2016Chiachío et al. ( , 2018, whereby discrete events (e.g., go/no-go decisions) can be jointly modeled together with continuous processes whose evolution may be uncertain (e.g., deterioration process). In PPNs, the uncertainty is modeled using states of information (Rus, Chiachío, & Chiachío, 2016), which provide a mapping between the possible numerical values of a state variable and their relative plausibility, hence giving greater versatility for representing uncertain knowledge in a more principled approach. As a key contribution, this article reveals how self-adaptation can be achieved naturally as a by-product of the evaluation of a PPN, because an inherent learning mechanism is implemented within the PPN execution semantics. More specifically, an instance of Bayesian model updating is seen to appear as a particular case of the conjunction of states of information (Tarantola & Valette, 1982), which is a basic operation within the PPN execution rules (Chiachío et al., , 2018. Consequently, the resulting approach has the advantages of (a) being able to deal with uncertain information in expert systems combining discrete events and continuous state variables, and (b) enabling self-adaptation by Bayesian learning from external input data.
The proposed methodology is general, and as such, it can be applied to different applications dealing with selfadaptation and uncertain information in infrastructure asset monitoring. However, in this article, it is illustrated using an engineering case study of condition-based maintenance for a railway track. The interest of this engineering application resides in the need for artificial intelligence (AI) methodologies that allow automated and adaptive decisions about maintenance activities and inspection actions in railway networks based on monitoring data (Lajnef, Rhimi, Chatti, Mhamdi, & Faridazar, 2011;Wang, Liu, & Ni, 2018;Weston, Roberts, Yeo, & Stewart, 2015). To illustrate the efficiency of the proposed methodology in this application, a PPN-based computational model is developed to act as an expert system for railway track monitoring and inspection, incorporating information about a state variable along with a number of operational rules that provides the basis for triggering a number of control operations and inspection activities. The overall system is shown to be adaptable by sequentially updating the state variable as (noisy) monitoring data become available.
The remainder of the article is organized as follows. Section 2 briefly overviews basic concepts about Bayesian model updating and PNs before introducing the PPN methodology in Section 2.3. In Section 2.3.3, an algorithmic description of PPNs is provided. The mathematical basis and computational aspects of Bayesian learning of PPNs are provided in Section 3. Section 4 illustrates and discusses our approach in application to a case-base self-adaptive expert system for railway track inspection. Finally, Section 5 gives concluding remarks.

Bayesian model updating
Let us consider a probability model described by the state vector taking values in a space denoted by  ⊂ ℝ , and ( ) a prior probability density function (PDF) of over , a Lebesgue integrable function that can be normalized such that ∫  ( ) = 1. The focus of Bayesian model updating is to update the prior information about ∈ , based on the information given by the data ∈  ⊂ ℝ , where  is the observation space within the region ℝ . Following the Bayesian formulation, the solution is not a single value of ; on the contrary, Bayes' theorem takes the initial quantification of the plausibility of , which is expressed by the prior PDF ( ), and updates this plausibility using the information in the data set  to obtain the posterior PDF of the state variable , as: where ( | ) is the likelihood function, which provides us with a measure of how well the model specified by predicts the actual data (Beck, 2010). The interested reader is referred to Beck (2010) and Rus et al. (2016) for further information about Bayesian model updating. In this work, we adopt a subjective interpretation of probability as a multivalued logic (Cox, 1946;Jaynes, 1983) whereby a PDF over the uncertain variable (e.g., ( | )) represents a measure of the relative plausibility of the values of ∈  conditional on the available information (e.g., ∈ ). This interpretation of probability is not well known in engineering where there is a widespread belief that probability only applies to aleatory uncertainty (inherent randomness in nature), and not to epistemic uncertainty (lack of knowledge).

Petri nets
A PN is a mathematical and graphical modeling tool first introduced by Carl Petri in 1962 (Petri, 1962) for analyzing the dynamic behavior of sequential asynchronous automatons. Since then, they have expanded to many areas of science and engineering for the modeling of complex distributed dynamical systems. The reader is referred to Murata (1989) for a comprehensive review and tutorial on PNs, but for the sake of clarity and readability, the main concepts are reproduced here under a unified notation. A PN is a bipartite directed graph (digraph) consisting of two types of nodes, places (e.g., states, represented by circles) and transitions (e.g., actions, represented by bars or boxes), connected by arcs either from places to transitions or vice versa. See Figure 1 for an illustration of a simple PN consisting of three places ( 1 , 2 , 3 ), and one transition ( 1 ). The places contain tokens that travel through the net depending on the firing of the transitions. The presence of tokens in the places of the PN is interpreted as holding the truth of the condition or information about the states associated with those places, and defines the marking of the net. A transition can fire only if all places leading to that transition have at least one token. Those places define the preset of transition , denoted by • . After the transition fires, one token is added to each of p 1 p 2 p 3 t 1 F I G U R E 1 Example of a Petri net composed of three places and one transition. Three tokens are represented in 2 its output places, which define the post set of the transition, referred to as • .

4.
∶ → ℕ >0 is a weight function, which assigns a value (1 by default) to each arc within . 5. 0 ∶ → ℕ is a vector containing the initial distribution of tokens over the set of places (initial marking).
The state of the overall net is represented by the marking ∈ ℕ , which, at a certain state ∈ ℕ, evolves dynamically according to the following state equation (Murata, 1989): where is an × matrix typically referred to as the incidence matrix, which can be obtained as the result of subtracting the backward incidence matrix ( − ) from the forward incidence matrix ( + ), i.e., = + − − , where + = [ + ], − = [ − ], = 1, … , , = 1, … , . The element + is the weight of the arc from transition to output place , whereas − is the weight of the arc to transition from input place .

The term
= ( 1, , 2, , … , , ) denotes the firing vector, a vector of binary values whose th component takes 1 if transition is fired, and 0 otherwise. In PNs, any transition needs to be enabled as a condition to be fired, which occurs when each input place of is marked with at least − tokens. Mathematically: where ( ) ∈ ℕ is the marking for place . Note that by means of PNs and their marking, the behavior of engineering systems can be described in terms of discrete system states and their changes over time. Also, it is worth mentioning that in practical applications of PNs, transitions are typically assigned with time delays that are useful for performance evaluation and scheduling problems of dynamical systems (Murata, 1989).

Plausible Petri nets
PPNs (M. Chiachío et al., 2018Chiachío et al., , 2016 are a hybrid variant of PNs where the sets of nodes { , } are partitioned into two disjoint subsets, namely, numerical and symbolic, which are denoted using superscripts ( ) and (), respectively. The symbolic subnet accounts for the discrete behavior of the system using regular tokens, as in classical PNs. In the numerical subnet, tokens are states of information about a state variable ∈  (Rus et al., 2016;Tarantola & Valette, 1982), which accounts for the numerical behavior of the system. In practical terms, these states of information can be understood as PDFs about (except for a normalizing constant; Mosegaard & Tarantola, 2002), which are referred to as ( ) and ( ) for numerical places and transitions, respectively. Hence, the marking = ( ( ) , () ) consists of both types of information given by ( ) for the numerical places, and () for the case of symbolic places, where ( ) and () are column vectors of normalized PDFs and integer values, respectively. From a mathematical perspective, a PPN can be described as a 9-tuple = ⟨ , , , , , , , , 0 ⟩, where: 1. The set is partitioned into ( ) ∈ ℕ for numerical places, and () ∈ ℕ ′ for symbolic places, such that ( ) ∪ () = , and ( ) ∩ () = ∅. Superscripts and ′ represent the number of numerical and symbolic places, respectively.
3. The set of arcs indicates the connections between transitions and places from the numerical and symbolic subnets,

4.
is a set of nonnegative weights applied to each arc within (1 by default). The set is partitioned into two subsets: ( ) and () , each one corresponding to the numerical and symbolic places, respectively, such that ( ) ∪ () = , ( ) ∩ () = ∅. Values from ( ) are real numbers.

5.
is a set of switching delays for the symbolic (0 by default).
6.  ⊂ ℝ is the state space of a stochastic variable { } ∈ℕ , which is representative of the numerical state of the net.

7.
 is a set of density functions associated with the numerical places and transitions.
8.  is a set of equations representing the dynamics of the state variable ∈ . Pre-set of t 1 Post-set of t 1 F I G U R E 2 Illustration of a sample PPN with two numerical 2 ), and one transition ( 1 ). Note that numerical nodes are drawn using double lines 9. 0 is the initial marking of the net, which is given by the pair of vectors ( ) 0 and () 0 for numerical and symbolic places, respectively.
In PPNs, the arc weights for the numerical subnet are denoted by + , − ∈ ( ) ⊂ ℝ + . The incidence matrix for the numerical subnet is defined from Note that the arc weights from the symbolic subnet, e.g., ( ′ − 11 , ′ + 12 ), are differentiated from the numerical ones by using an accent ( ′ ). A PPN model is shown in Figure 2 for illustration purposes.

Execution semantics
As stated in the last section, the dynamics of PPNs can be described by the join evolution of the numerical and symbolic subnets through the marking given by . Equation (2) is used to model the evolution of the symbolic part of the net, as in ordinary PNs. However, the evolution of ( ) relies on an ad hoc information flow based on two basic operations (M. Chiachío et al., 2016Chiachío et al., , 2018: the conjunction and disjunction of states of information (Rus et al., 2016;Tarantola & Valette, 1982). In these operations, the first principles of Boolean logic, in particular the logic operators AND (∧) and OR (∨), are invoked to allow the information from the numerical subnet to be exchanged within a PPN. More specifically, they enable the combination and aggregation of states of information across the numerical subnet of a PPN. To confer the conceptual framework without repeating the literature, the conjunction and disjunction of states of information are briefly explained and illustrated in Figure 3. In Figure 3, the term ( ) is the homogeneous density function (Mosegaard & Tarantola, 2002;Tarantola & Valette, 1982) that represents the state of complete ignorance about ∈ , hence providing a reference probability model for in the absence of any other 1. An input arc from place ( ) to transition conveys a state of information given by − ( ∧ )( ), which remains in ( ) after transition has fired.
2. An output arc from to ( ) conveys a state of information given by + ( • ∧ )( ), where • ( ) is the resulting density from the disjunction of the states of information of the preset of .

3.
After firing numerical transition , the state of information resulting in the output place ( ) is the disjunction of the state of information ( ) (the previous state of information), and + ( ∧ • )( ) (the information produced after firing transition ).
Example 1. Figure 4 provides an example of the execution rules presented above, by using a PPN model of one transition, three numerical places, and two symbolic places, as depicted in Figure 4a. In Figure 4b, a conceptual scheme is provided to exemplify rules 1-3 and shows how to obtain the marking at times and + 1. The numerical marking for are depicted in rectangular gray boxes.
The red dashed line indicates the separation between the marking at and + 1, respectively. Note that at + 1, the places ( ) 1 and ( ) 2 are updated with the information coming from transition 1 through a conjunction of states of information weighted according to ( − 11 = 1, − 12 = 2), respectively. Also, observe that the state of information resulting in place ( ) 3 after firing transition 1 is the joint information between the information that existed in ( ) 3 at , and that produced by transition 1 through the intersection with its preset, i.e., The rules given above are enough to describe the dynamics of the numerical subnet of PPNs. However, they can be synthesized through an algebraic expression describing a dynamical state equation, as follows (M. Chiachío et al., 2018): where = ( 1, , 2, , … , , ) is the firing vector for the numerical subnet (numerical and mixed transitions) at state ; − is the backward incidence matrix; and + is a column vector corresponding to the th row of the forward incidence matrix + . The term is an ( × ) matrix whose ( , )th element is given by the conjunction of states of information between and (expressed by ∧ ). Next, is an -dimensional row vector given by where is a vector whose elements are the Kronecker delta of variables and , which makes all elements zero except for = , = 1, … , . The term is an -dimensional column vector of binary constants, i.e., Finally, is a vector of normalizing constants required for ( ) to be a vector of bona fide densities. In Equation (4), the symbols ⟨⊗, •, ⋅⟩ are used to denote the matrix outer product, Hadamard product, and inner product (Ando, 1995;Beezer, 2007), respectively. Also, the ( , )th element from ( − ) • equals − ( ∧ )( ) and corresponds to the state of information that remains in place in ( ) at + 1 after firing , given that ( ) ∈ • is isolated from output arcs. Finally, observe that the summation of outer products ∑ =1 ( + ) ⊗ in Equation (4) renders an ( × ) matrix whose ( , )th element represents a weighted density function corresponding to the state of information added to postset place ( ) after transition has been fired. The interested reader should refer to (M. Chiachío et al., 2018) for further details.

Rule of transition firing
In PPNs, any transition ∈ is fired at time if the delay time has passed and 1. every symbolic place from the preset of has enough tokens according to their input arc weight, as in classical PNs.
2. each of the conjunction of states of information between and is possible, where ( ) belongs to the preset of .
Note from condition 2 that a conjunction, e.g., ( ∧ )( ), Tarantola & Valette, 1982). Note also that when any of the states of information involved in a conjunction is the homogenous density (also referred to as "noninformative density") ( ) of the state space of consideration , then the conjunction is always possible (Tarantola & Valette, 1982), thus condition 2 is automatically fulfilled. This argument is important in terms of using PPNs in practical applications, as will be demonstrated in the next section.
Further details about this aspect can be found in M. Chiachío et al. (2018).

PPN algorithm
The recursive scheme for PPNs explained above is, in general, difficult to evaluate analytically because the normalizing constants involved in the conjunction of states of information of Equation (4) are difficult to evaluate in practical cases. Furthermore, there are situations where the density functions are not completely known. To alleviate this drawback and confer the required versatility to the PPN methodology, particle methods (Arumlampalam, Maskell, Gordon, & Clapp, 2002;Doucet, De Freitas, & Gordon, 2001) can be used to circumvent the evaluation of the normalizing constants with a feasible computational cost. In this section, a pseudocode implementation of PPNs is provided, which combines the PPN algebra with the particle approximation for the conjunction and disjunction of states of information. Three main blocks comprise the pseudocode, namely, transition firing, information exchange, and marking evolution, which have been remarked for clarity. For better readability, the pseudocode has been provided as Algorithm 1 in Appendix A. In addition, some mathematical insights into the particle approximation of the conjunction and disjunction of states of information have been provided in Appendix B. Observe from Algorithm 1 that the normalizing constants from Equation (4) have been omitted because the particle approximation bypasses them through resampling.
Example 2. The suitability of the PPN algorithm to reproduce the execution semantics rules given in Section 2.3.1 is illustrated here using a numerical example. To this end, let us consider again the PPN given in Figure 4a; however, now the system states are considered as two-dimensional, i.e., ∈  ⊂ ℝ 2 . The initial marking 0 = ( 1 0 , 2 0 , 3 0 ) is described using 2D Gaussians as follows: 30,30], 2 ), with 2 being 2 = (4 2 , 2 2 ); and finally 3 0 ( ) = ∅. In this example, the transition 1 is defined using a Dirac delta density function, i.e., 1 ∼  ( ); thus, its firing is prescribed for the state variable on fulfilling the proposition ∈ , where  ⊆  is a region of the -space defined as:  = { ∈  ∶ ( 1 − 20) 2 + ( 2 − 20) 2 ⩽ 10}. Algorithm 1 has been applied using = 1,000 particles to evaluate the system state evolution, described through the marking , > 0. The results for the numerical places ( ) 1 (left) to ( ) 3 (right) are depicted in Figure 5 for = 0 (upper panels) to = 1 (lower panels). Each subplot represents a state of information about using samples (gray circles) in the state space . The blue circle represents the region . Observe that initially, at = 0, transition 1 is enabled because () 1 is assigned one token and ( 0 ∧ 1 0 )( ) ≠ ∅, for = 1, 2. after firing 1 , precisely because the arc weight from ( ) 2 to 1 ( − 21 ) equals two, while − 11 = 1. Note also that the PDFs 1 1 and 2 1 are concentrated within the region  (depicted as a blue circle in Figure 5) because of the Dirac delta PDF in 1 , which acts as a filter canceling any information out of the region  once 1 is fired. Finally, observe also that this example numerically shows how both the numerical and symbolic tokens interact in a synchronized manner in PPNs.

SELF-ADAPTATION MODELING BY BAYESIAN LEARNING OF PPNs
Lemma 1. In PNNs, any input arc from place ( ) to transition ∈ ( ) conveys a state of information given by the posterior density function of state variable , by adopting the following assumptions: 1. at time k, there are available data, denoted by ∈ ;

2.
( ) acts as likelihood function for , hence is named as data-transition; and 3.  is a linear space.
Proof. Let us consider that the state of information ( ) represents a prior PDF ( ) for within the state space . Let us now rewrite the likelihood function ( ) as ( | ). From Equation (4), the ( , )th element of matrix is given by ( ∧ )( ), which, by definition of conjunction of states of information, is given by Tarantola and Valette (1982): where is a normalizing constant, and ( ) is the homogeneous density function, which is also a constant because  is a linear space (Tarantola, 2005). By Bayes' theorem (recall Equation (1)), the resulting PDF can be interpreted as the posterior PDF of state vector given data ∈ , except for a normalizing constant. □ Corollary 1. Under the assumptions given above, and by considering that: 1. ( ) ∈ • is isolated from output arcs and

firing of transitions ( ) ∈ • is nonconcomitant, with
• being the set of transitions whose input arcs come from place ( ) ; then the marking ( ) +1 ( ) can be just obtained by recurrence as a posterior Bayesian estimation from ( ) ( ) except for a normalizing constant, as follows: where ∼ ( ) = ( ) ( ). In such case, ( ) is denoted as a learning-place.
Remark 1. Let us denote by ( ) a specific data-transition within a PPN and let be a time instant when data are available. A self-adaptive expert system can be modeled via PPNs by setting a number of learning places. Any learning place ( ) can be set and updated within a PPN by adopting the following procedure: 1. Make +̂= 0, ∀ = 1, … , .

Adopt
Remark 2. The methodology presented above reveals that PPNs are useful for modeling adaptive expert systems because they can deal with uncertain information such as the one coming from noisy condition monitoring data and expert opinion (Chiachío et al., 2018), and automatically update it for decision making. The resulting computational framework allows building computational models acting as self-adaptive expert systems, as shown in the next section within the context of a case study.

CASE STUDY: APPLICATION OF METHODOLOGY TO RAILWAY TRACK INSPECTION
In this section, the computational framework presented above is demonstrated in a case study for railway track inspection management. One of the main interests of this exercise is to provide evidence about the self-adaptation of PPNs using monitoring data. To this end, a PPN-based expert system is provided to generate autonomous and adaptive management decisions based on monitoring data about the state of geometry deterioration of a railway track, as depicted by Figure 7. In this case study, the deterioration of the track is assumed to occur due to traffic loading expressed in cycles, which represent an integer amount of train axles that have passed F I G U R E 7 Schematic diagram about a self-adaptive expert system using PPNs applied to a railway track maintenance problem through a particular track section during an operation period. Also, it is assumed that track measurement trains are used to provide measurements of the track state (in particular, the track settlement) at a set of regularly or nonregularly scheduled cycles. Observe from Figure 7 that every time the track measurement train provides new data about track settlement, the PPN-expert system uses that data to generate information to support decisions related to the scheduling of inspections by track engineers, as depicted by the dashed-blue rectangle. The track engineers' inspections do not provide new data points and are undertaken to confirm the requirement for a particular type of maintenance, such as tamping or stoneblowing (Esveld, 1989;Selig & Waters, 1994), or for speed restrictions or line closures to be imposed. The PPN-based expert system uses the available measurement data to ensure that an alarm is raised about the track condition if necessary or that track engineers' inspections are scheduled according to the latest knowledge of the track condition. This condition-based inspection (CBI) therefore acts to ensure that any disruption to operations caused by these inspections is minimized and that the required resources are used efficiently.

Problem description
Railway track geometry deterioration due to traffic loading is a critical railway operation and maintenance problem with important implications in safety and cost. When measured track irregularities exceed allowable limits, either traffic speed restrictions have to be prescribed, or corrective maintenance interventions like tamping have to be performed to restore the track to an acceptable geometry. The modeling of the track geometry degradation is a core element in the railway maintenance problem (Andrews, 2012;Andrews, Prescott, & De Rozières, 2014). In general terms, the temporal evolution of the track degradation can be assumed to be given by a statespace model, as where denotes the latent state of degradation at time or cycle ∈ ℕ, is a measurement of the degradation at time , and and are random variables that represent the process noise and the measurement error, respectively. Supported by the principle of maximum information entropy (Jaynes, 1983), and can be conservatively modeled as zero-mean Gaussian distributions; thus, the state-space model in Equation (10) can be rewritten probabilistically as: where and are the standard deviation of the model error and the measurement error, respectively. For this case study, a cycle-to-cycle incremental model for railway track degradation is adopted; thus, the state transition function ℎ in Equation (11a) can be expressed as ℎ = −1 + Δ , where Δ is the increment of track degradation in load cycle . Here, the (latent) state of track degradation at loading cycle is assumed to be given by the permanent settlement of the track in a particular track section. To calculate the cyclic increment of track settlement Δ , the semiempirical cyclic densification model from Indraratna, Thakur, Vinod, and Salim (2012) is adopted. This model is based on the theory of plasticity of soils (Yu, 2007) and the postulates of critical state soil mechanics (Schofield & Wroth, 1968), but for the sake of simplicity, it is not reproduced here. The interested reader is referred to J. Chiachío, M. Chiachío, Prescott, and Andrews (2017) and Indraratna et al. (2012) for further modeling details, and to Dahlberg (2001), Soleimanmeigouni and Ahmadi (2016), and Higgins and Liu (2018) for a comprehensive overview of track degradation models.
The data in this case study consist of a set of nonregularly scheduled (noisy) measurements = ( 1 , 2 , … , ) of track settlement taken from Aursudkij et al. (2009), which are sequentially introduced to the system at a set of discrete loading cycles. This data set corresponds to a laboratory simulation of traffic loadings for a 20-tonne axle-load train over a ballasted track section composed of 0.9 (m) (depth) subgrade material and 0.3 (m) (depth) ballast material, carried out in the Railway Test Facility (RTF) of the University of Nottingham (UK). The data set is reproduced in Table 1. The reader is referred to Aursudkij et al. (2009) for further information about the experimental setup whereby the data were collected, and to Brown, Brodrick, Thom, and McDowell (2007) for a detailed description on the Nottingham RTF. T A B L E 1 Experimental sequence of permanent unitary settlement (strain) data used for calculations, taken from Aursudkij et al. (2009)

Plausible Petri net model
The PPN-based expert system for railway track maintenance considered in this case study is depicted in Figure 8. The system represents a number of rules that autonomously raise an alarm (e.g., "line closure") or trigger inspection activities of a particular railway track section subjected to traffic loading degradation. As shown in Figure 8, the PPN is composed of two numerical places ( ( ) 1 , ( ) 2 ), nine symbolic places ( () 1 to () 9 ), four mixed transitions ( 1 to 4 ), and five symbolic transitions ( 5 to 9 ). The stochastic model for track degradation represented in Equation (11a) is embedded within the numerical place ( ) 1 . Thus, the state of information in ( ) 1 is given by: Numerical place ( ) 2 represents a buffer of information (initially empty), which collects the posterior values of track settlement leading to the "line closure" state, as will be explained further below. Discrete-event states such as "inspection needed," "activated inspection," etc., as specified by the colored text labels in Figure 8, are modeled by the symbolic places () 1 to () 9 , respectively. Places () 5 , () 6 , () 9 have red labels and represent irreversible discrete states, i.e., nonnumerical discrete states of the track degradation that permanently remain in the same condition once their corresponding places (e.g., () 9 ) have been marked. The gray text labels provide explanatory information about places. Observe also from Figure 8 that a number of inhibitor arcs (Murata, 1989;Schneeweiss, 2001) (those ending with a small circle) are used to produce the opposite effect of the rule described in Equation (3), i.e., they prevent a transition from enabling once its preset places are marked.
The system is fed during runtime using measurements coming from monitoring data. A cold transition ( ) is used to represent the data arrival, which are assumed to be available at a set of nonregularly scheduled time instants, as shown in Table 1. For this case study, the measurements 1 , 2 , … , are assumed to come with a 5% white-noise-type error; so, they are considered as random realizations of a Gaussian PDF centered in the latent state with standard deviation given by = 0.05‖ ‖, taking it as known, i.e., ∼  ( , ) (recall Equation (11b)). This PDF represents the state of information within transition 1 , given by: T A B L E 2 Description of the transitions shown in Figure 8. In the third column (rules), the delays are expressed in cycles. The last column provides a description of the action taken by the PPN-expert system when the rules are met Note that each time a new measurement arrives, transition 1 is enabled, which, by the PPN execution rules explained in Section 2.3.1, leads to the conjunction of the states of information of ( ) 1 and 1 (Equations 12 and 13, respectively). By Lemma 1, this conjunction leads to the posterior PDF and therefore to the update of the degradation variable in place ( ) 1 , except for a normalizing constant. It follows that selfadaptation for this particular example is enabled by means of the subnet { () 1 , ( ) 1 , 1 }. By evaluating the proposed PPN-based system, changes in the numerical and discrete track states are obtained with reference to a number of automated actions that are activated through firing transitions 1 -9 . An overview of the complete set of transitions is provided in Table 2. Observe from this table that the mixed transitions 2 , 3 , and 4 are defined based on condition, and henceforth, their activation is prescribed for the state variable on fulfilling the condition ∈  , where subspaces  , = 2, 3, 4, are specified in the third column of Table 2. These transitions are driven by states of information that are expressed by Dirac delta density functions (Chiachío et al., 2018), i.e., =  ( ), = 2, 3, 4. In Table 2, function ∶  → ℝ denotes the differential entropy (DE) of the degradation variable , as a measure of the degree of belief about the values taken by , which is given by 1 ∕ 2 ln[(2 )var( )]. Also, 1 [ ] denotes the expectation of with respect to 1 . From a computational point of view, the conditions in the mixed transitions 2 -4 specify numerical rules used by the expert system to raise an alarm or trigger an inspection action.

ID Type Rule State of information Action
The dynamics of the PPN-based expert system are briefly described next. Initially, at = 0, the system starts in the dual discrete state "Data arrived" and "available engineers," represented by one token at () 1 and () 7 , respectively, thus () 0 = (1, 0, 0, 0, 0, 0, 1, 0, 0) . In practice, this symbolizes an initial stage of the track where a first measurement is taken, almost no track degradation is observed, and engineers are available in case inspections are required. The initial marking of the numerical places is given by 1 0 =  (0, 1), represents a buffer of information that is initially empty). Subsequently, the numerical state variable in ( ) 1 starts evolving over time following Equation (10a), which is updated online by means of the subnet { () 1 , ( ) 1 , 1 } each time new data point arrives. In parallel, a sequence of periodic inspection (PI) activities takes place through subnet { () 2 , () 3 , () 7 , () 8 , 5 , 7 , 8 , 9 }, which is enabled because () 7 is initially assigned with one token, i.e., () 0 (7) = 1. Note that the system keeps running in PI mode until place () 6 receives a token, whereupon it switches to CBI mode. The CBI mode allows the expert system to trigger an inspection only when the condition ∈  2 is met, i.e., when the uncertainty observed in the predicted track settlement is higher than a certain value.
Note also from Figure 8 that a concurrency takes place through the subnet { ( ) 1 , 2 , 3 }, hence once any of the numerical rules represented by 2 and 3 are met, then the referred transitions (not necessarily both nor simultaneously) are enabled whereupon two possible sequences of actions are activated, namely: (a) a set of inspection activities represented by the transitions 5 , 8 , 9 (when 2 is fired) and (b) the closure of the line by firing 3 (when the mean predicted track settlement passes 0.014 m), which subsequently disables 1 and makes the self-adaptive expert system stop. In that case, the buffer in place ( ) 2 will collect and store the state of information taking place at ( ) 1 when 3 is fired. Finally, note that () 4 collects a number of tokens equivalent to the amount of collected measurements, and allows 6 to be fired, provided that at least two measurements are available, whereupon () 6 is marked and the system automatically switches from PI to CBI mode by disabling transition 7 . The two tokens required in () 4 to fire 6 are a prerequisite imposed on the system to avoid triggering CBIs based on predictions of the track settlement with little or no learning from the data. By this means, at least two data points are assured to train the model

Results
The simulation of the PPN-based expert system shown in Figure 8 yields predicted information about the state variable , along with the sequence of discrete events, such as activation of inspection, data arrival, etc. Algorithm 1 is applied to obtain the overall system evolution described through the marking , > 0, using = 5,000 samples. The results for the estimated degradation variable in place ( ) 1 along with its 5% − 95% probability bands are depicted in Figure 9 for = 0 → 75 × 10 3 cycles (see the leftmost panels). Figure 9c illustrates the temporal evolution of the uncertainty in the estimation of within place ( ) 1 , with indication of the reference level when inspections are needed. This uncertainty is expressed and quantified through the DE. The observed drops in the sequence of DE values in Figure 9c correspond to the uncertainty reduction due to Bayesian learning when new measurements become available.
Observe from these results that there is a period required by the PPN model to learn from the data, which corresponds to the loading cycles in the interval (0, 5 × 10 3 ]. After this learning period, not only does the precision of the prediction of clearly improve with time (predicted values of closer to data ), but also the uncertainty of the prediction gradually tends to diminish, which is numerical evidence of the Bayesian learning taking place in ( ) 1 . Figure 10a provides a plot of the history of the tokens visiting place () 2 during the overall period of evaluation = 0 → 75 × 10 3 , and indicates the sequence of activated inspections within that period. Note that at the beginning of the process (specifically the first 2,500 cycles), inspections are activated even when the uncertainty (DE) about in ( ) 1 is below the threshold value, as can be observed in Figure 9c. According to the PPN graph in Figure 8, these correspond to PIs that must be carried out until at least two measurements are available, whereupon () 6 is marked and the system switches from PI to CBI mode, as explained above. Note also from Figure 10a  2 from cycle = 2.5 × 10 3 to about = 2 × 10 4 , corresponding to inspection activities triggered because the uncertainty of the degradation variable in this initial period passes the threshold several times, i.e. (DE ⩾ −4.8), activating 2 . After this initial period, the system identifies that no more inspections are needed. Observe that these results reveal that the PPN autonomously responds to the arrival of data through adaptation so that the sequence of discrete states is altered in response to the most up-to-date information from data . Further insight about the response of the PPN of this case study is provided through a schematic diagram of the sequence of irreversible discrete states activated during a period of evaluation = 0 → 100 × 10 4 , as shown in Figure 11. The results are provided after 200 independent runs of the PPN algorithm. The circled points in Figure 11 represent the cycles when measurement data become available. The error bars are to indicate the 25th-75th probability bands corresponding to the variation in activation of certain discrete states due to the independent runs of the PPN algorithm. This diagram gives the intervals of the cycles (represented using bars) for which each discrete state remains active within the period of evaluation.

Discussion
In Section 4.3, a case study has been provided to illustrate the dynamics of PPNs, the different types of information that can be managed (uncertain information in confluence with information from discrete events), and how PPNs can sequentially learn from monitoring data. This section contains a discussion about the effectiveness of the proposed computational methodology in the context of the case study results. To this end, the PPN from the case study is comparatively evaluated by considering different subsets of data extracted from the data set in Table 1.
As a first exercise, the PPN is evaluated using Algorithm 1 by considering a data set (1) based on the first four data points shown in Table 1, i.e., the PPN only receives data up to 5 × 10 3 cycles. The results for the estimated degradation variable ∼ 1 and its DE are provided in Figure 9 (see right-hand panels). Observe that, in this case, the uncertainty increases from = 5 × 10 3 loading cycles, when new measurements are no longer available, in comparison with the case shown in Figures 9a and c, respectively, where the overall data set is available. As a consequence, the PPN responds by continuously triggering CBIs from = 8 × 10 3 , which is the load cycle when DE exceeds the threshold −4.8 (see Figures 10b  and c). These results corroborate that the proposed model is able to adapt to the available monitoring information in such a way that the more uncertainty in the degradation estimation (e.g., due to low quality of the monitoring system), the more inspections required, and hence the higher the maintenance costs, provided that there exists a cost associated with each inspection.
In addition, a comparison between accumulated calls for inspection engineers over time is carried out by considering the reduced data set, (1) , and , the complete data set. The interest in the accumulated amount of calls of engineers is because it is indicative of the availability of engineers for this particular example, such that the lower the calls for inspections, the higher the availability of engineers, hence lower inspection costs. To count the calls for engineers in each case, place () 7 in Figure 8 has been monitored over time of execution. The results, as shown in Figure 12b, are given in terms of total number of times that () 7 is empty. Note that during the first cycles (say ⩽ 1 × 10 4 ), the response of the PPN under both data sets is similar (if not equal) due to the PIs that are activated for the first cycles (recall that PIs are initially triggered until two data points arrive). After this initial period, the total number of triggered CBIs is clearly lower as a response of the PPN when using data set (recall Figure 10c), which implies lower calls for engineers, hence higher availability revealed through higher amount of tokens visiting () 7 during the runtime.
Finally, note that, from this standpoint, a natural research question arises about the optimum amount of data the selfadaptive expert system would require to minimize inspection costs. To answer such a question, the PPN shown in Figure 8 has been evaluated using six sets of data that have been named as { ( ) } 6 =1 , such that (1) ⊂ (2) … ⊂ (6) = , where (1) , (2) , … , (6) correspond to data sets obtained by successively considering the first four, five, and finally, all data points from Table 1, respectively. Figure 12b shows the results obtained from 200 independent runs of the PPN algorithm for each data set considered. Observe that, in general, the more measurement data become available, the lower number of calls for inspections, and thus the higher the availability of engineers. This is a consequence of the Bayesian learning of the PPN from the data, which serves to control the uncertainty of the track settlement predictions so as to avoid triggering unnecessary inspections. Note also that there is no significant difference in terms of maximum availability of engineers when using data sets (4) to (6) . This means that, in this particular case, taking (4) is optimal because the PPN response is virtually the same in terms of triggered inspection actions, while avoiding unnecessary future track settlement measurements.

CONCLUSIONS
This article presented a novel methodology to model selfadaptive expert systems by Bayesian learning of PPNs. The mathematical aspects behind PPNs along with the aspects relating to the learning procedure of PPNs have been provided using illustrative examples. As an application scenario, an engineering case study has been presented, which uses experimental data of railway track degradation to demonstrate how monitoring data and model-based knowledge about track degradation can be integrated within a self-adaptive expert system modeled using a PPN. The following are concluding remarks: • Expert systems modeled using PPNs can manage uncertainty and also autonomously respond to the arrival of noisy data through adaptation.
• Bayesian model updating is shown to appear naturally as a particular case of the conjunction of states of information, which is one of the intrinsic operations of PPN execution semantics.
• A PPN-based railway expert system can operate autonomously or as a decision support tool allowing the appropriate managers and railway engineers to make better decisions. Future research steps in the context of this specific application include the consideration of other subsystems within the expert system model, such as signaling, electrification, switchings and crossings, etc.
• Building on this work, one desirable future research direction to enhance PPNs as self-adaptive expert systems for complex civil infrastructures is to explore PPN architectures that allow the modelng of foreseen intervention scenarios. Also, an additional challenge would be how to incorporate massive and heterogenous data (Thaduri, Galar, & Kumar, 2015;Thöns, 2018) from infrastructure monitoring within the PPN methodology.

APPENDIX A: PPN ALGORITHM
A pseudocode implementation of the PPNs is provided below as Algorithm 1.

APPENDIX B: PARTICLE APPROXIMATION OF CONJUNCTION AND DISJUNCTION OF STATES OF INFORMATION
In particle methods, a set of samples [ ( ) ] =1 with associated weights [ ( ) ] =1 is used to obtain an approximation for the required density function (e.g., ( ∧ )( )), as follows: where is the Dirac delta and ( ) ∼ ( ∧ )( ). The particle weight ( ) represents the likelihood value of ( ) , and is representative of the plausibility of ( ) when it is distributed according to ( ∧ )( ). It can be evaluated for the case of  being a linear space as follows (Chiachío et al., 2018): ( ) = ( ( ) ) ( ( ) ) ∑ =1 ( ( ) ) ( ( ) ) (B2) A pseudocode implementation to obtain particles from the conjunction ( ∧ )( ) is provided as Algorithm 2.

Algorithm 2: Particle approximation of conjunction of states of information
The particle approximation of disjunction of states of information can be evaluated by simply joining the particles from the component-wise density functions and affecting their particle weights using an appropriate normalizing constant so as to obtain a bone fide density, as described in Algorithm 3. Note that the disjunction operation can be easily extended to the case of multiple states of information (e.g., 1 ( ), 2 ( ), … , ( )), as follows (Mosegaard & Tarantola, 2002): The last expression can be approximated using particles through Algorithm 3 by aggregating samples from