A Big Data Analysis Approach for Rail Failure Risk Assessment

Railway infrastructure monitoring is a vital task to ensure rail transportation safety. A rail failure could result in not only a considerable impact on train delays and maintenance costs, but also on safety of passengers. In this article, the aim is to assess the risk of a rail failure by analyzing a type of rail surface defect called squats that are detected automatically among the huge number of records from video cameras. We propose an image processing approach for automatic detection of squats, especially severe types that are prone to rail breaks. We measure the visual length of the squats and use them to model the failure risk. For the assessment of the rail failure risk, we estimate the probability of rail failure based on the growth of squats. Moreover, we perform severity and crack growth analyses to consider the impact of rail traffic loads on defects in three different growth scenarios. The failure risk estimations are provided for several samples of squats with different crack growth lengths on a busy rail track of the Dutch railway network. The results illustrate the practicality and efficiency of the proposed approach.


INTRODUCTION
Among all transportation infrastructure, the railway network is one of the most successful transport systems for reducing transportation cost, traffic congestion, and air pollution emission levels. On the one hand, the increase in usage of the railway network requires a systematic monitoring plan to keep the trains running in a safe way as well as with the least possible disruptions. (1) On the other hand, a large amount of data are collected by frequent measurements from the monitoring systems of the infrastructure and the assets involved in the railway operations. These data should be controlled, stored, and processed, such that they can be employed to take all necessary actions to guarantee the rail asset quality level desired by the infrastructure manager. (2) The large amount of data should be processed into actionable knowledge within a certain time period. (3) Risk is intuitively connected to decision making under uncertainty. (4) Recent developments in big data analytic for uncertainty management and risk assessment of industrial systems have been studied by Wu and Birge (5) and Choi et al. (6) Risk assessment of large-scale systems is of current interest across many application domains such as healthcare, (7) environmental safety, (8,9) transportation, (10)(11)(12)(13) business, (14) and product development. (15) In particular for railway applications, risk assessment is critical for the prediction of infrastructure health condition within a given time period. Continuous monitoring of railway systems can guarantee the availability of data that can be used to assess the risk of infrastructure failures. Also, the database constructed from continuous monitoring of data will become larger and larger over time. Thus, applying a big data analysis approach is necessary in order to adequately monitor the infrastructure condition. (16) Among all the railway infrastructure systems involved in the train operation, the rail track plays an important role in the railway networks. In an intensively used network, a considerable amount of the maintenance has to be allocated for the track, e.g., in the Dutch railway network, this amounts to almost half of the annual maintenance budget. (17) As a high percentage of failures occurring in the railway infrastructure is directly related to the rail, it is important to assess the failure risk of rails. The rail risk assessment involves detecting the rail defects that can potentially result in rail break and derailment in extreme cases. (18)(19)(20) Rail surface defects are caused by different factors such as fatigue due to large number of trains passing over rail components at, especially, welds, joints, and switches. (21) Early detection of surface defects is important to mitigate disastrous consequences of rail breaks. There are different methods to diagnose the condition of rail defects, including ultrasonic measurements, (22) eddy current testing, (23) and guided-wave-based monitoring. (24) In general, these methods are not able to detect defects in an early stage of growth, i.e., not until the defects are severe. In particular, detection of defects at the late stage of growth imposes extra operation and maintenance costs due to the fact that the only solution is to replace the rail.
To address the limitations of the current measurement methods, the use of video cameras installed on trains has become popular. (25)(26)(27) The use of video cameras avoids the error-prone, costly, and timeconsuming process of manual rail monitoring. Moreover, the videos taken from side cameras enable the infrastructure manager to capture the real condition of other track components such as fasteners, switches, and sleepers. Using video cameras, one can simply monitor whether the visible defects are at the early or late stage of growth. This means that the infrastructure manager has the opportunity to observe how the defect evolves over time in order to take actions at the right moment and to focus on the most urgent places for maintenance operations. This can lead to a significant reduction in the operation costs induced by the defects and it can prevent potential risks of rail breaks, reducing the risk of derailment. Due to the large number and the high resolution of the videos taken over the rail, an automatic detection algorithm is required to process the huge number of images from those videos.
The main contribution of the article is to assess rail failure risk based on an integrated framework that merges the information of two defect-related variables: visual length and crack growth. There is no similar approach in the literature for risk assessment of rail failure that considers both variables. This is due to the fact that in this case, a big data analysis problem has to be faced, as a result of which usually railway maintenance managers look at only one type of data and ignore the other influencing factors. We propose a risk function (Equation (1)) as a composition of three functions: the probability function, the crack growth function, and the partially inversed severity function. To evaluate these functions, we apply several techniques, including a deep convolutional neural network (DCNN) for image processing and defect detection, an N-step ahead prediction model for defect severity and crack growth analysis, and a Bayesian inference model for failure probability estimation.
To implement our proposed framework, a particular type of surface defect in railway networks called squat is considered in the case study. Furthermore, we give a proposed classification of the squats in terms of the visual length. Thus, squats are classified according to different severities. These classes can be used later for condition-based maintenance where we have different maintenance operations for different stages of the growth (rail grinding for light squats and replacement for severe squats). However, our approach can be generalized and applied for similar cases when there is a need to analyze a huge amount of image data for assessment of failure probability and risk function. For example, in a recent work by Skakun, (28) satellite images have been employed to assess flood hazard risk. Moreover, in the field of health science, abnormality detection using image processing has become very popular. (29) There are many cases in the literature where image data are used to deal with risk assessment problem. (28)(29)(30)(31)(32) In all these cases, as long as the focus is to detect abnormalities and failures among a big database of images, the risk assessment approach proposed in this article is applicable for merging attained information from images.
This article is organized as follows. In Section 2, the proposed failure risk assessment model is

Failure risk
Step 1 Step 2 presented, including the model framework. Section 3 addresses a real-life case study of the Dutch railway network. Section 4 presents the results and discussions. Finally, in Section 5, conclusions are presented.

The Proposed Framework
In this section, we propose a failure risk framework for analyzing the rail surface defects. The proposed framework is depicted in Fig. 1. Video images, ultrasonic detection, (22) and eddy current testing (23) can all be used to detect the defects that can lead to rail break. In this article, we rely on both the ultrasonic detection method and video images. On the one hand, with ultrasonic measurement, we derive a general characteristic of crack growth. On the other hand, with video images, we analyze the growth of the visual length of defects that are detected among a huge number of rail images. Then, a sample of the visual length of the detected defects is chosen for the assessment of the failure risk model. The approach can be employed for any type of rail defects.
In this framework, a large amount of image data is automatically processed by a DCNN to detect squats in Step 1 (see details in Section 2.4). The visual lengths of defects are measured from the defect detected from the video images, and then used for defect severity analysis in Step 2 (see details in Section 2.2).
In Step 3, a crack growth analysis is performed to estimate the crack growth as a function of million gross tons (MGT) by using the data from ultrasonic measurements (see details in Section 2.3). In addition, the probability of rail failure as a function of crack growth is estimated using the crack growth data.
Finally, we propose to assess the risk of rail failure with the composition of the probability function, the crack growth function, and the partially inversed severity function: where V 1 and V 2 are two consecutive measurements of visual length for a defect, detected by analysis of image data, and F S,inv. relates V 1 and V 2 to MGT. Function F C relates the estimate of MGT to crack growth, and function F Prob estimates the probability of failure based on the estimate of crack growth. Thus, the risk is approximated relying on the failure probability achieved in Equation (1). It means that the failure probability represents the risk of failure within a given MGT.

Severity Analysis
This section aims to model the visual length of defects based on the MGT. MGT is a measurement unit to show the total weight of freight and passenger trains that pass over a given track in a given time horizon. Thus, the MGT can directly influence the growth of defects in the sense that an increase in the MGT accelerates the defect evolution process and the tracks with a lower train occupation are expected to have a lower degradation rate than the busy tracks.
The defects are automatically detected using the image processing method described in Section 2.4. We measure the visual lengths of the detected defects to use in severity analysis. We consider visual length as an indicator of a defect severity. Analysis of rail image data shows that the visual length of defects can grow with different rates as the MGT increases.
To capture the dynamics of the growth, we keep track of the growth for each individual squat to determine the increase of the visual length in each MGT step. A generic function is used to model the growth. The function can be applied relying on different methods where two consecutive data measurements are available. We present the benefits of using an N-step ahead prediction model for the prediction of squats growth in our recent studies. For details, see Jamshidi et al. (33,34) Thus, considering index as an MGT increment counter, we use an N-step ahead prediction model to describe the growth of visual length at different whereV h i (m) is the estimate of the visual length for each individual squat i at step m assuming scenario h, M h (m) is the total amount of MGT in step m, F h S (.) is the one-step ahead prediction function, and V h i (0) is the visual length measurement at the current step.
By partial inversion of F h S (.), we get F h S,inv. as a function of the visual length in two consecutive MGT steps. In case of scarce data for the total amount of MGT in each step, an approximation can be made for the prediction model (2): A fixed increment of the MGT is selected to keep track of the growth of the visual length. Then, we apply function F h S,approx. in an N-step ahead fashion to reconstruct F h S . This yields the relation between visual length and MGT at step m. Once F h S is formulated, we can partially inverse it to get F h S,inv. as follows:

Crack Growth with MGT
The crack growth of defects is an important factor in rail breaks. Independent of the defect severity, the growth of the crack length depends on the traffic load (MGT). The idea in this article is to analyze the data measured by ultrasonic detection technique and to present a function for estimation of the crack growth over the MGT: (33,34) is the estimate of the crack growth length for defect i at MGT step m assuming scenario h and F h C (.) is the crack growth function. We will use a similar approach as described in Section 2.2 to assess the crack growth function.

Failure Probability
Regarding the crack growth data, assume the crack growth length is L, containing total I measurements ( L 1 , L 2 , . . ., L I ). Then the failure event can be defined as: where d i is the critical level for the ith measurement. This formula implies that a failure occurs if the crack growth length exceeds the critical level. Logistic function is appropriate for these data since the variable is binomial, meaning that the system fails if the measurement value satisfies Equation (6), otherwise no failure. (35) Therefore, a logistic function is considered for the likelihood of rail failure probability f ( L|(a, b)) with parameters a (intercept) and b (slope).
Recently, the Bayesian inference model has been employed extensively to assess model uncertainty and robustness for stochastic data behaviors. (36)(37)(38) Using a Bayesian inference model, variations of the model parameters can be considered as a step-wise degradation process. According to Bayes theorem, if prior knowledge about the parameter θ = (a, b) is represented by its probability density distribution π 0 (θ ), and if the statistical observations of crack growth length have likelihood f ( L|θ ), then rail failure probability can be expressed as posterior distribution π : (7) Typically, Monte Carlo methods are used in Bayesian data analysis to derive the posterior distribution. (39,40) The aim of using a Monte Carlo method is to generate random samples from the posterior distribution in order to use them when it is impossible to analytically compute the posterior distribution. Among all the Monte Carlo methods, slice sampling is easier to implement as only the posterior needs to be specified. (41,42) The slice sampling algorithm selects samples uniformly from the region under the density function. Therefore, in this article, a slice sampling algorithm is selected to capture the failure probability function.

Analysis of Rail Image Data
We consider a railway health monitoring situation where a huge amount of video data are regularly collected. Subsequently, the video data need to be analyzed in order to detect defects with a potential risk of rail break. The data are collected by a set of high-frame-rate cameras that are mounted on a measurement train. The video recordings cover the entire length of the measured distance on the rail track. The mounted cameras capture the rails from several angles to look at different components. The top view camera is aimed at the rail surface defects, with each frame covering a length of 15 cm of the track along the longitudinal direction. The recordings are preprocessed into video compilations where consecutive frames have a few millimeters of overlap and the effects of variations in the train speed are removed. Recordings made from (bi)monthly measurements of roughly 6,500 km of rail amount to producing thousands of gigabytes. Every 4 gigabytes of data cover 16 km of rail track. As a result, for recording videos of the whole Dutch rail network, almost 10 terabytes of data are required per year.
To be able to automatically extract defect information from the data, we train and apply a DCNN (43) to detect and classify the defects. Recently, application of DCNN has become very popular in the domain of big data due to the increases in the size of available training sets and algorithmic advances such as the use of piece-wise linear units and dropout training. (44)(45)(46) By passing through a number of convolutional layers, the images are fed to the DCNN to train a set of shared neuron weights, referred to as filters. Convolution filters detect distinguishing features and form what is called a feature map. We use rectified linear unit (ReLU) (47) activation functions after the convolution steps, and max-pooling layers to efficiently down-sample the outcome of each layer. Moreover, to prevent overfitting to the training data, we use dropout layers before each convolutional layer. Overfitting occurs when a classifier is fitted too closely to the sample data set that is unable to accurately describe the entire population, resulting in a high error over the test data. The dropout layer is known to prevent this by randomly disabling some activation from the previous layer. (48) The convolutional and pooling layer are finally attached to a sequence of three fully connected layers to get class predictions.
The DCNN is trained by iterative feed forward of the training examples through the network and by calculating the error with respect to the desired outcome. The error and its gradient are then evaluated at the last layer of the network and backpropagated through all the layers to adjust all the weights. Repeating this process until decreasing the error to a certain limit is called the gradient descent algorithm. (47) We use a widely applied variation of the algorithm where on each iteration, the error and gradients are calculated using a randomly selected set of training examples usually called a mini-batch. (47)

CASE STUDY
In this section, a track from the Dutch railway network is considered to illustrate the capabilities of the proposed methodology. Track availability can be affected by rail surface defects. Among all types of rail surface defects, like rail corrugation, head checks, shatter cracking, vertical splits, head horizontal splits, and wheel burns, squats play an important role in having a significant impact on the health condition of the track. Therefore, our main focus is on detecting the squats in this case study.
We select a sample from these data that contains recordings over a track in the north of the Netherlands from Zwolle to Groningen corresponding to approximately 300,000 captured frames. Two successive measurements of the same location along the track are matched together using the available time and geographic data. In total, 4,220 samples are labeled and used for training and testing of the neural network model. Of the total set of samples, 3,170 are normal rail samples and roughly 1,000 are squats.
The proposed DCNN architecture for analyzing this number of image frames is presented in Fig. 2. Initially the input images are down-scaled to 375×275 pixels and converted into gray scale. The sequence of three fully connected layers translates the extracted high-level features from the previous layers into three classes representing the normal rail, trivial defects (seed squats), and squats.
Trivial defects appear in the form of spots or small damages to the rail head, while squats are usually defects that are fully grown indentations and deformations of the rail surface. The normal class includes all other components such as plain rails, switches, welds, and possible nondefect contaminations.
To train the network, a set of manually labeled examples is collected from several locations along the measured track and is compiled into a training set for each one of the three classes. The network is trained once and then is used for multiple time predictions. The training time is 40 hours per 1,500 examples. Once the network is trained, it is used to find squats in the large pool of previously unseen samples (prediction). These samples are collected from other monitoring sessions. Unlike the training time, the prediction time is insignificant (30 seconds per

Light Squat
Medium Squat Severe Squat  15,000 examples). The prediction result then has an average binary accuracy of 96.9% (squat vs. normal) when training on 80% of the labeled data set and testing on the remaining 20%. By putting a high acceptance threshold on the network output response, we opt to detect the correct cases of squats, trivial defects, and the normal cases. Hence, after training and testing, we use the model to predict the severity of squats from the large amounts of available unlabeled data, from which we choose 109 detected squats for manual measurement of visual lengths in the track Zwolle-Groningen. Then, the samples are used in the next step where the growth of visual lengths is considered as described in Section 2.2. Here, squats with a visual length below 15 mm are considered as light squats, in which cracks have not appeared yet (surface initiation is assumed, and we cannot see beneath the surface from the image). Squats with visual length ranging from 15 to 30 mm are considered to be at the medium stage of growth. The medium squats evolve to severe squats when the network of cracks spreads further. Fig. 3 shows reference photos of squats ranging from light to severe together with crack evolution.
Light squats will evolve into medium or severe squats after repeated train passes. Once the squat is severe, the squat will evolve into a defect with surface-initiated cracks growing along the depth beneath the rail surface. (49)  Following the detection of squats by image processing, we apply the approach as described in Section 2.2 for this particular case to construct a severity function. From real data of visual length, we estimate F h S,approx. from Equation (3). Fig. 4 shows the relation between two consecutive measurements of visual length for a fixed value of MGT step (m = 1). Relying on the physical understanding of how a squat grows, we fit a polynomial regression model of degree 3, using the least-absolute residual method, (50) to represent the stochasticity of the growth. The residual plot together with the Rsquare value of 0.9778 determines how well the polynomial model fits the data. We consider the fit model as an average growth scenario, and the three-sigma control limits as slow and fast scenarios.
We use the estimated function of Fig. 4 for eightstep ahead prediction, and consider a fixed MGT increment of 3.01 in each step. As a result, a modelbased prediction function for the visual lengths versus MGT is depicted in Fig. 5, considering the three scenarios of average (a), fast (b), and slow (c).
The dotted line shows the upper bound of the estimation for visual length, i.e., it is very rare to observe a squat with a length over the upper bound in reality.
Assuming V h i (m) = 0, the visual length at MGT step m + 1 at the fast scenario reaches the upper bound with an MGT (MGT h 1 = 15.06) lower than at the average scenario (MGT h 3 = 21.83) and at the slow scenario (MGT h 2 = 51.32). It means that the degradation process in the fast scenario is more accelerated than in the average and slow scenarios as the traffic load on rail increases.
As described in Section 2.3.1, we estimate the crack growth function, F h C (·), by relying on ultrasonic measurement data. The model-based relation between the crack growth length and MGT is shown in Fig. 6. In addition, three different scenarios are considered to capture the crack growth dynamics, including the average scenario, the slow scenario, and the fast scenario. As seen in the figure, at the fast scenario, crack propagation of the squat at a given MGT is significantly faster than squats in the average and slow scenarios. For example, at MGT = 10.36, it is estimated that the crack length of a squat grows 1 mm in the slow scenario, 2 mm in the average scenario, and 8 mm in the fast scenario. We can assess the risk of rail failure considering any of the different scenarios of crack growth length.
In the failure probability model, we consider that a rail is prone to fail when a squat reaches a crack  length of 9 mm. The crack length of each squat is measured to see how it has grown over MGT, and how many cracks have reached a length of 9 mm or even more.
We use normal priors for the regression parameters (a, b). Relying on the data for the crack growth length, the parameters are estimated by a slice sampling algorithm considering 1,000 samples.   Respectively, Fig. 7 and Fig. 8 show how the mean of the parameter a and b varies over the samples and converges to a constant value. As seen in the figures, the posterior means of parameters converge to a stationary status after the first 50 samples.

RESULTS AND DISCUSSION
For a detected squat with measured visual lengths in one MGT step, we estimate the risk of rail failure as follows. From the model in Fig. 5  consecutive measurements. Then, from the model in Fig. 6, we find the crack growth length for the estimated MGT. Finally, we estimate the failure probability from the crack growth length in Fig. 9.
The failure probability plot represents how probable a squat fails in the next MGT step when the crack growth length is given. As an example, if the crack length of a squat increases 6 mm for MGT = 7.04, the probability that the squat could lead to a rail break is roughly 0.82.
In Fig. 10, a sample of five squats is visualized, and the estimates of failure probability from the given visual lengths are presented.
For instance, the squat with V 1 = 42 mm and V 2 = 57 mm will cause a rail break with a probabil-ity of 28.9% in next MGT step, if no maintenance action is operated. However, no serious failure threatens the squat at the early stage and the failure probability is then almost 10% (see the squat with 16 mm in visual length). In Table I, more samples of squats are presented.
The table includes 64 samples of squats with their measurements of visual length for two MGT steps. As expected, the squat at the severe stage will be prone to a rail break if no operation is carried out on the rail within a given MGT step. For example, there is a 53% chance of failure for the 64th squat in which the crack growth length is 4.10 mm within the given MGT step. The estimated risk values for the squats at the late stage indicate the need for immediate rail replacements. For the squats at early stage, a grinding operation is suggested to postpone rail failure by treating the squats.

CONCLUSIONS
In this article, we present a methodology for the risk assessment of rail failure for a type of rail surface defects called squats. A big data analysis approach is used to automatically detect squats from rail images. The visual lengths of squats are measured in order to use them in the severity analysis model, which captures the growth of visual length over MGT increments. In addition, due to the influence of crack growth on estimation of the failure risk, a crack growth analysis based on MGT has been performed. At the end, a Bayesian model is employed to estimate the failure probability. By relying on the estimated failure risk, the infrastructure manager is able to take actions at the right time and the right place in order to prevent unexpected consequences induced by rail breaks. While this article is focused on the analysis of squats, the results can also be applicable for the analysis of other types of rail defects.

ACKNOWLEDGMENTS
This research is part of the NWO/ProRail project (multiparty risk management and key performance indicator design at the whole system level, PYRAMIDS), project code 438-12-300, and the STW/ProRail project (advanced monitoring of intelligent rail infrastructure, ADMIRE), project 12235, which are partly funded by the Ministry of Economic Affairs. The authors also would like to thank IN-SPECTATION for providing us with image data and technical support.