Simulation‐based inference for reliability model selection and parameter calibration: An application to fatigue

Selecting a “useful” model among a set of competing candidate models is a common problem in many disciplines and particularly in engineering applications. For a modeller, the best model fulfills a certain purpose best (e.g., the ability to accurately predict new data, good interpolation, precise estimates of lower/upper percentiles, etc.), which is typically assessed by comparing model simulations to the observed data. Approximate Bayesian computation (ABC) which is a statistical learning technique is used in this work for reliability model selection and parameter calibration using small/moderate fatigue life data sets. This is always the case in material fatigue due to the high cost and low efficiency of fatigue tests which are bottleneck problem when designing components/structures with regard to fatigue. The ABC is a likelihood‐free based method which means that it needs only a generative model to simulate the data. The proposed method provides a formal rank of the competing reliability models by eliminating gradually the least likely models in a parsimonious manner. It is flexible since it offers the possibility to use different metrics to measure the discrepancy between simulated and observed data. The choice of the appropriate distance function depends essentially on the purpose of model selection. Through various examples, it has been demonstrated that the ABC method has a number of very attractive properties, which makes it especially useful in probabilistic risk assessment and reliability analysis.

As it is well known, it is the requirement of fatigue reliability analyses to accurately determine the probability model of a group of fatigue lives.Since the data are generally seldom sufficient to define a statistical distribution, we normally must rely on assumed distributions which arise from convenience or, preferably, from relevant physical arguments.Among the assumed distributions, which have been widely used in fatigue studies, are lognormal (LN), Birnbaum-Saunders and Weibull distributions, 1,2 to mention a few.These distributions are very similar but the predictions at lower/upper tails which are usually the areas of interest for inference could be significantly different, for this reason a robust discrimination technique according to a desired goal could be a useful alternative for practitioners.
Many different approaches have been proposed in the statistical literature to help select the "best" model among a group of competing hypotheses.][7][8][9][10] Bayesian inference is aimed at computing a posterior probability distribution over a set of hypotheses or models, in terms of their relative support from the data.In this discussion I shall present some of them.The Bayes factor (BF) proposed by Jeffreys et al. 11 is referred to as the standard Bayesian solution to the hypothesis testing and model selection problems 12 and the primary tool used in Bayesian inference for hypothesis testing and model selection. 13A comprehensive review of BFs is presented by Kass and Raftery. 14The idea is to select the model that presents the best point estimate of the posterior density ratios.Nevertheless, the main drawback of BFs is their sensitivity to the choice of priors as well documented in the literature.The method is also problematic if there are many possible models.To overcome the issues with the BF, other variants including partial BF, 15 the intrinsic BF, 16 and the fractional BF 17 have been proposed in the literature.These variants basically split the data into a training sample and a testing sample.The training sample is used to update an uninformative prior to obtain an informative prior.Unfortunately, they suffer from more or less arbitrary choices of training samples, weights for averaging training samples, and fractions, respectively.Model selection criterions based on loss of information, the well-known are the Akaike information criterion (AIC), 18 Akaike information citerion with correction (AICc), 19 Bayesian information criterion (BIC) 20 and Hannan-Quinn information criterion (HQIC) 21 have been widely used as well.They are based on a penalization of the likelihood function as the model becomes more complex, that is, models with more parameters.Following Spiegelhalter et al., 22 the model whose information criterion has a smaller value is the better.This is coherent with the principle of Occam's razor which states that unnecessarily complex models should not be preferred to simpler ones.The principle states that among competing theories that lead to the same prediction, the one that relies on the fewest assumptions is the best.In particular, when choosing among a set of models, the simplest valid model is the best choice.Exactly, how to quantify simplicity has been a subject of debate for centuries.In this study, the simplicity of a model is assumed to be determined by the number of parameters in the model.An inherent problem with these criteria is that they do not allow prior input for model choice.In a somewhat similar spirit, Spiegelhalter et al. 22 devised a selection criterion, called the deviance information criterion (DIC) based on Bayesian measures of the complexity level and of how well the model fits the data.A lower value of DIC indicates a better model fit.The major limitation of these approaches is that they undertake separately the competing models.
The reversible jump Markov chain Monte Carlo (RJMCMC) method, originally given by Green, 23 is another strategy that samples over the model and parameter spaces.The RJMCMC method offers an across-model simulation approach for Bayesian estimation and model comparison, by exploring the sampling space that consists of several models of possibly varying dimensions.A naive implementation of RJMCMC to models like Gibbs random fields suffers from computational difficulties.To overcome the inefficiency of a standard RJMCMC algorithm, a noisy RJMCMC sampler 24 has been developed.The Transitional Markov Chain Monte Carlo (TMCMC) algorithm proposed by Ching et al. 25 has been proven to outperform other approaches for its capability of estimating model evidence and smooth convergence merit.However, it has potential problems in tackling higher dimension parameters, 26 since its intermediate stage number will augment for the need of more samples while the accuracy of estimators may decrease with increasing parameter dimension. 25It should be noted that most of the proposed selection procedures converge to a single model.However in some circumstances more than one model could explain the data.Thus, there is no need to choose one model and it is possible to average the predictions from several models.Bayesian Model Averaging (BMA) is a technique designed to incorporate the uncertainty inherent in the model selection process, which is often ignored by traditional statistical analysis.As a statistical procedure to infer consensus predictions, BMA method weighs individual predictions based on their posterior model probabilities, with the better performing predictions receiving higher weights than the worse performing ones.BMA method can thus generate an averaged model, especially in cases where more than one model has a non-negligible posterior probability. 27BMA is ideally suited to guide this search, because it implicitly honors the principle of parsimony. 4,28The BMA ranking reflects an optimal tradeoff between goodness-of-fit and model complexity, with model complexity being encoded in the prior probability distributions of the model parameters.Another selection procedure that is popular is the Nested sampling (NS) which is a highly efficient and easily implemented sampling algorithm that has been successfully incorporated into Bayesian inference for model updating and model selection.The key step of this algorithm lies in proposing a new sample in each step that has a higher likelihood to replace the sample that has the lowest likelihood evaluated in the previous iteration.This process is a constrained sampling step which has significant impact on the algorithm efficiency.It has been successfully applied to cosmological field, 29 finite element model updating and model selection, 30 and surface flow problems. 31,32The main problem of the standard NS algorithm lies in the constrained step in which a new sample with higher likelihood is proposed to replace the sample with the lowest likelihood.This issue becomes even more challengeable after several iterations as the likelihood has reached a higher value and the parameter space shrinks to a very sharp region.
In this paper, a simulation-based framework that simultaneously addresses parameter calibration and model selection using the Approximate Bayesian computation (ABC) method is proposed.It offers the possibility to compare between several reliability models and to estimate the marginal distributions of the model parameters in a straightforward way.The ABC method belongs to the class of sequential particle filter methods which does not require the specification of a likelihood function.Its basic idea is to compare simulated data to the observed data using an appropritae distance/metric; if their difference is small, then the parameter values that generate the simulated data are selected as a sample from the posterior distribution.Successful applications of the ABC algorithms can be found in various areas including structural dynamics, 33,34 biology, 35 epidemiology 36 and cosmology, 37 to mention a few.The uniqueness of this study comes from the fact that thus far, no attempt has been made to compare several competing models simultaneously by using a variety of metrics/distances.It should be noted that the ABC algorithms take into account the complexity of the model (i.e., parameter number) automatically without the need to define a penalty term.In other words, the ABC algorithms promote systematically the selection of the simpler model and switches to complex models only when a high accuracy is required.It is similar to the classical penalised information criterions based on the log-likelihood whose penalise complex models, especially to discourage overfitting.In addition, we will demonstrate through this study that the proposed strategy is robust against model mis-specification (or at least could mitigate the effects of model mis-specification) compared with the classical likelihood-based methods.
The outline of the paper is as follows.Section 2 introduces the Bayesian paradigm for parameter estimation and model class selection and sets out the necessary background on ABC algorithms.The distance functions/metrics that can be used for Bayesian inference are given in the same section.In Section 3, we present a motivating problem with a focus on the merits of the ABC method.In Section 4, we present three applications in which we propose several competing statistical distributions to model lifetime data and we illustrate the procedure for selecting the most likely reliability model.Section 5 summaries the conclusions of this paper and discusses future extensions of this research.

Background and details
ABC is employed in this work for simultaneous model selection and parameter estimation. 35,38,39Let us consider first the problem of parameter estimation, 35,40,41 where    is the vector of model parameters.Thus, given the prior distribution for the parameters, (  ), the objective is to approximate the posterior distribution given by: where (|) is the likelihood function (conditional probability of observed data  given the parameters   ).
A simpler ABC algorithm for parameter estimation, which avoids the requirement of evaluating likelihood functions directly (|  ) in order to obtain the posterior distributions of unknown parameters, can be summarized in general terms as follows.By starting with  = 0, repeat the following steps until a sufficiently large number of samples     are accepted: • Step 1: sample a parameter vector    ⋆ from the prior distribution (  ); • Step 2: generate simulated data  ⋆ from a generative model (in this study, a parametric probability model); • Step 3: compare the simulated data,  ⋆ , with the observations, , utilising a user-selected distance function (,  ⋆ ) and a user-selected tolerance  (the tolerance  > 0 is the level of desired agreement between  and  ⋆ , measured by that is, make  =  + 1 and then set     =    ⋆ .Otherwise, return to Step 1. Each output     of the above algorithm is a sample of the distribution (  |(,  ⋆ ) ≤ ).If  is sufficiently small and (,  ⋆ ) is appropriately selected, it is expected that this distribution would be a good approximation to the true posterior distribution (  |).Therefore, difficulties in the successful implementation of the above algorithm obviously include the specification of the distance function (,  ⋆ ), as well as of the tolerance .
The concepts outlined in the above algorithm can be extended to incorporate model selection in addition to parameter estimation. 35,42,43The marginal posterior distribution of a model   from the given observations is given by: where (|  ) is the marginal likelihood distribution and (  ) is the prior probability function for model   ,  = 1, 2, … , .The posterior distribution in the combined space of models and parameters, which is sought in this work, is (    ,   |), where     is the vector of parameters for model   .Such posterior distribution is approximately represented in this work by a set of samples obtained from (    ,   |(,   ) ≤ ), using a more efficient sampling technique.It should be noted that the acceptance rate of ABC in its basic form is computationally prohibitive for small .Several variants have been proposed in the literature to overcome this issue and to gain in efficiency.In this work, the ABC algorithm based on an ellipsoidal sampling technique and a reweighing scheme is used. 34It can be summarised in Algorithm 1.
The algorithm starts by selecting a candidate model from the competing models supposed to be equally probable.A particle in then sampled from the prior over the model parameters, simulating the data using the model and accepted the particle if (,  ⋆ ) ≤  1 ).The process is repeated until  particles distributed over the competing models are obtained.The particles are then weighted using a kernel and normalised according to each model.For the next population, a tolerance threshold  2 is defined based on the discrepancy values ranked in descending order.For each model, the particles with (,  ⋆ ) >  2 ) are dropped.From the retained particles, a proportion are randomly selected and propagated to the next population.The remaining particles are then enclosed in an ellipsoid in which the masse centre and the covariance matrix are based on the propagated particles.The ellipsoid are then enlarged by a factor  0 to ensure that the particles on the borders will be inside.Finally, the population is replenished by re-sampling particles inside the ellipsoid associated to each model.In the subsequent iterations, the threshold is updated adaptively in the same way and samples selection are subjected to more stringent threshold.The priors on the models are also updated when one of the competing model is eliminated.Through the populations and as  → 0, a larger number of particles are selected for the most likely model(s) and the samples for the parameters better reflect the real posterior distributions as the most interesting region in the parameter space is well identified.Several stopping criterions could be used to stop the algorithm.In this work, the algorithm is stopped when the difference between two successive tolerance threshold values falls below a prespecified value.Let denote by  this value.A detailed discussion concerning the effects of these settings can be found in Ref. [34].
At the last population, the algorithm produces a Markov chain on (, ) for which the marginal distribution is (|).The posterior model probability can then be estimated by: As shown in Algorithm 1, the ABC-NS algorithm requires to choose a number of hyper-parameters.In this study, the algorithm is implemented with the following hyper-parameters: • the initial threshold  1 is set to 1000.
• the number of particles per population is set to 1000.
• the proportion of dropped particles and surviving particles are set respectively, to 30% and 60%.
• The ellipsoid enlargement factor  0 , is set to 1.1.
• The stopping criterion threshold value is equal to 10 −6 .
For further detail about ABC-NS in general as well as the concrete implementation, the interested reader can refer to the following examples released in GitHub (the same notation has been used here): ABC-NS for model calibration and ABC-NS for model selection.

Minimum distance estimators
A key ingredient in ABC procedure is the choice of a discrepancy function that describes how different the simulated and observed data are.In this section we give the computational forms of some of the main distance functions which can be used to deal with reliability model selection and parameter calibration.

• The Cramer-von-Mises distance
The Cramer-von-Mises (CM) distance measures the distance between the cumulative density function of the derived distribution against the dataset's cumulative histogram.It is defined by: where Ψ() is a weight function such that Ψ() = 1.For given ordered sample,  1 ,  2 , … ,   , from a statistical distribution, the distance can be written as follows: where   = (  ) is the predicted value for the th observation.In the following, the same alphabets denote the same meaning.

• The Anderson-Darling distance
The Anderson-Darling (AD) distance first introduced by Anderson & Darling 44 to place more weight at the tails of the distribution. 45It is defined by the following equation: where ] −1 is a non-negative weight function.Equation ( 5) can be written for a finite data sample as: Two modified distances of the AD distance called ADL and AD2L which give more weight to the left-tail have been tested.They are computed by the following formulas: Compared to the  AD distance, the  ADL distance gives more weight to the left-tail region.Similarly, the  AD2L assigns larger weight to the left-tail region of the distribution compared to the  ADL .

MOTIVATING PROBLEM
In this section, I shall illustrate the potentiality offered by the ABC-NS algorithm for model selection and parameter calibration by means of real data.The advantages of the simulation-based inference method over the most popular maximum likelihood (ML) based-methods will be highlighted.

Example 1. "endosulfan" data set
In this example a data named "endosulfan" 46 is used to illustrate some of the merits of the ABC method very important in risk assessment and reliability analysis particularly for designing against fatigue.Through this example, we will show that the ABC method is flexible, robust against model mis-specification or at least, it can mitigate the effects of model mis-specification through an appropriate choice of a distance function/metric and can be tailored according to the user's interest.
To demonstrate the usefulness of the ABC, we intend to estimate lower percentiles in a misspecified setting and we make a comparison with a likelihood-based method.It is worthwhile to note that percentiles of the fatigue lifetime distribution are often used as an indicators of reliability and structure safety.Consequently a precise estimates of some important percentiles is of paramount importance for designer.This data set used in this example is taken from Refs.[46]  which contains acute toxicity values for the organochlorine pesticide endosulfan, tested on Australian and non-Australian F I G U R E 1 Empirical data and theoretical CDFs plot to compare the fit of four distributions with ML estimates (B 5 and B5 indicate the empirical and estimated 5th quantile, respectively.).CDF, cumulative distribution function; ML, maximum likelihood.laboratory-species.A LN or a log-logistic distribution is often fitted in order to characterise the species sensitivity distribution for a pollutant.A low percentile of the fitted distribution, generally the 5% percentile, is then calculated and called the hazardous concentration 5% (HC5).It is interpreted as the value of the pollutant concentration protecting 95% of the species. 47In addition, the two-parameter Pareto distribution and the three-parameter Burr distribution have been fitted.The specifications of the competing reliability models are given in Appendix A. The four competing models are the same considered in Ref. [48].Model selection is first conducted by using a likelihood-based approach and through computing two information criterions for comparison purpose.Our analysis will incorporate both Akaike's and BICs.AIC balances between the complexity of the model and the statistical goodness-of-fit of the model by imposing a penalty for increasing the number of parameters in the reliability model.It is defined as: where ( θ) is the maximised log-likelihood function and  is the number of parameters in the model.The preferred model is the one corresponding to the lowest index.BIC is an improvement of the AIC in the sense that BIC factors in the size of the sample data in determining the amount of penalty to impose on a model due to increased number of parameters.It has been argued to penalise overfitting more effectively than the AIC measure.It is defined as: where ( θ  ) and  are as defined above and  is the sample size of the data.As in AIC, the preferred model is the one corresponding to the lowest index.As one can see from Table 1, the AIC and BIC criterions give the preference to the Burr distribution or the Pareto distribution.The choice between the two distributions seems thus less obvious.Looking at classical penalised criterions based on the log-likelihood seems thus also interesting, especially to discourage over-fitting.Figure 1 shows the data and the fitted cumulative distribution function (CDF), it can be seen that the left-tail seems to be better fitted by the Burr distribution.Its use could then be considered to estimate the HC5 value as the 5% quantile of the distribution.The LN and loglogistic reliability models seem the least likely models for this data set.
The ABC-NS algorithm is now employed to deal first with parameter calibration.We compare the performance of the Burr distribution (using the ML estimates) with the LN distribution calibrated using the distances given in Section 2.2 in a simple misspecified setting.Based on the fact that many practitioners prefer to use simpler models, the LN distribution is selected to fit the data and to predict the HC5 percentiles.We investigate now the efficiency of the ABC method for mitigating model mis-specification.We will show that by coupling a "wrong" model with an appropriate distance function measuring the similarity between the real and simulated data, one may get good predictions as well.The choice of the distance depends on the desired stated goal.Figure 2 displays the empirical data and the fitted CDFs using the best model following the IC and the LN models coupled with the different distances.One can clearly see from Figure 2 that the ML estimates cannot provide a good fitting quality to the empirical data.In contrast the use of the mis-specified model coupled with an appropriate distance function could provide a better fitting quality to the data.Through the following example, it has been demonstrated that the likelihood-free approach using a suitable distance is robust against model mis-specification.
Comparing the 5% percentiles (HC5) calculated using these fits to the one calculated from the ML estimates fit to the Burr distribution, we can observe on this example that fitting the LN distribution by maximising left-tail AD distances of first or second order enables to approach the value obtained by fitting the Burr distribution using ML estimates.Table 2 displays the empirical and computed quantiles given by the best model based on the information criterions and the misspecified model with different distances.It has shown that the mis-specified model coupled with an appropriate distance (i.e.,  ADL ,  AD2L ) makes good predictions much better than the model that has been initially identified as the most likely model (the Burr distribution).For this data set, the distances  CM and  AD show poor predictions.In conclusion, the proposed likelihood-free method seems interesting when precise predictions of some percentiles is required which is a main interest in fatigue.

Example 1. 16Mn steel welded specimens
In the first example, the test data of two kinds of 16Mn steel welded specimens whose tests were performed at Zhengzhou Mechanical Institute [49][50][51] are taken.The dimensions of the two kinds of specimens, machined under CO 2 gas protective welding procedures, are given in Figure 3.The tests was carried out under the four-point bending sine wave loading  mode and stress ratio of 0.1.The test arrangements, including the specimen joint pattern, ordinal of group , number of specimens  and the corresponding stress amplitude level Δ, are given in Table 3.
Cycles to failure is defined to be the cycles for which the maximum of crack grows to the half the specimen thickness.At this time, the deformation of specimens is too large to continue the normal testing and for the safety of the structure, this definition is necessary for avoiding disaster accident.The test results of the considered specimen are given in Table 4 (for illustrative purpose, only the tested welded specimens shown in Figure 3A are considered).For the data collected at each stress range, one aims to identify a suitable reliability model from a number of competing models.Five competing models widely used in fatigue have been considered: the LN, the two-parameter Weibull (W2P), the three-parameter Weibull (W3P), the extreme maximum value (EMV) and the Birnbaum-Saunders (BS).The probability density function (PDF) and the CDF of the candidate models are given in Appendix B. The ABC-NS algorithm using the different distance functions is now employed to discriminate between the candidate reliability models.Equal prior probabilities are assigned to the competing models and the same stopping criterion is used as in the previous example.Figure 4   particle over the populations considering the data obtained under Δ = 320 MPa.One can clearly see that the algorithm selects the W3P model as the most likely model to fit the available fatigue lives.The results are shown by using the  ADL distance for illustrative purposes.The algorithm rules out the least likely models over the populations and selects the model(s) which performs better.Figure 5 shows for the same stress level the evolution of the model posterior probabilities over the populations using the different distances.The W3P model is selected as the most plausible model by minimising the ( CM ,  AD ,  ADL ) distances while with the  AD2L distance, the algorithm converges to the EMV and W3P models with a posterior probabilities equal to 0.645 and 0.355, respectively.Next the ABC-NS algorithm is used to infer the competing models separately in order to compare the fitting quality.Table 5 shows the optimal estimates for each model using the different distance functions.Figure 6 displays the failure times and the fitted CDF, one can see that the W3P outperforms the other models for this data set.The W3P model provides the best fitting quality and has the ability to accurately describe the lower left-tail.Finally, it should be noted that he F I G U R E 6 A comparison between the fitted CDFs using the  ADL distance.CDF, cumulative distribution function.
ABC-NS algorithm converges for all the models despite the small sample size.For the selected model, one can see that the shape parameter is less than 1 for all the distances which means that the hazard function is decreasing as the number of cycles is increased which is not coherent with the physics of fatigue.It has been observed that for all the considered sample sizes and for all the distance functions, the ABC-NS algorithm works very well and never displays any problem with convergence.
To make a comparison, we employ the ABC-NS using the other distances and we estimate the posterior model probabilities for all the tested specimens.Table 6 shows the model posterior probabilities at the last population using the different distances.One can clearly see that the W3P and the EMV models perform equally well.From the same table, one can clearly see that the W2P performs poorly.It is ranked the last for all the metrics.Table 6 shows the model ranking for the other stress ranges, one can see that none of the models is always supported by the data.The selected model vary with respect of the distance function and the collected data.For instance for specimen 3, the best model is the W2P while for specimen 4, the best model is the EMV.

Fit effects in the left-tail region
After assessing how well each distribution fit to the overall data sets, the focus now is on the left-tail region of the distribution since this is the region of importance to engineering design and planning applications.In order to describe the fit effects in the tail regions, the error parameters  F1 and  F2 between the real value and theoretical value of two life data (so-called fit-differences in the left-tail region) [52][53][54] are, respectively, defined as: where  1 and  2 are, respectively, the minimum life and sub-minimum life of a group of  data.The smaller the | F | value, the better the fit effects in the left-tail region.When  F < 0, this implies a conservative evaluation.In contrast, when  F > 0, a non-conservative evaluation results.In addition, by comparing  F 1 with  F 2 , the trend of the evaluation can be determined.In fact, when  F1 <  F2 , the evaluation, as the fatigue life is less than  1 , may be conservative.But, when  F1 >  F2 , the evaluation may be non-conservative.The calculated values of  F1 and  F2 for the crack lengths given by the competing models are given in Table 7.
• The W3P model provides the best fit in the left-tail region followed by the EMV model.The BS and the LN models are quite similar while the W2P model provides a relatively poor fit.• The predictions in the left-tail region of the W3P model shows the same trends:  F1 <  F2 except for Δ = 260 MPa.
• The predictions of the LN and EMV models shows the same trend  F1 <  F2 for all the stress ranges.This means that the evaluation, as the fatigue life is less than  1 , may be conservative for all the crack lengths.• The results show that the EMV, LN and BS models give a conservative evaluations in the left-tail region.
• Overall, a better fitting quality of the left-tail region is observed by using the ( AD ,  ADL ,  AD2L ) distances compared with the  AD2L distance which is expected.
Analysing all the data, one can see from the total fit effects that the W3P model is the best.The shape parameter is less than 1 of all the crack lengths, which results in failure rates that decrease with fatigue cycling.This is not coherent with the fatigue physics.Moreover, the W3P model may give a non-conservative evaluations in the left-tail region for some stress ranges as one can see in Table 7.The EMV model may be an appropriate model to describe fatigue life both from safety and from the fatigue physics points of view.In addition, we have seen that it gives a conservative evaluations in the left-tail region.The BS and the LN distributions may be a good distributions to assume as well due to the good total fit effects, although the hazard functions in the large cycles number decrease with fatigue cycling.

Example 2. Fatigue tests and reliability analysis
In the second illustration, we consider the fatigue crack growth data available in the literature. 55The experimental data set consists of 68 crack propagation trajectories obtained from a series of fatigue tests performed on identical centercracked 2024-T3 aluminum plates (specimen geometry is illustrated in Figure 7. F I G U R E 7 (Left) Test piece geometry with a central crack (taken from Ref. [56]), (right) harmonic loading.

F I G U R E 8
Experimental fatigue crack trajectories.
located at the centre of specimen.The tests were performed under load control using a sinusoidal input at 20 Hz producing a constant stress range of Δ = 48.28MPa with a stress ratio equal to 0.2 (see, Figure 7).The accumulated number of cycles was recorded at specified crack lengths between 9 and 49.8 mm.The 68 experimental trajectories are shown in Figure 8.
In this study, one aims to select the most suitable reliability model to describe the number of cycles for a pre-defined critical crack length in order to estimate the probability of failure at a target number of cycles   .Let us consider a structure subject to a main degradation process (  ,  ∈ ℝ + ) which increases randomly on ℝ * + = (0, ∞) until it reaches a critical value  cr > 0, meaning the failure of the structure. 57It is assumed that   is an observable process whose sample paths are obtained from experimental feedback indicating the service time for crack size from  0 to  cr .The reliability of such a system may be expressed by: that is the probability that the process   does not reach the failure boundary on the whole observation interval [0,   ].Thus, the reliability analysis consists in estimating the probability of failure given by Equation (13).It has shown in Annis 58 that first-and second-order reliability methods FORM/SORM were inappropriate for such a problem for estimating the probability of failure.Bourinet 59 fixed this issue by introducing an extra random variable which is representative of a model error.In this study, we present a simple and a straightforward method to compute the probability of failure.It is based on an appropriate selection of the form of the probability distribution describing the cycles-to-failure of the tested specimens.To better illustrate the purpose of this study, Figure 9 shows the probability of failure to be estimated.The target time service   is selected as the average value between the 7 th and 8 th lowest numbers of cycles of the Virkler tests: Based on the empirical cumulative density function, the probability of failure should be between 0.0980 and 0.1126.The ABC-NS algorithm is now employed using the  AD distance for illustrative purpose to discriminate between the candidate models with the same control parameters as in Example 1.Our goal here is to select simultaneously the reliability model but also the suitable distance to precisely estimate the probability of failure after a specified service time   .The posterior model probabilities over some intermediate populations are shown in Figure 10.The algorithm converges to the EMV model.To get more insights, we infer the competing models to see the impact of the selected distance on model selection.Figure 11 shows the posterior probabilities for the four distances using the same stopping criterion.One clearly see a substantial advantage to the EMV model except for the  AD2L distance where the LN model is supported by the data.Table 8 shows the estimated probabilities for all the distances.One can clearly see that the EMV and LN models performs well in terms of probability of failure prediction.It is worthwhile to mention that except the  CM distance, all the the selected models with ( AD ,  ADL and  AD2L ) distances provide an estimation of the probability of failure inside the   empirical bounds.This means that by selecting a plausible model and and appropriate distance function, it is possible to better capture the pattern of small extreme values more "closely" and more importantly, the data on the lower tail.
Figure 12 shows the empirical data and the fitted CDF using the selected model for each distance function.Overall, one can see a good fitting quality.Now let us investigate the influence of the available times to failure on the probability of failure accuracy for a target time service.The main goal is to get an idea about the robustness of the different distances in estimating the probability of failure.To do so, we drop the available data from 5 to 30 using a step of 5 and we estimate the probability of failure.To get a representative results, we replicate the simulations 500 times.From Figure 13, one can see that overall the  AD ,  AD distances can give relatively accurate predictions.Based on the obtained results, it is obvious that the  AD2L distance is more conservative as it overestimates the probability of failure while the  CM distance provides a non conservative F I G U R E 1 3 Boxplots of the probabilities of failure estimated using the selected models with different distances and 500 replications.
estimates of the probability of failure as it tends towards the lower bound.Although the estimated probability is out the empirical bounds, the  AD2L distance may be of interest if a conservative estimates are preferred in the practical context of reliability evaluation.

CONCLUDING REMARKS
In this paper, a simulation-based inference algorithm called ABC-NS was adopted to discriminate between a number of competing models widely used in fatigue.The ABC-NS ranks the competing models according to their plausibility in light of a calibration data set.We have illustrated the proposed methodology with various data sets considering small/moderate sample sizes.Overall the method works well without any numerical problems despite the limited data.Several distances have been considered each with a specified properties.By identifying the purpose of model selection, modellers and experimenters can decide which distance should be preferred.Through different simulation studies we have shown the good performance of the method, which requires minimal tuning.The obtained results show that no model is always supported by the data.Thus, a straightforward method to discriminate simultanesously between a number of candidate models is very useful.Additionally, the optimal model may vary regarding the distance function used to measure the discrepancy between the observed and the simulated data.In summary, the simulation-based method presents some important advantages compared with the classical methods and could be useful for practitioners.Some of the merits of the simulation-based inference method are give below: • The competing models are compared simultaneously and the least likely models are eliminated through the simulation.
• The proposed method is flexible in the sense that the modellers can use different distances.
• The principle of parsimony is well embedded in the ABC-NS algorithm (it tends to select simpler models first).
• The proposed methodology is easy to implement, intuitive and can address simultaneously parameter calibration and model selection.• The algorithm can be easily tailored following the user's interest which is embedded in the distance function.
• Very useful when the main concern is about the prediction of low percentiles fatigue life.
• It has shown that the ABC algorithm coupled with a suitable distance is less affected by model misspecification.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data and the Matlab codes that support the findings of this study are available from the corresponding author.

F I G U R E 3
Schematic diagrams of the two kinds of 16Mn steel welded specimens.TA B L E 3 Details of the test arrangements.
shows the distributions F I G U R E 4 Particles repartition over the populations using  ADL distance.

F I G U R E 9
Schematic diagram of the probability of failure to be estimated and the random time distribution with a target service time   .F I G U R E 1 0 Model posterior probabilities over some intermediate populations using  AD distance.TA B L E 8 Estimation of the probability of failure for the different distances for  cr = 49.8 mm and   = 237543 cycles.

F I G U R E 1 1 F I G U R E 1 2
Evolution of the model posterior probabilities over the populations for the different distances.Failure times to reach the critical crack size and the fitted CDFs using the different distance functions.CDF, cumulative distribution function.
AIC and BIC scores for the different competing models.
TA B L E 1Abbreviations: AIC, Akaike information criterion; BIC, Bayesian information criterion.
Empirical data and theoretical CDFs plot using different models and different distance functions.CDF, cumulative distribution function.Empirical and predicted percentiles using a mis-specified reliability model coupled with different distances.
Fatigue life data of the 16Mn steel welded plate specimens.
Posterior probabilities and the the corresponding model ranking (in parentheses) for specimens subjected to different stress levels.
TA B L E 6 The tested specimens have the following dimensions: length L = 558.8mm, width w = 152.4mm, thickness B = 2.54 mm and a crack of initial size 2 = 18 mm The fitting parameters  F1 and  F2 for the considered cracks using the ABC-NS with the different distance functions.