Prediction intervals for load‐sharing systems in accelerated life testing

Based on accelerated lifetime experiments, we consider the problem of constructing prediction intervals for the time point at which a given number of components of a load‐sharing system fails. Our research is motivated by lab experiments with prestressed concrete beams where the tension wires fail successively. Due to an audible noise when breaking, the time points of failure could be determined exactly by acoustic measurements. Under the assumption of equal load sharing between the tension wires, we present a model for the failure times based on a birth process. We provide a model check based on a Q‐Q plot including a simulated simultaneous confidence band and four simulation‐free prediction methods. Three of the prediction methods are given by confidence sets where two of them are based on classical tests and the third is based on a new outlier‐robust test using sign depth. The fourth method uses the implicit function theorem and the δ‐method to get prediction intervals without confidence sets for the unknown parameter. We compare these methods by a leave‐one‐out analysis of the data on prestressed concrete beams. Moreover, a simulation study is performed to discuss advantages and drawbacks of the individual methods.


INTRODUCTION
We consider J load-sharing systems that are exposed to different initial stress conditions s 1 , … , s J . Each system consists of I components, and the external load of the system is equally distributed over all working components. In particular, the stress of a working component increases if other components have failed. This is also known as load redistribution with equal load-share rule. A system fails if a critical number I c ≤ I of its components fails. We assume that the time points of component failures are observed up to I j ≤ I for j = 1, … , J. Our aim is to provide prediction intervals for the time point that I c components have failed in a new system exposed to a specific initial stress s 0 . The initial stress of the new system, for which the prediction should be done, can be much lower than the initial stress of observed systems, that is, s 0 < min{s 1 , … , s J }. Hence, we consider the problem of accelerated lifetime experiments.
Our research is motivated by experiments with prestressed concrete beams with I = 35 tension wires that are exposed to cyclic loading. In this case, the stress on the beam is periodically increased/decreased between two fixed values. The difference between these two values is called stress range and takes the role of the load/stress s j in our model. Other examples 1 of equal load-sharing system include systems of electrical components or batteries in a parallel arrangement.
Our experimental setting is different from the situation of systems of I independent components often considered in reliability analysis and usually called k-out-of-n systems with k = I c and n = I in our notation. In such situations, the life-times of the components can be assumed to be independent and identically distributed. For several lifetime distributions, prediction intervals for the time of the I c th failure are constructed if I 0 < I c ≤ I failures of I components were observed. This is usually done by predicting the I c th-order statistic based on the I 0 th-order statistic. [2][3][4] Nordman and Meeker 5 also construct a prediction interval for the additional number of failures in the I components at time t w based on the number of failures at time t < t w .
According to our knowledge, prediction intervals for load-sharing systems have not been derived up to now. Even load-sharing systems exposed to different initial load conditions are not treated, although several results for identically distributed load-sharing systems have been published recently. Most of these results are based on sequential order statistics. [6][7][8][9][10][11] Several approaches assume exponential or Weibull distribution [12][13][14][15][16][17] or other parametric distributions 18 for the lifetime of the components or the time between successive failures. Semiparametric approaches are given by Kvam and Peña 1 and Deshpande et al. 19 Kong and Ye use a cumulative exposure model 20 as it was used in step-stress accelerated life testing and provide a method 21 for testing the exponential distribution for the time between successive failures. Confidence sets for model parameters were derived for some special parametric models. 12,15,22 Xu et al 16 propose a test for the mean time to failure in a load-sharing model with Weibull distribution.
In most of these approaches, I or I c parameters must be estimated. This is only possible if the number I of components and the number I c of critical failures is not too large and several independent and identical systems under same conditions can be observed. However, here we will consider systems with more than 30 components, and the systems are observed under different stress conditions. To get reliable prediction intervals for the time that a critical number I c of components has failed, it is important to use a model with not too many unknown parameters to avoid overfitting.
There are some simple models with few parameters for identically distributed load-sharing systems based on Weibull distribution 13 and exponential distribution. 15 To reduce complexity, we assume an exponential distribution for the times between failures, but with the difference that we incorporate the different initial stress conditions by a link function. This leads to nonlinear birth processes.
We present different methods for simulation-free prediction intervals for the time until the I c th failure (event) in a new system under initial stress s 0 when other systems exposed to different initial stress levels have been observed before. We also include the possibility that some failures of the new system have already been detected.
Our approach is based on the fact that exponentially distributed interarrival times lead to failure times with a hypoexponential distribution and in particular the distribution function is known explicitly. However, a standard implementation of this distribution function can become numerically instable, which can cause the implemented function to not be monotone increasing and to take values above 1. Fortunately, there is an efficient and more stable algorithm for calculating the hypoexponential distribution function. 23 Hence, the quantiles of the failure time distribution can be computed efficiently and only the uncertainty of the unknown model parameters needs to be incorporated.
A simple possibility to include this uncertainty is to use classical confidence sets for the parameters of generalized linear models based on, for example, the Wald or the likelihood ratio test. 24, pp. 409 Here, we propose two alternatives. One alternative is based on the sign depth for getting outlier-robust confidence sets. 25,26 Since all methods based on confidence sets for the parameter vector have the shortcoming that they require a grid search or another form of candidate generation to evaluate the respective test, we also use the -method to construct confidence intervals for the quantiles of the predictive distribution as a second alternative.
The paper is organized as follows. Section 2 introduces the model including the prediction problem and provides a model check based on a Q-Q plot with a simulated simultaneous confidence band. Section 3 deals with the derivation of confidence sets. Section 3.1 summarizes classical confidence sets for the whole vector of model parameters and one-dimensional aspects of the model parameters such as the quantiles of the predictive distribution. In particular, the -method for asymptotic distributions is used to obtain confidence intervals for one-dimensional aspects of the model parameters. Completely new outlier-robust confidence sets for the whole model parameter are derived in Section 3.2. Section 4 contains the derivation of prediction intervals starting with the predictive distribution in Section 4.1. Numerical problems when calculation the quantiles of this so-called hypoexponential distribution are also discussed here. Prediction intervals based on confidence sets for the whole model parameter are given in Section 4.3 whereas predictions based on confidence intervals for quantiles of the predictive distribution can be found in in Section 4.4. In Section 5, we apply the model check and the different prediction methods to the data of experiments with prestressed concrete beams and compare the performance of the prediction methods by cross-validation. A comparison via simulation is given in Section 6. We summarize and discuss our results in Section 7. Appendix A.1 provides the mathematical foundations for getting the classical confidence sets, Appendix A.2 for all proofs, and Appendix A.3 details about the R implementation.

THE LOAD-SHARING MODEL AND THE PREDICTION PROBLEM
In this section, we provide the load-sharing model (Section 2.1), the prediction problem in the load-sharing model (Section 2.2), and a model check of the load-sharing model (Section 2.3).

The load-sharing model
We assume that each system j = 0, 1, … , J has I components and that the failure times of the components of system j are given by a point process 0 = T 0, < T 1, < T 2, < … < T I , with I j ≤ I, where I j is the number of observed failures in the jth system. It can be assumed that all these time points are different although, in reality, several failures may be observed at the same time since the time is measured on a discrete scale. Let W i,j = T i,j − T i−1,j and i ∈ N be the waiting times (interarrival times) between the (i − 1)th and ith failures (events) and let be the corresponding counting process of the failures, where [0,t] (x) denotes the indicator function which equals one if Here, we assume that only the number of past failures (events) N t − = lim ↑t N influences the intensity of appearance of a failure at time t. Hence, (N t ) t≥0 is a state dependent point process, that is, a birth process. 27, p. 95, 9628, p. 21, p. 95 In particular, we regard nonlinear birth processes given by the intensity function This implies that the waiting times W i+1,j and i = 0, 1, … , I j −1 are independent and have an exponential distribution with parameter i ∶= h (I∕(I − i)). Since we aim to model several independent systems under different initial stress levels, we incorporate the initial stress level s j by where h is a given link function between stress and failure time and ∈ Θ ⊂ R p is an unknown parameter. Hence, the waiting times are independent and exponentially distributed with An example for the function h is so that the logarithm of the expected waiting time ) is strictly decreasing in s j and i. It is the often used linear link function proposed by Basquin. 29 However, other linear and nonlinear link functions are possible.
Our model is different from the linear birth process given by intensity i = i for fatigue accumulation mentioned by Sobczyk and Spencer. 30, p.112 However, for load redistribution, our model is very reasonable since the failure of I components increases the individual load on each of the remaining (1 − )I components by a factor of 1∕(1 − ). Hence, the load increases nonlinearly. Note that this load-sharing rule has previously been used 12,31 but without the function h. The Basquin link was also used by Kong and Ye 15 for load-sharing models based on exponentially distributed lifetimes of the components. We are aware that our model does not include damage accumulation and aging. This is only possible by more general self-exciting point processes that can model damage accumulation and aging via intensity functions that are not constant between failures, see Section 7. However, this makes the derivation of simulation-free prediction intervals difficult. In particular, only exponential waiting times yield a Markov process. 32, Theorem 8.2.9 Moreover, we have fairly small datasets in mind (see Section 5), which always bears the risk of overfitting if models with too many model parameters are used. Hence, we keep the model as simple as possible. However, we provide a model check in Section 2.3. The results for the real data set in Section 5 indicate that this model is realistic enough to provide good prediction intervals.

The prediction problem
The aim of this article is to provide prediction intervals for the time at which a critical number I c of the I components has failed. Depending on the application, I c could be I but a smaller value is possible as well, for example, if the stability of the system cannot be guaranteed after a certain number of component failures. Moreover, the number I j of observed component failures in experiment j is not restricted to be exactly I c . A smaller value I j < I c usually appears if the corresponding experiment lasts too long and thus is stopped due to time/cost limitations. Larger values I j > I c are possible if I c is only the minimum number of failures for which some safety condition of the system is not ensured.
The predictions are based on observations from previous experiments as well as potentially a certain number of component failures in the current system for which a prediction is made. More precisely, we observe realizations w i,j of the random variables W i,j , i = 1, … , I j , and j = 1, … , J, from J different systems and eventually additional realizations w i,0 of the random variables W i,0 , i = 1, … , I 0 , from a new system. For the new system, I 0 is smaller than I c and is set equal to 0 if no failure of the new process was observed. In particular, w 1,0 , … , w I 0 ,0 are already realizations of the random variables W 1,0 , … , W I 0 ,0 while the random variables W I 0 +1,0 , … , W I c ,0 will be realized in future. The aim is to predict that is, the time up to the failure of the I c th component of the new system. Note that it is sufficient to predict W I 0 +1,0 + … + W I c ,0 since this immediately * yields a prediction interval for (2). To quantify the uncertainty of a point prediction, we derive prediction intervals in Section 4 based on confidence sets.

A model check
Although there are many tests for checking the exponential distribution for i.i.d. variables, 33 the problem of checking the exponential distribution for variables that are not identically distributed is challenging. Kong and Ye 21 provides a goodness-of-fit test for their approach of a load sharing model with exponentially distributed lifetimes. Since any goodness-of-fit test has the disadvantage of a 0-1 decision of rejecting or not rejecting the null hypotheses, we present here a model check via a Q-Q plot including a simulated simultaneous 90%-confidence band.
Since the waiting times are not identically distributed, this requires some kind of standardization first: The sample quantiles in the Q-Q plot are taken from the rescaled waiting times wherêis the maximum likelihood estimator for and the rates are chosen according to the Basquin model, that is, Note that if W i,j is distributed according to our model and the true model parameter is , then (i−1, s )·W i, ∼ Exp(1). Hence,W i, should have a distribution close to Exp(1) if̂is a consistent estimator for .
We will discuss our approach for deriving a simultaneous confidence band in the following more general framework. Consider independent random variables W 1 ( ), … , W N ( ) from the parametric model W n ( ) ∼ Exp( n ( )), n = 1, … , N, Suppose that the rates satisfy for all ,̃∈ Θ and some function h n ∶ R p → R. Note that condition (5) is satisfied for the Basquin model given by (4). We first observe that a rescaling with the maximum likelihood estimation (MLE) of leads to a distribution that does not depend on the model parameter anymore, hence allowing us to fix to any value in the subsequent analysis. Note that this was already observed for i.i.d. exponentially distributed random variables, 34 in which case the MLE is given by the inverse of the sample mean.
If n ( ) satisfies (5), then the distribution of (W 1 ( ), … ,W N ( )) does not depend on , that is, Computing the confidence band. The confidence band is based on an adaptation of the Kolmogorov-Smirnov test to the rescaled random variables in (3). To this end, fix an arbitrary parameter (the chosen parameter is irrelevant according to Lemma 1). Let M be the number of simulations used to determine the confidence band, that is, consider random variables x ≥ 0 denote the distribution function of Exp(1) and q n the n N+1 -quantile of this distribution. Let is a monotone decreasing function between two consecutive jumps of F m (note that this difference might change from positive to negative and thus the absolute difference x  → |F m (x) − F(x)| is not a monotone function). Hence, the absolute value of this difference attains its global maximum at the beginning or end of a jump, that is, if a < b are two consecutive jumps of F m , then where lim x↑b denotes the limit to b from below. Since jumps of F m occur at Finally, let c 0.90 be the empirical 90%-quantile of ||F 1 − F|| ∞ , … , ||F M − F|| ∞ . Then, the simulated simultaneous 90%-confidence band for the Q-Q plot is the smallest band containing all the points in More formally, the upper bound of the confidence band is a linear interpolation of the points The lower bound of the band is obtained by taking the minimum instead.
Remark 1. Note that X 1,m , … , X N,m are not identically distributed (their distributions only converge to Exp(1)). We therefore determine the quantile c 0.90 by simulation rather than taking the critical value of the Kolmogorov-Smirnov test for the Exp(1) distribution.

CONFIDENCE SETS
Most of the prediction intervals in Section 4 are based on confidence sets for . We will therefore discuss two classical approaches as well as a new outlier-robust approach to derive such confidence sets in this section. Another prediction approach in Section 4.4 is based on confidence intervals for the quantiles of the predictive distribution. Hence, we will also provide a confidence interval for a one-dimensional aspect g( ) of the parameter. As an overview, the following four confidence sets/intervals are provided: (a) A classical confidence set for based on the Wald test (6) in Section 3.1.
(b) A classical confidence set for based on the likelihood-ratio test (8) in Section 3.1.
(d) A new outlier-robust confidence set for based on sign depth (10) in Section 3.2.
Note that the sets in (a) to (c) are affected quite a lot by outliers in the data, that is, by observations that are atypically small or large. Method (d) provides an alternative that is less sensitive to such outliers but has the disadvantage of yielding a larger confidence set, in particular for small sample sizes. Hence, an appropriate confidence set from (a) to (d) should be chosen based on whether outliers are likely to occur in the application.

Classical confidence sets for the whole parameter and for functions of the parameter
Since we assume that the waiting times (interarrival times) W i,j are independent for i = 1, … , I j and j = 0, 1, … , J and have an exponential distribution with parameter (i − 1, s j ), we can easily derive a maximum likelihood estimator̂for bŷ= arg max where f (w) denotes the density of the exponential distribution with parameter and w i,j is the realization of W i,j for all i, j. A classical (1 − )-confidence set for the whole parameter vector based on this maximum likelihood estimator̂is given by where is the information matrix, 2 p,1− denotes the (1 − )-quantile of the 2 -distribution with p degrees of freedom and p is the dimension of the model parameter, that is, ∈ R p . The related test is widely known as the Wald test. 37, p. 349 Another classical (1 − )-confidence set for based on the likelihood-ratio test 35, pp. 459-461 is given by This is sometimes easier to calculate than the information-based confidence set given by (6) since no information matrix is needed. However, since the confidence set (6) is an ellipsoid, the computation of the set (8) usually takes more time than the computation of the set (6).
Remark 2. Note that the confidence sets (6) and (8) shrink to the true for N ∶= N( J) ∶= This happens as well if is replaced by (N) depending on N as long as 2 1 N converges to zero. In particular, (N) itself can converge to zero.
using the -method for asymptotic distributions where . g( ) ∶= g( ) and q N is the -quantile of the standard normal distribution. Similarly, one-sided confidence intervals for g( ) can be constructed by using the (1 − )-quantile instead of q N 1− ∕2 and by replacing one of the interval boundaries with infinity. ) with growing sample size N hold. Note that, ideally in applications, the stress levels s 0 , … , s J and observed numbers of failures I 0 , … , I J should be chosen in such a way that the information matrix I(̂) leads to the smallest possible confidence set in 6. In particular, the information matrix should not be singular in order to avoid an unbounded set. For the Basquin model (4) considered here, this is ensured if there are at least observations (failures) at two different initial stress levels s i ≠ s j and preferably the distance between these two stress levels should be large. However, the maximal and minimal feasible choice for the stress levels usually depends on the underlying experiment since very small choices for s j result in extremely long and therefore cost-inefficient failure times whereas very large choices for s j may result in immediate failures or at least failure times that are too short to be recorded. Hence, in applications, typically an adaptive choice of the initial stress levels starting with some larger stress levels is used. In case of the application considered in this paper, the chosen initial stress levels are listed in Figure 1 in the chronological order they were used in the experiments. Note that these choices were made adaptively based on the time span of the previous experiments. In particular, we did not attempt to optimize them in terms of the resulting information matrix.

New confidence sets for the parameter based on sign depth
Data depth is a possibility to generalize the outlier-robust median of univariate data to more complex situations. In particular, simplicial regression depth can be used for getting outlier-robust tests for hypotheses in general regression models. 38 Simplicial regression depth leads to (p + 1)-sign depth for many regression models if the unknown regression parameter is p-dimensional. 26 The (p + 1)-sign depth is the relative number of subsets with p + 1 observations with alternative signs To determine alternating signs of residuals, an ordering of the explanatory variables is needed. Therefore, lets ∶= (s 1 , … ,s N ) ⊤ be the vector obtained by sorting s * in ascending order and letw ∶= (w 1 , … ,w N ) ⊤ be the corresponding vector obtained from w * when applying the same permutation as used fors. If some initial stress levels are equal, then some of the components ofs are equal so that it is unclear how to order the corresponding observations inw. Each time when this happens, competing observations are ordered randomly, independently from the other cases where this happens. Since W i,j are exponentially distributed with parameter (i−1, s j ) for j = 0, … , J, i = 1, … , I j , the median is ln (2) (i−1,s ) . Then, the residuals r n ( ) ∶=w n − ln(2) h (s n ) , n = 1, … , N are realizations of independent random variables R n ( ), which have a median zero if is the true parameter. Setting z * = (z 1 , … , z N ) ⊤ with z n = (s n ,w n ) ⊤ , the (p + 1)-sign depth is defined as .
A parameter is appropriate for a data set z * if its sign depth is not smaller than the -quantile of the distribution of the sign depth. The distribution can be determined exactly for small sample sizes since only 2 N different realizations of the signs are possible almost surely for a continuous distribution of the residuals. However, for larger samples sizes, an asymptotic distribution is needed. These asymptotic distributions are known 25,39 for p = 1 and for p = 2. They are given in these papers for autoregressive models, but they also hold for other regression models since only the signs of the residuals are used. There is ongoing research to derive the asymptotic distribution for p > 2. Alternatively, simplified versions of the sign depth as proposed by Kustosz et al 26 can be used for p > 2. However, we do not recommend using the simplified version since it usually leads to much larger confidence sets.
Having an -quantile q of the distribution of the sign depth, a (1 − )-confidence set based on sign depth is given by Note that tests based on (p + 1)-sign depth are generalizations of the classical sign test since they are equivalent to the classical sign test for p = 1. Hence, they are outlier-robust. However, previous results 25,26 indicate that tests based on (p + 1)-sign depth with p ≥ 2 are much more powerful than the classical sign test.

PREDICTION INTERVALS
This section contains several different approaches to construct asymptotic (1 − )-prediction intervals Î pred (W * ) for the time W ∶= W I 0 +1,0 + … + W I c ,0 between the I 0 th and the I c th failure based on the waiting times W * ∶= (W 1,0 , … , W I 0 ,0 , … , W 1,J , … , W I J ,J ) observed so far. More formally, Î pred (W * ) is an interval computed based on W * such that Such prediction intervals usually require knowledge about the predictive distribution, that is, the distribution of the future random variable W. Hence, the cumulative distribution function and the quantiles of W are given in Section 4.1. Afterwards, the following prediction intervals are presented: (a) Naive/Plug-in prediction intervals (14) based on a maximum likelihood estimator for (Section 4.2). (b) Prediction intervals (15) based on a confidence set for computed in one of the following ways (Section 4.3): • Using the Wald test (6).
(c) Prediction intervals (19) based on confidence intervals (9) for the quantiles of the predictive distribution (Section 4.4).
Each prediction interval has its own advantages and disadvantages, which are highlighted in Sections 5 and 6. Most notably, approach (a) provides the smallest prediction intervals but struggles to keep the nominal (1 − )-coverage rate unless the samples size is very large and no outliers occur in the data. Approach (b) based on the sign depth is the only outlier-robust prediction presented in this paper and thus outperforms all other approaches in terms of coverage rate whenever outliers appear frequently in the data. However, this approach often produces the largest prediction intervals and can lead to very conservative predictions, in particular in applications where outlier-robustness is not needed. Finally, approach (c) is computationally way more efficient than (b) and also yields smaller prediction intervals while still keeping the nominal (1 − )-coverage rate in applications without outliers.

Predictive distribution of the failure times
If all W I 0 +1,0 , … , W I c ,0 had the same exponential distribution, then W would have a Gamma distribution and prediction intervals could be constructed via known results for Gamma distributions. However, here (I 0 , s 0 ), … , (I c − 1, s 0 ) are pairwise different, and therefore W has a hypoexponential distribution with density given by 46 Hence, the cumulative distribution function is Expression (12) can be computed efficiently, see R package sdprisk. 47 However, this implementation suffers from numerical instability when (I 0 , s 0 ), … , (I c − 1, s 0 ) do not differ much. 48 Since this is often the case in our data, we need a numerically more stable algorithm. This can be done by rewriting the cumulative distribution function using a matrix exponential as follows: where e 1 = (1, 0, … , 0) ⊤ is the first (I c − I 0 ) × 1 unit vector, 1 I c −I 0 = (1, 1, … , 1) ⊤ is an (I c − I 0 ) × 1 vector of ones, and matrix. An implementation 23 of (13) is considerably more stable than the direct computation of expression (12). The infinite series 49 using Ward's diagonal Padé approximation.
An -quantile b ( ) of this distribution can only be given implicitly, namely as

Background and naive approach
Although there are many methods for getting prediction intervals for exponential and other univariate lifetime distributions, 2,40-43 not much has been published for accelerated lifetime experiments. Patel 2 mentions only a prediction interval for accelerated lifetime experiments with normal distribution, while Xiong and Milliken 44 provide simulation-based prediction intervals for exponentially distributed lifetimes for accelerated life testing. Hong and Meeker 45 provide a prediction method based on dynamic covariate information, but no prediction intervals are given. Our (simulation-free) prediction intervals all require knowledge about the predictive distribution analyzed in Section 4.1, that is, the distribution of the random variable W in (11). If the -quantiles of this predictive distribution are denoted by b ( ), then the so-called naive or plug-in prediction interval for W is given by wherê=̂(W * ) is any consistent estimator for such as the maximum likelihood estimator. This prediction interval is popular among engineers due to its simplicity. While it yields good results for large sample sizes, small sample sizes can lead to prediction intervals with a poor coverage rate. 50 Therefore, we propose four more conservative approaches for prediction intervals in Sections 4.3 and 4.4 below. These approaches can also be used in other situations where the predicted variable W is independent of the observed random vector W * .

Prediction intervals based on confidence sets for the model parameter
The first approach is based on confidence sets for like those given by (6), (8), and (10). Throughout the section, (1) and (2) denote given error rates in (0, 1), representing the choice for in the confidence set and the choice for in the quantiles of the predictive distribution. Note that we do not use the more common subscript notation 1 and 2 to avoid double or triple subscripts in the formulas.
For the choice of (1), providing the confidence set, and (2), providing the interval for the future observation, see Remark 4. Note that it is sometimes easier to work with the following larger prediction interval: All prediction intervals (based on confidence sets) in this article are computed via (16) rather than (15) to reduce the computational effort. Note that both prediction intervals often coincide, for example, when the confidence setĈ (1) (W * ) is connected and the mapping  → b ( ) is continuous for every .

Prediction intervals based on confidence intervals for quantiles
The construction of a confidence setĈ (1) (W * ) for is sometimes not that easy, in particular for high-dimensional parameters . Therefore, it might be easier to base the prediction interval on confidence intervals of the quantiles b ( ) of the predictive distribution. Since these confidence intervals are obtained with the -method for asymptotic distributions, we call such prediction intervals prediction intervals by the -method. To this end, let which can be calculated explicitly. Setting g( ) = b (2)/2 ( ) and g( ) = b 1− (2)/2 ( ), respectively, Section 3.1 ensures that and are one-sided -confidence intervals for b (2)/2 ( ) and b 1− (2)/2 ( ), respectively.
Note that b ( ) can be given explicitly if I c = I 0 + 1, that is, if we want to predict only the next component failure. Then, we have , ln  (2)) implies (2) = 1− 1− 1− (1) , a convergence of (1) to zero means that (2) converges to . In such situations, the proposed prediction intervals behave asymptotically like the so called naive or plug-in prediction interval given by (14).
Remark 5. All confidence set based approaches typically lead to conservative prediction intervals, that is, the true coverage rate is usually above the (1 − ) level the predictions are designed for. This is not surprising since these prediction intervals will have at least a (1− (2))-coverage rate † whenever the true parameter lies within the confidence set. Even if this is not the case, the coverage rate will often be fairly large simply due to the large intervals provided. The same is also true for predictions based on the -method although these prediction intervals are usually a bit smaller.
The naive approach (14) is recommended if the sample size is sufficiently large ‡ and/or nonconservative predictions are not an issue. However, aside from having small sample sizes, we suggest to use the confidence-set-based predictions/ -method in the following situations: • The prediction model is an oversimplification of the true dynamics in the experiment. This is often the case and can cause the naive approach to yield poor coverage rates. Our application in the next section is indeed an example for a poor coverage rate. • The data can be contaminated with outliers. If some experiments might yield unexpectedly short/long component lifetimes, due to, for example, fabrication errors in the components, we recommend using a confidence set based on an outlier-robust method, for example, the approach in Section 3.2. A simulation study with outlier contaminated data can be found in Section 6.
Note that the approach in Section 3.2 can lead to an extremely large confidence set (and thus also prediction intervals) if the sample size is small. We therefore recommend this approach only if a sufficiently large number of waiting times (> 100) is observed.

APPLICATIONS
In this section, we analyze data from J=11 experiments with concrete beams. Each beam contains a total of I = 35 tension wires and is exposed to cyclic loading. During this process, the tension wires successively break. Since wire breaks produce a loud noise, the failure times can be determined exactly using acoustic measurements. Throughout the analysis, we consider the Basquin model given by the link function h (x) = exp (− 1 + 2 · ln(x)) since initial investigations indicated that this is the best choice among the link functions we considered. The initial stress levels s 1 , … , s 11 used in the experiments and a plot of the observed waiting times w ij between the breaks against the actual loads of s I I−i are presented in Figure 1. Before applying our prediction methods, we start with the model check given in Section 2.3.

Model check
The confidence band of the Q-Q plot in Figure 2 is computed by the simulated simultaneous 90%-confidence band presented in Section 2.3 using M = 1 000 000, that is, one million samples, from our model with a parameter equal to the MLÊof the data example. Although it seems like all quantiles lie within the confidence band, actually the first few sample quantiles are slightly below it. This is due to some lower outliers, see Figure 1 for a plot of all waiting times on a logarithmic scale. Ideally, one should therefore choose a robust parameter estimation/confidence set, for example, the method presented in Section 3.2.

Comparison of the prediction intervals via cross validation
We now apply the different prediction methods from Section 4 to the data from the concrete beam experiments. Recall that we consider the Basquin model, that is, the link function h (x) = exp (− 1 + 2 · ln(x)). Also note that since the unknown parameter vector is two-dimensional, the sign depth from Section 3.2 is based on subsets with three residuals and therefore called 3-depth. The following predictions intervals are compared: † usually even more than that since the interval is larger than [b (2)/2 ( ), b 1− (2)/2 ( )] ‡ The simulation study in Section 6 suggests that naive predictions based on around 150 observed waiting times already provide intervals with a coverage rate close to 1 − . For all prediction intervals, a level of 90% was chosen, which is a common choice for applications in construction engineering. That means (1) = (2) ≈ 0.0513 for the methods Wald, LR, 3-depth, and -method to ensure the level (1 − (1))(1 − (2)) = 90%. Figure 1 shows the logarithmic waiting times between the breaks of all experiments together with the estimated function log 10 (ĥ(·) −1 ) using the maximum likelihood estimator̂= (27.99, 2.89) and the pointwise prediction intervals for the logarithmic waiting time of the next break based on the two new prediction methods 3-depth and -method. The majority of the data is well approximated by our estimations. The -method mostly provides smaller prediction intervals than the 3-depth method, the only exception is the upper bound for small stress ranges, in which both methods yield nearly the same bound. However, the intervals are very similar and, in particular, not much influenced by the outliers with very small waiting times.
A comparison of all prediction methods is done by a leave-one-out cross-validation. The following scenarios for leave-one-out cross-validation are considered: Next∶ For all = 1, … , J = 11 and L = 0, … , I − 1, the waiting time W L+1, for the next event is predicted using w 1, , … , w L, and w i,l for i = 1, … , I l , l ≠ , Next 5∶ For all = 1, … , J = 11 and L = 0, … , I − 5, the waiting time w 1, + … + w L, + W L+1, + … + W L+5, until the future five events is predicted using w 1, , … , w L, and w i,l for i = 1, … , I l , l ≠ .
The chosen j = 1, … , J = 11 takes the role of the series that is to be predicted, and L denotes the number of wire breaks already observed in this series, that is, L = I 0 in the setup described in (2). Again, L = 0 means that no observations of the new series are used.
To measure the performance of the prediction intervals, the interval score 51 is used. It is given by u] is the prediction interval, w 0 is the realized future observation, and { } denotes the indicator function. It combines the coverage and length of the prediction interval in such a way that a small score corresponds to a good prediction interval.   Table 1 shows the average value of this score in millions and the coverage rates in the two scenarios. Note that the naive method struggles to reach the 90%-coverage rate since it provides too small prediction intervals. This leads to the best interval score in the scenario Next and the worst interval score in scenario Next 5. The other methods provide coverage rates close to or above the nominal level of 90%. The -method leads to the best interval score within these methods. Since the 3-depth method provides larger prediction intervals, its interval score is worse. However, it still beats the Wald and LR methods in scenario Next. In scenario Next 5, its prediction intervals are so large that the coverage is much above 90%, causing its interval score to be the worst among the non-naive methods.

The prediction intervals in the accelerated set-up
Finally, we discuss predictions for the last experiment with a lower stress range and only one wire break in total. Figure 3A shows the 90%-prediction intervals for this experiment called SB06 with a stress range s 0 = 50 MPa using the depth method and the -method based on the J = 10 experiments with stress ranges from 80 to 455 MPa. Again, both methods provide very similar prediction intervals where lower and upper bounds of the prediction intervals of the 3-depth method are a little bit smaller than those of the -method. Both prediction intervals cover the first wire breakage at about 28 million load cycles given by the red line. The experiment was stopped after about 108 million load cycles without observing a second wire failure. This results in a censored observation indicated by the arrow in Figure 3A. Both methods predict another wire break within the next 25 million load cycles, hence indicating that the experiment should have been run a little longer to produce more data.
Remark 6. In order to check for preexisting defects in the tension wires caused by the 108 million load cycles at 50 MPa, the experiment was restarted at a higher stress range of 120 MPa as SB06a. The course of this experiment is depicted in Figure 3B along with the 90%-prediction intervals given by the two methods where no preexisting damage is assumed, that is, the predictions in Figure 3B are done without using the results for SB06 at 50 MPa (note that this extra failure time would have barely any effect on the overall prediction). The prediction intervals are again similar but do not include the true failure times of the experiment SB06a. The wires broke earlier than our methods proposed. This makes preexisting defects in the wires plausible.

SIMULATION STUDY
We consider a setup where several wire breaks in models with high stress levels are observed (representing the accelerated lifetime experiments), which then are used to make predictions for a system with a fairly low stress level. More precisely, we simulate the first six wire breaks (i.e., I j = 6 for all j) in experiments with a total number of I = 10 wires § based on the Basquin link function with = (28, 3) (which corresponds to the rounded MLE in the real data). The initial stress ranges for the observed experiments are a K-times repetition of the triple s = 200, 100, 80 for several integers K. The prediction interval concerns the sixth break in an experiment with initial stress range s 0 = 60 where only the first three wire breaks are taken as part of the observed data. The overall prediction error is set to be = 0.1. As suggested in the remark at the end of Section 4, this error is split into in the cases of the -method and methods based on confidence sets, where ( 1 (K), 2 (K)) represent the errors ( (1), (2)) occurring in Sections 4.3 and 4.4. Note that the choice for 1 (K) is somewhat arbitrary among choices that ensure 2 (K) → as K → ∞ and that also lead to confidence sets that converge to the one-point set { } as K → ∞. The simulation results for K = 1, … , 50 (averaged over 10,000 simulations) are given in Figure 4. Although the naive approach struggles to keep the 90%-coverage rate if the dataset is fairly small, it still yields better results than the other approaches in terms of interval score (note that a small score corresponds to a good prediction interval). This is not surprising given that the other prediction methods are more conservative and lead to larger prediction intervals with an unnecessarily high coverage rate.
Contaminated data. To study the behavior of the methods in the presence of outliers, we contaminated the data as follows: Each of the 18K simulated waiting times for s = 200, 100, 80 has a 20% chance to become an outlier (the choices are made independently from each other). Each chosen waiting time is replaced by an outlier where with the probability 50% the outlier is given by the 10th of the original value and with probability 50% it is set to 1. Outliers with small values appear often in applications as can be seen from the data set in Figure 1. However, it is known that methods for the exponential distribution are fairly robust against small outliers since these outliers are bounded by zero. This is also visible from the results in Section 5.2. But there is also the phenomenon of fatigue-tested specimen without rupture at stress levels where this is not expected. This means that sometimes unusual long waiting times appear. These are the outliers with large values. Figure 5 shows the results for the contaminated data. Now the naive method always has the problem that its coverage rate is much smaller than the nominal level. This also happens for the other likelihood-based methods Wald, LR, and -method although the deviation from the nominal coverage level is not as large as for the naive method. Only the 3-depth method keeps the level so that its score becomes the best for larger sample sizes.
Similar results are obtained with other scenarios of contamination, see Supporting Information. § we reduced the total number of wires in order to obtain a relative number I j ∕I = 0.6 of broken wires which is similar to the number of wire break in, e.g, experiment SB04 in the real data FIGURE 5 A, Coverage rates and B, scores of the different methods applied to contaminated data. The contamination chance is 20%. Contaminated data has a 50% chance to increase the waiting time by a factor of 10 and a 50% chance to decrease the waiting time to 1 [Colour figure can be viewed at wileyonlinelibrary.com]

CONCLUSION
We have proposed two new prediction intervals for the time of the failure of the I c th component of a load-sharing system with I components when the failure process of other systems with I components under different stress levels has been observed before. One of the new prediction intervals is outlier-robust because it is only based on the signs of residuals of the observed failures. Outlier-robustness is important since sometimes the waiting times between the failure of components are very short. Also, unusually long waiting times appear in so-called fatigue tests without rupture. Although outlier-robust methods are often less efficient than classical methods, the application to a real data set coming from experiments with prestressed concrete beams showed that they behave similarly or even better than the classical methods based on the Wald test and the likelihood ratio test. In the simulation study, they become superior for data with contamination.
The outlier-robust methods and the classical methods have the disadvantage that confidence sets for the unknown parameter must be calculated. In particular, this can be very time consuming for higher-dimensional parameters and usually has to be approximated by, for example, a grid search. Therefore, we proposed a more accurate method via the -method and the implicit function theorem where no confidence sets of the parameters have to be calculated. The application to the real data set and the simulation study show that this method yields fairly good results although the used dimension of the parameter was not high. It can be assumed that the superiority of this method compared with the confidence-set-based approaches becomes higher if the parameter has a higher dimension.
We assumed that the number I of components is the same for all systems since this was the case in our application with experiments with prestressed concrete beams. An extension to systems with different numbers of components can be done easily. We ignored censoring since most of our experiments were stopped after the last reported failure. Censoring can be easily incorporated in the likelihood based methods, but for the method based on sign depth, it is still unclear how this can be done. Similarly, an extension to a model incorporating damage accumulation and aging of the components will be much more complicated. Then, simple state-dependent processes are not adequate anymore and processes from the more general class of self-exciting processes must be used.
Self-exciting point processes include many point processes often used in reliability analysis as the homogeneous or inhomogeneous Poisson process or renewal processes. 27,30,52 Although a combination of a nonhomogeneous Poisson process with a birth process would be very adequate to model damage accumulation and load redistribution, we are not aware of a statistical analysis for such models. The only related paper 53 considers a combination of (nonhomogeneous) Poisson processes and renewal processes. In this paper, Lawless and Thiagarajah derive maximum likelihood estimators and apply a Wald test for a nonhomogeneous Poisson process of airplane air-condition failure data. Dachian and Kutoyants 54 also developed a test for testing a stationary Poisson process against a stationary self-exciting point processes. Other extensions of the nonhomegeneous Poisson process and renewal processes can be found in the literature. 55,56 Statistical inference for the more general class of self-exciting processes is treated mainly for other fields of application as market analysis, 57 analysis of financial data, [58][59][60][61][62] neuron firing data, 63 cancer data, 64 or earthquake data. 65 These papers treat estimation and testing of unknown parameters of the model but not prediction intervals. Only one 58 of the papers provides prediction intervals, namely for the terminal prices in online auctions. However, the prediction intervals are obtained by simulations.
We expect that simulation-free predictions intervals will be difficult to derive for self-exciting point processes including load sharing and damage accumulation. This is mainly due to the fact that our simulation-free prediction intervals depend heavily on knowing the distribution of the sum of the waiting times, that is, the hypoexponential distribution in our approach, explicitly. Without assuming independence between waiting times, it is usually very hard to compute this distribution analytically and simulation-based predictions are required. (i, s) ∶= (i, s). This matrix can be estimated by 1∕N( J) · I(̂) with I( ) given by (7). The continuous mapping theorem and Lemma of Slutzky provide that (̂− ) ⊤ I(̂) (̂− ) converges weakly to a 2 -distribution so that the set given by (6) is an asymptotic (1 − )-confidence set for . Using an additional approximation with the Taylor expansion for the i.i.d. case leads to the confidence sets based on the likelihood-ratio test given by (8). Moreover, the -method applied to (A1) provides the asymptotic normality of √ N( J)(g(̂)−g( )). Hence, an asymptotic (1 − )-confidence interval for g( ) is given by (9).

A.2 Proofs
Proof of Lemma 1. It is a well-known fact that W∕ ∼ Exp( ) for W ∼ Exp(1) and 0. Hence, we may assume that for all ∈ , Note that̂does not depend on * , that is, the distribution of the differencê( * ) − * does not depend on the true model parameter * . Consequently, once again using (A2), W n ( * ) = n (̂( * )) · W n n ( * ) = h n (̂)W n .
The assertion follows since the right-hand side does not depend on * .
Proof of Theorem 2. The one-sided confidence intervals given by (17) and (18) provide Since W and̂=̂(W * ) are independent, we obtain

A.3 Implementation details
For details of the implementation in R, see Supporting Information. Here it should only be mentioned that some effort was necessary not only for the calculation of the quantiles of the hypoexponential distribution but also for the 3-depth method. In order to speed up the computation of the 3-depth, we use an (asymptotically) equivalent form 25,Equation (3.2) that can be computed in linear time. The quantiles of the asymptotic distribution of the 3-depth were taken from the R package rexpar. 66 For small sample sizes, we used simulated quantiles of the distribution of the 3-depth calculated by Melanie Horn (TU Dortmund).

SUPPORTING INFORMATION
R-package loadshare: R-package which provides the methods and data set. This package is available on GitHub and can be installed via devtools::install_github(KLeckey/loadshare) More details and results: R implementation details and more results of the simulation study can be found in the separate file Loadshare_Supplementary_Material.pdf given on https://www.statistik. tu-dortmund.de/leckey0.html.