Robust estimation of stationary continuous-time ARMA models via indirect inference

In this paper we present a robust estimator for the parameters of a continuous-time ARMA(p,q) (CARMA(p,q)) process sampled equidistantly which is not necessarily Gaussian. Therefore, an indirect estimation procedure is used. It is an indirect estimation because we first estimate the parameters of the auxiliary AR(r) representation ($r\geq 2p-1$) of the sampled CARMA process using a generalized M- (GM-)estimator. Since the map which maps the parameters of the auxiliary AR(r) representation to the parameters of the CARMA process is not given explicitly, a separate simulation part is necessary where the parameters of the AR(r) representation are estimated from simulated CARMA processes. Then, the parameter which takes the minimum distance between the estimated AR parameters and the simulated AR parameters gives an estimator for the CARMA parameters. First, we show that under some standard assumptions the GM-estimator for the AR(r) parameters is consistent and asymptotically normally distributed. Next, we prove that the indirect estimator is consistent and asymptotically normally distributed as well using in the simulation part the asymptotically normally distributed LS-estimator. The indirect estimator satisfies several important robustness properties such as weak resistance, $\pi_{d_n}$-robustness and it has a bounded influence functional. The practical applicability of our method is demonstrated through a simulation study with replacement outliers and compared to the non-robust quasi-maximum-likelihood estimation method.


Introduction
The paper presents a robust estimator for the parameters of a discretely observed stationary continuoustime ARMA (CARMA) process. A weak ARMA(p, q) process in discrete-time is a weakly stationary solution of the stochastic difference equation (1.1) where B denotes the backward shift operator (i.e. BX m = X m−1 ), φ (z) = 1 − φ 1 z − . . . − φ p z p and θ (z) = 1 + θ 1 z + . . . + θ q z q are the autoregressive and the moving average polynomials, respectively, with φ 1 , . . . , φ p , θ 1 , . . . , θ q ∈ R, φ p , θ q = 0 and (Z m ) m∈Z a weak white noise, i.e., (Z m ) m∈Z is an uncorrelated sequence with constant mean and constant variance. If (Z m ) m∈Z is even an independent and identically distributed (i.i.d.) sequence then we call (X m ) m∈Z a strong ARMA process. A natural continuous-time analog of this difference equation with i.i.d. noise (Z m ) m∈Z is the formal p-th order stochastic differential equation where D denotes the differential operator with respect to t, a(z) = z p + a 1 z p−1 + . . . + a p and c(z) = c 0 z q + c 1 z q−1 + . . . + c q are the autoregressive and the moving average polynomials, respectively, with p > q, and a 1 , . . . , a p , c 0 , . . . , c q ∈ R, a p , c 0 = 0. The process (L t ) t∈R is a Lévy process, i.e., a stochastic process with L 0 = 0 almost surely, independent and stationary increments and almost surely càdlàg sample paths. However, this is not the formal definition of a CARMA(p, q) process because a Lévy process is not differentiable. The idea is more that the differential operator on the autoregressive side act like an integration operator on the moving average side. The precise definition of a CARMA process is given later. A rigorous foundation for CARMA(p, 0) processes is provided in Bergstrom (1983Bergstrom ( , 1984 and for CARMA(p, q) processes in Brockwell (2001). A Lévy driven CARMA process can be defined via a controller canonical state space representation. Necessary and sufficient conditions for the existence of strictly stationary CARMA processes are given in Brockwell and Lindner (2009). From Brockwell and Lindner (2009) (see as well Thornton and Chambers (2017)) it is also well known that a discretely sampled stationary CARMA process (Y mh ) m∈Z (h > 0 fixed) admits a weak ARMA representation, but unfortunately this is in general for Lévy driven models not a strong ARMA representation. For an overview and a comprehensive list of references on CARMA processes we refer to Brockwell (2014) and Chambers and Thornton (2018). In many situations it is more appropriate to specify a model in continuous time rather than in discrete time. In recent years the interest in these models has increased with the availability of highfrequency data in finance and turbulence but as well by irregularly spaced data, missing observations or situations when estimation and inference at various frequencies is to be carried out. It is not surprising that stationary CARMA processes are applied in many areas as, e.g., signal processing and control (cf. Garnier and Wang (2008); Larsson et al. (2006)), high-frequency financial econometrics (cf. Todorov (2009)) and financial mathematics (cf. Benth et al. (2014a,b)). The first attempts for maximum-likelihood estimation of Gaussian stationary and non-stationary MCAR(p) models are going back Stock (1985a,b, 1989) and were further explored in the well-known paper of Zadrozny (1988). Zadrozny (1988) investigates continuous-time Brownian motion driven ARMAX models and allows stocks and flows at different frequencies, and higher order integration. There exist a few papers dealing with the asymptotic properties of parameter estimators of discretely sampled stationary CARMA models as Schlemm and Stelzer (2012); Brockwell et al. (2011) and for nonstationary CARMA models Fasen-Hartmann and Scholz (2017). The papers have in common that they use a quasi maximum likelihood estimator (QMLE). However, it is well known that QMLE are sensitive to outliers and irregularities in the data. Hence, we are looking for an alternative robust approach.
In statistics the most fundamental question when considering robustness of an estimator is how the estimator behaves when the data does not satisfy the model assumptions (cf. Huber and Ronchetti (2009); Maronna et al. (2006); Olive (2017)). In the case of small deviations from the model assumptions a robust estimator should give estimations not too far away from the estimations of the original model. The most common and best understood robustness property is distributional robustness where the shape of the true underlying distribution deviates slightly from the assumed model. The amount of measures for robustness is huge, e.g., qualitative robustness, quantitative robustness, optimal robustness, efficiency robustness and the breakdown point, to mention only a few. In contrast to the case of i.i.d. random variables, in the case of time series, there exist several types of possible contamination of the data which makes it more difficult to characterize robustness. In particular, for AR processes it is well-known that the GM-estimator (cf. ; ; Martin (1980)) and the RA-estimator (cf. Ben et al. (1999)) satisfy different robustness properties in contrast to M-or LS-estimators which are sensitive to the presence of additive outliers (cf. Denby and Martin (1979)). However, for general ARMA models the GM-estimator and the RA-estimator are again sensitive to outliers and hence, non-robust (cf. Bustos and Yohai (1986)). Muler et al. (2009) develop a robust estimation procedure for ARMA models by calculating the residuals of the ARMA models with the help of BIP-ARMA models. For their result it is essential to have a strong ARMA model. Unfortunately the results can not easily be extended to weak ARMA models which we have in our context.
In this paper we use the indirect inference method originally proposed by Smith (1993) for nonlinear dynamic economic models. That paper was extended by Gallant and Tauchen (1996) and Gouriéroux et al. (1993) (see also the overview in Gouriéroux and Monfort (1997)) for models with intractable likelihood functions and moments. If the likelihood function and moments are intractable maximum likelihood estimation and generalized methods of moments are infeasible. The authors applied the indirect inference method to macroeconomics, microeconomics, finance and auction models; see as well Monfort (1996); Phillips and Yu (2009) for applications to continuous-time models, Gouriéroux et al. (2000); Kyriacou et al. (2017) for applications to time series models and Monfardini (1998) for applications to stochastic volatility models. In addition indirect inference is used for bias reduction in finite samples as, e.g., in Gouriéroux et al. (2000Gouriéroux et al. ( , 2010; Yu (2011); Kyriacou et al. (2017);do Rêgo Sousa et al. (2019). Our motivation for the indirect inference method is robust estimation (cf. Genton (2001, 2000); Kyriacou et al. (2017)). An alternative approach for bias correction is given in Wang, Phillips and Yu (2011) for univariate and multivariate diffusion models. For estimators of the mean revision parameter based on the Euler approximation and the trapezoidal approximation for discretization the authors calculate the bias and relate it to the estimation bias and discretization bias.
The core idea of the indirect estimation method is to avoid estimating the parameters of interest directly and instead fit an auxiliary model to the data, estimate the parameters of this auxiliary model and then use this estimates with simulated data to construct an estimator for the original parameter of interest (see de Luna and Genton (2001) for a schematic overview over the indirect estimation method). de Genton (2001, 2000) recognized that it is possible to construct robust estima-tors via this approach, even for model classes where direct robust estimation is difficult. The reason is that it is sufficient if the parameters of the auxiliary model are estimated by a robust estimation method. Therefore, de Luna and Genton (2001) present an indirect estimation procedure for strong ARMA processes (without detailed assumptions and rigorous proofs). They fit an AR(r) process to the ARMA model and estimate the parameters of the AR(r) process with a GM-estimator. We present a similar approach in our paper for the estimation of the CARMA parameters. Since the discretely sampled stationary CARMA process admits a weak ARMA representation instead of a strong ARMA representation several proofs have to be added and identifiability issues have to be taken into account.
The paper is structured as follows. In Section 2, we first present our parametric family of stationary CARMA processes and our model assumptions. Furthermore, we motivate that for any r ≥ 2p − 1 any stationary CARMA process has an AR(r) representation. Then, in Section 3, we introduce the indirect estimation procedure and give sufficient criteria for indirect estimators to be consistent and asymptotically normally distributed independent of the model; we have to assume at least consistent and asymptotically normally distributed estimators in the estimation part and in the simulation part of the indirect estimation method. Since the auxiliary AR(r) parameters of the sampled CARMA process are estimated by a GM-estimator we give an introduction into GM-estimators in Section 4 and derive consistency and asymptotic normality of this estimator in our setup. Moreover, we see that the GMestimator is still asymptotically normally distributed for CARMA processes with outliers as additive outliers and replacement outliers. Our conclusions extend the results of Bustos (1982). Finally, in Section 5, we are able to show that the indirect estimator for the parameters of the discretely observed stationary CARMA process is consistent and asymptotically normally distributed using in the estimation part a GM-estimator and in the simulation part a LS-estimator. Several robustness properties of this estimator are derived as well as qualitative robustness and a bounded influence functional. After all, the simulation part, in Section 6, shows the practical applicability of our indirect estimator and its robustness properties. We compare our estimator with the non-robust QMLE. Conclusions are given in Section 7. The paper ends with the proofs of the results in Section 8.

Notation
We use as norms the Euclidean norm · in R d and its operator · in R m×d which is submultiplicative. For a matrix A ∈ R m×d we denote by A T its transpose. For a matrix function f (ϑ ) in R m×d with ϑ ∈ R s the gradient with respect to the parameter vector ϑ is ∂ ϑ T ∈ R dms×s . Finally, we write D −→ for weak convergence and P → for convergence in probability. In general C denotes a constant which may change from line to line.

The CARMA model
In this paper we consider a parametric family of stationary CARMA processes. Let Θ ⊆ R N(Θ ) (N(Θ ) ∈ N) be a parameter space, p ∈ N be fixed and for any ϑ ∈ Θ let a 1 (ϑ ), . . . , a p (ϑ ), c 0 (ϑ ), . . . , c p−1 (ϑ ) ∈ R, a p (ϑ ) = 0 and c j (ϑ ) = 0 for some j ∈ {0, . . . , p − 1}. Furthermore, de- The CARMA process (Y t (ϑ )) t∈R is then defined via the controller canonical state space representation: Let (X t (ϑ )) t∈R be a strictly stationary solution to the stochastic differential equation where e p denotes the p-th unit vector in R p . Then the process is said to be a (stationary) CARMA process of order (p, q(ϑ )). Rewriting (2.1) line by line (Y t (ϑ )) t∈R can be interpreted as solution of the differential equation (1.2); see Brockwell (2001); . This means that in our parametric family of CARMA processes the order of the autoregressive polynomial is fixed to p but the order of the moving average polynomial q(ϑ ) may change. In addition, we investigate only stationary CARMA processes. Furthermore, we have the discrete-time observations Y h , . . . ,Y nh of the CARMA process (Y t ) t∈R = (Y t (ϑ 0 )) t∈R with fixed grid distance h > 0. Hence, the true model parameter is ϑ 0 . The aim of this paper is to receive from the observations Y h , . . . ,Y nh an estimator for ϑ 0 . Throughout the paper we will assume that the following Assumption A holds.
Assumption A.
(A.1) The parameter space Θ is a compact subset of R N(Θ ) .
(A.2) The true parameter ϑ 0 is an element of the interior of Θ .
(A.4) The eigenvalues of A ϑ have strictly negative real parts.
(A.7) For all ϑ ∈ Θ the spectrum of A ϑ is a subset of {z ∈ C : − π h < Im(z) < π h } where Im(z) denotes the imaginary part of z.
(A.8) The maps ϑ → A ϑ and ϑ → c ϑ are three times continuous differentiable. (ii) (A.4) guarantees that there exists a stationary solution of the state process (2.1a) and hence, a stationary CARMA process (Y t (ϑ )) t∈R (see ). For this reason we can and will assume throughout the paper that (Y t (ϑ )) t∈R is stationary. The assumption of a stationary CARMA process (Y t (ϑ )) t∈R is essential for the indirect estimation approach of this paper.
(v) A consequence of (A.5) and (A.6) is that the family of stationary CARMA processes (Y t (ϑ )) t∈R is identifiable from their spectral densities and in combination with (A.7) that the same is true for the discrete-time process (Y mh (ϑ )) m∈Z (cf. Schlemm and Stelzer (2012, Theorem 3.13)).
(vi) The CARMA process has to be sampled sufficiently finely to ensure that (A.7) holds so that the parameters can be identified from the discrete data.
In the following we denote the autocovariance function of the stationary CARMA process (Y t (ϑ )) t∈R as (γ ϑ (t)) t∈R which has by Schlemm and Stelzer (2012, Proposition 3 ∞ 0 e A ϑ u e p e T p e A ϑ u du. Due to Assumption A the autocovariance function is three times continuous differentiable as well.

The AR(r) representation of a stationary CARMA process
First, we define the auxiliary AR(r) representation of the sampled CARMA process (Y mh (ϑ )) m∈Z .
Definition 2.4. Let Π ⊆ R r+1 be the parameter space containing all possible parameter vectors of stationary AR(r) processes. The map π : Θ → Π with ϑ → π(ϑ ) and π(ϑ ) as given in Theorem 2.2 is called the link function or binding function.
Finally, due to Lemma 2.5 we suppose throughout the paper:

Indirect estimation
For fixed r, denote by π n an estimator of π(ϑ 0 ) that is calculated from the observations Y n = (Y h , . . . ,Y nh ). If we were able to analytically invert the link function π and calculate π −1 ( π n ), then π −1 ( π n ) would be an estimator for ϑ 0 = π −1 (π(ϑ 0 )). However, this is not possible in general since no analytic representation of π −1 exists. To overcome this problem, we perform a second estimation, which is based on simulations, and constitutes the other building block of indirect estimation. We fix a number s ∈ N and simulate a sample path of length sn of a Lévy process (L S t ) t∈R with EL S 1 = 0 and E(L S 1 ) 2 = σ 2 L . Then, for a fixed parameter ϑ ∈ Θ we generate a sample path of the associated CARMA process (Y S t (ϑ )) t∈R using the simulated path (L S t ) t∈R . This gives us a vector of "pseudo-observations" Y sn S (ϑ ) = (Y S h (ϑ ), . . . ,Y S snh (ϑ )) of length sn. From this observation Y sn S (ϑ ) we estimate again π(ϑ ) by an estimator π S sn (ϑ ). The idea is now to choose that value of ϑ as estimator for ϑ 0 which minimizes a suitable distance between π n and π S n (ϑ ). The formal definition is as follows.
Definition 3.1. Let π n be an estimator for π(ϑ 0 ) calculated from the data Y n , let π S sn (ϑ ) be an estimator for π(ϑ ) calculated from the pseudo-observations Y sn S (ϑ ) = (Y S h (ϑ ), . . . ,Y S ϑ (snh)) and let Ω ∈ R N(Θ )×N(Θ ) be a symmetric positive definite weighting matrix. The function L Ind : We are able to present general conditions under which this indirect estimator is consistent and asymptotically normally distributed.
(a) Suppose that the following assumptions are satisfied: Define the map If we replace in (C.1) and (C.2) convergence in probability by almost sure convergence then we can replace in the statement convergence in probability by almost sure convergence as well.
Then, as n → ∞, Gouriéroux et al. (1993) develop for a dynamic model as well the consistency and the asymptotic normality of the indirect estimator but under different assumptions mainly based on L Ind (ϑ , Y n ) (see as well Smith (1993)). These results are again summarized in Gouriéroux and Monfort (1997). In the context of indirect estimation of ARMA models, de Luna and Genton (2001, p.22) mention the asymptotic normality of their indirect estimator but without stating any regularity conditions and only referring to Gouriéroux and Monfort (1997, Proposition 4.2).

Remark 3.3.
(a) The asymptotic covariance matrix can be written as T Ω . This is the analog form of de Luna and Genton (2001, Eq. (4.4)).
(b) Note that the asymptotic results hold for any r ≥ 2p − 1. But increasing the auxiliary AR order does not necessarily yield better results. On the other hand, increasing s increases the efficiency. For s → ∞ we receive Ξ Ind (ϑ 0 ) → H (ϑ 0 )Ξ D (ϑ 0 )H (ϑ 0 ) T . The best efficiency is received for Remark 3.4. A fundamental assumption for Proposition 2.2 is (A.4) resulting in the existence of stationary CARMA processes. In particular, in the case of integrated CARMA processes (Y t (ϑ )) t∈R , where A ϑ has eigenvalue 0, the result of Proposition 2.2 does not hold in general. For this reason the indirect estimation approach of this paper can not be extended to integrated CARMA processes which are non-stationary. Even for integrated CARMA processes it is well known that estimators for the parameter determining the integration have a n convergence instead of a √ n convergence (cf. Chambers and McCrorie (2007); Fasen-Hartmann and Scholz (2017); Chambers and Thornton (2018)).
Remark 3.5. The discretely observed stationary CARMA(p, q(ϑ )) process (Y mh (ϑ )) m∈Z admits a representation as a stationary ARMA(p, p − 1) process with weak white noise of the form (1− e hλ i z) (the λ i being the eigenvalues of A ϑ ), θ (z) is a monic, Schur-stable polynomial and (ε m (ϑ )) m∈Z is a weak white noise (see Brockwell and Lindner (2009, Lemma 2.1)), i.e. (Y mh (ϑ )) m∈Z is a weak ARMA(p, p − 1) process. Such an exact discrete-time ARMA representation for multivariate CARMA processes was generalized in (Thornton and Chambers, 2017, Theorem 1) to possible non-stationary multivariate CARMA processes. Thus, it is as well possible to do an indirect estimation procedure by estimating the parameters of the discrete-time ARMA(p, p − 1) representation, e.g., using maximum-likelihood, instead of estimating the parameters of the auxiliary AR(r) model. Then the map π is replaced by the map π 1 which maps the parameters of the CARMA process to the coefficients of the weak ARMA(p, p − 1) representation of its sampled version (3.2). Using π 1 (ϑ ) instead of π(ϑ ) in Theorem 3.2, Theorem 3.2 can be adapted under the same assumptions giving asymptotic normality of the indirect estimator based on the discrete-time ARMA representation of the CARMA process. In particular, it is as well possible to derive an estimation procedure for non-stationary CARMA processes. However, until now there does not exist robust estimators for the parameters of weak ARMA processes such that this approach does not give robust estimators for the parameters of the stationary CARMA process, which is the topic of this paper.

Estimating the auxiliary AR(r) parameters of a CARMA process with outliers
In order to apply the indirect estimator to a discretely sampled stationary CARMA process we need strongly consistent and asymptotically normally distributed estimators for the parameters of the auxiliary AR(r) representation. In this section we will study generalized M-(GM-) estimators. The GM-estimator will be applied to a stationary CARMA process afflicted by outliers because we want to study some robustness properties of our estimator as well. Outliers can be thought as typical observations that do not arise because of the model structure but due to some external influence, e.g., measurement errors. Therefore, a whole sample of observations which contains outliers does not come from the true model anymore but it is still close to it as long as the total number of outliers is not overwhelmingly large. and let (Z m ) m∈Z be a real-valued stochastic process. The disturbed process (Y γ mh (ϑ )) m∈Z is defined as The disturbed process (Y γ mh (ϑ )) m∈Z is in general not a sampled CARMA process anymore.
(a) The interpretation of this model is that at each point m ∈ Z an outlier is observed with probability g(γ) while the true value Y mh (ϑ ) is observed with probability 1 − g(γ). The model has the advantage that one can obtain both additive and replacement outliers by choosing the processes (Z m ) m∈Z and (V m ) m∈Z adequately. Specifically, to model replacement outliers, one assumes that (Z m ) m∈Z , (V m ) m∈Z and (Y mh (ϑ )) m∈Z are jointly independent. Then, if the realization of V m is equal to 1, the value Y mh (ϑ ) will be replaced by the realization of Z m justifying the use of the name replacement outliers. On the other hand, modeling additive outliers can be achieved by taking Z m = Y mh (ϑ ) + W m for some process (W m ) m∈Z and assuming that (Y mh (ϑ )) m∈Z is independent from (V m ) m∈Z . Then we have Y γ mh (ϑ ) = Y mh (ϑ ) + V m W m such that the realization of W m is added to the realization of Y mh (ϑ ) if V m is 1.
(b) Another advantage of this general outlier model is that one can easily model the temporal structure of outliers. On the one hand, if (V m ) m∈Z is chosen as an i.i.d. sequence with P(V m = 1) = γ, then outliers typically appear isolated, i.e., between two outliers there is usually a period of time where no outliers are present. On the other hand, one can also model patchy outliers by letting (B m ) m∈Z be an i.i.d. process of Bernoulli random variables with success probability ε and setting V m = max(B m−l , . . . , B m ) for a fixed l ∈ N. Then as ε → 0, which results in γ = lε. For ε sufficiently small, outliers then appear in a block of size l.
Recall the following notion: Definition 4.3. A stationary stochastic process Y = (Y t ) t∈I with I = R or I = Z is called strongly (or α-) mixing if . If α l ≤ Cα l for some constants C > 0 and 0 < α < 1 we call Y = (Y t ) t∈I exponentially strongly mixing.
(D.2) Either we have the replacement model where the processes (Y mh (ϑ )) m∈Z , (V m ) m∈Z and (Z m ) m∈Z are jointly independent, and (V m ) m∈Z and (Z m ) m∈Z are exponentially strongly mixing, i.e., α V (m) ≤ Cρ m and α Z (m) ≤ Cρ m for some C > 0, ρ ∈ (0, 1) and every m ∈ N. Or we have the additive model with Z m = Y mh (ϑ ) + W m where the processes (Y mh (ϑ )) m∈Z , (V m ) m∈Z and (W m ) m∈Z are jointly independent, and (V m ) m∈Z and (W m ) m∈Z are exponentially strongly mixing.
(D.3) For all a ∈ R, π ∈ R r with |a| + π > 0: We largely follow the ideas of Bustos (1982) for the GM-estimation of AR(r) parameters, however our model and our assumptions are slightly different. Assumption D corresponds to Bustos (1982, Assumption (M2),(M4),(M5)). The main difference is that the sampled stationary CARMA process (Y mh ) m∈Z is in Bustos (1982) an infinite-order moving average process whose noise is Φ-mixing which is in general not satisfied for a sampled stationary CARMA process. However, we already know from Marquardt and Stelzer (2007, Proposition 3.34) that a stationary CARMA process is exponentially strongly mixing which is weaker than Φ-mixing. Therefore, we assume that (V m ) m∈Z , (Z m ) m∈Z and (W m ) m∈Z are exponentially strongly mixing instead of Φ-mixing as in Bustos (1982).
In the following we define GM-estimators. Let two functions φ : is defined as the solution of the equations for (π 1 , . . . , π r , σ ) ∈ R r × (0, ∞). The idea is again that these are the parameters of the auxiliary AR representation of (Y γ mh (ϑ )) m∈Z . Note that π GM (ϑ γ ) depends on the processes (V m ) m∈Z and (Z m ) m∈Z as well. We choose not to indicate this in the notation to make the exposition more readable. For the uncontaminated process (Y mh (ϑ )) m∈Z we also write π GM (ϑ ) instead of π GM (ϑ 0 ). Now, the GMestimator π GM n (ϑ γ ) = ( π GM n,1 (ϑ γ ), . . . , π GM n,r (ϑ γ ), σ GM n (ϑ γ )) based on φ and χ is defined to satisfy Throughout the paper we assume that there exists a solution of (4.3) although this is not always the case in practice.
(a) There are two main classes of GM-estimators, the so-called Mallows estimators and the Hampel-Krasker-Welsch estimators. More information on them can be found in Bustos (1982); Denby and Martin (1979); Martin (1980); . In the literature, this kind of estimators sometimes appear under the name BIF (for bounded influence) estimators. The class of Mallows estimators are defined as φ (y, u) = w(y)ψ(u), where w is a strictly positive weight function and ψ is a suitably chosen robustifying function. The Hampel-Krasker-Welsch estimators are of the form where w is a weight function and ψ is again a suitably chosen bounded function.
(b) Typical choices for ψ are the Huber ψ k -functions (cf. Maronna et al. (2006, Eq. (2.28))). Those functions are defined as ψ k (u) = sign(u) min{|u|, k} for a constant k > 0. A possibility for w is, e.g., w(y) = ψ k (|y|)/|y| for a Huber function ψ k . Another choice for ψ is the so-called Tukey bisquare (or biweight) function which is given by where k is a tuning constant.
(c) For the function χ, a possibility is χ( with the same ψ function as in the definition of φ . The random variable Z is suitably distributed. In order to develop an asymptotic theory and to obtain a robust estimator it is necessary to impose assumptions on φ and χ which we will do next analogous to Bustos (1982, (E1) -(E6)): Assumption E. Suppose φ : R r × R → R and χ : R → R satisfy the following assumptions: (E.1) For each y ∈ R r , the map u → φ (y, u) is odd, uniformly continuous and φ (y, u) ≥ 0 for u ≥ 0.
(E.2) (y, u) → φ (y, u)y is bounded and there exists a c > 0 such that u is non-increasing for y ∈ R r and there exists a u 0 ∈ R such that φ (y,u 0 ) (E.6) χ is bounded and increasing on {x : −a ≤ χ(x) < b} where b = sup x∈R χ(x) and a = −χ(0). Furthermore, χ is differentiable and x → xχ ′ (x 2 ) is continuous and bounded. Lastly, χ(u 2 0 ) > 0. In the remaining of this section we always assume that Assumption D and E are satisfied.
In general it is not easy to verify that π GM (ϑ γ ) is unique. Additionally, one would like to have that π GM (ϑ 0 ) = π GM (ϑ ) = π(ϑ ) are the parameters of the auxiliary AR(r) model in the case that the GM-estimator is applied to realizations of an uncontaminated sampled stationary CARMA process (Y mh (ϑ )) m∈Z . The following proposition gives a sufficient condition. (4.4) Assume further that the function u → φ (y, u) is nondecreasing and strictly increasing for |u| ≤ u 0 , where u 0 satisfies Assumptions (E.3) and (E.6), and the function χ is chosen in such a way that Then the auxiliary parameter π(ϑ ) as defined in Theorem 2.2 is the unique solution of (4.2), i.e., π GM (ϑ 0 ) = π(ϑ ).
The monotonicity assumption on φ is valid, e.g., for both the Mallows and Hampel-Krasker-Welsch estimators when the function ψ is chosen as a Huber ψ k -function with u 0 = k.
(c) The assumption on χ is fulfilled, e.g., if χ is chosen as in Example 4.4(c) with Z D = U 1 (ϑ )/ Var(U 1 (ϑ )). In the case that the driving Lévy process is a Brownian motion this means that Z ∼ N (0, 1).
The proof goes in the same vein as the proof of Bustos (1982, Theorem 2.1) and is therefore omitted. Next, we would like to deduce the asymptotic normality of the GM-estimator. Let the set K be given as in Remark 4.5 and for π = (π 1 , . . . , π r , σ ) ∈ K define (4.6) For the proof of the asymptotic normality of the GM estimator we use a Taylor expansion of Q GM (π, ϑ γ ) at π GM (ϑ γ ). With the knowledge of the asymptotic behavior Q GM ( π GM n (ϑ γ ), ϑ γ ) and ∇ π Q GM ( π GM n (ϑ γ ), ϑ γ ) it is then straightforward to derive the asymptotic behavior of the GM-estimator π GM n (ϑ γ ).
We need the following auxiliary result which is the analog of Bustos (1982, Lemma 3.1) under our different model assumptions.
First, we derive the asymptotic behavior of the gradient ∇ π Q GM (π n , ϑ γ ).
Lemma 4.11. Let Q GM (π, ϑ γ ) be defined as in (4.6) and suppose that ∇ π Q GM (π, ϑ γ ) is non-singular. Furthermore, let I GM (ϑ γ ) be given as in (4.7) and suppose that π GM n (ϑ γ ) The following analog version of Bustos (1982, Theorem 2.2) holds in our setting which gives the asymptotic normality of the GM-estimator.

Asymptotic normality
In Section 3 we already introduced the indirect estimator and presented in Theorem 3.2 sufficient criteria for the indirect estimator to be consistent and asymptotically normally distributed. In the following we want to show that these assumptions are satisfied in the setting of discretely sampled CARMA processes when we use as estimator π S n (ϑ ) in the simulation part the least-squares-(LS-) estimator π LS n (ϑ ) and for π n the GM-estimator π GM n (ϑ 0 ).
Remark 5.2. The quasi ML-function for the auxiliary AR(r) parameters of the discretely sampled CARMA process is defined as and the quasi ML-estimator as π QMLE sn (ϑ ) = arg min π∈Π ′ L QMLE (π, Y sn S (ϑ )). It is well known that for the estimation of AR(r) parameters the ML-estimator and the LS-estimator are equivalent (this can be seen by straightforward calculations taking the derivatives of the ML-function L QMLE which are proportional to the derivatives of L LS ).
We have already proven that (C.1) and (C.4) of Theorem 3.2 are satisfied. To show the remaining conditions on the LS-estimator π LS n (ϑ ) we require several auxiliary results. The remaining of this section is devoted to that.
Sufficient conditions for (C.2) and (C.5) are the weak uniform convergence of the LS-estimator and its derivatives. Since the LS-estimator is defined via the sample autocovariance function we first derive the uniform weak convergence of the sample autocovariance function and its derivatives.
Then the proof of (C.2) follows from Proposition 5.5.
A direct consequence from this is the next corollary.
Corollary 5.6. Let ϑ n be a sequence in Θ with ϑ n P → ϑ 0 . Then the following statements hold: This corollary already gives (C.5).
Finally, (C.3) is a consequence of Proposition 5.7 which gives the asymptotic normality of the the LS-estimator. In principle this follows from Theorem 4.12 by interpreting the least squares estimator as a particular GM-estimator with φ (y, u) = u and χ(x) = x − 1.

Robustness properties
Roughly speaking an estimator is robust when small deviations from the nominal model have not much effect on the estimator. This property is known as qualitative robustness or resistance of the estimator and was originally introduced in Hampel (1971) for i.i.d. sequences. The same article also gives a slight extension to the case of data that are generated by permutation-invariant distributions, introducing the term π-robustness (Hampel ( , p.1893). Of course, time series do not satisfy the assumption of permutation invariance in general. Therefore, there have been various attempts to generalize the concept of qualitative robustness to the time series setting. Boente et al. (1987, Theorem 3.1) prove that their π d n -robustness for time series is equivalent to Hampel's π-robustness for i.i.d. random variables and therefore, extends Hampel's π-robustness. They go ahead and define the term resistance as well. The concept of resistance has the intuitive appeal of making a statement about changes in the values of the estimator when comparing two deterministic samples. In contrast, π d n -robustness is only a statement concerning the distribution of the estimator, which is in general not easily tractable. The indirect estimator is weakly resistant and π d n -robust. The explicit definitions and the derivation of these properties for our indirect estimator are given in Section 9.1 of the Supporting Information.
Intuitively speaking, the influence functional measures the change in the asymptotic bias of an estimator caused by an infinitesimal amount of contamination in the data. This measure of robustness was originally introduced as influence curve by Hampel (1974) for i.i.d. processes. It was later generalized to the time series context by  who explicitly studies the estimation of autoregressive processes. However, in the paper of Künsch only estimators which depend on a finite-dimensional marginal distribution of the data-generating process and a very specific form of contaminations are considered. To remedy this, a further generalization was then made by  who consider the influence functional and explicitly allow for the estimators to depend on the measure of the process which makes more sense in the time series setup (cf. Martin and Yohai (1986, Section 4)). In the sense of Martin and Yohai (1986, Section 4) the indirect estimator has a bounded influence functional; see Section 9.2 in the Supporting Information.
The breakdown point is (for a sample of data with fixed length n) the maximum percentage of outliers which can be contained in the data without "ruining" the estimator. In this sense, it measures how much the observed data can deviate from the nominal model before catastrophic effects in the estimation procedure happen. However, the formal definition depends on the model and the estimator. Maronna and Yohai (1991) and Maronna et al. (1979) deal explicitly with the breakdown point of GM-estimators in regression models and Martin and Yohai (1985) and Martin (1980) study it in the time series context. A very general definition of the breakdown point is given in Genton and Lucas (2003, Definition 1 and Definition 2). Heuristically speaking, the fundamental idea of that definition is that the breakdown point is the smallest amount of outlier contamination with the property that the performance of the estimator does not get worse anymore if the contamination is increased further. As already mentioned in Martin (1980, p. 239) (the proof is given in the unpublished paper of Martin and Jong (1977)), and later in de Luna and Genton (2001, p. 377) and Genton and Lucas (2003, p. 89), the breakdown point of the GM-estimator applied to estimate the parameters of an AR(r) process is 1/(r + 1). Hence, the breakdown point of our indirect estimator is as well 1/(r + 1) since the other building block of the indirect estimator, the estimator π S n (ϑ ) is applied to a simulated outlier-free sample.

Simulation study
We simulate CARMA processes on the interval [0, 1000] and choose a sampling distance of h = 1, resulting in n = 1000 observations of the discrete-time process. The simulated processes are driven either by a standard Brownian motion or by a univariate NIG (normal inverse Gaussian) Lévy process. The increments of a NIG-Lévy process L(t) − L(t − 1) have the density µ ∈ R is a location parameter, α ≥ 0 is a shape parameter, β ∈ R is a symmetry parameter and K 1 is the modified Bessel function of the third kind with index 1. The variance of the process is then For the NIG Lévy process we use the parameters α = 3, β = 1, δ = 2.5145 and µ = −0.8890. These parameters result in a zero-mean Lévy process with variance approximately 1 which allows for comparison of the results to the standard Brownian motion case. For the outlier model we choose additive outliers where the process (V m ) m∈Z is a sequence of i.i.d. Bernoulli random variables with P(V 1 = 1) = γ. The process (Z m ) m∈Z is Z m = ξ for m ∈ Z where ξ and γ take different values in different simulations.
The indirect estimator is defined as in Section 5. We take π n as GM-estimator π GM n (ϑ 0 ) using the R software which provides the pre-built function arGM in the package robKalman for applying GMestimators to AR processes. This function uses a Mallows estimator as in Example 4.4(a). The weight function w(y) is the Tukey bisquare function from Example 4.4(b) applied to y , for the function ψ(u) the user can choose between the Huber ψ k -function and the bisquare function. The function is implemented as an iterative least squares procedure as described by Martin (1980, p. 231ff.). We do 6 iterations using the Huber function and then 50 iterations with the bisquare function, which is the maximum number of iterations where the algorithm stops earlier if convergence is achieved. In our experiments we use k = 4 for the tuning constant of the ψ k -function. In general, we set s = 75 to obtain the simulation-based observations Y sn S (ϑ ) = (Y S h (ϑ ), . . . ,Y S snh (ϑ )) in the simulation part of the indirect procedure. The type of Lévy process used for the simulation part is of the same type as the Lévy process driving the CARMA process. For the estimator π S n (ϑ ) we apply the least squares estimator and as weighting matrix Ω we take the identity matrix for convenience reasons. In some experiments we first estimated the asymptotic covariance matrix of the GM-estimator by the empirical covariance matrix of a suitable number of independent realizations of π n . Setting Ω as the inverse of that estimate did not significantly affect the procedure positively or negatively so that the use of the convenient identity matrix seems justified. In each experiment, we calculate the indirect estimator and, for comparison purposes, the QMLE as defined in Schlemm and Stelzer (2012). For the indirect estimator as well for the QMLE we use 50 independent samples and report on the average estimated value, the bias and the empirical variance of the parameter estimates.  Table 6.1: Indirect estimation of a CARMA(1, 0) process with parameter ϑ 0 = −2 driven by a Brownian motion with n = 1000.
First, CARMA(1,0) processes with parameter ϑ 0 ∈ (−∞, 0) are studied where A ϑ 0 = ϑ 0 and c ϑ 0 = 1. These processes are of particular interest because their discretely sampled version admit an AR(1) representation. For this reason, one would expect the indirect procedure to work very well as the auxiliary representation is actually exact. Initially, in Table 6.1, we estimate contaminated and uncontaminated CARMA(1,0) processes with ϑ 0 = −2 driven by a Brownian motion using in the indirect estimation method an auxiliary AR(r) process with r = 1, 2, 3. For uncontaminated CARMA(1,0) processes the parameter r = 1 gives the lowest absolute bias where the variance is the highest. However, the bias and the variances are very similar. By contrast with contaminated CARMA(1,0) processes, if we increase r the variances increase. That is not surprising because r = 1 reflects the true model and including more parameters than necessary results in more estimation errors. As well the bias is quite low for r = 1.
Next, in Table 6.2, we compare the indirect estimator with r = 1 and the QMLE for a Brownian motion driven CARMA(1, 0) process with either ϑ 0 = −2 or ϑ 0 = −0.2 . For ϑ 0 = −0.2 the CARMA(1, 0) process is not so far away from a non-stationary process. In both cases we see that the QMLE and the indirect estimator work quite well for uncontaminated CARMA(1,0) processes (top of Table 6.2). The QMLE has a lower variance in both cases where for ϑ 0 = −2 the absolute bias of the indirect estimator and for ϑ 0 = −0.2 the bias of the QMLE is lower. But still for both estimation procedures the values are comparable. If we allow additionally outliers in the CARMA(1,0) model,  Table 6.2: Estimation results for CARMA(1, 0) processes with parameter ϑ 0 driven by a Brownian motion with n = 1000 and r = 1.
already in the case ξ = 5 and γ = 0.1, the indirect estimator performs vastly better than the QMLE giving a much less biased estimate and lower variance. For ξ = 10 and γ = 0.1 the QMLE is far away from the true values where the indirect estimator still gives good results. Increasing γ to 0.15 but keeping ξ = 5 shows that both estimators perform worse than in the situation with γ = 0.1, which is to be expected. But once again, the indirect estimator gives excellent results. However, the QMLE runs much faster than the indirect estimator.
In a further study we investigate CARMA(3,1) processes. This especially means that the sampled process is not a weak AR process anymore. The true parameter is ϑ 0 = ϑ 1 ϑ 2 ϑ 3 ϑ 4 ϑ 5 such that For the CARMA(3,1) model we choose r = 5, which is also the minimum order of the auxiliary AR representation to satisfy Assumption B. We also tried different values of r but they didn't give better results (see Table 11.1 in the Supporting Information). In contrast, for contaminated CARMA(3,1) processes it seems that the absolute bias and variance are the lowest for r = 5.  Table 6.3: Estimation results for an uncontamined CARMA(3, 1) process with parameter ϑ 0 = (ϑ 1 , ϑ 2 , ϑ 3 , ϑ 4 , ϑ 5 ) driven by a Brownian motion with r = 5.
In the first instance, we compare the QMLE and the indirect estimator for uncontaminated CARMA(3, 1) processes in Table 6.3 and Table 6.4. In Table 6.3 the driving Lévy process is a Brownian motion where in Table 6.4 it is a NIG-Lévy process. The results are very similar. The QMLE has in general a lower variance than the indirect estimator. For n = 200 and n = 1000 it seems as well that the QMLE has a lower absolute bias. But for n = 5000 this changes and the indirect estimator has a lower absolute bias. However, both estimator perform excellent. For the Brownian motion driven model the QML optimization failed for n = 200, 1000 and 5000 in 5, 4, and 5 cases, respectively, where for the NIG driven model it failed in 6, 5, and 2 cases, respectively. The indirect estimator never failed. The error occurs when the estimated value of ϑ 0 is not an element of Θ anymore. The  Table 6.4: Estimation results for an uncontaminated CARMA(3, 1) process with parameter ϑ 0 = (ϑ 1 , ϑ 2 , ϑ 3 , ϑ 4 , ϑ 5 ) driven by a NIG Lévy process with r = 5.
results in the table are averaged over experiments in which the algorithm did deliver a result, the failed attempts were discarded. We obtained similar results as in Table 6.3 and Table 6.4 for different parameter values. Further, for a Brownian motion driven CARMA(3,1) process we estimate ϑ 0 for each of the following contamination configurations in Table 6.5 (see as well Table 11.2 in the Supporting Information for different values of n): ξ = 5 and γ = 0.1, ξ = 10 and γ = 0.1, ξ = 5 and γ = 1/6, and ξ = 5 and γ = 0.25. In this situation, the breakdown point has an upper bound of 1/6 since we have r = 5. Hence, γ = 0.25 lies above the breakdown point and we expect to encounter problems in the estimation procedure, while for γ ≤ 1/6 these problems should not occur. This is indeed the case.
For the first two experiments, where γ = 0.1, we immediately recognize the maximum likelihood estimate is severely biased and far from the true parameter value. Especially the inclusion of a zero component in the true parameter seems to pose a major problem since this component is affected by the most bias. On the other hand, the indirect estimator is still very close to the true parameter value in all components including the zero component. Increasing ξ to 10 while keeping γ = 0.1 results in a very similar performance of the indirect estimator. The increase of γ from 0.1 to 1/6 also affects the performance of the indirect estimator. For all components of ϑ 0 the absolute bias and the variance of the indirect estimator increase. However, the loss in quality of the indirect estimator is manageable and the calculated estimates still resemble the true parameter. This means that even at the breakdown point of 1/6, the performance of the indirect estimator is satisfying, although of course not as good as for lower contamination probabilities.  Table 6.5: Estimation results for a CARMA(3, 1) process with parameter ϑ 0 = (ϑ 1 , ϑ 2 , ϑ 3 , ϑ 4 , ϑ 5 ) driven by a Brownian motion with n = 1000 and r = 5.
The situation is vastly different in the experiment with γ = 0.25 where γ is above the breakdown point. Here, we see not surprisingly that the indirect estimator, too, gives estimates which are severely biased and quite far away from the true parameters. We also observe that the numerical procedure used to obtain the parameter estimates quite often fails to deliver a result. The ratio of successful to unsuccessful experiments is roughly equal to 1:2, i.e., the algorithm failed about twice as often as it succeeded. In this sense, we can say that the estimator has broken down: for a given outliercontaminated sample, it either does not return an admissible estimate at all, or, if it does, the estimate is far away from the true parameter. The latter statement is also evident from the fact that the variances of the indirect estimates are far smaller in this case than in other experiments which intuitively means that the algorithm typically returns very similar bad estimates if it returns a result at all.

Conclusion
In this paper we presented an indirect estimation procedure for the parameters of a discretely observed CARMA process by estimating the parameters of its auxiliary AR(r) representation using a GMestimator. Since there does not exist an explicit form of the map between the AR parameters and the CARMA parameters, an additional simulation step to get back from the AR parameters to the CARMA parameters was necessary. Sufficient conditions were given such that the indirect estimator is consistent and asymptotically normally distributed, on the one hand, in a general context, but on the other hand, as well for the special case where π n = π GM n (ϑ 0 ) and π S n (ϑ ) = π LS n (ϑ ). Moreover, the indirect estimator satisfies different robustness properties as weakly resistant, π d n -robustness and it has a bounded influence functional.
Summarizing the simulation studies, the indirect estimator performs convincingly for various orders p and q of the CARMA process, for different driving Lévy processes and for a variety of outlier configurations. The QMLE failed in some simulations but the indirect estimator could be used always. In contrast to the QMLE, the indirect estimator is robust against outliers where the QMLE is severely biased. For uncontaminated CARMA processes the indirect estimator is less biased for large n where for small n it is opposite. But in this situation both estimators work quite well. Obviously the bias in the indirect estimation procedure can be decreased by using in the estimation and in the simulation part the same type of estimator because then the bias from the estimation part and the simulation part are cancelling out (cf. Gouriéroux et al. (2000Gouriéroux et al. ( , 2010). But proving the asymptotic normality of the indirect estimator using as well in the simulation part the GM estimator is involved and topic of some future research.

Proofs of Section 3
Proof of Theorem 3.2. (a) We first start by proving the consistency. With the definition of Q Ind we obtain The four summands on the right-hand side converge in probability to zero as n → ∞. For the first one, this is a consequence of (C.1). For the remaining three terms, the arguments are similar, so that we treat only the second one exemplary. We have Here, we used the fact that sup ϑ ∈Θ π(ϑ ) is finite due to the continuity of the map π and the compactness of Θ as well as both (C.1) and (C.2). Therefore, the function L Ind (ϑ , Y n ) converges uniformly in ϑ in probability to the limiting function Q Ind (ϑ ). Per construction, ϑ Ind n minimizes L Ind (ϑ , Y n ) and Q Ind (ϑ ) has a unique minimum at ϑ = ϑ 0 . Therefore, weak consistency of ϑ Ind n follows by arguing as in the proof of Schlemm and Stelzer (2012, Theorem 2.4); although in their proof convergence in probability is replaced by almost sure convergence, this doesn't matter because we can use the subsequence criterion which says that a sequence converges in probability iff any subsequence has a further subsequence which converges almost surely.
The proof of strong consistency goes similarly by replacing convergence in probability by almost sure convergence.
Proof of Lemma 4.9. By the Cramer-Wold device, the statement of the lemma is equivalent to the assertion that 1 √ n−r x T ∑ n−r k=1 Ψ k (ϑ γ ) converges to a univariate normal distribution with mean 0 and variance x T I GM (ϑ γ )x for every x ∈ R r+1 . According to Ibragimov (1962, Theorem 1.7), this holds if we can show that and that (x T Ψ k (ϑ γ )) k∈N is strongly mixing with mixing coefficients α x T Ψ (ϑ γ ) (m) satisfying The same theorem then also states that x T I GM (ϑ γ )x < ∞ from which we then deduce that for i, j ∈ {1, . . . , r + 1} the entry [I GM (ϑ γ )] i j is finite and therefore, I GM (ϑ γ ) is well-defined. We start to show the existence of the (2 + δ )-th moment of x T Ψ k (ϑ γ ) in (8.7). Therefore, note that where the last inequality holds since Ψ k,i (ϑ γ ) is bounded by (E.2) and (E.6). Finally, the process (Y γ mh (ϑ )) m∈Z is strongly mixing and the mixing coefficients satisfy the above condition (8.8) for the following reason. Either we have in the case of replacement outliers that Y γ mh (ϑ ) = G(V m , Z m ,Y mh (ϑ )) for some measurable function G and the three processes (V m ), (Z m ) and (Y mh (ϑ )) are independent, or in the case of additive outliers we have Y γ mh (ϑ ) = G(V m ,W m ,Y mh (ϑ )) for some measurable function G and the three processes (V m ), (W m ) and (Y mh (ϑ )) are independent. Hence, by Bradley (2007, Theorem 6.6(II)), Assumption (D.2) and Marquardt and Stelzer (2007, Proposition 3.34) we receive ≤ Cρ m for some C > 0 and ρ ∈ (0, 1). Furthermore, Ψ k (ϑ γ ) depends only on the finitely many values Y γ kh (ϑ ), . . . ,Y γ (k+r)h (ϑ ) and by Bradley (2007, Remark 1.8(b)) this ensures that α Ψ (ϑ γ ) (m) ≤ α Y γ (ϑ ) (m + r) ≤ Cρ m . Thus, the strong mixing coefficients α x T Ψ (ϑ γ ) (m) of x T Ψ (ϑ γ ) satisfy the summability condition (8.8) and the lemma is proven.
Proof of Lemma 4.10. Note, first that for i, j = 1, . . . , r, due to Assumption (E.4) and the boundedness of 1/σ on the compact set K. By Assumption (D.1) and (A.3) the expectation on the right-hand side is finite. Similarly, The expectation on the right-hand side is finite due to Assumption (E.5). Similar arguments, using Assumption (E.6), also show that are uniformly dominated by integrable random variables.
Therefore, by (Billingsley, 1999, Theorem 16.8(ii)) (that is an application of dominated convergence) ∇ π Q GM (π, ϑ γ ) exists on K and the order of differentiation and expectation can be changed.
Proof of Lemma 4.11. We use the decomposition The first term is of order o P (1) due to (Bustos, 1982, Lemma 3.5) (cf. Kimmig (2016, Lemma A.5) in our setting). The second term converges to N (0, I GM (ϑ γ )) due to Lemma 4.9. Hence, we receive the statement.

Proofs of Section 5
For the ease of notation we write in the following for the Lévy process (L S t ) t∈R shortly (L t ) t∈R and hence, assume that E|L 1 | 2N * for some 2N * > max(N(Θ ), 4 + δ ); similarly (Y S t ) t∈R is (Y t ) t∈R .
Lemma 8.1. Define for any ϑ ∈ Θ the function f ϑ (u) = c T ϑ e A ϑ u e p 1 [0,∞) (u) and Then P-a.s. we have which is strongly mixing and ergodic.
The proof is moved to Section 10 in the Supporting Information.
Proof of Proposition 5.4. (a) First, we prove the pointwise convergence of the sample autocovariance function and second, that γ ϑ ,n (l, j) is locally Hölder-continuous which results in a stochastic equicontinuity condition. Then we are able to apply Pollard (1990, Theorem 10.2) which gives the uniform convergence.
Applying Proposition 5.4(b) we receive that Then the same arguments as in (a) and (8.12)-(8.14) lead to statement (b).
(c) The proof goes in analog lines as in (a) and (b).
Proof of Corollary 5.6. (a) We use the upper bound The first term converges in probability to 0 due Proposition 5.5(a) and the second term because π(ϑ ) is continuous (see Lemma 2.5) and ϑ n P → ϑ 0 . The proof of (b,c) goes on the same way.
Proof of Proposition 5.7. Due to Proposition 5.5(a) we already know that the LS-estimator π LS n (ϑ ) is consistent. The asymptotic normality of π LS n (ϑ ) follows in principle from Theorem 4.12 by interpreting the least squares estimator as a particular GM-estimator with φ (y, u) = u and χ(x) = x − 1. An assumption of Theorem 4.12 is that the Jacobian J GM (ϑ ) = ∇ π Q GM (π(ϑ ), ϑ ) is non-singular. For the LS-estimator this can be verified by direct calculations because Hence, J LS (ϑ ) is non-singular if and only if the upper left r × r block is non-singular. However, the upper left block is up to a positive factor the covariance matrix of the random vector (Y h (ϑ ), . . . ,Y rh (ϑ )) which is non-singular (cf. proof of Proposition 2.2). Still, we need to be careful because the function φ and χ do not satisfy Assumptions (E.2), (E.4) and (E.6) with respect to boundedness. However, a close inspection of the proof of Theorem 4.12 reveals that the boundedness is only used at two points. First, in Lemma 4.9 where we deduce the finiteness of the expectation in (8.9). However, for the LS-estimator Therefore, inequality (8.9) follows since the Lévy process (L t ) ∈R has finite (4 + δ )-th moment which then transfers to (Y t (ϑ )) t∈R by (Marquardt and Stelzer, 2007, Proposition 3.30) and subsequently the (2 + δ /2)-moment of Ψ k,i (ϑ ). Second, the boundedness assumptions are used in the proof of Lemma 4.10 to deduce the existence of ∇ π Q LS (π, ϑ ) and its continuity. But by the above calculations ∇ π Q LS (π, ϑ ) exists obviously and is continuous.

Supporting Information 9 Robustness properties of the indirect estimator
In this section we study the robustness properties of the indirect estimator for the CARMA parameters of (Y mh ) m∈Z where we assume that π n = π GM n (ϑ 0 ) is the GM-estimator satisfying Assumptions A, B, E, (D.3) for (Y mh ) m∈Z and that π(ϑ 0 ) is the unique solution of (4.2) for (Y mh ) m∈Z (a sufficient criterium for this is given in Proposition 4.6). Moreover, we require similarly to (E.2): (E2') (y, u) → φ (y, u) is bounded and there exists c > 0 such that Under some mild assumptions on ψ(u) and w(y) both the Mallows estimator and the Hampel-Krasker-Welsch estimator satisfy these conditions (cf. Boente et al. ( , p.1305). For the simulation part we take some estimator π S n (ϑ ), e.g., the LS-estimator, such that as n → ∞, holds.

Resistance and qualitative robustness
To this end, let y be a (infinite-length) realization of the discretely sampled CARMA process (Y mh ) m∈Z . Formally, we can write that y = (y mh ) m∈N ∈ R ∞ , where R ∞ denotes the infinite cartesian product of R with itself. On this space, equipped with the Borel σ -field B ∞ we denote the set of all probability measures by P(R ∞ ). In the following, we denote for y ∈ R ∞ as above by y n = (y h , y 2h , . . . , y nh ) the vector of the first n coordinates. Finally, P Y (h) denotes the probability measure of the discrete-sampled CARMA process (Y mh ) m∈Z .
Definition 9.1. Let y ∈ R ∞ and let ( ϑ n ) n∈N be a sequence of estimators. Denote by ϑ n (z n ) the value of ϑ n when it is calculated using the deterministic realization z n ∈ R n .
(a) ( ϑ n ) n∈N is called resistant at y if for every ε > 0 there exists a δ > 0 such that denotes an open ball with center x and radius δ with respect to the metric d n (z n , w n ) = inf ε : #{i ∈ {1, . . . , n} : |z n i − w n i | ≥ ε} n ≤ ε .
(c) For Q ∈ P(R ∞ ) we say that ( ϑ n ) n∈N is strongly resistant at Q if Q y ∈ R ∞ : ( ϑ n ) n∈N is resistant at y = 1.
(d) ( ϑ n ) n∈N is called asymptotically strongly resistant at Q if Q y ∈ R ∞ : ( ϑ n ) n∈N is asymptotically resistant at y = 1.
(e) ( ϑ n ) n∈N is called weakly resistant at Q if for any ε > 0 there exists a δ > 0 such that (f) ( ϑ n ) n∈N is called asymptotically weakly resistant at Q if for any ε > 0 there exist a δ > 0 and N(ε) ∈ N such that With this definition at hand, we want to study the question whether our indirect estimator for the parameters of a CARMA processes is resistant. We will make use of the fact that the indirect estimator consists of two independent parts, the GM-estimator for the parameters of the auxiliary AR representation, which deals with possible outliers in the observations, and the outlier-free estimator of the AR parameters based on simulated data.
Proof. First of all, ( π GM n (ϑ 0 )) n∈N is asymptotically strongly resistant at P Y (h) . This follows from Boente et al. (1987, Theorem 5.1). The theorem requires that φ and χ fulfill Assumption E, (E2') and that the limiting equation has a unique solution, which we assumed. Moreover, (Y mh ) m∈Z is stationary and ergodic due to Marquardt and Stelzer (2007, Proposition 3.34) and fulfills (D.3).
Next, by Cox (1981, Lemma 5) π GM n (ϑ 0 ) is a continuous function of Y n for every n ∈ N. This in combination with the asymptotically strongly resistance of ( π GM n (ϑ 0 )) n∈N at P Y (h) implies the strong resistance due to Boente et al. (1987, Proposition 4.2).
Theorem 9.3. The indirect estimator ( ϑ Ind n ) n∈N is weakly resistant and asymptotically weakly resistant at P Y (h) .
(9.8) Finally, (9.5) and (9.8) result in To summarize, for n ≥ N(ε) we have P Y (h) (Ω 0 ( ε, δ ) ∩ Ω n ) ≥ 1 − ε, and for y ∈ Ω 0 ( ε, δ ) ∩ Ω n and z n , w n ∈ B δ (y n ) we have This gives the asymptotically weakly resistance of ( ϑ Ind n ) n∈N at P Y (h) . By definition, ϑ Ind n depends on Y n through a continuous function applied to π GM n (ϑ 0 ) and therefore, ϑ Ind n is a continuous function in Y n . This and the asymptotically weakly resistance of ( ϑ Ind n ) n∈N at P Y (h) imply the weakly resistance at P Y (h) by Boente et al. (1987, Proposition 4.2).
Remark 9.4. If the stronger version sup ϑ ∈Θ π S n (ϑ ) − π(ϑ ) → 0 P-a.s. holds then it is possible to show on a similar way that the indirect estimator ( ϑ Ind n ) n∈N is even strongly resistant. As already mentioned, one could also define qualitative robustness of a sequence of estimators by demanding that the distribution of the estimator does not change too much when the data is changed slightly. To make this notion explicit, we first define a pseudometric for measures on metric spaces.
This pseudometric is a key component of the definition of qualitative robustness.
Definition 9.6. Let d Θ be a metric on Θ and let ρ n be a pseudometric on P(R n ), n ∈ N. For P ∈ P(R ∞ ) denote by P n the n-th order marginal of P. Finally, P ϑ n ∈ P(Θ ) is the distribution of the estimator ϑ n under P n . Then the sequence of estimators ( ϑ n ) n∈N is called ρ n -robust at P if for every ε > 0 there exists a δ > 0 such that for every Q n ∈ P(R n ) with ρ n (P n , Q n ) < δ : As shown in Boente et al. (1987, Theorem 3.1), this is a direct generalization of the definition of π-robustness given by  for i.i.d. processes.
Theorem 9.7. On R n we define the metric d n (x n , y n ) = inf {ε : #{i : |x i − y i | ≥ ε}/n ≤ ε} and use the Prokhorov distance with respect to d n , π d n (P n , Q n ) = inf{ε > 0 : P n (A) ≤ Q n ({x n ∈ R n : d n (x n , A) < ε}) + ε ∀ A ∈ B(R n )} on B(R n ). Then the sequence of estimators ( ϑ Ind n ) n∈N is π d n -robust at P Y (h) .
Regarding the metric d n two points are close if all coordinates except a small fraction are close.
Proof. Due to Theorem 9.3 the estimator ( ϑ Ind n ) n∈N is weakly resistant at P Y (h) . A conclusion of Boente et al. (1987, Theorem 4.2(i)) is then the π d n -robustness of ( ϑ Ind n ) n∈N at P Y (h) .
In summary, we can say that our indirect estimator is weakly resistant at P Y (h) as well as π d n -robust. This is in contrast to, e.g., M-estimators, which are not qualitatively robust even in the case of linear regression (cf. Maronna and Yohai (1981, p.8)).

The influence functional
In addition to the assumptions given at the beginning of the Supporting Information we assume throughout this section that there exists a unique solution π GM (ϑ γ 0 ) of (4.2) for (Y γ mh ) m∈Z = (Y γ mh (ϑ 0 )) m∈Z for any 0 ≤ γ ≤ 1, and J GM (ϑ 0 ) is non-singular. We denote the probability measure associated to the distribution of (Y γ mh ) m∈Z by P γ Y (h) for 0 ≤ γ ≤ 1. Note that γ = 0 corresponds to the case where there are no outliers, i.e., we can observe the nominal process without error and then write P 0 Then, the definition of the influence functional for the GM-estimator is whenever this limit is well-defined. Note that the influence functional depends on the whole "arc" of contaminated measures {P γ Y (h) , 0 ≤ γ ≤ 1}. This is the most important difference to the definition used by , because in that paper the approximation P γ Y (h) = (1 − γ)P Y (h) + γν for some fixed ν ∈ P(R ∞ ) is used (Künsch (1984, Eq. (1.11))). The influence functional measures the effect of an infinitesimal contamination of the true process by the process (Z m ) on the asymptotic estimate defined via the functional T GM .
In a similar vein, we can define the influence functional for the estimation of the parameter ϑ 0 of our CARMA process. Analogous to T GM , we first define a suitable statistical functional This is the analog of ϑ 0 = arg min ϑ ∈Θ [π(ϑ ) − π(ϑ 0 )] T Ω [π(ϑ ) − π(ϑ 0 )] in the uncontaminated case (cf. (3.1)). With this and ϑ Ind 0 (0) = ϑ 0 due to π GM (ϑ 0 ) = π(ϑ 0 ) the definition of the influence functional of the indirect estimator is We are interested in properties of this functional, in particular, if it is bounded. Boundedness of the influence functional implies that the estimate arising from the contaminated process cannot move too far away from the one in the uncontaminated case if the rate of contamination is infinitesimal. This property is well-known for the influence functional for GM-estimators of AR processes. Since these estimators are an integral building block of the indirect estimator, one can hope that it carries over to our scenario and indeed it does, since the two functionals are proportional.
Proposition 9.8. Assume that ∇ ϑ π(ϑ 0 ) has full column rank N(Θ ). Then if IF GM exists then IF Ind exists and Proof. This follows from de Luna and Genton (2000, Theorem 1) as special case.
From this theorem we see that the question of the boundedness of the influence functional for the indirect estimator of a discretely sampled CARMA process reduces to the question of the boundedness of the influence functional for the GM-estimatior of the auxiliary AR representation of the sampled CARMA process.
Theorem 9.9. Let the additive outlier model be given and let Assumption D hold. Then there exists a constant K > 0 such that Proof. The plan is to apply Martin and Yohai (1986, Theorem 4.3). Therefore, we have to check that , Eq. (4.6)) hold. Sufficient conditions for this equation are given in Martin and Yohai (1986, Theorem 4.2) which are obviously satisfied in our case due to (E.2), (E.6) and due to the fact that π GM (ϑ γ 0 ) depends only on the distribution of the finite random vector (Y γ h , . . . ,Y γ (r+1)h ). For the same reasons and with our assumption that J GM (ϑ 0 ) is non-singular the other conditions in (Martin and Yohai, 1986, Theorem 4.3) are satisfied as well.
As well due to Martin and Yohai (1986, Theorem 4.3) it is possible to give an explicit (but not a very handy) representation of IF GM (P W , {P γ Y (h) }) and hence, though Proposition 9.8 for IF Ind (P W , {P γ Y (h) }).

Proof of Lemma 8.1
Before we state the proof we require some auxiliary results.

Finally, with
we obtain the result.