Generalized binary vector autoregressive processes

Vector‐valued‐60 extensions of univariate generalized binary auto‐regressive (gbAR) processes are proposed that enable the joint modeling of serial and cross‐sectional‐50 dependence of multi‐variate binary data. The resulting class of generalized binary vector auto‐regressive (gbVAR) models is parsimonious, nicely interpretable and allows also to model negative dependence. We provide stationarity conditions and derive moving‐average‐type representations that allow to prove geometric mixing properties. Furthermore, we derive general stochastic properties of gbVAR processes, including formulae for transition probabilities. In particular, classical Yule–Walker equations hold that facilitate parameter estimation in gbVAR models. In simulations, we investigate the estimation performance, and for illustration, we apply gbVAR models to particulate matter (PM10, ‘fine dust’) alarm data observed at six monitoring stations in Stuttgart, Germany.


INTRODUCTION
Categorical data are collected in many fields of applications. When such data are observed over time, serial dependence is often present that has to be taken into account, for example, for modeling purposes or for statistical inference. Hence, the statistical research focusing on such data structures evolved considerably over the last years. With the collection of huge amounts of data nowadays, this leads particularly to a growing interest in statistical methods for the analysis of multi-variate categorical time series. As an important special case, multi-variate binary time series, that correspond to categorical time series data with just two categories, occur in many different contexts such as agriculture, biology, economy, engineering, environmetrics, genetics, geography, geology, medical science, natural language processing or sports; see, for example, Jentsch and Reichmann (2019) for some univariate examples. For instance, recent related literature addresses the detection of dependent Bernoulli sequences in Ritzwoller and Romano (2020) or the efficient generation of high-dimensional binary data with specified correlation structures in Jiang et al. (2020). Often, binary time series are obtained from binarization of observed real-valued data, when, for example, the interest is, whether some event occurs (or not) or a certain threshold is crossed (or not) instead of the actual value. Although simplified, this transformation will generally contain a great amount of the information and the dynamics of the original data.
Multi-variate binary time series obtained from a suitable thresholding procedure are for instance of much interest in economics, where periods of recession and of economic growth (no-recession) are considered; see for example, Bellégo and Ferrara (2009) and Startz (2006) on forecasting recessions in the Euro area and in the United States respectively, and Candelon et al. (2012), who use multi-variate dynamic probit models to study predictive relationships in binary processes with applications to financial crises. Considering jointly such recession time series of several countries, a multi-variate analysis allows to study not only the serial, but also the cross-sectional dependence in the data. In turn, this allows to investigate the spillover effects between several countries, that is, how a recession in one country will affect the economy in other countries in the future. In Figure 1, we show quarterly time series indicating periods of recessions and of economic growth for the G7 countries Canada, France, Germany, Italy, Japan, United Kingdom, and the United States from Q2/1960 to Q1/2017.
In signal processing, large numbers of nodes, that is, inexpensive sensors, are employed to make binary decisions whenever a signal is above a certain threshold or not, see for example, Cheng et al. (2013). Hence, multiple two-state time series with states 'detection' and 'no detection' are observed. In such applications, binarization of the original signal is particularly beneficial as binary data are inexpensive to store.
In recent years, there is increasing interest in air pollution in European cities and metropolitan areas. The EU established the European emission standards, which include limits for particulates in the air. Whenever the amount of PM 10 (coarse particles with a diameter between 2.5 and 10 μm, 'fine dust') exceeds the threshold of 50 μg/m 3 at a monitoring station, this causes a 'fine dust alarm'. Hence, for each monitoring station, this results in a binary sequence with states 'exceedance' and 'no exceedance'. In fact, the current public discourse centers to a large extent around whether the threshold is exceeded or not, and less about the actual amount of fine dust measured. Typically, several monitoring stations in one city allow for a joint analysis of the fine dust pollution. In Figure 2, we show the recorded fine dust alarms at six monitoring stations in the urban area of Stuttgart, Germany from 1 March 2016 to 31 July 2018. The occurrences of alarms tend to cluster, but a closer inspection reveals that the station Neckartor often shows an alarm before the others. Hence, a multi-variate analysis of this pattern might be helpful to allow for an improved prediction of future exceedances. In Section 4, we discuss this data set in more detail.
Typically, Markovian models are used to describe the dependence structure of categorical time series, see for example, Kedem (1980). Such models are very flexible and allow to capture a broad range of serial dependence, but the number of parameters grows exponentially with the order of the Markov model. As indicated by McKenzie (2003), already in the univariate case, this likely leads to over-parametrization. For a K-dimensional multi-variate binary time series, this problem is even much more intricate as the fitting of an unrestricted Markov model of order p requires the estimation of 2 Kp parameters. Hence, Markov models are not feasible, when the time series dimension or the model order become large. Alternatively, regression-based approaches such as logit and probit models are useful to study binary data; see for example Fokianos and Kedem (2003). Multi-variate regression analysis of panel data with binary outcomes has been studied in Czado (2000). Using nonlinear vector autoregressive models for latent variables associated with correlated binary time-series data, multi-variate dynamic probit models have been considered in Candelon et al. (2012). Estimation inference and identification issues in multi-variate dynamic panel data logit models are considered by Honoré and Kyriazidou (2019a) and Honoré and Kyriazidou (2019b). Such models allow for parsimonious and flexible modeling of serial dependence in binary time series. However, in contrast to classical autoregressive models, estimation in multi-variate dynamic panel models is no longer explicit and may cause identification issues in small samples as discussed, for example, in Honoré Kyriazidou (2019a, 2019b.
In the following, we develop a flexible and nicely interpretable autoregressive model framework for binary time series data which satisfies classical Yule-Walker equations such that an explicit estimation of the model parameters via Yule-Walker estimation becomes possible.

The Univariate Case: NDARMA versus gbARMA
In the univariate case, to avoid the estimation of a large number of parameters, Jacobs and Lewis (1983) proposed the class of (New) Discrete Autoregressive Moving-Average (NDARMA) models for categorical time series. To make sure that the process (X t , t ∈ ℤ) takes only values contained in a discrete state space , their idea is to choose X t randomly either from the past values of the time series X t−1 , … , X t−p or from one of the innovations e t , e t−1 , … , e t−q with certain probabilities respectively. This random selection mechanism is described by i.i.d. random vectors (P t where Mult (1; ) denotes the multinomial distribution with parameter 1 and probability vector  ∶= (j) = 1. Then, the NDARMA(p, q) model equation is given by is an i.i.d. innovation process taking values in the state space . NDARMA models are contained as special cases in the broad class of Markov models, but are considerably more parsimonious and still nicely interpretable due to their ARMA-type structure. In this spirit, Weiß and Göb (2008) showed that Yule-Walker-type equations hold and Weiß (2009a) discussed the connection of NDARMA models to general Markov chains.

C. JENTSCH AND L. REICHMANN
It is important to note that the probability vector  of the multinomial distribution in (1.1) contains the NDARMA model parameters, which are naturally restricted to satisfy two conditions: all entries of  have to lie in the unit interval and they have to sum-up to one. Hence, in contrast to general Markov chains, NDARMA models are particularly restricted to capture exclusively non-negative serial dependence. To address this lacking flexibility of the NDARMA model class, Jentsch and Reichmann (2019) proposed a simple and straightforward extension of the original idea of Jacobs and Lewis (1983) that allows also to capture negative serial dependence in univariate binary time series. In the resulting generalized binary ARMA (gbARMA) model class, in contrast to NDARMA models, the parameters (i) and (j) are allowed to be either positive or negative. Precisely, gbARMA models allow for (1) , … , (p) ∈ (−1, 1) and (1) , … , (q) ∈ (−1, 1), with (0) ∈ (0, 1] , such that has to be modified to contain valid probabilities in [0,1] for the selection mechanism. We define As in NDARMA models, the random selection mechanism for gbARMA models is again described by Then, the gbARMA(p,q) process (X t , t ∈ ℤ) follows the model equation where (e t , t ∈ ℤ) is an i.i.d. innovation process which takes values in {0, 1}. In the above, we set A detailed description of the gbARMA model class and its properties can be found in Jentsch and Reichmann (2019).

An Example: NDAR(1) versus gbAR(1)
In a nutshell, the benefit of a gbARMA model in comparison to an NDARMA model for binary data, is that it allows to pick systematically the opposite value of a predecessor if the corresponding model parameter is negative. To illustrate this, let us consider the simplest case of a gbAR(1) model with  = [ , ] following ] and | | + = 1. Equation (1.5) can be rewritten to get ; see also Weiß (2009a, Remark 12.2.1.2).
In particular, for X = 1∕2, the lower bound of the possible range for becomes −1.
Similarly, in view of the operations a t X t−1 and a t ( 1 − X t−1 ) in (1.6), that take the value of the time series from the time point before or its opposite value, respectively, it seems plausible to combine them and allow both at the same time. That is, we could think of a model equation of the form 1], = ∈ (0, 1] and = + ≠ + = = 1 as well as E( t ) = ∈ (0, 1). However, the model (1.7) is not identified as it is indistinguishable from the gbAR(1) model (1.6) with We refer to the Supporting Information for a proof of (1.8) and (1.9).
In Figure 3, we show realizations and corresponding autocorrelation functions (ACRs) of a gbAR(1) process with positive parameter ∈ [0, 1) (i.e. an NDAR(1) process) and a gbAR(1) process with negative parameter ∈ (−1, 0). Whereas the NDAR(1) process with positive shows long runs of the same value and a non-negative ACR, the gbAR(1) process with negative shows an oscillating pattern and an alternating ACR. In both cases, we used P ( e t = 1 ) = 0.5 290 C. JENTSCH AND L. REICHMANN

Toward a Multi-variate Analysis: A Bivariate gbVAR(1)
In the case, when more than just one binary time series is observed, as, for example, for the G7 recession data in Figure 1 or the fine dust alarm data in Figure 2, a multi-variate (i.e. joint) analysis is desirable. For categorical time series data, Möller and Weiß (2020) proposed a multi-variate extension of the NDARMA class with (non-negative) scalar model parameters that control the multinomial selection mechanism. This approach restricts the flexibility of the resulting Generalized DARMA (GDARMA) class to model joint dependence to some large extent. Instead, GDARMA models make use of a variation function applied to lagged observations and innovations to increase the entry-wise variation over time. However, the proposed variation function does not modify the past time series values in a systematical way, neither is the resulting process suitable to capture any negative dependence structure.
To achieve more model flexibility, let us first consider the case of two independent gbAR(1) processes (X t , t ∈ ℤ) and (Y t , t ∈ ℤ). By stacking them, we get a bivariate process . (1.10) However, due to the diagonal structure of the (random) coefficient matrices, such a model in (1.10) is not yet sufficient to study the cross-sectional dependence between two binary time series. Naturally, this can be achieved by allowing the off-diagonal elements of the coefficient matrices in (1.10) to be non-zero. This leads to the bivariate gbVAR(1) model ) ( e t,1 e t,2 ) , (1.11) which can be compactly written as where (e t , t ∈ ℤ) is an i.i.d. innovation process taking values in {0, 1} 2 . Note that Cov(e t ) = Σ e is allowed to be non-diagonal, whereas B t is imposed to be diagonal for identification reasons; compare also Remark 2.13. For comprehensive discussions of multi-variate Bernoulli distributions allowing for dependence leading to non-diagonal Σ e , we refer to the classical article by Bahadur (1961) and the more recent work by Dai et al. (2013 The natural approach is to adopt row-wise the random (multinomial) selection mechanism leading to mutually independent i.i.d. vector-valued processes (P t,1• , t ∈ ℤ) and (P t,2• , t ∈ ℤ), where (1.14) in the above notation obtained by including the off-diagonal zeros of B t in P t,k• , but this allows to use the whole rows  |⋅|,k• as arguments of the multinomial distributions.

Outline
In the spirit of the gbVAR(1) model (1.11) as a natural extension of the univariate gbAR(1) model in (1.5), we provide a full description and investigation of the corresponding generalized binary vector AR (gbVAR) model class in this article. In Section 2, we define generalized binary VAR processes of order p ∈ ℕ and derive general stochastic properties including formulae for the mean vector, stationarity conditions, moving-average representations, geometric mixing properties, Yule-Walker equations and transition probabilities. Furthermore, we discuss possible extensions and identification issues and address parameter estimation in gbVAR models based on Yule-Walker estimation. In Section 3, we examine the finite sample performance of these parameter estimators by means of different criteria in simulations. For illustration, we use gbVAR models to analyze a six-dimensional binary time series that indicates fine dust alarms in the urban area of Stuttgart, Germany, in Section 4. We conclude in Section 5. All proofs, additional simulation results and an extension to gbVARMA processes by adding a moving-average part are deferred to the Supporting Information.

THE GBVAR MODEL CLASS
In Section 2.1, we define gbVAR processes as multi-variate extensions of (univariate) gbAR models as introduced by Jentsch and Reichmann (2019). The definition naturally extends the bivariate gbVAR(1) model (1.11) to arbitrary order p ∈ ℕ and dimension K ∈ ℕ. Stochastic properties are derived in Section 2.2. Identification issues and parameter estimation are discussed in Sections 2.3 and 2.4 respectively.

gbVAR Models
For a K-dimensional binary time series (X t , t ∈ ℤ), let the matrix contain the autoregressive coefficient matrices  (i) = ( (i) kl ) k,l=1,…,K , i = 1, … , p and  = diag( 11 , … , KK ) of a gbVAR(p) model. As the gbVAR model allows for (i) kl ∈ (−1, 1), the entries of  have to satisfy kl | + kk = 1, k = 1, … , K, and  has to be modified to serve as a parameter matrix containing (row-wise) valid probabilities of multinomial distributions. This is achieved by taking entry-wise absolute values in  and we define These prerequisites enable us to give the definition of the generalized binary vector AR model of order p ∈ ℕ.
which are independent of (e t , t ∈ ℤ) and (X s , s < t). Here, we set Then the process (X t , t ∈ ℤ) is said to be a generalized binary vector AR process of order p (gbVAR(p)), if it follows the recursion It is worth noting that the Definition 2.1 does not restrict the innovation distribution and explicitly allows for degenerate distributions of e t with singular Σ e caused, for example, by Var(e t,k ) = 0 for some k ∈ {1, … , K} or deterministic relationships between the e t,k , k = 1, … , K. In the same way, it does not exclude degenerate (joint) distributions of (X t , t ∈ ℤ). Merely, it only imposes stationarity of the process (X t , t ∈ ℤ). A necessary and sufficient conditions ensuring stationarity will be discussed below in Section 2.2. Nevertheless, by not excluding degenerate cases that correspond to parameters lying on the boundary of the parameter space, we encounter identification issues that will be discussed in detail in Section 2.3.
It is possible to rewrite the gbVAR(p) model according to the alternative presentation of the univariate gbAR(1) model in (1.6). However, for the multi-variate case, this becomes cumbersome and the main benefit of the presentation of the gbVAR model in (2.3) is the closed-form expression using an autoregressive-type model equation.
Note that X t is equal to the sum of lagged time series observations X t−i multiplied in a familiar fashion with (random) matrices (related to negative coefficients) plus an innovation-type term B t e t . In the following, we pick up the introductory example from Section 1.2 and consider a bivariate gbVAR(1) model in more detail to illustrate the class of gbVAR models. such that  |⋅| 4 = 2 holds. Hence, for the (mutually independent) multinomial selection mechanisms, we have Taking the negative signs of the entries in  into account, the gbVAR(1) process follows the model equation ) .
Hence, in the second dimension the opposite values of the predecessors X t−1,1 or X t−1,2 are selected, whenever a t,21 or a t,22 become 1 respectively. Suppose the innovation process (e t , t ∈ ℤ) consists of two independent Bernoulli processes (e t,1 , t ∈ ℤ) and (e t,2 , t ∈ ℤ) with e,1 = P(e t,1 = 1) = 0.4 and e,2 = P(e t,2 = 1) = 0.8 leading to Σ e = diag(0.24, 0.16). In Figure 4, we show a realization of the bivariate gbVAR(1) process with the above specification together with the corresponding serial and cross-sectional autocorrelation structure. By allowing for positive as well as negative entries in the non-diagonal parameter matrix , the gbVAR(1) model becomes rather flexible and allows to describe diverse serial and cross-sectional dependence structures.

Stochastic Properties of gbVAR Models
First, we consider the expectation of the random matrices A (+,i) ,i) . This enables us to compute the stationary mean vector X ∶= E of a gbVAR(p) process.

294
C. JENTSCH AND L. REICHMANN Lemma 2.3 (Stationary mean of gbVAR processes). Let (X t , t ∈ ℤ) be a stationary K-dimensional gbVAR(p) process. Then, we have (2.7) The latter result reflects the relationship between the mean vector of the time series X , the mean vector e of the innovation process and the autoregressive parameters . In comparison to the univariate NDAR(p) process (see e.g. Weiß (2009a)), additional matrices  (−,⋅) appear in the formula for the mean that correspond to potentially negative model parameters. Further note that, in contrast to univariate gbAR(p) processes discussed in Jentsch and Reichmann (2019), we do not get X = e in the special case when the parameter matrices in  contain exclusively non-negative entries, such that all  (−,j) vanish. This is due to the diagonal structure In view of Definition 2.1, which supposes the gbVAR(p) process (X t , t ∈ ℤ) to be stationary and to fulfill the gbVAR recursion (2.3), its stationary solution can be derived in form of a moving-average-type gbVMA(∞) process. As for classical AR processes, the case p = 1 allows for a direct approach to construct the moving-average representation by recursively plugging-in the gbVAR(1) model equation. For all d ∈ ℕ 0 , by recursively plugging-in (2.3), we get For gbVAR(p) processes of general order p ∈ ℕ, we follow the common approach described, for example, in Lütkepohl (2005, Chap. 11.3.2) to rewrite the K-dimensional gbVAR(p) process as a Kp-dimensional gbVAR(1) process (X t , t ∈ ℤ). Precisely, by defining the Kp-dimensional vectors ) ,

295
where I K denotes the K-dimensional unity matrix and 0 r×s the (r × s)-dimensional zero matrix, we get an autoregressive representation for the process (X t , t ∈ ℤ). That is, the K-dimensional gbVAR(p) process (X t , t ∈ ℤ) can be represented as a Kp-dimensional gbVAR(1) process (X t , t ∈ ℤ) as follows where Kp is the one vector of length Kp. Note that the first K entries of (X t , t ∈ ℤ) equal the gbVAR(p) process (X t , t ∈ ℤ). By exploiting the above representation (2.10) ofX t as a gbVAR (1) process, analogous to (2.8), we get on the right-hand sides of (2.8) and (2.11) respectively. For p = 1, a sufficient condition familiar from (causal) vector-valued autoregressive processes is and, for general p ∈ ℕ, Note that (2.13) is equivalent to the condition that all eigenvalues of |⋅| have modulus smaller than one, and to the condition that all roots of the characteristic matrix polynomial lie outside the unit circle, that is, (2.14) The following result extends Theorem 1 in Jentsch and Reichmann (2019) to the multi-variate case.
(i) If p = 1 and condition (2.12) holds, the gbVAR(1) model has a gbVMA(∞)-type representation converging in L 1 . (ii) If p ∈ ℕ and condition (2.14) holds, the gbVAR(p) model has a gbVMA(∞)-type representation In comparison to classical vector autoregressive processes, the moving-average representation of gbVAR processes contains an additional termÃ (−) parameters. If all entries in the parameter matrix  = [ (1) , … ,  (p) , ] are non-negative, this additional term vanishes. It is also important to note that the conditions in (2.12)-(2.14) are sufficient, but not necessary for the gbVAR process to be stationary as they rely on the modified coefficient matrices . The stationarity condition (2.12) for gbVAR(1) processes is illustrated in the following example.
The latter example illustrates that kk > 0 for all k = 1, … , K is a sufficient, but not a necessary condition for (2.12) (and also for (2.14)) to hold.
For the derivation of asymptotic theory, (strong) mixing concepts are very helpful to quantify the serial dependence structure of time series processes; see, for example Doukhan (1994), Dedecker et al. (2007), or Bradley (2005) for overviews. As given in Billingsley (1968), when defined on a suitable probability space (Ω, , P), a process (Z t , t ∈ ℤ) is called -mixing, if for all subsets  1 ∈ (Z t , Z t−1 , …) and  2 ∈ (Z t+h , Z t+h+1 , …) of the ( 2.17) If the right-hand side of the inequality (2.17) is replaced by f h P (  1 ) , we get the definition of the -mixing property. If f h ≤ const. h for all h ∈ ℕ, the process (Z t , t ∈ ℤ) is called geometrically -mixing or -mixing respectively.
In contrast to Jacobs and Lewis (1983) or Weiß (2009a), who employ a suitable Markov chain representation and take a detour by showing primitivity first to deduce (strong) mixing properties, we can make direct use of the moving-average-type representation derived in Theorem 2.4 for gbVAR(p) processes to prove (geometric) -and -mixing.
Note that gbVAR processes are of autoregressive-type, but they are nonlinear due to the random coefficient matrices. Nevertheless, we can show that the autocovariance structure of gbVAR processes coincides with that of classical VAR processes in the sense that the same Yule-Walker equations hold. For this purpose, we denote by Γ X (h) = Cov(X t+h , X t ), h ∈ ℤ, the corresponding autocovariance matrices of the gbVAR(p) process (X t , t ∈ ℤ). Theorem 2.7 (Yule-Walker equations for gbVAR(p) models, h > 0). Let (X t , t ∈ ℤ) be a K-dimensional gbVAR(p) process that fulfills (2.14). Then, for all h ∈ ℕ (with h ≠ 0), we have (2.18) leading to the system of Yule-Walker equations (2.19) By replacing the autocovariances Γ X (h) by sample versionsΓ X (h), Yule-Walker equations can be used for parameter estimation using the well-known Yule-Walker estimators. Before these will be addressed in Section 2.4, we will discuss possible identification issues in Section 2.3.
The derivation of a Yule-Walker-type equation for h = 0 is much more intricate. For this purpose, we will use Hadamard products denoted by '•', where A • B ∶= (a ij b ij ) i,j for two matrices A and B of the same dimensions.
Theorem 2.8 (Yule-Walker equation for gbVAR(p) models, h = 0). Let (X t , t ∈ ℤ) be a K-dimensional gbVAR(p) process that fulfills (2.14). Then, for h = 0, we have (2.20) Note that the formula derived in Theorem 2.8 contains the expression which is similar to the classical formula for the Yule-Walker equation for h = 0 of VAR(p) processes (see e.g. Lütkepohl, 2005, Section 11.4, here only Σ e  is replaced by Σ e ) plus some additional terms that contain "I K • · · ·". Note that these additional terms only adjust the diagonal entries. They show up due to the random coefficients, that control the selection mechanism in the gbVAR model. Alternatively, the last line in (2.20) can be rearranged to get (2.21) Note that the Hadamard multiplication in the last term on the right-hand side sets the diagonal of Σ e (which is already determined by e due to e,ii = e,i (1 − e,i ) to zero and that the first term on the right-hand side of (2.21) contains only  and e , but not Σ e . This is particularly useful to identify the off-diagonal elements of Σ e , which can be used for estimation purposes as discussed in Section 2.4.
Example 2.9 (Special cases of Theorem 2.8). For p = 1, the formula derived in Theorem 2.8 simplifies to become If additionally all entries in  (1) are non-negative, all terms containing  (−,1) = 0 K×K vanish and we can replace  (1) |⋅| by  (1) . This leads to In the following result, we derive expressions for one step ahead transition probabilities for gbVAR processes to reach a certain state in {0, 1} K given the past values of the time series.
Denote by ij = {i=j} the Kronecker delta and set p r 0 ∶= P ( e t = r 0 ) . Then, the transition probability given the past p values of the time series becomes Remark 2.11 (Dependent multinomial selection). In view of the mutually independent multinomial selection mechanisms P t,k• , k = 1, … , K, used to define gbVAR processes in Definition 2.1, a possible extension would be to allow for dependence between the multinomial distributions. Such an extension would not affect the Yule-Walker equations for h > 0 in Theorem 2.7, but those in Theorem 2.8 for h = 0. However, we do not follow this path as it would complicate things considerably and the benefit of such an extension of the gbVAR model class would be comparatively small. Furthermore, multinomial distributions allowing for dependence seem to be less developed; see, for example Johnson et al. (1997, chap. 36) for a rather restrictive attempt in this direction. Note that the scalar selection mechanism used for the GDARMA approach proposed in Möller and Weiß (2020) coincides with a setup of mutually dependent multinomial selection mechanisms with perfectly correlated coefficient matrices' diagonal entries.
For given p ∈ ℕ, using Theorem 2.7, the gbVAR(p) parameter matrices  (1) , … ,  (p) are identified from the autocovariance function (ACF) if and only if the Yule-Walker matrix is non-singular, whereX t is defined in (2.9), such that we can re-arrange (2.19) to get (2.25) 300 C. JENTSCH AND L. REICHMANN Now, let us discuss the case of a singular Yule-Walker matrixX ,p . It is singular if and only if there exists vectors g ∈ ℝ Kp , g ≠ 0 Kp , such that g ′X t is constant. AsX t takes values in {0, 1} Kp , such a vector g can always be chosen from {−1, 0, 1} Kp ∖{0 Kp } leading to g ′X t either equal to 0 or equal to 1 for all t. For example, if one component (X t,k , t ∈ ℤ) is constant over time, we have that e ′ k X t is constant in {0, 1}, where e k is the kth K-dimensional unit vector. Hence, for general p ∈ ℕ and asX t−1 contains X t−1,k , … , X t−p,k , we can find p many linearly independent vectors g that have only one entry equal to one and all others equal to zero leading to g ′X t constant in {0, 1}. If, for some h = 0, 1, … , p − 1, we have X t,i = X t−h,j or X t,i = 1 − X t−h,j for some i, j = 1, … , K, we can find a vector g having two non-zero entries such that g ′X t is constant equal to 0 or 1 for all t respectively. Suppose, we can find L ≤ Kp linearly independent vectors g 1 , … , g L with the property g ′X t constant. If L = Kp, we are in the purely deterministic case. If L < Kp, the underlying gbVAR(p) model is partially identified. After having identified L such linearly independent vectors g 1 , … , g L , we can find also vectors g L+1 , … , g Kp is of full rank Kp. Moreover, the g L+1 , … , g Kp can be chosen such that they have only one non-zero entry equal to 1 respectively. Together, we have where v L is a constant vector with 0 and 1 entries andỸ t is (Kp − L)-dimensional with Cov(Ỹ t ) having full rank.

Plugging-in forX t in the Yule-Walker matrix in (2.19) and multiplication from the right with
By definingỸ ,p = Cov(Ỹ t ) and AỸ ,p = [ (1) , … ,  (p) ]G −1 H ′ , the dimension-reduced K × (Kp − L) parameter matrix AỸ ,p is identified by Finally, we can obtain one candidate of [ (1) , … ,  (p) ] by right multiplying AỸ ,p with (G −1 H ′ ) ′ to get a K × Kp parameter matrix. In the case, where the matrix  is not invertible, which corresponds to at least one diagonal entry kk = 0 for some k = 1, … , K, the innovation distribution of those e t,k is not identifiable. However, we can rearrange (2.20) using (2.21) to isolate {( K×K − I K )•Σ e } on one side of the equation. Now, let M ≤ K be the number of zero diagonal elements of . If M = K, we are in the special case (see e.g. Example 2.5(iv)), where no innovations enter the model at all and the distribution of e t is clearly not identifiable.
If M < K, we define a (K − M) × K matrix F that picks those components k ∈ {1, … , K}, where the diagonal elements kk of  are non-zero such that FF ′ has full rank K −M. Then, isolating  e on one side of the equation in (2.7) and left-multiplication with F and inserting F ′ F leads to (2.28) with an obvious notation for ẽ andẽ. Since ẽ has full rank, we can left-multiply the whole equation with  −1 e to identifyẽ, which is the mean vector ofẽ which is defined to consist of those e t,k 's, where the corresponding kk 's are non-zero. This allows also to identify the diagonal elements of the covariance matrix Σẽ ofẽ. Furthermore, after re-arranging (2.20) using (2.21) to isolate {( K×K − I K )•Σ e } on one side of the equation, left-multiplication with F, right-multiplication with F ′ and inclusion of F ′ F twice, we can get on one side of the equation (2.29) Again, since ẽ has full rank, we can left-and right-multiply the whole resulting equation with  −1 e to identify also the non-diagonal elements of Σẽ of those e t,k . Note that this identification is indeed sufficient, as the innovations e t,k that are not identified, do not enter the gbVAR(p) at all.
The identification issues discussed above are illustrated in the following example.

Parameter Estimation in gbVAR Models
The joint distribution of a gbVAR process is fully determined by the marginal distribution of the i.i.d. innovations (e t , t ∈ ℤ) and by the model parameters in . In view of the second-order dependence structure of gbVAR processes, the Yule-Walker equations derived in Theorems 2.7 and 2.8 constitute an important link between the mean vector X , the autocovariance function Γ X , the gbVAR coefficients in , which are all population quantities of the process (X t , t ∈ ℤ), and mean vector e and covariance matrix Σ e , which are population quantities of the innovation process. Here, X and Γ X (h) are easily estimable from a gbVAR data sample X 1 , … , X n by their sample versionŝ (2.31) has a non-singular covariance matrix, we can use the Yule-Walker equation system (2.19) to construct the well-known Yule-Walker estimator [ (1) , … , (p) ] for [ (1) , … ,  (p) ] by replacing the ACF Γ X by the sample ACFΓ X . Otherwise, we have to apply the Yule-Walker estimation to the corresponding lower-dimensional process (Ỹ t , t ∈ ℤ) obtained as in Section 2.3.
(2.34) From Lemma 2.3, we get an estimator̂e for the innovation mean e by rearranging (2.7) and plugging-in the sample versions of  (i) ,  (−,j) ,  and X to get , p analogous to (2.6). In scenarios where (some) diagonal elementŝk k equal zero such that is no longer invertible, the corresponding e,k 's are not identified, but the remaining mean parameters are still identified via (2.28); see also Remark 2.14. The diagonal of Σ e , that is, I K •Σ e can be estimated byÎ (2.36) Moreover, by using the Yule-Walker equation (2.20) from Theorem 2.8, it is also possible to construct an estimator for the non-diagonal elements of Σ e , that is, for ( K×K − I K )•Σ e . Such an estimator is obtained by replacing the last line of (2.20) by (2.21), separating ( K×K − I K )•Σ e on one side of the equation (achieved by left-and right-multiplication with  −1 ) and replacing all population quantities on the other side of the equation by their sample versions proposed above. In cases, where  is not invertible due to (some) zero diagonal entries, the non-diagonal elements of Σẽ are still identified; see the discussion around (2.29). Finally, transition probabilities derived in Lemma 2.10 can be estimated in a similar fashion by replacing population quantities by the corresponding estimators to get in the case where the random vectors e t consist of mutually independent Bernoulli random variables. In the dependent case, the estimator for off-diagonal elements of Σ e , that is, ( K×K − I K )•Σ e , derived above can be used to allow also for dependent Bernoulli random variables to incorporate linear dependence.

Remark 2.14 (Estimation outside of the parameter space). Estimation of the gbVAR parameters
such that (at least) for one row k 0 ∈ {1, … , K}, we have (2.40) 304 C. JENTSCH AND L. REICHMANN 2005, Sect. 5.2). For this purpose, we construct a K * × (K 2 p + K) matrix C of rank K * by first defining the auxiliary matrixc These matricesc k 0 have only entries of −1 and 1 in row k = k 0 and zero otherwise. Then, the matrix C combines the vectorized auxiliary matrices for every k 0 ∈ K ∶= {k 0,1 , … , k 0,K * } by , thuŝis the vectorized estimated parameter matrix from the Yule-Walker estimator (2.33). As we restrict the rows to have row sums (of the absolutes) equal to one, we set c = K * , which leads to the constraint estimator defined bŷ valid =̂+ Revectorizinĝv alid leads to the constraint parameter estimatorÂ valid and the corresponding valid matrix is obtained as described in (2.34). Since valid contains now K * zero diagonal entries, the innovation process (e t , t ∈ ℤ) is not identified for those rows. The remaining entries of e can be estimated by a reduced system of dimension K − K * consisting of all rows witĥv alid,kk ≠ 0; see also the discussion after (2.35).

SIMULATION STUDY
We investigate the performance of Yule-Walker-based estimators in gbVAR models as described in Section 2.4 by Monte Carlo simulations. To illustrate the estimation performance in several gbVAR model setups, we consider a) the (average) mean squared error (MSE) of different parameter estimators and b) the (average) mean absolute deviation error (MADE) of transition probability estimators. For this purpose, we consider three different gbVAR(p) setups with orders p = 1, 2 and of dimensions K = 3, 4 for sample sizes n = 100, 500, 1000 to be able to judge the performance of parameter estimation in several gbVAR model specifications. Precisely, we consider data generating processes (DGPs) with the following specifications: (DGP1) gbVAR (1)  For all DGPs, the corresponding innovation process (e t , t ∈ ℤ) consists of K independent Bernoulli processes (e t,k , t ∈ ℤ), k = 1, … , K with e,k = P ( e t,k = 1 ) leading to diagonal Σ e matrices with diagonal entries e,k (1− e,k ), k = 1, … , K. Note that we make use of positive as well as negative entries in  (1) and  (2) . Hence, these coefficient matrices are related to the diagonal matrix In Section 3.2, we address also the estimation of the off-diagonal elements of a non-diagonal variance-covariance matrix Σ e .

Average MSE Estimation Performance
To measure the estimation performance, we calculate averages of the entry-wise mean squared errors (MSE) of the estimators (1) and (2) ,̂X,̂e and, respectively, based on 1000 Monte Carlo replications for each DGP and each sample size. The simulation results are presented in Table I. It can be seen that the estimation performance improves with increasing sample size for all estimators and all DGPs. In comparison, the estimation of the mean innovation vector e is least precise with an average mean squared error around 10 percent. This phenomenon can be explained by formula (2.35), which requires the inversion of the diagonal matrix. Due to rather small diagonal entries of , already small deviations in might lead to a less stable estimation of e and to a larger MSE.

Average MSE Estimation of Non-diagonal Σ e
As discussed in Remark 2.13, the imposed diagonality of  (0) does generally allow to identify also the non-diagonal entries of Σ e . These can be estimated by using the Yule-Walker equation for h = 0 from Theorem 2.8 in conjunction with (2.21), where the corresponding estimator is obtained by replacing all population quantities by their sample analogs as described in Section 2.4.
In Table II, we report the MSE for the diagonal and non-diagonal elements ofΣ e for different sample sizes. It can be seen that the MSE decays for increasing sample size. However, for a small sample size of n = 100, the MSE of the off-diagonal elementŝe ,12 is huge with 16.7383. This value is caused by a mis-estimation for some few Monte Carlo replications, where the estimated parameters already show a large MSE. Due to a matrix inversion, this leads to unstable and unreliable estimates. Nevertheless, this issue disappears for larger sample sizes such that the joint distribution of the innovation process in form of the non-diagonal entries of Σ e can be consistently estimated. However, as  is imposed to be diagonal for identification reasons, the non-diagonal entries of Σ e do not have large effects on the stochastic properties of the model. Hence, in practice, it seems to be recommendable to avoid the estimation of a non-diagonal Σ e due to potentially unstable estimation results.

Average MADE Estimation Performance
An alternative concept to measure the estimation performance in gbVAR models is based on (average) mean absolute deviation error (MADE) of transition probability estimators. Note that a direct comparison of prediction probabilities in [0,1] and outcomes in {0, 1} are not straightforward and might be misleading to judge the parameter estimation performance. Hence, we compare the (one step ahead) population transition probabilities with the corresponding estimated transition probabilities and consider with p s 0 |s 1 ,…,s p = P(X t = s 0 |X t−1 = s 1 , … , X t−p = s p ) as obtained in Theorem 2.10 andp s 0 |s 1 ,…,s p =P(X t = s 0 |X t−1 = s 1 , … , X t−p = s p ) as constructed in (2.37) for the special case, where e t,1 , … , e t,K are independent such that p r 0 = P(e t = r 0 ) = ∏ K k=1 P(e t,k = r 0,k ). Mainly, there are two possibilities to use (3.1) to judge the average estimation accuracy in gbVAR models. The first one, considers the absolute deviation of the transition probabilities according to their actual appearances in the Monte Carlo sample under consideration. That is, given The second option is to calculate the absolute deviate of the transition probabilities over all possible states of s 0 , … , s p in the state space {0, 1} K leading to year. For particulate matter PM 10 (coarse particles with a diameter between 2.5 and 10 μm), the liability has a threshold of 50 μg/m 3 . Hence, whenever the amount of PM 10 exceeds the threshold of 50 μg/m 3 at a certain monitoring station, this will cause a 'fine dust alarm'. Hence, for each such monitoring station, this results in a binary sequence with states 'exceedance' and 'no exceedance'. In fact, the current public discourse centers to a large extent around whether the threshold is exceeded or not, and less about the actual amount of fine dust measured.
In view of these EU regulations, Stuttgart, Germany is one poorly prominent city, where air pollution generally is a major problem. The reasons for these problems are essentially twofold. On the one hand, they can be explained by its geographic location in a valley leading to a poor air exchange in the city area. On the other hand, the main industry such as automobile companies and suppliers as well as financial industry is located near to the city center. Due to the restricted space in a valley to expand, many people live in suburbs of Stuttgart and have to commute to their work places. The commuting traffic concentrates on few main traffic routes, which are highly frequented during rush hours. Hence, large portions of particulate matter in the air in and around Stuttgart are caused by individual mobility.
A first inspection of the data in Figure 2 shows that fine dust alarms tend to occur in clusters indicating serial and cross-sectional dependence. However, all sequences do not show long runs of fine dust alarms, but rather long runs without any alarm. Moreover, fine dust alarms tend to show more likely in winter. This is due to the fact that the topological influence of stationary temperature inversion hinders vertical air exchange. Hence, after the analysis of the whole data set, we will also run the analysis on the subsample of the data collected in the fine dust alarm period (October 15-March 15) afterwards.
First, when considering the whole data set, one station (NRT, k = 5) shows considerable more exceedances in comparison to the other stations with fine dust alarms occurring in about 13% of the days in the considered time  The mean of all the other stations lie around 3% indicating that only few fine dust alarms are detected. Overall, this is not surprising, as NRT is located at one of the most frequented roads of Stuttgart, where high buildings on one side of the road hinder the air exchange and favor air pollution. In contrast, BC with the smallest value of 0.0227 is located at an accommodation route to the city outside of the city center. Now, to study the serial dependence in the data, we aim to fit a gbVAR(p) model. As gbVAR processes satisfy standard Yule-Walker equations and can be estimated by Yule-Walker estimators as described in (2.33), we can make use of classical order selection criteria such as Hannan-Quinn (HQ) or BIC to determine an appropriate order p of the fitted gbVAR process. Whereas HQ selects a more parsimonious model with p = 1, BIC leads to p = 2. To make a choice which model fits best in terms of prediction performance, we use the receiver operating characteristic (ROC) curve and the corresponding area under the curve (AUC), where an AUC near to one indicates good prediction performance. For this purpose, similar to Section 3.3, we make use of transition probability estimatorsP(X t = s 0 |X t−1 = s 1 , … , X t−p = s p ) as constructed in (2.37) to estimate the transition probabilities for each station, which allows to compute the ROC curves and AUC values. In Table IV, we show the resulting component-by-component AUC values and their overall means for model orders p ∈ {1, 2}. Both models show a good prediction performance with AUC values near to one. However, the additional benefit of fitting a gbVAR(2) model in comparison to a more parsimonious gbVAR(1) model is minor. Hence, we make use of a gbVAR(1) model in the following to further analyze the PM 10 data set. In this case, Yule-Walker estimation leads to an estimated parameter matrix having ∑ K l=1 |̂k l | = 1.1451 > 1 for k = 5 corresponding to NRT. Hence, in view of Remark 2.14, we have to use constraint estimation leading to the  Figure 6. Heatmaps of estimated coefficient matrices for fitted gbVAR(1) processes on the whole PM 10 data sample (, left panel) and on the fine dust alarm period ( FDAP , right panel) estimated parameter matrix as shown in the left panel of Figure 6. The absolute eigenvalues of compute to {0.7092, 0.2933, 0.2933, 0.1143, 0.1143, 0.0019} such that the fitted gbVAR(1) model is stationary.
In each row of, we can see which past state at time t − 1 of the six monitoring stations (fine dust alarm or not) does affect the state at time t. For example, with 51.33% probability, NRT takes the same value as the day before. In contrast, the largest entry in row k = 4 (HS) is 0.2509 in the second column corresponding to BC such that it takes its value of the day before with probability of about 25%. Note also, as can be seen in the left panel of Figure 6, that the fitting of a gbVAR model leads to some negative coefficients in. From a modeling perspective, this naturally leads to more flexibility in comparison to models that do allow only for non-negative coefficients.
For identification purposes, we impose  to be diagonal; see also Remark 2.13. Since we constrained the estimation to achieve ∑ K l=1 |̂5 l | = 1 and set̂5 5 = 0 for NRT, the effective innovation process is of reduced dimensioñ K = 6 − 1 = 5. Hence, as described in Section 2.4, we get The diagonal entries of indicate how often the corresponding innovation terms are selected. For example, for SG, in about 40% of the days, the innovation term enters the gbVAR model, whereas this happens only in about 17% for AKP.
By using formulae (2.35) and (2.36), we can estimate the mean vector and the variances of the innovations. This leads tôe Now, we conduct the same analysis as above, but only for the data collected in the fine dust alarm period (FDAP) (October 15-March 15). First, we obtain the sample mean vector X,FDAP = (0.08981, 0.04854, 0.08252, 0.08010, 0.25243, 0.05340) ′ .
As there are only very few alarms outside of the FDAP, the sample vector̂X ,FDAP has entries that are about twice as large as those of̂X. Now, in contrast to the analysis of the whole sample, both order selection criteria HQ and BIC select the order p = 1 and a corresponding AUC analysis (not reported) leads again to only minor improvements when using order p = 2 in comparison to the more parsimonious gbVAR(1) model. For the FDAP, again a constraint estimation has to be executed leading to the estimated parameter matrix FDAP as shown in the right panel of Figure 6. The absolute eigenvalues of FDAP compute to {0.6845, 0.3070, 0.3070, 0.1147, 0.1147 In comparison to the analysis of the whole data sample, there is only a pronounced change in the estimated mean vectors, when restricting the analysis to the FDAP. In particular,̂X ,FDAP is estimated about twice as large aŝX.
In contrast, the estimation of the coefficient matrix leading to and FDAP turns out to be very similar.

CONCLUSION
We consider vector-valued extensions of gbAR processes introduced by Jentsch and Reichmann (2019) to model multi-variate binary time series data with potentially negative model parameters. We define generalized binary vector AR (gbVAR) models of order p ∈ ℕ and provide a full description of the gbVAR model class. We derive stochastic properties of gbVAR processes including formulae for the mean vector, stationarity conditions, moving-average representations, geometric mixing properties, Yule-Walker equations and transition probabilities. Possible identification issues and parameter estimation in gbVAR models based on Yule-Walker estimators are discussed. In a simulation study, the estimation performance of Yule-Walker estimators and related estimators is analyzed in several regards indicating good finite sample properties. In a real data application, we fit gbVAR processes to binarized PM 10 data from Stuttgart, Germany. The estimated gbVAR(1) model contains positive as well as negative coefficients to capture the serial dependence in the data and proves to yield accurate predictions.