Stationary count time series models

During the last 20–30 years, there was a remarkable growth in interest on approaches for stationary count time series. We consider popular classes of models for such time series, including thinning‐based models, conditional regression models, and Hidden‐Markov models. We review and compare important members of these model families, having regard to stochastic properties such as the dispersion and autocorrelation structure. Our survey covers univariate and multivariate count data, as well as unbounded and bounded counts. We also discuss an illustrative data example. Besides this critical presentation of the current state‐of‐the‐art, some existing challenges and opportunities for future research are identified.


| INTRODUCTION
The area of time series analysis has attracted a lot of interest in research and practice during the last 50 years. Many textbooks are meanwhile available on this topic, with the pioneering work being the famous book by Box and Jenkins (1970). However, most of these textbooks exclusively concentrate on real-valued time series, despite the fact that in applications, we are often faced with different types of data. Especially count time series are often monitored in practice, that is, time series consisting of discrete and quantitative observations from the set of non-negative integers, N 0 = {0, 1, …}. Accordingly, models and methods for count time series were also covered by a large number of research articles during the last few decades, but they did not enter the general textbook literature yet. As a consequence, they are still not well communicated to a broader audience. As a step towards a greater visibility of this topic, this article presents a compact overview on the current state-of-the-art of models for stationary count time series. Certainly, there has also been research on nonstationary count time series or on other types of discrete-valued time series (see the book by Weiß (2018) for a more comprehensive survey), but the above focus offers an excellent starting point for discovering this fascinating and important area.
In what follows, we assume x 1 , …, x T to be a univariate time series originating from a stationary count process (X t ) t ∈ Z = {…, − 1, 0, 1, …} . In the case of multivariate counts, bold letters are used instead, that is, x t or X t , respectively. For identifying appropriate types of models for these data, we have to check the characteristics of the considered count time series. An important distinction is, for example, the one between bounded or unbounded counts: if there exists an upper limit n ∈ N that can never be exceeded, then only models designed for the bounded range {0, …, n} are relevant. Otherwise, we are concerned with unbounded counts, having the full set N 0 as their range. Next, we have to check marginal characteristics like the dispersion and zero probability, see Box 1 for a brief survey, and serial characteristics like the autocorrelation structure. Based on these findings, we define a set of candidate models for the subsequent statistical model fitting and diagnostics.
As an example, let us have a look at a count time series (length T = 242) about the weekly number of sales of a soap product in a supermarket. The data are taken from MacDonald and , and a summary is provided by Figure 1. Since there is no specified upper bound for the number of sales, we treat the counts as being unbounded. They have the sample mean ≈ 5.44, whereas the variance is about 2.8 times larger, taking the value ≈15.40. So the counts exhibit a lot of overdispersion, which is also visible from the plot of the sample probability mass function (PMF) in Figure 1. The sample autocorrelation function (ACF) shows weak autocorrelation (the lag-1 value equalsρ 1 ð Þ≈0:392), which quickly decreases with increasing time lag h. Actually, the partial ACF (PACF) in Figure 1 has a significant value only at lag 1, indicating that the true data-generating process (DGP) might have a first-order autoregressive (AR) structure. So relevant candidate models for the sales count time series might be found within the class of conditionally linear models (Grunwald, Hyndman, Tedesco, & Tweedie, 2000), and the observed (P)ACF characteristics shall help us to identify them.
Possible models to be used as candidates for the model fitting of a stationary count time series might be found in the large family of thinning-based models, which is surveyed in Section 2. Another popular class are conditional regression models for count time series, see Section 3 for an overview. Furthermore, also the Hidden-Markov models (HMMs) discussed in Section 4 are commonly used for count time series. For each of these model families, important members and their stochastic properties are presented. In addition, common features across and differences between these families are discussed. In Sections 2-4, we also continue the data analysis of the above sales count time series, and briefly address some further data applications. Finally, Section 5 concludes and outlines current research directions.

BOX 1 POSSIBLE PROPERTIES OF COUNT DISTRIBUTIONS
The "normal" distribution for a count random variable X having the unbounded range N 0 is the Poisson (Poi) distribution. It is characterized, among others, by the so-called equidispersion property: its variance is equal to the mean, V[X] = E[X]. Furthermore, its zero probability is related to the mean by the equation P(X = 0) = exp (−E[X]). If, in contrast, X has a variance being larger (smaller) than the mean, then X is said to exhibit overdispersion (underdispersion). Analogously, if P(X = 0) is larger (smaller) than exp(−E[X]), we refer to X as being zero-inflated (zero-deflated). A similar classification is done for bounded counts defined on {0, …, n}, but then with respect to the binomial (Bin) distribution instead of the Poisson distribution. In particular, since the binomial variance satisfies n V Several count distributions have been proposed in the past to capture the above features. For example, a common choice for unbounded counts with overdispersion is the negative-binomial (NB) distribution, whereas the beta-binomial (BB) distribution is often used for bounded counts with extra-binomial variation, see Johnson, Kemp, and Kotz (2005) for a comprehensive overview. If one wants to capture underdispersion as well, then Conway-Maxwell extensions of the Poisson or binomial distribution, respectively, constitute a possible solution (Sellers, Swift, & Weems, 2017).
Such dispersion features are also observed for multivariate count data, and several multivariate extensions of parametric models for univariate counts have been developed, see the survey by Johnson, Kotz, and Balakrishnan (1997). Additional complexity might be caused by a sophisticated cross-correlation structure or by heterogeneous marginal properties. A possible solution is to use copulas to combine the different types of marginal distribution (Genest & Nešlehová, 2007;Joe, 1997).

| THINNING-BASED MODELS FOR COUNT TIME SERIES
In data applications with stationary count time series, one often observes an autocorrelation structure being similar to that of the ordinary autoregressive moving-average (ARMA) models (Box & Jenkins, 1970), where the Y t and ε t are real-valued random variables. Model (1), however, cannot be applied to count processes, because the multiplications in (1) do not preserve the integer range ("multiplication problem"). Thus, there has been a lot research effort to modify the basic ARMA model (1) in such a way that the resulting model generates only count data, but still with an ARMA-like ACF. For the models to be discussed in this section, the idea is to directly substitute the multiplications in (1) by an integer-valued counterpart, a so-called "thinning operation." The resulting models are then referred to as integer-valued ARMA (INARMA) models. There exists a large variety of possible thinning operations, see the recent survey by Scotto, Weiß, and Gouveia (2015), but in what follows, we shall mainly concentrate on the binomial thinning operation " " due to Steutel and van Harn (1979). For α ∈ [0; 1], and applied to the count random variable X, the newly generated count α X arises from X by binomial thinning if the distribution of α X j X is the binomial distribution Bin(X, α). So in contrast to the ordinary multiplication "αÁ", the thinning "α " constitutes a random operation that generates a count value from {0, …, X}. But since both operations have the same mean, we can use binomial thinning as an integer-valued substitute of the multiplication. The name "thinning" expresses the shrinking effect of "α ." As discussed in section 2.1.1 in Weiß (2018), we can interpret α X as the random number of "survivors" from a "population" of size X, where each individual survives with the probability α. This type of interpretation carries over to the INARMA models to be discussed now.

| INARMA models for unbounded counts
The first INARMA model was proposed by McKenzie (1985) and Al-Osh and Alzaid (1987): the INAR(1) model defined by the recursion X t = α X t − 1 + ϵ t with α ∈ [0; 1), which constitutes an integer-valued counterpart to the ordinary first-order AR model (i.e., model (1) with p = 1 and q = 0). Here, the innovations (ϵ t ) Z are independent and identically distributed F I G U R E 1 Sales count time series: plot against time t in top row; plots of sample probability mass function against x as well as (partial) autocorrelation function against lag h in bottom row (i.i.d.) count random variables, and ϵ t is generated independently of (X s ) s < t . Since "α " is a random operation, further model assumptions are necessary, namely that all thinning operations are performed independently of each other and of (ϵ t ) N , and that the thinning operation at time t is independent of (X s ) s < t . The term "α X t − 1 " can be interpreted as the number of survivors of generation t -1, and "ϵ t " expresses the immigration to generation t. Similarly, Al-Osh and Alzaid (1988) and McKenzie (1988) defined an integer-valued counterpart to the MA(1) model (so (1) with p = 0 and q = 1), the INMA (1) model, by X t = ϵ t + β ϵ t − 1 . Both types of first-order models have properties analogous to those of their (Gaussian) ARMA complement: Poi(λ)-distributed innovations (λ > 0) lead to Poi(μ)-distributed observations (μ = λ/(1 − ρ(1))), where the ACF ρ(h) is given by ρ(h) = α h for the INAR(1) model, and where ρ(h) = 0 for h ≥ 2 for the INMA(1) model. Both the INAR(1) and the INMA(1) model can be embedded into higher-order AR-type or MA-type models, respectively. The INMA(q) model is defined by where the q + 1 thinnings at time t (the time index t has been added below the operator " " in (2) for clarification) are performed independently of each other. For the above INMA(1) model, we set β 0 = 1. But if q > 1, further model assumptions are necessary to uniquely define the INMA(q) model: the conditional distribution of (β 0 t ϵ t , β 1 t + 1 ϵ t , …, β q t + q ϵ t ) given ϵ t has to be specified. This can be done in quite different ways, leading to different types of INMA (q) models, see Brännäs and Hall (2001) for details. For example, one may require conditional independence between the thinnings experienced by ϵ t , as it was suggested by McKenzie (1988). But in any case, Poi-distributed innovations still go along with Poi-distributed observations, and also the ACF the situation is more complex. If assuming the conditional distribution of (α 1 t + 1 X t , Á Á Á , α p t + p X t ) given X t to be multinomial as suggested by Alzaid and Al-Osh (1990), Poi-distributed innovations cause Poi-distributed observations, but the ACF differs from the ACF of an ordinary AR(p) model. Therefore, the proposal of Du and Li (1991) is usually preferred in practice, because their assumption of conditional independence for the above thinnings implies that the INAR (p)'s ACF can be computed by solving the well-known Yule-Walker equations For this "DL-INAR(p) model," however, the above Poi-preservation is lost. Generally, the observations' variance σ 2 has to be computed from the innovations' variance σ 2 ϵ according to whereas we have the well-known relation μ = μ ϵ /(1 − α • ) between the observations' and innovations' mean. The transition probabilities of the DL-INAR(p) process are computed as a convolution between p binomial distributions (caused by the thinnings) and the innovations' distribution. For example, if p = 1 and α 1 = α, we have The required PMF for ϵ t follows from the actual model assumption: for the Poi-INAR(1) model, we take the PMF of the Poi(λ) distribution, whereas the NB-INAR(1) model assumes the innovations to follow NB(n, π) (also see Box 1). The latter approach allows to model an overdispersed and AR(1)-like count time series, which could be relevant for the sales count data introduced in Section 1; we shall continue this data example later in Section 2.4. Several further possibilities for choosing the innovations' distribution have been discussed in the literature; especially members of the compound-Poisson family proved to be suitable for the INAR(1) model (Schweer & Weiß, 2014). Many modifications and generalizations of the basic INARMA models, especially of the widely-used INAR(1) model, have been proposed in the literature. A possible approach is to replace the binomial thinning operation by another type of thinning. Latour (1998) considered a family of generalized thinning operations, which includes the popular NBthinning of Risti c, Bakouch, and Nasti c (2009) as a special case. Another modification is to assume the thinning parameters to be random themselves (random coefficient thinning), see Zheng, Basawa, and Datta (2006), Zheng, Basawa, and Datta (2007), and Gomes and e Castro (2009) for details. For example, if the binomial thinning's parameter is betadistributed, then we obtain the BB-thinning operation, which allows to generate additional dispersion for the considered INARMA process. For the convolution-closed models proposed by Joe (1996), the thinning operations are defined such that the observations and innovations stem from the same distribution family. A broad subclass of these models was recently developed by Sellers, Peng, and Arab (2019), which relies on Conway-Maxwell extensions of the Poisson distribution (for the observations and innovations) and of the binomial distribution (for the thinning operation). See Scotto et al. (2015) and Weiß (2018) for a comprehensive discussion of alternative thinning operations, and Joe (2019) for more background on convolution-closed models.

| Thinning-based models for bounded counts
The INARMA models discussed in the previous Section 2.1 can only be applied to time series consisting of unbounded counts. If a thinning-based model for bounded counts is required, with a given upper bound n ∈ N, then modifications are required to avoid that X t > n happens. As an example, McKenzie (1985) proposed to modify the INAR(1) model X t = α X t − 1 + ϵ t by replacing the innovation term ϵ t by an additional thinning, β (n − X t − 1 ). So the resulting binomial AR(1) model satisfies where the thinning parameters α, β ∈ (0; 1) can be expressed as β = π (1 − ρ) and α = β + ρ with π ∈ (0; 1) and (7) constitutes a finite Markov chain with transition probabilities The parametrization by (π, ρ) is useful regarding the basic stochastic properties of model (7): its stationary marginal distribution is Bin(n, π), and the ACF equals ρ(h) = ρ h . The parametrization by (α, β), in contrast, is more easy to interpret, with "α X t − 1 " again being the survival term, and "β (n − X t − 1 )" being the revival term. Weiß (2009b) extended the binomial AR(1) model (7) to a pth-order autoregression by using a random mixture similar to that of the discrete ARMA model discussed in Section 4.2. The resulting binomial AR(p) model satisfies the pthorder Yule-Walker equations (4). Model (7) was also modified to capture extra-binomial variation. Weiß and Kim (2014) replaced the binomial thinning operations in (7) by beta-binomial ones, that is, the thinning parameters α, β are randomly drawn from a beta distribution, which causes additional (conditional) variance. Möller, Weiß, Kim, and Sirchenko (2018) developed several extensions for zero-inflated bounded counts, with different types of zero patterns. Many further modifications have been proposed in the literature; Section 5 briefly discusses some of these recent developments, also regarding the other model families discussed in Sections 2-4.
Let us conclude this section by briefly referring to a data example. Weiß and Kim (2014) considered a group of n = 17 countries in the Euro area and determined the number of countries with stable prices among them (per month for the period [2000][2001][2002][2003][2004][2005][2006]. A plot of the resulting bounded-counts time series in shown in Figure 2. From the corresponding sample ACF, an AR-like autocorrelation structure becomes clear. Furthermore, Weiß and Kim (2014) showed that the data exhibit significant extra-binomial variation (about 1.52 times more variance than for a binomial model). Therefore, they decided to use the beta-binomial AR(1) model with parameter estimatesα≈ 0:72 andβ ≈0:10.
Thus, a country with stable prices in month t will also have stable prices in month t + 1 with probability 72% ("survival"), whereas the probability for newly getting stable prices is only about 10% ("revival").

| Multivariate INARMA models
While there has been a lot of research concerning univariate count time series during the last decades, there is still relatively little work on multivariate count time series Within the thinning-based approaches, the first proposal is due to Franke and Rao (1993), who defined an integer-valued counterpart to the vector ARMA models by using matrix-binomial thinning as a substitute of matrix multiplication. More precisely, with A ∈ [0; 1) d × d , this type of multivariate thinning operation is defined as where the d 2 univariate binomial thinning operations on the right-hand side of (9) are performed independently of each other. A more general family of matrix thinnings is considered by Latour (1997). Using such an operation (we still focus on the binomial type of matrix thinning for simplicity), a multivariate extension of the INAR(1) model is defined by where (ϵ t ) Z is an i.i.d. d-dimensional count process with finite mean μ ϵ and covariance matrix Σ ϵ (Franke & Rao, 1993). A pth-order extension, in analogy to the INAR(p) model by Du and Li (1991), was proposed by Latour (1997). Model (10) has a unique stationary solution if the polynomial det I−A z ð Þ6 ¼ 0 for jz j ≤ 1 (Latour, 1997), where I denotes the d × d-identity matrix. In this case, the marginal mean equals μ = I−A ð Þ − 1 μ ϵ , and the autocovariance function satisfies the Yule-Walker equations where B has the entries b ij = a ij (1 − a ij ). If the component processes (X t,i ) Z should constitute univariate INAR(1) processes on their own, then A has to be chosen as a diagonal matrix, A = diag(a 1 , …, a d ), such that the cross-correlation between X t,i and X t,j is solely caused by the innovations (Pedeli & Karlis, 2011). Such a diagonal-matrix bivariate INAR (1) model was used by Pedeli and Karlis (2011) for the bivariate count time series x 1 , …,x 365 ∈N 2 0 plotted in Figure 3. It provides the daily number of road accidents in the Schiphol area (Netherlands) in 2001, where it is distinguished between daytime (x t,1 ) and nighttime (x t,2 ) accidents. Both component series are weakly autocorrelated, and they also exibit a weak positive cross-correlation. For the innovations ϵ t , Pedeli and Karlis (2011) tried out bivariate extension of both the Poi-and the NB-distribution, and finally, they decided for the latter model in view of the apparent overdispersion in the data.  (1) recursions; an analogous model with NB thinning instead of binomial thinning is due to Risti c, Nasti c, Jayakumar, and Bakouch (2012). In both cases, the model can be rewritten similarly to (10), but using a special type of random coefficients for the thinning matrix A.
For the case of bivariate bounded counts (with possibly different bounds n 1 , n 2 in the different components), Scotto, Weiß, Silva, and Pereira (2014) developed a counterpart to the binomial AR(1) model (7), which uses a thinning operation based on the so-called bivariate binomial distribution of type II (Johnson et al., 1997). If both components have the same upper bound n 1 = n 2 = n, the bivariate binomial AR(1) model by Risti c and Popovi c (2019) can be used as an alternative.

| Application to sales count data
Let us have a look back to the sales count data, see Figure 1 and the discussion in Section 1. Since the time series consists of univariate and unbounded counts, only the models presented in Section 2.1 are relevant for this example. The autocorrelation structure of the data appears AR(1)-like, so the INAR(1) model seems to be a reasonable candidate model. Furthermore, because of the strong overdispersion, an INAR(1) model with overdispersion should be used. Thus, we take the NB-INAR(1) model as a candidate, but also the Poi-INAR(1) model is included for illustration. The detailed estimation results as well as several checks for model adequacy can be found in Supplement S, Supporting information; here, only a brief summary of the main findings is given. Background information on the considered diagnostic tools can be found in Weiß (2018).
If using Akaike's information criterion (AIC) or the Bayesian information criterion (BIC) for model selection, then the NB-INAR(1) model is clearly preferred. This preference is confirmed if looking at the fitted models variance (or at the variance of the respective Pearson residuals), because only the NB-INAR(1) model is able to capture the observed overdispersion reasonably well. The thinning parameter ("survival probability") is estimated asα≈ 0:254, so the fitted model implies that about 25% of the soled soap products in week t − 1 initiate a further purchase in week t. However, comparing the ACF of the fitted model to the sample ACF, it becomes clear that the NB-INAR(1) model is not able to fully explain the actual serial dependence structure (also the Pearson residuals exhibit rather large ACF values at lags 1-4). Therefore, we shall later continue our search for an adequate model once we discussed the next big class of models for stationary count time series. F I G U R E 3 Road accidents count time series: plot of daytime (x t,1 ) and nighttime (x t,2 ) accidents against time t

| REGRESSION MODELS FOR COUNT TIME SERIES
Another popular approach for the modeling of stationary count time series is conditional regression models. Imposing appropriate parameter constraints, it is again possible to obtain an ARMA-like ACF structure. The resulting models are often referred to as integer-valued generalized autoregressive conditional heteroskedasticity (INGARCH) models, see Section 3.1. But also nonlinear forms of dependence are possible if defining a conditional regression model within the framework of generalized linear models (GLMs), see Section 3.2. Multivariate extensions of the INGARCH model are briefly surveyed in Section 3.3. For more comprehensive discussions of regression models for count time series, the reader is referred to Chapter 4 in the book by Kedem and Fokianos (2002), and to Chapters 4 and 5 in Weiß (2018).

| INGARCH models
The basic idea behind INGARCH models is to transfer the ARMA recursion (1) to the level of conditional means, M t ≔ E[X t | X t − 1 , …]. Since the mean of a count random variable is not integer-valued but just a positive real number, it is possible to find parameter constraints for the linear recursive scheme such that negative outcomes are avoided. At this point, one may certainly ask why an integer-valued ARMA-like model is referred to as the INGARCH model. To explain this apparent contradiction, let us recall that, for example, the GARCH(1, 1) model is defined by requiring the conditional variances σ 2 t to follow σ 2 t = β 0 + α 1 Y 2 t − 1 + β 1 σ 2 t − 1 . The INGARCH(1, 1) model relies on the same recursive scheme but applied to the conditional means M t , that is, M t = β 0 + α 1 X t − 1 + β 1 M t − 1 . So the data-generating mechanisms are analogous in this sense. Nevertheless, the name "INGARCH" is still controversially discussed and a couple of alternative proposals have been made in the literature (Weiß, 2018, Remark 4.1.2). We continue referring to these models as "INGARCH" in the sequel, because this name seems to be more often used than any of its competitors.
Let us conclude this section with a critical comparison of the INARMA and the INGARCH family. Both constitute an ARMA-like approach for stationary count time series, and their pure AR-members even have an identical autocorrelation structure. Nevertheless, there are a couple of differences between both families. First, while the INARMA family also includes pure MA-type models, this is not the case for the INGARCH family. Quite the contrary, the "MA-part" in (12) actually causes some kind of "long memory." This is easily seen for the INGARCH(1, 1) model, where that is, the current observation is affected by all previous observations (Fokianos, 2012). Because of this memory, the INGARCH(1, 1) model was successfully applied to transactions count data, see, for example, Christou and Fokianos (2014)

| Nonlinear conditional regression models
Within the framework of GLMs, it is easily possible to also obtain a nonlinear autocorrelation structure. This is achieved by introducing an additional link function on the left-hand side of (12), preferably one having the full sets of reals as its range, because this may allow to weaken the necessary parameter constraints. For time series consisting of unbounded counts, the log-link is often used, whereas one commonly applies the logit-link to the normalized conditional mean M t /n of bounded counts. As an example, Fokianos and Tjøstheim (2011) consider the log-linear Poisson autoregressive model defined by which can be understood as a multiplicative type of t − 1 . Thanks to the log-link, the model parameters β 0 , α 1 , β 1 might also become negative while preserving a positive conditional mean M t . But for the existence of a stationary solution, parameter restrictions are required (Fokianos & Tjøstheim, 2011): |α 1 + β 1 | < 1 if α 1 , β 1 have the same sign, and α 2 1 + β 2 1 < 1 otherwise. Another ARMA-like but nonlinear conditional regression model is the generalized ARMA (GARMA) model considered by Benjamin, Rigby, and Stasinopoulos (2003), having a model recursion of the form where A and ℳ are functions representing the AR-and MA-terms, respectively. Also the generalized linear ARMA (GLARMA) model proposed by Davis, Dunsmuir, and Streett (2003) has to be mentioned in this context, which first defines the residual-like quantities e t ≔ X t −M t ð Þ =M λ t with λ ∈ (0; 1]. Then the model recursion is given by While all the previous models are observation-driven, a parameter-driven Poisson regression model is the one by Zeger (1988). It is based on a latent process (ϵ t ) Z , which is positive real-valued and weakly stationary with mean 1, variance σ 2 ϵ and ACF ρ ϵ (h). The count X t at time t is then generated from the state-dependent distribution Poi(μ ϵ t ) with parameter μ > 0, and conditioned on (ϵ t ) Z , (X t ) Z is assumed to consist of independent counts. Thus, the observations' mean equals μ, the variance equals μ + μ 2 σ 2 ϵ (overdispersion), and the ACF is given by ρ h ð Þ = ρ ϵ h ð Þ= 1 + μσ 2 ϵ À Á − 1 (being the ACF of the latent process with an additional damping factor).
As a final remark, although we focus on stationary count processes here, it should be mentioned that the regression models discussed in the present section are easily extended to nonstationary processes by additionally including covariate information. Details are provided in the above references as well as in the books by Kedem and Fokianos (2002); Weiß (2018). As an example, both Benjamin et al. (2003) and Zeger (1988) analyzed a time series of monthly counts of poliomyelitis cases in the United States. They applied a NB-GARMA(0, 2) model and a parameter-driven Poi-regression model with latent AR(1) process, respectively. But in both cases, also a linear trend and harmonic oscillation (with annual and semiannual cycles) had to be included to obtain a satisfactory fit.

| Multivariate INGARCH models
In analogy to the INARMA case (recall the discussion in Section 2.3), there is rather little work on multivariate extensions of the INGARCH model. A multivariate INGARCH(1, 1) model for unbounded counts assumes the conditional mean M t = E X t j X t − 1 ,… ½ to satisfy the linear recursion where b 0 ∈ 0; ∞ ð Þ d , and where A, B are non-negative d × d-matrices. This linear scheme can also be extended to higher model orders, see Heinen and Rengifo (2003). The count X t at time t is then generated according to the selected conditional distribution. Cui and Zhu (2018), for example, consider a special type of bivariate Poisson distribution, which also allows for negative cross-correlation. Heinen and Rengifo (2003) propose to use (among others) copulas for defining a multivariate extension of the conditional Poisson distribution (also recall the analogous approach by Karlis and Pedeli (2013) concerning multivariate INARMA models). Such a copula approach is also considered by Fokianos, Støve, Tjøstheim, and Doukhan (2020) for model (17) and, in addition, for a log-linear type of multivariate INGARCH(1, 1) model, defined in analogy to (14): Here, the logarithm is applied component-wise, and 1 denotes the vector of ones. Conditions for the existence of a stationary solution to (17) and (18) are derived by Cui and Zhu (2018) and Fokianos et al. (2020). Both papers also present applications to bivariate count time series regarding the number of transactions of certain stocks traded at the New York Stock Exchange. For model (17), the stationary mean of X t is given by I−A −B ð Þ − 1 b 0 , and formulae for variance and autocovariance are provided by Heinen and Rengifo (2003). A bivariate INARCH(1) model for bounded counts, using a conditional bivariate binomial distribution of type II, was proposed by Scotto et al. (2014).

| Application to sales count data
Let us return to the sales count time series with its AR(1)-like autocorrelation structure (Section 1). In Section 2.4, it turned out that the INAR(1) model does not constitute an approach for adequately describing the data. Therefore, in this section, we shall consider its counterpart within the INGARCH family instead, that is, the INARCH(1) model. Note that, in contrast to the Poi-INAR(1) model, the Poi-INARCH(1) model exhibits some degree of overdispersion in its marginal distribution, so it might turn out as a reasonable choice for the data. But especially the NB-INARCH(1) model (in the sense of Xu et al. (2012)) appears as a plausible candidate. Detailed results regarding model estimation and adequacy checks can again be found in Supplement S.
For both models, ML estimation leads to significant estimates, where both information criteria, AIC and BIC, result in a clear preference for the NB-INARCH(1) model. Although the fitted Poi-INARCH(1) model exhibits some overdispersion, it can neither explain the sample variance, nor the conditional dispersion structure as evaluated by the Pearson residuals (variance much larger than one) and the PIT histogram (clearly U-shaped). The NB-INARCH(1) model, in contrast, is even favored over the NB-INAR(1) model by the information criteria. This preference appears plausible in view of the fact that NB-INARCH(1) model is better able to capture the actual autocorrelation structure. But while the Pearson residuals do not show significant ACF values anymore, they still have relatively large ACF values at lags 2-4. A possible solution could be to increase the AR order p; in Section 4.3, we shall consider another type of candidate model instead, namely a Poisson HMM.

| FURTHER MODELS FOR COUNT TIME SERIES
Although INARMA and INGARCH models are probably the most widely used approaches for stationary count time series, a couple of further models have been proposed in the literature. Two of them are briefly presented in the sequel: the HMM in Section 4.1, and the discrete ARMA model in Section 4.2.

| HMM for counts
HMM is a parameter-driven time series model, which can be applied to count processes, but also to processes with any other kind of range, even to categorical processes. HMMs date back to Baum and Petrie (1966), who referred to these models as "probabilistic functions of Markov chains." Nowadays, the standard reference on HMMs is the comprehensive book by Zucchini, MacDonald, and Langrock (2016), while compact surveys are provided by MacDonald and Zucchini (2016) and Weiß (2018). A HMM is a bivariate process X t , Q t ð Þ N 0 , where the X t are the actual (count) observations, and the Q t are the hidden states. The latter are assumed to be categorical with some numerically coded range Q = 0,…,d Q f gwhere d Q ∈N. X t , Q t ð Þ N 0 is said to follow a HMM if the following three properties hold for all q, r∈Q and all t ∈ N 0 : 1. observation equation P(X t | X t − 1 , …, Q t , …) = P(X t | Q t ), with time-homogeneous state-dependent distributions P(X t = x | Q t = q) = p(x| q); 2. state equation P(Q t | X t − 1 , …, Q t − 1 , …) = P(Q t | Q t − 1 , …); 3. Markov assumption P(Q t | Q t − 1 , …) = P(Q t | Q t − 1 ), with time-homogeneous state transition probabilities P(Q t = q | Q t − 1 = r) = a q j r .
Thus, HMMs are special types of parameter-driven (generalized) state-space models (Brockwell & Davis, 2016). For a stationary HMM, we have to refine the third requirement by an additional stationarity assumption.
A stationary HMM has d Q d Q + 1 ð Þparameters related to the transition probabilities of the hidden states, which are summarized through the transition matrix A = (a q j r ) q, r . Further parameters are implied by the d Q + 1 state-dependent distributions. These are typically taken to be parametric count models, for example, state-dependent Poisson models for the case of unbounded counts (thus referred to as a Poi-HMM), or state-dependent binomial models for time series of bounded counts. In both cases, we end up with altogether d Q + 1 ð Þ 2 model parameters, which increases quadratically in the number of hidden states.
The stationary marginal distribution is a probabilistic mixture of the state-dependent distributions, for example, a Poisson mixture for a Poi-HMM. Here, the mixture weights are given by the stationary marginal probabilities π q = P (X t = q) of the hidden states. For a Poi-HMM, with X t j Q t = q~Poi(λ q ), we thus have the observations' mean μ = P q∈Q π q λ q and variance μ + P q∈Q π q λ 2 q −μ 2 (overdispersion). The autocovariance function is given by Joint probabilities and likelihood function are computed efficiently by using the forward algorithm, and a likelihood-based identification of the sequence of hidden states q 1 , …, q T , corresponding to the given time series data x 1 , …, x T , is done by using the Viterbi algorithm; detailed descriptions are provided by Zucchini et al. (2016) and Weiß (2018).

| Discrete ARMA models for counts
One of the first approaches for generating discrete time-series data with an ARMA-like dependence structure are the discrete ARMA models proposed by Jacobs and Lewis (1983). They define two types of discrete ARMA models relying on a random mixture of past observations and innovations; in what follows, we concentrate on the one referred to as "NDARMA model" by Jacobs and Lewis (1983). Let the observations (X t ) Z and the innovations (ϵ t ) Z be count processes, where (ϵ t ) Z is i.i.d. with P(ϵ t = i) = π i , and where ϵ t is independent of (X s ) s < t . Furthermore, we have to assume a process of i.i.d. multinomial vectors α t,1 ,…, α t,p ,β t,0 , …, β t,q $ Mult 1; ϕ 1 , …, ϕ p , φ 0 , …,φ q with p + q≥1, which are independent of (ϵ t ) Z and of (X s ) s < t . These multinomial vectors become one in exactly one of their components (and zero otherwise), and they enable us to define a random mixture as follows: (X t ) Z is said to follow a discrete ARMA model if holds. As a consequence, the upcoming observation X t is generated by simply selecting one of either the past observations X t − 1 , …, X t − p (with probabilities ϕ 1 , …, ϕ p ), or the available innovations ϵ t , …, ϵ t − q (with probabilities φ 0 , …, φ q ). Therefore, the stationary marginal distribution of the observation X t is identical to the one of the innovations ϵ t . Furthermore, (X t ) Z has an ARMA-like autocorrelation structure: the autocovariance function satisfies where r(i) = 0 for i < 0, and r i ð Þ = P i − 1 j = max 0,i − p f g ϕ i − j Á r j ð Þ + φ i 1 0 ≤ i ≤ q ð Þotherwise. Here, 1 Á ð Þ denotes the indicator function. Note that σ 2 = σ 2 ϵ holds for discrete ARMA models, so (21) applies to the ACF after having removed the factor "σ 2 ϵ ." In view of the simple data-generating mechanism and the Yule-Walker equations (21), the discrete ARMA models appear as an attractive approach at the first glance. However, their sample paths are characterized by long runs of count values, interrupted by sudden jumps, a pattern that is hardly found in applications. As a possible solution to make these models more practice-oriented, Gouveia, Möller, Weiß, and Scotto (2018) suggest to generate more variation within successive counts by adding a variation operator to the model recursion (20). More precisely, Gouveia et al. (2018) consider the case of bounded counts and define the binomial variation operator by bv n (X) j X~Bin(n, X/n). Then they modify (20) as and they refer to the resulting model as the bvARMA model. Despite this modification, the Yule-Walker equations (21) still hold true for the bvARMA model, and observations and innovations still have the same mean. But other distributional properties differ between X t and ϵ t ; in particular, we generally have σ 2 6 ¼ σ 2 ϵ for the bvARMA model, see Gouveia et al. (2018) for details. They applied the bvARMA model to a large data base of count time series regarding the number of rainy days per week (thus n = 7) at meteorological stations throughout Europe and Russia.