Analysis of alert message propagation on the highway in VANET assuming Markovian vehicle arrival process

Vehicular ad hoc networks (VANETs) provide the infrastructure for the intelligent transportation system (ITS). The number of vehicles that are capable of communicating with each other through VANET is increasing. In this paper, we provide several analytical results regarding the message propagation on the highway in VANET. In our scenario, there is a stationary message source (i.e., alert messages are generated at an accident). Vehicles can receive the message and get informed when they reach the radio transmission range of the message source or another informed vehicle. Most papers published on the analysis of message propagation assume that the inter arrival times between vehicles follow a Poisson process; there are very few results available with more general traffic model. In this paper, we show that the Poisson process is not always suitable for modeling vehicle traffic. Instead of the Poisson process, we propose to use the more general Markovian arrival process (MAP) to model the vehicle headway times and derive the probability that the message propagates beyond a certain distance from the accident under this traffic assumption. Through several numerical examples, we demonstrate how much the statistics of the vehicle arrival process impacts the message propagation distance.

Hence, analyzing the behavior of message propagation is necessary to measure the quality of service in these use cases. Two factors make the analysis difficult: the lack of a suitable model for vehicular traffic, and the complexity of the system due to the underlying protocol stack, radio specific behavior, and the impact of the structure of the road network.
There are two main techniques to obtain performance measures in a VANET system. The typical approach is based on discrete-event simulation, with SUMO/Omnet++ being the most popular tool for this purpose. Analytical models have several advantages over simulations: they are typically faster to evaluate and provide a better, more direct insight into the behavior of the system; on the other hand, they are much more difficult to develop. Nevertheless, some results based on analytical models are also available. In both cases, an accurate traffic model is necessary to characterize the arrival process of the vehicles. In some papers, the motion of the vehicles is modeled in detail, taking not only the arrival time, the speed (velocity), but also the acceleration also into account. 5,6 However, in the vast majority of the cases, researchers rely on the Poisson model and assume constant velocity for analytical simplicity. [7][8][9] Among other papers, the failure of Poisson modeling for vehicle arrivals was recognized in previous studies. 10,11 More general traffic models are needed to describe the arrival process of the vehicles more accurately. As natural extensions of the Poisson process, Markov arrival processes (MAPs) have been successfully used in several areas to model complex traffic patterns. MAPs are not only more general than the Poisson process, in fact, MAPs include the Poisson process as a special case. When MAPs fail to model the realistic traffic, the Poisson process fails, too. However, if the Poisson process fails, MAPs can still approximate the real behavior reasonably well. MAPs have been used for modeling vehicular traffic previously, starting with Alfa et al 12 and including Farkas et al. 13 In this paper, we provide several analytical results regarding the message propagation on the highway, such that the vehicle arrival process is given by a MAP. In our scenario, there is a message source, that has a fixed location (i.e., alert messages are generated at an accident). Vehicles can receive the message and get informed when they reach the radio transmission range of the message source or the transmission range of another informed vehicle. The transmission range is assumed to be fixed, and the messages are assumed to be received immediately and without being dropped. This idealistic behavior is called deterministic message passing 14 in the literature, and novel broadcasting schemes 15 are getting closer and closer to it. In our model, the arrival process of the vehicles is time stationary (we do not consider seasonality), the velocity of the vehicles is constant, and the effect of the counter-flow traffic is ignored. While the constant velocity assumption does not hold in reality, we need this restriction in order to keep the analytical model tractable (relaxing this restriction is among our future plans). This scenario seems simple and idealistic, yet it is challenging to study with analytical tools, even assuming Poisson traffic. Our aim to study the message propagation distance, thus how far the alert message propagates (measured from the accident) after a certain amount of time.
A similar problem has been solved in Babu and Muhammed Ajeer, 16 with a Poisson process and the presence of channel randomness. In Zhuang et al 17 and in Miorandi and Altman, 9 the mean message propagation distance has been derived for the same scenario, with Poisson vehicle traffic. In Mahmood and Horvath,18 the detailed transient analysis of the message propagation is provided, still under the Poisson assumption. Not surprisingly, other studies 9,17 and Mahmood and Horvath 18 arrived to the same results, but with three different analysis approaches: they express the mean message propagation distance explicitly, as the function of the radius of the radio coverage, the vehicle speed, and the vehicle arrival intensity. In a more recent paper, 19 the message delivery delay is investigated in case of nonconstant, uniformly distributed vehicle velocity, but it still assumes Poisson vehicle arrival process.
In this paper, we show that the Poisson process is not always suitable for modeling vehicle traffic. Instead of the Poisson process, we propose to use the more general Markovian arrival process to model the vehicle headway times, which still allows for tractable analysis. We provide analytical results on both the stationary and the transient properties of the message propagation distance assuming Markovian vehicle arrival process and show that the more accurate modeling of the vehicle arrival process implies a better approximation of the message propagation distance.
The rest of this paper is organized as follows. Section 2 introduces MAPs and shows how to obtain them based on empirical measurement data. The main results on the clusters of informed vehicles are presented in Section 3. The stationary and the transient properties of the message propagation distance are described by Section 4. Section 5 provides some numerical examples; finally, Section 6 concludes the paper.

| STOCHASTIC MODELS FOR THE VEHICLE ARRIVAL PROCESS
The Poisson process has been used in the literature for a long time to model the interarrival times of the vehicles. The reason for this modeling choice is not its validated correctness, but its analytical simplicity; many useful performance measures can be expressed analytically when Poisson traffic is assumed.

| The failure of Poisson process in modeling vehicular traffic
The Poisson process is, however, not a good model for the vehicular traffic in the vast majority of the cases. After examining a number of recent traces comprising of LIDAR measurements, 20 it is easy to show the apparent differences between the statistics of the empirical measurements and the Poisson process.
The left plot in Figure 1 depicts the empirical probability density function (pdf) of the empirical measurements of vehicle interarrival times (also referred to as headway times in the literature) and the pdf of the Poisson model having the same traffic intensity. It can be seen clearly that the shapes of the pdfs are significantly different. A popular measure used to quantify "burstiness" of the traffic is the squared coefficient of variation (SCV), which is defined by the ratio of the variance and the square of the mean value. In the Poisson case (when the interarrival times are exponentially distributed), SCV = 1 holds. SCV values lower than 1 correspond to more "regular" (closer to deterministic) traffic, while SCV > 1 means that the traffic is highly varying. For this measurement trace, we got SCV = 46.913, which means that the vehicle interarrival times are extremely varying.
Another property of the Poisson process is that it is unable to capture the correlations between the vehicle interarrival times. In practice, however, the interarrival times are correlated, as shown by the right plot in Figure 1, which depicts the correlation of an interarrival time and the kth successive interarrival time.
In the subsequent sections, we show that the accurate modeling of vehicular traffic is essential to obtain accurate analytical results for the message propagation distance (and, we believe that for many other performance measures, too).

| Markov arrival process
Having shown that the Poisson model is not always suitable for vehicular traffic, we propose an alternative solution in this section. The Markovian arrival process (MAP 21 ) is a useful modeling tool that is capable of characterizing non-Poisson traffic. It is widely applied in many areas, including telecommunication and logistic systems. MAPs, while being rather general and flexible traffic models, are still relatively simple to work with during analytical derivations.
In a Poisson process, the interarrival times are exponentially distributed. In the case of MAPs, the interarrival times are not exponentially distributed but are given by the composition of several exponentially distributed phases. More precisely, MAPs have a background process, which is an irreducible continuous-time Markov chain (CTMC), where two kinds of state transitions are distinguished: those that generate an arrival and those that do not. If the size N × N generator matrix of the background process is denoted by D, we have that D = D 0 +D 1 , where matrix D 1 contains the rates of those state transitions which are accompanied by a vehicle arrival and matrix D 0 contains the rate of those transitions that do not generate arrival events, they are internal transitions only. Since (1 is the column vector of ones and is the column vector of zeros of appropriate size), we have that D 0 1 = −D 1 1. Figure 2 presents an example with N = 5, where the dashed lines are the internal transitions and the solid ones are accompanied by vehicle arrivals. The states of the background process are often referred to as phases, too.
If the state process of the MAP is denoted by J ðtÞ, t ≥ 0, and the kth interarrival time by T k , then the complementary cumulative distribution function (ccdf) and the density function (pdf) of the interarrival times are given by 22 The stochastic matrix P containing the state transition probabilities between two consecutive interarrival events is obtained by and will be used frequently in the sequel. The stationary row vector of matrix P, denoted by π , is the solution of πP = π,π1 = 1. With this vector, we can express the scalar (state independent) pdf of the interarrival times from (2) as 22 Based on this result, it is easy to derive the following simple quantities characterizing the interarrival times 22 : The interarrival times generated by a MAP can be correlated, too. The lag-k correlation, thus the correlation between T 1 and T k + 1 can be calculated by 22 where λ = 1=EðT k Þ is the mean arrival rate and α is the stationary probability vector of the CTMC given by matrix D, satisfying .

| Obtaining the MAP from empirical measurement data
MAPs form a dense class in the set of point processes, 23 which means that every point process can be approximated arbitrary well with a MAP having appropriately many phases. The critical question is how to obtain the matrix parameters of the MAP, D 0 ,D 1 , to approximate the real traffic behavior accurately. One possibility is to build the background Markov chain of the MAP based on the intuitive understanding of the traffic. For instance, in Figure 3, the arrival process is Poisson, but the arrival rate is time-dependent, where the periods F I G U R E 2 Markovian arrival process transition diagram determining the arrival rate are exponentially distributed and follow each other in a circular way. The solid lines represent transitions of D 0 , and the dashed lines the ones of D 1 (here, vehicle arrivals are not accompanied by phase transitions).
When empirical measurement data is available, an automatic approach can also be used. The process of obtaining the D 0 ,D 1 matrix parameters of a MAP from empirical measurements is called MAP fitting. There are several MAP fitting algorithms available. Some of them compute statistical quantities (marginal moments, auto-correlations, joint moments, etc.) and create the D 0 ,D 1 matrices such that the resulting MAP exhibits the same statistics. [24][25][26] Some other algorithms try to find the MAP that is most likely to generate the measurement data with the expectation-maximization algorithm. 27,28 Nevertheless, MAP fitting is an inherently difficult problem, the corresponding algorithms are continuously improving, but there is no single perfect tool available yet.
For our VANET traffic trace, we have applied the EM algorithm in Horváth and Okamura 27 and the KPC-toolbox. 25 From the two, the first one provided better results, and it will be used for numerical investigations in the rest of the paper. As depicted by Figure 1, the pdf of the vehicle interarrival times is accurately captured by the MAP model, as opposed to the Poisson model. We got somewhat worse matching for the correlations, which can be a consequence of the too low number of data to fit.

| CLUSTERS OF INFORMED VEHICLES
In this paper, we consider a scenario where there are a single road and a message source. The message source is stationary (it does not move) and emits messages continuously. The message source, as well as all the vehicles, communicate through radio transmission, with the radio coverage assumed to be fixed with radius R ( Table 1). The transmission is considered to be perfect; the messages are delivered without any delay and error (also referred to as the Friis model 29 ). When a vehicle receives the message, it becomes informed, and, while moving, emits the message itself, too.
All vehicles are assumed to be identical and have the same constant speed v. Since all vehicles have the same speed, they can not overtake each other. The road is straight and has infinite length.
A group of vehicles, where the distance between subsequent vehicles is less than the radio transmission range R, is called a cluster. We assume that the vehicles in the same cluster can exchange information immediately. Whenever the first vehicle sends a message, all the vehicles within the cluster receive it immediately. The cluster length, represented by random variable G, is measured from the position of the first vehicle to the last position where the information is available, that is, the position of the last vehicle plus R (see Figure 4).
In the rest of the section, we are going to study the stochastic behavior of G, as it is an essential ingredient in the analysis of the message propagation distance. Since we have a MAP vehicle arrival process, to describe the system completely, we have to consider the joint behavior of G and the phase of the MAP. Namely, we have to keep track of the phase of the MAP at two specific time points: at the time when the first vehicle of the cluster was generated, and at G=v later (at the end of the cluster).
If the phase of the MAP at time t is denoted by J ðtÞ, and the first vehicle of the cluster was generated at time t=0 (without loss of generality), the complementary cumulative distribution function (ccdf) of G and the phase at the end of the cluster is defined by and the corresponding matrix Before expressing G(x), let us introduce matrix Z=[Z ij ] with the transition probabilities of the MAP phases between the beginning and the end of the cluster as Thus, Z ij is the probability that the phase of the MAP is j at the end of the cluster, given that it was i at the beginning of it. For matrix Z, an explicit solution is provided by the next theorem.

Theorem 1 Matrix Z can be expressed by
Proof By conditioning on the time between two vehicles, there are two possibilities: either the headway time is greater than R/v, or it is less than or equal to R/v. In the former case, the two vehicles do not belong to the same cluster.
In the latter case, they are in the same cluster, and the phase transition probabilities can be calculated recursively, based on the cluster initiated by the second vehicle. Using (1) and (2), we get which, switching to matrix notation, translates to

G, EðGÞ,EðGÞ
The random variable of the cluster length and its phase-dependent and phase-independent mean values The phase-dependent and phase-independent ccdf of the cluster length The phase-dependent pdf of the cluster length

Z
The transition probabilities of the MAP between the beginning and the end of a cluster γ The stationary phase distribution of the MAP at the beginning of a cluster

DðtÞ,Fðt,xÞ
The stochastic process of the information distance and its phase-dependent transient ccdf

D,EðDÞ
The stationary information distance and its mean value

FðxÞ, FðxÞ
The phase-dependent and the phase-independent ccdf of the stationary information distance

H,EðHÞ
The random variable representing the distance between clusters βðtÞ, β The transient and the stationary phase-dependent probabilities that no vehicles hold the information C The cycle time, that is the interarrival time between two cluster heads

F I G U R E 4 A cluster of informed vehicles
since at speed v the transmission range R, in distance, translates to R/v, in time. Moreover, the second term of Z can be simplified as which, after simple transformations, establishes the theorem.
Similar to the results available for the Poisson case, 18 by the stochastic interpretation of the system, G(x) can be expressed recursively as The first term of (12) corresponds to the case when x falls into the transmission range of the first vehicle of the cluster. In this case G ij (x) = Z ij , since G ≥ R holds, we have to take care only about the phase transitions (see (8) and (9)).
In the second case of (12), x is out of range for the first vehicle, thus to cover x, the cluster has to consist of more vehicles (Figure 4). Conditioning on the interarrival time between the first and the second vehicle of the cluster (having pdf e D 0 y D 1 ) leads to (12).
In order to compute the mean cluster length EðGÞ, we define the phase-dependent mean cluster length that contains information on the MAP phase when the first car was generated J ð0Þ and the MAP phase at the end of the cluster J ðG=vÞ: The i,jth element of the phase-dependent mean cluster length matrix is defined by ½EðGÞ i,j = EðG Á I fJ ðG=vÞ = jg jJ ð0Þ = iÞ, where I fÁg is the indicator variable.

Theorem 2
The phase-dependent mean value of G can be calculated by Proof To obtain the mean value, the integral of the ccdf (12) is calculated as follows: where in cat ðI− e D 0 R=v ÞPZ = Z −e D 0 R=v and that ðI−P + e D 0 R=v PÞ − 1 e D 0 R=v = Z from (10). Collecting the EðGÞ terms together provides the theorem.
In the next step, we compute the phase-independent mean cluster length EðGÞ from matrix EðGÞ . To do so, we have to obtain the stationary phase distribution of the MAP at the beginning of the clusters, denoted by row vector γ.

Lemma 1
The stationary phase distribution vector of the MAP at the beginning of the cluster, vector γ, satisfies the linear set of equations γZP = γ, γ1 = 1.
Proof Let us define a discrete-time Markov chain characterizing the evolution of the phases at the beginning of the clusters. The phase transitions between the beginning and the end of the clusters are governed by stochastic matrix Z. The phase transitions between the beginning and the end of the intracluster periods are governed by stochastic P = ð −D 0 Þ − 1 D 1 . Hence, the phase transitions over the beginning of two consecutive clusters are given by matrix ZP (that is also stochastic), whose stationary distribution provides γ.

Corollary 1
The phase-independent mean cluster length, EðGÞ, is After the description of the first moment by (14), the next theorem and the corresponding corollary express the second moment. Based on the second moment, it is possible to study the variance of the cluster length, but it is an important ingredient necessary to compute the mean information distance in the next section, too.

Theorem 3
The phase-dependent second moment of the cluster length can be expressed by Proof To obtain EðG 2 Þ, the integral of the ccdf (12) is calculated by where the term marked with ( * ) can be simplified as

:
Putting this result back to (16), for all terms of EðG 2 Þ, we get the following: To simplify the third term of (17), we apply integration by parts, leading to In the same way, we can also simplify the the last term of (17) as Finally, putting all terms together gives Exploiting ðI− P + e D 0 R=v PÞ − 1 e D 0 R=v = Z from (10) and applying simple transformations establishes the theorem.
As we did with the mean cluster length, we can also obtain the phase-independent (scalar) second moment of the cluster length by multiplying EðG 2 Þ by γ from the left and by 1 from the right.

Corollary 2
The phase-independent second moment of the cluster length, EðG 2 Þ, is Finally, we provide a differential equation for the ccdf G(x) itself. The motivation is that it is usually easier to solve a differential equation than the integral equation of (12).

Theorem 4
The phase-dependent ccdf of G is the solution of the delayed differential equation (DDE) with boundary condition G(x) = Z,x ≤ R.
Proof Let us express and manipulate G(x+R). For x > 0, we need to use the second case of (12) only. Changing variable in the integral gives

| ANALYSIS OF THE MESSAGE PROPAGATION
Assume that an event occurs on the highway at position A. In the rest of the paper, this position is assumed to be fixed, and messages advertising the event are generated continuously for a long time. Here we study the right-continuous stochastic process fDðtÞ, t > 0g, where DðtÞ is the information propagation distance, that is the position of the last car measured from A having the message received at time t, plus R (the radius of its radio coverage). In Mahmood and Horvath, 18 we have studied the same scenario with the Poisson arrival process, where we have shown that due to the PASTA property EðDÞ and EðGÞ are equal. In the case of MAP, the EðDÞ and EðGÞ are not the same, and we are going to investigate their relation in this section. The time evolution of DðtÞ is shown in Figure 5. The trajectory of DðtÞ consists of alternating intervals. There are intervals where no vehicles hold the information; the length of these intervals (in distance) is denoted by H. Then, a vehicle enters the range of the accident and gets informed, informing its cluster of length G as well. This informed cluster will leave the accident in time G=v, followed by another uninformed interval, and so forth.
The mean length of the uninformed intervals is easy to derive. The phase of the MAP at the beginning of a cluster is given by vector γ; on the other hand, the phase at the end of the cluster is γZ. With this initial phase, the mean time till the MAP generates an arrival is given by γZð −D 0 Þ − 1 1, thus we have From the properties of H and G derived above, we can characterize the properties of DðtÞ.

| Mean information distance
Let us first derive the mean value of D = lim t!∞ DðtÞ, denoted by EðDÞ.

F I G U R E 5
The evolution of the information distance DðtÞ Theorem 5 The mean information distance is expressed by Proof The process fDðtÞ, J ðtÞg forms a Markov renewal process at the moments when the current informed cluster leaves the accident. This means that at these time instants, the phase of the MAP characterizes the future of the process completely.
The theorem can be proven by the renewal reward theorem 30 as where C denotes the time duration of a cycle, which is the sum of the informed and the subsequent uninformed intervals. Hence, (22) is the integral of the information distance in a stationary cycle, divided by the mean cycle time.
If the joint density of G and the corresponding phase transition is defined by matrix gðxÞ = − d dx GðxÞ, EðCÞ is calculated as where y represents the duration of the uninformed interval and x is the cluster length in a specific cycle, and γgðxÞe D 0 y D 1 1 is their joint pdf. The numerator of (22) can be calculated as since x 2 /2v is the area below one "triangle" in Figure 5. Substituting (23) and (24) into (22), we get which equals (21).

| The transient analysis of DðtÞ
After the analysis of the mean information distance EðDÞ, we are going to study the transient behavior of the process DðtÞ. This is one of the most interesting measures when the effect of an event/accident is analyzed since it is important to know how far the alert message gets t time after the accident.
Since we have a MAP vehicle arrival process, the analysis of the joint behavior of fDðtÞ, J ðtÞg is easier than analyzing DðtÞ alone. More precisely, we are going to introduce the random variableĴ ðtÞ as the phase of the MAP at the moment when the cluster present at time t will leave the accident, and define the joint ccdf F i ðt, xÞ = PðDðtÞ > x,Ĵ ðtÞ = iÞ, and the corresponding row vector Fðt, xÞ = ½F i ðt, xÞ. The joint probability of being in the uninformed interval and in a certain phase at time t is given by row vector βðtÞ = ½β i ðtÞ , with elements β i ðtÞ = PðDðtÞ = R, J ðtÞ = iÞ. Note that, for the latter quantity, we use J ðtÞ instead ofĴ ðtÞ. Hence, in the uninformed intervals, βðtÞ follows the evolution of the background Markov chain of the MAP, and when a vehicle enters the range of the accident, we let the MAP generate all the vehicles of the cluster immediately and freeze its phase at the end of the cluster. Hence, as long as the cluster has not left the accident yet, Fðt, xÞ will characterize the information distance and the MAP phase at the end of the current cluster.

Theorem 6
The transient ccdf Fðt, xÞ, x > R and the probability of an uninformed interval βðtÞ satisfy the partial differential equations (PDEs) Proof To prove the theorem, we describe the evolution of DðtÞ in an infinitesimally small time period (t,t + Δ). Since we have a Markovian arrival process, the events to consider must include the phase transitions, too, both those that generate an arrival and those that do not generate any. There are two possibilities leading to Dðt + ΔÞ > x and J ðt + ΔÞ = i, for x > R as follows: • At time t, there was an informed cluster on the highway already, which moved towards the accident, by a distance of vΔ. SinceĴ ðtÞ is the phase at the end of the current informed cluster, we have thatĴ ðt + ΔÞ =Ĵ ðtÞ in this case (as the current cluster remained the same). The probability of having multiple events in (t,t+Δ) is o(Δ), for which lim Δ!0 oðΔÞ=Δ = 0 holds. Based on these two possibilities, we have that With simple algebraic manipulations, we get which, tending Δ to 0 and switching to matrix notations, is equal to (25).To derive (26), we have to investigate how it is possible to be in an uninformed interval and phase i at time t + Δ. There are three cases, as follows: • The system was in an uninformed interval and phase i at time t already, and no phase transitions occurred in the MAP. The probability of the latter event is 1 + [D 0 ] ii Δ + o(Δ).
• The system was in an uninformed interval and phase j at time t, and there was an internal phase transition in the MAP from phase j to i, meaning that no vehicles arrived to the range of the accident, with probability [D 0 ] ji Δ + o(Δ). • At time t, there was a cluster of informed vehicles on the highway (ending with phase i), which has left the accident in (t,t + Δ). Hence, a new uninformed interval begins at time t + Δ. The probability of this event is F i (t,R) − F i (t,R + vΔ).
Putting all parts together leads to that can be transformed to which, after taking the limit Δ!0 and using matrices, yields Observing that the derivative with regards to x on the right-hand side can be expressed using the time derivative based on (25) establishes the theorem.

| Stationary analysis
The stationary behavior can be obtained by taking the limit D = lim t!∞ DðtÞ. The following theorem provides the stationary solution of the phase-dependent ccdf of the information distance, denoted by FðxÞ = lim t!∞ Fðt, xÞ, using the stationary phase-dependent ccdf of the cluster length G(x).
Theorem 7 For x > R, the stationary phase-dependent ccdf of the information distance is given by and, for x=R, the stationary probability vector of an uninformed interval is the solution of the system of linear equations Proof Taking the limit t!∞ in (25) gives and taking the limit in (26) leads to In the latter equation, observe that matrix D 0 +D 1 Z defines a continuous-time Markov chain that characterizes the phase process of the MAP in uninformed intervals. The MAP evolves according to matrix D 0 , and whenever a vehicle arrives to the accident (matrix D 1 ), the phases are immediately adjusted by matrix Z to reflect the end-ofcluster phases. The stationary solution of this Markov chain provides the phase distribution in the uninformed intervals. From the definition of EðHÞ and EðGÞ, the stationary (phase-independent) probability of the uninformed intervals, β1, is EðHÞ=ðEðGÞ + EðHÞÞ, proving (28). On the other hand, we can observe that vector βD 1 is proportional to the phase distribution right at the end of the uninformed intervals when a new cluster gets formed. The same phase distribution is given by vector γ, too, the only difficulty is to find the scaling factor between them, that normalizes βD 1 . Since exactly one cluster is formed at the end of each uninformed period (that lasts for EðHÞ=v in time), we have that the scaling factor is v=EðHÞ; thus, (29) becomes that proves (27).

Theorem 8
The phase independent ccdf of the information distance, FðxÞ = FðxÞ1, can be expressed by for Proof Multiplying (27)  Taking the integral of both sides from y=0 to ∞ gives where, from FðRÞ = PðD > RÞ, for the constant, we have that c = EðGÞ=ðEðGÞ + EðHÞÞ (see Figure 5), yielding (30).
Theorem 5 provides a way to calculate the mean information distance EðDÞ. The same quantity can be derived from Theorem 8 as well, if we take the integral of (30) as follows: where the integral term in the parenthesis simplifies to which is in line with (21).

| Speed of the information propagation
Making use of the transient distribution, another interesting study can be carried out: the analysis of the speed of the information propagation. From the ccdf F(t,x), the mean information distance at time t can be obtained as Hence, taking the integral of both sides of (25), and multiplying it by 1 from the right, we get that Equation (33) is easy to interpret and is completely reasonable. The term −v means that the informed cluster moves towards the accident at speed v; hence, the information distance decreases at this speed. At the other hand, βðtÞD 1 is the rate of new cluster formation (the rate of a vehicle arrival in an uninformed period), when the information distance jumps up to EðGÞ1.
The negativity of d dt EðDðtÞÞ is not obvious from the formula. Still, it follows from the behavior of the system, namely that in case of constant speed, only a single informed cluster can exist, and a new one can be formed only after the previous one has left the accident.

| NUMERICAL EXAMPLES
In this section, we investigate the impact of the statistics of the vehicle interarrival times on the cluster size G and the information distance DðtÞ. Our implementation is based on Matlab, and we use the BuTools library 31 for obtaining the MAPs. To validate the correctness of the presented analytical methods, we have developed a custom simulation tool that gives the exact same results in all of the studied cases.
In all of the numerical examples, the vehicle speed is assumed to be v = 36m/s (which is the typical speed limitation on highways in many countries), and the radio coverage of the communication is R = 150m.
In Sections 5.1 and 5.2, we demonstrate the importance of using appropriate traffic models by showing the impact of the squared coefficient of variation and the lag-1 correlation coefficient on the mean cluster length and the mean F I G U R E 6 The mean and the SCV of cluster length G information distance. Section 5.3 studies the transient distribution of the message propagation distance, and Section 5.4 presents some results with real traffic data as well.

| Analysis of the cluster length
We have used the procedure of Horvath 26 to obtain matrices D 0 and D 1 from the mean arrival rate, the SCV, and the lag-1 correlation, ρ, of the headway times. With several combinations of these statistical parameters, the mean and the SCV of the cluster length G were computed based on Corollaries 1 and 2.
Setting the mean arrival rate to a low value leads to very short clusters that mostly consist of isolated vehicles. On the other hand, by setting it too high, almost all vehicles will be connected into a single cluster. We found that λ=0.65 (vehicles/second) represents a car density that falls between the two extreme cases, through which the cluster length is worth examining. The SCV of the headway times has been varied in the range of 0.5 (close to deterministic) to 5 (very bursty), where SCV=1 corresponds to the case of the Poisson vehicle arrival process. As for the lag-1 correlation parameter ρ, we studied the effect of no correlation (ρ=0), negative correlation (strong negative correlation ρ=−0.3 and moderate negative correlation ρ=−0. 19), and positive correlation (strong positive correlation ρ=0.85 and moderate positive correlation ρ=0. 19) as well. Of course, our procedure can handle the full range of these parameters, that is, SCV > 0 and −1 < ρ < 1, the only limitation being that ρ > −1/SCV must be respected. 26 The results are shown in Figure 6. As visible from the figure, the statistics of the vehicle arrival process has a significant impact on the cluster length statistics. In general, the lower the SCV is, the longer the clusters are, since the variability of the headway times is lower. The combination of high correlation and low SCV leads to the longest clusters, in this case the headway times of many subsequent vehicles are almost identical. When the SCV is higher, the difference between the curves is smaller, but still significant (observe that the scale of the y axis is logarithmic). According to the plot on the right side of Figure 6, the squared coefficient of variation of the cluster length also depends on the SCV of the headway times.
We note that there are some short intervals in Figure 6 where the plots are not perfectly smooth. The reason for this effect is that the procedure we used for creating the MAPs 26 has changed the size of the MAPs around these points.
The numerical solution of the DDE defined by Theorem 4 makes it that it is possible to investigate the ccdf of the cluster length as the function of the SCV as well. According to Figure 7, the cluster length is always greater than or equal to R, due to the definition of the system. It is clearly visible that the burstiness of the vehicle flow has a significant impact on the length of clusters of informed vehicles. When SCV is small (more regular headway times), the informed clusters are longer, while higher SCV leads to shorter clusters.

| Analysis of the information distance
The main target of the paper is the analysis of the information distance, that is, how far a message (e.g., an alert message) can propagate in the stationary state. On the left side of Figure 8, the mean information distance is depicted assuming different SCV and correlation values, based on Theorem 5. As known from earlier results, 18  According to the right side of Figure 8, higher SCV of the headway time leads to lower message propagation distance, and high correlation decreases the message propagation distance even more. The plots were obtained by the numerical solution of the ODE defined in Theorem 8.

| Transient analysis
The results of Section 4.2 make it possible to investigate the time-evolution of the message propagation distance, DðtÞ. As a demonstration, we study a case when an accident occurs at time t=0, when the state of the MAP is stationary. We assume that no vehicles are informed about the accident have βð0Þ = α and . From this starting point, the ccdf F(t,x)+β(t) can be obtained by the numerical solution of the ODE given in Theorem 6. (For the solution, the step size along the time axis was 1, and along the distance axis it was 1/v).
The results are visualized in Figure 9 as a heat map. In line with the expectations, the information distance is always at least R, and the more time elapses since the accident, the higher the probability is that vehicles farther away from the accident receive the message about it. After t=300 (that is, 5 min), the stationary state is almost reached, and the distribution of DðtÞ does not change significantly.

| Experiments with real data
We did experiments with real data as well. Two factors make such a study difficult.  The first difficulty is that there are very few high-quality headway data traces available publicly. The majority of traffic data contains counts over a certain period of time (e.g., number of vehicles detected in an hour), which is not suitable for MAP fitting. For our procedure, we need the exact arrival times of the vehicles (or equivalently, all headway times). The only relevant data set we found was Coifman and Li, 20 which was based on LIDAR measurements (a cheaper method that does not need LIDARs has been published in Zhe et al 32 ). While this is a fairly large data set, it is still not long enough, since the treatment of such seasonal traffic measurements needs a lot of data. Hence, we decided to ignore the seasonal nature of the traffic and cut out a part of the data consisting of around approximately 629,000 samples where the traffic was approximately stationary.
The second difficulty is that fitting MAPs needs a lot of data. Capturing the characteristics of the density function of the marginal distribution can be accurate with less data, too, but for correlation fitting, especially for higher lags, much more data is needed. This observation is reflected by Figure 1, too, where the density is fitted well and the lagcorrelations are not matching as well. As more headway time data become publicly available, it will be possible to fit MAPs that better represent the real traffic, making our analytical model more accurate.
Based on the approximately 629,000 samples extracted from the data set, 20 we executed the EM-algorithm published in Horváth and Okamura 27 to create a MAP with 800 states. Even with that many states, the formulas presented in this paper give instant results. Table 2 compares these analytical results with the simulation results driven by the original measurement data. According to the results, the mean cluster length EðGÞ is obtained very accurately; the error (the deviation from the simulation results) is below 2%. The Poisson assumption, commonly used in the literature, 9,18 gives almost 100% error. The same holds for the mean noninformed periods EðHÞ as well. However, for the second moment of the cluster length EðG 2 Þ, our method has a significant error, due to the imperfect MAP fitting caused by the overly small data set. Still, the MAP model-based results are not far away from the simulation results, as opposed to the Poisson model-based results, where there is a 100-times difference. The inaccuracy in EðG 2 Þ implies inaccuracy in EðDÞ ,too. Our method gives 4,013 m for the mean message propagation distance instead of 5,375 m, but it is still much better than the Poisson result with 248.6 m.
We believe that, as more data gets available and more mature MAP fitting methods get developed, the practical relevance of our procedure is going to improve in the future.

| CONCLUSION
The behavior of the message propagation in VANET systems is affected by the traffic model of the vehicles. In this paper, we consider the Markovian arrival process as a traffic model and derive various results related to message propagation. We have derived the moments and the ccdf of the stationary cluster length and the stationary and transient distribution of the information distance. We validated our analytical results with simulation. We conclude that, when the Poisson assumption fails to model the traffic, the Markov arrival process can be a remedy to approximate the message propagation distance in practice.