Link quality estimation for arbitrary packet sizes over wireless links using packet reception events

The packet error rate of wireless links is known to increase with the length of packets. Yet, packet length is rarely taken into account in protocols and algorithms estimating the packet error rate. Still, it is an important factor that higher layer protocols need to be aware of. In this article, we systematically measure the relationship between packet length and packet error rate over a wide range of wireless links and technologies. On the basis of our measurements, we propose a simple empirical model that can capture this behavior. Using this model, multiple methods are proposed that can estimate the packet error rate for any packet length by sampling the link. We consider methods based on hello packets with controlled packet lengths as well as data packets, where the transmitted packet lengths cannot be controlled. We investigate the accuracy of the different estimation methods in various situations and show how they are able to predict the delivery ratio for different packet sizes.


INTRODUCTION
In most wireless networks, it is important to understand and measure the quality of the links. This information is needed in order to decide the best path through a network, the right level of forward error correction (FEC) coding, what transmission power to use, which transmission rate to use, which channel to use in a multichannel network, and much more. These are areas that are crucial to the performance of wireless systems, such as wireless sensor networks (WSN), ad hoc networks, mesh networks, and cellular networks. The process of identifying a link's quality is known as link quality estimation (LQE) and has been a frequent topic of study. [1][2][3][4][5][6][7][8] For most higher layer functions, including routing, the packet delivery ratio (PDR) is the single most important information about a link's quality. It is known that the PDR is dependent on the packet length. 9,10 Until recently, the detailed empirical work on this topic was surprisingly thin. Most work concerning packet length effects are based on independent bit errors. The bit error rate (BER) is then translated into a packet error rate (PER) as follows: In a recent publication, 11 we have shown exactly how the packet length affects the PDR with a systematic and extensive empirical study. The conclusion from this study is that Equation (1) does not describe this effect in sufficient detail. Furthermore, it is very common that LQEs ignore the packet length, and this affects the performance of higher layer protocols.
The extensive work on radio channel modeling often only explains the behavior either at packet level or at bit level, but rarely how they relate, [12][13][14][15][16] with Gilbert 12 being a notable exception. Furthermore, they tend to be too complex to be usable in the design of practical LQE methods. Either their models are too computationally complex to be carried out in a frequent manner or they have many model parameters that cannot easily be parameterized from typical available statistics about real links. Even the simplest models, such as the two-state Markov model from Gilbert 12 cannot be parameterized without complex regression methods or maximum likelihood estimations (MLEs). Furthermore, they might not work well when the available information about a link is limited.
At the same time, many LQE methods are based on packet reception events, such as standard hello packet protocols. Hello packets are typically small packets periodically emitted by all nodes in a network, and their reception rate is used to estimate the PDR of each link. Instead of hello packets, some protocols use regular data packets or feedback from the automatic repeat request (ARQ) function to estimate the PDR. In either case, the packet length of interest might be different from the packet lengths used to sample the link. The length of data packets is determined by the applications and cannot be controlled. For hello packets, it is beneficial to keep them as short as possible. A shorter hello packet creates less overhead and also consumes less power to transmit, which is extra important for battery-powered nodes. To base an LQE on these types of packets, we need to understand the relationship between PDR and packet size. This article is based on packet reception events. However, if LQEs are based on other methods, such as observing chip errors, FEC corrections, signal to noise ratio, or soft decoding values, it is still necessary to account for the packet length effects, meaning that the work of this article still is highly relevant.
In this article, we extend our previous work 11 in several ways. The frequency of packet losses indicates the quality of a link, but it is necessary to accumulate several reception results over a period of time before being able to reliably draw conclusions about a link's true quality. Hence, we will study what effect different amount of reception events have on the accuracy for different packet length-aware LQE methods. We will also develop efficient ways to implement the LQE methods that works well also with limited information about a link as well as how to use data packets, where the packet sizes varies due to the applications being used. Furthermore, we take a more rigid mathematical approach in this article and also make sure that the developed methods can be implemented in practice, meaning that the per-packet computation is kept at a minimum and that we do not require the storage of large amounts of data. For the sake of completeness, we will first summarize the key results from our previous work. 11 The remainder of this article is organized as follows. Section 2 introduces our measurement setup, and Section 3 lists the main results from these measurements and introduces the LQE methods from our previous work. 11 In Section 4, we look at the impact of reducing the amount of used packet reception events on the methods, and in Section 5, we develop and test estimation methods based on regular data packets. Section 6 contains related work, and Section 7 concludes this article.

MEASUREMENT SETUP
To study packet reception-based LQEs, we created experiments to measure the packet loss for several different packet sizes. We focused on a single wireless link where one device sends packets to another device. Packets were sent continuously and with different sizes in random order (either laptops or wireless sensor devices) by the transmitting device. The receiving device did not transmit any packets during the experiments. Packets were considered received if and only if the receiver could correctly receive the packet using its wireless chip. Three different wireless technologies were tested: IEEE 802.11 (WiFi) at 2.4 GHz, IEEE 802.15.4 (used by ZigBee) at 2.4 GHz, and DASH7 (ISO/IEC 18000-7) 17 at 433 MHz.
For WiFi, we used broadcast packets to avoid RTS/CTS and acknowledgements. We modified the driver to select broadcast data rates of 2 Mbps (default) or rates up to 54 Mbps. The transmission power was fixed to the maximum allowed value. We used standard off-the-shelf WiFi cards. For most experiments, we used a 3Com OfficeConnect 108Mb 11g PC Card together with Linux and the Madwifi driver 0.9.4 (now replaced by the ath5k driver 18 ). For our 54 Mbps IEEE 802.11g measurements, we used a WiFi card based on the Atheros AR2425 chip as transmitter and a Netgear WNA1100, which uses an Atheros AR9271 chip, as a receiver. Both transmitter and receiver ran Linux with the open source ath5k and ath9k_htc drivers, 18 respectively.
The IEEE 802.15.4 measurements were done with t-mote sky, which is based on the CC2420 wireless chip from Texas Instruments. 19 IEEE 802.15.4 is based on direct sequence spread spectrum (DSSS) and has a data rate of 250 kbps. The standard on-board antenna of the t-mote sky was used (inverted F). For the software, we used TinyOS 2.1, 20 with its built-in clear channel assessment (CCA) function. Also here, no retransmissions, acknowledgements, or other packets were sent.
The DASH7 measurements were done with Wizzimotes, which are based on the CC430F5137 System-on-chip solution from Texas Instruments. 21 DASH7 uses frequency shift keying (FSK) with a channel size of 216 kHz, a data rate of 55.56 kbps, and PN9 encoding. We used a standard quarter-wave whip antenna. The software came from Wizzilab. Also here, no acknowledgements, error correction, or network functionalities were used. However, no CCA function was used.
The measurements were done in different static scenarios with different conditions and different distances. Most scenarios were set up in a typical office environment with several other interferers (both WiFi and others) and a few in a typical rural home environment with minimal interference. With the exception of the outdoor scenario and the measurements in the anechoic chamber, all links were non-line-of-sight (NLOS).
In each experiment, we generated 200 packets of each length using a Poisson arrival process during about 3.5 minutes. That meant that we utilized the wireless channel less than 5% of the time. Only one sender was used at a time, leading to no internal interference. For each scenario, we conducted 10 experiments in short succession with which we could calculate the 95% confidence interval. We sent packets with 15 or 16 different packet sizes, depending on the technology used. In total, 30 000 or 32 000 packets were generated over a time of about 35 minutes per scenario. Due to different transmission rates and packet sizes, transmission times differed slightly.
The selection of a Poisson arrival process for the experiment packets is because we want random observations of the link in our experiments. This also means that we look at the long-term behavior of the link and not changes from one packet to the next. Furthermore, all results of this article hold for any traffic pattern that a real user of the link may have when considered over a longer period and as long as there is no interplay between the link quality and the traffic pattern.
Both the DASH7 and the IEEE 802.15.4 links have a tendency to change over time. To counter this effect, sometimes, we had to measure over a longer period and extract the most stable subperiod and discard the rest. The extracted subperiods always contained the same amount of packets as the other experiments (2000 packets of each packet length).
The reported packet sizes exclude the physical layer (PHY) and medium access control (MAC) headers but included IP and higher layer headers. Both the used headers and the maximum allowed packet length differ between technologies. The maximum allowed packet length was 1500 bytes for WiFi compared with 114 bytes for IEEE 802.15.4 and 240 bytes for DASH7. The actual maximum is actually higher for the latter two if minimal headers can be used. Figure 1 shows the measurement results with each line denoting one link. Figure 1A shows eight different static links with different conditions and different distances using WiFi with a data rate of 2 Mbps. Six links were set up in a typical office environment with several other interferers (both WiFi and others) and two in a typical rural home environment with minimal interference (links 2 and 6). The links were all non-line-of-sight (NLOS). The remaining subfigures show similar measurements in different environments and with different configurations and technologies. The dashed lines in Figure 1B show measurements done in an anechoic chamber.

Packet loss analysis
From the results in Figure 1, we can see that the PDR indeed decreases with increasing packet length, but that there is a large variation among the different scenarios. Sometimes, as demonstrated by link 7 in Figure 1, this difference can be quite dramatic; going from 46% PDR for 50 bytes packets down to only 6.5% for 1500 bytes packets. We can also see that it is not possible to accurately estimate the packet loss for all packet lengths if we only generate hello packets of one packet length, since we cannot know the slope of the curve. Links 5, 6, and 7 of Figure 1 demonstrate this case as they all have roughly the same delivery ratio for 50 bytes packets but radically different PDRs for long packets. This is the reason to extend Equation (1) with a length-independent delivery component p 0 as follows: where p L denotes the packet delivery probability for a packet of length L and p L is the length-dependent delivery component with p as its parameter. For a list of these notations and other mathematical notations used throughout this article, see Table 1. More results and details about these results can be found in our previous work. 11 To motivate this model, we may reason as follows. There are both packet length-independent factors and packet length-dependent factors that affect the chance of packet loss. For instance, we may assume that a successful packet reception involves two events: (a) the receiver detects the packet transmission and is able to tune onto the transmitter (this event is independent of the packet length) and (b) the receiver can correctly receive all bytes in the packet (this event is length dependent). However, these two events are of course not independent either, but as we will show later, this empirical model is accurate enough for our purposes.
Under the assumption that byte errors are independent, you may think of p as the payload byte delivery probability. We prefer to write the equation based on PDR and bytes, instead of PER, BER, and bits, since one byte is the smallest transmittable unit in all the wireless protocols in this study and most other wireless protocols. Note that we can translate between the two forms, since 1 − p is linked to the BER. Header length (bytes) n The number of hello packet reception events used in an estimation S, B The set of used data packet reception events used in an estimation divided into two sets (small and big) d The packet length threshold used to divide packets into S or B (bytes) PDR X The measured packet delivery ratio for the set X of packet reception events Abbreviation: PDR, packet delivery ratio.
We also note that Equation (2) is inline with the channel model by Gilbert. 12 Gilbert states that for a link following the Gilbert channel model, the relationship between PDR and packet length is as follows: where , p, and q are model parameters (Gilbert uses A, J, and L). By choosing q = 0, we obtain Equation (2). Hence, the Gilbert model is a generalization of our model. However, we will show that Equation (2) is practically good enough for LQEs for real systems and that wireless communication links follow the model of Equation (2) quite well regardless of whether the link follows Gilbert's two states channel model or not.

Estimating the packet delivery probability
How can we find the PDR of a particular packet length when we know the PDR of another length? In this section, we will develop some methods for this problem. We base these on the assumption that PDRs for one or two packet sizes are known and that we need the PDR of a third. We will use and Λ for the packet sizes with the known PDRs and L for the packet size of the PDR that we want to know. Without loss of generality, we assume < Λ. We use p x to denote the PDR of packets with length x bytes. Hence, the problem is to find p L given p and/or p Λ . In the first method, which is frequently used by higher layer protocols, such as routing protocols, no packet length effect is assumed, ie, that p L = p x for all valid sizes x. We call this method the flat method, and it is enough to know only one PDR in this method (eg, p Λ ).
In a second method, we still assume only one known PDR value but use the model of Equation (1). Assume that the header length h is excluded from the packet lengths and that bit errors indeed are independent. Then, we can obtain p = (1−BER) 8 as the byte delivery ratio. Converting from bits to bytes and from error rates to delivery rates in Equation (1), we obtain the following equation system for the two packet sizes of interest (namely, Λ and L): which solves to: We can now estimate p L if we know p Λ . We call this the BER header method. However, as observed in our previous work, 11 the most accurate estimations are not obtained with h set to the actual header length in Equation (4), but with a significantly larger h. This is because we are no longer just predicting the header but also the rest of the length-independent reception event. However, in this method, we only have the PDR for one packet size, which means that h must be a fixed parameter for all links for a given technology. The best approach is to use the h that minimizes the estimation erors for the average behavior of typical links. We call this the BER optimal method.
If a network designer chooses to collect the PDRs for two different packet sizes, such as using two different hello packet sizes, a better estimation can be made. In this method, which we call the two-packet lengths method (two sizes), we can calculate both p and p 0 in Equation (2) by the following equation system: which solves to: where p Λ is the delivery probability for hello packets of length Λ, p is the delivery probability for length , and Λ ≠ . The solution can be inserted into Equation (2) to calculate an accurate estimation for the PDR of packet length L as follows: We can of course also imagine networks where more than two packet sizes are used. Perhaps hello packets contain a varied amount of data whose length cannot be controlled (eg, a neighbor list). In such situations, we may use a method based on least squares or MLE. However, we should keep in mind that these methods already are complex for per-packet calculations, and, furthermore, more packet sizes lead to even more complex the computations. Another area where we may be forced to consider many different packet sizes is when using data packets. Section 5 presents solutions for that.
Nevertheless, we define and test two methods using more than two packet sizes but only as theoretical benchmarks. The first one, we refer to as the least squares (L-S) method and is based on Equation (2) and ordinary least squares of the logarithm of all the p x values between 50 and 500 (six PDR values). This method has an additional problem when applied in real LQEs, we also need to deal with the situation when p x values are estimated to be 0, which is not possible due to the logarithm.
The second method is called Gilbert, because it is based on Gilbert's model, 12 which is found in Equation (3). We parameterize this model based on all the p x values between 50 and 500. To find the optimal parameters, we had to use two standard search algorithms, namely, the L-BFGS-B optimization algorithm 22 and nonlinear least squares based on the Trust Region Reflective algorithm. Both algorithms are only suitable for offline analysis and hence are not practical for real-time per-packet parameterization. For each link, we chose the search result with the lowest absolute error for the six PDR values after removing search results with nonvalid parameters.

Comparison of methods
Here, we numerically evaluate the different packet loss estimation methods by including all the 59 links that we measured, including the ones from Figure 1. Figure 1 does not include all our links, since some scenarios and links were omitted due to space limitations and similarity with other links. Especially indoor 2 also include five more links from a rural residential location in addition to the ones shown in Figure 1. Since the different technologies have different maximum packet sizes, we also used different values for and Λ for the different technologies. Table 2 lists all the used parameters for the different technologies.
We tested the six different estimation methods and calculated the absolute error compared with the actual PDR of all packet sizes. Then, the mean absolute errors (MAE) were calculated for all packet sizes and links per technology. That is, for each scenario S and estimation method , we calculated the MAE as follows: where S is the scenario expressed as a set of links, L is the set of packets sizes, p s x is the measured PDR of packet length x on link s, and L () is the estimation method trying to estimate p L , such as Equation (4) or (6).
The results are shown in Table 3 in units of a percent. For the BER optimal method, we calculated and used the optimal h for each individual technology but over all the measured links. The optimal h was found by a scalar minimization algorithm of the MAE, and we found h = 1294 for WiFi, h = 38 for 802.15.4, and h = 20 for DASH7 to be the optimal values. Table 3 shows that errors are roughly declining when going from left to right, ie, going from simple methods towards more complex and better methods. For methods only using one single p x value as input, we measured using either p or p Λ as input data. Going from the flat method and the BER header method with the actual header length to the BER optimal method with a better h parameter, we can immediately see a significant error reduction. Also, using longer hello packets improves the accuracy. When using the two-packet lengths method (2 Sizes), the errors reduce even further. However, the benefit of using the least squares method or the Gilbert model with search algorithm parameterization is small, even though all the six PDR values between and Λ are being used.
In the remainder of this article, we will focus on only WiFi. However, it must be pointed out that the same approaches apply to the other technologies as well and that the results are similar.

Per-packet handling
It is not enough to only have simple models with good enough accuracy. For an LQE to be of practical use, it also needs to be easy to compute in an efficient way. In this section, we focus on keeping the required computation and storage down so that the methods can be used for every packet reception event, even on resource-constrained devices.
The first to consider is how to obtain and store the packet reception events and how to estimate p and p Λ . Obviously, we need the computation per-packet reception event to be small and at the same time reduce the memory allocation for the collected data, since LQE mechanisms often will be handled in low level operating system drivers, in the firmware of network interface cards, or on simple embedded microcontrollers. Furthermore, it is important that an online algorithm is used, meaning that when a new packet reception event occurs, the computation is incremental and based on previous calculations instead of being forced to recalculate almost everything. This can keep the per-packet computation very low. All the presented methods allow for this if the p and/or p Λ also can be obtained without recalculation everything.
The estimation of p and/or p Λ can easily be obtained from some of the last packet reception events by a simple moving average. Such an average can be calculated in an online fashion without a full summation for each new packet reception event. In the remainder of this article, we will assume such a solution. However, this solution requires the last n packet receptions to be stored at the receiver. For accuracy reason, n needs to be large enough, which we will further discuss in Section 4. However, there are alternatives that removes the storage requirement, such as the use of an exponentially weighted moving average (EWMA), which can be considered in many LQE implementations.
To simplify even further for resource-constrained systems, such as embedded systems based on micro-controllers without hardware support for floating-point arithmetics, we note that it is possible to find fixed point arithmetic implementations of all the methods. For the BER optimal method and Equation (4), we want to make the exponent a small integer by choosing Λ in clever ways. This would simplify the formula to a few simple multiplications. For instance, if we are considering L = 1500, we may choose a hello packet size of Λ = 103. This would yield an exponent of h+L h+Λ = 2, which would radically simplify Equation (4) and only require a single multiplication.
In a similar way, we may select = 48 and Λ = 411 for the two-packet lengths method and Equation (6). In this case, the exponent becomes 4, and we can estimate the PDR for 1500 bytes packets with just one division and three multiplications.

HELLO PACKET SAMPLING
With a hello protocol, we sample a link's quality by injecting packets on a periodic basis. The quality of the LQE is dependent on the availability of fresh hello packet reception events. For a new link, there is only a few number of reception events, and this will affect the accuracy of the estimate in the beginning. As more hello packets are injected and used, the accuracy becomes better if the link is stable. For unstable links, we do not want to depend on too old reception events, but we discard them after a while. This means that the number of packets that can be used is reduced due to freshness requirements. To still get a good estimate of dynamic links, we need to increase the amount of hello packets per second.
The methods defined in Section 3.2 will work well when the in-data (ie, the estimates of p and p Λ ) is very good and in all the evaluations so far, we have had a total of 2000 packets sent per-packet length on stable links. However, what happens to the accuracy if we only use a limited number of packets? To answer this, we will model the variables p , p Λ , and L stochastically using errors-in-variables models and assume that the links follow our loss model in Equation (2).
In the following, we let n denote the total available packet reception events. If two packet sizes are used, we assume equal amount of reception events per-packet size (ie, n∕2). We definep andp Λ as the measured values of p and p Λ respectively. Since packet receptions can be considered as i.i.d. Bernoulli events, we have thatp ∼ B(n, p )∕n andp Λ ∼ B(n, p Λ )∕n, where B(n, p) is the Binomial distribution with the parameters n and p.

Bias and consistency
The first questions are whether our estimation methods are unbiased and consistent. It can be shown that all our estimators (flat, BER hdr., BER opt., and two-packet lengths) are consistent. Hence, if we increase the amount of packet reception events (ie, let n → ∞), then the estimator will get better and converge (in probability) to the correct value of p L . However, only the flat estimation method is unbiased, meaning that for small values of n, the other methods have a small bias. They all tend to overestimate more often than they underestimate. This follows from the fact that all the methods use the exponential function x a , which is strictly convex for a > 1. Due to Jensen's inequality, we can conclude that the estimators are upwardly biased except for the trivial cases ofp Λ = 1 orp Λ = 0. Nevertheless, it is possible to calculate the amount of bias. Below is the bias formula for the BER method with any value of h and packet sizes such that L > Λ.
where a = (h + L)∕(h + Λ) and a ≥ 1. This bias is strongly dependent on n, p Λ , and the exponent a. Unfortunately, it is difficult to calculate exactly, but the bias reduces when n → ∞ (the estimator is consistent) or a → 1 (it becomes the flat method) as well as when p Λ → 0 or p Λ → 1.
Since we can quantify the bias, it may be tempting to compensate for this in the estimator to achieve an unbiased estimator, and this can be done using numerical calculations of Equation (7) or by finding good approximations. However, in general, subtracting the bias from an estimator does not necessarily lead to smaller estimation errors and that also holds here. Numerical calculations that we did indicate that the MAE of the estimator does not reduce and even increases if we introduce bias compensation. Hence, we discourage from the use of bias compensation for these estimates.

Two-packet lengths method
In the two-packet lengths method, we need to divide the packet reception events into the two different packet sizes. This means more variance since we have half as many events per in-data estimate (p Λ andp ). Furthermore, we have two variates, both with Binomial distributions as before, namely,p ∼ B(n∕2, p )∕(n∕2) andp Λ ∼ B(n∕2, p Λ )∕(n∕2). Add to this the fact that the estimator is upwardly biased. Hence, we need to quantify the errors this estimate has when n is small.
First, due to insufficient sampling, we may end up with situations when the standard two-packet lengths method (Equation 6) cannot be used or will obviously estimate wrong due to too big variance in the input. There are in particular two cases that we need to take special care of. Namely, if eitherp = 0 orp Λ = 0, and ifp <p Λ . To deal with this, we redefine the two-packet sizes estimation function as follows: In the remaining of this article, we will use this estimation function for the two-packet lengths method. It can be shown that this modified estimation function is consistent, since the in-data errors will reduce when n → ∞ and only the last case will be used.
Using a Monte Carlo simulation approach, we numerically calculated the absolute error of the two-packet lengths method as follows: and calculated the MAE by repeatedly using all the links from all 44 WiFi scenarios. Note that we use L (p , p Λ ) rather than the actual p L as benchmark. This is because we here only want to study the errors introduced by insufficient sampling. Later, we will look at the complete errors.
As before, we used = 50, Λ = 500, and L = 1500, and took p 50 and p 500 from all measured WiFi links. For each link, we simulated packet reception events for the used hello packet sizes according to a binomial distribution divided by the number of simulated packets. The resulting estimatesp 50 and/orp 500 were then fed to the different estimation methods, and the MAE was calculated. Each link was repeated enough times for the MAE results to become statistically significant (99% confidence interval within ±.01%). Figure 2A shows the results. The max curve shows the link yielding the worst MAE; ie, the MAE of all links fall within the gray area.
From Figure 2A, we can see that the MAE quickly decreases as we use more packets. At n = 200 packets (100 packets per length), we get an MAE of 6%. At n = 400 packets, this number drops to 4.4%. These MAE values should be compared with the errors introduced by the estimation methods, which are shown in Table 3. It is clear that the sampling errors are significantly bigger than the method errors when considering hello protocols with small n. These results are important as the amount of available packet reception events might be limited for many links. Maybe it is too expensive to often generate hello packets, or a link shows a dynamic behavior meaning that information from old packets must be discarded as they are outdated.
In Figure 2B, we further study the errors due to insufficient sampling for the two-packet lengths method by showing the CDF for n = 20, 200, and 2000. Due to the fact that some links have more room for variance and thereby may give raise to bigger estimation errors, we have divided all our WiFi links into three equal-sized sets (Weak, Average, and Good). This was done by sorting the links according to their average PDR for all packet sizes and equally divided them into the three sets. The figure shows the CDF curves for each link set and each of the three n values. We can see that the links with average link quality have larger errors than both the good and the weak ones. This is not surprising given that the average links have more room for variance of p L . For the good and weak links, p L is bounded by 100% and 0%, respectively. Hence, a large portion of the problems with estimation accuracy stems from links with in-between quality. It is here we will see the biggest errors, and this effect is aggravated when n is small.

FIGURE 2
Errors for the two-packet lengths estimation method (50&500) due to insufficient hello packet sampling

Comparison of methods when n is small
We know that insufficient sampling leads to bigger errors for the two-packet lengths method, but how does this compare with the other methods? Figure 3 combines the sampling errors and the method errors. It shows the errors compared with the actual p 1500 values of each link, ie, | L (p ,p Λ ) − p L |. The figure shows the MAE from all links in all WiFi scenarios for the three different estimation methods. The results of Figure 3 reflects the actual errors to be expected in practice given n hello packet samplings for each of the methods.
In Figure 3, we can see how the two-packet lengths method reduces its estimation error as the number of packets are increased as we showed previously. It is also apparent that the methods based on a single packet length do improve the estimations quicker. Since we know that the errors due to the model are smaller for the two-packet length method, the only explanation to this is that the model is worse in dealing with errors in the input estimates due to the insufficient sampling. Hence, if it is necessary to keep the number of hello packets down (eg, dynamic links), a single packet length method is preferred. When the amount of packets increase, the two-packet lengths method becomes more accurate, outperforming the BER optimal method using 50 bytes packets already at 40 packet receptions. The BER optimal method using 500 bytes packets is outperformed from around 700 packet receptions. However, we need to remember that the results for the BER optimal method are optimistic as we are using the best possible h for the tested links and real systems will never be able to do that without extensive measurements.
In Figure 3, we also included the result of using extensive search algorithms based on Gilbert's model (ie, Equation 3). While it is more accurate with good input values, we can see that this method, if it was computationally practical, would still converge slowly as n increases; ie, using Gilbert's model causes problems when dealing with limited information. The least-squares method could not be implemented as it cannot deal well with p x = 0 values, which are very common when n is small.
Since the errors in Figure 3 include both model errors and errors due to limited sampling, we can use it to predict what the estimation errors will be in real LQE implementations. As such, it is not always the case that the better model of the two-packet length method always is better. In the case of few samplings, such as when dealing with a dynamic link, single packet lengths methods will achieve lower errors.

Impact of the used hello packet sizes
Another aspect affecting the estimation errors is the used hello packet sizes, ie, Λ and . In this section, we try different hello packet sizes in combination with different amount of hello packet samplings to find out how the estimation errors are affected. Figure 4 shows the results for all estimation methods. In all curves, we look at the mean absolute estimation error of all packet sizes. Figure 4A shows the single packet size methods, such as the flat method, the BER method using the actual header length, and the BER method using the optimal h value. The solid curves show the MAE when n is large, while the dashed curves show the same, but with a hello packet sampling of only n = 100 (50 small packets and 50 large packets). Since we look at the errors of estimating all packet sizes, a hello packet length close to the middle (ie, around M∕2 = 750) does indeed give us the best results. However, we can also see how the accuracy improves as we start using larger hello packets  up until Λ = 800. From there onwards, there is no improvement or even a degradation. The degradation can be explained by the fact that it becomes harder to correctly estimate small packet sizes using large hello packets.
In Figure 4B, we show the results for the two-packet lengths method. As we now have two different hello packet sizes, we get many more curves. However, we omitted many of them to reduce clutter in the graph. Each curve represents a different size of the smaller hello packet (ie, Λ). The x-axis is the difference between the large and the small hello packet size (ie, − Λ) in bytes. For a large Λ, the difference cannot be too large, because then the larger hello packet would be larger than the maximum packet size. That is why the curve for Λ = 900 bytes stops at 600 bytes on the x-axis. Also here, we show the results for when n is large and when we use a limited hello packet sampling of n = 100.
From Figure 4B, we can see that the best choice of Λ is 500 bytes, and this is the best choice also among the omitted curves. However, any Λ in the range 300 to 700 bytes are also quite good and give similar estimation errors. Furthermore, the two used packet sizes must not be too similar in size. The results suggest that − Λ should be at least 200 bytes, but even larger differences are better. However, when n is large, we can see a slight decrease in estimation accuracy as the packet length difference is increased beyond around 900 bytes. The same cannot be seen when n is small, instead a larger difference between the two used hello packet sizes is almost always better.

USING DATA PACKETS
Regular data packets can also be used to estimate the link quality instead of dedicated hello packets. In order to do so, it must be known which packets are lost, which can be achieved by sequence numbers, for example. Unfortunately, the lengths of the data packets are determined by the applications, and they can have any length. While this can give more complete estimations, computations get more complex, in particular as the number of different packet sizes increases. In this section, we will develop computationally efficient estimation methods that use regular data packets as the in-data. As previously, an online algorithm solution is needed that can reduce the amount of computation per packet.
If the packet length mix generated by the applications over a link is dominated by either one or two particular packet lengths, this problem becomes trivial; we can use any of the previous methods and just discard packets of other lengths. However, if this is not the case, we need new methods. It is also possible that the packet length mix changes over time, making the problem even harder. To deal with this, we may try standard statistical methods, such as nonlinear least squares or MLE. However, both these methods are very computationally intensive as they require a large amount of calculations. It is also hard to find good online algorithms for these. In addition, they are not necessarily unbiased and consistent. For instance, taking the logarithm of the PDR values and using a linear least squares method will minimize in log-space only and will not generate the optimal parameters of the original error model. Hence, we will develop dedicated methods for estimating using data packets instead. We first estimate the model parameters p and p 0 and then look at the final accuracy of estimating p 1500 .

Estimating p
A simple way to find an estimate is to divide the packets into two bins, one for small packets and one for big packets. Let us assume the division is made at d = M∕2, where M is the maximum packet size (ie, M = 1500 for WiFi). We define two disjoint sets of packets S = {i|L i ≤ M∕2} and B = {i|L i > M∕2}, one for each bin. L i represents the packet length of the ith packet reception event, and if we have n events, then |S| + |B| = n. In the remaining of the article, we make the assumption that |S| ≈ |B|. Furthermore, let L be a stochastic variable representing the packet length distribution. Then we can define L S to be a stochastic variable representing the packet length distribution in the bin of small packets and likewise L B will be the distribution for the large bin. For instance, if L is a uniform distribution between 1 and M, then L S is uniform between 1 and M∕2 and L B between M∕2 + 1 and M. For any distribution L, we can calculate the average packet length and the total PDR per bin as follows: where y i is 1 if packet i was correctly received or 0 if it was lost or corrupted. The two points (S, PDR S ) and (B, PDR B ) allow us to reduce the problem to the two-packet lengths method for calculating p and p 0 , ie, using Equation (5). However, if we assume that packet lengths are uniformly distributed between 1 and M, that packet losses exactly follow our error model in Equation (2) and that there are large amounts of packets in both bins, we will still not predict the correct values; ie, PDR S does not predict p S and the same for PDR B .

Theorem 1. The estimators PDR S and PDR B of p S and p B , respectively, are both asymptotically upwardly biased.
Proof. We prove this theorem using Jensen's inequality. Our model is exponential, which is a convex function. Hence, for any type of packet length distribution, it follows from Jensen's inequality that: in the large bin. The results are the same for the small bin.

Corollary 1. PDR S and PDR B are not consistent estimators of p S and p B (does not converge in probability).
Proof. See Appendix A.
To further explain this, assume that packet lengths are uniformly distributed (between 1 and M). This means that the packet lengths L i in each of the sets S and B also are uniformly distributed. Hence, we get S and B exactly in the middle of their respective bins.
The consequence of Theorem 1 and Corollary 1 is that Equation (8) overestimates the PDR, and this overestimation is systematic and independent of the sample size used for the prediction, that is, the estimator is not consistent. The only exception would be if the packet size distribution is strongly bimodal, with one mode in each bin. Hence, PDR S ≳ p S and PDR B ≳ p B . However, it is also important to quantify this inequality since a small difference may be ignored. We can calculate the expected values for each of the four values of Equation (8) as follows: and uniformly distributed packet sizes distribution for large samples sizes (Equations 11 and 12). We assume p 0 = 1 and a cut-off between S and B at M∕2 when p < 1. The asymptotic bias is as follows: Since Equations (11) and (12) are difficult to grasp, we plot them in Figure 5. We can note that the error is much bigger for the estimator PDR S compared with PDR B , and this is because the slope of the curve is steeper for the smaller packet sizes (see for instance Figure (1). These errors will have impact on the accuracy for any estimator if used this way and increasing the sample size will not improve. We have exemplified with uniform packet size distribution, but this will hold true for most other distributions, except a bimodal distribution. Despite all these issues, we can still define an estimator p for p by using the two points from Equation (8) and the two-packet lengths method, ie, Equation (5) as follows: when PDR S > 0 and PDR S ≥ PDR B .

Corollary 2. When packet sizes are uniformly distributed and p > 0,p as defined in Equation (13) is a consistent estimator of p (converges in probability).
Proof. When we combine Equation (9) and (10) with Equation (13), we get: since all values are bounded, and all the functions are continuous within those bounds. This shows thatp is asymptotically unbiased. Since Var[p] → 0 as n → 0, we can conclude thatp also is consistent. This follows from Chebyshev's inequality. See Appendix A for details.
Hence, despite the overestimation of PDR S and PDR B ,p is a consistent estimator for p and hence will work well as long as the sample size is not too small. This is an effect of PDR S and PDR B having errors that are canceled out in the estimation of p. Note that this estimation works quite well even when packet lengths are not uniformly distributed as long as both the S and B sets contain sufficient amount of packets. Therefore, depending on the packet length mix, we may choose another cut off threshold between the S and B sets than M∕2 to make sure both bins contain sufficient amount of packets.

Estimating p 0 using the two-bins method
While the method of dividing packets into two bins works well for estimating p, it does not estimate p 0 accurately due to the overestimation discussed in the previous section. Even so, we may still try to use it and, as we will see later, it will actually work quite well in many scenarios. To find p 0 using this method, which we will call the two-bins method, we use Equations (9) and (10) with the bottom equation of Equation (5), and then we obtain these two options: To get an even better estimate in the case of a low number of packet reception events, we choose to take the arithmetic average of the two options of Equation (14).
However, there are other ways of estimating p 0 . One might attempt to divide into more than two bins and use a linear least square method of the logarithm of the bin PDRs. However, increasing the number of bins beyond two will not improve the estimation. As long as we divide packets into bins, we will still overestimate the curve exactly as before and such estimators will remain asymptotically upwardly biased and inconsistent. Hence, it is sufficient to use just two bins.
Another way is to compensate for the bias, which we tried without success in Section 4.1. However, as we will soon see, in this case, bias compensation works better. From Equations (11) and (12), we know that we can quantify this bias if the packet length distribution is uniform. The bias simply depends on p and M. To do this, we define a as follows: Using this , we can get new estimators for p 0 as follows: Using this method, which we call the two-bins+ method, we can find p 0 in an accurate way if the packet length distribution is uniform or approximately uniform. However, it must be noted that if the packet length distribution is not uniform, the accuracy may disappear. For instance, if the distribution is bimodal with one mode in each bin, there is no error, and we should have that = 0. Instead, using this method, we would underestimate.
To simplify the two-bins+ method, we could approximate the calculation of in Equation (15) with a simple polynomial function that approximates well in the valid area of p, ie, 0.998 < p ≤ 1. All our WiFi scenarios have p values within those bounds. With polynoms of degree 3, we can efficiently approximate both and · p M/2 and thereby avoid calculating the exponentiations of Equation (15) if needed.

A consistent estimator of p 0 in the general case
To find an unbiased and consistent estimation method of p 0 , we need to leave the bin approach. The following method, which we call the transform method, is based on a coordinate system transformation. We start by assuming that we know p, which we may have obtained from the two-bins method in Equation (13). Then, we use the following coordinate transformation: where x i is the packet length of the i:th packet, y i is 0 or 1 depending on if packet i arrived or not, and ′ i is the transformed version of y i . The transformation makes the PDR curve, which normally is an exponential decay function, into a horizontal line at y ′ = p 0 . This happens since we simply subtract the actual PDR curve (with the correct p value) from the original y i or PDR values. As a consequence, the average of the transformed values ′ i should be a consistent estimator of p 0 as follows: assuming that we have a total of n = |S| + |B| packet reception events. From this, we get: when n → ∞. Now, we can find an estimator of p 0 , by rewriting Equation (16) as follows: To estimate both p and p 0 , we need to calculate the sums of Equation (8) for both bins. Then, we obtainp from Equation (13) andp 0 from Equation (17). For most parts, this method contains an extra exponent in the denominator of Equation (17). However, the biggest challenge is whenp changes, because then we need to recalculate the denominator of Equation (17). Essentially, we have to recalculate the entire sum, which means calculating a lot of exponents. Hence, this is not an online algorithm. However, in Appendix B, we develop an approximation based on Jensen's inequality (p L i is a convex function) and Hölder's defect, which is an quantification of how much inequality there is in Jensen's inequality. When we use this approximation, we call it the online transform method.
The next question is whether these two estimators are unbiased and consistent. The estimator p of Equation (13) is only asymptotically unbiased but biased for any n < ∞. This means thatp 0 of Equation (17) must be biased if it is based on the estimator p of Equation (13). However,p 0 of Equation (17) is consistent when combined with a consistent estimator p, such as Equation (13), as shown by the following theorem.

Theorem 2. The estimatorp 0 of Equation (17) is consistent of p 0 under the assumption that the estimatorp is consistent.
Proof. We start by showing thatp 0 is asymptotically unbiased; ie, that E[p 0 ] → p 0 when |S|, |B| → ∞: = p 0 when |S|, |B| → ∞. The last step holds true sincep → p (is consistent). Finally, we can conclude the proof in the same way as in Corollary 1 and Appendix A, since we trivially can see that Var[p 0 ] → 0 when |S|, |B| → ∞.
Under the assumption thatp is a perfect estimator (ie,p = p), numerical results suggest thatp 0 of Equation (17) is not only asymptotically unbiased but also unbiased for all n. However, since this does not hold true for the estimatorp of Equation (13), this is mostly of theoretical interest. Nevertheless, Equation (17) does not add its own bias when n is small.
Due to the approximation used in the Hölder's defect, the online transform method is both biased and asymptotically biased and hence not consistent. How these estimators behave for different amounts of packet reception events n will be demonstrated numerically in the following section. We will see that this bias is within reasonable bounds.

Numerical results
The remaining question is how good the different methods are in practice. In this section, we first used the 44 WiFi links to find representative values for p and p 0 . Then, we used those values to simulate how the different methods perform. For each link, the p 0 and p were found by an optimization algorithm that minimized the absolute errors between Equation (2) and all p x for that link.
With all 44 values of p and p 0 , we again resorted to a Monte Carlo simulation, where we simulated n = 2000 data packet reception events per link. For each packet, the simulator generated a random packet length according to a certain packet length distribution, calculated the expected PDR for that packet length based on Equation (2)   the simulated WiFi link. Finally, the successful or failed delivery of the packet was simulated by a Bernoulli trial using the calculated PDR. The split between the two bins was d = M∕2, ie, 750 bytes for our WiFi tests.
In the first comparison, the data packet-based estimation methods had access to the actual p value of the link, and the task was only to estimate p 0 . Table 4 shows the results for all the estimation methods introduced in the previous sections; 2 Bins is the two-bins method, 2 Bins+ is the two-bins method with bias compensation, Tf is the transform method, oTf is the online transform-method, and MLE is an MLE based on the following likelihood function: where R is the set of packets received and L is the set of packets lost in a given sample R ∪ L. We maximized the logarithm of Equation (18) using the L-BFGS-B optimization algorithm, 22 which is a bounded version of the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm. This method is only for comparison due to its complexity. The table shows the MAE between p 0 and the estimatedp 0 for all 44 WiFi links repeated 5000 times. The confidence interval is neglectable. As shown in the table, several different packet length distributions were used. First, we tested uniform packet size distribution where all valid packet sizes are equally probable. The bimodal distribution selected only packet lengths of either 375 bytes or 1125 bytes, ie, M∕4 and 3M∕4. The trimodal used three different packet lengths (50, 775, and 1500 bytes). In both bimodal and trimodal, all used packet lengths were equally probable. We also tried integer versions of the exponential and normal distributions that were truncated to between 1 and 1500 bytes. Finally, we tested the simple and complete IMIX distributions, which are used in standardized performance tests and are designed to mimic the real packet size distribution on the Internet. 23 IMIX Simple is a trimodal distribution with different packet sizes and unequal probability between the packet sizes. IMIX Complete is based on IMIX Simple with a uniform packet length distribution as base.
From the results in Table 4, we can see that there is very little difference between the methods. Hence, in practice, we can choose a simpler method, such as the two-bins method. Regarding the two-bins+ method, we can see an improvement for the uniformly distributed packet lengths, which is expected. However, for the other extreme, the bimodal distribution, the error actually increases slightly. This shows that this method may degrade the accuracy. The conclusion from this is that the two-bins method is the simplest method, and only the significantly more complex transformation method gives better results. In the remainder of this section, we only focus on the two-bins method and the online transform method.
In Figure 6, we show the importance of d, ie, where the split between the two bins is, which is used by both the two-bins method and the (online) transform method. These results were obtained using the same Monte Carlo simulation approach as above and repeated enough times to make the confidence intervals very small. A uniform packet length distribution was used. The figure shows the MAE when using the complete methods, ie, predicting both p and p 0 and using those to predict the actual p 1500 . Both the two-bins method and the online transform methods are shown. From the figure, we can see that the used d value has little impact as long as it is not approaching extreme values. We can also see that d = M∕2 is not the optimal split but a slightly smaller value. As shown in the figure, the result is similar for both methods as well as for different n, the amount of data packet reception events used. We repeated the experiments with other packet length distributions, but the results were similar.
Finally, in Figure 7, we compare the complete two-bins method versus the online transform method when estimating p 1500 using different amounts of data packets (eg, different n). We have also included the flat method based on data packets and an MLE method of Equation (18) as comparison. We also attempted an MLE variant of Gilbert's model, but that was too difficult to implement in a robust manner.  From the figure, we can see that the two-bins method outperforms the flat method and performs well for both distributions and that the errors are almost identical. Similar results are obtained from other packet length distributions. When comparing Figure 7 with Figure 3, we find that the two-bins method follows the two-packet lengths method with the two-bins method having a lower error. This is due to the longer packets being used in the two-bins estimation. We can also see that both the two-bins method, and the online transform methods are not far behind the MLE method. This means that these methods should be expected to perform close to what is possible, while still offering a feasible solution in practice for most LQE use cases.

RELATED WORK
Previous research has shown that packet length has an impact on the PDR with longer packets having a larger chance of getting corrupted. 1,9,12 Unfortunately, too many LQEs ignore this and simply assume the same delivery ratio among all packet lengths. Some authors, such as Fonseca et al, 2 have acknowledged this and propose to use regular packets or protocol packets, such as acknowledgments and signaling packets.
The current empirical work on packet length effects is surprisingly thin. Most works concerning these effects are based solely on theoretical approaches. [24][25][26][27][28][29][30] Quite often, a Raleigh or Rician small-scale fading model is used to find the BER. The BER is usually translated into a PER using Equation (1). This means that independent bit errors are assumed, and this is used without any empirical validation.
In other works, [12][13][14][15][16] it is common to see a model that describes either the BER or the PER but very rarely their relationship. However, Gilbert 12 does this for his two-state Markov chain model. Unfortunately, even Gilbert's simple model contains three parameters that needs to be found by an LQE, making it an unsuitable choice for an LQE. This also means that more complex models based on Markov chains 13-16 also are unsuitable for LQE even though they may be better at describing the behavior of the links.
Most of the work on packet size effects on PDR is within packet length optimization or packet size adaptation, 24,29,31-33 ie, how to find the optimal packet length for a link. Most base their work on theoretical models or simulations and develop mechanisms to improve throughput, reduce energy consumption, and/or minimize delay. Only very few of them, such as Lettieri and Srivastava, 31 did measurements, but then not of the actual impact of packet length on PDR.
Also outside the area of packet length optimization, very few studies have been done of actual measurements. One exception is Nguyen et al, 9 which showed that the packet loss increases exponentially with the packet length and that the packet loss doubles for each additional 300 bytes on a wireless link without FEC coding. Another one is Chakeres and Belding-Royer, 10 where the authors study the use of hello packets in the Ad hoc On-Demand Distance Vector Routing (AODV) routing protocol and try different parameter combinations. One of their conclusions is that when using hello packets of the same length as the data packets, end-to-end performance in the network increases from 60.7% to 80.8% delivery ratio. Dong et al 32 included one graph of one link based on IEEE 802.15.4 where packet loss for various packet sizes are shown and Srinivasan 34 looks at the PDR for both data packets and acknowledgments and concludes a significant difference.
Perhaps the most complete measurement of the packet length effect on PDR is the work by De Couto et al. 1 There, the authors extracted measurements from their own testbed and plot the packet length vs PDR for some of the links. They use the results to explain why they experience a lower accuracy for their LQE. However, no further analysis or modeling of the packet length effect is done.

CONCLUSION
The PDR of wireless links depends, besides channel conditions, on the packet length. Derived from measurements, we introduced a simple loss model that captures this behavior. Using this model, we proposed several ways of estimating the packet delivery probability of longer data packets based on sampling the link using shorter packets.
We showed the estimation accuracy of these methods when estimating delivery ratios of different packet lengths. By sampling using two different packet lengths and then applying some few low-complexity calculations, the estimation of the packet delivery probability for arbitrary packet lengths was significantly improved. We also presented how to do this efficiently and what impact a small sample size has on the estimators.
Finally, the use of data packet reception events as input for the estimators was discussed. Several efficient estimation methods were developed that could use reception events for data packets where the used packet lengths are dictated by the applications. We showed how an online algorithm for this could be constructed. Lastly, a comparison study of the accuracy of all the proposed methods were done, showing improved accuracy for our proposed link quality estimators. Hence, we believe that all future link quality estimators must consider the effects of packet length. 33. Dong W, Liu Y, Wang C, Liu X, Chen C, Bu J. Link quality aware code dissemination in wireless sensor networks. We choose a n 0 that is large enough so that for all |S| > n 0 : If |PDR S − a| ≥ , then in accordance with reverse triangle inequality: and this means that we can conclude the following implication: and according to Chevyshev's inequality, we have as |S| → ∞. Equations (A2) and (A3) together mean that Equation (A1) holds. Hence, PDR S P → a and PDR S cannot be consistent for p S at the same time.

APPENDIX B: HÖLDER'S DEFECT
In this appendix, we will use Hölder's defect (see Steele 35, ch. 6 ) to find an online calculation method for the denominator of Equation (17). We will assume WiFi with 0 ≤ L i ≤ 1500 and 0.998 ≲p ≤ 1, which holds for all the measured WiFi links  in this article. The same approach can be used for other technologies as well. We know from Jensen's inequality that: where L = 1 n ∑ n i=1 L i . Hölder's defect quantifies the error in Jensen's inequality (Steele 35, ch. 6 ) and should be able to find tighter bounds than Equation (B4).
The convex function is (x) =p x and its second derivative is ′′ (x) = (lnp) 2px . Hölder's defect applied on Equation (B4) states that: To find a good estimate, we can set to the average between m and M, but this will be too low (again according to Jensen's inequality). Instead, we use the average packet length L as follows: Then, we get the following approximation for the denominator of Equation (17): ) .
This approximation looks more complex but is easier to compute in incremental steps. Note that there are good online algorithms for the calculation of the sample mean. 36 The variance can also be calculated in an online fashion by, for instance, keeping running sums for the following identity: Var[X] = E[X 2 ] − (E[X]) 2 . This makes it possible to find a completely online algorithm for the approximation in Equation (B5).
That Equation (B5) is a good estimate can be shown numerically with a Monte Carlo simulation. It is clear that the estimate will be worse for some packet length distributions than others and for different values ofp. To put this to a test, we calculated the relative error for some different options ofp and common packet length distributions that can be imagined or has been proposed. We used the same distributions as in Table 4. Table B1 shows the results from the simulation. It is clear that the maximum relative error is found for smallp. The maximum relative error from the tested distributions is found to be 7.6 %, which should be an approximate upper bound for the approximation in Equation (B5). Given all the other errors that arise in the estimations of PDRs, this approximation is still reasonable.