†Some preliminary results of this paper were published in 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Munich, Germany, 15–17 February 2012 . In this paper, we have made the following improvements: firstly, we design a complete system and provide its detailed implementations; secondly, besides the storage overhead and traceback process overhead, we also conduct performance analysis in terms of the traceback accuracy and audit trail table access time; thirdly, to back up the analytic results, we carry out extensive simulations based on synthetic and real network topology.
During the last decade, tremendous efforts have been dedicated to enhancing the security of communication networks. A tip-of-the-iceberg victim list including E*Trade, Amazon (in February 2000), China Telecom's main DNS servers (in May 2009) and Wikileaks (in December 2010) demonstrates the destruction of denial-of-Service(DoS) attacks and defenselessness of the Internet under such attacks. Broadly speaking, DoS attacks can be classified into flooding-based attacks (e.g., TCP SYN) and software-exploits-based attacks (e.g., Teardrop). In 2011, the significant transformation has been the rise that the attacker blends two kinds of DoS attacks into the mix . Such attacks are much more complicated than before. Thus, it is more urgent to develop the effective counter-attack measures.
Defending against DoS attacks lies not only in taking preventive measures but also in identifying the true origin of the attacker and blocking further occurrences of such incidents. This boils down to the problem of single-packet IP traceback. The problem involves tracing the path of an individual IP packet back to its origin, which is tough and challenging because of the rampant spoofing of source address of the packet by the attacker. A handful of approaches for single-packet IP traceback have been proposed [3-7], in which packet logging is a generic technique. Its basic idea is to record the forwarding packets as audit trails, which suffer from the following disadvantages: Firstly, it demands all/part of intermediate routers on network paths to log packets as audit trails, which causes the storage overhead growing with the number of forwarded packets. Secondly, it queries not only the routers on the attack path but also those neighboring routers during the traceback process. Thirdly, it exploits Bloom filter to compress the logged packets, which introduces a great number of false positives . Therefore, it is desirable to propose a more precise and efficient single-packet IP traceback approach.
In this paper, we design a novel path-based approach for single-packet IP traceback (PSIT) that is independent on the packet logging. We observe that attackers cannot control routing paths although they insert arbitrary source addresses into IP packets. Motivated by the observation, we exploit the routing paths to establish logical paths called traceback paths (TPs) as audit trails.
The key idea of our approach is to set up a TP from the victim to the attacker in IP network when the IP packets traverse on the routing path. Inspired by the principle of building up label switching path in multi-protocol label switching , we propose a core concept called traceback equivalence class (TEC), which describes a set of IP packets that traverse on the same routing path. All the packets of a TEC are processed in the same way at routers. For each arriving TEC, the router Ri assigns a distinct label to the TEC. Ri writes the label and its identification information into this TEC. After the TEC is forwarded to the downstream router Ri + 1, Ri + 1 records the label and Ri's identification information. A TP fragment for the TEC from Ri + 1 to Ri is established. During the path traceback, the TP is constructed by linking the labels of each router in series, which is equivalent to accumulating the path fragments. Compared with packet logging, the benefits of this original idea is three-fold: (i) the storage overhead only relates to the number of routing paths, no matter how many packets traverse on them; (ii) the number of queried routers during the path traceback only relates to the hop counts of an attack path; (iii) the false positives are reduced to almost zero.
We carry out extensive analysis and simulation to conduct thorough numerical comparisons between our approach and the two state-of-the-art approaches. The results show that our approach significantly outperforms the previous approaches in terms of the overhead at routers, as well as the traceback accuracy.
Our major contributions can be summarized as follows:
We present a novel path-based approach for single-packet IP traceback. To the best of our knowledge, this is the first research that sets up TPs based on routing paths to solve the single-packet traceback problem. Moreover, we extend our approach to be available for IP fragments and transformed packets.
We use mathematical analysis to conduct a thorough performance comparison between our approach and the two state-of-the-art approaches, in terms of the path establishment overhead, path traceback overhead and traceback accuracy.
We perform extensive simulations based on synthetic and real network topology to supplement our analytic results. Our approach is demonstrated sufficiently by the evaluation results, which is more precise and efficient.
The rest of this paper is structured as follows. Section 2 discusses the single-packet traceback problem and the motivation of our work. In Section 3, we propose a path-based single-packet IP traceback approach called PSIT. We use mathematical analysis to evaluate PSIT in Section 4. We supplement the analytic result in Section 5. Section 6 discusses the practical deployment issues. In Section 7, we survey related research on single-packet IP traceback, and Section 8 concludes this paper.
2 BACKGROUND AND MOTIVATION
2.1 Single-packet traceback problem
In DoS attacks, an attacker can place an arbitrary IP address into the source address field of IP packet, termed as IP spoofing. In this paper, this forged IP packet is termed as an attack packet. The source and destination of an attack packet is an attacker (A) and a victim (V), respectively. Let R1, ..., Rn be the ordered list of intermediate routers traversed by this attack packet from A to V. We define this ordered list of routers as the attack path A. Obviously, the attack path A is also the routing path from A to V. The main objective of single-packet traceback problem is to trace an individual attack packet to its source, i.e., identify the attack path A. In our view, this is equivalent to establishing a logical path from V to A, shown in Fig. 1. This logical path is termed as Traceback Path (TP). Multiple attackers' case corresponds to an attack graph generated by overlapping all the TPs from the victim to the attackers.
The imminent threat imposed by DoS attacks call for precise and efficient single-packet IP traceback schemes that enjoy the following features:
allow for partial deployment on the Internet;
correctly trace back the attacks consisting of packets that undergo transformation;
low processing and storage overhead at routers; and
high traceback accuracy.
Previous schemes failed to satisfy the previous features collectively. For example, in MORE  and RHIT , all routers must be traceback enabled, which is an unrealistic assumption. If a traceback approach does not provide benefits for partial deployment, an ISP would have no incentive to perform deployment. In addition, RHIT cannot trace back the packets undergoing transformation. Removing the previous assumption, a handful of single-packet traceback approaches based on packet logging have been proposed [3-7]. However, such approaches demand partly/all intermediate routers to record packet digests as audit trails, which require a significant amount of resources to be reserved at routers and introduce a great number of false positives (for more details, refer to Section 7). Thereby, it calls for a more precise and efficient approach for single-packet IP traceback.
We observe that the source addresses in attack packets may be forged, but the destination addresses must be genuine. According to packet forwarding mechanism, for each arriving packet, routers only parse its destination address. Therefore, the routing paths in DoS attacks are genuine. In addition, the routing path is fairly stable in a certain period. Different from earlier work, we believe that employing routing paths to set up TPs instead of packet logging contributes to an improved traceback capability. Motivated by the observation, we develop a concrete path-based approach for single-packet IP traceback, which can help reduce the storage overhead. Moreover, we focus on reducing the number of routers queried during the path traceback. Furthermore, we address the issue of false positives.
3 PATH-BASED SINGLE-PACKET IP TRACEBACK
In this paper, we propose a novel path-based approach for single-packet IP traceback termed as PSIT. PSIT has a similar system model with source path isolation engine (SPIE) . In this model, deployed routers establish audit trails when IP packets traverse on the network, and the traceback manager accumulates those audit trails to trace back attack paths. However, PSIT has the different operations in “establishment” and “traceback” with SPIE and HIT.
3.1 Overview of the proposed approach
The key idea of our approach is to set up a TP from victim to attacker. Borrowed the idea of building up label switching path in multi-protocol label switching, we propose a core concept termed TEC, which describes a set of packets that are traversed on the same routing path. All the packets of a TEC are processed in the same way at routers. For each arriving TEC, the router assigns a label to the TEC and binds them, where the label is a short identifier with fixed length. The binding procedure is depicted in Figure 2. Assume the label L identifies the TEC F that comes from router Ru to Rd. If Ru sends a packet p contained in F to Rd, Ru marks its identification number and L into the header fields of packet p (hereinafter termed of “marking field”). After receiving p, Rd records Ru identification number and L. Thus, the TP fragment for F from Rd to Ru is established. L only makes sense between Ru and Rd. For the router Ru, L is labelout. For the router Rd, L is labelin. During the path traceback, the TP is constructed by linking the labelin and labelout of each router in series.
We illustrate the main idea by considering the scenario shown in Figure 3. Suppose both the identification information of a single router (hereinafter termed of “router ID“) and the label can be marked into a header field of forwarded packet (hereinafter termed of “marking field”). Let TEC x traverse on the attack path (A, R1, R2, R3, R4, V).
When x arrives at R1, R1 assigns labelout 3. This labelout is termed as standard out label (SOL), which is assigned by the ingress router only. R1 marks SOL and its ID into x and forwards x to the downstream router. After arriving at Ri (i > 1), Ri determines the proper TEC through its destination IP address and marking value. Then, Ri assigns a labelout to this TEC. A traceback path item (TPI) is made of the Ri − 1 ID carried in x, labelin and labelout. Ri inserts the TPI into the traceback path block (TPB) in the traceback path table (TPT). The TPB is associated with the corresponding destination IP address. Finally, Ri marks its ID and labelout into x. When x arrives at V, the TP A is set up completely. After the TP has been established, the router only marks the router ID and corresponding labelout into the TEC. In addition, the situation that routing paths converge at a router would have to be taken into consideration. There are three routing paths across R2 in Figure 3. For the TECs with the different destination addresses, R2 inserts their TPIs into the various TPBs. For the TECs with the same destination addresses, R2 assigns distinct labelout to them and inserts their TPIs into the same TPB.
The traceback manager identifies the router R4 through its ID marked in attack packets. R4 exploits the label carried in attack packets to find out the corresponding TPI and returns it to the server. Moreover, the server determines R3 through its ID in this TPI. Furthermore, R3 uses the labelin in the returned TPI to find out another TPI. This process continues until the ingress router R1 is identified.
In the following three sections, we will describe the design of our approach in detail:
Path establishment. How should an individual TP be established rapidly through the label distribution and how should the forwarded packets be processed once the TPs have been established?
Path traceback. How should the TP be constructed by linking the labels?
Extension. How should IP fragments and the packets undergoing transformation be traced?
3.2 Path establishment
To establish the TP, each deployed router commits both logging and marking operations. The logging operation is to record the path fragment called TPI. The marking operation is to append both router ID and labelout into the marking field.
When a packet p arrives at the ingress router R1, R1 only commits the marking operation. In the normal case, the marking field in p is unset and meaningless in the ingress router. Therefore, it is effortless to identify whether R1 is an ingress router. However, the malicious host may write the forged values into the marking field of p, which can disorient the current router. In PSIT, each router maintains a neighbor list including its neighboring router IDs. R1 matches the router ID in p with each entry of its neighbor list. If the matching does not succeed, R1 is view as an ingress router.
When a packet p arrives at another intermediate router Ri (Ri is not the ingress router), Ri judges whether the TP corresponding to the routing path traversed by p has been set up, then decides its operation. The judgment process is divided into two steps. In the first step, Ri determines the proper TPB for p through its destination IP address. In the second step, Ri identifies the correct TPI in p by matching the marking information carried in p with each TPI of this TPB. If the matching succeeds, Ri marks its ID and the labelout of the matched TPI into p. Otherwise, Ri assigns a new labelout and forms a new TPI, inserts the TPI into its TPB and marks its ID and the labelout into p. On the basis of the idea mentioned previously, the router operation during the path establishment can be described in Figure 4.
So far, we have introduced the frame of path establishment. In what follows, we will discuss three detail techniques for such framework.
3.2.1 Encoding scheme
The PSIT demands that both the router ID and label can fit into the marking field. As explained by Muthuprasanna et al. , we assign a 12-bit ID number to each deployed router, which could distinguish all neighbors of the router. In addition, we allocate 18 bits to the label and its value range is from 0 to 262143.
The marking information consists of 12-bit router ID and 18-bit label. To fit the information into the marking field, we use the 16-bit identification field, 1-bit reserved bit and 13-bit fragment offset field of the IP header. The left 12 bits are used to store the router ID. The remaining 18 bits store the label. Figure 5 depicts the encoding scheme. The backward compatibility issue of reusing the three fields has been discussed in .
3.2.2 Label distribution
At deployed routers, a distributed label must uniquely determine an arriving TEC which demands the distributed label is different with all the existing labelout in the corresponding TPB. PSIT uses the modulus operator to assign the label. Supposing maxLabel denotes the number of labels that have been assigned in a TPB, a new distributed label for the TPB is
where 262144 is the maximum number of distributed labels for this TPB. TPB is implemented using a data structure known as circular queue. If the number of TPIs in a TPB exceeds 262144, this TPB is regarded as saturated. In that case, the router updates maxLabel to 0 and restarts a new round of label distribution.
We depict the label distribution mechanism with an example in Figure 3. TEC x, y, z converge at the router R2. Initially, maxLabel in each TPB is 0. When x arrives at R2, R2 finds out the TPB associated with V (referred as TPB V) on the basis of its destination address, assigns label 1 to x and updates V's maxLabel to 1. Because y owns the same destination address with A, R2 assigns label 2 to y and updates V's maxLabel to 2. For z, R2 finds out the TPB H2, distributes label 1 to z and updates TPB H2's maxLabel to 1. Thus, R2 could distinguish the three TECs through their assigned labels.
3.2.3 Traceback path table organization
At high-speed routers, the increasing amount of traffic load requires that TECs must be processed at a rate commensurate with the TEC arriving. PSIT demands each router maintains multiple TPTs to cope with multiple TECs at the same time so as to ensure that the TPT access time is commensurate with the TEC arriving. A TEC is disposed only if its corresponding TPT is found out. In particular, a deployed router keeps a separate TPT for each TPB. If TPT has its own read/write hardware, TECs with different destination addresses can be disposed in different TPTs simultaneously, which can be illustrated in Figure 6(a). Thus, the TPT access time can match with the maximum TECs arriving rate that have the same destination address.
At a low-speed router, the traffic rate is lower and the TPT access time is not an issue. Therefore, a TPT can contain all TPBs, which is shown in Figure 6(b).
In addition, because of the memory limitation of deployed routers, a TPT only stories a certain number of TPBs. Once the memory resource is exhausted, the TPT needs to be paged out. Therefore, the routers annotate each TPT with the timestamp and the destination addresses.
3.3 Path traceback
Before the path traceback begins, an intrusion detection system (IDS) will determine an occurred exceptional event and provide the traceback manager with a packet p, victim V and time of attack T. During the path traceback, the traceback manager that contains the network topology information sends traceback query messages to the routers on the attack path so as to accumulate the path fragments that have been established.
Given V and p, the traceback manager pinpoints the most recent router through the router ID number carried in p and sends a query message to this router, which contains V, T and Label carried in p. When the router receives this message, it finds out the corresponding TPB in the TPT through V and T, matches the Label with the labelout of each TPI in this TPB and returns the router ID number and labelin in the TPI to the traceback manager. The server judges whether this labelin is the SOL mentioned in Section 3.1. If the labelin is SOL, the traceback process terminates. Otherwise, the server updates the label in the query message with the returned labelin and launches a new query message to the upstream router determined by the router ID number. Figure 7 illustrates the process of iteratively querying routers on the attack path.
We depict the path traceback with an example in Figure 3. IDS provides the traceback manager with an attack packet p contained in TEC x, victim V and time of attack T. The marking information in p is (R4 ID, 1). The server directly determines R4 through p's marking values and sends a query message that contains V, label 1 and T to R4. R4 directly identifies TPB V through V and T, finds out the proper TPI by matching each labelout in TPB V with label 1 and returns the R3 ID and labelin 1 to the traceback manager. The server updates the query message and launches a new query. The query process terminates at R2. The ingress router R1 is identified because of the R2's response. R1 may be at a source within the network or at the edge of PSIT system.
When an attack packet traverses on the network, it may undergo some valid transformations (e.g., NAT and fragmentation). However, such transformations are not invertible in IP networks. To trace these packets, we extend PSIT in two directions:(i) log IP fragments and (ii) log IP packets undergoing transformation.
Besides the ordinary TPT, each router maintains two special digest tables, called the fragmentation digest table (FDT) and transform lookup table (TLT), respectively. FDT is only for storing the digests of IP fragments. TLT records the original packet prefix and its transformation digest. The implementation of FDT and TLT is the same as that in .
The deployed router disposes each forwarded packet p as follows:
If p is an IP fragment and transformed at the current router, the router records the transformation information into TLT and the digest of p into FDT.
If p is an IP fragment and not transformed at the current router, the digest of p is stored into FDT.
If p is a nonfragmented packet and transformed at the current router, the router firstly records the original form of p into TLT. Secondly, the router marks p with its ID number and the labelout 0. The labelout 0 is regarded as standard transformation out label (STOL), which is assigned by the router only where packets are transformed. Finally, the router computes the digest of p and stores it in TLT.
Otherwise, the path establishment follows Figure 4.
The traceback manager traces an attack packet p as follows:
If p is an IP fragment, the path traceback is the same as in SPIE. The traceback manager dispatches traceback queries to the routers on the attack path and their neighbors. After receiving a query message about p, the queried router examines FDT and TLT of the relevant time period.
If p is a nonfragmented packet, the traceback manager only dispatches traceback queries to the routers on the attack path. After receiving a query message about p, if the labelout in this query message is not STOL, the path traceback is similar to the process in Section 3.3. Otherwise, this means p underwent transformation at the queried router. Assuming this router is Rj, Rj examines TLT of the relevant time period. To illustrate, Rj embeds its ID number and STOL into p, computes the digest of p and consults TLT. Thus, the router can find out the original packet p′. On the basis of the original packet p′, the traceback manager determines the upstream router Ri and then takes proper action as follows:
4 PERFORMANCE EVALUATION USING MATHEMATICAL ANALYSIS
In this section, we use mathematical analysis to conduct a thorough performance comparison between PSIT and the two state-of-the-art single-packet IP traceback approaches, that is, SPIE  and HIT . The performance metrics include the following:
Path Establishment Overhead
The number of logged audit trails requirement. This is the memory requirement for logging audit trails.
Audit trail table access time requirement. This is the audit trail table access times per unit time.
Path Traceback Overhead
The number of queried routers. This is the number of queried routers for gathering the path fragments.
The number of false-positive routers. This is the number of spurious routers that are mistaken for attack routers during a traceback process.
4.1 Path establishment overhead
Once the routers implements PSIT, to create audit trails, the deployed routers have to keep recording the audit trails and accessing the audit trail tables no matter whether the DoS attacks occur in the Internet. Therefore, the number of logged audit trails requirement and audit trail tables access times are significant metrics to evaluate IP traceback solutions.
4.1.1 The number of logged audit trails requirement
In PSIT, the audit trails logged at deployed routers include (i) TPIs and (ii) packet digests for both IP fragments and nonfragmented packets undergoing transformation.
Let us consider all packets forwarded by a router R. We assume that the number of IP packets arriving at R per unit time is n, the percentage of IP fragments is α and the percentage of packets undergoing transformation at R is β. The percentages of various kinds of IP packets in all forwarded packets at R are shown in Table 1.
Table 1. Percentages of different types of IP packets.
2.IP fragment and transformed packets at the current router
3.IP fragment or transformed packets at the current router (include two above)
α + β − αβ
4.Non-fragment packets and not transformed at the current router
1 − (α + β − αβ)
Firstly, we consider the nonfragmented packets not undergoing transformation at R. From Table 1, we infer that the number of such packets per unit time is n × [1 − (α + β − αβ)]. S denotes the number of TECs arriving at R. We have 1 ≤ S ≤ n × [1 − (α + β − αβ)]. According to McCreary et al.  and Stoica et al. , α ≤ 0.25 % and β ≤ 3 %. Thus, 1 ≤ S ≤ 0.967575 × n.
Secondly, we consider nonfragmented packets undergoing transformation at R and IP fragments. From Table 1, the number of such packets is n × [1 − (α + β − αβ)]. M denotes the number of such packets arriving at R. We have
Thus, M = 0.032425 × n
Let NLTPSIT, NLTSPIE and NLTHIT denote the number of logged audit trails at R per unit time in PSIT, SPIE and HIT, respectively. We have NLTPSIT = S + M. Using (2) to substitute for M, we, have
In SPIE, all arriving packets are needed to be logged. Then,
In HIT, NLTHIT is roughly 1/2 of those in SPIE. Thus,
When , we have NLTPSIT < NLTHIT < NLTSPIE. If the number of packets contained in a TEC is more than 3, the number of logged trails in PSIT is less than the other approaches. Paxson pointed out that most of routing paths persist for over 1 h, and only few routing paths may persist for only several seconds . Therefore, the number of packets that traverse on a routing path is far more than 3.
4.1.2 Audit trail table access time
Let TATPSIT, TATSPIE and TATHIT denote the access time requirement per unit time in PSIT, SPIE and HIT. In PSIT, the router R can keep a separate TPT for each TPB. Packets with different destination addresses can be processed in corresponding TPT in parallel. Suppose the number of destination addresses in all arriving packets at R is m. In the best case, all arriving traffics with the different destination addresses are equal to each other. Then,
In the worst case, all arriving traffics have the same destination address. Then,
In SPIE, all packets forwarded by R are processed in serial. Thus,
In HIT, R maintains a different digest table for each neighbor. Suppose R has k neighbors. The access time requirement in HIT is roughly 1/2 of those in SPIE. In the best case, the traffic arriving rates from all neighbors at R are equal to each other. Then,
In the worst case, the overall arriving traffic at R is only from a certain neighbor. Then,
Obviously, TATPSIT ≤ TATSPIE. When m ≥ 2k, we have TATPSIT ≤ TATHIT. PSIT examines fewer tables than HIT when the number of destination addresses in all arriving packets at R is twice as much as the number of R's neighboring routers.
4.2 Path traceback overhead
For a given attack path (R1, …, Rn), the traceback manager constructs the attack path backward from Rn to R1 by iteratively querying the relevant routers. Let NRPSIT, NRSPIE and NRHIT denote the number of queried routers in PSIT, SPIE and HIT, respectively.
In PSIT, the traceback manager can exactly identify the upstream router Ri − 1 on the attack path by querying router Ri. The manager dispatches n − 1 rounds of query message (excluding the ingress router R1 on the attack path), identifying one router in each round. Thus,
In the existing single-packet IP traceback approaches based on packet logging, the traceback manager needs to query the routers on the attack path and their neighbors, suppose each router has m neighbors. In SPIE, the traceback manager sends n rounds of query messages and consults m − 1 routers in each round. Thus,
In HIT, the traceback manager sends rounds of query messages and queries m − 1 routers in each round. Then,
When , we have NRPSIT ≤ NRHIT < NRSPIE. That means PSIT performs better than both SPIE and HIT on condition that the average number of neighboring routers is more than 3.In addition, the measurement study has shown that the average number of neighboring routers in the Internet is about 6.34 .
For 19438 Automatic Systems (ASs) from CAIDA, we accumulate the nodes degree of each AS and then divide it by its number of nodes to calculate the average nodes degree. Finally, we compute the complementary cumulative distribution function of the average nodes degree, shown in Fig. 8. The number of ASs with average node degree more than 3 is only 17% of its total amount and we observe that most of these ASs are transit ASs. Considering most of routing paths need go through the transit ASs, similar to Fig. 11(a), PSIT queries fewer routers than HIT.
4.3 Traceback accuracy
When a router is mistaken for an attack router, we call it “false positive.” In SPIE and HIT, the deployed routers use Bloom filter to compress the logged packets, which introduces false positives. Considering the memory constriction at routers, the number of false-positive routers increases with the number of logged digests .
Unlike SPIE and HIT, PSIT generates the false positives owing to “recycling of labels.” Once the newly assigned label is the same as a certain previous label that has been assigned in the corresponding TPB, a matching error occurs during the path traceback. In PSIT, such error probability is found almost zero. The reason for this can be stated as follows. Firstly, considering each TPB has the ability to distribute labels for 262144 routing paths. PSIT enables traceback to 262144 attackers with zero false positives, and such a large-scale DoS attack is rarely to take place. Secondly, if the attack packets are identified in a timely fashion, the path traceback can be initiated before the correct label is overwritten. In a word, the number of false-positive routers is almost zero.
In addition, PSIT Extension uses Bloom filter to store both IP fragments and nonfragmented packets undergoing transformation as audit trails. As mentioned previously, the number of false-positive routers increases with the number of logged digests. However, the chance of fragmented packets has been becoming lower and lower from 0.25% to 0.06% . Moreover, because over 60% of IP fragments are attacking packets and IP fragmentation is detrimental to end-to-end performance in IP network, modern network stack implements automatic MTU discovery to prevent fragmentation regardless of the underlying media . Furthermore, less than 3% of IP packets undergo common transformation . Compared with the total amount of common packets, the number of IP fragments and transformed packets is trivial. Therefore, the number of false-positive routers of digest table is almost negligible in PSIT Extension.
In this section, we will present the result obtained from our comparative experimental study between our approach and the two state-of-the-art approaches (SPIE and HIT) to back up the analytic results mentioned in Section 4. We conduct simulations on Omnet++ 4.1 .
5.1 Path establishment overhead
For the path establishment overhead, we design simulations to study the logging probability and the ratio of audit trails that are logged at each router to the average number of audit trails that are logged at the whole network.
Similar to , we also use a synthetic transit-stub topology generated by ReaseSUI and a AT&T topology collected by Rocakfuel (Rocketfuel 7018.r0) [22, 23]. The transit-stub topology contains 10 transit and 40 stub networks. In this topology, the routers in transit networks are referred as core routers, and the routers in stub networks are referred as edge routers. In the AT&T topology, each router is connected directly with hosts. We refer to the routers with more than two neighbors as core routers. Thus, the detailed settings of two topologies are described in Table 2. Each end host sends request messages to other hosts with a certain probability, and both IP fragments and transformed packets are not contained in these messages.
Table 2. Simulation settings for path establishment overhead.
The number of routers
The number of core routers
The number of edge routers
The number of links
Firstly, we assemble the number of forwarded packets and the numbers of logged audit trails logged in different time intervals and calculate the probability that audit trails are logged at the whole network, respectively. Figure 9 shows the comparison result for logging probability. In SPIE, the average logging probability keeps 100% steadily. In HIT, we consider two implementations: in HIT-1, the ingress router on each routing path logs all forwarded packet, and in HIT-2, none of packets is logged at the ingress router. HIT's average logging probability falls in the range of 45% to 55%. Compared with SPIE and HIT, we see that PSIT can significantly reduce the average logging probability, and this probability decrease with the simulation time increasing. For example, in Figure 9(b), before the time unit of 4, the average logging probability in PSIT is higher than HIT because the TPs have not been built up at the whole network. However, since then, the average logging probability is lower than SPIE and HIT gradually, even reduces to 5%. This is because once the TPs are established, the routers on the TPs only mark the forwarded packets, which confirms our analytic results.
Secondly, we collect the number of TPIs logged by each router in PSIT and calculate the ratio of the number of TPIs that are logged at each router to this average number of TPIs at the whole topology. Then, compute the complementary cumulative distribution function of the logging ratios. Figure 10 depicts the results on the transit-stub topology and AT&T topology. The logged ratios at edge routers are lesser than core routers. The reason lies in that the number of routing paths across the edge routers is lesser than the core routers. In our approach, an edge router that may be the ingress routers on some routing paths needs not commit the logging operation (refer to Section 3.2.2). In the AT&T topology, we notice that the logging ratios at core routers vary remarkably. In Figure 10(a), the logging ratios of all routers are more than 1. However, only around 18% core routers' logging ratios are more than 1 in Figure 10(b). This is because the logging ratios at routers significantly depend on the centralities of these routers within the network. Considering core routers in Figure 11, all routing paths across R1 in Figure 11(a) are originated from or destined to H1, …, H2, whereas all routing paths across r1 in Figure 11(b) are only originated from or destined to h1. The centrality of R1 is higher than r1, which leads to high logging ratios. In transit-stub topology, most of the core routers are similar to the R1; in AT&T topology, most of the core routers are similar to r1.
5.2 Path traceback overhead
This section aims to investigate the number of queried routers during the process of tracing an attack path with different lengths.
The simulation is based on a router-level topology from CAIDA (ITDK9812 skitter data) . The detailed setting of this topology is described in Table 3. Each router is connected directly with a host. We set a certain host as a victim, and the other routers are viewed as zombies.
Table 3. Simulation settings for path traceback overhead.
The number of routers
The number of links
The largest node degree
The average node degree
For all attack paths destined to the victim in different lengths, we collect the number of queried routers in SPIE, HIT and PSIT and calculate the average number of queried routers. The simulation results are depicted in Figure 12. If the lengths of attack paths are less than 4, the number of queried routers in PSIT is a little more than HIT. Otherwise, PSIT performs better. The reason lies in that the probability that the short attack paths contain the high-degree routers is lower. During tracing an attack path, in PSIT, the number of queried routers is only equal to the number of routers on this path. However, SPIE and HIT still depend on the degree of each router on this path. This confirms the analytic results in Section 4.2.
5.3 Traceback accuracy
Besides the cause of false positive generated by Bloom filter, SPIE and HIT still return the incorrect attack paths in some particular networks . The reason lies in that, both SPIE and HIT ambiguously determine from which neighbor the packet came whenever two neighbors of a given node have processed the packet. Figure 13 shows an example. In SPIE, R7 queries neighbor R4 and takes link R4 − R7 mistake for a part of the attack path. In HIT-1, link R4 − R7 is included in the attack path, and the reason for this mistake is the same with SPIE. In HIT-2, R6 queries R3 and link R6 − R3 may be added to the attack path. However, this link was not traversed by the attack packet.
In PSIT, a queried router can exactly determine its upstream router on the attack path no matter whether neighbors of this queried router have processed the packet. We conduct PSIT on the network topology mentioned in Figure 13. The simulation result is depicted in Figure 14.
6.1 Partial deployment
Supposing that only partial routers implement PSIT, all the deployed routers constitute an overlay network. In this network, each deployed router obtains its neighbors and is uniquely identified by the router ID number. Given a router Ri that forwards an attack packet, during the path traceback, Ri can directly determine the upstream router Ri − 1 on the attack path in this overlay network by its router ID number no matter whether R directly connects with Ri − 1 in the real network topology. Therefore, PSIT could work well even when only partial deployed across routers in the Internet.
6.2 False negative
In PSIT, the size of TPT increases with the number of TPBs. PSIT has to refresh their TPTs because of the limited memory resource of routers. Thus, the false negative problem happens. To decrease the rate of refreshing TPTs, the memory resource of deployed routers must be configured properly. As mentioned in Section 4.1, for each core router, the memory requirement is dependent on the centrality of this router. Considering most routing protocols are based on the shortest paths in the IP networks (e.g., RIP and OSPF), we use the betweenness of complex network to estimate the centrality of a router . The betweenness centrality of a router v can be expressed as
where gst is total number of shortest paths from s router to node t and gst(v) is the number of those paths that pass through v. The memory capacity at v can be adequately configured on the basis of the betweenness centrality of v.
Through the large amounts of tests, attackers may deduce the correct router ID number of the neighbors of the ingress router. We need two additional mechanisms to deal with this vulnerability. Firstly, the traceback manager examines the source address in the attack packet to judge whether it is the same as the IP address of the sender host connecting with the input port (or contains in the network prefix associated with the input port). Secondly, the traceback manager monitors the arriving traffic rates in both the ingress router and the forged neighboring router to identify where the malicious traffic comes.
For the distributed reflector denial-of-service (DRDoS) attacks, PSIT only traces back the “reflector,” not the real attackers. We can view the reflection process as a form of packet transformations. In DRDoS attacks, the reflector sets up a mapping relationship between the source address and the destination address in the attack packet. Thus, PSIT could correctly trace back DRDoS following the process in Section 3.4.
7 RELATED WORK
It is obvious that hunting down the attackers (zombies) is essential in solving the DoS attack challenge. Most work on IP traceback focuses on flooding-based DoS attacks [13, 25-30], which cannot defend against software-exploited-based DoS attacks. Few studies pay attention to single-packet IP traceback, which could trace both flooding-based DoS attacks and software-exploited-based DoS attacks. Our effort makes a great contribution to the improvement of prior art in the single-packet IP traceback.
The existing approaches for single-packet IP traceback are grouped into two classes: the single-packet IP traceback employing packet logging and the hybrid single-packet IP traceback employing packet logging and marking.
7.1 The single-packet IP traceback employing packet logging
Matsuda ea al.  proposed an approach for single-packet IP traceback employing packet logging for the first time. Its main idea is that deployed routers store the portion of each forwarded packet as audit trails. During the traceback process, the traceback manager queries the deployed routers and examines their logged audit trails to identify the attack path. The logged portion length is up to 60 bytes, which results to the high storage overhead at routers. Snoeren et al.  proposed a hash-based single-packet IP traceback approach named as SPIE. In SPIE, deployed routers compress the portion of each forwarded packet using Bloom filter and refer to it as digest, which reduces the storage overhead significantly. However, Bloom filter could generate false positives, which lowers the traceback accuracy. According to the basic property of Bloom filter, given a fixed false positive rate, the memory requirement at deployed routers still grows linearly with the number of forwarded packets. Lee et al.  proposed a scalable hash-based single-packet IP traceback approach to further reduce the storage overhead. Unfortunately, this approach increases the false positive rate. Hilgenstieler et al.  proposed extensions to SPIE for precise and efficien log-based IP traceback. This approach cannot essentially improve SPIE in terms of the storage overhead and false positive rate.
7.2 The hybrid single-packet IP traceback employing packet logging and marking
Gong et al.  proposed a single-packet IP traceback approach called HIT, which is the first research that employs both packet logging and marking to solve such problem. In HIT, when a packet traverses on the network, partial deployed routers mark the path fragmentation into this packet and the other store its digest as audit trail. Obviously, those logged digests contain the upstream path fragmentation. Compared with SPIE, HIT reduces the storage of packet digests by 1/2. However, the storage overhead still grows with the number of forwarded packets, which prevents this approach from being applied to high-speed networks with heavy traffic. Malliga et al.  designed a scheme MORE to improve HIT using interface numbers of routers, instead of partial IP address, to mark a packet's forwarding path. This approach still requires high storage and cannot remove the false positives completely. Yang et al.  proposed an approach called RHIT to improve MORE by marking routers' interface numbers and integrating packet logging with a hash table. RHIT reduces the storage overhead and false positive rate significantly. However, there are two stronger (arguably less practical) assumptions in MORE and RHIT. Firstly, all routers must be traceback enabled. Clearly, if any traceback scheme is adopted in the Internet, not all ISPs will implement this function. Therefore, a traceback mechanism should function even when only partially deployed across routers in the Internet. Secondly, an interface needs to connect with a router only. In the real network, an interface may link with several routers by layer-2 devices.
Our work is fundamentally different with this early research. We set up the logical paths as audit trails instead of packet logging, which achieves to an improved traceback capability. Compared with previous work, our approach has the following advantages: firstly, our approach reduces the memory requirement at routers; secondly, we reduce the number of queried routers during the traceback process; thirdly, we address the issue of false positive rate; finally, our method allow for partial deployment on the Internet.
During the past decade, DoS attacks have posed a powerful security threat to the Internet. Single-packet IP traceback is a critical part of defending against DoS attacks. However, current proposals for single-packet IP traceback based on packet logging suffer from the drawbacks including the high storage and processing overhead at routers, as well as low traceback accuracy.
In this paper, we have proposed a more precise and efficient path-based approach for single-packet IP Traceback (PSIT). The main idea is to set up logical paths as audit trails instead of packet logging. During the path traceback, we construct the attack path through accumulating the established path fragments. Comparing to the previous approaches, PSIT outperforms them in terms of the overhead at routers, as well as the traceback accuracy. Specifically, in our approach, the storage overhead is only related to the number of routing paths, no matter how many packets traverse on them; the number of queried routers during the traceback process is only related to the number of hops in an attack path; the false positive rate would be almost zero.
The work was supported in part by the funding agencies of China: the Innovative Research Groups of the National Natural Science Foundation under grant 61121061.