• Open Access

A histogram-based method for efficient detection of rewriting attacks in simple object access protocol messages

Authors


Abstract

In order to secure the content of simple object access protocol (SOAP) messages in Web services, several security standards of Web service security, such as XML digital signature, are used. However, the content of a SOAP message, protected with XML digital signature, can be altered without invalidating the signature. Existing methods for detecting XML rewriting attacks are inefficient because the cost of performing detection operation is linear to the height of the SOAP message tree. Thus, each element of SOAP message needs to be accessed and checked. In this paper, we propose an efficient method for detecting XML rewriting attacks on SOAP messages using a histogram. With our method, once the source of attacks is identified, we save it in the form of a histogram, which enables us to maintain a statistical information about the location of the attack in the SOAP message. We can use this information to detect attacks in the future and thus avoid unnecessary check of all elements in the SOAP message. Experiments show that our methods outperform existing methods by several times in many cases. Copyright © 2014 John Wiley & Sons, Ltd.

1 Introduction

Web services are used in a variety of applications and become a key technology in developing business operations on the Web [1, 2]. The communication between Web services occurs via the simple object access protocol (SOAP) messages. SOAP message is an XML document with a mandatory Body element containing request and response and with an optional Header element containing routing and security information.

In order to secure the content of SOAP messages in Web services, several security standards of Web service security (WS-Security), such as XML digital signature, are used. However, McIntosh and Austel [3] illustrated that the content of a SOAP message, protected with XML digital signature, can be altered without invalidating the signature. This is so called XML rewriting attack, and it can occur because XML digital signature does not protect the location of the signed element within the SOAP message tree. This enables attacker to re-locate the signed element of SOAP message while keeping the signature valid.

Different solutions are proposed to solve this problem. If used correctly, Web service policy (WS-Policy) can prevent XML rewriting attacks by enforcing the location of the signed element in policy files. Thus, policy advisor tools for Web services policies are proposed in [4, 5]. Policy advisor tools diagnose the SOAP messages according to a policy file, explain the risks, and suggest appropriate recovery actions. We refer to this method as the policy-based method. In [6-8], it is proposed to consider information about the structure of SOAP message by adding a new header element, called SOAP Account, in outgoing message's header. We refer to this method as the inline method. In [9, 10], it is proposed to consider the absolute path from the root to the signed element using XPath expressions. We refer to this method as the string-based method. However, abovementioned methods for detecting XML rewriting attacks are inefficient because the cost of performing attack detection is linear to the height of the SOAP message tree. Thus, each element of SOAP message needs to be accessed and checked.

In order to solve this problem, we propose an efficient method for detecting XML rewriting attacks on SOAP message using a histogram. In our method, we first construct a labeling scheme of a SOAP messages structure using the Dewey labeling scheme (DLS) [11] and then attach it in outgoing message's header. DLS enables us to label a SOAP message tree with the level-based numbers that can be used for expressing how the elements within the SOAP message relate to one another. If the attacker modifies the SOAP message, it is detected by observing the change of the numbering relationship of the signed elements. Once the source of attack is identified, we save it in the form of histogram, which enables us to maintain a statistical information about the location of the attack in the SOAP message. We can use this information to detect attacks in the future and thus avoid unnecessary check of all elements in the SOAP message.

More precisely, we make following contributions in this paper:

  • We propose an efficient method for detecting XML rewriting attacks on SOAP message using a histogram.
  • We propose to construct a labeling scheme of SOAP message structure using the DLS that is proved to be secure against XML rewriting attacks and has other significant advantages over existing methods.
  • Experiments show that our methods outperform existing methods by several times in many cases; we also provide an in-depth analysis on the major factors that impact the performance of previous methods.

The remainder of this paper is organized as follows. Section 2 introduces preliminaries for our research. Section 3 discusses related work. Section 4 describes our proposed method. Section 5 presents performance evaluation. Section 6 highlights conclusions and future work.

2 Preliminaries

In this chapter, we explain and formally define XML digital signature and XML rewriting attack.

2.1 XML digital signature

Making Web services secure means making SOAP messages secure [12]. To secure exchanges of SOAP messages, several security standards of WS-Security, such as XML digital signature, are used. XML digital signature provides a method of digitally signing the SOAP messages to ensure their integrity. More precisely, we can formally define XML digital signature as Definition 1.

Definition 1. XML digital signature. A digital signature for a SOAP message is a 3-tuple (Generate, Sign, and Verify) of algorithms satisfying the following conditions:

  • Generate is a key generator algorithm. On input 1k where k is a security parameter, it outputs public and secret keys as following (pk, sk).
  • Sign is the signature algorithm. On input message M and secret key sk, algorithm Sign outputs a signature δ = Sign(sk,M).
  • Verify is the verification algorithm. Given a public key pk, message M, and signature δ, algorithm Verify(pk, M, δ) outputs 1 if δ = Sign(sk,M). Otherwise, it outputs 0.

An example of a SOAP message secured with an XML digital signature is depicted in Figure 1.

Figure 1.

SOAP message secured with XML digital signature.

In Figure 1, XML digital signature, which is displayed in a red-dashed region, is a 2-tuple signature. The signed soap: Body element, which is displayed as red solid region, is referenced by its URI under the ds: Reference element.

2.2 XML rewriting attacks

Security is one of the significant challenges for Web services because of their deployment in open and unprotected environment [13]. Recall from Section 1 that the content of a SOAP message, protected with XML digital signature, can be altered without invalidating the signature. This is so called XML rewriting attack, and it can occur because XML digital signature does not protect the location of the signed element within the XML document. This enables attacker to re-locate the signed element of SOAP message while keeping the signature valid. More precisely, we can formally define XML rewriting attacks as Definition 2.

Definition 2. XML rewriting attack. Given a message M, which is signed using δ = Sign(sk,M) as defined in Definition 1, XML rewriting attack introduces a new BOGUS element and moves a signed element under a signature δ' = (Sign(sk,M'))− 1 where δ' is a valid signature on message M under the public key pk, that is, Verify(pk, M, δ ’)  = 1.

An example of an XML rewriting attack on SOAP message is depicted in Figure 2. In Figure 2, after the attack, Body element is moved under a new atk: BOGUS header element, making this soap: Body element meaningless for the receiver. Attacker creates his own soap: Body element with a new ID and content information. In Figure 2, the result of this modification is depicted as a black solid region. According to SOAP specification, the new soap: Body element is not required to have an identifier. This signature is still valid because the referencing element “body” is still present and not changed. Thus, the attacker bypasses the detection checks.

Figure 2.

XML rewriting attack on Body element of SOAP message.

There are three types of XML rewriting attacks, such as replay attacks, redirection attacks, and multiple security header attacks. In [10, 14], the application of the different types of the XML rewriting attacks on SOAP message is demonstrated, using the same method demonstrated in Figure 2. The replay attacks occur on MessageID element, which is defined by WS-Addressing and used by the server to keep track of client requests. Once the replay attacks occur, we may cause the same request to be processed several times, making the client pay several times for the same query and forcing the server to do redundant work. The redirection attacks occur on ReplyTo element, which is used for embedding the URI of the ultimate recipient. Once the redirection attacks occur, the SOAP message may be redirected to a different location instead of the location it was sent for. The multiple security header attacks occur on timestamp element, which is used by the server to manage its cache, where the server deletes a message when the timestamp expires.

The practical impact of the XML rewriting attack is presented in [15, 16], wherein the attacks on the Amazon EC2 SOAP and the Eucalyptus Cloud Web Services are demonstrated.

3 Related Work

There are numerous methods for the detection of XML rewriting attacks. We categorize these methods into three types: the policy-based method, the inline method, and the string-based method. In this section, we review these methods.

3.1 Policy-based method

If used correctly, WS-Policy can prevent XML rewriting attacks by enforcing the position of the signed element in policy files. Thus, policy advisor tools for Web services policies are proposed in [4, 5].

A policy-based advisor tool, presented in [4], diagnoses the SOAP messages according to a policy file, explain the risks, and suggest appropriate recovery actions. It generates a security report by running queries that check for over 30 syntactic conditions. For instance, if the policy file indicates that < ReplyTo > element is optional to be signed, the policy advisor will make a warning about possibility of redirection attack and will recommend making < ReplyTo > element mandatory and signed for a request message. A policy-based method, presented in [5], is able to detect and fix faults in SOAP messages. As a part of the proposed method, a technique called Bit-Stream is developed. It works based on the importance of SOAP elements, detects the vulnerabilities and risks, and recommends advices for higher security. The system adapts simulation-based method that allows self-optimization of its performance under different security conditions.

3.2 Inline method

Inline methods proposed in [8, 17, 18] consider information about the structure of the SOAP message by adding a new header element called SOAP Account. The SOAP Account header includes the number of child element of envelope; the number of header elements; the number of references for signing element; and predecessor, successor, and sibling relationship of the signed object. The extension to SOAP Account is proposed in [7, 19]. This extension considers new characteristics of SOAP message such as the depth of information and parent elements of the signed node as well as a way uniquely identify the parent elements. Another extension is proposed in [6]. This extension uses a new header in SOAP message containing the signed elements positions in the message. The header is added to the SOAP message after the detection of signed elements positions located in the Document Object Model tree.

3.3 String-based method

In [9, 10], it is proposed to consider the absolute path from the root to the signed element using XPath expressions. Specifically, the proposed method uses a subset of XPath, called FastXPath, instead of ID attributes for signature referencing in Web services messages. FastXPath is able to protect against XML rewriting attack in most reasonable scenarios without causing the performance impact associated with the use of complex XPath expressions.

3.4 Discussions

The policy-based advisor tools are not efficient. In order to detect attacks of element deletion with policy-based method, every element should be declared as a mandatory. This reduces the flexibility of the XML document and causes the performance degradation in the validation phase. Inline method can overcome a pitfall of policy-based method; however, it is not efficient too. This is due to a high complexity of this method, which needs a long calculation time in order to determine the structural information. The main pitfall of string-based method is that if the depth of nodes grows, the size of the SOAP message becomes large, which introduces additional overhead in processing the messages. Moreover, with the aforementioned methods, the cost of performing attack detection is linear to the height of the XML tree (SOAP message). Thus, each element of SOAP message needs to be accessed and checked.

4 Proposed Method

Taking the shortcomings of all the discussed solutions into consideration, we propose an efficient method for detecting XML rewriting attacks on SOAP message using a histogram. In this section, we first describe how to construct a labeling scheme of SOAP message structure and explain how to detect the XML rewriting attacks using a histogram.

4.1 Constructing a labeling scheme of simple object access protocol message structure

After carefully observing the XML rewriting attacks, it is obvious that all attacks are modification of SOAP messages. It is either deleting parts and adding afterwards or adding new elements in SOAP messages. When such unexpected modification occurs in the form of manipulation of XML elements, the intended predecessor and successor relationship of the SOAP element is lost consequently. Based on these observations, we have concluded that at the time of sending SOAP message, it is possible to keep SOAP elements structure information by attaching it to the outgoing message's header.

We construct a labeling scheme of SOAP message structure using the DLS and then attach it to the outgoing message's header. DLS enables us to label SOAP message tree with the level-based numbers that can be used for expressing how the elements within the SOAP message relate to one another. If the attacker modifies the SOAP message, it is detected by observing the change of the numbering relationship of the signed elements.

Originally, DLS is proposed for labeling XML trees. Because SOAP messages have the same format as XML document, we can use the benefits of DLS in order to formulate a solution for XML rewriting attacks. The DLS is a number-based inclusion in SOAP message; thus, it avoids all pitfalls of existing methods by reducing high latency and high protocol overhead.

4.1.1 Constructing labels

The DLS of a node represents the path from the document root to the node. The labels of all nodes are constructed by three components [20, 21].

  1. Level component is the level of the node in the XML document. The level of the tree from root to leaf is designated such that the root level is null.
  2. Inherited label component is the component that succeeds the label of parent node, eliminating the level component from a parent node label, which is inherited. In succeeding the label of the parent node, the exact location of the node can be identified.
  3. Sibling order component. This represents the relative location among sibling nodes.

A DLS is produced by three components, concatenated by a “delimiter (.)”, and is defined in order to assist users in figuring out the relationship between nodes. The labeling for an XML document is divided into a root node and internal nodes. Algorithm 1 [20] describes the construction steps of the DLS.

image

In Algorithm 1, 1 ~ 5 lines defines the variables such as level of the current node, sibling order of current node, value that is inherited from the parent node label, string value of current level, and labeling of plaintext node. Starting from line 6, it recursively checks following two conditions. The first condition is that if level of the current node equals to 0, then we label this node as root and give a null value. The second condition is that for the children nodes of null, the next level of the XML tree, which is “2”, is continued with the label of its parent node, which is “1”, and concatenation “.”. Then, a letter “1” is added for the first child. According to the decimal numbers, unique labels for first child node of root should be “2.1.1”. Next, for the child node for “2.1.1”, the next level of the XML tree, which is “3”, is continued with the label of its inherited component, which is “1.1”, and concatenation “.”. Then, the letter “1” is added for the first child. This action will be executed through lines 10 ~ 17 until the full tree has been traversed. After the execution of this algorithm, each node in the tree has a labeling value.

Figure 3 is the labeled XML tree built from the Algorithm 1. As stated in the algorithm, because there is no parent and sibling node for the element, null is assigned to the root node. For the rest of the elements, as algorithm goes one level down, it assigns “1” for the first child and “+1” to the sibling of first child and so on.

Figure 3.

Labeling scheme of SOAP message structure.

4.1.2 Correctness of relationship

The DLS assigns each node a Dewey label, which is a concatenation of its parent's label and its local order. Given two Dewey labels A : a1. a2 … am and B : b1. b2 … bn, we define Dewey order as

Definition 3. Dewey order [22]. In DLS, A follows B if and only if following condition holds: a1 = b1, a2 = b2, …, am = bn where m < n. The relationship of A and B can be established based on following properties:

  • Ancestor–descendent relationship. A is an ancestor of B if and only if m < n and :a1 = b1, a2 = b2, …, am = bn; that is, a1. a2 … am is a prefix of b1. b2 … bn.
  • Parent–child relationship. A is a parent of B if and only if A is an ancestor of B and m = n − 1; that is, a1. a2 … am matches the parent label of b1. b2 … bn.
  • Sibling relationship. A is a sibling of B if and only if A's parent label matches B's parent label.

Theorem 1. Correctness of relationship [22]. Given three DLS labels A : a1. a2 … am, B : b1. b2 … bn and C : c1. c2 … cl, such that A < B and B < C, the A < C.

Proof. From Definition 3 [22], because A < B, we can have m < n, such that math formula and am × b1 < bn × a1. Consequently, we can obtain math formula and bn × cl < cl × b1. Because, math formula, thus, we have A < C.

4.2 Rewriting attack detection using histogram

The idea of histogram-based attack detection is to solve new problems by adapting solutions that have been used to solve similar problems in the past. We can formally define histogram as Definition 5.

Definition 4. Histogram [23]. Let x be a measurement that can have one of T values contained in the set X = {x1, …,xT}. Consider set of n elements whose measurements of the value of x are A = {a1, …,an}, where a1 ∈ X.

The histogram of the set A along measurement x is H(a,X), which is an ordered list consisting of the number of occurrences of the discrete values of x among the at. If Hi(a,X), 1 ≤ i ≤ T, denotes the number of elements of A that have value xi, then H(a,X) = [H1(a,X), …, HT(a,X)] where

display math

4.2.1 Histogram construction

Once the source of attack is identified, we save them in the form of histogram, which enables us to maintain statistical information about the location of the attacks in SOAP message tree. Thus, we can use this method to detect attacks in the future and thus avoid unnecessary visit to the all nodes of SOAP message tree. These procedures are reflected in Algorithm 2. The algorithm consists of following steps:

  • Step 1:Any legitimate receiver of the message complying with our method, as soon as message arrives, builds DLS of SOAP elements structure information of the received message and
image

compares it with the attached structure information.

  • Step 2:The differences of these DLS schemas are reflected in detectAttack step. If the attacker modifies the SOAP message, it is detected by observing the change of the numbering relationship of the signed elements.

Theorem 2. The proposed DLS can detect Replay and Redirection attacks on signed message δ = Sign(sk, S).

Proof. According to Definition 2, when some unexpected modification δ' = (Sign(sk,m'))− 1 occurs in the form of manipulation of underlying XML elements, the intended predecessor and successor relationship of the SOAP element is lost consequently. This contradicts the Definition 3 and proves the theorems. Figure 4 shows this process.

  • Step 3:Once the source of attack is identified, we save it in the form of histogram, which enables us to maintain statistical information about the location of the attack in the SOAP message. We can use this information to detect attacks in the future and thus avoid unnecessary access and check all elements of the SOAP message.
Figure 4.

Labeling scheme of SOAP message structure after XML rewriting attack.

5 Performance Evaluation

We conducted experiments to evaluate and compare the performance of the three methods, namely inline method such as SOAP Account method, string-based method such as FastXPath method, and our method as histogram-based method. In this section, we first describe experimental settings and then proceed with experimental results.

5.1 Experimental settings

Experiments were carried out on a 2.4 GHz Pentium processor with 512 MB of RAM running Windows XP Professional. We chose Axis, a SOAP processor engine, for the creation and processing of SOAP messages. For signature creation and verification, we use the Apache XML Security library version 1.6. This library implements the security standard for XML. We use Tomcat 6.0, which is a servlet engine for the deployment of axis servlet. The prototype is developed using the NetBeans 6.9 environment.

5.2 Experiment results

Figure 5 demonstrates the result of message size comparison. In Figure 5, x axis represents the number of elements in SOAP message, and y axis represents the size of a SOAP message in bytes. The graph in Figure 5 shows that when using string-based methods, as number of elements grows, the size of SOAP message significantly increases. This is because string-based method attaches a significant amount of string data to a SOAP message. Inline method has the smallest size of all methods. This is because of its compact scheme, which almost does not enlarge the SOAP message. With our method, at some stage, the size of SOAP message becomes larger than that of inline method. This is because DLS always grows in size while inline method almost does not grow in size. However, our method shows a significant improvement compared with string-based method, because of its numbering inclusion.

Figure 5.

Message size comparison.

Figure 6 demonstrates the result processing time comparison. In Figure 6, x axis represents the number of elements in SOAP message, and y axis represents the SOAP message processing time in milliseconds. The graph in Figure 6 indicates that the time when processing string-based method is long. This is due to its big size. As we described in graph of Figure 5, string-based method introduces additional size due to its string-based addition and thus causes a high latency. As the number of elements grows, inline method gains high processing time. This is due to high complexity of inline method scheme, which results in a long calculation time. Our method shows a significant improvement compared with the FastXPath and SOAP Account. This is due to its simplicity and dynamicity of DLS.

Figure 6.

Processing time comparison.

Figure 7 demonstrates the result detection time comparison. In Figure 7, x axis represents the number of elements in SOAP message, and y axis represents the SOAP message processing time in milliseconds. In Figure 7, the inline method shows the highest detection time. This is because the criteria according to which detection action performed are various. When a SOAP message has a big size, the number of elements to be checked grows up. String-based method has a similar disadvantage.

Figure 7.

Detection time comparison.

When checking incoming SOAP message, they linearly traverse in SOAP message tree, which results in a long detection time. From the graph in Figure 7, it is easy to see that histogram-based method detects XML rewriting attacks faster than other methods. This is because, once the source of attack is identified, we save it in the form of histogram, which enables us to maintain statistical information about the location of the attack in the SOAP message.

6 Conclusions

In this paper, we have proposed an efficient method for detecting XML rewriting attacks on SOAP message using a histogram. With our method, once the source of attack is identified, we save it in the form of histogram, which enables us to maintain a statistical information about the location of the attack in the SOAP message. We can use this information to detect attacks in the future and thus avoid unnecessary check of all elements in the SOAP message.

We conducted experiments to evaluate and compare the performance of the three methods, namely inline method such as SOAP Account method, string-based method such as FastXPath method, and our method as histogram-based method. Experiments show that our methods outperform existing methods by several times in many cases.

Acknowledgement

This work was supported by the Information Technology Research and Development program of Ministry of Knowledge and Economy/Korea Evaluation Institute of Industrial Technology (10041854, Development of a smart home service platform with real-time danger prediction and prevention for safety residential environments). This research was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012-003797).

Ancillary