• Open Access

Efficient authentication and access control of scalable multimedia streams over packet-lossy networks



Securing scalable multimedia streams becomes an important issue with the emergence of various scalable multimedia coding standards and their wide spread applications. In this paper, we first propose two novel schemes for authenticating scalable multimedia streams over packet-lossy networks. The first scheme uses a digital signature to protect the integrity of a group of frames and uses erasure correction coding to combat packet loss. The second scheme employs message authentication code to protect integrity of individual frames, which is completely resilient to packet loss and greatly improves computational efficiency compared with the first scheme. With the second authentication scheme, we further present a scheme that provides both authentication and access control to scalable multimedia streams over packet-lossy networks. This third scheme uses symmetric encryption to enforce access control by allowing authorized users to decrypt substreams corresponding to their privileges and uses attribute-based encryption to disseminate secret keys to users. For the first two schemes, we analyze their performance in terms of computation cost, communication overhead, buffer size, and probability of successful authentication, whereas for the third scheme, we demonstrate its application to H.264 scalable video coding encoded streams. Copyright © 2013 John Wiley & Sons, Ltd.


The wide spread deployments of multimedia applications such as video streaming, mobile television, and Internet Protocol Television (IPTV) are very much facilitated by the development of handheld devices as well as the ubiquitous networking infrastructure. Today, more than 100 million users have downloaded multimedia files, and over 47 millions of them do so on a regular basis [1]. Because of the widely differing user device capabilities and network bandwidth, scalable multimedia coding standards, such as JPEG-2000 [2] for images and MPEG-4 [3] and H.264 scalable video coding (SVC) [4] for videos, surfaced as appealing solutions to address the pervasiveness of multimedia applications. These standards produce scalable multimedia streams made up of a base layer providing the image/video at the coarsest quality and one or more enhancement layers that gradually improve the image/video quality. Hence, instead of sending multiple versions of the same content to cater for different devices or network conditions, a multimedia source prepares a single stream per content over a source–proxy–user network, where one or more proxies perform transcoding—removing one or more enhancement layers, before delivering the transcoded substreams to end users.

However, as multimedia streams are distributed over open and public networks, they are vulnerable to attacks, including unauthorized data insertion, removal, or modification for commercial or political purposes. Without appropriate measure to provide content authentication, there is no assurance on whether a stream received by users is indeed the one sent by a legitimate source containing legitimate content.

Another threat on multimedia streams is unauthorized content access. In many commercial multimedia applications, users pay for different privilege levels and are granted the access rights on contents with corresponding qualities, for example, high resolution versus low resolution. A dishonest user may attempt to access unauthorized stream content beyond his or her access rights. To deal with this threat, an access control scheme must be in place to cordon off illicit accesses.

It is nonetheless a challenging problem to realize authentication and access control for scalable multimedia streams. Besides satisfying common performance requirements, such as computation and communication efficiencies, the security scheme has to deal with the scalability demand posed by the heterogeneity of receiving devices and networks. For example, it is highly desirable that no additional operational overhead is imposed on proxies. That is, a security scheme should be proxy transparent such that a proxy does not need to be aware of the underlying security mechanism. Packet loss is another challenge. Modern scalable multimedia coding standards are designed to be resilient to packet loss. The security scheme for multimedia streams should provide the same level of resilience to packet loss, which is especially challenging to achieve because cryptographic algorithms are extremely sensitive to errors.

Most existing authentication schemes are for nonscalable streams (e.g., H. 222|MPEG - 2 [5] streams). A nonscalable stream contains a single layer, and the entire stream must be available for successful decoding. Hence, authentication scheme designed for nonscalable streams is monolithic and cannot be directly applied to scalable streams because a legitimate transcoding will be viewed as tampering and the streams would be rejected. Only a few authentication schemes have been proposed for scalable multimedia streams. They are based on hash chaining or Merkle hash tree along with a digital signature amortized over a group of packets. However, these techniques either cannot tolerate packet loss or incur high operational overhead for proxies.

Access control for scalable multimedia content typically relies on content encryption where different enhancement layers are encrypted with different keys. Appropriate decryption keys are then distributed to users on the basis of their access rights. However, as we will show in the paper, most of the existing schemes are vulnerable to a user collusion attack, where two or more users subscribing to different privilege levels collude and derive decryption keys for higher privilege levels. Furthermore, these schemes assume the presence of an online key distribution center (KDC) to distribute keys to users, which could be a single point of failure and pose a scalability problem as the number of users increases.

1.1 Our contributions

We first propose two schemes for authenticating scalable multimedia streams over packet-lossy source–proxy–user networks. The first scheme combines the advantageous features of [6] and [7], resulting in an authentication technique that is transparent to proxies and resilient to packet loss. The second scheme further improves the computation cost incurred at end users by using message authentication code (MAC), instead of digital signature, to protect the authenticity of individual packet. We present a comprehensive analysis of the two schemes in terms of computation cost, communication overhead, buffer size, and probability of successful authentication.

With the second authentication scheme, we further present the third scheme, which provides both authentication and collusion-free access control to scalable multimedia streams over packet-lossy networks and without employing KDC. This scheme uses attribute-based encryption (to disseminate keys) and symmetric encryption (to encrypt multimedia content), and it allows users with the right privileges to decrypt substreams associated with their privileges. We present the application of our scheme to the H.264 SVC streams.


2.1 Authentication for multimedia streams

Most of the existing solutions on multimedia authentication are designed for nonscalable multimedia streams. The work of [8] uses hash chain on packets of a multimedia stream, where the hash of a packet is attached to its previous/next packet and the last packet in the chain is digitally signed. This method has very low communication overhead and computation cost but is not resilient to packet loss. Subsequent works such as those of [9-16] focus on devising different graph-based authentication method with digital signature to achieve robustness to loss with reasonable communication overhead, [12] and [17] use hash-based MAC (HMAC) instead of digital signature for authentication to lower the computation cost, but [12] requires time synchronization, whereas [17] imposes a high communication overhead.

Instead of embedding multiple hashes or HMAC within a packet as the schemes earlier, the work of [18] proposes a scheme called Signature Amortization using Information Dispersal Algorithm (SAIDA). The scheme performs erasure correction coding where it computes n hashes for each of the n packets in a packet group, generates digital signature on the hashes, and performs erasure correction coding on the concatenation of these hashes and the signature. The resulting code is then dispersed to the n packets. As long as the number of packets received is at a threshold value, the hashes and the signature can be recovered for authentication of the received packets. An improvement to SAIDA, commonly called communication overhead reduced-SAIDA (cSAIDA), is given in [19] that further lowers the communication overhead by performing the erasure coding twice on the authentication data. We employ a similar technique as cSAIDA in our first authentication scheme. In addition, the readers are referred to [6] for an excellent survey on authentication for nonscalable multimedia streams.

For authentication of scalable multimedia streams, the work of [20] employs hash-chaining method, but it does not consider packet loss issue that is vital for online transmission of streams. In [7], three schemes are designed for MPEG-4 video streams considering lossy networks; two of which—the flat authentication scheme (FAS) and the progressive authentication scheme (PAS)—use exclusive OR (i.e., exclusive disjunction, denoted as XOR from here onwards) and concatenation technique, respectively, to generate packet hash. It is worth noting that although they are designed with MPEG-4 streams in mind, they can be applied to general scalable streams. However, these schemes are not proxy transparent. More recently, an authentication scheme for H.264 SVC streams is presented in [21], which supports all scalabilities provided by H.264 SVC but comes with a high communication overhead.

In [6], the hash-chaining scheme for authenticating scalable multimedia streams is presented. It models a scalable stream as a sequence of frames, and each frame has a base layer and multiple enhancement layers. The stream is divided into groups of n frames. Each enhancement layer of a frame is hashed, and its hash is attached to its predecessor layer in the same frame. Then, the hash of each frame (i.e., the hash of the base layer) is appended to its previous frame, and the first frame of the group is digitally signed. The collision resistance property of the hash function ensures that the signature authenticates the entire group of frames. Note that a substream with a base layer and 0 or more enhancement layers can be verified using the authentication data carried only in that substream. This property implies that the scheme is proxy transparent, that is, to adapt a multimedia stream, a proxy simply discards certain enhancement layers from the top and delivers the remaining layers to users. However, the scheme does not tolerate frame loss—once a frame is lost, all subsequent frames in a group become nonverifiable and must be rejected. Furthermore, it adds computation cost at the users because of the expensive digital signature verification, especially for portable mobile devices.

2.2 Access control for multimedia streams

Access control involves authorizing legitimate users with appropriate privileges to access a certain resource while denying access from illegal users [22, 23]. Solutions for authorization fall into two categories, access control models and cryptographic techniques. An access control model mediates access to resources by checking with access control rules established in conformance with a security policy. A cryptographic method for access control manages authorization by encrypting information items such that only authorized users have the right keys to decrypt the encrypted data. A number of access control solutions (e.g., [24-27]) based on cryptographic techniques have been proposed. All of them assume that information items as well as users are classified into a certain type of hierarchy and there is a relationship between the encryption key assigned to a node and those assigned to its children. They differ mostly in the different cryptographic techniques employed for key generation. Most of them employ complex and computationally expensive cryptographic operations, such as RSA or large integer modular exponentiations. Employing these schemes in access control to multimedia streams is not feasible because user devices (such as Personal Digital Assistant and cell phones) in multimedia applications can be very resource constrained.

A flexible access control scheme for scalable JPEG2000 image streams is presented in [28]. In this scheme, a rooted hash tree based on the inherent structure of the underlying JPEG2000 image stream is constructed. Node values in the tree are used as keys to encrypt packets in the image stream in such a manner that only users with the right privileges can decrypt the encrypted packets corresponding to the granted image. The access control scheme is efficient and is fully compatible with the scalability properties of JPEG2000. However, it assumes the existence of an online KDC and is fragile to packet loss.

In [29], an access control scheme for H.264 SVC-encoded streams is proposed. Similar to the work of [6] for authentication, [29] treats a H.264 SVC stream in two dimensions—the spatial-quality scalability dimension and the temporal scalability dimension where the former models the spatial and quality layers of each frame and the latter models different frames. We use the term unit to denote a portion of video data for a particular temporal and spatial-quality layer. Each unit is encrypted using an encryption key generated in a hierarchical manner. However, the scheme is vulnerable to user collusion attack, and it assumes the presence of an online KDC for key distribution. We discuss this scheme in detail in Section 6.2.

Access control schemes for streams encoded by the MPEG-4 fine granularity scalability (FGS) standard are proposed in [30] and [31]. It is assumed that a single encrypted MPEG-4 FGS stream is used to cater for two different applications—one requiring scalability in terms of Peak Signal-to-Noise Ratio (PSNR) and another in terms of bit rate. To this end, video data of a MPEG-4 FGS quality enhancement layer frame are partitioned into a group of segments. Each segment simultaneously belongs to a PSNR layer and a bit rate layer that are correlated to each other, for example, a low PSNR layer is likely to share data with a low (instead of a high) bit rate layer. The security requirement in [30] and [31] is to ensure that a right to access a layer of one scalability type does not make the layers of the other scalability type also accessible or vice versa. In [30], independent keys are generated for encrypting each segment. This scheme is inefficient primarily because of the large number of encryption keys to be managed by the KDC and users. In [31], the authors exploited one-way hash function and property of the Diffie–Hellman problem to reduce the number of encryption keys. However, the scheme is vulnerable to collusion attack and requires an online KDC to distribute encryption keys to users.

We note that selective encryption on multimedia content is another widely studied technique. Selective encryption exploits the fact that a multimedia stream could contain more important portions that can be selectively encrypted to provide sufficient level of control. By selectively encrypting the base layer while leaving the enhancement layers in the clear, the source can prevent unauthorized access to the complete high-quality multimedia stream. On the other hand, by selectively encrypting the enhancement layers while leaving the base layer in the clear, the source can allow a low quality preview of the multimedia stream for marketing purposes. Thus, in addition to reducing computation cost compared with that incurred by total encryption, selective encryption is also used to provide additional functionalities in different applications. However, it is worth noting that selective encryption is insufficient to protect the confidentiality of the multimedia stream as the unencrypted portion may leak significant context information, as shown in [32]. For a comprehensive survey on selective encryption schemes for video, the readers are referred to [33].


3.1 Erasure correction code

Let k and n be two positive integers satisfying k < n. An (n, k) erasure correction code (ECC) consists of an encoder module and a decoder module. The encoder module accepts a k-tuple of information symbols Xk = (x1,x2, …,xk) and outputs an n-tuple of codeword Yn = (y1,y2, …,yn), where all xi and yj have the same bit length, 1 ≤ i ≤ k and 1 ≤ j ≤ n. In an (n, k) systematic ECC, the codeword Yn = (x1,x2, …,xk,yk + 1,yk + 2, …,yn), where yk + 1, …, yn are called parity check symbols [34]. In this paper, we assume that the (n, k) ECC is maximum distance separable, that is, its minimum Hamming distance is n − k + 1. Thus, given any k or more symbols in Yn, the decoder can output the original information of Xk [34].

Our first authentication scheme makes use of a double-ECC coding scheme (DECS), that is similar to that used in [19], to achieve a high probability of successful authentication but with low communication overhead. A DECS scheme consists of an (n, k) systematic ECC scheme and a (2n − k, n) systematic ECC scheme, denoted by ECCn,k and ECC2nk,n, respectively. The encoding and decoding functions of the DECS scheme are described in the following.

Encoding function DECS-EN(Xn). This function takes as input an n-tuple Xn = (x1,x2, …,xn), where xi is l bits long, and outputs Zn = (x1 ∥ y1, x2 ∥ y2, …, xn ∥ yn), where ∥ denotes the concatenation of two symbols. The function proceeds in four steps:

  1. Compute a 2n − k-tuple, (x1,x2, …,xn,c1,c2, …,cn − k) ← ECC2n − k,n(Xn).
  2. Divide c1 ∥ c2 ∥ ⋯ ∥ cn − k into k symbols of equal length, denoted by (d1, d2 …, dk).
  3. Compute an n-tuple (y1,y2, …,yn) ← ECCn,k(d1,d2, …,dk).
  4. Output Wn = (x1 ∥ y1, x2 ∥ y2, …, xn ∥ yn).

Decoding function DECS-DE(Yq). Suppose that math formula is a subset of Wn and k ≤ q ≤ n. The decoding function takes Yq as input and outputs Xn with the following steps:

  1. Use ECCn,k to decode math formula to obtain (d1,d2, …,dk), because q ≥ k.
  2. Divide d1 ∥ d2 ∥ ⋯ ∥ dk into n − k symbols of equal length, namely (c1,c2, …,cn − k).
  3. Use ECC2nk,n to decode math formula and (c1,c2, …,cn − k) to obtain Xn = (x1,x2, …,xn) because q + n − k ≥ n.

The remarkable benefit of DECS is that, as long as at least k symbols in Wn are received, the DECS decoding ensures that all (x1,x2, …,xn) will be recovered.

3.2 Attribute-based encryption

Another building block used in this paper is the ciphertext policy attribute-based encryption scheme (CP-ABE) proposed in [35]. In CP-ABE, every user's secret key is associated with a set of attributes, whereas every ciphertext is associated with a ciphertext policy, that is, an access structure on attributes. A user successfully deciphers a ciphertext on the condition that his or her key attributes satisfy the access structure specified in the ciphertext.

Specifically, an access structure is represented by an access tree math formula with a root denoted by R and n leaf nodes corresponding to n attributes, respectively. Each inner node x with cx children nodes is attached with a threshold value tx satisfying 0 < tx ≤ cx. For all leaf nodes x,  tx = 1. Given a set of attributes math formula, math formula is evaluated from the leaves upward to R. A leaf is evaluated as 1, that is, true, if and only if the corresponding attribute is enclosed in math formula. Each inner node x is then evaluated as 1 if and only if at least tx of its children nodes are evaluated as 1. If R is evaluated as 1, we say that math formula, namely, math formula matches access structure math formula. An example access tree is depicted in Figure 1.

Figure 1.

An access tree for the access structure a1 ∨ a2 ∨ (a3 ∧ a4), where a1, a2, a3, a4 are four attributes.

The CP-ABE scheme in [35] consists of the following four algorithms:

  • AB-Setup is an initialization algorithm run by a trusted authority. It takes as input a security parameter and outputs a public key PK and a master secret key MK.
  • AB-KeyGen(MK, math formula) is run by the authority to issue a key for a given attribute set math formula. It takes as input MK and math formula and outputs a key math formula associated with math formula.
  • AB-Encrypt(PK, m, math formula) is run by a user to perform a CP-ABE encryption on message m with an access structure math formula. Taking as input PK, m, and math formula, it outputs a ciphertext math formula.
  • AB-Decrypt(math formula, math formula, math formula, math formula) is run by a user holding math formula to decrypt a ciphertext math formula with a ciphertext policy math formula. If math formula, it outputs the corresponding plaintext m correctly. Otherwise, it outputs ⊥.


We consider a multimedia content distribution setting that consists of a content source, a set of proxies, and a group of heterogeneous end users. The source produces protected multimedia streams and forwards them to proxies. The proxies are responsible for stream transcoding (i.e., dropping certain enhancement layers) according to users' preferences or network conditions and for delivering transcoded streams, called substreams, to the end users. When a user receives a transcoded stream from a proxy, he or she decrypts the encrypted content and/or verifies the authenticity of the received content. The proxy–user network may subject to packet loss because of transmission errors or traffic congestions. Our goal is to allow all heterogeneous users to successfully receive the protected streams despite proxy transcoding and packet loss.

As in [6, 7], we model a scalable multimedia stream as a sequence of video frames. Each frame has a base layer and m cumulative enhancement layers, where m ≥ 0. Without loss of generality, we assume that each network packet carries a complete frame. Hence during the rest of the paper, we use the term frame and packet interchangeably. To keep the description compact, we illustrate our schemes by considering one-dimensional scalability, but they can be easily extended to support multidimensional scalability.


Both schemes consist of three algorithms: the authentication algorithm used by a source to generate and insert authentication data; the transcoding algorithm used by a proxy to perform transcoding; and the verification algorithm used by an end user to verify received packets. In the sequel, we use math formula to denote a collision-resistant hash function.

5.1 Signature-based authentication scheme

A multimedia stream is divided into groups of n packets. Because packet groups are processed independently, we focus on the processing of one packet group G = [P1,P2, …,Pn], where Pi denotes the ith packet (or frame) in the group. For each packet Pi, we use Li,0 to denote its base layer and Li,j its jth enhancement layer, j = 1, 2, …, m. Therefore, Pi = Li,0 ∥ Li,1 ∥ ⋯ ∥ Li,j ∥ ⋯ ∥ Li,m.

During system initialization, the source chooses a digital signature scheme Σ = (Sig(), Vfy()) with a secret key sk and a public key pk. The source's signature on a message M is computed as σ = Sigsk(M). Given signature σ and message M, anyone can verify the authenticity of M by checking whether Vfypk(σ, M) returns 1. The source's public key pk is distributed to all entities in the system in an authenticated manner. The proposed scheme consists of the following three algorithms.

Authentication algorithm. Suppose that the source generates a multimedia stream with an identifier Sid. Taking as input a packet group G = [P1,P2, …,Pn] with an identifier Gid, the source performs the following steps to output an authenticated packet group G′.

  • Step A1.For each packet Pi, 1 ≤ i ≤ n, compute math formula as the digest of the top enhancement layer, and compute math formula, for layer Li,j, 0 ≤ j ≤ m − 1.
  • Step A2.Generate a codeword (h1,0 ∥ y1, …, hn,0 ∥ yn) = DECS-EN(h1,0,h2,0, …,hn,0).
  • Step A3.Compute the packet group hash math formula and σ = Sigsk(H).
  • Step A4.Divide the binary string of σ into k equal-length segments (σ1,σ2, …,σk). Generate the codeword (s1,s2, …,sn) by applying the ECCn,k encoding function on (σ1,σ2, …,σk).
  • Step A5.Output G′ = [P1,P2, …,Pn] as the authenticated packet group, where P1, …, Pn are given by
display math(1)
display math(2)
display math(3)
display math(4)

Figure 2 depicts the procedure for the source to generate packet hashes. In the end, the source transmits G′ to the proxies.

Figure 2.

Packet hash generation for the proposed signature-based authentication scheme for a scalable multimedia stream.

Transcoding algorithm. Upon receiving G′ = [P1, …,Pn] from the source, the proxy adapts it according to the downstream network bandwidth or capabilities of user devices by removing certain enhancement layers from the top. To remove the top t enhancement layers, 1 ≤ t ≤ m, the proxy simply truncates every packet Pi in G′ into a new packet Pi such that Pi = Li,0|| ⋯ ||Li,m − t. Then, the proxy forwards G″ = [P1, …,Pn] to end users.

Verification algorithm. Because of packet loss, a user may only receive q of the n packets in G″ sent by the proxy, k ≤ q ≤ n, denoted as math formula, where 1 ≤ i1 < i2 < ⋯ < iq ≤ n. If q < k, the user rejects math formula because it does not contain sufficient information for verification. Otherwise, he or she runs the following steps to verify the integrity.

  • Step V1.For each packet math formula, parse Pj into m − t + 1 layers, that is, Pj = Lj,0||Lj,1|| ⋯ ||Lj,m − t. In addition, parse the base layer Lj,0 such that math formula, and parse each enhancement layer Lj,l such that math formula for l ∈ [1, m − t]. Namely, the user recovers math formula for each packet Pj.
  • Step V2.For each packet math formula, compute the hash values from the (m − t)th enhancement layer down to the base layer. Namely, compute math formula, 0 ≤ l ≤ m − t − 1, and math formula.
  • Step V3.For all packets math formula, use the hashes on their base layers and the y values to recover the hash digest of the packet group math formula. Namely, compute math formula, because q ≥ k.
  • Step V4.Compute the packet group hash math formula.
  • Step V5.Recover the signature by computing math formula, because q ≥ k. Set σ = σ1||σ2|| ⋯ ||σk.
  • Step V6.Verify the signature by checking whether Vfypk(σ,H) outputs 1. If so, accept math formula; otherwise, reject it.□

5.2 HMAC-based authentication scheme

It is demanding for a low-end user device in the previous scheme to store a group of n packets and verify the digital signature without disrupting the stream data rendition. We next propose a lightweight authentication scheme employing HMAC. Because the computation time of a HMAC is comparable with that of a hash computation, the scheme significantly reduces the computation cost of the end users. A prerequisite of this scheme is that the source shares a secret key kMAC with all users. We discuss this key management issue in the next section.

The HMAC-based scheme also consists of an authentication algorithm for the source, a transcoding algorithm for the proxies, and a verification algorithm for end users. The transcoding algorithm is identical to that in the signature-based scheme and hence is omitted in the following.

Authentication algorithm. Suppose that the source generates a multimedia stream with an identifier Sid. The authentication is on the packet level. Given a packet P = L0|| ⋯ ||Lm, the source performs the following to output an authenticated packet P′ = L0|| ⋯ ||Lm.

  • Step A1.Compute math formula as the digest for the top enhancement layer, and compute math formula for all 0 < j ≤ m − 1.
  • Step A2.Compute h0 = HMAC(h1||L0||Sid, kMAC).
  • Step A3.Set Lm = Lm, Li = hi + 1||Li for 1 ≤ i ≤ m − 1, and L0 = h1||L0||h0.
  • Step A4.Output P′ = L0||L1|| ⋯ ||Lm.□

Verification algorithm. A user verifies every packet P′ in a received stream individually using the HMAC key kMAC. Suppose that at transcoding, the proxy removes t of m enhancement layers from every packet. Let math formula be a received packet.□

  • Step V1.For all 0 < i ≤ m − t, parse math formula into math formula, and parse math formula into math formula.
  • Step V2.Set math formula, and compute math formula for all 0 < i ≤ m − t.
  • Step V3.Compute h0 = HMAC(h1||L0||Sid, kMAC).
  • Step V4.If math formula, math formula is accepted and output P = L0|| ⋯ ||Lm − t; otherwise, math formula is dropped.

5.3 Security and discussions

In the signature-based authentication scheme, the source's signature is computed over the concatenation of the hashes of all n packets where each hash carries the accumulated hashes of all enhancement layers residing in the same packet. As long as the hash function and the signature scheme are secure, the resulting authentication scheme is secure in the sense that any malicious modifications on the packets will be detected by the verification algorithm. The DECS coding scheme has no impact on security, it simply protects all the hashes so that the verifier can still recover all hashes and therefore verify the signature in spite of packet loss. Similarly, the HMAC-based authentication scheme is secure as long as the underlying HMAC algorithm is secure. The security of both schemes can be proved formally using reduction [36], by showing that breaking the security of the authentication schemes leads to breaking of the hash function, signature scheme, or HMAC algorithm.

The signature-based authentication scheme requires that the public key of the source be distributed to end users in a public but authenticated manner. This can be carried out for example by embedding the public key in the client software. The HMAC-based authentication scheme uses secret keys for HMAC computation that normally requires the existence of a KDC. Another advantage of the signature-based scheme is that it provides sender nonrepudiation but at the expense of introducing more computational overhead due to use of signature, as will be shown in the next section.

5.4 Performance and discussions

With reference to our source–proxy–user framework, we define the following metrics to evaluate the performance of an authentication scheme designed for scalable multimedia streams:

  • Computation time. The amount of time needed by the source as well as the proxy to generate authentication information for a group of n packets.
  • Verification time. The amount of time needed by a user to verify a group of q packets, k ≤ q ≤ n.
  • Per-packet authentication information. The amount of authentication information contained in a packet.
  • Buffer size. The buffer space needed at the source and user to verify and process the packets.
  • Authentication probability. The percentage of packets that are received and verifiable.
  • Proxy transparency. The need for proxies to be aware of the authentication mechanism.

We compare our authentication schemes, namely the signature-based and HMAC-based schemes proposed in Sections 5.1 and 5.2, respectively, with the FAS and PAS in [7] and the hash-chaining scheme in [6] because they all apply to generic scalable multimedia streams. We briefly describe FAS and PAS in the following discussion, whereas the hash-chaining scheme is described in Section 2.1.

Both FAS and PAS perform authentication on a group G of n packets, each with a base layer and m enhancement layers. For each packet Pi, the hash of layer j, hi,j is computed as math formula, where Li,j denotes the jth layer of Pi. In FAS, the packet hash math formula is the hash of the XOR of all layer hashes, whereas in PAS, the packet hash is generated as math formula. For both ||schemes, the packet group hash is computed as math formula, where Gid and Sid are the group and stream identifiers, respectively. H is signed, and n hashes along with the signature are similarly encoded using the technique in [19].

Our comparison focuses on a group of n packets, and the parameters used in our evaluation are shown in Table 1. These parameters are chosen as follows: taking a practical example of an encoded scalable video stream from [37], we have a quality scalable QCIF (Quarter Common Intermediate Format) sequence mobile at 15 frames per second with a base layer of 64 kbps and m = 4 enhancement layers. Consecutive enhancement layers increase the bit rate to 80, 96, 112, and 128 kbps, respectively. We let |BL|, |EL|, and |L|, respectively, denote the size of the base layer, enhancement layer, and the average size of a scalable layer component in a frame (in this case, |BL| = 533, |EL| = 133, and |L| = 213 Bytes averaging over 15 frames).

Table 1. List of parameters.
|BL|533 BytesAverage size of the base layer in a packet
|EL|133 BytesAverage size of an enhancement layer in a packet
|L|213 BytesAverage size of a layer in a packet
m4The number of enhancement layers
t2The number of discarded enhancement layers
math formula, |MAC|20 BytesSize of the output of a collision-resistant hash function and a HMAC function
|σ|128 BytesSize of a digital signature
math formula, math formula0.5498 µsTime taken by the source and proxy to compute a hash with 64-Byte input block
math formula, math formula  
math formula, math formula77.016 µsTime taken by a user device to compute a hash with 64-Byte input block
math formula1.48 msTime taken by the source to generate a signature
math formula27.2 msTime taken by the user device to verify a signature
|H|20n + 4Size of packet group hash H assuming 4 Bytes for group and stream identifier Gid and Sid

In our setting, we assume that the proxy removes t = 2 enhancement layers from the original video stream. In addition, we assume that the time used by the source and proxy to compute a hash on a 64-Byte input block math formula is 0.5498 µs, while the time for the source to generate a signature math formula is 1.48 ms [38]. On the other hand, the time taken for an HP Hx 2790 with a 624-MHz processor* to compute a hash on a 64-Byte input block, math formula, is 7.7016 µs, whereas the time to verify a signature, math formula, is 2.72 ms [39]. However, considering that 90%–95% of the CPU processing time are used for multimedia processing, we let math formula and math formula. In addition, we let math formula for i ∈ {s,p,u}.

Table 2 shows the analytical results on the performance of the FAS, PAS, hash-chaining, signature-based, and HMAC-based authentication schemes.

Table 2. Analytical results on performance of the FAS, PAS, hash-chaining, signature-based, and HMAC-based authentication schemes.
 FASPASHash chainingSignature basedHMAC based
  • * Per-packet authentication information.
Sender operations when preparing n packets each with m + 1 layers
Computation time(ms)math formulamath formulamath formulamath formulamath formula
Buffer size (packets)nnnn1
PPAI* (Bytes)math formulamath formulamath formulamath formulamath formula
Proxy operations for n packets when removing t enhancement layers
Computation time(ms)math formulamath formula
PPAI* (Bytes)math formulamath formulamath formulamath formulamath formula
Proxy transparencyNoNoYesYesYes
Receiver operations assuming it receives q packets with (m − t + 1) layers, k ≤ q ≤ n
Verification time (ms)math formulamath formulamath formulamath formulamath formula
Buffer size (packets)At most nAt most nnAt most nAt most 1

5.4.1 Computation time at source and proxy

Figure 3 shows the source computation time for the schemes under different values of n. We assume that a single DECS encoding/decoding overhead is negligible compared with the computation times of a single hash and/or signature generation/verification and/or CP-ABE encryption/decryption. Among all, PAS has the highest computation time because of the use of progressive hash followed by the signature-based, FAS, and hash-chaining schemes, all exhibiting almost identical computation cost. A significant gap is observed in the computation time for the HMAC-based scheme against those of the other schemes. This is because the HMAC-based scheme replaces the costly signature generation with a fast HMAC computation.

Figure 3.

Source computation time of the FAS, PAS, hash-chaining, signature-based, and HMAC-based authentication schemes with respect to n.

Computation time incurred at a proxy depends on whether the authentication scheme is proxy transparent. The FAS and PAS schemes require the proxy to compute authentication information of removed layers on the fly. Hence, their computation time is linear to the number of removed layers. On the other hand, the signature-based, HMAC-based, and the hash-chaining schemes are proxy transparent as all necessary authentication information are already in the packets and no cost is incurred at the proxy. Among them, the signature-based and HMAC-based schemes are more desirable than the hash-chaining scheme as the latter does not tolerate packet loss.

5.4.2 Per-packet authentication information

The FAS, PAS, and signature-based schemes employed ECC to combat packet loss. Authentication information of a group of n packets is partitioned into k, ECC-coded, and the resulting n symbols are dispersed across the n packets such that receiving at least k of n packets in the group will ensure recovery of the authentication information. Hence, the per-packet authentication information is inversely proportional to k, and k is determined on the basis of the packet loss probability p; if p is expected to be high, then k has to be small to accommodate more parity checks.

For a source preparing a multimedia stream for distribution, it first estimates the anticipated packet loss probability p. Then, it selects a value α, 0 ≤ α ≤ 1, such that with k = αn data packets, an acceptable (as determined by the source) authentication probability under loss probability p can be achieved. With the values of k and n, the per-packet authentication information can then be calculated as shown in Table 2.

As such, before transcoding, for the FAS, PAS, and signature-based schemes, the per-packet authentication information increases linearly with p, whereas for the hash-chaining and HMAC-based schemes, the per-packet authentication information remains constant regardless of p because each packet can be individually authenticated. The per-packet authentication information changes after transcoding. Figure 4 shows the per-packet authentication overhead as a ratio (i.e., authentication data per-packet over packet length) at p = 0.5 and k = 0.3n so as to achieve an authentication probability of ≥ 0.97.

Figure 4.

Per-packet authentication information for the signature-based, HMAC-based, hash-chaining, FAS, and PAS schemes before and after transcoding at p = 0.5 and k = 0.3n.

As shown in Figure 4(a), before transcoding, the signature-based scheme has the highest ratio, followed by the hash-chaining, HMAC-based, FAS, and PAS schemes. FAS and PAS have the lowest ratio because the three other schemes incorporated authentication information of all layers within a packet. As a result, in Figure 4(b) after transcoding, although the ratio of the signature-based scheme remains the highest, it is now followed by FAS and PAS and then the hash-chaining and HMAC-based schemes. A more significant increase (6.3%) is observed in the ratio of FAS and PAS compared with the others. This is because after transcoding, additional authentication information of the removed layers is incorporated into packets in FAS and PAS.

5.4.3 Verification time at the user.

Similar to the per-packet authentication overhead, the verification time at an end user is also indirectly affected by the parameter k, that is, the number of received packets q where k ≤ q ≤ n. Intuitively, the verification time will be longer when p is low. Figure 5 compares verification time of various schemes with respect to n for p = 0.5 and k = 0.3n. In this figure, the hash-chaining scheme has the longest verification time because it does not tolerate packet loss, and thus, all of the n packets have to be verified to authenticate a packet group. It is then followed by the signature-based, FAS, and PAS schemes having almost identical verification time. The HMAC-based authentication scheme has a constant verification time equivalent to that of verifying a single packet because it allows user to check the integrity of individual packets, rather than an entire packet group.

Figure 5.

Verification times for the signature-based, HMAC-based, FAS, PAS, and hash-chaining schemes with respect to n (p = 0.5, k = 0.3n).

5.4.4 Authentication probability.

In our signature-based scheme, a packet group can be verified successfully only if the number of lost packets in a group is (n − k) or less. Assuming independent packet loss, the probability of successful authentication is given by

display math

where p is the loss probability. Figure 6 shows the authentication probability respective to different per-packet authentication information. From this figure, we see that with approximately 115 Bytes of per-packet authentication information (≈ 0.09% of the packet size), the signature-based scheme can achieve a 99% authentication probability when p = 0.5.

Figure 6.

Authentication probability for the signature-based scheme under different packet loss probabilities (n = 128 packets).

From the aforementioned results, we can infer the following:

  • The hash-chaining scheme is not suitable for multimedia distribution over packet-lossy networks because it cannot tolerate packet loss, whereas the HMAC-based scheme is the most robust against network loss as each packet is individually verifiable.
  • For applications that are not sensitive to delays, the signature-based, FAS, or PAS scheme can be used. For applications requiring minimal delay, the HMAC-based scheme outperforms the others.
  • Although the FAS and PAS have the lowest initial authentication ratio, they exhibit similar performance as the signature-based and HMAC-based schemes after transcoding (note that limited bandwidth often occurs in the proxy–user communication link). Hence, the signature-based and HMAC-based schemes are better choices because they are proxy transparent, whereas the FAS and PAS are not.
  • For applications requiring nonrepudiation of origin service, only schemes using digital signatures, such as the signature-based, FAS, and PAS schemes, can be used. Although the HMAC-based scheme outperforms all the other schemes, it only provides stream authentication, not nonrepudiation, service.

Authentication schemes proposed in [6] and [9] are based on the Merkle hash tree. In these schemes, a leaf node is the hash of a layer in a video frame, an interior node is the digest of concatenation of its children, and the root represents the digest of the video frame. This set of roots then form the leaves of another hash tree; the root of which represents the digest of a group of frames. The source needs at least math formula hash computations for tree construction assuming n frames, each having (m + 1) layers. Furthermore, the schemes are not proxy transparent as the proxy needs to incorporate the hash of the removed layer and/or hashes of subtrees covering the removed layers to the packets. To provide proxy transparency, each leaf node needs to carry a signature generated on the root and hashes of the sibling nodes on the path to the root node (≈ log2(m + 1) ⋅ Shash). This incurs a high computation and communication overhead.


In this section, we first present a generic access control and authentication scheme for scalable multimedia streams and then show how to apply it for access control of H.264 SVC streams.

6.1 The scheme

As before, we consider a source–proxy–user content delivery system and model a scalable multimedia stream with a set of layers {L0,L1, …,Lm}, where L0 is the base layer, whereas L1, …, Lm correspond to m enhancement layers. When joining the system, a user subscribes to a privilege level on the basis of his or her preference. In general, the number of privilege levels is less than the number of layers. However, to simplify the presentation and without loss of generality, we assume that they are the same and that a user with privilege level j is entitled to access the set of layers math formula, j = 0, 1, …, m.

The proposed scheme is constructed upon the HMAC-based authentication scheme in Section 5.2. To enforce access control on a stream, the source picks a random root key km and then computes math formula, where math formula is a collision-resistant hash function. The source uses kMAC to authenticate the stream as in the HMAC-based authentication scheme and uses ki to encrypt layer Li, i = 0, 1, …, m.

We make use of the CP-ABE scheme in Section 3.2 to encrypt the encryption keys such that only privileged users can obtain the corresponding keys. For each kj, j = 0, 1, …, m, the source constructs a CP-ABE access tree Γj that has a single attribute node aj corresponding to privilege level j with a threshold value 1. That is, Γj({aj}) = 1. The source computes the CP-ABE encryption on kj with the single-node access structure Γj. Because of the simplicity of Γj, the resulting ciphertext has the minimum length. A user u subscribing to privilege level j holds the CP-ABE secret key math formula associated with the attribute aj and recovers kj from the corresponding ciphertext. Then, starting from kj, the user traverses the hash chain to obtain all the keys for decrypting the layers he or she is allowed to access.

We now proceed to formally describe our access control and authentication scheme, which consists of four algorithms—Initialization, which initializes the source and users' settings, KeyDistribution, which distributes the symmetric encryption keys to users, StreamGeneration, which encrypts and authenticates a multimedia stream, and StreamReceive, which decrypts and verifies a received stream.

Initialization. The source runs AB-Setup to generate a CP-ABE PK and MK. When a user u registers to the source and subscribes to privilege level j, the source returns to him or her a math formula tuple, where the attribute set math formula and math formula.

KeyDistribution. The source executes the following steps to generate and distribute symmetric keys to all users.

  • Step K1.Choose a random symmetric key km and generate km − 1, km − 2, …, k0, kMAC as described earlier.
  • Step K2.For each j ∈ [0,m], construct an access structure Γj as specified earlier, and encrypt kj to obtain CTj = AB-Encrypt(PK,kjj).
  • Step K3.Send the set of ciphertexts math formula to the users over either an in-band or out-of-band channel.

StreamGeneration. As the HMAC-based authentication scheme, the scheme here operates at the individual packet level. Given a packet P = L0 ∥ ⋯ ∥ Lm in a multimedia stream, the source performs the following steps.

  • Step A1.Generate an authenticated packet P′ = L0 ∥ ⋯ ∥ Lm by running the authentication algorithm of the HMAC-based scheme specified in Section 5.2 with the key kMAC.
  • Step A2.Generate an encrypted and authenticated packet P″ = L0 ∥ ⋯ ∥ Lm by computing Li = Enc(Li,ki), for all 0 ≤ i ≤ m, where Enc() is a symmetric encryption algorithm in an standard mode of operation, such as CBC (Cipher Block Chaining) mode or counter mode [36].
  • Step A3.Output P″ as an authenticated and encrypted version of P.

StreamReceive. A user subscribing to privilege level j receives his or her granted keys during the key distribution phase. Specifically, when a user u with math formula receives math formula, he or she computes math formula. Then, he or she traverses the hash chain to derive all the keys, kj − 1, …, k0, kMAC, granted to him or her.

Upon receiving an encrypted and authenticated packet P″ = L0 ∥ ⋯ ∥ Lm − t from a proxy, the user proceeds as follows.

  • Step V1.Decrypt Li to obtain Li = Dec(Li,ki), for i = j, j − 1, …, 1, 0.
  • Step V2.For each decrypted layer Lj, execute the verification algorithm in Section 5.2 by using kMAC as the verification key.

Note that incorporation of authentication in our encryption-based access control scheme is not optional. It is well known that standard operation modes of block ciphers do not provide message authentication [36] and that using encryption without adequate integrity protection is vulnerable to active attacks [40]. It is believed that integrity or authentication service must be offered in any security-aware transmission [41].

We also note that the standard authenticated encryption modes in [42] [43] cannot be applied to achieve our purpose. These modes allow content confidentiality by means of symmetric key block cipher algorithm. Thus, a key management scheme with an online KDC would need to be devised to enforce access control by using these authenticated encryption modes. On the other hand, the use of attribute-based encryption in our scheme to disseminate encryption keys completely removes the need of an online KDC.

6.2 Application to H.264 SVC streams

In this section, we show how our generic scheme presented earlier can be applied to protect H.264 SVC streams. To keep the paper compact, we will focus on access control while omitting the part on stream authentication. In addition, we denote y consecutive applications of the hash function math formula as math formula.

In [29], an access control scheme is proposed for H.264 SVC-encoded video streams having temporal, spatial, and quality scalabilities. As mentioned in Section 2.2, a video stream is modeled in two dimensions—the vertical dimension for spatial-quality layers of a frame and the horizontal dimension for different frames (i.e., temporal layers). In both dimensions, the higher layer depends on the lower layers for decoding. We use the term unit, denoted as Di,j, to refer to spatial-quality layer i of a frame belonging to temporal layer j. Thus, a video stream with S spatial-quality layers and T temporal layers can have up to S × T units. The requirement of an access control scheme is such that for a user having access privilege to the unit Ds,t, the user will also have access to the set of units {Di,j | i ∈ [0, s − 1],  j ∈ [0, t − 1]} but not the units higher than Ds,t (Figure 7).

Figure 7.

H.264 SVC-encoded stream as modeled in [29] with S = 4 spatial-quality layers and T = 3 temporal layers. Direction of an arrow indicates a unit of the lower privilege, that is, a privilege to access unit D1,2 will also allow the user to access the units D1,1, D1,0, D0,2, D0,1,  and D0,0.

For the sake of consistency, we illustrate the scheme in [29] in terms the algorithms in Section 6.1—namely KeyDistribution, StreamGeneration, and StreamReceive.

KeyDistribution. Given a H.264 stream as shown in Figure 7, the source chooses a random K, computes math formula, and math formula, where kY and kX denote the scalability type keys for the spatial-quality scalability and temporal scalability, respectively. Then, the source computes, for all layers i = S − 1, S − 2, …, 0 in the spatial-quality scalability, the set of keys {kY,i} as math formula. Similarly, the source computes, for all layers j = T − 1, T − 2, …, 0 in the temporal scalability, the set of keys math formula. It then forwards the keys securely to a KDC.

StreamGeneration. For every unit in spatial-quality layer i and temporal layer j, that is, Di,j, the source uses k(i,j) = kY,i ∥ kX,j as the symmetric key to encrypt Di,j and then sends the encrypted stream to the users.

StreamReceive. A user with access privilege to the unit Ds,t first authenticates himself or herself to the KDC to obtain the key k(s,t) = kY,s ∥ kX,t. Using k(s,t), the user derives all k(p,q) for p < s and q < t and uses this set of keys to decrypt the units he or she is entitled to access.

The scheme in [29] allows both the KDC and a user to maintain only a single key, and a privileged user can derive all the necessary keys to decrypt the granted units by using a one-way hash. However, the scheme is subject to collusion attack where two users separately subscribing to lower privilege levels can cooperate and derive encryption key of a higher privilege level. With reference to Figure 7, suppose user A obtains k(0,2) = kY,0 ∥ kX,2 from the KDC and user B obtains k(3,0) = kY,3 ∥ kX,0. When they collude, they obtain k(3,2) = kY,3 ∥ kX,2 for decrypting the full stream that they originally did not have privileges to access. This workaround is clearly unacceptable for most applications.

In the following, we show two approaches that are secure against this type of user collusion attack. In both approaches, the source classifies the layers into privilege levels. Note that a privilege level may cover one or more layers. However, to simplify explanation, we assume that each unit Di,j corresponds to a privilege level (i,j). As a result, there are S × T privilege levels: (i,j), i ∈ [0, S − 1], j ∈ [0, T − 1]. The source generates S × T encryption keys ki,j to encrypt Di,j, i ∈ [0, S − 1],  j ∈ [0, T − 1], uses CP-ABE to encrypt ki,j, and then sends the ciphertexts to users over either in-band or out-of-band channels. The two approaches differ in how the keys ki,j are generated.

Key generation—Approach 1. To generate a set of S × T encryption keys, the source

  1. Generates a random key K.
  2. For the highest temporal layer T − 1 in every spatial-quality layer i,  i ∈ [0, S − 1], computes math formula, where “S″ is the American Standard Code for Information Interchange (ASCII) code of the letter S.
  3. In a given spatial-quality layer s,  s ∈ [0, S − 1], for each of the remaining temporal layers j, j ∈ [0, T − 2], computes math formula, where “T″ is the ASCII code of the letter T.

Figure 8 shows the keys generated for the stream in Figure 7. Depending on a user's access requirements, there are two scenarios to be considered at the user end (Figure 8).

  1. User subscribes to privilege level (s, T − 1) (i.e., to a stream with certain spatial-quality layer s but with the highest temporal layer T − 1). The user only needs to obtain ks,T − 1 from the corresponding CP-ABE ciphertext. Using ks,T − 1, he or she can derive the set of keys {ki,j} for i ∈ [0, s − 1] and j ∈ [0, T − 2].
  2. User subscribes to privilege level (s,t) for some spatial-quality layer s and some temporal layer t. The user needs to first obtain {ki,t}i ∈ [0,s] by decrypting s + 1 CP-ABE ciphertexts and then computes the other keys necessary for decrypting the units he or she is entitled to access.
Figure 8.

Keys generated using Approach 1. If a user subscribes to privilege level (2, 2), he or she needs to know the key k2,2 to access the lower privilege levels; if the user subscribes to privilege level (2, 1), he or she needs to know the keys k2,1, k1,1, k0,1 to access the lower privilege levels.

This approach allows the source to maintain a single key, but a user has to maintain potentially more than one keys due to the sequential key generation. The efficiency of this approach is the highest if a user subscribes to a stream of the highest temporal layer (regardless of which spatial-quality layer) because he or she needs to perform only one CP-ABE decryption. On the other hand, if the user subscribes to a temporal layer other than the highest layer, he or she needs to perform more than one CP-ABE decryptions—the maximum being the number of available spatial-quality layers in the stream.

Key generation—Approach 2. In our second approach, we utilize a key generation method proposed in [44]. To generate a set of S × T encryption keys, for each privilege level (i,j), the source chooses a random secret encryption key ki,j, a unique public label li,j, and where applicable, public values math formula and math formula, where ⊕ is an exclusive OR operation.

Figure 9 shows the keys generated for the stream in Figure 7. We stress that the secret keys ki,j are CP-ABE encrypted. The CP-ABE ciphertexts and the public values (i.e., li,j, yi − 1,j, and xi,j − 1) are sent to users via either an in-band or out-of-band channel.

Figure 9.

Keys generated using Approach 2, where math formula, math formula, math formula, and math formula.

Note that a user subscribing to a privilege level (s,t) can always derive keys for the lower privilege levels by using the public information and the single key ks,t as follows. The user first computes math formula and math formula and then computes all ki,j for i ∈ [0, s − 2] and j ∈ [0, t − 2].

This approach similarly eliminates the need of an online KDC and requires the source and users to maintain only a single key. As a result, it allows each user to perform the minimum number of one CP-ABE decryption. It is secure against collusion attack because of the one-way function math formula, and it allows users of higher privilege to efficiently derive encryption keys for the lower privileges but not vice versa. However, the public values li,j, yi − 1,j, and xi,j − 1 must be delivered to users, which results in higher communication overhead than Approach 1.

6.3 Remarks

The access control schemes in [30] and [31] are designed for the MPEG-4 FGS streams and involve an online KDC. As mentioned in Section 2.2, the scheme in [30] generates independent keys for different privilege levels. As a result, both the KDC and users have to maintain a large number of keys for every video stream. This number is then reduced to one per video stream in [31], but this scheme is vulnerable to the same user collusion attack as described in Section 6.2. Note that the MPEG-4 FGS and the H.264 SVC stream structures are similar. Hence, our proposed approaches for H.264 SVC streams can be readily applied for access control of MPEG-4 FGS streams as well without the need for an online KDC and without suffering from the user collusion attack.

Users may subscribe and unsubscribe from a multimedia service. In the case that a new user subscribes to a multimedia service at a specific privilege level, the source issues to him or her a CP-ABE secret key associated with that privilege level. This is a one-time effort and can be carried out either online or offline. Encrypted multimedia streams can be broadcast to users, and only those users with the secret keys at the right privilege levels are able to successfully decrypt the received streams. Whenever a user terminates the subscription, the source must revoke the user such that he or she can no longer access the multimedia service. As pointed out in [35], this can be achieved by incorporating numerical attributes in a user's CP-ABE secret key. For instance, when a user subscribes to a certain privilege level, the source provides the user a CP-ABE secret key associated with an attribute specifying an expiry date.§ Before the expiry date, the user will be able to access multimedia streams at his or her access privilege. Once the time lapses, the user will need to obtain a new CP-ABE secret key with a new expiry date. Note that for better security, it is prudent practice to encrypt each video stream by using a different secret key; otherwise, compromising one key will result in compromising multiple video streams. Without employing CP-ABE, an online KDC is needed to distribute these secret keys to end users over authenticated and secure channels. The KDC would be operated by the source or other parties. In any case, an online KDC leads to higher operating cost and, if not managed properly, could be a single point of failure.


We first proposed two authentication schemes for generic scalable multimedia streams. The first scheme uses a digital signature coupled with a DECS for authenticity protection over a group of packets, such that the probability of successful authentication remains high in packet-lossy networks. The second scheme replaces the signature with a HMAC and offers packet level authentication with low computation overhead. Both schemes have the salient feature of proxy transparency with lower computation and verification costs (for the source and users respectively) compared with the existing schemes that are not proxy transparent. The HMAC-based authentication scheme has the lowest computation and verification costs and the strongest resilience to packet loss. Therefore, it is highly suitable for low-end user devices with limited computation power and noisy network access (such as wireless or mobile networks).

In addition, we proposed an access control scheme that allows flexible privilege classifications for generic scalable multimedia streams. The scheme uses CP-ABE to distribute secret keys to users, thereby eliminating the need of an online KDC. We demonstrated how to apply our scheme for access control of H.264 SVC streams. We pointed out a user collusion attack to the existing multimedia access control schemes in the literature and presented two key generation techniques that are secure against the attack.


This work was supported by A*STAR SERC Grant No. 102 101 0027 in Singapore.

  • *

    We have chosen this setting because most of the latest generation smartphones use processors of similar performance.

  • Note that the source can always alter the key generation sequence depending on user request pattern to achieve the optimal efficiency.

  • math formula

  • §

    Such application is feasible because most multimedia subscription is on a monthly or yearly basis.