• Open Access

An anonymous data aggregation scheme for smart grid systems

Authors

  • Xuefeng Liu,

    1. National Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China
    Search for more papers by this author
  • Yuqing Zhang,

    Corresponding author
    1. National Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China
    2. National Computer Network Intrusion Protection Center, Graduate University of Chinese Academy of Sciences, Beijing, China
    • Correspondence: Yuqing Zhang, National Computer Network Intrusion Protection Center, Graduate University of Chinese Academy of Sciences, Beijing, China.

      E-mail: zhangyq@ucas.ac.cn

    Search for more papers by this author
  • Boyang Wang,

    1. National Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China
    Search for more papers by this author
  • Huaqun Wang

    1. School of Information Engineering, Dalian Ocean University, Dalian, China
    Search for more papers by this author

ABSTRACT

By integrating the traditional grid with the advanced communication and information technologies, smart grid can provide a reliable and efficient energy service for our modern society. Data aggregation plays an important role in evaluating the current energy usage information of consumer domains, based on which the operation center can accommodate distributed power sources to maximize the utilization efficiency. However, it also incurs a potential risk to the consumer privacy. In this paper, we propose an anonymous multi-dimensional data aggregation for smart grid systems. With the proposed scheme, the operation center can compute both additive and non-additive aggregation functions over the collected reports from consumers. The computation cost of each consumer is independent of the number of collected data types. In addition, by using the batch verification technique, the operation center's computation cost can be significantly reduced. The security analysis demonstrates that the proposed scheme can achieve identity privacy preserving, data authentication, and confidentiality. Copyright © 2013 John Wiley & Sons, Ltd.

1 INTRODUCTION

Traditional power grid is a centralized interconnected network that delivers electricity energy from a few suppliers to a large number of customers. With the growing demand on high-quality electricity, the current power grid becomes a bottleneck to the ubiquitous electronic devices in our lives, because of the lack of load balance and effective real-time diagnosis, and so forth [1]. Aiming at providing a reliable and efficient energy service in modern society, the concept of smart grid has emerged by integrating the traditional grid with the advanced communication and information technologies. As shown in Figure 1, with the two-way transmission of electricity and information, smart grid owns unique characteristics such as self-healing, consumer participation, and the accommodation of generation options besides the reliability and efficiency [2, 3].

Figure 1.

Architecture of smart grid.

Data aggregation, including both addition functions (e.g., sum, average) and non-addition ones (e.g., variance, standard deviation, max/min, median, histogram, percentile), plays a crucial role in smart grid systems. As a primary motivation, the operation center can determine the real-time power usage information and predict the consumption of the next phase power by computing certain aggregation functions over the collected records from consumers. Then, the operation center could accommodate distributed power sources to maximize the utilization efficiency and make a variable electric pricing mechanism [4] that enables consumers to intelligently manage energy use and significantly reduce bills, for example, reducing their demands in peak load periods or scheduling energy-intensive electrical appliances running in off-peak periods.

Although data aggregation enables the operation center to easily monitor and effectively manage energy allocation, it also incurs a potential risk to the consumer privacy [5, 6]. This is because that the collected data for aggregation normally includes sensitive energy usage information, which could be maliciously used to reveal the consumer's information such as his or her habits, behaviors, activities, preferences, and even beliefs [7]. To encourage consumers participation, an anonymous data aggregation is greatly desired, which should satisfy a strong security requirement defined as follows. Specially, even the operation center cannot associate a collected record with a special consumer.

Many privacy-preserving data aggregation schemes have been proposed for sensor networks in the past few years [8-11]. These schemes are designed for the traditional single-dimensional data aggregation, for example, the monitored temperature. However, in smart grid systems, the aggregated data are usually in the terms of multi-dimensional manner including the amount of energy usage, the purpose of the consumption, and other necessary status [12], which renders the existing schemes [8-11] inefficient for the smart grid because they have to process every dimension separately. Recently, by integrating the super-increasing sequence and perturbation techniques, multi-dimensional privacy-preserving data aggregation schemes were presented based on the homomorphic encryption in [12, 13]. However, the schemes can only support addition aggregation functions such as sum and average, which may limit their usage in reality.

Our contribution. In this paper, we propose an anonymous multi-dimensional data aggregation system for smart grids based on the bilinear pairing cryptography. With the proposed data aggregation scheme, the operation center can compute both additive statistics and non-additive aggregation functions. The main objective of our scheme is to guarantee that no entity in the system can disturb the consumer privacy during the data aggregation. Besides, the proposed scheme can also achieve the authenticity and confidentiality of the collected consumer data. Performance analysis shows that the computation complexity of consumers is independent of the number of collected data types, and the computation overhead of the operation center can be significantly reduced by employing the batch verification technique.

The rest of the paper is organized as follows. In Section 2, related works are reviewed. Section 3 presents the overview of this work. The details of our proposed scheme are shown in Section 4. In Section 5, we analyze the security properties, followed by performance analysis in terms of computation cost and communication overhead in Section 6. Finally, conclusions are drawn in section 7.

2 RELATED WORK

Data aggregation has attracted many concerns in recent years. Privacy issue is a major problem to data aggregation applications because consumers may be unwilling to reveal their private data. In [8], He et al. proposed a privacy-preserving data aggregation scheme in wireless sensor networks based on slicing techniques. Specially, each node sliced its private data into a certain number (denoted as J) of pieces, maintained a piece itself and distributed the others to J − 1 random selected sensor nodes. Finally, all nodes summed up its own slice and all the received values and transmitted the sum to the query server for privacy data aggregation.

Feng et al. proposed a privacy-preserving data aggregation scheme by employing secret perturbation technique [9]. The main idea is that, instead of reporting the original data, each sensor node computed a sum of the real data and the secret shared with the sink. All the perturbed data are aggregated before being forwarded, the sink can simply subtract the sum of secrets from the aggregated data to obtain the aggregation of the original data. An efficient data aggregation scheme was proposed in [10] based on the homomorphic cryptosystem technique [14], which allows the intermediate node to carry out aggregation without encryption and decryption operations. Obviously, these schemes can only support sum aggregation functions.

In [11], Shi et al. extended the slicing technique to design a privacy-preserving data aggregation scheme that supports both sum and non-sum data aggregation functions in people-centric urban sensing systems. All the schemes mentioned earlier are designed for the sensor networks, where the aggregated data is usually assumed to be one dimensional. On the contrary, the collected data in smart grid is in the term of multi-dimensional including the amount of power usage, the purpose of the consumption, and so on [12], which renders the existing schemes [8-11] inefficient for the smart grid because they have to process every dimension separately.

Recently, an efficient multi-dimensional data aggregation scheme for secure smart grid (efficient and privacy-preserving aggregation (EPPA)) was presented in [12]. EPPA is based on the homomorphic Paillier cryptosystem such that the gateway can aggregate the consumer record without encryption and decryption operations. Upon receiving the aggregation value, the operation center can store each aggregated datum by employing the super-increasing sequence and perturbation techniques. However, the Paillier cryptosystem can support only addition homomorphic, which may limit the adoption of EPPA in practice. In this paper, we propose a multi-dimensional privacy-preserving data aggregation scheme that will support both addition and non-addition functions.

3 OVERVIEW

3.1 System model

Different from previous works, we do not assume that there exists a trusted third party in our system model, for example, a fully trusted local gateway in [12]. Instead, we consider a typical data aggregation system that involves two categories of entities, as illustrated in Figure 2: an operator center (OC) and n smart meters. Specially, the OC managed by the power supplier is responsible for calculating certain aggregate statistics over the collected records from consumers. For a wide range of statistics, the aggregation functions should support both additive and non-additive operations such as sum, average, variance, standard deviation, max/min, median, histogram, and percentile. Each consumer is equipped with a smart grid, which can automatically collect the real-time consumption and other related status data for aggregation.

Figure 2.

System model.

3.2 Threaten model

The OC is set to be honest-but-curious in our model. On the one hand, it will behave in an honest manner to follow the proposed scheme. On the other hand, it may be also curious to identify the owner's identity from a consumer report during data aggregation.

In addition, we assume that there exists an outside adversary in our system, who can eavesdrop the traffic flow to breach consumers' record privacy. Besides, the adversary also has the ability to inject false data arbitrarily in an aggregate procedure to bias the statistics result.

3.3 Security requirements

To fulfill a secure data aggregate statistics for smart grid system under the aforementioned threat model, the proposed scheme should satisfy the security requirements defined as follows.

  • Data confidentiality: This requirement guarantees that only the OC can access the consumer record, and the final statistics result is preserved against an outside adversary.
  • Data authentication: To defend against an injection attack, the OC should be capable of verifying that an individual record to be aggregated is issued from an authorized consumer.
  • Anonymity: To preserve the consumer privacy, the proposed system should ensure that the OC cannot infer the owner's identity from a special consumer report.

3.4 Security model

There are two kinds of entities participating in the proposed data aggregation protocol P. Let U and O denote a consumer and the operation center, respectively. It is worth noting that both U and O may have multiple instances involved in different executions of protocol P. We denote the instance i of U and the instance j of O by Ui and Oj, respectively.

Essentially, the data aggregation protocol P is an interactive protocol between Ui and Oj, which guarantees the anonymity of U and provides a session key sk between the instances of U and O to protect the consumer report. During the execution of P, an adversary math formula can make queries to any instance, which models the adversary capabilities in a real attack. All available oracles that the adversary can query are listed as follows.

  1. Send(Ui/Oj, m): Sends a message m to the instance Ui (or Oj) and returns a response following the protocol. This oracle is used to simulate an active attack, where the adversary has the ability to control all the communication such as inserting forged messages, modifying and canceling the existing messages, and so on. In addition, the adversary can also eavesdrop the communication.
  2. Reveal(Ui/Oj): Returns a session key computed by Ui (or Oi). This query is used to simulate the misuse of session key.
  3. RevealID(Ui): Returns the real identity of participant instance Ui. This query is used to simulate the misuse of user identity.
  4. Test(Ui/Oj): This query is to define the semantic security of session key and can be performed only once. A coin b is flipped. If b = 1, the adversary is given the session key of Ui (or Oj); otherwise, the adversary would learn a random number with the same length of the session key.
  5. TestAnon(Ui, ID0, ID1): This query is to define the anonymity of user identity and can be performed only once. A coin b is flipped. If b = 1, the adversary is given the real identity ID1 of Ui; otherwise, it returns a false identity ID0 to the adversary.

3.5 Security notations

Definition 1. (AKE security) In this experiment, the adversary math formula is allowed to query oracles including Send(Ui/Oj, m), Reveal(Ui/Oj), and Test(Ui/Oj). The aim of adversary math formula is to guess whether the challenge is the real session key or a random number in the Test(Ui/Oj) oracle by outputting a bit b′. Let math formula denote the case that math formula wins this game such that b = b′, where b is the chosen bit in the Test(Ui/Oj) oracle. The advantage of math formula in breaking the semantic security of protocol P is defined as follows:

display math(1)

The protocol P is said to be semantically secure if the advantage math formula is negligible for any adversary math formula running with polynomial time t.

Definition 2. (User anonymity) Besides Send(Ui/Oj, m) and Reveal(Ui/Oj) oracles, the adversary math formula can also query RevealID(Ui) and TestAnon(Ui, ID0, ID1). The aim of adversary math formula is to guess whether the challenge identity is real or not in the TestAnon(Ui, ID0, ID1) oracle by outputting a bit b′. Let math formula denote the case that math formula wins this game satisfying b = b′, where b is the chosen bit in the TestAnon(Ui, ID0, ID1) oracle. The advantage of math formula that is violating the anonymity of protocol P is defined as follows:

display math(2)

The protocol P is said to be anonymous to the user if the advantage math formula is negligible for any adversary math formula running with polynomial time t.

4 OUR SCHEME

In this section, we elaborate on the details of the proposed anonymous data aggregation scheme, which consists of the following five operations: system setup, cryptographic token issuance, aggregation announcement, user data generation, and data aggregation and response. Before describing the proposed scheme, we first give a brief review on bilinear pairings [15] and several related underlying complexity assumptions.

4.1 Bilinear maps

Let G1 and G2 be an additive cyclic group and a multiplicative cyclic group of the same prime order q, respectively. Let e : G1 × G1 → G2 denote a bilinear map constructed by modified Weil or Tate pairing with the following properties:

  1. Bilinear: for all math formula and P, Q ∈ G1, e(aP, bQ) = e(P, Q)ab.
  2. Non-degenerate: there exists a point P such that e(P, P) ≠ 1.
  3. Computable: there is an efficient algorithm to compute e(P, Q) for any P, Q ∈ G1.

Complexity Assumptions

Definition 3. (Discrete Logarithm Problem (DLP)) For unknown math formula, given P1 ∈ G1, aP1 or Z ∈ G2, Za, compute a.

The (t, ε)-DLP assumption holds in G1 and G2 if no t-time algorithm has advantage at least ε in solving the DLP problem in G1 and G2.

Definition 4. (Co-Computational Diffie–Hellman Problem (co-CDH)) For unknown math formula, given (P ∈ G1, aP ∈ G1, v ∈ G2), compute va.

The co-CDH assumption holds in (G1, G2) if no t-time algorithm has advantage at least ε in solving the co-CDH problem.

4.2 System setup

In our scheme, we do not assume that there exists a trusted third party in the data aggregation system. Instead, the whole domain is managed by the operation center. To bootstrap the system, the OC first generates a bilinear map group system G1, G2, e(·,·), q, P as defined in Section 4.1. Then, the OC chooses two random elements math formula and H ∈ G1) as the master key. With the pair (x, H), the OC calculates W = x · H, Ppub = x · P and Δ = e(P,H) ∈ G2, respectively. Finally, the parameters (G1, G2, e(·,·), q, P, W, ∆, Ppub, h(·), (·)k will be published, where h(·) is a one-way hash function math formula, (·)k denotes a secure symmetric encryption algorithm with the key k.

Prior to join in the system, consumer i should submit his identity ID1 to the OC for the registration. Then, the OC uses the master key (x, H) to compute math formula and math formula.

Finally, the OC assigns the key aki = (Ai, Bi) to consumer i through a secure channel, for example, issuing a smart card.

4.3 Cryptographic token issuance

The cryptographic token issuance is performed between the OC and each consumer as follows. Note that there exists static share secrets between the OC and consumer i, that is, math formula and math formula. Thus, we assume that the communication between them is secure with the protection of Ai or Bi.

  • Step 1. The OC computes Yi = ri · Pτ and sends (Pτ, Yi) to consumer i, where Pτ ∈ G1 is the aggregation identifier and ri is a random number in math formula.
  • Step 2. Consumer i selects two random numbers αi, βi from math formula and computes Yi = αi ⋅ Yi + αiβi ⋅ Pτ. Then, consumer i sends math formula to the OC, where math formula.
  • Step 3. The OC uses its private key x to calculate Si = x(ri + δi) · Pτ and sends it back to consumer i.

Finally, consumer i computes Si = αiSi to obtain a partial blind signature [16] PBSi = Yi||Si on message xiP with an agreed information Pτ. Any verifier can verify the validity of the blind signature by checking whether Equation (1) holds.

display math(3)

Correctness:

display math(4)

4.4 Aggregation announcement

To execute a data aggregation, the OC performs the following steps to inform consumers for transferring their collected data.

  • Step 1. Composes a message PτN, where N is a random number used to resist replay attack and ║ denotes a concatenation operator that connects two strings together.
  • Step 2. Calculates a signature σ on message {PτN1} with its private key x under the elliptic curve digital signature algorithm [17, 18].
  • Step 3. Broadcasts the message PτNσ as an aggregation request to consumers.

4.5 User data generation

Consider that the smart meter of consumer i has collected l kinds of data (di,1, di,2,…, di,j). After receiving the aggregation request PτNσ, consumer i executes the following steps for user data generation.

  • Step 1. Checks whether Pr coincides with the one received in the cryptographic token issuance.
  • Step 2. Verifies the validity of the signature by using the public key Ppub.
  • Step 3. Computes xi · Ppub = xi · xP to encrypt the collected data, that is, math formula.
  • Step 4. Composes a message cixiPPBSi and transfers it to the OC.

4.6 Data aggregation and response

Upon the receipt of n records from consumers, the OC first verifies the validity of each record as follows.

  • Step 1. Verifies the validity of the blind signature PBSi on xiP with the aggregation identifier Pr by checking whether Equation (4) holds.
display math(5)
  • Step 2. Computes xixP and decrypts ci to obtain (di,1, di,2,…,di,j, N + 1).

Note that the verification of partial blind signatures can be accelerated by using the batch verification technique. Specially, the OC can verify whether Equation (3) holds. With the batch verification technique, the number of time-consuming pairing operations involved to verify n partial blind signatures can be reduced from 2n to 2.

display math(6)

Correctness:

display math(7)

After verifying and decrypting the consumer reports, the OC can compute arbitrary aggregate statistics including both additive and non-additive functions over them. According to the aggregation result, the OC can accommodate distributed power sources to maximize the utilization efficiency. Meanwhile, the OC also responds a message m to inform consumers about the current electricity status of the consumer domain as follows.

  • Step 1. Selects a random number math formula and computes
display math(8)
  • Step 2. Encrypts the message m with the key K, that is, c = (mN + 2)K.
  • Step 3. Broadcasts C1, C2, c, and σ1 in the consumer domain, where σ1 is a signature on (C1, C2, c) under the private key x.

Upon the receipt of C1, C2, c, and σ1, each consumer first verifies the signature σ1 and then computes Equation (7) to decrypt the ciphertext c.

display math(9)

With the current electricity status report m, consumers can adjust their behavior to reduce bills.

5 SECURITY ANALYSIS AND FORMAL PROOF

5.1 Security analysis

In this section, we analyze the security of our scheme in terms of data confidentiality, data authentication, and anonymity defined in Section 3.3.

  • Data authentication In the proposed scheme, each consumer data is in the format of cixiPPBSi, where math formula and PBSi = Yi||S′ is a partial blind signature on message xiP with an agreed information Pr. The employed partial signature technique [16] is existential unforgeable against adaptive chosen-message attacks under the assumption of the hardness of co-CDH problem, which guarantees that any attacker cannot forge a valid signature PBSi on xiP to compose a message math formula. Thus, the proposed scheme achieves data authentication.
  • Data confidentiality The consumer record is encrypted with the session key xxiP. Obviously, it is difficult for an attacker to compute xxiP without the knowledge of x and xi. Otherwise, the co-CDH assumption will be violated. Therefore, only the OC can access the user record. On the other hand, in the response phase, the OC employs the broadcast encryption technique [19] to protect the current electricity status such that only the authorized users in the domain can encrypt it. Specially, the OC computes C1 = k · W and C2 = k · P, K = ∆k and encrypts the response message with the key K, where k is a random number selected from math formula. For an outside attacker, it is unable to compute ∆k in polynomial time because of the hardness of co-CDH problem. Therefore, the response message is also kept secret.
  • Anonymity In the proposed scheme, each individual consumer's report is bounded by a blind signature [16] to declare its authenticity. Essentially, any view of the cryptographic token issuance (Yi, δi, Si) is unlinkable to any valid signature (Yi, Si, m = xiP, Pτ) because there always exists blind factors (αi, βi) satisfying the mapping from (Yi,δi,Si) to (Yi, Si, xiP, Pτ). Thus, the anonymity can be guaranteed. One can refer to Theorem 2 in [16] for a detail demonstration.

5.2 Formal proof

This section gives a formal proof to demonstrate the security of the proposed scheme.

Theorem 1. Let P be the protocol we proposed. Let math formula be an adversary running within a time bound t and executes at most qs sessions. Then, we have

display math(10)

Proof. We define a sequence of games to prove this theorem. For each game Gamei, let Succi denote the event that math formula correctly guesses the bit b involved in the Test query.

We define a sequence of games to prove this theorem. For each game Gamei, let Succi denote the event that math formula correctly guesses the bit b involved in the Test query.

  • Game0: This is the real protocol in the random oracle model. From the definition, we have

    display math(11)
  • Game1: In this game, we simulate all the protocol instances in game Game0, except that we terminate the protocol if adversary math formula breaks the existential unforgeability property of the elliptic curve digital signature algorithm (ECDSA) signature scheme under chosen-message attacks. The ECDSA signature scheme is provable secure [20] under chosen-message attacks. Thus, Game1 and Game0 are perfectly indistinguishable, and we have

    display math(12)
  • Game2 In this game, we simulate all the protocol instances in game Game1, except that we terminate the protocol if adversary math formula breaks the existential unforgeability property of the adopted blind signature scheme under chosen-message attacks. Because the employed blind signature is proved to be existentially unforgeabile under chosen-message attack [16], Game1 and Game0 are perfectly indistinguishable and we have

    display math(13)
  • Game3: In this game, we imbed a co-CDH tuple (aP, bP) into the protocol to instead xP of OC and xiP of the user. Then, we choose a session from the qs sessions. If the adversary chooses this session as the test session and win the games, that is, correctly guesses the bit involved in the test oracle, then we can use the adversary as a subroutine to solve the co-CDH problem. From the simulation, we can see that Game3 and Game2 are perfectly indistinguishable if the co-CDH problem is hard. So, we have

    display math(14)

Now, we analyze the probability that the adversary wins game Game3. We can see that if the probability of breaking the co-CDH problem is excluded, then the session key can be chosen as a random value. So, we have

display math(15)

Theorem 2. Let P be the protocol we proposed. Let math formula be an adversary running within a time bound t and executes at most qs sessions. Then, we have

display math(16)

The consumer-related information is only involved in the used blind signature. The anonymity of the signature ensures that even OC cannot infer the real identity of the consumer. The detailed formal proof of the signature technique can be found in [16]. Thus, the user anonymity can also be guaranteed in our scheme.

6 PERFORMANCE

This section evaluates the performance of the proposed scheme in terms of computation cost and communication overhead.

6.1 Computation cost

The computation cost of our data aggregation scheme mainly includes two parts, executed at the consumer side and at the OC side, respectively. In the user data generation, a consumer first takes two point multiplication operations in G1 to verify the validity of the signature σ signed by the OC using ECDSA. Besides, the consumer generates an encryption ci, which involves a point multiplication operation in G1 and a symmetric encryption operation. After receiving the response from the OC, the consumer needs to verify the attached signature and compute the encryption key K, which includes two point multiplication operations in G1, two pairing operations, and one multiplication operation in G2. Compared with the time-consuming cryptography operations such as pairing and point multiplication, the computation complexity for the symmetric encryption and multiplication in G2 are considered negligible. Thus, it only costs five point multiplication operations and two pairing operations for the consumer in our scheme, which is independent of the number of aggregated data types.

The computation tasks for the OC involve point multiplication in G1, exponentiation operations in G2, and pair operations. More specially, the OC has to perform a point multiplication in G1 for the generation of a signature in aggregation announcement. For data aggregation, the OC verifies the received n user records including two pairing and n point multiplication operations with the batch verification technique, and implements n point multiplication operations to compute the corresponding keys. In addition, three point multiplication and one exponentiation operations are performed by the OC to compute a key and the corresponding signature for response. In total, the computation cost of the OC includes two pairing, 2n + 3 point multiplication in G1, and one exponentiation operation in G2.

To the best of our knowledge, there are only two schemes named EPPA [12] and multidimensional privacy-preserving aggregation (MDPA) [13] that support multi-dimension data aggregation. It is worth noting that both EPPA and MDPA can only support addition aggregation functions such as sum and average. In contrast, our scheme enables the operation center to compute both additive and non-additive aggregation functions over the collected reports. Table 1 gives the test time for the involved cryptography operations by using pairing-based cryptography library [21]. The experiments are conducted on a computer with AMD Athlon II X2 3.10-GHz CPU and 4-GB RAM. The computation cost comparison is shown in Table 2. With the running time for cryptography operations, the variation of computation cost for each consumer is illustrated in terms of the collected data type number l in Figure 3. Similarly, Figure 4 gives the computation cost comparison of the OC in accordance with the number of users in one aggregation. From the figures, it is directly observed that the computation cost of our scheme is considered acceptable.

Table 1. Cryptographic operations execution time.
 DenotationTime (ms)
TmA multiplication in G12.3
TeAn exponentiation in G20.6
TpA pairing operation5.4
TexAn exponentiation in math formula10.3
Table 2. Computation cost comparison.
 UserOC
  1. OC, operator center; EPPA, efficient and privacy-preserving aggregation; MDPA, multidimensional privacy-preserving aggregation.
Our scheme5Tm + 2Tp(2n + 3)Tm + 2Tp + Te
EPPATm + 4Tp + (l + 1)Tex(n + 3)Tp + Te + 4Tm + Tex
MDPATpnTp + Tm
Figure 3.

Computation cost of a consumer. EPPA, efficient and privacy-preserving aggregation; MDPA, multidimensional privacy-preserving aggregation.

Figure 4.

Computation cost of the operation center. EPPA, efficient and privacy-preserving aggregation; MDPA, multidimensional privacy-preserving aggregation.

6.2 Communication overhead

Similar to the computation cost analysis, the communication overhead can be also evaluated from the two aspects: the consumer side and the OC side. First, we analyze the communication overhead of each consumer in our system. For user data generation, consumer i constructs and transfers a message cixxiP║PBSi, where ci = (di,1, di,2,…,di,j, N + 1) math formula, xxi ∈ G1, and PBSi = (Yi ∈ G1, Si ∈ G1). Thus, the size of the message is l*|d| + 3*|G1| + |N| = 512 + 32*l, where we assume |G1| = 160 bits for secure usage, N is a 32-bit random number used to resist replay attack, and the size of the individual collected data is 32 bits.

For the OC, it first broadcasts Pτ, N, and σ in aggregation announcement, where σ is an elliptic curve digital signature. Besides, the OC also broadcasts a response C1, C2, (mN + 2)Kσ1 in the consumer domain after the data aggregation. In total, the size of messages sent from the OC is math formula = 1216 bits.

Without loss of generality, we assume that smart meters collect 10 types of data for aggregation. From the aforementioned analysis, we can get that the overall communication overhead of our scheme is (512 + 320) * n + 1216 = 832 * n + 1216 bits, where n denotes the number of users in the consumer domain. In contrast, the total of the communication overhead in EPPA and MDPA are about 2304 * n + 800 and 1248*n + 2240 bits, respectively. Therefore, the proposed scheme indeed enjoys a better performance.

6.3 Discussion

Partial blind signature technique [16], serving as an essential tool in the proposed scheme, enables consumers to anonymously join in one data aggregation, that is, even the OC cannot refer the owner's identity from a consumer report. To maintain the unlinkability among two or more data aggregations, the cryptographic token issuance should be performed each time per aggregation; it thus seems to become a bottleneck for high-frequency data aggregation.

To investigate this problem, we evaluate the computation delay caused by one cryptographic token issuance. Specially, the OC computes two point multiplication operations in G1 for each user, that is, Yi = riPr in step 1 and Si = x(ri + tai)Pr in step 3. Considering a practical application consisting of 1000 consumers, the total delay of the OC for the cryptographic token issuance is about 1000 * 2 * 0.0023 = 4.6 s. On the other hand, each user only spends three point multiplication operations to calculate αiY, αi βi Pr, and Si = αiS, which takes about 3 * 0.0023 = 0.0096 s. Compared with the set time interval between two aggregations, for example, 15 min [12], the computation delay caused by one cryptographic token issuance is deemed acceptable.

7 CONCLUSION

This paper proposes a multi-dimensional data aggregation scheme for smart grid systems, which can be used for both additive aggregation functions and non-additive ones. With the blind signature technique, any entity in the system cannot associate a transmitted report with a special consumer. The comparison result of performance has demonstrated that the proposed scheme is effective in terms of both computation and communication overhead.

ACKNOWLEDGEMENTS

The authors would like to thank the editors and anonymous reviewers for their valuable comments to significantly improve the quality of this paper. This work is supported in part by National Science Foundation of China under grant nos. 60970140,61272481, and 61272522.

  • The gateway (resp. aggregation node) and operation center (resp. sinknode) are integrated into a whole entity in EPPA [12] (resp. MDPA [13]) because data aggregation is performed on ciphertext at the gateway (resp. aggregation node) and the aggregation result is decrypted at the operation center (resp. sink node).

Ancillary