• Open Access

On preserving user privacy in Smart Grid advanced metering infrastructure applications

Authors


ABSTRACT

Advanced metering infrastructure (AMI) enables real-time collection of power consumption data through the Smart Grid communication network. With the current deployment of smart meters (SMs), one of the concerns that started to be raised by the customers is on the privacy of their power consumption data. The exposure of these data can lead to several privacy problems that need to be addressed before the customers can be convinced for the use of SMs. This paper has two contributions. First, it identifies the threats regarding user and data privacy in AMI applications and comprehensively surveys the existing solutions to address these threats. We categorize the existing approaches on privacy and discuss pros and cons of these approaches with respect to some criteria. Second, we pick one of the existing solutions on privacy, namely the homomorphic encryption, and evaluate its feasibility and impact on performance when used in data aggregation for real-time AMI applications. We investigate and compare the performance of homomorphic encryption in terms of data size and end-to-end delay with that of hop-by-hop secure data aggregation and data concatenation within a network of SMs via extensive simulations. We finally conclude the paper with some future privacy issues that are subject to further research. Copyright © 2013 John Wiley & Sons, Ltd.

1 INTRODUCTION

The implementation of advanced metering infrastructure (AMI) at the distribution side of the Smart Grid introduces many benefits. First of all, AMI replaces manual power consumption data reading by human operators and hence provides a more accurate real-time data reading for billing purposes. AMI also enables fine-grained power consumption data reading. In this way, the availability and the amount of power consumption data increase significantly and enable many new applications, which were very difficult to accomplish in the past. Dynamic pricing, demand response, demand forecasting, and fraud detection are some examples of new applications in Smart Grid that occur as a result of a real time fine-grained power consumption data.

However, AMI has more targets for attacks than the conventional meter: (1) smart meter, (2) communication network, and (3) the utility company. Availability of such data on different venues provides opportunities for several new attacks. In addition to various attacks on integrity, availability, and accountability [1], the most serious concerns are on privacy. Privacy in Smart Grid is related to confidentiality of user identity and consumption data. There are two types of attacks that can pose privacy threats: (1) attacks on the communication network to capture the consumption data in transit (i.e., from smart meter to utility and from utility to other parties) and (2) attacks on the stored data (i.e., by the smart meter, utility company, or other third parties). For instance, a curious neighbor can collect household power usage from a smart meter (SM), either by remotely accessing the meter or by capturing the packets coming to or going out from the SM. Once the consumption data are captured, there may be several scenarios that can pose privacy threats. This is because the captured consumption data can easily be disaggregated into individual load signatures by non-intrusive load monitoring [2]. Analyzing these data over a period can provide a forecast about the household activities as shown in Figure 1. Typical privacy threats include but not limited to

  • Determining personal behavior patterns (can be used by marketers, government)
  • Determining specific appliances used (can be used by insurance companies)
  • Performing real-time surveillance (can be used by law enforcement and press)
  • Target home invasions (can be used by criminals)
Figure 1.

Power usage to personal activity mapping [3].

These examples necessitate the need for perfect privacy policies and solutions in order to make sure that the customers would participate in the use of SMs. In fact, in the Netherlands, the government had to delay smart metering deployment plan due to criticism on privacy issues from the public [4]. Therefore, the data should be protected/hidden when they are stored in the SM and being communicated within the Smart Grid communication network. This effort, however, may not be enough if the customers do not trust the utility companies. In this case, the data and user information should also be hidden from the utilities by using anonymization techniques. Nevertheless, the utility company should still be able to identify the individual in order to charge them in a monthly basis.

As a result of the aforementioned threats and issues, several privacy preserving approaches have been proposed for AMI applications very recently. This paper focuses on these privacy approaches and their overhead/performance issues in AMI applications. Our contributions are twofold: in the first part of the paper, we first provide a thorough survey of the existing efforts to protect user data privacy. We also classify the existing approaches on the basis of the concept of anonymizing the customer identification (ID) and provide a brief description on each approach. In the second part of the paper, we investigate the impact of privacy solutions on the performance of AMI applications that utilize data aggregation.

Data aggregation is picked because it is exploited widely in many AMI applications to reduce traffic and data size in the underlying networks. The approach to provide data aggregation privacy is based on homomorphic encryption. We investigate the impact of the use of homomorphic encryption on data size and latency compared with other approaches. The choice of these metrics stems from two facts. First, Smart Grid is expected to deliver huge amounts of power, monitoring, and other types of data and thus we need to investigate how homomorphic encryption increases or reduces the amount of data to be transmitted. Second, given the real-time requirements of Smart Grid to prevent power failures or handle demand response, we need to investigate the end-to-end (ETE) delay performance of the data delivered via aggregation. We compare the results of homomorphic encryption with hop-by-hop (HBH) secure data aggregation where the data are decrypted for aggregation at the intermediate SMs. Another baseline we consider is a variation of this approach where packet concatenation is applied without decrypting the data. Finally, we conclude the paper with open research issues.

We believe that this paper can be a good starting point for the researchers who would like to conduct research on these issues as well as the customers who are interested in privacy preserving solutions when SMs are deployed in their houses.

This paper is organized as follows. Section 2 details our categorization of the existing privacy solutions. The solutions under each category are explained in the subsequent sections. Section Section 6 describes homomorphic encryption in more details. In Section Section 7, we describe the model and protocol setup for the data collection and aggregation. Section Section 8 is dedicated to performance evaluation of homomorphic encryption. Finally, in Section Section 9, we conclude the paper by pointing out to some future directions in the area.

2 PRIVACY APPROACHES FOR AMI APPLICATIONS

The approaches to provide privacy for AMI applications differ on the basis of the concept of anonymizing the customer identity (ID). Specifically, three types of approaches were pursued in the literature so far.

The first category of approaches strives to provide privacy of the data both when they are in transit and stored in SM or utility companies' servers. Basically, the goal is to disassociate the customer ID and consumption data so that even the utility company will not have access to consumption data to do analysis or share it with other parties for the same purpose. In this category, the utility may not perform billing either because no individual customer data can be collected. This category is referred to as Anonymization. These approaches can be implemented by the involvement of trusted third party or additional trusted infrastructure.

A second category is a hybrid category where the customer ID is exposed to utility company but the consumption data do not reflect the actual consumption. Hybrid approaches are implemented using two different techniques: (1) load signature moderation and (2) power consumption data masking.

The final category, referred to as Non-Anonymization, strives to provide privacy of the consumption data while they are in transit within the Smart Grid communication network. In other words, the utility companies are trusted and thus they are allowed to know and store the customer ID along with its detailed consumption data. However, the data are encrypted during the transit so that only the key holder can capture them. Non-anonymization approaches can be implemented by encryption, which can be homomorphic.

Next, we describe these approaches in detail. Table 1 shows a summary of these approaches under each category.

Table 1. Privacy preserving approaches for AMI.
Approach categoryPrivacy solutionBilling generatorUtility can access
Fine-grained dataBilling data
AnonymizationEscrow service [5]UtilityPer customerPer customer
Trusted gateway [6] Trusted third party (TTP) [7]Smart meterAggregateNot allowed
TTPAggregateNot allowed
Hybrid approachLoad signature moderation [8-10]UtilityPer customerPer customer
Power consumption data masking [4, 7, 11]Not applicableAggregateNot applicable
Non-anonymizationTraditional encryption [12]UtilityPer customerPer customer
Aggregation-based Encryption:   
- Hop-by-hop aggregation [13]UtilityPer customerPer customer
- End-to-end homomorphic encryption [14, 15]Not applicableAggregateNot applicable

3 ANONYMIZATION APPROACHES

Anonymization of customer (or user) ID stems from the fact that the customer ID is mainly required for billing purposes and billing does not need a really fine-grained power consumption data. To hide the customer ID of the power consumption data for non-billing purposes, these data need to be sent anonymously to the utility company but they still can be authenticated and related to a particular group such as under the same distribution sub-station. The anonymity of the customer is achieved in several ways as detailed next.

3.1 Anonymity through escrow service

In this approach, anonymity is provided through two distinct customer identities [5]. One identity is for billing and administrative purposes and the other identity is for non-billing purpose (i.e., to send consumption data). The first identity is known by the utility whereas the second identity is only known to a trusted third party escrow service. Only the escrow service keeps the mapping of both identities which are hard-coded in the SM. SM will use the appropriate identity when sending data to the utility company.

Even though these identities can protect a user's privacy, this approach has several problems: (1) single point of failure of the escrow service; (2) the trustworthiness of the escrow service; (3) the relation between both identities may be inferred by capturing both types of data in a period; and (4) SM readings can still be data mined for usage patterns because they still have an ID.

3.2 Anonymity through trusted gateways

Another approach to provide anonymity is through a trusted neighborhood gateway [6]. As shown in Figure 2, SMs send tuples to this gateway. A tuple consists of consumption data along with a pseudo-random tuple ID. The gateway then sends the neighborhood-level aggregate power consumption to the utility company. All communications between SMs, gateway and the utility are assumed over a secure channel that provides authenticity, confidentiality, and integrity. In this approach, the utility cannot do billing but can do bill verification. For every billing cycle, SM calculates and sends billing report to the utility company directly. The utility uses a computationally expensive zero knowledge protocol through a series of challenge–response mechanism with the SM for interactive billing verification.

Figure 2.

Anonymity through trusted gateways.

In this approach, demand response may not be possible because individual SMs are not accessible by the utility. In addition, relying on SMs for billing calculation poses an issue of software update when there is a change in billing regulation. In such a case, millions of SMs may need to update their software, which may not be feasible. To overcome this problem, another approach in which a trusted third party (TTP) replaces the gateway and performs the billing tasks, is pursued [7].

4 HYBRID APPROACHES

Hybrid approaches strive to modify the customer consumption data before they leave the house. In these approaches, the ID of the customer is not anonymous but with modification of the consumption data, an adversary/utility/service provider cannot really make correct conclusions out of the data even if they are available in plaintext. On the basis of the type of the data modification technique, the following approaches are possible.

4.1 Load signature moderation

Managing power usage within the household through “Load Signature Moderation” or “Load Signature Reshaping” is performed before the power consumption data are reported to SM. This approach, however, requires an internal power supply such as a rechargeable battery and a power router. This battery may be discharged or recharged on the basis of a certain policy. For instance, the best effort algorithm proposed in [8], which is a deterministic algorithm, can be used to keep the output load constant whenever possible. However, this is not the only solution. A stochastic algorithm is proposed in [9] for the same purpose. Similarly, a non-intrusive load leveling algorithm is presented in [10] that attempts to provide privacy by maintaining an adaptable target load profile by taking into account all battery states.

There are three load shaping strategies that can be used to protect the user privacy: hiding, smoothing, and obfuscation. Hiding strategy makes the rechargeable battery fully supply the appliance's requirement. Then, the battery is slowly recharged as shown in Figure 3(a). In smoothing strategy, a mix of supply from rechargeable battery and from the utility company is used to support an appliance. Hence, the reported power consumption of an appliance to SM is less than its load signature as shown in Figure 3(b). In obfuscation strategy, the appliance's load signature is obfuscated with a series of battery recharging events as shown in Figure 3(c).

Figure 3.

Examples of load shaping strategy [8]. (a) Hiding, (b) smoothing, and (c) obfuscation.

This approach does not provide full privacy as there will still be data reported and exposed to the utility and adversaries. Such data can still give an idea on certain user behaviors such as being present at home.

4.2 Power usage data masking

The power usage data masking is performed when SM reports the power consumption data to the utility company by adding a mask value to the data. There are several methods for the mask value generation such as secret sharing method [4], using a random value from known distribution with known variance and expectation [7], and using distributed Laplacian perturbation algorithm [11]. For instance, in the secret sharing approach, a secret value is generated in such a way that when all secret values from SMs are added together, the result is a certain known value such as 0. Before sending the meter reading report, SM adds the real power consumption data with this secret value so that the adversary cannot know the real power consumption data. By adding all the masked data, the utility company will obtain the total real power consumption.

The overhead and scalability of the approach should be carefully considered given that the generation of secret values is based on certain computations.

5 NON-ANONYMIZATION APPROACHES

In this type of approach, the main goal is to provide confidentiality for the consumption data while they are traveling from the SM to the utility company. In addition, the intermediate nodes would not be able to access the consumption data because they are encrypted. In this way, if the data are captured by adversaries, they cannot access the data to do load signature analysis. Various encryption mechanisms can be used for this category. We categorize them on the basis of their ability to perform data aggregation, which is needed in many contexts for statistical purposes.

5.1 Traditional encryption

Typical solutions to encryption are via either symmetric key cryptography or asymmetric key cryptography. Bisoi [12] recommends the use of asymmetric key cryptography because the symmetric key cryptography has some issues: (1) a stored shared secret key in SM increases vulnerability of the key being stolen and (2) vulnerability of finding the shared secret key through pattern analysis. Nonetheless, symmetric key cryptography would be much faster if the application concerns about the delay.

5.2 Aggregation-based encryption

Apart from the exposure problem to the utility company, traditional approaches may also suffer when data aggregation is to be employed within the network (e.g., at the intermediate nodes). Data aggregation can be performed through packet concatenation or on the basis of functions such as sum and average. Instead of sending individual plaintext data to destination, aggregating the plaintexts and sending aggregated data will reduce the total bandwidth used. In fact, in many cases, this type of aggregated data is needed for statistical purposes. However, data aggregation also poses privacy issues because it requires the performing node to have access to plaintext to do the aggregation operation. To overcome this issue, two approaches were proposed.

5.2.1 Hop-by-hop concatenation

The first approach just performs concatenation of the encrypted packets when doing aggregation at the aggregator nodes [13]. In this way, there is no need to decrypt the data. Two different symmetric key pairs are used. The first key pair is used by the SM and the utility company to provide ETE encryption for SM data. The second key pair is used by the aggregator and its one-hop parent node for HBH authentication.

Even though there is some additional overhead due to the implementation of HBH security, data aggregation via packet concatenation mechanism still gives some bandwidth saving while providing privacy protection when the data are in transit. Nonetheless, there is not real saving on the size of the data sent because the data are concatenated. The only saving is on the header count. In addition, when the channel is lossy, the packet drop rate would be higher because larger packets are traveling.

5.2.2 End-to-end encryption via homomorphic approach

The second approach is based on homomorphic encryption schemes where ciphertext is used to perform some aggregation operations. Basically, homomorphic encryption allows arithmetic operation (i.e., multiplication) on the ciphertext. Two examples of homomorphic approaches that use homomorphic encryption to provide privacy protection are in-network data aggregation [14] and leakage/fraud detection [15].

In [14], an aggregation tree that covers all the SMs in the neighborhood is constructed. The root of the tree is the sink node. Every SM in the aggregation tree encrypts its power consumption data using the public key of the sink node, takes inputs from its children nodes, aggregates them by multiplication operation, and then forwards the result to its parent node. The root of the tree performs multiplication on all the incoming data and then decrypts the result to obtain the final result (i.e., total power consumption). Hence, the privacy of household power consumption is maintained because the aggregation operation is performed on the ciphertext.

In [15], the goal of leakage detection is to find any leakage or fraud by examining aggregated power consumption in a neighborhood without revealing any information about the individual power consumption, even when the sink node is malicious. To achieve this goal, a combination of homomorphic encryption and additive secret sharing is used.

In a neighborhood of N customers that connected to a substation, the substation sends the public key certificates of all SMs. Then, every SM Mi, i = 1,2,3,…,N, instead of sending their power consumption mi to the substation, first picks N random numbers ai1, ai2,…, aiN, such that mi = ∑ jaij mod n, for a large n. Mi encrypts each of these aij with the public key of Mj (i.e., PUBj), math formula, j = 1, 2, …, i − 1, i + 1, …, N. While aii remains in Mi, the N − 1 encrypted values cij are sent to the substation. The substation performs multiplication operation to the N − 1 values that use the same public key Mi and then sends the result to Mi. SM Mi then decrypts it with its private key (i.e., PKi), adds aii to it, and then sends the final plaintext value back to the substation. For leakage detection, the substation compares the aggregated power consumption, which is equal to the sum of the final plaintext value from all Mi, with its own power consumption record. The leakage detection protocol operation is shown as follows:

display math

In the rest of the paper, we will assess the impact of using homomorphic encryption on the performance of a network of SMs performing data aggregation. Note that a very preliminary evaluation has been performed in [16]. This paper's evaluation portion extends and improves this evaluation in several ways. We evaluate a new approach on the basis of concatenation as will be detailed shortly and also look at other details such as multiplication algorithm used in homomorphic encryption and use of increased key sizes.

Before describing the evaluation setup and specific assumptions, we provide some background information on homomorphic cryptosystems and how to select the appropriate system for AMI applications.

6 HOMOMORPHIC CRYPTOSYSTEMS

6.1 Overview

Homomorphic cryptosystems use either symmetric or asymmetric key for encryption and decryption. Symmetric key cryptography is faster than asymmetric key cryptography. However, very few symmetric key homomorphic cryptosystems have been proposed. Moreover, most of them have been broken [17].

Suppose that m1 and m2 are two plaintexts, homomorphic encryption in the homomorphic cryptosystems can be classified as:

  1. Additive homomorphic encryption. The result of addition operation on plaintext values can be obtained by decrypting the result of multiplication operation on the corresponding encrypted values.
display math
  • 2.Multiplicative homomorphic encryption. The result of multiplication operation on plaintext values can be obtained by decrypting the result of multiplication operation on the corresponding encrypted values.
display math

Many homomorphic cryptosystems and their variants can be found in the literature. Because our focus in this paper is not to provide a survey of these approaches, we only summarize a selection of homomorphic cryptosystems in Table 2. A more comprehensive review can be found in [17] and in [18, 19] for elliptic curve versions. Among these cryptosystems, we discuss the selection of the most appropriate one for AMI applications in the next subsection.

Table 2. Homomorphic cryptosystems.
CryptosystemHomomorphic operationType of keyMessage expansion factorSecurity remark
AdditiveMultiplicative
Paillier [20] Asymmetric key2Semantically secure (IND-CPA)
Okamoto-Uchiyama [21] Asymmetric key3Provable secure equivalent to difficulty of the factorization problem
Naccache-Stern [22] Asymmetric key≥4Provable secure under the prime residuosity assumption
RSA [23] Asymmetric key1Not semantically secure
El Gamal [24] Asymmetric key2Semantically secure (IND-CPA)
Domingo-Ferrer [25]Symmetric key2Vulnerable to known plaintext attack
Castelluccia, Mykletun, Tsudik [26] Symmetric keyAdd a small number of bitsProvable secure
Elliptic curve El-Gamal [19] Asymmetric key4Elliptic curve discrete logarithm problem (ECDLP)

6.2 Homomorphic cryptosystem selection for data aggregation applications

The selection of homomorphic cryptosystem for AMI applications needs to consider three main criteria:

  1. Functionality: Because we are considering homomorphic encryption for data aggregation that requires a sum function, additive homomorphic cryptosystems are the possible candidates.
  2. Security: AMI applications have typically high security requirements. The ability of the cryptosystems to resist attacks is also very important. For instance, Paillier and El-Gamal are both indistinguishable under chosen plaintext attack (IND-CPA) and hence they are preferred to Domingo–Ferrer or RSA.
  3. Performance: AMI applications have different ETE delay, priority, reliability, and bandwidth requirements [27, 28]. Therefore, the performance of homomorphic cryptosystems for AMI applications needs to be considered in order to meet these requirements. We pick two metrics under this criterion:
    1. Message Expansion Factor: Message expansion factor of homomorphic encryption shows the size of ciphertext compared with plaintext as a result of homomorphic encryption operation. For instance, 10 bytes of plaintext message becomes around 20 bytes of ciphertext message when using the Paillier cryptosystem because it has the expansion factor of 2. Hence, message expansion factor will influence the available bandwidth. Larger ciphertext will consume more bandwidth while transmitted on the communications network than the smaller one.
    2. Computational Overhead: Encryption and decryption computational costs should be minimal to reduce the processing delay. Also note that larger ciphertext requires a more intensive arithmetic computation during the data aggregation operation. For example, it is faster to do 2 byte multiplication than 10 byte multiplication operation. Finally, different choices of algorithms for the cryptographic computation also affect the computational overhead [29].

On the basis of the aforementioned criteria, the Paillier cryptosystem can be considered as the preferred choice for privacy-preserving ETE aggregation because of its lower message expansion factor, security, and functionality. Furthermore, its encryption cost is not too high and has an efficient decryption [17]. Paillier cryptosystem is also non-deterministic because of the random number r as shown in Algorithm 1. This random number makes the same message encrypted into different ciphertexts. In the algorithm, the basic notation definitions are as follows: ZN—set of integers N, math formula—set of integers co-prime to N, and math formula—set of integers co-prime to N2.

image

Even though the El-Gamal cryptosytem has the similar message expansion factor and security as Paillier, it does not have the required functionality (i.e., sum function). Its variant on elliptic curve (EC) which is called elliptic curve El-Gamal (EC-EG), however, has the additive homomorphic property. It performs addition operation on the ciphertext to produce the sum and has the same security level as the El-Gamal cryptosystem. It also has the benefit of an EC cryptography in terms of the small key size. An EC over a 163-bit field gives the same level of security as a 1024-bit RSA modulus or Diffie–Hellman prime [30].

Nonetheless, EC-EG has a higher message expansion factor than the El-Gamal cryptosystem. It has the message expansion factor of 4. As shown in Algorithm 2, each message needs to be mapped to an elliptic curve point (x,y) and then encrypt this point into two ciphertexts, where each ciphertext has two components. Moreover, there are some issues that need to be addressed before implementing EC-EG for AMI applications. We need to select the EC parameters such as the underlying finite field and the coordinate system. We should also determine a mapping function from a message to an EC point and vice versa. This mapping function should be deterministic such that the same plaintext always maps to the same EC point and has the following property: map(m1 + m2 + ⋯ + mn) = map(m1) + map(m2) + ⋯ + map(mn) [31]. Unfortunately, there are not many existing works on mapping plaintext messages to an EC point. The mapping functions are probabilistic algorithms [32] that are based on brute force approaches. When the search space is large, the brute force approaches will consume much resource for finding an EC point for each message. Therefore, in the experiments, we will use the Paillier cryptosystem.

image

7 PRIVACY PRESERVING DATA AGGREGATION PROTOCOL VIA HOMOMORPHIC ENCRYPTION

In this section, we describe the privacy preserving protocol based on ETE homomorphic encryption as well as how we address the targeted security goals.

7.1 Network architecture

We consider a mesh network of SMs that serves as the underlying network for AMI applications. Typically, in-network data aggregation involves three types of nodes: sink node, aggregator, and leaf node. A sink node initiates query and acts as the end destination of the aggregation results. An aggregator node (e.g., a SM) is an intermediate node that receives and combines meter readings from its child nodes (e.g., SMs) and then forwards a single intermediate result to its parent node. A leaf node performs data reading and forwards them to its parent node. In this paper, we assume that there is one sink node but the results can be easily extended to multiple sink nodes where each sink forms a different tree. An SM may act as a leaf node or an aggregator depending on its location on the aggregation tree. As an aggregator, an SM can perform data reading and data aggregation as well.

We use a multilevel network tree (i.e., acyclic) that consists of many SMs and one gateway as the sink node as shown in Figure 4. Because we want to explore the performance of ETE homomorphic encryption under various tree depth levels, we assume that the aggregation tree is known in advance and remains static during the experiments.

Figure 4.

Multilevel network tree.

In the next subsection, we describe the data collection protocol and how we address the security threats earlier.

7.2 Data collection protocol

The sink node initiates data aggregation by periodically sending a query to the network. SMs at the lower layer (leaf nodes) send their encrypted power consumption data to their parent SM at the upper layer. These intermediate SMs, depending on the type of encryption operation, perform aggregation operation on the ciphertext before sending the result to the parent SM or to the sink. These nodes are referred to as aggregator in the rest of the paper. SM will be used to refer to leaf nodes in the communication tree. The sink node computes the average power consumption by doing a division on the total sum, which is in plain text. We make the following assumptions as part of the data collection protocol:

  • The communication channels are assumed to be perfect and lossless so that there is no packet loss.
  • Key generation and distribution are performed before the sink node initiates a query. Hence, each node already has its appropriate keys based on its role. An aggregator has both public and private keys, whereas a leaf node only has a public key.
  • Data aggregation is performed at an intermediate SM using the sum operator and the result is transmitted as soon as the aggregator SM receives data from all of its children.
  • Each aggregator already has the ID list of its direct child SMs.

The protocol can also handle the following attacks in addition to providing privacy:

  • Eavesdropping: Eavesdropping attack may take place in AMI applications when data are in transit on the communication network by overhearing the transmission to obtain privacy information.
  • Data pollution: Data pollution may take place in AMI when an external attacker performs false data injection attack or when previous meter readings from some internal nodes reach an aggregator (i.e., data freshness attack). This data freshness attack may tamper the current aggregation result.
  • Node failure: A node failure may occur when an SM or an aggregator fails to respond to queries or fails to forward its reading or the intermediate aggregation results. In this way, the sink node will receive an incorrect aggregation result.

The three threats mentioned earlier are addressed as follows:

Each SM has a unique ID. Each packet sent from a downstream node has this ID in the packet. The aggregator node verifies this ID using a simple look-up mechanism on its ID list. If the incoming packet comes from an authorized node, the received packet will be included in the aggregation operation. This mechanism also avoids data pollution from external attacker (i.e., false data injection attack).

We use a timestamp to overcome data freshness attack. If the timestamp of a packet is less than the timestamp assigned in the node, the operation or the data received will be discarded. A timeout is used to avoid an aggregator waiting indefinitely in case some of its child nodes are unable to report. Initially, the sink node announces its timeout value to its first level aggregator nodes. Subsequently, depending on its position in the aggregation tree, an aggregator will adjust its timeout value according to the timeout value of its parent node: (maximum sink timeout limit—tree depth × minimum delay between depth levels).

To handle the inaccurate aggregation results in case of node failures, we use the number of readings sent to the sink node and compare it with the actual SM count. Note that for each SM, either a leaf node or intermediate SM, a two-tuple of information is sent: (1) an encrypted power consumption or an encrypted aggregate power consumption and (2) an encrypted number of power consumption data. For a leaf node, the number of data is 1 whereas the number of data at an intermediate SM depends on the number of its child nodes. The sink node verifies the received number of power consumption data before it calculates the average power consumption. When the number of data is less than the registered customers, then this means there is a node failure.

As part of the protocol, four operation codes are defined as shown in Table 3. OPCODE is the type of operation used to specify the operation that needs to be performed by each SM. ID is the identity of the sender node. TIMESTAMP is used for data freshness. DATA is two-tuples of information that represents different information based on OPCODE as defined in Table 3.

Table 3. Operation code definition.
OPCODEType of operationDATA
KPublic key distribution to all nodes, initiated by the Sink nodePublic key: N||g
PPrivate key distribution to the aggregator nodes, initiated by the Sink nodePrivate key: λ || μ
SData request to all nodes, initiated by the Sink nodeParent node timeout (see assumption 5)
RData reporting from all nodes to the sink in response to data request operation, generated by smart metersAt the leaf node: E(m)||E(1)
At the aggregator node: math formula, n is the number of nodes involved in the aggregation
 

8 PERFORMANCE EVALUATION

8.1 Baselines and performance metrics

We used a Java-based simulation to implement the privacy-preserving data aggregation. Our goal is to assess the performance of privacy-preserving homomorphic data aggregation while being able to resist or detect the aforementioned attacks. The approach is represented as ETE-H in the graphs and tables. We compare the performance of ETE-H with two other protocols:

  • HBH aggregation (HBH-A): HBH-A basically decrypts the data, performs aggregation on the plaintext, and encrypts the aggregated data before sending them. Therefore, it exposes the data to the intermediate nodes. In order to provide user privacy, we assume that this approach can be complemented with a different mechanism where pseudonyms (instead of real IDs) are used. The pseudonyms are associated with the IDs of SMs but this association is known only by the sink node as performed in [5]. As a result, even if the data are exposed to intermediate nodes, they cannot be associated with a real ID.
  • HBH concatenation (HBH-C): One other alternative to homomorphic encryption is to perform concatenation of encrypted packets at the intermediate nodes and send the concatenated packet to the upper level as used in [13]. This is somewhat similar to ETE homomorphic encryption but there is no operation on the packets. They are just concatenated and a larger packet is created. The final packet is decrypted and the aggregation function is performed at the sink node.

For performance evaluation, we used the following metrics:

  • ETE latency: This is the elapsed time between the sink node sending a query and receiving the final average value. We assume that the time spent for source authentication is so small compared with the arithmetic operation and can be ignored in the ETE delay calculation. Moreover, this authentication process has the same effect whether it is in ETE-H, HBH-A, or HBH-C.
  • Encrypted data size: This metric measures the average message generated from the leaf nodes and from the aggregator nodes in bits. The sizes of messages affect the number of bits/packets required to transmit them.

We performed four experiments to observe the effect of the following parameters on ETE latency and data size for ETE-H, HBH-A, and HBH-C: (1) key size, (2) depth of the tree, (3) the total number of aggregators per level, and (4) the effect of multiplication algorithm on homomorphic encryption. The depth of the tree refers to the depth starting from the sink (as 0) to the leaf nodes (excluding them).

All the experiments used 64-bit keys, except for the key size experiment. We used a total of 36 SMs in all experiments. For experiment 1, we used a two-level network topology that has two aggregators at each level. Each aggregator has eight SMs and nine SMs at levels 1 and 2, respectively. For the tree-depth experiment, we used balanced network trees. We started from a flat network with a depth of 1, 2 aggregators, and 17 leaf nodes per aggregator. Then we increased the tree depth by 1, maintained the number of SMs per aggregator as 2 for each level, and repeated the experiments until tree-depth of 5. For experiment 3, the number of SMs per level is changed whereas the depth is kept constant. Experiment 4 is performed at the sink node only assuming 36 SMs sensing messages to the sink node at any configuration. Table 4 summarizes the network topology configurations and the key size.

Table 4. Network topology configurations and key size for the experiments.
Experiment typeNetwork topology configurationKey size (bits)
Tree depth#Agg per level#SMs per agg∑ SMs
1228/936Varies
21–52Varies3664
312–10Varies3664

We set the following parameters constant during the experiments: minimum communication delay between nodes on different depth levels = 50 ms and power consumption data size = 16 bits. The sink generates 2000 queries to collect power consumption data. We calculated the average ETE latency as well as the average data size from these queries.

8.2 Experiment results and discussion

8.2.1 Experiment 1: effect of key size

The effects of using different key sizes in aggregation to latency and data size are shown in Figure 5 and Table 5, respectively. In Table 5, we provide the average data size for all leaf SMs and aggregators. These are shown as SM and AGG, respectively. We also show the percentage increase in average data size. Several observations can be made from these results.

Figure 5.

The effect of key size on end-to-end latency.

Table 5. Data size comparison for different key sizes.
Key size (bits)average encrypted message size (in bits)
ETE-HHBH-AHBH-C
SMAGG%SMAGG%SMAGG%
  1. SM, Smart Meter; AGG, aggregator.
64125.21746.01295125.4125.40125.31753.81300
128253.23538.21297253.4253.40253.53549.51300
256509.87130.91299509.0509.00509.37130.51300
5121021.714296.712991021.21021.201021.214296.711300
10242045.628632.313002045.32045.302045.328634.31300

Considering the ETE latency, we can see that while the increase in the key size means a better protection, it is at the expense of exponential increase in the latency for all three approaches as shown in Figure 5. HBH-C has a higher ETE latency than ETE-H and HBH-A, which have a similar latency for a given key size. They provide 16–40% reduction in the ETE latency compared with HBH-C.

The increase in the key size, which is by a factor of 2, provides a linear increase in the average size of the encrypted message of the SMs and aggregators by the same factor as seen in Table 5. We observe that both ETE-H and HBH-C have a similar percentage increase even though they have different operations. This can be attributed to the fact that homomorphic encryption eventually generates a new packet whose size is the total number of bits in both packets. This is also same in concatenation when two packets are combined. There may be only some additional information bits, which is not a major increase. There is no increase in the size of data in HBH-A because the packets are decrypted and aggregation is performed at the intermediate SMs.

Given the fact that HBH-A approach does not increase the message size, it is quite interesting to see that ETE-H is providing similar ETE latency. And similarly, the latency of ETE-H approach is lower although its message size is very close to that of HBH-C. These can be explained as follows: HBH-A performs decrypt-aggregation-encrypt at every aggregator including the sink, which takes much time and this time should be much more than the time spent by ETE-H approach to perform homomorphic multiplication at each node and decryption at the sink. Otherwise, given that HBH-A has smaller messages to transmit and thus the transmission delay of messages is much less, HBH-A should have provided lower ETE delay. HBH-C also suffers from the overhead of decryption at the sink where it performs decryption of all the data messages (embedded in big packet) before it performs aggregation.

To show that this is really the justification of the superior performance of ETE-H, we performed a separate experiment for the sink. We evaluated the cost of multiplying n-encrypted messages followed by decryption at the sink and compared it with the result of decrypting n messages and summing them in HBH-C for various data sizes. Figure 6 indicates that Multiplication-Decryption approach is much faster than Decryption-Sum approach at the sink. This is mainly due to higher overhead of decryption compared with multiplication. In the former approach, there are n multiplications and one decryption, whereas in the latter, there are n decryptions and one summation. This outcome explains why HBH-C is experiencing more delay at the sink and why ETE-H is faster although it performs multiplication at the aggregator nodes.

Figure 6.

Delay overhead of multiplication-decryption versus decryption-summation operations at the sink, n = 36.

8.2.2 Experiment 2: effect of the depth of the tree

The impact of the depth of the tree on the ETE latency performance is shown in Figure 7. ETE latency increases with the increased depth in all approaches. This is because there will be more transmission delay in the data when they arrived at the sink. The pattern in ETE latency is similar to that of Figure 5 because of the same reasons. HBH-C performs around 20% worse than the other approaches at all depth levels. Note that the same performance ratio is maintained because of the fact that the total number of SMs is not changing in the network. The total number of SMs affects the overhead at the sink, which may significantly change the ETE latency.

Figure 7.

Latency comparison for different depth levels.

However, in terms of message size, whereas the average data size generated from the aggregator nodes in HBH-A remains constant, both ETE-H and HBH-C show a significant increase in the data size with the increase of the tree depth level as shown in Table 6. Hence, ETE-H and HBH-C consume more bandwidth than HBH-A.

Table 6. Data size comparison with different network tree depth levels.
Tree depth (level)Average encrypted message size (in bits)
ETE-HHBH-AHBH-C
SMAGG%SMAGG%SMAGG%
1125.22244.31693125.2125.20125.42257.41700
2125.21746.01295125.4125.40125.31753.81300
3125.51500.81095125.3125.30124.71495.91100
4125.61501.31096125.1125.10125.01500.61100
5125.41498.91096125.2125.20125.11501.61100

Considering the tree depth and processing times, we can summarize the findings in this subsection as follows: the transmission time from the leaf nodes to an aggregator has a small contribution to ETE latency because the data size of the encrypted message from leaf nodes is typically small. The transmission time from an aggregator to its parent node will have a significant contribution if the data size is larger. In addition, there will be significant processing. The processing times at an aggregator, from the highest to the lowest, are in the following order: HBH-A, ETE-H, and HBH-C; whereas at the sink node, it is HBH-C, HBH-A, and ETE-H respectively.

8.2.3 Experiment 3: effect of the number of aggregators per tree level

In this experiment, we aimed to analyze the effect of spreading the load to more aggregators on performance. Looking at the ETE latency, ETE-H and HBH-A perform better than HBH-C when the number of aggregators is smaller (Figure 8). As the number of aggregators increases, there will be more operations performed at the sink because of increased number of children reporting. This increases the processing delay for both multiplication and decryption and thus the delay of ETE-H and HBH-A becomes similar to that of HBH-C at increased number of aggregators. For HBH-C, the total number of decryptions will not change but it takes the advantage of more parallelism among the increased number of aggregators performing aggregation, which helps to maintain a flat latency. Regarding data size, we can see that with the increased number of aggregators per tree level, the average message size generated by aggregators is decreased as shown in Table 7.

Figure 8.

Latency comparison for different number of aggregators per tree level.

Table 7. Data size comparison for different number of aggregators per level.
#Agg per levelAverage encrypted message size (in bits)
ETE-HHBH-AHBH-C
SMAGG%SMAGG%SMAGG%
2125.22244.31693125.2125.20124.82246.21700
4125.41124.8797125.3125.40125.21126.8800
6125.1748.4498125.3125.30125.2751.4500
8125.3561.9349124.8124.80125.5564.9350
10125.4450.3259125.0125.00125.7452.4260

8.2.4 Experiment 4: using a different multiplication algorithm in ETE-H

In the aforementioned experiments, we used the default Java multiplication operation in our implementation of ETE-H. Because different computation algorithms may produce different results, we also investigated the performance of Karatsuba multiplication algorithm [33], shown in Algorithm 3, compared with the Java's default multiplication. In our implementation of Karatsuba's algorithm, we used a cutoff value to limit the number of recursive operations. We define K − n as a cutoff value that is equal to the maximum bit length of the operands divided by 2n. Hence, the cutoff value of Karatsuba − 1 (K − 1) to K − 3 are the half size, the one-fourth size, and the one-eighth size of the maximum bit length of the operands.

Figure 9 shows that when the data (operand) size is bigger (e.g., ≥ 1024 bytes), Karatsuba performs better than Java's default multiplication algorithm. Therefore, the algorithm can be employed at the sink or at the nodes that are near the sink. This is because, the sink and aggregators near the sink will receive bigger data compared with the leaf SMs or the aggregators near the leaf SMs. This will speed up the multiplication process at the upper level SMs.

image
Figure 9.

Computational time comparison of Karatsuba algorithm and Java's default multiplication for ETE-H.

9 CONCLUSION AND OPEN ISSUES

In this paper, we first motivated the need for privacy in Smart Grid and surveyed the existing works that address the privacy issues in AMI applications. We classified the privacy approaches for AMI into three categories on the basis of the concepts of anonymizing the customer ID: anonymization approaches, non-anonymization approaches, and hybrid approaches. For each category, we provided the ideas of each approach and discussed vulnerabilities if any.

After surveying the approaches, we evaluated the performance of privacy preserving data aggregation approaches that are widely recommended. We compared the ETE latency and message size performance ETE-H, HBH-A, and HBH-C approaches. Overall, the results indicated that ETE-H provides comparable ETE latency when compared with HBH-A, which does not provide privacy by itself. In addition, its performance is superior to HBH-C in terms of ETE latency because of fastness of homomorphic multiplication compared with decryption at the sink. However, both ETE-H and HBH-C increase the message size to be transmitted significantly and thus their bandwidth requirements will be higher. As a result, ETE-H can be a preferred solution if the underlying network traffic will not be significant. Otherwise, HBH-A can be picked provided that it is complemented by a separate privacy mechanism.

Although we surveyed various privacy preserving approaches, there are still many interesting issues that need to be explored in the future. We list some of them as follows:

  • The dependency of anonymization approaches to trusted infrastructure such as a gateway or SM that might operate in an open and harsh environment raises issues about physical security and protections, device level security, authorization, and access control that need to be addressed.
  • The dependency on these trusted infrastructures needs to be minimized in order to make the solutions easily deployable for all the utilities. Otherwise, the solutions may bring extra cost and labor overhead for the utilities, which may slow their willingness to ensure full privacy.
  • Another issue that needs to be investigated is the effect of providing user privacy on the quality of service performance of the traveled data. Because there are several other applications that will be using the same communication infrastructure in the Smart Grid, their performance may suffer with the overhead that comes with the privacy solutions. Approaches providing privacy but not significantly compromising QoS need to be developed.
  • Current load moderation techniques use a rechargeable battery where frequent charging and discharging will affect the battery life. In addition, as distributed energy generation becomes more widely available, load moderation will have more options. Smarter moderation algorithms that take into account multiple sources, prolonging battery life, and able to accommodate future events are required.
  • In the non-anonymization approaches, homomorphic encryption which is often used to provide statistical values such as sum, average, and variance tends to increase the ciphertext data size and is computationally expensive compared with traditional encryption. Hence, new homomorphic systems that are lightweight are needed to realize their deployment in the Smart Grid.

Ancillary