《1. Introduction》

1. Introduction

Smart grids, as next-generation power networks, provide efficient and intelligent electricity and information exchange to maximize energy usage efficiency and meet modern demands by integrating advanced information processing and communication technologies [1,2]. For example, the smart meter in a user’s home can sense the electricity usage information of home appliances in real time, and the control center can collect and analyze these data to learn the user’s power usage behaviors and provide dynamic pricing and flexible power dispatching policies [3–5]. However, the smart grid is confronted with substantial data communication and computation burdens given the explosive growth of smart meters [6,7]. Furthermore, the exposure of the collected power consumption data of smart meters promotes privacy leakage risks because the power consumption data can be used to explore users’ living habits and even infer their economic status [8]. In addition, tampering and forgery attacks will also produce a great threat to the stability of the smart grid [9,10]. For example, the false data injection attack from a cyber attacker caused the world-shaking Ukraine blackout accident in 2015 [11]. To address the above challenges of performance, privacy, and security in the smart grid, many research schemes were proposed, among which a typical representative, that is a secure and efficient data aggregation mechanism, has attracted appreciable attention for its significant advantage. Currently, the smart grid data aggregation schemes can be roughly divided into the following three categories.

The first category is composed of the data aggregation scheme with the traditional network architecture. For example, Lu et al. [12] presented an efficient and privacy-preserving data aggregation mechanism by integrating the superincreasing sequence, homomorphic Paillier encryption, and batch verification, achieving efficient multidimensional data aggregation with security and privacy protections. Furthermore, Ni et al. [13] constructed a security-enhanced data aggregation scheme by jointly using homomorphic encryption, the trapdoor hash function, and homomorphic authenticators, thereby improving the computation and communication costs of work with confidentiality and integrity guarantees. From the perspective of dynamic pricing and service support, Gope and Sikdar [14] formulated a privacy-friendly lightweight data aggregation mechanism. It realizes strong privacy protection under dynamic billing, which is especially suitable for devices with limited computing resources. Without the support of a trusted third party, Liu et al. [15] proposed a practical data aggregation scheme with efficient privacy preservation. In the proposed scheme, the trusted users are linked to form a virtual aggregated area, and the aggregated results are used for data analysis, such that the user’s personal privacy is protected and the robustness of the system is improved. From the perspective of finegrained aggregation, Li et al. [16] developed a multisubset data aggregation scheme with efficient privacy preservation. According to the different ranges of the power consumption data, it can achieve multisubset aggregation and provide fine-grained data service; at the same time, the user’s privacy is preserved with a low computation cost. Although the developed schemes in the above literature achieve efficient and secure data aggregation, further opportunities exist to reduce the data processing delay and communication overhead that are due to the weakness of the employed traditional network architecture.

Fortunately, fog computing, as a promising computing paradigm, has been developed to overcome the weakness of the traditional network architecture and has been proven to decrease the delay and communication overhead significantly, especially when combined with cloud computing [17]. Consequently, the second category of solutions developed the data aggregation mechanism with the edge/fog computing architecture. For example, Lu et al. [18] constructed a fog-assisted privacy-preserving data aggregation scheme by integrating the Paillier encryption, the one-way hash chain, and Chinese remainder theorem. This scheme has the property of aggregating the data of hybrid Internet of Things (IoTs) into one, and possesses a filtering function for fake data. Based on the application demands of different data types, Huang et al. [19] studied a fog-enabled selective data aggregation scheme that also considers the reliability and privacy-preserving problems. To further enhance the privacy effect of the above methods, Lyu et al. [20] proposed a fog-based differential privacy-preserving data aggregation scheme; this scheme achieves differential privacy for statistical data and ensures the data confidentiality from the aggregator. From the resource-constraint consideration in an edge computing system, Zhang et al. [21] presented an efficiency-enhanced privacy-preserving data aggregation scheme by transferring the time-consuming signature operations offline, thereby effectively relieving the online computation burden. Focusing on anonymous authentication in the fog-enabled smart grid, Zhu et al. [22] conceived an anonymous data aggregation scheme by employing the Paillier cryptosystem and blind signature, which can provide strong privacy protection with low computation and communication costs. Although the above solution reduces the system delay and communication overheads significantly, and provides privacy and security protection to some degree, this category of schemes still faces issues of security and centralization. For example, when a user’s private information is transmitted to a fog node, and a malicious attacker successfully intercepts the channel and steals the secret key, it is difficult to guarantee the privacy of the user. Moreover, all of the users’ data are concentrated in the fog or cloud layer, which inevitably introduces the problem of centralization.

The emergence of the blockchain technique [23] has provided a new perspective to address the above problems because of its decentralization and nontampering features. Currently, there are several studies that have applied the blockchain to the smart grid. For example, in Ref, [24], Liang et al. investigated a blockchainbased data protection scheme for the smart grid, and it proves that the blockchain can effectively improve system security under cyber-attacks. Therefore, the third category of solutions encompasses the combination of data aggregation and the blockchain technique. Specifically, Fan and Zhang [25] proposed a secure data aggregation for smart power regulation by integrating the consortium blockchain into the smart grid, in which a multireceiver model for collecting multidimensional data is developed, and based on smart contracts, it establishes flexible power monitoring and management mechanisms to enhance the security of the smart grid. Guan et al. [26] studied a blockchain-assisted anonymous data aggregation scheme for the smart grid; it enhances the system security and obtains better performance compared with other solutions. However, the users’ power consumption data are transmitted in plaintext form in groups and will be confronted with some security risks. Although the above blockchain-based privacy-preserving data aggregation schemes effectively enhance the smart grid security and solve the problem of centralization and single point failure, all of them do not consider the edge computing paradigm, causing an ineffective utilization of local resources. As a result, the system efficiency has a large space for improvement. Accordingly, the works [27] and [28] were developed to improve system performance by combining the blockchain and edge computing, but these two schemes do not provide specific executable solutions.

The above schemes solve the corresponding problems of the smart grid to varying degrees, but there are still many weaknesses. Different from the existing solutions, we propose a doubleblockchain assisted secure and anonymous data aggregation (DA-SADA) scheme for the fog-enabled smart grid by integrating the blockchain, the Paillier cryptosystem, batch verification, and an anonymous authentication mechanism. Specifically, the main contributions of this scheme are summarized as follows:

(1) We design a three-tier architecture-based data aggregation framework by integrating fog computing and the blockchain. It is a security-enhanced framework, and the local resources are exploited effectively, which provides strong support for achieving efficient and secure data collection in the smart grid.

(2) We develop a secure and anonymous data aggregation mechanism with low computational overhead by jointly leveraging the Paillier encryption, batch aggregation signature, and anonymous authentication. It can effectively resist various security threats (such as eavesdropping, tampering, and replay attacks) and provide multiple privacy preservations.

(3) The system achieves fine-grained data aggregation and provides effective support for power dispatching and price adjustment by the designed double-blockchain and two-level data aggregation. Additionally, this design further strengthens the system security and robustness.

The remaining parts of this paper are organized as follows. In Section 2, we describe some preliminaries. Section 3 introduces the constructed network model in detail. Our proposed scheme is presented in Section 4, followed by the security and performance evaluation in Section 5. Section 6 ultimately draws the conclusion of this paper.

《2. Preliminaries》

2. Preliminaries

《2.1. Blockchain》

2.1. Blockchain

The blockchain can be considered as a peer to peer (P2P) distributed database that creates blocks and links in chronological order [29], which is designed to provide decentralized and distributed solutions for a wide range IoT and industrial Internet of Things (IIoT) applications. The main blockchain components include transactions, blocks, smart contracts, the consensus mechanism, cryptography, and the P2P network [30]. Specifically, in a blockchain network, the participants act as the distributed nodes for protecting and maintaining the shared record of transactions collaboratively; it does not need any trusted party for supervision and management. All nodes are responsible for sharing, packaging, verifying, and storing new transactions generated in the blockchain network. Therefore, it can establish trust among participating entities that do not trust each other in a distributed scenario. It also has decentralization, nontampering, and security features.

Decentralization: The distributed structure of the blockchain ensures the decentralization property. Furthermore, the thirdparty maintenance management is not required, and the nodes in the network are completely autonomous based on the incentive mechanism.

Nontampering: Nontampering means that once transaction data are recorded in the blockchain, the record cannot be successfully tampered with or deleted.

Security: The data written to the blockchain needs to be collectively verified, which indicates that successful tampering needs at least 51% of the computing power in the entire network, which is usually impossible in practice.

《2.2. Paillier encryption》

2.2. Paillier encryption

The Paillier homomorphic encryption method is widely used in the privacy protection area. It can directly operate on ciphertext, thus effectively protecting data privacy. Specifically, the Paillier encryption is an additive homomorphic encryption, and it consists of key generation, encryption operation, and decryption operation.

Key generation: Given a security parameter κ, the system randomly chooses two large primes p and q, where (this operation is used to calculate the length of p and q, and which is equal to κ-bits) and gcd[ pq, (p –1)(q –1)]= 1, and then calculates public key N = pq and private key = lcm(p –1)(q –1). Next, select a generator  and ensure the calculation of  mod N is available. Furthermore, it defines the function . Finally, the public key and private key of Paillier encryption are obtained.

Encryption operation: For any plaintext  , the system selects the random number  , and then the ciphertext can be calculated as  mod N2 .

Decryption operation: According to ciphertext C, it can calculate the plaintext as mod N.

《2.3. Bloom filter》

2.3. Bloom filter

The Bloom filter consists of a long binary vector and a series of random mapping functions; it has the advantages of low computational complexity, high space utilization, and query efficiency. It can quickly confirm whether an element exists in the set.

We assume that there are k hash functions {h1, h2, ... , hk} and one set with elements {x1, x2, ... , xω}. These elements are mapped to the corresponding position of the Bloom filter by k uniformly independent hash functions, and the value of the corresponding position is set to 1. The specific operation is shown in Fig. 1.

Element adding: As shown in Fig. 1, we hash the element by k times to obtain k hash values {h1(x1), h2(x1), ... , hk(x1)}, and then based on these values, find the corresponding positions of the Bloom filter. Finally, let values k of the corresponding positions in the Bloom filter be 1.

《Fig. 1》

Fig. 1. Generation of the bloom filter.

Element query: To query whether the element x1 exists in the Bloom filter, we first calculate k hash values of the element x1, which is denoted as  {h1(x1), h2(x1), ... , hk(x1)}, and then check whether the values of the corresponding positions in the Bloom filter are all 1. If one of them is zero, it indicates that the element x1 is not stored in the Bloom filter; otherwise, the element x1 is stored in the Bloom filter.

False positive rate: A false positive in terms of the Bloom filter means that the element x1 does not belong to the element of the Bloom filter, but the value of the corresponding position  is 1. We assume that the element value in the Bloom filter is set to 1 with probability p = 1 – (1 – 1/θ ) kn . According to the result of the work [31], we can obtain that the upper limit of the false positive rate is  , where θ denotes the number of elements in the Bloom filter.

《3. Network model and threats》

3. Network model and threats

《3.1. Network model》

3.1. Network model

In our constructed network model, a fog-enabled data aggregation smart grid consists of four entities (smart meters, fog nodes, cloud server, and trust authority (TA)) and is displayed in Fig. 2. Specifically, we assume that the coverage area of a smart grid is divided into m subareas, and each subarea deploys n smart meters for sensing user’s power consumption information. All of the ·n smart meters form the user layer. Accordingly, each subarea deploys a fog node to collect and aggregate the data from its own area, and all the m fog nodes form the fog computing layer that is located at the edge of the network between the user and service supporting layers. At the service supporting layer, the cloud server is used to process the data uploaded from the fog layer and generate real-time decision-making. TA is responsible for the generation of the entire system’s parameters. The specific function definitions of these entities in each layer are presented in detail in the following part.

《Fig. 2》

Fig. 2. Network architecture of the developed DA-SADA. UA: user aggregation; FA: fog aggregation.

The User layer: The user layer is mainly composed of a large number of smart meters. For example, in the subarea j, the ith smart meter SMij observes a user’s real-time power consumption, and then encrypts and signs these consumption data. Next, it sends these encrypted data to the aggregation node at the user layer. The aggregation node aggregates the verified ciphertext to generate the first-level aggregation ciphertext, and then encapsulates the related information into a block. At the same time, the newly generated block will be added to the user aggregation (UA)-blockchain by the consensus mechanism. In these processing processes, the identity of SMij (i.e., the user) always exists under a pseudonym. Finally, the generated UA-blockchain is sent to the fogj for further processing.

Fog computing layer: The fog computing layer is the middle layer between the user and service supporting layers that is deployed at the edge of the network, which enables the secondlevel aggregation of the encryption data to significantly reduce the communication overhead. Specifically, when the fogj receives the first-level aggregated ciphertext from the UA-chain sent by the aggregation node in the user layer, it signs the aggregated ciphertext and sends it to the aggregation node at the fog layer for secondary aggregation. Next, the aggregation node encapsulates the related information into a new block, and then the newly generated block is added to the fog aggregation (FA)-blockchain by the consensus mechanism. Finally, the generated FA-chain is sent to the cloud server.

Service supporting layer: In this layer, the cloud server can record, analyze, store, and manage users’ power usage information in real time, which is automatically executed by a smart contract, so the whole process does not need human intervention, improving the efficiency of the system and enhancing the security of the privacy data. Specifically, when the cloud server obtains the second-level aggregated ciphertext from the FA-chain that is sent by the aggregation node at the fog layer, it performs the decryption operation to recover the plaintext of the second-level aggregation result, and then utilizes Horner’s law to achieve fine-grained aggregation plaintext. The combination of coarse and fine-grained aggregation results provides support of diverse data for effective power dispatching management.

TA: TA is primarily responsible for generating and managing all public parameters and secret keys for entities in the system. Meanwhile, it creates a Bloom filter for smart meters of each subarea by collecting a user’s pseudonym. This Bloom filter will be sent to the corresponding users. The same operation is adaptable to the fog layer.

《3.2. Adversary model》

3.2. Adversary model

In the smart grid scenario, in order to pry into a user’s private affairs, an eavesdropper may exist that can eavesdrop on the communication links between smart meters and fog nodes. At the same time, the active attacker may tamper with the transmission information and launch replay attacks to threaten the security of the smart grid. In our adversary model, we divide threats that may occur in the network into internal and external attacks.

Internal attack: The first category of internal attacks is composed of the malicious node attacks, which occur during the generation of the blockchain in the user and fog computing layers. For example, in the generation process of the blockchain, a malicious node pretends to be a legal node in the network, and initiates some active attacks (e.g., tampering, forgery, replay) to impair the authenticity and integrity of the user’s private data. Therefore, the system should have the capability of identifying the legality of node identities in the consensus process. The second category of internal attacks is described as honest-but-curious in terms of fog and cloud nodes. For example, the fog node may be affected by undetected malware, and malware will eavesdrop on the data from devices, so we must ensure that the fog node does not observe the user’s private data throughout the process. Similarly, the system should guarantee that the user’s personal private data cannot be derived from the cloud server.

External attack: The attacker can eavesdrop and tamper with the transmitted data over communication links; it also can launch a replay attack. Therefore, the system must ensure that the attacker cannot successfully obtain the privacy information over the communication links and that it is immune to active attacks.

《4. Double-blockchain assisted secure and anonymous data aggregation》

4. Double-blockchain assisted secure and anonymous data aggregation

In this section, we develop a DA-SADA scheme for the fog-enabled smart grid by integrating the blockchain, the Paillier cryptosystem, batch aggregation verification, and an anonymous authentication mechanism. It consists of four parts: system initialization, UA-blockchain generation, FA-blockchain generation, and service supporting.

《4.1. System initialization》

4.1. System initialization

In our network scenario, the trusted third party TA is responsible for the system initialization, where there are three procedures that need to be executed in this system initialization process, that is, the generation of system parameters, the distribution of system parameters, and the generation of the Bloom filter.

The generation of system parameters: In the generation stage of system parameters, the TA selects the system security parameter κ to calculate two safe large primes = κ. Consequently, it calculates N = pq as the public key of the homomorphic encryption algorithm and  = lcm(p –1,q –1) as the corresponding private key. Meanwhile, the system randomly selects  and calculates s = rN mod N2 . Let g = N + 1 and define the function as

Furthermore, for the sake of providing identity anonymity, the SMij chooses a random prime number Xij to calculate its secret key  modN2 ;  this public key Xij is used to calculate the smart meter’s pseudonym, that is, Pseuij = Xij mod N2 . Similarly, the fog node fogj chooses a random prime number Xj as its public key and calculates Yj = X–1 j mod N2 as its secret key to denote the fog device’s pseudonym Pseuj = Xj mod N2 . Finally, the TA chooses the secure cryptographic hash function

The distribution of system parameters: With the generation of all system parameters  , the public parameters (N,H) will be published online and the remainder of them will be allocated to the corresponding real entities. Specifically, keys (Xij, Yij, s), (Xj, Yj), and are assigned, respectively, to the SMij, fog node fogj and cloud server through the secret channel.

The generation of the Bloom filter: The TA collects the pseudonyms of smart meters to create a Bloom filter for each subarea. Similarly, the TA also collects the fog devices’ pseudonym to create a Bloom filter in the fog layer. Specifically, in the user layer, the TA sets an array with θ bits; then, it uses a hash function to calculate the hash value of all pseudonyms in the same area. The element value of this array is set to one when the index value is equal to H (Pseuij ) mod θ. Finally, the TA sends the Bloom filter to smart meters in the corresponding area. A similar operation will be implemented to generate the Bloom filter for the fog layer.

《4.2. Generation of UA-blockchain》

4.2. Generation of UA-blockchain

By considering the privacy leaks from the data analysis of the power consumption and tampering threat, the sensing device (i.e., smart meter) needs to encrypt the power consumption data of the user, and the relevant information needs to be digitally signed for integrity. This process is called transaction generation. Subsequently, the aggregation node aggregates the encrypted data and records the corresponding information into a block. Finally, the aggregation node generates the UA-blockchain by the consensus mechanism. The specific generation process of the UA-blockchain is shown in Fig. 3 and is represented below.

《Fig. 3》

Fig. 3. Generation of UA-blockchain. This process includes three steps: transaction generation, creation of new block, and blockchain generation.

4.2.1. Transaction generation

The generation of power consumption ciphertext: For a subarea with n smart meters, in a certain time slot ts, we denote the data item of the SMij as dij ; then, each smart meter calculates ciphertext Cij by the following formula:

where . We calculate g = N + 1 and obtain another form of the Paillier encryption algorithm c = (1 + mN)rN mod N2 according to the nature of (1 + N) m (1 + mN) mod N2 , which is mainly used to avoid the cumbersome calculation in the encryption and decryption operation, thereby reducing the computational overhead.

The generation of the signature: It can obtain the signature σij.

Next, the smart meter sends the report to the corresponding aggregation node in the user layer, where we choose the smart meter with the largest remaining computing resource as the aggregation node at the user layer. 

Verification and ciphertext aggregation: After the aggregation node receives reports from each smart meter, it first checks the effectiveness of the user’s pseudonym with the Bloom filter. Then, the timestamp is checked to confirm the validity of these reports. Finally, it uses the batch verification to verify signatures, and the specific expression is given as

where this equation is derived from the aggregation operation and the concrete values of the public and private keys. The detailed expression is written as

After the successful verification of the smart meters’ signatures, the aggregation node performs an aggregation operation to obtain the aggregated ciphertext Cj for subarea :

Consequently, the aggregated ciphertext Cj is combined with other related information to generate the transaction Tx = (Cj, Pseuij,ts ).

4.2.2. Creation of the new block

The aggregation node records the transaction Tx = (Cj, Pseuij,ts ) in a block and broadcasts this block in the subarea j for information authentication. This block still includes three other elements, that is, the Merkle root and previous and current hashes. The value of the Merkle root is achieved by hashing the ciphertext Cj and the pseudonym in the Merkle tree, as shown in Fig. 3. The hash value of the current block is calculated as following equation:

where this calculation process implies that once the block is added into a chain, it is difficult to tamper with the block content since the hash value of the previous block is involved in calculating the hash value of the current block.

4.2.3. Blockchain generation

After the aggregation node creates a new block, the new block is broadcast in this subarea. The ordinary node in this subarea verifies records in this new block, and each node only verifies the data related to itself for meeting the real-time scheduling requirement in the smart grid. If it is consistent with the original data, it passes the verification and broadcasts the verification result to other nodes in the user layer. After collecting the correctness confirmation message sent by the other 2n/3 + 1 nodes or more, this new block is considered to be valid and added to the UA-blockchain. In our blockchain network, we assume that the number of malicious nodes should be less than 1/3 of the total number of network nodes. Because we define that a new block can only be added to the blockchain when it passes the verifications of 2n/3 + 1 nodes or more, we set such threshold value for security consideration. It also implies the attacker can tamper the information in the block successfully only when it captures more than 2/3 nodes of the network. The specific consensus process is shown in Fig. 3.

《4.3. Generation of FA-blockchain》

4.3. Generation of FA-blockchain

Similarly, in the fog computing layer, the generation of the FAblockchain consists of the transaction generation, new block creation, and blockchain generation.

4.3.1. Transaction generation

The transaction generation of the fog layer is similar to that in the user layer. First, when the fog node j receives encrypted data from the UA-blockchain, these encrypted data will be digitally signed for integrity at fog nodej. Then, the selected aggregation node at the fog layer performs the aggregation operation for all of the Cj, , that is, it obtains the secondary aggregation result. Similarly, we choose the fog node with the largest remaining computing resource as the aggregation node.

The generation of the signature: When the jth fog node fogj receives the aggregated power consumption ciphertext Cj of the corresponding subarea, it can calculate the signature σj:

Next, this fog node sends the report to the corresponding aggregation node in the fog layer.

Verification and ciphertext aggregation: After the aggregation node receives the reports from each fog computing device, it first checks the effectiveness of the fog device’s pseudonym with the Bloom filter. Then, the timestamp is checked to confirm the validity of these reports. Finally, it uses the batch verification to verify these signatures, and the specific expression is given as

where this equation is derived from the aggregation operation and the concrete values of the public and private keys. The detailed expression is written as

After the successful verification of the smart meters’ signatures, the aggregation node performs an aggregation operation to obtain the secondary aggregation ciphertext CAS for all subareas.

Consequently, the aggregated ciphertext is combined with other related information to generate the transaction .

4.3.2. Creation of the new block

The aggregation node at the fog layer records the transaction  in a block and broadcasts this block to other fogs for information authentication. Similar to the creation of the block in the user layer, this block includes the transaction, the Merkle root and previous and current hashes. The hash value of the current block is calculated as the following equation.

4.3.3. Blockchain generation

After the aggregation node creates a new block in the fog computing layer, the new block is broadcast to other fog nodes and added into the FA-blockchain through the consensus mechanism. The consensus mechanism is similar to that of the user layer. First, the ordinary node in the fog computing layer verifies the records in this new block and each node only verifies the data related to itself. If it is consistent with the original data, it passes the verification and broadcasts the verification result to other nodes in the fog computing layer. After collecting the correctness confirmation message sent by the other 2m/3 + 1 fog nodes or more, this block is considered to be valid and added to the FA-blockchain.

《4.4. Service supporting》

4.4. Service supporting

When the cloud server receives the FA-blockchain from the fog computing layer, it reads the secondary aggregation ciphertext and decrypts the ciphertext by using the Paillier decryption algorithm. To leverage the Paillier decryption algorithm effectively, we further specify the components of Eq. (13), that is, Eq. (13) can be rewritten as

Meanwhile, we respectively define symbols M and R as

Accordingly, the cloud server can organize the ciphertext CAS into the following format, which is in accordance with the ciphertext form of Paillier encryption.

Consequently, the cloud server can use the Paillier decryption algorithm to decrypt the aggregated ciphertext directly and obtain the aggregated plaintext M.

Finally, Horner’s rule is employed to complete the high-speed analysis of the aggregated plaintext and obtain fine-grained aggregation results; the detailed solving process is shown as Algorithm 1. In this algorithm, the coefficient denotes the total power consumption of subarea j, which is defined as

Due to the values of these coefficients, it achieves the finegrained aggregation successfully, that is, it not only obtains the entire power consumption of the network but also recovers the subarea’s data.

Once the cloud server gains the power consumption of each subarea through the above operations, these fine-grained data can be explored to predict the power usage trend of each subarea, and then provide decision support for power dispatching and price adjustment. Accordingly, the smart contract enables these decisions to be executed automatically and develops the time-of-use pricing feedback strategy to encourage users to adjust their electricity use habits for alleviating the burden of the power grid and improving the power utilization efficiency.

With the accumulation of data, the blockchain sharing ledger will become increasingly larger, which is called blockchain bloat. For example, in the past nine years, the size of the Bitcoin system ledger has reached 153.1 GB [32]. All historical transaction items of Bitcoin need to be kept for a long time because they are used to calculate account balances. For the proposed aggregation mechanism in this paper, the smart meter’s data item of the new generation does not rely on the previous one, thus there is no need to save all the data items on each node. We recommend regularly cleaning out obsolete data items and releasing storage space in the relevant nodes.

《5. Security and performance evaluations》

5. Security and performance evaluations

In this section, we will discuss the security and anonymity properties of the proposed scheme, and analyze the performance in terms of the computation cost. In particular, we perform a quantitative analysis on the successful probability of tampering attacks under different scenarios, which proves the high security of our proposed scheme. Furthermore, the computation costs of the identity authentication and whole system are given in detail, and they show that the proposed scheme is lightweight and more suitable for systems with real-time requirements.

《5.1. Security analysis》

5.1. Security analysis

Data confidentiality: In our defined threat model, the transmission of users’ power consumption data over channels is subjected to the eavesdropping attack, and the fog and cloud nodes are both honest-but-curious. To guarantee the confidentiality of users’ privacy data, Paillier encryption is employed to encrypt these consumption data in the form of ciphertext Cij = gdij · rN mod N2 . Even if the eavesdroppers observe all of these data and know the encryption algorithm, they cannot decrypt the ciphertext to reveal users’ data without the private key because an encrypted result of Paillier encryption is semantically secure from the chosen plaintext attack [33]. Similarly, the aggregation execution objects of fog and cloud nodes are all encrypted data, so the fog node cannot obtain the user’s real data without the corresponding private key. Although the cloud server can recover the aggregated plaintext of each subarea as UAj by using the additive homomorphism of the Paillier algorithm, it still cannot deduce the original meter data of each user. Therefore, our developed scheme provides strong confidentiality for users’ power consumption data, that is, it protects the users’ privacy information effectively.

Data integrality and validity: Aiming at resisting the active attacks (e.g., tampering, forgery, replay) on the smart meter’s data, in our scheme, the user signs the ciphertext Cij and timestamp ts as  by using the batch aggregation signature before sending it to the upper layer. Only when the  is equal to   mod N2 , can the receiver confirm that the received information has not been tampered with. Obviously, once the data  is tampered with, this equation will not be established. That is, even if the attacker successfully modifies the information or launches replay attacks, the receiver can detect these threats effectively. As a result, our developed scheme guarantees the integrity and validity of the private data. It also provides the same security protections in the fog layer.

Identity anonymity and authenticity: The user identity is usually associated with the private information, and the disclosure of the user identity information can often cause a series of hazards. In the proposed scheme in this paper, the identities of smart meters and fog devices always exist in a pseudonym form, that is, PseuijXij mod N2 and PseujXj mod N2 , respectively, where the public keys Xij and Xj  are randomly selected by the user and fog device, respectively, and the generated pseudonyms Pseuij and Pseuj are random and are not associated with the true identity of the user and fog device. Even if the malicious attacker decrypts the meter’s data of users successfully, it still means nothing because it cannot obtain the real identity of the user. Thus, our scheme realizes the anonymity of the user identity. At the same time, an illegal node may exist that attempts to impersonate the legal user’s identity; however, our identity authenticity mechanism can identify this identity fraud behavior since we have already collected the legal pseudonym in advance and mapped it in the Bloom filter. It can quickly determine whether the node’s pseudonym is in the Bloom filter by the querying operation.

《5.2. Successful attacking probabilities》

5.2. Successful attacking probabilities

According to the threat model definition in Section 3, we choose two typical attacks to evaluate their impacts on aggregation results, that is, tampering attacks in nodes and over links. To demonstrate the advantages of our proposed solution, we comparatively analyze the successful probability of tampering attacks under different solutions.

5.2.1. Tampering attack in nodes

In our threat model, we assume that the total number of smart meters that attackers need to manipulate is w if they want to successfully launch a tampering attack, and the total number that attackers need to manipulate of fog nodes is f. To make it easier to understand, we suppose that the compromised probability of each smart meter is independent and denoted as αi, where i = 1, 2, ... , w, ... , nm and 0 ≤ αi ≤ 1. Similarly, the compromised probabilities of the fog node and cloud server are represented, respectively, by βj, j = 1, 2, ... , f, ... , m, 0 ≤ βj ≤ 1 and  . Meanwhile, we assume the intercepted probability of the smart meter’s secret key by a malicious node is independent and set to be , where i = 1, 2, ... , w, ... , nm and 0 ≤ ≤ 1.

Therefore, the successful probability of tampering attack under the traditional secure scheme can be given as

where the weight is 1/3, indicating that the attacker chooses to attack three category nodes equally. This traditional secure scheme is that the data are transmitted among nodes without considering blockchain, but it has other same secure mechanism with our proposed scheme.

For the proposed scheme in this paper, the blockchain-based secure scheme can tolerate less than 1/3 of compromised nodes due to the existence of the consensus mechanism. Based on this conclusion, we define the threshold  for each subarea, where the function ceil() is defined to return the smallest integer that is greater than or equal to the specified expression. In the meantime, the other threshold  is given for the fog layer. Consequently, the successful probability of a tampering attack under our proposed scheme can be written as

5.2.2. Tampering attack over links

In this part, we consider the attack that intercepts or forges data packets over the communication channels.

For the traditional secure scheme, the attacker may launch an attack to tamper with power data before the fog node or cloud server receives these data. This type of attack requires the successful intrusion into the communication channel and obtains the private key of the sender node to successfully modify the data. Therefore, we denote ηi , i = 1, 2, ... , w, ... , nm, 0 ≤ ηi ≤ 1 as the successful probability of an intercepting attack over the communication link between smart meter and fog node, and denote  , j = 1, 2, ... , f, ... , m, 0 ≤ ≤ 1 as the successful probability of an intercepting attack over the communication link between the fog node and cloud server. The independent successful probability of this kind of attack is

where the weight is 1/2, indicating that the attacker chooses to attack the two kinds of communication links equally.

For the proposed scheme in this paper, considering that the privacy data are encapsulated into the blockchain between the user and fog layer, as well as the link between the fog layer, and cloud server, we know that generally, the data in the blockchain are not tamperable, so we do not consider the successful possibility of a tampering attack in the communication links among the user layer, fog layer, and cloud server. However, a consensus process needs to be executed before the blockchain is formed, in which nodes of the user layer need to communicate with each other to form an internal communication network, and in this communication network, it will be confronted with the tampering attack. This threat also exists in the fog layer. Therefore, we will analyze the successful probability of a tampering attack over these two internal networks. First, we assume that there are  communication channels in the user layer, and the successful probability of intruding a communication channel at the user layer is denoted as  , where  xuc = 1, 2, ... , l, ... , and 0 ≤ ≤ 1. Meanwhile, we assume that there are  communication channels in the fog computing layer, and the successful probability of intruding a communication channel at the fog layer is denoted as , where   and 0 ≤ ≤ 1. Therefore, the independent successful probability for tampering with data in the proposed hierarchical blockchain network is 

Finally, we assume that the tampering attack in nodes and over links is independent, and the probability of being chosen by the attacker is equal. Consequently, the total successful probability of tampering attacks for the traditional and our proposed schemes are, respectively, given as

where the weight is 1/2, indicating that the successful probability of an attacker launching two kinds of attacks is independent and equal.

5.2.3. Successful probabilities

In the previous two parts, we analyzed the successful probability of the tampering attack for the traditional and our proposed schemes from a theoretical perspective. To show the analysis results more intuitively, we use the Monte Carlo simulation method to further analyze the successful probability. In this simulation scenario, we assume that there are 20 smart meters in each subarea and 1 cloud server in the service supporting layer, and the number of fog nodes is 50. Then, we assume that the probability that attackers need to manipulate smart meters is 10% to 100%; thus, the w is variable from 100 to 1000 in the entire network. Meanwhile, we define that the range of variables  all vary from 0.9 to 1, and the range of is set to be [0, 0.1]. The values of variables are randomly selected within their ranges, and we execute the experiment 1000 times to evaluate the average value of the simulation results. The experiment runs on a notebook with an Intel Core i5-7200U CPU @ 2.50 GHZ, with 8.00 GB RAM.

Fig. 4 depicts the interrelation between the successful attacking probability and the total number of smart meters that attackers need to manipulate. Notably, the successful probability exhibits a continuous decline with the increase of the number of the manipulated smart meters, and our proposed scheme demonstrates a significant advantage in the reduction of security threats. In particular, the successful attacking probability approaches 0 in our scheme when the total number that attackers need to manipulate is more than 500. The main reason for this result is that our proposed scheme designs two consensus mechanisms in the generation of the UA-blockchain and FA-blockchain, and the consensus mechanism needs group verification. Therefore, the use of the double-blockchain significantly enhances the robustness of the system.

《Fig. 4》

Fig. 4. Successful attacking probabilities under different solutions.

《5.3. Computation cost》

5.3. Computation cost

In this subsection, we analyze the computation costs of identity authentication and the entire system. In the simulation scenario, we assume that the number of fog nodes is variable from 5 to 50. Meanwhile, we set the error probability of the Bloom filter to 0.01, and define the RSA modulus N and parameter p as 1024 bits and 160 bits, respectively. Although the content-based Bloom filter usually has conflicts, the conflict probability is very small. For example, in the case of using seven different hash functions, to use a bit string of 2 MB size, the overall error rate is less than 0.01. Therefore, it is reasonable to set the error probability of the Bloom filter to 0.01. For convenience of explanation, we denote TE1, TE2, TM and TP as the exponentiation operations in , the exponential operations in G, the multiplication operations and the bilinear pairing in G, respectively. We use the pairing-based cryptography (PBC) library to implement these operations. The data set of simulation is from Commission for Energy Regulation Ireland [34]. Table 1 lists the operation notations and their time costs in the evaluation process.

《Table 1》

Table 1 Operation notations and time costs.

Fig. 5 shows the time cost of identity authentication with and without the Bloom filter. Observing from this figure, we can find that the time cost of the traditional scheme without the Bloom filter grows sharply with the increase of the number of smart meters, but our proposed scheme has a limited increasing range and the time cost is much lower than the traditional scheme. This is because the Bloom filter uses multiple hash functions to improve space utilization, which greatly improves the query efficiency of the authentication process.

《Fig. 5》

Fig. 5. Time cost of identity authentication.

Subsequently, for the sake of comprehensively displaying the computation cost, we analyze the computation cost of the entire system with our developed scheme, and conduct a comparison with two benchmark schemes, that is, the security-enhanced data aggregation scheme (SEDA) [13] and a lightweight privacypreserving data aggregation scheme for edge computing (LPDAEC) [21]. Because the computation cost of the hash operation is negligible compared with exponentiation and multiplication operations, we do not consider the cost of the hash operation in our evaluation process.

Specifically, in the user layer, the generation of ciphertext   and signature requires two multiplication operations TM and one exponentiation operation TE1 in  , respectively. After the aggregation node in the user layer receives the report  from n smart meters, it first authenticates the validity and integrity of the received data by batch verification, which includes n multiplication operations Tand one exponentiation operation TE1 in . Next, the aggregation computation of the user’s data needs n multiplication operations TM . Finally, the aggregation node sends the report to the fog layer. In the fog layer, to generate the signature σj , one exponentiation operation TE1 in  is needed; then, fog nodes send the report  to the aggregation node at the fog layer, and the aggregation node first authenticates the received data by batch verification, which includes m multiplication operations TM and one exponentiation operation TE1 in . After the successful authentication, the aggregation node aggregates the first-level aggregation report Cj, j = 1, 2, ... , m, and the aggregation computation needs m multiplication operations TM. Then, the fog node sends the aggregation report to the upper layer. Upon receiving the report from the fog node, the cloud server decrypts the aggregation report, and this Paillier decryption includes one exponentiation operation TE1 and one multiplication operation TM. From the above analysis, the entire calculation process of our proposed scheme includes (4mn + 2m + 1)TM + (mn + 2m + 2)TE1 operations. Similarly, we can obtain the computation costs of other schemes. The specific operation statistics of these schemes are shown in the Table 2.

《Table 2 》

Table 2 Time costs.

In Fig. 6, similar to the computation cost of identity authentication, the total computation cost of the system is proportional to the number of smart meters. Meanwhile, we can observe that our proposed scheme achieves a significant reduction in the total computation cost compared with SEDA and LPDA-EC. For example, when the number of smart meters is 500, the total computation cost of our proposed scheme is 103 ms, which reduces by 80% and 60% that of SEDA and LPDA-EC, respectively. Furthermore, the reduction of the computation cost will become more pronounced with the increase of the number of smart meters. This is mainly because the required time for bilinear pairing is much larger than that of other operations, and both SEDA and LPDA-EC include the expensive bilinear pairing operation during the verification process. However, in our proposed scheme, the use of the pairing calculation is effectively avoided, which significantly reduces the computation cost at the same time.

《Fig. 6》

Fig. 6. Total computation cost of the system.

From the above security and performance analysis results, we can conclude that the proposed security and anonymous data aggregation scheme significantly reduces the system computation cost while providing strong security and anonymity protections. Moreover, it is more suitable for systems with real-time highfrequency data collection and aggregation requirements in the smart grid.

《6. Conclusions》

6. Conclusions

The smart grid can achieve reliable and stable services by collecting and analyzing the users’ electricity consumption data, but the users’ security and privacy are usually threatened during these operations. Therefore, we propose a DA-SADA scheme. Specifically, we construct a security-enhanced three-tier architecture by combining fog computing and the blockchain, and the local resources are exploited effectively. Subsequently, a lightweight secure aggregation mechanism is developed to ensure the confidentiality, integrity, and authenticity of private data. In particular, in order to realize the flexible regulation of power, we design the doubleblockchain to achieve fine-grained aggregation of the users’ power consumption data, and the double-consensus in the formation of the double-blockchain further enhances the security of the system. Finally, the security analysis confirms the high security of our proposed scheme, and the comparison analysis of computation costs in the entire system further validates its performance advantage, providing a more suitable solution for systems with real-time requirements. Although our proposed scheme provides an efficient and secure data collection mechanism for smart grid, it still lacks an efficient and smart method to select aggregation node. Therefore, in future work, we plan to develop a dynamic and smart aggregation node selection mechanism to improve the applicableness of developed scheme in the real network scenario by integrating machine learning method.

《Acknowledgments》

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61971235, 61871412, and 61771258), the Six Talented Eminence Foundation of Jiangsu Province (XYDXXJS-044), the China Postdoctoral Science Foundation (2018M630590), the 333 High-level Talents Training Project of Jiangsu Province, the 1311 Talents Plan of Nanjing University of Posts and Telecommunications (NUPT), the Open Research Fund of Jiangsu Engineering Research Center of Communication and Network Technology, NUPT (JSGCZX17011), the Scientific Research Foundation of NUPT (NY218058), and the Open Research Fund of Anhui Provincial Key Laboratory of Network and Information Security (AHNIS2020001).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Siguang Chen, Li Yang, Chuanxin Zhao, Vijayakumar Varadarajan, and Kun Wang declare that they have no conflict of interest or financial conflicts to disclose.