Privacy-aware cloud ecosystems: Architecture and performance

Summary With an increasing number of cloud providers offering services made use of by both individual users and other providers, there is a realization that service provision now involves an “ecosystem” of providers. Some providers may be directly visible to a user, while others may be contributors to composite services and not directly known to the user—as only the provider offering the composite service is visible. Such services may include: domain specific services (eg, simulation), advertising services, or profiling/analytics services. Understanding the impact on data privacy of a user for such a composite service remains a challenge, and providing transparency (and obtaining user consent for data use) remains a key requirement of the European General Data Protection Regulation (GDPR). An architecture that makes use of blockchains and smart contracts is proposed that addresses this requirement. An implementation of the architecture is used to demonstrate how access control can be managed and audited. The scalability and cost of undertaking access control, as the number of actors (both service providers and “voters”) increases, is also described. The proposed approach can be used to support service aggregation across both private and public clouds.

of a controller or joint controller. 3By defining these elements, GDPR gives the responsibility of any violation in data processing to the controller or joint controller, but also gives a shared responsibility to the processor when the user has no direct control on the data (or the analysis carried out on the data).The integration of GDPR into a cloud ecosystem helps make more explicit the responsibility and accountability requirements of both the processor(s) and controller(s).Under such requirements, any operation of a cloud provider on personal data must be in accordance with user consent. 1 Given these GDPR requirements, several solutions have been proposed to support the accountability and provenance tracking of user data when it is delivered to a controller or submitted to a processor. 4Some solutions utilize blockchain-based technologies to improve transparency and trust between users and actors. 4,5In addition to these solutions, the blockchain as a shared ledger has been integrated into applications that cover privacy, authentication, provenance, and data integrity. 5Blockchain technology has brought a provably secure, fully distributed, and consensus-driven solution to these applications.A review of recent blockchain-based techniques to enhance privacy and trust in cloud environment 6,7 shows that the impact of such techniques on cloud-based service deployment has not yet been studied.This article proposes a blockchain-based approach to improve provenance tracking of cloud user data under GDPR requirements.The main contributions of this work can be summarized as follows: (i) a service-oriented architecture that makes use of a blockchain network, to enable an audit trail of service providers to be generated.The architecture supports trustable containers that securely log all provider operations on personal data; (ii) a blockchain-based logging mechanism to identify providers who violate GDPR rules; (iii) a case study to show how GDPR rules can be deployed as smart contracts in the blockchain-supporting access, transfer, and profiling operations on user data; (iv) performance evaluation showing the effect of increasing the number of parties who verify provider operations, and the amount of gas (a metric used to measure the computational complexity of carrying out the analysis) used for deploying smart contracts over a blockchain test network.
The rest of this article is structured as follows.Section 2 provides background material about blockchain and smart contracts.Section 3 describes the proposed architecture for supporting user privacy in a cloud computing environment, and Section 4 focuses on interactions between the software components in the architecture.Section 5 includes a case study to illustrate how verification can be carried out using a blockchain, with reference to GDPR rules.Section 6 provides experimental results describing the computational complexity associated with carrying out the verification process.Related research work is reviewed in Section 7 and conclusions are provided in Section 8.

BLOCKCHAIN BACKGROUND
A blockchain is a public ledger comprising of a distributed, shared database (storing records in blocks) and a set of connected nodes.The blocks are structured as a chain, with each block containing a hash of its previous block.Each block also contains a time stamp and a nonce.The former shows the creation time of the block and the latter is an arbitrary number used just once in a cryptographic communication. 8The nodes in a blockchain have a peer-to-peer relationship and can build a new block of valid transactions via a mining process.The nodes creating the blocks are called miners.
Mining is a main concept of the blockchain through which a block is made and attached to a blockchain network.To this end, several techniques are currently available, namely, Proof of Work (PoW), Proof of Stake (PoS), Proof of Importance (PoI), Proof of Space (PoSpace), and Practical Byzantine Fault Tolerance (PBFT). 3blockchain network can be public, federated, or private. 9In a public blockchain, everyone can participate and access blocks without any permission (eg, Ethereum 10 ).In the federated blockchain, the network is operated under the leadership of several organizations or groups, which limit the type of user who can take part in the verification of transactions (eg, Corda 11 and R3 12 ).Finally in the private blockchain, only one organization has permission to create/verify blocks (eg, Monax 13 and Multichain 14 ).
A smart contract is executable code that runs on the blockchain and translates a usual contract between two or more individuals into a program.A smart contract provides mediation between two parties so that it enforces them to follow the contract.Each contract can involve a set of transactions that may alter the state of the blockchain, for example, Ethereum. 10Ethereum uses payments (referred to as "gas") for deploying a smart contract or executing transactions.Gas refers to a unit measuring the computational effort required to carry out operations in a smart contract (associated with operation or opcodes in the contract). 8,15The gas is paid in Ether, being the cryptocurrency in Ethereum ecosystem and allows smart contracts to be executed.

PRIVACY-AWARE CLOUD ECOSYSTEMS ARCHITECTURE
Figure 1 shows the architecture, which has two general work flows: (i) service delivery and (ii) improving user privacy.The former offers a set of services or composite services realizing user requirements.The latter proposes a blockchain-based technique for user data provenance tracking.
The user interface enables submission of personal data to cloud-hosted services.Using the interface, a user can also submit preferences for verifying GDPR rules on the operations executed by providers.

Privacy and access management
This layer includes four components: service broker, agreement builder, container management, and blockchain network-enabling an audit trail of provider operations to be stored in a blockchain network.Operations can be access, store, profiling, or transfer of user data.Any GDPR violations are also flagged to the user through this layer.
Service broker-identifies services that match user requirements-providing the name, location, and address of the service provider to the agreement builder component.
Agreement builder-acts as a broker to create a shared agreement between a user and provider(s).Given the operations to be executed on user data, this component builds a smart contract to record information required for verifying operations under GDPR rules.The smart contract address is sent to providers to be deployed on their containers.
Container management-launches and manages a container on the provider to get data from the provider and submit this to a blockchain network.It deploys the smart contract supplied by the agreement builder for recording such data.The data may involve user and provider addresses, the operations processed on user data (eg, access, transfer, profiling), and information for verifying operations under GDPR rules (eg, user age).Our assumption is that containers are trusted and they record every operations executed on the user data.

Service management
This layer is responsible for discovering, building, and publishing cloud services (some of which may be an aggregation of multiple services).The QoS management component maintains details about cost, availability, and uptime associated with each service.

REALIZATION OF THE ARCHITECTURE
Interaction between components is realized using four phases.Table 1 describes the symbols used.
Phase 1: service discovery and composition-this phase identifies requested services or the development of composite services.A service broker identifies providers involved in the offered services.
Phase 2: building a shared agreement-a protocol for the creation of a shared agreement (based on GDPR requirements) is illustrated in the sequence diagram of Figure 2.This phase is activated by the agreement builder using service details provided by the service broker component.The agreement builder confirms the identity of services requiring user data and sends a request to data controllers/processors (actors) about operations to be executed by them on user data.The agreement builder then waits for consent of the data subject.Given operations and associated GDPR rules, the agreement builder then builds a smart contract-referred to as container_submission.The smart contract consists of a template for storing data in the blockchain.This component also determines a set of voters for verifying operations. 1 The voters are third parties connected to the blockchain network and can give votes when executed operations do not comply with GDPR rules.Phase 3: logging data processing-operations on user data, recorded using the trusted container, are stored in a blockchain (based on the con-tainer_submission smart contract).From Figure 3 the agreement builder requests personal data from the data subject, which is then forwarded to the actors for processing.A container_submission smart contract is used subsequently by a container to log the data, and which facilitates the verification process.On termination of data processing, a message indicating the finalization of the process is submitted to the agreement builder.During agreement development between data subject and actors, the former should be notified in advance about operations that will be exeby actors on their personal data.Not all data subjects may be concerned about GDPR compliance-as verifying an operation is a costly process and a part of such cost should be paid by the data subject.We therefore introduce degree of compliance to enable a data subject to associate a value between [0, 1] to each operation that will be executed on their personal data: Definition 1.Let  be a set of operations executed by actor(s) on personal data.A function Deg ∶  → [0, 1] is defined to map the degree of compliance for verifying the operations into a real number between 0 and 1.For an operation  i ∈  executed by actor i, the outputs show full-compliance, partial-compliance, and noncompliance, respectively.
Each voter can also define a threshold for the verification of each operation, that is, if the degree of compliance of data subject for an operation is greater than or equal to a voter's threshold, the operation is verified.The choice of a threshold is subjective and shows the interest of a voter for verifying GDPR compliance.Defining such a threshold is independent of the degree of compliance determined by data subjects.Setting it too high may limit the number of voters who engage.Setting it too low may not lead to an unuseful outcome.The degree of compliance can be considered as an input of the container_submission smart contract.The voting results can be reported according to the degree expressed by the data subject and the thresholds determined by voters.An actor is classified as a violator based on the following definition.
Definition 2. Let  ={V 1 , … , V l } be a set of voters and v j be a vote by V j after verifying operation  i by actor i such that where G  is a set of GDPR rules related to  i ,and j is a threshold defined by V j for verifying  i .Moreover, let m ≤ l be the minimum number of acceptable votes for reporting a violation.The actor i is classified to be a violator if Σ l j=1 v j ≥ m.

CASE STUDY
Consider a composite service combining: Order creation, Payment,andShipping 16 services.A customer orders a product, ships it to a destination address, and organizes the payment process using an online portal (see Figure 5).A provider should access a customer's personal data to carry out the following data processing (operations): (i) Order creation service provider:should get the customer data: name, identification number, biometric information, age, and contact details; (ii) Payment service provider: needs to access customer name, identification number, and bank account details to handle the payment; (iii) Shipping service provider: requires the name and contact details of a customer.The provider remotely interacts with a subcontractor (a Mail service provider) to manage the product delivery.Given the roles defined in GDPR, both Order creation and Payment service providers are processors and directly access customer data.The Shipping service provider, however, can have both processor and controller roles.It acts as a processor when managing a part of data delivery and as a controller when transferring the data to the subcontractor.Finally, Mail service provider is assumed to be a data processor.
It is assumed that a shared GDPR-based agreement has been reached between the customer and actors (controllers/processors).The agreement is based on three GDPR rules: access, transfer, and profiling of customer data-where each actor must guarantee that (i) if customer data are

Container_submission smart contract
This smart contract is deployed at containers and has three functions: Access, Transfer,a n dProfiling.Each function gets necessary information for verifying its related GDPR rule in the verification phase.The smart contract also enables customer to identify a degree of compliance for verifying operations under GDPR rules.Access uses the boolean auth_access variable, to identify whether the service supplied by the actor supports encryption of personal data or not; Transfer gets the country name of the provider receiving customer data; Profiling requests the age of the customer whose personal data are under an automated profiling operation (eg, obtaining some statistical results on customer data).

Verification smart contract
This smart contract is deployed by predefined voters to verify GDPR rules on actors.Each voter detects GDPR violation and sends out a message.
Four functions: Initial_Verification, Verify_Access, Verify_Transfer,andVerify_Profiling are used to implement this.Each function is implemented for the verification of an operation executed by actors on customer data. 2 Initial_Verification-For each operation executed by an actor, the function compares the degree of compliance of customer for verifying the operation and the thresholds determined by voters.If the former is greater than or equal to the latter, then it (locally) calls the function for verifying the operation in the smart contract.
Verify_Access-the verification smart contract has a list of all sensitive data identified by the GDPR standard.For verification, personal data logged by a container is compared with a sensitive data list.The log recorded by a container is checked to identify the authentication status.Providing that authentication variable (auth_access)isfalse value, the actor accessing the customer data is identified to be a violator.
Verify_Transfer-checks the location of the data receiver, and if outside of Europe the function then checks the list of countries certified by BCR country.If neither of these match, a GDPR violation is flagged.
Verify_Profiling-checks customer age via deploying container_submission smart contract.If the age is less than 18 years, then a violation is flagged.
The approach proposed in Reference 3 can be used for translating the aforementioned GDPR rules into opcodes for use in smart contracts.
The approach focuses on most frequently used operations, for example, access, data transfer, and so on.These operations are used to support data processing (on personal user data) by service providers and their execution can be directly monitored and verified.

Voting smart contract
This smart contract is deployed by the agreement builder and collects the votes returned by voters in order to check whether a violation is committed by actors or not.The function of this contract-called here Conclude-gets the addresses of voters participating in the verification process.Given Definition 2, if at least m voters report a violation, the actor is reported.

EXPERIMENTAL RESULTS
An initial prototype was built by Ganache 18 and Ropsten, 19 and smart contracts in Figure 6 were programmed in Solidity. 20The Ganache local test network supplied default gas and Ether to alter blockchain states under the function calls.Ropsten is a public test network, supporting miners, and with a gas limit of 4 712 388 for deploying a contract.Remix Ethereum was used as the framework to compile and run deployed contracts.The smart contracts container_submissionand verification were deployed on Ganache and Ropsten, respectively.The amount of gas used for the contracts was 773 721 for container_submission and was 1 814 952 for verification. 3The gas consumption for the deployment of voting smart contract depends on the number of voters involved in the contract.The variation in the amount of gas consumed by changing the number of voters, and the variation in gas used by changing the number of operations of actors are taken into account.Moreover, the impact of the deployments of voting contract with different number of voters on the (average) time taken for the mining process is evaluated.When detecting violations in GDPR rules identified in the case study, experiments are carried out to show the rate of violation detection under different threshold levels determined by voters.Furthermore, the relationship between the violation detection rate and the number of acceptable votes for reporting a violation is investigated with regards to different diversity of voters' thresholds.Number of voters vs gas consumption: Figure 7 shows the relationship between the number of voters and the used gas spending on the deployment of voting contract.In this experiment deployed in Ganache test network, the number of voters varies from one to eight.As seen from the figure, when the number of voters increases, the amount of consumed gas increases constantly from 365 597 (one voter) to 723 608 (eight voters).

Number of operations executed by an actor vs gas consumption:
We consider one actor and one voter, with number of operations (ie, Access, Tr a n s f e r , Profiling) executed by the actor varying from 1 to 10. Transaction costs for the verification of Initial_Verification function (with different parameters) were repeated and measured five times to calculate the average used gas.In each activation, the operations, degrees of compliance, and voters' thresholds were selected randomly.Given these assumptions, the relationship between the number of operations and the transaction costs used for verifying them is shown in Figure 8.It can be seen from the figure that as the number of operations increases, the used gas varies from 42 503 (one operation) to 298 384 (10 operations).

Number of voters vs mining time:
This experiment was performed on the Ropsten test network, to measure the time taken from the deployment to mining of a contract-repeated five times to calculate the average.Number of voters in the contract was altered from one to eight.Figure 9 identifies the time taken (in seconds) for the voting contract to be successfully mined since its deployment time.Similar to Reference 21, our results indicate that the time depends on the interest of miners in the voting contract and does not depend on the number of voters or contract parameters.It results that miners can normally take an arbitrary time for the process of mining.
Violation detection rates: the effect of changing voter thresholds on the rate of violation detection in the GDPR rules (from Section 5) is measured-where a violation in a rule is based on the conditions proposed in Definition 2. We consider nine actors (controllers/processors) and one voter (with one operation per actor).Moreover, the data subject assigns a degree of compliance for verifying operations executed by actors.Voter thresholds are set to 5, 7, and 9 4 From Figure 10, the x-axis indicates the number of actors who committed a violation, and the y-axis the rate of violation detection based on the voter's thresholds.For each threshold, the verification smart contract was activated 10 times to calculate the average rate of violation detection.In each activation, a random number between 0 and 10 was generated to indicate the degree of compliance with GDPR rules on the operation executed by each actor.Given such assumptions, the average rate of violation detection is calculated as: where n is the number of violations (varies from 1 to 8) and r i is the average number of successful detection for the violation i.When a voter increases its threshold for verifying operations, the average rate of violation detection decreases.The fluctuations are due to the random generation of compliance degrees.We observe that changing the number of violations does not impact on the detection rate, as GDPR compliance of operations is automatically verified and any violation is flagged when the threshold of voter is less than or equal to the degree of compliance required by a data subject.
Violation detection rate vs number of votes: this experiment evaluates the impact of changing the minimum number of acceptable votes for reporting a violation on the average rate of violation detection.The evaluation is done under different thresholds.Considering eight actors, one is randomly selected as a violator breaching a GDPR rule.Moreover, there are eight voters (with thresholds between 0 and 10), and the compliance degree of data subject for verifying the rules is randomly chosen between 0 and 10.In the experiment, the diversity of thresholds for voters is assumed to be 4 and 8, that is, voters have 4 and 8 different thresholds for verifying operations, respectively.Regarding the eight voters involved in the evaluation, every two voters have the same threshold.However, for the diversity of eight, each voter has a different threshold with the others.
As illustrated in Figure 11, the x-axis shows the minimum number of acceptable votes to issue a violation report.The y-axis indicates the average rate of violation detection, calculated by the formula provided in the previous experiment with n=1.The smart contracts of the case study were  11, by increasing the number of acceptable votes for reporting a breach, the average rate of violation detection declines gradually.It is also observed that a higher detection rate (or a more precise violation detection) can be achieved when the number of thresholds used increases and the number of acceptable votes decreases.

RELATED WORK
Blockchain and smart contracts have recently motivated cloud security researchers to improve user trust and privacy for sharing data in cloud ecosystem.The potential of using blockchain-based techniques to protect healthcare data located in cloud was described in Reference 22, highlighting practical challenges for recording medical data in a blockchain network.Moreover, the authors in Reference 23 presented a patient centric healthcare data management system with the aid of blockchain.The system ensured that the private health-care data in cloud is only monitored by patient.A blockchain-based approach was proposed for storing cloud attestation in Reference 6.The authors implemented a smart contract for recording the migration of user data between cloud providers.The deployment of the smart contract enabled cloud users to identify the location of their data through the submission of a query to the contract-extended in Reference 24 to provide cloud users more control on the migration of their data to providers in user-defined white lists.Consumer-based data movement policies realized through a blockchain-based technique was reported in Reference 25.The authors in Reference 21 proposed an automatic way for tracking and enforcing data sharing agreements between a user and cloud providers with the aid of smart contract and blockchain technology.In this approach, the providers who violated the shared agreements were detected through a set of voters or arbiters listed in a voting contract.In Reference 26, a secure smart home architecture based on cloud and blockchain technology was proposed.The authors used an encryption and hashing algorithm to obtain confidentiality and trust.The integration of blockchain-based approaches into several security services, including authentication, privacy, data provenance, and integrity are reviewed inReference 5.A conceptual model-called ProvChain-was designed to collect cloud data provenance and provide the assurance of data operations in a cloud storage application through logging provenance data in a blockchain network. 27The integration of blockchain and attribute-based signcryption was proposed in Reference 28 to support secure data sharing in a cloud ecosystem.Although the aforementioned approaches take advantages of blockchain and smart contracts to enhance cloud user privacy and trust, none of them applied GDPR rules in their methods to clearly give some standard regulations to the actors processing user data.
Work that combines blockchain and GDPR in a cloud environment includes a blockchain-based approach for supporting data accountability and provenance tracking, which meets GDPR requirements proposed in Reference 4. The approach presented two different models for deploying a smart contract: (i) data subject consent rules recorded in a blockchain under which each actor (controller/processor) should follow the rules and (ii) actor policies supported as a smart contract that allows users as subscribers join or leave the contract.Verification to check for compliance of consent rules was undertaken manually in both cases.A personal health data sharing system has been proposed in Reference 29, which enables users to securely share their health data and for data consumers to get necessary data in a transparent manner and in compliance with GDPR.
The system used blockchain technology supplemented by cloud storage shares the health data.A data quality inspection module relied on machine learning approaches was introduced in the system to monitor the quality of personal health data.Although the system benefits from GDPR and

FIGURE 1 A
FIGURE 1 A new architecture for privacy-aware cloud ecosystems

FIGURE 3
FIGURE 3 Data recording in blockchain

FIGURE 7 FIGURE 8
FIGURE 7 Number of voters vs gas consumption

FIGURE 9 FIGURE 10
FIGURE 9 Number of voters vs mining time