LibreSocial: A Peer-to-Peer Framework for Online Social Networks

Distributed online social networks (DOSNs) were first proposed to solve the problem of privacy, security and scalability. A significant amount of research was undertaken to offer viable DOSN solutions that were capable of competing with the existing centralized OSN applications such as Facebook, LinkedIn and Instagram. This research led to the emergence of the use of peer-to-peer (P2P) networks as a possible solution, upon which several OSNs such as LifeSocial.KOM, Safebook, PeerSoN among others were based. In this paper, we define the basic requirements for a P2P OSN. We then revisit one of the first P2P-based OSNs, LifeSocial.KOM, that is now called LibreSocial, which evolved in the past years to address the challenges of running a completely decentralized social network. Over the course of time, several essential new technologies have been incorporated within LibreSocial for better functionalities. We describe the architecture and each individual component of LibreSocial and point out how LibreSocial meets the basic requirements for a fully functional distributed OSN.


INTRODUCTION
The analysis of online social network (OSN) trends over a period of more than a decade has shown significant growth in their popularity among users, and consequently the number of OSNs has risen significantly 1,2 . This growth in the users and the OSNs is directly attributable to advances in computing technologies (both hardware and software), and increased computer know-how among the users. Current popular OSNs rely heavily on the centralized computing model, in which the OSN service provider is in charge of handling and presenting the data of the users, and actually owns all the data. Besides the more obvious technical risks in the centralized computing model for OSNs which the providers have endeavored to address, such as unbalanced load distribution, performance bottlenecks, single point of failure, single point of attack and channel bottleneck 3 , we see two other concerns that centralized OSNs have not addressed well: accumulated costs which manifest as scalability concerns 4,5 and security and privacy concerns 4,6 .
The first concern, accumulated costs due to scalability, are introduced because of a large number of highly connected users, need for more infrastructure due to a large network, high network traffic, need for mechanisms for management and dissemination of the user-generated content and challenges associated with database scalability 5 . To cover the rising costs, service providers tend to monetize the users' content and private data, by selling it to third parties. The second concern, security and privacy, are divided into three categories, that is, user-related, service provider-related, and third-party application related threats 4,6 .
User-related threats are a consequence of disclosure of private data to other users intentionally, such as by hacking, or unintentional due to lack of or poorly configured privacy settings. service provider-related threats are mostly due to the fact that the service provider has control of users' data. While the user must trust the provider to treat his data properly, the provider can leak the user's personal data outside the context of its initial definition 7,8 and can further allow information linkages by unauthorized third parties who aggregate data from different social data centers to obtain more information about the different users 9 , such as in the Facebook-Cambridge Analytica data scandal 1 .
Third-party application related threats are introduced by users to provide extra functionalities that are not in the OSN, and in most cases they are untrusted. For users to use them, they must allow the application to access their private data which then exposes them and in many cases there is no component that screens how the application manipulates the user's data 10 .
Based on these concerns, researchers have proposed the use of decentralized or distributed computing models hence the emergence of distributed online social networks. A distributed online social network (DOSN) is described as "an online social network implemented on a distributed information management platform, such as a network of trusted servers or a peer-to-peer (P2P) system or opportunistic network" 11,12 . It is distributed in the sense that all computing, storage and communication resources are provided by the user rather than an economically driven provider. I This allows shifting of the implementation of the infrastructure, and the privacy and security control to the users, while allowing users to undertake innovative development of the system, effectively lowering the operational costs 13 . DOSNs can be realized via two modes of implementations, web-based and peer-to-peer (P2P) DOSNs 14 . Web-based DOSNs are also sometimes collectively called hybrid OSNs. They are heavily reliant on distributed, federated web servers usually referred to as pods, that are often operated by private individuals. Operating a pod requires in-depth technical skills, which eventually limits the features as well as concentrates the data at a few pods run by capable users. Inexperienced users can join existing pods.
This renders users vulnerable to privacy concerns as the pod operators can still access private data on their pods. Similarly to centralized solutions, the majority of users have to trust someone to maintain the privacy of their data, a trust that is often misused. The P2P DOSN (or P2P OSNs) on the other hand operate through the sole interconnection of P2P software run on user devices, similar to previous P2P file sharing networks, requiring no trust in any operator but require more complex networking and system solutions. With the right combination of P2P mechanisms, such as presented in this paper, a fully distributed, scalable, reliable and secure OSN platform can be created that purely runs on the "free" computing, storage and networking resources of the user devices.
By this, the P2P DOSN does not require the invest of money and thus is free of the need to monetize its service.
Peer-to-peer networks are a class of distributed networks in which the peers act simultaneously as servers and clients, that is providing and consuming resources to/from each other, in a self-organizing manner without centralized control. The features of P2P networks include high degree of decentralization, self-organization, multiple administrative domains, low barrier to deployment, organic growth, resilience to faults and attacks, and abundance and diversity of resources 15,16 . However, in spite of these promising features, existing P2P research offers only fragments for a working DOSN, as each individual component, such as the overlay network, data management and routing, comes with its own challenges. Aiello and Ruffo 8 argue that (traditional) P2P systems are in and of themselves not a complete solution for a couple of reasons.
Firstly, P2P systems such as the structured overlay networks have, in their unmodified state, many security challenges 17 that make the network unstable. Also, within the P2P open environment, data access is typically open to everyone which compromises the user's data privacy.
Lastly, the most P2P overlays offer only very restricted and low-level APIs while social applications need a suite of higher-level services to reduce the overhead during the application development. Thus, individual p2p mechanisms discussed in literature are not ready to user for a P2P DOSN, but require carefully extension, adaptation and integration to address the needs in a P2P DOSN.
In this paper, we present LibreSocial, that realizes a satisfactory level of reliable functionality in form of a P2P framework, adequate social interaction applications on top and meets the security, privacy and the essential quality of service (QoS) requirements for a secure decentralized OSN. This work on LibreSocial is motivated by three reasons.
Firstly, the previous works 18,19,20 discussed and evaluated the security and privacy aspect of the framework, and 21 focused evaluating the monitoring functionality of the system. Therefore, although the system was introduced in parts, the finer details of the framework were not discussed. Secondly, the framework has experienced many changes over the last five years, maturing and leading to key insights on the interdependencies of the components. Finally, we present the implementation and an evaluation of LibreSocial, as current most of the P2P-based OSN proposed in literature have no tangible implementation that can be tested in a live environment. The rest of the paper is organized as follows.
In Section 2, we look at several proposed P2P OSNs, and discuss their achievements as well as shortcomings. In Section 3, we introduce the concept of the P2P framework by first discussing the technical requirements needed for the framework and thereafter introduce Libre-Social (previously called LifeSocial. KOM 18,19,20,22 ). In Section 4, we introduce the core functions of the P2P overlay which builds on FreePastry, an implementation of the Pastry overlay 23 , highlighting the modifications so as to suit our application's needs. Section 5 to 7 discuss the essential framework component layers of LibreSocial's system architecture, namely the overlay, the p2p framework, the OSN plugins and the GUI, showing how the defined requirements are realized. We describe the evolution of LibreSocial from its former version LifeSocial.KOM in Section 8 and present in Section 9 the evaluation of LibreSocial. A conclusion and outlook for future work is given in section 10.

RELATED WORK
An analysis of majority of the DOSN solutions proposed distinguishes two main research directions 8

Peer-to-peer OSNs
There are several proposed OSNs since the advent of LifeSocial.KOM 18,19,20,22 in 2008, which aim at a P2P DOSN platform run purely on the user devices, mainly to address the privacy concerns identified in centralized OSNs. In this section, we introduce the most prominent of these proposals and also briefly mention shortfalls that we have observed in them.
PeerSoN 2 24 is designed with the aim of addressing privacy concerns and ensuring availability. In the system, a solution for the privacy concern was provided by integration of encryption and access control mechanism to give a unified user login procedure. Availability was addressed through implementation of file sharing procedures.
The architecture of the system is two-tiered in nature and is designed to decouple the user contents from the control mechanisms.  The second level is the DHT Pastry 23 which provides lookup services and ensures the system remains robust to churn. A data availability service was implemented on the DHT. The system does not implement any security features but gave suggestions on the use of attribute-based encryption or a cipher-text attribute-based encryption for security and asymmetric keys to support access control. The shortcoming in this proposed system is that it does not seek to implement a functional system, but rather the testing of a trust relationships and data availability via simulation.
General shortcomings in literature on P2P DOSN are thus mainly the lack of implementation and thus of a full picture of the overall system, the limitation of functionality, as well as the heavy reliance on trust in friends. Based on our decade long experience on building P2P DOSN, we find these three points essential to address for a suitable P2P DOSN.
There must be an implementation attempt to identify the shortcomings and interdependencies of the considered mechanisms and to advance the architecture. A full set of functions, namely identity management, access-controlled storage and secure communication must be provided, in order to meet the demands of an OSN and the address the technical challenges in the combination. And finally, the solution must not assume trust in friends, as OSN use cases require to store highly personal information, often referring to these "friends", without them having access to it. In LibreSocial, we address these three challenges and more.

A P2P FRAMEWORK FOR ONLINE SOCIAL NETWORKS
Designing P2P-based OSNs is a non-trivial task as P2P solutions usually involve some design complexities. P2P mechanisms for realizing similar functionalities as in the centralized OSNs can be implemented in a variety of ways. However, the necessary P2P components can be clearly defined into a suitable P2P framework, while giving system designers independence to chose the components based on their need. In this section, we begin by underlining the necessary technical requirements in the design of the P2P-based framework to support an OSN application. Thereafter, we introduce LibreSocial, an OSN designed based on the defined technical requirements.

Technical Requirements for a P2P-based OSN
In a centralized OSN, the server stores the data, replies to queries and enforces the access control as well as other security considerations.
In P2P solutions the same must be achieved in a decentralized fashion, while ensuring quality solely based on the cooperation of unreliable and potentially malicious user devices (nodes). First, is the need for a reliable (overlay) network that interconnects all nodes, integrates an identity management and supports routing messages to nodes/users.
Mechanisms to store data reliably and securely with a fine-grained access control mechanism are needed. The provision of security and access control is especially challenging, as no one is to be trusted in the network, including friends. Trust in friends must not be assumed since friends in OSNs are typically known but may not be fully trusted Thus, access rights can be clearly assigned to users and groups.
Both users and groups can be affected by identity theft, thus it is desired that the overlay provides mitigation against such to ensure messages are sent by known and authenticated users.
(g) Security management: The users should be authenticated before using the network. The system should support the implementation of a suitable access control mechanism in combination with a suitable replication mechanisms so that users can individually pick who can access which of their data to enforce data privacy requirements. The use of digital signatures to sign the content will ensure that users can verify that the data is untampered, correct and authenticated. Finally, the communication channels should be secured through the use of appropriate cryptographic methods for end-to-end encryption.
(h) Quality monitoring and evaluation: Running a large-scale, distributed system always bears the risk of unpredictable behavior, this must be identified and addressed autonomously by the system itself. Monitoring of the overall performance helps to expose the inherent failures and hidden strengths of the system. It provides a way of reviewing the quality of service (QoS) of the system. By reviewing data gathered from the monitoring process against the expectation on performance, strategies can be implemented that address the shortfalls in quality in specific situations. Continual system testing allows the developers and

LibreSocial: A P2P-based OSN
We propose LibreSocial, a P2P-based OSN that is designed based on the defined technical requirements given in Section 3.1. In Fig. 1, the technical requirements are put together into a proposed framework for a P2P-based OSN. The architecture of LibreSocial has been designed based on the Open Services Gateway Initiative (OSGi) service platform with the goal of making it easy to add/remove services. The architecture consists of four layers namely: 1. the P2P overlay, 2. P2P framework,

plugins and applications, and 4. the graphical user interface (GUI).
The most important aspect of this architecture is that it can also provide support for the deployment of any other P2P application (in form of a set of alternative plugins) by simply separating the OSN plugins from the overlay and the framework which can then be used for the new application. This is possible because of the strong P2P framework in the middle summarizing all essential P2P functions in an abstracted manner. The OSGi service platform, implementing a local service bundle orchestration, further permits the adaptation of the system to any other application as desired, with possible code reuse. Parts of this system has been previously published under the term LifeSocial.KOM 18,19,20,22 which was renamed due to naming conflicts. In the following sections, we discuss in detail how each of the layers is realized.

OVERLAY: A HEAVILY MODIFIED FREEPASTRY
Pastry 23 is a generic P2P routing and object location scheme whose nodes forms a structured P2P overlay that is completely decentralized, fault-resilient, scalable and reliable. FreePastry 3 is a readily available open source implementation of Pastry that was developed at Rice University and is extensively used within the research community.
The choice of Pastry, and in this case, FreePastry, as the overlay in LibreSocial mainly because it comes bundled with many other simple but useful P2P-based tools which are needed in LibreSocial. These tools includes PAST 46,47 (a replication scheme for simple key-value pairs), Scribe 48 (a simple multicast event notification infrastructure) and Split-Stream 49 (a multicast streaming system that uses Scribe). These simple and highly limited tools can be directly accessed without need for further installation/configurations. In addition, FreePastry offers keybased routing functionality 43 , hence it strives to achieve reliable IDbased routing. In LibreSocial, FreePastry has been heavily extended to provide secure identity management, secure and parallel routing management among other modifications to adapt it to the needs of the system. We discuss these further below.

Initial identity management
FreePastry relies on the use of DHTs for routing data in the network.
Therefore ID management is based on the DHT. We look at how the identity space is created and how the identifiers are constructed Identity space: The DHTs utilize a predefined ID space of size 2 160 for all nodes which can be viewed as a circular structure in which the successor of the highest ID is the lowest ID, that is 0, hence a ring network. Peers are responsible for the IDs closest to them. Pastry defines the closest node as one having a nodeID with the longest possible matching prefix to the desired ID. Each peer maintains a routing table with entries pointing to other peers in exponentially growing distances. Also a leaf set with the numerically closest nodes in the ring is maintained.
The construction of the routing table ensures that it is always possible to find a node that is closer to any ID. If no peer is identified as being close to a given ID, the current peer becomes responsible for this ID.
Identifier construction: Every peer in the initial ring has a unique numeric identifier called the nodeID that is generated randomly for each node. Each nodeID is a 160-bit value, with the values of the nodeID's being uniformly distributed over the numeric space in which the identifiers are picked from. This random assignment of nodeID's ensures, with high probability, that nodes with adjacent nodeID's are diverse in geography, ownership, jurisdiction, network attachment and so on. The overlay also offers an efficient routing functionality. Given a numeric value in the 160-bit numeric space and a message, the overlay is capable of efficiently routing the message to the network node whose identifier is numerically closest to the given numeric value.

Initial Message routing
The message routing process is made possible by the routing algorithm.
We discuss the routing algorithm and its constructs, that is the routing table and the leaf set.
Routing algorithm: Given a network that consists of N nodes, the overlay's routing algorithm guarantees that the message will be delivered to the recipient node in O(log 2 N ) steps. At each routing step, the message is forwarded to a node whose nodeID shares key whose prefix is at least one digit longer than the prefix that the key shares with the present node's ID. If such a node is unknown, the message is forwarded to a node whose nodeID shares a prefix with the key as long as the current node, but is numerically closer to the key than the present node's ID. The routing algorithm in the initial FreePastry takes advantage of three data structures, the leaf set and a routing table, which are different for each node and help the node keep track of its immediate neighbors.
Routing table: This is organized into O(log 2 b N ) rows (b being a configuration parameter with typical value of 4) with a total of 2 b −1 entries in each row. The 2 b − 1 entries at a given row n refer to a node whose nodeID shares the current node's nodeID in the first n digits but whose n + 1th digit has one of the 2 b − 1 possible values other than the n + 1th digit in the present node's ID.
Leaf set: The leaf set L refers to the node set with the |L|/2 numerically closest larger nodeIDs, and the |L|/2 nodes with numerically closest smaller nodeIDs, in reference to the current node's nodeID. The leaf set is especially important during the process of message routing.
Pastry further considers a neighborhood set which holds nodeIDs and IP addresses of the |M | nodes closest (based on the proximity metric) to the current local node. The neighborhood set is used in routing messages as well as maintaining locality properties. In FreePastry the neighborhood set is not implemented.

Overlay modifications
In order to create a foundation for security in the overlay as well as provide support for heterogeneous nodes within the context of the OSN, it is necessary to make severe changes to the FreePastry design and implementation. These changes are summarized herein. We present a further in depth analysis on the construction of P2P overlays with desired properties in this dissertation 55 .

Secure nodeID
In LifeSocial, asymmetric cryptography was provided using a 1024-bit RSA algorithm 50 which necessitated modification on how FreePastry works so as to accommodate the 1024-bit as opposed to a 160-bit nodeID. This has been changed in LibreSocial to elliptic curve cryptography (ECC) 51,52 with 160-bit keys which now matches the initial requirements of FreePastry while provide strong encryption with minimal overhead. Also symmetric cryptography is provided using the advanced encryption standard (AES) algorithm 53,54 with a 128-bit key size. We ID2UserIDStorageItem logged in user nodeID (= public key) ID2UserIDStorageItem m a p s to maps to 0 2 160 -1 FIGURE 2 UserID to NodeID mapping now consider the process of registration, and profile and userID creation, immutable and associated to a user in contrast to the nodeID, which is mutable and associated to a node.
Registration procedure: The registration process of a new user requires that there is a network, i.e. another member present to act as the bootstrap node, else the new user will become the first node in the network. Based on the user name and passphrase the user generates an asymmetric key pair. The 160-bit public key is used as the nodeID during the creation of the Pastry Node associated to this user. The user can change the nodeID at will by simply regenerating a new set of keys. With the nodeID being a public key, all communication to this node can be encrypted and any signatures from a node can be verified directly.
Profile and UserID management: A profile item is created once a new user is able to join the network, which can then be fully or partially encrypted using a symmetric key for confidentiality and stored with the public key as the nodeID inside the network. Because the nodeID can change, it cannot be used as a unique identifier for the user, to help users identify each other. Therefore an immutable userID is also generated once when the user's profile is created. To map the nodeID to the userID, a mapping-item is created, which is stored at the userID in the overlay and lists what the corresponding nodeID to the userID is. In order to prevent attackers from illegally overwriting the mapping-item, it is required that the overwriting node verifies that it is in possession of the claimed asymmetric key pair through a challenge-response approach. Additionally, several other mapping-items are also stored in the network to map the userID to the current nodeID so that other nodes can find out the current nodeID of a given known userID. This is shown in Figure 2. The routing algorithm is used to retrieve this mapping item. By this approach, users can log in to their account from any machine, as the keys and credentials are purely created from their username and passphrase, i.e. their knowledge. By storing the nodeID (= public key) as a signed data item under the userID, nodes can identify the location of their contact in the network. Any communication to this node can now be encrypted and signatures from this node verified.

Parallel and iterative routing
The routing table has been extended to contain not just one entry per routing entry but a bucket of multiple peer addresses as in the DHT Kademlia 58 . Instead of forwarding a message to only one single node from the routing table, in LibreSocial a requesting node will deliver messages to k different peers in parallel. The requesting node waits to receive the responses and then sends the message to the new k most promising peers. This process is iterated until the target peer is identified and the messages is delivered. This significantly reduces the impact of interference during the message routing process by a malicious nodes and has been proven successful in Kademlia 58 as well. However, to fully realize the benefits of this technique requires two optimization problems to be solved: termination of redundant messages after successful lookup and rejection of further suggestion after forwarding of messages is done.

Weak nodes
In FreePastry all participating nodes are treated as equal. In reality, node capacities differ and some nodes are incapable of contributing to the network, either due to short participation times, missing storage capacity or limited bandwidth. These nodes should be able to use the services in the overlay but should not be visible otherwise, that is, should not be in charge of routing and storing data. To achieve this, the weak nodes are labeled with specific markers, implemented as specific port numbers, during the joining process. As the IP and port numbers are available throughout the code, at any relevant position in the code it can be considered to treat weak nodes differently. In specific, the information about weak nodes is not spread in routing, only appear in leaf sets and hence used only for final message delivery. In the replication process these nodes are ignored. Thus, we ensure that these nodes do not store data and do not participate in routing, both of which are taken care of by strong nodes.

THE P2P FRAMEWORK
The P2P framework is a toolbox of essential services and mechanisms

Storage and Replication
The importance of data availability within any OSN cannot be overstated. While the modified FreePastry provides simple routing and a simple data storage, data can get lost if the storing node goes offline.
LibreSocial builds on PAST 46,47 to handle storage. PAST was a natural choice for handling storage because it is already integrated with FreePastry, and also includes replication management. In the following we discuss how PAST handles files management and replication as well as the extensions to PAST to support features that were not present.

File Management
When a user joins the network, the user avails a small percentage of their storage space for network usage. This ensures that the application can perform functions such as replication. Key file management functions include addressing and storage, and file operations which we discuss further.
File addressing and storage:Files stored using PAST, like nodes in the network, are also addressed using a 160-bit identifier, referred to as the fileID. This fileID is calculated either by hashing the file itself, being then unique and unchangeable for this given file, or, alternatively, the fileID is created by hashing a reproducible string, such as the user name and the function of the data object, for example, under the hash of "Alice__Albums" the data item listing the albums of Alice could be found. An inserted file is stored at the node whose nodeID most closely matches the prefix of the fileID, and the node that stores this file is found using FreePastry's routing algorithm.

File operations:
The fileID is used to perform INSERT, REQUEST, UPDATE and DELETE operations. PAST was modified so as to provide the UPDATE and DELETE of files functionalities, which were previously not supported.

Replication
The other important function that PAST provides is data redundancy support via replication which helps guarantee data availability in case the data owner is offline. In addition, LibreSocial incorporated a data caching extension is included to improve data access. There is also functional support for load and traffic balancing to ensure replicated data storage and access is evenly distributed among the nodes. These are discussed herein.
Replication process: PAST provides replication management, so that a file stored at a node x is replicated to k − 1 additional nodes, where these k − 1 nodes are the next closest nodes based on the fileID and may also be found in x's leaf set. To ensure that the k replicas of a file were actually created, every successful replicating node transmits an acknowledgment, called a store receipt, back to the node that performed the insertion. We adapted PAST's initial replication mechanism to not only check whether a given fileID is available at the replica nodes but to actually check the hash of this file, otherwise file updates would not have been propagated. To support efficient replication, as well as reduce system and traffic overload, caching mechanisms as well as load balancing for storage and traffic are incorporated.

Local caching mechanism:
The response time of the application is improved through the use of an internal caching mechanism introduced on top of PAST. It holds data items recently retrieved thus reducing the need for resolving subsequent requests for the same content within the next short period of time. The time for which the data is served from the cache is chosen carefully to limit the traffic in the network, but also to maintain a freshness of the data and to consider potential updates.
Storage load balancing: There may arise a situation in which a particular node may not have sufficient storage space or may limit it for various reasons. In such a case, a node may reject a request to store a file or replica. However, the system takes care of such a shortfall by providing replica diversion. Replica diversion is used as the first option when it comes to load balancing and is aimed at balancing the load within a leaf set L. A node x that is experiencing storage space shortage and receives a request will delegate the request to another numerically close node y within its leaf set that has more storage available but was not previously selected for replicating the respective file. If y accepts to store the data, x then stores a pointer to y. So as to guarantee availability, a pointer to y is stored in the k + 1th closest node z, making z responsible for the replica in case x fails, thus fulfilling the need for k replicated files. In case all the nodes in the leaf set of x have reached their storage limit, then file diversion is initiated and the file is distributed to another part of the nodeID space in PAST by selecting a different salt so as to generates a different fileID. File diversion ensures a balance in the remaining free storage space in different portions of the nodeID space in PAST and the delegation of the data storage task does not violate the anonymity of the data owner as the data is encrypted and the node storing the data cannot read it unless it has been granted permission by the data owner.
Traffic load balancing: In most cases, when an overload occurs, the storage capacity is not the limiting factor but the bandwidth of the node responsible for a popular item. If an item is requested very frequently, the node might use all its bandwidth to send out that item and still not be able to process all the requests. We extended PAST to harness, in such cases, the large group of file receivers to spread out the data themselves. For that the responsible node maintains a list of receivers and forwards the file request to individual nodes from this list 45 . Which previous receiver is selected can be chosen based on various factors.
However, for popular files, many receivers are available to serve the file even further. As this step is optional, it only improves the performance and does not induce consistency or replication issues. In case of a file update, the list of previous file receivers is emptied and reset.

Access control
Access control mechanisms are needed to ensure that only authorized users read from and write to data items in the network. Authorized users are generally also selected friends and these are stored in a friend list. LibreSocial achieves access control through the use of cryptographic keys. For write access, we differentiate between a first write operation and the following update operations.

First write operation:
For an unused dataID, anyone is free to write this first instance of the data, as no access rights are violated. The owner of the data object generates a symmetric cryptographic key which is used to encrypt the storage item. The symmetric key is then individually encrypted with the public keys of each user or group who shall have read rights. This encrypted data item is signed and combined with the public key of the owner as well as the list of encrypted keys builds the secure storage item, which is then stored simply in the network under the previously unused dataID. The public key of the owner or group is stored together with the signed item when it is first inserted.
File update: It is possible to update, that is, overwrite, the data with a new secure storage item if it is signed with the corresponding private key belonging to a specified public key, after it has been verified by the node where the data object is stored. Fig. 3 shows Read access: Reading of the data item requires retrieving of the data and having the right private key for the decryption of the individually encrypted symmetric key which is then used to decrypt the data.
In a hierarchical group, the symmetric key is retrieved using a depthfirst traversal through the group hierarchy. Thus, anyone may have the Secure Storage Item, e.g. as replicating node, but only the users selected by the owner can decrypt and read it. To support random data retrievals within the network's graph structure, the first design option is preferred, that is, the item is stored as an entry point. For example the album set of user Alice is stored under the hash of the string "Alice-Albums". In this situation, due to many abstraction layers and the high level of redirection that occurs, high latency may occur during the process of the pointer/hash initiated traversals in the P2P overlay. In any case, every level of redirection, including multiple hops, eventually leads to a 1-to-1 communication over the P2P overlay. This scenario has the advantage that it is now possible to split some seemingly large objects, such as photo albums, into smaller parts and identify only a single object that holds a list of hashes for all the parts are pointers to further parts. It then becomes possible to handle persistent data in a better manner, as well as provide a cleaner abstraction over the P2P overlay for the storage and retrieval of graphs connected through these hash pointers but with the possibility of latency due to redirections and additional layers.

Distributed Data Structures
LibreSocial supports storage of large data sets by integrating three distributed data structures (DDSs) into the framework, namely, distributed linked list, distributed set and prefix hash tree. The DDSs are built on top of the DHT that the modified PAST provides, with different APIs and entry points for each DDS, hence no interaction takes place between them. A discussion on the three structures follows.

Distributed sets & linked list
A set is a data structure that is an unordered collection of unique members/elements, and each member of the set may be either a set or a primitive element referred to as an atom. A list is a sequence of zero or more elements of a given type, in which the elements are linearly Storage in distributed manner on various nodes is done by splitting the list of data items into buckets of a defined size, each containing a given range of the data items of the whole data structure and then distributed to the nodes in the network. The bucket size, s, depends on the size of items to be stored by the user, and is a data structure parameter.
The structure itself is in essence an array and all items from the interval [s · (m − 1), s · m − 1] for m ∈ N are stored in a bucket named structurename_m, where m is the interval ID and N is a set of all network nodes. This is represented in Figure 4. In case the index of an item is known, the remote node responsible for the bucket is also known and can be contacted directly, thus additional latency is in the form of one redirection. The distributed sets/lists can be used to store structured data in form of data graph stored in the network as shown in Fig. 5.

Prefix Hash Trees
The prefix hash tree 59 (PHT) is a trie-based distributed data structure that was designed with the sole purpose of supporting more sophisticated queries over a DHT, specifically, it allows running of range queries, heap queries, proximity queries and multidimensional analogues of these queries. It relies on the lookup interface of the DHT to construct a trie-based structure and does not need to know how the DHT topology looks like or how the DHT performs routing. In LibreSocial, the use of the PHTs make it possible to perform range searches. This allows searching for content that falls within a predefined range, such as users by age.
Assuming an alphabet that consists only of binary digits, hence keys are represented as binary numbers, and that every leaf node stores at most M keys. The PHT structure begins with a single node stored at a given leaf node. Insertion of a key, K, will first result in a lookup operation to locate the leaf node that stores K. The insertion process may then cause the leaf node to split into two children followed by redis-

DDS security and access control
All of these distributed data structures incorporate cryptographically enforced access control mechanisms. Confidentiality is typically provided through directed encryption with the public keys of the read enabled users. Integrity is provided through the signature of the author, which is stored alongside the encrypted/unencrypted data. Data updates, that is, the overwriting or deletion of data, requires the signature of the previous author to take effect. All replicating nodes consider this and thus a majority of the replica holders must confirm that the changes are valid.
The need for such advanced access control options is seen in the functionality of the news wall of every user. This social networking specific feature gives each user a personal wall page that they can post entries to. Other users can also post to another user's personal page and also comment on those posts if they have been accepted as friends.
Users may also alter their comments or even delete them. Entries of others must not be altered. Even the owner of the news wall cannot alter the entries of others, but may delete the entry in total. The wall is imple- that is, overwrite it with a tombstone object, which is treated in the presentation differently. The owner of a wall can delete a complete entry, by removing the pointer to that comment from his wall item, which he owns, but not the wording in the entry, which would required to replace the comment at the comment's data item ID in the DHT, which is not allowed, as he cannot present a valid signature from the previous owner.
The same applies for comments on photos in photo albums and entries in the forum of a group. As the access control is enforced by the P2P framework, the applications and plugins do not have to hassle with the complexity of the access control enforcement. In the dissertation 56 , we present the DDS concept in full detail.

Communication Channels
The

Unicast, multicast and aggregation
LibreSocial supports both synchronous and asynchronous messaging. It also provides support for unicast (1-to-1) messaging such as in direct

Publish/Subscribe (pub/sub)
This allows users to communicate with each other without the knowledge of each others addresses. The messages are published to topic channels that the users have subscribed to. Pub/sub is implemented using Scribe 48 with support for caching of created topics. Scribe creates a minimal spanning tree for all participants in the pub/sub channel and delivers messages from any origin to all subscribers.

Streaming via SplitStream
Streaming can be visualized as a high bandwidth 1-to-N communication, hence is different from a basic file transfer. LibreSocial supports streaming using SplitStream 49 which allows minimization of the upload bandwidth requirement by equally distributing the workload and ensuring that nodes which consume also participate in routing. Support for order preservation is integrated and can be enabled or disabled depending on the application plugin requirements as this includes an overhead.
In essence the P2P framework supports transmission of low overhead streaming data that is already properly formatted. Also, there are additional options for resending of lost packets and using checksums to verify received data.

Streaming via WebRTC
While the streaming option through SplitStream routes data flows from one P2P framework instance (node) to another, there is often the need for simple, low latency audio and video conferencing. This use case is addressed through WebRTC which is provided by the browser and allows browsers to connect to each other. Having access to the webcam and microphone allows to set up conferencing tools. In this case, the data flow takes place between browser instances and does not pass through the P2P framework. This side-channel is the only exception to the aim to manage all data and communication in the P2P framework.

Secure message channel
The nodeIDs in the network are public keys. Thus, any communication can be encrypted and signed. To ensure that the messages are secured, they can be encrypted either by a symmetric key or using the asymmetric key depending on the communication channel that is to be used. If the message is intended for only one recipient, then only the asymmetric key encryption is applied. If it is meant for a group, then the symmetric key is applied and then the public key of the receiving user which is identical to the nodeID to which the message is sent to. As the sender's nodeID is also it's public key, the recipient can easily verify the message.

Monitoring and Testing
Running a large-scale distributed P2P system bears the risk of unforeseen emerging performance issues. In order to identify the quality and performance of the network, there is need for a reliable monitoring solution. LibreSocial performs tree-based monitoring using SkyEye.KOM 60,61 (or simply SkyEye). SkyEye works by implementing a further overlay on top of the P2P network and arranges the nodes into a tree structure as shown in Fig. 6. Every node collects local measurement for certain pre-defined metrics while at the same time receiving metrics from its own child nodes (if it has any child nodes) and aggregates these received values with its own. After performing the collection and aggregation, the nodes sends these metrics to its pre-calculated parent node in the This is shown in Fig. 7.
We elaborate the long term vision for monitoring and management  We have shown in simulation that metric intervals can be defined and reached through such a distributed control loop from both sides, that is, both in lowering the average hop count (when the response time is "too high") and in raising the hop count (when the response time is high but the maintenance traffic should be lowered). However, for an integration in LibreSocial it is needed to adapt the framework so that the configuration of the system can be adapted during runtime, which is not trivial.
Testing of the system allows the developers to find bugs that exist in the system, while monitoring allows the system users to find out exactly

AppStore -Repositories for (OSN) Plugins
The AppStore is an independent plugin extension for repository based app/plugin management. The development of the AppStore was pos-

Other supporting components
In addition to the P2P features discussed in the framework, there are three other important components integrated into the framework.
These are the Storage Dispatcher, the Message Dispatcher and the Information Cache. The component framework shown in Figure 8 shows the general placement of these components in the architecture.
The Storage Dispatcher provides storage services for the platformspecific data objects, both locally and remotely. It keeps track of the application data being stored. All data objects and messages in Libre-Social extend a common class called SharedItem, having a storage key, header and the data object itself, making it a storable object. The Storage Dispatcher acts as a local stub and performs efficient storage, retrieval, update and removal operations on the data object using the P2P framework. Execution of these operation can be either synchronous or asyn- publish/subscribe mechanism. It sends a message to every node that has registered to a particular TopicChannel, e.g. participants in a chat room.
The AggregationChannel is used to combine data on a global/network scale, used in SkyEye. In order for nodes to provide data, they must add a sensor and to receive combined/aggregated data, the node must register a callback. The aggregation of the sensor data takes place in the AggregationServer which is in the backend.
The Information Cache acts as a cache for objects, either data objects or stored messages, requested by the higher layers from the distributed storage. These requested objects are expected to change infrequently, hence can be kept in the cache so that subsequent requests can be served locally direct to the Apps/GUI to minimize network traffic due to repeated requests. The cache size is configurable and the caching strategy employed in LibreSocial is the least recently used (LRU) strategy. The use of the Information Cache allows exempting the plugins in the upper layers from handling asynchronous events. The plugins simply decide the objects they need at any particular time and retrieve it from the cache. Such data may either be available, already requested or not available. In case of unavailability, the plugin then requests for it, leaving the cache to initiate the lookup for the requested object and to process the irregularly incoming data.

Summary of the P2P Framework
The framework is a collection of (advanced) P2P functions to harness the resources in the overlay, hide the complexities and to provide interfaces for advanced applications on top. PAST provides replication functionality and has been extended to support access control, security, heterogeneity and also load balancing. With the mechanisms for distributed data structures and advanced communication options which all consider the security of the users various applications can be built. The monitoring solution SkyEye supports the network by gathering and providing continuously information on the network's performance that can be used to fine tune the nodes' configuration in the system.

THE PLUGINS AND APPLICATIONS
Plugins are software components that add a specific feature to a system to enhance the system's capabilities. The use of plugins in system design provides for increased extensibility, simplicity in system design and parallel development of a software application. The plugins used in LibreSocial are based on the OSGi framework and are placed on top of the P2P framework layer and rely on the services that it provides. Each plugin also provides an OSGi command interface that allows the Test plugin to check functionality of the plugin during distributed tests. The following are the plugins that were implemented.
• Login: This is the entry point into the network. The plugin handles the user's registration and login in the underlying framework.
• Profile: This plugin is used to create a data item that contains the user's personal data. The item is stored in the secure P2P storage. It allows for granular adjustment of the private data that users can view. They can also set cover and profile pictures.

THE GRAPHICAL USER INTERFACE
The top most layer of the architecture (Fig. 1) is the graphical user interface (GUI), which is the point of contact with the users. LibreSocial's GUI has undergone many changes since the initial implementation. Starting from a pure command line interface, to an Eclipse-based applet framework, to its current design which uses standard web technologies such as HTML5, AJAX, JQuery, Bootstrap and Knockback.js. The current GUI is shown as a screenshot in Fig. 10. This combination of web technologies also allows the provision of multi-language support (currently only English and German) as well as support for mobile devices (at least the GUI is accessible). The GUI's backend is composed of three essential parts, the plugin template, the plugin logic and the WebProvider. These are discussed in detail.   Table 1, the differences are tabulated. We give a summary of the differences below. This allows the generated public key to be directly used in the network for encrypted communication.
• Message routing strategy: In addition to the traditional forwarding strategy to the node with nodeID that is numerically closest to the required nodeID, LibreSocial also implements a parallel/iterative routing strategy. This mitigates against certain attacks such as Sybil, Eclipse and routing attacks, while also limiting the lookup time and the amount of traffic generated.
• Capacity awareness: By supporting heterogeneous nodes in LibreSocial, it is possible to further introduce strategies that allow the system to gather more informative data such as available persistent storage space, memory, bandwidth, and type of devices at login and even at runtime so as to dynamically adjust the routing table to support strong and weak nodes.
• Group access control: A group access control mechanism has been introduced to support sets and nested sets of users allowing to ease the management of various friend groups and to support the mapping of organizational hierarchies to e.g. the CSCW groups and forums.

EVALUATION OF PERFORMANCE AND COST
In order to measure the efficacy of the system, we carry out an experiment in which all the plugins are run so as to ensure that all system functions are activated. The quality of the system is given through the function set and provided features. Three aspects were considered of importance to show the general costs of the system. These are: (a) Network cost: This focuses on the data rate (bytes/sec), the message rate (messages/sec) and the average hop count for lookups.
(b) Storage cost: The focus is on the retrieval time, storage time, average data stored versus actual data stored at a single node, and average number of replicas per node versus the actual number of replicas in a node, displaying the storage load.
(c) Cost of security: This looks at the impact on the data objects when either symmetric or asymmetric encryption is performed. Analogue to the workload plan, the objects are created / requested at      Tables 2 and 3. A discussion of this analysis follows.

Network
In the setup with 64 nodes, we observe that the total data transfer of reaches an approximate maximum of 250 MB/s in the network (see Fig.   11e) with an maximum peak load per node at 3.9 MB/s within the first ten minutes (see Fig. 11f), which corresponds to the network initialization phase. However, in general, network data rates oscillate between 0 and 150 MB/s with peaks during the network initialization phase, the file uploads and the photo uploads (see Fig. 11d). This corresponds to 2.4 MB/s as average peak load per node across all setups, which is acceptable. The number of nodes does not effect the max node load, as new nodes bring statistically more resources than they consume. Please note, that we present the in traffic characteristics, as the out traffic / sending transmission rates are lower than the receiving rates. This is due to the same homogeneous transmission load at all nodes in the work plan, i.e. homogeneous send rates, heterogeneous receive rates. The maximum load for sending messages is roughly half of the receiving load.
The transmission of DDS, i.e. distributed data structures, is taking the biggest impact on the traffic, as shown in Fig. 11g. Further traffic sources are the replication of the stored data items as well as messages.
Messages in a network may be join messages, leave messages, maintenance messages, user messages or request and retrieval of results 64 .
The network message rate (see Fig. 11h peaks at the end of the experiment with up to 13,000 messages/s, corresponding to the Wall plugin experiment. This is due to the retrieval of the DDS associated with the wall comments. In general, the total message rate oscillates between 0 and 6000 messages/sec. Fig. 11h shows the maximum per node message rate, reaching 100 message/s. Both numbers are considered low. We have a very low hop count of 0 to 1 in the system (see Fig. 11i).
This is due to the fact that the routing table has room for 160·20 = 3200 entries, sufficient to list the maximum 64 nodes in the setup. Mostly, every node has information about every other node in our setup. As the overlay provides lookups in logarithmic time to the number of nodes in the system, only a setup with hundred thousands of nodes would increase the hop count significantly. Correspondingly to the low hop count, the retrieval times (see Fig. 11j) and storage times (see Fig. 11k) are also very low at 50 and 110 milliseconds. The storage needs twice the retrieval time mainly due to the creation of replicas during storage.
Both values are considered tolerable. Figure 11m to Figure 11p show the storage analysis of our tests. The focus of the storage analysis is the storage and replication load in total and at the most loaded node. The overall number of unique objects is shown in Fig. 11m, the number of corresponding replicas is presented in Fig. 11o. As same test operations were performed at every node instance, the storage initiation load was the same for all nodes, while the actual storage node was diverse. As pointed out earlier, LibreSocial is designed to ensure that no single node is overwhelmed with storage requests, and that the replication requests are evenly distributed in the entire network. The peak load load is roughly small with 4100 unique items at the most loaded node (see Fig. 11n) and 14,000 replicated items at maximum per node (see Fig. 11p). These maximum values are less than twice the average storage load of 2500 unique items in average per node and approx. 8000 replicated items in average per node. Thus the maximum load deviation is below 2, showing a fair load distribution. This shows that the additional load brought by further nodes to the system enlarges the resource pool that is used uniformly.

Security
One of the changes between LifeSocial and LibreSocial is the public key infrastructure used. LifeSocial implemented 1024-bit RSA algorithm 50 while in LibreSocial this was changed to a 160-bit ECC algorithm 51,52 .
The performance of the ECC algorithm is shown in Table 2 which can be compared with the results in 22 . In general, the overhead is much smaller than for RSA algorithm, and equally the encryption and decryption times are significantly reduced. Also, we evaluated the AES algorithm for symmetric encryption and tabulated the results in Table 3. The overheads are generally small as most of the objects are usually less than 1 kilobyte, with the encryption and decryption times being less than 1 millisecond.

CONCLUSION
This paper presents LibreSocial in full, an P2P-based platform for Online Social Networks. The goal of the development of this OSN application is to provide a fully-distributed, secure online social network that offers with high-quality services while having practically no operational cost, despite running on unreliable, unsecure and sometimes malicious user devices. To match the needs for such an OSN, the paper specifies technical requirements for a P2P-based OSN, and shows how Libre-Social is designed to meet these requirements. LibreSocial is designed on a structured P2P overlay, FreePastry, with modifications for identity management and security, hence guaranteeing logarithmic routing efficiency. While PAST offers simple file storage, the inclusion of distributed sets, distributed linked-lists and prefix hash trees provides support for complex data such as albums, comments and inbox messages, while ensuring these have access control features, and opens up the system to the implementation of more advanced searching mechanisms such as range searches. The monitoring and testing plugins included in Libre-Social sets it above other systems as it allows for quality of service (QoS) monitoring using the available aggregated metrics and therefore adjustments can be made to achieve desired QoS standards. Through the broad capabilities of the used P2P framework, LibreSocial provides simple yet powerful implementations of OSN functions, such as friends, messaging, photos, walls but also unique features such as group/forums, file hosting, voting and audio/video chat. The modern user interface makes it compelling to use.
Selected elements of LibreSocial have been partially published before and reached very positive reaction in the community, corresponding dissertations 21,55,56,57 around LibreSocial elaborate on specific elements, such as the overlay, the storage and the monitoring. This is the first in detail overview on the overall architecture and interdependencies of the elements. In general, LibreSocial offers a working solution for fully distributed, secure but also high quality social networking and is capable to further support a wide set of simple to develop applications.
As next step, we aim to deploy LibreSocial in a beta test to gain further insights on its performance 'in the wild' . With this, we work on our vision to provide a feature-rich tool for secure and privacy-aware communication and interaction, that cannot be surveilled or shut down 65 , thus providing a tool for free speech in today's challenging times.