• Open Access

Peer cluster: a maximum flow-based trust mechanism in P2P file sharing networks

Authors


Correspondence: Weifeng Sun, School of Software Technology, Dalian University of Technology, Dalian, 116621, China.

E-mail: xinxinyuanfan@mail.dlut.edu.cn

ABSTRACT

Trust mechanism has become a research focus in recent years as a novel and valid way to ensure the transaction security in peer-to-peer file sharing networks. Nevertheless, some fundamental challenges still exist, for example: How can malicious peers be effectively isolated? How can various threats of manipulation by strategic peers be resisted? What strategy should be used to ensure that the service providers are authentic peers? Considering these challenges in our minds, in this paper, we propose a new trust mechanism based on the maximum flow theory. We firstly add a few prestigious peers into a cluster as the original members according to their transaction behaviors in a period; then, we perform maximum flow algorithm and identify those peers that still link from (to) the peers in the cluster as new members, which is carried out repeatedly, and almost every normal peer would finally become the member of the cluster. Each request peer has the priority to select downloading sources from this cluster according to our trust mechanism. In this way, the malicious peers are isolated, and their transaction behaviors are also confined largely even though they have high reputation. Extensive experimental results confirm the efficiency of our trust mechanism against the threats of exaggeration, cheat, collusion, and disguise. Copyright © 2013 John Wiley & Sons, Ltd.

1 INTRODUCTION

1.1 Background

Peer-to-peer (P2P) file sharing networks, as a means of sharing and distributing information, are receiving more attention, wherein the management of trust and reputation plays an important role to make the participants establish the relationship of authentic cooperation and mutual benefit. Trust is a personal and subjective phenomenon that is based on various factors or evidence, and that some of those peers carry more weight than others; an individual's subjective trust can be derived from a combination of received referrals and personal experience. Reputation can be considered as a collective measure of trustworthiness based on the referrals or ratings from members in a community [1]. The concept of reputation is closely linked to that of trustworthiness, but it is evident that there is a clear and important difference. The distinguished difference lies in that trust systems produce a score that reflects the trusting entity's subjective view of trusted entity's trustworthiness, whereas reputation is referred to as a single value that represents what the community as a whole thinks about a certain user [2]. In our work, the reputation of a peer aggregates the trust ratings of other peers, which reflects its capability of providing file downloading services for other peers, and the trust of a peer represents the direct confidence it has in another, which reflects its personal opinion. Usually, the request peers consider the reputation of response peer as the criterion of determining whether to make transactions or not. Without trust and reputation mechanisms, the network cannot offer an authentic transaction platform for each participant and can lead to mistrust among participants and eventually system failure. By collecting, distributing, and aggregating the feedback about the participants' past behaviors, the reputation-based trust models can help participants decide whom to trust, encourage trustworthy behavior, and deter participation by those who are unskilled or dishonest [3].

Recently, many researches on trust and reputation mechanisms using various methods, such as fuzzy logic theory [4-6], Bayesian network [7, 8], subjective logic [9, 10], social cognitive [11-14], fine-grained [15, 16], or even game theory [17-20], have been proposed in this field on which both academia and industry are focusing their attention.

1.2 Motivation

In P2P networks, the reputation of a peer means its importance degree—the higher the reputation, the larger the probability it is selected as downloading source. The traditional trust mechanisms almost adopt this scheme to manage system transactions. However, the existence of massive malicious peers makes it impossible to estimate the transaction behavior fairly, so the transactions with malicious peers are unavoidable. Generally speaking, the normal peers provide authentic file uploading services for other peers, as well as give authentic trust ratings to other peers. On the contrary, the malicious peers not only provide unauthentic file uploading services but also give unauthentic trust ratings, even cooperate with each other to make collusion attacks. Considering these mentioned characteristics, some ideas inspire us: Can we seek a method to identify the normal peers and isolate those malicious peers? What strategy should be implemented to ensure that the service providers are authentic peers? To address these questions, we propose a new trust mechanism based on the maximum flow theory. Our mechanism makes those normal peers form a cluster, in which the response peers have the priorities when selected as downloading sources. When a request peer issues a query for downloading a file, the peers that have the matched files are defined as the candidates; then, the request peer firstly selects those candidates that are members of the cluster. If there are two or more, it selects the one that has the highest reputation as the downloading source. If there is no candidate in a cluster, it selects the one that has the highest reputation directly from the response peers. In this way, the transaction behaviors of malicious peers can be confined effectively even though they have high reputation.

1.3 Related work

Trust mechanism research can be analyzed from different perspectives. In the simple eBay system [3], there are one short comment and three discrete feedback ratings: positive (1), negative (−1), and neutral (0). A peer's reputation just aggregates the ratings given by other limited neighbors without considering the short comment. The system is carried out easily, but it cannot implement any effective measures to isolate and punish those malicious peers. Kerschbaum et al. propose PathTrust model [21] by exploiting the maximum-weight path to have personalized reputation ratings. This model only focuses on the reputation between initiator and candidates but ignores the related trust ratings given by other peers to the candidates. These two models only focus on the limited trust ratings without considering complete rating information. However, in this paper we deal with all the related trust ratings given by other transaction partners while calculating the personal reputation.

Kamvar et al. [22] propose the trust model EigenRep, which is based on trust transitivity. At first, the system defines some pre-trusted peers with high reputation. Each peer's reputation relies on the others' trust ratings. However, how to select and distribute the pre-trusted peers in large-scale and decentralized P2P networks is a question. If there are few malicious peers, the EigenRep may properly assess the transaction behavior, but if there are lots of malicious peers, the results would be poor resulting from the transitivity among massive malicious peers because each receives exaggerated ratings from other malicious ones, and in return, gives high trust ratings to them. So, how to confine the trust transitivity among malicious peers, especially the collective malicious ones, is the key. In this paper, our algorithms make normal peers cluster together, and the response peers in the cluster have the priorities to provide services for other requesters. In this way, the malicious peers, even they possess high reputation, cannot be selected as downloading sources, which can confine the trust transitivity among malicious peers.

Levien et al. [23] adopted maximum flow trust metric to choose the trustworthy servers. Efstathiou et al. [24] used a maximum flow algorithm to calculate the decision function for their protocol, making it robust against less naive colluders who combine false trading with real contribution. The decision function is a probability. Peer P provides service to C with probability: p = min(mf(P → C)/mf(C → P), 1), where mf(⋅) is the result of maximum algorithm for a pair of vertices. If the denominator is 0, P provides service only if the numerator is strictly positive. At the same time, the authors also define a generalized maximum flow algorithm (GMF) to detect excessive false trading. The weight we of an edge encountered during the maximum flow computation is discounted on the distance, d(P,Se), of e's source vertex, Se, from the maximum flow source, P. we will depend also on q(P,Se), which equals the following: the ratio of the sum of weights of all the outgoing edges starting from Se divided by the sum of the weights of all the outgoing edges starting from P; that is, we is first divided by math formula; it is then divided by q(P,Se), but only if q(P,Se) > 1. g is a constant. P will allow a request from C to proceed to the test with probability: p = min(g × GMF(P → C)/gmf, 1), gmf is updated every time a new, strictly positive GMF result is computed by P. The update follows gmf = α × gmfold + (1 − α) × GMF(P → C) and α = 0.5. And reference [25], which also utilizes maximum flow-based subjective reputation, shows us the maximum flow-based decision function is robust and makes the system converge to optimal cooperation and performance, and meanwhile, the authors also propose other incentive technologies, such as discriminating server selection, subjective reputation, and adaptive stranger policies to tackle the cooperation problem commonly existing in P2P system. These works are to utilize the maximum flow algorithm to calculate the decision function (probability) that one peer whether provides service for another peer. However, in our work, we mainly adopt the maximum algorithm to calculate the trust flow among peers according to their local trust and add the trusted peer into a cluster. Our purpose is to form a trusted community (cluster) via maximum flow algorithm.

1.4 Challenging issues

The anonymous, autonomous, and open natures cannot ensure the security and good application of P2P file sharing networks. Marmol et al. [26] have shown us nine important security threat scenarios. Faced with these threats, it is very hard to explore a comprehensive and valid trust mechanism. Considering the variability of different types of malicious peers, we can mainly conclude two challenges as follows: (1) How can we identify the normal peers and isolate the malicious peers effectively, especially the strategic malicious ones? (2) What strategy should be implemented to ensure that the service providers are authentic peers?

1.5 Our contributions

To address the aforementioned challenges, we propose a new trust mechanism based on the maximum flow theory. The main contributions include the following: (1) our algorithms make the normal peers form a cluster by performing the maximum flow repeatedly. In this way, we are able to identify normal peers and isolate the malicious ones effectively; (2) we propose the definition of peer cluster and make the response peers in it have the priorities to provide file uploading services for request peers; meanwhile, we also define the local trust ratings between a pair of arbitrary peers and personal reputation of each peer; (3) in order to evaluate the performance of our trust mechanism more effectively, another two classic criterions (recall and precision) are introduced. To verify the availability and rationality of our model, several types of threats are used to compare the performance with EigenRep, PathTrust, and Random models. Experiment results show that our model can achieve better performance against different types of threats.

The remaining of this paper is organized as follows: Section 2 details our trust mechanism; and Section 3 presents the experimental parameters and analyzes the corresponding results; finally, some conclusions are introduced in Section 4.

2 PEER CLUSTER TRUST MECHANISM

In this section, we introduce our trust management mechanism in detail. Firstly, we give a simple introduction on how to calculate maximum flow on the basis of two common approaches: augmenting path algorithm and preflow-push algorithm. Secondly, we define local trust rating to reflect the direct opinion (confidence) that one peer has in another, and personal reputation of each peer to reflect the probability of providing file uploading services for other peers. Thirdly, we give the definition of peer cluster and prove its existence in theory. Additionally, our algorithms describe how to make normal peers form a cluster and ensure the response peers in a cluster have the priorities to provide file uploading services for those request peers. Finally, we analyze the computation complexity of our algorithms and prove its correctness.

2.1 Maximum flow algorithm

The maximum flow algorithm as a classic optimization method has been applied in many research fields. The maximum flow in a directed graph can defined as follows. Let G = (V,E) be a directed graph with vertex set V, edge set E, and the edge capacity c(u,v), u, v ∈ V. Moreover, our paper adds two virtual vertices: source s and sink t. The maximum flow algorithm means to find the maximum flow routed from the source to the sink, which obeys all capacity constraints.

There are two commonly used approaches for solving maximum flow problem: augmenting path algorithm and preflow-push algorithm. First, we introduce the augmenting path algorithm referring to [27]. Augmenting path algorithm has two parts: Routine A and Routine B, respectively; the first is a labeling process that searches for a flow augmenting path, e.g., a path from s to t for which flow f < c along all forward arcs and f > 0 along all backward arcs. If Routine A finds a flow augmenting path, Routine B changes the flow accordingly. In general, a node can be one of the following three states: unlabeled, labeled and scanned, or labeled and unscanned. Upon entering Routine A, all nodes are unlabeled. The first step renders the source labeled and unscanned.

Now, we give the concrete performance process of Routine A and Routine B.

Routine A (labeling process):

Initially, label the source (−, ε(s) = ).

General steps: select any node, x, that is labeled and unscanned, and let (z±, ε(x)) be its label. To all unlabeled successor nodes, y, such that f(x,y) < c(x,y), assign the label (x+, ε(y)), where ε(y) = min{ε(x), c(x,y) − f(x,y)} (such that y are now labeled and scanned). To all predecessor nodes, y, that are unlabeled, such that f(y,x) > 0, assign the label (x_, ε(y)), where ε(y) = min{ε(x), f(y,x)} (Such y are now labeled and unscanned). Now define x to be labeled and scanned. Repeat the general step until the sink is labeled and unscanned or until no more labels can be assigned. In the former case, go to Routine B; in the latter case, terminate (f is a maximum flow).

Routine B (flow change):

The sink has been labeled (y±, ε(t)). If the first part of label is y+, replace f(y,t) with f(y,t) + ε(t); otherwise, replace f(t,y) with f(t,y) − ε(t). Go to node y and treat it the same way: if its label is (x+, ε(y)), replace f(x,y) with f(x,y) + ε(t); if its label is (x_, ε(y)), replace f(y,x) with f(y,x) − ε(t). In either case, go to node and repeat until the source is reached. Then, discard all labels and return to routine A. The following example will give a detailed introduction.

The arc numbers are c(x,y) and f(x,y). The initial flow sends one unit along the path (s,x,y,t) described in Figure 1(a).

Figure 1.

The performance process of augmenting path algorithm.

The first execution of Routine A results in “breakthrough” (i.e., finding a flow augmenting path) with the following labels: (a) After one step in Routine A, Figure 1(b) describes the label. (b) After two steps in Routine A, Figure 1(c) draws the label. (c) After three steps in Routine A, Figure 1(d) depicts the label. (d) Now execute Routine B to obtain new flow such as shown Figure 1(e). (e) After repeating Routine A, we finish with the labels in Figure 1(f).

This shows that the flow can be increased by 1 (=ε(t)) along the flow augmenting path, (s,(y,x),t). Working backwards from the sink, the path traverses arc (x,t) in a forward direction, so its flow increases by 1 (f(x,t) becomes 2). The label at x tells us that we traverse arc (x,y) in a backward direction, so we decrease that flow by 1 (f(x,y) becomes 0). We then move to node y, and its label tells us that the flow augmenting path traverses arc (s,y) in the forward direction, so its flow increases by 1 (f(s,y) becomes 2). Because we reach the source (s), Routine B terminates, and we return to Routine A with the following flows and initial label shown in Figure 1(g).

After executing Routine A, we find that there is no flow augmenting path, so the current flow has the maximum value of three units from s to t. The final labels are shown in Figure 1(h).

In 1986, Goldberg and Tarjan [28] proposed the preflow-push algorithm for calculating maximum flow, which had been applied widely because of simplicity, flexibility, and efficiency. In this paper, we also utilize the preflow-push algorithm while computing the maximum flow. For details, see [28]. Referring to book [29], we now give a simple description on how to perform a maximum flow using the preflow-push method with residual network.

We define Figure 2(a) as initial network, and the entire process of performing preflow-push method is described in Figure 2(b). In the initial phase, we push flow through s − 1 and s − 2, thus making 1 and 2 active (a node is active if it has a strictly positive excess). In the second phase, we push flow from these two vertices to 3 and 4, which makes them active and 1 inactive (2 remains active). In the third phase, we push flow through 3 and 4 to t, which makes them inactive (2 still remains active). In the fourth phase, 2 is the only active node, and edge 2 − s is admissible, and one unit of flow is pushed back along 2 − s to complete the computation.

Figure 2.

The performance process of preflow-push algorithm.

2.2 Local trust rating and personal reputation

In the P2P file sharing networks, the transactions happened among peers can be presented by a directed weighted graph. The vertex set V denotes the system peers, edge set E = {i|j ∈ Trans(i),  lij} represents the trust ratings, Trans(i) is the set of peers from which peer i has downloaded files, and lij represents the trust rating from peer i to j. After a certain number of transactions, the entire network will form a web of trust, whose weight between a pair of arbitrary peers is the trust rating as shown in Figure 3.

Figure 3.

Web of trust with six peers.

Each time peer i downloads a file from peer j, it may rate the transaction as successful when the downloaded file is satisfactory or unsuccessful when unsatisfactory, which reflects the direct opinion in the downloading source. Now, we give the definition of local trust rating based on historical transaction behavior.

Definition 1. The local trust rating lij is the satisfactory degree of individual transactions that peer i has downloaded files from peer j.

display math(1)
display math(2)

Here, succ(i,j) is the number of successful transactions peer i deems, unsu(i,j) is the number of unsuccessful transactions peer i deems, and max(pij,0) represents the larger.

The local trust rating reflects the direct opinion or confidence one peer has in another. However, we use the personal reputation to represent the common view of trustworthiness. So we define the personal reputation as follows:

display math(3)

Here, I(i) represents the set of peers that have downloaded files from peer i, and |I(i)| is the number of peers in the set.

The distributed implementation of query and storage in P2P system is crucial and very hard because it determines not only the practical application but also the feasibility to achieve the peer cluster in a real system. A number of P2P file sharing systems have merged, and each has its own data location scheme [30]. Similar to Gnutella [31], which uses broadcast-based schemes and does not guarantee reliable content location, and CAN [32], Chord [33], Pastry [34], and P-Grid [35] use a distributed hash table to deterministically map keys into points in a logical coordinate space and guarantee a definite answer to a query in a bounded number of network hops. However, most of P2P systems deployed on Internet are unstructured [2]. Therefore, in this paper, a simple data list is designed for each peer to store the rating information to other peers based on the transaction behavior whether the peer got the authentic resource or not.

As shown in Figure 4, peer 0 has two transaction partners 2 and 3, and the number of successful transactions with peer 2 is 10, while the unsuccessful number is 15. The last item R(j) represents the partner's personal reputation value. This list is updated according to the change of succ(i,j), unsu(i,j), and R(j) in each time series. Peers store the local transaction information and simultaneously inquire the partners' rating information; then, the local trust value between a pair of arbitrary peers can be obtained, and the corresponding personal reputation can also be computed. At the same time, an updated trust network is formed in the view of each peer; the next step for them is to perform the maximum flow algorithm over the updated network and add those peers that still connect to the peers in the cluster into the cluster to become new members of this cluster.

Figure 4.

Peer 0's data list.

2.3 Cluster description

The main purpose of our algorithms is to make the normal peers form a cluster and isolate those malicious peers. Now, we define peer cluster in terms of directed weighted graph, which represents the transaction relationship among peers.

Definition 2. A peer cluster is a subset C ⊂ V; for all vertices v ∈ C, v has at least as high level of local trust with the vertices in C as it does with vertices in V–C depicted in Figure 5.

Figure 5.

A tightly linked peer cluster.

In Definition 2, the vertices in the cluster have high level of local trust ratings among them. In order to verify the existence of peer cluster, we give the following theorem.

Theorem 1. Cluster C is the vertex set, in which all vertices are reached from s after removing the minimum cut with s and t viewed as source and sink, respectively.

Proof. Adopting the contradiction, we assume that some vertex v reached from source s after removing the minimum cut does not belong to the cluster. Because v ∉ C, it has higher level of trust in V–C than C. Then, it should be removed to the other cluster including sink t to make a more efficient cut, which conflicts the property that the vertices that belong to a minimum cut should distribute the two sides, respectively. Similarly, the assumption that some vertex v belongs to cluster C reached from sink t can be proved.

2.4 Algorithm framework

In this subsection, we introduce the detailed algorithm description on how to make the normal peers form a cluster based on maximum flow theory. However, we must design the flow capacity before performing the preflow-push algorithm. The capacity range also influences the size of a cluster; if the capacity is too large, the size of cluster may be large, which may induce many malicious peers to join the cluster; if the capacity is too small, the size would be small, which may make few normal peers join the cluster and refuse other normal ones. Therefore, a suitable capacity is needed to ensure suitable size of the cluster. In our work, we use the difference of successful and unsuccessful transactions to express the capacity between a pair of arbitrary peers. We define it as follows:

display math(4)

The number of original members in a cluster is also influential in gathering the normal peers, which is neither too large nor too small. The simulation results show that 5% of system peers is the reasonable selection while designing the amount of original members in Section 3.3. Therefore, we select the 5% highest personal reputation R(i) peers as original members of the cluster according to their transaction behaviors in the first time series. These original members of the cluster are different from the pre-trusted peers in EigenRep model because those pre-trusted peers are defined directly without referring to any trust evaluation by other peers. Obviously, the more the capacity between a peer and some peers in the cluster, the higher the probability it is identified as a new member. The normal peers, which can provide authentic services for other request peers and give authentic trust ratings to other response peers, will have high capacity and be added into the cluster constantly.

Next, we describe how to make the normal peers form a cluster via Algorithm 1.

image

In Step 1, we need to select some prestigious peers as the original members of our cluster, such as peers 1 and 2 in Figure 6(a). In our work, 5% of peers are considered as targets according to their ranked personal reputation after the first time series. Then, the capacities from the source and to the sink are define on the basis of Step 2. After performing the preflow-push algorithm and removing the minimum cut, we identify those peers that still connect the peers in cluster as new members, such as peers 3, 4, 5, and 6. Finally, all the peers with high local trust ratings will be identified as members and form a cluster such as found in Figure 6(b).

Figure 6.

Peer cluster.

The request peers send their queries for downloading files and select response peers from the cluster based on their personal reputation R(i). If there is no response in the cluster, then they directly select the one that has the highest personal reputation from the response peers as depicted in Algorithm 2.

image

2.5 Complexity and correctness

Because our proposed trust mechanism requires to repeatedly calculate the maximum network flow, the computation complexity would be discussed in this section. Moreover, the correctness would be discussed as well. In Algorithms 1 and 2, the computation complexity mainly depends on the complexity of preflow-push algorithm while identifying those normal peers as new members and selecting downloading source, and preflow-push in turn mainly depends on pushing and relabeling operations. In order to find out the relabeling and pushing accurately, we firstly give an example to illuminate the concrete performing process by a simple flow network in Figure 7, in which f is a preflow function and satisfies ∑ jfji − ∑ jfij ≥ 0, for all i ∉ {s,t}; e is the flow excess at a vertex, which is defined as e(i) = ∑ jfji − ∑ jfij; and h is a labeling height function, which is assigned to each vertex v an integer. Initially, h(s) = n (n = |V|), h(v) = 0 for all other vertices, and h(v) ≤ h(w) + 1 for every residual edge (v,w). However, we use a queue to make vertices relabel or push in Algorithm 1. The first-in, first-out selection scheme applies the discharge operation until the queue is empty. The discharge operation can remove the vertex on the front of queue, can apply relabeling or pushing to label the height or make the excess become 0, and can add newly active vertices to the rear of queue. The detailed performing process of preflow-push algorithm can be expressed via Steps 1–6.

Figure 7.

The performing process of preflow-push algorithm.

Then, we will analyze the computation complexity of our proposed algorithms and prove the correctness.

Lemma 1. Source s is reachable from any active (excess) vertex i ∉ {s,t} by a path in the residual network.

Proof. By flow decomposition theory, any preflow can be decomposed into three parts: (a) along path from source s to sink t; (b) along the directed cycles; and (c) along path from source s to active vertex. Paths (a) and (b) cannot make the excess at vertex i, so there must be a path from source s to active vertex i. The residual network contains the reversal of this path; therefore, source s is reachable from active i.

Lemma 2. For all active vertices i ∉ {s,t}, the maximum labeling height h(i) is at most 2n − 1.

Proof. Because i is an active vertex, there must be a path from i to source s in the residual network. Suppose the path is i = v0, v1, …, vl = s. Thus, the maximum value of length l is n − 1. Because h(vi) ≤ h(vi + 1) + 1 and h(s) = n, we have h(i) = h(v0) ≤ h(v1) + 1 ≤ h(v2) + 2 ≤ ⋯ ≤ h(vl) + l ≤ h(s) + (n − 1) = 2n − 1.

First, we explore the number of relabeling operations based on the lemmas mentioned earlier. According to the preflow-push algorithm, only n − 2 vertices in V − {s,t} may be relabeled, and the value of h(v) can be initially 0 and grows to 2n − 1 by the Lemma 2. Therefore, we have that the number of relabeling operations is at most (n − 2)(2n − 1) = 2n2 − 5n + 2 ≤ 2n2.

Then, we explore the number of push operations. In the preflow-push algorithm, there are two types of push operations: saturating push and nonsaturating push.

For any vertex pair (u,v), we will count the saturating push operations from u to v and from v to u together, considering them as the saturating push operations between u and v. If there are such pushes, (u,v) or (v,u) at least belongs to the edge. Now, suppose that a saturating push from u to v occurs, then h(v) = h(u) − 1. In order to push from u to v again, the algorithm must first push flow from v to u, which cannot happen until h(v) = h(u) + 1. Because h(u) never decreases, the value of h(v) must increase by at least 2. Similarly, h(u) must increase by at least 2 between saturating pushes from v to u. By Lemma 2, any vertex height never excesses 2n − 1, which implies that the number of times any vertex has its height increase by 2 is less than n because at least one of h(u) and h(v) must increase by 2 between any two saturating pushes between u and v. Thus, the number of saturating push operations is at most 2n per edge, and the total number of saturating pushes should be 2n ⋅ |e|.

To get the nonsaturating operations, we define a function Φ = ∑ (v|e(v) >0)h(v). Each nonsaturating push must cause Φ to decrease by at least 1. There are two ways that can make Φ increase. First, relabeling a vertex increases Φ by less than 2n − 1 because the relabeling number cannot increase its maximum possible height by Lemma 2. Second, a saturating push from a vertex u to a vertex v increases Φ by less than 2n because no height changes and only vertex v, whose height is at most 2n − 1, can possibly become excess. We show that a nonsaturating push from u to v decrease Φ by at least 1 because before the nonsaturating push, u was excess and v may or may not have been excess. And u is no longer excess after the push. In addition, v must be excess after the push, unless it is the source. Therefore, the potential function Φ has decreased by exactly h(u), and it has increased by either 0 or h(v). Because h(u) − h(v) = 1, the function has decreased by at least 1. Thus, the total amount of increase in Φ is due to the relabeling and saturating and is constrained to be less than (2n − 1)(2n) + (2n − 1)(2n ⋅ |E|) = 4n2 ⋅ |E| + 4n2 − 2n ⋅ |E| − 2n ≤ 4n2 ⋅ (|E| + 1) = 4n2|E|.

In addition, we explore the complexity of selecting the downloading source. In Algorithm 2, selecting downloading source means to find out that the peer that has the largest personal reputation value R(i) from all the response ones. Suppose that the number of the response peers is nc. Let downloading source peer be d and response peers be pi, 1 ≤ i ≤ nc. Then, we can find out peer d via the following nc − 1 steps: d = max(p1,p2), d = max(d,p3), …, math formula. On the other hand, we suppose that the number of request peers is nr; therefore, the total number of computation should be nc ⋅ nr. Obviously, both nc and nr are less than n; thus, the total number of computation while selecting downloading source is less than n2.

Considering the operations of relabeling, saturating push, nonsaturating push, and selecting downloading source, the general algorithm complexity is in worst-case time (2n2) + (2n ⋅ |E|) + (4n2 ⋅ |E| + 4n2 − 2n ⋅ |E| − 2n) + n2 = O(n2 ⋅ |E|).

To prove the correctness, we use the classical theory of Ford and Fulkerson [36], in which the augmenting path, a simple path from s to t in the residual network, is used to verify the maximum flow by Theorem 2:

Theorem 2. A flow f is the maximum flow if and only if there is no augmenting path.

Now, we prove the correctness by Theorem 3:

Theorem 3. The algorithm is correct; that is, the preflow f is maximum when the algorithm terminates and all labeling heights are finite.

Proof. While the algorithm terminates and the labeling height is finite, all the vertices i ∉ {s,t} must have no excess because there are no active vertices. So, the preflow f is a flow. Next, we prove that there is no augmenting path by the contradiction. Suppose there is an augmenting path s = v0, v1, …, vl = t; then, we have h(vi) ≤ h(vi + 1) + 1, for 0 ≤ i < l. Therefore, we have h(s) = h(v0) ≤ h(v1) + 1 ≤ h(v2) + 2 ≤ ⋯ ≤ h(t) + l because h(s) = 0 and l < n; thus, h(s) < n, which contradicts h(s) = n. Therefore, there is no augmenting path, and the preflow f is the maximum.

3 EXPERIMENTAL RESULTS AND ANALYSIS

3.1 Types of threats

Considering the different transaction behaviors, we may classify peers into two main types: normal and malicious. The normal ones provide authentic file uploading services, as well as give authentic trust ratings; the malicious peers not only provide unauthentic file uploading services but also give unauthentic trust ratings, even calumniate or pretend to be normal peers. According to the variability of malicious peers, we divide them into three types of threats:

  1. Individual malicious peers (IMPs): The malicious peers in this type of threats provide unauthentic uploading services, as well as give unauthentic trust ratings to other ones.
  2. Collective malicious peers (CMPs): These malicious peers make a collusion attack. They provide unauthentic file uploading services, cheat and calumniate normal peers, and give exaggerated ratings to other malicious ones mutually, which makes a serious threat.
  3. Disguised malicious peers (DMPs): Those DMPs provide authentic and popular files to attain high level of trust from other peers in some cases when selected as downloading sources; then, they give high local trust ratings to other malicious ones. This type of malicious peers is very “cunning” and has the IMP and CMP characteristics.

3.2 Experiment configuration

We design a power-law P2P network to simulate our trust mechanism because this type of network is prevalent in real-world P2P networks [37]. In addition, a simulation prototype using C++ language is developed, by which three types of threats (IMP, CMP, and DMP) can be compiled to verify the feasibility and efficiency of our trust mechanism. The simulation programs are launched on a computer with the configuration of Microsoft Windows XP Professional, Genuine Intel(R) CPU, 1.61 GHz, 1 G Memory. Meanwhile, 300 peers with 3000 files are developed on it via the program. Although the transactions among these 300 peers happen on this computer, the distributed placement and interaction in the real world can also be implemented as presented in Section 2.2. During each simulation, some peers are able to issue service requests for downloading files; meanwhile, other peers are able to respond to the requests. When a request is issued by a peer, it is propagated by broadcast through the entire network in the usual Gnutella [38] way.

Gnutella is a decentralized P2P system consisting of hosts connected to each other over TCP/IP. In the Gnutella network, the traffic consists of queries for data replies to queries and discovering messages to find nodes. This type of network allows to share arbitrary resources. If a node wishes to join the Gnutella network, it must connect to any existing node of the system. The mechanism “host caches” allows a new node to join the network by connecting to a random node's IP address provided via DNS or on a specific website. The new node opens a TCP connection to the existing node and performs a handshake. A node needs to connect to multiple existing nodes in order to reach the whole Gnutella nodes. When it manages to join the network, it communicates with neighbor nodes by sending and receiving Gnutella protocol messages and accepting connections from new nodes. The main protocol messages are as follows: (a) Ping: a request for information about another nodes; (b) Pong: a reply carrying information about a node; (c) Push: a mechanism that allows a firewalled node to share data; (d) Query: a request for a resource; and (e) Query Hit: a response identifying an available resource. Gnutella is a broadcast network, in which Pings and Queries are forwarded to multiple nodes. To reduce the resource consumption, nodes cache Pongs to response to Pings when they can. Pongs and Query Hits are routed back to the path needed to reach the destination. In this respect, the queries are inefficient due to flooding, but the replies are rather efficient. Figure 8 shows a typical query and response in Gnutella network.

Figure 8.

A typical query and response in Gnutella.

A node that wants to perform a search sends a Query message to all nodes connected directly to it. Each of them then replicates and relays the Query message to its neighbors. A node replies with a Query Hit message when it has content that satisfies the request. This Query Hit contains the IP address and port number where this specific node can be reached for the data transfer. When a node receives a Query Hit message, it knows where to obtain the data. In Gnutella, the data are downloaded out-of-network: instead of wasting the Gnutella network capacity, the two nodes involved in the transfer connect over TCP/IP and transfer the data directly using the standard protocols HTTP/1.0 or HTTP/1.1. And the concrete parameters are shown in Table 1.

Table 1. Experiment parameter configuration.
Parameter configurationValue
Peers of system300
Files of system3000
Types of files100
Time series each simulation100
Request services each time series30
Number of original members of our cluster (5%)15

As shown in Table 1, our system has 300 peers with 3000 files that are divided into 100 types, and each peer has 10 different files. In each simulation, there are 100 time series. In each time series, 30 peers will issue 30 file downloading service requests randomly. The original members of our cluster are defined as 5% of the system peers according to their transaction behaviors in the first time series. Trying to get the accurate value, we always perform 10 simulation operations and calculate the average value as the final result while simulating EigenRep and PathTrust models.

Therefore, in each simulation, there are 100 × 30 transactions—transaction means an edge in our algorithms; thus, there are 3000 edges after finishing 100 time series, and some edges may be linked repeatedly, which commendably meets the node degree distribution of power-law network. This is the reason that we select 100 time series and 30 service requests each time series.

In this paper, we introduce the successful downloading percentage (SDP) so as to evaluate the performance of trust mechanism more accurately and reasonably. Generally speaking, the more the number of successful transactions, the better the trust mechanism. So we define the SDP as follows:

display math(5)

Here, succ(i,j) and unsu(i,j) have been defined in Section 2.2. In formula (5), the SDP represents satisfactory transaction percentage of the entire system.

3.3 Amount of original peers in cluster

In our trust mechanism, the suitable selection of original members in a cluster is also a crucial problem. If we identify many peers as the original members, some malicious peers may be added into the cluster; if few peers are identified as original members, some normal peers with low local trust ratings may not be added into the cluster. These two consequences do not meet our anticipation, so we should select a suitable amount as the original members.

In order to design the original members more reasonably, we introduce another two widely used statistical classifications: recall and precision. Recall is a measure of completeness, whereas precision can be seen as a measure of exactness or fidelity. In a statistical classification task, recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (the sum of true positives and false negatives that are items not labeled as belonging to the positive class but should have been). And the precision for a class is the number of true positives (the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (the sum of true positives and false positives that are items incorrectly labeled as belonging to the class). With the introduction, in this paper, we define recall and precision as follows:

display math(6)
display math(7)

Here cluster _ peers represents the peers in cluster; normal _ peers represents the peers that should have been normal actually. A recall score of 1.0 means that every normal peer is added into the cluster C (but says nothing about how many malicious peers are also added into the cluster C), whereas a precision score of 1.0 means that every peer identified as belonging to cluster C is indeed normal (but says nothing about whether all the normal peers are added into the cluster C).

The performance accuracy of our trust mechanism is tightly related to the amount of selected peers originally in cluster. To obtain the reasonable amount of original peers, several simulations on recall and precision are performed while changing the number of system peers from 100 to 3000. In Figure 9, the precision curves are decreased on the whole with the increase of original peers, which means that larger amount of original peers may induce the negative effect and cause the addition of malicious peers into the cluster. Unlike the precision, the recall curves are a little oscillated, but some curves upraise obviously, such as in Figure 9(c, d); thus, the reasonable selection on the amount of original peers in cluster should be trade-off between recall and precision. The results in Figure 9(a–d, f) show that both recall and precision are high while the percentage of original peers is 5%, and Figure 9(e) shows that the percentage is 4%, which is very close to 5%. So, we think the scale-up of original selected peers in cluster should be proportional to the total amount of system peers, and 5% is a rational selection.

Figure 9.

The recall and precision while the number of network peers are (a) 100, (b) 200, (c) 300, (d) 1000, (e) 2000, and (f) 3000, respectively.

3.4 Convergence and cluster size

We perform our trust mechanism to make those normal peers form a cluster and provide services for other request peers based on Algorithms 1 and 2. The system stability is a crucial feature because it can ensure the efficiency of our proposed trust mechanism. In order to verify the stability, several simulations are performed when the percentages of malicious peers are 40%, 50%, 60%, and 70%, respectively, while facing the IMP threat, and the results are depicted in Figure 10.

Figure 10.

The number of unsuccessful transactions with the increase of time series.

In Figure 10, the number of unsuccessful transactions increases slowly with the increase of time series and finally becomes stable. Meanwhile, the more the malicious peers, the slower the stability trend changes. Compared with the situation when the percentage of malicious peers is 70%, the number of unsuccessful transactions is almost stable and not increasable at early stages when the percentages of malicious peers are 40%, 50%, and 60%. With the increase of the malicious peers, although the trend of avoiding unsuccessful transactions becomes slow, our algorithms can still make it eventually.

To verify the effectiveness of clustering more clearly, we detail another group of simulations shown in Figure 11. We draw the concrete curves that describe the growing process of peer amount in cluster with the increase of time series facing IMP, CMP, and DMP threats. From Figure 11(a)–(h), the malicious peers are 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80%, respectively. Calculating these malicious peer percentages, we can easily get that the theoretical number of normal peers in cluster should be 300 * (1 − 10%) = 270, 300 * (1 − 20%) = 240, 300 * (1 − 30%) = 210, 300 * (1 − 40%) = 180, 300 * (1 − 50%) = 150, 300 * (1 − 60%) = 120, 300 * (1 − 70%) = 90, and 300 * (1 − 80%) = 60. In the experimental results, we can conclude that our trust mechanism almost makes all the normal peers cluster together, which indicates the efficiency and stability. In addition, the simulation results also illuminate that the convergence trend varies when the amounts of malicious peers are different. For instance, the curves become smooth very quickly when the percentages of malicious peers are 10%, 20%, 30%, 40%, and 50%. Unfortunately, the speeds of gathering these normal peers become slow when the percentages of malicious peers are 60%, 70%, and 80%, but our algorithms can still make them form a cluster with the increase of time series.

Figure 11.

The number of peers in cluster with the increase of time series when the percentages of malicious peers are (a) 10%, (b) 20%, (c) 30%, (d) 40%, (e) 50%, (f) 60%, (g) 70% and (h) 80%. IMP, individual malicious peer; CMP, collective malicious peer; DMP, disguised malicious peer.

3.5 IMP simulation and analysis

These IMPs cheat independently. They provide unauthentic file uploading services for request peers, as well as give unauthentic local trust ratings to response peers. According to these characteristics, we run the simulations of our trust mechanism with EigenRep, PathTrust, and Random models. In this paper, in the Random model, those request peers randomly select other peers as the downloading sources without considering any trust or reputation. The corresponding experimental results are depicted in Figure 12.

Figure 12.

The successful downloading percentage with different percentages of individual malicious peers.

In the data, the performance of our trust mechanism is better than the other trust models. Owing to the mechanism that we let the normal peers form a cluster and make them have the priorities to provide uploading services for those request peers, the SDP can maintain high level even when the percentage of malicious peers is 80%. Although the EigenRep performs well at first because of little blindness resulting from those pre-trusted peers that are defined previously, the SDP declines obviously with the increase of malicious peers because of the trust transitivity among plenty of malicious peers. The performance of PathTrust is also poor because the model only considers the local trust values between initiator and candidates as the criterion of selecting downloading source without aggregating complete local trust ratings provided by other peers to the candidates, so the trust mechanism may be one sided. And the Random model performs worst and changes almost linearly.

3.6 CMP simulation and analysis

This type of malicious peers not only has the IMP characteristics but also collaborates with each other. They always exaggerate the local trust ratings mutually, which could make a serious threat to the entire system. According to these characteristics, we run the experiments and the results are described in Figure 13.

Figure 13.

The successful downloading percentage with different percentages of collective malicious peers.

Faced with the united attacks, our trust mechanism still performs better compared with the other trust models. The performance of EigenRep is poor because they cannot identify and isolate those CMPs effectively. With the increase of malicious peers, some of them will get high reputation; then, they in return give high local trust ratings to other malicious peers via trust transitivity chain. Finally, most of the malicious peers will have relatively high reputation by obtaining continuous exaggeration ratings. However, our trust mechanism makes the entire network form a cluster that identifies the normal peers. Then, we make the response peers in cluster provide uploading services for request peers, which can confine the transaction behaviors of malicious peers even when they have high reputation. Nevertheless, these trust models perform poorly when the number of malicious peers is large enough. PathTrust only calculates the trust values between initiator and candidates without aggregating all the local trust ratings; thus, it can confine the exaggerated behaviors among the CMPs, and it performs better than EigenRep. And the Random model is also changeless. According to the results in Figure 13, we can conclude that this type of threat makes a serious threat owing to the united behaviors.

3.7 DMP simulation and analysis

In fact, this type includes IMPs and DMPs. The DMPs are very cunning because they provide popular and authentic files for normal peers to achieve high reputation values in some cases when selected as downloading sources, then give high local trust ratings to other IMPs. In our simulation, the disguised peers are defined as 50% of the total malicious peers (TMPs), and the experiment results are shown in Figure 14.

Figure 14.

The successful downloading percentage with different percentages of total malicious peers.

The performance of EigenRep is good at first because of the pre-trusted peers, which can resist the temptation of those disguised peers. Because the pre-trusted peers have high reputation defined previously, the request peers would like select these pre-trusted peers as downloading sources. However, with the increase of malicious peers, the SDP declines obviously. The reason lies in that the disguised peers could gain positive ratings from request peers by providing authentic files and in turn promote the IMPs. Our trust mechanism always make those request peers select downloading sources from the cluster, which can resist the temptation of disguised peers and confine the transaction behaviors of malicious peers effectively. PathTrust just selects the maximum-weight paths as the trust values between initiator and candidates, which avoids the trust transitivity between disguised peers and those candidates more or less, so the experiment results are a little better than EigenRep. And the Random model is changeless as well. In DMP type, the disguised peers can provide authentic files; thus, the threat is weaker than that in CMP type, but still stronger than that in IMP type.

3.8 Recall and precision of IMP, CMP, and DMP

In order to evaluate the performance of our trust mechanism more comprehensively, we list the experimental data of IMP, CMP, and DMP in Table 2, in which TMP expresses the types of malicious peers, PMP represents the percentage of malicious peers, TNP is the total number of peers in cluster, and NMP is the number of malicious peers in cluster. Then, we calculate the recall and precision via formulas (6) and (7), and the results are depicted in Figures 15 and 16.

Table 2. Number of normal and malicious peers in cluster.
TMPIMPCMPDMP
PMP (%)TNPNMPTNPNMPTNPNMP
10268027122700
20239024222422
30209021342144
40174017941804
50150215661577
60118412781267
7096108987910
80631063136712
Figure 15.

The recall with different percentages of malicious peers. IMP, individual malicious peer; CMP, collective malicious peer; DMP, disguised malicious peer.

Figure 16.

The precision with different percentages of malicious peers. IMP, individual malicious peer; CMP, collective malicious peer; DMP, disguised malicious peer.

Usually, there is an inverse relationship between recall and precision. And it is possible to increase one at the cost of reducing the other. For instance, our trust mechanism can increase the recall by adding more peers into the cluster, at the cost of increasing the number of malicious peers (decreasing precision). Similarly, it can also add a few exact normal peers to achieve high precision by improving trust level between candidate and the peers in cluster while identifying new members after removing the minimum cut, at the cost of decreasing recall. However, in Figures 15 and 16, both recall and precision can remain at a high level with the increase of malicious peers, which shows the rationality and correctness of flow capacity and the amount of original members designed in our trust mechanism.

4 CONCLUSION

We propose a new trust mechanism to make those normal peers form a cluster and provide file uploading services for other request peers. Our trust mechanism has the following properties: (1) trying to isolate those malicious peers and confine their transaction behaviors, we make the normal peers form a cluster by performing maximum flow algorithm repeatedly; (2) in order to ensure that the service providers are normal peers, we consider not only if they belong to the cluster but also their personal reputation values R(i); (3) we define another two criterions (recall and precision) to estimate the performance of our trust mechanism. The detailed experimental results also show that our trust mechanism can achieve a better performance against the threats of exaggeration, cheat, collusion, and disguise.

ACKNOWLEDGEMENTS

This paper is supported by Nature Science Foundation of China under grant no. 60673046 and no. 90715037; University Doctor Subject Fund of Education ministry of China under grant no. 200801410028; National 973 Plan of China under grant no. 2007CB7142057; National Nature Science Foundation of China No.: 61272173; Graduate Creative Talents Project of DUT; Natural Science Foundation of China under grant no. 61103233; NSFC-JST under grant no.51021140004; and the Fundamental Research Funds for the Central Universities under grant no. DUT12JR08.

Ancillary