## 1. Introduction

Our aim in this paper is to study the mechanisms behind the growth of large networks. The models that we consider are random-graph processes where nodes and links are added to the graph, but without removal of either nodes or links.

In recent years, empirical data on large-scale networks have become available and have provided valuable insights into the structure of real world connected systems. In many complex networks it is found that the degrees of the nodes follow a power law distribution. In an early reference in this field de Solla Price (1976) found that the number of citations of scientific articles follows a power law, and he showed that this power law can be explained if one assumes that the probability that a new publication cites a given article is proportional to the number of citations. This was taken up by Barabási and Albert (1999), who proposed a similar model for the World Wide Web, and who coined the name ‘preferential attachment’ for this mechanism. Today, most existing models that are aimed at reproducing power law graphs incorporate a preferential attachment mechanism of some kind; numerous examples can be found in Dorogovtsev and Mendes (2003). However, already in 1925, Yule had published a closely related stochastic branching model. The Yule process is general and has been adapted to networks; see Krapivsky *et al.* (2000), Newman (2005) and Dorogovtsev *et al.* (2000).

Jeong *et al.* (2003) proposed a method to identify a preferential attachment process. The idea is to estimate, for every *k*, the intensity Π(*k*) by which nodes of degree *k* acquire new links during a small time interval Δ*t*. The preferential attachment hypothesis predicts that the rate Π(*k*) is a monotonically increasing function of *k*, and Jeong *et al.* (2003) found that this is indeed so for several real world networks. In other words there is a correlation between a node's attractiveness and its popularity, where we interpret the *attractiveness* of a node as its rate of acquiring new links, and the *popularity* of a node as the number of links that it has already acquired.

However, the preferential attachment hypothesis actually makes a stronger prediction, namely that a node is attractive *because* it is popular, i.e. a node with high degree attracts new links at a higher rate than other nodes *because* it has high degree. Despite the popularity of the preferential attachment hypothesis in explaining the structure of complex networks, it seems natural also to investigate the opposite relationship: a node may have some intrinsic quality causing it to acquire links at a higher rate than other nodes. Such nodes will in general end up with higher degree than less attractive nodes; thus, in this setting we shall also find an increasing Π(*k*), although the growth mechanism is different from preferential attachment.

In some networks a frailty mechanism provides a seemingly reasonable explanation of the connectivity distribution. For example in a sexual network the degree of a node (i.e. the number of previous sexual partners) is not displayed and cannot be used directly as a selection criterion. Likewise, research papers with a high citation count are likely to have some merit, and are, one would hope, cited again because of their quality, and not merely because of their previous popularity. However, papers with many citations are more likely to be found in a literature search by a potential citer and are also for this reason more likely to be cited. It seems reasonable that in many real networks there is a mixed mechanism, such that a node's attractiveness increases with its popularity, but such that there still is heterogeneity between the nodes, which is not directly related to their degrees.

In this paper we shall study a random-graph model which evolves by a mechanism which mimics this behaviour, and we shall investigate statistical methods by which we can distinguish such a process from a pure preferential attachment process. The random-graph model that we propose we call a * frailty graph*. At birth every node *v* is assigned a * frailty Z*_{v}, which is distributed according to some probability distribution *Z*. Frailty is a term that is used in event history analysis to describe unobserved heterogeneity in data. The random graph then grows in such a way that existing nodes acquire new links with a rate that is proportional to its assigned frailty variable. We shall assume that we cannot observe the frailty *Z*_{v} for a given node *v*; however, it is possible to infer properties about the distribution of *Z* from the behaviour of the random-graph process.

We can consider a preferential attachment process and the frailty graph process as two extremes: in the preferential attachment process nodes are equal at the outset, and their rate of acquiring new links depends solely on their degree. Nodes which enter the graph at a later time, or nodes that are unlucky in the beginning and acquire few links, will have little chance of overtaking nodes with an early success. In the frailty graph process, however, there is an inherent unevenness between the nodes from the beginning and, even if a node with unfavourable frailty is lucky and receives several links in one time period, this will not affect its ability to attract new links in another time period. Furthermore, entering the graph at a later time is not an obstacle to becoming a successful node, as long as the node has a favourable frailty variable.

The method to identify preferential attachment, which was proposed by Jeong *et al.* (2003), considers the degree distribution of the network at two relatively close observation times and estimates the link acquiring rate of a node as a function of its degree. As explained above, this method does not distinguish a network evolving by the preferential attachment mechanism from a frailty network. As an alternative we propose methods which make use of three observation times, *t*_{1}, *t*_{2} and *t*_{3}, or two observation times combined with information on age.

A neat property of the preferential attachment mechanism is that it automatically leads to graphs where the degrees follow a power law distribution, provided that the rate with which a node acquires new links is growing linearly with its degree. A power law distribution has been observed in many real world complex networks; for this reason the preferential attachment hypothesis is an appealing explanation. In the frailty model we obtain a power law distribution only if the frailty variable itself is power law distributed; thus, a power law may be more difficult to rationalize if we assume that a frailty process is the underlying mechanism of the network. However, power laws are observed in many situations in the real world, e.g. in social interactions as pointed out by Zipf (1949), and may be caused by several mechanisms (Newman, 2005), so it is not so far fetched to assume that such a frailty variable may have a power law distribution. In this paper we derive a distribution for the frailty variable which ensures that the graph process asymptotically has the same degree distribution as the Yule process and thus follows a power law.

We shall also consider two real world networks, namely citation data for a certain set of mathematical research papers, and the sexual network of a subset of the Norwegian population. Much work has been done on the study of sexual networks, since this is relevant with regard to the spread of sexually transmitted infections. In a study of homosexual men attending a sexually transmitted infections clinic, Colgate *et al.* (1989) observed that the sexual contacts follow a power law degree distribution. This finding was supported by Liljeros *et al.* (2001), on the basis of analyses of data from a population-based sexual survey in Sweden, and later Liljeros *et al.* (2003) suggested that preferential attachment could be of relevance for sexual network growth. However, these issues have been the subject of some controversy, and existing data samples are too small to verify scaling behaviour for more than 1–2 decades. In an attempt to find growth models for sexual networks, Jones and Handcock (2003) and Handcock and Jones (2004) noted that heterogeneity between the nodes for forming new links is the mechanism that is best supported by data, and they concluded that

‘a unitary behavioural process, such as preferential attachment, is unlikely to underlie empirical sexual network degree distributions’.

Moreover, they found that, although the sexual network in some portions of some societies are well fitted by a power law, a better fit is generally obtained by a negative binomial distribution and variants thereof. Similarly, Hamilton *et al.* (2008) found that, when compared with alternative models, the power law hypothesis fails to have consistent support. In a different setting, Stephen and Toubia (2009) compared preferential attachment with other possible mechanisms to find an explanation for the process of link addition in a certain social commerce network. They found that the mechanism which fits the data best is a mechanism that is based on vertex attributes, akin to a frailty effect.

Real networks are governed by complex social, behavioural and evolutionary dynamics and the frailty and preferential attachment graph models that are considered here are idealizations that involve severe reduction of that complexity. It is probable that empirical networks may evolve by a combination of the two mechanisms in addition to time inhomogeneous growth. In the first part of the paper we focus on the two ‘pure’ graph models and investigate the theoretical differences between them; then we shall consider some data examples and discuss how they relate to the two models.