Complex Networks and the Web: Insights From Nonlinear Physics

Authors

  • Andrea Scharnhorst

    1. Senior researcher in the Networked Research and Digital Information (Nerdi) group at the Royal Netherlands Academy of Arts and Sciences. She obtained her Diploma in theoretical physics and her PhD in philosophy at the Humboldt University Berlin. She was employed at the Academy of Sciences of the GDR and later at the Wissenschaftszentrum Berlin für Sozialforschung. Her research interests cover models of self-organization and evolution of complex systems and their application to social sciences, bibliometric analysis and evaluation, and Web based science, technology and innovation indicators. Currently she is coordinating the EU-funded WISER project.
    Search for more papers by this author

Abstract

The Internet and the Web can be described as huge networks of connected computers, connected Web pages or connected users. The formal structure of these networks is expected to represent patterns of communication and organization, and to influence the nature of communication in these networks. A number of approaches have been developed to study these phenomena. This paper reviews the emergence from theoretical physics of a new specialty in complexity theory which analyses the Internet and the Web as complex networks. Concepts and findings from this area of physics are reviewed and made accessible to a non-physics audience. In complexity theory, the concept of connectivity is expressed by mathematical laws, addressing the distribution of links over nodes, the emergence of hierarchies or the behavior of “inhabitants” of such networks. The paper begins with an introduction to the topological classification of complex networks as “small world networks” and “scale-free networks.” It discusses how specific topologies or connectivity patterns are based on the construction and growth of such networks. Major findings about the Internet and the Web are discussed. The paper also explores the possibilities of linking statistical empirical analysis and mathematical modeling to qualitative research as a way of gaining insight into the emergence of complex networks.

Introduction

Networks surround us; we are built from networks and form part of networks. In general, a network is understood to be a set of nodes related to each other by certain links. However, beside this abstract structural description, the appearance of networks is amazingly diverse. One can look at networks of railways and electricity lines connecting power stations, food Webs and networks of neuronal cells, semantic Webs and networks of trade relationships1.

The interest of science in networks has a long tradition. More recent examples are the foundation of random graph theory in mathematics (Erdös, Rényi) starting in the 1930s (Bollobás, 1985) and the emergence of social network analysis as a field in sociology in the 1960s and 1970s (Scott, 2001). Graph theory has long since been described as part of system theory (Laue, 1970). The last decade witnessed an additional increase in interest in describing and analyzing networks. Several books have appeared about the networked character of nature and society (Barabási, 2002; Bornholdt & Schuster, 2003; Buchanan, 2002; Dorogovtsev & Mendes, 2003; Huberman, 2001; Watts, 1999). Interestingly, many of these books are written or edited by physicists.

Once again, physics seems to be taking the initiative in discovering general laws in quite different phenomena. Like the theories of self-organization in the 1970s and 1980s and the complexity theories in the 1990s, network theories are used to establish a bridge in the explanations of different phenomena in society and nature. One can see this development as part of a more general approach of complexity theory2. Physics contributes to the interdisciplinary field of complexity research. As I will discuss in this article, complex networks theory is an emerging specialty inside statistical and non-linear physics. Its aim, however, is to describe phenomena far outside physical processes. In talking about a bridge in the explanation of nature and society, one would expect to observe a diffusion of methods between social and natural sciences. The mutual reception of methods and concepts around complex social networks between physics and sociology is still in its early stages. Although social networks have been studied with methods from complex network theory (Barabási, Jeong, Néda, Ravasz, Schubert & Vicsek, 2002; Newman, 2001a, 2001b; Redner, 1998; Watts, 1999), few papers discuss concepts and methods of social network analysis on the one hand and (physical) complex network theory on the other (Watts, 1999; Watts, Dodds & Newman, 2002). I find the framework of models and concepts developed so far inside physics impressive enough to be worthy of the attention of scientists outside of physics. One aim of this paper is therefore to expose the findings of the physics community in the area of complex networks to a wider audience.

This paper was initially motivated by the sudden increase in the number of papers in physics journals dealing with the Internet and the Web. This raises the question of what makes the Web so interesting to a theoretical physicist. In fact, this development turned out to be an indication of the emergence of a new specialty inside statistical physics that best can be described as complex networks theory. It will be seen that the concept of connectivity is specifically shaped in this literature. By presenting some details of theories, methods and empirical studies in complex networks theory, it will possible to explore intersections between quantitative and qualitative research.

As previously mentioned, the Web plays a prominent role in this paper, since it served as a prominent example for network studies. The development of the Internet and the Web served as a trigger in the increase in the number of papers about complex networks, which is examined via a bibliometric analysis.

The main part of the paper is devoted to a review of results from the emerging field of complex networks theory. It contains sections about the topology and statistics of networks, growth and the evolution of networks, and dynamic processes on networks. The paper uses examples of the application of this knowledge to the Web and the Internet.

A Bibliometric Exercise to Begin: A Sample of “Complex Networks” Papers

Two years ago, Bernardo Huberman published a book called The Laws of the Web (Huberman, 2001), summarizing the work of a group previously called “Internet ecologies.” It analyses the Web from the point of view of statistical physics. At present, several groups are active in the field of complex networks, investigating not only the Web but several other networks. Hundreds of articles have appeared in this area. To make this development visible in a bibliometric study, the Web of Science database3 (http://www.isinet.com/isi/products/citation/wos/) was examined. To get a representative sample of relevant articles, I decided to search for a certain string (topic search). This covered title, keywords and abstracts of documents4 in the database. I started with different combinations including the term “network” in the period from 1981 to 2002 in all databases. The string “network*” alone delivered over 100,000 hits. The combination “dynamic networks” produced about 100 hits, mostly in physics journals. Finally, “complex networks” gave 248 hits (end of September 2002). I decided to rely on the phrase “complex networks” as one that produced a broad, yet manageable sample for my purposes. It should be noted here that I'm quite aware of the limitations of a keyword search. Several articles relevant in the field do not contain this phrase in their titles, keywords or abstracts, and have been identified as relevant in parallel by a literature search.5 Furthermore, terms describing the same phenomena may well differ from field to field. The sample of 248 documents does not represent a full coverage of articles devoted to the phenomena “complex networks.” It does, however, cover a broad selection of highly relevant articles in many areas, not only in physics. It can be seen as the tip of an iceberg.

The documents contained in the sample have been classified manually into categories representing areas of scientific research. I used a classification from the Science and Engineering Indicators6 that considers life sciences (clinical medicine, biomedical research, and biology), physical and environmental sciences (chemistry, physics, and earth and space sciences), mathematics, engineering and technology, social and behavioral sciences (social sciences, psychology, health, and professional fields). The documents have been classified by considering both their topic (title and abstract) and the journal in which they appeared. I choose not to differentiate within the life sciences category. For example, an article about “Complex networks of interactions connect genes to phenotypes” in the journal Trends in Biochemical Sciences (Cornish-Bowden & Cardenas, 2001), as well as an article about “Species positions and extinction dynamics in simple food Webs” in the journal Journal of Theoretical Biology (Jordan, Scheuring & Vida, 2002) belong to the category of life sciences. As I was interested in the specific approach developed within physics for complex networks I differentiated in the next category of physical and environmental sciences between physics on the one hand, and chemistry and space and earth sciences on the other. The last two fields contain a very small number of articles, and are summarized in the graphical presentation. Many of the articles classified into physics carry “complex networks” in their titles (for example, “Exploring complex networks” (Strogatz, 2001) or “Statistical mechanics of complex networks” (Albert & Barabási, 2002)). Other titles seem not to refer to physics at all, like “Topology of the conceptual network of language,” but do appear in mainstream physics journals such as Physical Review E (Motter, de Moura, Lai, & Dasgupta, 2002). Nine articles from a total of thirty physics articles are published in Physical Review E, a journal devoted to statistical, non-linear and soft matter physics.7 The other articles are distributed over Nature, Physical Review Letters, Physica A, and other physics journals. Mathematics, engineering and technology have been merged into one field. This part of the sample contains articles like “Expansion method for the throughput analysis of open finite manufacturing/queueing networks with N-policy” in the journal Computers and Operations Research (Kavusturucu & Gupta, 1999). Although operations research is an interdisciplinary field addressing topics in ecology, transportation, urban planning as well as economics, it is mostly based on specific mathematical methods and the use of computers.8 This last example is representative of a specific and mathematical abstract approach to complex networks. As a last category social and behavioral sciences have been considered. One example of this category is an article about the evolution of communication systems. Here, the author used the phrase “complex networks” metaphorically to describe that “self-organization can be distinguished in terms of developmental stages of increasingly complex networks.” (Leydesdorff, 1994).

Figure 1 shows the distribution of documents over fields. Due to the small number of documents in the period from 1981 to 1990, only the period after 1990 is presented in Figure 1.9

Figure 1.

The growth of documents with the phrase “complex networks” in title, keywords or abstract in different fields over time (Source: Web of Science, September 2002).

Although a burst of articles noticed in physics led me to look into the phenomenon of “complex networksm” physics articles are almost absent in the early years. In life sciences, the notion of “complex networks” was used earlier, and quite extensively. Graph theory in mathematics, the analysis of power networks, road networks and telecommunication networks, and the analysis of traffic in computer networks were other contexts in which the term was used frequently over the years. Only a few articles with “complex networks” appear from time to time in chemistry or earth and space sciences. In social and behavioral sciences the term appears in the range of one to five documents.

As can be seen in Figure 1, the number of physics articles using the notion “complex networks” has increased tremendously since 2000. As we will note later, most of these papers are related to a trend in theoretical physics, more particular, in statistical and non-linear physics which established a specialty. The article concentrates on this specialty.

Besides the different contexts in which networks are studied, they can also be described at different levels, focusing either on parts of networks, entire networks, or the function of networks in larger processes. The following topics form the core of the literature about “complex networks”: network structure, network statistics, network evolution and network dynamics. In the following section I introduce some of the aspects of these topics discussed in the physics literature.

General Laws of Complex Networks: The Emergence of the Specialty “Complex Networks Theory”

Real World Networks, Complex Networks Theory and the Role of the Web

In physics, “complex networks” is not only a notion used frequently, it is also seen as a field where “dramatic advances … have been witnessed in the past few years” (Wang, 2002). Inside this field, the structural and dynamic properties of networks on a general level, independent of the concrete nature of the elements (nodes), and the links (edges), are analyzed. Although mathematical description seems to dominate the work, physics articles are different from pure graph-theoretical analysis in mathematics. An important aspect of the validity of the “laws” found is always its proof in real network structure. The notion “'real world' networks” has been introduced to describe this feature (Strogatz, 2001). The fascination for physicists comes wholly from the realization that one and the same mathematical law, and supposedly also the same underlying mechanism, can be used to describe quite different real networks.

The development of the field of network analysis, or “complex networks” in statistical physics, was mainly inspired by the Internet and the World Wide Web. Without the availability of digitized data, none of the examples actually used in complex networks theory could have been exploited. Availability of data in digitized form facilitate computation in particular for huge networks. To give an example, topological properties of “metabolic networks of 43 different organisms based on data deposited in the WIT database”10 have been analyzed (Jeong, Tombor, Albert, Oltvai, & Barabási, 2000, p.651). Furthermore, the Internet and the Web served as objects of investigation. In a recent review, Dorogovtsev and Mendes used the Internet and the World Wide Web as THE examples of complex networks (Dorogovtsev & Mendes, 2002). They write about the origin of the rapid growth in this field: “The first experimental data, mostly for the simplest structural characteristics of the communication networks, were obtained in 1997-1999. … After these findings, physicists started intensive study of networks in various areas, from communications to biology and public relations.” (p.2) The literature Dorogovtsev and Mendes refer to stresses the Web or the Internet as a field of study in the titles.11 All of them have the Web as the main locus of investigation. Other authors report the same conclusion: “With the recent appearance of the Internet and the world-wide-Web, understanding the properties of growing networks with popularity-based construction rules has become an active and fruitful research area.”12 (Krapivsky & Redner, 2002).

A second reason for the explosion in the number of complex networks articles is related to the increased use of information communication technologies (ICT's). In another review Wang writes: “In the past few years, the computerization of data acquisition, and the availability of high computing power have led to the emergence of large databases on complex topology of various real networks… The availability of these huge amount of real data has in turn stimulated great interest in trying to uncover the generic properties of complex networks” (Wang, 2002).

The examples given in physics cover a wide range of connotations and quite different disciplines. Table 113 contains examples of specific networks that have been investigated from the perspective of complex network theory. Bearing in mind the diversity of substantial contexts in which “complex networks” appear, one can understand the enthusiasm inside physics about the universality in mathematical terms of the laws found by applying the concepts of theoretical physics.

Table 1.  Examples of real-world networks analyzed from the perspective of complex network theory.
Network/referenceNodesLinksFeatures
Electronic circuits (Ferrer, Janssen, & Sole, 2001)Electronic components (resistors, diodes, capacitors)WiresSmall world, scale-free
Southern California Power Grid (Amaral, Scala, Barthelemy, & Stanley, 2000)Transformers, substations, generatorsHigh-voltage transmission linesSmall world, not scale-free
Network of World Airports (Amaral et al., 2000)AirportsNon-stop connections14Single scale network
Metabolic networks (in 43 organisms like bacteria, archaea, and eukaryotes (Jeong, Tombor, Albert, Oltvai, & Barabási, 2000)Substrates, enzymes, intermediate complexesBiochemical reactionsScale-free
Conceptual network of language (Motter, de Moura, Lai, & Dasgupta, 2002)WordsAppearance in the same entry of a thesaurus of English languageSmall world, scale-free
World Wide Web (Adamic, 1999) (Albert, Jeong, & Barabási, 1999)Sites (Pages)HyperlinksSmall world, scale-free
Internet (Vázquez, Pastor-Satorras, & Vespignani, 2002)Routers15WiresScale free
Co-authorship network (Barabási et al., 2002)Authors of scientific articlesCo-authorshipScale-free

Topology and Statistics of Networks: Static Descriptions

Small world networks and “six degrees of separation”

The concept of “small world networks” was introduced by Watts and Strogatz (1998) to characterize networks in which nodes are linked to each other by only a few nodes in between (Watts, 1999; Watts & Strogatz, 1998). Interestingly, the name comes from an experiment Stanley Milgram did in the mid-sixties (Milgram, 1967). This experiment is a social psychological one and has at first glance nothing to do with mathematical definitions. In one description of the experiment (accounts of the exact details vary), Buchanan explains that Milgram asked people to send a letter to “a stockbroker friend of his living in Boston, but he did not give them the address. To forward the letter, he asked them to send it only to someone they knew personally and whom they thought might be socially ‘closer’ to the stockbroker” (Buchanan, 2002, p.13). Most of the letters arrived, and surprisingly, took on average only six steps. The phrase “six degrees of separation” comes out of this experiment. The relevance to small world networks is that it makes the structure of ties-the social network-between people visible.

Small world networks can be constructed from regular networks just by adding random elements to them (Buchanan, 2002). They are located somewhere in the middle between a totally regular graph and a random graph.

Even in large networks, the distance between two nodes can be small. The distance is usually defined as number of edges or links that connect them along the shortest path. Small world networks also share these properties with random networks. The difference with randomly created networks is that small networks have a higher clustering coefficient, C. The clustering coefficient, C, of a network characterizes the extent to which nodes connected to a certain node are also connected to each other. It compares the number of existing links in a neighborhood of a node with the number of all possible links in that neighborhood. For social networks one could say that the clustering coefficient expresses “the degree to which a person's acquaintances are acquainted with each other and so measures the cliquishness” of the friendship network of a node (Watts, 1999, p. 33). If the (local) clustering coefficients around nodes correlate with other characteristics of the nodes, as, e.g., the number of links to a node, a hierarchical structure is present in the system (Serrano & Boguñá, 2003). Often, the clustering coefficient of real world networks is compared with the clustering coefficient for a random network that can be calculated according to a formula. Strogatz compared the clustering coefficients for three affiliation networks: company directors (linked by joint membership in boards of firms), movie actors (linked by mutual appearance in films) and authors of biomedical papers (linked by co-authorship) (Strogatz, 2001). When the predicted and measured clustering coefficients are near to each other, as in the case of the directors network, Strogatz interpreted this result as “no additional social forces need to be invoked” in the explanation (p. 272). In the case of the two other networks, the prediction “accounts for about half of the observed clustering. The remaining portion depends on social mechanisms at work in these communities” (p. 272).

Another definition of small networks examines the local neighborhood and the diameter of the network. The diameter of a network is usually defined as the average shortest-path length of a network.16 Small world networks are then defined as (1) having local neighborhood as in the case of regular lattices and (2) such that the diameter of the network increases logarithmically with the number of nodes (as it is the case for random networks) (Amaral et al., 2000; Hayes, 2000). In this way, a small world network combines local interaction with the possibility of long distance interaction. Where local interaction is represented by so-called strong ties (or close-knit friends), weak ties (or acquaintances over several nodes) are responsible for the bridging of different closely connected parts of a network. This non-local feature is important for the functioning of social networks (Granovetter, 1973; Watts, 1999, p. 21).

The detection of clustering coefficients and shortest path length helps to show the small world feature. A further step is the construction of models or algorithms to explain the emergence of such small world phenomena. In other words, one asks how people are able to find the shortest path between nodes, having only local information about the nodes next connected to them (Kleinberg, 1999; Newman 2000).

Scale-free networks and connectivity distributions

Albert and Barabási introduced the notion of a scale-free network to characterize a specific distribution of links over nodes. (Albert & Barabási, 2002) Empirically, how many nodes have zero links, one link, two links and so on can be counted, and frequency distributions constructed. The number of links to a node is called the degree of a node. So these distributions are sometimes called degree distributions. In other articles they appear under the name “connectivity distribution.” In a random graph, the degree distribution follows a Poisson distribution with almost a bell shape (see Figure 3). One finds a dominant average of links per node (in our demonstration case it is <k> = 5). This average defines a certain scale on which most of the connected nodes can be found. Further, Barabási notes that the exponential decay of [the distribution function] P(k) guarantees the absence of nodes with significantly more than <k> links. (Barabási, 2001). In other words, nodes with significantly more than <k> links are very rare.

Figure 3.

Degree or connectivity distribution for random graphs (pink line) and for scale-free networks (blue line)

This is not the case in scale-free networks. Instead, they display an extremely skewed distribution with a long tail. Mathematically, such distributions can be described by a power law: “…the power law distribution implies that there is an abundance of nodes with only few links, and a small-but significant-minority that have a very large number of links.” (Barabási, 2001). So-called skew distributions characterize a broad range of phenomena, ranging from income distribution to size distributions of firms, or word distributions in a text (Ijri & Simon, 1977). The distributions carry different names like Zipf, Pareto or Lotka distribution.17 Skew distributions follow a power law. Power laws in nature often characterize the transition from disorder to order. Therefore, the finding of a power law is particularly important because it comes together with the presence of self-organizing mechanisms. For network theories, the appearance of power laws stands for a kind of paradigmatic change. Barabasi writes that through these findings complex networks were lifted out of “the jungle of randomness where Erdös and Rényi had placed them forty years earlier and dropped them into the center of a colorful and conceptual rich arena of self-organization.” (Barabási, 2002, p.77)

Building an average for skew distributions is less meaningful. In other words “there is no characteristic scale for the node degree” (Krapivsky & Redner, 2002). For this reason, skew distributions are labeled as scale-free. Mathematically, a power law distribution means that the probability of finding a node with k links to other nodes is proportional to k. If the data of the distribution are plotted in a so-called log-log plot, where both the x axis and the y axis have a logarithmic scale, they should follow a straight line (for k0=0). The slope of this line is equal to the parameter γ. Each power law is characterized by such an unique number. This makes the exponent γ very important for the analysis, and often papers concentrate on fitting the degree exponent γ from data taken from different real world networks. The other aspect in the relevance of measuring the exponent γ is the question of why a certain parameter γ occurs. Different values of the exponent staying for different power laws indicate different dynamic mechanisms working behind the distributions; similar values indicate the action of similar mechanisms. I will show later that certain theoretical mathematical models produce quite specific values for γ. Also, the resistance of networks against attacks (i.e. the removal of nodes or links) seems to depend critically on the value of the degree exponent.

If one considers real world networks, different values of γ have been found. In the review of Dorogovtsev and Mendes alone, thirty-one different estimates for degree exponents from real world networks are reported. Here, γ ranges from around 1.5 for networks built from words, to around 2.2 for metabolic networks, around 2.5 for protein-protein interactions, around 2.5 for collaboration networks and between 2.5 and 3.0 for citation networks (Dorogovtsev & Mendes, 2002).

For the topological structure of the network, the power law “predicts that most nodes have only few links, held together by a few highly connected hubs.” (Barabási, 2002, p. 71). These hubs occur in scale-free networks with a much higher probability than in random graphs. Goh and others point to the fact that this imbalance between many sparsely connected nodes and few intensively connected nodes plays an important role in functionality (Goh, Oh, Jeong, Kahng, & Kim, 2002). As I will discuss later this property has consequences for dynamic processes going on in the network.

The power law cannot always be observed in a pure form in real data. Therefore, other types of distributions have been introduced in the literature. Besides scale-free distributions, so-called broad-scale networks and single-scale networks have been defined. Broad-scale networks show a (scale-free) power law behavior over different scales, followed by a sharp cut-off. Single scale networks have an exponentially decaying tail (Amaral et al., 2000).

Scale-free networks themselves can be further investigated according to additional topological characteristics. For example, it has been shown that the existence or non-existence of tree-like structures inside the network leads to different stability properties against the removal of nodes or links (Goh et al., 2002).

The Web as small world and scale-free

To investigate the connectivity distribution of the Web, several authors have considered the structure of hyperlink networks. Here, the Web documents act as nodes and the links are created by hyperlinks between them. Hyperlinks, in terms of their techno-social implementation, lead from one Web source to another and can be considered directed links. Therefore, one differentiates between in-link distributions (links which point to a page of interest) and out-links distributions (links which start from one page of interest and point to other pages). In 1998 Albert, Barabási and Jong wrote a computer program (robot or Web crawler) “that adds to its database all URLs found on a document and recursively follows these to retrieve the related documents and URLs” (Albert, Jeong, & Barabási, 1999). They studied the domain of their own academic institution (nd.edu) with 325,729 documents and 1,469.680 links. Their expectation was “to find that Webpages are connected to each other randomly” (Barabási, 2002, p. 66). Instead, they found that the tail of the incoming link distribution follows a power law with an exponent γin= 2.1. The outgoing link distribution has an exponent of γout= 2.45. They also looked at other domains such as whitehouse.gov, yahoo.com and snu.ac.kr and found the same power law for the different domains. Kumar and co-authors reported similar findings in a crawl of the Web they did in 1997, provided by the search engine Alexa (Kumar, Raghavan, Rajagopalan, & Tomkins, 1999). In a technical comment on a paper of Barabási and Albert18, Adamic and Huberman wrote about the result of a crawl already done in 1997 that delivered a distribution function for the number of in-links with a slope γout= 1.94 (Adamic & Huberman, 1999). The question that remains after all these findings is how “millions of Webpage creators work together in some magic way to generate a complex Web that defies the random universe” (Barabási, 2002, p. 68).19

As well as degree distributions, other networks characteristics have also been calculated for parts of the Web. Albert, Jeong and Barabási found that “two randomly chosen documents are on average 19 clicks away from each other.” (Albert et al., 1999). Adamic calculated that the average shortest path between two nodes is around three clicks, using data from a crawl performed by the search engine Alexa in 1998. The clustering coefficient for that real world network was calculated with 0.1078 compared to 0.00023 for a random network with the same number of nodes and links. Both the clustering coefficient, and the shortest path between two nodes, point to the fact that the Web also shows small world networks features. According to the literature, the Web shows characteristics both of small-world networks and scale-free networks, and this holds also for other real world networks.

The relationship between small world and scale-free networks

Reading the literature, one might get confused about the two characteristics “being a small world” or “being scale free” for complex networks. Sometimes, both characteristics are attributed to networks. Sometimes, the radically different character of these two types of networks is highlighted. Part of the confusion arises from the fact that different levels of reflection are being applied. Mathematical models, as theoretical explanations for empirical facts, follow one type of reasoning. Empirical analysis of real world networks follow another. In the search for explanations for both small world networks as scale-free networks, prototypes of models have been developed at the theoretical level. Small world networks are represented by the Watts-Strogatz (WS) model, and scale-free networks by the Barabási-Albert (BA) model. I will introduce the models in more detail in the next section. For the time being, let me state here that they represent two distinct classes of models. As Albert and Barabási showed, the WS model of a small world graph simply has a degree distribution similar to a random graph and is not scale-free (Albert & Barabási, 2002). Holme and others note that the WS model has high clustering like, e.g., social networks, whereas the BA model has a clustering coefficient that scales toward zero as N [the number of nodes] approaches infinity (Holme, Kim, Yoon, & Han, 2002). The BA model produces a scale-free network, which is not a small world. However, it shares one feature with a small world network, namely that two randomly chosen nodes are connected by a very short path. Meanwhile, to make the situation even more complex, other models have been developed which show both properties.

In terms of empirical analysis, real networks usually exhibit a variety of degree distributions and, as I mentioned above, do not meet the exact criteria of a power law. In addition, the smaller the network in terms of number of nodes and links, the more difficult a statistical analysis of its static properties becomes. Consequently, it is of no surprise that real networks display different features. The creation of variants of the two “prototype models” mentioned above is a theoretical reflection of variation in empirical measurements. Furthermore, one can apply different strict criteria for both types of networks. Depending upon which theoretical definition is chosen, the properties of both network types in real world networks can be found.

The importance of connectivity and other statistical properties of networks

Let us come now to some consequences of topological analysis. To determine the static properties of a network is not only of mathematical interest, but also has practical consequences. In the case of small-world networks, Huberman wrote: “Political influence, searching for a job, even the spread of diseases and other forms of social contagion, such as rumors or news, depend on the existence of such short chains of acquaintances, in ways that have been documented by social network scientists for over four decades” (Huberman, 2001). For the Web, one question is to what extent information about its connectivity distribution can be used to find information more easily, for example, the development of specific search algorithms that do not proceed in a random way but use “strategies which preferentially utilize the high connectivity nodes…” (Adamic, Lukose, Puniyani, & Huberman, 2001). Adamic et al. call these local search algorithms. The authors further argue that “it may be not coincidental that several large networks are structured in a way that naturally facilitates search” (Adamic et al., 2001). It seems that networks where information is located and distributed locally, without perfect global control, tend to follow a power law with the exact exponents that are favorable to local search mechanisms (Adamic et al., 2001).

Statistical analysis, as introduced so far, focuses on the outcome of the interplay of hundreds and thousands of individual acts at the macro level of the system. Although the individual behavior at the micro level is not described in detail, it remains responsible for what can be traced on the macro level. The surprisingly ordered outcome of all these chaotic individual actions feeds back to the individual level.

The topology of a small world network, or a scale-free network, sets boundary conditions for individual behavior. As the information about this topological structure is expressed using probability distributions, the model can still account for a lot of variation and randomness created at the individual level. There are no strict deterministic rules an individual has to follow, but the probability for certain actions might be higher than for others. For example, in a random graph, two nodes are no more likely to link to each other by virtue of having a common neighbor. In small world networks, just the opposite is the case and a high clustering coefficient is a measure of this transitivity. An example of a real world network with this property is the network of co-authorships. Newman calculated clustering coefficients for co-authorship networks in the Physics e-Print archive and the high energy physics SPIRES database (Newman, 2001a). All these coefficients have higher values than those expected for a random network of the same size. Newman argues that multiple co-authorship, e.g., articles having three authors, contribute to the higher clustering. But, this effect can account for some, though not all of the clustering seen in their graphs. He concludes that other social mechanisms seem to be active in gaining acquaintance by a common collaborator; being part of the same circles (conferences, journals); or working at the same institution. A smaller value of a clustering coefficient in the MEDLINE database compared with the physics networks is interpreted as the expression of more hierarchically organized collaboration structures in biomedical research (Newman, 2001a). The indicators introduced so far can thus describe the way an individual makes connections to others. For groups of individuals, a tendency in behavior is defined that at the end will lead to a certain topology. I will make these tendencies more explicit in the next section, where I consider mechanisms that lead to different topologies.

To summarize, in order to understand the nature of the mechanisms that lead to a certain network, it is useful to look at the topology of the finally emerging network. Are links created totally randomly (random networks), are they created following precise rules (regular networks), or is there a combination of randomness and certain rules present (complex networks as small worlds or scale-free networks)? In addition, networks are not only constructed differently, but they also behave differently depending on different topologies.

Growth and Evolution of Complex Networks

Random networks remain a special reference point for the characterization of complex networks. The classical random network is usually considered as consisting of a fixed number of nodes. Where there is growth, this only affects the distribution of links between these nodes. To be able to reproduce the features of real networks, the fact that the network itself both grows and/or declines has to be taken into account. Therefore, growth can follow quite different mechanisms. One can differentiate between four different elementary processes: addition or removal of a node and addition and removal of a link (Albert & Barabási, 2002).

Each of these processes might have a certain probability of occurrence in a real world network. For instance, in citation networks, links remain stable whereas in co-authorship networks, co-authorship links can be renewed. For the Web, both nodes (pages or sites) and links (hyperlinks) are in a state of continuous change.

Exponentially growing networks and small world networks

The simplest random network, with a growing number of nodes, can be constructed in the following way: in each time step, one new node is added to the network. For the new node, one selects another node, total randomly (“without any preference”), to which the newcomer will be linked. One can get analytical expressions for the degree distribution of such a growing network. According to the formula, the distribution has an exponential form. It is different from random networks but shares with them the rapid decrease of the degree distribution for highly connected nodes. Such networks are labeled s exponentially growing networks (Dorogovtsev & Mendes, 2002).

The “recipe” for a small world network usually begins with a constant number of nodes linked according to a regular order (Figure 2).For instance, the nodes form a circle line where each node is connected to its immediate neighbor and the neighbor after that. Then, two scenarios are discussed: random rewiring of nodes with a certain probability (Watts-Strogatz model) and the addition of links to randomly chosen nodes with a certain probability (Newman-Watts model) (Wang, 2002).

Figure 2.

Schematic representation of the “creation” of a small world network from a regular network and the corresponding degree distributions.

The principle “popularity is attractive” or the mechanism of preferential attachment

Scale-free networks are constructed differently; not only the number of links but also the number of nodes grows. One starts with a small number of nodes. Then, new nodes are introduced and connected to a certain number of already existing nodes. These nodes are chosen with a probability proportional to the degree k (the number of links) they have. In other words, nodes that already have a lot of links are more often chosen as partners for the new nodes. This mechanism is described as “preferential attachment.” If the probability Π is only a linear function of the degree k, and increases with k, this mechanism produces a scale-free network. The degree distribution of this network has the form P(k) is approximately equal to kγ, with the exponent γ equal to 3. This model is called the Barabási-Albert model.

According to Barabási the combination of growth and preferential attachment is a simple model for producing a hierarchy: “A node rich in links increases its connectivity faster than the rest of the nodes because incoming nodes link to it with higher probability-this ‘rich-gets-richer’ phenomenon is present in many competitive systems” (Barabási, 2001). The principle is also known under the name of the “Matthew effect.” Robert Merton, the famous American sociologist coined the term to describe the uneven allocation of reputation in science20 (Merton, 1968; Bonitz, Bruckner, & Scharnhorst, 1997). The preferential attachment mechanism has also been labeled as “popularity is attractive.” Implicitly, one assumes here that the linking behavior is oriented to the popularity or attractiveness of a node, and that this attractiveness can, conversely, be expressed in terms of the number of links of a node.

In the previous section, I mentioned that real networks usually express a variety of degree distributions. To find a good fit for the value of the exponent γ by regression analysis presents one goal of empirical investigations. As I showed, the value of γ can differ among different real world networks. The differences in γ indicate the presence of other, or additional, mechanisms to the pure process of preferential attachment introduced above. Indeed, the theoretical analysis provides different variants of the original Barabási-Albert model. Some of these models will be discussed here in more detail to make clear which details in mathematical models might be linked to qualitative studies.

What seems to be critical for the construction of networks that are only scale-free is the type of dependency between the degree of a node and the probability of adding another link to it. Qualitatively, the question is what are the criteria for linking a new node to already existing nodes? Let us consider a mathematical formula that expresses this relationship:

image(1)

The Π stands for the probability to add a link to a certain node and depends on the degree (number of links) this node has (Albert & Barabási, 2002). On the right side of the above equation we find two terms representing two different processes. Three parameters are present: A, a, and α. These can all be interpreted in a certain way. The first term is simply a constant A. This is also called the initial attractiveness of the node i. If a node does not yet have links, there must also be a chance for such an isolated node to become connected within the growing network. The term A expresses the probability that a node will be connected, independently of the degree it has. This kind of basic connectability ensures that every node has a chance to become linked at all. Usually, it would be assumed that this probability is small. This constant term is a simple but quite important extension of the original Barabási-Albert model. The way in which isolated nodes can be connected to a network has to do with the addition of innovation to a network. If the fact that a node is already linked remained the only criterion for growth, isolated nodes would never get a chance to be part of the network.

Once a node has a link, one can speak of a certain boundary having been crossed. Then, a second mechanism starts and is assumed to be dominant, compared with the initial attractiveness. This second mechanism akα ensures that the probability to link a node to others will increase with the number of links that this node already has. This mechanism has been introduced previously as preferential attachment. It can be interpreted as a kind of imitation behavior. The mathematical formula, however, offers a further way of differentiation by presenting us two parameters a and α.

The exponent α describes the way in which probability grows with degree. When α equals 1 linear growth occurs and the value of γ will vary between 2 and ¥ depending on the other parameter a. If α is less than 1 (sub-linear case) the degree distribution approaches a stretched exponential form. In the case that α is greater than 1 (super-linear case) it can be shown that for α a ‘winner takes all’ phenomenon emerges, in that almost all nodes have a single edge, connecting them to a ‘gel’ node that has the rest of the edges of the network.” (Albert & Barabási, 2002).

In summary, only when α equals 1 is the scale-free character of the network reconstructed, whereas any other form of preferential attachment seems to destroy the scale-free character of the network (Krapivsky, Redner, & Leyvraz, 2000). This defines possible behavior on the micro level in quite sharp boundaries. In fact, one can reconstruct the function Π from empirical observations. For certain networks, e.g., network of citations or of co-authorship, one can follow the growth of the network over time. The number of nodes added to other nodes and the degree of these nodes can be counted and the function, Π (k), can be obtained from the data. For several networks, Π (k) is a linear function of the degree k, and the parameter α is equal to one. The adding of nodes follows a sort of “simple” self-reproduction process. Reaction kinetics can be used to illustrate the behavior at the micro-level that leads to a linear function in the preferential attachment mechanism. In reaction kinetics, linear growth appears whenever a new chemical product is created from the reaction of a substrate with the product itself that functions, in this case, as a catalyst of its own building (auto-catalysis). Outside chemistry the duplication process of RNA and DNA follows the same law. In social systems, for instance, one has argued that the recruitment of new scientists by professors educating PhD students can be seen as a self-reproduction process (Bruckner, Ebeling & Scharnhorst, 1989). Returning to our network, the degree of a node would grow according to a self-reproduction mechanism. This picture inverts the idea of adding a node to another one. Now, a node with a certain degree is “fishing” for new nodes (substrate) to increase its own degree. If only the degree of the node is relevant for the adding process one would expect to have such a “pure” preferential attachment mechanism. For the motivations of the individuals producing links in a social network of whatever kind, the pure preferential attachment mechanisms would correspond to an orientation only in the attractiveness of the node to link with. If the neighborhood of the node (e.g., the degree of neighboring nodes) is also involved in the growth process, one would expect a nonlinear growth with α unequal to one. The analogy in chemical kinetics are cross-catalytic reactions where more than one product is involved in the production, or “network effects” where more than one molecule of the product has to be present to produce the product itself (auto-catalysis of higher order). Or to take another example, the growth of science can be argued to be influenced by the building of scientific schools, which create such a networked effect resulting in non-linear growth rates (Bruckner, Scharnhorst, 1986).

In the literature, further extensions of formula (1) can be found by adding or modifying terms. Pennock et al. proposed a slightly different combination of two processes: the process of preferential attachment and the process of uniform attachment (Pennock, Flake, Lawrence, Glover, & Giles, 2002). In addition, the process of introducing new links may be changed. If the number of these added links increase in time this is called “accelerated growth.”Albert and Barabási (2002) showed that such a process changes the exponent γ of the degree distribution. Nodes (and links) may have a limited lifetime, and networks that also decay can be considered. Dorogovtsev and Mendes (2000) found that the power law dependence in connectivity distributions only remains if only a small fraction of links between old nodes are removed. The empirical investigation of these processes remains to be pursued via case studies. These would establish whether such processes occur in real-world networks, and secondly what the system-specific reasons are for them to appear.

Modeling the Web as a growing complex network

The finite lifetime of nodes seems to be extremely important for modeling processes on the Web, since it is known for its dynamic and unstable character (Koehler, 2002). The literature on models for Web growth and connectivity shows that there are several modeling possibilities that lead to distributions similar to the observed one. I will discuss only two of them here.

In a simulation model, Boudourides and Antypas assume that Web pages have a greater possibility of linking to another Web page on the same topic than to Web pages on other topics (Boudourides & Antypas, 2002). According to the authors, this rule acts as a preferential attachment mechanism. In the model, topics are randomly assigned to Web pages. With a set of relatively simple rules, Boudourides and Antypas are able to reproduce the coefficient for the out-link distribution within the range of other estimations for a skew distribution. Furthermore, the clustering coefficient is much higher than for a comparable random network and the average shortest path is in the same range. Thus, a small world property also seem to be reproducible.

Whilst Boudourides and Antypas' argument rests on topic similarity of Web pages in the linking procedure, another model argues in terms of updating procedure of links on Web pages. The model is “motivated by the conduct of the agents in the real Web to update outgoing links (re)directing them towards constantly changing selected nodes.” (Tadic, 2001). The author adds a mechanism of preferential update to the mechanism of preferential attachment. Tadic can show that the existence of temporal variation in the outgoing links (updating) is a necessary pre-condition to be able to explain not only the degree distribution, but also the distribution and size of connected clusters inside the network. The dynamic structure of the hyperlinks makes the Web unique in comparison with other networks such as citation networks, which have “frozen” links. The two examples show that the differentiated modeling procedure of growth processes of complex networks opens a variety of questions for qualitative studies.

Dynamics on Complex Networks

Dynamics on networks (rather than of networks) refers to dynamic processes that take place in a network topology. One can think of the spread of a disease over a network of social contacts, or of computer viruses among computers. The spreading of a chemical reaction in a physical space can also be understood with epidemic models on networks. In all these cases, each node gets additional characteristics. It is “infected” or “not infected.” The network topology defines the neighborhood between nodes, and so the possible method of infection.

For dynamics on networks, random graphs or exponential networks also serve as an important reference point. Epidemic processes on exponential networks exhibit a characteristic threshold. The existence of this threshold can be explained in the following way. To reach a balance between infected and uninfected nodes, two processes are usually assumed to be relevant. Infected nodes can become “healthy” at a certain rate, and nodes become newly infected at a certain infection rate λ. Let us consider the number (or density) of infected nodes as a relevant variable to characterize the state of the whole network. This number depends, as one would expect, upon the value of rate of infection. Below a critical value of λ, the healing processes dominate the infection processes and the number of infected nodes remains zero. Above a critical point the infection increases, and affects growing parts of the network with accelerating infection rates. The first surprising result of scale-free networks is that, for certain ranges of the exponent γ of the degree distribution, no such epidemic threshold exists. Pastor-Satorras and Vespignani write that “for 0 < γ≤ 1 the model does not show an epidemic threshold and the infection can always pervade the whole system. In the region 1 < γ≤ 2, the model shows an epidemic threshold that is approached, however, with a vanishing slope [no singular behavior]…Only for 2 < γ we recover again the usual critical behavior at the threshold” (Pastor-Satorras & Vespignani, 2001, p.1).

The existence of a certain range of the exponent γ in which the epidemic threshold does not appear underlines the importance of determining the exact value of the exponent for real world networks. To understand the disappearance of the epidemic threshold, one has to return to the connectivity distribution and examine the deviations in the connectivity from the average <k>. Connectivity fluctuations in scale-free networks diverge because of the highly connected nodes. In other words, one can say that “the network nodes possess a statistically significant probability of having a virtually unbounded number of connections compared to the average value.” (Moreno, Pastor-Satorras, & Vespignani, 2002, p.1). This makes these networks very vulnerable to epidemic attacks.

On the other hand, the special topology of the network allows for other possibilities of counteractions. Pastor-Satorras and Vespignani refer to “targeted immunization schemes, based on the nodes' connectivity hierarchy” that might be able to “sharply lower the network's vulnerability to epidemic attacks” (Pastor-Satorras & Vespignani, 2001). Therefore, instead of a random uniform immunization of individual nodes, the most highly connected nodes are immunized, “the ones more likely to spread the disease” (Dezso & Barabási, 2002; Pastor-Satorras & Vespignani, 2001, p.6).

As a practical example, the authors discuss the spreading of computer viruses on the Internet. After a simulation using parameter values from an Internet network, they come to the conclusion that “an optimized immunization of the Internet can be reached only through a global immunization organization that secures a small set of selected high-traffic routers or Internet domains.” Unfortunately, as they argue further, “the self-organized nature of the Internet does not allow to easily figure out how such an organization should operate” (Pastor-Satorras & Vespignani, 2001, p.8).

The special immunization schemes proposed above use specific characteristics of scale-free networks, discussed in connection with the attack vulnerability of complex networks (Holme et al., 2002): “In particular, it has been shown that SF [scale-free] networks possess a noticeable resilience to random connection failures, which implies that the network can resist a high level of damage (disconnected links), without loosing its global connectivity properties; i.e., the possibility to find a connected path between almost any two nodes in the system.” (Pastor-Satorras & Vespignani, 2001, p.6).

Not all networks are equally vulnerable, and they may be more or less resilient in the face of different kinds of attacks. Holme et al. introduced different attack strategies by removing nodes or links. One strategy determines which objects are to be removed from the initial topological structure of the network (e.g., starting with the nodes with the highest degree). Another strategy recalculates the structure (e.g., ranking list of high degree nodes) after each step of removal (Holme et al., 2002). The difference in attack strategies showed the importance of changes in the network's structure during the attack. Recalculating strategies were the most effective for real networks.

Investigations of this kind gain importance if the topology of networks is used to protect networks against attacks. Random networks are still the most robust networks. From this result the authors conclude: “this supports the intuitive idea that building a server-less network would be very robust to attack.” (Holme et al., 2002, p. 13) On the other hand, it has been established that scale-free networks are also quite robust against random attacks, and that they can organize the flow of information effectively. In the end, ‘real-world’ networks represent a combination between functionalities described by different models. The goal of analysis consists in understanding the mechanisms and driving forces behind these functionalities.

Conclusions: Complex Networks Theory and Qualitative Studies

The last few years have witnessed the emergence of a new specialty, called complex networks theory, which has deep roots in statistical and non-linear physics. It has been argued that the formation of this specialty was very much triggered by the rise of the Internet and the Web. On the one hand, they form an outstanding and interesting representation of networks. On the other hand, information and communication technologies, the digitization of data, and the existence of Web-based databases made data about many more networks suddenly available for statistical analysis. The development of the specialty started with empirical investigations of different types of “real-world” networks; among them, the Internet and the Web figured prominently. It extended further into model-building activities that tried to mirror the statistical features found empirically. A rich class of different models has become available. The paper reviews some of these model activities, namely small world network and scale-free networks. It discussed the topology of such networks, mechanisms of growth, and the appearance of dynamic processes on such networks. It used the Internet and the Web as primary empirical examples.

It should be noted that the role the Web played in the growth of this specialty may also influence the way in which “real-world” networks are analyzed. Connectivity and the corresponding link structure play a prominent role. This might have biased the research that has been done in complex networks theory in a certain way. It would be interesting to look at the growth of “modern” complex networks theory from the point of view of history of science. Given that graph theoretical approaches have a long history, it would be reasonable to expect to find predecessors of reflection in the literature about complex networks, perhaps using other technical terms and certainly producing other theoretical perspectives. Whatever the results of such a science historical analysis, the recent research in complex networks seems to mirror the Zeitgeist of living in a globally connected world where many different kinds of human activities are embedded in networks (Castells, 1996).

In this paper, although the Web is used as a prominent example, I have shown that many more different real-world networks are under consideration. A certain number of them are related to social processes or human behavior, for instance, networks of language, traffic networks, networks of movie actors, networks of scientific collaboration and citation networks. What is often missing in these social applications of physics is the link to social science theories. Here is a wide open field for further interdisciplinary research. The extension of the empirical analysis towards other huge social “real world” networks is another possibility of future research. (See as an example for such networks (Garrido & Halavais, 2003) and Alexander Halavais's homepage http://alex.halavais.net/research.html).

Here, I would like to draw attention to the possibility that complex networks theories cannot be used not only as a basis for empirical analysis. One can also use them as a heuristic framework to find research questions in social sciences. The application of statistical analysis might be limited for a number of social networks because of their small size. However, the models presented in this paper can also be used as a heuristic guideline to suggest the kinds of processes that should be observed. I elaborate one such example here: the growth mechanisms of preferential attachment as discussed earlier.

It was found that, in the case of scale-free networks, preferential attachment takes place in a very specific form. This means that the individual act of adding and linking a new node to a network follows certain boundary conditions and statistically cannot be done arbitrarily. On average, a new node will be linked to an already “rich” node in links. Scale-free networks only emerge according to one specific form of preferential attachment. This defines sharply how, mathematically speaking, the linking has to take place. As I discussed earlier a linear growth of links is the pre-condition of a scale-free network with a certain exponent γ of the degree distribution.

To be able to exploit these insights for a substantial discussion of growth processes of networks, one has to re-contextualize the processes under study. Let's take the WWW as one example. According to the literature the network of hyperlinks amongst Web pages shows a scale-free behavior with exponents γin= 2.1 and γout= 2.7 for the in-link and the out-link distributions (Albert, Jeong, & Barabási, 1999; Dorogovtsev & Mendes, 2002). Accordingly, it could be assumed that the mechanism of creating the hyperlink network follows a preferential attachment mechanism. For the out-link network, the original Barabási-Albert model (which gives an exponent of γ equal three) might even present a good explanation for the empirically observed distribution. Nevertheless, one finds further models in the literature which aim to explain more adequately the emergence of such a scale-free network. Tadic proposed processes such as the adding of new nodes and the updating of links (Tadic, 2002). Dorogevtsev, Mendes and Samukhin proposed a whole spectrum of linking processes related to the introduction of a new node: a number of links distributed preferentially each unit of time, a number of links pointed at a new node at the instance of its birth, a number of links distributed randomly without any preferential attachment (Dorogovtsev, Mendes, & Samukhin, 2000). All these different processes occur with certain probabilities, which can be described by certain parameters. Concerning the Web, the values of these parameters are still not known. Often authors search for plausibility arguments in the assumption of a certain process. The lifetime of a link on a Web page, or the initial number of links on a Web page, have been not empirically tested. This is not impossible in principle. The data are potentially there, though arduous to retrieve. An attempt could be made to make these processes visible by empirically observing a sample of Web pages and monitoring how newcomers become linked to the existing population. Also, a differentiation between random distributed links and preferential attached links seems to be possible. Pennock et al. argue in such a direction when writing about two common behavior of Web page authors: “(i) creating links to pages that the author is aware of because they are popular, and (ii) creating links to pages that the author is aware of because they are personally interesting or relevant” (Pennock, Flake, Lawrence, Glover, & Giles, 2002). Such assumptions could be verified by conducting interviews. A sample of Webmasters could be questioned about their motivations for linking. One part of the linking activity will be goal-oriented with the goal possibly being determined by “popularity” arguments. Another part of the linking activity might have different reasons and could be summarized under “random” linking. This way, at least a percentage rate of preferential attachment processes and other processes could be estimated. A questioning of Webmasters has been already done but in a quite different context and not related to complex networks analysis (Hine, 2001; Park, 2002). One found evidence that linking behavior is related to so-called “authoritative” Web pages and that the way the Webmaster envisions different audiences, as well as the standard setting of procedures inside organizations, determine the establishment of hyperlinks. For the formal analysis that I have in mind here, one would expect to relate the diversity of different motivations to criteria which are measurable in terms of a network structure. For example, one would examine the extent to which the authority of a Web page is related to the degree of hyperlinks (in-links) to that page.

In any case, the specialty of complex networks theory remains an interesting development that social scientists should be aware of. It concerns new definitions for connectivity and new indicators for network analysis. It also concerns results about the functionality of connectivity, which has implications forthe accessibility of information in networks and the functional stability of this information. Possible explanations of connectivity with the help of mathematical models require further qualitative and context bounded research into the nature of complex networks.

Acknowledgments

I would like to thank colleagues at NIWI and in ASCoR who commented on this paper. In particular, I would like to thank Loet Leydesdorff for important comments on an earlier draft of this paper. I am indebted to Anne Beaulieu, co-editor of this special issue, for her comprehensive feedback that helped clarify my arguments.

Footnotes

  • 1

    See, e.g., a lecture by Peter Erdi given in 2001 for numerous graphical examples of networks http://www.rmki.kfki.hu/biofiz/cneuro/tutorials/kzoo/kzooall/.

  • 2

    See, e.g., the special issue “Networks and Complexity” of the journal Complexity, 8 (1), 2002.

  • 3

    The Institute for Scientific Information in Philadelphia (http://www.isinet.com/isi/) produces the Web of Science. The Web of Science compiles its databases (Science Citation Index®, Social Science Citation Index®, Arts & Humanities™) from about 8500 scientific journals covering all disciplines.

  • 4

    Documents include scientific articles, editorials, and book reviews.

  • 5

    The literature search involved scanning the bibliographic references in the retrieved articles, searching in Science Direct and on the Web using the phrase “complex networks.”

  • 6

    The Science and Engineering Indicators are published every two years by the NSF. For the latest on-line version see URL:http://www.nsf.gov/sbe/srs/seind02/start.htm.

  • 7

    This journal, as well as Physics Review B and Physical Review Letters, are edited by the American Physical Society, see URL http://publish.aps.org/ for more information.

  • 8

    See the mission of the journal at the URL http://www.elsevier.com/inca/publications/store////3/0/0/300.pub.htt#aimsc

  • 9

    Abstracts and keywords are only included in the Web of Science databases since 1990. From this point onwards, it becomes easier to locate relevant documents via the Web of Science topic search facility.

  • 10

    The database is located at http://igWeb.integratedgenomics.com/IGwit; however, it is no longer publicly available. For more information see also http://www.integratedgenomics.com/genomic.html.

  • 11

    See, e.g., Huberman, Pirolli, Pitkow, & Lukose (1998) Strong regularities in World Wide Web surfing; Albert, Jeong, & Barabási (1999) The diameter of the world-wide Web; Faloutsos, Faloutsos, & Faloutsos (1999) On power-law relationships of the internet topology; Huberman & Adamic (1999) Growth dynamics of the world-wide Web; Kumar, Raghavan, Rajagopalan, & Tomkins (1999) Extracting large-scale knowledge bases from the Web; Kleinberg, Kumar, Raghavan, Rajagopalan, & Tomkins (1999) The Web as a graph: measurements, models, and methods.

  • 12

    ‘Popularity-based construction rules’ is equivalent to the so-called preferential attachment mechanisms. Hereby, new links and also new nodes connect to nodes that already have a lot of links, which are already ‘popular’. I will discuss this principle in more detail in the section “Growth and Evolution of Complex Networks.”

  • 13

    For a broader and more detailed overview see, e.g., the review by (Dorogovtsev & Mendes, 2002) and (Goh et al., 2002).

  • 14

    Here the authors consider the distributions of cargo and passengers over the different airports assuming that the numbers of cargo and passengers are proportional to the number of connections that an airport has to other airports.

  • 15

    A router is an element of the Internet which is defined as “a device or, in some cases, software in a computer, that determines the next network point to which a packet should be forwarded toward its destination. The router is connected to at least two networks and decides which way to send each information packet based on its current understanding of the state of the networks it is connected to.” (searchNetworking.com, Definitions, http://searchNetworking.techtarget.com/sDefinition/0,sid7_gci212924,00.html.)

  • 16

    For each pair of nodes one can calculate the shortest path between them. This opens the possibility of considering a distribution of shortest path lengths over pairs of nodes in a network and determining an average shortest-path length. This average is called the diameter of the network. In some papers the diameter of the network is related to the maximal shortest path as characteristics for the maximal extent of a network (Dorogovtsev & Mendes, 2002).

  • 17

    For a comparison between these different laws see (Adamic, 2000).

  • 18

    “Emergence of scaling in random networks” (Barabási & Albert, 1999).

  • 19

    It should be noted here that the group around Huberman was among the first to discuss other skew distributions on the Web, as the number of pages per site (Huberman & Adamic, 1999) or the number of users of Web pages (examining usage logs from AOL) (Adamic & Huberman, 2000). For the publications of the Huberman group see http://www.hpl.hp.com/shl/. Other authors analyzed the structure of the Internet network looking at the level of routers and domains (for an review see Dorogovtsev & Mendes, 2002).

  • 20

    See also a comment by Eugene Garfield on the history of the Matthew effect in an article about citation indexing (http://garfield.library.upenn.edu/papers/history/heritagey1998.html).

  • 21

    Some of the articles in the references are also available from the Physics e-Print archive. They can be found in the subfield “condensed matter” or “computer science.” To obtain them go to http://arxiv.org/abs/cond-mat or to http://arxiv.org/abs/cs and type in the abstract number after the slash.

Ancillary