SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

We have recently witnessed the growth of hyperlink studies in the field of Internet research. Although investigations have been conducted across many disciplines and topics, their approaches can be largely divided into hyperlink network analysis (HNA) and Webometrics. This article is an extensive review of the two analytical methods, and a reflection on their application. HNA casts hyperlinks between Web sites (or Web pages) as social and communicational ties, applying standard techniques from Social Networks Analysis to this new data source. Webometrics has tended to apply much simpler techniques combined with a more in-depth investigation into the validity of hypotheses about possible interpretations of the results. We conclude that hyperlinks are a highly promising but problematic new source of data that can be mined for previously hidden patterns of information, although much care must be taken in the collection of raw data and in the interpretation of the results. In particular, link creation is an unregulated phenomenon and so it would not be sensible to assume that the meaning of hyperlinks in any given context is evident, without a systematic study of the context of link creation, and of the relationship between link counts, among other measurements. Social Networks Analysis tools and techniques form an excellent resource for hyperlink analysis, but should only be used in conjunction with improved techniques for data collection, validation and interpretation.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

Our purpose in this paper is to review and compare two approaches to hyperlink analysis, and thereby to contribute to methodological discussions in Internet studies. The approaches and goals of these two areas have some features in common, but also important differences that it will be useful to identify and explain. Both areas have the broad goal of extracting useful information about Web use and employ the general approach of using predominantly quantitative techniques to display or summarize hyperlink-based data. This is in contrast to hyperlink analysis as conducted by statistical physicists and computer scientists, which adopts a much more abstract approach, as will be revealed in the brief review of these areas below.

The two approaches come from different fields. Hyperlink Network Analysis (HNA) derives from Social Network Analysis, whereas Webometrics derives from information science. As a result it would be natural to expect the different backgrounds to be reflected in the kinds of problems tackled, the methods developed and the outcomes sought. Field differences can also mean that there is little interaction between practitioners of each approach, and our review aims to enrich the understanding of these approaches by highlighting what they share and how they differ.

A second motivation for this review is to introduce the techniques to a wider group of researchers, in the belief that with the increasing importance of the Web for an ever-broader spectrum of human activities, online analysis techniques should be more widely developed and exploited. Hyperlink analysis provides Internet researchers with new analytical methods for the study of networked (or connected) structures on the World Wide Web. This paper both provides a review of hyperlink research that originates in hyperlink network analysis or Webometrics communities and examines practical issues related to link data-collection techniques.

Compared to other Web methods such as a content-based analysis, the relative advantage of hyperlink analysis is that it is able to examine the way in which Web sites form a certain kind of relations with others via hyperlinks. According to Weare and Lin (2000), content-based studies may be missing an opportunity if they fail to consider the hyperlinked environment. Many Web sites with common topics are hyperlinked together, which allows users to access materials or services hosted on other sites. Given this interweaving hyperlinking structure, it may be necessary to recognize individual Web sites as mutually dependent entities, which constitute a Web system. If a content analysis of individual Web sites does not include materials to which the Web site under investigation hyperlinks (such as academic reports that are hosted on other sites), it fails to see the structures in the environment that afford social navigation. Also, hyperlink analysis enables visualization of navigational elements, such as changes in the hyperlink structure of contents of a Web site. It also makes possible the quantification of relational attributes among sites within a community being studied. Using this information in combination with other Web analyses (Howard, 2002; Park, 2002c) can contribute to the understanding of why and how certain types of contents come to appear on Web sites.

Hyperlink Analysis Approaches From Computer Science and Statistical Physics

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

The Web as a small world network

Using hyperlink data obtained from the Web, statistical physics researchers have examined the Web as an abstract network. We discuss here the work that highlights the importance of hyperlinks, while Scharnhorst (this volume) provides an extensive discussion of physicists' study of the Web and other networks. One of the first questions addressed was whether a small world network exists online in addition to those found in the offline world (Björneborn, 2001; Milgram, 1967; Watts & Strogatz, 1998). According to the hypothesis of Albert, Jeong and Barabasi (1999), if you selected two Web pages at random, you could get from one to the other by following hyperlinks on average 19 times, but Broder et al. (2000) later showed that this was incorrect: many pairs of Web pages are not joined by chains of hyperlinks.

Broder et al. (2000) examined about 200 million pages and 1.5 billion hyperlinks. Their study revealed that more than 90 percent of the sample Web pages form a single connected part if hyperlinks are treated as bi-directional. The probability that there was a hyperlink path between two randomly chosen Web pages was only 24 percent. When there was a path, there were an average of approximately 16 hyperlinks in the path between pages. The figure was much smaller, 6.83, for an undirected path. The Web sites of the universities within a country seem to follow a similar pattern (Thelwall & Wilkinson, 2003b) and a more detailed topological model has been given by Baeza-Yates and Castillo (2001). Other research that sought to measure distance between Web sites (or Web pages) supports a small world phenomenon in the pattern of hyperlink connections (Adamic & Adar, 2001).

Hyperlinks for Web Information Retrieval

One factor that has generated interest in the potential to extract information from hyperlinks is their use in Web information retrieval algorithms for search engines. Google led the way, in terms of application, with its PageRank calculation designed to find the most important pages on the Web by analysing hyperlink structures (Brin & Page, 1998, Henzinger, 2001). The success of Google (Thelwall, 2002j) means that Web site designers should take into account how it analyses links when designing their site's navigational structure (Park, 2002a; Thelwall, 2002k). Google's success has also lead to PageRank's adoption for bibliometrics (Thelwall, 2003c).

Kleinberg's (1999) HITS algorithm uses a combination of page content and link structures to identify the most useful pages for the topic matching a search engine user's query. This is based upon the assumption that the overall link structure of the Web is not as important as that in the locality of the topic of concern. One recent clustering algorithm is that of Flake et al. (2002), which is also designed to identify local clusters of pages, but this time by link structures alone. The breakthrough with this algorithm is a technical one: it is able to solve the previously impractically complex Web clustering task. Variants of this algorithm have been produced to explore the community structure of the Web (Thelwall, 2003d).

It is also worth mentioning that links and co-links are part of the armory of a modern search engine (Arasu et al., 2001), used in features such as the “find similar pages” function. A recent study has shown that these, along with bibliometric couplings, can be used to help identify similar Web sites (Thelwall & Wilkinson, 2004). In a complete reversal of relationships, Web site owners can also use traces from their log files to identify search engine links used to access their site, giving valuable information about sources of new visitors (Thelwall, 2001h).

The analysis of hyperlinks in the areas of computer science and statistical physics serves as a starting point because it found that the Web is not a chaotic entity. Rather, it appears to evolve in a manner that obeys a simple mathematical law and on a large scale, to be highly structured. Our review therefore finds support in these findings, which point to the appropriateness of applying quantitative techniques to measure patterns of hyperlink connectivity on the Web.

Hyperlink Network Analysis: Theory and Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

Theoretical background

As the computer and the Internet became increasingly important tools for social interaction and information exchange among people, the Journal of Computer-Mediated Communication published a special issue whose theme was “Studying the Net” in 1997 (see http://www.ascusc.org/jcmc/vol3/issue1/). In that issue, Garton, Haythornthwaite and Wellman (1997) and Jackson (1997) suggested that the methods of social network analysis could be applicable to understand the interplay between computer-mediated communication (CMC) processes. In particular, Jackson (1997) argued that hyperlink-based social network analysis could be a strong approach for studying the representation and interpretation of the Web's communication structure.

The theoretical framework of hyperlink network analysis is based on the use and application of traditional social network analysis, which studies the relations that exist among people, organizations, and nation-states (Wasserman & Faust, 1994; Wellman & Berkowitz, 1989). Social relations are generally arranged based on exchanges among social actors. The contents of exchanges can be visible or intangible, and include manufacturing goods, knowledge, political power, citation, social support, media content, or information. The exchange takes on the patterns or regularities that could not be found if social members are analyzed individually. In other words, exchange relationships among members of a social system can be represented as networks-sets of ties describing their interconnections. In the field of communication, social network analysis examines the relationships among a social system's components (generally the individual) based on the stable patterns of use of the communication system (consisting of channel/media, message, and symbol) (Monge & Contractor, 2000; Richards & Barnett, 1993; Rogers & Kincaid, 1981).

With the introduction of computer technologies into society, several researchers have examined CMC networks among computer conference users (Danowski & Edison-Swift, 1985; Rice, 1982, 1994; Rice & Barnett, 1986). Following this approach, Paccagnella (1998) recently used the social network approach to examine the structural communication patterns among people using Italian cyberpunk computer conference systems such as a bulletin boards. Marc Smith (1999c) and others (Kang & Choi, 1999; Paolillo, 2001) have analyzed the network of text message flows among Internet users. In relation to a scholarly setting, some studies (Koku, Nazer, & Wellman, 2001; Haythornwaite & Wellman, 1998; Matzat, 2001; Walsh & Maloney, 2002) have employed network analysis to examine the pattern of communication relations and media use among researchers. A number of factors were examined: Working and social relations such as scientific productivity, the frequency of collaborative communication, the information exchange relationships, and the types of communication technologies used. Also, a series of studies conducted by Haythornthwaite (2000, see http://alexia.lis.uiuc.edu/~haythorn/) and the Human-Computer Interaction group of Cornell University (Gay et al., 2001) have investigated online communication networks among students from the perspective of social networks. In contrast to these studies, some research (Hampton & Wellman, 2000; Matei & Ball-Rokeach, 2001) extended the role of CMC networks to offline life. They examined online social ties among people in relation to social interactions in their offline world.

As claimed earlier, the Web forms an important CMC network that may contain social networks (Wellman, 2001). As social members start to use hyperlinks to create and maintain their personal or organizational ties online as well as offline, a social network connected by hyperlinks becomes a part of CMC networks. In other words, a hyperlink network can be described as a specific type of CMC network, in which Web site authors are interconnected by hyperlinks (Park, in press).

A hyperlink is a technological capability that enables, in principle, one specific Web site to connect seamlessly with another. The shared (bilateral or unilateral) hyperlinks among Web sites allow documents and pictures to be referred to through the Web. The information or contents may be ‘transmitted’ through the simple click of a mouse (Pirolli & Card, 1999). A hyperlink between two Web sites functionally brings two sites closer together. While any individual and organization have complete freedom in choosing the selection of hyperlinks on their Web sites, hyperlink structures are likely to be designed, sustained, or modified by Web site creators to reflect their communicative choices and agendas (Jackson, 1997; Park, 2002). That which binds together the nodes of the Web, Web sites, can be social networks as well as technological components (Kling, 2000). From this perspective, we can potentially discern fingerprints of social relations through the analysis of configurations of hyperlink interconnections among Web sites that represent a social system's components such as people, private companies, public organizations, cities, or nation-states.

Analytical method

Hyperlink network analysis uses a set of analytical techniques and tools (e.g., density, centrality, cluster analysis, block modelling, and multi-dimensional scaling) derived from social network analysis. The difference between hyperlink and traditional network analysis is the use of hyperlink data that can be obtained only from Web sites. The basic hyperlink network data set is an n x n matrix S (also called a 1-mode network matrix), where n equals the number of nodes in the analysis. In hyperlink network analysis, the nodes are Web sites that represent social actors such as people, groups, organizations, cities, or nation-states. Each cell, sij, indicates the absence or presence or the frequency of the hyperlinks among nodes i and j. For example, sij could be a zero or a one depending on whether there is hyperlink between node i and j. Also, the strength of the relationship can be expressed if each cell represents how many hyperlinks exist between two nodes. S is symmetrical (sij= sij) when one is not concerned with directionality of the hyperlinks. In those instances when the source and receiver of the information are differentiated, S is asymmetrical (sij≠ sij). Alternatively, a 2-mode hyperlink network matrix, an m x n, may be made. In this matrix, the rows usually represent the types of hyperlinking Web sites and the columns the contents of hyperlinks. A 2-mode matrix can be converted to a 1-mode matrix depending on the research questions of interest. In the new converted matrix, a value in the ijth cell is the number of hyperlinks for which both Web site i and j contain the same content.

One of the important outcomes of hyperlink network analysis is to identify a central node, in this case, a central Web site, generally defined as the site that provides the most and/or shortest connections to other members within the group (Scott, 1991; Wasserman & Faust, 1994). The central Web site usually plays the role of hub, broker, and authoritative or prestigious site. There exist various centrality measures. Bonacich's eigenvector centrality is often used as a global indicator in hyperlink network analysis. It is appropriate in those instances where the hyperlink network is symmetrically interconnected and the frequencies of the hyperlink connections among Web sites are not binary and are relatively dense (Bonacich & Lloyd, 2001). However, this measure provides an inadequate description of a directional (or asymmetrical) network. As a result, the directional hyperlinks can be analysed using Freeman's degree centrality. It measures the number of a Web site's direct hyperlink connections with others in the group (Freeman, 1979). Freeman's measure consists of incoming and outgoing degree centrality. In hyperlink network analysis, indegree centrality is calculated based on the number of hyperlinks a Web site receives from the other sites, while outdegree centralityis determined with the number of hyperlinks originating from a site. Besides these values, there are Freeman's closeness and betweenness centrality measures (Freeman, 1979). Closeness centrality is used to determine which Web site has the shortest path to all others in the group (Freeman, 1979). Betweenness centrality refers to the frequency with which a Web site falls between pairs of other sites in the group and represents the potential for control of communication, as a broker or a gatekeeper (Freeman, 1979). Finally, Richards' Negopy centrality is the mean number of hyperlink connections required to reach each of the other Web sites in a group, such that the lower the value the more central the site (Richards, 1995). The majority of group Web sites, such as the websites of members of a university department, are connected to the central site so that Internet users can navigate with fewer links when going through it.

There are other procedures frequently used in hyperlink network analysis. With centrality indicators, centralization and density are useful to examine overall characteristics of the network. Centralization means the extent to which a hyperlink network is organized around its central Web sites (Scott, 1991; Wasserman & Faust, 1994). Density indicates the overall level of network integration (Freeman, 1979; Wasserman & Faust, 1994). While centralization is the proportion of other Web sites' connections with a central site, density reflects how sites are connected to one another in an entire network. Next, a cluster analysis identifies those groupings of Web sites that best represent their hyperlinked relations, producing central and periphery groups in terms of density within each cluster (Aldenderfer & Blashfield, 1984). While cluster analysis sorts Web sites into discrete clusters based on the existence of (in)direct connections among themselves, block modelling discovers sites with similar positions together (Burt, 1992). Web sites in the same block disclose similar patterns of hyperlinkage connections to others rather than that they are linked to each other. Multi-dimensional scaling methods such as a correspondence analysis (Barnett, 1993; Torgerson, 1958) are able to reveal the positions that nodes occupy in space. A matrix of hyperlink connectivity is converted to two- or three-dimensional coordinates and a graphic representation, that is, a map, is drawn. A quadratic assignment procedure (QAP) examines the association between two networks or different attributes of the same network (Krackhardt & Porter, 1986). Since it is able to test whether two data matrices are similar to each other, it is often used to assess whether a hyperlink network among Web sites indicates other relations among the creators of sites.

A Survey of Hyperlink Network Studies

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

This section reviews prior hyperlink research, conducted from the perspective of social network analysis within the topics of e-commerce, social movements, interpersonal, interorganizational, and international communication. The underlying belief is that the structural patterns of hyperlink connectivity can serve a particular social or communicative function.

E-commerce

Krebs' (2000) study of Amazon.com indirectly revealed the role of hyperlinks in relation to mutual trust among online consumers. Amazon.com provides customers with information of the following type: people who bought this book also bought these books, with appropriate hyperlinks so that prospective customers can take a look at the related books directly. Krebs argued that the fact that people with similar interests bought those books contributes to persuading prospective consumers to buy them. Choosing a specific book as a focal node, he built a hyperlink network among books, assuming that books indirectly represent people who bought them and thus that the network can be seen as a social network. Palmer, Bailey, and Faraj (2000) also used the hyperlink method to examine e-commerce. When purchasing a commodity online, a consumer's trust in a Web site has been regarded as one of the most influential factors in the transaction process. Based upon this theory, Palmer et al. used the number of inbound hyperlinks to a Web site as an indicator of the trust bestowed upon an Internet firm. They obtained their data from Alexa.com. As a Web information company, Alexa provides a variety of statistics about individual Web sites. The results obtained by Palmer et al. indicated that the number of incoming hyperlinks was strongly related with the use and prominence of TTPs (Trusted Third Parties) and privacy statements that may be regarded as further trust indicators. Their research method was similar to a traditional network analysis that measures an individual's prestige in terms of the number of the friends who choose the person as their representative. Davenport and Cronin (2000) see hyperlinks in general as potential new sources of trust in their targets, in a way that is similar to the function of references in scholarly literature. It is also possible to further categorize the type of trust granted to Web sites by examining the context of hyperlinks (Beaulieu & Simakova, 2002).

Social movements

The configuration of hyperlink networks itself can convey useful overall information about the landscape of certain issues in a society. Adamic (1999) explored the hyperlink structure of Web sites that dealt with an abortion issue. Her results revealed differences between groups representing different positions in the debate, in terms of the structures of the sites and in terms of density of connections between sites. Rogers and Marres (2000) analysed a hyperlink network among Web sites run by nongovernmental organizations (NGOs), governmental bodies, and private companies such as car manufactures involved with the climate change issue on the Web. They found that the organizations were producing a symbolic representation of their alliance through the selection of hyperlinks. For example, NGOs tend to hyperlink to governmental institutions including the United Nations, but not to any private players. Also, Hine (2000)'s ethnographic analysis of the trial of British nanny Louise Woodward shows that tracing the complex hyperlink connections between the different Web sites related to a particular topic may help understand agenda setting and building processes. It also paints a picture of the way relations are formed and maintained between actors, resulting in the creation of hyperlinks. More recently, Halavais and Garrido (2003) described the hyperlink network structure of NGO Web sites related to the Zapatista movement and examined their role in the global NGO networks. Their study presents another good example of using a hyperlink network analysis to tease out the structure of a social movement, based on the distribution of links among the activists' Web sites. Thus, the topology of hyperlink relations among the Web sites of social organizations addressing a common issue enables researchers to discern what types of social issues are present and prominent through Web networks. The frequency or intensity of hyperlinks among a group of personal or organizational sites devoted to a social issue may provide clues to the salience of the topic in the community. Furthermore, this representation can also indicate the underlying relationships between a site creator's position in an offline world and that of their Web presence.

Although recent studies on small world, Web information retrieval, e-commerce and social movements have tended to adopt hyperlink analysis, there is a lack of investigation into social networks among the nodes. In other words, no systematic examination of how hyperlink networks among the Web sites (or pages) reflect social relations among their producers has been undertaken. On the other hand, another body of studies that follows has examined how individual social actors' lives, embedded in a Web environment, are inscribed on hyperlink networks.

Interpersonal communication

Park, Barnett, and Kim (2000, 2001) analyzed hyperlinks among South Korea's political parties and National Assemblymen. The research was carried out to examine whether the hyperlink structure reflects homophilous attributes among political actors. A large body of the communication network research (Monge & Contractor, 2000; Rogers & Bhowmik, 1971) suggests the importance of homophily, that is, the tendency to select communication partners who are similar to oneself. Pointing out that the impact of homophily seems unlikely to diminish in cyberspace, Park et al. (2000, 2001) developed a site-by-site matrix of hyperlinks and compared the hyperlink connectivity matrix with a shared party membership matrix using QAP. They found that the structure of their hyperlink network is significantly related to party membership. In other words, politicians and their constituencies form a community of “birds of a feather” (McPherson, Smith-Lovin, & Cook, 2001). In particular, they argued that the South Korean culture based on “Confucianism” had a great impact on strengthening homogeneity among political actors on the Web. Thus, we can interpret the social and communicational structure among those social actors based on hyperlink structure.

Interorganizational communication

Many organizations have created their own Web sites, regardless of whether their activities, services, or products are concerned with the Internet (Shapiro & Varian, 1999). Some organizations such as Yahoo.com largely exist as an independent entity on the Web. Web sites may be regarded as organizations themselves. Thus, interorganizational networks can be formed via shared hyperlinks. For example, an organization's Web site tends to offer hyperlinks to other sites because they have a certain kind of business partnership. Bae and Choi (2000) employed hyperlink network analysis among Web sites to capture the structure of hyperlink-mediated communication between 402 human rights NGOs. They found that many NGOs form a hyperlink network with others according to the similarity of their aims or activities, rather than geographic location. In Thelwall's (2001a) research, business hyperlinks were the most common type of external hyperlink. In the sample, 72 out of 232 sites, or 31 percent, were found to have hyperlinks based on affiliated business relations. Using a selection of British universities, Web directories, and large computing companies such as Netscape and Oracle, Thelwell (2001b) found that the business relationships among organizations are distributed throughout their Web sites via the hyperlink selection. Further, Park, Barnett, and Nam (2002a) analyzed the affiliation network among 152 commercial Web sites in South Korea. They created a site-by-site relation matrix based on the existence of hyperlinks in a Web page titled affiliated programs. They measured centralities and found that the clustering structure of the hyperlink-affiliation network was influenced by the financial Web sites (such as credit card companies' sites) with which others are affiliated. Park et al. (2002a) explained that the major revenue sources of commercial sites are advertising and e-commerce. Payment by credit card is the most common transaction on the Internet. Also, financial companies such as banks play important roles as trusted parties in credit card transactions. Their high centrality scores can be interpreted as the degree of involvement of Web sites in commercial activities: the more business they perform, the more likely that the financial company plays an important role in the network. Thus it appears that hyperlink networks articulate a wide-range of other relationships.

International communication

Finally, hyperlink research in the field of international communication also supports the notion that the configurations of hyperlinks contain sufficient information to infer social relationships inscribed into the offline world. The process of globalization may be understood from the perspective of World System Theory (Wallerstein, 1976), which argues that global society may be characterized by an unequal exchange between powerful information-rich and information-poor countries (Barnett, 2001; Hargittai, 1999).

Halavais (2001) examined the role of geographic borders in cyberspace using the hyperlink pattern of Web sites. His approach was to take a sample of 4,000 Web sites, analyze their external hyperlinks and determine the total percentage of hyperlinks from the sites to various countries. He found that Web sites are more likely to hyperlink to sites hosted in the United States than to sites anywhere else in the world. Similarly, Brunn and Dodge (2001) analyzed the inter-domain hyperlinks among Web sites belonging to 174 geographic TLDs (top-level domains, such as .ca for Canada). Their research revealed that the most and least connected regions and countries are somewhat similar to the center-to-periphery dimension in World System literature. Barnett, Chon, Park, and Rosen's (2001) research examined hyperlinkage patterns among the 29 OECD (Organization for Economic Cooperation and Development) countries. The United States was found to be the most central country, and Iceland and Turkey most peripheral. There was also a correlation between each country's centrality in the hyperlink network and its GDP, and they also found an association between the strength of the hyperlink network and other social and communicational networks.

More recently, Park (2002b, 2002c) used hyperlink network analysis to study international scientific communication and transnational information flow in Asian region. His research differed from the above studies in that it did not rely on World System theory. Under the assumption that hyperlinks can be regarded as a form of communication network connecting individual researchers, Park (2002b) compared the structure of (incoming and outgoing) hyperlinks embedded in university Web sites hosted in the 10 Asian countries with co-authorship patterns among those countries. He found that the two network structures were significantly correlated with each other. He argued that the academic hyperlink network represented one aspect of research communications across national borders. Another of Park's studies (2002c) examined an international information flow system from South Korea to Taiwan based on the Web. Similar social and historical contexts (such as the colonial rule of Japan, a confrontational stance towards communist countries, and rapid economic growth) had tied South Korea and Taiwan together for many years. However, their relationships have taken a different path since South Korea opened official diplomatic relations with China in 1992, in the name of the “one China” policy. Motivated by this interesting background, Park analyzed the structure of hyperlink connectivity between South Korea and Taiwan, and found that the hyperlink network was very sparsely connected in terms of the number of South Korean Web pages hyperlinking to the pages of the other country. Because both countries are heavily dependent on exportation of computer products and Internet communication technologies, however, they have to weigh the costs and benefits of further economic cooperation with each other. This might influence such a hyperlink configuration. Park's research implies that some aspect of the relationship of the two countries can be described by Web hyperlinks.

Issues: Units of analysis

Selection of a unit of analysis depends on the research question. The possible units are geographic TLD (top-level domains) such as .kr (for South Korea), secondary domains (e.g., Oxford at http://www.ox.ac.uk), and Web documents (such as HTML-formatted Web pages, Web-accessible PDF files or PowerPoint slides). For instance, geographic TLDs will be suitable to examine international communication and trans-national knowledge flow in the age of globalization. Because TLDs sometimes represent each country, this can be a reasonable approach. It is known, however, that some domains are heavily used by international organizations, including .to and .tv, and that related error can occur when using the link: command (Smith, 1999b). A more detailed explanation of problems involved in using the link: command will be provided in the section on Webometrics. In the case of inter-organizational networks among dot-com companies, secondary domains may be proper units of analysis since they are regarded as organizations themselves (Park et al., 2002ab). Similarly, research on informal communication between individual scholars can employ the scholar's homepage as the unit of analysis. For those who are interested in formal communication among researchers such as citing behavior, Web documents can be taken as units. In scientific communication, scholarly documents represent their authors so that research articles on the Web can be reliable units of analysis.

Issues: Data gathering

Data on hyperlink networks between Web sites can be obtained in three ways: 1) observation, 2) computer-assisted measurement, and 3) a combination of 1 and 2. There is no doubt that observation has been a central measurement tool for gathering network data. Nevertheless, the use of human coders has limitations. It requires a researcher to surf Web sites and many Web pages within each site carefully. When it is used for a large number of sites, there is a high labor cost as well as the possibility of coding errors. For such cases, computer-assisted measurement may offer significant advantages. Past research in the field of CMC has used computer-assisted tools to gather social network data (Hampton, 1999). In order to collect hyperlink data, some researchers have written Web crawler programs to collect the data directly from Web sites (Halavais & Garrido, 2003; Terveen & Hill, 1998; Thelwall, 2001ef). Although this process seems to be more effective than traditional observation methods, it is also problematic because access to these programs is limited. This prevents other researchers from replicating previous research. In response to this problem researchers can make their full data sets freely available online for others to verify their findings (Thelwall, 2001e). An additional problem is that Web crawlers run with slight differences in their operating parameters and so no two crawlers can guarantee identical coverage of a Web site, even if running at the same time. In fact, the difficulty in deciding on data-gathering tools is a common issue for the Internet research community (Jones, 1999; Mann & Stewart, 2000). In order to determine the validity and reliability of a research method, a data-collecting tool needs to be dependable and accessible at an affordable price. The measurement tool should be available to the researcher without any serious barriers. Alternatively, search engines have been used as tools to trace the hyperlinks among Web sites (Adamic & Adar, 2001; Brunn & Dodge, 2001; Ciolek, 2001; Ingwersen, 1998; Park, Barnett, & Nam, 2002b; Thelwall, 2001a), but see the Webometrics section below and Wouters and Gerber (this volume) for reservations about their use. They are less satisfactory as a data source due to not being under the control of the researcher, since the crawling parameters under which they operate are commercial secrets and unknown to the research community. For example, in a large set of sites it is likely that a significant percentage will not be covered at all by any given search engine (Lawrence & Giles, 1999; Thelwall, 2001c), creating potential problems for data analysis. AltaVista (http://www.altavista.com) is a good example. It records incoming and outgoing hyperlinks separately. It should be noted that none of the search engines commonly used produces outcomes tailored for hyperlink network analysis. Based upon the results generated by search engines, a researcher needs to compile the results into matrices. More detailed technical issues related to data gathering will be discussed again in the section of Webometrics.

Issues: Hyperlink creation motivations

Some hyperlink network analyses tend to assume that the motivations of hyperlinking to another person's or institution's Web site are recommendation or endorsement of the Web site. But hyperlinks may be based on hostile relations rather than cooperative alliances among Web site producers (Sunstein, 2001). Especially in satirical political Web sites, some sites hyperlink to others as part of a sarcastic and negative criticism, while the other sites do not reciprocate the hyperlink so as to avoid the appearance of association. In the case of issue networks, the motivations for link creation are not so much attributable to the authority of the hyperlinked Web sites, but rather to the relevance of sites involved.

Terveen and Hill (1998) examined the number of hyperlinks between Web sites as an indicator of the quality of sites, and found that hyperlink connectivity had a significant relationship to experts' quality judgments of sites. In particular, the indegree connectivity of a site was positively correlated with these judgments. Park et al. (2002b) also regarded the number of hyperlinks incoming to a Web site as an indicator of site credibility. They empirically tested associations among hyperlink network structures, the number of visitors and Internet users' perceptions of the Web site's credibility, using a sample composed of 50 South Korean Web sites. They found that a site's incoming centrality in the hyperlink network was significantly related to visiting behavior and perceived Web site credibility. Also, Park (2002a) conducted a survey with 64 Korean Webmasters to examine the reasons Web sites form hyperlink networks with other sites. The results indicated that Webmasters require that the credibility of hyperlinked Web sites be higher than average when deciding to hyperlink to them. Thus, the more credible the Web site, the more incoming links the Web site has. The more incoming links, the more visitors the Web site has, and vice versa. The results of past research support these contentions (Park et al., 2002b; Pirolli & Card, 1999; Terveen & Hill, 1998).

The findings reported above may indicate that Web site credibility or reputation functions as a causal factor or antecedent influencing the hyperlinking pattern among Web sites. At a practical level, there is no quick and easy method for evaluating a very large number of Web sites or documents (regardless of their types) except for hyperlink counts. Thus, hyperlink analysis, though imperfect, may be the best current filtering system.

Although a number of issues remain unresolved, hyperlink network analysis is certainly a worthwhile method to analyze various kinds of information obtained from the Web. It enables researchers to identify an invisible network in the field of interpersonal and organizational communication. Hyperlink network analysis has rendered visible a latent network among people or organizations that might not appear when focusing only on the organization and its members' relationships. Also, hyperlink network analysis has the advantage of being unobtrusive (Webb, 1966). This can avoid sensitive issues that result from obtrusive observation on the Internet, such as monitoring, physical fatigue, and privacy.

Hyperlink Analysis: The Webometrics Approach

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

Background

Information science interest in hyperlinks started in 1996 and has been mainly driven by analogies with citations in journal articles. The fields of bibliometric and scientometrics make extensive use of citations both to help assess the quality of academic work and to trace patterns of scholarly communication (Borgman & Furner, 2002; Wouters, 1999). The underlying assumptions are that more important, or higher-quality articles, will tend to be cited more, and that citations often indicate that the work in the cited article has been built upon or otherwise used by the citing article (Cronin, 1984). In fact, reasons for citing are extremely diverse (Borgman & Furner, 2002) but citation analysis remains an effective, if controversial, tool that is used for a variety of purposes (Garfield, 1979; Oppenheim, 1997; Moed, 2002).

Two early articles examined the use of hyperlinks to track Web information (Larson, 1996; Rodríguez Gairín, 1997). An extensive theoretical discussion by Almind and Ingwersen (1997) also set the foundations for and gave a name to the new field of Webometrics. The event that triggered Webometrics was the deployment by commercial search engines such as AltaVista of an interface that allowed anybody to count links between large web spaces with a simple command. This made it possible to think about creating techniques to exploit this new facility and to begin to speculate about and investigate potential new applications. The information scientists who noted this potential naturally turned to their own disciplines to look for applications, and the apparently close analogy between hyperlinks and citations, both being the referencing of one document by another, gave them a ready-made set of research questions and techniques through the adaptation of citation analysis.

Rousseau (1997) popularized the term sitation for a Web hyperlink, foregrounding the citation analogy. Aguillo (1998) at the same time started the e-journal Cybermetrics and began an extensive investigation into various aspects of Web and Internet use, including hyperlinks. The analogy between hyperlinks and citations has continued to generate interest within information science, including speculations about the kind of information that they could reveal in different contexts (Borgman & Furner, 2002; Björneborn. & Ingwersen, 2001; Cronin, 2001; Davenport & Cronin, 2000; Thelwall, 2002i).

The starting point for Webometrics, then, was the attempt to apply citation analysis to the Web context. Since citation analysis tracks (to some extent) scholarly communication, some researchers have sought to use hyperlink counts as a measure of the extent of online communication between the owners of two or more sets of Web pages. Other citation analysis attempts to evaluate bodies of work through their citation counts, which has lead to a second type of Webometrics approach: to see whether link counts can be valid measures of online impact. This has lead to investigations into whether pages attract hyperlinks primarily for the quality or interest level of their contents, so that hyperlink counts would measure some kind of online impact.

The starting points in terms of methods are not just formulae and algorithms for computing useful information, but also a wide range of data validation techniques, partly a legacy of the continued controversy surrounding evaluative citation analysis.

Data collection: Web crawlers and commercial search engines

Early Webometric studies used commercial search engine advanced queries to obtain hyperlink counts, primarily AltaVista and AllTheWeb. For example, entering the query host:wlv.ac.uk AND link:knaw.nl into the AltaVista advanced query section could be used to count the number of pages with domain names ending in wlv.ac.uk that contain a hyperlink that includes the text knaw.nl. This could be expected to give a count of Wolverhampton University (http://www.wlv.ac.uk) pages that hyperlink to the Koninklijke Nederlandse Akademie van Wetenschappen (KNAW, Royal Netherlands Academy of Arts and Sciences, http://www.knaw.nl), either the main site or a subdomain in each case. Such queries gave easy access to useful data from the huge search engine databases. One of well-known drawbacks is that no search engine can index the whole Web (Thelwall, 2002h) and actual coverage in 1999 appeared to be below 16% for the major search engines of the time (Lawrence & Giles, 1999). If a commercial search engine is used, then clearly this is a limitation that must be accepted and discussed in the study.

A second major problem, however, was immediately discovered: that the results returned by search engines fluctuated irregularly, sometimes dramatically (Bar-Ilan, 1999; Mettrop & Nieuwenhuysen, 2001; Rousseau, 1999; Snyder & Rosenbaum, 1999; Thelwall, 1999). In order to combat this, Rousseau (1999) proposed multiple search rounds and an averaging process.

Additionally, search engines have exhibited peculiarities in behavior in the past. Snyder and Rosenbaum (1999) reported that AltaVista had the following problems:

The AltaVista metaterm ‘link’ is intended to retrieve the total number of pages each of which has at least one link to a specified page. In practice, the metaterm frequently fails to retrieve all, or even most of the linkpages. Conversely, the ‘link’ command sometimes retrieves pages that do not contain the link specified. (p. 380).

A more recent systematic comparison of AltaVista's results with that of a specialist crawler found it to be highly reliable for UK university sites (Thelwall, 2001a). This particular problem seems to have disappeared, although there is no guarantee that other problems will not occur in the future.

A logical alternative is to create a specialist information science Web crawler to ensure reliable access to data, as called for by Bar-Ilan (2001). Such a tool has now been developed and extensively used (Thelwall, 2001ef) but this is not on a scale to rival commercial search engines, being capable of crawling all universities in a single country within a month, but not capable of giving significant international coverage. As a result, and helped by improvements in search engine reliability (Thelwall, 2001g; Vaughan & Thelwall, 2003), subsequent research has used both approaches.

In situations where the results are suspected to be unreliable, or because the query sent cannot be specific enough to capture the data, human observation can be used either as an additional filtering step, or a random sample can be taken to estimate the accuracy of the results (Cronin et al., 1998). This approach could be used, for example to ensure that the hyperlink still connected to the desired Web site (or Web page) in case the site (or page) had moved or been deleted since the hyperlink was created. In this case, error messages such as “The server does not have a DNS entry” or “404 URL Not Found” would often be obtained.

Data analysis methodologies

A key article that provoked many follow-up investigations was that of Ingwersen (1998), which introduced the Web Impact Factor (WIF). This is a metric designed to assess the impact of an area of the Web based upon counting the number of hyperlinks to it. In fact many different variants were proposed and tested but the most initially successful were the external absolute WIF and the external relative WIF. The former is simply the number of pages outside the area being measured that contain a hyperlink to it. The latter is this figure divided by the number of pages in the target area. Both of these omit hyperlinks between pages within the target area, these being often for navigation purposes and are therefore not useful indicators of external impact.

These or other hyperlink metrics have been applied to various Web spaces (Ingwersen, 1998). A summary of the different spaces analyzed is listed in Table 1. Key findings will be summarized separately later.

One problem with the relative WIF is that counts of pages within an area of the Web can be substantially more unreliable than hyperlink counts. This is due to various factors including mirror site inclusion and design decisions about Web page sizes and format (Thelwall, 2001a). In response to this, academic WIFs have also used university staff numbers as their denominator (Thelwall, 2001a, 2003b), giving improved results. An alternative approach has been to dispense with the WIF denominator altogether and to model the data using the raw hyperlink counts, which arguably gives more intuitive results (Thelwall, 2002b).

Other analytic approaches have included a variety of multivariate statistical techniques and pathfinder network scaling (Thelwall, 2002e) but these have all been found wanting when applied to national academic Webs, because of a poor fit between the underlying assumptions of the statistical tests and the multiple trends present in the data. A simpler approach, applicable for small collections of sites analyzed, is the network diagram where arrows are drawn connecting sites with thickness proportional to a hyperlink-based calculation (Thelwall, 2001b). Even this approach has its limitations, however, with four potential arrow thickness metrics giving different results and problems of scale occurring with domains of different sizes (Thelwall & Smith, 2002). The four metrics used for arrow thickness are as follows.

  • Total link counts
  • Total links divided by total number of pages in the target site(s)
  • Total links divided by total number of pages in the source site(s)
  • Total links divided by total number of pages in the source and target site(s)

Each method gives a different perspective on the data set and it can be useful to produce all four. Total link counts paint a picture of the total linking among the set whereas the last one factors out size so that the underlying tendency to link can be seen. In contrast, dividing by target site pages gives an indication of which sites attract the most links per page, and where these links come from. Dividing by source size gives an indication of which sites host the most links per page, and the sites that they target.

Validating Web hyperlink counts: Correlations and motivations

When a new data source is found, it is important to validate it in terms of reliability, representativity and its potential uses. Typically, this process involves theoretical studies of the data source as well as statistical exercises to compare the data with other more known quantities (Oppenheim, 2000). As a result, early Webometric studies compared WIFs with other impact measures. It was found that WIFs did not correlate highly with journal impact factors for e-journals (Harter & Ford, 2000; Smith, 1999a) and did not seem to be related to the research impact or quality of universities or departments (Smith, 1999a; Thelwall, 2000; Thomas & Willett, 2000). These results were a major blow for Webometrics, but then a whole series of positive results showed that the approaches did have potential. A study of 25 UK universities made the breakthrough by showing a significant correlation between WIFs and average research quality, both for AltaVista data and specialist crawler data (Thelwall, 2001a). Since then, further significant results have been found for a larger group of 109 UK universities (Thelwall, 2002d), Australian universities (Smith & Thelwall, 2002) and Taiwanese universities (Thelwall & Tang, 2003). In addition significant correlations have now been found for journal Web sites (Vaughan & Hysen, 2002; Vaughan & Thelwall, 2003) and academic departments (Chu et al., 2002; Li et al., 2002; Tang & Thelwall, 2003). As a result of these studies, it can now be concluded that counts of hyperlinks to academic-related Web sites frequently strongly associate with research quality. This does not imply, however, that there is a cause-and-effect relationship between the two. To address this issue, it is necessary to find out why hyperlinks are created and to study motivations for hyperlinking.

Some of the earlier correlation-based studies included exercises that attempted to identify the motivations behind hyperlink creation. After investigating various hypotheses to explain these (including the Matthew effect) (Thelwall, 2001a, 2002b, 2003b), it seems that there is greater Web-related activity in more institutions which produce more research.

At the individual researcher level, Kim (2000) explored the influences on a researcher's hyperlinking behavior, based on a focus interview with 15 scholars who included external (also called outgoing) hyperlinks in their academic papers. Although the hyperlinks in scholarly electronic articles are created as the result of a variety of motivations, he found that scholarly and social motivations are as strong as technological reasons. At the interorganizational level, in the case study of ArXiv.org and SPIRES-HEP (http://www.slac.stanford.edu/spires/hep) which are a Web-accessible pre-print article server and an extensive bibliographic database respectively, Kling et al. (2001) found that hyperlinks between two organizations were made by a variety of social, economic, and technological relationships such as a joint grant research and a Web site interface. Although hyperlinking technology was born of the advancement of computer technologies, whether and how hyperlinking among people and organizations is established is socially and culturally determined in the particular context (Hine, 2000).

The most direct academic hyperlink motivation study took a random sample of 414 hyperlinks between UK universities and classified them by the apparent motivation for their creation (Wilkinson et al., 2003). It was found that although fewer than 1% of hyperlinks targeted formal scholarly publications, such as a journal article or conference paper, over 90% of targeted material that was in some way related to research or other scholarly activity, such as teaching. This shows that Web hyperlinks are best viewed as data about informal scholarly communication. In fact they may be the most publicly available data source for informal scholarly communication, and hence have great potential.

It is likely that in the future, much more detailed studies of hyperlinking motivation will be undertaken in an attempt to gain a better understanding of the phenomenon, but it is expected that the results will be complex and highly context-dependent (Thelwall, 2002g).

Alternative document models

One discovery from the studies that analyzed hyperlinks between universities (Thelwall, 2001a; 2002b; 2003b) was that there were many cases in which one site contained thousands of hyperlinks to another, all created for essentially the same reason, and that this would show up as an anomaly in the data. A typical example would be the Web site of a collaborative research project where each page contains a standard hyperlinks bar that included a hyperlink to the home page of each partner institution. This violates the implicit assumptions of hyperlink analysis; that each hyperlink should be of approximately the same importance as the others. As an example, clearly 1000 automatically-generated hyperlinks should carry less weight than 1000 created by the decisions of different academics.

In order to circumvent this problem, alternative document models (ADMs) were created (Thelwall, 2002d), which aggregate hyperlinks together based upon directories, domains and whole sites instead of the page, as used in all previous research. The ADMs produced much better correlations with research productivity (Thelwall, 2002; Thelwall & Harries, 2003; Thelwall & Wilkinson, 2003a), showing the value of this approach. Their main drawback is that they are difficult to use if raw data is obtained from commercial search engines, since for each hyperlink count the exact hyperlink source and target URL must be known to perform the aggregation process. Programs are now freely available online to perform ADM aggregations to crawler data (Thelwall, 2001e).

ADMs have been found flawed in two contexts. Sheffield University appears to use different domain names less frequently than other UK universities, turning it into an anomaly for the domain ADM (Thelwall, 2002a). Essex University hosts a database of hyperlinks to thousands of different German Web sites, creating an external hyperlinks anomaly (Thelwall, 2003a).

Key findings

Webometrics studies have yielded a number of interesting results (Smith & Thelwall, 2002; Tang & Thelwall, 2003; Thelwall, 2001c, 2002a, 2002c, 2002e, 2002g; Thelwall et al., 2003; Thelwall & Smith, 2002; Thelwall & Vaughan 2003). Among other indications, these results point to the need to study the Internet in a way that is sensitive to fields, as a dynamic, differentiated space, in which geographical notions still matter.

From these findings the concerns with online relationships and impact can be seen. What they do not show is the concern with data validation and methodological development that has characterized most Webometrics research and has been discussed above.

The key contributions of Webometrics to hyperlink analysis have been the development of methods for data collection, processing and validation. In addition, a range of general results has been generated about how the Web is used, primarily in academia, and establishing factors that influence web use or impact, as measured by hyperlink counts.

Clearly, the conclusion must be drawn that great care has to be taken in collecting data, processing it to remove the anomalies (e.g. using the Alternative Document Models if possible), and interpreting the results. Failure to do this risks arriving at incorrect results and misleading conclusions.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

The hyperlink network analysis approach is based upon the assumption that hyperlinks may be the formalized bridge between hyperlinking and hyperlinked Web sites' authors, serving as social symbols or signs of communication hyperlinkage among themselves. In other words, hyperlinks are considered not simply as a technological tool but as a newly emerging social and communicational channel. There is a tie through hyperlinks that connects individuals, organizations, or countries on the Web. The literature suggests how hyperlink networks may in some circumstances reflect off-line connections among social actors, and be unique to online interactions in other cases. Further, hyperlink networks among Web sites and social relations in the offline world may be seen as co-constructing each other to some extent, so that offline relationships can influence how online relationships are developed and established (Birnie & Horvath, 2002; Hampton & Wellman, 2000). In terms of the development of methods, hyperlink analysis has been able to apply social networks analysis techniques to collections of Web sites and draw conclusions based upon an assumption of actor relationships.

Webometrics research into hyperlinks has so far focused on data reliability and validity issues in addition to the development of analysis tools. Investigations have used questions about whether links can be used to represent connections between page authors or alternatively, whether they may reflect the value of or interest in the documents targeted. Most studies have also used academic domains. To sum up the progress in this direction, it can be seen that it is possible to extract meaningful data from academic sites, if care is taken to employ the best methods and to validate the data. The results for academic spaces can be used to give information about patterns of online scholarly communication, but little is known so far about the validity or correct interpretation of results from non-academic domains. Webometrics has not had the large range of pre-existing network analysis calculations to draw upon in the way that Hyperlink Network Analysis has, it has used very simple link counts, sometimes divided by size measures. For more sophisticated analyses standard statistical techniques have been employed, such as multivariate statistics. A range of basic graphical techniques have also been developed, primarily types of network diagrams.

Two approaches to hyperlink research, hyperlink network analysis and Webometrics, look similar in that both approaches examine Web sites (or documents) based on the relations with others rather than the attributes of individual sites. In other words, they examine the relational attributes among Web sites. But they are different in their interpretation of the meaning of hyperlinks. The former tends to cast hyperlinks between Web sites as social and communicational ties, while the latter has concentrated on hyperlinks between academic Web sites. It has found these to be indicators of a range of predominantly informal scholarly communication, sometimes representing social ties but also the flow of information. The evidence from Webometrics is that in an academic setting at least, hyperlinks rarely directly represent social ties between individuals. Indeed, the most common link targets are university home pages, and it is unusual to find a link from the home page of one scholar to that of another. Hyperlinks typically represent a wide range of communication behaviors, some relating to social ties and others relating to the flow of Web information.

So what can the two approaches learn from each other? Hyperlink Network Analysis can borrow the more developed data collection, processing and validation tools, techniques and approaches from Webometrics. Similarly, Webometrics can benefit from adopting the extensive Social Networks Analysis tool set used by Hyperlink Network Analysis, in situations where data validation indicates that it is appropriate to use assumptions of hyperlinks representing social or communicational ties. Otherwise the Social Networks Analysis tools may need to be adapted or their results interpreted in a different way to a context devoid of any kind of social network.

It is evident, then, that the reality of hyperlink analysis is that it is a complex and problematical tool. Although patterns can be extracted from hyperlinks, it is still the case that they are a largely unregulated and anarchic phenomenon. As a result great care must be taken to validate data when conducting hyperlink analyses to avoid drawing false conclusions because of data unreliability. Nevertheless, the importance of links on the Web means that the approaches described here are set to have a promising future.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References

The order of authors is in alphabetical and both authors contributed to writing this paper equally. The first author acknowledges that some of work included here was done while he was working for the Royal Netherlands Academy (KNAW). Both authors are grateful to anonymous reviewers and Anne Beaulieu for their valuable comments.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Hyperlink Analysis Approaches From Computer Science and Statistical Physics
  5. Hyperlink Network Analysis: Theory and Methods
  6. A Survey of Hyperlink Network Studies
  7. Hyperlink Analysis: The Webometrics Approach
  8. Conclusions
  9. Acknowledgments
  10. References
  • Adamic, L. A. (1999). The small world Web. Proceedings of 3rd European Conference of Research and Advanced Technology for Digital Libraries, ECDL. Retrieved October 3, 2002 from http://www.hpl.hp.com/shl/papers/smallworld/smallworldpaper.html.
  • Adamic, L. A., & Adar, E. (2001, May). You are what you link. Paper presented to the 10th annual International World Wide Web Conference, Hong Kong. Retrieved June 19, 2001 from http://www10.org/program/society/yawyl/YouAreWhatYouLink.htm.
  • Aguillo, I. F. (1998). STM information on the Web and the development of new Internet R&D databases and indicators, in Online Information 98: Proceedings. Learned Information, 239243.
  • Albert, R., Jeong, H., & Barabasi, A. -L. (1999). Diameter of the World Wide Web. Nature, 401(9), 130131.
  • Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills , CA : Sage.
  • Almind, T. C. & Ingwersen, P. (1997). Informetric analyses on methodological approaches to ‘Webometrics.’ Journal of Documentation, 53(4) 404426.
  • Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A. & Raghavan, S. (2001). Searching the Web. ACM Transactions on Internet Technology, 1(1), 243.
  • Bae, S., & Choi, J. H. (2000, April). Cyberlinks between human rights NGOs: A network analysis. Paper presented to the 58th annual national meeting of the Midwest Political Science Association, Chicago.
  • Baeza-Yates, R. & Castillo, C. (2001). Relating Web characteristics with link based Web page raking. Proceedings of SPIRE 2001, IEEE (pp. 2132). CS Press, Laguna San Rafael , Chile .
  • Bar-Ilan, J. (1999). Search engine results over time - A case study on search engine stability. Cybermetrics, 2/3. http://www.cindoc.csic.es/cybermetrics/articles/v2i1p1.html. Available http://dois.mimas.ac.uk/DoIS/data/Articles/upvupvcyby:1998-99:v:2-3:i:1:p:1.html.
  • Bar-Ilan, J. (2001). Data collection methods on the Web for informetric purposes - A review and analysis. Scientometrics, 50(1), 732.
  • Barnett, G. A. (1993). Correspondence analysis: A method for the description of communication networks. In Barnett, G., & Richards, W. (Eds.). Progress in communication sciences (pp. 135164). Norwood , N.J. : Ablex.
  • Barnett, G. A. (2001). A longitudinal analysis of the international telecommunication network, 1978–1996. American Behavioral Scientist, 44(10), 16381655.
  • Barnett, G. A., Chon, B., Park, H., & Rosen, D. (2001, May). An examination of international Internet flows: An autopoietic model. Paper presented at the annual conference of International Communication Association, Washington, D.C.
  • Beaulieu, A., & Simakova, (2002). The public face of databases: data resources on the web and the creation of trust in science. Paper presented at the Society for the Social Studies of Science (4S) 2002 Annual Meeting. Milwaukee, USA.
  • Birnie, S. A., & Horvath, P. (2002). Psychological predictors of Internet social communication. Journal of Computer-Mediated Communication, 7(4). Retrieved September 24, 2002 from http://www.ascusc.org/jcmc/vol7/issue4/horvath.html.
  • Bonacich, P., & Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social Networks, 23, 191201.
  • Björneborn, L., & Ingwersen, P. (2001). Perspectives of Webometrics. Scientometrics, 50(1), 6582.
  • Björneborn, L. (2001). Small-world linkage and co-linkage. Proceedings of the 12th ACM Conference on Hypertext and Hypermedia (pp. 133134). New York : ACM Press.
  • Borgman, C & Furner, J. (2002). Scholarly communication and bibliometrics. In Cronin, B. (Ed.), Annual Review of Information Science and Technology 36 (pp. 372). Medford , NJ : Information Today Inc..
  • Brin, S. & Page, L. (1998). The anatomy of a large scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7), 107117.
  • Broder, A. Kumar, R, Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. & Wiener, J. (2000). Graph structure in the Web, Journal of Computer Networks, 33(1-6), 309320.
  • Brunn, S. D., & Dodge, M. (2001). Mapping the ‘Worlds’ of the world wide web: (Re)Structuring global commerce through hyperlinks. American Behavioral Scientist, 44(10), 17171739.
  • Burt, R. S. (1992). Structural holes: The social structure of competition. Cambridge , MA : Harvard University Press.
  • Chu, H., He, S. & Thelwall, M. (2002). Library and information science schools in Canada and USA: A Webometric perspective. Journal of Education for Library and Information Science 43(2), 110125.
  • Ciolek, T. M. (2001, September). Networked information flows in Asia: The research uses of the AltaVista search engine and “weblinksurvey” software. Paper presented to the Internet Political Economy Forum 2001: Internet and Development in Asia, Singapore. Retrieved February 8, 2002 from http://www.ciolek.com/PAPERS/weblinksurvey2001.html.
  • Cronin, B. (1984). The citation process. London : Taylor Graham.
  • Cronin, B. (2001). Bibliometrics and beyond: Some thoughts on Web-based citation analysis. Journal of Information Science, 27(1), 17.
  • Cronin, B., Snyder, H.W., Rosenbaum, H., Martinson, A. & Callahan, E. (1998). Invoked on the Web. Journal of the American Society for Information Science, 49(14), 13191328.
  • Cui, L. (1999). Rating health Web sites using the principles of citation analysis: A bibliometric approach. Journal of Medical Internet Research, 1(1), e4. Retrieved January 3, 2003 from http://www.jmir.org/1999/1/e4/index.htm.
  • Danowski, J., & Edison-Swift, P. (1985). Crisis effects on intraorganizational computer-based communication. Communication Research, 12(2), 251270.
  • Darmoni S. J., Thirion B., Douyére M., Challoub, C., & Leroy J. P. (2000). Mesure de l'impact des sites Web : Le Web impact factor. L'exemple des CHU français. Revue du Praticien - Médecine Générale, 14(516), 20792080.
  • Davenport, E. & Cronin, B. (2000). The citation network as a prototype for representing trust in virtual environments. In: Cronin, B. & Atkins, H. B. (Eds.). The Web of knowledge: a festschrift in honor of Eugene Garfield. Metford , NJ : Information Today Inc. ASIS Monograph Series, 517534.
  • Douyére, M., Soualmia, L.F., Le Duff, F., Thelwall, M. & Darmoni, S.J. (2002, May). Web impact factor : Un outil bibliométrique appliqué aux sites Web des facultés de médecine et des CHU français, Neuviémes Journées Francophones d'Informatique Médicale. Québec, Canada.
  • Flake, G.W., Lawrence, S., Giles, C.L., & Coetzee, F.M. (2002). Self-organization and identification of Web communities, IEEE Computer, 35, 6671.
  • Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215239.
  • Garfield, E. (1979). Citation indexing: Its theory and applications in science, technology and the humanities. New York : Wiley Interscience.
  • Garton, L., Haythornthwaite, C., & Wellman, B. (1997). Studying online social networks. Journal of Computer-Mediated Communication, 3(1). Retrieved September 19, 2000 from http://www.ascusc.org/jcmc/vol3/issue1/garton.html.
  • Gay, G., Stefanone, M., Grace-Martin, M., & Hembrooke, H. (2001). The effects of wirelss computing in collaborative learning environments. International Journal of Human-Computer Interaction, 13(2), 257276.
  • Goodrum, A. A., McCain, K. W., Lawrence, S. & Giles, C. L. (2001). Scholarly publishing in the Internet age: A citation analysis of computer science literature. Information Processing & Management, 37(5), 661676.
  • Halavais, A. (2000). National borders on the World Wide Web. New Media & Society, 2(1), 728.
  • Halavais, A., & Garrido, M. (2003). Mapping networks of support for the Zapatista movement. In McCaughy, M., & Ayers, M. D. (Eds.), Cyberactivism: Online activism in theory and practice. London : Routledge.
  • Hampton, K. N., & Wellman, B. (2000). Examining community in the digital neighborhood: Early results from Canada's wired suburb. In Ishida, T. & Isbister, K. (Eds.), Digital cities: Technologies, experiences, and future perspectives. (pp. 194208). Heidelberg , Germany : Springer-Verlag.
  • Hampton, K. N. (1999). Computer assisted interviewing: The design and application of survey software to the wired suburb project. Bulletin de Methode Sociologique (BMS), 62, 4968.
  • Hargittai, E. (1999). Weaving the western Web: Explaining differences in Internet connectivity among OECD countries. Telecommunications Policy, 23(10/11), 701718.
  • Harter, S. & Ford, C. (2000). Web-based analysis of E-journal impact: Approaches, problems, and issues. Journal of the American Society for Information Science, 51(13), 115976.
  • Haythornthwaite, C. (2000). Online personal networks: Size, composition and media use among distance learners. New Media & Society, 2(2), 195226.
  • Haythornthwaite, C., & Wellman, B. (1998). Work, friendship and media use for information exchange in a networked organization. Journal of the American Society for Information Science, 46(12), 11011114.
  • Henzinger, M. R. (2001), Hyperlink analysis for the web. IEEE Internet Computing 5(1), 4550.
  • Hernandez-Borges, A., Macias-Cervi, P., & Gaspar Guadardo, M. (1999). Can examination of WWW usage statistics and other indirect quality indicators help to distinguish the relative quality of medical Web sites Journal of Medical Internet Research, 1e1. Retrieved January 3, 2003 from http://www.jmir.org/1999/1/e1/index.htm.
  • Howard, P. (2002). Network ethnography and the hypermedia organization: New organizations, new media, new methods. New Media & Society, 4(4), 551575.
  • Ingwersen, P. (1998). The calculation of Web impact factors. Journal of Documentation, 54(2), 236243.
  • Jackson, M. H. (1997). Assessing the structure of communication on the world wide web. Journal of Computer-Mediated Communication, 3(1). Retrieved September 19, 2000 from http://www.ascusc.org/jcmc/vol3/issue1/jackson.html.
  • Jones, S. (Ed.) Doing Internet research: Critical issues and methods for examining the Net. Thousand Oaks , CA : Sage.
  • Kang, N., & Choi, J. H. (1999). Structural implications of the crossposting network of international news in cyberspace. Communication Research, 26(4), 454481.
  • Kim, H. J. (2000). Motivations for hyperlinking in scholarly electronic articles: A qualitative study. Journal of the American Society for Information Science, 51(10), 887899.
  • Kleinberg, J., (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604632.
  • Kling, R. & McKim, G. (2000). Not just a matter of time: Field differences in the shaping of electronic media in supporting scientific communication. Journal of the American Society for Information Science, 51(14), 13061320.
  • Kling, R. (2000). Learning about information technologies and social change: The contribution of social informatics. The Information Society, 16(3), 217232.
  • Kling, R., McKim, G., & King, A. (2001). A bit more to IT: Scholarly communication forums as socio-technical interaction networks. Unpublished manuscript retrieved August 14, 2002 from http://www.slis.indiana.edu/csi/Wp/wp01-02B.html.
  • Koku, E., Nazer, N., & Wellman, B. (2001). Netting scholars: Online and offline. American Behavioral Scientist, 43(Special issue: Mapping globalization), 17501772.
  • Krackhardt, D., & Porter, L. (1986). The snowball effect: Turnover embedded in communication networks. Journal of Applied Psychology, 71, 5055.
  • Krebs, V. (2000). Working in the connected world book network. IHRIM (International Association for Human Resource Information Management) Journal, 4(1), 8790.
  • Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1999, April). Trawling the Web for cyber communities. Paper presented to the 8th World Wide Web Conference, Toronto, Canada.
  • Larson, R. R. (1996). Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace. Proceedings of the AISS 59th annual meeting .
  • Lawrence, S. & Giles, C. L. (1999). Accessibility of information on the Web. Nature, 400, 107109.
  • Leydesdorff, L. & Curran, M. (2000). Mapping university-industry-government relations on the Internet: The construction of indicators for a knowledge-based economy, Cybermetrics, 4. Retrieved January 3, 2003 from http://www.cindoc.csic.es/cybermetrics/articles/v4i1p2.html.
  • Li, X., Thelwall, M., Musgrove, P., & Wilkinson, D. (2002, September). The relationship between the links/Web impact factors of computer science departments in UK and their RAE (Research Assessment Exercise) ranking in 2001. Paper presented to the Seventh International S&T Indicators Conference, Karlsruhe, Germany.
  • Mann, C., & Stewart, F. (2000). Internet communication and qualitative research: A handbook for researching online. Thousand Oaks , CA : Sage.
  • Matei, S., & Ball-Rokeach, S. (2001). Real and virtual social ties: Connections in the everyday lives of seven ethnic neighborhoods. American Behavioral Scientist, 45(3), 550563.
  • Matzat, U. (2001). Social networks and cooperation in electronic communities. A theoretical-empirical study on academic communication and Internet discussion groups. Amsterdam : Thela Publisher.
  • McPherson, M., Smith-Lovinl, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415444.
  • Mettrop, W. & Nieuwenhuysen, P. (2001). Internet search engines - fluctuations in document accessibility. Journal of Documentation, 57(5), 623651.
  • Milgram, S. (1967). The small world problem. Psychology Today, 1(1), 6067.
  • Moed, H. (2002). The impact-factors debate: The ISI's uses and limits, Nature, 415, 731732.
  • Monge, P., & Contractor, N. S. (2000). Emergence of communication networks. In Jablin, F. M., & Putnam, L. L. (Eds.), The new handbook of organizational communication: advances in theory, research, and methods (pp. 440502). Thousand Oaks , CA : Sage.
  • Oppenheim, C. (1997). The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology. Journal of Documentation, 53, 477487.
  • Oppenheim, C. (2000). Do patent citations count? In Cronin, B. & Atkins, H. B. (Eds.), The Web of knowledge: A festschrift in honor of Eugene Garfield. Metford , NJ : Information Today Inc. ASIS Monograph Series, 405432.
  • Palmer J. W., Bailey, J. P., & Faraj S. (2000). The role of intermediaries in the development of trust on the WWW: The use and prominence of trusted third parties and privacy statements. Journal of Computer-Mediated Communication, 5(3). Retrieved Jun 22, 2000 from http://www.ascusc.org/jcmc/vol5/issue3/palmer.html.
  • Paolillo, J. C. (2001). Language variation on Internet Relay Chat: A social network approach. Journal of Sociolinguistics, 5(2), 180213.
  • Park, H. W. (in press). What is hyperlink network analysis?: New method for the study of social structure on the Web. Connections.
  • Park, H. W. (2002a). Examining the determinants of who is hyperlinked to whom: A survey of Webmasters in Korea. First Monday, 7(11). Retrieved November 19, 2002 from http://www.firstmonday.org/issues/issue7_11/park/index.html.
  • Park, H. W. (2002b, November). E-science and hyperlink network analysis: Collaborative communication through hyperlinking. Paper presented to the conference of the Netherlands School of Communications Research, Utrecht, Netherlands.
  • Park, H. W. (2002c, December). No diplomatic relationship between Korea and Taiwan on the Web? Looking at an enemy's Websites. Paper presented to the TKU (Tamkang University) 2002 International Communication Convention, Taipei, Taiwan.
  • Park, H. W., Barnett, G. A., & Kim, C. S. (2001). Internet communication structure in Korean National Assembly: A network analysis. Korean Journal of Journalism & Communication Studies (Special English edition), 185204.
  • Park, H. W., Barnett, G. A., & Kim, C. S. (2000). Political communication structure in Internet networks - A Korean case. Sungkok Journalism Review, 11, 6789.
  • Park, H. W., Barnett, G. A. & Nam, I. Y. (2002a). Hyperlink-affiliation network structure of top Web sites: Examining affiliates with hyperlink in Korea. Journal of the American Society for Information Science and Technology, 53(7), 592601.
  • Park, H. W., Barnett, G. A., & Nam, I. Y. (2002b). Interorganizational hyperlink networks among websites in South Korea. NETCOM: Networks and communications studies, 16(3/4, Special issue on the Internet development in Asia), 155173.
  • Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643675.
  • Polanco, X, Boudourides, M. A., Besagni, D., & Roche, I. (2001). Clustering and mapping Web sites for displaying implicit associations and visualising networks. University of Patras.
  • Rice, R. E., & Barnett, G. (1986). Group communication networking in an information environment: Applying metric multidimensional scaling. In McLaughlin, M. (Ed.) Communication Yearbook, 9 (pp. 315338). Beverly Hills , CA : Sage.
  • Rice, R. E. (1982). Communication networking in computer-conferencing systems: A longitudinal study of group roles and system structure. In Burgoon, M. (Ed.), Communication Yearbook, 6 (pp. 925944). Beverly Hills , CA : Sage.
  • Rice, R. E. (1994). Network analysis and computer-mediated communication systems. In Wasserman, S., & Galaskiewicz, J. (Eds.), Advances in social network analysis (pp. 167203). Thousand Oaks : Sage.
  • Richards, W. D. Jr. (1995) The NEGOPY network analysis program. Burnaby , BC : Department of Communication, Simon Fraser University.
  • Richards, W. D. Jr., & Barnett, G. A. (Eds.). (1993). Progress in communication science, 12. Norwood , NJ : Ablex.
  • Rodrìguez Gairìn, J. M. (1997). Valorando el impacto de la informacion en Internet: AltaVista, el “citation index” de la Red, Revista Espanola de Documentacion Cientifica, 20:175181. Retrieved January 3, 2003 from http://www.kronosdoc.com/publicacions/altavis.htm.
  • Rogers, E. M., & Bhowmik, D. K. (1971). Homophily-heterophily: Relational concepts for communication research. In Barker, L. L., & Kibler, R. J. (Eds). Speech communication behavior: Perspectives and principles (pp. 206225). Englewood Cliffs , N.J. : Prentice-Hall, Inc.
  • Rogers, E. M., & Kincaid, D. L. (1981). Communication networks: Toward a new paradigm for research. New York : Free Press.
  • Rogers, R., & Marres, N. (2000). Landscaping climate change: A mapping technique for understanding science and technology debates on the world wide web. Public Understanding of Science, 9, 141163.
  • Rousseau, R. (1997). Sitations: an exploratory study, Cybermetrics, 1. Retrieved January 3, 2003 from http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html.
  • Rousseau, R. (1999). Daily time series of common single word searches in AltaVista and NorthernLight, Cybermetrics, 2/3. Retrieved January 3, 2003 from http://www.cindoc.csic.es/cybermetrics/articles/v2i1p2.html.
  • Scharnhorst, A. (2003). Complex networks and the Web-insights from non-linear physics. Journal of Computer-Mediated Communication 8(4). Available: http://www.ascusc.org/jcmc/vol8/issue4/scharnhorst.html.
  • Scott, J. (1991). Social network analysis: A handbook. Thousand Oaks , CA : Sage.
  • Smith, A. & Thelwall, M. (2002). Web impact factors for Australasian universities. Scientometrics, 54(3), 363380.
  • Smith, A. G. (1999a). A tale of two Web spaces: Comparing sites using Web impact factors. Journal of Documentation, 55(5), 577592.
  • Smith, A. G. (1999b). The Impact of Web sites: A comparison between Australasia and Latin America. In Proceedings of INFO'99, Congreso Internacional de Informacion, Havana, 4-8 October 1999. Retrieved January 3, 2003 from http://www.vuw.ac.nz/~agsmith/publns/austlat/.
  • Smith, M. (1999c). Invisible crowds in cyberspace: Measuring and mapping the social structure of USENET. In Smith, M., & Kollock, P. (Eds.), Communities in cyberspace (pp. 195219). London : Routledge.
  • Snyder, H., & Rosenbaum, H. (1999). Can search engines be used for Web-link analysis? A critical review. Journal of Documentation, 55(4), 375384.
  • Soualmia, L.F., Darmoni, S.J. Le Duff, F., Douyére, M., & Thelwall, M. (2002). Web impact factor: A bibliometric criterion applied to medical informatics societies' Web sites. In Proceedings of MIE 2002.
  • Seventeenth International Congress of the European Federation for Medical Informatics, Studies in Health Technology & Informatics, 90, 178183.
  • Sunstein, C. R. (2001). Republic.com. Princeton , NJ : Princeton University Press.
  • Tang, R., & Thelwall, M. (2003, in press). Disciplinary differences in US academic departmental web site interlinking. Library and Information Science Research.
  • Terveen, L., & Hill, W. (1998, November). Evaluating emergent collaboration on the Web. Conference of Computer Supported Cooperative Work, Seattle , Washington .
  • Thelwall, M., & Harries, G. (2003). The connection between the research of a university and counts of links to its Web pages: An investigation based upon a classification of the relationships of pages to the research of the host university. Journal of the American Society for Information Science and Technology, 54(7), 594602.
  • Thelwall, M., & Smith, A. (2002). A study of the interlinking between Asia-Pacific University Web sites. Scientometrics, 55(3), 363376.
  • Thelwall, M., Tang, R. & Price, E. (2003). Linguistic patterns of academic Web use in western Europe. Scientometrics, 56(3), 417432.
  • Thelwall, M., & Tang, R. (2003, in press). Disciplinary and linguistic considerations for academic Web linking: An exploratory hyperlink mediated study with Mainland China and Taiwan. Scientometrics.
  • Thelwall, M., & Wilkinson, D. (2003a). Three target document range metrics for university Web sites. Journal of the American Society for Information Science and Technology, 54(6). 489496.
  • Thelwall, M., & Wilkinson, D. (2003b). Graph structure in three national academic Webs: Power laws with anomalies. Journal of the American Society for Information Science and Technology, 706712.
  • Thelwall, M., & Wilkinson, D. (2004, in press). Finding similar academic Web sites with links, bibliometric couplings and colinks. Information Processing & Management.
  • Thelwall, M. (2000). Web impact factors and search engine coverage. Journal of Documentation, 56(2), 185189.
  • Thelwall, M. (2001a). Extracting macroscopic information from Web links. Journal of the American Society for Information Science and Technology, 52(13), 11571168.
  • Thelwall, M. (2001b). Exploring the link structure of the Web with network diagrams. Journal of Information Science 27(6) 393402.
  • Thelwall, M. (2001c), Commercial Web site links. Internet Research, 11(2), 114124.
  • Thelwall, M. (2001d), Results from a Web Impact Factor crawler. Journal of Documentation, 57(2), 177191.
  • Thelwall, M. (2001e). A publicly accessible database of UK university Website links and a discussion of the need for human intervention in Web crawling. Retrieved January 3, 2003 from http://www.scit.wlv.ac.uk/~cm1993/papers/a_publicly_accessible_database.pdf.
  • Thelwall, M. (2001f) A Web crawler design for data mining. Journal of Information Science, 27(5), 319325.
  • Thelwall, M. (2001g), The responsiveness of search engine indexes. Cybermetrics, 5(1). Retrieved January 3, 2003 from http://www.cindoc.csic.es/cybermetrics/articles/v5i1p1.html.
  • Thelwall, M. (2001h). Web log file analysis: Backlinks and queries. ASLIB Proceedings, 53(6), 217223.
  • Thelwall, M. (2002a). The top 100 linked pages on UK university Web sites: High inlink counts are not usually directly associated with quality scholarly content. Journal of Information Science, 28(6), 485493.
  • Thelwall, M. (2002b). A research and institutional size based model for national university Web site interlinking. Journal of Documentation, 58(6), 683694.
  • Thelwall, M. (2002c). Evidence for the existence of geographic trends in university Web site interlinking. Journal of Documentation, 58(5), 563574.
  • Thelwall, M. (2002d). Conceptualizing documentation on the Web: An evaluation of different heuristic-based models for counting links between university Web sites. Journal of the American Society for Information Science and Technology, 53(12), 9951005.
  • Thelwall, M. (2002e). An initial exploration of the link relationship between UK university Web sites. ASLIB Proceedings, 54(2), 118126.
  • Thelwall, M. (2002f). A comparison of sources of links for academic Web Impact Factor calculations. Journal of Documentation, 58(1), 6072.
  • Thelwall, M. (2002g). What is this link doing here? Beginning a fine-grained process of identifying reasons for academic hyperlink creation. Information Research, 8(3), paper no. 151. Available at: http://informationr.net/ir/8-3/paper151.html.
  • Thelwall, M. (2002h). Methodologies for crawler based Web surveys. Internet Research: Electronic Networking and Applications, 12(2), 124138.
  • Thelwall, M. (2002i). Research dissemination and invocation on the Web. Online Information Review, 26(6), 413420.
  • Thelwall, M. (2002j) In praise of Google: Finding law journal Web sites. Online Information Review, 26(4), 271272.
  • Thelwall, M. (2002k). Subject gateway sites and search engine ranking. Online Information Review, 26(2), 101107.
  • Thelwall, M. (2003a, in press). Methods for reporting on the targets of links from national systems of university Web sites. Information Processing and Management.
  • Thelwall, M. (2003b). Web use and peer interconnectivity metrics for academic Web sites. Journal of Information Science, 29(1), 1120.
  • Thelwall, M. (2003c). Can Google's PageRank be used to find the most important academic Web pages Journal of Documentation, 59(2), 205217.
  • Thelwall, M. (2003d, in press). A layered approach for investigating the topological structure of communities in the Web, Journal of Documentation, 59(3).
  • Thomas, O. & Willett, P. (2000). Webometric analysis of departments of librarianship and information science. Journal of Information Science, 26(6), 421428.
  • Torgerson, W. S. (1958). Theory and methods of scaling. New York : Wiley.
  • Vaughan, L. & Thelwall, M. (2003). Scholarly use of the Web: What are the key inducers of links to journal Web sites Journal of the American Society for Information Science and Technology, 54(1), 2938.
  • Vaughan, L. Q. & Hysen, K. (2002). Do Web link counts resemble citation counts: An empirical examination. ASLIB Proceedings, 54(6), 356361.
  • Wallerstein, I. (1976). The modern world system. New York : Academic Press.
  • Walsh, J. P., & Maloney, N. G. (2002). Computer network use, collaboration structures and productivity. In Hinds, P., & Kiesler, S. (Eds.), Distributed work. (pp. 433458). Cambridge , MA : MIT Press. Retrieved June 10, 2002 from http://tigger.uic.edu/~jwalsh/Collab.html.
  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge , NY : Cambridge University Press.
  • Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440442.
  • Weare, C., & Lin, W. Y. (2000). Content analysis of the World Wide Web-Opportunities and challenges. Social Science Computer Review, 18(3), 272292.
  • Web, E. J. (1966). Unobtrusive measures: Nonreactive research in the social sciences. Chicago : Rand McNally.
  • Wellman, B. (2001). Computer networks as social networks. Science, 293(14). 20312034.
  • Wellman, B., & Berkowitz, S. D. (1989). Social structures: A network approach. New York : Cambridge University Press.
  • Wilkinson, D., Harries, G., Thelwall, M., & Price, E. (2003). Motivations for academic Web site interlinking: Evidence for the Web as a novel source of information on informal scholarly communication. Journal of Information Science, 29(1), 5966.
  • Wouters, P. F. (1999). The citation culture. Ph. D. thesis. University of Amsterdam.
  • Wouters, P. F., & Gerber, D. (2003). Interactive Internet? Studying mediated interaction with publicly available search engines. Journal of Computer-Mediated Communication, 8(4). Available: http://www.ascusc.org/jcmc/vol8/issue4/wouters.html.