Although it has had a long history, the mapping and visualization of knowledge domains has attracted great interest in recent years both within and outside of the Library and Information Science/Studies field (LIS). This is probably due to a number of factors. For example, written knowledge has become available in digitized forms in recent years with the explosive development of the World Wide Web and its plethora of online databases, which provides huge amounts and large varieties of data for this type of study; and the increased computing power has enabled routine analysis and visualization of such vast networks of information for social science researchers (Shiffrin & Börner, 2004).
Three main components of network analysis are (a) the objects of analysis, (b) the relationships between these objects, and (c) the mapping or visualization tools, where the former two correspond to nodes and edges of a network, and the latter to methods for representing the resulting network in an intuitively graspable manner in two- or three-dimensional space for simple viewing or interactive exploration. Traditionally, individual articles or patents, authors' oeuvres, and scholarly journals have been used as objects of analysis, citation links and word or phase cooccurrences as measures of relatedness between these objects, and multivariate analysis techniques, such as cluster analysis, multidimensional scaling (MDS), and factor analysis (FA) as analysis and mapping tools. All have recently been adapted to the Web environment (e.g., Web sites viewed as journals and hyperlinks as citation links), giving rise to Webometrics (Thelwall, Vaughan, & Björneborn, 2005). New types of objects and relationships have appeared in recent studies, e.g., genes and DNA sequences, and so have new mapping methods (Börner, Maru, & Goldstone, 2004; Boyack, Börner, & Klavans, 2007; Chen, 2006; Henzinger & Lawrence, 2004; Van Eck, Waltman, & Van Berg, 2006; Wilkinson & Huberman, 2004).
Visualization of LIS
LIS itself is perhaps the most frequently mapped knowledge domain in the LIS literature, which is not surprising given the amount of expert domain knowledge required by this type of study. Among these, White and McCain (1998), referred to as W&M98 below, is probably one of the most thorough, covering the field of Information Science (IS) for the years 1972–1995. This study defined the IS field by its 12 core journals, and collected citation and cocitation data from the databases of the Institute for Scientific Information (ISI) through DIALOG search facilities. It used authors' oeuvres as the objects of analysis and factor analysis and MDS to visualize the field at several stages of its development. Naturally, this well-cited study also provided an excellent review of earlier visualization studies of the LIS field. Following the publication of this study, there was a long quiet time period with relatively few visualization studies of the LIS field, especially considering the surging interest in the visualization of knowledge domains we find in this study, until Jansens, Leta, Glanzel, and De Moor (2006) and Åström (2007).
Jansens et al. (2006) analyzed full-text articles from five core LIS journals covering the years 2002–2004, and Åström (2007) articles from 21 LIS journals covering the years 1990–2004 and references found in these articles as indexed in Web of Science. Both used individual articles as objects of analysis, but one used word links and the other cocitation links to measure the relationship between articles. Both used MDS and cluster analysis techniques to visualize relationships.
These two studies provided two views of the LIS field represented by individual papers. An author cocitation analysis (ACA) study of the field should contribute to an even more complete view of the field, and, as importantly, of studies on visualization of knowledge domains, as it can help researchers contrast it with the two article-based studies to compare different types of visualization studies.
The present study examines the IS field defined in the same way as W&M98 for the years 1996–2005, i.e., the decade following the time period examined in that study using ACA techniques. This will allow us to compare with that study in order to see how the IS field changes, in addition to a static view of the field during this decade. We also take this opportunity to attempt to clarify a few confusions that exist in ACA and to introduce a novel visualization technique for factor analysis results in ACA.
Author Cocitation Analysis (ACA)
Since its introduction by White and Griffith (1981), author cocitation analysis has gained great renown in the study of intellectual structures of scholarly fields and of the implied social structures of the corresponding communities and, as a result, has become an important area of information science (White & McCain, 1998; Zhao & Strotmann, 2007).
ACA is one particular type of cocitation analysis. In cocitation analysis, a set of items (documents, authors, journals, etc.) is selected to represent a research area that can be as large as an entire science or as small as a single specialty within a research field (Small, 1999). Relationships between these items are then analyzed by using their cocitation counts as similarity measure and multivariate analysis techniques such as cluster analysis and MDS as analysis tools in order to study the intellectual structure of this research field and to infer some of the characteristics of the corresponding scientific community. In general, two items are counted as cocited when they appear together in the same reference list of an article. ACA takes the author as the unit of its analysis, uses cocitation counts to measure the relationships between oeuvres of representative authors, and applies multivariate analysis techniques such as FA to study the underlying structure of a research field as represented by these oeuvres.
Most ACA studies have applied the fairly consistent general steps and techniques of classic ACA (McCain, 1990a) to different research fields with little or no modification. However, in recent years, interest in exploring alternatives has resurged. Some studies have proposed new techniques for mapping author clusters (Chen, 2006; White, 2003); others have explored how to go beyond first-author counting in ACA with the support of newly available citation analysis data sources (Schneider, Larsen, & Ingwersen, 2007; Zhao, 2006); still others have challenged some of the statistical techniques used in classic ACA, two examples of which being Ahlgren, Jarneving, and Rousseau (2003) on whether Pearson's r or cosine measures should be used to define similarity in ACA, and Leydesdorff and Vaughan (2006) on whether a symmetric cocitation matrix or an asymmetrical citation matrix should be used as input for mapping purposes.
These studies, including Ahlgren, Jarneving, and Rousseau (2004a, 2004b), Bensman (2004), and White (2003, 2004), have contributed to the clarification of some existing measures and techniques and to the development of new ones, which may result in improved ACA studies. The present study aims to further contribute to this area of research by providing empirical evidence to help clarify a few issues in the use of FA in ACA.
Factor Analysis in ACA
Factor analysis (FA) has been used in ACA from the very beginning (White & Griffith, 1981). Its goal is to seek a small number of meaningful underlying factors that explain the relationships among a larger number of items, and it is used in ACA to reveal specialty structures of research fields and authors' memberships in one or more such specialties. FA applied in ACA has been shown to provide clear and revealing results as to the nature of a discipline (White & McCain, 1998).
In addition to the specialty structure of a research field that other multivariate analysis techniques such as cluster analysis and MDS can also identify, FA can assign individual authors to more than one specialty, can indicate the level of association between authors and the specialties to which they belong, and can provide measures of “the degree of relationship between specialties” (White & Griffith, 1982, p. 260).
A researcher faces a number of methodological decisions when designing a FA study in ACA, regarding the selection of a base set of citing documents for cocitation counting, the selection of authors to analyze as representative of the field of interest, the type of data to input to the FA routine, the method for extracting factors, the number of factors to extract, which factor rotation method to apply, and which factor loadings to consider meaningful for interpreting factors. Different decisions may well produce different results (Leydesdorff & Vaughan, 2006). It is therefore important to understand the options that are available (e.g., types of input data, methods of extracting factors, factor rotation methods), the different results that may be obtained using different techniques, and when and why certain techniques may be more appropriate than others.
An examination of previous ACA studies shows some common practices in the application of FA techniques in ACA: (a) principle component analysis has been the major method used for extracting factors; (b) Kaiser's rule of eigenvalue greater than one the main technique for deciding the number of factors to select; (c) both raw cocitation frequency matrices and Pearson's r correlation coefficient matrices have been used as input data to the Factor Analysis procedure; (d) both orthogonal and oblique rotations have been applied in ACA to facilitate the interpretation of results; (e) and various levels of significance of factor loadings have been reported for interpretation, ranging from 0.3 to 0.5.
However, studies appear to be largely silent on why a particular technique was chosen instead of other techniques of the same type, nor do they compare ACA results obtained using different techniques (e.g., oblique versus orthogonal rotation) on identical data sets to help researchers choose among them. The present study attempts to address this issue. We will focus our examination on the rotation methods of FA in ACA where confusion exists in the literature and look forward to seeing a clarification of other aspects of ACA methodology in future research publications.
There are two types of rotation methods in FA: orthogonal and oblique. Theoretically, an orthogonal rotation assumes that resulting factors are not correlated and works best for revealing independent dimensions of the underlying structure being studied. An oblique rotation, by contrast, is not restricted to keeping the extracted factors independent of each other and therefore works better at separating out factors when it can be expected theoretically that the resulting factors would in reality be correlated (Hair, Anderson, Tatham, & Black, 1998). In the case of ACA, large factors are interpreted as specialties within a research field, and the correlation between factors can therefore be expected to be fairly high. Indeed, high correlations between factors were found in different fields, with the highest correlation between two factors reported in a given study ranging from 0.39 to 0.65 (McCain, 1990a; White & Griffith, 1982; Zhao, 2006). An oblique rotation, therefore, appears theoretically more appropriate for ACA. As an additional advantage, an oblique rotation produces a component correlation matrix to indicate the degree of correlation between resulting factors (McCain, 1990a; White & Griffith, 1982).
In practice, however, both oblique and orthogonal rotation methods have been used in ACA studies, even by the same authors, and both types of studies have demonstrated how useful and informative the factor-analytic approach to ACA is. For example, McCain (1990a) and White and Griffith (1982) applied an oblique rotation method and discussed its advantage of providing measures on the interrelationships between factors. Later studies by these authors, such as W&M98, however, switched to an orthogonal rotation instead, without discussing why they made this choice. And yet the orthogonal rotation in that study still produced clear and revealing results as to the nature of the IS field (White & McCain, 1998).
We have become curious about whether different rotation methods would produce different specialty structures in ACA and whether such differences are sufficient to justify one over the other.
To summarize, the present study therefore addresses the following research questions.
- 1.What is the intellectual structure of the IS field, 1996–2005, the first decade of the Web? How has it changed from the decades before this time period?
- 2.What are the differences in factor analysis results in ACA between the use of an oblique rotation and an orthogonal rotation?
We hope to contribute to a more complete mapping of the IS field and to a better understanding of the factor analysis techniques when applied to ACA, and consequently to improved ACA studies.