Traditional field delineation methods in bibliometric studies appear to have reached their limits when dealing with highly interdisciplinary fields such as nanotechnology or stem cell research which have recently become a focus of science and technology policy research. Researchers therefore have developed sophisticated algorithmic procedures to overcome these difficulties, hoping to collect a set of articles in a research field being studied that is complete as well as clean. The present case study explores the effect of field delineation on author co-citation analysis (ACA) studies of the intellectual structure of the library and information science (LIS) field using two different but overlapping journal sets to define the LIS field. We find that the major overall structure remains largely the same between these two views of the LIS field, which suggests that field delineation is not crucial to ACA studies of research fields, provided the emphasis of a study is on the major overall structure of a research field. The two views do however differ at a more detailed level of analysis, which suggests that studies that aim to shed light on particularly subtle research policy issues may need to pay serious attention to the way that they delineate their fields.
Field delineation has always been an integral part of bibliometric studies, and indeed has been recognized as a fundamental and largely unsolved problem in the scientometric literature (van Raan, 1996; Zitt, 2006). It has recently become a research area of its own because traditional field delineation methods appear to have reached their limits when dealing with highly interdisciplinary fields such as nanotechnology (Bassecoulard Zitt, 2007) or stem cell research which have recently become a focus of science and technology policy research.
Field delineation is normally the first step in bibliometric studies of a research field - the process of collecting a set of research articles that can represent this research field for a given time period. Citation analysis studies additionally require the reference lists of the articles being considered. The collective view of the authors of these articles as indicated by their citing behaviour is then analyzed with regard to which documents or authors they find most useful for their research and how these documents or authors are related to each other.
For a traditional well-defined research field such as mathematics or sociology, articles in a set of core journals in this field published during a period of time are often used for this purpose. This method, however, does not work well for emerging, interdisciplinary or multidisciplinary research fields because research articles in these fields are published in an extremely wide range of journals. Keyword searching does not always work in these fields or with existing citation indexes such as Web of Science and Scopus due to the many limitations inherent in these databases, e.g., limited coverage and search features (Web of Science) or limits put on the maximum number of records available for download (Scopus).
Researchers therefore have developed sophisticated algorithmic procedures to overcome these difficulties, hoping to collect a set of articles in a research field being studied that is complete as well as clean, i.e., includes all articles in a field but no articles on topics outside this field. Zitt Bassecoulard (2006), for example, combined citation analysis, keyword search and document clustering techniques to delineate the nano-sciences literature using Web of Science.
While it would be nice for all bibliometric studies to have a well-delineated field, and doing so might be essential for certain types of studies, questions exist regarding how much a complete and clean dataset matters to typical evaluative and relational bibliometric studies such as ranking of authors and co-citation mapping of a research field, and whether it is worth the significant extra costs to achieve a complete and clean dataset. We begin exploring these questions in the present study.
This study is motivated by studies that explored how much modified and sophisticated citation indicators improve upon simple citation counts for evaluation of impact, and those that compared author rankings using different citation databases. These studies found that simple citation counts often work very well, and that author rankings are highly correlated although ranks of some individual authors may vary significantly for various reasons.
We therefore explore the following research question using two different sets of journals in the same research area: how different are the intellectual structures shown in author co-citation maps resulting from these two datasets?
By exploring this question, we hope to understand how much typical citation-based bibliometric studies are affected by field delineation, and how confident we can be when using these bibliometric methods in policy and other studies.
We choose the library and information science (LIS) research field for our study, and compare maps produced from an author co-citation analysis (ACA) of articles in two different but overlapping sets of core LIS journals, to see how much of an influence the field delineation method may have on the outcome of a ACA study.
Studies (Nisonger Davis, 2005; Kohl Davis, 1985) have shown that ratings of LIS journals by the deans of LIS schools and by the directors of academic and research libraries (ARL) in North America in terms of value for tenure and promotion were quite different. For example, Nisonger Davis (2005) found that the journal Information Processing Management was ranked #3 by the deans and #27 by the directors whereas College Research Libraries was ranked #1 by the directors but #12 by the deans (Table 2, p. 350-351).
We take this as indication that LIS faculty and ARL librarians tend to publish in different journals, considering the importance of tenure and promotion for faculty and librarians. This may also indicate that LIS faculty and ARL librarians are interested in different research problems and consult different resources in their research as different journals normally have different foci and target different audiences.
To collect our source papers, we therefore take the journals that were top ranked by deans of LIS schools in one set and those top ranked by ARL directors in a second set, as found in Nisonger Davis (2005). We hope that these two sets of journals, to some extent, represent the research problems and topics valued by LIS faculty and ARL librarians respectively, although they have considerable overlap and they each publish research by both LIS faculty and ARL librarians. By analyzing these two sets of journals, we hope to see whether these two views of the LIS field are different in addition to answering our research question above.
Specifically, we take the top 20 of each of the two ranked lists of LIS journals presented in Table 2 of Nisonger Davis (2005). Because two journals tie for 20th place in the deans' list, we included the 21st journal of the directors' list in order to have an equal number of journals from the two lists. In order to focus on research articles, we excluded the journal Annual Review of Information Science and Technology as this journal mainly publishes reviews.
As citation analysis requires data on cited references, we also removed from our study the three journals that are not included in Web of Science in 2007 - the citation index that we used to obtain data on cited references. These three journals are: (1) School Library Media Research, which was on the deans' list, (2) Reference Services Review, which was on the directors' list, and (3) Libraries Culture, which was on both lists.
This results in 18 journals from each of the two ranked lists of journals (Table 1). Journals marked with * are those unique to each list. This table shows that 2/3 of the journals in the two lists overlap.
Table 1. Two sets of Journals Analyzed
We retrieved full Web of Science records of all research articles in these journals published during 2006 to June 2008 (when the searches and downloads were carried out). The deans' list of journals contained a total of 2322 articles that included 70947 cited references and 26970 unique cited (1st) authors, whereas the directors' list contained 1781 articles that together included 52006 cited references and 22362 unique cited (1st) authors. Articles in the deans' list of journals which are arguably more research-oriented included slightly more references on average than those in the directors' list which are perhaps more practice-oriented (30.6 vs. 29.2).
Using computer programs we created for this purpose, we calculated citation counts of all authors as well as co-citation counts between pairs of most highly cited authors, once based on the journals in the deans' list and once based on those on the directors' list. From these data we created ACA maps using methods described in Zhao Strotmann (2008).
Results and discussion
An 11-factor model was produced from a factor analysis of each of the co-citation matrices of 150 most highly cited authors from the two datasets. As seen in Table 2, the model fit from the directors' dataset is significantly lower than that from the deans' dataset as indicated by a lower percentage of total variance explained, a larger number of non-redundant residuals (i.e., the differences between observed and implied correlations) with an absolute value greater than 0.05, and a larger number of authors that are not well explained by the factor model (communalities). This suggests that highly-cited authors in the directors' dataset are more diverse than those in the deans' dataset.
Table 2. Factor models and their model fits
Table 3 provides the names and sizes of factors of the two 11-factor models. The size of a factor in this table is the number of authors who primarily load on that factor. A factor was named upon examining the articles written by authors who primarily load on the factor.
Table 3. Factors and their sizes
If large factors are interpreted as specialties in ACA (White McCain, 1998), Table 3 shows that the major specialties of the LIS field revealed from the two datasets are largely the same, although their sizes differ. The main difference is that Children's information behaviour is a distinct specialty in the directors' dataset but does not emerge as a factor from the deans' dataset. In addition, there is a very small and weak factor from the deans' dataset that appears to consist of authors who studied the H-index. This factor, tentatively named Mathematical Bibliometrics, did not emerge from the directors' dataset.
Concerning sizes and distinctness of specialties, the IR interaction specialty is much larger in the directors' dataset than in the deans' dataset whereas the Scientometrics and IR systems specialties differ between the two in the opposite way. Other specialties are roughly of the same size in both datasets, but those marked with square brackets appear clearly in only one of the two datasets (we consider a factor clear if at least 3 authors load on that factor with a value of at least 0.7). For example, the Theory/Foundations specialty is a clear specialty in the directors' dataset, but a weak and unclear one in the deans' dataset. This is quite an interesting finding because one would have expected to see stronger citing behaviour to theoretical foundations from LIS faculty than from ARL librarians.
Interrelationships between specialties are shown and illustrated in Figures 1 and 2, visual representations of the two factor models from ACA using the two data sets. As seen here, the two views of the LIS field are quite similar in terms of the field's overall structure, but show some differences at a more detailed level.
The two-camp structure (i.e., the literatures camp and the information retrieval (IR) camp) that has been observed repeatedly in ACA studies of the information science (IS) field (White McCain, 1998; Zhao Strotmann, 2008) can be observed on both maps here. Added to this LIS mainstream, we find three specialties that were not seen at all or not clearly seen in earlier studies of the IS field: Medical informatics, Technology diffusion, and Knowledge management (KM). All three appear to be extensions to LIS.
This picture of mainstream specialties vs. extensions is particularly clear on the map from the directors' dataset (Figure 2), especially when viewed in a three-dimensional layout (not shown here). It is less visible on the deans' map because the connection between the KM specialty and the LIS mainstream specialties (both camps) is much denser and more diverse than on the directors' map where the connection is weak and mainly via shared theoretical or methodological foundations as indicated by Strauss and Latour. The KM specialty also serves as the bridge between the Technology diffusion specialty and the LIS mainstream on both maps, although more strongly on the deans' map than on the directors' map.
The Usage analysis of e-resources specialty is relatively separate from the rest of the literatures camp on both maps, serving as one of the bridges between the two camps. The IR systems specialty, however, is separated out from the rest of the IR camp quite clearly on the directors' map, but much less so on the deans' map. The Theory/Foundations specialty is connected to both camps on the deans' map, but only to the IR camp on the directors' map. The Medical informatics specialty is connected to the LIS field through the IR systems specialty on both maps, but the two connecting authors on the deans' map also co-load on the IR interaction specialty.
The observed differences between the two views of the LIS field can be interpreted as meaning that, compared to ARL librarians, LIS faculty have discovered more connections between LIS and related fields such as management as well as between ‘soft’ IR (i.e., IR interaction) and ‘hard’ IR (i.e., IR systems). They also seem to have discovered some common theoretical foundations for both camps of the LIS field.
This case study explored the effect of field delineation on ACA studies of the intellectual structure of the LIS field. The major overall intellectual structure as indicated both by major specialties identified and by the overall interrelationships between them as mapped out via ACA, remains largely the same when the source papers are from two journal sets that differ in about one third of the journals included.
The two views of the LIS field do however differ at a more detailed level of analysis. Sizes of some major specialties and the strengths of connections between mainstream LIS and its extensions are examples. The differences we find suggest that research valued by the deans has found more diverse as well as stronger connections between mainstream LIS and the extensions to it as well as between different areas within the mainstream, than research valued more by the ARL directors has.
We conclude that field delineation is not crucial to ACA studies of research fields, provided the emphasis of a study is on the major overall structure of a research field. A good sample of the literature may work well enough in many cases, and a complete and clean dataset is justified only when the extra cost is not overwhelmingly high.
The detailed differences we find, however, also suggest that studies that aim to go into great detail or which try to shed light on particularly subtle research policy issues may need to pay serious attention to the way that they delineate their fields.
This study was supported in part by the Social Sciences and Humanities Research Council of Canada.