The present paper explores simultaneous modelling of cross-reference activity between authors by use of asymmetric proximities and multidimensional unfolding. We thereby model and map both citing and cited relations between authors in a common space. This enables a more comprehensive comparison of the author's dual roles of citing and being cited in a reference network. We model a set of 31 authors and compare the results to a recent author co-citation study of Information Science. We find that multidimensional unfolding is a reliable and insightful technique for modelling authors' citing and cited dimensions simultaneously. The common space of citing and cited positions exemplify that some authors have substantial discrepancies between their citing behaviour and the way their works are used by peers in the set. Further, modelling mutual relationships as asymmetric brings more accuracy and nuances into the maps as relationships become overt. Finally, the study discusses how high publication activity influences mapping results considerably. To counter this effect, we demonstrate the appropriateness of correcting data for main effects by use of an asymmetric proximity measure of odds ratios.
In the decade since the paper by White and McCain (1998) the interest in science mapping has increased considerably. Important methodical improvements have been achieved. Much less attention has been given to the theoretical and methodological assumptions behind science maps, their validity, and how they should be evaluated. Some notable exceptions are, for example, Persson (2001), Boyack (2004), Schneider & Borlund (2007a; 2007b), Klavans & Boyack (2009), and Schneider, Larsen & Ingwersen (2009). The motivation behind the current research is to investigate some of the tacitly held conventions in mapping studies.
Mapping studies are mostly based on co-occurrence analysis. A co-occurrence is an indirect measure of mutual symmetric relationship between two objects (e.g., Schneider & Borlund, 2007a). Co-occurrences are in fact aggregated measures from a reference network. Reference networks are multidimensional with a citing and a cited dimension. A co-occurrence analysis only models one of these dimensions. Co-occurrences are certainly valuable; but they disregard the overt multidimensional nature of a reference network. Co-occurrence analysis implies symmetric relationships between objects, whereas a reference from one object to another is a genuine asymmetric relation. Objects have various related roles in reference networks, contrary to what is modeled in a co-occurrence analysis. The active role of citing determines the “behavioral” profiles of objects. The passive role of being cited creates a fluctuating profile of reception among peers in the network. Consequently, objects have structural patterns in both the citing and cited dimensions of the network. While these patterns may differ considerably, they are associated.
It is therefore important to investigate, simultaneously, both the citing and cited roles of objects in reference networks, and to examine differences between such analyses and traditional co-occurrence analysis used in mapping studies. Hitherto this has been a difficult task; however, the present paper suggests a novel approach to simultaneous modelling of both citing and cited roles of authors in a reference network. We model the reference activity between a set of authors with an asymmetric probabilistic proximity measure. The results are mapped to a common space by use of multidimensional unfolding (Borg & Groenen, 2005), where each author is represented by two points, one for the citing and one for the cited position.
The common space enables comparison of the different roles of authors, their citing behavior, and their cited impact among the authors included in the study. To our knowledge multidimensional unfolding of reference networks have only been attempted in two previous studies, both with journals as objects (Heiser & Busing, 2004; Schneider, 2008).
The focus of this paper is methodical. We are interested in the modelling effects of the explored approach. We therefore employ a small convenience sample of objects based on a recent author co-citation mapping of the field of Information Science (Persson, 2008). The convenience sample enables us to model some known characteristics from the findings in Persson (2008). The purpose is to explore how cross-reference activity between authors, modeled by an asymmetric proximity measure and multidimensional unfolding, can supplement, elaborate or even question some of the tacitly held conventions and common findings in author co-citation analyses. Notice, this is not a mapping study of Information Science per se. The paper is structured as follows: The next section elaborates on the background for the study; subsequently we present the methods applied; followed by the results section; and finally we summarize and discuss the major findings from the unfolding analysis.
In bibliometric studies, cross-reference activity is modeled by use of transaction matrices (Price, 1981; Noma, 1982). Transaction matrices are square asymmetric matrices. A square matrix has identical objects in rows and columns. The matrix is asymmetric because the information flow expressed by references given by one object to another is not necessarily reflecting a mutual relationship. A transaction matrix for a set of authors therefore has citing author in rows and cited authors in columns. The diagonal of the matrix contains author's self-references. Notice that authors in rows and columns are identical due to the matrix being square. Hence, the matrix represents transactions between the set of authors, in the present case their cross-reference activity.
Modelling cross-reference transactions is not novel. Typically, the focus is upon cross-reference activity between journals and not authors, as suggested in this study (e.g., Tijssen, De Leeuw & Van Raan, 1987; Leydesdorff, 2001). Especially the utilization of journal transactions, available from the Journal Citation Reports®, has been the focus of recent studies by Leydesdorff (e.g., 2004; 2006; 2007a; 2007b). Common to these studies are that they find one position for a journal in the map, which supposedly reflects both the citing and cited dimensions of the journal. The present paper is a continuation of these studies. But we apply multidimensional unfolding in order to position authors simultaneously in both their citing and cited dimensions (Borg & Groenen, 2005).
Multidimensional unfolding was originally proposed by Coombs (1950). It is a data analysis technique for two-way two-mode proximity data, such as data in a transaction matrix (Borg & Groenen, 2005). Unfolding models are often used with preference data, i.e., the preference orders of n subjects for m objects. In the general unfolding situation, we do not necessarily have objects where n = m, as in the present study. We may even have completely different types of objects in rows and columns. Distance-based multidimensional unfolding can be considered an extension of multidimensional scaling (MDS). In MDS models we have d (xi, xj) = d (xj, xi), whereas in unfolding models we generally find d (xi, yj) ≠ d (xj, yi).
Thus distances in unfolding are inherently asymmetric. Further, in MDS we must have d (xi, xi) = 0, whereas in general we have d (xi, yi) ≠ 0 in unfolding. Unlike MDS, the unfolding model also allows modelling of the diagonal of the square matrix. In theory, this implies that the unfolding model is able to reflect all three relational aspects present in a reference transaction matrix, citing, cited and self-citing relations. To our knowledge, self-reference activity is almost never considered in mapping studies, at least not modelled and reflected in the mapping solutions. For an asymmetric matrix, multidimensional unfolding finds coordinates in a low-dimensional space for two sets of objects with a common quantitative scale. Until recently, unfolding models suffered from degenerate solutions in the majority of cases. A degenerate solution satisfies a given optimality criterion perfectly, while at the same time resulting in a substantively trivial, non-informative spatial representation (Heiser, 1989). For example, recurrently a configuration is found where one set of objects all collapse into the same location at the centre of a circle formed by the other set of objects. Recently, solutions to the degenerate problem have been found in two novel unfolding algorithms, PREFSCAL (Busing, Groenen, & Heiser, 2005) and GENEFOLD (Van Deun et al., 2007), respectively.
In order to explore the potentials of modelling author cross-reference activity by multidimensional unfolding, especially seen in relation to author co-citation analysis, the present study investigates some of the major structural findings in the work by Persson (2008). Persson (2008) presents a recent author co-citation map of Information Science for the period 2003–2007. The map contains 51 of the most cited authors for this period. These cited authors are identified from the same journal set used by White and McCain (1998) in their renowned study of Information Science. Three overall clusters are identified: “informetrics”, “soft information retrieval (soft IR)” and “hard information retrieval (hard IR)”. The “informetrics” cluster is the largest (most authors), followed by “soft IR”, and finally “hard IR”. According to Persson (2008, p. 37) some of the major findings are that “webometrics” has emerged as a strong subfield within “informetrics”, and that “Peter Ingwersen” seems to be a central node linking “webometrics” and “informetrics” to the “soft IR” cluster. Persson (2008, p. 37) points out that it seems “…information science is getting more integrated and some of our colleagues are doing the job”. The present study investigates these findings from a different perspective, i.e., cross-reference activity between some of the actual authors found in the map. Accordingly, the present study explores: 1) a different data set; 2) direct measures instead of indirect measure; 3) asymmetric relations instead of symmetric ones; and 4) modelling and mapping of both citing and cited dimensions, instead of only one of them. The next section presents the method applied in the study, including data selection choices, data collection, and the general settings for the analysis.
In the study, our primary interest is the effects of modelling author cross-reference activity, and not the actual mapping of a specific discipline. In order to examine the potentials of the present technique, we take a retroductive research approach, where we investigate some established findings from recent author co-citation analyses of Information Science. The objects in this study are therefore chosen selectively in order to focus exclusively upon distinct modelling aspects of the citing and cited relations between authors. As stated above, we use the findings in Persson (2008) as our context for the present analysis. Our method comprises four steps: 1) the collection of reference data in Social Science Citation Index® for the chosen authors; 2) the construction of a transaction matrix of cross-reference activity between the authors, 3) the calculation of asymmetric probabilistic proximities; and finally 4) the multidimensional unfolding of the matrix.
The data set comprises cross-reference activity between 31 authors. Author in this context means oeuvre, a body of writings by a person and not the person himself (White & McCain, 1998). 29 of the included authors in this analysis are subjectively chosen among the highly cited authors represented in the author co-citation analysis of Information Science, presented in Persson (2008). Two extra authors are included. Identification of the pertinent structures and groupings, and not necessarily who is in or out of the map, are the central aim in mapping studies. Obviously, such an aim entails that essential objects are included in the study. In this analysis we selected authors from the three major groupings identified in Persson (2008).
Table 1 presents the authors as well as their subfield grouping in Persson (2008); blank indicates the extra authors.
Table 1. Authors chosen for the present study and their subfield grouping in Persson (2008); i-metric (informetrics); web (webometrics); soft IR (soft information retrieval); and hard IR (hard information retrieval)
To a certain degree we reflect the slight differences in the number of authors in the groupings, though this is of minor importance to the present study. Consequently, the current set of objects is neither exhaustive nor representative for the field of Information Science. They are chosen in order to explore the ability of multidimensional unfolding to model known characteristics of co-citation maps. Individual reference activity profiles are obtained for the chosen authors. Similar to traditional author co-citation analysis, our study is limited to first cited author reference activity. Thompson Reuters' Social Science Citation Index® is used to identify first author cross-references between objects. Notice, we do not rely upon a specific journal set.
We do not limit the reference activity to a certain time period. Obviously, reference profiles for objects that have been active for longer periods of time will tend to be larger and more varied. This pattern is normal and of no problem to the unfolding analysis in general for at least two reasons: 1) magnitude is levelled out by the applied proximity measure; 2) it seems that reference profiles rapidly become stable, approximating a power law distribution. When reference activity cumulates over time, the reference odds ratio tends to become stable. A sudden change in interest and publication activity is needed in order to change the otherwise stable reference pattern.
A 31 by 31 square asymmetric transaction matrix is constructed, where rows and columns correspond to authors giving references and authors receiving citations, respectively. The transaction matrix is transformed into a simple probabilistic proximity matrix of observed to expected cross-reference activities, corrected for main effects in the data. Main effects reflect the tendency of some authors to have consistently higher frequencies than others. Our purpose is to model author structure, where the magnitude of individual authors' reference activity does not dominate. Contrary to most mapping studies, we apply a probabilistic measure to reflect the asymmetric nature of data and to level out main effects in data. We denote frequencies fij, where i is the citing and j is the cited dimension. If authors give references to each other in a random fashion, we would expect the joint frequencies to satisfy the formula for the expected frequencies (eij) under independence:
that is, the product of the estimated probability of citing and the estimated probability of being cited times the total number of citations N (here, the + in the marginal totals replaces the index over which we have summed). As a measure of proximity to be used in the unfolding model, we define the odds of author ai giving a reference to author bj against what we expect under independence:
The odds are given in Table 2 in Appendix 1. Note that ρ (ai, bj) = 1 if author ai gives references to author bj as expected according to the size of cross-reference activities of the two authors; ρ (ai, bj) < 1 if author ai does not give references to bj as much as expected according to size; and ρ (ai, bj) > 1 if author ai does give references to author bj greater than expected. In the unfolding solution, ρ (ai, bj) < 1 will lead to a relatively large distance d (xi, yj) and ρ (ai, bj) > 1 to a relatively small distance d (xi, yj). Author self-references are in the diagonal and are generally greater than expected. Remember that self-reference odds ratios are distinctive for the present set of 31 authors. It is not the actual self-citing rate for the chosen authors. It is the odds ratio for giving a reference to oneself compared to giving a reference to one of the other authors included in the study. For example, the odds that VAN RAAN gives a reference to himself rather than exchanging references with one of the other authors in the set is 21 to 1. Very high self-citing odds may, for example, indicate a restricted reference activity for a given author towards the most of the authors in the set.
The asymmetric proximity matrix of cross-reference odds ratios is the basis for the multidimensional unfolding analysis. In the present study we apply the GENEFOLD algorithm (Van Deun et al., 2007). GENEFOLD performs an alternating least squares and iterative majorization procedure. The procedure starts with a random configuration of dimensionality and then alternates between transformation updates, configuration updates and regression weight updates. This is to ensure that the sequence of loss function values converges to a local minimum to provide an optimal configuration and avoid degenerate solutions. We apply an ordinal unfolding approach where distances in the map have a monotone relation with the odds ratios in the proximity matrix. The subsequent section presents and discusses some of the characteristic findings of the multidimensional unfolding analysis.
In multidimensional unfolding, locating the citing and cited positions of an author, assessing the mutual distance between positions, and interpreting the positions relative to other nearby authors, indicate author's degree of resemblance in their different roles of citing and being cited, as well as their overall structural positions in the investigated reference network. Authors with similar patterns, citing, cited or both, will tend to group. A good modelling solution will locate an author's citing and cited positions according to the most dominant cross-reference proximities in their profiles. Remember that the ratios are the odds of a row author citing a column author against the expected values under independence. Notice, the unfolding solution is a common space for the two sets of objects. This entails that citing and cited positions, and distances between them, can be jointly interpreted. How an author presents him- or herself through citing behavior, and the same authors' work are perceived by peers in the network, may differ considerably. It is important to emphasize that the authors should be considered primarily as objects (oeuvres), and not as individuals. Their profiles are not exhaustive, and their inclusion in the present study is highly subjective. Take for example SALTON. In this study, SALTON citing is a heavily reduced profile of references given by Gerald Salton to the other objects included in the study. Only publications indexed by Thompson Reuters are considered in the citing profiles, and only the first cited author of a work is considered in the cited profiles. As Gerald Salton is deceased his citing profile is in principle concluded and thus fixed, though the one applied in this study is definitely not exhaustive. A fixed concluded citing profile is no problem. Theoretically, an object with a concluded citing profile should eventually converge upon a stable position among citing equals. Cited profiles, however, are in principle infinite and never fixed. This entails that the object's cited dimension, in principle, is in a state of flux. With this in mind, we now examine some of the characteristic findings of the unfolding solution.
The result of the multidimensional unfolding analysis for the two sets of objects, i.e. the citing and cited dimension for each individual author is presented in Figure 1. The closed circles indicate citing positions and the open circles the cited positions for each author. Citing labels are to right of the closed circles and cited labels are to the left of the open circles.
Several collective groupings emerge from the unfolding solution. At first these groupings seem to correspond with Persson (2008). Moving clockwise around the map, in the upper right quadrant, we identify what would be the “soft IR” in Persson (2008). This group comprises: SPINK, WILSON, JANSEN, CHEN, VAKKARI, DERVIN, KUHLTHAU, BATES, SARACEVIC, BELKIN, HJORLAND, INGWERSEN and perhaps CRONIN. Notice that there are some marked differences between the citing and cited positions of some of the objects within this group. Overall the cited positions are fairly coherent, perhaps with the exception of HJORLAND and INGWERSEN. Especially INGWERSEN's citing and cited positions are some distance apart, we return to that below. Most notably is the clear separation of this group when we compare their citing positions to the cited ones, where the cited positions seem more coherent and citing positions more split. We can see that SPINK, WILSON, JANSEN, VAKKARI, DERVIN, KUHLTHAU remain as a citing group in the upper right corner, whereas CHEN, BATES, HJORLAND, and especially SARACEVIC, BELKIN, and INGWERSEN move close to the citing and cited positions of what would be the “hard IR” group. Consequently, the citing behaviors of SARACEVIC, BELKIN, and INGWERSEN are much closer to that of the “hard IR” group than the citing behaviors of SPINK, WILSON, JANSEN, VAKKARI, DERVIN, KUHLTHAU. However, when cited these objects are much more alike as the group becomes more coherent. They give references to each other and the odds of being cited from, for example, the “hard IR” group is very low. The unfolding analysis thus indicates the differences between the citing and cited dimensions of these authors. There is a clear difference between how these authors perceive themselves as signified by their citing behavior and how others perceive them signified by the reception of their works. In fact what is indicated is a substructure not clearly deducible from the co-citation map in Persson (2008).
Moving further clockwise round the map, we find the “hard IR” group in the lower right corner. We will return to this group in the subsection below. At 6'o clock we find the first of the “informetric” groupings. This one comprises GARFIELD, SMALL, WHITE and MCCAIN. This group seems coherent as distances between citing and cited positions are small. Notice that SMALL's cited position is a bit closer to WHITE and MCCAIN. While GARFIELD has certainly published in the scientometric field, the position and structure of this group suggest that they are closer to both “soft and hard IR”. Not entirely unreasonable. It is tempting to name this group the “Philadelphia group”, with an ISI and a Drexel subgroup.
Moving further to the middle left part of the map, we identify what would be the “scientometrics” subgroup of the “informetrics” group. It comprises of ROUSSEAU, VANRAAN, MOED, GLANZEL, EGGHE, and LEYDESDORFF. There are however some noticeable alterations between the citing and cited structure of these authors. In most co-citation maps EGGHE and ROUSSEAU are located close to each other. This is also the case in the present analysis but only for the cited positions. It is clear that the unfolding analysis models ROUSSEAU's citing behavior different from that of EGGHE. ROUSSEAU's citing position is some distance apart from the cited position, whereas EGGHE's citing position is fairly close to his cited one. This suggests some difference in research orientation between the two of them.
In the upper left to middle part of the map we find what would correspond to the “webometrics” group in Persson (2008). The group comprises of THELWALL, VAUGHAN and BAR-ILAN. It is clear that there is a marked difference between the citing and cited position of THELWALL. The citing position of THEWALL is some distance away from the more coherent cited positions of the three “webometrics” objects. In Persson (2008) it is suggested that “webometrics” is the primary link between “informetrics” and “soft IR”. This is not immediately discernable from the unfolding analysis. The group seems more isolated, though there is some likelihood that BAR-ILAN will be cited from some authors outside this group. This is indicated by the cited position. It may be that the isolation of the group to some extent is the result of these objects being the “youngest” chosen for the analyses, though they do receive references from other authors in the set. More likely, the collective position of the group is the result of high internal publication activity, which also results in high internal cross-reference activity. We further discuss this in a subsection below.
Finally, there is the centre of the map. CRONIN seems to move along a vertical axis, with the citing position a bit to the “informeteric” side and the cited position closer to the “soft IR” side. This makes Cronin the central object in the present unfolding analysis. Interestingly, CRONIN is currently editor of two of the most influential Information Science journals: Annual Review of Information Science and Technology and Journal of the American Society for Information Science and Technology.
Notice, this is our interpretation. What is important in the present study is not whether this is a good reflection of the subgroups of Information Science. It is instead to explore the modelling of asymmetric cross-reference data in an unfolding analysis to see how such analyses can supplement, elaborate, or question traditional author co-citation analyses.
Below we give two specific examples of the modelling capabilities of the unfolding analysis.
The importance of asymmetric relations: “hard information retrieval”
In the lower left corner of Figure 1, CROFT, HARMAN, SALTON, ROBERTSON and SPARCK JONES seem to comprise the “hard IR” group. Interestingly, the distance between citing and cited positions of the objects is close except for the cited position of SALTON, which is slightly more to the lower middle of the map. This is an indication that these objects have a clear tendency to cite within their own group. The indication is confirmed from the cross-reference odds ratios in Table 2. The only external object with a consistent odds ratio ≥ 1of being cited by objects in this group is BELKIN. If we examine the positions of cited objects external to the group (open circles) and compare them to the group's citing positions, we find that the location of BELKIN is indeed the closest cited object; however it is some distance away from the “hard IR” group. Such a structural position of BELKIN is not apparent from Persson (2008). Notice, that many of the objects in the present study are or have been active researchers during the same time periods. They have therefore had the opportunity to give references to at least some of the other objects internal or external to their present groupings. Apparently, external reference activity seems to be very limited for the “hard IR” group. In the present study, “retrievalists” do not repay the gracious reception their works get from “information scientists”. The cross-reference activity is not mutual; the relationship is asymmetric going from Information Science objects to information retrieval objects.
The findings by Persson (2008), as well as other co-citation analyses, suggest that “hard IR” is a subfield of Information Science. Information retrieval is certainly important to Information Science; it is part of its “intellectual base”. However, it is questionable whether the cited authors that usually comprise the “hard IR” group would see themselves as “information scientists”, and indeed information retrieval as a subfield of Information Science. In contrast, to them information retrieval is rather a subfield of computer science. They have different publication patterns than Information Science, with special conferences and journals most often not included in mappings of Information Science. They do, however, occasionally publish in journals considered among the core of Information Science. This surely indicates common interests between information retrieval and Information Science. However, as indicated in this unfolding analysis, the citing preference of “retrievalist” is clearly directed towards other “retrievalists”, also when they publish in Information Science journals (if indeed such can be said to exists per se).
Consequently, the present unfolding analysis questions whether “hard IR” is a subfield of Information Science. We should of course be very careful not to make hasty generalizations about a group of objects based on insufficient data. So at least we demonstrate the potential of modelling direct asymmetric citing – cited relations, and emphasize its value when interpreting findings based on indirect co-occurrence relationships. Notice, we do not question the appearance of “hard IR” in co-citation maps. We just emphasize that it is a product of indirect co-occurrence analysis, and that co-occurrences can conceal more genuine asymmetric relationships, this should be remembered when interpreting bibliometric maps.
The influence of publication activity: “webometrics”
Recent mappings of Information Science, such as Åström (2007) and Persson (2008), have indicated that during the last decade “webometrics” have emerged as a strong research specialty within Information Science. Persson (2008) assigns the cited oeuvre of INGWERSEN a structural role as the central connection between “webometrics” (“informeterics”) and “soft IR”. INGWERSEN connects the two major subfields. Indeed “Peter Ingwersen” and co-workers have published some founding papers on “webometrics” in the late 1990s and early 2000s. Nevertheless, this publication behaviour is not visible in the present analysis. INGWERSEN's works on “soft IR” dominates his citing position. INGWERSEN's cited position, however, is a considerable distance apart from the citing one. It is located slightly above the central axis in the map. From Table 2 we can infer that INGWERSEN has several odds ratios of receiving citations ≥ 1 and that most of the authors who give these references come from the “soft IR” group, but noticeably, also from the smaller “webometrics” group. Despite the “soft IR” group's reference activity towards INGWERSEN his cited position seems on their periphery.
In Persson (2008) “webometrics” has a central position and THELWALL and INGWERSEN have central roles – indeed INGWERSEN seems to bridge the two groups. Yet in the present analysis “webometrics” is on the fringe of the map, it has no apparent central role. This could of course be an artefact of the choice of objects included in the study. But this is probably not so. On the contrary, the central position of “webometrics” in Persson (2008) is more likely due to influence of “mass”, which means that the study is based upon a considerable number of publications from the authors in the “webometrics” group. A straightforward online analysis in Web of Science® supports this claim. From the journal set used by Persson (2008), given its time limitations, among the top 20 most productive authors are THELWALL, BAR-ILAN and VAUGAHN. More distinctively, THEWALL is by far the most productive author in the set with 43 journal articles. In second place comes EGGHE with 28 – a significant difference between rank 1 and 2 in this distribution. If we also investigate the distribution of authors who give references to INGWERSEN (no time limit) the picture of “mass” becomes apparent. Excluding INGWERSEN himself, all three “webometrics” authors are in the top 10 authors who give references to INGWERSEN and most astonishingly, THEWALL'S citing behaviour is responsible for almost 10% of all INGWERSEN' s citations. Number two is SPINK (another highly productive author in the data set) with around 4% of the citations. It is an unusual citation profile. It is rare that one author (other than the author himself or herself) contributes with such a large proportion of citations. Obviously, if such a citing author is included in a study “the force of gravity”, to use a physical metaphor, will play a large role.
So what is the effect of the “force of gravity” in the present study? The probabilistic proximity measure applied in this study corrects for main effects – “mass” – when modelling cross-reference activity. The cited position of INGWERSEN reflects that THEWALL and the “webometrics” group, as well as the “soft IR” group are likely to cite INGWERSEN. However, contrary to the study byPersson (2008) the position of INGWERSEN is not particularly central for the two groups; others also have bridging roles such as JANSEN. Though heavily cited, the citations to the “webometrics” group are mainly internal – asymmetric. Contrary to Persson (2008), the multidimensional unfolding is capable of modelling this effect in a way that positions “webometrics” more on the fringe than in the centre of the Information Science map. It is therefore important to notice that in the data set used in Persson (2008), the three “webometrics” authors contribute with a substantial number of publications; especially THEWALL has more publications than would be expected. With such a proportion of publications, reference activity is of course influenced, both references to others and self-references. As objects in mapping studies are chosen among highly cited ones, and since self-reference are not dealt with, these authors seem in a “favourable position” due to their publication activity. Subsequently, when compared with indirect and symmetric measures such as co-citations, the reference activities of these authors come to play an important role in the resulting maps. It is highly probable that their central position in Persson (2008) to a large degree is self constructed due to their citing “mass”. More generally, this effect is latent in all studies with indirect measures of co-citation and where publication activity is not considered. This is not the case with multidimensional unfolding. Here self imposed “mass” is detected and modelled differently. Hence, the question is whether the “webometrics” group have an artificial or disproportional weight upon the structural patterns in Persson (2008)?
With this we end the presentation of the unfolding analysis. The last section sums up the findings and discusses their implications and possible limitations.
Summary and discussion
The present paper explores modelling of cross-reference activity between authors by use of asymmetric probabilistic proximities and multidimensional unfolding. As a result we are able to simultaneously model and map both citing and cited relations between authors in a particular reference network. We thereby examine the different roles of authors, their “citing behavior” and their “cited reception” among peers. Three major findings from the present analysis are important. First, we demonstrate that multidimensional unfolding is a reliable and insightful technique for modelling authors' citing and cited dimensions simultaneously. The common space of citing and cited positions do exemplify that some authors have substantial discrepancies between their citing behavior and the way their works are used by peers in the set. The case of INGWERSEN is illustrative; the citing behavior is close to the “hard IR” group, while the cited position is some distance away, split between “soft IR” and “webometrics”. Second, modelling mutual relationships as asymmetric brings more precision, accuracy and nuances into the maps as relationships become more overt.
It is clearly demonstrated in the present case with the “hard IR” group. In most author co-citation studies of Information Science, this group is tacitly accepted as a major subgroup of Information Science. We claim that such interpretations seem to forget the indirect and symmetric nature of co-citations. The reason why “hard IR” turns up in mappings of Information Science is that likelihood of information scientists citing this group is considerably higher than the likelihood of retrievalists citing information scientists– only few references are returned. Information retrieval (IR) is important for Information Science. It is one of the “intellectual bases” but it is questionable whether IR is a subgroup of Information Science. This case demonstrates the importance of examining mutual relations as genuinely asymmetric. This reduces ambiguousness when interpreting mapping results. Third, the application of an asymmetric probabilistic proximity measure of odds ratios demonstrates the appropriateness of correcting data for main effects. Most author co-citation analyses collect their data from a limited set of journals that defines the field of study. A considerable variation in publication activity among journals and their authors is very likely in such a set. Consequently, this will influence the subsequent mapping study, as high publication activity leads to dominating reference patterns. In principle, this may not be a problem, however, we should reflect upon it as it certainly influences the results. In the present case, we demonstrate this with the “webometrics” group. Whereas the present odds ratios and unfolding analysis positions “webometrics” on the fringe of the present map, in Persson (2008), where “mass” is not corrected for, the group has a much more central position in the map. Consequently, we think that the present study demonstrates that modelling cross-reference activity by use of an asymmetric probabilistic proximity measure and multidimensional unfolding not only elaborates and supplements an author co-citation map; it also questions some its established findings, even considering that the data are basically different.
There are some limitations to the present study. First, we only apply first cited author data. Collection of all-author citation data is possible, and should be examined further. We still need to investigate the scalability of the unfolding technique. As unfolding is a dimensionality reduction algorithm, it necessarily has limitations as to the number of objects it can reasonably model. This also needs to be investigated further. In addition, it may appear that “younger” objects would be difficult to model in this approach. It is certainly true that “older” objects have larger and more stable reference profiles, and that the profiles of “younger” objects fluctuate more. This may create some flux in the maps, but over time author profiles approximate power law distributions and their citing and cited odds ratios become stable. Consequently, their positions in the maps will also stabilize. When profiles are stable a substantial number of changes are needed in order to alter an object's positions in a general structure. So, in principle all objects can be modelled, in fact, it is possible to analyse both a “younger” and an “older” generation of authors in a field simultaneously.
Finally, the importance of bibliometric maps is their ability to reflect structure. Structural patterns, dynamics and stability are important, not necessarily who is in or out of the maps. Most maps cannot be exhaustive, as disciplinary boundaries are not exact or agreed upon, and perhaps more important, comprehensive data are not available.
Table 2. Asymmetric proximity matrix of cross-reference activity odds ratios between authors included in the study An odds ratio of 1 indicates an expected reference activity of one author to another given the total cross-reference activity of the two authors; odds ratios below or above 1 reflect lower or larger expected reference activity.