Geovisualization of bibliographic data using ArcGIS



This study examined the geographic aspects of literature involving the visualization of bibliographic data published by authors residing in the contiguous United States. ArcGIS was used to visualize networks of cited-citing publications and co-authors for 102 publications based on first author institutional affiliation. Spatial statistics and other tools within ArcGIS were used to explore the clustering of research activity and test the “death of distance” hypothesis among co-authors. Both the “producers” and “consumers” of the scholarly output were found to be clustered. Visual inspection of the thematic maps found research activity concentrated in the following cities: Bloomington, IN, Philadelphia, PA, Sandia, NM, Stillwater, OK, and Tucson, AZ. Over half of the co-authorship (60%) occurred among authors within the same ZIP code. The cited-citing publication network and co-author network maps shared a characteristic pattern indicating that many producers and consumers also co-authored with each other. While the number of co-authored publications in the field of visualization of bibliographic data increased from 1995–2009, the average co-author distance remained unchanged over that period.


Visualizing relationships between authors and co-authors using citation maps began with and even facilitated the development of citation indexes (Allen, 1960). While most citation maps are created in relative space (i.e., relationships are scaled relative to each other), there is growing interest in mapping bibliographic data in geographic space to explore collaboration, national research productivity, and research spillover to surrounding businesses. More recently, the impact of the Internet on co-author collaboration has been examined to investigate whether distance remains an impediment to collaboration (Adams, Black, Clemmons, & Stephan, 2005; Hoekman, Frenken, & Tijssen, 2010; Maggioni & Uberti, 2009; Waltman, Tijssen, & Van Eck, 2011). This assertion, that the telecommunications revolution has eliminated impediments and facilitated collaboration, is often referred to as the “death of distance” (Cairncross, 2001). However, most collaboration appears to be local (e.g., advisor-student or colleagues at the same institution) and the First Law of Geography (Tobler, 1970) would predict collaboration to be localized. More specifically, “[e]verything is related to everything else, but near things are more related than distant things” (Tobler, 1970, p. 236).

Geographic Information Systems (GIS) are increasingly being utilized to present bibliographic data in geographic space. One of the earliest studies to use ArcGIS, a commercially available GIS, to map bibliographic data in geographic space is Batty's (2003) study of institutional research productivity by highly cited individuals from the Institute for Scientific Information's HighlyCited database. The bibliographic data was mapped using author institutional affiliation and presented as thematic maps with proportional symbols. Borner, Penumarthy, Meiss, and Ke (2006) mapped a set of papers from the Proceedings of the National Academy of Sciences based on first author affiliation. Borner et al. (2006) mapped the bibliographic data as thematic maps using proportional symbols, as well as flow maps connecting information “producers” and “consumers” of the scholarly works, and thus produced a visualization of knowledge diffusion. In these studies, ArcGIS facilitated the visualization of bibliographic data, but spatial statistics and other tools within ArcGIS (e.g., average nearest neighbor analysis and distance measurements) are never fully utilized to statistically explore research clustering or the “death of distance.”


This study expands upon previous work using ArcGIS to explore geographic aspects of bibliographic data. There are no known studies that utilize spatial statistics and other tools within ArcGIS to examine research clustering or the “death of distance” using bibliographic data. The approach used could be applied to any knowledge domain, but this study examines journal articles and conference papers involving the visualization of bibliographic data from 1995 to 2009 within the contiguous United States.


The objectives of this study are to: (1) demonstrate how the spatial aspects of bibliographic data can be represented as both points and lines (i.e., lines between cited-citing publications and co-authors), and (2) determine if the “death of distance” occurred in the field of visualization of bibliographic data from 1995 to 2009.


Journal articles and conference papers involving the visualization of bibliographic data were identified using Thomson Reuters' Web of Science (Thomson Reuters, 2010). This was accomplished using a single keyword search and then refining the search using Document Type and Publication Year (1995–2009). Each publication was then examined for an actual visualization (e.g., co-author citation map, subject cluster maps, etc.) versus tables or graphs of bibliographic data. The publications were further refined to include only those authored and/or co-authored by individuals residing in the contiguous United States. The publications that cited the initial set were also obtained. Both the cited and citing publications were geocoded using the ZIP code associated with the institution of the first author. Separate point maps were then created for the cited and citing publications in ArcGIS using a United States base map (ESRI, 2008) and projected as USA Contiguous Equidistant Conic to preserve distance measurements. Using spatial statistics tools within ArcGIS, geographic clustering was examined using an average nearest neighbor analysis.

Cited-citing publication and co-author network maps were then created. This was accomplished using a Visual Basic for Applications (VBA) script (ESRI, 2004) within ArcGIS to generate lines connecting the cited publications to the citing publications. The VBA script was also used to generate lines between co-authors of the original cited publications. The amount of co-authorship was quantified and plotted over time. An annual average distance between co-authors was then calculated for 1995–2009 and the following hypothesis tested using the Kruskal-Wallis test after testing normality:

H0: The annual mean distances are the same.

H1: The annual mean distances are different.

Where H0 = Null Hypothesis and H1 = Alternate Hypothesis


The Web of Science search and subsequent refining resulted in 102 publications. Sixty of those publications were cited one or more times resulting in 591 citings (i.e., citing publications) having at least a first author residing in the United States. The geographic locations of the 102 publications are presented in Figure 1 and Figure 2 and their 591 citing publications in Figure 3 and Figure 4 using point and proportional symbols maps. The point patterns in both Figure 1 and Figure 3 were found to have a Z score ≤ −2.58 and therefore clustered with a 0.01 significance level using the average nearest neighbor analysis within ArcGIS. A map of the cited-citing publication network is shown in Figure 5.

Figure 1.

Institutional affiliation of cited publications.

Figure 2.

Institutional affiliation of cited publications – proportional symbol.

Figure 3.

Institutional affiliation of citing publications.

Figure 4.

Institutional affiliation of citing publications – proportional symbol.

Of the 102 cited publications, 65 were co-authored. There was a statistically significant increase in the number of co-authored papers from 1995–2009 (Figure 6), but only 26 of the 65 co-authored publications were authored by individuals located in different ZIP codes. The co-author networks for those 26 publications are shown in Figure 7.

The average distance between co-authors was determined for each of the 65 co-authored publications and then an annual average distance calculated for each year. An insufficient number of co-authored publications for 1995–2000, limited the statistical analysis to 2001–2009. A Shapiro-Wilk normality test indicated that the data was not normally distributed, so the following hypothesis was tested using the Kruskal-Wallis test.

H0: The annual mean distances are the same.

H1: The annual mean distances are different.

Where H0 = Null Hypothesis and H1 = Alternate Hypothesis

The Kruskal-Wallis test resulted in a p-value of 0.5205; therefore, the null hypothesis is not rejected. There is no difference in annual mean distances for the 65 co-authored publications from 2001 to 2009.


The keyword search of Web of Science and subsequent refining resulted in a small, but representative set of publications involving the visualization of bibliographic data. The average nearest neighbor analysis determined that both the cited and citing publications were clustered. The clustering is more apparent in the proportion symbol maps (Figure 2 and Figure 4), which show greater research activity in Bloomington, IN; Philadelphia, PA; Sandia, NM; Stillwater, OK; and Tucson, AZ. The main “producers” of the scholarly works are also the main “consumers” of those scholarly works.

Figure 5.

Cumulative cited-citing network, 1995–2009.

Figure 6.

Number of co-authored publications, 1995–2009.

Figure 7.

Cumulative co-author networks, 1995–2009.

The visualizations of the cited-citing publications and co-author networks were insightful, but do have some limitations. The most salient features of the cited-citing networks are the northeast-southwest pattern and “fanning” of the lines from those citing the 102 publications back to the five main “producers” described above. The general pattern seen in the cited-citing network map (northeast-southwest) can also be seen in the co-author network map. Not only are the main producers also the main consumers, but their overall pattern suggests collaboration among those researchers. One major limitation is the overlapping lines obscuring some of areas of major activity (e.g., Bloomington, IN and Stillwater, OK), as well as repeated citing among major authors in this field.

The resulting 65 co-authored publications comprised a small data set, but large enough for hypothesis testing. Over half of the co-authorship (60%) occurred among authors within the same ZIP code. The collaboration tended to be one of extremes; co-authors were either located at the same institution or separated by hundreds of miles. The overlapping lines were also an issue for the co-author networks.


This study demonstrates that ArcGIS can be used to visualize bibliographic data and that spatial statistics tools within ArcGIS can be used to explore the spatial aspects of the bibliographic data. ArcGIS was effective in visualizing author and co-author publication networks, but there were some limitations (e.g., overlapping lines). The results of this study show that research involving visualizing bibliographic data is localized and the main producers are also the primary consumers. The distance between co-authors involved in the visualization of bibliographic data was unchanged from 2001–2009, so the “death of distance” did not occur among the co-authors in this study.


Several opportunities exist to further this research, including addressing some of the limitations discussed throughout the paper. For example, sample size in this study was small, but could be increased by broadening publications included in the study. The overlapping lines could be addressed by using a single line between the same ZIP codes and scaling line thickness to the number of connections between those ZIP codes. Lastly, user studies are needed to explore the effectiveness of the visualizations.