Can citation analysis of Web publications better detect research fronts?



We present evidence that in some research fields, research published in journals and reported on the Web may collectively represent different evolutionary stages of the field, with journals lagging a few years behind the Web on average, and that a “two-tier” scholarly communication system may therefore be evolving. We conclude that in such fields, (a) for detecting current research fronts, author co-citation analyses (ACA) using articles published on the Web as a data source can outperform traditional ACAs using articles published in journals as data, and that (b) as a result, it is important to use multiple data sources in citation analysis studies of scholarly communication for a complete picture of communication patterns. Our evidence stems from comparing the respective intellectual structures of the XML research field, a subfield of computer science, as revealed from three sets of ACA covering two time periods: (a) from the field's beginnings in 1996 to 2001, and (b) from 2001 to 2006. For the first time period, we analyze research articles both from journals as indexed by the Science Citation Index (SCI) and from the Web as indexed by CiteSeer. We follow up by an ACA of SCI data for the second time period. We find that most trends in the evolution of this field from the first to the second time period that we find when comparing ACA results from the SCI between the two time periods already were apparent in the ACA results from CiteSeer during the first time period.