Where you search is what you get: literature mining – Google Scholar versus Web of Science using a data set from a literature search in vegetation science





Is Google Scholar superior in literature search compared to the Web of Science?


The Internet.


The maximum number of papers dealing with specific subjects was derived from a published review and compared with Google Scholar and Web of Science search results using GLM and a post-hoc test.


Search results acquired through Google Scholar were not significantly different from the maximum number of papers found by manual search, while the Web of Science search delivered significantly less.


Researchers should give more prominent recognition to Google Scholar as a search tool, especially when conducting quantative reviews and meta-analysis.


Searching for published knowledge is a fundamental precondition for intellectual scientific work. While two decades ago scientists relied on libraries or paper copies of existing work, nowadays the Internet has enabled a tremendous exchange of scientific literature, which is readily accessible online. Assessment of literature is also the basis for all scientific publications, as part of any publication intended for an international audience.

The Web of Science (WOS; Thomson Reuters) has become one of the most popular databases for scientific journals within natural sciences in recent years (Falagas et al. 2008). However, the Google Scholar (GS) web search engine has increasingly gained attention of scientists, especially students (Mikki 2009). While WOS guarantees a relatively stable search environment, with clearly defined lists of indexed journals, its searches are restricted to metadata, such as title, keywords and abstracts. This limitation does not apply to GS, where, theoretically, every text that is electronically available on the Internet will be indexed (Falagas et al. 2008). Therefore, Google does not provide information about what journals are included in the database (Mayr & Walter 2007; Walters 2007). However, GS offers a true full-text search for the majority of scientific journals (see http://scholar.google.com/intl/en/scholar/help.html for general information on the content covered by GS). Here we test and discuss the benefits and drawbacks of using these two literature search tools: a database that includes full-text search capabilities (GS) vs a database that consists of indexed metadata (WOS). For this purpose we use a clearly defined and evaluated data set on ordination techniques within a set of important journals in vegetation science.


The data set used to evaluate the performance of both databases contains all publications dealing with ordination techniques within the discipline of vegetation science covering an 18-yr period (1990–2007) and considers the five journals Applied Vegetation Science (AVS), Folia Geobotanica (FG), Journal of Vegetation Science (JVS), Plant Ecology (PE) and Phytocoenologia (PH). Details on the data sets can be found in von Wehrden et al. (2009), who conducted manual searches by looking through printed and digital full-text versions of the journals and searching for the same set of topics as used in this study. The authors concluded that the WOS roughly detects one-third of all studies dealing with ordination techniques as a result of its restriction to title, keywords and abstract when searching the database.

In the present study, GS was searched for the same ordination methods as in von Wehrden et al. (2009; see Appendix S1 for a list of all search expressions). The total number of search results was noted and the short text excerpts presented in the search results were read to eliminate irrelevant search hits or occasional duplicates. Additionally, these excerpts helped to identify only those studies that performed the ordination method in question, rather than those that cited or discussed it.

An ANOVA was used to compare the abundance of papers per journal and search method. GLMs were used to compare database searches (both GS and WOS results) against the maximum number of papers dealing with the subjects given by the search terms and found through manual search.


The GS search and the maximum number derived from manual searches did not differ significantly, based on the post-hoc test, while the WOS search differed significantly from both the GS search and the maximum number of papers found by manual searches (Appendix S2). All five journals showed differing patterns regarding the different search methods (Fig. 1); for example, GS slightly overestimated the number of publications within JVS, while within all other journals GS underestimated the number of publications, most prominently for PH. A search in WOS resulted in notably fewer references dealing with the respective search terms; finding at best about half of the maximum number of publications.

Figure 1.

Box-plot illustrating the number of publications per year and search method, demonstrating the difference between the five journals searched. Search methods differed widely among the five different journals, yet Google always detected more publications compared to the WOS search (see Appendix S3 for an ANOVA of the data).


In our comparison of searches in GS and WOS, results from GS clearly outperformed those from WOS and yielded similar numbers of articles as the maximum number of papers found by manually searching the journals. GS delivers an almost complete selection of relevant publications, thus confirming results from social sciences (Kousha & Thelwall 2007). Furthermore, the explained deviance of the GS search was comparable to the WOS search (Appendix S4), clearly indicating the latter was less effective. However, by searching for topics that are usually only referred to in the methodological parts of publications we may have introduced a bias into our analysis. Authors may choose not to include specific methodology in keywords and abstract of their article as they deem these topics are ‘less important’ than other aspects of their research. Nevertheless, scientists using WOS for literature searches arrive at significantly fewer relevant results than when using GS or performing a manual search. Additionally, the WOS search revealed a bias for different journals and for the different ordination techniques, thus some articles applying specific methods or published in certain journals may be cited more, while articles using different ordination methods are systematically lost when using WOS. Presumably, these biases are the outcome of specific instructions for authors given in the individual journal, and might also be inferred by the editors.

Searching GS may be more time consuming than searching WOS, as it requires a semi-manual evaluation of the text excerpts. However, in direct comparison to the manual evaluation, which involves checking all printed or digital versions of the articles, using GS is clearly a great time-saver that delivers almost identical results. The slight tendency of GS to overestimate the number of relevant articles (i.e. the presented excerpt implied the use of the ordination technique although it was only discussed) was most conspicuous at very low article numbers (e.g. AVS in Fig. 1) but less pronounced at higher values.

Within GS, certain journals may not be available from time to time, as we experienced during the data collection for this paper. This lack of availability is widely based on journals being accessible for full-text search to Google, which depends partly on license agreements with publishers. For example, for Phytocoenologia GS delivered only valid results from the year 2002 onwards, whereas the WOS database also included previous years. In that specific case, the WOS search provided fewer relevant results.

The scientific publication structure demands reproducible search methods, which WOS has provided to several generations of scientists. GS, on the other hand, is a relatively new development, and is used by fewer researchers (e.g. Hightower & Caldwell 2010). This may change in the future, partly as Google steadily increases the cover of its digital library for older volumes, and partly because the free access to GS may attract more users in the future (Levine-Clark & Gil 2009). This may be especially true in countries where WOS is not available to all scientists due to the high subscription cost.

We conclude that literature queries may yield superior results when conducted within GS compared to WOS, based on our analyses. Although we offer only a selected search for some journals based on specific keywords, we believe that the overall pattern is valid beyond our analysis. Doubts in the scientific community regarding GS are, however, still justified, given the undocumented and sometimes fluctuating journal access. In addition, GS also accesses publications beside peer-reviewed references, which can potentially lead to biases (Alcaraz & Morais 2012). Nevertheless, due to its full-text search capabilities, GS is an important and very useful tool to search the literature. To date, it has been widely overlooked by the scientific community.


Comments and suggestions from Michael Palmer, Tom Wohlgemuth and three anonymous referees greatly improved our manuscript.