Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects



The aggregated journal-journal citation matrix—based on the Journal Citation Reports (JCR) of the Science Citation Index—can be decomposed by indexers or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two content-based classifications of journals: the ISI Subject Categories and the field/subfield classification of Glänzel and Schubert (2003). The content-based schemes allow for the attribution of more than a single category to a journal, whereas the algorithms maximize the ratio of within-category citations over between-category citations in the aggregated category-category citation matrix. By adding categories, indexers generate between-category citations, which may enrich the database, for example, in the case of inter-disciplinary developments. Algorithmic decompositions, on the other hand, are more heavily skewed towards a relatively small number of categories, while this is deliberately counter-acted upon in the case of content-based classifications. Because of the indexer effects, science policy studies and the sociology of science should be careful when using content-based classifications, which are made for bibliographic disclosure, and not for the purpose of analyzing latent structures in scientific communications. Despite the large differences among them, the four classification schemes enable us to generate surprisingly similar maps of science at the global level. Erroneous classifications are cancelled as noise at the aggregate level, but may disturb the evaluation locally.