The ACM Computing Classification System (CCS) is a classification system for computer science fields with 4 hierarchical levels. Authors are requested to assign their work to one or more classes from CCS when submitting it. Therefore every paper in the collection is expected to be classified based on CCS. There are 11 top levels (Table 4) in CCS from which only 10 levels were represented in the data collection. It should be noted that multiple classes can be assigned to one paper. For example, 3 papers were classified Topic J, Computer Applications. One of those 3 papers could also have been assigned to a second, or third top level category, e.g. Topic G, mathematics of Computing. A paper might not be assigned to a second level, or it might be assigned to multiple second levels. Similar logic applies to the third and fourth level categories. From the data set, 686 papers were used for this part of comparison analysis excluding 7 papers without classification assigned by paper authors.
Table 5 shows the similarity between keywords or tags and the titles (kwtt, tagtt), abstracts (kwab, tagab), and titles and abstracts (kwtx, tagtx) when the articles are petitioned into four sets. Group one is any paper classified in one or more first level topic categories with no more refined (2nd level) classification. Group two includes articles classified in one or more first level categories with one or more second level categories and no more refined classification. Group three consists of articles classified in one or more first and/or second level categories with one or more third level categories with no more refined classification. Group four consists of articles classified in one or more first, second and third level categories, and one or more fourth level categories. After petitioning, there were no papers assigned to the Group one, and 27, 205, and 454 papers were grouped to Groups two, three and four respectively. A t-test performed on keywords and tags for each group showed no significant mean difference between keyword and tag similarity in higher classes (general concepts). For group 3 and 4, similarity mean differences between keywords and tags to title, abstract, and both title and abstract were statistically significant, showing keywords are better at representing specific concepts in documents. However, when only tags were compared among classification level groups, in all comparisons Group three and Group four were significantly better representing papers than Group two (Table 5 and Figure 3). For the comparison of tags with titles, a t-test confirmed that the similarity mean differences between Group 2 and Group 3, and Group 2 and Group 4 were significant (t(230) = −2.184, p =.030; and t(38.336) = −3.920, p <.001 respectively). For the comparison of tags with abstracts, and title + abstract, the test result were similarly significant (Group 2 and Group 3 comparison for tagab was t(230) = −2.858, p =.005; for tagtx was t(230) = −3.026, p =.003; Group 2 and Group 4 comparison for tagab was t(479) = −2.822, p =.005; and for tagtx was t(479) = −2.981, p =.003). The comparison by classification showed that tags describe what the resource is about in more general levels of concept when compared with keywords. Although describing resources with specific concepts might not be as good and efficient as keywords, a statistical test confirmed that tags are better in representing specific concepts than general concepts.