American Society for Information Science & Technology 2008 Annual Meeting contributed poster

Authors


Mining Maps of Information Objects: An Exploratory Ontological Excursion

The societal archive (virtual though it might be) is made up of information objects of all kinds-everything from stories to talismans to buttons to films to documents. To date the boundaries around domains within which these objects are stored have dictated both the ways in which we classify objects and the ways in which, therefore, they may be retrieved and used for scholarship According to Hj0rland (2003, 98) rationally deduced schema predetermine the potential use of intellectual content by limiting its retrieval. The empirical derivation of units of classification, on the other hand, particularly in developing or evolving systems, provides a basis upon which conceptual systems can be built.

We wish to suggest an approach to the mapping of information objects-that is, all artifacts that are considered informative and therefore might be retrievable through an information system such as a catalog, database, or search engine-that will reveal undiscovered semantic relationships. In this presentation we will use the CIDOC Conceptual Reference Model (hereafter CIDOC CRM) to map select sets of information objects. The CIDOC CRM is an ontology designed for the representation of artifacts and the integration of cultural information.

We hope to demonstrate the compatibility of multi-disciplinary (and therefore multi-epistemological) mappings of the components of information. That is to say, we will map the objects themselves, we will map them as artifacts resident in a repository, and we will map instantiations of their representations-the thing, the nature of its existence, and all searchable iterations of it. An example is this Etruscan Satyr-head antefix held by the University of Pennsylvania Museum of Archeology and Anthropology (http://www.museum.upenn.edu/new/worlds intertwined/etruscan/architecture.shtml).

The decorated terracotta round would have been used at the end of a row of terracotta roof tiles to keep pests out. In addition to the antefix itself, seven representations of it (instantiations) exist in-house at the Museum. For the purpose of this abstract we have mapped the item itself and the digital image of it you can see on the museum's website. In the CRM “E” identifies a class, “P” identifies a property, and the direction of the arrow identifies the direction of the property. For instance, one can describe the material of an object thus:

original image

which can also be rendered thus:

E22 Man Made Object (Satyr Head) → P45 consists of : E57 Material (Terracotta) Because terracotta is as much a process as a material, the object also can be identified thus:

E22 Man Made Object (Satyr Head) ← P108 was produced by : E12 Production (the production of the Satyr Head) → P126 employed : E57 Material (Terracotta)

For the purpose of data-mining the mappings can be represented like this:

For the object itself:

Material {E22 (→ P45: E57)(← P108 : E12 → P126: E57)}

Artifact {E22 → P2: E55}

Item Number {E22 → P48: E42}

Source {E22 (←P30: E10 → P29 : E40 → P131: E82)(← P24: (E8 → P22: E40 → P131: E82))(E8/E10 → P7: E53 → P87: E44) + (: E53 → P27 (: E9 → P25: E53 → P87: E44) + (: E9 → P14 : E40) + P14.1: E55)}

Date {E22 (→P108: E12 → P4: E52 → P78: E49)(: E52 → P82: E61)}

To which would be added for an image of the object:

Image {E22 (→P138: E36)(→P1: E75)}

A second example is a merchant marine deck log from the United States Merchant Marine Academy Class of 1942 archives. The handwritten archival document can be represented thus:

Collection {E78 ← (P46: E78)}

Item {E22 ← (P46: E78)}

Creator {E22 ← ((P108: E12) → (P14: E82)(P14.1: E55 → P131: E82))}

Image {E22 (→P138: E36)(→P1: E75)}

At a glance it is apparent that although the two information objects are quite different, they share classes and properties that indicate semantic similarity.

The immediate objective of our project is to map a large set of instantiated information objects using the CRM, then to use data-mining techniques to discover as yet unknown patterns among the combinations of entities, properties, and relationships. Similar patterns will be grouped using the tools of naïve classification. Our poster will demonstrate the mapping of selected objects, and the nascent classification of properties.

There is evidence that societal forces lead to the evolution of large sets of information objects, and that these sets display cultural imprints in their accumulated properties, attributes, and relationships. One way to enlarge the societal archive is to mine it for heretofore unrealized meaning. Our purpose is to discover epistemologically diverse groupings of characteristics of information objects. One might even hope that such a classification would reveal as yet unknown relationships among information objects that might help to create better pathways in the semantic web, or as Patrick Wilson (1968) might have said, to create greater exploitative power.

Ancillary