Principle violations revisiting the Dublin Core 1:1 Principle
Although many best practice documents encourage Dublin Core metadata creators to “obey the 1:1 Principle,” this recommendation has proven “extremely confusing” in practice. The impact of this confusion is widespread violations of the Principle that inhibit the ability of large-scale metadata aggregations to provide useful services. A preliminary operational definition of the 1:1 Principle that identifies of non-conformant descriptions is explored.
Although the Dublin Core metadata standard was originally intended for describing online “document-like objects,” libraries, archives and museums quickly adopted it to disseminate information about cultural heritage artifacts. In order to distinguish between records describing “original” resources and records describing network-retrievable “surrogates,” DCMI introduced the 1:1 Principle, characterized as:
In general, Dublin Core metadata describes one manifestation or version of a resource, rather than assuming that manifestations stand in for one another. For instance, a jpeg image of the Mona Lisa has much in common with the original painting, but it is not the same as the painting. As such the digital image should be described as itself, most likely with the creator of the digital image included as a Creator or Contributor, rather than just the painter of the original Mona Lisa. The relationship between the metadata for the original and the reproduction is part of the metadata description, and assists the user in determining whether he or she needs to go to the Louvre for the original, or whether his/her need can be met by a reproduction (Hillmann, 2003).
The specification of this principle has done little to prevent records that offer incoherent accounts of the described resources. Interviews with metadata creators indicate that the 1:1 Principle causes a “great deal of confusion” (Park & Childress, 2009). Shreeves, et al. (2005) found that “no collections maintained a consistent one-to-one mapping between a metadata record and a single resource.” The multiple, inconsistent, interpretations of the 1:1 Principle result in ambiguities that are “particularly problematic” in large-scale metadata aggregations that rely on automated workflows to provide services (Shreeves, et al., 2005). Hutt & Riley's (2005) analysis found that date values were divided between those describing physical resources and digitized copie. Han, et al. (2009) found that one third of ContentDM repositories included information about both physical and digital manifestations.
Because there are no formally specified “rules” for understanding the 1:1 Principle, each of these studies has relied on “high-level assumptions” as the basis of their observations (Hutt & Riley, 2005). Beyond recommendations by authors to “obey the 1:1 Principle,” “…there is a lack of studies focusing on how the one-to-one principle is reflected in the metadata creation process…as well as in actual DC records.” (Park & Childress, 2009).
Based on patterns in existing cultural heritage metadata, a preliminary operational definition of the 1:1 Principle that identifies non-conformant descriptions is explored.
Dushay & Hillmann (2003) recommend using visual graphic analysis of OAI-PMH metadata harvests for evaluation and metadata augmentation purposes. Even simple, lightweight approaches can provide insights about a set of metadata records (Nichols, et al. 2008). For this pilot study, the SIMILE Gadget (http://simile.mit.edu/wiki/Gadget) utility is used to characterize 55,000 OAI-PMH Dublin Core records from 25 IMLS Digital Collections and Content (IMLS DCC) Opening History collections. Using Gadget summaries, each collection is classified by drawing on concepts defined by the DCMI Abstract Model (Powell, et al., 2007) and DCMI Metadata Terms specifications (DCMI Usage Board, 2008). Although the OAI-PMH XML schema (and many of the records under consideration) pre-date these recommendations, the defined concepts provide a framework for an operational definition of 1:1 Principle violations and a path for augmenting harvested records.
The account of the 1:1 Principle offered by Hillmann (2003) characterizes it as a relationship between related records – one describing a physical cultural heritage artifact and one describing a digital surrogate. In this pilot analysis, statements about a physical format (e.g. glass plate negative, 35mm slide, etc.), are classified as describing resources belonging to the PhysicalResource class. Statements about digital formats (e.g. image/jpeg, 10224000 bytes, etc.) are classified as describing resources that are members of a DigitalResource class. This latter class is not provided by DCMI, but is defined for our purposes as equivalent to other DCMI resource classes. An interpretation of the 1:1 Principle (in the spirit of Hillmann) is that a described resource cannot be a member of both the PhysicalMedium class and the DigitalResource class. Of the 25 collections, 60% (n15) are identified as containing records that described resources in both the PhysicalMedium and DigitalResource classes - thus violating our working interpretation of the Principle.
The remaining 40% of collection appear to describe a single class of resources, however 7 of these collections exhibit an interesting feature undocumented by previous studies. These records describe resources that are classified as PhysicalMediums (e.g. paintings, baskets, newspapers manuscripts, etc.), yet retrieve a digital manifestation. Because non-networked resources can legitimately be assigned a URI, it is not until the URI is de-referenced that a “violation” becomes evident. Automated approaches to detecting 1:1 Principle violations will be unable to identify these records because they are internally coherent.
Only one of the collections included in this pilot did not violate the 1:1 Principle. Although two collections did not include a direct violation described above, they did assign dates to DigitalResources that preceded the availability of specified formats (e.g. by asserting TIFFs dated 1916). These records also suggest that other combinations of properties can also indicate a potential violation (e.g. a description of both the publisher of physical manifestations and the digital manifestation). In either case, external world-knowledge is needed to be able to classify such records as “violations.”
The failure of the 1:1 Principle to achieve its objectives suggests that it is necessary to revisit it as we develop new specifications for Dublin Core-based cultural heritage metadata. In doing so, we do not wish to offer a proscriptive set of rules, but rather provide a formalization of the Principle and what counts as a violation that can be applied to existing metadata – particularly that shared through OAI-PMH. This work must be informed by recent developments in cultural heritage domain models (such as FRBR and CIDOC Conceptual Reference Model) in order to account for both the content and the carriers of resources described by this metadata. A formalized definition of the 1:1 Principle can be used with the improved semantics of the DCMI Abstract Model to augment existing cultural heritage metadata.