Lois Mai Chan, Ph.D
Professor, School of Library and Information Science
University of Kentucky
U.S.A.
loischan@uky.edu
ABSTRACT: The rapid growth of Internet resources and digital collections and libraries is accompanied by a proliferation of metadata schemas. Each metadata schema has been designed based on the requirements of the particular user community, intended users, type of materials, subject domain, the depth of description, etc. Problems arise when building large digital libraries or repositories with metadata records prepared according to diverse schemas. Most users do not and should not have to know or understand the underlying structure of the digital collection; but in reality, they are experiencing difficulties in resource discovery and access. How to enable a “one-stop” seamless search presents considerable challenges. This presentation reviews some of the methods that have been or are currently used to achieve or improve interoperability among metadata schemas.
In recent literature, a great deal has been written about interoperability between and among different metadata schemas. This presentation reviews and analyzes some of the methods currently used to achieve interoperability.
“The ability of multiple systems, using difference hardware and software platforms, data structures, and interfaces, to exchange and share data” (NISO 2004)
“The ability of two or more systems or components to exchange information and use the exchanged information without special effort on either system” (ALCTS 2004)
“The compatibility of two or more systems such that they can exchange information and data and can use the exchanged information and data without any special manipulation” (Taylor 2004)
It should be noted that these models are not mutually exclusive. Sometimes, within a particular project, we may see more than one model being used.
In this approach, all participants of a consortium, repository, etc., use the same schema, such as MARC/AACR or the Dublin Core. By using the same standard, a high level of consistency can be maintained. This, of course, has been the approach in the library community for over a century. It is the ultimate solution to the interoperability problem. However, although it is a conceptually simple solution, it is not always feasible or practical, particularly in heterogeneous environments serving different user communities where components or participating collections contain different types of resources already described by a variety of specialized schemas. This method is only viable at the beginning or early stages of building a digital library or repository, before different schemas have been adopted by different participants of the collection or repository. Examples of uniform standardization include the MARC/AACR standards used in union catalogs of library collections and the Electronic Theses and Dissertations Metadata Standard (ELD-MS) based on the Dublin Core used by members of the Networked Digital Library of Theses and Dissertations (NDLTD).
This model ensures a similar basic structure and common elements, but with varying depths and details. Examples of application profiling include the Library-Application Profile (for using Dublin Core) and the Biological Data Profile of the National Biological Information Infrastructure (NBII), which is based on FGDC/CSDGM (Content Standard for Digital Geospatial Metadata) of the Federal Geographic Data Committee. Examples of adaptation/modification include:
There have been a substantial number of crosswalks. Some examples are:
The crosswalk approach appears to be more workable when mapping from complex to simpler schema – in other words, a “one way street.” An example is the crosswalk between the Dublin Core and MARC. Because of different degree of depth and complexity, crosswalk works relatively well when mapping MARC fields to Dublin Core elements but not vice versa, because MARC is a much more complex schema. One of the problems identified by Marcia L. Zeng is the different degrees of equivalency: one-to-one, one-to-many, many-to-one, and one-to-none (Zeng 2001). Also, while crosswalk works well when the number of schemes involved is small, mapping among multiple schemas is not only extremely tedious and labor intensive, but requires enormous intellectual effort. For example, a one-way crosswalk requires one mapping process (A-->B), and a two-way crosswalk requires two mapping processes (A-->B and B-->A). The process becomes more and more cumbersome the more schemas are involved. For example, a crosswalk involving three schemas would require six (or three pairs of) mapping processes, a four-schema crosswalk would require twelve (or six pairs of) mapping processes, and a five-schema crosswalk would require twenty mapping processes.
The Picture Australia project is a digital library project encompassing a variety of institutions including libraries, the National Archives, and the Australian War Memorial, many of which came with legacy metadata records. Records from participants are collected in a central location (the National Library of Australia) and then translated into a “common record format,” with fields based on the Dublin Core (Tennant 2001). The OAI stipulates that “it is compulsory that all open archives be able to generate metadata for all resources in unqualified Dublin Core (DC)…This will ensure that service providers who do not understand any other metadata format will at least be able to glean the basic information about resources from their DC renditions.” (Suleman and Fox 2001).
The lingua franca superstructure is built from a set of core attributes that are common to many or most of the existing schemas used by participants in a digital library or repository. An example is the ROADS template, which uses a set of broad, generic attributes.
The question is, then: how does one determine what the “most common attributes” are? A possibility is to make use of the core attributes, identified by the IFLA Working Group on the Use of Metadata Schemas (IFLA 2003), as occurring in the most widely used metadata schemas. These common core attributes are: Subject, Date, Conditions of use, Publisher, Name assigned to the resource, Language/mode of expression, Resource identifier, Resource type, Author/creator, and Version. The results of a survey conducted by the IFLA Working Group indicate that certain elements are more universally or frequently occurring than others. Based on this evidence, it could be argued that in a particular environment (a digital library, a repository, etc.), a consistent or central index (NISO, p. 2) or a combined index -- a master index merging the most commonly occurring elements in various metadata schemas from different collections -- can be used as a tool for federated searches. Such an index enables a layered service, offering access at a high level, involving an entire digital library or repository, while at the same time allowing the browsing of rich metadata descriptions within individual collections.
This model may be applied to different information environments, print, visual, audio, geospatial, etc. The common attributes shared by components or participants within a particular environment can be defined according to their user needs. For example, in a multilingual environment, it is expected that language would be an important attribute; and in an environment encompassing resources from various parts of the world, geographic location would be significant.
Some of the advantages of defining and using a set of core attributes are:
The master index allows the users to enter from a common search interface and be directed to the appropriate component(s) or service(s) within the digital library, where the user may browse the rich description in the metadata records contained in the individual components parts, which may have been prepared according to different metadata schemas. Or, a further, more refined search utilizing the unique elements such as controlled vocabulary, publisher name, condition of use, etc. in the individual metadata schema may be conducted.
Resource Description Framework (RDF)
The Resource Description Framework (RDF) is a data model developed by the World Wide Web Consortium (W3C) for the description of resources on the Web that “provides a mechanism for integrating multiple metadata schemes” (NISO 2004). Expressed in XML, multiple namespaces may be defined to allow elements from different schemas to be combined in a single resource description. An RDF record links multiple descriptions, created at different times for different purposes, to each other. The following example shows how different metadata schemas (as indicated by namespaces) can be packaged together (Iannella 1999):
<? xml version="1.0" ?>
<RDF xmlns = "http://w3.org/TR/1999/PR-rdf-syntax-19990105#"
xmlns:DC = "http://purl.org/DC#"
xmlns:AGLS = "http://naa.gov.au/AGLS#">
<Description about = "http://dstc.com.au/report.html">
<DC:Title> The Future of Metadata </DC:Title>
<DC:Creator> Jacky Crystal </DC:Creator>
<DC:Date> 1998-01-01 </DC:Date>
<DC:Subject> Metadata, RDF, Dublin Core </DC:Subject>
<AGLS:Function> Information Management – Internet /AGLS:Function>
</Description>
</RDF>
Metadata Encoding and Transmission Standard (METS)
The Metadata Encoding and Transmission Standard (METS) is a standard for packaging descriptive, administrative, and structural metadata into one XML document for interactions with digital repositories. It provides a framework for combining several internal metadata structures with external schemas (such as MODS or MIX). It is “a standard that provides a method to encapsulate all the information about an object—whether digital or not” (Tennant May 15, 2004).
The descriptive metadata section may point to descriptive metadata external to the METS document (e.g., a MARC record in an OPAC or an EAD finding aid maintained on a WWW server), or contain internally embedded descriptive metadata, or both. Multiple instances of both external and internal descriptive metadata may be included in the descriptive metadata section. The following example shows a file section from a digital library object for an oral history which has three different versions: a TEI-encoded transcript, a master audio file in WAV format, and a derivative audio file in MP3 format (METS 2004):
<fileSec>
<fileGrp ID="VERS1">
<file ID="FILE001" MIMETYPE="application/xml" SIZE="257537" CREATED="2001-06-10">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.xml</FLocat>
</file>
</fileGrp>
<fileGrp ID="VERS2">
<file ID="FILE002" MIMETYPE="audio/wav" SIZE="64232836" CREATED="2001-05-17“ GROUPID="AUDIO1">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.wav</FLocat>
</file>
</fileGrp>
<fileGrp ID="VERS3" VERSDATE="2001-05-18">
<file ID="FILE003" MIMETYPE="audio/mpeg" SIZE="8238866" CREATED="2001-05-18“ GROUPID="AUDIO1">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.mp3
</file>
</fileGrp>
</fileSec>
Guenther, Rebecca, and Sally McCallum. (2002). New metadata standards for digital resources: MODS and METS. ASIST Bulletin, 29(2).
Heery, Rachel M , Andy Powell, and Michael William Day. (Mar. 1998). Metadata: CrossROADS and interoperability [computer file]. Ariadne (Online) no. 14.
IFLA Working Group on the Use of Metadata Schemas. (2003). Guidance on the Structure, Content, and Application of Metadata Records for Digital Resources and Collections: Report of the IFLA Cataloguing Section Working Group on the Use of Metadata Schemas: Draft – For Worldwide Review. http://www.ifla.org/VII/s13/guide/metaguide03.pdf
Iannella, Renato (1999). An Idiot's Guide to the Resource Description Framework. http://archive.dstc.edu.au/RDU/reports/RDF-Idiot/
Johnston, Pete. (2003). Metadata and Interoperability in a Complex World [computer file]. Ariadne (Online) no37, p. 2.
McCallum, Sally H. (2003). Library of Congress metadata landscape. Zeitschrift für Bibliothekswesen und Bibliographie, 4.
METS: A Tutorial & Overview. (2004) http://www.loc.gov/standards/mets/METSOverview.v2.html
National Information Standards Organization. (2004). Understanding Metadata. http://www.niso.org/standards/resources/UnderstandingMetadta.pdf
Suleman, Hussein and Edward Fox. (2001) The Open Archives Initiative: Realizing Simple and Effective Digital Library Interoperability. Journal of Library Administration 35(1/2): 125-145.
Taylor, Arlene. (2004) The Organization of Information. 2nd ed. Westport, CN: Libraries Unlimited.
Tennant, Roy. (February 15, 2001) Different Paths to Interoperability. Library Journal 126(3):118-119.
Tenant, Roy. (May 15, 2004). It’s Opening Day for METS. Library Journal, 129 (9), 28.
Tenant, Roy. ((July 2004). Metadata’s Bitter Harvest. Library Journal, 129(12), 32.
Tenant, Roy. (Dec. 2003). The Engine of Interoperability. Library Journal 128 (20), 33
Tenant, Roy. (May 15, 2002). The Importance of Being Granular. Library Journal, 127(9), 32-33.
Zeng, Marcia Lei. (2001). Supporting Metadata Interoperability: Trends and Issues. In: Global Digital Library Development in the New Millennium. Ching-Chih Chen ed. Beijing: Tsinghua University Press. pp. 405-412.