Web Magazine for Information Professionals

Metadata (1): Encoding OpenURLs in DC Metadata

Andy Powell and Ann Apps propose a mechanism for embedding machine parsable citations into Dublin Core (DC) metadata records.

This article proposes a mechanism for embedding machine parsable citations into Dublin Core (DC) metadata records [1] based on the OpenURL [2]. It suggests providing partial OpenURLs using the DC Identifier, Source and Relation elements together with an associated 'OpenURL' encoding scheme. It summarises the relevance of this technique to support reference linking and considers mechanisms for providing richer bibliographic citations. A mapping between OpenURL attributes and Dublin Core Metadata Element Set (DCMES) [3] elements is provided.

The OpenURL

The OpenURL provides a mechanism for encoding a citation for an information resource, typically a bibliographic resource, as a URL. The OpenURL is, in effect, an actionable URL that transports metadata or keys to access metadata for the object for which the OpenURL is provided. The target of the OpenURL is an OpenURL resolver that offers localized services in an open linking environment. The OpenURL resolver is typically referred to as the user's Institutional Service Component (ISC). The remainder of the OpenURL transports the citation.

The citation is provided by either using a global identifier for the resource, for example a Digital Object Identifier (DOI) [4], or by encoding metadata about the resource, for example title, author, journal title, etc., or by some combination of both approaches.   It is also possible to encode a local identifier for the resource within the OpenURL.  In combination with information about where the OpenURL was created, this allows software that receives the OpenURL to request further metadata about the information resource.  However, this article focuses on the OpenURL metadata encoding mechanism rather than on the specific details of how OpenURLs are processed and used by resolvers and other software.

Originally known as the SFX-URL, the OpenURL's roots lie in the SFX research on reference linking in hybrid library environments [5]. At the time of writing, the OpenURL is most appropriate for citing bibliographic resources, although this is expected to change as the OpenURL develops and moves through the standardization process. Furthermore, the OpenURL has been developed primarily to support 'reference linking' applications. On its own, it does not provide enough richness to form the basis for detailed, full bibliographic citations, for example it includes only the first author of the work.

An OpenURL comprises two parts, a BASEURL and a QUERY. The BASEURL identifies the OpenURL resolver that will provide context sensitive services for the OpenURL. The BASEURL is specific to the particular user that is being sent the OpenURL - it typically identifies the ISC offered by the institution to which the user belongs. Services that embed OpenURLs in their Web interfaces, for example in their search results, must develop mechanisms for associating a BASEURL with each end-user. One way of doing this is to store the BASEURL in a cookie in the user's Web browser, another is to store the BASEURL along with other user preferences.

The QUERY part can be made up of one or more DESCRIPTIONs. Each DESCRIPTION comprises the metadata attributes and values that make up the citation for the resource. A full breakdown of the components of the DESCRIPTION is not provided here. See the OpenURL specification for full details [6].

Here is an example OpenURL:

http://resolver.ukoln.ac.uk/openresolver/?sid=ukoln:ariadne&genre=article
    &atitle=Information%20gateways:%20collaboration%20on%20content
    &title=Online%20Information%20Review&issn=1468-4527&volume=24
    &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel

In this example the BASEURL is <http://resolver.ukoln.ac.uk/openresolver/>, the URL of the UKOLN OpenResolver demonstrator service. The rest of the OpenURL is the QUERY, which is made up of a single DESCRIPTION of an article entitled 'Information gateways: collaboration on content' by Rachel Heery. The article was published in 'Online Information Review' volume 24.

Notice that, because the OpenURL is a URL, it is encoded in such a way that special characters, for example space characters, are represented by a percentage sign followed by two hex digits. This process is known as mandatory escape encoding.

(Note that all the OpenURL examples in this article have been split across multiple lines for display purposes.  Note also that the optional OpenURL 'sid' attribute, set here to 'ukoln:ariadne', indicates the service that generated the OpenURL.  For simplicitly other example OpenURLs in this article do not contain a 'sid' attribute.)

Proposals

This article makes two proposals. Firstly, that an OpenURL may be given as the value of a DC Identifier element as a way of providing a citation for the resource being described by the DC record. Secondly, that an OpenURL may also be given as the value of a DC Source or Relation element as a way of providing citations for resources that are related to the resource being described.

The mechanism used in both cases is the same - a partial OpenURL is placed in the element value. A partial OpenURL is an OpenURL without a BASEURL. This is because, at the time at which the OpenURL is placed into the DC element value, there is no knowledge of which end-user(s) will receive the OpenURL. It is therefore not possible or sensible to embed the BASEURL part of the OpenURL in the element value. Only the DESCRIPTION part of the OpenURL should be placed in the element value.

A DC encoding scheme [7] of 'OpenURL' should be used to indicate that the value forms part of an OpenURL. The DESCRIPTION part of the OpenURL should be full mandatory escape encoded prior to placing in the DC element value. Furthermore, any ampersand ('&') characters that appear in the OpenURL as attribute separators must be encoded as '&amp;'.

Software that processes DC metadata records containing OpenURL DESCRIPTIONs will have to unencode any encoded '&' characters and add a BASEURL in order to deliver full OpenURLs to the end-user.

Proposal 1 - providing a citation for the resource being described

In order to provide a citation for the resource being described by a DC record, place an OpenURL DESCRIPTION for the resource in the value of a DC Identifier element and indicate a scheme of 'OpenURL'.

Here is an example, encoded using the XHTML <meta> tag:

<meta name="DC.Identifier" scheme="OpenURL"
    content="genre=article
    &amp;atitle=Information%20gateways:%20collaboration%20on%20content
    &amp;title=Online%20Information%20Review&amp;issn=1468-4527&amp;volume=24
    &amp;spage=40&amp;epage=45&amp;artnum=1&amp;aulast=Heery&amp;aufirst=Rachel" />

Note that the 'OpenURL' scheme is not yet formally recognised by the Dublin Core Metadata Initiative as a recommended Dublin Core qualifier.

A fuller set of XHTML <meta> tags for this resource might be:

<meta name="DC.Title" content="Information gateways: collaboration
    on content" />
<meta name="DC.Creator" content="Heery, Rachel" />
<meta name="DC.Identifier" scheme="OpenURL"
    content="genre=article
    &amp;atitle=Information%20gateways:%20collaboration%20on%20content
    &amp;title=Online%20Information%20Review&amp;issn=1468-4527&amp;volume=24
    &amp;spage=40&amp;epage=45&amp;artnum=1&amp;aulast=Heery&amp;aufirst=Rachel" />

In this case some information is duplicated in both the OpenURL DESCRIPTION and DC elements. This article makes no recommendations about whether it is sensible to duplicate the metadata in this way.

Note that for some applications, the citation provided by the OpenURL DESCRIPTION will not be sufficiently detailed. In such cases, a rich citation for the resource being described by the metadata record may only be achieved by combining the OpenURL DESCRIPTION with DCMES elements and possibly elements from other namespaces.

Proposal 2 - providing a citation for a related resource

In order to provide a citation for a resource that is related to the resource being described, place an OpenURL DESCRIPTION for the related resource in the value of a DC Source or Relation element and indicate a scheme of 'OpenURL'.

For example, imagine that an HTML version of the journal article mentioned above is made available on the Web. Its embedded metadata might be:

<meta name="DC.Title" content="Information gateways: collaboration on content">
<meta name="DC.Creator" content="Heery, Rachel">
<meta name="DC.Format" content="text/html">
<meta name="DC.Identifier" content="http://www.ukoln.ac.uk/~lisrmh/infogate.html">
<meta name="DC.Source" scheme="OpenURL"
    content="genre=article&
    &amp;atitle=Information%20gateways:%20collaboration%20on%20content
    &amp;title=Online%20Information%20Review&amp;issn=1468-4527&amp;volume=24
    &amp;spage=40&amp;epage=45&amp;artnum=1&amp;aulast=Heery&amp;aufirst=Rachel">
<meta name="DC.Relation.references" scheme="OpenURL"
    content="id=doi:10.1045/december99-dempsey&amp;genre=article
    &amp;atitle=International%20Information%20Gateway%20Collaboration:%20report
    of%20the%20first%20IMesh%20Framework%20Workshop
    &amp;title=D-Lib%20Magazine&amp;issn=1082-9873&amp;date=1999-12&amp;volume=5
    &amp;artnum=12&amp;aulast=Dempsey&amp;aufirst=Lorcan">

This DC record refers to two related resources - the original journal article from which the Web version is derived (using DC Source) and an article published in D-Lib Magazine that is cited in the article (using DC Relation).

Rich citations and strategies for handling duplicate information

The example OpenURLs shown above are ideal for supporting 'reference linking' applications. However, in some cases more detailed citation information may be required.

Consider this example DC record for a journal article:

<meta name="DC.Title" content="International Information Gateway
    Collaboration: report of the first IMesh Framework Workshop">
<meta name="DC.Creator" content="Lorcan Dempsey">
<meta name="DC.Creator" content="Tracy Gardner">
<meta name="DC.Creator" content="Michael Day">
<meta name="DC.Creator" content="Titia van der Werf">
<meta name="DC.Publisher" content="Corporation for National Research Initiatives">
<meta name="DC.Date" content="1999-12"> 
<meta name="DC.Type" content="article"> 
<meta name="DC.Language" content="en-us"> 
<meta name="DC.Rights" content="Copyright (c) 1999 Lorcan Dempsey, Tracy Gardner,
    Michael Day, and Titia van der Werf"> 
<meta name="DC.Identifier" scheme="DOI" content="10.1045/december99-dempsey"> 
<meta name="DC.Identifier" content="http://www.dlib.org/dlib/december99/12dempsey.html">
<meta name="DC.Identifier" scheme="OpenURL"
    content="id=doi:10.1045/december99-dempsey&amp;genre=article
    &amp;atitle=International%20Information%20Gateway%20Collaboration:%20report
    of%20the%20first%20IMesh%20Framework%20Workshop
    &amp;title=D-Lib%20Magazine&amp;issn=1082-9873&amp;date=1999-12&amp;volume=5
    &amp;artnum=12&amp;aulast=Dempsey&amp;aufirst=Lorcan">

Notice that there is information contained in the DC elements that is not available in the OpenURL - for example the names of multiple authors. There is also information in the OpenURL that is not available in the DC elements, and that could not be embedded into DC elements - for example the volume and article numbers. There is information that is more accessible for machine parsing in the OpenURL such as the author's family and given names.   Finally, there is some information that is duplicated in both the DC elements and in the OpenURL.

(Note: in the general case, one can imagine information about the affiliations of the authors also being embedded into the DC metadata, though details of the mechanism to do this have not yet been agreed by the DCMI.)

In some cases it might be useful to remove the duplicated information from the DC record. One approach would be to remove attributes from the OpenURL DESCRIPTION, where that information is available in other DC elements. So, in the DC record above, the 'atitle' and 'id' attributes might be removed. In other cases it might also be possible to remove the 'date', 'aufirst' and 'aulast' attributes as well. Software that processes the DC record could attempt to reconstruct a full OpenURL by adding information to the partial DESCRIPTION based on the DC element values.

However, in many cases, particularly where metadata is embedded into a resource dynamically based on a back-end database, the cost of duplicating information in both DC elements and the OpenURL is probably not very high. Clearly, where metadata and OpenURLs are created and maintained manually, there will be consistency implications for any duplicated information.

A DC/OpenURL crosswalk

The table below gives the definitions of the current OpenURL attributes:

Attribute Value Description 
genre bundles:  
 journal a journal, volume of a journal, issue of a journal 
 book a book 
 conference a publication bundling proceedings of a conference 
 individual items:  
 article a journal article 
 preprint a preprint 
 proceeding a conference proceeding 
 bookitem an item that is part of a book 
aulast  A string with the first author's last name 
aufirst  A string with the first author's first name 
auinit  A string with the first author's first and middle initials 
auinit1  A string with the first author's first initial 
auinitm  A string with the first author's middle initials 
   
issn  An ISSN number 
eissn  An electronic ISSN number 
coden  A CODEN 
isbn  An ISBN number 
sici  A SICI of a journal article, volume or issue. Compliant with ANSI/NISO Z39.56-1996 Version 2 (see http://sunsite.berkeley.edu/SICI/) 
bici  A BICI for a section of a book, to which an ISBN has been assigned. Compliant with http://www.niso.org/bici.html 
title  The title of a bundle (journal, book, conference) 
stitle  The abbreviated title of a bundle 
atitle  The title of an individual item (article, preprint, conference proceeding, part of a book ) 
   
volume  The volume of a bundle 
part  The part of a bundle 
issue  The issue of a bundle 
spage  The start page of an individual item in a bundle 
epage  The end page of an individual item in a bundle 
pages  Pages covered by an individual item in a bundle. The format of this field is ' spage-epage'
artnum  The number of an individual item, in cases where there are no pages available. 
date YYYY-MM-DD 

YYYY-MM

YYYY 

The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYY-MM-DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year. 
ssn winter | spring | summer | fall The season of publication 
quarter 1 | 2 | 3 | 4 The quarter of publication 

The table below provides a mapping from OpenURL attributes to unqualified DC elements.

genre

 

individual items

bundles

 

article

preprint

proceeding

bookitem

book

journal

conference

aulast

creator

creator

creator

creator

creator

-

contributor

aufirst

creator

creator

creator

creator

creator

-

contributor

auinit

creator

creator

creator

creator

creator

-

contributor

auinit1

creator

creator

creator

creator

creator

-

contributor

auinitm

creator

creator

creator

creator

creator

-

contributor

issn

X

-

X

-

-

identifier

X

eissn

X

-

X

-

-

identifier

X

coden

X

-

X

-

-

identifier

X

isbn

-

-

X

X

identifier

-

idenitfier

sici

identifier

-

identifier

-

-

identifier

identifier

bici

-

-

identifier

identifier

-

-

-

title

X

-

X

X

title

title

title

stitle

X

-

X

X

title

title

title

atitle

title

title

title

title

-

-

-

volume

X

-

X

X

X

X

X

part

X

-

X

X

X

X

X

issue

X

-

X

-

-

X

X

spage

X

X

X

X

-

-

-

epage

X

X

X

X

-

-

-

pages

X

X

X

X

-

-

-

artnum

X

X

X

X

-

-

-

date

date

date

date

date

date

date

date

ssn

date

date

date

date

date

date

date

quarter

date

date

date

date

date

date

date

The table shows OpenURL attributes against the genres for which they are allowed to be used. Mappings to DC elements are shown at appropriate points. An X in the table indicates that the OpenURL attribute may be used with the particular genre, but that there is no sensible DC mapping at that point.

The OpenURL 'genre' can be mapped to the DC Type element, although the list of OpenURL genres does not correspond with the list of types in the recommended DCMIType encoding scheme qualifier [8].

Note that five (author-related) OpenURL attributes are shown mapping to the DC Creator and Contributor elements. In general, several of these OpenURL attributes must be combined to form a complete DC Creator or Contributor value (for example aufirst and aulast). Depending on the formatting of a DC Creator or Contributor element value, mapping back from DC to these OpenURL attributes may be difficult because of the problems of splitting a single name into multiple components.

A richer crosswalk would be possible using qualified Dublin Core elements but this has not been presented here.

OpenURL standardization and future work

A request for fast-track standardization of the OpenURL was approved by NISO during its December 2000 SCD meeting. The expectation is that "NISO's aim will be to move rapidly towards a Draft Standard for Trial Use". Work is currently underway with NISO to establish a Steering Committee to work on the standardization. However, at the time of writing no firm timescales had been established.

It is anticipated that there will be some changes to the OpenURL specification during the standardization process. The nature of the changes will be:

(The authors would like to thanks Herbert Van de Sompel, Cornell University for providing background information for this section.)

Relation to DC Citation Working Group recommendations

The DC Citation Working Group was set up in November 1998 and was responsible for identifying standard methods for including bibliographic citation information about resources in their own metadata, and related problems of identifying resource version information. The group concentrated specifically on an article's placement within a journal, volume, and issue. The group has made several proposals for qualifiers to the Dublin Core Metadata Element Set (DCMES) to achieve this aim. Specifically:

It is worth noting that the working group's proposed structured-value set can be mapped directly to available OpenURL attributes as follows:

Proposed structured valueOpenURL attribute
JournalTitleFulltitle
JournalTitleAbbreviatedstitle
JournalVolumevolume
JournalIssueNumberissue
JournalPagesspage, epage, pages

More recently the working group began discussing a related problem of how to capture bibliographic citation information about conference papers, with a view to including other bibliographic genre in the future.  OpenURLs provide a way to encode citation information for books, book parts, conference proceedings and papers.  However, some conference proceedings are also journal issues.  In this case, to capture citation information for an article as both a conference item and a journal item, it would be necessary to include two OpenURLs within repeated DC Identifier elements.

Therefore, the OpenURL DESCRIPTION appears to offer all the functionality identified by the working group for encoding bibliographic citations for simple resource discovery, albeit using a less human-readable syntax than that proposed by the working group.   However, it may not offer the required functionality for individual Dublin Core based applications.

(The authors would like to thank Cliff Morgan, John Wiley & Sons, Ltd. (previous chair of the DC-Citation Working Group) for supplying background information for this section.)

Conclusion

The main purpose of this article has been to propose the adoption of an 'OpenURL' encoding scheme for the DC Identifier, Source and Relation elements.   By doing this, the DCMI will provide users of DC metadata with a simple method of encoding machine-readable citations for bibliographic resources within their metadata, in particular supporting a mechanism for linking between digital resources and non-digital resources.  We have also provided a crosswalk between unqualified DC and the OpenURL attributes and shown how a combination of both OpenURLs and DC metadata can be used to provide richer citations than those provided by either technology on its own.

References

  1. Dublin Core Metadata Initiative
    <http://dublincore.org/>
  2. OpenURL
    <http://www.sfxit.com/openurl/>
  3. Dublin Core Metadata Element Set (DCMES)
    <http://dublincore.org/documents/dces/>
  4. Digital Object Identifier (DOI)
    <http://www.doi.org/>
  5. Reference linking in a hybrid library environment. Part 3: Generalizing the SFX solution in the "SFX@Ghent & SFX@LANL" experiment.
    Van de Sompel, Herbert and Hochstenbach, Patrick.
    D-Lib Magazine, October 1999.
    <http://www.dlib.org/dlib/october99/van_de_sompel/10van_de_sompel.html>
  6. OpenURL Syntax Description
    <http://sfx1.exlibris-usa.com/openurl/openurl.html>
  7. Dublin Core Qualifiers
    <http://dublincore.org/documents/dcmes-qualifiers/>
  8. DCMI Type Vocabulary
    <http://dublincore.org/documents/dcmi-type-vocabulary/>

 

Author Details

 

Andy Powell
Assistant Director, Distributed Systems and Services
UKOLN: the UK Office for Library and Information Networking
University of Bath
Bath BA2 7AY, Uk
E-mail: a.powell@ukoln.ac.uk

Ann Apps
Research and Development (Electronic Publishing)
MIMAS, University of Manchester
E-mail: ann.apps@man.ac.uk

[Andy is a member of the Dublin Core Advisory Committee. Ann is chair of the Dublin Core DC-Type Working Group and a member of the Dublin Core. Ann is also a member of the OpenURL NISO Standards Committee. Advisory Committee.]