Web Magazine for Information Professionals

Exposing Information Resources for E-learning

Steve Richardson and Andy Powell on Harvesting and searching IMS metadata using both the OAI Protocol for Metadata Harvesting, and the Z39.50 Protocol.

An introduction to the IMS Digital Repositories Working Group

IMS [1] is a global consortium that develops open specifications to support the delivery of e-learning through Learning Management Systems (LMS). (Note: in UK higher and further education we tend to use the term Virtual Learning Environment (VLE) in preference to LMS). IMS activities cover a broad range of areas including accessibility, competency definitions, content packaging, digital repositories, integration with ‘enterprise’ systems, learner information, metadata, question & test and simple sequencing. Of particular relevance to this article is the work of the IMS Digital Repositories Working Group (DRWG) [2].

The DRWG is working to define a set of interfaces to repositories (databases) of learning objects and/or information resources in order to support resource discovery from within an LMS. In particular, the specifications currently define mechanisms that support distributed searching of remote repositories, harvesting metadata from repositories, depositing content with repositories and delivery of content from the repository to remote systems. Future versions of the specifications will also consider alerting mechanisms, for discovering new resources that have been added to repositories.

Note that, at the time of writing, the DRWG specifications are in draft form.

Two broad classes of repository are considered:

In the former, it is assumed that, typically, the learning objects are described using the IMS metadata specification [3] and packaged using the IMS content packaging specification [4]. The latter includes many existing sources of information including library OPACs, bibliographic databases and museum catalogues where metadata schemas other than IMS are in use. In both cases it is assumed that the repository may hold both assets and metadata or just metadata only. Both the example implementations described below fall into the second category of repository.

The DRWG specifications describe the use of XQuery [5] over SOAP [6] to query ‘native’ repositories of learning objects. This usage is not discussed any further in this article. The specifications also describe how to search and harvest IMS metadata from ‘information’ repositories using the OAI Protocol for Metadata Harvesting (OAI-PMH) [7] and Z39.50 [8].

The primary intention of the specifications is two-fold. Firstly, they support the integration of a LMS with one or more back-end learning object repositories. Secondly, they support relatively seamless discovery of resources in one or more information repositories by the end-user from within an LMS.

So, why is this important? Well, as information providers we are used to disclosing information about the resources we make available, either through our Web sites or in more structured ways using, for example, Z39.50. However, in the main, such disclosure tends to happen in the context of other information systems. Increasingly, information resources will need to be exposed for use in the context of online learning systems, and it is reasonable to expect that the primary specifications used to deliver those systems will be those being developed by IMS.

IMS metadata and the JISC Information Environment

The JISC Information Environment (JISC IE) technical architecture [9] specifies a set of standards and protocols that support the development and delivery of an integrated set of networked services that allow the end-user to discover, access, use and publish digital and physical resources as part of their learning and research activities. In the context of the JISC IE, both learning object repositories and information repositories are known as ‘content providers’, while a VLE (or LMS, to use IMS terminology) is known as a ‘presentation service’, because it is primarily involved in interacting with the end-user.

It is interesting to note that two of the key technologies endorsed by the JISC IE are Z39.50 (to support distributed searching) and the OAI-PMH for metadata harvesting - two of the same technologies as specified by the DRWG. What is different between the two approaches is that the JISC IE uses these protocols to exchange simple Dublin Core (DC) [10] metadata records, while the DRWG use these protocols to exchange IMS metadata records.

As we show below, content providers that already support Z39.50 or OAI-PMH to expose simple DC metadata records probably don’t have to do too much work to make IMS metadata records available.

Case study 1: Integrating IMS into the RESULTs OAI repository

The RESULTs Learning Technology Portal [11] is a project funded by JISC and is intended to be a dynamic Web portal for learning technologists in the sense that it will provide multiple views to multiple types of resources for multiple types of users and aspects of practice. The portal accommodates resource browsing, search, collating, resource categorisation, submission and editing, interactive activities and discussion networks.

RESULTs is a metadata repository and does not host actual resources. Typically resources reside on other servers and the URLs for those resources are stored in RESULTs as part of enriched metadata records.

Careful attention has been paid to interoperability standards throughout the development of RESULTs and support for both Dublin Core and IMS metadata formats has been integrated into the underlying relational database structure.

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a transport protocol that oversees the transfer of any metadata from one computer, acting as the data provider (or repository), to another computer, acting as the service provider (or harvester). A harvester can make requests for information about the repository or for an individual record or groups of records that may be restricted by date or by other predefined groupings.

The requests that are of interest here are requests for records, and in particular the type of metadata that is returned. The default metadata schema for records in OAI-PMH is simple Dublin Core (oai_dc). All repositories must support the oai_dc record format, however there are no restrictions on the other types of metadata that can be used. It is equally valid to use the International Metadata Standard (IMS), or indeed any metadata standard provided that they can be encoded using XML.

OAI-PMH uses HTTP to encode requests and XML to encode responses. A typical request for a single record with the identifier 568 from the RESULTs OAI Repository in DC format looks like this:

http://www.results.ac.uk/phpoai/oai2.php?verb=GetRecord&identifier=oai:uk.ac.results:568&metadataPrefix=oai_dc

Breaking this into its constituent parts; there is a repository gateway that handles all OAI requests:

http://www.results.ac.uk/phpoai/oai2.php

An instruction, or verb, that tells the repository what to do:

Verb=GetRecord

An identifier so that the Repository knows which record is being requested:

Identifier=oai:uk.ac.results:568

And finally the metadata specification, in this case asking for ‘simple’ DC metadata:

MetadataPrefix=oai_dc

The response is coded in XML, which is simplified into a schematic view below:

<OAI-PMH>
<responseDate>2002-11-05T14:11:52Z</responseDate>
<request verb=“GetRecord” identifier=“oai:uk.ac.results:568”
metadataPrefix=”oai_dc“>http://www.results.ac.uk/phpoai/oai2.php</request>
 <GetRecord>
  <record>
   <header>
    <identifier>oai:uk.ac.results:568</identifier>
<datestamp>2002-03-19</datestamp>
   </header>
<metadata>
    OAI Dublin Core Metadata Record
   </metadata>
  </record>
 </GetRecord>
</OAI-PMH>

Two aspects of this response are of interest here. One is the ‘metadataPrefix’ attribute of the request element, which simply says that the encoding of the returned metadata is ‘oai_dc’ (technically, this means that the returned metadata must conform to an XML schema for simple DC, as defined by OAI). The second is the actual metadata itself, in this case an ‘oai_dc’ metadata record, the details of which are not shown for simplicity.

A similar request for the same record but in IMS metadata format would look like this:

http://www.results.ac.uk/phpoai/oai2.php?verb=GetRecord&identifier=oai:uk.ac.results:568&metadataPrefix=ims

The only difference is the ‘metadataPrefix’ attribute. As can be seen below, the only thing that has changed is the prefix and the actual metadata itself. The same principle can be applied to any metadata schema, i.e. OAI can transport any metadata, provided it can be encoded using XML and the encoding can be described using XML schema.

<OAI-PMH>
<responseDate>2002-11-05T14:11:52Z</responseDate>
<request verb=“GetRecord” identifier=“oai:uk.ac.results:568”
metadataPrefix=”ims“>http://www.results.ac.uk/phpoai/oai2.php</request>
 <GetRecord>
  <record>
   <header>
    <identifier>oai:uk.ac.results:568</identifier>
<datestamp>2002-03-19</datestamp>
   </header>
<metadata>
    IMS Metadata Record
   </metadata>
  </record>
 </GetRecord>
</OAI-PMH>

Implementation Details

The OAI Web site has a selection of tools implemented by members of the OAI Community [12]. The RESULTs server is running MySQL and PHP as the database and main programming environment and there is a PHP OAI Repository tool kit [13] available for download from the OAI site. Integrating the code was simply a case of copying the files onto the server and editing the configuration script to reflect the RESULTs specific information.

The code provided will only support flat database tables where all the information about a record is stored in one table and as RESULTs has a relational database table structure some additions had to be made to the code to resolve reference numbers into actual values.

Integrating IMS

The code provided only supported ‘oai_dc’ metadata, but the author did have the foresight to provide a mechanism by which other metadata formats could be easily integrated. Only two things were required:

  1. Update the configuration file to support the IMS record format.
  2. Write a script that takes a database record, resolves any foreign keys into actual values and wraps the data in IMS compliant format.

By writing an IMS metadata template and then adding IMS as a metadata type in the configuration file, along with information on how to find the IMS template, the Repository now handles requests for IMS data equally well.

A demonstration of this functionality is available on the RESULTs site and the following URLs will demonstrate the services provided:

http://www.results.ac.uk/phpoai/oai2.php?verb=Identify http://www.results.ac.uk/phpoai/oai2.php?verb=GetRecord&identifier=oai:uk.ac.results:568&metadataPrefix=oai_dc http://www.results.ac.uk/phpoai/oai2.php?verb=ListRecords&metadataPrefix=oai_dc http://www.results.ac.uk/phpoai/oai2.php?verb=ListRecords&from=2002-07-06&metadataPrefix=oai_dc http://www.results.ac.uk/phpoai/oai2.php?verb=ListRecords&from=2002-07-06&until=2002-10-11&metadataPrefix=oai_dc

All the above examples will work equally well for IMS records by simply changing the MetadataPrefix to ‘ims’ like so:

http://www.results.ac.uk/phpoai/oai2.php?verb=GetRecord&identifier=oai:uk.ac.results:568&metadataPrefix=ims

Note: No support is given in the RESULTs repository for OAI-PMH sets as it is still unclear as to what actually constitutes a set in RESULTs.

Case study 2: Integrating IMS into the RDN Z39.50 target

The Resource Discovery Network (RDN) [14] is a national service funded by JISC to provide access to high quality Internet resources for the UK higher and further education communities. The RDN is a cooperative network of subject &lsquo;hubs&rsquo;, including ALTIS (hospitality, leisure, sport and tourism), BIOME (health, medicine and life sciences), EEVL (engineering, mathematics and computing), HUMBUL (humanities), PSIgate (physical sciences) and SOSIG (social science, business and law). Each hub provides access to one or more Internet resource catalogues, containing descriptions of high quality Internet sites, selected and described by specialists from within UK academia and affiliated organisations. Value-added services such as interactive Web tutorials and alerting services are also provided to enable users to make more of their time on the Internet.

The resource descriptions available in each of the hub catalogues are gathered into a central database of all RDN records, known as the RDN ResourceFinder. The OAI-PMH is used to gather the records together. Currently, the default simple DC record format is used to share records, though there are plans to exchange richer metadata records based on qualified DC.

Various interfaces to ResourceFinder are made available [15] including a Z39.50 target that complies with functional area C of the Bath Profile. The database technology used to deliver ResourceFinder is Cheshire [16], an open source XML-based information retrieval tool. A Cheshire configuration file defines the search attributes that ResourceFinder supports, and record conversion ‘output filters’ written in Perl convert the internal XML record syntax stored in the Cheshire database to Bath Profile compliant XML and SUTRS (unstructured text) record syntaxes for delivery as search results. (In Z39.50 terminology, ‘XML’ and ‘SUTRS’ are known as Record Formats).

In order to modify the ResourceFinder Z39.50 target to support the draft DRWG specification, we needed to do three things:

  1. Decide on an Element Set Name for our new IMS metadata XML record syntax. In this case we chose ‘IMS’ as the name.
  2. Write a new Perl output filter to convert the internal XML record syntax stored in Cheshire to an IMS-compliant XML record syntax.
  3. Modify the Cheshire configuration file to associate the new output filter with the ‘IMS’ Element Set Name.

The results of this work can be seen in the live ResourceFinder Z39.50 target (z3950.rdn.ac.uk, port 210).  It is worth noting that the work done so far is not fully compliant with the DRWG specifications, partly because they are still undergoing development.  For example, the DRWH specifications define a large number of IMS-specific search attributes (the attributes upon which searches can be based).  The ResourceFinder target does not currently support any of these - it only supports the DC search attributes required for Functional Area C of the Bath Profile.  Furthermore, the ‘IMS’ Element Set Name that we use is not part of the current IMS specifications.

To demonstrate the results of this work, here is an annotated transcript of a Z39.50 session using the UNIX linemode Z39.50 client, yaz-client [17], to search the ResourceFinder database:

$ yaz-client z3950.rdn.ac.uk:210 Connecting…Ok. Sent initrequest. Connection accepted by target. ID : 2001 Name : Cheshire II zServer - XRDN - RDN ResourceFinder Version: 2.33 Options: search present delSet resourceCtrl accessCtrl scan sort Elapsed: 0.158460 Run yaz-client and connect to the ResourceFinder target…
Z> base xxdefault …set the database name to ‘xxdefault’…
Z> find MRSA Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 5, setno 1 records returned: 0 Elapsed: 0.010991 …search for ‘MRSA’ (the hospital suberbug which is passed through poor hygiene)…
Z> format SUTRS Z> elements F Z> show 1 Sent presentRequest (1+1). Records: 1 [xxdefault]Record type: SUTRS Title: Identification of MRSA reservoirs in the acute care setting: a systematic review - executive summary Description: One of a series of reviews related to MRSA infection control practices written by Rhonda Griffiths et al. The purpose of this review is to present the best available evidence regarding “the design of clinical areas and the role of the inanimate objects commonly found there, in the transmission of MRSA in the acute hospital setting.” This executive summary discusses the scope of the review; inclusion and exclusion criteria; search strategies used; quality assessment; data collection and analysis; and includes a brief summary of the results and their implications for practice. Published in 2002 by the Joanna Briggs Institute for Evidence Based Nursing and Midwifery, and available on the Web in PDF (requires Adobe Acrobat Reader). The full-text version is available to members only. Identifier: http://www.rdn.ac.uk/record/redirect/oai:rdn:nmap:4072654 Identifier: http://www.rdn.ac.uk/record/redirect/?url=http%3A%2F%2Fwww... joannabriggs.edu.au%2FEXMRSAident.pdf Type: Document/report / Systematic review Subject: Staphylococcal Infections / transmission Subject: Bacterial Infections Subject: Antibiotics Subject: Literature Reviews Subject: Review Literature [Publication Type] Subject: Methicillin Resistance Subject: Nosocomial Infection Subject: Cross Infection Note: This metadata record is copyright an RDN partner. Personal and educational use is allowed. All other use prohibited without permission. http://www.rdn.ac.uk/copyright/ nextResultSetPosition = 2 Elapsed: 0.611498 …set the record format to ‘SUTRS’, the element set to ‘F’ (full) and return the first result…
Z> format XML Z> elements F Z> show 1 Sent presentRequest (1+1). Records: 1 [xxdefault]Record type: XML <?xml version=“1.0”?> <record-list> <dc-record> <title> Identification of MRSA reservoirs in the acute care setting : a systematic review - executive summary </title> <description> One of a series of reviews related to MRSA infection control practices written by Rhonda Griffiths et al. The purpose of this review is to present the best available evidence regarding “the design of clinical areas and the role of the inanimate objects commonly found there, in the transmission of MRSA in the acute hospital setting.” This executive summary discusses the scope of the review; inclusion and exclusion criteria; search strategies used; quality assessment; data collection and analysis; and includes a brief summary of the results and their implications for practice. Published in 2002 by the Joanna Briggs Institute for Evidence Based Nursing and Midwifery, and available on the Web in PDF (requires Adobe Acrobat Reader). The full-text version is available to members only. </description> <identifier>http://www.rdn.ac.uk/record/redirect/oai:rdn:nmap:4072654</identifier> <identifier>http://www.rdn.ac.uk/record/redirect/?url=http%3A%2F%2Fwww... joannabriggs.edu.au%2FEXMRSAident.pdf</identifier> <subject>Staphylococcal Infections / transmission</subject> <subject>Bacterial Infections</subject> <subject>Antibiotics</subject> <subject>Literature Reviews</subject> <subject>Review Literature [Publication Type]</subject> <subject>Methicillin Resistance</subject> <subject>Nosocomial Infection</subject> <subject>Cross Infection</subject> <type> Document/report / Systematic review </type> </dc-record> <record-list> nextResultSetPosition = 2 Elapsed: 0.574363 …set the record formt to XML, the element set name to ‘F’ (full) and re-display the first result…
Z> format XML Z> elements IMS Z> show 1 Sent presentRequest (1+1). Records: 1 [xxdefault]Record type: XML <?xml version=“1.0” encoding=“UTF-8”?> <lom xmlns=“http://www.imsglobal.org/xsd/imsmd_v1p2" xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=“http://www.imsglobal.org/xsd/imsmd_v1p2 http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd"> <general> <identifier> oai:rdn:nmap:4072654 </identifier> <title> <langstring> Identification of MRSA reservoirs in the acute care setting : a systematic review - executive summary </langstring> </title> <language> eng </language> <description> <langstring xml:lang=“en-GB”> One of a series of reviews related to MRSA infection control practices written by Rhonda Griffiths et al. The purpose of this review is to present the best available evidence regarding “the design of clinical areas and the role of the inanimate objects commonly found there, in the transmission of MRSA in the acute hospital setting.” This executive summary discusses the scope of the review; inclusion and exclusion criteria; search strategies used; quality assessment; data collection and analysis; and includes a brief summary of the results and their implications for practice. Published in 2002 by the Joanna Briggs Institute for Evidence Based Nursing and Midwifery, and available on the Web in PDF (requires Adobe Acrobat Reader). The full-text version is available to members only. </langstring> </description> <keyword> <langstring>Staphylococcal Infections / transmission</langstring> </keyword> <keyword> <langstring>Bacterial Infections</langstring> </keyword> <keyword> <langstring>Antibiotics</langstring> </keyword> <keyword> <langstring>Literature Reviews</langstring> </keyword> <keyword> <langstring>Review Literature [Publication Type]</langstring> </keyword> <keyword> <langstring>Methicillin Resistance</langstring> </keyword> <keyword> <langstring>Nosocomial Infection</langstring> </keyword> <keyword> <langstring>Cross Infection</langstring> </keyword> </general> <metametadata> <contribute> <role> <source> <langstring>RDN</langstring> </source> <value> <langstring>Creator</langstring> </value> </role> <centity> <vcard> BEGIN:VCARD ORG: END:VCARD </vcard> </centity> </contribute> <metadatascheme> IMS Metadata 1.2 </metadatascheme> </metametadata> <technical> <location> http://www.joannabriggs.edu.au/EXMRSAident.pdf </location> </technical> <educational> <learningresourcetype> <source> <langstring>RDN</langstring> </source> <value> <langstring> Document/report / Systematic review </langstring> </value> </learningresourcetype> </educational> </lom> nextResultSetPosition = 2 Elapsed: 0.918393 …set the record format to ‘XML’, the element set name to ‘IMS’ and display the first result a third time…
Z> quit …quit.

Note: support for the use of Z39.50 to expose IMS metadata records by the RDN ResourceFinder should be seen as purely experimental at the time of writing.

Conclusions

The purpose of this article has been to raise awareness of the work of IMS in the area of providing access to learning object and information repositories and to show that implementing these specifications for existing systems may not be an overly difficult task.  However, while we are confident that the use of OAI-PMH described here will form a sensible basis for interoperability between different systems, the draft nature of the DRWG specifications probably means that it is a little early to be spending significant effort on supporting IMS metadata in Z39.50 just yet.

References

  1. IMS
    <http://www.imsglobal.org/>
  2. IMS Digital Repositories Working Group
    <http://www.imsglobal.org/digitalrepositories/>
  3. IMS Learning Resource Meta-data Specification
    <http://www.imsglobal.org/metadata/>
  4. IMS Content Packaging Specification
    <http://www.imsglobal.org/content/packaging/>
  5. XML Query
    <http://www.w3.org/XML/Query>
  6. Simple Object Access Protocol (SOAP)
    <http://www.w3.org/2000/xp/Group/>
  7. OAI Protocol for Metadata Harvesting
    <http://www.openarchives.org/
  8. Z39.50
    <http://lcweb.loc.gov/z3950/agency/>
  9. JISC Information Environment Technical Architecture
    http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/>
  10. Dublin Core Metadata Initiative
    <http://dublincore.org/>
  11. RESULTs
    <http://www.results.ac.uk/>
  12. OAI-PMH Tools
    <http://www.openarchives.org/tools/>
  13. PHP OAI Data Provider, University of Oldenburg
    <http://physnet.uni-oldenburg.de/oai/>
  14. Resource Discovery Network (RDN)
    <http://www.rdn.ac.uk/>
  15. Working with the RDN
    <http://www.rdn.ac.uk/publications/workingwithrdn/>
  16. CHESHIRE
    <http://cheshire.lib.berkeley.edu/>
  17. YAZ
    <http://www.indexdata.dk/yaz/>

Authors

Andy Powell
UKOLN, University of Bath
a.powell@ukoln.ac.uk

Steven Richardson
UMIST
s.richardson@umist.ac.uk