Web Magazine for Information Professionals

Metadata Corner: CrossROADS and Interoperability

Michael Day, Rachel Heery and Andy Powell report on work in progress on enhancements to the ROADS software.

The third phase of the ROADS eLib project [1] puts service interoperability at centre stage. The project, which provides software to a number of subject services within eLib and beyond, is now working in an environment where interoperability is a requirement. There is a ‘strand’ within the project where we are investigating a variety of tools and protocols that might contribute inter-working functionality to the ROADS ‘toolkit’. This account attempts to give some brief notes by way of context to this work and to provide a sketch of work in progress.

ROADS and its environment

For services participating in an ‘inter-working environment’, exploration of the business and organisational models on which future services will be based is critical. These models will need to accommodate the strategies of the information service providers, strategies that may, at present, be unaligned. Even more important, interoperable services will need to fulfil the requirements of information users, requirements that have not yet been articulated at a detailed level. It is, for example, unclear which services users wish to be integrated in a ‘federated service’ and which should remain individualised and targeted at particular audiences.

However these wider discussions are for another forum. Within ROADS, as with a number of other eLib projects, one of our objectives is to investigate and demonstrate ways in which the variety of existing and emerging technologies can be used to enable services to work together. We are looking in particular at interoperability between the subject services using our software but also we are developing means by which the ROADS services can inter-work with services based on other software. We hope that a number of the issues raised and the solutions developed will be of interest not only to ROADS users but also to a wider audience.

What do we mean by interoperability?

Interoperability can be viewed as existing at a number of levels. It can be investigated from a variety of viewpoints. The MODELS project [2] has been the focus for significant activity and discussion in relation to interoperability of service provision within the UK. MODELS seeks to provide a framework for the management of access to distributed resources and services. It proposes a system architecture for integrating the different stages of the information gathering process. From the perspective of systems architecture interoperability can be seen in terms of processes, for example:

Another approach might be to consider interoperability from a management perspective. A valuable analysis might be made in terms of:

From the user’s viewpoint, interoperability can be more or less effective depending on how closely the various services are integrated as regards semantics, query language, indexing, management of results. For the user of a typical library OPAC various assumptions are implicit in terms of the quality of the records, matching of query language to indexing policy, precision of the retrieval process. (These assumptions may be more or less correct but they represent a shared view of the ‘information space’ and its navigational limitations).

A useful approach is suggested by Clifford Lynch and Hector Garcia-Molina in their report of the 1995 IITA Digital Libraries workshop where they outline a continuum of levels of interoperability. These different levels are characterised by:

’ … provide a superficial uniformity for navigation and access but rely almost entirely on human intelligence to provide any coherence of content’ [3]
’ …(the interchange of metadata and the use of digital object transmission protocols and formats based on this metadata rather than simply common navigation, query, and viewing interfaces) as a means of providing limited coherence of content, supplemented by human interpretation.’ [3]
‘…to access, consistently and coherently, similar (though autonomously defined and managed) classes of digital objects and services, distributed across heterogeneous repositories, with federating or mediating software compensating for site-by-site variations.’ [3]

It would be an interesting exercise to place proposed services on this continuum, and to consider how explicit this positioning needs to be in order to allow the searcher to navigate their ‘information space’ effectively.

Interoperability work within the ROADS project

Within ROADS we are considering a number of ways in which services can begin to work together. In order to illustrate the possibilities we are making ‘demonstrators’ available from the web sites of the project partners. We are incorporating inter-working functionality into the next release of the ROADS software (currently beta testing). We hope our implementation experience can feed into other activities.

The demonstrators fall into three broad areas:

In addition we are looking at providing a framework to support common approaches amongst inter-working ROADS services. Our work in this area has focused on producing guidelines for the usage and content of ROADS templates. We have worked with ROADS services to produce Cataloguing Guidelines and a Template Registry to inform and promote standard practice.

Searching across multiple ROADS services

ROADS provides a set of tools to manage Internet resource descriptions. These descriptions are based on ROADS templates and are searchable using Whois++ [4], a simple directory service protocol. Whois++ allows co-operating services to inter-work with each other by forming a ‘mesh’ of servers. A server in the mesh that is unable to satisfy a particular query may route the query on to another server that it ‘knows’ holds the necessary information. This process is known as ‘query routing’ based on ‘forward knowledge’ and is described in some detail by Kirriemuir et al. [5]

There are several ROADS-based subject services covering a number of subject areas. Some of these services might be usefully grouped to allow them to be searched together. Initial demonstrators of such grouping in the areas of medicine and engineering are under development [6].

Interoperability between protocols

In the current information landscape, services are made available using a variety of protocols. To search across multiple services that are made available using different protocols, gateways are required. The ROADS project has already developed a simple Z39.50 to Whois++ gateway known as ZEXI [7]. This gateway allows the user of a Z39.50 client to search a ROADS-based service. In this way ROADS services can be integrated into a Z39.50 environment.

ZEXI builds on Isite [8], a freely available software package for building full-text indexes. It includes a Z39.50 front end. Isite provides a simple back-end interface (API) to integrate existing external databases. ZEXI provides a Z39.50 to Whois++ gateway by implementing a simple Whois++ client as a back end. However, limitations in Isite mean that only unstructured plain text records (SUTRS) can be returned in the result set to the Z39.50 client.

We hope to enhance Isite and the current ZEXI to enable it to return structured USMARC or GRS-1 records to the Z39.50 client. This will involve enhancing the Isite back-end API and mapping ROADS/Whois++ records to either USMARC or GRS-1 records.

Integrating other databases with ROADS

The ROADS Whois++ server supports a back end API [9] that allows an arbitrary database to be used to store resource descriptions.

The NewsAgent eLib project [10] is developing a current awareness service for library and information staff based on a repository of resource descriptions. This information is stored in an Oracle-based database developed by Fretwell-Downing Informatics Ltd. An experimental back end script that allows the ROADS Whois++ server to be placed in front of the NewsAgent database has been developed. This allows us to integrate the NewsAgent database into a ROADS environment. A demonstrator is available [11].

Usage and content of ROADS templates

Resource descriptions in ROADS-based services are stored using ROADS templates. Different templates are defined for different resource-types, e.g. for documents, services or projects. Data-elements are defined as simple attribute-value pairs. The ROADS system has been designed to be configurable and it is relatively easy to create new data-elements and new template-types for ROADS templates. The unlimited creation of new data-elements or template-types is likely, however, to have a detrimental impact on interoperability. To help solve this, a metadata registry for ROADS templates has been set up. The ROADS template registry [12] provides a human-readable list of all ROADS template-types and the associated data-elements that are in use. Each element is briefly defined. Users, and potential users, of ROADS can therefore find out which particular elements are being used currently and help avoid the unnecessary proliferation of service-specific template-types or data-elements. Users who do need to create new template-types or data-elements can add these to the registry.

A metadata registry cannot, however, ensure that the metadata content in one subject service itself is interoperable with metadata content from another service. For example, dates and names may be stored in a wide number of formats. The traditional way of dealing with this problem in the print context is to develop cataloguing rules. The ROADS project has developed, in co-operation with ROADS-based services, some generic ROADS cataloguing guidelines [13] to help define how services should deal with the content of templates. The guidelines make particular suggestions on the format of dates, languages and names and also advise on the use of mandatory fields. The development of cataloguing guidelines for ROADS can also have a positive impact on wider interoperability because it can, wherever possible, maintain consistency with legacy resource-description standards like the International Standard Bibliographic Description (ISBD) or the Anglo-American Cataloguing Rules (AACR) which are (or soon will be) in the process of revision to permit them to describe electronic resources of all types.

References

  1. The ROADS Project
    http://www.ilrt.bris.ac.uk/roads/
  2. The MODELS Project
    http://www.ukoln.ac.uk/dlis/models/
  3. Clifford Lynch and Hector Garcia-Molina. Interoperability, Scaling, and the Digital Libraries Research Agenda: A Report on the May 18-19, 1995 IITA Digital Libraries Workshop. August 22, 1995
    http://www-diglib.stanford.edu/diglib/pub/reports/iita-dlw/main.html
  4. Patrik Faltstrom, Sima Newell, Leslie L. Daigle. Architecture of the Whois++ service.
    http://ds.internic.net/internet-drafts/draft-ietf-asid-whoispp-01.txt
  5. John Kirriemuir, Dan Brickley, Susan Welsh, Jon Knight, Martin Hamilton. Cross-searching subject gateways: the query routing and forward knowledge approach. D-Lib Magazine, January 1998.
    http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/january98/01kirriemuir.html
    http://www.dlib.org/dlib/january98/01kirriemuir.html
    http://sunsite.anu.edu.au/mirrors/dlib/dlib/january98/01kirriemuir.html
  6. CrossROADS demonstrators
    http://www.ukoln.ac.uk/metadata/roads/crossroads/
  7. ZEXI
    http://www.roads.lut.ac.uk/zexi/
  8. Isite Information System
    http://www.cnidr.org/ir/isite.html
  9. Martin Hamilton, John Knight. WHOIS++ Gateway Interface specification - version 1.0
    http://www.roads.lut.ac.uk/Reports/wgi/wgi.txt
  10. The NewsAgent Project
    http://www.sbu.ac.uk/litc/newsagent/
  11. NewsAgent Whois++ search interface
    http://roads.ukoln.ac.uk/newsagent/cgi-bin/search.pl
  12. ROADS Template Registry
    http://www.ukoln.ac.uk/metadata/roads/templates/
  13. ROADS Cataloguing Guidelines
    http://www.ukoln.ac.uk/metadata/roads/cataloguing/

Author details

Michael Day, Rachel Heery and Andy Powell
UKOLN Metadata Group
UKOLN
University of Bath