The JOIN-UP Programme: Seminar on Linking Technologies

sandy shaw

The JOIN-UP Programme: Seminar on Linking Technologies

Sandy Shaw reports on a seminar bringing together experts in the field of linking technology for JISC's JOIN-UP Programme.

This seminar brought together experts in the field of linking technology with participants in the four projects which constitute the JOIN-UP programme, for exploration and discussion of recent technical developments in reference linking.

The JOIN-UP project cluster forms part of the DNER infrastructure programme supported by the JISC ⁵⁄₉₉ initiative. Its focus is development of the infrastructure needed to support services that supply users with journal articles and similar resources. The programme addresses the linkage between references found in discovery databases (such as Abstracting and Indexing databases and Table of Contents databases) and the supply of services for the referenced item (typically, a journal article), in printed or electronic form. Four individual projects have been combined in the JOIN-UP Programme: Docusend (Kings College, London); ZBLSA and XGRAIN (EDINA); ZETOC (British Library). These projects will work together to contribute separate but compatible, and inter-operable, parts of the four-part DNER functional components of discover, locate, request, and access.

The aim of the seminar was to develop understanding of recent developments in reference linking and to build consensus among the JOIN-UP partners on the adoption of common solutions. The seminar timetable grouped the eight excellent presentations into three sessions with ample time for discussion among the thirty participants.

Ann Bell, Director of Library Services, King’s College London, chaired the first three sessions. Peter Burnhill, Director of EDINA, chaired the closing discussion session. The presentations may be found on the JOIN-UP Programme Web site [1].

Setting the Scene: Background and Purpose

Lorcan Dempsey, the Director of the DNER, welcomed the participants and set the context of the seminar within the wider ambitions of the DNER. Recent work on the DNER has recognised the distinction between content (the information resources themselves) and environment (the means by which these resources are accessed and managed). The major challenge is to bring together the disparate types of information resources into a unified information environment that enables them to be used in an integrated way. Integration of this framework with local institutional services is a further requirement. Current DNER development activity is due to report shortly, and should result in a completed strategy framework by Summer 2001. A new round of implementation activity for new service developments will follow.

Andy Powell, UKOLN, discussed recent work on development of the DNER architecture. In addition to the four functions, discover, locate, request, and access, there is a need for services supporting information creation and collaboration among the creators, providers, and consumers of information. Rather than existing as a collection of stand-alone services, the DNER should function as an integrated whole: for example, the user should be able to discover material of interest held by a range of content providers, and move directly to the content. To enable the development of portals which will provide access to these multiple services, the services will need to expose their content and metadata for machine to machine access (m2m). Many different types of portal will be required (subject, data centre, institutional, learning), using either of two broad approaches: thin (shallow linking) or thick (deep linking). Thin portals will provide searches over HTTP on content description and service description services. Thick portals will provide advanced services for searching, sharing, and alerting using additional technologies for each function. Cross-searching services will make use of Z39.50 and the Bath Profile to access Z-targets. Sharing may be achieved by use of the Open Archives Initiative harvesting protocol. This allows repositories to share with external portals the metadata records corresponding to their holdings. Alerting services may be provided by use of RSS (Rich/RDF Site Summary) technology.

A key problem with the use of URLs, both as identifiers for information objects and as locators, is that they lack persistence (dead links are a common feature of the web). A ‘discovery’ action should produce metadata about an object that is persistent, and that identifies it unambiguously. The user then needs to be able to resolve this metadata, to determine a location where an instance of the resource may be found. In order to locate an ‘appropriate copy’ of the resource, i.e. a copy that the user is able to, and authorised to access, the resolution process may take account of the user’s identity, rights, and location. A mechanism that has been proposed for handling metadata records of this type is the OpenURL, which encodes metadata for a resource (effectively, a citation) as a URL. A portal which assists a user in the discovery of a resource may refer information about the resource (in the form of an OpenURL) to a resolution server, possibly local to the requesting user, to determine the location of an appropriate copy.

In summary, the emerging DNER architecture model anticipates that the glue between presentation services (portals) and content will be provided by a set of middleware/fusion services that will manage authentication, authorisation, collection description, service description, and resolution. Interesting times lie ahead!

Peter Burnhill, Director of EDINA, gave an overview of the JOIN-UP programme, which brings together four projects, commonly related in developing the DNER infrastructure for the provision of journal articles: Docusend, Xgrain, ZBLSA, and ZETOC. In applying the four ‘demand-side’ verbs, discover, locate, request, and access, to the provision of journal articles, there is a clear need for the use of common identifiers to communicate information reliably between the services providing these four functions. The present offerings in the DNER, while providing services that implement each function, do so in isolation from one another. Hence the user is obliged to treat each step of the process as a distinct activity, to be undertaken separately. The aim of the JOIN-UP programme is to create a framework within which the four ‘demand-side’ activities can inter-operate. While not insisting that this would operate ‘seamlessly’, Peter was at pains to stress that the final product would at least be ‘well-seamed’. Of the four JOIN-UP projects, Docusend and ZETOC were originally conceived as end-to-end services for document delivery, and Xgrain and ZBLSA as functional brokers for discovery and location, respectively. The difference in nature of the four projects provides the opportunity to investigate how heterogeneous services can work together to form key infrastructure components of the DNER.

The ZBLSA project aims to develop a pilot service that acts as a locate broker for articles on behalf of the various DNER portals. Given a reference to an article, and information about the requesting user, ZBLSA will enumerate a list of the most appropriate resources that can supply services on the article to that user. In general, the ZBLSA broker will act as a ‘Rosetta Stone’ directing requests from different types of portal (A&I, ToC, Subject RDN, local library, L&T OLE), to different types of content provider (library, aggregator, publisher, document delivery service, open archive).

Defining the Problem: Scope and Purpose of Linking

In the first of two talks, Jenny Walker of Ex Libris presented a general review of linking, defined briefly as the problem of deriving from the information present in a standard citation, the location of an instance of the object itself. The early solutions to the problem made use of static links (as used in ISI Links, IOP). These links are precomputed and hard-wired into citation lists and A&I records. They are not ‘context-sensitive’, in that they are invariant, taking no account of the affiliation or status of the requesting user. While they are highly reliable, they may be generated only by an agent which has full control over the information environment in which the objects are stored. Static linking solutions are built round a central database holding unique identifiers and associated metadata. They provide search facilities taking metadata input and deriving the corresponding identifier; in turn these identifiers may be resolved by a central service to locate the full text object.

While these products bring great advantages to users, they suffer from a serious limitation: in failing to take account of the context of the requesting user, they do not identify the most ‘appropriate copy’ for that user. The same journal may be available from many different sources, such as aggregators, local library mirrors, or ‘free with subscription’ copies. The appropriate copy for a given user has more to do with who the user is than what the journal is. So, the original definition of the problem may be restated: given the information in a citation to an article, how does the user find an appropriate copy of the article? What is required is a scheme that generates links dynamically, according to the rights status of the requesting user. A further limitation of some existing linking frameworks is that they consider only the delivery of the full text object. Since links, in this case, are determined by a remoter service provider, the local librarian has no means of supplying additional links to other related services, such as abstracts, citation databases, or OPAC and union catalogues. Given this additional requirement, the problem can be refined further: given a record of bibliographic metadata, how does one deliver appropriate services for it?

Albert Simmons of Open Names Service, OCLC, gave a presentation on naming as a key component of robust link resolution. Firstly, it is important to be clear about the meaning of names. As mentioned by Jenny Walker, some useful distinctions have been drawn within the model developed by IFLA. The object of interest may be a distinct intellectual or artistic creation itself, i.e. a work such as Romeo & Juliet. The specific realization of the work is called an expression (e.g. original text, revised text, performance). The physical embodiment of an expression of a work is called a manifestation (e.g. CD-ROM). A specific copy of a manifestation is called an item. Any naming framework must distinguish between works, expressions, and manifestations in assigning names. Numerous naming schemes exist for different types of object (e.g. ISBN, ISSN, ISMN, ISAN, BICI, SICI, DOI etc.). Each has its own function, naming authority, and rules of application. Equally, numerous metadata definition schemes exist (e.g. INDECS, Dublin Core, MARC). Rules for interoperabilitiy are required, especially to handle the growing market in multimedia products. Albert outlined some of the issues of handling digital objects within just one of these naming schemes, the ISBN, such as the need to extend the name space (by assigning additional digits to the ISBN); the absence of agreed core metadata; the authority to assign ISBNs to ebooks; the definition of who is a publisher (everyone?). Clearly, much work will be required to resolve all the issues surrounding digital objects within all these naming schemes.

OCLC has identified the need for an Open Names Service that will act as a general-purpose name resolution service to handle transactions in e-commerce and digital rights management. A single interface will handle all link types, and will redirect enquiries to sites according to the type of name. The base service will maintain customer profiles and will manage authentication. Additional services will include trusted third party activities, collaborative development, and service profiling. The service is scheduled to go into production in January 2002. Appraising solutions

Matthew Dovey, Oxford University, JAFER project, gave a presentation on Z39.50: Old Criticisms and New Developments. Z39.50 is a substantial client/server protocol with an extensive range of facilities. In typical use, a client will send a search request which specifies items of bibliographic information; the server will identify a matching set of bibliographic records within its database, and permit the user to selectively retrieve these in an agreed format. Z39.50 clients may reside on a user’s desktop machine, but more commonly exist on web Z39.50 proxies, which users access over HTTP. A number of problems have arisen in the practical use of Z39.50 services for various reasons: weakness in implementation, vendor ignorance, and unrealistic expectations. Rightly or wrongly, a substantial number of negative perceptions are also commonly held: commercially irrelevant; overweight and expensive; resistant to integration. Various initiatives have attempted to redress these problems: the Bath Profile, which sets a clear standard for implementers to observe; Z39.50 embedded in Windows; the JAFER project. This project has undertaken a number of developments: Client Beans is a JavaBean encapsulation of a Z39.50 client; Server Beans is a Z39.50 front end with a set of pluggable back-end data handlers for different data environments. Further facilities have been developed to enable non-programmers to handle data using Z39.50. These implementations are all available as open source. So whereas Z39.50 has had its critics, a number of developments are now underway to improve its accessibility and reliability.

In her second presentation, Jenny Walker used the example of the Ex Libris SFX model to describe the use of OpenURLs for context-sensitive reference linking. This addresses the ‘appropriate copy’ problem by divorcing the generation of links from the information resources that provide the citation metadata. Rather than linking directly from the OPAC or A&I database, links are directed to a resolution server local to the user. This server is configured with knowledge of the rights of local users (such as subscriptions) and can therefore identify the location of services which will provide that user with full text or other services relevant to the metadata record. While Ex Libris are vendors of the SFX resolution server, the technology is essentially open, and other types of resolution server may be deployed.

The enabling technology for this open linking approach is provided by the OpenURL draft standard, which is currently proceeding through the NISO standardisation process as a fast track work item. The OpenURL has two parts: a base-URL which identifies the user’s local resolution server, and a content component which contains elements of metadata associated with the information object of interest. An information service which supports open linking must be able to determine whether a requesting user employs a local OpenURL resolver (using one of several possible methods), and, if so, must provide OpenURL links for each reference object (e.g. citation) it holds. If a user consulting such an information service clicks one of these OpenURL links, the user’s browser is directed to the local resolver where the associated metadata is resolved. The resolver then presents the user with a choice of extended services relevant to the information object: full text, abstract, author information, local library holdings, or other. Many types of information service may provide OpenURL links: A&I services, ToC, OPACs, e-journals. An attractive feature of the approach is that the local resolver is under local control, and provides the librarian with a single point of administration, and the means to implement service access policies. Work has been done to demonstrate how the SFX approach and the CrossRef approach (described below) can operate in a complementary manner.

Richard O’Beirne of Blackwell Publishing gave a presentation on Digital Object Identifiers (DOIs) and CrossRef. Blackwell publishes a substantial number of scholarly journals and has worked on the Crossref Initiative since its inception, together with other major STM publishers. Publishers had recognised that while adding internal cross-linking capabilities to their electronic full text publications provided a useful service, what users really wanted was full linking across the range of all publishers’ journals. Solving this problem by means of bilateral agreements becomes impractical as the number of participating publishers grows. The CrossRef solution was to establish a single resolution service for journal articles. Each participating publisher embeds persistent links from their article references to the cited articles held at other publishers’ sites. CrossRef uses DOIs for this purpose. Resolution is supported by the DOI/Handle resolution system which maps a DOI to the location of the corresponding content. While DOIs themselves are persistent, the use of the resolution system means that links will still work, even if a publisher reorganises its URLs internally, or a journal moves to a new publisher altogether.

When publishing an article, the publisher deposits metadata for the article (including its DOI) with CrossRef. In addition, for each reference within the article, the publisher queries CrossRef to discover the corresponding DOIs, and embeds these as reference links in the online journal article. When a user clicks one of these reference links, the DOI contained within it is resolved by the Handle system, and the user is automatically redirected to the cited article on the responsible publisher’s site.

While CrossRef has enjoyed much success (used in over 3 million published articles by 2001), substantial revisions to the technology are underway. The existing ‘one-to-one’ model (one DOI resolves to one URL) will be replaced by a ‘one-to-many’ approach, so that functions other than simple full text location can be provided. A requirement will be made that standardised metadata is supplied whenever a DOI is registered. This will extend the range of possible uses of DOIs, and will enable the multiple resolution goal. The issue of the ‘appropriate copy’ problem has been recognised, and a prototype which addresses this has been developed in collaboration with DLF, CNRI and IDF.

Implications for JOIN-UP

Peter opened the final discussion session with a reminder of the ‘Rosetta Stone’ model for a general-purpose locator/broker. In the context of JOIN-UP, the locator function will be investigated and demonstrated by the ZBLSA project. This will deliver a pilot facility designed to meet the needs of portals in the DNER. Given a metadata record describing an article, and information about the requestor, ZBLSA will identify a set of services on the article available to the user.

The importance of identifiers of different types was discussed, particularly with regard to the historical base of print journals. It was noted that for journal articles, the ISSN has particular significance. There was some opinion that a lower priority should be given to the role of Z39.50 in the ZBLSA project. The OpenURL is becoming widely accepted as a mechanism for the encapsulation of object metadata over HTTP, and should be seriously considered as the main mechanism for conveying simple service requests to ZBLSA.

Acronyms

A&I	Abstracting and Indexing
BICI	Book Item and Contribution Identifier
CNRI	Corporation for National Research Initiatives
DLF	Digital Library Federation
DNER	Distributed National Electronic Resource
DOI	Digital Object Identifier
IDF	International DOI Foundation
IFLA	International Federation of Library Associations
INDECS	Interoperability of Data in E-commerce Systems
IOP	Institute of Physics
ISAN	International Standard Audiovisual Number
ISBN	International Standard Book Number
ISMN	International Standard Music Number
ISSN	International Standard Serials Number
MARC	Machine-Readable Cataloging
NISO	National Information Standards Organization
OCLC	Online Computer Library Center
OLE	Online Learning Environment
OPAC	Online Public Access Catalogue
RDN	Resource Discovery Network
SICI	Serial Item and Contribution Identifier
STM	Scientific, Technical and Medical Publishers
ToC	Table of Contents

References

JOIN-UP Programme Web site at http://edina.ac.uk/projects/joinup/

Author Details

Sandy Shaw
EDINA
University of Edinburgh

Email: S.Shaw@ed.ac.uk