Web Magazine for Information Professionals

Clumps Come Up Trumps

Helena Gillis, Verity Brack, John Gilby and Marian Hogg review the four eLib CLUMP projects now at the end of their funding periods.

This article is an end of project review of the Large Scale Resource Discovery strand of the eLib Phase 3 Programme. Four ‘clump’ [1] projects were funded, CAIRNS, M25 Link, and RIDING are regionally based, and Music Libraries Online (MLO) is subject based.

One question that this article aims to answer is ‘Have the clumps projects been a success?’ The following sections highlight some of the many issues that the four projects have looked at and the progress that has been made.

Technical Issues

Z39.50 Software

All the projects were using the Z39.50 protocol [2] for bibliographic search and retrieval; this protocol uses the client-server model, and the clumps have developed their own clients to search their libraries’ catalogues. When the projects began in 1998 a limited number of options were available: M25 Link and CAIRNS chose the bespoke route, looking at what Z39.50 clients could be obtained, both free of charge and commercial, whereas RIDING and MLO used the Fretwell-Downing [3] VDX software.

Use of Z39.50 has increased throughout the lifetime of the projects and it is now the norm for academic libraries (and significantly, other sectors) to include a Z39.50 server in their library systems. There are many different Z-clients now available and it is not obvious which software package is the best for any given application. Availability, support, cost, ease of installation, ability to configure to specific Z-servers, skills needed to make changes, future expansion, and system requirements are some of the issues considered when making a choice of software.

M25 Link adopted the Europagate [4] software at an early stage of the project and carried out development work to build a Z-client into the existing M25 Web Guide. Following evaluation, the team is implementing a new user interface with two additional clients: the Index Data PHP [5] client and the Zeta Perl client to compare searching performance. One facility that has been designed into M25 Link is a database that contains all Z-server connection details, bibliographic attribute settings, mappings for returned data and general information about the member catalogues.

CAIRNS has investigated a number of gateway packages including GEOWEB [6], ZAP [7], Europagate and ELiAS [8]. Work on GEOWEB was frozen when it was found that the gateway, in its current form, was unable to send different search attributes to each target database. epixtech software [9] was adopted as the main avenue for development, when investigation showed that it is possible to use it to develop a mechanism (termed ‘dynamic clumping’ by CAIRNS) to produce a sub-set of targets for users to search (see section 3.2), and to display accurate holdings and circulation information. ZAP and ELiAS are of interest because they can handle both MARC and GRS-1 records, a requirement for the incorporation of non-library targets in the clumps.

Fretwell-Downing’s VDX gateway software is used by a number of existing services, such as AHDS [10], as well as RIDING and MLO. It offers a fully-featured, integrated inter-library loan system as well as a Z-client. Both RIDING and MLO were able to customise their VDX user interfaces, and the software can be configured to specify different search attributes for each target (‘query adaptation’). MLO also have a JAFER software demonstrator [11] available from their web site; this software is to be developed under the JISC Distributed National Electronic Resource (DNER) [12] development programme.

Holdings Information

Three of the projects have looked at the retrieval of holdings information, something that is often not available to a Z-search although of great importance to users. The epixtech software used by CAIRNS enables the display of detailed holdings for each item record from libraries using epixtech systems. It is planned to expand this to retrieve data from other system types in the future. The VDX software used by RIDING and MLO is currently able to access holdings information from Innopac systems.

M25 Link can access general holdings and location information with Aleph, Innopac, Talis and Unicorn systems, and in the future with Horizon. M25 Link has also implemented a specific periodical title search option, essential for users seeking serials information.

De-duplication

Another software issue is the combining and de-duplicating of results. M25 Link combines the results from searched catalogues and then runs a de-duping routine. Whilst not perfect, this does reduce the quantity of information on-screen for the user to look at. However, some users, particularly cataloguers, are wary of the accuracy of operation of de-duping routines, and others have questioned its usefulness.

A requirement that all the projects have found essential is that the Z-client must have the facility to be tailored to individual library catalogues. This theme is expanded below.

Interoperability Issues

Interoperability is the term used for different systems working with each other and with each other’s data. Interoperability is of many types – technical, semantic, political, human, domain, community etc. [13].

From the clumps perspective there are two main interoperability issues: technical interoperability for Z39.50 use between different library system, and the consistency of data in the catalogues of the clumps libraries.

Z39.50 Attributes

Bibliographic applications of Z39.50 use the Bib-1 attribute set to define access points to bibliographic records. This is a set of 99 Use attributes, such as Title, ISBN, Date of publication etc., and related Structure, Position, Completeness, Relation and Truncation attributes of differing values, that are used in conjunction with a Use attribute. Z-servers normally have only a selection of these attribute values implemented rather than the complete set, and so incompatibility is likely to occur between elements of the Z-client’s search request and the particular configuration of Bib-1 that is supported by a Z-server.

If a Z-server does not support an attribute value that is specified by a Z-client’s request, it can respond in a variety of ways:

· it may invoke an alternative, default value; e.g. if the requested entry point is not recognised, the encompassing Use attribute ‘Any’ could be used.
· it may reject the search request and return an ‘unsupported attribute’ error message.
· it may not complete the request.

Use attributes are mapped to particular indexes within the Z-server, so the Author attribute (1003) for example, would access an author index; occasionally this mapping is inappropriate so although data is returned, the hits appear to have little relation to the search term used.

Profiles

Given the considerable scope for variation in the implementation of Bib-1 attribute values, it is necessary for any Z-gateway to utilise some form of controlling mechanism. Such mechanisms are referred to as profiles, the essential function of which is to regularise the performance of the client and server in any Z39.50 association. The Bath Profile [14] has relatively recently gained ISO recognition and is gaining acceptance in the library community. The profile offers a suite of nineteen search options grouped into three levels of conformance. A full set of six attribute values is specified for each search type.

Given the enhanced interoperability that this profile affords, M25 Link will formally adopt it, and is in the process of holding discussions with supplier representatives of the seven library systems currently included in the M25 Link. CAIRNS, RIDING and MLO also support the adoption of the Bath Profile.

Cataloguing and Indexing Issues

As is often the case with any form of union catalogue, cataloguing practices and conventions used by member institutions and library systems can lead to a wide variety in usefulness of results for the poor user. Cross-searching highlights the differences between systems, and it is important that users understand that the quality of the data received is not necessarily dependent purely on the functionality of Z39.50.

Bibliographic Standards

One of the obvious differences among catalogues is use of the MARC standard: partner libraries of the clumps variously use USMARC or UKMARC, and many have also introduced local fields; at least one clump library does not use MARC at all.

It is possible to develop mapping specifications that 'translate' each Z attribute to the relevant MARC tags in each database, and MLO has taken this path. This has the effect of refining the search and display facility considerably, but the rules have to be adapted for new partners joining the clump. This is obviously not an ideal solution in the long term, and Z39.50 searching would certainly benefit from the potential harmonisation of MARC standards represented by MARC21. The British Library is currently moving towards adopting MARC21, and if this were more widely adopted it would be a welcome development in overcoming confusion in the use and meaning of MARC tags.

The clumps have looked closely at index 'content' issues, and found that there are wide variations in the actual content of the bibliographic record. Most specialist material, such as music or electronic datasets, presents difficulties in cataloguing which are not fully addressed by traditional international cataloguing rules and standards. Interpretation and adherence to AACR2 is often inconsistent, for example, MLO has found problems in the application of Uniform Titles; transliteration of non-Western alphabets (Rachmaninov, Rakhmaninov, Rakmaninoff etc); use and citation of thematic catalogues and/or opus numbers (crucial to music as, in the absence of an ISBN, these often represent the only unique identifiers); designation of the function of added names, such as performers; and subject headings, where almost all the libraries had their own internal system.

Some of these variations are more crucial than others, and CAIRNS and MLO have drawn up guides of recommended cataloguing and indexing practice [15] to encourage a greater uniformity within the consortia. SCURL – the Scottish Confederation of University and Research Libraries [16] - has recently adopted the CAIRNS recommendations. Without major retrospective conversion of catalogues there is never going to be complete uniformity, but progress in identifying the extent of diversity of practice has been made.

Indexing Policies

As well as the different interpretations and applications of MARC tags, a further complication is that of different indexing policies. It is quite possible that each library system will have its indexes built from different sets of MARC tags. For instance, a Name index may include editors, performers, illustrators etc., as well as authors, and an Author search may well produce false hits because of this. Equally, in records where the Notes field has been used extensively for additional names, such as performers on a sound recording or video, the names may well not appear in the Name index, and a personal name search would return no results. Of course, these problems are not unique to virtual union catalogues, and every catalogue has its own search and retrieval idiosyncrasies. However, it is much more complicated trying to guide a user through the multiple possibilities of search and retrieval that the cross-searching of diverse catalogues presents than it is for a single OPAC.

Clumps projects cannot expect partner libraries to make massive changes in their databases to suit cross-searching, but each project has encouraged its consortium libraries to think about interoperability in the long term, so that when new library systems are introduced, or major retrospective conversion programmes undertaken, some of these issues are taken into consideration.

User Interface

Interface Design

The development of web-based virtual union catalogues creates a tension between OPAC and web searching. In his article on building a union catalogue in an earlier edition of Ariadne, Matthew Dovey [17] identifies the two major concepts of information retrieval: recall and precision. An OPAC user would usually expect high recall and precision, in that they expect a query to return a comprehensive set of relevant results. Web users, on the other hand, are used to low recall and precision, as they are generally searching unstructured data. Naturally, the virtual union catalogues seek to be as like an OPAC as possible, but they do not have the advantage of a single set of cataloguing and indexing policies, consistent rules, or a single structured database.

Interfaces for cross-searching therefore need to be something between a traditional OPAC and a web search engine. All the clumps projects have consulted with their users to establish what their requirements are, and have designed their interfaces accordingly. The results from both surveys could be summarised by the three C's of Choice, Clarity and Customisation.

· Choice: Users want to be able to select the catalogues they search, rather than be faced with the whole range of libraries available. They also want to be able to choose from a wide range of access points, with minimum guidance.

· Clarity: Users require logical and clearly laid out choices, such as the selection of catalogues; subject strengths should be organised in some form of classified sequence. Terminology should be appropriate to the user community, and avoid web or computer style jargon. Instructions for searching should also be clear, and the display of results should indicate the method by which it is ordered, and include, if possible, the facility to sort.

· Customisation: Evidence of an increased familiarity with web searching over OPAC use shows that users require and expect a customisable interface, whereby they can store a personal 'profile' of selections which they will use regularly.

Selecting Catalogues for Searching

Each of the clump projects has been faced with the problem of how to guide users to the collections most likely to meet their needs. Even with the relatively small number of collections included in each of the clumps, it does not make sense, whether in respect of a user’s time, or in respect of gateway and network efficiency, to search to all available collections. A mechanism to narrow the focus of an enquiry to fewer, more suitable collections is a sensible approach even within a single clump. The method that the clumps are using to assist users to target their searches more precisely is that of Collection Level Descriptions.

The development of a collection description scheme for the clumps involved input from a national working group [18] and also from other eLib Phase 3 projects. The scheme has to encompass collections of physical items, collections of digital surrogates of physical items, collections of other digital items and catalogues of such collections. The basic scheme was developed and refined, and is used by RIDING and Agora at their gateways. Users are able to search collection level descriptions for location, subject matter, access details, and other information. MLO have also based their collection descriptions on this scheme, and have additionally developed a list of specialist subject headings based on the adoption and expansion of the BUBL [19] list of subject terms for music.

The M25 Link and CAIRNS clumps use slightly different approaches; originally, M25 Link allowed users to identify collections with particular subject strengths (based on the M25 librarians’ own views of their collection strengths), and also to select by geographical area within the M25. Following evaluation, a new interface is under design that has a default of selecting catalogues by geographical zone only. Options are also available for selection of catalogues by subject strength, individual institution pick list, and access arrangements. CAIRNS has adopted the Research Support Libraries Programme (RSLP) Collection Level Descriptions schema [20] using a SQL database (MS Access) to store location, subject and other information, which can be searched and displayed in a client web browser using ColdFusion. Collections can be identified using subject strengths taken from the Research Collections Online (RCO) [21] service of SCURL, and by geographical location in Scotland. As well as 'dynamic clumping', CAIRNS also offers pre-selected 'mini-clumps'; subsets of the available Z collections and searches, selected by and for specific user groups.

Work on collection level description schemes is being taken forward in a number of ways, particularly by the RSLP Collection Description project, and the High Level Thesaurus (HILT) Project [22], which is investigating subject vocabularies. Development of similar schemes is also taking place in other domains, such as the museums sector, as the importance of metadata increases.

Additional Clump Services

Access to Resources

The clump projects were based on the need for large scale resource discovery, and despite interoperability difficulties, have all succeeded in providing services for resource discovery. However, it is inevitable that a user, having discovered a pertinent resource, will then wish to access it. Thus the clumps have been faced with additional issues regarding physical access for consultation purposes and/or borrowing access.

Reciprocal Access and Borrowing Policies

Initially, the clump projects merely provided information on access and loan rights but subsequently reciprocal access and borrowing policies have been developed for some of the consortia, and the investigation of inter-library loan services through the clump gateways is in progress. Some clump projects include non-academic library targets - RIDING includes the Leeds Library and Information Service for example – and implementation of reciprocal policies becomes more problematical when users outside the academic sector are included.

Obviously the display of detailed holdings information in the item records is of great importance here (see section 2.1 above); users would also like to see availability information if possible. Z-searches normally only return the Z-target location e.g. University of Sheffield, and cannot necessarily state which branch library holds the item. Other holdings type information is returned with some searches: RIDING returns the call number for sites with Innopac library systems; CAIRNS returns the call number, holding status and location information for sites with epixtech library systems. M25 Link returns library location for most system types, and recent versions of Unicorn are also able to include classmark and circulation data.

Inter-library Loans

RIDING has established an inter-library loan service as an extension to its virtual union catalogue. The user can select the ‘Request’ link beside an item in their list of search results; and, if authorised, access an inter-library loan form, already populated with the appropriate bibliographic details. The form data can be edited if necessary, and the ILL request submitted. The system automates the processing of requests to supplying libraries, and the complete loan cycle for an item is tracked. A user can check the progress of the request at any time by logging in to their own RIDING account.

There are many developments to be made and issues to be resolved before on-line ILL is available to all users however; these include:

· copyright
· conflict with existing ILL systems
· relationships with the British Library Document Supply Centre
· adoption of the ILL ISO standard by system suppliers
· reciprocal lending agreements

Inter-library loans are a key area which the clump services have the potential to support, but further development is required.

Future Clump Developments

The primary aim of each of the eLib clump projects is to provide searches to a virtual union catalogue for their consortium sites. However, the wider vision for the clumps has always been to use the cross searching mechanism as a way of increasing access to resources. For this reason, each of the clump projects investigated the development of a number of service extensions to the central clump search mechanism. This has included the investigation of inter-library loans services (see above), authentication and authorisation facilities, and the manipulation of records to suit the needs of users. Evaluation has shown that users (students, academics and subject librarians in particular) of the clump services would like to see further development of the services. This development would include work on all the issues raised in this article and also on information landscapes and individual user profiles.

Conclusion

To answer the question posed at the start of this article, the (perhaps a little biased!) answer is ‘yes’, the projects have shown that Z39.50 is viable and can be used successfully to produce a union catalogue. Evaluation of the clumps services have indicated that end users are amazed that cross-searching is possible, and are happy to cope with its short-comings, at least for the moment. Cross-searching in the ‘real world’ will always have to cope with inconsistencies, and it is very unlikely that it will be able to achieve highest recall and highest precision, but it does provide a rapid and easy-to-use service.

The projects have generated interest and stimulated development in several areas; a list of successful outcomes of the clump projects includes:

· working Z39.50 services cross–searching heterogeneous databases
· input into development of the Bath Profile
· development of working collection description schemas
· service developments such as access and borrowing policies and inter-library loan services
· guidelines on cataloguing and indexing for interoperability
· blueprints for development of other clumps and for organisations wishing to join a clump
· discussions with library system suppliers about their Z39.50 servers to improve interoperability and add further features
· encouragement for consortium building, providing a forum for co-operation between consortium members at all levels

The clump projects now have a pool of expertise, which they are willing to share with others who might be interested in creating a clump or joining a clump. Please contact any of the authors of this article for information and advice.

Current Status of the Projects

CAIRNS is due to complete its eLib Phase 3 funding activities on 31 December 2000 and thereafter work will continue under the auspices of the Scottish Collections Network Extension (SCONE) Project [23].

http://www.cairns.lib.gla.ac.uk/; Project Co-ordinator Helena Gillis (h.gillis@lib.gla.ac.uk)

MLO carries on until 30 April 2001 and is seeking funding opportunities to become a live service.

http://www.musiconline.ac.uk/

M25 Link has funding until 31 July 2001 at which time the service will go live with all 40 members of the M25 Consortium of Higher Education Libraries as partners.

http://www.m25lib.ac.uk/M25link/

RIDING completed its project period in the spring of 2000 and continues as a live service for the Yorkshire & Humberside Universities Association until the end of 2001.

http://www.riding.ac.uk.

References

  1. For information on clumps architecture see: http://www.ukoln.ac.uk/dlis/models/clumps/
  2. The Z39.50 Maintenance Agency at the Library of Congress: http://lcweb.loc.gov/z3950/agency/
  3. Fretwell-Downing Informatics homepage: http://www.fdgroup.com/fdi/company/home.html
  4. The Europagate web gateway home page: http://europagate.dtv.dk/wg_index.htm
  5. Index Data homepage: http://www.indexdata.dk/
  6. The GEOWEB homepage is at: http://www.library.geac.com/default.asp?Page=geoweb-na
  7. The ELiAS homepage is at http://www.elias.be/
  8. The ZAP homepage is at: http://www.indexdata.dk/zap/
  9. The epixtech homepage is at: http://www.amlibs.com/
  10. Arts and Humanities Data Service gateway: http://ahds.ac.uk:8080/ahds_live/
  11. The MLO experimental JAFER gateway: http://samantha.las.ox.ac.uk:8080/
  12. Information about the Distributed National Electronic Resource (DNER): http://www.jisc.ac.uk/dner/
  13. Interoperability definitions by Paul Miller, the UK Interoperability Focus: http://www.ariadne.ac.uk/issue24/interoperability/.
  14. The Bath Profile - An International Z39.50 Specification for Library Applications and Resource Discovery: http://www.ukoln.ac.uk/interop-focus/bath/
  15. CAIRNS cataloguing and indexing recommendations: http://cairns.lib.gla.ac.uk/docs/index.html
  16. SCURL – the Scottish Confederation of University and Research Libraries: http://scurl.ac.uk/
  17. Dovey, Matthew So you want to build a union catalogue? Ariadne, issue 23: http://www.ariadne.ac.uk/issue23/dovey/
  18. The national collection description working group report is at: http://www.ukoln.ac.uk/metadata/cld/wg-report/
  19. BUBL list of subject terms for music: http://link.bubl.ac.uk/music/
  20. Research Support Libraries Programme (RSLP) Collection Description project: http://www.ukoln.ac.uk/metadata/rslp/
  21. For information on the Research Collections Online (RCO) database and clumping, see the article by Dennis Nicholson et al, Cairns that go clump in the night: A description of work in Scottish universities on distributed library systems, one of the 'clump' projects under eLib phase 3, in Library Technology, vol.3(5) November 1998 at: http://www.sbu.ac.uk/litc/lt/1998/news1167.html
  22. The High Level Thesaurus Project (HILT) is at: http://hilt.cdlr.strath.ac.uk/
  23. The Scottish Collections Network Extension (SCONE) project is at: http://scone.strath.ac.uk/

Author Details

 
Verity Brack
RIDING Project Manager
St. George’s Library
University of Sheffield
Email: v.brack@shef.ac.uk
Web site: www.shef.ac.uk
 
John Gilby
M25 Link Project Manager
BLPES
London School of Economics
Email: j.gilby@lse.ac.uk
Web site: www.lse.ac.uk
 
Helena Gillis
CAIRNS Project Co-ordinator
Library
University of Glasgow
Email: h.gillis@lib.gla.ac.uk
Web site: www.gla.ac.uk
 
Marian Hogg
MLO Project Co-ordinator
Library
Trinity College of Music
Email: mlo@tcm.ac.uk
Web site: www.tcm.ac.uk