If the concept of parallel searching of catalogues via Z39.50 is stimulating, the initial manifestation is truly exciting. Maybe not exactly Alexander Graham Bell or Archimedes territory but life-enhancing nevertheless: to have been working on the implementation of an idea for over twelve months, as the UK eLib clumps projects have, and suddenly see bibliographic records returned simultaneously from a search across multiple library catalogues, makes it seem that all the arguments, stress and technical tinkerings have finally been worthwhile. Only as the first exhilaration subsides does reality kick in, the questions start and systematic testing take over.
There is little doubt that Z39.50 in its current state of development is not the panacea for all the interoperability and resource discovery ills assaulting the conjoined international bodies in librarianship and information science. Many of the problems – and the reason for the problems – have been well documented and, accordingly, are common knowledge in the Z39.50 community: three recent papers have appeared in Ariadne alone , ,  and this present contribution re-visits some of the issues and considers their potential effect on end users.
At the same time, the rationale of using clumps to kick-start large-scale resource discovery was a major incentive behind their initiation, particularly as leading from this could be seen the possibility of the creation of a UK virtual union catalogue. Of course, in parallel with the clumps, JISC, the Joint Information Systems Committee, has also been funding COPAC, the CURL OPAC , as a physical union catalogue and a study comparing these two approaches to a national holdings database is expected during 2000 (a study on the implications of Z39.50 on COPAC was conducted on behalf of CURL in 1998 ). So, with current Z39.50 limitations, will the clumps projects – bearing in mind that they are not yet complete – be consigned to the dustbin of eternity, seen as an interesting side road that inadvertently ended up as a cul-de-sac? Against this, the projects have always insisted that they were not purely technical exercises and, in considering success or failure, not only the impact of the technical limitations needs to be considered – with the knowledge that improvements here will definitely be forthcoming – but also wider issues relating to service. This paper is an attempt to provide a brief, critical review of the current state of clumping in the UK. It begins by making some comparisons between clumps and the alternative of a physical union catalogue, a comparison that should not be seen, in any way, as a criticism of COPAC but simply as ‘a view from the clumps’. Having established that clumps are ‘a good thing’, the paper then goes on to outline ways of focusing searching via collection descriptions and dynamic clumping, considers some technical concerns about Z39.50, and ends by looking at the possibilities for inter-connecting clumps.
Clumping the UK union catalogue
Catalogues as collections
Apart from perhaps those sectors of society where a lingering idealism remains, the concept of perfection, certainly of the perfect creation, has long since been overtaken by sheer pragmatism, lack of time and economics. The idea of the perfect national union catalogue, a comprehensive, monolithic edifice within which can be located quickly and precisely all items for research, education or pleasure falls somewhere down the possibility stakes, though, sensibly, that does not stop librarians re-invigorating the idea from time-to-time and updating it to take account of the latest technologies. The key to a useful national union catalogue is comprehensiveness, making it the ideal jumping-off point for ‘known item, only-one-in-the-country searches’; the key to a useable national union catalogue mixes comprehensiveness with other characteristics such as a good user interface, ease of access, speed of response and currency.
Comprehensiveness in this context might be easy to define – all the books, monographs and other items published by the nation, together with all foreign publications purchased by the nation’s libraries – but creating (the perfection of!) the national union catalogue presents a number of practical hurdles. Not least is the fact that significant collections of interest to staff and students in higher education are held in non-HE libraries and, while cross-sectoral initiatives are increasingly encouraged, the funding required of such a large-scale project is unlikely to be forthcoming in the near future, in spite of the aspirations created by the possibilities of joined-up government. In other words, a physical national union catalogue will inevitably be manifest as a partial national union catalogue. Whether or not a physical national union catalogue restricted to HE holdings alone would be doomed to failure remains open to speculation. The sheer size of such an undertaking would appear daunting: CURL currently has around 20 full members, representing approximately 20% of HE institutions in the UK, thus leaving COPAC significantly short on the comprehensiveness front. Furthermore, as the collections of the major UK research libraries are the foundation of COPAC, the law of diminishing returns would be expected to kick-in before too long, with each new library representing a significant additional implementation work load for the data centre while providing a decreasing number of unique items for the union database.
The importance of COPAC as a major tool used internationally by researchers, librarians and others in checking the existence of books and their locations and when searching for bibliographic records is beyond doubt. This pre-eminence will remain, whether or not COPAC expands. As stated above, COPAC owes its importance to the significance of the collections of its founding libraries but, even within the overall database, these collections could be viewed as related but defined (and sometimes overlapping) groups: the legal deposit libraries; universities having research excellence in particular subject areas; universities who have gained excellent teaching quality marks in particular subject areas; and university libraries with special collections. From this perspective, a physical union catalogue like COPAC is a ‘clump of clumps’ offering centralised access through a common interface, though in this scenario there is still a risk that the ‘unique item’ search will be unsuccessful due to the economic forces limiting comprehensiveness. A further limitation on comprehensiveness is the retrospective cataloguing backlog in many university libraries, though this will equally effect physical and virtual union catalogues.
The library collections in the eLib clumps reflect a similar – if currently smaller – spread. RIDING , for example, contains a deposit library (BLDSC) and two CURL members (Leeds and Sheffield) together with other strong research and teaching institutions and a broad mix of special collections: a strong clump with a lower national comprehensiveness factor than COPAC, though with the advantage that it includes catalogues not in COPAC. A combination of clumps would increase the comprehensiveness factor while bringing in even more non-CURL libraries. And, as COPAC itself is Z39.50-compliant, this could be searched as an additional database in parallel with the clumps: at this point, the comprehensiveness factor increases significantly, as, consequently, do the chances of locating that one-off unique item. The opportunities for a further improvement in comprehensiveness lie with those clumps that form strong links with public libraries, providing access to a myriad of special collections that might otherwise be lost to higher education. RIDING and CAIRNS  are expecting to be exemplars of cross-sectoral clumps though, as funding becomes available there is the strong possibility that public libraries will form their own clumps which will, in turn, link to those from HE. The idea of considering a national union catalogue (physical or virtual) as a group of collections that can be combined in different ways depending on the needs of the searcher provides some potential for the unique item search, which could be ‘fired-off’ to those collections most likely to match the topic of the item, then broadening out from there as the condensed search failed. The ways of achieving this in a clumped environment, through the use of collection descriptions and dynamic clumping are indicated in the next main section.
Union catalogues and services
One of the key differences between virtual and physical union catalogues would appear to revolve around service. The four UK clumps projects are being developed by consortia and a consortium implies a group of libraries working together to provide, as far as possible, extended co-operative services that benefit all members. A consortium implies trust, the willingness to test new ideas and services with others, while offering degrees of protection for one’s own institution and users. As the originator of COPAC, this is just what CURL has done, and it has explored – though not yet initiated – interlending possibilities as an extension of its database. As far as the clumps are concerned, extended services – including access, interlending and sometimes borrowing – have been closely related to each Z39.50 gateway. For example, RIDING has created its RAP, the RIDING Access Policy, negotiated during the project and allowing all ‘accredited researchers’ (postgraduate students, researchers, academic staff) to borrow from any member library on production of a RIDING Access Card; interlending services associated with the clump gateway are also being piloted among sites.
Services such as these necessitate a wide involvement at all staffing levels and across many functional areas before being put into effect, with the consequence that the clumps have not developed as isolated projects but have been integrated into existing services to real users at all member libraries. A key advantage of this approach is that staff across each consortium know each other well, understand and support the aims of the clump and accordingly respond to users’ needs in a pro-active manner. By comparison, it is suspected that as the physical catalogue model grows to improve its comprehensiveness it will become monolithic, necessarily separated from real services, real users and individual libraries, and interpose a layer of impersonality and, potentially, bureaucracy between the end user and the item required. Certainly in comparison with an integrated resource discovery and request service such as being developed with Fretwell-Downing Informatics for RIDING, where an item identified in a catalogue search can be dropped into an interloan request form and dispatched to the home library for mediation, the relationship between a bibliographical record of an item and its actual use is more remote in the physical union catalogue model.
Closely associated with this higher degree of service orientation from the virtual catalogue is the ability to customise the gateway interface so that it more closely reflects the needs of librarians and users of the home consortium. This might include information on consortium services, details of individual libraries and their collections (though see below), descriptions of limitations of the responses of individual Z-targets, and authorisation and authentication procedures. The ways in which targets are presented to users may be changed to reflect consortial circumstances, together with links to other clumps, some of which may have inter-consortial relationships, some of which may not. Similarly, with greater ‘home’ control will come the possibility of adding other, non-catalogue or non-bibliographical Z-compliant resources, widening the range of the gateway and its potential usefulness.
Refining and focusing
Moving from the concept of the virtual union catalogue – the clump as ‘a good thing’ – to its practical manifestation free of technical problems, is not a seamless operation and some of the issues surrounding the transition are covered in the next two sections. But even if the technology is flawless and even if clumps can themselves readily be clumped, how can users navigate through the jungle to find the tree they need? Little enough is known about the links between user needs, search behaviour and results, though the present generation of web search engines does not always give cause for celebration, encouraging as it does undirected searching. This approach does not help the user, particularly the increasing numbers with limited time, and it potentially puts increased loads on networks and servers, neither of which – this being emergent technology – have been tested. It could, however, be reasonably predicted that a virtual national union catalogue that ground networks to a halt, slowed to a crawl individual library management systems through external load, and wasted the time of users would not be an unqualified success. From a different perspective, extremely high demand could also blight the performance of the network connections to, and the machine performance of, a physical union catalogue. Methods that enable the user to refine searches of the virtual union catalogue should alleviate the problems and current attempts are focused on the identification of collections. The use of collection level descriptions is being investigated by RIDING and the rather different approach of dynamic clumping being pioneered by CAIRNS is described by Dennis Nicholson in this issue of Ariadne.
Through the use of collection descriptions, it is hoped that users will discover collections of interest and then search just the relevant databases rather than all those available. Furthermore it is hoped that users will be able to perform searches across multiple collections in a controlled way and that software will perform such tasks based on known user preferences. The eLib Collection Description Scheme has been developed as 29 metadata elements  and is being implemented in RIDING; ways of encouraging use prior to a clump search, particularly through interface design, are under discussion.
Is Z39.50 fit for purpose?
Several papers have dealt with the technical issues surrounding Z39.50, considering, in particular, some of the deficiencies and difficulties of implementing the protocol across a range of library management systems, in much the same way as the clump projects. Results of testing the clumps in their present developing state indicate similar inconsistencies in bibliographic data retrieved and the potential for confusion when these records are returned to end users provides some cause for concern.
The RIDING experience of analysing Z39.50 search results is fairly typical. It suggests that substantial inconsistency occurs in the response of targets to author searches: while a simple surname search invariably delivers the expected goods, though with some data overload if the surname is non-specific, there are pronounced differences between targets on the introduction of first names into the search string. In these cases, some targets respond with zero hits irrespective of the format of the first name, others produce a reduced sub-set using first name initial, while others produce yet a different sub-set with first name in full. Attempting to increase search specificity by combining keywords – say author and title – similarly works well with some targets and completely fails with others.
With a knowledge of the operation of the Z39.50 attributes, their combinations and order, the methods behind the construction of the indexes of individual library management systems, and other reactions of each target, responses such as that described above can be explained. It is also possible that with ‘query adaptation’ results can be matched more closely to expectations by adjusting the value of attributes and order in which they are transmitted to targets. If such query adaptations do not work, then analytical explanations on the performance of targets serve little purpose if the signals sent to end users result in a lack of confidence. One view – already expressed by more than one RIDING librarian – is that the users of our library systems possess well-honed searching skills and that any move away from the excellent capabilities of the existing systems will result in negative feedback and bad clump publicity. Alternatively, the overwhelming use of Web search engines, with an acceptance of high recall and low relevance, must also have affected the expectations of users of library OPACs. It is extremely difficult to predict reactions to new systems and it is just possible that users will bring a tolerance to Z39.50 gateways not currently shown by some expert searchers of OPACs. This tolerance, coupled with a little experience, familiarity and understanding – the qualities needed in life – may elicit an enthusiasm borne of the additional possibilities and services provided, in spite of inherent limitations. The clumps intend to obtain feedback on these issues during evaluation.
This is not to say that clump projects should (or will) rest on their laurels and fail to develop better searching possibilities within the current restrictions of the Z39.50 standard: all the projects are committed to implementing the best gateway possible. But when all the technical steps have been taken, there will probably remain a need for pop-up boxes, help screens and other devices to provide explanations on use and limitations, at least in the immediate future. RIDING, for one, is developing these as an active service to users and Ridley  has indicated similar support mechanisms in BOPAC (the Bradford [or ‘Better’] OPAC), to clarify the retrieval of items that don’t, at first sight, appear to match the search criteria. Other help is also at hand: there is little doubt that the formerly fragmented world of Z39.50 implementations has benefited tremendously from the internationalisation of co-operation that has occurred over the last few years and the recently announced Bath Profile  is the latest manifestation designed to bring a greater harmony to the interpretation and use of attribute combinations and catalogue interoperability.
One of the other current limitations with Z39.50 systems – the general lack of holdings and circulation data – will hopefully have a platform for resolution in the near future. The ZIG (Z39.50 Implementors Group) Holdings Schema  is expected to be ratified in January 2000 and some vendors are committed to implementing this within 6 months though, of course, others may take longer (while some targets can already return at least holdings details). Interpreting holdings information is further complicated by variations in cataloguing practice between (and sometimes within) institutions.
Clumping the clumps
While the previous section indicates a generally optimistic outlook for future developments in Z39.50, the clumps projects have spent much of their two years or so duration resolving and clarifying technical issues to ensure that gateways to their groups of resources are up and running. To move these individual clumps towards the beginnings of a virtual national union catalogue by clumping the clumps ramps up the complexity and is something that has yet to be explored in practice. It is hoped that the comments in the section below will contribute to that process.
The addition of an external resource to an existing clump for either one-off or regular searching is straightforward: a secondary list of potential targets – COPAC, Library of Congress, etc. – can be provided and users select those of relevance (possibly guided by collection descriptions). For the duration of the search, the additional resources become, in effect, part of the original clump. In exactly the same way, the secondary list could incorporate other clumps, each of which would behave as a single target to the originating gateway. It is assumed, in this scenario, that the originating gateway would not need to know the attribute settings used by the secondary clumps for their individual targets: the gateway would be treated just like any other end user. There is also the question of resource and network efficiency which may lead to forcing a search of the home clump first. From the way the existing clumps have developed, it is clear that there is good reason for this: a clump such as RIDING has been established to offer services to the combined clientele of its institutions. For example, non-RIDING members may search library catalogues via the RIDING Gateway but they will not be eligible for added value services such as ILL. In this way, for members, the home clump may be considered an access tool while for non-members it is a resource discovery tool.
The practicalities of this scenario need testing clump to clump but, on the surface would seem to offer few problems. The most effective way of presenting other clumps to end users needs addressing, in addition to a consideration of the requirements of end users for searching secondary clumps. These might include: distance from study base (‘I’m prepared to travel x miles or for y hours’); a geographical zone (‘I’m going to London for the weekend; which libraries will offer…’); and resource strengths. Resolving these questions will raise user interface issues and the possibility of extending collection descriptions by incorporating geographic and travel information. In terms of technical efficiency, this approach would put less load on any one node (e.g. the home clump) by ‘stacking’ servers, though it does not reduce the overall network traffic or load on the resources being searched. Indeed it could increase this by encouraging users to search the whole of a remote clump or even searching the same resource twice: where, for example, the same catalogue was included in a geographic clump and a subject clump.
Of course, the straightforward scenario is simplistic. Few users will need to search all the catalogues of a secondary clump, particularly with the implementation of effective collection descriptions. Their needs will preferably be met by a pick-and-mix approach, choosing catalogues from the home clump, adding a further resource from a secondary clump and yet a further database from another clump location. The question then arises, ‘How do you do this technically?’: in the absence of a universal adoption of the Bath Profile, can the attribute settings, carefully adjusted to provide efficient searching from one clump gateway, be utilised ‘on the fly’ to external gateways? A further question is, ‘How do you do this in user interface terms?’: what is the best way of presenting clumps to be used in a pick-and-mix manner? The relevance of these issues to national clumping discussions has become apparent during the existing projects and current timescales and budgets will not allow their exploration in detail. Additional work is required to take this further.
At present, both the physical and the virtual routes to a national union catalogue have limitations. In terms of today’s technology, the physical model predominates but would require the resolution of a number of organisational and political issues to provide the comprehensiveness expected. For example, if the physical model were to be based on COPAC, CURL would have to agree a way forward for a significant expansion of the database in addition to discussing methods for incorporating non-HE records. Technology for the virtual national union catalogue is moderately sound as far as it goes – basic search and retrieve – while tomorrow’s developments will lead to improved interoperability and services to users via initiatives such as the Bath Profile, the ZIG Holdings Schema, further refinements to collection descriptions and the use of dynamic clumping. A considerable amount of work remains to be done to develop inter-clumping and the user interface associated with it but the clumps do indicate an attractive way forward, based as they are on services, customisation for their home client groups and the possibility of extensions to include non-bibliographic databases. What is clear, is the tremendous enthusiasm on the part of both physical and virtual clumpers for moving the technology forward and an awareness that they need each other to achieve real changes in services for end users.
- . Dye, Juliet and Harrington, Jane (1999), Clumps in the real world: what do users need? Ariadne, no. 20. http://www.ariadne.ac.uk/issue20/clumps-workshop/
- . Ridley, Mick (1999), Practical clumping. Ariadne, no. 20. http://www.ariadne.ac.uk/issue20/bopac/
- . Miller, Paul (1999), Z39.50 for all. Ariadne, no. 21. http://www.ariadne.ac.uk/issue21/z39.50/
- . COPAC: http://copac.ac.uk/copac/
- . CURL feasibility study to investigate potential applications and strategic implications of Z39.50 technology on the COPAC service: http://www.curl.ac.uk/z39-50_project.htm
- . RIDING: http://www.riding.ac.uk/
- . CAIRNS: http://cairns.lib.strath.ac.uk/
- . Brack, Verity (2000), The eLib Collection Description Scheme. Vine, no. 116 (due January).
- . The Bath Group, October 15 1999. The Bath Profile: an international Z39.50 specification for library applications and resource discovery. http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/
- . Z39.50 Holdings Schema: http://lcweb.loc.gov/z3950/agency/holdings.html
- The next CLUMPS event is: Library Resource Sharing and Discovery: Catalogues for the 21st Century. This is a one-day workshop (two locations, London and Glasgow) presented by the eLib Clump Projects and co-ordinated by UKOLN. The London event is on March 3rd, and the Glasgow event happens on 11th April. Further details are available at: http://www.ukoln.ac.uk/events/elib-clumps-2000/intro.html