Web Magazine for Information Professionals

Collection Description Focus Showcase: Mapping the Information Landscape

Verity Brack reports on this one-day showcase of Collection Description projects and services held at the British Library, London, 25 March 2003.

The national Collection Description Focus is based at UKOLN [1] and funded by the British Library [2], the Joint Information Systems Committee (JISC) [3] , the Research Support Libraries Programme (RSLP) [4] and Resource [5]. It aims to improve co-ordination of work on collection description methods, schemas and tools, with the goal of ensuring consistency and compatibility of approaches across projects, disciplines, institutions and sectors.

The Showcase Event

There is an increasing emphasis on mapping and managing information resources in a way that will allow users to find, access, use and disseminate resources from a range of diverse collections. JISC’s vision of a managed “information environment” and new programmes of digital content creation have highlighted the role of collection-level description. This one-day event focussed on the challenge of bringing together distributed content and how collection-level description can help to facilitate the process.

The programme included input from several major strategic initiatives including the New Opportunities Fund portal (EnrichUK) [6] , JISC’s Information Environment Service Registry (IESR) [7], the CC-Interop (COPAC/Clumps Continuing Technical Cooperation) project [8] and the Natural History Museum’s Collection Level Description project [9] . The day also featured demonstrator and pilot services from a wide range of collection level description services.

Programme

Chris Batt, Director of the Libraries and Information Science Team at Resource, welcomed us to a day of ‘acronym heaven’, and this proved to be a remarkably accurate forecast.

The day started with a presentation by Chris Anderson, New Opportunities Fund, and Pete Dowdell, UKOLN, who are collaborating on the development of EnrichUK, a portal for cultural, social, artistic, and historical material from the UK. The NOF-Digi project, the largest publicly funded content creation project in the world, has created a huge amount of digital material - 1 million pages of text, 400,000 images, and thousands of film and audio clips. 150 websites linked to 500 contributors have been developed and the EnrichUK portal will bring these together in a ‘one-stop content showcase’. Unlike many other collections of digital material, these are aimed at the informal, casual learner not the educational sector. The EnrichUK model takes a simple approach with three main descriptions: project, collection and agent. For collection description they have produced their own very simple schema, based on the RSLP work, and renaming some of the RSLP fields e.g. ‘subject’ is now called ‘topics’. Controlled terms are used for subject, geographic coverage, collection type, and language. The user interface has been designed to provide easy browsing and searching, while the input interface for collection description contributors is also simple. Despite the simplicity, errors in data input are still found, often due to a misunderstanding of the concept i.e. what a collection is.

Neil Thomson of the Natural History Museum followed with an insight into biodiversity collections and the work being carried out at national and international levels. Biodiversity collections come in several different types - preserved specimens in museums, observational data, and living collections, for example, in gardens. The type of user varies widely also, from academic researchers, through naturalists and natural history societies, to the general public. To be able to cater for all these is a quite a challenge. Collection Navigator is the museum’s new resource discovery tool, allowing access to the Natural History Museum’s 70 million specimens, 10,000 archive items, and over 1 million titles. BioCASE, a European funded project, is bringing together collection records from national nodes in 31 countries, and developing a collection-level profile. This profile is based on a simplified version of the EAD (Encoded Archival Description) schema and uses a variation of Dublin Core known as Darwin Core (a profile for search and retrieval of natural history collections and observation databases). The challenges arising from this development, apart from actually obtaining the data, were already sounding familiar - the problem of names, particularly acute in the natural history field where items have scientific names and common names, as well as synonyms, and the problem of names for geographic locations, particularly ‘fuzzy’ names, e.g. the Mediterranean Basin.

The third presentation was by Dennis Nicholson and Gordon Dunsire of the CC-Interop and SCONE projects (among others). CC-Interop developed from the e-Lib Clumps projects and is looking at interoperability with COPAC. Two of the clumps, CAIRNS and RIDING, are being used with the SCONE collection level description service to investigate collection description standards requirements, and to compile cataloguing and indexing standards in the clumps. They have surveyed and compared clumps, collection description services, and draft schemas; the resulting report includes a comparative data dictionary for the schemes so that levels of metadata structure can be compared and mapped. All the schemes were then compared with the RSLP model, finding that there is a high degree of structural compatibility. However, there is less compatibility in content standards, one of the issues being that of granularity (what is a collection). Other interoperability issues were discussed: collection identifiers, names, physical location data elements, date ranges (there is confusion between what is meant: the date of collection, the date of production, or the subject date? For example, what is the date of a collection of 18th century books about classical Rome that were collected in the 19th century?). Service-level description is a necessary partner to many collection-level descriptions, but is still at an embryonic stage (and probably complex). Future development with SCONE would involve using additional attributes and data elements such as agent administrator, language of audio material, education level of content, and classification or subject scheme used. Finally, we were reminded that interoperability applies to people as well through a complicated diagram of co-operative infrastructure for Scotland, incorporating at least 18 acronyms!

Fourth came Susi Woodhouse and Nick Poole from Resource to give us a national perspective on collection-level descriptions, including EnrichUK, Cornucopia [10], with over 2500 collections from the cultural sector, Cecilia [11] from the music sector, currently with 1500 collections from 500 institutions, and Crossroads [12], a pilot project in the West Midlands incorporating pottery collections from Stoke. Development of collection-level descriptions centred on the RSLP schema with the additional issues of legacy terminology and interoperability between these different projects. User issues were not forgotten, as these services are aimed at a wide audience, and the concept of a ‘Google for Collections’ was suggested - most users just want a quick answer and are reluctant to search in depth, a point picked up later in the discussion.

The final session before the lunch break saw Rachel Bruce, from JISC, describe middleware for the JISC Information Environment. Currently, an end user of JISC services has to access them separately and via differing interfaces, but in the near future the Information Environment aims to provide integrated access. Middleware will provide machine-readable information about services, content, rights and users to enable a user interface, such as a portal, to interact appropriately. Collection descriptions are part of this middleware, and the Information Environment Service Registry (IESR) [7] is a pilot project set up to explore and develop collection and service metadata for the Information Environment. It is the intention that the IESR will be used both by machines and by humans, and again the RSLP schema is being used as a basis for the collection descriptions.

Lunchtime gave us the opportunity to try out so many of the Showcase demonstrator and pilot projects that little time remained to stoke up for the afternoon sessions! The afternoon kicked off with Bridget Robinson and Ann Chapman, of the Collection Description Focus, presenting a potted history of the development of collection level description, from its beginnings in the e-Lib programme through the development by the RSLP, to the current situation. The RSLP analytical model of collections and their catalogues has formed the basis of much current work. Obviously, applying a theoretical model to real-life situations means that modifications are required, and there are still several issues with RSLP that need addressing, such as missing elements, difficulties with element labels, and lack of detail in some elements. Input from end users should tell us how the scheme could be developed and refined further. They ended by reminding us that it is still early days in the development of collection-level descriptions, and end users have yet to become familiar with the concept of a description for a collection rather than a single item. To assist this development the Collection Description Focus have developed an Online Tutorial, and a prototype version was displayed. A second version of this will made be available after Easter for feedback and suggestions.

Finally, the day was rounded off with a short presentation on the future development of collection descriptions by Paul Miller of UKOLN, and a discussion panel. The main emphasis of the event, at least to the reporter, was the overwhelming reliance on the RSLP schema and the fact that everyone feels they have to simplify it. There are many good reasons for this of course, but perhaps an ‘RSLP Core’ is one way to go. The problems with terminology just won’t go away, and work with end users has scarcely begun! This was an interesting and informative day, and I look forward to more.

References

  1. UKOLN Web site http://www.ukoln.ac.uk/
  2. The British Library http://www.bl.uk/
  3. The Joint Information Systems Committee (JISC) http://www.jisc.ac.uk/
  4. The Research Support Libraries Programme (RSLP) http://www.rslp.ac.uk/
  5. Resource The Council for Museums, Archives and Libraries http://www.resource.gov.uk
  6. EnrichUK http://www.enrichuk.net
  7. JISC Information Environment Service Registry http://www.mimas.ac.uk/iesr/
  8. CC-interop: COPAC / Clumps Continuing Technical Cooperation Project http://ccinterop.cdlr.strath.ac.uk/
  9. The Natural History Museum http://www.nhm.ac.uk
  10. Cornucopia http://www.cornucopia.org.uk/
  11. Cecilia http://www.cecilia-uk.org/
  12. Crossroads http://www.crossroads-wm.org.uk/

Author Details

 Verity Brack
Institute for Lifelong Learning
University of Sheffield

Email: v.brack@shef.ac.uk
Web site: http://www.shef.ac.uk/till/