Web Magazine for Information Professionals

OAI: The Fourth Open Archives Forum Workshop

Manjula Patel provides us with an overview of the 4th Open Archives Forum Workshop.

Welcome and Introduction

Rachel Heery, UKOLN, University of Bath

Delegates were welcomed and reminded that this was the fourth and final in a series of workshops which have been organised by the Open Archives Forum Project. Rachel Heery explained that the project was a supporting action funded by the European Commission to bring together EU researchers and implementers working in the area of open access to archives.

Developing the OA-Forum Online OAI Tutorial

Leona Carpenter, Digital Library and Information Consultant

This tutorial has drawn on various presentations and tutorials from the OAI community which were acknowledged. It is being developed as a set of Web pages and is available from the OA-Forum Web site [1]. The tutorial breaks down into two main areas: OAI for beginners and OAI-PMH (OAI Protocol for Metadata Harvesting). The beginners section provides some background on what the protocol is and what it does; technical detail is at an introductory level for those considering implementing the protocol. The second part relates to various topics covered in workshop tutorials, largely basic ideas, history and development and the technical basis for implementing the protocol. It also covers XML schemas and metadata formats.

Breakout Session: E-theses

facilitated by Jessica Lindholm, Electronic Information Services Librarian, Lund University Libraries, Sweden

The session covered a variety of e-theses-related issues from a range of perspectives. The group identified metadata, workflows, copyright issues, preservation and convincing decision makers as themes of special interest for the session.

In the time available it was not possible to discuss the topics that had been identified as being of common interest in great depth, but the overall impression of the current situation was encouraging. Many developmental projects are underway, some universities are already requiring electronic submission, and there was an indication from one participant that e-theses work, initially funded on a project basis at his university, is now considered a routine part of the work of library and computer staff (and funded accordingly).

With a large number of people undertaking research in this area, problems are being resolved and pockets of expertise are emerging. Generic ready-to-use tools are desired, but local needs are likely to create high levels of expectation and demand on such tools. They are to handle the entire publication life chain, producing a ‘useful’ document for resource discovery which also accommodates preservation needs. The tools are to support the learning environment as well as the archival environment.

The participants felt that they would benefit from closer collaboration, particularly in terms of sharing best practice in several of the areas that were discussed. For example, a joint approach on standardised copyright agreements between authors and publishers could be interesting to investigate in depth. As more institutions adopt e-theses, it becomes easier to convince decision makers of the merits of this approach.

Breakout Session: Quality Issues

facilitated by Rachel Heery, UKOLN, University of Bath

This session discussed the identification of good practice and the formulation of generic guidelines for the future. The distributed nature of systems means that quality becomes more and more important the greater the number of contributions. Quality Assurance (QA) needs to be considered right from the start (e.g. all JISC projects are now required to take QA into account, with additional funding for this specific purpose). QA requirements change to meet technical advances, so the process needs to be iterative.

There was a question over who should be responsible for QA, Data Providers, Service Providers or Clusters? At the moment OAI-PMH mandates DC (Dublin Core) as the lowest common denominator for metadata description -this unfortunately makes the default of a very low specificity. The guidelines need to encourage the sharing of richer metadata records such as MARC and IEEE LOM, to enable records to be re-tasked and re-purposed.

Breakout Session: Sustainability Issues

facilitated by the DARE Project

In this breakout session participants discussed organisational issues relating to sustainability of open archives. Sustainability entails securing organisational support and providing a service based on an institutional repository to a scientific or scholarly community by building service on top of the archive. A question was raised as to whether a “self-archive” model was appropriate for institutional repositories.

IMLS Collection Registry and Item-level Metadata Repository at the University of Illinois

Timothy Cole, Mathematics Librarian & Professor of Library Administration, University of Illinois at Urbana-Champaign, USA

Tim began the presentation by providing background information on the Institute of Museum and Library Services National Leadership Grant Program (IMLS NLG) in the US which funds research and demonstration, digitization, preservation, model programs and new technology in the Library and Museums arena.

He gave an overview of the “IMLS Framework of Guidance for Building Good Digital Collections”, published in November 2001 [2] and spoke of four general recommendations from the IMLS Forum, one of which indicates that the IMLS should encourage the integration of an archiving component into every project plan by requiring a description of how data will be preserved.

An overview was also provided of an interesting OAI project in which the University of Illinois had been involved under a Mellon grant. The primary objective of this project was to create and demonstrate OAI tools; build a portal using aggregated metadata describing cultural heritage resources; investigate use of EAD (Encoded Archival Description) metadata in an OAI context and research the utility of aggregated metadata. The portal currently has 25 OAI data providers and aggregates some 479,000 metadata items.

An announcement was made of a preliminary version of OAI guidelines for static repositories, as a lower barrier option for exposing relatively static and small collections of metadata [3].

In addition, there are now OAI data provider services being built into many popular digital library applications such as ContentDM, Encompass, DLXS, Dspace, and Eprints.org. However, some of the implementations are limited in that they may support the oai_dc metadata schema only, or have limited feature sets and metadata mappings which may not be configurable.

Some concerns were expressed over IP rights issues and uncertainties as to whether licences limit metadata sharing; Timothy Cole’s view is that machine readable IP rights attributes are needed to facilitate reuse.

In closing he considered OAI in context; descriptive item-level metadata alone appears not to be sufficient. It needs to be combined with collection descriptions, user annotations, machine-generated clustering etc.; it is important to note that OAI-PMH is not limited to item-level descriptive metadata.


The second day of the workshop was themed around several applications amongst which cultural heritage featured prominently.

Theses Alive!

Theo Andrew, Project Officer, University of Edinburgh
Richard Jones, Systems Developer, Edinburgh University Library

This 2-year project is being led by the University of Edinburgh, its aims being to develop an OAI-compliant thesis archive and submission system for use in all participating universities and to develop an infrastructure which enables e-theses to be published on the Web to the extent that a minimum of 500 e-theses exist within the UK segment of the NDLTD (Networked Digital Library of Theses and Dissertations) after two years. As was explained, the project also aims to develop and implement a metadata export system (crosswalk) capable of delivering metadata to relevant metadata repositories for UK thesis information and to produce a “checklist approach” for universities to use as they develop e-theses capability.

An analysis of the current situation has revealed that on average 100 theses are accessed via a reading room per month and approximately 50 paper copies of theses are sent out per year. Given such demands, the school of informatics was keen to develop a repository for theses.

Dspace [4] was chosen for its power and functionality, technical support, continued development, emphasis on digital preservation and its use of the most up to date OAI-PMH version 2.0. The repository was to be developed in two phases, a post-viva deposit of the final version and a pre-viva submission for examination. Since DSpace does not currently provide functionality to handle the submission phase, a software module was developed by the project to cater for thesis authoring and supervision; workflow features for thesis submission; required forms for submission; metadata export/crosswalk facilities and an option to withhold the thesis at the archive end.

The project is looking not only at the act of building and populating an E-Theses archive, but also at addressing the requirements of university administrators, examiners, students and academics. It has become apparent that a new role is required for the Library/Information Services -one which is not simply a replacement for the traditional interlibrary loan.

Aquitaine Patrimonies & Cyberdocs: A French cultural heritage portal and an electronic structured document publishing platform

Rasik Pandey, Developer, AJLSM

The presentation gave an overview of the development of a cultural heritage portal from a service provider’s point of view. A diverse range of cultural heritage information relating to the Aquitaine region in France is harvested using the OAI-PMH. The portal is currently in a validation phase, the final version being due in March 2004 [5]. At present there are several types of search available: a simple search based on full text; an advanced search based on field-level free text; and a cartographic search which works by searching geographic departments followed by the town. The project has found that a major difficulty stems from trying to find common threads in diverse content by which resources can be presented such that value can be added by the service provider. They also found that DC was insufficient and added seven additional terms.

The second half of the presentation described Cyberdocs which has is origins in developing an information processing platform for scholarly publishing, its forerunner being Cybertheses. Cyberdocs is an open source platform for publishing structured electronic documents.

DARE: a new age in academic information provision in the Netherlands

Henk Ellermann, Project Leader, Erasmus University, Rotterdam

DARE is the Dutch equivalent of the open archives movement, or Digital Academic Repositories [6]. It aims to have repositories at all universities catering for the archive of all academic output which would include: theses, articles, data sets, lecture slides, etc. A major aim is to enable reuse of such resources and the provision of services based on the repositories. A tender process is currently in progress for the provision of services.

Science and Culture: Developing a knowledge site in distributed information environments

Ann Borda, Head of Collections Multimedia, Science Museum
Alpay Beler, IS Architect, Science Museum
Nick Wyatt, Collections Services Librarian, Science Museum Library

This presentation described a project being undertaken by the Science Museum in London with the support of funding from the New Opportunities Fund (NOF). The project involves several museums in the UK, aiming to make a rich quantity of materials and collections accessible. It further aims to contextualise information through intelligent display, searching and relational linking and to develop user-focused activities and personalisation tools. NOF projects are required to use DC and XML, although flexibility is allowed in the implementation. Content is drawn from 5 disparate databases amounting to 40,000 digitised images and associated text; 30,000 library records; 10,000 object records and 50 narrative topics. The project has found authority control to be an essential feature if consistency in the data is required. At the collection level DC and Research Support Libraries Programme Collection Level Description (RSLP CLD) [7] elements are used. Future work is envisaged in the area of creating communities based on interest groups linked to subject hierarchies.

Harvesting the FitzWilliam

Shaun Osborne, Project Manager, Fitzwilliam Museum, University of Cambridge

The Fitzwilliam is the Art Museum of the University of Cambridge [8], providing access to teaching, learning and research as well as access to the general public. The museum has a diverse collection of 500,000 objects managed in 5 curatorial groups: manuscripts and printed books; paintings, drawings and prints; antiquities; applied arts; coins and medals. The goal of the work being undertaken is to develop a unified catalogue of object records which can be used for: collections management; teaching, learning and research; and electronic access. The work is being funded by JISC’s FAIR (Focus on Access to Institutional Resources) Programme to support the preparation of records and images to provide access using the OAI-PMH.

OA-F Organisational Issues Working Group Report

Paul Child, Project Manager: Artworld, University of Cambridge

This presentation provided a summary of the review of organisational issues which have emerged through discussions within the OA-Forum project and covered the following issues: business models, intellectual property rights, quality assurance, metadata, interoperability, content management systems, and the importance of organisational issues. The review forms a part of the set of deliverables of the project.

Open Archives, Open Access and the Scholarly Communication Process

David Prosser, Director, SPARC Europe

The scholarly publishing process comprises four functions: registration (establishing intellectual priority); certification (certifying the quality/validity of the research); awareness (assuring accessibility of research); archiving (preserving research for future use). Looking at each function from an institutional repository perspective, it is clear that registration can be achieved. Certification on the other hand, via the process of peer review is independent of the medium. Awareness can be enhanced by OAI-compliance and interoperability, so that search engines can index the metadata harvested from federated repositories. The advantage in terms of preservation is that an institutional repository helps to put librarians rather than journal publishers in charge of digital archiving.

Panel Session and discussion of the future of Open Archives

Tim Cole

The OAI-PMH technical guidelines have been deemed to be good. But object identity attributes are still not well understood. For example, to what extent does the object’s “value-addedness” need to change before it can take on a new identity? Simple DC is neither rich nor structured enough; DC is focused on static objects and operates at an item-level view. CLD may be more important. On a philosophical level, open archives initiatives need to encourage reuse of information objects and provide cost-benefit analyses.

Carl Lagoze

We should remember that metadata has uses other than just resource discovery. OAI metadata is largely for dissemination purposes. It may be useful to add work flow processes into the OAI model, so that for example migration of articles can be tracked or information such as annotations or versions can be maintained.

Andy Powell

Largely from a technical point of view, as far as the OAI-PMH protocol is concerned, no further development is required. Assignment of rights, such as how to carry Creative Commons licences could be an area to investigate; also how to carry metadata other than DC. There is certainly no need to remove the mandatory status of DC. There is a need for additional mechanisms, (over and above cataloguing guidelines), to cater for QA in metadata provision. Some scepticism was expressed with regard to getting OA eprints adopted in the UK -self-archive may not be the correct model.

Muriel Foulonneau

Largely concerned with how memory organisations take up technology and collaborate (with a French focus). There is a need for preservation metadata to be adopted, as well as data transfer and repository synchronisation. OAI-PMH appears to be under-exploited by memory organisations.

Discussion

To begin with, questions from the floor and discussions centred on the issue of take-up of open archives, in particular institutional repositories. Andy Powell felt that it was difficult to convince academics to self-archive and that there probably would be no significant take-up until there is a requirement to do so by funding bodies. Tim Cole suggested that there is good take-up of archives in some areas such as physics, but studies were required to get a better picture. A member of the audience thought that linking the UK’s Research Assessment Exercise to institutional repositories would also encourage take-up in the UK. A significant point made by a member of the audience was that, if institutional repositories are to gain acceptance by academics, there will need to be a change in the scholarly publishing process - a view with which the author agrees.

Further discussions related to: whether publishers would be happy with self-archiving of published material; the cost of running an institutional repository; institutional versus discipline-based repositories (e.g. with joint authorship, where should the article be archived?); and tracking of how harvested metadata is being used.

References

  1. The Open Archives Forum Web site
    http://www.oaforum.org/
  2. IMLA Framework of Guidance for Building Good Digital Collections
    http://www.imls.gov/pubs/forumframework.htm
  3. OAI Guidelines for Static Repositories
    http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
  4. Dspace
    http://www.dspace.org/
  5. Aquitaine Patrimonies Portal
    http://ajlsm-sdx.hopto.org/sdx-22h/pa-portail/
  6. The DARE Project
    http://www.surf.nl/
  7. RSLP Collection Description
    http://www.ukoln.ac.uk/metadata/rslp/
  8. The Fitzwilliam Museum
    http://www.fitzmuseum.cam.ac.uk/

Author Details

Manjula Patel
Research Officer UKOLN

Email: m.patel@ukoln.ac.uk
Website: http://www.ukoln.ac.uk

Return to top

Article Title: “Fourth Open Archives Forum Workshop In Practice, Good Practice: The Future of Open Archives”
Author: Manjula Patel
Publication Date: 30-October-2003
Publication: Ariadne Issue 37
Originating URL: http://www.ariadne.ac.uk/issue37/oa-forum-ws-rpt/