Manjula Patel provides us with an overview of much of the workshop organised by UKOLN between 4-5 September, 2003, University of Bath, UK.
![]()
Delegates were welcomed and reminded that this was the fourth and final in a series of workshops which have been organised by the Open Archives Forum Project. Rachel Heery explained that the project was a supporting action funded by the European Commission to bring together EU researchers and implementers working in the area of open access to archives.
This tutorial has drawn on various presentations and tutorials from the OAI community which were acknowledged. It is being developed as a set of Web pages and is available from the OA-Forum Web site [1]. The tutorial breaks down into two main areas: OAI for beginners and OAI-PMH (OAI Protocol for Metadata Harvesting). The beginners section provides some background on what the protocol is and what it does; technical detail is at an introductory level for those considering implementing the protocol. The second part relates to various topics covered in workshop tutorials, largely basic ideas, history and development and the technical basis for implementing the protocol. It also covers XML schemas and metadata formats.
The session covered a variety of e-theses-related issues from a range of perspectives. The group identified metadata, workflows, copyright issues, preservation and convincing decision makers as themes of special interest for the session.
In the time available it was not possible to discuss the topics that had been identified as being of common interest in great depth, but the overall impression of the current situation was encouraging. Many developmental projects are underway, some universities are already requiring electronic submission, and there was an indication from one participant that e-theses work, initially funded on a project basis at his university, is now considered a routine part of the work of library and computer staff (and funded accordingly).
With a large number of people undertaking research in this area, problems are being resolved and pockets of expertise are emerging. Generic ready-to-use tools are desired, but local needs are likely to create high levels of expectation and demand on such tools. They are to handle the entire publication life chain, producing a 'useful' document for resource discovery which also accommodates preservation needs. The tools are to support the learning environment as well as the archival environment.
The participants felt that they would benefit from closer collaboration, particularly in terms of sharing best practice in several of the areas that were discussed. For example, a joint approach on standardised copyright agreements between authors and publishers could be interesting to investigate in depth. As more institutions adopt e-theses, it becomes easier to convince decision makers of the merits of this approach.
This session discussed the identification of good practice and the formulation of generic guidelines for the future. The distributed nature of systems means that quality becomes more and more important the greater the number of contributions. Quality Assurance (QA) needs to be considered right from the start (e.g. all JISC projects are now required to take QA into account, with additional funding for this specific purpose). QA requirements change to meet technical advances, so the process needs to be iterative.
There was a question over who should be responsible for QA, Data Providers, Service Providers or Clusters? At the moment OAI-PMH mandates DC (Dublin Core) as the lowest common denominator for metadata description -this unfortunately makes the default of a very low specificity. The guidelines need to encourage the sharing of richer metadata records such as MARC and IEEE LOM, to enable records to be re-tasked and re-purposed.
In this breakout session participants discussed organisational issues relating to sustainability of open archives. Sustainability entails securing organisational support and providing a service based on an institutional repository to a scientific or scholarly community by building service on top of the archive. A question was raised as to whether a "self-archive" model was appropriate for institutional repositories.
Tim began the presentation by providing background information on the Institute of Museum and Library Services National Leadership Grant Program (IMLS NLG) in the US which funds research and demonstration, digitization, preservation, model programs and new technology in the Library and Museums arena.
He gave an overview of the "IMLS Framework of Guidance for Building Good Digital Collections", published in November 2001 [2] and spoke of four general recommendations from the IMLS Forum, one of which indicates that the IMLS should encourage the integration of an archiving component into every project plan by requiring a description of how data will be preserved.
An overview was also provided of an interesting OAI project in which the University of Illinois had been involved under a Mellon grant. The primary objective of this project was to create and demonstrate OAI tools; build a portal using aggregated metadata describing cultural heritage resources; investigate use of EAD (Encoded Archival Description) metadata in an OAI context and research the utility of aggregated metadata. The portal currently has 25 OAI data providers and aggregates some 479,000 metadata items.
An announcement was made of a preliminary version of OAI guidelines for static repositories, as a lower barrier option for exposing relatively static and small collections of metadata [3].
In addition, there are now OAI data provider services being built into many popular digital library applications such as ContentDM, Encompass, DLXS, Dspace, and Eprints.org. However, some of the implementations are limited in that they may support the oai_dc metadata schema only, or have limited feature sets and metadata mappings which may not be configurable.
Some concerns were expressed over IP rights issues and uncertainties as to whether licences limit metadata sharing; Timothy Cole's view is that machine readable IP rights attributes are needed to facilitate reuse.
In closing he considered OAI in context; descriptive item-level metadata alone appears not to be sufficient. It needs to be combined with collection descriptions, user annotations, machine-generated clustering etc.; it is important to note that OAI-PMH is not limited to item-level descriptive metadata.
The second day of the workshop was themed around several applications amongst which cultural heritage featured prominently.
This 2-year project is being led by the University of Edinburgh, its aims being to develop an OAI-compliant thesis archive and submission system for use in all participating universities and to develop an infrastructure which enables e-theses to be published on the Web to the extent that a minimum of 500 e-theses exist within the UK segment of the NDLTD (Networked Digital Library of Theses and Dissertations) after two years. As was explained, the project also aims to develop and implement a metadata export system (crosswalk) capable of delivering metadata to relevant metadata repositories for UK thesis information and to produce a "checklist approach" for universities to use as they develop e-theses capability.
An analysis of the current situation has revealed that on average 100 theses are accessed via a reading room per month and approximately 50 paper copies of theses are sent out per year. Given such demands, the school of informatics was keen to develop a repository for theses.
Dspace [4] was chosen for its power and functionality, technical support, continued development, emphasis on digital preservation and its use of the most up to date OAI-PMH version 2.0. The repository was to be developed in two phases, a post-viva deposit of the final version and a pre-viva submission for examination. Since DSpace does not currently provide functionality to handle the submission phase, a software module was developed by the project to cater for thesis authoring and supervision; workflow features for thesis submission; required forms for submission; metadata export/crosswalk facilities and an option to withhold the thesis at the archive end.
The project is looking not only at the act of building and populating an E-Theses archive, but also at addressing the requirements of university administrators, examiners, students and academics. It has become apparent that a new role is required for the Library/Information Services -one which is not simply a replacement for the traditional interlibrary loan.
The presentation gave an overview of the development of a cultural heritage portal from a service provider's point of view. A diverse range of cultural heritage information relating to the Aquitaine region in France is harvested using the OAI-PMH. The portal is currently in a validation phase, the final version being due in March 2004 [5]. At present there are several types of search available: a simple search based on full text; an advanced search based on field-level free text; and a cartographic search which works by searching geographic departments followed by the town. The project has found that a major difficulty stems from trying to find common threads in diverse content by which resources can be presented such that value can be added by the service provider. They also found that DC was insufficient and added seven additional terms.
The second half of the presentation described Cyberdocs which has is origins in developing an information processing platform for scholarly publishing, its forerunner being Cybertheses. Cyberdocs is an open source platform for publishing structured electronic documents.
DARE is the Dutch equivalent of the open archives movement, or Digital Academic Repositories [6]. It aims to have repositories at all universities catering for the archive of all academic output which would include: theses, articles, data sets, lecture slides, etc. A major aim is to enable reuse of such resources and the provision of services based on the repositories. A tender process is currently in progress for the provision of services.
This presentation described a project being undertaken by the Science Museum in London with the support of funding from the New Opportunities Fund (NOF). The project involves several museums in the UK, aiming to make a rich quantity of materials and collections accessible. It further aims to contextualise information through intelligent display, searching and relational linking and to develop user-focused activities and personalisation tools. NOF projects are required to use DC and XML, although flexibility is allowed in the implementation. Content is drawn from 5 disparate databases amounting to 40,000 digitised images and associated text; 30,000 library records; 10,000 object records and 50 narrative topics. The project has found authority control to be an essential feature if consistency in the data is required. At the collection level DC and Research Support Libraries Programme Collection Level Description (RSLP CLD) [7] elements are used. Future work is envisaged in the area of creating communities based on interest groups linked to subject hierarchies.
The Fitzwilliam is the Art Museum of the University of Cambridge [8], providing access to teaching, learning and research as well as access to the general public. The museum has a diverse collection of 500,000 objects managed in 5 curatorial groups: manuscripts and printed books; paintings, drawings and prints; antiquities; applied arts; coins and medals. The goal of the work being undertaken is to develop a unified catalogue of object records which can be used for: collections management; teaching, learning and research; and electronic access. The work is being funded by JISC's FAIR (Focus on Access to Institutional Resources) Programme to support the preparation of records and images to provide access using the OAI-PMH.
This presentation provided a summary of the review of organisational issues which have emerged through discussions within the OA-Forum project and covered the following issues: business models, intellectual property rights, quality assurance, metadata, interoperability, content management systems, and the importance of organisational issues. The review forms a part of the set of deliverables of the project.
The scholarly publishing process comprises four functions: registration (establishing intellectual priority); certification (certifying the quality/validity of the research); awareness (assuring accessibility of research); archiving (preserving research for future use). Looking at each function from an institutional repository perspective, it is clear that registration can be achieved. Certification on the other hand, via the process of peer review is independent of the medium. Awareness can be enhanced by OAI-compliance and interoperability, so that search engines can index the metadata harvested from federated repositories. The advantage in terms of preservation is that an institutional repository helps to put librarians rather than journal publishers in charge of digital archiving.
The OAI-PMH technical guidelines have been deemed to be good. But object identity attributes are still not well understood. For example, to what extent does the object's "value-addedness" need to change before it can take on a new identity? Simple DC is neither rich nor structured enough; DC is focused on static objects and operates at an item-level view. CLD may be more important. On a philosophical level, open archives initiatives need to encourage reuse of information objects and provide cost-benefit analyses.
We should remember that metadata has uses other than just resource discovery. OAI metadata is largely for dissemination purposes. It may be useful to add work flow processes into the OAI model, so that for example migration of articles can be tracked or information such as annotations or versions can be maintained.
Largely from a technical point of view, as far as the OAI-PMH protocol is concerned, no further development is required. Assignment of rights, such as how to carry Creative Commons licences could be an area to investigate; also how to carry metadata other than DC. There is certainly no need to remove the mandatory status of DC. There is a need for additional mechanisms, (over and above cataloguing guidelines), to cater for QA in metadata provision. Some scepticism was expressed with regard to getting OA eprints adopted in the UK -self-archive may not be the correct model.
Largely concerned with how memory organisations take up technology and collaborate (with a French focus). There is a need for preservation metadata to be adopted, as well as data transfer and repository synchronisation. OAI-PMH appears to be under-exploited by memory organisations.
To begin with, questions from the floor and discussions centred on the issue of take-up of open archives, in particular institutional repositories. Andy Powell felt that it was difficult to convince academics to self-archive and that there probably would be no significant take-up until there is a requirement to do so by funding bodies. Tim Cole suggested that there is good take-up of archives in some areas such as physics, but studies were required to get a better picture. A member of the audience thought that linking the UK's Research Assessment Exercise to institutional repositories would also encourage take-up in the UK. A significant point made by a member of the audience was that, if institutional repositories are to gain acceptance by academics, there will need to be a change in the scholarly publishing process - a view with which the author agrees.
Further discussions related to: whether publishers would be happy with self-archiving of published material; the cost of running an institutional repository; institutional versus discipline-based repositories (e.g. with joint authorship, where should the article be archived?); and tracking of how harvested metadata is being used.
Manjula Patel
Research Officer UKOLN
Email: m.patel@ukoln.ac.uk
Website: http://www.ukoln.ac.uk
Article Title: "Fourth Open Archives Forum Workshop In Practice, Good Practice: The Future of Open Archives" Author: Manjula Patel
Publication Date: 30-October-2003 Publication: Ariadne Issue 37
Originating URL: http://www.ariadne.ac.uk/issue37/oa-forum-ws-rpt/intro.html
Copyright and citation information File last modified: Wednesday, 03-Nov-2004 14:15:51 UTC
![]()
Ariadne is published every three months by UKOLN. UKOLN is funded by the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC, the European Union and the Museums, Libraries and Archives Council. UKOLN also receives support from the University of Bath where it is based. Material referred to on this page is copyright Ariadne (University of Bath) and original authors.