Web Magazine for Information Professionals

Developing an Agenda for Institutional E-Print Archives

Philip Hunter, John MacColl and Marieke Napier report on a one day Open Archives conference on OAI compliant metadata and e-print issues. Held at the Institute of Mechanical Engineers, London. 11 July 2001.

A one day Open Archives event co-ordinated by the DNER, CURL and UKOLN was held on Wednesday 11th July at the Institute of Mechanical Engineers, Birdcage Walk, London. Birdcage walk is in a very impressive part of London, circumscribed by Westminster Abbey, Buckingham Palace, and the Houses of Parliament. Lucky for us the hot sun added to the splendor of the location.

picture of Catherine Grout giving her presentation

Catherine Grout giving the opening presentation

The Institute of Mechanical Engineers building itself is also very grand. After registration and coffee we all moved into the lecture theatre, a striking room filled with impressive art work and the most comfy lecture theatre seats most of us have experienced. The chair for the day was Sheila Corrall, Director of Academic Support Services at the University of Southampton. She explained that the aims for the day were to come up with a list of recommendations for JISC on ways that they could encourage OAI uptake.

A general introduction to OAI was then given by Catherine Grout, Assistant Director (Development) of the DNER. She began with the quotation from Wallace Stevens ‘the whole, the complicate, the amassing harmony’. She felt that the quote (initially used by Lorcan Dempsey) sums up what the OA initiative is all about. The JISC/DNER perspective is that open archiving provides a technology for cementing the DNER architecture. The DNER investment over the next few years will be dedicated to building an Information Environment appropriate to the needs of learners, teachers and researchers in UK HE and FE. She argued that:

there is considerable investment which needs to take place in a range of middleware, fusion and portal services to support this development. At the moment we have a number of different services delivering content and presenting them to end users by a variety of different interfaces. Our challenge is to develop the Information Environment in such a way that we considerably advance the coherence with which these services are offered to end users. Our ultimate goal for this development is the seamless searching of rich relevant resources which the DNER vision enshrines.

The JISC also has an interest in exploring the use of Open Archives as a key way of disclosing metadata about the resources held by our services, and of particular significance here, by members of the higher and further education community. Some JISC Services, for example RDN and MIMAS, have already been working to look at making their metadata OAI compliant. JISC has also funded the open sourcing of the eprints software at Southampton, and is supporting the Open Citations project. Tools, guidelines, best practice case studies and pilot projects are all likely to be the sort of initiatives which JISC will wish to fund in the future. JISC will also be interested in projects involving a range of members of the HE and FE community as part of moving forward in this area. Catherine emphasized Shelila’s point that today’s meeting was mainly about leveraging the community resources in support for institutional agendas for using OAI for e-print archives. Her priorities were to find out what institutions can do and what service providers can do (where central funding can genuinely add value) and come up with practical ideas in the form of tools, guidelines, best practice case studies and possible pilots.

Following Catherine was Michael Nelson, recently of the University of North Carolina, but now working for NASA

Michael Nelson on the podium

Michael Nelson at the podium

. His entertaining talk was entitled OAI past, present and future. Michael started of by saying that Distributed searching, the computing science hammer to the interoperability nail, is hard to do. There were many attempts in the mid-90s, which failed. But metadata harvesting, proposed instead by Van de Sompel (recently appointed the e-director of the British Library), Nelson, Lagoze and others, also turned out to be difficult. Every archive had its own different format; for example, repositories which were included at the beginning included arXiv (physics), Cogprints (cognitive science), NDLTD (theses) and RePEc (economics). OAI separated out data providers from service providers. Data providers must provide methods for metadata harvesting. The objective was to achieve ‘self-describing archives.’ Nelson described OAI as a generic bulk metadata transport protocol. It is only about metadata – not full-text. It is also neutral with respect to the source of the metadata. Commercial publishers are interested too. The protocol, launched in January/February of this year, has been frozen for 12-15 months to allow services to be built on a stable platform. Nelson explained the difference between OAI and OAIS (Open Archival Information System), which is a developing standard for digital preservation.

photo of coffee break in the Institute of Mechanical Engineers

Coffee Break

The protocol was initially a subset of the Dienst protocol, then it defined its own OAI-specific protocol. Now this has been dropped in favour of unqualified Dublin Core (over-simple and with questionable semantics, but then so was the OAI-specific protocol). It supports multiple metadata formats. It also employs flow control, in order to deal with very large data sets. The intention is that existing data sets should be capable of being made OAI-compliant with very little effort. Much of the protocol is optional, and DC is very accommodating (even a null record is DC-compliant). This is important for author-contributed metadata. DC is a ‘lowest common denominator’ format, supporting parallel metadata formats, including MARC (Nelson suggested we shall soon see ‘the revenge of MARC’ in a return to ‘thousand flowers blooming’ metadata sets). The protocol developers very much want a range of community-specific metadata formats to develop. It also uses XML, which has lots of advantages (e.g. schemas to determine compliance). But it is unforgiving, so that harvesting in small batches is recommended. It is however a good disciplinarian in that it forces clean metadata.

The OAI protocol is always a front-end for another dataset: it has no interface for record input or deletion. Eprints, for example, is an archiving system with the OAI protocol built in. It also supports no terms and conditions restrictions. It is, however, possible to set up public and private OAI servers, which feed a source database. Service providers decide how regularly to poll archives and extract metadata, and thus update us as users in respect of updates, additions and deletions. The protocol also supports ‘sets’ to partition archives, e.g. by discipline.

photo of Stevan Harnad

Stevan Harnad on the potential of institutional eprint archives

Stevan Harnad, Professor of Cognitive Science at the University of Southampton, then gave a paper on The potential of institutional eprint archives. He explained that although OAI started with a dedicated focus this has widened as people realized the potential of sharing metadata. Stevan’s intention was to narrow this focus back to the original publication type, something he now calls the ‘Self-Archiving Initiative.’ He began by firstly pointing out that Southampton is developing a ‘quick and simple’ version of ‘industrial strength’ e-print software which makes installation much simpler. After negotiation with Chris Gutteridge, also of Southampton, Stevan stated that the software should be available by the end of the month. Cornell will also soon produce a registration tool to make non-registered archives registered as simply as possible. Stevan believes that all peer reviewed journals should be freely available in an interoperable electronic format. He argued that while we fail to have all articles available research itself will be the biggest looser. Currently over 20,000 refereed journals publish over 2 million articles a year and most researchers cannot access them. Stevan argued that Institutional libraries need to be rethought of as outgoing collections, which give as well as take. The incentive is at the level of institutions, since institutions lose when their own researchers’ work lacks impact, as it does because peer researchers in other institutions are debarred from access due to high subscription costs. Stevan advocates that all research universities should mandate a CV with all published papers linked to an institutional archive. There is therefore an explicit link there to RAE methodology, which could make the RAE redundant (the impact would be measured by ‘continuous assessment.’) He has been trying to persuade a group of Provosts of elite US universities to do this. In the UK, the people we need to persuade are the Funding Councils, in order to change the methodology for research assessment(1). After the presentation there were a number of questions about the effect of self archiving on the publishing industry. In response Stevan volunteered a page entitled ‘Zenos Prima FAQs’ that had an answer for every apprehension expressed. The matter of digital preservation was raised by a number of people and there seemed to be a consensus that there is a lot more to electronic publishing than just text and how we deal with this material is a serious issue. The other area that people were keen to discuss was whether the move forward for OAI should be domain or institutional based. Throughout the day there were many arguments raised for both sides.

After lunch, Paul Ayris, Director of Library Services at UCL, spoke on Why research libraries need open archives. He began by referring to the ‘serials crisis’, which Stevan Harnad had told us we should not describe in such terms. Yet his graph proved that there is a crisis. The cumulative increase in the RPI since 1986 is c. 50%; that in periodical prices is nearly 300% - while at the same time library funding in real terms has dropped by about 1% over the same period. He mentioned that the NESLI deals which have been brokered have proved difficult for CURL, since they have been based on traditional spend on print journals. This is effectively a ‘tax on research.’ CURL wants to lobby for a general review of STM publishing by the Director of Fair Trading. CURL will produce advocacy packs for its member institutions for next academic session, to alert Principals and Vice-Chancellors of the problems.

As Chair of the relevant CURL Task Force, Paul advocated the establishment of OAI servers in institutions – though consortial or regional models may also be appropriate. Libraries should lead this. He outlined the risks, however. VCs will cavil at the ‘multiple costs’ involved in ‘speculating to accumulate.’ We still will have to pay for print. He asked about the costs of OAI, in terms of staffing, metadata and infrastructure. There is also the need to clarify the ownership of IPR. In the action plan he suggested, Glasgow, Nottingham, Edinburgh, Southampton and Strathclyde are all setting up archives: could JISC fund an evaluation of these? Can JISC funding also be provided to support the establishment of archives in all institutions? Charles Oppenheim mentioned in the Q&A that JISC is setting up an IPR committee under Brian Fender, and including Charles. This will address many of the issues which Paul had raised.

photo of Chris Rusbridge

Chris Rusbridge on setting up an institutional eprints archive

The next paper was given by Chris Rusbridge and William Nixon of the University of Glasgow: Setting up an institutional eprints archive: what is involved? The Glasgow model is inclusive of all types of scholarly publication, including reports, conference papers, monographs and book chapters. They had hoped to invoke their archive in the current RAE, but could not get things established in time. They were explicit about long-term digital preservation not being part of the aim. The Glasgow server has 15 papers at present – all by Chris’ own staff! Some of the formats supported by the eprints software were questioned by Chris (Word and HTML, for example). Glasgow has added PDF and planning for XML. Being able to link in to the authentication structures of the institution (as in single sign-on) would be a good thing. Links to Reference Manager should be supported, and a better audit trail is required, as in submission date. He also asked whether any quality checking should happen. Can we be sure that the papers submitted are indeed submitted by our own academics?

Computing Services at Glasgow did the installation, rather than Library staff. Chris Rusbridge spoke of his wish to set up an e-theses service at Glasgow, possibly using NDLTD (and he mentioned the interesting fact that City University is the only UK university in membership of NDLTD). NDLTD is likely to increase use of these by 400%, according to Virginia Tech figures. One of the comments after the presentation was that a bulletin board should be available so that people setting up OA software could share ideas online. Chris Gutteridge explained that there is a mailing list that deals with technical issues but there remained a feeling that there should be a mechanism available for people to discuss wider cultural issues.

Rachel Heery giving her presentation

Rachel Heery on the Open Archives Forum

The final paper was by Rachel Heery, Assistant Director for Research & Development at UKOLN on European support for Open Archive activity. She introduced a new European project, the Open Archives Forum, an Accompanying Measure funded by the EC IST programme. It includes Humboldt University and IEI-CNR in Pisa. Rachel explained that the last question on the possible creation of a bulletin board had been very apt when considering the role of the Open Archives Forum. Some of the main project objectives are to provide a focus for dissemination, encourage collaborative working and exchange of information. The OA Forum will also explore some of the relevant business models and evaluate the OAI protocol technologies, comparing the protocol with HARVEST and Z39.50, for example, and asking the question whether DC is sufficiently rich as a metadata format. Registration of interest was invited.

After a coffee break the meeting led into an open discussion moderated by Jan Wilkinson, University Librarian at the University of Leeds. Jan pointed out again the need to let JISC know what we wanted from them. She suggested we try to gain a certain level of commitment at an institutional level and discuss barriers to progress, be they at an institutional, service provider or funding level. Gordon Dunsire made a plea for the initiative to cover all scholarly material. That point, however, had already been granted. There was a suggestion that an XML cleanser be provided to the community. Ronald Milne presented the case for disciplinary rather than institutional archives, echoing a point made earlier by Peter Brophy. This did not receive too much support, though the point was made that most papers these days are written by authors from several institutions.

Charles Oppenheim made the point that institutional frameworks can help junior researchers. I suggested that JISC should fund a pilot study in a small group of institutions to assess research impact differently, by requiring that researcher CVs are deposited online with links to papers in a local open archive, as Stevan Harnad had suggested in the morning session. This should also assist in filling archives. I reckoned that it might be necessary to make academics ‘hate their institutions’ in order to get the archives filled (several people said that academics do not want their institutions to impose on them; Les Carr of Southampton said he ‘hated’ his university, as an academic.) There was some support for this notion of a pilot initiative funded by JISC, from Paul Ayris. This connected with Catherine Grout’s reminder that there is a research contract obliging this in the case of some research councils. Fred Friend also supported the notion of tackling the Funding Councils on this. Chris Rusbridge suggested that JISC may make such deposit a condition of any grant it awards. Thomas Krichel suggested the creation of a disciplinary archive in library and information science.

Sheila Corrall then concluded the meeting with a summing-up. She highlighted the need to create cultural change. We want funding from JISC to allow us to build on existing projects and to experiment. A ‘side by side’ institutional/disciplinary approach seems to be the most positive route, since the two are not mutually incompatible. She then suggested that JISC invite bids for imaginative suggestions for populating archives, of either type. They should put the funding up and invite us to bid imaginatively for it. CURL and the DNER were thanked for initiating the event.


[1] Minotaur: Six Proposals for Freeing the Refereed Literature Online: A Comparison by Stevan Harnad, 22-June, Ariadne Issue 28 http://www.ariadne.ac.uk /issue28/minotaur/intro.html

An archive of presentations from the conference is available at: http://www.ukoln.ac.uk/events/open-archives/open-programme.html

Author Details

John MacColl
Email: j.maccoll@ed.ac.uk
Marieke Napier
Email: m.napier@ukoln.ac.uk
Philip Hunter
Email p.j.hunter@ukoln.ac.uk