PALS Conference: Institutional Repositories and Their Impact on Publishing

Kurt Paulus describes for us the Publisher and Library/Learning Solutions (PALS) Conference held in London this June.

PALS [1] is the ongoing collaboration between UK publishers (ALPSP [2] and the Publishers Association [3]), and Higher/Further Education (JISC). PALS aims to foster mutual understanding and work collaboratively towards the solution of issues arising from electronic publication.

This was a 'hot issue' conference [4], on a topic - institutional repositories - that has seen much interest, lots of activity and experiment. The general direction of the concept is not yet clear, but at least some of the issues are being exposed and are beginning to be clarified. Moreover it's another opportunity to learn lots of new acronyms, including PALS itself: Publisher and Library/Learning Solutions.

Before going into detail about the conference, held at the Royal College of Obstetricians and Gynaecologists, London, it is interesting to look at the composition of the audience. Over 100 people had signed up to attend. Of these, about 40 were from universities - academic and library departments. About 35 were from publishers of various sorts, and the remainder were from JISC, funding agencies, national laboratories and so on. So, it appears, the interest of the audience was more in the first part of the conference topic than the second, and this was reflected in the individual contributions.

Most speakers provided their own definition of what institutional repositories are: institution-based services that provide storage, dissemination, management and stewardship of content created by the institution(s) housing the repository. Although the language differed between speakers, the basic concept was agreed.

Clifford Lynch of the Coalition for Networked Information (CNI) [5], set the scene with a keynote talk on the future infrastructure for scholarship in the digital age as he saw it. He pointed to some significant developments that in his view will shape the information landscape of the future: The developing intellectual life of (academic) institutions is not best served by the formal scholarly publishing mechanisms with which we are familiar. The intellectual wealth of an institution is unstructured, with many scholarly, educational and administrative strands, and repositories need to reflect that. Unlike the permanency and open access of the published literature, it is not clear that all an institution's intellectual content should be open to everyone nor whether it should be preserved forever.

He posed a challenge to publishers particularly, though not exclusively, in the sciences: the contemporary research environment in the sciences relies heavily on computer networks, sensors and the data they capture, vast datasets (e.g. in astronomy), software and simulation. Traditional scholarly publishing has accommodated these changes in only a very limited way, and it may be that institutional repositories will be the format for making this content more accessible.

Much of the discussion of repositories has revolved around the outputs of the research communities. Increasingly, though, attention is turning to the digital resources required for education and learning: lecture notes, course collections, interactive learning materials and so on, the latter constrained to a degree by intellectual property issues. And what about the wider public: repositories in public libraries, for example? One step at a time!

In his usual way, Clifford provided us with both a context and a vision, as well as food for thought for what he called a conversation not a conference. The speakers for the rest of the day dwelt on what is happening on the ground, today, which tends to be much more prosaic. It will be interesting to see how the projection of current activities will reflect Clifford's more visionary expectations in a few years' time.

It fell to Mark Ware, of Mark Ware Consulting [6], who recently reported to PALS on Web-based repositories, to provide some factual background. Yes, interest in institution-based repositories is growing strongly, ranging from the pioneering ones at MIT [7] and Caltech [8] , via Glasgow, the collective initiative of the Dutch universities. User interest is growing more rapidly than content: between January and June this year, accesses to the Caltech CODA Archives, for example, increased by 107% but the number of records by only 7%. While there has been growth in content at MIT's DSpace, it has come from new departments contributing working papers and technical reports, rather than from new material contributed by the existing participants. Most of the repositories are small, with the number of records in the hundreds only. There are now over 130 e-prints repositories but with an average of only 350 records each (254 if the original three repositories are excluded).

These figures suggest that one of the early main issues is to persuade academics to deposit their outputs in the repositories, through advocacy and training. One or two institutions take a somewhat more coercive line, but none of the speakers recommended this as a sensible route. With the current slow rate of progress, there is little evidence yet that repositories are focusing on reforming scholarly publishing. Nor have they yet begun to tackle long-term preservation seriously; they are still some way from achieving the sort of vision that Clifford Lynch presented.

In principle, well based and stocked institutional repositories could have a significant impact on scholarly publishing, but Mark Ware's survey of publishers suggested that they are not yet quaking in their boots. Less than half those surveyed thought repositories would have a significant impact on traditional publishing within five years. Nearly three quarters considered that the commercial impact would be zero or neutral. Their permissions policies reflect this fairly relaxed view and they are split between either waiting or doing some experimentation to explore the many publishing issues surrounding repositories.

The Joint Information Systems Committee in the UK has actively supported new approaches to the management and dissemination of academic information. Chris Awre from JISC gave an overview of JISC's FAIR (Focus on Access to Institutional Resources) Programme [9]. Under the programme, JISC is sponsoring some 14 projects looking at e-prints, museums and images. It is also sponsoring institutional portals to investigate the technical and cultural issues, test the interoperability between different repositories through open archive standards and explore the issues related to supporting different communities in parallel, for example e-science and teaching and learning.

Chris Awre's view was that institutional repositories can be valuable tools for the sharing of information and breaking down the barriers between currently separate silos of information, and FAIR's role is to test how well repositories can achieve these objectives.

Repositories need new software, and Raym Crow of SPARC Consulting Group [10] summarised what is currently available. There are now more than a handful of systems available publicly via open source licences, with Eprints, DSpace and Fedora in the lead. All are compliant with OAI metadata harvesting protocols. Interoperability is therefore a key characteristic, facilitating discovery of content.

Currently they support traditional content - preprints and postprints, curriculum support materials, conference proceedings. Content submission and management processes are developing, though there is as yet no peer review facility, something that is still in the hands of the publishers. None of the systems provide turnkey solutions for long-term preservation, but preservation is clearly a key concern. Other issues being addressed are varieties of access policies and the complexities of rights and permissions.

The afternoon of the conference was devoted to four somewhat dry case studies. It is a reflection of the interest in the subject that the vast majority of the audience stayed in their seats even as the presentations stretched beyond their allotted times. The first was by Greg Tananbaum of Berkeley Electronic Press, talking about the e-Scholarship Repository at the University of California. The system has an impressive feature set, including peer review (!), ability to publish HTML, full text searching and personal e-mail notification of new content.

What struck this listener most was how closely the repository resembles a good scholarly publishing approach: ease of submission for the author with automated conversion to PDF; energetic marketing to potential authors; author service and feedback on how widely the content had been accessed (with some 90% of readers coming from outside the institution); and so on. Tananbaum's main conclusions from this experience were clarity over what the repository is to achieve, making a compelling case to faculties that use of the repository saves them time and enhances their reputation, and making sure that non-academics, including administrators, are actively engaged.

All content, printed, musical or other, has rights attached to it, so the issue of how to manage these rights in the context of preprints, e-prints, post-prints and repositories is an obvious concern. The worry has been, perhaps, more acute for publishers but is clearly taken on board by the proponents of institutional repositories. Steve Probets of Loughborough University reviewed the outcome of the JISC-funded RoMEO Project (Rights MEtadata for Open archiving) [11] set up to explore the rights issues related to self-archiving by authors. This included an extensive survey of existing copyright transfer agreements, an analysis of what authors can legally do with their works and a survey of what authors consider important in what is done with their works. Not surprisingly, authors tend to be fairly permissive, as increased availability of their papers is of benefit to them. This also goes hand-in-hand with the more relaxed attitude that many publishers have been developing. The fact that there is a significant lack of clarity overall - perhaps reflecting this laid-back view of rights management - is shown by indicators that many authors quite happily sign over copyright, that data and service providers are less than rigorous in checking rights status of the work they handle and that only half the data providers ask authors explicitly for the right to manage their works.

Nevertheless, there is agreement among data and service providers that rights management needs to be tackled by means of machine-readable rights specifications and integrated into OAI-compliant protocols. The Creative Commons solution - 11 licences each in three versions - is perhaps over-complex but could be integrated into OAI-compliant systems in a way that also satisfies authors' aspirations.

Leo Waaijers gave a sweeping account of the Dutch multi-university multi-project DARE (Digital Academic Repositories) Initiative [12]. The project reflects some of the concerns expressed by other speakers and re-emphasises the need to market to the internal academic audience. In DARE, for example, 144 top scientists in the participating institutions have been targeted to get them to deposit their work in the repositories by the end of this year. DARE recognises that there are different audiences - academics, teachers and students, 'society' - and is structured appropriately, allowing for institutional and disciplinary repositories with the back-up of an 'e-depot' to be managed by the Royal Library. Conceptually it allows for a variety of derivative and overlay products to be derived from it, conceivably giving publishers the opportunity to be part of the action. However, with the big search engines such as Google and Yahoo entering the research and scientific arenas, Leo Waaijers felt modestly uncertain about the direction matters would take: direct use by readers via search engines, derivative products generated by publishers and others, or a bit of both.

Oxford University Press (OUP) [13] is one of a number of publishers who have been dipping their toes into the water rather than waiting and seeing. Johanneke Sytsema and Richard O'Beirne had just a few minutes at the end of the proceedings to talk about collaboration between OUP (via Oxford E-prints) and the (JISC- and CURL-funded) SHERPA Project [14]. As with many of the other projects it is early days, but the experience so far has been positive, and recent collaboration demonstrates that there is more to be learned from rolling up one's sleeves than standing on the sidelines.

Discussion time was modest between the two sessions, and non-existent at the end of the day. The comments reflected the concerns already recorded, with the exception of some exchanges about whether institutional repositories could help with the difficulties facing monograph publishing. It is not a technical problem, again, but a social/political one requiring tenure committees to review their assessment criteria.

If one were to attempt to summarise the conference, one would say that institutional repositories are only at the starting gate but are beginning to translate aspiration into identifying the issues that have to be defined and resolved before substantial progress occurs. The main content providers - researchers and teachers - are slow to come on board. The technical, standards and legal problems are being addressed in a coherent and apparently successful way. The question of whether this is a sea change in the way information is disseminated, rather than an additional tool at the disposal of the researcher and student seems to be, on the evidence presented at this meeting, quite an open one. For publishers, again, there are threats to be assessed and opportunities to be explored.

