Web Magazine for Information Professionals

Open Repositories 2008

Mahendra Mahey reports on the third international Open Repositories 2008 Conference, held at the School of Electronics and Computer Science, University of Southampton in April 2008.

This was the third international Open Repositories Conference, the previous two being held in 2007, San Antonio, Texas [1] and in 2006, Sydney [2], so Europe was the third continent to host the event. Southampton was gloriously sunny for the five days of the conference (1-4 April), so there was no need to use the disposable plastic macs that were provided in the delegate bags. The event tends to attract people who have either already set up digital repositories in their institutions, are thinking about it or are interested in various aspects of repositories. Typically these repositories allow open access to their research outputs and/or teaching or learning materials. Delegates included developers, administrators, librarians, managers or even policy makers.

The main theme of the conference this year was practice and innovation, and there was plenty of evidence that this was reflected in the main conference programme [3]. The conference continued in the same format of previous conferences, that of having general conference plenary sessions followed by parallel Technical Open User Group meetings for DSpace [4], Fedora [5], and Eprints [6], the popular institutional repository software platforms. New to the event was the very first 'Repository Challenge' organised by the JISC CRIG (Common Repository Interface Group) [7] where software developers worked together in small teams in a very informal way (barcamp style) to create real-life software demonstrators and services quickly, building on previous CRIG work (started in March 2007) which had produced real-life, user-relevant scenarios and services for digital repositories. A first prize of $5000 was awarded to the winner, more of which later.

As part of this packed programme there were also Birds of a Feather Sessions, meetings of the UK Council of Research Repository Managers, the EurOpenScholar group of European Universities Senior Managers and the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) European Rollout - the third alpha release of the OAI-ORE specifications.

Facts and Figures

This year the event was heavily over-subscribed, there were over 450 delegates from 34 countries. There were 234 submissions in total to the conference. After peer review, 31 submissions were accepted for the main conference (some sessions were parallel sessions), 51 submissions were accepted as posters and 45 were accepted for the user group sessions, giving an acceptance rate of 54%.

photo (36KB) : Figure 1 : Some of the delegates at OR08 in the main conference room.

Figure 1: Some of the delegates at OR08 in the main conference room.
Flickr image courtesy of adamnfield

For the first time, all the presentations and papers are available from a real repository, the OR08 conference repository [8].

screenshot (29KB) : Figure 2 : The OR08 conference repository

Figure 2: The OR08 conference repository

Use of CrowdVine for Conference Networking

The organisers of the conference used the CrowdVine Web-based system to facilitate networking between delegates at the conference. Although the software has similarities with Facebook (which I have used), this was the first time I had experienced such software. Delegates were encouraged to join up and use the technology for social networking throughout the conference and there were some people who joined who didn't actually attend the conference, mentioning no names. The software had some interesting features, such as showing your network of friends (alas I never win in the popularity stakes), who you were a fan of, and those you wanted to meet and those who want to meet you. I would recommend this software to anyone organising an event, it is easy and free to set up for a conference, especially for quite large meetings, say 100+. It was clear that most people were using CrowdVine during the conference, though I think it is a good idea to give access to it well before and after a conference, which the organisers had done.

screenshot (22KB) Figure 3 : Open Repositories 2008 - Home (CrowdVine) on 29 April 2008

Figure 3: Open Repositories 2008 - Home (CrowdVine) [9] on 29 April 2008

More information about setting up Crowdvine for your event for free is available [10].

The Main Conference

As with many conferences of this type, it was physically impossible to attend all the sessions, so what follows are my personal highlights of the week, apologies for any sessions I may have missed out. Remember that the conference repository contains presentations and documents of all the sessions, in case there is something of interest to you that I have omitted [8].

Highlights from Day One

Opening Keynote

Repositories for Scientific Data: Peter Murray-Rust, University of Cambridge

Peter Murray-Rust delivered the opening keynote. I think the main aim for Peter was to highlight the 'elephant in the room' as far as digital repositories are concerned, i.e. where do we put data from research within educational institutions, especially that which is generated from the 'long tail of science'? Most of the work to date in institutional digital repositories has tended to focus on making research publications freely available through the principles of the Open Access movement. Although there were a few technical difficulties during the presentation (on which Peter has subsequently blogged [11]). He focused on the fact that although there is no one size-fits-all solution for data repositories for science, there are some general principles that could be adopted for those interested in setting them up. For example, data repositories should be a natural and invisible part of the workflow of a scientist and should support directly the scientific process, where the people running them are physically present in or around the laboratory. Peter was also at pains to stress that solutions to data repositories need not be complicated; the typical scientific informatics toolset could consist of nothing more than word processing and spreadsheet software, free-text indexing tools, reliable storage and a HTTP/REST interface to access the data. He offered an insight into how all of this is being approached in the chemistry / crystallography domain. Peter's presentation is available from the conference repository [8].

Session 2b: Sustainability Issues (a)

Collaboration in building a sustainable repository environment: a national library's role.

Warwick Cathro: National Library of Australia

Warwick highlighted what can be achieved when a national library and the university community work together. The Australian National Library and the university community in Australia are developing a number of sustainable services e.g. national discovery and metadata aggregation services, collection registry services based on a prototype registry service called Online Research Collections Australia (ORCA) [12] which is built on the ISO 2146 (Registry Services for Libraries and Relate Organisations) standard [13], the PILIN Project [14] which is likely to lead to a National Persistent Identifier Service (based on the handle system [15], Automated Obsolescence Notification System (AONS) [16] - a very useful tool which alerts repository managers to potential obsolescence of file formats in their repository and finally the development of an Australian Metadata Encoding and Transmission Standard (METS) metadata profile [17].

Session 3: Interoperability

SWORD: Julie Allinson, University of York, UK

My former UKOLN colleague Julie Allinson spoke about SWORD (Simple Web service Offering Repository Deposit) [18], work which she presented to the Open Repositories Conference in San Antonio (2007) on the Deposit API work [19]. Julie gave some background as to how a deposit specification / protocol for use across repository platforms came into place and how JISC came to the rescue with funding to ensure that the protocols and demonstrators were developed. There really seemed a genuine buzz and interest about the work, not just in the presentation but throughout the conference.

Poster Sessions: Minute Madness

photo (6KB) : Figure 4 : Julie Allinson, University of York, presenting her poster

Figure 4: Julie Allinson, University of York, presenting her poster
Picture courtesy of 'Ares' Picassa Web Albums

In traditional 'minute madness' fashion, there were slightly chaotic scenes of people trying to organise the poster presenters (there were in fact 51 posters, although I do not think all of them were actually presented). I have a few vivid memories. Firstly, the huge queue of people waiting to talk. Secondly, all presenters had exactly one minute to present their poster and convince people to come to their stand in the informal poster session / wine reception afterwards. Memorable presenters included:

Leonie Hayes, who gave as the chief reason for people to come to her stand the fact that she had travelled the most time zones to get to the conference, from New Zealand [20].

Peter Sefton, who presented and then admitted that his poster had yet to be created, but would be ready in the next hour or so for people when they came to his stand [21].

Pero Šipka, who told a story about a 'marriage' between SCIndeks and a national journals repository, the other partner in this 'marriage' was actually Pero's wife Biljana Kasanovic, which he only revealed at the end of the sixty seconds [22].

Highlights from Day Two

Session 4a: National and International Perspectives

What can we learn from Europe in our quest for populating our repositories?

Vanessa Proudman, Tilburg University, Netherlands

Vanessa was presenting an analysis of six case studies conducted at various European institutions on how they managed to populate their repositories. Interestingly, all the institutions in the study already had a mandate for deposit in place for their staff. Vanessa then identified six areas that influence the successful population of a repository, namely; policy, organisation, influential factors for populating repositories, advocacy, services and legal aspects. Vanessa then gave her top six critical success factors for populating a repository from a list of seventeen, at this point there were many delegates who were scribbling down furiously. Some of the factors included; having a strong communications plan, showcasing your efforts and achievements, using your local, regional, national and international networks for the development of policy, services and personnel, providing sound Intellectual Property Rights information and support. A full list of the factors is available [23].

Session 5a: Legal

Repositories and Digital Rights: An Overview of the landscape and an action plan

Grace Agnew, Rutgers University Libraries, US

Grace gave an excellent, very thorough and entertaining overview of copyright, digital rights and repositories over a time span of nearly forty years. Topics included the Digital Rights (DR) agenda for repositories, examining a data model for DR in repositories, a review of the current legal and technical landscape of Digital Rights Management (DRM) and the importance of repositories in this area. It was clear that Grace was very knowledgeable about her subject, every slide was packed with so much information that there was much more than twenty minutes worth.

Session 6b: Models, Architecture and Frameworks

The aDORe Federation Architecture

Herbert Van de Sompel, Los Alamos National Laboratory (LANL), Research Library, US

Much of the work in the repository space focuses on institutions with relatively small collections, but numbering thousands of digital objects; what happens to repository architectures if they have to deal with tens of millions of objects? The answer to this question from Herbert Van de Sompel from LANL was clear in his opening remarks, 'scale changes everything'! Herbert gave an overview of how the LANL team needed to develop a new repository architecture to supersede the existing structure so that it could deal with the enormous scale and the severe deficiencies that existed in the previous information discovery environment developed for the LANL Research Library.

Some of the deficiencies include; metadata-centric records, tens of millions of digital assets stored as separate files, content being tied to a collection and its discovery, preventing other applications from utilising the content. The LANL team had explored existing open source repository solutions and decided they were not sufficient to deal with the specific problems at LANL. The solution came about by developing a three-tier distributed, component-based approach to meet their specific challenges of scale. The metadata-centric approach was replaced by a compound object approach, where digital assets were bundled into storage containers that reduce the number of files in a file system. It was also necessary to separate storage repository from applications using a 'surrogate' repository system, and by providing a simple HTTP protocol machine interface to allow access to the stored assets. Herbert also pointed out that the resultant software, the aDORe Archive Installer, is freely available [24].

Session 7: Usage

MESUR: Implications of usage-based evaluations of scholarly status for open repositories

Johan Bollen, Los Alamos National Laboratory, US

Johan's presentation was my personal favourite of the whole conference. Even though it was at the end of the day, the so-called 'graveyard shift', I can honestly say that the audience were on the edge of their seats and were thoroughly entertained. From the opening slide, a picture of Britney Spears [25] and Big Star [26] (a relatively unknown band from the early 1970s judging by the lack of hands going up when the audience was asked), it was clear that this was going to be an engaging presentation. Johan's main argument was that current metrics which measure the impact of research using measures of popularity are an inadequate measure of the actual influence and impact on research; other factors need to be taken in account.

He supported his assertion with the judgement that although Britney Spears has sold 83 million records whereas Big Star had only sold 50,000, it was Big Star that has had the greatest influence on popular music. Johan argued that a much more complicated model is required to explain the impact of research. LANL's research was based on an enormous amount of usage, citation and bibliographic data obtained form a variety of publishers (his team had to sign confidentiality agreements with them), aggregators and institutions. Johan argued that scholarly evaluation largely based on the supposedly valued citation data has several shortcomings, e.g. publication delays mean that citation data lag scholarly developments by significant periods, and therefore more emphasis should be placed on services that could actually shorten the scholarly life cycle. Johan also pointed out that citation data ignores the growing body of grey literature or non-textual scholarly objects that exist outside the scholarly journals in which authors regularly publish and has an impact on research. Johan concluded that research on aggregated usage data over a range of scholarly sources is a significant adjunct to traditional citation data.

The Repository Challenge

screenshot (5KB) : Figure 5 : The graphic for Common Repositories Interface Group (CRIG)

Figure 5: The graphic for Common Repositories Interface Group (CRIG) [27].

Many software developers attend the Open Repositories conference, so this year it was decided by JISC-funded CRIG that it would be a good idea to organise a week-long developer's activity to produce new demos of novel repository capability - the Repository Challenge! 19 teams registered for the challenge and they had some real-life, user-relevant scenarios and services to inspire them from previous CRIG activities (e.g. scenario development, expert telconferences, Unconferencing and a Barcamp). The Repository Challenge seemed a natural 'follow-on' from CRIG work and much praise must go to David Flanders, chair of the Repository Challenge, who has also been organising much of the work of the CRIG.

photo (10KB) : Fiure 6 : Some example ideas for developers to work on

Figure 6: Some example 'ideas' for developers to work on

I really feel that this activity generated an excitement around the conference, especially during the break sessions. The challenge was run in a very informal way, coders were developing over copious cups of coffee or meeting up over drinks and pizza in true Barcamp style over the week.

The challenge culminated at the conference dinner, where five of the entrants were short-listed (after being judged by a panel of expert judges) and a video of each was presented to the delegates, who then had to vote for their favourite. The winning entry attracted a US$5,000 prize, with the cash prizes for second and third places plus gifts for honourable mentions.

photo (18KB) : Figure 7 : General Chair of the Organisation Committee Les Carr, at the conference dinner

Figure 7: General Chair of the Organisation Committee Les Carr, at the conference dinner
Flickr image courtesy of 'adamnfield'

The order of merit is shown below:

  • 1st Place: Mining For ORE by Dave Tarrant, Ben O'Steen and Tim Brody
  • 2nd Place: Zero Click Ingest by Leo Momus, Peter Sefton, Scott Yedon and Christaan Kortekaas
  • 3rd Place: BibApp 1.0 by Tim Donohue
  • Shortlisted: FileBlast by Scott Wilson and Kris Popat
  • Shortlisted: Visualiser by Patrick McSweeney
photo (31KB) : Figure 8 : Congratulations to the winners of The Repository Challenge, Dave Tarrant and Tim Brody (University of Southampton) and Ben OSteen (University of Oxford), for their entry Mining with ORE

Figure 8: Congratulations to the winners of 'The Repository Challenge',
Dave Tarrant and Tim Brody (University of Southampton) and Ben O'Steen (University of Oxford), for their entry 'Mining with ORE' [28]

More details about each of the entries are available, including some videos of the presentations [29].

Repository User Group Sessions and Other Meetings

Below are brief highlights from the Repository User Group sessions:

DSpace

DSpace 1.5 was launched, a major and fundamental upgrade to DSpace to allow modularisation; a strategy and roadmap for the development of DSpace in 2008/2009 were also presented. There were also demonstrations of how DSpace 1.5 can be customised using overlays; how the submission process can be configured; and how repositories can be personalised. A demonstration of a Virtual Olympic Museum, technical metadata and issues with content packaging together with three case studies formed the rest of the DSpace session.

Fedora Commons

A technical update to Fedora Commons was given as well as strategies for long-term digital preservation with repositories. Several demonstrations were given: mashups, Muradora, Plone front end to Fedora, Fedora integration with Honeycomb, a batch metadata editing tool, records management and digital preservation using Fedora and icalendar, Fedora and datasets, a toolkit for implementing ingest and preservation workflows, XForms 3 and Fedora, Fedora+Atom, Fedora and Ruby on Rails, search and Fedora, Using VUE and content based image retrieval. Three Fedora case studies were also presented.

Eprints

There were presentations on the launch of EPrints 3.1 beta and the future of Eprints as well as on improving the support for the editorial review process for Eprints, extending Eprints and repository analytics. In addition there were three case studies focusing on the UK Research Assessment experience.

OAI-ORE European Rollout Meeting

The meeting was intended for information managers, strategists, and implementers of networked information systems. It was led by the two coordinators of OAI-ORE [30], Carl Lagoze of Cornell University and Herbert Van de Sompel of Los Alamos National Laboratory. Delegates learnt more about the ORE data model and about the translation of this data model to the XML-based ATOM syndication format. They also learned about initial experiments with the specifications that have been carried out and there was opportunity for delegates to ask questions and discuss the specifications.

EurOpenScholar Meeting

Theme: The University's Mission, Management and Mandate in the Open Access Era

photo (25KB) : Figure 9 : Dr Alma Swan, Dr John Smith and Prof Bernard Rentier speaking at the EurOpenScholar (EOS) meeting

Figure 9: Dr Alma Swan, Dr John Smith and Prof Bernard Rentier speaking at the EurOpenScholar (EOS) [31] meeting
Picture courtesy of heystax77 from flickr

EOS is the European movement for Open Access to scientific and scholarly publications whose goals are to inform the European university communities about the opportunities available to researchers today for providing open access, and to establish institutional repositories in the universities and research centres of Europe in order to deliver a range of benefits.

Conclusion

This was a very busy event, clearly over-subscribed, but very well organised with lots of interesting speakers and sessions, especially 'The Repository Challenge'. Several people have blogged about their experiences: Neil Grindley [32], Pete Johnston [33] and Maureen Pennock [34]. Open Repositories 2009 - OR2009 [35], will be in Atlanta, Georgia, USA, over 18-21 May 2009; see you there.

References

  1. 'What Is an Open Repository?', Julie Allinson, Jessie Hey, Chris Awre and Mahendra Mahey, April 2007, Ariadne Issue 51 http://www.ariadne.ac.uk/issue51/open-repos-rpt/
  2. APSR http://www.apsr.edu.au/Open_Repositories_2006/
  3. Open Repositories 2008 http://or08.ecs.soton.ac.uk/conference.html
  4. dspace.org – Home http://www.dspace.org/
  5. Fedora Commons http://www.fedora.info/
  6. EPrints for Digital Repositories - http://www.eprints.org/
  7. CRIG - DigiRepWiki http://www.ukoln.ac.uk/repositories/digirep/index/CRIG
  8. Welcome to OR08 Publications - OR08 Publications http://pubs.or08.ecs.soton.ac.uk/
  9. Open Repositories 2008 - Home (CrowdVine) - http://or08.crowdvine.com/
  10. CrowdVine: Create your own social network http://www.crowdvine.com/
  11. Unilever Centre for Molecular Informatics, Cambridge - petermr's blog; Blog Archive >> OR08 "Repositories and Scientific Data" - the challenge of complexity
    http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1019
  12. Online Research Collections Australia (ORCA) http://www.library.uq.edu.au/escholarship/orca.html
  13. ISO 2146 Project http://www.nla.gov.au/wgroups/ISO2146/
  14. PILIN https://www.pilin.net.au/
  15. HANDLE.NET - The Handle System http://www.handle.net/
  16. SourceForge.net: AONS http://sourceforge.net/projects/aons
  17. The Australian METS Profile - A Journey about Metadata - http://www.dlib.org/dlib/march08/pearce/03pearce.html
  18. SWORD - DigiRepWiki http://www.ukoln.ac.uk/repositories/digirep/index/SWORD
  19. Repository Deposit Service Description; Presentation to OR 2007 : the 2nd International Conference on Open Repositories, San Antonio, Texas, USA, 23-26 Jan 2007. Presenter: Julie Allinson, UKOLN, University of Bath; Co-authors: Rachel Heery (UKOLN), Martin Morrey (Intrallect), Christopher Gutteridge (Southampton), and Jim Downing (Cambridge) (application/pdf Object) http://www.openrepositories.org/2007/program/files/5/heery.pdf
  20. RESEARCHSPACE@AUCKLAND : DISASTER RECOVERY (DR) - OR08 Publications http://pubs.or08.ecs.soton.ac.uk/67/
  21. Swimming upstream. From the repository to the source, in search of better content - OR08 Publications http://pubs.or08.ecs.soton.ac.uk/74/
  22. The national citation index as a platform to achieve interoperability of a national journals repository - OR08 Publications http://pubs.or08.ecs.soton.ac.uk/60/
  23. Seventeen guidelines for stimulating the population of repositories (application/pdf Object) http://arno.uvt.nl/show.cgi?fid=69760
  24. Tutorial http://african.lanl.gov/aDORe/projects/adoreArchive/docs/tutorial.html
  25. Britney Spears Official Site http://www.britneyspears.com/
  26. BIG STAR * IN SPACE http://bigstarband.com/index.html
  27. CRIG - DigiRepWiki http://www.ukoln.ac.uk/repositories/digirep/index/CRIG
  28. To see their winning presentation visit http://blip.tv/file/866653
  29. CRIG Repository Challenge at OR08 – DigiRepWiki http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Repository_Challenge_at_OR08
  30. Open Archives Initiative Protocol - Object Exchange and Reuse - http://www.openarchives.org/ore/
  31. Bernard Rentier, Recteur » EurOpenScholar http://recteur.blogs.ulg.ac.be/?p=151
  32. Open Repositories 2008 : Information Environment Team http://infteam.jiscinvolve.org/2008/04/18/open-repositories-2008-2/
  33. eFoundations: Open Repositories 2008 http://efoundations.typepad.com/efoundations/2008/04/open-repositori.html
  34. Digital Curation Blog: Open Repositories 2008 http://digitalcuration.blogspot.com/2008/04/open-repositories-2008.html
  35. Open Repositories Conference 2009 http://or09.library.gatech.edu/

Author Details

Mahendra Mahey
Repositories Research Officer
UKOLN

Email: m.mahey@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/repositories/

Return to top