Web Magazine for Information Professionals

What Is an Open Repository?

Julie Allinson, Jessie Hey, Chris Awre and Mahendra Mahey report on the Open Repositories 2007 conference, held in San Antonio, Texas between 23-26 January 2007.

23-26 January 2007 saw the second Open RepositoriesConference [1], this year hosted at the enormous Marriott Rivercenter Hotel in San Antonio, Texas, around the corner from the Alamo. The conference followed on from the inaugural one held last year in Sydney [2], offering the U.S. repositories community an ideal opportunity to gather, together with a generous scattering of attendees from other parts of the world. With the strap-line 'achieving interoperability in an open world', the conference promoted interoperability and openness in various ways, not just between repositories on a technical level, but also between development communities, technical implementers, librarians and repository managers. The question posed in the title of this article, was reported in one conference blog [3] whose author pointed to the positive impact of bringing together issues surrounding both open source and open access in one conference. The very act of blogging, as illustrated in various posts about the conference [4], demonstrates a genuine commitment to open debate.

The three and a half day conference was structured into two halves, with the first one and a half days being dedicated to user group meetings for the main three open-source repository software platforms: EPrints, DSpace and Fedora. The remainder of the conference was given over to cross-platform plenary, poster and keynote presentations.

photo (92KB) : Figure 1 : The main conference rooms at Open Repositories 2007, Flickr image courtesy of 'Afraid of Ducks

The main conference rooms at Open Repositories 2007
Flickr image courtesy of 'Afraid of Ducks'

User Group Sessions

Tuesday and Wednesday morning hosted six user group sessions. Non-exclusive, the sessions offered a chance for both experts and those less-familiar with the different software platforms to share practical experience and gather information, perhaps to take back and inform their 'home' user group. For those with a strictly non-partisan stance, it was a great opportunity to jump from group to group and see many of the exciting and fast-moving developments happening within each community. With so much going on concurrently, it's impossible to capture it all, but the User Groups drinks reception on the 23rd was a great opportunity for informal exchange and discussion.

DSpace

The DSpace user group opened with a session dedicated to 'Governance and Architecture' where Mackenzie Smith and John Ockerbloom talked about the status and proposed new technical architecture of the DSpace software [5]. Use of DSpace is international and growing, and the new architecture plans to support the community needs through growing modularisation. The DSpace User Interface project, Manakin [6], was introduced by Scott Phillips who demonstrated how a third-party add-on for DSpace can improve customisability. Through the other DSpace sessions various example were illustrated, showing how the software has been implemented and customised for a range of needs and data types, such as digitised collections, geospatial data and scholarly publications, and for federated access to primary research data in the SPECTRa project [7].

EPrints

photo (99KB) : Les Carr launching Eprints 3.0, courtesy of Jessie Hey

Les Carr launching EPrints 3.0.
Image courtesy of Jessie Hey

EPrints took this conference as an opportunity to launch version 3 [8] officially, a major upgrade over previous versions. Les Carr offered a detailed walk-through of the software, illustrating the various new and improved features such as extensive use of plug-ins, a more user-friendly workflow, support for publisher embargo periods, auto-suggest for fields such as author name, and easier importing of metadata from other systems. Some of the other speakers described their EPrints experiences in specific subject-based repositories, such as that of the E-LIS repository, in Europe, outlined by Zeno Tajoli [9]. The presentation prepared by Anita Coleman and Joseph Roback from dLIST, in the USA, described the 'Latest News' feature that they added to EPrints 2.0 - a social networking tool that draws on the Web 2.0 ethos to facilitate active participation from the repository user community. Pauline Simpson spoke of changing research funder attitudes and the example of the Natural Environment Research Council in the UK. This was complemented by the talk by Stephanie Haas from Florida on the Aquatic Commons [10]. The challenges of useful services such as classification systems were presented by Cheryl Malone and the development of exemplar preservation services for repositories, as envisioned by the UK PRESERV Project [11] were introduced by Jessie Hey.

Fedora

Following on from the Fedora User Group meeting at the University of Virginia last June [12], this strand of the Open Repositories 2007 Conference contained a range of presentation describing work that had been done rather than work that was being planned. This demonstrated a moving on within the Fedora community, and delegates were left much encouraged by what they had heard.

The work behind the development of the National Science Digital Library (nearly 5 million digital objects) [13] was described in two presentations by Dean Krafft, who described plans for developing collaborative services over this huge library, and Chris Wilper and Aaron Birkland, who reported on the development of a new open source RDF triplestore, MPTStore [14], to provide rapid access to users when searching the library. Smaller scale implementation reports came from Tufts University, Indiana University, and OhioLink. Talks also covered specific aspects of Fedora use, including ingest of digital objects, their validation and versioning, workflow - including work at the University of Hull and the Arts & Humanities Data Service in the UK - authentication & authorisation, and editing of repository content.

Two of the Fedora Working Groups also reported. Carol Minton Morris from the Communications & Outreach group presented the results of a survey of Fedora users, finding users within four distinct but overlapping communities: scholarly communication, museums & libraries, education, and e-science. Wider promotion of Fedora within these communities of practice is planned for the future. Ron Jantz from the Preservation Services group reported on activity that had led to the introduction of object integrity checking within Fedora 2.2. Event messaging is a current interest, and is planned for Fedora 2.3. The group is working on making Fedora capable of meeting the requirements to be a Trusted Repository according to the criteria of the Research Libraries Group. Sandy Payette, co-Director of the Fedora project, rounded off the user group session with a look to the future and the establishment of the Fedora Commons as a non-profit foundation upon which future Fedora development can be based and opened up to contribution from all users.

Opening Keynote

The opening keynote was given by James L. Hilton, Vice President and Chief Information Officer at the University of Virginia. As its title suggests - 'Open source for open repositories : new models for software development and sustainability' - the presentation focused on software development and changes in education's attitude towards enterprise and open source software. Hilton argued in favour of community development, collaboration and unbundling software ownership and support, using the development of the Sakai open source courseware system as an exemplar. Open source is not without its challenges, though, and Hilton accepted that IPR, patents and licencing were all issues. Nor did he believe that open source equals 'free', likening it to a free puppy which requires time, finances and energy. Overall, this energetic opening speech was very positive and Hilton ended by asserting that 'we underestimate the extent to which individuals (or collections of individuals) can change the world'.

Plenary and Poster Sessions

photo (89KB) : Figure 3 : Perusing the posters at Open Repositories 2007, Flickr image courtesy of Julie Allinson

Figure 3 : Perusing the posters at Open Repositories 2007.
Flickr image courtesy of Julie Allinson

Over the ensuing two days, there were eighteen presentations to the assembled conference, organised into six themes, plus a busy and varied poster reception and a presentation on Open Access from the Scholarly Publishing and Academic Resources Coalition (SPARC).

In the Management Strategy and Policy session, Andrew Treloar gave an overview of the current status of the Australian ARROW project [15], including some reflections on using vanilla Fedora in conjunction with the Fedora-based VITAL repository product from VTLS, a third-party vendor. Familiar issues of metadata, persistent identification and communication were tackled within the ARROW project. Leslie Johnston then talked about 'how principles and activities of digital curation guide repository management and operations'. This speaker looked at the UVA Digital Library Repository (another Fedora instance) and the principles used in building services and curating materials. The four main principles were: selection, standards, trustworthiness and preservation/sustainability. To round off this session, Atsuko Takano covered the Chiba University institutional repository and the institutional repository movement in Japan. She introduced the "principle of principled promiscuity" whereby repositories, in order to encourage deposit, welcome everything. This formed the basis for the Chiba strategy along with cooperation between faculty and library, and outsourcing metadata creation.

Thursday began with the Preservation session where Mackenzie Smith outlined an approach to expressing actionable preservation policies for repositories, from the PoLicy Enforcement in Data Grid Environments (PLEDGE) project [16]. Joan Smith then talked about archiving websites using the 'mod_oai' Apache web server plugin, allowing them to be harvested using OAI-PMH using MPEG 21 DIDL [17]. Miguel Ferrera described the CRiB [18] recommendation service being developed at the University of Minho, Portugal. This offers a service-oriented architecture for executing, evaluating and recommending migration-based preservation activities, drawing on services such as PRONOM [19], PANIC [20], TOM [21] and MyMorph [22].

In the User Services and Workflow session, the two presentations went from a technical look at the Fez open source content model and workflow management front-end to fedora [23] to a user-focussed ethnographic study of institutional repository librarians and their experiences of usability. This particular study found that the attitudes and expectations of academics and librarians differ greatly!

Next up, the Semantic Web and Web 2.0 introduced three technical approaches to creating and accessing content and to populating repositories in a lightweight way. The DLESE Teaching Boxes [24] are customisable, digital replicas of educators 'teaching boxes' for use in creating and adapting pedagogical content and context. The SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments) project at MIT [25], presented by Richard Rodgers, is using RDF and other semantic web technologies to gather heterogeneous data sources into a single web interface. The SIMILE project has produced various tools including RDFizers to transform existing data into an RDF representation of it, the Timeline widget for visualizing time-based events and Longwell, a web-based RDF-powered highly-configurable faceted browser. Eric Larson presented BibApp, a mash-up that uses several lightweight open source technologies for getting information about people and their publications at the University of Wisconsin-Madison Libraries and exposing it via a single online interface.

Thursday was wrapped up by a session on Interoperability, where Carl Lagoze presented preliminary ideas about the new OAI-ORE [26] initiative and reported on a recent technical committee meeting. Lagoze introduced the resource-centric ORE as a companion to the metadata-centric OAI and outlined the ORE view of a compound digital object where constituent parts can be re-used and uniquely referenced. Julie Allinson went on to describe some related work in the U.K. to create a lightweight service for facilitating deposit across multiple repositories in a standard way [27]. Mahendra Mahey presented preliminary findings from an analysis of scenarios and use cases collected by repository projects funded in the U.K [28].

The closing plenary session on Friday morning was on e-Science and e-Scholarship and opened with Julie Allinson who introduced a Dublin Core Application Profile for describing scholarly works [29] which had made use of the FRBR application model and the Dublin Core Abstract Model to facilitate the capture of multiple descriptions for different entities. Matthias Razum talked about eSciDoc [30], a collaborative project between the Max Planck Society and FIZ Karlsruhe to create a scholarly information and communication platform that moves research organisations away from 'information silos' and supports the research process from idea through to completion. To close, C. Lee Giles introduced work on the ChemXSeer portal [31] for the chemistry discipline. The repository will be truly hybrid, integrating scientific literature with experimental and analytical data, and automatically harvested materials with user submitted data.

Closing Keynote

The closing keynote came from Tony Hey, currently Vice-President for Technical Computing at Microsoft and formerly Director of the UK e-Science Core Programme. Hey talked about e-Science and Scholarly Communication, presenting a vision of the direction that both will take in the digital age. Hey believes that we are on 'on the verge of a new type of science paradigm' that will see scientific research becoming increasingly date-centric, using computational methods to enrich the scholarly data lifecycle from data acquisition and ingest, through metadata and annotation, to storage and provenance. Linking experimental data with publications, analysis and statistical data are also critical elements of the cycle and Hey pointed to a range of examples to illustrate, including the ChemXSeer portal outlined by C. Lee Giles in the final plenary presentation. Hey ended his talk by making the case for open access and open document formats.

Conclusion

As the establishment of repositories continues to grow, and the potential for using repositories to store, manage, aggregate and provide access to a wide range of materials begins to be realised, this conference offered an ideal opportunities to share experience, best practice, new developments and mechanisms for 'achieving interoperability in an open world'. The mixture of research and production examples gave attendees a real sense that they could take away practical examples to implement in their own repositories but in the knowledge that more is to come.

Continuing to leap across continents, Open Repositories will be coming to the UK in 2008, to be hosted by the University of Southampton [32].

References

  1. Open Repositories 2007 http://www.openrepositories.org/
  2. Open Repositories 2006 http://www.apsr.edu.au/Open_Repositories_2006/
  3. Anderson, Bill. "OR2007, Retrospection 1: 'What is an open repository?' ", PRAXIS101, 27 January 2007 http://praxis101.com/blog/archives/000095.html
  4. For some examples see:
    Musings on Information and Librarianship
    http://infomotions.com/musings/open-repositories-2007/,
    Jim Downing
    http://wwmm.ch.cam.ac.uk/blogs/downing/?m=200701,
    the Chronicles of Richard
    http://chronicles-of-richard.blogspot.com/2007/01/open-repositories-2007-preliminary.html,
    The Disruptive Library Technology Jester (DLTJ)
    http://dltj.org/2007/01/open-source-for-open-repositories/
  5. DSpace Federation http://www.dspace.org/
  6. MANAKIN project http://wiki.dspace.org/index.php//Manakin
  7. SPECTRa project http://www.lib.cam.ac.uk/spectra/
  8. EPrints software http://www.eprints.org/software/
  9. E-LIS http://eprints.rclis.org/
  10. Aquatic Commons http://www.iamslic.org/index.php?section=147
  11. PRESERV Project http://preserv.eprints.org/
  12. Fedora Users Conference, Unversity of Virginia, 18-19th June 2006
    http://www.lib.virginia.edu/digital/fedoraconf/index.shtml
  13. National Science Digital Library http://nsdl.org/
  14. MPTStore http://mptstore.sourceforge.net/
  15. Arrow http://www.arrow.edu.au/
  16. PLEDGE Project http://pledge.mit.edu/index.php/Main_Page
  17. mod_oai http://www.modoai.org/
  18. CRiB http://crib.dsi.uminho.pt/
  19. PRONOM http://www.nationalarchives.gov.uk/pronom/
  20. PANIC http://metadata.net/panic/
  21. Typed Object Model (TOM) http://tom.library.upenn.edu/
  22. MyMorph http://docmorph.nlm.nih.gov/docmorph/mymorph.htm
  23. Fez http://sourceforge.net/projects/fez/
  24. Teaching Boxes http://teachingboxes.org/
  25. SIMILE http://simile.mit.edu/
  26. OAI-ORE http://www.openarchives.org/ore/
  27. Deposit API http://www.ukoln.ac.uk/repositories/digirep/index/Deposit_API
  28. Scenarios and Use Cases http://www.ukoln.ac.uk/repositories/digirep/index/Scenarios_and_use_cases
  29. Eprints Application Profile http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
  30. eSciDoc http://www.escidoc-project.de/homepage.html
  31. ChemXSeer portal http://www.czen.org/node/332
  32. Open Repositories 2008 http://www.openrepositories.org/2008/

Author Details

Julie Allinson
Repositories Research Officer
UKOLN, University of Bath

Email: j.allinson@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/repositories/

Mahendra Mahey
Repositories Research Officer
UKOLN, University of Bath

Email: m.mahey@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/repositories/

Chris Awre
Integration Architect
e-Services Integration Group
University of Hull

Email: c.awre@hull.ac.uk
Web site: http://www.hull.ac.uk/esig/

Jessie Hey
Digital Repositories Services Researcher
School of Electronics and Computer Science
University of Southampton

Email: jmnh@ecs.soton.ac.uk
Web site: http://eprints.soton.ac.uk/

Return to top