Web Magazine for Information Professionals

Hydra UK: Flexible Repository Solutions to Meet Varied Needs

Chris Awre reports on the Hydra UK event held on 22 November 2012 at the Library of the London School of Economics.

Hydra, as described in the opening presentation of this event, is a project initiated in 2008 by the University of Hull, Stanford University, University of Virginia, and DuraSpace to work towards a reusable framework for multi-purpose, multi-functional, multi-institutional repository-enabled solutions for the management of digital content collections [1]. An initial timeframe for the project of three years had seen all founding institutional partners successfully implement a repository demonstrating these characteristics.  Key to the aims of the project has always been to generate wider interest outside the partners to foster not only sustainability in the technology, but also sustainability of the community around this open source development.  Hydra has been disseminated through a range of events, particularly through the international Open Repositories conferences [2], but the sphere of interest in Hydra has now stimulated the holding of specific events in different countries: Hydra UK is one of them.

The Hydra UK event was held on 22 November 2012, kindly hosted by the Library at the London School of Economics.  Representatives from institutions across the UK, but also Ireland, Austria and Switzerland, came together to learn about the Hydra Project, and to discuss how Hydra might serve their digital content collection management needs.  29 delegates from 21 institutions were present, representing mostly universities but also the archive, museum and commercial sectors.  Five presentations were given on Hydra, focusing on the practical experience of using this framework and how it fits into overall system architectures, and time was also deliberately given over to discussion of more specific topics of interest and to allow delegates the opportunity to voice their requirements.  The presentations were:

Introduction to Hydra

Chris Awre from the University of Hull gave the opening presentation.  The starting basis for Hydra was mutual recognition by all the founding partners that a repository should be an enabler for managing digital content collections, not a constraint or simply a silo of content.  Digital repositories have been put forward and applied as a potential solution for a variety of use cases over the years, and been used at different stages of a content lifecycle. 

LSE Library (Photo courtesy of Simon Lamb, University of Hull.)

Figure 1: LSE Library
(Photo courtesy of Simon Lamb, University of Hull.)

To avoid producing a landscape of multiple repositories all having to be managed to cover these use cases, the Hydra Project sought to identify a way in which one repository solution could be applied flexibly to meet the requirements of different use cases. The idea of a single repository with multiple points of interaction came into being – Hydra – and the concept of individual Hydra ‘head’ solutions.

The Hydra Project is informed by two main principles:

The Hydra Project has sought to provide the common infrastructure upon which flexible solutions can be built, and shared.

The recognition that no single institution can achieve everything it might want for its repository has influenced the project from the start.  To quote an African proverb, ‘If you want to go fast go alone, if you want to go far, go together’. Working together has been vital.  To organise this interaction, Hydra has structured itself through three interleaving sub-communities, the Steering Group, the Partners and Developers, as shown by Figure 2.

Figure 2: Hydra community structure

Figure 2: Hydra community structure

The concept of a Hydra Partner has emerged from this model of actively working together, and the project has a Memorandum of Understanding (MoU) process for any institution wishing to have its use of, and contribution and commitment to Hydra recognised.  Starting with the original four partners in 2008, Hydra now has 11 partners, with two more in the process of joining.  All have made valuable contributions and helped to make Hydra better.  Hydra partnership is not the only route to involvement, though, and there are many in the Hydra developer community who are adopters of the software, but who have not reached a stage where partnership is appropriate.

The technical implementation of Hydra was supported through early involvement in the project by MediaShelf, a commercial technical consultancy focused on repository solutions.  All Hydra software is, though, open source, available under the Apache 2.0 licence, and all software code contributions are managed in this way.  The technical implementation is based on a set of core principles that describe how content objects should be structured within the repository, and with an understanding that different content types can be managed using different workflows.  Following these principles, Hydra could be implemented in a variety of ways: the technical direction taken by the project is simply the one that suited the partners at the time.

Hydra as currently implemented is built on existing open source components, and the project partners are committed to supporting these over time:

These components are arranged in the architecture shown in Figure 3.

Figure 3: Hydra architecture

Figure 3: Hydra architecture

A common feature of the last three components in the list above is the use of Ruby on Rails as the coding language and its ability to package up functionality in discrete ‘gems’.  This was consciously chosen for Hydra because of its agile programming capabilities, its use of the MVC (Model–View–Controller) structure, and its testing infrastructure.  The choice has been validated on a number of occasions as Hydra has developed.  However, it was noted that other coding languages and systems could be used to implement Hydra where appropriate.  This applies to all the main components, even Fedora.  Whilst a powerful and flexible repository solution in its own right, Fedora has proved to be complex to use: Hydra has sought in part to tap this capability through simpler interfaces and interactions.

Richard Green presenting Hydra @ Hull (Photo courtesy of Simon Lamb, University of Hull.)

Figure 4: Richard Green presenting Hydra @ Hull
(Photo courtesy of Simon Lamb, University of Hull.)

Hydra @ Hull

Richard Green, Consultant to Library and Learning Innovation at the University of Hull, followed up the introduction to Hydra by describing in detail the implementation of Hydra at Hull, working as one of the founding partners in taking this forward. Hydra @ Hull [7] is deliberately set up as a generic repository that can, in principle, cater for any type of digital content the University wishes to manage: as such, the repository holds a wide variety of content types. For the most part they are managed using a common workflow, although a separate workflow has been developed for theses and there are plans to develop specific workflows for images and multimedia.  However, alongside the workflows developed purely for deposit as part of Hydra, Hull has also implemented workflows that support deposit of content from other systems in a way that then allows them to be managed through the repository.  Examples (a combination of prototypes and implementations) include:

These workflows structure the content objects in a way that makes them Hydra-compliant, so that Hydra can work with them.

 

Hydra UK participants (Photo courtesy of Simon Lamb, University of Hull.)

Figure 5: Hydra UK participants
(Photo courtesy of Simon Lamb, University of Hull.)

Richard then highlighted aspects of Hydra as implemented at Hull that are applicable across all Hydra implementations.  Facet-driven access is provided through the Blacklight user interface.  Different content types can be displayed in different ways (allowing, for example, a Google maps integration for datasets that is not applicable for other content types).  Security within the system also enables collections to be exposed openly or delivered only to appropriate user groups.  Content can be arranged in collections of two types: hierarchical collections for structuring the repository as an aide to managing it; and flexible collections that allow the display of items from across the repository for a particular purpose and facilitate access to them (eg a combination of archival materials held in separate hierarchies).  Flexible interfaces can also be applied when creating and updating records within a repository, such that different templates can be displayed according to content type.  Different workflow steps can also be applied: Hull has, for the most part, found that a relatively simple two-stage workflow works for most items.  Everything deposited in the repository goes into a QA queue, and is then checked before being formally published.

Hull’s experience of being a Hydra partner has been a fruitful one, and we have been able to contribute in a variety of different ways (code, architecture, Web site maintenance, documentation). It has also at times been a painful birth as we have seen the system come to life.  The work has been well worth it: Hydra @ Hull has been well-received by users, and Hull has also been able to exploit one of the Hydra components, Blacklight, further by using it as an alternative interface to the library catalogue (a role Blacklight was originally designed to support).  Future integration of catalogue and repository search is now under investigation.

Hydra @ GCU

Caroline Webb, Repository Developer at Glasgow Caledonian University, spoke in her presentation on using Hydra to support the Spoken Word service at Glasgow Caledonian*,  describing how Hydra offered a community that could help sustain any solution implemented.  Caroline is solely responsible for maintaining Spoken Word, a repository of recordings from the BBC that are made available to support teaching and learning.  It had originally been decided to hold this collection using Fedora, but building an interface that didn’t have to be maintained entirely in-house has proven to be an added bonus.

Hull and GCU had previously worked together on the JISC REMAP Project [8], and, in an example of Hydra re-purposing, Caroline’s initial adoption of Hydra has been to take Hull’s Hydra head and adapt it to meet GCU’s needs.  A requirements analysis highlighted the following:

Whilst Hydra @ Hull can hold audio and video material (and it does have such materials in the collection) in a basic fashion, it cannot currently provide easy access to them through players, or hold all the metadata that can be useful for such materials.  Adding this has been the focus of GCU’s work and is now in place.

Technically, GCU colleagues have focused on delivering the audiovisual material using progressive download or pseudo-streaming as they have had historical difficulties with streaming to different browsers.  This has worked well, using the JW player for its cross-format compatibility, although adaptive HTTP streaming may be investigated in the future.  Working with Hydra has been a steep learning curve at times, not least because the community and technology have been developing fast over the past 18 months, and Ruby on Rails training had been a necessary, though invaluable, starting point.  Once up and running, the flexibility of the framework and agility in making changes has enabled rapid progress to be made.

Hydra @ LSE

Ed Fay, Digital Library Manager for the LSE Library, started his presentation by announcing that LSE has formally joined Hydra as a Partner.  This has been an end result of long-term interest to the Hydra community and adoption of some, though not all, of the technical components.  Partnership now offers LSE colleagues the chance to contribute back actively and also to inform Hydra’s development based on their experience.

The need for a system like Hydra emerged as LSE looked to manage an increasing amount of digital library material that was being generated locally through the LSE Digital Library [9].  An analysis of options was made and a business case generated, resulting in the adoption of, initially, Fedora (a process previously documented in Ariadne [10]).  The subsequent adoption of Hydra was based on a further analysis of interface solutions and recognition that Hydra and the LSE had more shared use cases.  Having said that, pragmatism in the LSE implementation has resulted in selective use of Hydra’s components at this time.

Uses

Does not use

ActiveFedora

Blacklight

Solrizer

User authentication

Hydra community input

Web-based interface to editing content

Table 1: Current LSE use of Hydra components

Hydra has aided LSE’s work through support of deposit, manipulation and structuring of content objects.  The current decisions made  do not necessarily mean that further components won’t be used, rather that the LSE repository architecture, covering both preservation and access repositories, has found it beneficial to focus mainly on certain parts and address the others more locally.  For example, a local user interface solution has been created that suits current needs.

Future work planned includes doing more with digitised collections, further developing the preservation workflow, and understanding better the relationship with the existing EPrints repository used for research outputs.

Hydra @ Oxford

Neil Jefferies, R&D Manager at the Bodleian Libraries, started his presentation with the revelation that Oxford doesn’t really use Fedora.  It is still used as the underpinning repository for the Oxford Research Archive (ORA), but they had become increasingly frustrated by certain aspects of Fedora that over-complicated what they would like to use it for.  Could a system be created that focused on all the best bits (primarily, the flexible generic object model for content, the semantic model, the REST API, and storage abstraction) whilst leaving behind the unnecessary wrapping of objects and, although in principle possible, a lack of modularity in the architecture?  An alternative approach was identified in the California Digital Library microservices concept [11], and they have used this to develop a CDL microservices repository with the features of Fedora they liked.  This became known as DataBank [12], software that is now available for use by others as a solution for the management of research data.

In this context, what is the link to Hydra?  In exploring interface solutions for use with DataBank, it was observed that the subset of Fedora functions implemented for DataBank were almost exactly the same as those used by Hydra heads.  Hence, it became apparent that Hydra could be used over DataBank as much as over Fedora.  A number of use cases have emerged to which Hydra can be applied within the overall repository architecture Oxford colleagues have implemented:

At the heart of Oxford’s development is the concept of the semantically aware object, an object that knows what it is and how it relates to other objects and information.  Hydra provides a way of working with such objects within an overall architecture.

Discussion

Three discussion groups emerged in unconference style:

Technology

The technical discussion proved to be the most popular, as delegates sought to understand more fully what was possible.  Discussion addressed the issue of what constitutes a Hydra content object, and how this could be best modelled, plus how different authentication options could be embedded within a single Hydra head.  Noting the emphasis during Oxford’s presentation on not using the full Fedora system, there was interest in how Hydra might be applied over other repository engines.  A more generic version of the ActiveFedora gem that enables interaction between Ruby and Fedora via the REST interface was mooted: ActiveRepository, perhaps.  Alongside specific technical topics, there was also interest in Ruby on Rails training courses, and the potential of a European Hydra Camp to provide focused training on the Hydra technologies.

Collections Management

Although a few collections use cases had been proposed in scoping the discussions, the primary use cases that emerged were around research data management, and how Hydra might support this.  Experience at Hull has currently sought to keep this simple, although with the expectation that Hydra can be enhanced to accommodate evolving needs.  The role of Hydra as a data catalogue, mirrored to some extent in Oxford’s development of DataFinder, a companion service to DataBank, is feasible alongside the management of datasets.  Key issues were the ability to link versions of datasets and deal with complex content objects comprising multiple files.

Digital Preservation

Those in this discussion group highlighted the range of experiences they had had in addressing digital preservation, the common theme being that there is an increasing body of material that needs preservation attention.  Themes covered the emphasis in taking what you get and making sure versioning works correctly in case subsequent copies are added.  A question for consideration was whether disk images should be broken up into their files, or indexed as a single entity and files accessed within it.  It was noted that archival processes often work better for external depositors and work to gather internal materials may need development.

Conclusion

So what is Hydra?  As the project’s Web site states, Hydra can be three things:

The day offered an opportunity to understand these aspects of Hydra and find out more about how it has been applied in a variety of scenarios in the UK.  The event did not touch on the many US implementations and experiences, though information is available elsewhere [15].  Delegates welcomed the practical nature of the experiences described, and the description of issues that need to be properly considered and addressed to manage digital content effectively.  Hydra UK will develop as the community needs it to by continuing to put Hydra into practice.

*Editor’s note: Readers may obtain further information from:

Iain Wallace, Graeme West, David Donald. "Capacity Building: Spoken Word at Glasgow Caledonian University". July 2007, Ariadne Issue 52 http://www.ariadne.ac.uk/issue52/wallace-et-al/

References

  1. Hydra http://projecthydra.org
  2. Hydra dissemination https://wiki.duraspace.org/display/hydra/Events%2C+presentations+and+articles
  3. Fedora Commons http://fedora-commons.org/
  4. Apache Solr  http://lucene.apache.org/solr/
  5. Blacklight http://projectblacklight.org
  6. Hydra’s technical framework
    https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
  7. Hydra @ Hull http://hydra.hull.ac.uk
  8. Richard Green, Chris Awre, “The REMAP Project: Steps Towards a Repository-enabled Information Environment” April 2009, Ariadne Issue 59
    http://www.ariadne.ac.uk/issue59/green-awre#7
  9. LSE Digital Library http://digital.library.lse.ac.uk/
  10. Ed Fay, “Repository Software Comparison: Building Digital Library Infrastructure at LSE” July 2010, Ariadne Issue 64  http://www.ariadne.ac.uk/issue64/fay
  11. California Digital Library microservices https://wiki.ucop.edu/display/Curation/Microservices
  12. DataBank https://databank.ora.ox.ac.uk/
  13. Shared Canvas http://www.shared-canvas.org/
  14. IIIF http://www-sul.stanford.edu/iiif/image-api/
  15. Hydra applications and demos http://projecthydra.org/apps-demos-2-2/

Author Details

Chris Awre
Head of Information Management
Library and Learning Innovation
Brynmor Jones Library
University of Hull
Hull 
HU6 7RX

Email: c.awre@hull.ac.uk
Web site: http://www2.hull.ac.uk/lli/

Chris Awre is Head of Information Management within Library and Learning Innovation at the University of Hull. He oversees the teams responsible for the acquisition, processing and cataloguing of all materials managed through the Library from both external and internal sources, the latter focusing on the development of the digital repository and local digital collections. Chris has a background as a systems librarian and advocates the value of a broad approach to the systems used for digital repository collection development.