Moving Towards Interoperability: Experiences of the Archives Hub
The Archives Hub  is a JISC-funded service based at Mimas, a National Data Centre supporting world-class learning and research . It brings together descriptions of archives for research and education, enabling users to search across over nearly 200 repositories. It stores descriptions in Encoded Archival Description (EAD).
Interoperability is about working together (inter-operating). Whilst the central theme of this article is data interoperability – the ability to exchange or share information and use that information – this also requires individuals and organisations to work together; another form of interoperability. Over the last 18 months, the Archives Hub team have been engaged on a JISC-funded Enhancements Project to promote interoperability through practical means for encouraging data sharing and working collaboratively with colleagues to help achieve that aim.
Benefits of Interoperability
For the archive community, the drive towards interoperability should be seen as something that is hugely beneficial for researchers. It is inevitable that we will move more towards the seamless integration of resources because in the digital age the ability to search for resources efficiently and effectively is a basic expectation of users. It saves users time, it pulls together archives by name, place and subject, enabling researchers to make new connections, and it pools the experience, expertise and resources of archivists for the greater good.
Archive repositories generally feel that it is very worthwhile and valuable for them to contribute to the Archives Hub, but they do not want to create finding aids specifically for the Hub, as well as creating finding aids for their own system and maybe for other network sites, such as AIM25 and SCAN. This duplication of work should not be required in the current environment, when we have the technical know-how to implement more efficient solutions. One of the benefits of interoperability should be that one description can be used for various purposes. In reality, it may be that one description needs to be modified for different purposes, but that modification should be minimal and preferably automated.
Of course, the ideal scenario is to support remote searching. We are taking a step in this direction with the work that we are currently doing with The Women's Library, which hosts Genesis, a portal for searching for resources relating to women's studies. The Archives Hub is providing remote access to our descriptions so that Genesis can remotely search a pre-defined sub-set of descriptions relating specifically to women. This is the sort of innovative work that will take the archives community forwards towards true interoperability.
UK Archives Discovery
There are many challenges involved in the sharing and cross-searching of data. The first hurdle is to convince stakeholders that it is worth prioritising this kind of work. The archive community has begun to come together on this, and the formation of the UK Archives Discovery network (UKAD) , which has evolved out of the National Archive Network (NAN) is an indication of this. UKAD has the following objectives:
- To promote the opening up of data and to offer capacity for such a cross-searching capability across the UK archive networks and online repository catalogues
- To lead and support resource discovery through the promotion of relevant national and international standards
- To support the development and use of name authorities
- To advocate for the reduction of cataloguing backlogs and the retro-conversion of hard-copy catalogues
- To promote access to digitised and digital archives via cross-searching resource discovery systems.
- To work with other domains and potential funders to promote archive discovery
The UKAD network will be an important vehicle for sharing information, ideas, projects and outcomes. For the Archives Hub, undertaking this sort of work in isolation is contrary to our strategic aim to foster a culture of collaboration and community . However, once the will is there and the benefits are articulated, the lack of resources to tackle the challenges inherent in this kind of integration of data sources can also be a barrier. Archivists commonly adhere to ISAD(G) as a standard for creating finding aids . However, it is not a content standard and it does not lead to anything like the level of consistency in data systems, data structure and format required for pulling data sources together. The Archives Hub has adopted EAD as a standard that is international and widely used , and it is helping us achieve interoperability, but it is only one option for structuring data. Many repositories use relational databases, which provide a whole range of advantages. But not all database systems facilitate exporting of data, or importing of data from other sources. Even once this issue is addressed, there remains the whole gamut of data-related problems. We are not consistent in the way we create titles, dates, reference numbers and index terms, we do not all include the same content, we do not catalogue archives to the same level of detail.
The Archives Hub has been looking at ways to tackle these issues and open up online access. As an aggregating service and a focal point for parts of the archive community, we believe that we should promote and support standards and best practice, initiating and participating in collaborative ventures that will benefit archive users and raise the profile of archives.
Background to the Current Work
In 2008 and 2009 the Archives Hub undertook enhancement work that was part of a larger Mimas Enhancements Programme funded by JISC. The vision for the Programme was: to deliver benefits to researchers by advancing interoperability and linking through to new content; to integrate services; to share best practices to support long-term usability and sustainability across Mimas services.
As part of this project we looked into the status of export from Axiell CALM  and Adlib  and talked to our community of contributors and potential contributors to gauge the level of interest in exporting to the Hub. We also realised that it was important to investigate the potential for exchanging data with other providers, in particular with AIM25 . The project established a solid foundation from which we could continue to develop this work, and JISC agreed to fund a second enhancements project to do just that.
Working with AIM25
Over the past few years the Hub team have developed a productive relationship with AIM25, the London consortium. There are obvious advantages in working together to the advantage of our users, and this is facilitated by the fact that we both use EAD. The basic aim is to share data, and beyond this to benefit from working together in other areas.
In November 2007 we attended a meeting of the CALM and AIM25 Users' Group in London. That was the start of an initiative which eventually led to the Archives Hub taking in nearly 2,000 records from the University of London Senate House Library, Courtauld Institute of Art and Queen Mary University of London. These records were provided to us by AIM25 and we were able to put them onto the Hub with only minimal data editing. We are looking at ways to minimise further any editing, and if possible, automate all editing that is required. We should emphasise that this is not editing of content, but of structure. Our colleagues at AIM25, Geoff Browell and Francis Blomeley, agreed to work on an export routine that would provide us with 'Hub flavour' EAD, and we agreed in turn that we would work to create a specification for an export routine from CALM and Adlib that would be suitable for both the Archives Hub and AIM25. By co-operating in our approach we have been able to reduce duplication of effort and benefit users.
The Enhancements Interoperability Project
In April 2009 we started work on the second enhancements project, to promote interoperability by improving the ease of exporting from CALM and Adlib to the Archives Hub and AIM25. CALM and Adlib are the two leading archival software systems used in the UK, and we decided to work with these systems to open up the benefits of the project to as many repositories as possible.
The existing CALM and Adlib export functions are currently somewhat out of date, having been put together some years ago, and one of the specific aims of the project was to update them. Once the project had been approved, and a project officer appointed, we set out to liaise with Hub contributors in repositories across the UK.
We were very keen that the project should be based on real data, from real contributors. There is very little point basing a specification on ideal data, as you are unlikely ever to encounter them in a real-world situation. We wanted to ensure that we could cope with as many vagaries of data as possible – not only different interpretations of ISAD(G), but also differences in the way that contributors use CALM and Adlib. We know that rules are not always applied consistently, and that each repository will have a slightly different way of using its archival software. While we knew from the start of the project that we would not be able to accommodate all idiosyncrasies, we were determined to make the export process something that as many repositories as possible could use, without significantly altering their workflows.
To expedite this, we called for volunteers to be part of the project, and received a good response – both from existing Hub contributors, and from people interested in contributing in the future. Overall, we received descriptions from 14 institutions which volunteered following a call on the CALM users (calm-lis) Jiscmail list. Once we had gathered our contributors, we asked each of them to send a selection of descriptions, exported using the current export routines. We asked that these descriptions be a variety of collection-level and multi-level descriptions; we also asked contributors not to tidy them up, or alter them in any way other than what might be required to run the export routines. This meant that we knew we were dealing with data that would be comparable to those exported to the Hub during normal business.
Once the project officer had received these data, she built a database to handle the outputs from the processing. This database contained every EAD field which the Hub displays with an example of ideal Hub form and syntax. The descriptions were then examined, element by element, and each element was classed as 'ideal', 'satisfactory', or 'unsatisfactory'.
These data were then used to build a picture of where the major problems were – if an element or attribute was consistently unacceptable, then we could be reasonably sure that the problem was with the export routine, not the contributor's data.
To help us with this, Axiell and Adlib kindly gave us access to test copies of CALM and Adlib. This meant that we were able to examine the descriptions in their native environment, as well as run test, or dummy, descriptions. This gave us the freedom to experiment, and to determine which fields within CALM and Adlib produced the most satisfactory EAD descriptions.
One of the first challenges we met, which proved to be a major one, was the problem of index terms. They are vital for the Hub's system of resource discovery, and we recommend that all Hub descriptions must have at least the name of the creator indexed as an access point. These access points are currently marked up on the Hub in a very idiosyncratic way, so that we can distinguish between forename, surname, life dates, etc. We are currently in the process of reviewing this practice, because the advantages of identifying the components of an index term need to be set against the significant disadvantages of non-standard markup. This index term markup was therefore not considered for this project. As this markup was also the main difference between Archives Hub and AIM25 descriptions, this meant that it was much easier to produce a specification that would be satisfactory for both.
While we hoped that this would make the question of access points easier to handle, we encountered an issue in that the version of CALM to which we had access did not provide access to the CALM database of index terms. This meant that we were unable to experiment with index terms, and had to rely on the descriptions provided by our volunteers. This was further complicated by the fact that the majority of access points did not, in fact, export – we were faced with description after description full of empty personal name, corporate name, and geographic name tags.
Our first round of contributor visits was designed to address this problem, and so we chose to visit repositories whose access points had exported correctly. This would enable us to see where in CALM this information was coming from, and adjust our specification, or advice to contributors, accordingly. However, for some less used access points, such as genre form, we were unable to find examples of successful exports.
Many contributors also use links to the CALM thesaurus, resulting in an ID being present in the field in CALM, but nothing being exported. While we have asked CALM to remedy this, we have been unable to give them any details of how to do so.
Another challenge arose from CALM being so dominant in the archival software market in the UK - we only had one Adlib volunteer for the project. This meant that, while we were able to provide a specification to Adlib, it was less comprehensive, and relied less on real-world data, than that provided to Axiell for CALM.
We have provided both software providers with specifications for a revised and improved version of their specification, to be used both for the Hub and AIM25. While we are keen that this improved export is implemented as soon as possible, this is now out of our hands, and we are reliant on the resources which Axiell and Adlib have available to devote to this.
We have used the knowledge gained from this project to produce guidelines for contributors on how to amend their current CALM exports so that they are ideal for the Hub. We have tried to make this as unobtrusive to contributor workflows as possible, and have given several alternatives for contributors to manage this: in the software itself; by hand in an XML editor, or by editing descriptions once they have been uploaded to the Hub's new EAD editor. We are recommending that contributors use the last, as the editor has been specifically designed to facilitate the quick and easy creation of valid and interoperable EAD.
We have disseminated this guidance to contributors, and already have some new contributors who are using this method to upload their CALM descriptions to the Hub. We hope that publishing the results of this project will encourage more contributors to do so.
This project has already realised a number of benefits, with more expected in the future. Some are specific benefits for the Hub and AIM25; others are specific benefits for contributors, for software suppliers, and for the archive and research communities as a whole.
The improvement of interoperability, and the expansion of coverage of the Archives Hub and AIM25, is surely the most important community benefit, as it provides a larger proportion of UK archive descriptions stored in sustainable and interoperable EAD format, and increases the number of archive descriptions which are discoverable and searchable online.
The Archives Hub and AIM25 are, of course, benefitted by an increased number and variety of contributors, and also by descriptions which require less editing before they can be uploaded.
Contributors benefit by being more easily able to make their descriptions available in EAD, with the attendant benefits of interoperability and discoverability.
For aggregation services such as the Archives Hub to thrive, they need to be responsive to the online environment, to explore innovative ways to expand and disseminate content and to strike a balance between solidity and flexibility, innovation and reliability. We are aware that in an environment where archive repositories are increasingly stretched and where creating archival descriptions is time-consuming, we need to respond by showing that the Archives Hub is of benefit to archives as a means to disseminate information, raise awareness and provide all the benefits of cross-searching diverse archival content.
- The Archives Hub http://www.archiveshub.ac.uk
- Mimas http://mimas.ac.uk/
- UK Archives Discovery Network http://archivesnetwork.ning.com/
- Archives Hub strategic aims http://www.archiveshub.ac.uk/strategy/
- International Standard Archival Description http://www.ica.org/en/node/30000
- Encoded Archival Description official site http://www.loc.gov/ead/index.html
- Axiell CALM http://www.axiell.co.uk
- Adlib http://www.adlibsoft.com/
- AIM25 http://www.aim25.ac.uk/