Further Experiences in Collecting Born Digital Archives at the Wellcome Library
In a previous article  we discussed how the Wellcome Library had accepted that born digital material  will form part of its collections in the future. Work is now under way to give practical shape to these plans, and in the last six months born digital archival material has begun to be acquired by the Library. This article assesses the progress that has been made and discusses the experiences, and challenges, of dealing with real digital material.
Rules of Engagement
To recap from our previous report, the Library has acknowledged that much of the expertise necessary for dealing with digital material already resides with the archivists who work in the Library. The processes for acquiring and managing digital material are being built on sound archival practice, and driven by the archivists, supported by one new appointment, a digital curator to provide technical support. To date, this approach has proved to be robust, flexible and practicable.
We have begun to identify opportunities for acquiring digital material, initially from existing donor/creators, since in these cases we can build on an existing relationship. Much work has taken place to set out and make explicit the workflows that will determine the flow of digital material into the Library, and the sequence of activities that will track its course from negotiation to ingest into a repository. We have described how the workflows we were setting up for digital material were found to mirror quite closely those in existence for traditional paper material. However, digital material is 'diverted' down new paths for processes such as anti-virus checking and the creation of file listings. These processes generate new data, 'digital provenance', as well as technical metadata and some new information flows between archivist and donor/creator. Though we are far from clear about exactly what sorts of metadata we will eventually require and define more clearly what we shall require from these tools.
We confirm receipt of digital material and its virus-free status (or otherwise) and provide donor/creators with a 'manifest' that details the material they have transferred to us. These new workflow paths can be seen as short loops added to existing workflows; indeed, one can see virus scanning before ingest as analogous to fumigation of paper records from a suspect source before allowing them into the strong room. These parallelisms provide something of a familiar model for donor/creators and for archivists facing this new medium.
We also supply donor/creators with an FAQ (Frequently Asked Questions) that sets out the broad steps we take in dealing with their digital material, as well as encouraging them to contact us with any concerns that they may have. In this way we aim to maintain the trust of our donor/creator community by providing processes that are transparent and predictable.
Experiences with Digital Materials
Working with digital material has proven more dynamic than we had anticipated. Each transfer varies - in format, in size/total volume and the challenges associated with rendering it. Each has different and often new issues relating to selection, appraisal, arrangement and description. The design of our process has been cyclical: plan - implement - evaluate - plan. This iterative approach is designed to build a framework to structure our work. It allows us to change and adapt any of our plans very quickly in response to either learning or circumstance. By applying a consistent set of processes we ensure consistency across these widely differing situations.
There seems to be a fear of digital material and some uncertainty about how to move this material efficiently into the future. Understanding this has helped the Library identify how and when it has a role to play. As a collector of material the Library has to deal with many organisations as well as material in a variety of formats. We have a choice as to what we accept and the terms on which we do so. Equally, donor/creators have no obligation to provide us with material. We have to work hard to convince them that we can be trusted. The 'fear' of digital material cuts two ways: donor/creators can be nervous, and have to feel sure that the Library's strategy is appropriate for their material and addresses their concerns. Equally there is an awareness of the evanescence of digital material, and the need for urgent action to preserve it that can work in the Library's favour, provided our strategy convinces. To date our approach is succeeding with donor/creators. Feedback seems to indicate they value honesty about what we can promise and what remains to be addressed. For example, for the moment our focus is upon collection, ingestion and safe custody, with production of the material to researchers representing a separate problem to be addressed later. Donor/creators respond positively to our setting this out clearly. It also means that their learning curve is not so steep as to be manageable. Emphasising that this is a learning process for both parties has proved a productive approach.
In addition to working with material that has survived to be passed to us, we now understand better how material is lost. For example, one organisation promised us material but subsequently could not find it on their network. It later transpired the material in question had probably been deleted when it was discovered that the 'old' formats were incompatible with the new 'office' type software the organisation had adopted. Another organisation approached us after its digitisation project failed. It had looked to digitisation to provide gains in economic and operational efficiency; but it had not thought through the storage and access issues. The project failed leaving it with the original paper files as well as a considerable amount of unorganised and unstructured digital material. Others assume that we will want to receive all material only in PDF, particularly PDF (A) format.
Management support for our plans to collect digital material has remained crucial. Communication has been an essential strategy in maintaining this support. Staff involved have been communicating ideas, progress, and intentions to management, as well as to those staff members less closely involved, on a regular basis. This is not a trivial activity: it takes the time and effort of the Library's Digital Curator and two archivists, as well as affecting, to a lesser extent, the available time of other archivists. 'Going digital' is proving neither a cheap nor easy option.
As noted above, leveraging the existing goodwill and trust of our donor/creator community is crucial. However, the transition to working digitally is not completely smooth for some organisations. Stressing the experimental nature of our current operations, we have suggested that paper transfers should continue as normal, whilst we also receive digital material, with the donor/creator and the archive both taking a relaxed attitude to any resulting duplication. It was hoped that this would take some of the pressure off the donor/creator, who could operate knowing that there would be a paper insurance copy of the digital material transferred. In fact, if anything, this can lead to more complex interactions. Many organisations do not have any sort of organised records management policy for digital material and, accordingly, different staff may handle the different record formats. Paper records may be handled by a records/information management section, with varying degrees of corporate support, whilst, on the other hand, locating and transferring digital material may be the responsibility of an IT department with different perspectives upon what constitutes important material. Digital material can be found 'on request' by those responsible but we have much work to do if we are to build regular transfer schedules into their everyday business. A good relationship with the section handling paper records is crucial and can be leveraged into an entrée to the IT staff when a direct approach might be more difficult. There is a degree of redundancy involved in explaining again concepts with which the manager of paper records is already familiar, and in building once again that relationship of trust. Frequently Asked Questions (FAQs) and the plan to create pages on our Library web site should help make this education role more economical of effort.
From the outset we knew that the support of our own IT department would be crucial. We have continued to work with these colleagues, to seek their advice and to keep them informed of our plans and activities. With their support we have established a process for quarantining incoming digital material. We have also established procedures for virus checking and have secured stand-alone hardware for this purpose. Working with our IT colleagues has demonstrated to them our understanding of the issues and risks of bringing digital material into our corporate IT environment. In turn we are trusted to carry out anti-virus checks, and so retain full control over the material we are acquiring. As in our relations with donor/creators, openness and communication build the relationships on which the work succeeds.
The Library is clear that the only sustainable way to hold and manage digital material is through the use of a digital object repository. Work began on defining requirements for a repository in autumn 2007. Our requirements build on our experience with the Fedora repository, requirements documents from other institutions and with significant input from the Library's archivists. The requirements are supported by documents outlining our workflows in relation to digital materials, our Preservation Policy , and work to identify those file formats we believe we have the ability and resources to preserve adequately.
Practical Problems Encountered
Our work to date has taken the form of trial runs, engaging for the first time with the practical issues of acquiring digital material. So far the Library has not encountered any major problems: for instance, we have not yet met any virus-infected material. There have, however, been some problems with validation keys assigned by the donor/creator. For some the use of such tools is an unfamiliar concept. In other cases we have not always been told what tools were used; even when we have, it can still be difficult to validate material successfully. Greater experience on our part and on the part of donor/creators is expected to smooth this process.
One trend that seems to be emerging is the role of digitisation programmes. It is, of course, traditional that transfer of archives is often driven by running out of storage space. Our experience in the past few months - which may or may not turn out to be representative - suggests that organisations which formerly would simply have sought a home for bulky paper records may now reach for digitisation as the solution for bulk storage problems; sometimes without a clear picture of the preservation and storage requirements that such a programme would have.
We noted above the nervousness that exists around digital material, and resistance to transferring digital material to the Library. Issues and concerns seem to centre on uncertainty of the permanence of the material, or there is confusion about what material we are attempting to acquire and for what reason. Somehow digital material is being perceived as 'different'. It is essential that we continue to work with our donor/creator community and further develop an education programme. This will acknowledge that for the near future paper and digital materials may be looked after by different officers in an organisation, who will have different skills, concerns and priorities.
What Remains to Be Done?
We still lack experience in dealing with digital material, and we have only just begun to engage with our donor/creators. We need to communicate widely that the Wellcome Trust is collecting digital material if we are to gain donor/creator trust. We will be making a space on the Wellcome Library Web site for digital curation and these pages will make our FAQs and other model documents publicly available. Nonetheless we anticipate that this education role will always represent a part of our work.
There still remains a gap between our plans and their implementation. We lack the practical tools to make working with digital material easier and more reliable. In particular we lack the tools to acquire technical metadata automatically. In the short term this may not represent too large a problem since we are 'learning by doing'. In the longer term greater experience with digital material will help us fill this gap.
As an institution that collects from a potentially huge variety of sources we are likely to acquire a wide range of material and formats. We know that much of the management of this material could be automated. We are not yet in a position to achieve this. We lack practical tools and infrastructure for the automated extraction of technical metadata and we have not yet done much work to identify what metadata, especially technical metadata, we will require for long-term preservation. We also have further work to do with donor/creators to ensure that they provide us with regular transfers of material, ideally in 'preferred' formats along with some of the necessary metadata. In the short term we will have to undertake manual management of most incoming material, which is not sustainable in the long term, although it is a great learning opportunity.
Are We Succeeding?
Clearly we are embarking on a long journey and it is too early to talk of success or failure, but based on our recent work the signs are promising. We are acquiring real digital material and donor/creators seem receptive to the way in which we are engaging with them. To date we can demonstrate progress, which feeds into the growth of confidence in our work and makes further progress likely. Success, we trust, will build upon success.
Going digital is clearly not a easy option nor one to be undertaken lightly. Yet engaging with digital material has proved to be a valuable practical step. Only by engaging with it can we test our hypotheses, learn from our experience and move forward. Building on the professional skills of archivists has been a useful example of this very practical and successful approach. The key to this has been staying flexible and being prepared to change any plans in the light of experience.
Examining the differences and/or similarities between digital and physical confirms for us that we are building processes that are meaningful for our library. Increasingly similarities between digital and physical material suggest that we should re-examine our professional practice and apply more consistency of process between digital and physical material. This makes for more economically sustainable activities, leverages existing skills and experience and makes it easier to assimilate new material into our collections. We also recognise the value of documenting our processes. By documenting how we plan to work, we are creating tools to feed into our cyclical planning process of plan - implement - evaluate - plan.
Communication is proving crucial to supporting this work. Not only are we communicating what we are doing with our donor/creator community, but with our own senior management and with our own IT department. Transparency and openness are being successfully used as tools to build trust and confidence. The process is slow, especially with external donor/creators, but those that we work with and keep informed feel engaged in our processes and supportive of our long-term strategy.
It also means making changes to our business as we go. In working through how we will deal with incoming digital material we are finding ourselves re-examining how we deal with physical material and making explicit processes that had hitherto been taken for granted. We are looking at how we communicate with our donor/creator community, how and why we record actions and activities we undertake, and the ways in which we appraise incoming material. This willingness to re-examine the way we work and to change it where necessary is an important step in the right direction.
The Wellcome Library has accepted that born digital material will form part of its collections in the future: we have only made first steps towards the achievement of that aim but it is proving to be an interesting and fruitful journey.
- "Collecting Born Digital Archives at the Wellcome Library", Christopher Hilton and Dave Thompson, January 2007, Ariadne Issue 50 http://www.ariadne.ac.uk/issue50/hilton-thompson/
- Hereafter simply referred to as digital material.
- "Wellcome Library Preservation Policy for Materials Held in Collections", http://library.wellcome.ac.uk/assets/wtx038065.pdf