Metadata: Preservation 2000
The Cedars conference, "Preservation 2000: an International Conference on the Preservation and Long Term Accessibility of Digital Materials," was held at the Viking Moat House Hotel in York on 7-8 December 2000. There were over 150 participants, about one half from outside the UK. As a prelude to the conference proper, a one-day workshop entitled "Information Infrastructures for Digital Preservation" was held at the same venue on the 6 December. This workshop mostly concerned preservation metadata and attracted over 70 participants.
The theme of the "Preservation 2000" conference was the long-term preservation of digital materials . The conference was organised as part of the Cedars (CURL Exemplars in Digital Archives) project and was sponsored by the Joint Information Systems Committee (JISC), the Research Libraries Group (RLG) and OCLC Online Computer Library Center.Cedars is a Consortium of University Research Libraries (CURL) project funded by the Joint Information Systems Committee (JISC) under phase 3 of the Electronic Libraries Programme (eLib) . The project was funded to investigate some of the issues that relate to digital preservation and the conference was an opportunity both for the project to share information about project outcomes but also to look at some other recent international developments. Throughout the three days, demonstrations of the Cedars demonstrator archive and some BBC microcomputer emulation experiments undertaken as part of the CAMiLEON project were available in a room adjoining the conference venue .
Lynne Brindley, Chief Executive of the British Library, opened the "Preservation 2000" conference with a keynote address . The presentation described a range of digital preservation activities that had been undertaken in the UK since 1995 (including the Cedars project) and highlighted several national library approaches to the issue, including that of the National Library of Australia (NLA) and the European NEDLIB project. Through these initiatives, and others, Brindley noted that much good work had been carried out already but that there was a critical need to involve more stakeholders and to get digital preservation on the agenda of key decision makers and funding bodies. She pointed out that those concerned with the digital preservation problem have not "yet brought seriously on board authors, publishers and other digital content creators, funding agencies, senior administrators, hardware and software manufacturers, and so on." The presentation ended with a plea for a national (and international) manifesto, with eight specific commitments outlined. In summary, these commitments included:
- Public relations - there is a need to get digital preservation on the agenda of key decision-makers and funding agencies.
- Web archiving - there is a specific need to collaborate in research and development (and implementation strategies) for preserving significant Web sites.
- Digital preservation strategies - there is a need for the development of digital preservation strategies both at national and international level.
- Collaborative working - there is a need to commit to working internationally and collaboratively ways that we can learn from mistakes as well as successes. We also need to ensure that there are links with other international stakeholder groupings. Specifically within the UK, it will be important to support the creation of the Digital Preservation Coalition.
The papers that followed were divided up into broad themes. For example, the first session was entitled "Models for distributed digital archives," and included an account of Cedars project outcomes by Kelly Russell of the Consortium of University Research Libraries (CURL) and a description of the LOCKSS project by Vicky Reich of Stanford University Library.
Russell's paper gave some background on the Cedars project and stressed the "vertical learning curve" that project participants had faced at the beginning of the project. She described how the project had adopted a distributed architecture based on the draft Reference Model for an Open Archival Information System (OAIS) produced by the Consultative Committee on Space Data Systems (CCSDS) . The presentation also outlined the main project deliverables, which included the included the development of a demonstrator archive, the production of a metadata schema and guidance for collection managers. In summing up, Russell noted a number of lessons that the project had learnt over the past two and a half years. Without going into too much detail, it might be useful to enumerate some of these lessons here. First, it was noted that adopting the OAIS model had been important because - in addition to providing a generic architecture for a digital repository - it also helped the project to acquire a shared understanding of vocabulary and concepts. Secondly, Russell noted the vital importance of the creation and maintenance of metadata regardless of what particular digital preservation strategy has been adopted. Indeed, Russell pointed out that - in one sense - digital preservation is "all about metadata." This then raises the issue of what specialist knowledge and expertise will be required of those staff who will need to create and maintain this metadata. The experience of the Cedars project suggests that there may need to be some sharing of expertise with specialists outside the traditional library and information domain, e.g. with computer scientists. There may also be specific educational requirements for all of the cultural heritage professions, some of which could be dealt with by specialised courses like the MPhil in Digital Management and Preservation offered by the Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow . Russell finished her presentation with an explanation that funding had been acquired for an additional year of the project. This would enable some scalability testing of the Cedars demonstrator to take place as well as some continued work on preservation metadata standardisation and the organisation of some collection management workshops in conjunction with the JISC Digital Preservation Focus activity.
The LOCKSS (Lots of Copies Keeps Stuff Safe) project is an initiative of Stanford University Libraries . It is concerned with maintaining access to Web based journals (chiefly in the scientific, technical and medical areas) by distributing copies in Web caches managed by a number of distributed organisations (chiefly libraries). If many libraries are able to take "custody" of Web content in this way, the caches can communicate with each other through a protocol called LCAP (Library Cache Auditing Protocol) and recover lost content. There was quite a lot of interest in the LOCKSS approach at the conference, but it was noted that the technique could only be applied at the moment to Web based information of a non-volatile nature.
The next session concerned the management of national collections of digital information. This included descriptions of preservation activities in the British Library by Helen Shenton, a description of the NEDLIB project by Lex Sijtsma of the Koninklijke Bibliotheek (the National Library of the Netherlands) and a description of the National Library of Australia's experience by Colin Webb. Helen Shenton's presentation was entitled "From talking to doing" and it described some of the preservation initiatives of the British Library, including the preservation functions of its newly commissioned Digital Library System (DLS). The core of the presentation, however, concentrated on the diversity of staff skills that are required, and how the British Library has built up its own expertise in this area both by involving existing staff in its digital preservation initiatives and by bringing in others from outside. Lex Sijtsma described the European Union-funded NEDLIB (Networked Deposit Library) project, whose partners included a number of European national libraries and other organisations . He described the project's development of a Deposit System for Electronic Publications (dSEP) - based on OAIS - and demonstrated an "Interactive Work Flow Tour" of it. Colin Webb's presentation reviewed National Library of Australia (NLA) initiatives from the beginning of the PANDORA (Preserving and Accessing Networked Documentary Resources in Australia) project . This was an interesting paper based on the NLA's extensive experience of doing digital preservation. In his reflections on the strengths and weaknesses of the NLA approach, he noted that a particular long-term strength was the fact that the library had not been reliant on external funding for its digital preservation initiatives. The NLA itself had funded them as part of its core business. This meant that these initiatives were not dependent on the whims of short-term research-type funding and that the NLA itself had built up a considerable amount of expertise in digital preservation.
The second day of the conference began with a session on the practicalities of digital preservation. This started with a description of the draft Workbook for the Preservation Management of Digital Materials by Maggie Jones of the Arts and Humanities Data Service . The "Workbook" is the result of a Library and Information Commission (now Re:source) funded project and is a comprehensive outline of best practice in the digital preservation area. Jones's paper concentrated on the results of the peer-review process that had just been completed and looked forward to developing it further as a training tool through various workshops. This presentation was followed by a paper entitled "Comparing Preservation Strategies and Practices for Electronic Records" by Michèle Cloonan and Shelby Sanett of the University of California, Los Angeles. This reported on a study carried out on behalf of the Preservation Task Force of the InterPARES project . The study involved some interviews with individuals involved in digital preservation strategies. The presentation concentrated on perceived changes of emphasis in how the term "preservation" was understood and some preliminary thoughts on the economic costs of preservation to institutions. The session was wrapped-up with a brief outline of intellectual property rights issues by Ellis Weinberger of Cambridge University Library and the Cedars project.
The next session broadly covered those important issues that relate to preserving the authenticity of digital information. The first presentation was by Nancy Brodie of the Treasury Board Secretariat of the Government of Canada who addressed authenticity issues with regard to the requirements of scholars, law librarians and governments. This was followed by a presentation by Kevin Ashley of the University of London Computing Centre (ULCC) who, like Brodie, emphasised the need for the authenticity of digital objects to be known, and suggested that any accompanying metadata itself needed to be part of the chain-of-proof. He also looked at how differing rights of access might need to condition which parts of digital resources might have to be "hidden" from users or archive staff. For example, personal data in databases might have to be deleted or anonymised, as might some descriptive metadata. The final paper in the session was an outline by George D. Barnum of the US Government Printing Office (GPO) of that organisation's creation of a Federal Depository Library Program Electronic Collection . This looked at the changing nature of the GPO as it changes from an organisation primarily concerned with the distribution of printed materials to traditional library-type organisations into being the host of digital collections in addition to it's original role.
The final session was about "working-together" and included an account of some ongoing international collaboration concerning digital preservation metadata by Robin Dale of RLG. The presentation included a report of the Information Infrastructures for Digital Preservation workshop that was held the day before the conference (and which is described in more detail below). This was followed by a description of the JISC's Digital Preservation Focus activity by Neil Beagrie, the Assistant Director (Preservation) of JISC's DNER (Distributed National Electronic Resource) Office . One of the most important tasks of the Digital Preservation Focus will be the setting-up of an UK Digital Preservation Coalition that will be an important focus of continued collaborative activity in this area, both within the UK and internationally.
The conference ended with a closing keynote by James Michalko, the RLG president. He echoed many of the more internationally applicable points of Lynne Brindley's proposed manifesto and raised some difficult issues. For example, he asked participants to note important groups of people who were not in attendance at the conference: e.g., high level decision makers, software manufacturers, publishers and government agencies. He also noted that as yet there were no widely accepted business models for digital preservation in place. In his comments on Lynne Brindley's manifesto, he suggested that the UK Digital Preservation Coalition could help take forward the public relations agenda and orchestrate research and development initiatives with reference to the wider international context.
Information Infrastructures for Digital Preservation workshop
On the day before the "Preservation 2000" conference started, a one-day workshop entitled "Information Infrastructures for Digital Preservation" was held at the same venue. The main theme of the workshop was digital preservation metadata. Accordingly, the first presentation of the day was by Brian Lavoie of OCLC, who gave an outline of a White Paper produced as part of the work of the joint OCLC and RLG Working Group on Metadata for Digital Preservation . The White Paper, which will be published in January 2001, describes the current state-of-the-art on the development of metadata to support digital preservation. The paper begins by defining the objectives of a broadly applicable preservation metadata element set and then reviews the OAIS model with regard to how it characterises the range of metadata elements needed in order to support the operation of a digital archive. Lavoie pointed out that the OAIS model was a useful starting point for the identification of the types of information required to manage preservation in an archival system. He also noted, however, that it should not be treated as a rigid blueprint and that the model may need to be adapted, extended and altered. The White Paper also reviews - and attempts to synthesise - the various draft metadata specifications proposed by the Cedars project , the National Library of Australia (NLA) , the NEDLIB project  and Harvard University Library. Lavoie noted areas of convergence between the various specifications, notably the explicit or implicit influence of the OAIS model and their emphasis on defining preservation in terms of maintaining the accessibility of digital information objects in the context of changing technical environments. He also noted points of divergence relating to important issues like granularity and implementation. Lavoie pointed out that after the White Paper is published, the real work of the joint OCLC/RLG working group will begin. This will include the development of an overall metadata framework and the identification of those metadata elements that will be able to support. The working group will also have to address implementation issues - possibly with some kind of testbed project - and produce some recommendations on best practice.
At the end of Lavoie's paper, an international reaction panel made some comments. The panel included representatives of the developers of all of the preservation metadata initiatives reviewed in the White Paper. The panel comprised Stephen Chapman of Harvard University, Kelly Russell of the Cedars project, Colin Webb of the NLA and Catherine Lupovici of the Bibliothèque nationale de France (BnF) on behalf of the NEDLIB project. All of the panel were happy with the general approach of the White Paper and appreciated the RLG and OCLC's support of the working group. Several of the panelists made the point that the preservation metadata specifications that were produced as part of their projects were developed in response to specific practical requirements and that it was interesting to see how these would all be able to fit into a generic metadata framework.
The following four presentations concerned particular projects. Inka Tappenbeck of the State Library of Lower Saxony and University Library of Göttingen (SUB Göttingen) described the ongoing work of AP 2/5 of the CARMEN (Content Analysis, Retrieval and MetaData: Effective Networking) project . The project is funded by the Global Info programme - a programme supported by German scientific societies, scientific information centres, libraries, publishing houses and the Federal Ministry for Education and Research (BMBF). Work package (AP) 2/5 of CARMEN concerns metadata for terms and conditions and archiving. Partners in AP 2/5 include SUB Göttingen, the publisher Springer-Verlag and Munich University of Technology (TUM). AP 2/5 wanted to use existing standards and, where possible, had based their metadata element set on existing standards like the Dublin Core Metadata Element Set (DCMES) and the OAIS model. The OAIS model provided the starting point for the structure of the CARMEN AP 2/5 metadata element set with, for example, Reference Information being provided by DCMES 1.1 and Provenance Information being based on an adapted version of the metadata described in the Cedars outline specification. The prototype set was then compared with the format used by the Göttingen Digitisation Centre (GDZ).
In the second presentation, Oya Rieger of the Department of Preservation and Conservation at Cornell University Library described the metadata aspects of Project Prism . Prism (Preservation, Reliability, Interoperability, Security, and Metadata) is a four-year research project funded by phase 2 of the Digital Libraries Initiative and involves personnel from both Cornell's Computer Science department and Cornell University Library. A major focus of Project Prism is information integrity. Rieger talked about the need for an well defined data model for metadata to ensure extensibility. The presentation also mentioned two other Cornell-based projects. The first was a Web site profiler tool developed by students in the Computing Science department that can be used to analyze Web sites and create a profile of characteristics that might be "important in maintaining, mirroring and preserving them" . The second project concerned the implemention of a digital preservation strategy for Cornell's digital image collections, a project funded by the Institute of Museum and Library Services (IMLS) .
The third presentation was by Günter Mühlberger of the University of Innsbruck. He described the Metadata Engine (METAe) project funded by the European Commission under the 5th Framework Programme . The project intends to develop digitisation software that will be able to extract metadata about a document's structure during the digitisation process itself and create a rich text structured in XML (Extensible Markup Language). This enriched output of the digitisation process might enable new products to be developed, might enable easier access for visually impaired users and might also aid the long-term preservation of the digital object created by the digitisation process.
Finally, Margaret Byrnes of the US National Library of Medicine (NLM) outlined the findings of the NLM's permanence working group. The working group developed an initial permanence rating system that could be applied to the range of digital resources that the NLM publishes.
Some common themes
It is very difficult to identify and summarise all of the issues that were raised at the Preservation 2000 conference and the Information Infrastructures for Digital Preservation workshop, but there follows a personal attempt to identify some common themes. Please note that these are in no particular order and as they are a result of personal reflection, some other important themes may have been neglected.
- Preservation strategies - Lynne Brindley reminded us that in a world of digital information we are no longer able to rely on "benign neglect" as a backup preservation strategy. This strategy worked quite well (in some contexts) for traditional information types, but it is not a sustainable option in the digital era. Those concerned with digital preservation need to be involved at the beginning of the digital life cycle.
- Data creators and publishers - it is vital that there is good communication between those who create and publish digital information on one hand and those who are responsible for its preservation on the other. This is not just an intellecual property issue, but is based on the fact that data creators and publishers often have a very detailed technical understanding of the nature of the resources they make available. This technical knowledge might need to be part of the metadata that accompanies a digital resource.
- Intellectual property rights (IPR) - some of the issues relating to IPR were discussed in Ellis Weinberger's presentation, but solving IPR issues - in collaboration with publishers and other rights owners - remains an important problem that needs to be addressed by libraries and other cultural heritage organisations. Negotiating specific rights for preservation may need to be part of license negotiations between publishers and library consortia.
- Collection management - although we don't have precise figures, there is a perception that digital preservation (if not digital storage per se) costs a lot of money. Collection management policies for digital information will have to find a balance between keeping everything and keeping the minimum amount of information possible in order to maintain the possibility of future serendipity. Lynne Brindley said that "we must bear in mind that we are in effect deciding what record will be available in the future: a decision not to select a digital document means there is unlikely to be the serendipitous find in the future."
- Metadata - two of the presenters (Kelly Russell and Oya Reiger) emphasised the centrality of metadata in the digital preservation process. The work of the OCLC and RLG Working Group on Metadata for Digital Preservation will be important in informing the future development of preservation metadata standards.
- Web archiving - the importance of preserving (parts of) the World Wide Web was raised several times at the conference. This should be the topic of future research.
- Staff expertise - Kelly Russell and some others raised the issue about the level of staff skills required for digital preservation activity, especially the creation of technical metadata (e.g., Representation Information in OAIS terminology). This may need to be the focus of specific education and training needs.
- Collaboration - throughout the conference and workshop there was an emphasis on working together and sharing experiences, successes and failures. It is hoped that the development of groups like the UK Digital Preservation Coalition will help foster international collaboration and co-operation.
The Preservation 2000 conference was a good reflection of the current state-of-the-art in digital preservation. It demonstrated, if nothing else, that digital preservation is slowly moving from being the focus of specific research and development projects into being seen as part of the core mission of libraries and other cultural heritage organisations. That is not to say that there is no need for more research and development work. There is, and it is hoped that this can be co-ordinated on a national level. Full proceedings of both the conference and workshop will be made available on the RLG Web site  and the conference papers in a special issue of the New Review of Academic Librarianship. An inital account of both events has already been published in RLG DigiNews .
- 1. Preservation 2000: an International Conference on the Preservation and Long Term Accessibility of Digital Materials: http://www.ukoln.ac.uk/events/cedars-2000/
- 2. Cedars project: http://www.leeds.ac.uk/cedars/
- 3. CAMiLEON project: http://www.si.umich.edu/CAMILEON/
- 4. Lynne Brindley, Preservation 2000: keynote speech, Presentation given at: Preservation 2000: an International Conference on the Preservation and Long Term Accessibility of Digital Materials, Viking Moat House Hotel, York, 7-8 December 2000. http://www.bl.uk/concord/otherpubs_speeches04.html
- 5. Consultative Committee for Space Data Systems, Reference model for an Open Archival Information System (OAIS), Red Book, Issue 1. CCSDS 650.0-R-1. Washington, D.C.: National Aeronautics and Space Administration, May 1999. http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html
- 6. University of Glasgow, Humanities Advanced Technology and Information Institute, MPhil in Digital Management and Preservation. http://www.hatii.arts.gla.ac.uk/Courses/DigitalMPhil/
- 7. LOCKSS: http://lockss.stanford.edu/
- 8. NEDLIB project: http://www.kb.nl/coop/nedlib/
- 9. PANDORA project: http://pandora.nla.gov.au/pandora/
- 10. Neil Beagrie and Maggie Jones, Preservation Management of Digital Materials Workbook. Pre-publication draft, October 2000. http://www.jisc.ac.uk/dner/preservation/workbook/
- 11. InterPARES project: http://www.interpares.org/
- 12. Government Printing Office (GPO), Federal Depository Library Program Electronic Collection (FDLP/EC): http://www.access.gpo.gov/su_docs/locators/net/abtfdlpec.html
- 13. JISC Digital Preservation: http://www.jisc.ac.uk/dner/preservation/
- 14. RLG and OCLC Explore Digital Archiving. RLG News Release, 10 March 2000: http://www.rlg.org/pr/pr2000-oclc.html
- 15. Kelly Russell, Derek Sergeant, Andy Stone, Ellis Weinberger and Michael Day, Metadata for Digital Preservation: the Cedars Outline Specification. Leeds: Cedars project, March 2000. http://www.leeds.ac.uk/cedars/OutlineSpec.htm
- 16. National Library of Australia Preservation Metadata Working Group, Preservation Metadata for Digital Collections: Exposure Draft. Canberra: National Library of Australia, 15 October 1999. http://www.nla.gov.au/preserve/pmeta.html
- 17. Catherine Lupovici and Julien Masanès, Metadata for Long Term Preservation. NEDLIB Report series, 2. The Hague: Koninklijke Bibliotheek, July 2000. http://www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm
- 18. CARMEN AP 2/5: Metadaten zu Terms und Conditions und zur Archivierung: http://harvest.sub.uni-goettingen.de/carmen/
- 19. Project Prism: http://www.prism.cornell.edu/main.htm
- 20. Web Site Profiler: http://www.cs.cornell.edu/Courses/cs501/2000fa/project-folder/profiler.html
- 21. Preserving Cornell's Digital Image Collections: Implementing an Archival Strategy: http://www.library.cornell.edu/imls/
- 22. Metadata engine project: http://meta-e.uibk.ac.at/
- 23. Papers from the Preservation 2000 conference and presentations from the Information Infrastructures for Digital Preservation workshop are available on the RLG Web site: http://www.rlg.org/events/pres-2000/
- 24. Robin Dale and Neil Beagrie, "Digital Preservation Conference: Report from York, UK." RLG DigiNews, Vol. 4, no.6, 15 December 2000. http://www.rlg.org/preserv/diginews/diginews4-6.html#feature2
Photographs by Philip Hunter, UKOLN