Web Magazine for Information Professionals

Volcanic Eruptions Fail to Thwart Digital Preservation - the Planets Way

Matthew Barr, Amir Bernstein, Clive Billenness and Manfred Thaller report on the final Planets training event Digital Preservation - The Planets Way held in Rome over 19 - 21 April 2010.

In far more dramatic circumstances than expected, the Planets Project [1] held its 3-day training event Digital Preservation – The Planets Way in Rome over 19 - 21 April 2010. This article reports its proceedings.

The venue chosen for this Planets training event, the last of a series of five held over the past 12 months around Europe, was the prestigious Pontifica Universitata Gregoriana – the first Jesuit University, founded over 450 years ago in the heart of Rome and only a few minutes’ walk from the Trevi Fountain. There the Planets training team had been looking forward to welcoming delegates from all over Europe. When, therefore, a few days before the start, the ash cloud from the eruptions of the Icelandic volcano Eyjafjallajökull progressively closed increasing volumes of airspace across Europe, it also cast an increasing shadow over the entire event.

However, because approximately half the delegates confirmed their determination to attend, and all but one presenter successfully made alternative travel arrangements, the decision was taken that the event should proceed as planned, with those unable to travel being offered complimentary places at the Planets Open All-Staff Conference in Berlin on 19 May 2010.

Day 1: 19 April 2010

And so it was that delegates and presenters, having travelled from Austria, Germany, Italy, the Netherlands, Switzerland and the UK by car and train, gathered on 19 April to hear the welcome address by Reverend Father Martin M Morales SJ, Director of the University’s Archive. Father Morales welcomed the work of the Planets Project in providing practical tools and solutions for long-term digital preservation. He told the Conference that preservation has always been one of the tasks of cultural heritage institutions owing to the fragility of historic printed material which, over time, would make it increasingly difficult to provide access to scholars. Digitisation and digital preservation were simply new techniques in the continuing process of preservation. He stressed, however, the need to create structured metadata for digitised material to ensure that the problems which were now being encountered with paper material would not be encountered in the future when dealing with digitised versions.

photo (61KB) : Planets Trainers and Delegates gather in Rome

Planets Trainers and Delegates gather in Rome

Clive Billenness, Planets Programme Manager, British Library, then proposed a view of Digital Preservation in which it is treated as simply another response to Business Risk, to be addressed within an organisation’s normal corporate governance arrangements. He demonstrated how the Planets Approach provides tools and services to support each stage of the Risk Management Cycle defined in a number of Standards and Guidance documents, running from Risk Identification, through Assessment, Evaluation, Planning and Execution to Review. He also demonstrated how this approach conforms to the OAIS model. He invited delegates to return to their own organisation and examine their corporate Risk Register to see whether they could find references to risks to digital holdings within that document; and if not, to encourage their corporate risk managers to consider this as an emerging area of Risk.

Following this theme, Mark Guttenbrunner, Vienna University of Technology, one of the team developing the Planets PLATO Preservation Planning tool, then showed how the preservation of individual digital objects related to the wider policies and procedures of an organisation. He also emphasised that effective preservation planning is a cornerstone of creating a ‘trusted’ repository, citing work in this area by NESTOR, Drambora and TRAC. Mark showed a workflow from Preservation Planning to Preservation Action. He also quoted the results of a User Survey performed by Planets in 2009 which demonstrated a clear link between the availability of budget within organisations to manage digital preservation and the existence of a digital preservation policy.

After coffee, Dr Ross King, Austrian Institute of Technology, presented some thought-provoking figures about the current size and projected growth-rate of the digital universe, translating the quite abstract measure of Exabytes of data created each year into the more pictorial visualisation of a stack of CD’s which could reach to Mars and half-way back again.

Ross then considered both the challenges and incentives related to digital preservation. He noted that over the millennia, while the density at which successive civilisations have been able to store data on different media has increased (from hieroglyphics on the pyramids to the modern USB memory device), the permanence and robustness of the media used has reduced in almost direct proportion. So, while the Rosetta Stone, dating back to the second century BC, provided an effective and durable storage media to enable three languages to be compared, at 760kg it lacked portability, although it required only a human eye to read its contents. In comparison, although a modern PC could store the same content within a few milligrams of silicon, it required a range of hardware and software to render the content readable by a human.

He contrasted the issues of format and media obsolescence and reflected on the pros and cons of the two main approaches to digital preservation: migration and emulation, pointing out that Planets accommodates both of these approaches.

Finally, returning to the theme of risk in digital preservation, Ross reflected on the possible contradiction which exists between normal business planning, which is based on the principle of a rapid Return on Investment, and Digital Preservation. The latter, if not driven by legislation, is likely to be undertaken with a view on very long-term returns which might not be immediately measurable financially. He therefore agreed that a risk management-based approach might be easier to use to create a business case justifying the investment required.

Sara van Bussel, National Library of the Netherlands, then introduced delegates to the types of action which might be taken to undertake digital preservation. She compared the two main approaches to digital preservation – migration of material from one format to another and the presentation of digital material in its original format within an emulation of its original technical environment.

Migration denotes that digital objects are regularly converted from one bytestream representation, following a specific format, used by the software available at one point of time, into the representations needed by later generations of software. Emulation describes an approach which leaves the bytestreams unchanged, but instead tries to preserve the older generation of software by allowing it to run unchanged on more modern hardware. She explained the strengths and risks associated with each approach and also considered whether format migration was best performed at ingest or access.

Sara then summarised the findings of the Planets survey of preservation tool provision for the most commonly used file formats in memory institutions. This confirmed that tools were available for the 10 most commonly used formats within the 57 migration tools known to and/or used by Planets Project partners.

She then explained to the delegates the different approaches to emulation available within Planets, and finished her presentation by demonstrating the relationship between the Planets Core Registry and the other Planets components, and explaining the potential uses for the data it stores about file formats, media and preservation tools.

To close the morning session, Professor Manfred Thaller, University at Cologne, then introduced the delegates to some of the complications arising from attempting to understand digital objects – the process of characterisation. He also demonstrated, using a simple graphical object and a commonly used graphics application, how the appearance of a file can be substantially altered despite an apparently successful migration from one format to another. He then detailed the various characterisation services available within Planets to Identify, Validate, Extract and Compare the properties of different digital objects.

Manfred considered different options for the comparison of objects, and introduced the conference to Planets’ eXtensible Characterisation Extraction Language (XCEL), which describes the formats of digital objects created in different applications, and the eXtensible Characterisation Definition Language (XCDL) which is used to create application-independent XML files describing the contents of individual digital objects and so enable comparison of objects created by different applications.

He also demonstrated the need to apply automated techniques to the large-scale comparison of collections of objects owing to the time which would be required to perform human comparisons.

Having stated clearly the extent of the problems to be resolved in characterisation, Manfred then showed how Planets could assist with these problems by the use of XCEL, XCDL, the Planets Collection Profiling Tool (to inventory the different formats in use within a given collection) and the Planets Comparator to assist in the quality assurance of format migration.

After lunch, the focus on the application of Planets tools and services continued.

Edith Michaeler, Austrian National Library, provided an overview of the Planets Testbed as a service to assist with the selection and benchmarking of digital preservation tools using its own ‘corpora’ of over 4,500 digital objects in multiple formats, and also to support Preservation Planning. In preparation for the practical training sessions which would follow on Days 2 and 3 of the course, Edith outlined the process by which experiments are conducted in the Testbed and invited the delegates to register to use the Testbed, which was recently made publicly available [2], and it is also possible to email for a user account [3].

An important aspect of the Testbed approach is the sharing of experiment results, including input and output files. As a heritage institution or other content holder, this is valuable because experiment data can provide an insight into other institutions’ experiences with preservation tools and, in turn, encourage a community of digital preservation expertise. Tool developers, too, can further their understanding of the community’s needs and benchmark their products against those of other providers.

Mark Guttenbrunner then returned with his colleague from the Vienna University of Technology, Hannes Kulovits, to introduce delegates to the Planets Preservation Planning Framework and the Planets online Tool (Plato). Hannes emphasised the need for preservation planning for individual objects to be contained within a framework encompassing organisational objectives and wider preservation policies. He illustrated the variety of different potential stakeholders within an organisation with a potential interest in any preservation plan adopted.

He then demonstrated how their views might be taken into consideration within a planning assessment framework. This can then be described in an objective-based decision tree, developed either directly in the browser-based Plato tool or using the Freemind mind-mapping tool and then imported into the Plato. A structured assessment process then takes place, considering different possible preservation strategies and evaluating each against scored criteria derived from the objective tree. From this evaluation, a fully documented preservation plan can be identified and adopted for execution in the Planets Interoperability Framework.

The purpose of the session was to introduce delegates to the functionality of Plato and prepare them for practical sessions using the tool on the following two days, and so focused primarily on principles rather than hands-on use. The delegates, were however, now well prepared for the exercises on Days 2 and 3 of the event.

After a break, Ross King explained the structure of the overall Planets Framework within the context of the OAIS Repository Model. Ross demonstrated how workflows are created within the Planets Interoperability Framework using a series of templates which can either be used ‘as is’ or modified to meet the particular requirements of the activity envisaged.

Ross then demonstrated how this framework was applied within a ‘real’ digital preservation project at the British Library – a project to migrate existing digitised 18th Century Newspapers. The delegates were treated to a preview of a 5-minute documentary about this project - now available to view online at YouTube [4] - after which Ross explained how the Planets Framework had been connected to the BL’s repository services to complete this application of the Planets toolset successfully.

One positive comment which delegates have made about every Planets training course has been that they were pleased to see the inclusion of case studies about the application of Planets tools and services by partners. The Rome event was no exception as Barbara Sierman, National Library of the Netherlands, explained how it was envisaged that Planets tools would be integrated within their forthcoming project to replace their existing digital repository system (e-Depot). She also demonstrated how the Functional Model which underpins Planets is compatible with the OAIS model.

Closing the first day, Clive Billenness, British Library, gave a short presentation on plans to sustain the outputs from the Planets Project. He gave details of the newly formed not-for-profit organisation The Open Planets Foundation (OPF) [5]. He explained how the individual organisations would be able to benefit from existing Planets outputs and continuing development undertaken by the OPF either as subscribing members of the OPF or simply as part of the wider Digital Preservation Community.

Day 2: 20 April 2010

On the following day, delegates returned to participate in more practical work with the Planets tools and services.

The day started, however, with a joint presentation by Giovanni Bergamin of the Biblioteca Nazionale Centrale di Firenze and Rossella Caffo of the Central Institute for the Union Catalogue of Italian Libraries and for Bibliographic Information.

Giovanni gave a background to the Magazzini Digitale (Digital Stacks) Project and the vision of how it would from a development project into an actual service. He emphasised that this was not so much a technical IT project as a series of management activities addressing financial, curatorial and legal issues. He emphasised the similarity in the issues to be addressed in both digital and physical assets. This project focuses heavily on non-proprietary software and operating systems in order to reduce the problems with future hardware and software compatibility. For the same reasons, it uses low-cost and commonly available storage technologies relying on replication across multiple sites in preference to more complex disk mirroring.

Giovanni also explained that the project was being conducted in an environmentally conscious way, taking into account power consumption and cooling requirements in the design of solutions.

Finally he contrasted the ‘River’ and ‘Lake’ approaches to metadata management proposed by Eric Hellman and Lorcan Dempsey. The ‘Lake’ assumes relatively few metadata schemas in use within the repository model with a limited number of sources feeding them. In the preferred ‘River’ model, there is an assumption that, over a long period of time, schemas may change and stores of metadata may be fed by a large number of tributary sources. Giovanni explained why the Magazzini Digitale, with its wide and undefined potential future community of practice would be adopting the ‘River’ model.

Rossella Caffo then introduced the latest digital preservation projects in Italy. She explained how The Institute of the Union Catalogue of Italian Libraries (MiBAC) promotes preservation of and access to digital cultural heritage across Europe. Within the ATHENA Project (Access to Cultural Heritage Networks across Europe) Italy fosters the integration of museums in digitising programmes. A further joint-programming initiative chaired by the Italian ministry of culture co-ordinates research in the field of cultural heritage, both tangible and digital. These strategic framework programmes will raise awareness of digital preservation and help define a common European agenda for digital preservation research.

The Conference now moved into a series of more practical sessions. Firstly, continuing his presentation from Day 1, Manfred Thaller gave a practical presentation on the issues relating to characterising digital objects. He demonstrated how fragile digital file formats can be, and how this fragility and susceptibility to corruption through media storage or transcription error can influence the choice of file format for long-term preservation.

He also raised the important question of what is authentic when considering preservation, where rendering and layout influence the meaning conveyed. He posed this question in the context of a document transmitted in A4 page size and received in Letter page size, leading to differences in page break points and changes in the meaning conveyed by the content.

Sara van Bussel then gave a practical demonstration of two of the emulation tools available within Planets:

Sara also provided a more detailed explanation of the Planets Core Registry, explaining how it had evolved from The UK National Archives’ Pronom registry and how it can be accessed and exploited by other Planets components and also how users might make use of its services either from within Planets or as a direct data source accessible via a Web browser or via another Web service.

Next, Mark Guttenbrunner and Hannes Kulovits introduced an extended practical session working with the Planets Preservation Planning Tool, Plato. Working in syndicates, delegates were given a sample collection of archived images containing both text and graphics and were invited to use the Freemind tool to create mindmaps for different aspects of the digital preservation issues to be considered in creating an planning assessment framework. These mindmaps can be imported into Plato to form a preservation decision tree.

Typically, the decision tree considers preservation issues under 4 main areas:

These issues are expressed as measurable results, either as a Boolean Yes/No or as an ordinal value or score.

At the end of the session, the delegates brought together their proposals for different aspects of the decision tree and discussed each others’ work. This exercise led to a great deal of interesting discussion while varying requirements arose from different delegates’ perceptions of the needs from their collection.

The exercise highlighted that preservation planning cannot be done quickly, nor without a considerable amount of consultation. The Plato team confirmed, however, that template preservation plans would be provided to accelerate the process of generating a plan, and also to present a series of commonly arising considerations for different types of object.

Finally, to close the day, Edith Michaeler was joined by Matthew Barr of HATII at the University of Glasgow for a practical demonstration of the Planets Testbed in preparation for its use by delegates on Day 3. Experiments on the Testbed follow a defined cycle, and during this session, Edith and Matthew took delegates through the entire cycle from design and definition to execution and assessment of the results.

Day 3: 21 April 2010

The final day maintained its focus on the practical aspects of the Planets Tools and Services.

The day began, however, with an introduction by Amir Bernstein of the Swiss Federal Archives to their database archiving tool SIARD (Software Independent Archiving of Relational Databases). Amir began by considering the logical structure of relational databases and some of the difficulty in the long-term preservation of their contents.

The SIARD system extracts all the data tables and records from a variety of proprietary databases and stores it in XML format in a series of folders all within an uncompressed ZIP64 file. The database is supplemented with metadata generated during the extraction process and the final file is capable of both being read in its extracted form as well as re-imported back into another database for processing.

Amir then connected live to a database server in Switzerland and demonstrated the tool operating with a remote database, extracting all the data into a new .ZIP file stored on his local computer.

The tool drew many positive comments from the delegates, who were very interested to lean about future plans to expand the range of databases to which it can be connected.

The greater part of the rest of the day was filled with two further practical sessions facilitated by the course leaders.

The first one was to complete the sample PLATO plan based on the work of the previous day by testing a number of preservation options and assessing their outcomes. This provided the opportunity to explore how different decisions taken earlier within the decision tree affected the scores and recommendations emerging from the plans. Delegates were also shown how to investigate in detail the outputs from a planning exercise to determine what were the critical factors which led some options to be either rejected entirely or otherwise to be marked lower than others.

The second session was to enable delegates to conduct individual experiments within the Planets Testbed using a number of different scenarios provided for them. Not all of them were designed to end successfully, and, again, delegates were encouraged to investigate the diagnostic information to understand better why an experiment had ‘failed’.

The final session of the conference was conducted by Clive Billenness, who provided a technical explanation of the installation options for the Planets software as well as a more detailed description of the components provided within the installation suite, which includes a suite of management and configuration tools to enable Planets to be installed, set up and used by an IT-literate user without excessive difficulty.

Clive finished his presentation with a warning, however, that the Planets tools are still part of a project and are therefore likely to undergo final updates and reconfigurations before the project finishes. For this reason, he encouraged delegates to use the online versions at the Project Web site first before experimenting with their own installations.

Conclusion

And so the fifth and final Planets training event closed, and delegates and trainers set off home, this time with the airways re-opened and the roads less congested. Those delegates who were unable to attend this course because of the travel disruption have been offered places free of charge at the final Planets event – an open All-Staff Conference in Berlin on 20 May 2010.

During the past 12 months, more than 200 people have attended the Planets training events in Copenhagen, Sofia, Bern, London, and Rome. Feedback has been consistently positive about the format and content of the courses, and the training material has been continuously modified in responses to comments received.

A number of delegates have indicated that they would like the Planets team to come to their country and provide a repeat course for colleagues. While this cannot be undertaken within the Planets Project, it is anticipated that its successor, the Open Planets Foundation, will provide regular training courses around the World on a regular basis.

The material presented during the course has been placed online to provide a free training resource for the future and the training team hope that delegates will continue to benefit from it for a long time to come.

As Programme Manager for this project, I would like to take the opportunity publicly to thank here the many people who worked tirelessly and, in some cases, in the midst of all kinds of crises, to ensure that all the events ran smoothly. As always, making an event appear effortless inevitably requires a great deal of effort behind the scenes. I would also like to thank all those who attended the courses and who contributed so actively to the various sessions.

My final thanks I offer to the European Commission, who provided us with the finance not only to run these courses at a cost which made them accessible to both private researchers as well as employer-sponsored delegates, but also enabled us to make grants to delegates from certain countries to cover the full costs of their attendance.

Additional Resources

The slides used over the 3 days are now available for download [6].

In addition, a series of audio-visual presentations based on the material used on Day 1 of the Training Course are also available to view or download [7].

They provide a complete set of study material which introduces the concepts of digital preservation and gives a first outline of the Planets tools and services without requiring any deep prior knowledge about the topic.

References

  1. Planets (Preservation and Long-term Access through Networked Services)
    http://www.planets-project.eu/
  2. Planets Testbed http://testbed.planets-project.eu/testbed
  3. helpdesktb@planets-project.eu
  4. Preserving the British Library’s C19 Newspaper Collection with Planets: a short film
    http://www.youtube.com/watch?v=K6NnFcSpAh8
  5. The Open Planets Foundation (OPF) http://openplanetsfoundation.org/
  6. Zip file http://www.planets-project.eu/events/rome-2010/presentations/planets_all.zip
  7. Planets Publications http://www.planets-project.eu/events/audio-visual/index.htm

Author Details

Matthew Barr
Planets Project Testbed Team
HATII at University of Glasgow

Email: m.barr@hatii.arts.gla.ac.uk
Web site : http://www.gla.ac.uk/hatii/

Amir Bernstein
SIARD Team
Swiss Federal Archive

Email: amir.bernstein@postmail.ch
Web site: http://www.bar.admin.ch/

Clive Billenness
Planets Project Programme Manager
The British Library

Email: clive.billenness@bl.uk
Web site: http://www.bl.uk/

Manfred Thaller
Professor of Computer Science for the Humanities
University at Cologne

Email: manfred.thaller@uni-koeln.de
Web site: http://www.hki.uni-koeln.de/

Return to top