Web Magazine for Information Professionals

iPRES 2008

Frances Boyle and Adam Farquhar report on the two-day international conference which was the fifth in the series on digital preservation of digital objects held at the British Library, on 29 - 30 September 2008.

iPRES 2008, the Fifth International Conference on Digital Preservation, was held at the British Library on 29-30 September, 2008. From its beginnings five years ago, iPRES has retained its strong international flavour. This year, it brought together over 250 participants from 33 countries. iPRES has become a major international forum for the exchange of ideas and practice in Digital Preservation.

The theme of the conference was ‘Joined Up and Working: tools and methods for digital preservation’. Preserving our scientific, cultural and social digital heritage draws together activity across diverse disciplines. It transcends international boundaries and incorporates the needs of disparate communities. By working together, there has been some real concrete progress towards solving the problems that were identified in earlier years.

This was the first year that iPRES collected and published full papers in addition to the presentations provided at the conference. Authors’ abstracts were reviewed by at least three members of the Programme Committee for quality, innovation, and significance. Venues are very limited in number for publishing conceptual frameworks, scientific results, and practical experience in digital preservation. The inclusion of full papers makes an important contribution to the field by addressing this problem.

This was also the first year that iPRES had a vendor and sponsor presence during the conference. Ex Libris, Sun and Tessella demonstrated offerings with emerging preservation components. As digital preservation moves into the main stream, it is anticipated that this trend will continue.

The conference was blogged solidly. Particular highlights included Chris Rusbridge, Director of the Digital Curation Centre [1], who provided a comprehensive near-real-time report and the Information Environment blog from the JISC [2].

There were two tracks in the programme. One designed for those with an interest in practically preserving digital content within their organisation, the other designed for those with an interest in underpinning concepts and digital preservation technology. The programme was a mix of short presentations (between 15 – 20 minutes) and longer panel discussions with audience participation.

Day One – 29 September 2008

The keynote address was given by Dame Lynne Brindley, CEO of the British Library. Her theme was digital preservation as a component of a larger complex jigsaw. She related some of the projects and work which is ongoing at the British Library. She noted that there was now an awareness of the need for digital preservation even from the general public, as confirmed in the public’s response to a recent British Library (BL) project on email preservation. Her final comments, and this was a thesis which was returned to throughout the conference, was that the term digital preservation was perhaps now not the most appropriate term for the activities involved – she suggested ‘preservation for access’.

Session 1: Modelling Organisational Goals

1. Digital Preservation: A Subject of No Importance? Neil Beagrie, Najla Rettberg
2. Modelling Organisational Goals to Guide Preservation. Angela Dappert
3. Component Business Model for Digital Preservation. Raymond Van Diessen, Barbara Sierman.
4. Development of Organisational and Business Models for Long-term Preservation of Digital Objects. Suzanne Lang, Michael Minkus.

The opening session of the conference looked at high-level mapping of digital preservation policies to wider institutional goals and strategies. Across the four papers the recurrent message was that in order to gain resonance and ‘buy-in’ from the host organisation the approach needed to be contextualised. It was essential to take into account the overall goals of the organisation.

The first paper reported on work funded by JISC to model a generic preservation policy - the work has not been published fully, at the time of the conference or of writing, but this is something which will certainly be of interest to the wider community. The other three speakers looked at more detailed approach to modelling at specific institutions, the KB and the Bavarian State Library. The Planets [3] Project looked at the risk management issues associated with decisions which would need to be made at the local organisational level dependent on the approach taken.

Session 2: Disciplinary Contexts (Practitioner Track) and Digital Preservation Formats (Technical Track)

Disciplinary Contexts
1. Long-term Preservation of Electronic Literature. Sabine Schrimpf
2. Preservation of Art in the Digital Realm. Tim au Yeung
3. Dexterity: Data Exchange Tools and Standards for Social Sciences. Louise Corti
4. Sustaining Digital Scholarship as Cooperative Digital Preservation. Bradley J Daigle
5. Adapting Existing Technologies for the Digital Archiving of Personal Lives. Jeremy Leighton John

Each of these parallel tracks had five presentations concluding with a question and answer session.

The practitioner track featured presentations about diverse formats from a range of institutions and countries. A common thread throughout the presentations was the need to couch their work in terms of the organisational needs and objectives and to ensure that ongoing work was given an institutional orientation.

We heard about a joint approach by the Deutsche Nationalbibliothek, the Deutsche Literaturarchive Marbach and nestor in their work to develop preservation strategies for electronic literature. From the University of Calgary we heard about the issues around preserving digital art from a number of case studies. The speaker concluded that there were no definite answers in this field and that the work was in a state of flux.

The complex problem set surrounding digital scholarship was illustrated from work at the University of Virginia. The thrust of this paper was that to sustain digital scholarship there needs to be a collaborative approach taken at the organisation level to formulate an achievable digital preservation strategy.

The work at the UKDA on the data exchange tools for social scientists graphically illustrated the issues in working with a group of researchers who all work and produce data on a variety of proprietary software packages. The JISC-funded work looks at open data exchange formats which will eventually produce a suite of tools to enable data to be preserved and remain exchangeable.

The final paper provided a fascinating look at the BL’s Digital Manuscripts project. The work combined digital forensics with an evolutionary approach to the issues. The key message pointed to a need for a flexible and diverse approach to work in this area.

Digital Preservation Formats
1. Enduring Access to Digitised Books. Oya Rieger
2. Creating Virtual CD-Rom Collections. Kam Woods
3. Preservation of Web Resources, The JISC PoWR Project.Brian Kelly
4. Preserving the Content and Network:An Innovative Approach to Web-archiving. Amanda Spencer
5. What? So What?”: The Next-Generation JHOVE2 Architecture for Format-Aware Characterization.Stephen Abrams

The session on Digital Preservation Formats drew on concrete experience among the presenters defining formats to preserve specific types of content. The texture and detail provided by the presenters made this one of the most enjoyable and memorable sessions at the conference. It began with Oya Rieger’s overview of the work that she and colleagues have done at Cornell University to define formats to support their large-scale book digitisation programme. Kam Woods from Indiana University described how they provide a virtual CD library online and highlighted many of the issues that arose in so doing. Brian Kelly of UKOLN gave an especially entertaining presentation that highlighted archived versions of the University of Bath Web site over many years; this demonstrated how the University has changed its image of itself and interacted with its students. It made a powerful argument to justify the need to archive, preserve, and use online material. Amanda Spencer gave an enjoyable exposition of the work that is going on at The National Archives (TNA - UK) to capture and preserve the increasing amount of Government material published on the Web. To complete the session, Stephen Abrams provided an update and outlook for JHOVE2. JHOVE has become one of the most widely used tools to identify and validate digital content in the preservation community. JHOVE2 is its much anticipated successor and the audience was eager to hear about the current plans for it.

Session 3: Preservation Planning - (Plenary Session)

1.Emulation: From Digital Artefact to Remotely Rendered Environments. Dirk von Suchodoletz
2. Data Without Meaning: Establishing the Significant Properties of Digital Research. Gareth Knight
3. Towards a Curation and Preservation Architecture for CAD Engineering Models. Alexander Ball
4.Evaluating Strategies for Preservation of Console Video Games. Mark Guttenbrunne

The four speakers in this session looked at different aspects and approaches to preservation planning.

The first speaker, from the University of Freiburg, spoke about an emulation approach which has the debatable advantage of preserving the objects in their original form and experiencing them in their original environment. The latter, of course, may have ongoing related IPR issues.

That much-vaunted term ‘significant properties’ was discussed through the work of the InSPECT Project during the second paper. The work will produce a data dictionary of significant properties which will lead to the production of an xml schema.

We then turned to work looking at CAD engineering models from the University of Bath. Engineering is a discipline which has requirements for retaining and preserving data over the long term. Again in this work IRP issues were a consideration as engineering moves towards a service approach rather than merely product delivery. Two tools have been developed, the first a preservation planning tool which uses Representational Information (as defined by OAIS) to advise on appropriate strategies to adopt for migrating CAD models to archival or exchange formats. Their second tool provides an architecture for lite formats – LiMMA.

The final speaker from the Vienna University of Technology outlined the Planets approach to preservation for what some may have considered ephemeral material (but as the keynote speaker remarked will it always be ephemeral?) in the form of console video games. The Planets preservation planning approach was taken for 3 case studies adopting both emulation and migration strategies. Again IPR was a factor that needed considering.

During the Q&A session the relevance of the OAIS [4] framework was brought into the discussion – this was particularly timely as there is a current review of the OAIS model.

Session 4: Understanding Costs and Risks (Practitioner Track) and Preservation Metadata (Technical Track)

Understanding Costs and Risks
1. LIFE2 Costing the Digital Preservation Life-cycle More Effectively. Paul Wheatley
2. Risk Assessment: Using a Risk-based Approach to Prioritise Handheld Digital Information. Rory McLeod
3. The Significance of Storage in the Cost of Risk of Digital Preservation. Richard Wright
4. International Study on the Impact of Copyright Law on Digital Preservation. William G. LeFurgy, Adrienne Muir.

The four speakers all shared their practical experience and outcomes from their recent projects – all addressing one of the core issues – how much will it cost and is it legal?

The experience and findings from the LIFE2 Project were discussed. This was a project that was evidence-based and one in which the methodology had been refined from the preceding LIFE project. Additional case studies were presented which informed the lifecycle costing models and the generic preservation model. However it was evident that accurate forecasting of preservation costs was still some time away.

The risk assessment work at the BL was outlined. It was clear that the success of this work was due in part to the fact that this was not a stand-alone exercise. Rather it was an integral part of the overarching digital library and preservation work under way at the BL. Twenty-three risks were identified (using AS/NZ 4360:2004 standard) and which were classed as either a direct or indirect risk. The major risks found were media deterioration and for hand-held material, that of software obsolescence. The indirect risks, not surprisingly, were related to policy.

The third speaker’s approach was an insightful look at the cost of risks, looking in particular at the storage of audiovisual material. The thesis was that storage costs decline which have an effect on capacity as more ‘stuff’ becomes available. This results in growing usage with a resulted increased risk - risk being directly proportional to usage. The upshot: the recognition of the need to pay attention to bit preservation, particularly for complex files which are much more fragile than simple files.

The final speakers in the session gave the audience a taster of that most popular of topics, copyright. A review of a recent international study which looked at the impact of copyright on preservation from a number of jurisdictions was presented. Each reported domain had exceptions in regard to digital preservation but there was wide variation between countries. However as it stands, the current legislation is, in some areas, a barrier to effective digital preservation. The recommendation from the report is that laws and policies should support and encourage digital preservation, not act as a barrier.

Preservation Metadata
1.Developing Preservation Metadata for Use in Grid Based Preservation Systems. Arwen Hutt
2. Using METS, PREMIS and MODS for Archiving EJournals. Markus Enders
3. Harvester Results in Digital Preservation System. Tobias Steink
4. The FRBR Theoretical Library: The Role of Conceptual Data Modelling in Cultural Heritage Information System DesignRonald Murray

The first speaker outlined the metadata that was needed to support the distributed federated Chronopolis repository.

The second speaker discussed the approach used at the British Library to archive e-journal metadata. No single standard covers all of the needs for e-journal metadata. He explained how METS was used as a container and how PREMIS and MODS each played a role within it. This is a pattern that several organisations are using, and the concrete examples and discussion of design decisions may be useful for others.

The third speaker explored and contrasted the capabilities of the METS and WARC formats for holding metadata on the results of Web harvesting.

The final speaker in the session noted that for every preservation metadata approach, there is an underlying conceptual model. He argued that there could be benefit in looking explicitly at the literature on conceptual modelling.

Day One: Plenary: Closing Remarks

The final session, as observed on the day, was a Dickensian slant on the future of the iPRES series. Representatives from iPRES past, present and future outlined the growth of iPRES and the future plans for the series. The delegates were encouraged to consider this and to capture their thoughts either in the feedback form or by speaking to one of the Programme Committee.

Day Two: 30 September

The keynote was given by Dr Horst Forster of the Directorate General for Information and Media who gave an overview of the EU’s work in digital preservation. He also discussed the importance of digital preservation at national and international levels, as well as a commitment to dedicate the resources needed to make progress. An interesting ‘take-home’ was the feeling that the ‘p’ word was not the most apt and perhaps the preservation community might look to interact with other areas in the digital landscape.

Session 5: National Initiatives (Practitioner Track) and Grid Storage Architecture (Technical Track)

National Initiatives
1. JISC Funding and Digital Preservation Initiatives in the UK. Neil Grindley
2. Weaving a National Network of Partnerships in the National Digital Information Infrastructure and Preservation Programme. Martha Anderson
3. Digital Strategy and the National Library of New Zealand Steve Knight
4. Digital Preservation Activities Across Communities – Benefits and Problems Natascha Schumann

The four speakers gave an overview of the national initiatives in their particular space. The panel session looked at areas were there might be opportunities for co-operation and collaboration internationally and a discussion of shared challenges.

Interesting debate followed about the differences between the rhetoric and the reality when it comes to collaboration. A suggestion, which seemed to have resonance with the audience, was that collaboration would be much more forthcoming if it was directed at shared specific issues which needed solving.

The sustainability of the national DPC, nestor type bodies was also discussed – for instance would it be more effective to have a pan-European body?

Grid Storage Architecture
1. Towards Smart Storage for Repository Preservation Infrastructure. Steve Hitchcock
2. Repository and Preservation Storage Architecture. Keith Rajecki
3. Implementing Preservation Services over the Storage Resource Broker. Douglas Kosovic
4. Embedding Legacy Environments into a Grid-based Infrastructure. Claus-Peter Klas

The first three talks in the Grid and Storage Architecture session demonstrated that the way that we think about storage is changing radically. The first talk provided an architecture for ‘smart storage’ that would move many of the capabilities that are currently implemented in the repository or preservation layers down into the storage layer itself. The second talk outlined a modular storage architecture and contrasted it with several existing repository approaches. The third talk described work to extend the Storage Resource Broker (SRB) to support preservation services. The final talk introduced the EU co-funded SHAMAN Project and its plans to integrate preservation services with data grid technology.

Session 6. Establishing Trust in Service Providers (Practitioner Track) and Service Architecture for Digital Preservation (Technical Track)

Enabling Trust in Service Providers
1. Creating Trust Relationships for Digital Preservation. Tyler Walters
2. The Use of Quality Management Standards in Trustworthy Digital Archives. Suzanne Dobratz
3. The Data Audit Framework: A Toolkit to Identify Research Assets and Improve Data Management in Research-led Institutions. Sarah Jones
4. Data Seal Approval, Data Archiving & Network Services. Henk Harmsen

This session looked at the issues and challenges which are common to all areas of collaborative working and shared services provision.

The first speaker looked at a model for successful preservation federation looking at three different trust models. It was evident that so-called institutional trust was initially based on individual relationships. He went on to outline the five different stages in developing trust. Indeed the conclusion was that trust is not only relevant to processes and procedures but it also has an equally important human perspective.

The next speaker from the Humboldt University of Berlin spoke about their work in quality management in regard to trustworthy digital archives. The work, from the nestor programme, looked at the relevance and applicability of different standards to trusted digital networks (TDRs). From the survey it was not evident whether a separate standard for trusted repositories was required; however work was ongoing.

An overview of the Data Audit Framework was particularly timely as it was officially launched at a separate meeting the following day. The work, part of a JISC project, produced an audit methodology, online toolkit and registry. Four pilot sites were used and the speaker presented many practical tips which came to light as the work progressed to produce a self-audit tool.

The final session looked at the Data Seal of Approval from the Data Archiving and Networking Services in the Hague. There are 17 guidelines under the seal. The speaker outlined them and how the Archives have worked to satisfy such requirements. There was a nice aggregation of the responsibilities between the data provider and the repository which in itself was a useful pragmatic step on the trust continuum. The next stages for the work would be its international acceptance and ongoing work to meet TRAC requirements.

During the active Q&A session an apposite comment from the floor was that building trust was easy compared to rebuilding trust once there had been a break in the relationship.

Service Architecture for Digital Preservation
1. Updating DAITSS: Transitioning to a Web-service Architecture. Randall Fischer
2. Conceptual Framework for the Use of the Service Oriented Architecture Approach. Christian Saul
3. RODA and Crib: A Service Oriented Digital Repository. Jose Carlos Ramalho
4. Persistent Identifiers distributed system for Cultural Heritage digital objects. Maurizio Lunghi

The Service Architecture session was focused on how a new generation of digital repositories has begun to provide open service-oriented interfaces. This is an essential step to enable digital preservation tools and services to work with more than one repository. These interfaces are emerging, however, and common practice has not yet emerged. The final paper reminded the audience that the problem of long-lived identifiers for digital objects has also not yet been fully put to rest.

Session 7: Training and Curriculum Development (Practitioner Track) and International Approaches to Web Archiving (Technical Track)

Training and Curriculum Development
1. Digital Preservation Management Workshop at the Five-Year Mark. Nancy McGovern
2. Digital Preservation Training Programme. Kevin Ashley
3. Funding Digital Preservation Research Practice and Education in the US. Rachel Frick
4. The Key Challenges in Training and Educating a Professional Digital Preservation Workshop. Seamus Ross

This session, brought together an international panel of experts in the field of training delivery and curriculum development.

The first speaker from the Interuniversity Consortium for Political and Social Research (ICPSR) shared experiences of the seminal work of the Digital Preservation Management Workshop. The work has now reached a level of maturity and is looking for further development in its next phase. Key tips which were shared included know your audience, mix the teaching styles and be clear what the course objectives are to ensure that they matched delegates’ expectations.

This segued nicely into the work of the Digital Preservation Training Programme (DPTP) which was first run in collaboration with the previous speaker. The main thrust of the DPTP is to foster critical thinking rather than provide a prescriptive fix for all preservation issues.

The third speaker gave the perspective of a funding body – looking at what they require from proposals to fund training programmes. Though this was a US body, the issues that were raised were pertinent to the international audiences. The initiatives they funded included internships and elevation practice.

The pan-European collaborative approach was succinctly outlined in the final session. WePreserve - an umbrella organisation of large EU-funded preservation projects – works to reduce duplication of effort and to foster collaboration and a cohesive approach to training.

The final discussion session touched on different expectations which people have from training courses. The clear message, from all the speakers, was that comprehensive preservation expertise was not something which could be picked up from attending a 5-day course. Rather, what the current courses aimed to do was to raise awareness in the community of the gamut of issues involved and to empower delegates to take forward the digital preservation agenda in their own organisations.

International Approaches to Web Archiving
1. Thorsteinn Hallgrimsson - National and University Library of Iceland
2. Birgit N. Henriksen – The Royal Library, Denmark
3. Helen Hockx-Yu – The British Librar
4. Gildas Illien - National Library of France
5. Colin Webb - National Library of Australia

The Web archiving panel provided an insight into the one of the most challenging digital preservation problems of our time. Speakers gave unique national perspectives from Iceland, the United Kingdom, Denmark, France, and Australia. They described the tremendous scale of the national efforts, as well as the complex legal and technical challenges. The legal frameworks in France and Denmark have enabled them to carry out large-scale crawls with archives in the 100TB range. The frameworks in Australia and the United Kingdom, in contrast, have required explicit permission to collect sites resulting in smaller archives. The International Internet Preservation Coalition (IIPC) provides good examples of collaborative development and tool reuse (e.g., the Heretrix crawler, the Web Curator Toolkit). An important development reported by Collin Webb is that the IIPC has recently established a digital preservation working group that is looking beyond bit-level preservation. The heterogeneous Web archive collections provide an enormous opportunity for, and challenge to, the digital preservation community.

Session 8: Digital Preservation Services (Practitioner Track) and Foundations (Technical Track)

Digital Preservation Services
1. Encouraging Cyber-infrastructure Collaboration for Digital Preservation. Chris Jordan
2. Establishing a Community-based Approach to Electronic Journal Archiving: LOCKSS. Adam Rusbridge
3. The KB e-Depot as a Driving Force for Change. Marcel Ras
4. Building a Digital Repository: a Practical Implementation. Filip Boudrez

The Services session comprised four papers each outlining current digital preservation services. The first looked at a range of initiatives at the supercomputing centres in the US. They outlined some of the important recent work from a number of programmes including NDIIPP and the NSF DataNet. The next paper outlined the LOCKSS Pilot Programme, a JISC- and RLUK-funded project, to ensure the long-term accessibility to electronic journal content for the scholarly community. Their conclusion was that community involvement was essential to take forward this work. The pragmatic two- stream approach of the e-Depot development at the National Library of the Netherlands (KB) was outlined by the third speaker. Their cross-departmental approach and their involvement in international work such as Planets have allowed them to implement practical solutions for permanent access.

The final speaker in this session described the implementation path taken at the City of Antwerp Archives in the building of their digital repository. Their incremental approach taken with their records management approach had paid dividends.

Foundations
1. Bit Preservation: A Solved Problem? David H Rosenthal
2. Modelling Reliability for Digital Preservation Systems. Yan Han
3. Ways to Deal with Complexity. Christian Keitel
4. A Logic-based Approach to the Formal Specification of Data Formats. Michael Hartle

The Foundations session brought several papers together looking at mathematical and theoretical foundations for digital preservation. It started with a very strong paper and presentation by David Rosenthal who challenged some of our common assumptions about bit preservation. His analysis suggests that bit preservation is far from a solved problem and that reliability estimates for hardware and systems are hugely optimistic. This paper could change the way that you think about holding large quantities of digital material over the long haul. Holding a petabyte for a hundred years is not as easy as we might think! The session moved on to a paper that applied well-known mathematical modelling techniques to look at system-level reliability for a digital archive composed of a mixture of redundant disk and tape storage.

The session concluded with a paper looking at ways to think about reducing the complexity of preservation systems and providing a more rigorous basis for file format specifications.

Conclusion

The iPRES Conference concluded with some summary remarks by Steve Knight from the National Library of New Zealand. Steve noted that the conference theme “Joined up and Working” may not yet fully characterise the state of practice in the field, but is a call to arms.

The programme committee collected feedback forms from the participants. Ninety-five percent rated the programme as very good or excellent (4 or 5 out of 5) and ninety-one percent felt that the conference increased their knowledge of digital preservation. Anecdotal and recorded feedback, including feedback from blogs, confirms that iPRES 2008 was a success. As well as the blog coverage, there was also some press coverage in the Guardian [5]. The new initiatives introduced this year were well received. It would seem that iPRES is fully established as a conference in the digital preservation world and we look forward to next year’s at the California Digital Library.

Please note that all the presentations and the conference proceedings are available online at the iPRES Web site [6][7].

References

  1. DCC blog http://digitalcuration.blogspot.com/
  2. JISC blog http://infteam.jiscinvolve.org/?s=ipres
  3. Planets project http://www.planets-project.eu/
  4. OAIS http://public.ccsds.org/publications/archive/650x0b1.pdf
  5. In praise of … preserving digital memories, Guardian article, 30 September 2008
    http://www.guardian.co.uk/commentisfree/2008/sep/30/internet.digitalmusic
  6. Adam Farquhar (Ed). The Fifth International Conference on Preservation of Digital Objects.
    London: The British Library, 2008. ISBN-978-0-7123-0913
  7. iPRES 2008 http://www.bl.uk/ipres2008/

Author Details

Frances Boyle
Executive Director
Digital Preservation Coalition

Email: fb@dpconline.org
Web site: http://www.dpconline.org

Adam Farquhar
Head of Digital Library Technology
The British Library

Email: adam.farquhar@bl.uk
Web site: http://www.bl.uk

Return to top