Book Review: Preparing Collections for Digitization

Michael Day reviews a recently published book on the selection and preparation of archive and library collections for digitisation.

Over the past 20 years a great deal of information and guidance has been published to support cultural heritage organisations interested in undertaking digitisation projects. It is well over a decade now since the seminal Joint National Preservation Office and Research Libraries Group Preservation Conference on Guidelines for digital imaging [1] and standard introductory texts on digitisation like Anne Kenney and Oya Rieger's Moving theory into practice [2] and Stuart Lee's Digital imaging: a practical handbook [3] are of a similar age - although still extremely useful. More up-to-date guidance is also available from services like JISC Digital Media [4] and the Federal Agencies Digitization Guidelines Initiative [5].

Into this mix comes this new book on the preparation of collections for digitisation by Anna Bülow and Jess Ahmon, respectively Head of Preservation and Preservation Officer at The National Archives in Kew, London. The book claims to fill a gap in the existing literature, covering the practical aspects of safeguarding collections during image capture. It is perhaps worth noting upfront that the main focus of the book is on textual resources and documentary records, meaning that it would seem to be most useful for those working in the libraries and archives sectors.

The first chapter provides some essential context, linking digitisation initiatives to the ongoing collection management practices of archives and libraries. It makes the general point that collection management has three main aspects: the development, use and preservation of collections.

Collection management involves making well informed decisions in order to prioritise actions and optimise the allocation of resources to maintain as much accessible value as possible. (p. 5)

Bülow and Ahmon argue that digital technologies have created new challenges for collection management, e.g. being partly responsible for a shift in attention from the development and preservation role to the development and use role. In practice, however, the link between the roles can be more nuanced. For example, in some cases digitisation may benefit conservation aims by helping to reduce the physical handling of fragile materials. In general, however, the authors feel that while the long-term sustainability challenges of digital content remain unresolved, "digitization of any book or document cannot be seen as a preservation measure for the original itself." (p. 8). The chapter concludes with a brief outline of the four phases of digitisation, each of which is made up of multiple steps. Of these, this book focuses primarily on the first two stages, covering all of the tasks that need to be done prior to imaging (e.g. selection, rights clearance, document preparation) as well as those associated with the digitisation process itself (imaging, quality assurance, transcription, metadata creation). The remaining two stages, chiefly facilitating use and sustainability, are not dealt with in any detail by this book.

Chapter two outlines some of the things that need to be done prior to digitisation, including building project teams that provide a wide range of skills, including expert conservators. From their archives perspective, Blow and Ahmon strongly advocate the involvement of conservation staff from the outset of digitisation projects (p. 18). Further sections deal in more detail with the questions of outsourcing and the role of microform. As a reasonably stable medium, microform has become the de facto standard for some preservation reformatting projects, and this can interact with digitisation projects in a number of different ways. The general principle adhered to here seems to be a reasonable one, i.e. that it is preferable to undertake imaging of a collection once only (p. 28), meaning that some digitisation projects will have to fit both into a single workflow.

The third chapter is an outline of digital image technology by Ross Spencer, a colleague of Blow and Ahmon at The National Archives. This chapter provides a quick overview of key concepts like image resolution (e.g. in pixels per inch or dimension), bit-depth/colour-depth and colour management. Spencer also introduces the concept of developing an 'archival master' from which a variety of so-called 'service copies' can be generated (these are sometimes also known as 'access files'). Archival masters are intended to be useable over reasonably long periods of time, and so care needs to be taken over the exact choice of format and the compression algorithms used. Following widespread practice in the cultural heritage sector, Spencer comments that image formats like TIFF (Tagged Image File Format) "have proven to be robust and have lasted over time" (p. 38). Regardless of format, however, it is important that image quality is sufficient to support its continued use and reuse.

Image specifications should be as high as possible. One of the benefits of digitized objects is the manipulations and techniques that might be used on them and so [...] it is important to specify a requirement for images with the highest possible and highest achievable standards for our time. (p. 42)

On compression techniques, Spencer also expresses the widely accepted view that the use of lossy compression is not best practice for archival masters (p. 41). Other topics covered in the chapter are image post-processing (e.g. image enhancement, cropping, de-skewing) and the need to retain metadata about the image as part of the provenance record. It might have been useful for the chapter to have referred to a standard text like Howard Besser's Introduction to imaging [6], as this goes into far more detail on most of these topics.

The following chapter turns to look in more detail at the process for selecting collections for digitisation. Despite the advent of large-scale digitisation initiatives in the last decade, Bülow and Ahmon consider that most cultural heritage organisations would not currently be in a position to consider digitising any more than a relatively small proportion of their content (p. 47). Selecting collections for digitisation needs to fit with wider institutional goals, and could be based on (for example) use, format and physical condition (e.g. as part of a conservation strategy for fragile originals), or containing content judged to have most potential for reuse in education and research. The chapter refers to the popular decision-making selection matrix developed by Harvard University Library [7], suggesting that the main value of such tools is in providing "a checklist of questions to form the foundation of a selection policy" (p. 53). The rest of the chapter concerns the initial assessment of the physical format and condition of collections before they can be finally selected for digitisation.

Chapter five introduces the more detailed collection surveys that need to be undertaken to help identify specific challenges that will be encountered during digitisation itself. Bülow and Ahmon argue that such surveys are not optional extras, but essential to help inform choices on equipment and much else (p. 63). The examples provided in the chapter (which includes some interesting illustrations) provide evidence of the extremely wide range of different physical formats that might be encountered in archives, but the techniques described would also be useful for helping libraries to identify where significant gaps might need to be filled from other collections or institutions (p. 66), or in deciding what to do with things like uncut pages, fold-outs or extra-tight bindings. As a general principle, Bülow and Ahmon argue that the planning stage of a digitisation project must "allow time for the survey and analysis of results because the survey will provide information that can have a crucial impact on the imaging operation and hence the timescale for the whole digitization project" (p. 67).

The following chapter goes on to outline some of the issues around the equipment used for image capture. Bülow and Ahmon concede that there is an abundance of literature on this topic (p. 91), so they focus in this book on equipment requirements as they relate to original content. The chapter first describes risks to content from increased handling, exposure to light and heat, and disassociation, then goes on to consider the features to consider when selecting which specific equipment to use. It concludes with a brief overview of the different types of imaging equipment, including flatbed scanners, overhead scanners and digital cameras.

Chapters seven and eight deal with the physical aspects of preparing documents for image capture as part of a digitisation workflow. Chapter seven focuses on document formats and fastenings, using a number of practical examples from The National Archives own holdings. Images illustrate the assortment of fastenings used in archives and the chapter provides, with short case studies, some tips for how these can best be dealt with during the imaging process. These problems may not be as significant a problem for most libraries, however, since - as Bülow and Ahmon admit - printed books "can be the most trouble-free of document formats to digitize" (p. 123). The chapter does, however, investigate the contentious issue of disbanding books for digitisation. Chapter eight deals in more detail with the preparation of damaged documents, and how to evaluate whether specific conservation expertise would be needed and how this would potentially fit within the digitisation workflow. Short case studies illustrate how conservation expertise has been integrated within two of The National Archives" own projects.

The final chapter describes setting up the imaging operation from the perspective of document welfare. The principle expounded here is that the collection manager should be "involved during the planning and setting-up of the imaging operation so that they can advise on these issues" (p. 159). Some compromises may need to be made here because the optimal environmental conditions for documents may not always coincide with those for digitisation staff. The chapter covers workspace design, health and safety, document tracking and staffing.

The remainder of the book comprises a two-page conclusion, a very short annotated list of further reading and an index.

To conclude, Preparing Collections for Digitization is an extremely valuable addition to the literature on digitisation. It should be of interest to all involved in the digitisation of documentary records and textual materials and would usefully inform the development of conservation strategies and digitisation workflows dealing with these content types. The volume has been attractively produced making extensive use of short case studies and illustrations, particularly in the sections dealing with collection surveying and preparation. I spotted the occasional inconsistency, e.g. the use of 'principle' for 'principal' in chapter 1 (p. 1) and the confusing use of both 'phase' and 'stage' in Figure 1.2 (p. 11). I also wondered whether it was necessary to include both Figures 1.1 and 1.3, as they are largely identical.

As the title implies, Preparing Collections for Digitization focuses almost entirely on the collection management and conservation processes that need to be undertaken prior to the imaging stage. Naturally, it has far less to say about what comes afterwards, e.g. text transcription, structural metadata, delivery, evaluation and sustainability. In practice, the book would therefore need to be used in conjunction with the wide range of other digitisation guidance available in other publications and on the Web. However, like much of that wider digitisation literature, books like this - perhaps inevitably - largely represent the producer interest in that they focus on the specific requirements of libraries and archives rather than those of the potential users of the resources that are being generated by digitisation projects. There are some hints in chapter 4 that people outside the collecting institutions themselves - e.g. members of the research community or the general public - might have a potential role in helping to select content for digitisation. However, there is less awareness of how the digitisation choices of cultural heritage organisations might limit the later use of the resources they are so expensively creating. Researchers can be very resourceful in their use of digitised resources, but it is still very important to capture potential usage scenarios at the planning stage of a digitisation project.

That said, archives and libraries planning to undertake digitisation will find this new book a useful source of guidance on the collection surveying and preparation activities that are a necessary part of any digitisation project.


Author Details

Michael Day

Research and Development Team Leader
University of Bath

Email: m.day@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Michael Day has worked at UKOLN since 1996 on a variety of research projects relating to resource description, semantic interoperability and digital preservation. He currently leads UKOLN's research and development team and is part of the small team at UKOLN contributing to the EU-funded IMPACT (Improving Access to Text) Project: http://www.impact-project.eu/

