Digitisation and e-Delivery of Theses from ePrints Soton

Julian Ball and Christine Fowler describe the partnership between the University of Southampton’s Library Digitisation Unit and its institutional repository for digitising and hosting theses.

The Hartley Library at the University of Southampton has in excess of 15,000 bound PhD and MPhil theses on 340 linear metres of shelving. Consultation of the hard-copy version is now restricted to readers making a personal visit to the Library, as no further microfiche copies are being produced by the British Library and no master copies of theses are lent from the Library. Retrieval of theses from storage for readers and their subsequent return requires effort from a large number of staff. Second soft-bound copies of theses were once deposited by authors and were available to libraries for consultation through the inter-library loan (ILL) process. Due to ever-tightening constraints on physical storage, these copies are now no longer deposited, and those once held have been sent for disposal, thus removing the facility of loans for external readers.

With the move towards electronic distribution of all University written materials, the University of Southampton amended its Calendar commencing with the period October 2008/09 to require as a condition of award for authors to deposit, in addition to the paper copy, an electronic copy of their thesis that the University can electronically distribute [1]. At the time of deposit consent is obtained that the University may make the thesis electronically available through the University of Southampton Research Repository subject to any approved embargo period.

Authors retain the copyright of their thesis and this remains unaffected by the Calendar amendment.

Digitisation and Authors’ Rights

With the general move of the University towards electronic distribution of written materials, the University Calendar was amended in 2008, establishing that theses would be deposited electronically, to enable open access from the research repository, ePrints Soton [2].

The Library has started a retrospective mass digitisation programme generating searchable PDFs for theses prior to 2008. Due to the copyright residing with authors and with no provision in the Calendar prior to 2008 for electronic delivery, procedures have been established to gain the consent of authors before providing access to the PDFs. Further aspects may prevent the digitisation of theses and electronic distribution that include the presence of published papers, third-party copyright materials within the text and part of the work being submitted as a digital copy.

The digitisation methodology used by the Library is based primarily on that adopted by the EThOS electronic theses online system [3]. Comment is provided on the digitisation of unbound theses, scanning techniques, file size, watermarking and compression to ensure that theses can be delivered from ePrints Soton.

Figure 1: Study area in the Hartley Library, University of Southampton

Figure 1: Study area in the Hartley Library, University of Southampton

In order to facilitate access to theses by external readers author consents are being retrospectively gained. Currently all theses are catalogued on the Library management system (SirsiDynix Symphony) and electronic deposits are catalogued in ePrints Soton. Readers are provided with links between the two sources.

Since 1 October 2008 a number of theses have been deposited in an electronic format and made available from ePrints Soton [2], the University research repository. Distribution is made to the world community free of charge and without any restrictions except where there are legitimate embargoes for legal reasons, such as patent technology or other required confidentiality issues.

The University of Southampton Calendars prior to 2008/09 did not include any provision for electronic deposit and distribution. The author’s copyright remains in place for a period of 70 years from the end of the calendar year in which the author dies (Copyright, Designs and Patents Act 1988 s.12 [4]). In consequence, no digital copies of theses can be made prior to September 2008 for the purpose of open access distribution without the written permission of the author or their Estate.

Rationale for Digitisation and Electronic Delivery of Theses

The programme of digitisation and electronic delivery of theses at Southampton is informed by the following points:

  • local use studies at Southampton have shown that bound theses submitted in the last 10 years are widely consulted
  • the University has a research repository, ePrints Soton that is being used for the delivery of theses. In the academic year 2012/2013 from a holding of 2,166 unrestricted PDFs, there were 249,406 downloads and 200,331 abstract views
  • theses represent a rich resource of research data and digitisation will further expose their content through web search engines
  • potential global exposure increases their citation frequency
  • theses are written in English and many world research workers have at least a working knowledge of English, making these materials a valuable world-wide resource
  • current access to the paper version is restricted to readers physically present at the University of Southampton Library
  • electronic access will reduce storage space and staff retrieval time
  • electronic access provides researchers a stable URL that can be referenced
  • the University of Southampton Library has a well-equipped Digitisation Unit with a range of book scanners and staff able to digitise theses

Based on this rationale, the University of Southampton Library has embarked on a programme of digitisation of its theses published prior to 1 October 2008 for distribution via ePrints Soton. Currently 3,212 theses have been digitised from the period 2001 - September 2008. Searchable PDFs generated from these theses are stored in a dark archive and uploaded into ePrints Soton when the author’s consent has been obtained.

A number of considerations have been met prior to and during the digitisation process which are outlined in the following sections.

Addressing Copyright

Prior to and including session 2007/2008, the University of Southampton stipulated in the Calendar 2007/08 s31a [5] that authors only give written permission for their paper thesis to be lent, copied and distributed. This gave no provision for the creation and distribution of digital surrogates or for publishing on a web repository or distribution through electronic means.
The Copyright, Designs and Patents Act 1988 c.48 s.42 [6] enables prescribed libraries, which includes university libraries, to copy any item in its permanent collection for preservation purposes. This provision does not enable electronic free open distribution of theses.

A number of options were considered to protect the University from copyright infringement of electronically distributed theses submitted prior to September 2008 that included:

  • a takedown policy for any thesis made available electronically if challenged by the author
  • placing notices in national newspapers stating that the University would make theses electronically available unless an objection was made within a specified time-frame
  • obtain the consent of authors to make their thesis electronically available

The first two options were deemed inadequate by an IP specialist in the University of Southampton as they could not be relied on.. The third approach is currently being implemented, whereby the consent of each author is being obtained prior to making their thesis digitally availably to the world community from the research repository.

Obtaining the Consent of Authors

To ensure that the University of Southampton has a creditable consent from authors who wish to allow their thesis to be electronically distributed, a ‘Supplementary author declaration and consent (e-thesis)’ form has been developed and is available on-line [7]. Authors are invited to sign the form and send the original to the University. Upon receipt of the form, their thesis can be made available via ePrints Soton if there are no third-party copyright issues.

To obtain the permission of individual authors a number of approaches are being employed which include:

  • placing a small article in the University of Southampton Alumni newsletter and asking for the completion of the ‘Supplementary Release Form’
  • placing the Supplementary Release Form on the Library and the Alumni Web pages
  • letters to individual authors
  • emails to individual authors

The University is actively raising awareness through the Office of Development and Alumni Relations to gain Author consent. In one publication of the Alumni newsletter, authors were invited to sign the ‘Supplementary Release From’ and 35 consents were obtained. 100 letters were individually sent to authors and 21 consents were obtained. 15 authors have given consent who noted the invitation on the Hartley Library Web pages. The Office of Development and Alumni Relations supplied 166 email addresses of pre-2008 PhD alumni. A targeted email to the alumni produced 19 responses enabling their theses to be uploaded into ePrints Soton for open access.

Digitisation from Unbound Theses

Theses at Southampton have in the most part been bound by the University bindery and in a relatively uniform style. The A4 sheets are stitched into sheaves, glued and bound into a round backed spine. These openings are very tight with the text lying towards the binding gutter. Digitisation of the volumes intact using book scanners can result in some of the text being lost during digitisation.

Getting a better view, copyright image, used under licence from shutterstock.com

A trial to unbind theses and digitise single sheets was undertaken. The case, front and back boards and spine were removed intact and the binding removed by guillotining. Experience has shown that the cases from other binders may not be so easily removed. Digitisation of single sheets opened up the possibility of using document feeders. It was found that the digitisation and quality assurance of single sheets using either manual book scanners or form feed photocopies produced better quality images than from bound theses and was a quicker process. The decision was therefore taken to unbind theses prior to digitisation. Following digitisation, the sheets are returned loose to the intact case, bound with linen tape and sent off site for long term storage.

Digitisation Methodology

The methodology of digitisation adopted by the JISC-funded EThOS [8] Project has been used in part by the Library Digitisation Unit for University of Southampton theses. The remit is to produce a digital copy of the deposited hard-bound paper thesis and to make the content searchable with a file size that enables electronic transmission over current bandwidth restrictions. This is being achieved with the following digitisation procedure:

  • theses are digitised at a true resolution of 300dpi to maximise the optical character recognition (OCR) accuracy
  • individual pages that contain only black text or diagrams are digitised in black and white 1-bit monochrome
  • pages containing any grey or colour component are digitised in colour as 24- bit
  • ABBYY Recognition Server 3.0 [9] is used to provide a searchable PDF. By applying a jpeg compression, a file size between 5-50MB per thesis can be achieved
  • Goobi, an open source software, maintained by intranda GmbH [10], manages and tracks all digitisation jobs, queuing to ABBYY via an XML ticketing API and subsequent output

Due to the variable content of the A4 pages in terms of colour depth, two methods are employed to capture these variable colour components efficiently:

  • theses that contain no colour or a minimum number of colour pages are digitised on form-feed photocopiers as 1-bit images and colour pages are inserted subsequently. Older theses were, typically, produced without any colour text or images
  • theses that contain a large number of colour pages are scanned by hand on book scanners producing 24- or 1-bit images depending on the bit depth of individual pages

Digitisation Restrictions

The re-publication of a thesis in a digital format brings a different set of copyright issues that may preclude digital publication. Such issues include:

  • requirement for rights clearance
  • presentation of part of the thesis as a digital file
  • inclusion of third-party copyright materials within the text
  • bound inclusion of published papers

Request Process for Creating e-Theses

Internal and external requestors may generally download PDF images of theses as required post-2008, and for those pre-2008 that have undergone rights clearance. However, University of Southampton bound theses remain in high use and in the 12-month period between 2010 and 2011 approximately 900 theses from Library stock were requested by readers. Previously, second paper copies or microfiche copies were made available to external requests, but they are now no longer available. When external readers do request a copy of a thesis, the Library:

  • encourages the reader to visit the Library to read the hard copy
  • will undertake to contact the author and seek signature of the ‘Supplementary Release Form’,
  • and if successful digitise the thesis and make it available in ePrints Soton

The Library will continue mass digitisation of its theses for the period 2000-2008 and store them in a dark archive until author consent has be gained to upload them into eprints Soton. Theses prior to 2000 will only be digitised when author consent has been provided.

Consolidation of Platforms, Catalogue Entry and Master Copy

Currently all bound theses within the Library are catalogued and records appended to the Library catalogue, SirsiDynix Symphony. Theses deposited electronically and those deposited post-October 2008 are accessed via ePrints Soton which has a further catalogue entry. Links are currently provided between the two catalogues.

The ePrints Soton metadata is being harvested by a number of aggregators and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and exposed to UKETD_DC and searchable from EThOS [11]. As previously noted, the paper and electronic version of a thesis are not always identical, there being greater scope to hold various types of media within an electronic version. The Calendar mandates that both digital and paper versions are submitted, but currently there remains the risk of discrepancies between the two versions.

Conclusions and Remarks

Thesis digitisation activities at Southampton contribute to the University’s wider visibility as well as being an obvious factor in space management and a saving in staff time. As a Russell Group university with a major research presence, any activities which contribute to overall research impact are highly significant. The current preparations within UK HE for the Research Excellence Framework exercise (REF) in 2014 [12] have highlighted research impact as a key measure of research excellence. Along with other Research Councils, the Arts & Humanities research Council believes that free and open access to publicly funded research, including theses, offers significant social and economic benefits [13].The link between global exposure of research outputs and impact is well documented and it is clear that putting thesis metadata and digital objects into an institutional repository will enhance that institution’s overall impact [14]. As the requirement to deposit an electronic theses version was agreed in 2008, there are currently a number of hard-copy theses being received by the University Library as well as the regular receipts of born-digital theses into the ePrints Soton. This mixed economy of materials illustrated that a reader-focused approach to finding this material is paramount. Currently the thesis metadata since 2008 are captured in both ePrints Soton and the Library catalogue. Previously to 2008, all theses metadata are present in the Library catalogue with a small number of theses prior to this date in the ePrints Soton. The conclusion is that visibility of the metadata in both locations is more helpful to the readers than any form of cross-referral with complicated caveats between the two sources. In time, it may be expedient to have all thesis metadata in the ePrints Soton only, but the Library will need to be guided by its users on this point.

Staff development has taken place over the past three years in terms of understanding the IPR implications of born-digital and retrospectively digitised theses. The Library Digitisation Unit has developed a considerable amount of background knowledge about the various permutations of thesis submission and their suitability for digitisation. The development of  a workflow to carry out this procedure has also been a product of this development curve. There is a critical set of relationships between the teams within cataloguing, inter-library loan, digitisation and the ePrints Soton and the related workflows are key to the smooth operation of the service.

References

  1. University of Southampton Calendar 2008/9 http://www.calendar.soton.ac.uk/arch2008_9/sectionV/part4.html
  2. University of Southampton Institutional Research Repository ePrints Soton http://eprints.soton.ac.uk
  3. EThOS Electronic Theses Online System http://www.ethos.ac.uk
  4. Copyright, Designs and Patents Act 1988, Chapter 48  http://www.legislation.gov.uk/ukpga/1988/48
  5. University of Southampton Calendar 2007/08 http://www.calendar.soton.ac.uk/arch2007_8/sectionV/part4.html
  6. Copyright, Designs and Patents Act 1988, Chapter 48, Section 42
    http://www.legislation.gov.uk/ukpga/1988/48/section/42
  7. Supplementary author declaration and consent (e-thesis) form
    http://www.southampton.ac.uk/library/resources/documents/thesisconsentform.pdf
  8. UK theses digitisation project http://www.jisc.ac.uk/whatwedo/programmes/digitisation/theses.aspx
  9. Server-based Document Conversion Software - ABBYY Recognition Server 3.5 - Product Overview http://www.abbyy.com/recognition_server/product_overview
  10. intranda GmbH http://www.intranda.com
  11. British Library EThOS - search and order theses online http://ethos.bl.uk/Home.do
  12. Research Excellence Framework (REF) 2014 http://www.ref.ac.uk
  13. Arts & Humanities Research Council - Open access to research output
    http://www.ahrc.ac.uk/About-Us/Policies,-standards,-and-forms/open-access/Pages/open-access.aspx
  14. Harnad, S and Carr, L and Swan, A and Sale, AHJ and Bosc, H (2009) Open Access Repositories - maximizing and measuring research impact through university and research-funder open-access self-archiving mandates. Wissenschaftsmanagement, 4 (4). pp. 36-41.

Author Details

Julian Ball
Library Digitisation Unit
Hartley Library
University of Southampton
University Road
Southampton SO17 1BJ

Email: j.h.ball@soton.ac.uk
Web site: http://www.southampton.ac.uk/library/ldu/

Julian Ball is manager of the University of Southampton Library Digitisation Unit and has been part of two major JISC digitisation projects, 18C parliamentary papers and the 19C political pamphlets. The Unit specialises in the digital capture of high resolution archive resources to support research. A seamless workflow has been developed for the digitisation of materials, through to web delivery. This incorporates quality control steps, optical character recognition and standardised metadata.

Christine Fowler
Head of Electronic Library Services
Hartley Library
University of Southampton
Southampton SO17 1BJ

Email: mailto:c.a.fowler@soton.ac.uk
Web site: http://www.southampton.ac.uk/library/

Christine Fowler is Head of E-Library Services and Faculty Librarian for Humanities and Health Sciences at the University of Southampton. Her interests include business models for e-resource creation and content sustainability. Christine has been associated with the Library Digitisation Unit for seven years and was part of the project team that created both the 18th Century Official Parliamentary Publications collection (available through ProQuest) and latterly the 19th Century British Pamphlets Online collection (JSTOR) as part of the JISC Digitisation Programme.

Date published: 
Wednesday, 26 March 2014
Copyright statement: 

This article has been published under Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence. Please note this CC BY licence applies to textual content of this article, and that some images or other non-textual elements may be covered by special copyright arrangements. For guidance on citing this article (giving attribution as required by the CC BY licence), please see below our recommendation of 'How to cite this article'.