Digitisation and e-Delivery of Theses from ePrints Soton

Julian Ball and Christine Fowler describe the partnership between the University of Southampton’s Library Digitisation Unit and its institutional repository for digitising and hosting theses.

The Hartley Library at the University of Southampton has in excess of 15,000 bound PhD and MPhil theses on 340 linear metres of shelving. Consultation of the hard-copy version is now restricted to readers making a personal visit to the Library, as no further microfiche copies are being produced by the British Library and no master copies of theses are lent from the Library. Retrieval of theses from storage for readers and their subsequent return requires effort from a large number of staff. Second soft-bound copies of theses were once deposited by authors and were available to libraries for consultation through the inter-library loan (ILL) process. Due to ever-tightening constraints on physical storage, these copies are now no longer deposited, and those once held have been sent for disposal, thus removing the facility of loans for external readers.

With the move towards electronic distribution of all University written materials, the University of Southampton amended its Calendar commencing with the period October 2008/09 to require as a condition of award for authors to deposit, in addition to the paper copy, an electronic copy of their thesis that the University can electronically distribute [1]. At the time of deposit consent is obtained that the University may make the thesis electronically available through the University of Southampton Research Repository subject to any approved embargo period.

Authors retain the copyright of their thesis and this remains unaffected by the Calendar amendment.

Digitisation and Authors’ Rights

With the general move of the University towards electronic distribution of written materials, the University Calendar was amended in 2008, establishing that theses would be deposited electronically, to enable open access from the research repository, ePrints Soton [2].

The Library has started a retrospective mass digitisation programme generating searchable PDFs for theses prior to 2008. Due to the copyright residing with authors and with no provision in the Calendar prior to 2008 for electronic delivery, procedures have been established to gain the consent of authors before providing access to the PDFs. Further aspects may prevent the digitisation of theses and electronic distribution that include the presence of published papers, third-party copyright materials within the text and part of the work being submitted as a digital copy.

The digitisation methodology used by the Library is based primarily on that adopted by the EThOS electronic theses online system [3]. Comment is provided on the digitisation of unbound theses, scanning techniques, file size, watermarking and compression to ensure that theses can be delivered from ePrints Soton.

Figure 1: Study area in the Hartley Library, University of Southampton

In order to facilitate access to theses by external readers author consents are being retrospectively gained. Currently all theses are catalogued on the Library management system (SirsiDynix Symphony) and electronic deposits are catalogued in ePrints Soton. Readers are provided with links between the two sources.

Since 1 October 2008 a number of theses have been deposited in an electronic format and made available from ePrints Soton [2], the University research repository. Distribution is made to the world community free of charge and without any restrictions except where there are legitimate embargoes for legal reasons, such as patent technology or other required confidentiality issues.

The University of Southampton Calendars prior to 2008/09 did not include any provision for electronic deposit and distribution. The author’s copyright remains in place for a period of 70 years from the end of the calendar year in which the author dies (Copyright, Designs and Patents Act 1988 s.12 [4]). In consequence, no digital copies of theses can be made prior to September 2008 for the purpose of open access distribution without the written permission of the author or their Estate.

Rationale for Digitisation and Electronic Delivery of Theses

The programme of digitisation and electronic delivery of theses at Southampton is informed by the following points:

Based on this rationale, the University of Southampton Library has embarked on a programme of digitisation of its theses published prior to 1 October 2008 for distribution via ePrints Soton. Currently 3,212 theses have been digitised from the period 2001 - September 2008. Searchable PDFs generated from these theses are stored in a dark archive and uploaded into ePrints Soton when the author’s consent has been obtained.

A number of considerations have been met prior to and during the digitisation process which are outlined in the following sections.

Addressing Copyright

Prior to and including session 2007/2008, the University of Southampton stipulated in the Calendar 2007/08 s31a [5] that authors only give written permission for their paper thesis to be lent, copied and distributed. This gave no provision for the creation and distribution of digital surrogates or for publishing on a web repository or distribution through electronic means.
The Copyright, Designs and Patents Act 1988 c.48 s.42 [6] enables prescribed libraries, which includes university libraries, to copy any item in its permanent collection for preservation purposes. This provision does not enable electronic free open distribution of theses.

A number of options were considered to protect the University from copyright infringement of electronically distributed theses submitted prior to September 2008 that included:

The first two options were deemed inadequate by an IP specialist in the University of Southampton as they could not be relied on.. The third approach is currently being implemented, whereby the consent of each author is being obtained prior to making their thesis digitally availably to the world community from the research repository.

Obtaining the Consent of Authors

To ensure that the University of Southampton has a creditable consent from authors who wish to allow their thesis to be electronically distributed, a ‘Supplementary author declaration and consent (e-thesis)’ form has been developed and is available on-line [7]. Authors are invited to sign the form and send the original to the University. Upon receipt of the form, their thesis can be made available via ePrints Soton if there are no third-party copyright issues.

To obtain the permission of individual authors a number of approaches are being employed which include:

The University is actively raising awareness through the Office of Development and Alumni Relations to gain Author consent. In one publication of the Alumni newsletter, authors were invited to sign the ‘Supplementary Release From’ and 35 consents were obtained. 100 letters were individually sent to authors and 21 consents were obtained. 15 authors have given consent who noted the invitation on the Hartley Library Web pages. The Office of Development and Alumni Relations supplied 166 email addresses of pre-2008 PhD alumni. A targeted email to the alumni produced 19 responses enabling their theses to be uploaded into ePrints Soton for open access.

Digitisation from Unbound Theses

Theses at Southampton have in the most part been bound by the University bindery and in a relatively uniform style. The A4 sheets are stitched into sheaves, glued and bound into a round backed spine. These openings are very tight with the text lying towards the binding gutter. Digitisation of the volumes intact using book scanners can result in some of the text being lost during digitisation.

Getting a better view, copyright image, used under licence from shutterstock.com

A trial to unbind theses and digitise single sheets was undertaken. The case, front and back boards and spine were removed intact and the binding removed by guillotining. Experience has shown that the cases from other binders may not be so easily removed. Digitisation of single sheets opened up the possibility of using document feeders. It was found that the digitisation and quality assurance of single sheets using either manual book scanners or form feed photocopies produced better quality images than from bound theses and was a quicker process. The decision was therefore taken to unbind theses prior to digitisation. Following digitisation, the sheets are returned loose to the intact case, bound with linen tape and sent off site for long term storage.

Digitisation Methodology

The methodology of digitisation adopted by the JISC-funded EThOS [8] Project has been used in part by the Library Digitisation Unit for University of Southampton theses. The remit is to produce a digital copy of the deposited hard-bound paper thesis and to make the content searchable with a file size that enables electronic transmission over current bandwidth restrictions. This is being achieved with the following digitisation procedure:

Due to the variable content of the A4 pages in terms of colour depth, two methods are employed to capture these variable colour components efficiently:

Digitisation Restrictions

The re-publication of a thesis in a digital format brings a different set of copyright issues that may preclude digital publication. Such issues include:

Request Process for Creating e-Theses

Internal and external requestors may generally download PDF images of theses as required post-2008, and for those pre-2008 that have undergone rights clearance. However, University of Southampton bound theses remain in high use and in the 12-month period between 2010 and 2011 approximately 900 theses from Library stock were requested by readers. Previously, second paper copies or microfiche copies were made available to external requests, but they are now no longer available. When external readers do request a copy of a thesis, the Library:

The Library will continue mass digitisation of its theses for the period 2000-2008 and store them in a dark archive until author consent has be gained to upload them into eprints Soton. Theses prior to 2000 will only be digitised when author consent has been provided.

Consolidation of Platforms, Catalogue Entry and Master Copy

Currently all bound theses within the Library are catalogued and records appended to the Library catalogue, SirsiDynix Symphony. Theses deposited electronically and those deposited post-October 2008 are accessed via ePrints Soton which has a further catalogue entry. Links are currently provided between the two catalogues.

The ePrints Soton metadata is being harvested by a number of aggregators and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and exposed to UKETD_DC and searchable from EThOS [11]. As previously noted, the paper and electronic version of a thesis are not always identical, there being greater scope to hold various types of media within an electronic version. The Calendar mandates that both digital and paper versions are submitted, but currently there remains the risk of discrepancies between the two versions.

Conclusions and Remarks

Thesis digitisation activities at Southampton contribute to the University’s wider visibility as well as being an obvious factor in space management and a saving in staff time. As a Russell Group university with a major research presence, any activities which contribute to overall research impact are highly significant. The current preparations within UK HE for the Research Excellence Framework exercise (REF) in 2014 [12] have highlighted research impact as a key measure of research excellence. Along with other Research Councils, the Arts & Humanities research Council believes that free and open access to publicly funded research, including theses, offers significant social and economic benefits [13].The link between global exposure of research outputs and impact is well documented and it is clear that putting thesis metadata and digital objects into an institutional repository will enhance that institution’s overall impact [14]. As the requirement to deposit an electronic theses version was agreed in 2008, there are currently a number of hard-copy theses being received by the University Library as well as the regular receipts of born-digital theses into the ePrints Soton. This mixed economy of materials illustrated that a reader-focused approach to finding this material is paramount. Currently the thesis metadata since 2008 are captured in both ePrints Soton and the Library catalogue. Previously to 2008, all theses metadata are present in the Library catalogue with a small number of theses prior to this date in the ePrints Soton. The conclusion is that visibility of the metadata in both locations is more helpful to the readers than any form of cross-referral with complicated caveats between the two sources. In time, it may be expedient to have all thesis metadata in the ePrints Soton only, but the Library will need to be guided by its users on this point.

Staff development has taken place over the past three years in terms of understanding the IPR implications of born-digital and retrospectively digitised theses. The Library Digitisation Unit has developed a considerable amount of background knowledge about the various permutations of thesis submission and their suitability for digitisation. The development of  a workflow to carry out this procedure has also been a product of this development curve. There is a critical set of relationships between the teams within cataloguing, inter-library loan, digitisation and the ePrints Soton and the related workflows are key to the smooth operation of the service.


