Electronic Theses and Dissertations: A Strategy for the UK
‘ETDs’ is the acronym widely used in the US to stand for ‘Electronic Theses and Dissertations’. The father of the ETD movement, Professor Ed Fox of Virginia Polytechnic Institute (Virginia Tech), explains the acronym as containing an implicit Boolean ‘OR’: ‘ETs’ OR ‘EDs’ equals ‘ETDs’. This makes for a very convenient shorthand, whereby a digital object which is either an electronic thesis or an electronic dissertation can be referred to as ‘an ETD’. It makes sense to have the acronym as inclusive as this, since the use of the terminology ‘theses’ and ‘dissertations’ varies considerably not merely from country to country, but even within countries. Certainly in the US, there is no uniform agreement among graduate schools over what is a thesis and what a dissertation.
The international ETD movement had its fifth annual symposium last month – although ETDs have been created at Virginia Tech since 1994. The event was held in the US, and the delegates were mostly from North America and Canada, with a sprinkling of South Americans, Australians and a few Europeans. Germany, France and Scandinavia were all represented. There was one delegate only from the UK.
The ETD movement has obviously not yet taken root in the UK. Nevertheless, there has been a group concerned with exploring the issues. UTOG, the UK Theses Online Group, was established several years ago, and it has been aware of the activities of the ETD movement, but also of the distinctive characteristics of thesis publishing in the UK, which have made us – so far – unsympathetic to the creation of an ETD culture here. In 2001, UTOG commissioned a report into the digitisation of theses from the SELLIC(1) Project at the University of Edinburgh. In addition to considering retrospective digitisation, the report also looked at the issues involved in the production and management of ‘born digital’ theses, identified reasons for the UK to become properly involved in this initiative, and recommended that UK institutions join the Networked Digital Library of Theses and Dissertations (NDLTD). These ideas were taken forward by JISC’s Scholarly Communications Group, and SELLIC was encouraged to submit a project proposal to the JISC Focus on Access to Intellectual Resources (FAIR) programme at the beginning of 2002, under the name Theses Alive! As a result, a two-year project was funded, based at Edinburgh, and will be starting work in a national effort to promote ETDs from the autumn of 2002. Theses Alive! is one of three projects in electronic theses funded by JISC. The others are DAEDALUS, led by Glasgow, and E-theses in the UK, led by the Robert Gordon University. Theses Alive! will work with both of these, and will share an advisory structure with the latter.
Why put them online?
What is the value in putting theses and dissertations online? First, it is important not to be too exclusive about the material we are talking about. When we think of ‘theses’, most of us probably conjure up an image of thick, black-bound volumes standing forbiddingly in rows on library shelves, very probably in closed access stack. These unwelcoming and largely unused and unvisited volumes stand as silent witness to the claim of our major universities to the designation ‘research institutions’ – alongside the research publications of our academic staff, which of course are treated very differently, with the aim of securing maximum impact. Printed theses and dissertations, unfortunately, have all too little scope for impact.
But what they also are is the fruit of original research, achieved usually only at great effort by students many of whom go on to become the academy – to careers in research publishing and teaching in universities. They contain years of work: important ideas, painstaking methodologies, literature reviews, successful hypotheses, and records of experimentation. While most of the very important original research in them is likely eventually to be recorded elsewhere, in the journal literature, or in monograph publications which are based upon them, particularly in the case of the humanities and social sciences, this is by no means true of all of them. It is not unusual to come across theses which are the only substantial work of criticism of a minor author, for example. Even where it does happen, there is usually a significant delay between the completed research and its publication.
And this body of work is only the doctoral thesis literature. In the UK, we are consistent in describing doctoral-level research outputs as ‘theses’, while reserving ‘dissertations’ for less elevated research: that done for masters degrees, or even at undergraduate level. Are we then to exclude dissertations from our consideration, or might there be a value in releasing this research also, or at least a portion of it, into the online corpus of research material? Where it also represents original work, the means of introducing our junior scholars to the world of research, then it surely has a value in advancing knowledge?
ETDs, OAI-PMH and NDLTD
At this point, there is a need to explain some other acronyms. OAI-PMH stands for ‘Open Archive Initiative Protocol for Metadata Harvesting’. NDLTD stands for ‘Networked Digital Library of Theses and Dissertations’. OAI-PMH is fairly new on the scene: the NDLTD, on the other hand, has been around for several years now. The NDLTD was begun by Ed Fox and his colleagues at Virginia Tech, as a movement to promote the development of online theses and dissertations. Joining the NDLTD commits an institution to moving towards a requirement that – from a given launch date, the theses and dissertation (TD) literature, or a portion of it, is mounted online on a university server for free internet access. Many US universities now do this. Virginia Tech alone has over 4,000 ETDs available (more than half of them being Masters dissertations). In Europe, the presence of ETDs is patchier, although Humboldt University in Berlin has developed an impressive system for managing ETDs, and has changed the legislation surrounding thesis literature in Germany (which until recently mandated print output) in the process.
The NDLTD is a realistic and pragmatic movement. It does not expect institutions to change everything overnight. There is a journey to be undertaken, the principal steps of which are:
- a. Ensure that the metadata for ETDs is available online.
- b. Move to a hybrid system of print and online TDs for a period of time.
- c. Arrive at the point at which the online medium is the authorised medium for the production of TDs.
This journey may take several years.
The NDLTD has been in existence since 1997. To date, there has been no membership fee, and it had been run on the efforts of a number of dedicated volunteers, largely inspired by Ed Fox’s philanthropic vision. However, as the international interest grows, and the movement spreads, the organisation has reached the point at which it needs a stronger organisational structure to sustain its efforts, and to undertake promotional activity and support materials in a more concerted way. At this year’s conference it was announced that it is soon to be registered officially in the US as a non-profit membership organisation, and a fee will be levied from members.
NDLTD has been around much longer than OAI-PMH, but the recent development of the latter protocol has given a significant new boost to its objectives. The Open Archives Initiative Protocol for Metadata Harvesting is an important new protocol in the world of internet information. Through the use of a ‘lowest common denominator’ metadata format, it allows those producing metadata for all types of digital objects to ‘expose’ their metadata to a protocol running extensively across the internet, so that that metadata can be harvested by search software and made available to the communities who may wish to harvest it for any purpose they wish. The true significance of the protocol lies in its support for interoperability. It is a tool for building union catalogues from a potentially vast range of discrete collections, and it therefore exploits the power of the internet to make virtually possible what is physically impossible. Collections of theses and dissertations, catalogued by librarians at the institutions in which they were produced, can – via the OAI-PMH – contribute to a vast worldwide virtual database of metadata which can be searched via the same protocol. This means that all the ETDs on the subject of sub-Saharan climatic change, for example, or manipulation of a particular gene, or the economy of Hong Kong, or the works of the Scottish poet Iain Crichton Smith, can be found in a single search.
The NDLTD does not mandate the use of OAI-PMH – at least not yet. Before the new protocol was on the scene, it advocated the use of the Z39.50 protocol for search and retrieval (and the two protocols used together would offer a powerful combination). But it seems clear that the NDLTD’s effort to build a global virtual database of ETDs does point very naturally to the use of the OAI-PMH, and the most commonly-used database systems which have been developed to support ETDs, including the Virginia Tech ETD software, are now being adapted to become OAI-PMH compliant, with OAI data service interfaces being developed.
Is metadata not enough?
Do we need thousands of electronic theses and dissertations clogging the arteries of the internet (or possibly the Grid, in the near future)? In the UK at least we have grown used to engaging with this literature on the basis of its proxies – the metadata for theses, which usually includes an abstract. We have the British Thesis Service (BTS)(2), run by the British Library and supported by most UK universities – though there are a few significant omissions of research universities from its ranks. The BTS takes copies of printed theses produced by its members, and makes microfilm copies of them for loan or sale via the British Library. This saves the individual institutions the trouble of responding to requests for sale or loan copies directly, and also means that the metadata is searchable via a common union catalogue. The British Library made a commitment at the end of 2001 to join the NDLTD, and is planning to convert its microfilm operations to a digitisation-based service, which it also hopes to apply retrospectively to its huge body of thesis literature. When this is achieved – and it must surely be a mammoth task – hundreds of thousands of UK-produced ETDs will find their way onto the NDLTD.
The UK also has a commercial metadata service, the Index to Theses(3) published by Expert Information Ltd, whose coverage does not map exactly onto that of the BTS, although the two services are examining ways of harmonising their operations. The ProQuest service, Digital Dissertations, based on the University Microfilms International (UMI) Dissertations Abstracts database, while it has wide international coverage, does not feature many UK theses, because the UK has been so well catered for by the BTS and Index to Theses. The model, then, has been primarily a centralised one, and one which is based upon metadata. Those wishing to search the thesis literature would most commonly use the Index to Theses, and then order a sale or loan copy of the thesis they wish to consult either from the British Library, or direct from the university concerned, if it is not a member of the BTS.
One reason why the UK has perhaps not moved faster into the world of ETDs is because of this centralised model. Many universities have become used to a procedure involving the despatch of their theses to the British Library, and have considered the management of their theses an issue for the British Library rather than for themselves, shelving their own local copies of theses in closed access stack, and fetching copies out when requested for use on-campus. But we are all familiar with the limitations of microform as opposed to online dissemination, and the difficulty for the British Library’s service has been the size of its operation, which makes switching from a microfilm to an online process costly and time-consuming. Nevertheless, such a switch is necessary for several reasons, the main one being that metadata is not enough. The now intuitive action, for the researcher using this literature, is to proceed from the metadata to the full-text at the instant they wish to.
We might consider this a measure of its responsiveness. Until the advent of ETDs, thesis literature was one of the least responsive. Once identified in an index of theses (of which there were several in themselves), the thesis had to be requested using the local interlibrary loan service, could take some time to arrive, and could then only be used in association with the discomfort of a microform reader machine. If a bound copy was loaned, usually it would be for restricted use in the borrower’s library only. As the rest of the research literature becomes more and more available through aggregated ejournal services, offering instant access to sets of journals extending back now often to their origins, searchable in a variety of ways across a large online corpus, the thesis literature, by contrast, could appear antiquated and intractable.
For that reason, and because web sites are now so prominent in the communications of researchers among themselves, thesis literature has been moving online anyway, in a patchy and uncontrolled way. There is nothing to prevent a student putting a copy of their thesis onto their own web server, or a departmental server. And so there is ‘bottom-up’ pressure to provide ETDs, more than pressure from the organisations whose research is being published. There is also evidence that students know about and use the NDLTD already. Recent figures on the use of the ETDs in the Virginia Tech archive indicate a high number of hits from the UK(4)
The metadata alone is not enough, and is patchy in any case, with the Index to Theses providing the best service. The BTS cannot be searched directly, but recommends that users use the Index to Theses or the SIGLE(5) service, or else the British Library Public Catalogue Books File.(6) This multiplicity of possible routes to finding only the metadata is one of the reasons why the thesis literature is not more sought after by researchers. Many are not prepared to fight their way through the unconnected metadata sources only to end up with a potentially long wait for an item which is likely to arrive in a difficult format.
A strategy for the UK
In developing a strategy for the development of ETDs in the UK, however, we do wish to start with metadata. Theses Alive! will aim to ‘genericise’ the metadata creation process for all UK theses and dissertations, in order to simplify the distribution of metadata while at the same time linking it to an ETD wherever possible. The model we plan is shown below.
Using the submission software, the student creates their own metadata, which is quality-controlled by the Library. Also in the loop, inevitably, is the authority responsible for validating the approved thesis, which we have simply called the ‘registrar’ in the model. Interaction between student and supervisors goes on throughout the course of the degree programme. Finally, the system outputs are the metadata, formatted as required for various agencies, and the ETD itself, which may be a PDF or other file format attachment, or may be part of the same XML file, plus any linked files.
Our expectation is that an XML schema – or perhaps a number of schemas - will be developed for UK theses and dissertations, possibly based upon schemas which already exist for use in another context. A schema will describe each thesis according to its various structural elements, and should support the export of metadata in all of the various formats required, while at the same time describing the full text of the thesis. In other words, PDF is not likely to be sufficient in the longer term. Using XML provides us with a non-proprietary format, with greater scope for database storage of deconstructed documents, greater search flexibility, and the possibility of preserving the ‘raw’ source of the document.
The more challenging task may be to find universities which are willing to allow ETDs to be created in their institutions, and to work with us in the Theses Alive! project, as pilot sites. We are not providing any funding for hardware for sites, but will support them with software installation, and will provide technical and advocacy support. The intention is to solicit interest from institutions willing to act as pilot sites, in August 2002. We hope to have five or six of these, representing a mix of different university types in the UK, and providing both doctoral theses and Masters-level dissertations to the project.
Much of our work will be on the political and cultural changes needed in institutions in order to prepare them for the inevitable future context of ETDs. For some institutions, moving to an environment in which the electronic thesis or dissertation is the authoritative copy, the one which is preserved and used, may seem a huge step which is still years away. Even in the US, the numbers of institutions which have made provision for ETDs is still relatively small, though growing almost daily.
At this point, let me add a little more detail to the three-stage process described above. This is the strategy we wish to see adopted by individual universities in the UK, and supported by Theses Alive!
- Genericise the metadata: this step is not a prerequisite, but it helps inasmuch as it implies the use of a single ETD as a structured digital object, and creates an ‘ETD in waiting’. The ETD is there, but is restricted to non-public access only, by a system administrator. This is the first goal.
- Introduce a hybrid print and electronic TD publishing policy: very few institutions are likely to adopt ETDs outright, without running a parallel print and web service to begin with. During this stage, the print version remains the authority version for a period, but at a particular point, the roles swap over, and the ETD becomes the authority. This is the second goal.
- It is then a fairly short step to the third stage, in which the electronic format is the required format for submission.
Of course, there is a great deal of work to be done within institutions as they move through these stages. Theses Alive! will develop an ETD submission system designed for use in the UK, but rolling it out for use by newly-commencing postgraduate students in universities will involve a lot of effort. University staff will require training in order that they can offer training programmes to the students concerned. It is likely that these staff will be Library staff, although other staff in a training role, from IT services or even academic staff training new postgraduates in research skills, may be the preferred source of this. Virginia Tech uses graduate students themselves, which clearly also has a number of advantages, though it would be a less common model in the UK. A major component of the training programme will be attention to Intellectual Property Rights (IPR). Students will require to be educated not only in their own rights in their theses or dissertations, but also of the need to clear rights for linked or embedded content. Theses Alive! will provide central support for the training programmes in pilot sites.
The number of US universities which have gone down the ‘required’ route is still also small – perhaps around a dozen. Some of them have followed an accelerated process, taking their users by surprise. Others have worked with pilot departments for a while, and then taken the plunge. A number of speakers at the Fifth International Symposium on Electronic Theses and Dissertations, held in Brigham Young University, Utah, last month, spoke about going ‘cold turkey’ - taking a decision to require ETDs, then announcing when it will happen, and dealing with the pain along the way. Many have found this a tough process, but none have gone back on it.
In the US context, those universities which are already far advanced on the ETD path generally have achieved this by means of a collaboration involving four different players on campus – academic staff, administrators, library staff and IT staff. Of these four, the most important group is perhaps the administrators – those involved in the management of graduate education. Most US universities have an organisation on campus called the ‘graduate school’, with a supporting infrastructure which is coherent and well-resourced. Few UK universities have ‘graduate schools’ as such, though they are growing in number. Postgraduate education in the UK is more commonly managed on a departmental basis. Being more fragmented, and less capable of achieving economies of scale across the postgraduate studies layer, may well make the task of engaging these administrators, the Deans of Graduate Schools, considerably more difficult in the UK. But without the support of senior university managers, the ability of an institution to move in the direction of requiring ETDs is likely to be very much compromised. Certainly, the library cannot do it alone – nor the computer services department. Academics can lobby successfully, if they become convinced of the value of the initiative, but they might be content with achieving ETDs in their own department only – a partial solution which will not satisfy the library’s desire for uniform access.
The benefits to postgraduate education
There are clear benefits to scholarship in the development of ETDs. Here is a body of original research material which is hugely underused. Cynics might say that some research supervisors exploit this situation. Knowing that a student’s thesis is unlikely ever to be read by more than a handful of people, might make a supervisor less diligent in providing supervision than is in the best interests of the student. Another way of looking at this is to accept that researchers are under tremendous pressure to do their own research, and getting the PhD out of the way is a necessary evil which, once passed, allows a real research career to begin.
Yet this is to sell our junior scholars short. Providing them with a structured environment in which to write their theses and dissertations makes a major improvement to their experience as postgraduates. Another very clear advantage is the creation of a corpus of ETDs produced by previous cohorts of students, which can serve as exemplars and to support the new TDs being selected by newly-arrived students. The systems which have been developed to support ETD development can do more than simply allow a finished ETD to be uploaded to a server, to await processing by research committee, academic registrar and library. They should support the interactivity between student and supervisor or research committee from day one of the postgraduate degree course. Here, perhaps, is what a real ‘virtual research environment’ should be. They should also provide the student with a record of their progress, containing a history of the changes and edits, of supervisor comments, together with the bibliographic review and experimental history. There are systems around now to support these functions – some open source and others commercial. Theses Alive! is likely to base its UK-oriented system on one which is already in existence.
And in the interests of the advancement of knowledge generally, here we have a new body of material to add to the corpus of freely available research now supported by the web via new standards such as the OAI-PMH and the OpenURL protocol. ETDs can then join eprints in representing the growing body of original research produced by our academic community, which is quality-controlled – or has at least some indication of quality control status – and is provided for peers to use as they wish, at no cost – not even that of an interlibrary loan copy on microfilm, or a photocopy.
The systems which handle eprints and ETDs, indeed, share a considerable amount of functionality. Much of the workflow is similar, though the sequencing is not quite the same. Both follow the same basic sequence of preparation – submission – review – finished publication – although ETDs obviously have much more in-progress review. A key difference will be in their readiness to join the corpus. A researcher using a search provider to query the research corpus is only likely to be interested in completed ETDs (and extremely unlikely to be able to find anything else), whereas they may be happy to search for eprints which include those submitted for publication but not yet published. An interesting area of overlap occurs in the case of research programmes which require their students to publish an article or a number of articles in a peer-reviewed journal as part of their degree. It is not difficult to imagine a scholarly publishing system which allowed students working in an ETD module to tap in to the functionality used by academics in eprint self-archiving and journal submission.
Theses Alive! is a project which many consider well overdue in the UK. The hard work of the UK Theses Online Group over the past few years now has an opportunity of bearing fruit. That work will commence in the autumn, and we would welcome expressions of interest in participating in it from universities across the country.
June 2002. © John MacColl. Non-exclusive right of publication granted.
- (1) Science & Engineering Library, Learning & Information Centre [www.sellic.ed.ac.uk]
- (2) www.bl.uk/services/document/brittheses.html
- (3) www.theses.com
- (4) See http://scholar.lib.vt.edu/theses/data/somefacts.html#logs
- (5) ‘System of Information for Grey Literature in Europe’ [www.kb.nl/infolev/eagle/frames.htm]
- (6) http://blpc.bl.uk/