At Glasgow Caledonian University (GCU) the Spoken Word , a project in the JISC / NSF Digital Libraries in the Classroom (DLiC) programme , was conceived in 2001-2002 in response to a set of pedagogical and institutional imperatives. A small group of social scientists had, since the 1990s, been promoting the idea of using 'an information technology-intensive learning environment' to recapture some of the traditional aspirations of Scottish Higher Education, in particular independent, critical and co-operative learning . And the institution was about to embark on the construction of the SALTIRE , a purpose-built learning centre, incorporating a full set of resources to support the contemporary learner.
The partners in the five-year (2003-08) project are: in the USA, the MATRIX  unit at Michigan State University (MSU) with the OYEZ project and Academic Technologies  at Northwestern University (NU); and, in the UK, Learner Support at GCU with the British Broadcasting Corporation (BBC) Information and Archives. Prior to the submission the project partners were linked through their interests, and cooperation, in developing the academic use of remote digital libraries of audio and video content.
GCU set out to collaborate with these partners to demonstrate how substantial resources of audio (and subsequently video) from the archives of the BBC might be made available to students and their teachers. The project was to be exemplary of the approaches to learning and scholarship and the concomitant supports offered to the University community from the new SALTIRE learning centre. It was also intended to build capacities within the support services, and in particular, to embed resources for electronic library development.
Library standards were important to GCU from the outset. A stated objective was to induce students to 'write on and for the Internet' using authoritative sources and legitimate rhetoric . To secure the confidence of academic supervisors procedures analogous to those used in traditional text were pursued. In particular, the citing and referencing of accessible sources of established provenance and persistence had to be possible. Tools to address digital libraries, retrieve content and support the creation of student projects were needed.
A central aspiration was to develop resources to meet these standards and yet support flexible and adaptable delivery. The intention was to accommodate 'pedagogical pluralism' - allowing for different approaches to, and styles of, teaching by different personalities and within different disciplines. A range of teachers and disciplines, from GCU and from other universities in the EU and the USA, has been involved throughout. From 'privatisation and regulation' in economics and 'the ethics of biology' at GCU, through 'hospitality management' at University of Strathclyde and 'the impact of technology from 1945' in history at Northwestern to 'Gandhi' for anthropology and 'women in British politics' for women's studies at Kansas State, the range of disciplines and topics using the service has been considerable. Over 1,500 registered users have used the service. A second stated objective of the original project was to 'enhance digital libraries' (of, for example, BBC content) for academic use and that has been a major priority. High 'usability' was a prerequisite for successful use in the essentially (if symbiotically) linked activities of learning, teaching and research.
The requirement to serve diversity, and a consciousness of relatively rapid and continuous technological change, were the major influences on the design of the service.
As a general principle, an approach analogous to 'separation of concerns' programming was adopted. Partitioning of the functions and components of the service provided flexibility and better facilitated the management of change. This approach allowed a 'division of labour' but the associated 'specialisation of function' has been harder to maintain in a small team. The general principle of maintaining some separation of: front-end applications / presentation layer; a business logic layer; and a backend storage and delivery layer has not been fully realised but remains a goal. Thus ideally the tools for searching, collecting items, marking and annotating clips and creating final presentations would be portable and might address a range of content-providing repositories through appropriate middleware. In the current set-up they are either completely integrated or relatively separate. The goal is 'plug-in plug-out'.
Fundamental to the provision of the service was access to a satisfactorily large and appropriate subset of BBC content (programmes and metadata). 'Satisfactorily large' would mean sufficient content to allow users choices and alternatives. Students would be obliged to choose carefully what was best for their stated purposes.
The negotiation of rights represented the largest obstacle. GCU and the BBC entered into a legal deposit agreement in which the Corporation gave the University non-exclusive permissions to hold content deposited from the archives and to serve them internationally 'for educational use only' as both streams for access from GCU URLs and as downloads 'for individual study use only'. The University were required to accept liability for securing third-party rights and thus for undertaking the clearance of such rights. This necessitated the establishment of a rights clearance team and, in the first instance, the use of a FileMaker database to keep an audit of the processes of clearance and to record data and calculate the basic statistics. All participants in programmes broadcast from 1988 onwards are searched for and, if found, contacted and asked to give permission for educational use, worldwide through the Web. Of those successfully contacted, the vast majority has granted permissions. Full clearance statistics will be published before the end of the project.
The next element in the process was the identification of programmes of potential interest to teachers. Disciplinary experts (teachers) are assisted by the project librarians to select programmes from the BBC's proprietary catalogue (INFAX). Initially this process relied on a private Greenstone  representation of the BBC catalogue. Recently, as an aspect of a set of general initiatives to provide wider access to the archives, the BBC has made its catalogue publicly available online  and this has been used for selection. Selected programmes are requested in batches from the BBC which digitises as necessary. Digitised programmes are delivered to GCU and transcoded from the original .wav files for streaming and downloading from Spoken Word GCU servers. They are then linked to BBC and additional GCU-generated metadata in a repository and streamed as MPEG4/H264 with MP3/MPEG4 downloads.
At an early stage (2004), the project successfully ingested the data from the BBC (INFAX) catalogue into a FEDORA  repository - but the development effort required to use the FEDORA application to create a working service, with acceptable user interfaces, in an acceptable time-frame, was deemed too heavy. REPOS  is a working repository solution (PHP/MySQL) created by MSU. Although it had shortcomings in relation to the project vision, it enabled the rapid development of a working digital library. It has supported a working service for the last three years. However the front end had been not developed for use by teachers and students. MATRIX at MSU provided a rudimentary version of a more user-friendly interface and GCU developed and elaborated this into Padova, which we describe in more detail below.
Each of the records created is a super-set of the original BBC catalogue entry and has a link to the digitised BBC programme. The additional information provided always includes the name of the teacher who is the collector and his or her discipline. The original intention of collecting additional rich information from the expert collectors and for investigating the capture of 'user-generated metadata' has not yet been fully pursued: in the service version we do not, as yet, have the capacity to enable users to write back to the repository.
The initial requirement of users of a library of multimedia items is to find programmes or items and then to assess their appropriateness. The goal of the service is to provide a range of ways to locate resources. Ideally, searches could be initiated from within tools which collect items for later use and/or enable text or voice annotations of programmes or clips. Searching from the catalogues (OPACS etc,) to locate books and articles would also be convenient. Both Sentient Discover and Sirsi Rooms were investigated; however, it was felt their implementation could not be justified within the scope of the project. On a demonstration and development server we have entries and live links to our repository running in an instance of our University Sirsi Unicorn library catalogue; we hope to achieve a full implementation for the coming session.
But Padova was developed as our main finding aid.
Padova is the name given to the public-facing, Web-delivered, front end to the REPOS repository. It provides search, browse, citation, notification, and external service integration systems to users, along with delivery of the media itself.
A central concern was maintaining the integrity of its BBC deposit agreement with GCU. An authentication layer was developed, enabling metadata to be openly searchable - but with the BBC media protected behind a registration process which obliged users to accept the terms of a 'student-friendly' End User License Agreement .
In pursuit of the vision of enhancing the capacities of learners to write 'on and for the Internet', several features were developed to encourage the use of established scholarly practices in the new repository environment. Of particular significance is the inclusion of a citation system, enabling consistent references to content to be downloaded for import into standard reference management software. Citations contain a URL link to the media item, allowing learners to build their own libraries of Spoken Word content.
Padova incorporates notification systems, providing RSS and Atom feeds for searches. Though now relatively commonplace, Padova was one of the first systems to allow users to generate their own personalised RSS feeds based on any search query with links to media items attached. This can create, for any feed reading application, a 'live linked' listing of content akin to a bibliography. When a new item conforming to the search is added, it appears in the list. Registered users can also access the functionality to create RSS feeds with the media items attached as MP3 files, thus supporting the creation of instant and customised subject-specific podcasts.
External services are integrated throughout and can be further developed. Currently, automatic links to Wikipedia pages and to Google Scholar provide searches based on names of programme participants parsed out of record metadata, as well as internal links to other Spoken Word resources featuring those individuals. For example a search for "Bedouin" will find a programme which lists P.J. O'Rourke as a participant and that entry will be hyper-linked to occurrences in both external services and in the internal catalogue.
Delivery of audio and video material is of course the main purpose of Padova. The Project has obligations to restrict the use of BBC media on third-party educational Web sites Web to streaming media only. On teaching sites, including VLEs, a GCU URL must link to the GCU servers. This led to the integration of Apple's Darwin Streaming Server within the repository infrastructure as a cost-effective solution to the delivery of audio and video in a relatively standards-compliant fashion. Investigation of Adobe Flash as a media delivery format is underway, and provision is planned as an alternative delivery mechanism. However, the MPEG family of standards looks set to remain the project's mainstay for the foreseeable future.
MATRIX has developed a tool set called MediaMatrix  which allows users to visit Web pages and collect resources for subsequent use. This software supports:
The items collected are stored on the user's personal portal page in MATRIX servers. This page keeps all their annotations and maintains direct links to the marked portions of the original pages.
Users can organise their thoughts on the portal page using folders. Teachers can create a group for each of their classes and invite students to join that group, allowing both the teacher and students to preview the work of, and collaborate with, other members of the class. Members of a group have complete control over permissions for access allowing a folder to be shared with the group or kept private.
Attaching text (or voice) annotations to marked sections of audio, video or images is an attractive goal. Being able to do that in a shared real-time environment offers a host of teaching, research and development possibilities. Project Pad  from Academic Technologies at Northwestern University delivers that functionality in a particularly attractive and customisable user environment. It is programmed in Java and Flash (ActiveScript). Its major functions are:
The imperatives for the library world of the widely acknowledged and ongoing "Revolution in Scholarly Communication"  is dependent upon and supported by trusted digital repositories. As Spoken Word considered the transition from a project to a set of core digital library services , the need for a long-term robust repository solution became clear. General inquiries as to who was doing what, and existing expertise from our project partners in Academic Technologies at NU led to early investigations into the open source repository software Fedora.
Although complex, FEDORA repository software offers a flexible service-oriented architecture for managing and delivering digital content. In late 2005 meetings in Denmark and in Wales led to the formation of a UK and Ireland Fedora Users Group. Spoken Word Services attended the first meeting of this group, held in the University of Hull in May 2006. Subsequent meetings at GCU and the University of Oxford have seen the numbers of participants grow. A full list of current users, meetings and other activities is available from the Fedora UK&I wiki . This domestic interest and our experience of group discussions and activities further reinforced our decision to investigate actively the migration to Fedora.
Furthermore, membership of the Users Group has led to a number of important developments and collaborations for Spoken Word Services. Colleagues at the E-Services Integration Group, University of Hull  have been particularly supportive and have shared considerable expertise in digital preservation, demonstrated in existing projects such as RepoMMan, the Repository Metadata and Management project . The REMAP Project, funded under the JISC 'Repositories and Preservation' Programme , initiates a formal partnership between the Hull group and Spoken Word Services. It builds on the work of RepoMMan to investigate the use of a digital repository to support the embedding of records management and digital preservation in the context of a UK Higher Education institution.
Spoken Word Services is increasingly concerned with the curation and preservation of audio and video materials produced by GCU and other partners. Projects such as REMAP are important in helping us to consider the functionality of the tools required in managing and developing our repository. Working with Fedora has also led us to discussions with the IRIScotland project , which is building an institutional repository infrastructure for Scotland. The intention is to share from each other's experience.
Spoken Word Services is a partner in the EDINA-led Visual and Sound Materials (VSM) portal scoping study and demonstrator project , funded under the JISC Portals Programme. The aim of the project is to investigate the value and feasibility of a national portal for both time-based media and image collections dedicated to the needs of the Further and Higher Education communities. It is anticipated that the portal demonstrator will go live in January 2008 and will continue to be available for a further two years. The intention is that EDINA will harvest from the Spoken Word FEDORA.
During the lifetime of the Spoken Word Project, the BBC itself has been through many changes, both institutional and technological. There has been a considerable push to increase public accessibility to BBC archive material, firstly through the Creative Archive Project , and more recently, through the online publication of the BBC Programme Catalogue. Rights remain a major challenge. The exclusively educational focus of the Spoken Word, and the related legal deposit agreement with the BBC remains unique. The experience of careful third-party rights clearance to secure permissions for educational use is proving highly instructive and will be of considerable use to both the BBC and the educational community. The creation of a metadata repository 'secondary' to the 'primary' data from the BBC INFAX data set suggests a useful strategy for the re-cataloguing for scholarly use of the one million plus items held by the BBC. Multiple 'secondary' metadata repositories could provide manageable specialist finding aids for a range of disciplines and purposes. Knowledgeable scholars can collect and annotate resources of direct relevance to their areas of expertise for research, learning and teaching. In a process analogous to the use of 'slips' for the construction of the Oxford English Dictionary, learned users can enhance digital libraries for scholarly purposes.
In general we have demonstrated to our BBC colleagues the value and use of their digitised materials in the environment of a community-based trusted digital repository. These materials, which are created and preserved by public funds, form the basis of an extremely rich source of scholarly material both within the BBC and for use by external scholars. Viewing them as a remote digital library rather than a mere collection of assets has considerable advantages for both preservation and use.
To take full advantage of the possibilities offered by digital repositories, the metadata produced by authoritative expert users must be captured and written back to the repository. Some aspects of that work are being approached though the linkage of Project Pad to Fedora which is mentioned above. The XML produced by the application annotations provide the potential for enhancing the metadata but the range of issues raised by 'tagging' and 'social sofware' which relate to authoritative provenance, consistent metadata schemes and so on, are all relevant to the operation of such a system.
Spoken Word Services has deliberately invested in capacity building at GCU and now hopes that transfer of knowledge and technology will enable complimentary local institutional repository development. Personnel for the Spoken Word project was either seconded from University positions or was firstly appointed to the University and then seconded. As the project nears completion, staff will either be appointed to the Spoken Word Services or will return to their previous positions. In either event, the expertise and experience gained on the project will be retained by the University. Knowledge transfer has also occurred. Spoken Word has worked formally with a number of units in the University and informally with others. The Library, Learning Resources and Learner Support are all examples of such co-operation. Teaching and learning has been an important area of collaboration and knowledge transfer.
The selection of the materials to be acquired from the BBC was largely undertaken by teachers. Most of these teachers, but not all, were at GCU. This has led to interactions amongst the GCU teachers and between them and teachers elsewhere. In cooperation with the Caledonian Academy , a formal discussion of pedagogical approaches and notions for learning improvement related to multimedia is about to be initiated. Teachers who have used our content and tools will be asked to reflect on their experience and a new group of teachers will be invited to speculate on their expectations and how these might be realised.
A set of Higher Education Academy  projects has been undertaken as part of the project dissemination study. The disciplines involved have been Social Policy, Political Economy and Hospitality Management. The resources have been used in a variety of ways including podcasting.
The establishment of a set of digital library services at Glasgow Caledonian University was planned to allow for development beyond 2008. The building expertise and capacity in relation to repositories and linking them to teaching and learning has been a particularly important aspect of that strategy. But we are also intending to capitalise further on the links we have created with other institutions and development communities. A 'Son of Spoken Word' project is currently under discussion with the potential for worldwide scholarly communication using a range of broadcast archives. The drive for interoperability pushed by the Open Data movement makes 'open access multi-media resources' enormously attractive for scholarly exploitation (both the 'treasures' in 20th century collections and the huge daily outputs from new media sources which are increasingly challenging the dominance of traditional mass media). Users should be able to access content across media types (e.g. images, texts, video and audio) and across language boundaries both easily and intuitively.
Demonstrating and asserting the rather special requirements of academics, and promoting the potential of their contributions, is a very valuable project. It is necessarily international: a lack of this dimension is a major weakness of some current and recent digitisation projects. Rights are a central issue in this regard. It is hard to contemplate a satisfactory network of scholarly communications which is limited to a single rights regime. International, educational rights and their protection would have to be an element for investigation. We hope that Spoken Word has made a worthwhile contribution to the development of such an international network, and will continue to do so.