The Second Digital Repositories Programme Meeting

Julie Allinson and Mahendra Mahey report on a 2-day JISC Digital Repositories Meeting focusing on project clusters working together and other related issues held by JISC in Warwick, UK over 27-28 March 2006.

The JISC (Joint Information Systems Committee) Digital Repositories Programme [1] held its second Programme meeting towards the end of March. Following in the collaborative tradition set by last October's joint Programme meeting with the Digital Preservation and Asset Management Programme [2], this gathering was themed around the cluster groups established by the Digital Repositories Programme [3] and included many guests from other JISC areas of work and beyond. These clusters seek to encompass many of the diverse issues being considered across the Digital Repositories Programme, including the different repository types (e-Learning and Scientific data), the infrastructural and technical issues (Integrating infrastructure and Machine services) and the social, cultural and legal topics (Legal and policy, Personal resource management strategies and Preservation).

Invited guests included members of the Repositories and Preservation Advisory Group, whose formal meeting followed on the Tuesday afternoon, representatives from other JISC projects, such as those falling within the Digital Preservation and Asset Management Programme, and external invitees from the Wellcome Trust and Key Perspectives Ltd. [4]

Informal Meetings

On the Monday morning, some time was made available for an informal thematic meeting on personal resource management strategies - a topic that cuts across many of the projects, irrespective of domain and cluster.

Personal Resource Management Strategies

This session [5] looked at harnessing, supporting and integrating personal resource management strategies within repositories. Sarah Currier introduced the session and the theme, which is relevant to her project, Community Dimensions of Learning Object Repositories. Richard Green then gave a presentation on some of the outcomes of the RepoMMan (Repository Metadata and Management) project [6] . RepoMMan are building a workflow tool on top of FEDORA (Flexible Extensible Digital Object Repository Architecture) [7] to enable users to use a repository for the entire lifecycle of a document, from inception to publication. Richard talked about their online user requirements survey, which has had some interesting findings. These suggested that digital repositories were seen as a welcome addition and could contribute to personal resource management, particularly for users whose current strategies are haphazard.

Chris Pegler then talked about the PROWE (Personal Repositories Online Wiki Environment) [8] project which is looking as the use of wikis and informal mechanisms for part-time, distributed lecturers. The project is a partnership between the Open University and the University of Leicester, both of whom has large distance learning programmes. Again personal resource management plays an important role where lecturers might be employed at multiple institutions, in diverse locations and on a part-time basis.

Discussion at the end of this session acknowledged that there were a wide range of issues here, including version control, workflow, the use of file formats, the role of communities and the use of formal and informal approaches.

Plenary Presentation : Monday

Rachel Bruce, JISC Programme Director for the Information Environment, gave the keynote address to the meeting on Monday. The focus was on the current JISC capital spending on repositories. Rachel outlined how the £14 million would be spent across different workpackages relating to repositories. Calls for proposals will be issued in April and September 2006, and April 2007. The overall aim is to improve the digital content infrastructure through curation, interoperability and discovery. Funding areas included funding for local repository creation and innovation and, at a national level, money for improving interoperability, infrastructure, shared services and improving repository knowledge and skills through a network of experts for repositories.

Plenary Presentation : Tuesday

Brian Kelly, from UKOLN, gave a talk on 'Standards For JISC's Digital Repositories Programme' [9]. Brian looked at the approach to the use of standards developed for JISC's development programmes that focuses on the use of open standards wherever possible. Brian also gave an overview of standards for the digital repositories programme, looking at some specific areas of interest: standards for metadata, harvesting and identifiers. In addition, Brian gave some suggestions for how feedback might be gathered.

Cluster Sessions

The remainder of the meeting was spent in parallel cluster sessions.

E-Learning Cluster Session

This session [10] was chaired by Amber Thomas from JISC and was attended by CD-LOR (Community Dimension of Learning Object Repositories) [11], PROWE, Rights and Rewards [12], WM-Share (West Midlands - Share) [13], JORUM [14] and TrustDR (Trust in Digital Repositories) [15]. Each project gave a short presentation, focussing on one of two topics: 'Communities - finding or building them?' or 'IPR in Elearning repositories'. In addition, Phil Barker, CETIS (Centre for Educational Technology Interoperability Standards) [16], gave a briefing on OAI-PMH (Open Archives Initiative's Protocol for MetaData Harvesting). Phil outlined the background of OAI PMH, emerging from the world of sharing e-prints through the web. He explained the way that the protocol works, and how it is related to metadata harvesting tools. He showed practical examples of how the query language works over a browser interface or with other systems.

Communities: Finding or Building Them?

Discussion here covered the usefulness of community as a notion, and whether it should encompass set groups such as teaching teams. It was suggested that there can be dysfunctional communities, and some negative experiences of communities were discussed. There is evidence from JORUM and other repositories that users do value the community concept and there is an incentive to contribute and use it. All the projects noted that people are concerned about 'making things visible', the notion of 'reification'. It seemed that projects all conceptualised groups of users as being key rather than isolated individuals, but whether the groups are tightly defined enough or voluntary enough to be 'communities' was a point for discussion.

Intellectual Property Rights (IPR) in Elearning Repositories

Discussion here covered widespread common practices that are contrary to best practice. They may arise from lack of awareness but there is also calculated risk-taking. Once the content is more widely available, the risk of the IPR owner finding out about infringements is much greater, so the risk assessment needs reviewing. Effective notice and takedown procedures are crucial. It was suggested that attributing third party copyright is good practice and may actually reduce the risk of the IPR owner taking action.

At the end of the session some actions and suggestions for future work were identified and included using the Digital Repositories Programme wiki in various ways to share and collaborate.

Integrating Infrastructure Cluster Session

This session [17] was chaired by Neil Jacobs from JISC and attended by a diverse range of projects, including IRI Scotland (Institutional Repository Infrastructure for Scotland) [18], SHERPA (Securing a Hybrid Environment for Research Preservation and Access) Plus [19], PerX (Pilot Engineering Repository XSearch)[20], MIDESS (Management of Images in a Distributed Environment with Shared Services) [21], Repository Bridge [22], Community Eprints [23], IESR (Information Environment Service Registry) [24], IEMSR (Information Environment Metadata Schema Registry) [25], SPIRE (Secure Personal Institutional and Inter-Institutional Repository Environment) [26], Versions (Versions of Eprints - user Requirements Study and Investigation Of the Need for Standards) [27], Geo-X-Walk [28] , STARGATE (Static Repository Gateway and Toolkit) and the JISC Linking UK Repositories Study [29]. Prior to the meeting, projects in this cluster were asked to submit summaries and identify questions and/or topic areas for discussion.

To open the meeting, Neil Jacobs gave a brief outline of the forthcoming JISC funding for repositories before introducing Alma Swan, from Key Perspectives Ltd.[30], to talk bout the Linking Repositories scoping study. This study is being carried out by Key Perspectives Ltd and the University of Hull, with significant input from SHERPA (Securing a Hybrid Environment for Research Preservation and Access) [31] and the University of Southampton. The final report is due at the end of April 2006 and its purpose is to scope technical and organisational models for establishing a national repository services infrastructure. The study is looking at user requirements, the roles and responsibilities of repositories and services, technical architecture and business models for offering a viable and sustainable infrastructure. Alma talked about findings to-date identifying a wide range of issues. In conclusion, Alma stressed that communication is critical to the development of good, viable, sustainable services.

Following on from Alma's presentation, there was discussion of the need to understand the repository landscape, including the role and interrelationship of informal (such as P2P) and formal repositories, where function and context define the standards and technologies used. The group also discussed the potential for repository coalition(s) which were seen to have roles in helping clarify confusion over the landscape, in supporting repository development, providing a collective voice and drawing together the resources that already exist. These coalition(s) would naturally grow out of communities, which are created for a range of purposes. Understanding these communities and their needs would help define repository services, which might need to be specific repository-level services or aggregated services provided at a higher level. Affecting culture change and changing the notion of publishing were seen as a beneficial outcome of creating a better understood, sustainable and integrated repository landscape.

Scientific Data Cluster Session

This session was chaired by Rachel Bruce from JISC. Attendees at this session [32] included eBank [33], GRADE (Scoping a Geospatial Repository for Academic Deposit and Extraction) [34], R4L (Repository for the Laboratory) [35], StORe (Source-to-Output Repositories) [36] and SPECTra (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) [37], User Needs and Potential Users of Public Repositories [38] and CLADDIER (Citation, Location, And Deposition in Discipline and Institutional Repositories) [39] and RepoMMan. Each project discussed issues around the themes of 'identifiers', 'metadata' and 'digital rights management'. The session also included a presentation from Robert Terry of the Wellcome Institute, on UK PubMed Central [40].


There was general agreement about the fact that it was difficult enough to agree unique identifiers with formal publications but there are even greater difficulties with range of projects present and different data types to identify, e.g. digital objects, data streams, information packages etc.

The CLADDIER project felt there was too much diversity to standardise on identifiers. Each archive in the CLADDIER project is therefore using it's own identifier to maintain autonomy. e-Bank overlaps with R4L and are using DOIs (Digital Object Identifiers) [41] to identify datasets. The cost currently is12 euro cents to register and 2 cents to maintain per year for each identifier. E-bank stated that 'resolution' (the returning of metadata about the object and not a copy of it) is a big issue around identifiers. E-bank reported that the DOI database does not cover as much metadata as other identifier systems and so it may be of limited future use in the context of being used to identify datasets. It was stressed that there needs to be discussion around how to create stable URIs (Universal Resource Indicators) as this is a major issue for e-bank hinging largely on how long the URI provider is around.

GRADE are partners with CLADDIER who use the Handlers system [42] and the project is looking at the use and re-use of geospatial data from formal to informal contexts and including Digital Rights Management (DRM). One issue emerging for the project is a discussion on when to identify a particular data set. Probity (integrity) is a therefore an interesting issue in this area and is also related to whole issue of digital rights.

StORE reported that their project is looking at potential relationships between source and output repositories so that is possible to look in both directions from published paper to source data and back. Currently, there is no evidence to back up how a research article based on data got its results as there is no link back to the data the paper was based. However, as long as the data the research is based on is also stored in a repository it would be possible to use an identifier from the research article to do this. Therefore StORE would be interested in the types of identifiers being used by projects to reference articles as well as data.

SPECTRA pointed out that cost becomes a considerable problem with identifiers especially when there is large scale handling of data and therefore potentially thousands of data sets might need identifiers. E-bank pointed out that one way of approaching large numbers of data sets is to merge several into one report and then register that report as a data set. SPECTRA also highlighted issues that need to be discussed about versioning of data where a data set becomes unique due to changes made and therefore is not simply another version of the dataset but rather a completely new one.

It was felt by the group that there is a potential role for funding bodies to make money available for identifier registration to make sure identifiers are use to link to datasets available.

Robert Terry from Pub Med Central pointed out that the ADC (Astronomical Data Center) [43] has quite a strict policy on the quality of dataset it accepts and therefore could be a good model for projects to look at as to when to accept one being assigned an identifier. He made the point that a line needs to be drawn on what data needs to be identified using global identifiers as opposed to local ones. Also, he mentioned that no one had thought of the additional cost of keeping clinical trial data on computers for ten years and curation issues need to be discussed in relation to all datasets. Finally, there is not a reward system to making data available and there are significant differences in how data is shared and managed across disciplines. Planetary research is a good example of how it can take 25 years to get probes developed which then collect data; and then generally the team that runs the instrument gets exclusive use of data and it is only after an embargo period that it goes into a public repository for other researchers to use.

Rights Issues

The data cluster then discussed rights issues:

The GRADE project identified in the UK geospatial community copyright is retained even over re-mapping. The team are therefore currently working on what area of law this comes under, particularly for Crown Copyright, which is concerned with paper. Currently some projects cannot share their data because it is based on OS (Ordinance Survey) base maps. The Office of Public Sector Information is looking to change their position on availability of the data.

CLADDIER is likely to have to look at rights issues of data at some point in time but has no specific plans to do so at present.

e-BANK have had to think about a rights model, as making data public then negates IPR and patent claims. Rights can be relatively simple as they usually reside with the institution that created it and making the data publicly available is therefore their responsibility. One promising area is that of an 'embargo period' where other people's work is now being given away after a fixed period of time which is specified by the depositor when they deposit their work in the repository in the first place. This could serve as a model for other academic disciplines where this might be an issue.

StORE is not having to find rights answers because repositories owners (usually the institution) make their own rules.

The SPECTRA project communicated that most chemists are worried about other researchers using their material before it goes into a peer review publication and quite often they would like to have a five year embargo on their data. The open access principle had to be compromised to get chemists on board. There is therefore an argument that this issue can be resolved by an agreement for chemists not to publish articles using the source data. SPECTRA poised the interesting question of whether it worth challenging these assumptions to make data more widely available, s academics would be prepared to at least submit their articles for publication if not the associated data.

UK PubMed Central (Robert Terry) and the Wellcome Trust provides funding to cover the costs of publishing for the top 30 universities in the UK. They felt there was a need to make it as easy as possible for researchers to publish and by doing this would create a critical mass of people which would in turn influence policy. Robert stated that the service was starting to get a significant international base of published data that could get to the heart of difficulties around standard ways of publishing for the academic community.


Most projects use standards for metadata that have already been developed due to the need for interoperability.

Further Actions

A number of future actions were agreed by this cluster and can be viewed on the wiki [32].

Legal and Policy Issues Cluster Session

This session .

This session [44] was chaired by Naomi Korn and attended by TrustDR, GRADE , Rights & Rewards, VERSIONS, EThOS (Electronic Theses Online Service) [45]. To start the session, each project gave a short presentation about its work. Following this, Naomi Korn and Charles Oppenheim provided an update from the JISC IPR consultancy, a 2 year consultancy ongoing until 2007, internal to JISC and supporting projects/programmes and producing reports and news letters for JISC staff. It differs from JISC which is a front end advisory service that is proactive and reactive to user queries. There are 3 consultants: Naomi Korn, Charles Oppenheim, and Sol Picciotto. The session then divided into small groups to discuss the following statement: 'Enforcing contracts is not always possible, so the use, access and contribution of material to repositories will often require varying levels of trust and other methods'. Key issues reported back to the other cluster groups included agreement that commonalities between projects exist yet they are also diverse and taking different approaches. Different communities and media types affect the way rights are approached, but diversity of approach may be a good thing. Through talking to each other, communities can address overlaps and differences and find solutions. The group agreed that the cluster was a useful forum and should move forward by inviting new projects, holding regular meetings and sharing resources and information via the Programme wiki [44].

Repositories and Preservation Cluster Session

This session was chaired by Steve Hitchcock from the University of Southampton. Attendees at this session [46] included Sherpa DP [47], Sherpa Plus, PARADIGM (Personal Archives Accessible in Digital Media) [48], PRESERV (PReservation Eprint SERVices) [49], Repository Bridge, DCC (Digital Curation Centre) [50] and MIDESS. Steve Hitchcock introduced the meeting and its aim to enable collaboration between projects. Thinking about new areas and projects for future JISC funding was also encouraged. Helen Hockx-Yu then gave a recap of the 12 workpackages for forthcoming JISC funding, outlining some of the preservation elements. There was some discussion at this point about support for the uptake of XML (Extensible Markup Language).

Steve Hitchcock opened the discussion session with a short presentation on the content-service provider model and talked about projects that are developing practical applications based on this model. Referring to the OAIS (Open Archival Information System) [51], Steve noted that repositories are well advanced in fulfilling data management functions, but preservation elements are lacking. Existing expertise, such as that brought together by the cluster, can be used to plug this gap. Discussion kicked off with some debate around the definition of curation and preservation. Other broad areas of discussion covered lifecycle and advocacy and issues relating to embedding preservation practices into the workflow at the point of creation, rather than attempting to retrospectively preserve. The importance of metadata in this process should not be underestimated. IPR and DRM issues surfaced also in collecting and preserving material, particularly where transfer occurs. There was some debate about assessment factors and risk, with general agreement that preservation risks should not be over-stated or used as a 'scare tactic'. It is better to look to the shorter term and at specific needs, such as specifying standards for preserving multimedia and scientific data. Trust and certification was also discussed, with some concern expressed over developing a high-level certification process which could act as a barrier to repositories taking preservation on board.

Ideas for further work included sharing knowledge and tools through exchanges of code or expertise, DCC-organised events, using the Programme wiki to share and discuss, holding further focussed meetings, ensuring skills are widely used and exploring lightweight approaches to certification.

Machine services

This session was chaired by Philip Vaughan from JISC. Attendees at this session [52] included IESR, IEMSR, RepoMMan, GeoXWalk, HILT (High-Level Thesaurus) [53] and ASK(Accessing and Storing Knowledge) [54].

There was an initial discussion of what machine and shared services were and list of characteristics were drawn up:

After this, IESR, IEMSR, GeoXWalk and HILT gave presentations which are available on the programme wiki [52]. Each presentation was followed by a question/answer session.

Following this, JISC programme projects present gave a brief overview of how their projects are using machine to machine services (or could use them) and how they could possibly link and use some of the services that were presented earlier.

The following issues and ideas were agreed that would be taken forward:


Thanks to Amber Thomas, Sarah Currier and Brian Kelly.


Each session reported back to the whole meeting, presenting some of the issues discussed and ideas for further work. Overall the Second Digital Repositories meeting provided an opportunity for projects within the Programme to meet, network, share experiences, concerns and issues and identify ways of working together within their cluster groups. By inviting projects from across JISC and beyond, the scope of activity and collaboration was further widened to ensure that repository development is embedded within and across JISC activities over the coming years.


