This workshop was held at the University of Lancaster Centre for e-Science. The organisers were Rob Crouchley, Rob Allan and Caroline Ingram, there were 17 other attendees.
The main aim of this workshop was to explore the relationship between digital repositories, e-Research and Portals in the UK with a view to discovering e-infrastructure gaps and articulating requirements. The hosts had been commissioned by JISC to undertake the ITT: JISC Information Environment Portal activity - supporting the needs of e-Research .
The aims and objectives of the study were:
The workshop was recorded and will be made available as a resource on the Web for future reference (as has been done with previous workshops).
The event was a lively one with a wide range of talks, these included: Data Webs and repositories for subject-specific collections in Zoology and for Geospatial Information; Open Archival publication and Digital Curation; the need to provide a greater variety of Information Environment tools, inter-operating with existing ones such as the Information Environment Service Registry (IESR); the need to link data with information and make both available in the 'Discovery to Delivery' cycle.
Most of the speakers are involved with JISC-funded projects, or have been at one time. We now provide a short summary of each presentation.
Simon Coles, National Crystallography Service, University of Southampton
Simon gave a far-ranging introduction from the perspective of a practising chemist who is also the head of the UK Crystallography Service run from University of Southampton. He has been inspirational in the e-Bank and R4L projects which are investigating tools to support the research life cycle and linking repositories directly into the laboratory for archival of data and associated meta-data. A key message was that underlying data is not suitable for the 'printed page' as attempts to represent it result in a loss of information. Published papers should contain the interpretation and intellectual input and a link to the actual data for re-evaluation if necessary. Current tools to discover and interpret data are however barely adequate. Simon reported that Google copes quite well with the chemical identifiers (InChI) used for indexing purposes. Current interfaces to the UK Crystallography Service are included in portals such as Intute: Science, Engineering and Technology (formerly PSIGate), and OAI is used for the repositories.
Fred Friend, University College London
Fred continued the line of discussion started by Simon, but addressed the question of where the archives should be hosted. There are several "factors" involved in this which include: cultural/ loyalty; political; convenience. In the end it was suggested that there is no perfect solution and that multiple repository types would persist. There were questions such as "who holds my publications if I change institution?" and "how do we classify a facility provider such as CCLRC which fits neither the institution nor subject category?". In the end, users will probably choose either:
Repository providers will need to supply tools which enable cross searching.
Chris Awre, University of Hull
Chris outlined the history of portals as we now see them and some JISC-funded projects. There are many different portals for different purposes. Within an institution this has to be managed and a common interface provided for the users with single sign-on. Standards such as WSRP (Web Services for Remote Portlets) and JSR-168 facilitate this and are supported by many Java frameworks. uPortal is the most commonly used open-source institutional portal. Administration, library and teaching/ learning functionalities are being brought together in the institutional portal. A portal can now be viewed as a thin layer that aggregates, integrates, personalises and presents information, transactions and applications to the user seamlessly and securely, according to their role, location and preferences, and in a manner independent of browser platform or device. Portals complement repositories by providing a user interface to the content.
Catherine Jones, CCLRC
Cathy outlined CCLRC's ePubs Project to develop an open archival repository for publications by its staff and facility users. Several interesting factors have emerged from this work. Take-up of ePubs is strongest in departments with an existing culture of publication and information collection. There can be a competitive element when staff see how many publications their peers are producing. Organisation of the content is also important. There needs to be a culture change in the deposit of publications for the success of ePubs. One possibility is keeping a version which can be deposited (e.g. a preprint). Cathy also mentioned some related work on the JISC-funded CLADDIER Project which is investigating linking data to publications, and on a digitisation programme for older technical reports going back to the early 1960s. This aims to capture much research-related technical information which would otherwise be hard to access.
Derek Sergeant, University of Leeds
Derek's presentation took us through a survey of user needs for the JISC-funded EVIE Project Embedding a VRE in an Institutional Environment . This analysed the research life cycle from the perspective of several different research disciplines. The main findings of this study were: that resource discovery was deemed most essential for the portal but provision of support for research outputs less so; that users wanted to see all databases and resources that are available for their subject, but want a single Google-style search box - however, no one size fits all users. The collaboration tools that were most popularly indicated were a meeting organiser and the ability to share files.
Matthew Mascord, University of Oxford
Matthew presented subject-specific work to enhance the IB project through the provision of a portal. A 3-month research process analysis was carried out with some key users from the heart disease and cancer modelling communities. This identified tools to be provided in the portal, including a repository interface for studies and results of model runs, a collaboration tool for sharing and annotating moving images (Vannotea is being evaluated), and a management tool (Transparent Approach to Costing (TRAC) is being used). The use of a USB digital pen is also being investigated as this is a means of remote sharing discussions of the underlying mathematics, particularly for the cancer models.
David Giaretta, Digital Curation Centre, CCLRC
David explained some work going on in the EU-funded CASPAR Project which is concerned with ingestion and publication of scientific data. A key part of this process is the 'representation information' which captures knowledge of data bit structure and any other information needed to interpret it in the future.
Rob Allan, CCLRC
Rob presented the work which had been done to date in the JISC ITT mentioned in the Introduction to this paper. A full final report and additional background information will be available soon. In addition to the issues identified by the earlier presenters, Rob mentioned the need for researchers to combine and cross-search data and information, possibly in a collaborative manner, as well as the need for both wide (e.g. Google) and deep (e.g. subject-based) discovery facilities.
Ann Apps, MIMAS, Manchester
Ann presented the current status and functionality of the IESR, which is an example of the components of the IE. It is a machine-to-machine registry providing: descriptions of collections of resources (e.g. census, e-learning resources); descriptions of services that make resources available; agents; transactional services, all of which are needed for resource discovery. A portal builder would not need to know about all the separate underlying services. A number of protocols are supported and integration with other services, such as UDDI and RSS, is being considered. The biggest problem at present seems to be take-up of the IESR and people need to be made aware of the advantages of using it.
Ken Miller, UK Data Archive, Essex
Ken's talk also addressed the linking of data (source) to publications (research output) in the JISC/ CURL-funded StORe Project. Ken reported on a detailed survey of researcher method and practice in the use and management of digital repositories, involving the research communities of 7 scientific disciplines . Implementations are now beginning to address issues of how researchers can deposit and link data and share it with known peers.
Peter Millington, University of Nottingham
Peter described a directory of open archive repositories, OpenDOAR, which includes institutional and subject repositories, funder's OA archives, but not OA journals. An aim is to provide access to authoritative, evaluated data. Two related projects RoMEO and JULIET, are concerned with publishers' policies, prototype API available and funders' policies on mandatory deposition of results of research respectively. Although the project is relatively new, interest is growing. Only 33% of OAI archives currently have policies published via OAI-PMH and more are being encouraged to submit them via suggested defaults. More machine-to-machine interfaces are being developed.
David Shotton, University of Oxford
Zoologists are developing the BioImage database and finding ways to populate it. David described the difference between a database, in which data is confined, and a Data Web, in which metadata harvesting is used to discover data and information from independent sources. The BioImage Data Web registry provides interoperability, gathering, ordering and integrating the metadata from across the Web into a single searchable graph, then directs users to original sources. Primary data holders benefit by increased user traffic, but retain control locally. The project leverages many of the advantages of Web 2.0 and includes subject-specific semantic tagging.
James Reid, EDINA
James looked at an infrastructure for improved geospatial data sharing through improving access via digital repositories. It addresses the issues of: lack of willingness to share; locating data; and explores mechanisms for sharing and locating data. A demonstrator is currently linked into LandMap and DigiMap services and has search, upload and download tools.
Caroline Ingram, CSI Consultancy
The second day ended with an active discussion session lead by Caroline Ingram. The main research resource discovery issues drawn out from this meeting were:
It was stated that "It is better to address un-satisfied requirements now than to deploy new frameworks". But David Shotton reminded the group that a tool has to be fit for purpose, not shoehorned into place.
In conclusion, the workshop was extremely useful and interesting, not only in terms of contributing to the e-research and portals study, but also more generally for the community with regard to considerations to make during future development for portals and digital repositories. Thanks again from the organisers to all participants, who contributed considerably to a successful couple of days discussions.