Open Repositories 2010
The air temperature in Madrid was around 37ºC when the Edinburgh contingent arrived in mid-afternoon on 5 July. The excellent air-conditioned Metro took us all the way into town - about 14km - for only 2 Euros. We were told later that the temperature during the preceding week had been about 21ºC, but by the end of the conference week we were enjoying 39ºC. The conference venue turned out to be opposite the Santiago Bernabeu stadium (home of Real Madrid), in Paseo de la Castellana. Because of the World Cup and Spain's steady progress through the upper reaches of the tournament, football was never far away.
This event report looks briefly at a small cross-section of presentations which gave clues about likely future repository development. Because there were two main streams in the conference it was not possible to attend all sessions. This is a selective review with apologies for omissions. Overall there was a change in emphasis this year, with a focus on research management rather than open access. Indeed there were numerous presentations dealing with research management solutions. Organisations have not given up on open access, but considering the populating of repositories as a spin-off from a more complete research management solution is seen as a more fruitful approach.
One of the principal sponsors of the event was Microsoft Research. Three years ago at the Open Repositories 2007 conference in San Antonio, Texas, Tony Hey talked about the potential for a contribution to the repository world from Microsoft. It has taken a little time, but now things are starting to move, and it seems likely that Microsoft will be a significant player in providing technical solutions where they are possible.
As well as the three stalwarts of Dspace, ePrints and Fedora, there were several newish players represented, e.g., Equella, and Pubman.
The conference programme was as follows:
- 6-7 July 2010: The general sessions.
- 8-9 July 2010: Specfific user group sessions for Dspace, ePrints, and Fedora.
Pre-prints and the Origin of the Web
One of the most interesting of the presentations at the event was given on Tuesday 6 July by Salvatore Mele of CERN: 'INSPIRE: A new information system for High-Energy Physics'. He told us that analysis of clickstreams in the leading digital library of the field shows that HEP (High-Energy Physics) scientists seldom read journals, instead preferring pre-prints. He pointed out that HEP has explored alternative communication strategies for decades, initially via the mass mailing of paper copies of preliminary manuscripts, then via the inception of the first online repositories and digital libraries. He used an intriguing photograph of the CERN library taken shortly before the inception of the Web, which showed shelves full of pre-print papers. One of the original motivations for creating the Web was to support the circulation of information about these papers, and eventually to make the papers themselves available. HEP is therefore uniquely placed to answer recurrent questions raised by the current trends in scholarly communication:
The present services offered by INSPIRE, include: citation analysis, author disambiguation, harvesting of Open Access content, strategic partnerships with other information providers and leading publishers in the field. INSPIRE is now designing and implementing new services, including tagging, automatic key-wording, crowd-sourcing of curation, automatic author disambiguation, widening of the scope of the collections, semantic analysis of the content, etc.
IRs and Research Information Registries
Sally Rumsey, University of Oxford, in her paper 'Blurring the boundaries between an institutional repository and a research information registry: where's the join?' articulated the view that increasingly institutional repositories (IRs) are being considered as tools for research management as part of pan-institutional systems. This might be driven by the need for statutory reporting such as that required for the forthcoming UK REF (Research Excellence Framework). Such functionality generally requires integration with other management systems within Higher Education institutions. It is common to find that each research management system has been selected to serve a specific need within an organisational department. As a result, data are held in silos, are duplicated and can even be 'locked in' to those systems. Some institutions are addressing this problem by considering CRISs (Current Research Information Systems) or business intelligence systems.
The solution at Oxford is the development of a registry and tools to support research information management. Many of the motivations behind the repository are common with those for research information management. There is considerable overlap of design, data, services, and stakeholder requirements. By considering these two areas together with other related digital repository services, new opportunities and efficiencies can be revealed to the benefit of all stakeholders.
Deposit (Modus Operandi)
David Tarrant, University of Southampton, outlined the new DepositMO Project, which aims to develop an effective culture change mechanism that will embed a deposit culture into the everyday work of researchers and lecturers. The project intends to extend the capabilities of repositories to exploit the familiar desktop and authoring environments of its users. The objective is to turn the repository into an invaluable extension to the researcher's desktop in which the deposit of research outputs becomes an everyday activity. The target desktop software suite is Microsoft Office, which is widely used across many disciplines, to maximise impact and benefit. Targeting both EPrints and DSpace, leveraging SWORD (Simple Web Oriented Repository Deposit) and ORE protocols, DepositMO outputs will support a large number of organisations.>
The ultimate goal is to change the Modus Operandi of researchers so that repository deposit becomes standard practice across a large number of disciplines using familiar desktop tools. Partners for this project include the University of Southampton, Microsoft Corporation, Macmillan Publishers Ltd, the Higher Education Academy, LLAS (the Subject Centre for Languages, Linguistics and Area Studies at Southampton), and the University of Edinburgh.
Author Disambiguation: Open Researcher and Contributor ID
Author disambiguation was a significant issue for this conference, and was discussed in several presentations, but there was nothing really new. It is likely that it will be some time before a good solution is found to this problem. Simeon Warner, Cornell University, gave an interesting presentation on the ORCID Project, which represents a collaborative effort of institutions and publishers to achieve a solution.to the names question. The Open Researcher and Contributor ID Project  is an effort to create an open, independent registry of researcher and contributor identifiers. The mission is to resolve systemic name ambiguity, by means of assigning unique identifiers linkable to an individual's research output, to enhance the scientific discovery process and with the additional goal of improving the efficiency of funding and collaboration. Importantly the founding parties include not only the leading commercial players (Thomson Reuters, Elsevier, Nature, etc.) but also open repository projects such as INSPIRE, SSRN (The Social Science Research Network), arXiv, etc. This is an unprecedented opportunity to address the name disambiguation problem in scholarly communication and to provide an effective substitute for many author identity-based services. The effort can only achieve its full potential if the academic community, primarily disciplinary and institutional repositories, is an active partner.
There were several presentations for different repository software all demonstrating search solutions using Solr  as a separate indexing and search solution. For Dspace this is being developed by @Mire and will be implemented in Dspace 1.7. While in part this is a behind-the-scenes improvement, it will provide faceted searching for the first time.
There were two presentations for Dspace, one on using Google Analytics (by Graham Triggs of OpenRepository) and one on the @Mire statistics modules. There may have been a bit of behind-the-scenes competition going on here. Graham was able to demonstrate that the same stats could be produced using the free Google Analytics data. It might however be unwise to rush to judgement without further investigation.
Poster Sessions Pecha Kucha
This was pretty well organised this year with a full screen countdown. People have got the hang of this now, and hardly anybody ran over the statutory minute. Some people in the community can turn in an excellent summary in the time available, so the need for drama might be answered in future by a shorter slot.
Conference Dinner at La Casa de Mónico
This location is a favourite for weddings and other functions, located just a few kilometres outside Madrid. The evening of 7 July was the same evening as the World Cup semi-final between Spain and Germany - the organisers of the event caved in to the inevitable, and provided two large plasma screens on the presentation stage, which showed the match during the early part of the meal. We found ourselves at a table which was not at all interested in football, and we discussed other things.
Future Development of DSpace
DSpace and Fedora UG sessions were packaged together as a Duraspace stream with some shared sessions. It seemed as though most of the Duraspace organisation (the result of the DSpace/Fedora organisational merger) was present at OR 2010. There was a special presentation where the future development of DSpace was outlined in some detail. Essentially future development of the DSpace platform will be focussed on including Fedora 'inside'. Meaning that the current storage-related code in Dspace will be replaced with Fedora. In this way, the perceived advantages of Fedora – a simple flexible database for which easily customisable interfaces can be developed – will be available to DSpace developers. DSpace 1.7 is scheduled for release in December 2010.
This ran in the 'Developer Lounge' (and an attached bar terrace) throughout the event. The task was to create a functioning repository user interface, presenting a single metadata record which includes as many automatically created, useful links to related external content as possible (the task resembles an earlier Repository Fringe challenge). For the purpose of the challenge, the definitions of the highlighted terms above were defined as follows:
- "functioning" in this sense means that mockups/screenshots are not sufficient – however a working prototype is OK
- "related" in this sense means that the external content is related to this particular metadata record in some way.
- "as many useful links" means that marks will be awarded for useful links, so an interface with fifty meaningless links does not beat one with three genuinely useful links!
- *links must be related to content, not just a system. So, for example, a link to the page at http://www.wikipedia.org is not legitimate, but a link to a specific page in Wikipedia could be. Only one link of each 'type' counts: i.e. having four links to URLs which reference 'topics' in a given system is fine but will count as one link for the challenge.
Richard Davis and Rory McNicholl from the University of London Computer Centre won the Challenge .
This was held on the final afternoon of the event. It was well attended, not just by developers, but by a significant number of managers. In our case, we already understood what it did, but not much about how it did it. So it was interesting to get detail of the history of SWORD development, and the underlying Atompub protocol. The uses of SWORD were demonstrated in a number of ways, including repository submission by email, and via Facebook. The SWORD tool is amazingly versatile, and the workshop was one of the highlights of the conference.
Lee Dirks, Alex Wade and Oscar Naim of Microsoft Research also held a workshop on Friday 9 July – 'Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem' which was designed to ' provide a deep dive into several freely available tools from Microsoft External Research', and to demonstrate 'how these can help supplement and enhance current repository offerings.'
Some among us might have been expecting OR conferences to run out of steam after a few years but instead they are developing. Where in previous years the presentations were often "this is our repository with our user interface customisations", this year they were more focussed on how to add value to the repository and how to integrate it with other software.
All presentations will be available from the Open Repositories web site in due course .
- ORCID http://orcid.securesites.net/
- Solr http://wiki.apache.org/solr/FAQ
- Details of the Developer Challenge can be found on the DevCSI blog
- Open Repositories 2010 http://or2010.fecyt.es/publico/Home/index.aspx