Handshake Session at International Repositories Infrastructure Workshop, Amsterdam
I was a participant in the repository handshake group discussions at the JISC-, SURF- and DRIVER-funded International Repositories Workshop in Amsterdam in March 2009 . Motivation for deposit was widely discussed at the start of the day. It also became apparent after a few hours that the premise of the discussions, i.e. 'repository handshake', was not a universally clear concept. Some felt the term referred to technical protocols, and some to business processes. There was a general feeling that the term might be too broad to be helpful.
On the second day the group discussed and refined some previously provided use cases. A number of new use cases were then developed, and there was an attempt made to prioritise the resultant set. The group felt that the development of certain demonstrators would be helpful, and that this should ideally happen in the next six to twelve months. It was generally felt that any significant international infrastructural solutions would require funding and facilitation.
I attended this event with my colleagues Paul Walk and Michael Day of UKOLN. The purpose and objectives of the workshop were stated as follows:
'The focus of this two day workshop is limited to repositories of open access research papers in order to progress the most mature aspects of developing repository networks. However, it will be necessary to highlight links with other data/resource types and to reflect trends in research practice including international, collaborative, data-driven science.
The objectives of the workshop are:
- to review and accept the description of the current position as described in the briefing materials (see below)
- to come to a shared vision of an international repositories infrastructure or, at least, the infrastructure components that might best be developed internationally
- to identify the essential components of an international repositories infrastructure
- to review the approaches to sustainability, scalability and interoperability being taken by these components, bearing in mind the wider research infrastructure
- to agree ways to resolve any issues identified in (3) above, including areas where practical international collaboration would help
- to identify critical success factors in achieving the progress identified in (4) above, bearing in mind the current position
- to consider ways in which the progress might be coordinated and reviewed over time
We hope that we will be able to agree action plans that can be taken forward internationally.'
16 March 2009
The first thing to note was that the title of all the groups had changed between the pre-meeting wiki discussions and the event itself. The group I was interested in had changed from 'Deposit Workflows and Environments' to the more opaque-sounding, but roughly analogous 'Repository Handshake' group. I think it's fair to say that the premise of this session was confused from the outset and providing a report is bound to be subject to some retrospective rationalisation and reductionism.
Discussions on Definitions
Peter Burnhill from EDINA was facilitating the session and he began with an overview of some typical deposit scenarios.
The session started with a general discussion around motivations for deposit. A well received point was made that the means by which repositories fit into everyday workflow processes for people (depositors) in their place of work is important over and above any repository-specific issues.
It eventually became clear after several hours that there was little clarity or common understanding about was meant by the term 'repository handshake'. Requirements were felt to be essential in to clarifying the debate. Simeon Warner of Cornell University suggested that motivation for deposit should be ruled out of scope given that plenty of examples existed already.
A key issue appeared to be whether we were talking about handshake protocols or policies, and there were different views on this as to which should be the subject of the session. There was considerable disagreement about 'what the problem is' – some thought protocols, others metadata, and some again thought it was policies, and the discussion became seriously fragmented at this point for some time.
Paul Walk of UKOLN suggested there should be a distinction between handshakes for internal purposes within institutions and their wider usage, an approach that would help in deriving use cases. There was some consensus that the scope of our discussions should only cover obtaining resources from 'outside' the system, despite some members of the group wanting to focus on moving resources between repositories.
In the second session Peter suggested that we need to work out 'what is to be done' as Lenin would have put it . Simeon amplified the point that the term'repository handshake' was not a clear one. I felt this helped to move the debate along, as I think many in the room had not been altogether clear in their minds as to the basis for the discussions taking place. For some, the term was very much about the protocol, for others more about the business processes, i.e. who does what and how in the handshake.
As I think Simeon rightly pointed out, the discussion was so general at this stage that we were making no real progress. He noted that repository handshakes have properties that are widely subject-specific, one example being in Physics, where academics do not cite page numbers.
17 March 2009
The second day started more promisingly with an attempt to get to work refining the four pre-prepared use cases . In the first use case 'wishes' was changed to 'obliged to'. The use case of depositing a link as opposed to the item itself was seen as being a key issue. This was puzzling to me, as Jorum has been doing this for many years.
The title of the second use case was changed from 'multiple deposit' to 'deposit', with multiple deposit deemed as too wide-ranging. Simeon felt it was important that users need not have to retype entries or have to go anywhere near the 'horrible repository'. Issues surrounding the need for repositories to cater for academic recognition, and the difficulties occasioned by the lack of motivation to deposit, were widely discussed.
The third use case covered journal deposit. Tara Packer from the Nature Publishing Group suggested that publishers would want to get involved in this area, to make a significant contribution to the process and provide a service that made the publishers look good.
In the session break I had an interesting chat with Clifford Lynch. He said he was quite puzzled by the European focus on Open Access and funding mandates as the drivers for deposit, and the general desire for institutions to have their own IRs. He said that in the US the driver was essentially data management provided at a national level, with repositories being subject-based and operated in a relatively cost-effective manner. I also had an amusing conversation with Peter where the idea of a new repository verb 'BEG' was minted.
Emerging Use Cases
In the final session we turned to a few new use cases.
- Handshake 5 – Institution-assisted deposit by users of research project
- Handshake 6 – CV-driven bibliographies with links to papers
- Handshake 7 - External sources that assist in the deposit process
- Handshake 8 - Assisting researcher/author with publisher – discipline-specific
Peter then attempted to get us to vote on the eight handshakes and met with some hesitancy from the group.
It was felt that the development of some kind of 'product' or demonstrator was required within the next six to twelve months. Paul Walk suggested that some organisational facilitation would be needed to ensure things happen in terms of generating international solutions, such as providing international funding. Simeon suggested it would be very difficult to get funding from the US and that it's probably best to persuade existing funding bodies to put cash in the pot.
In the closing session all groups came back together. Peter Burnhill summarised the handshake session.
He suggested that our use cases could be nicknamed 'deposit opportunities'. The focus should be on: more deposits; making it easier to deposit; and making deposit more rewarding for the depositor. These were the key words for deposit – 'more', 'easier' and 'rewarding'. While I doubt anyone would disagree with anything in Peter's summary, I was not sure it could reflect the entirety of discussions across the group.
Peter suggested a four-phase plan. We may be able to achieve national leverage, but for truly international solutions we needed to get funders to engage with each other to manage this.
He went on to say that the group had outlined eight deposit opportunities with the suggestion that we focus on three of them in a six-month period, and that three or four exemplars should be highlighted in order to put some flesh on the bones. A two- to three-year horizon seemed more realistic for the wider opportunities to bear fruit.
The two main use cases that came out of the two days of discussion were:
- Multiple author deposit – institutionally motivated.
- Journal deposit from publisher as a service to their authors.
Input from Other Groups
There were then summaries from the other groups.
Andrew Treloar spoke for the interoperable identification infrastructure group. Some issues outlined were identifiers at the repository level, rules, approval mechanisms, temporal issues, issues around self-populating and depopulating repositories. Authorship was mentioned, and how we might want multiple personae for work and home use.
Les Carr summarised for the citation services session. The group looked at addressing issues surrounding global open access literature. He walked us through a citation model the group had developed. Les talked about repositories being able to hand on accurate references to other services. This would be achieved by: establishing a testbed of repositories; creating APIs for reference list extraction; developing a reference extractor plug-in; and an OAI-PMH schema for reference lists. There would need to be liaison with Crossref, university publishers, and so on. There was mention of the creation of reference processors, reference services and other exemplar 'whizzy' reference services, for example for network visualisation and trackback.
Finally, the international repositories organisation group's deliberations were summarised. There was talk of an activity plan, and that bottom-up funding for an organisation is required. Furthermore, a concept should be formed around stakeholder needs.
Clifford Lynch provided the closing note speech. He addressed what he saw as the cross-cutting themes and implicit assumptions that had appeared during the two days of the workshop.
His first observation was that, given the workshop was about repositories and infrastructure, we had asked how we join up international repositories to get international infrastructure, but hadn't asked the question 'to what end?' We had been wrestling with how repositories interacted with a whole set of other services, many of which were quite distinct. We also had identity management issues related to identification of the individual and the organisation to address. We also touched upon the complexities of coupling repositories with other services and their identity systems. The important question was: do these places live within the same access management spheres? If not, we had a problem.
Clifford felt there was a fascinating set of issues around the relationships between name management and name authority. We needed to think about how personal name authority fitted with identity management in the repository space. There were some registries in place, but they only represented the beginnings. We needed to codify policy in a more sophisticated way. We also had to include the needs of the deposit mandates of funders, for example, who funded the work, and who was the employer of creator.
He noted there was some discussion about the interfaces we expected repositories to make available. It wasclear that we needed a lot more than just OAI-PMH. We needed guidance on what range of interfaces repositories should provide. The ability to be able to cross-deposit in an automated way was seen as very important, such as the idea of being able to upgrade materials or data in one repository with materials or data from another system, as might be the case if a publisher held better metadata.
We might want to be able to do a mass extraction or export from a repository. These mass moves were quite likely to trigger performance issues. We needed to be careful not to abuse the systems/networks in any syncing/matching exercises - efficiency was key here.
Clifford was concerned about the idea/wish for auto-replication of materials from other repositories, and suggested that we needed to think about duplication problems from the outset, otherwise we were in danger of replicating some of the classic mistakes of computing history. He mentioned that he and others at the event were frequently very frustrated by the restriction of the debate to traditional scholarly materials.
He noted that workflow was discussed at length, and he was confused about the lack of clarity around this concept. Scholarly workflow might be clear, but generally workflow was not. Where repositories were used in a more general context there were considerable likelihood of confusion over workflow notions.
The final issues covered were the need to change user behaviour in terms of authorial practices, and the role of the publisher. Clifford noted that it was very difficult to alter scholarly behaviour. The benefit of doing so had to be very clear to scholars if we were to have any hope of achieving this.
Clifford then mentioned the wish for structured references that supplement standard references with links and DOIs. Could we do this with repositories? If so, how cost- effective would it be and of what quality? Repositories and overlays, with the need to absorb some traditional publisher tasks into the repository space, might replace the traditional reference approach. Repositories would exist alongside traditional publication methods.
We needed to make our assumptions explicit to get our priorities clear. There was an ambiguity about the possible futures, and different people placed very different priorities on the range of possible services. Clifford thought repositories were coming of age, and we needed to move our thinking about repository infrastructure out of its closed space and into a wider context. The implication of this was that we had to situate the repository as just one of an overall set of services. Peter Burnhill asked about possible timescales and the priorities. Clifford replied we needed to get the identifiers solution right first, else we would encounter major problems down the line. The capacity to cross-deposit lended itself to prototypes, which also required careful examination very soon.
Generally speaking the workshop managed to go some way to achieving its objectives, and it does appear that the handshake group was one of the more difficult sessions from which to obtain clear and focused outputs. Much of the first day was spent orientating ourselves and trying to get a grip on the scope of the discussions. The second day did provide some refinement on existing use cases and some new ones that should be useful in informing the debate. Clifford Lynch, as usual, provided some astute comments and much-needed perspective to help place the discussions in a wider context. A wiki supporting the event's ongoing activities  has now been made public, and I look forward to seeing some activity in terms of the development of some demonstrators, especially as my own SWORD Project  is mentioned in phase 1 of the activities .
- UKOLN Events: International Repositories Workshop, March 2009: Home Page http://www.ukoln.ac.uk/events/ir-workshop-2009/
- In imitation of the widely circulated pamphlet by Lenin of the same name published in March 1902 as part of his attempt to galvanise the Bolshevik Party into practical and pragmatic action.
- Repository Handshake use cases (Slideshare) http://www.slideshare.net/Earthowned/repository-handshake-use-cases
- International Repositories Infrastructure wiki http://repinf.pbwiki.com/
- SWORD Project http://www.swordapp.org/
- Repository Handshake – Extending An Open Hand http://repinf.pbwiki.com/Repository-handshake