The Digital Curation Centre (DCC) is staging a series of free regional data management roadshows to support institutional data management, planning and training. These events run over three days, presenting best practice and showcasing new tools and resources. Each day is designed for a different audience with complementary content so that participants can attend the days that best meet their needs. Presentations from both the second roadshow in Sheffield and the first one in Bath in November 2010 are on the DCC Web site .
Getting us all up to speed with the nature, current challenges and existing good practice relating to research data, was the goal for Day 1. The format for achieving this was a substantial morning presentation followed by an afternoon of illustrative case studies.
Liz delivered a comprehensive overview which was both accessible for newcomers and thought-provoking for delegates with some familiarity with the topic. Clearly scale and volume are key features of research data, but the audience was stunned to hear that, in the context of all global digital information, the International Data Corporation(IDC) estimates a growth rate of 58% per year to reach 35 Zettabytes by 2020  (A Zettabyte being one million million Gigabytes). Complexity is also an issue, both in terms of the data themselves and the infrastructure workflows to process them across organisational, disciplinary and national boundaries in the context of Open Science and the Panton Principles . No wonder then that major funders like the Natural Environment Research Council (NERC) and the National Science Foundation (NSF) are formulating policies that make data management planning mandatory, and that institutions are beginning to respond to the challenge. Liz highlighted some useful exemplars at institutional level, referring to progress at Edinburgh , the JISC-funded Incremental Project  and collaboration with North American partners to develop the DCC Data Management Planning (DMP) Online Tool . This is all happening in the context of freedom of information, citizen science and the ensuing ethics and privacy issues as the sharing culture progresses.
So what practical issues do data management policies need to address, and where are the gaps? The list addressed all stages of the lifecycle from storage, appraisal and selection, right through to licensing, sharing, attribution and citation. Solutions need to be secure, resilient, provide value for money, and be sustainable; so cloud services may be part of the way forward. Underlying the technical concerns though are two fundamental questions. Can we incentivise data management through recognition, impact metrics or other means? Secondly, how do we unravel the funding conundrum in relation to who owns, preserves, and benefits from the process? Required reading before starting to tackle this would seem to be the report from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access , and the Keeping Research Data Safe materials .
The morning raised questions relating to the what and why aspects of research data management, so we were ready for an afternoon of case studies from Yorkshire and Manchester looking at how to tackle it in practice.
MaDAM , based at the University of Manchester, is a project in the JISC Managing Research Data Programme. They have examined the research data management requirements of selected biomedical researchers and are developing a pilot infrastructure capable of upscaling to a university-wide service embedded in the normal research lifecycle. The researchers wanted a centralised storage system with automatic back-up that could make data searchable, retrievable and above all shareable. Meik showed screenshots of how the interface had evolved, so that the prototype was now ready to move to a production service. He emphasised the importance of a design that was hospitable in the future to a range of subjects and data types; for example, offering automatic thumbnails for images. The end product will provide the facility to archive complete projects with links to their published outputs.
NeISS  is a three-year project in the JISC Information Environment Programme with Leeds as the lead partner. Mark gave a clear explanation of social simulation and its relevance to a wide range of social science areas. Critically, the data processed are of value beyond the boundaries of academic research and into the field of policy development. The project represents therefore a key example of integrating data from varied sources and processing them for subsequent sharing and reuse. It also encompasses making the data and their underlying methodology accessible to and usable by non-specialists.
Mark showed a diagram of the architecture through which they envisage linking together the varied datasets and models into a collection of portal services in order tomake outputs visible and push them out to different types of user. He illustrated this approach in more detail with a series of video clips developed to instruct users. Work in progress includes further investigation of 'data-crunching' using the National Grid Service, and curation of the outputs, given their resource-intensive production process.
York Digital Library (YODL)  is a multimedia repository at York, but is part of a group of related services including the local Virtual Learning Environment (VLE) together with the White Rose Research Online and White Rose eTheses Online services shared with Leeds and Sheffield. YODL's remit is not exclusively limited to research. A fact reflected in the recently issued policy and guidelines which relate to 'University produced digital content relating to teaching, learning and research' ,and which resonates with the growing emphasis on research-led teaching and the use of research data by students at undergraduate level.
Matthew explained that YODL had worked predominantly with Humanities departments and therefore had processed mostly audio, image and video data. The aim in principle was to be able to handle a wide range of data, while balancing the needs of complex and simpler structures. The repository is based on Fedora software for its flexibility, but time, effort and expertise are needed to benefit from this flexibility; hence projects have deliberately costed in dedicated developers on fixed-term contracts.
YODL was also perceived as flexible in terms of offering support at four different levels:
DMTpsychis a JISC-funded Research Data Management (RDM) Training Materials project aiming to deliver a workbook and slides to support lectures for postgraduate research students. The Psychology area spans a wide range of complex and voluminous data sources including interviews, statistics and MRI scans, and is supported by an equally wide range of funders with varying data management requirements. Richard emphasised the need to make the training delivery engaging; the video clip of a fire complete with sound effects certainly caught the audience's attention. Somewhat paradoxically, it seemed the students preferred printed rather than online material to support the activities, though they also wanted more examples for cutting and pasting into data management plans (DMPs).
Richard outlined the lecture content, noting that the students liked an approach which became gradually more specific as well as the opportunity to follow up lectures in smaller work groups. In the longer term, the project intends to increase research data awareness in the psychology community and collaborate further, for example, with the British Psychological Society.
Though also focussed on training, the DataTrain case study  raised some different issues. Cultural divergence in the attitudes of the two subject areas of Archaeology and Social Anthropology had been observed; even within Archaeology there was a need to distinguish between requirements in the academic and professional fields. There was also some similarity with Psychology in case study 4, in that these disciplines use a very wide range of data collection techniques and have to meet diverse funder requirements.
Again the plan is to have a series of lectures in order to cover the topic adequately and demystify it effectively without the use of jargon. As with other initiatives in information literacy, engagement from academics to get the material embedded in courses emerged as critical. However, there was no conclusion as to whether in the long run this might extend to delivery by academics rather than by central services. The project has now reached the pilot stage, with real students in Archaeology and Social Anthropology at the University of Cambridgeengaging with the material. It would be interesting to see how their feedback influences further development. How can we make research data management training both useful and interesting?
Kevin outlined the support that the DCC can offer in terms of the DCC Curation Lifecycle Model , tools , guidance materials  and further events to share good practice. He stressed that the approach was collaborative rather than prescriptive, and that the DCC could provide support to underpin local work such as possible cloud services, and synthesising information on cost benefits. He also offered post-event follow-up to discuss local issues relating to the needs that participants had identified at this roadshow.
Day 2 was aimed at those in senior management roles and looked at strategic and policy implementation objectives. The format was a mixture of presentations from Liz Lyon plus group work exercises and discussion facilitated by her. Marieke Guy's write-up of Day 2 at the Bath Roadshow , which followed a similar pattern, complements the report below. The DCC has also undertaken to collate the results of the group work and make them available.
Liz gave an overview of the challenges, highlighting both institutional and disciplinary diversity. She posed questions about the factors to be considered in facing them, requirements, motivations, benefits, plus risks, and lastly costs. We then had the option of selecting one of those three factors and working on it in a small group. Group feedback about requirements gathering noted the need for the personal touch, using interviews, finding a champion and making a structured proposal for discussion, perhaps based on the DCC Curation Lifecycle Model. I worked in one of several groups looking at motivations, benefits and risks, and we felt that preserving academic reputation could be a key driver if we could also find a way to reduce administrative burden. We concluded that synthesising evidence of benefits from existing reports might be a useful service that the DCC could provide. A few brave delegates elected to consider costs and drew out the need to cost the full lifecycle and the difficulty of estimating post-grant costs. One positive aspect was the feeling that the culture is shifting towards better costing of bids at an earlier stage.
Liz then gave us some input to enable us to take our deliberations forward later at a local level. On requirements gathering she drew attention to the Data Asset Framework (DAF)  and practical findings from pilot users with some entertaining quotes. Data loss, increased costs, legal risks including Freedom of Information issues, and loss of reputation, were noted as key issues. She suggested that the Keeping Research Data Safe benefits taxonomy  was a valuable framework.
Following the same format, we looked at our local institutional readiness against the background of the DCC Curation Lifecycle Model and attempted a SWOT (Strengths Weaknesses Opportunities and Threats) analysis of our own situation. Overall, most institutions had pockets of expertise and varying advocacy networks, but our weakness was the inability to join them up to form a coherent approach. We agreed that it was now opportune to address the topic and that technology offered potential solutions, but also felt that the economic climate, restructuring that ignored the new agenda, and the risk of divergent subject approaches posed significant threats.
Liz encouraged us by reporting progress at Edinburgh , the advent of possible cloud solutions, the firming up of funders' policies, for example the Natural Environment Research Council , and the evolution of the DMP Online Tool . Further afield, it appears that Australia has established good practice.
We were now drilling down to specifics and tackled next the issues surrounding skills. Liz hit us hard with a long list of skills that might be needed, and asked us to audit where we were in that respect and to prioritise what we saw as the core skills. One group was diverted by the great metadata skills debate, but we noted that it remained unclear how far those needs might be reduced by more sophisticated resource discovery tools. We concluded that overall there were significant pockets of expertise in libraries, IT services, records management and research offices, but again joining them up in a coherent way was going to present a challenge. There seemed to be an absence of relevant staff training opportunities, but this was a gap the DCC could help us fill and would be covered in more detail on Day 3. Postgraduate training was being covered by the JISC RDM Training Materials  and Liz drew attention to useful projects we had not covered on Day 1.
We were now at the heart of the matter. Liz focussed on the need to optimise organisational support, and flagged up key components such as leadership, coordination and role definitions. She referred to the reorganisation of research support services at the Queensland University of Technology  and asked us to take that approach into account as we moved to conjuring up our own plans. We were asked to identify actions and a timeframe for them ranging from 0 to over 36 months, and were challenged to answer the question 'What will you do tomorrow?'
Our short-term plans included establishing a task force, building on research office links with our research community, getting research data into the grant application workflow and awareness training. Starting a real fire to destroy some data was a light-hearted suggestion, but the mood was certainly there to start some metaphorical fires and get some quick wins with key high-profile datasets. Medium-term suggestions were linked to the Research Excellence Framework, roadmap production and building a solid business case, preferably with an eye-catching exemplar at its heart. There was less certainty however about longer-term plans, but a consensus emerged that maintaining momentum would be crucial, that an element of restructuring was likely, and that monitoring national and discipline-based services would be essential in arriving at a fully formed business plan.
The delegates felt that a lot of valuable outputs had been produced, and Kevin undertook to ensure that the group work would be collated and shared. The SWOT analyses in particular could be usefully amalgamated for further consideration. DCC would also follow up with each institution represented.
The aim of Day 3 was to provide practical 'nuggets' for participants to take back to institutions so they could start doing something small but on a practical level. There were lectures from DCC staff and some group exercises.
In this session Joy explored how existing policies, codes of practice and support can be exploited when making the case to senior managers.
Research income is a good place to start: what do the research funders now require with respect to RDM? Non-compliance could result in a serious loss of research income in the future. As a first step, ensure the relevant funder polices are somewhere researchers can easily find them .
Look up codes of research practice at your institution and bring them to the attention of your researchers. These codes may assume that RDM is happening but is there evidence that there are policies and systems in place to support it? If not, highlight the risk to the institution of not doing this. Lost or leaked data could cause irreparable harm to an institution's reputation, and drastically affect future funding as well as potential collaboration. Find real-world examples of where it has all gone wrong to illustrate your point.
The University of Glasgow's code of good research practice  expects all researchers to take responsibility themselves for data management and to observe the standards of practice of the funding bodies. Crucially though, Glasgow has also committed to providing training and resources for this effort ; you may need to lobby for similar at your own institution.
Build trust with your researchers. Talk to them in language relevant to them and find out what they want. If you can provide evidence of need, you may be able to get support from staff development. Exploit the channels you already have don't set up a new group on RDM, fit in with pre-existing committees (such as research ethics). Make use of existing courses and tie into them and add slides to existing course materials. For more ideas see the Incremental Project , DCC 101 Training materials  and the UK Data Archive best practice guide .
Another approach is to make use of assessment tools such as the DAF . Doing an assessment may reveal where efforts are being duplicated or resources under-utilised. Senior managers are more likely to respond positively if you illustrate the potential for efficiency savings.
Finally, be sure to avoid using curation-related terminology and refine your message to your target audience. Timing can be crucial; hook into new initiatives as you become aware of them. Draw upon existing policies and mandates where you can, and augment or adapt existing materials that are freely available.
Data management planning is fast becoming a requirement for the majority of funding bodies and research councils. This session explored the current landscape and future directions.
The DCC has provided a summary table of research data policies which is a useful starting point . Most funders now have a requirement for data management and sharing plans which should describe
Some institutions have already been involved in projects to develop policies to support RDM. The University of Oxford has produced a statement of commitment to RDM, developed through collaboration with the University of Melbourne . Similarly, at the University of Edinburgh, they have developed policy and strategy for RDM  and provide a wealth of advice and guidance for researchers on their Web pages .
There is lots of guidance for producing a Data Management Plan (DMP) available, derived from JISC-funded projects. DMTpsych  for example, used the DCC checklist as the basis for producing a guidance booklet for postgraduate students. The Incremental Project  has led to the production of a short set of FAQs entitled 'Who can help with....'. Researchers have asked for practical examples, so the site includes case studies and video clips of researchers . This type of initiative can help raise the profile of support and ensure it becomes embedded into wider courses.
At the end of March 2011, all the JISC Managing Research Data (JISCMRD) projects will be reporting back and all the material they produce will be freely available for anyone to use . It is important to remember though, that access to general guidance is not enough. Researchers need to be talking to IT and library staff to discuss initial ideas, get practical support and find the best way forward.
Working in teams of between five and seven people, each group had to develop a 'pitch' aimed at senior management. The pitch had to last less than two minutes and relay the relevance of data management to the overall strategic aims of the institution. Delegates were reminded to give careful consideration to the language they used and to think about concrete evidence that could be used to support their arguments.
In the event, while everyone quite enjoyed this exercise, when it came to reporting back, most groups felt they weren't quite actually ready to pitch, but they did have the basic structure. So, for example, one group developed a pitch aimed at a pro-vice-chancellor pointing out an example of an RDM disaster and potential problems with current procedures, but the group also provided the first steps of how to solve them.
Funder policies are diverse and have different requirements. Some have different policies for different programmes and they require researchers to produce plans in different ways. To help researchers, the DCC has produced the DMP Online, an online data management planning tool . In the first version it had 51 questions/headings. After consultation, this increased to 115 (though not everything needs to be answered by everyone). It is an online tool and provides the headings needed depending on the research funder concerned, and the stage of the research cycle. The DMP Online is freely available and enables users to create, store and update multiple versions of their plans; it meets funders' specific data-related requirements and researchers can obtain instant guidance on how to write their plans.
DMP Online v.2 is now live. The main difference is a clearer interface and a versioning feature which makes it possible to do a new version of a DMP based on an existing one.
This presentation concentrated on how to identify the data you want to keep. It is important to do it throughout the life of the research project as various factors will affect what you are going to be able to do with the data later on.
When selecting data, the following points needed to be taken into consideration:
To tackle all of the above, you need the researcher involved and to be thinking from the very beginning about what is going to need to be kept. However there is always going to be a balance between want and need, what is permissible with the data and what the institution can actually afford to do.
You also need to discuss within your institution who is responsible for different parts of the DMP. It will be a mix of research group expertise, central research support, library and computing services, depending on what aspect of the research practice and storage you are talking about. Resources available include the DCC How-to Guide on selection and appraisal .
Alex began his presentation with a key question: why license your data? One possible answer is that it gives clarity to the reuse possibilities of your data. The problem is that the law is not up to speed with databases as there is the database itself, the content, input and the visualisation that can be derived from the data. Each aspect may comply to differing degrees with current copyright laws and situations vary in different jurisdictions.
He described the four types of Creative Commons (CC) licences and their limitations in relation to research data. For example, if you license data as 'non-commercial', does writing an article based on this data that is then published in a commercial journal contravene this licence? As a rule, CC licences work well with homogeneous things such as reports; but if you have different types of data, data from different sources or different attributes for the data and for the database in which they are housed, then all becomes horrendously complicated. Other possibilities are the Open Data Commons suite of licences or the Open Government Licence.
Whichever type of licence you choose to use, you need to attach it to the data, putting it where people are going to read it and in a format that is discoverable by harvesters. Guidance on licensing is imperative because once you apply a licence to your data, that licence is irrevocable. Researchers need to be clear about what they want to happen with their data and will need expert advice on which licence is appropriate. The DCC provides a guide on How to License Research Data .
The day finished with very little time for groups to complete a final exercise by actually starting to write a DMP. We discovered that it was very difficult for one person to come up with a DMP. The first step is to identify the relevant people who can help. The problem areas were considered to be: ensuring access; data sharing; reuse; as well as long-term storage which institutions have facilities in place?
This roadshow presented an excellent opportunity for the exchange of ideas and experience. It was well attended by delegates representing a wide range of roles in the RDM process: researchers, research support staff, IT staff and librarians.
Research data are the research institutions' 'crown jewels' and can be considered the new 'special collections'. They are at great risk of loss, damage or leaking, unless correctly managed (slides of USB drives and a melted laptop made the point). As these data increase on a massive scale year on year, the problem becomes more acute. Senior management of many research institutions are slowly coming round to developing and implementing RDM policies and providing resources to support them. This is to some extent driven by research funders, but research support staff and researchers will also need to make the case for RDM support.
There is now a wealth of experience and resources made available by the DCC and others. They are willing to collaborate on developing and sharing information on RDM good practice, training materials and tools.
The roadshow succeeded in the aim of providing delegates with advice and guidance to support institutional RDM. We have been inspired by this roadshow to encourage RDM planning and training at our institution. We thoroughly recommend attending future roadshows to be held in South-East England and South Wales .
Research Development Librarian
The University of Sheffield Library
Marion Tattersall is a Research Development Librarian at the University of Sheffield Library. She is interested in how researchers gather their information and the ways they publish and disseminate their findings. Her current role is to develop library services that support those activities in a way that fits in with researchers' workflows. Her previous roles there were in eServices development and subject liaison with a range of departments in Engineering, Science and Medicine.
Faculty Librarian for Science
The University of Sheffield Library
Carmen O'Dell is the Faculty Librarian for Science at the University of Sheffield. She is interested in developing close partnerships with departments to ensure resources and services closely match the needs of students and researchers. Before coming to Sheffield Carmen worked in the CERN Library, Geneva.
Senior Library Assistant
The University of Sheffield Library
John Lewis is a senior customer services assistant at the University of Sheffield Library. He has been drawn to study the issues surrounding research data management whilst completing a Masters degree in Information Management. He will be involved in developing and providing research support services at the University library.