Web Magazine for Information Professionals

Research Data Preservation and Access: The Views of Researchers

Neil Beagrie, Robert Beagrie and Ian Rowlands present findings from a UKRDS survey of researchers' views on and practices for preservation and dissemination of research data in four UK universities.

Data has always been fundamental to many areas of research but it in recent years it has become central to more disciplines and inter-disciplinary projects and grown substantially in scale and complexity. There is increasing awareness of its strategic importance as a resource in addressing modern global challenges such as climate change, and the possibilities being unlocked by rapid technological advances and their application in research. In the US the National Science Board has stated that:

'It is exceedingly rare that fundamentally new approaches to research and education arise. Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change. They enable analysis at unprecedented levels of accuracy and sophistication and provide novel insights through innovative information integration. Through their very size and complexity, such digital collections provide new phenomena for study.' [1].

Similar views have been expressed internationally through the International Council for Science:

'Because of the critical importance of data and information in the global scientific enterprise, the international research community must address a series of new challenges if it is to take full advantage of the data and information resources available for research today. Equally, if not more important than its own data and information needs, today's research community must also assume responsibility for building a robust data and information infrastructure for the future.' [2].

In a UK context, the UK Government is a signatory to the OECD's Declaration on Access to Research Data from Public Funding [3] and has a strong policy commitment to supporting science and innovation. The Treasury, Department of Trade and Industry (DTI) and the Department for Education and Skills (DfES) published in 2004 the Science and Innovation Investment Framework 2004-014, which set out the Government's ambitions for UK science and innovation over that period, in particular their contribution to economic growth and public services. A section of the Framework addressed the need for an e-infrastructure for research. It argued that over the next decade the growing UK research base must have ready and efficient access to digital information of all kinds such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation but presents a number of major risks due to unresolved challenges in their long-term management [4].

These challenges in the management, preservation and curation of research data have been recognised over many years and different studies e.g. Lievesley and Jones in 1998 [5], Lord and MacDonald in 2003 [6], Tessella in 2006 [7], OSI in 2006 [8], and Lyon in 2007 [9]. Although encouraging progress has been made towards fulfilling some of the recommendations of these studies, much remains to be done. As a result, research data remain an important and strategic issue for the UK Higher Education Funding Councils and the UK Research Councils.

The UK Research Data Service Feasibility Study

In April 2007 The Higher Education Funding Council for England (HEFCE) requested expressions of interest in leading feasibility studies in the area of Higher Education shared services. The Russell Group IT Directors and Research Libraries UK, with the support of the Joint Information Systems Committee, the British Library, and the Research Information Network, submitted a joint expression of interest to undertake a UK Research Data Service (UKRDS) Feasibility Study and were successful in obtaining funding. Following an invitation to tender, Serco Consulting (in partnership with Charles Beagrie Limited and Grant Thornton) were appointed as consultants for the study.

The overall objective of the UKRDS feasibility study was to assess the feasibility and costs of developing and maintaining a national shared digital research data service for the UK Higher Education sector. From the outset it was recognised that the scope and requirements for a UKRDS should be determined primarily by researchers themselves. Therefore a major component of the feasibility study for UKRDS was the Survey of the Researcher Viewpoint conducted in collaboration with the Universities of Bristol, Leeds, Leicester and Oxford. This involved researchers and staff from central services in the four universities. Their views were sought on the potential scope and requirements and how a national shared data service could help their research, their institution, and UK research competitiveness. The four universities represented a sample of research-intensive universities of different scales within the UK.

Survey Methodology

The survey was undertaken between March and October 2008 and was conducted as an online questionnaire with a series of nine focus groups at the Universities of Bristol, Leeds and Leicester (three at each institution). In addition the University of Oxford undertook a series of in-depth qualitative interviews and a workshop with their researchers to contribute to the survey.

The work and analysis of findings at Oxford were conducted by Luis Martinez Uribe of the University of Oxford. The survey work at the other three case study sites by a joint team from Serco Consulting and Charles Beagrie was led by Neil Beagrie with analysis and illustration of the online questionnaire results being undertaken by Ian Rowlands and Robert Beagrie. David Grounds and Mike Callanan from Serco Consulting, Julia Chruszcz from Charles Beagrie, and university staff at Bristol, Leeds, and Leicester made significant contributions particularly to the workshops and the definition and review of emerging findings and themes.

The online questionnaire was developed as a set of draft questions which was then piloted and refined with researchers in initial focus groups at the case study sites. When the questions were finalised they were hosted on Survey Monkey and researchers invited by letters distributed by the case study institutions to complete the questionnaire during May 2008.

Each case study site for the online questionnaire aimed to maximise responses from researchers and respondents across their universities. Different channels for inviting responses were available at each site. These varied from university-wide email lists to personal letters to heads of department asking them to cascade information to relevant staff. The number of researchers invited to participate in each case cannot therefore be calculated. However the proportion of the known population of active researchers who responded at each university is given below.

We received 179 responses to the online questionnaire covering over 500 researchers in the universities of Bristol, Leeds, and Leicester. Responses could be made by individual researchers or research teams as a group. Group responses were primarily in Science, Technology and Medical (STM) subjects. The response rates from each institution represented approximately 6-10% of the total population of active researchers at Leeds and Bristol and 17% of the active research population at Leicester. Upon completion a second series of focus groups was then held at each site to discuss preliminary findings and responses from the Survey.

In addition, 37 one-to-one interviews with researchers were conducted in the University of Oxford by Luis Martinez-Uribe and a workshop held to discuss the initial findings from the interviews at Oxford [10].

A final series of focus groups were held at Bristol, Leeds, and Leicester and a workshop at Oxford during October to seek researcher feedback on emerging proposals for UKRDS. The case study sites deliberately invited researchers to these focus groups who had a range of needs from different faculties including arts and humanities and social science researchers as well as biological and physical sciences. They also deliberately included researchers who might be considered sceptics as well as supporters of a potential UKRDS from their questionnaire responses.

Scope of This Article

A preliminary report from the Survey was included in the UKRDS Interim Report [11] and discussed with the UKRDS Project Board and Steering Committee. Elements of the Survey and its findings were also incorporated in the Final Report of the UKRDS Feasibility Study submitted to HEFCE [12]. However space constraints precluded presentation of all the data and findings in full in these reports and they were mainly included in a separate unpublished appendix to the final report. In total 27 questions were included in the online questionnaire. This article therefore aims to publish more of this material and set it in its context (largely edited from the reports with updates from more recent published studies). In particular it provides selections from the quantitative data from the questions and responses and qualitative data from the additional free text comments in the online questionnaire.

Results from the Online Survey

Question 3: Your Department/Faculty

In this section respondents selected Higher Education Statistics Agency cost centre codes. There was a broad range of responses from different disciplines. In general terms STM researchers' responses totalled somewhat above what would be expected as a proportion of the known population at the case study sites, whereas the arts and humanities and social sciences were slightly under-represented. Bio-sciences, the largest single category of academic research by funding and number of publications, was particularly well represented in responses (17.3%). Overall the sample was too small for data covering the smaller disciplines with relatively low numbers of researchers in the case study institutions. The responses were voluntary and perhaps can be seen to some degree as a 'heat map' indicating disciplines where research data issues were of greatest concern.

chart (82KB) : Figure 1: Department or Faculty of Survey Respondent

Figure 1: Department or Faculty of Survey Respondent [Source data]

Question 6: Your Position

There was a good spread of responses across grades of staff. 17 research fellows were represented and form the majority of respondents in the 'other' category.

chart (36KB) : Figure 2 : Position of Survey Respondent

Figure 2: Position of Survey Respondent [Source data]

Question 9: Your Data Storage Needs

Researchers were asked if a national research data service were established in the UK what data storage features they thought should be included and how important would they be. A strong need for UKRDS to focus on long-term (>5 years) and medium-term (1-5 years) data curation and preservation was indicated. There was lower demand for short-term data storage (1-12 months).

chart (41KB) : Figure 3 : Relative Importance of Different UKRDS Data Storage and Preservation Time-Scales/Services

Figure 3: Relative Importance of Different UKRDS Data Storage and Preservation Time-Scales/Services

Question 12: Your Current and Future Data Storage Requirements

Overall a 360% growth in data volumes is anticipated over the next three years by researchers in the survey. There is some significant variation within and between disciplines.

Question 13: For what period after the conclusion of your project will the long-term research data remain useful for research and need to be retained either for yourself or others?

48.9% of data is seen as having a useful life of under 10 years. Only about 27% is seen as having indefinite value and retention. There are significant differences across disciplines in retention requirements as illustrated in the figure of perceived longer- term value of current research data holdings by discipline below.

chart (72KB) : Figure 4 : Perceived Longer-term Value of Current Research Data Holdings by Discipline

Figure 4: Perceived Longer-term Value of Current Research Data Holdings by Discipline [Source data]

Question 14: Data use and users. During this retention period who is likely to be interested in using it? (respondents could tick all applicable categories)

The majority of the retained data is seen as useful or is used by a small number of users. Only 24% have 20-100+ external users. In US National Science Board data collection levels terms [1], the majority of data surveyed would be 'research data collections' with only 24% likely to be in 'community data collections' or 'reference data collections'.

chart (57KB) : Figure 5 : Users of Research Data

Figure 5: Users of Research Data [Source data]

Question 15: Are there any grant or legal requirements to retain your research data?

33% responded there is a grant requirement and 14% a legal requirement to retain their data. A surprisingly high number of respondents believe there is no requirement (about 26%) or do not know (36%) if there is a grant or legal requirement to retain their research data.

Question 16: Location/Storage of Your Data

Most research data is held locally with less than 20% of respondents using an international or national facility for data deposit. Most data is held on individual PCs and departmental servers.

Question 17: Why do you use the storage options above [Q16] and do they meet your needs?

There were 117 free-text responses in total to this question. They have been analysed and categorised as follows:

Top Ten ranked positive factors/requirements
Convenience
23
Security/Secure/Safe
23
Backed-up
21
Ease of access
21
Ease/Flexibility of use
16
Low cost (price)/free
13
Storage capacity
13
What is available
9
Data sharing
8
Self-management/Control
6

Table 1: Respondents' top ten positive factors/requirements (ranked by total number of responses)

Ranked Negative Factors/improvements needed
Limited storage capacity locally
7
Limited security locally
6
Limited backup locally
6
Difficult access for externals
6
Expensive local service
3
Local facilities do not meet need
3
Time needed for local backup
2
Limited publicity when local
2

Table 2: Respondents' most often cited negative factors/improvements needed (ranked by total number of responses)

Question 19: What measures do you use to make your own research data available?

Most researchers share data - only about 12% do not make their data available. Informal peer exchange networks within research teams and with collaborators pre-dominate. Only about 18% share data via a data centre.

chart (60KB) : Figure 6 : Methods for Sharing Research Data

Figure 6: Methods for Sharing Research Data [Source data]

Question 25: How would you normally access the research data of other researchers?

In contrast to responses to question 19 where only 18% share via a data centre, about 43% responded that they access other researchers' data via a data centre.

Question 26: Do you access data from other sources (e.g. Government, National Health Service) for your research?

About 41% of respondents are using data from sources such as government and the NHS in their research.

The Oxford University Scoping Digital Repository Services for Research Data Management Project

The project Scoping Digital Repository Services for Research Data Management represented a cross-agency collaboration between the Office of the Director of IT, the Oxford University Computing Services, the Oxford University Library Services and the Oxford e-Research Centre. Findings from the project form the basis for the Oxford case study work for the UK Research Data Service (UKRDS) feasibility study. The project aimed to scope requirements for services to manage and curate research data generated by Oxford researchers. A central activity from the project was a requirements-gathering exercise to learn more about current data management practices amongst different research groups in the University and identify the researchers' top requirements for services to help them manage data more effectively. To complement the previous exercise, as well as to raise awareness and encourage discussion, a workshop was organised in June 2008 to hear about examples of good and interesting research data management practice from the perspective of different disciplines. A full report from the Oxford scoping study and workshop is available [10].

Findings and Top Requirements for Services at Oxford

A total of 37 researchers were interviewed between May and June 2008 and 46 people attended the workshop in June. This good response from researchers reveals the interest in research data management and it helped to document current practice and capture requirements for services across disciplines in Oxford.

The key findings were:

The management of research data in the University of Oxford is exercised to variable degrees of maturity across the institution. There are departments and individuals with extensive experience in handling the data they collect and big projects with a focus on data activities which produce, document and share data to a very high standard. On the other hand, there are many other departments and small-scale projects in which the data management depends entirely on individual researcher's skills and this is sometimes driven by individual short-term convenience.

Overall, the vast majority of researchers interviewed thought that there are potential services that could help them to manage their data more effectively.

The top requirements from Oxford researchers for services to help them with their data management activities gathered from the interviews and the workshop are:

Advice on practical issues related to managing data across their life cycle. This help would range from assistance in producing a data management/sharing plan; advice on best formats for data creation and options for storing and sharing data securely; to guidance on publishing and preserving these research data.

A secure and user-friendly solution that supports storage of large volumes of data and their sharing in a controlled way that will permit the use of fine-grained access control mechanisms.

A sustainable infrastructure that allows publication and long-term preservation of research data for those disciplines without domain-specific archives such as the UK Data Archive, the Natural Environment Research Council Data Centres, the European Bioinformatics Institute and others.

Funding that could help address some of the departmental challenges in the management of the research data that are being produced.

The scoping study also undertook a consultation with service providers to identify gaps in service provision and piloted the Data Audit Framework methodology to document in depth the data management workflows of two research groups in Oxford. This project is now being followed up by the JISC-funded Embedding Institutional Curation Services in Research (EIDCSR) at Oxford to address the research data management and curation requirements of two collaborating research groups.

Emerging Findings and Themes from the Survey

The interim and final reports of the feasibility study produced a detailed analysis of current service provision for researchers and universities. They also considered the research data policies of the research funders and conducted a gap analysis based on desk research of all available information, including the Survey and the workshops with the case study sites, and detailed interviews with key stakeholders. The following conclusions were drawn:

The Survey findings were followed up and discussed at the four final focus group sessions attended by researchers and service support professionals in each of the case study sites. From this input the following main themes emerged:

The Survey in a UK and International Context

Statistics for the UK Higher Education sector are available from the Higher Education Statistics Agency (HESA) and updated annually by individual universities. The figures for all UK universities for the Quality Research-related funding (QR) provided by the funding councils are provided in the figure below. This provides one measure of ranking research activity in universities and correlates quite closely with figures for research projects and number of researchers provided by our case study sites.

The Survey looked at needs and views of researchers in four UK universities. The position of each of the case study sites in this ranking is also shown below.

chart (50KB) : Figure 7 : British Universities rank-ordered by total Quality Research-related funding (QR) in 2005/6

Figure 7: British Universities rank-ordered by total Quality Research-related funding (QR) in 2005/6 [Source data]

In terms of UK ranking by Quality Related funding (2005-6) Oxford is 2nd, Leeds 9th, Bristol 13th and Leicester 27th. We believe the universities in this group ranked from 1-27 in QR funding account for the majority of funded research projects and a large part of the potential market of UKRDS for research data services.

The online surveys, focus groups, and discussions held with the Universities of Bristol, Leeds, Leicester, and Oxford provide findings which suggest that participating in the UKRDS service could be of interest to these institutions, and comparable universities of similar scale. This suggestion is supported by the expressions of interest from peer institutions to be involved in future phases of UKRDS.

What is currently unknown, however, is the potential level of interest in much smaller institutions, ranking between 27 and 127 in QR funding, which undertake some research and have much lower levels of research funding and numbers of research active staff and research projects. However it seems likely that individual active researchers or research teams within such institutions would also benefit from institutional participation in the UKRDS service; over time the potential scope of the service could be all 127 Higher Education institutions across the UK undertaking research.

Modern research is of course international and our Survey unsurprisingly has many echoes in the findings and themes emerging from studies on research data in other countries.

In Australia, the government has accepted the recommendation of the proposal report Towards the Australian Data Commons: A proposal for an Australian National Data Service [14] to establish A National Australian Data Service (ANDS) and has provided funding of Au$24 million (£11.5 million) over 4 years.

In Canada, a working group comprised of a number of Canadian organisations and agencies has been established to provide recommendations and an action plan for a new national approach to the stewardship of research data in Canada and has already published a Gap Analysis of existing provision [15].

Other European countries are also keenly aware of the challenges and opportunities associated with this aspect of e-infrastructure. In Germany the Alliance of German Science Organisations has identified research data as a major focus in its Priority Initiative on Digital Information [16] and the University of Göttingen, one of Germany's leading research universities, has published a study of the digital preservation needs of its researchers [17].

Conclusions

The frequency of demands for data management support at institutional level, by researchers, is starting to increase as data volumes grow, and as more research funders (notably the UK's Research Councils) develop policies for the management of data outputs generated by grant holders. Researchers' own expectation is often that the institutional library and/or IT service will make the necessary provision; although this has started to happen on a small scale, university managers have serious concerns about the cost, scalability and sustainability of purely local solutions, and the duplication of effort that may result.

Although some of the UK's Research Councils have data centres for their outputs, and there are some discipline-based repositories at national and international level, the large gaps in the UK's current provision means that the challenge of managing this data and ensuring its long-term sustainability and potential re-use often defaults to the individual university or even individual research group or researcher.

It is also important to understand that the data management challenge is by no means restricted to so-called 'big science', although large-scale facilities in areas such as particle physics generate huge data volumes. More modestly funded projects in all disciplines will also bring data challenges and data formats of varying complexity. The issue of the data deluge and increasing data complexity is not going to be restricted to Russell Group universities, although they can reasonably expect to have the largest volumes of research data to deal with. All UK universities will be affected by these challenges and it seems likely all could benefit to some degree from a UK-wide research data service.

Clearly there are several excellent facilities currently in place which provide outstanding services on behalf of their communities for access to and storage of research data. However, that does not imply - if we take a holistic look at the situation within universities and the opportunities and guidance available to researchers for preserving and providing access to research data across all disciplines - that researchers' needs are being fully met.

The UKRDS Feasibility Study and the Survey presented here can make a strong case for a national shared service in order to enhance the capacity, skills, and R&D investment needed to sustain UK research data cost-effectively. In the complex landscape of existing provision, the devil, of course, will be in the detail of implementation. The introduction and testing of any new provision and shared services must inevitably be pragmatic and evolutionary. A national shared service might take a number of forms, with components provided by universities, the Research Councils, and other agencies. The Feasibility Study reviewed and appraised the possible options and recommended in its final report a 'pathfinder approach' to establishing a UKRDS. At the time of writing a further UKRDS project is getting under way, funded mainly by HEFCE with another contribution from JISC. This phase, known as the UKRDS Interim Project, will move things forward in the planning, design and preparation for a UKRDS by testing proof of concept in a modest way in collaboration with the four case study universities [18].

References

  1. National Science Board (NSB), 2005, Long-lived Digital Data Collections: Enabling Research and Education in the 21st century September 2005 (National Science Foundation). Retrieved 10/12/07 from
    http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf
  2. International Council for Science, 2004, ICSU Report of the CSPR Assessment Panel on Scientific Data and Information (International Council for Science). Retrieved 17 November 2006 from
    http://www.icsu.org/Gestion/img/ICSU_DOC_DOWNLOAD/551_DD_FILE_PAA_Data_and_Information.pdf
  3. OECD, 2004, Declaration on Access to Research Data from Public Funding. (Organisation for Economic Co-operation and Development, Paris). Retrieved 1 June 2009 from
    http://www.oecd.org/document/0,2340,en_2649_34487_25998799_1_1_1_1,00.html
  4. Her Majesty's Stationary Office (HMSO), 2004, Science and innovation investment framework 2004-2014, (Her Majesty's Stationary Office, London). Retrieved 1 June 2009 from
    http://www.hm-treasury.gov.uk/spending_sr04_science.htm
  5. Lievesley, D. And Jones, S., 1998, An Investigation into the Digital Preservation Needs of Universities and Research Funders: the Future of Unpublished Research Materials, British Library Research and Innovation Centre Report no.109 (British Library 1998). Retrieved 29 June 2008 from
    http://www.ukoln.ac.uk/services/papers/bl/blri109/
  6. Lord, P., and Macdonald, A., 2003, e-Science curation report (Joint Information Systems Committee). Retrieved 29 June 2008 from
    http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf
  7. Tessella, 2006, Mind the Gap: Assessing Digital Preservation Needs in the UK (Digital Preservation Coalition, York). Retrieved 17 November 2006 from
    http://www.dpconline.org/docs/reports/uknamindthegap.pdf
  8. OSI e-infrastructure Working Group, 2007, Developing the UK's e-infrastructure for Science and Innovation (National e-Science Centre Edinburgh). Retrieved 20/5/2008 from
    http://www.nesc.ac.uk/documents/OSI/report.pdf
  9. Lyon, E., 2007, Dealing with Data: Roles, Rights, Responsibilities and Relationships. Consultancy Report, (UKOLN University of Bath). Retrieved 3 January 2008 from http://www.jisc.ac.uk/media/documents/programmes/digitalrepositories/dealing_with_data_report-final.pdf
  10. Martinez-Uribe, L., 2008, Findings of the Scoping Study and Research Data Management Workshop- Main Report, (University of Oxford). Retrieved 1 June 2009 from
    http://ora.ouls.ox.ac.uk/objects/uuid%3A4e2b7e64-d941-4237-a17f-659fe8a12eb5/datastreams/ATTACHMENT02
  11. Serco Consulting, 2008a, UKRDS Interim Report, Version v0.1a.030708 7th July 2008. Retrieved 1 June 2009 from
    http://www.ukrds.ac.uk/UKRDS%20SC%2010%20July%2008%20Item%205%20(2).doc
  12. Serco Consulting, 2008b, The UK research data service feasibility study: Report and Recommendations to HEFCE, 19 December 2008. Retrieved 1 June 2009 from
    http://www.ukrds.ac.uk/HEFCE%20UKRDS%20Final%20Report%20V%201.1.doc
  13. Alma Swan, A., & Brown, S., Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs, July 2008. Retrieved 30 July 2009 from
    http://www.jisc.ac.uk/publications/documents/dataskillscareersfinalreport.aspx
  14. ANDS Technical Working Group, 2007, Towards the Australian Data Commons: A proposal for an Australian National Data Service (Commonwealth of Australia, Canberra). Retrieved 20/5/2008 from
    http://www.pfc.org.au/twiki/pub/Main/Data/TowardstheAustralianDataCommons.pdf
  15. Research Data Strategy Working Group 2008 Stewardship of Research Data in Canada: Gap Analysis Retrieved 1 June 2009 from
    http://data-donnees.gc.ca/docs/GapAnalysis.pdf
  16. The Alliance of German Science Organisations, 2008, Priority Initiative Digital Information. Retrieved 1 June 2009 from
    http://www.dfg.de/forschungsfoerderung/wissenschaftliche_infrastruktur/lis/download/allianz_initiative_digital_information_en.pdf
  17. Neuroth, H., Strathmann, S., Vlaeminck, S., 2008, 'Digital Preservation Needs of Scientific Communities: The Example of Göttingen University', in Research and advanced technology for digital libraries: 12th European conference, ECDL 2008, Aarhus, Denmark, September 14-19, 2008: proceedings (Springer 2008).
  18. UK Research Data Service (UKRDS) http://ukrds.ac.uk

Author Details

Neil Beagrie
Charles Beagrie Ltd.

Email: neil@beagrie.com
Web site: http://www.beagrie.com/

Robert Beagrie
Charles Beagrie Ltd.

Email: rob@beagrie.com
Web site: http://www.beagrie.com/

Ian Rowlands
Reader
Department of Information Studies
University College London

Email: i.rowlands@ucl.ac.uk
Web site: http://www.slais.ucl.ac.uk/

Return to top