Web Magazine for Information Professionals

The Fourth DCC-RIN Research Data Management Forum

Martin Donnelly and Graham Pryor report on the fourth Research Data Management Forum event, on the theme "Dealing with Sensitive Data: Managing Ethics, Security and Trust," organised by the Digital Curation Centre (DCC) and Research Information Network (RIN) in Manchester, England, over 10 - 11 March, 2010.

The fourth meeting of the Research Data Management Forum was held in Manchester on 10 and 11 March 2010, co-sponsored by the Digital Curation Centre (DCC) [1] and the Research Information Network (RIN) [2]. The event took Dealing with Sensitive Data: Managing Ethics, Security and Trust as its theme [3].

Day 1: 10 March 2010

DCC Associate Director Liz Lyon and RIN Head of Programmes Stéphane Goldstein welcomed the 45 delegates to the event, and began by introducing the keynote speaker, Iain Buchan, Professor of Public Health Informatics and Director of the Northwest Institute for Bio-Health Informatics (NIBHI), University of Manchester.

Iain's talk was entitled Opening Bio-Health Data and Models Securely and Effectively for Public Benefit, and addressed three main questions:

  1. Where does the public's health need digital innovation?
  2. How can research curators promote this innovation (and what are the implications for Ethics, Security and Trust)?
  3. Is a framework required (covering the Social Contract and a digital and operational infrastructure)?

A major theme in contemporary healthcare is that of prevention, and the need for proactive 'citizen buy-in' in order to avert NHS bankruptcy, a need supported by the use of 'persuasive technologies.' There is, however, a disconnect between the proactive public health model, and the reactive clinical model, and between expectations and available resource. 'Digital bridges', composed of new information technologies, are used to close the gaps between primary and secondary care, and to link disease-specific pathways.

Iain touched on the impact that the data deluge is having on healthcare, reflecting that knowledge can no longer be managed solely by reading scholarly papers: the datasets and structures now extend far beyond any single study's observations. It is now necessary to build data-centred models, and to interrogate them for clusters via dedicated algorithms.

However, there are holes in the datasets – for example, clinical trials exclude women of childbearing age and subjects undergoing certain treatments – hence electronic health records must be mined in order to fill these gaps, but this can be problematised by a lack of useful metadata, leading to 'healthcare data tombs,' repositories of health records lacking the contextual information to make them useful. Such data resources may be worse than useless: they may be misinformation.

Comprehensible social networks with user-friendly interfaces can be used to improve the quality of metadata, based on the principle that more frequent use leads to better quality information. These networks can also bridge the Balkanisation that can occur when different groups tackle the same issue from varying standpoints (e.g. examining obesity from dietary- and exercise-based perspectives, but not sharing data across these boundaries.) The vision is for a joint, open, unifying and interdisciplinary framework and understanding wherein resources and expertise are shared. Of course, crossing these divides is accompanied by a raft of trust and security issues, and Iain described the various measures that are implemented to cope with them.

Iain discussed the ethical issues surrounding wider use of health record information across the NHS, including consent (opt-in versus opt-out), the right (or lack thereof) of an investigator to go to a patient directly, and – perhaps most controversially – whether it was actually unethical to allow a health dataset to go under-exploited. If this is indeed the case, it follows that there is a real need to audit the demonstrable good that is derived from datasets.

Day 2: 11 March 2010

The morning's session began with a presentation from Nicky Tarry, Department for Work and Pensions (DWP), who spoke on Data Handling: doing the right thing and doing it in the right way. Nicky began by outlining the data held by the DWP: benefits data (including sickness and disability records), drug abuse/criminal history, personal (including ethnicity and family make-up), plus data from surveys and other government departments, including employment, tax and other financial information.

He then covered the legacy security measures that were used to protect data (sampling and partial anonymisation), and the changes put in place to improve them in the wake of recent data loss scandals (full anonymisation, senior sign-off).

He outlined SARA (Survey and Risk Assessment), a risk-based system that was introduced following a recent Cabinet Office review that was triggered by a number of high-profile data leaks. He also spoke of the new Data Labs which are being prepared to process data that meet certain criteria, and which will allow flexibility while still meeting security requirements. Nicky also mentioned the DWP Data Access Ethics Committee, established in 2004 and comprising a mixed membership from a variety of backgrounds.

The next session was entitled Providing a Service, and was split into two parts. The first presentation was from Kevin Ashley, speaking in his role as former head of the Digital Archives Group at the University of London Computer Centre (ULCC), and covered Managing Issues of Confidentiality, Ethics and Trust.

Established in 1997, the National Digital Archive of Datasets (NDAD) is a branch of the National Archives (TNA) dedicated to the preservation of, and access to, digital datasets, metadata and related documents from UK central government departments. NDAD is hosted and managed by ULCC on a sub-contract basis, while TNA identifies the datasets and documents to be deposited.

The focus of Kevin's talk was the process by which users can access sometimes-sensitive information, and records that are deliberately preserved in their original (imperfect) states, hence contextual information is important to explain the imperfections! NDAD always checks with the government department that a record is safe to be released, and records are signed off by both NDAD and that department prior to release as an additional safety mechanism.

Redaction (censorship) of data may be achieved by restricting access (for example via embargo), by excising sensitive data, or via redaction of metadata that describes the sensitive data. The manual redaction process can be time-consuming and therefore expensive. NDAD's rule-based model defines precisely who is allowed to see what, and under which circumstances, and records of past restrictions are also stored and preserved.

Finally, in terms of managing trust, different stakeholders have different concerns: data owners need to be assured that their instructions are being followed, and that they are kept informed about developments; TNA guidelines must be followed and auditable records kept of all requests and accessed; and the general public may wish to ask questions which NDAD is obliged to answer.

Following Kevin, Reza Afkhami of the UK Data Archive (UKDA) spoke about Secure Approaches to Data Management, specifically the UKDA's new Secure Data Service (SDS). Based at the University of Colchester, the UKDA is a TNA-designated Place of Deposit, and houses the UK's largest collection of digital humanities and social science data. The SDS mission is to promote researcher access to sensitive micro data (maximising data access and utility), while simultaneously protecting confidentiality and minimising the risk of disclosure of sensitive information.

Identifying a threshold of maximum tolerable risk involves a trade-off between data utility and disclosure risk, and the SDS threshold for released data sits slightly below this maximum. SDS users are required to register with the service, and access criteria are set based on purpose, users, research output (which is screened), location (this can be restricted to specific IP addresses if appropriate), and data licensing (which involves contracts with users and data owners.) The data security model incorporates a series of checks, including the validity of the statistical purpose, trusted researchers, anonymisation of data where necessary, technical controls (e.g. over encryption and import/export of data), and scrutiny of the research outputs in order to prevent inappropriate disclosure.

Reza also outlined the legal and regulatory framework within which SDS operates, comprising UK law, duties of care, and rules imposed by the data suppliers, as well as the penalties that can be imposed upon researchers who break the rules.

Next up was Reza's colleague Veerle Van Den Eynden from UKDA's Research Data Management Support Services, who gave another side of the story in her presentation entitled Sharing and Managing Sensitive Data in Research with People.

Veerle's presentation addressed two core questions against a backdrop of research with people as participants and/or as subjects for study:

  1. What can researchers do to enable sharing of sensitive data?
  2. How can data archives help researchers?

In response to the first question, the recommendation is that issues relating to data sensitivity and sharing should be considered from the early planning stage, and that pre-existing methodologies should then be adapted to fit the circumstances of the research. It is necessary to include data sharing with an ethical review process, to manage consent in an informed way, and to consider data management once the research is over (and the funding is no longer there.)

Veerle quoted a couple of definitions for 'personal' and 'sensitive' data from the Data Protection Act (1998, and the Data Sharing Review of 2008), and outlined other pieces of legislation which combined to form the pertaining ethical and legal framework, including the Statistics and Registration Services Act (2007) and the Human Rights Act (1998).

With regard to the role of data archives in the research process, the UKDA position is to provide guidance and training for researchers (covering informed consent, anonymisation, data security, and access regulations), guidance for ethics committees (at the institutional, faculty, school or departmental level), and the execution of disclosure checks and data anonymisation during the data archiving process.

The final presentation of the morning came from Andrew Charlesworth, Director of the Centre for IT and Law (CITL) at the University of Bristol, and was entitled Thinking Outside the Tick Box: complying with the spirit, and not just the letter, of the law.

Beginning with a reference to the recent 'Climategate' scandal [4], Andrew noted the scale of interest that third parties (from the public to the press, and beyond) might have in research data, the fact that Freedom of Information (FOI) legislation may apply to it, and the risks that accompany failure to comply with FOI requests. Researchers, therefore, need to be kept aware of potential problems.

Andrew identified potential issues which may arise from a disconnect between regulation and practice. Citing the example of a Faculty Ethics Committee which had issued a blanket ruling that all research data must be kept for a prescribed period, the question of who exactly would take responsibility for the data's safekeeping did not appear to have occurred to the committee, nor did the appropriateness or feasibility of holding onto every last dataset.

Past experience shows that leaving responsibility to the researchers themselves is problematic: they move institution, retire, replace and lose equipment – and furthermore this approach exposes the institution to risk in the shape of not being able to locate particular data within the FOI notice period.

And, echoing Reza's point from earlier in the morning's session, Andrew ended with a couple of notes of caution, the first being: beware lack of sanctions. If there is no sanction for bad data practice, no one takes responsibility for ensuring good practice. The second was: beware the tick box. Experience indicates that policies underpinned by standardised forms breed a complacent 'box-ticking' culture, which generally fails to engender a true engagement with the spirit of the policy, and compromises effective risk assessment and risk management.

Breakout Groups and Subsequent Discussion

After lunch, the delegates divided into three breakout groups:

  1. Identifying a good practice checklist for an HEI dealing with sensitive data
  2. Assessing optimal technical approaches for data curation systems and services
  3. Scoping training requirements for UK HE data managers

Once the delegates had reconvened, the last formal item of the event was a summary of the points raised and next steps, which was provided by DCC Associate Director Graham Pryor. Considering the morning's presentations, he identified the human infrastructure as a key strand running through each of the situations that had been explored. From here he saw the potential for a significant redefinition of institutional repositories, which should be perceived less as a locale or physical entity and instead promoted as a bundle of services, covering an advisory role (data policy and compliance, the benefits and techniques of preservation/ curation), undertaking and supporting data audits, and training. A pre-requisite to success would be for repository/data management staff to engage more actively in a direct relationship with the research project lifecycle and its protagonists.

The issue of applying appropriate and achievable sanctions to non-compliance with measures for dealing with sensitive data raised the prospect of a number of tricky scenarios that would have to be balanced against the more open-handed approach necessary for buy-in to effective data management. It was suggested that the DCC training programme may contribute here by including courses for researchers on data ethics and managing sensitive data.

There were also some messages for the DCC coming from the three breakout groups. Looking at the needs of a good practice checklist, the DCC should be expected to lend its authority by providing advocacy at a high level within institutions, by explaining best practice models, and through targeted training. The concept of a pilot programme was mooted, and this will be explored in the context of the new DCC data management roadshows.

With respect to optimal technical approaches, it was evident that the capability of practitioners would be a first principle, and this again brought everyone's attention back to the DCC training programme. It would also be essential to have a framework of policies for data handling – and buy-in to those policies! – which returns us to the old chestnut of appropriate and effective advocacy, another primary objective of the new roadshows.

The third breakout group, having covered a theme that is core to the DCC's recently commenced Phase 3, produced fresh demands in the shape of a national standard 'driving licence' for data management. However, calls for data management to be embedded in the broader researcher development were thought to be met by the DCC's existing programme of 'Train the Trainer' events. Nonetheless, the DCC mission to increase both capability and capacity was given further definition in the ensuing debate where, for instance, it was acknowledged that university ethics committees as well as IT managers need to gain a broader perspective on the research lifecycle than their offices currently permit. Moreover, it was felt that the need for training in data management should be clearly identified in the data management plans being developed as part of research funding submissions.

Conclusion and Next Meeting

Closing the event, DCC Director-Designate Kevin Ashley introduced himself as having quickly to occupy the 'big shoes' left behind by his predecessor, Chris Rusbridge, but explained that from an early age he had been required to grow into his clothes, and so was up for the challenge! Referring to the DCC having recently entered its third phase, but with a reduced budget and staff, he complimented the RDMF as a means of engaging with a diverse community of stakeholders. This latest event had tackled some very challenging issues, offering serious food for thought as the DCC reflected on its next three-year programme of work.

The Forum's fifth meeting is expected to be held in October or November 2010, again in Manchester. Full details will be released via the JISC Research Data Management mailing list [5] nearer the time, so subscribe to this if you would like to be kept in the loop.

References

  1. Digital Curation Centre (DCC) http://www.dcc.ac.uk/
  2. Research Information Network (RIN) http://www.rin.ac.uk/
  3. All presentation slides are available via the RDMF4 event page on the DCC Web site
    http://www.dcc.ac.uk/events/research-data-management-forum/research-data-management-forum-dealing-sensitive-data
  4. See, for example, "Hackers target leading climate research unit". BBC News: Science and Environment: 20 November 2009, 14:13 GMT
    http://news.bbc.co.uk/1/hi/sci/tech/8370282.stm
  5. JISCMail - RESEARCH-DATAMAN List at JISCMAIL.AC.UK
    http://www.jiscmail.ac.uk/lists/RESEARCH-DATAMAN.html

Author Details

Martin Donnelly
Curation Research Officer
Digital Curation Centre
University of Edinburgh

Email: martin.donnelly@ed.ac.uk
Web site: http://www.dcc.ac.uk/

Graham Pryor
Associate Director
Digital Curation Centre
University of Edinburgh

Email: graham.pryor@ed.ac.uk
Web site: http://www.dcc.ac.uk/

Return to top