In 1999 Sir John Taylor , then Director General of the UK Research Councils, talked about e-Science, i.e. global collaboration in key areas of science and the next generation of infrastructure that will support it. It encompasses computationally intensive science that is carried out in highly distributed network environments or that uses immense datasets that require grid computing. In the US the term cyberinfrastructure has been used to describe the new research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualisation and other computing and information processing services over the Internet. In Australia—and other countries—the term eResearch extends e-Science and cyberinfrastructure to other disciplines, including the humanities and social sciences, and denotes the use of information technology to support existing and new forms of research.
It is within this rapidly evolving context that the researcher of the 21st century now operates. However not all researchers are responding to changes in this new environment. In this article we will examine the current research paradigm, the main drivers for researchers to engage with this paradigm, reasons for lack of engagement, and a project undertaken at an Australian university—as part of a national initiative—to start to address the problem of making data from research activity, past and current, more discoverable and accessible.
Trends in the Evolution of Research Paradigms
Gray  discusses the evolution of research paradigms which has led to the increasingly important role that information technology now plays in supporting research. For thousands of years there was experimental / empirical science followed in the last few hundred years by theoretical science. The third research paradigm, which evolved in the last few decades, was characterised by increasingly complex research challenges based principally on large-scale computational simulation. This led to the concept of holistic systems of systems; the evolution from wet labs (hands-on scientific research and experimentation) to virtual labs; and an emphasis on modelling, simulation, projection and prediction.
E-Science / cyberinfrastructure / eResearch are short-hand terms for a new fourth paradigm which is characterised by data-intensive science. The focus is on data analysis and mining; patterns discovery; and the evolution of large databases and data archives. One by-product is the so-called 'data deluge', which has led to enormous challenges in research data management. The fourth paradigm is changing the longstanding model of scholarly communication as, in the words of Clifford Lynch , 'the paper becomes a window for a scientist to not only actively understand a scientific result, but also reproduce it or extend it'. This latest paradigm is also characterised by the collaborative and multidisciplinary nature of the research being undertaken at both national and international levels.
National and International Drivers for Change
Governments worldwide are investing in national research information infrastructures to drive national innovation. Because universities clearly have a central role in the generation of knowledge and innovation, they are major stakeholders in national innovation strategies. A university's research impact—the extent to which its research informs further research and practice—is a significant component of the university league table measures .
In May 2004 the Australian Government announced that it would establish Quality and Accessibility Frameworks for Publicly Funded Research as part of the Backing Australia's Ability – Building our Future through Science and Innovation. The principal objective of the Research Quality Framework (RQF) initiative was to develop the basis for an improved assessment of the quality and impact of publicly funded research as well as an effective process to achieve this . The proposed Australian model, not unsurprisingly, had strong parallels with its British counterpart. However, as it happened, feedback and submissions from the sector in response to various issues took place prior to the announcement in March 2006 that the UK government was going to change its Research Assessment Exercise (RAE) model.
Ultimately both countries adopted amended research evaluation models which, however, retained the original principal objectives. The UK now has its Research Excellence Framework (REF)  while Australia has just initiated its Excellence in Research for Australia (ERA) . At the same time New Zealand has introduced the Performance-Based Research Fund (PBRF) . In Australia ERA will be used to measure the quality of Australian universities' research against international benchmarks as well as to inform funding decisions.
Another driver is the fact that good research requires good management. In recognition of this principle, many major research funders worldwide either currently have or are implementing policies that require grant holders to submit data management plans for formal approval and to manage their data in accordance with those plans. The National Science Foundation, for example, has mandated that data management plans will be subject to peer review .
In terms of the actual practice of eResearch, stellar examples in Australia include supercomputing (e.g. climate modelling), human physiome project, bioinformatics, Australian Synchrotron , and preservation of languages and cultures. However these are but 'islands of excellence in oceans of ignorance' . Australia is not unique is this regard.
Why therefore—despite all the clear indications of how research is evolving—is there such a low uptake of eResearch? The answer can be found in a model developed to explain the lifecycle for the adoption of technology. The technology adoption lifecycle is a sociological model developed by Joe M. Bohlen, George M. Beal and Everett M. Rogers at Iowa State University, building on earlier research conducted there by Neal C. Gross and Bryce Ryan . Subsequently in his book, Crossing the chasm, Geoffrey Moore  proposes a variation of the original lifecycle. He suggests that for 'discontinuous or disruptive innovations', there is a gap or chasm between the first two adopter groups (innovators / early adopters), and the early majority.
Moore's technology adoption lifecycle graph  describes how technology enthusiasts and visionaries are the first to embrace a new technology, followed by a frustrating period of time before the pragmatists (early majority) start to utilise the technology, followed by conservatives (late majority) and finally sceptics (laggards).
Using Moore's model for the adoption of technology, we are at that point on the chart where eResearch needs to move across the 'chasm' which separates early adopters from the early majority.
Deterrents to Crossing the Chasm
With increased opportunity comes increased complexity. Across all disciplines researchers are expressing concern about:
- duplication of effort
- loss or difficulty in recovering data for use in future research projects
- security of confidential data
- data organisation
- data backups and archiving data for long-term preservation
- data sharing or publishing
- data ethics
- data synchronisation
Undoubtedly one of the largest deterrents is the effort required of researchers not only to locate their data, but also to format it for sharing. This highlights the lack of implementation of clearly defined data standards, e.g. formats and metadata . Compounding the problem is that even if researchers are prepared to allocate resources to data sharing, it is unquestionably a lot of work and in many cases there are no mechanisms in place to facilitate the process. The latter is particularly true of the newer data types.
Researchers are caught in the tension between being pushed to adopt eResearch practices and resistance to change because of perceived obstacles.
Addressing the Challenge
In developing and supporting research infrastructure, it is clear that the content—and content is used here to encompass all research output—will not achieve critical mass by virtue of individual voluntary effort. It is a huge task which should not be left to non-profit organisations and individual universities, writes James Boyle, a Duke University law professor and founding board member of Creative Commons . Instead the energy must shift to a coordinated effort between institutions, particularly universities, and the national government. This need for high-level collaboration has been echoed in a recent report to the European Commission . In addition, as O'Brien  observes, individuals are important to the outcome. The infrastructure must build 'a bridge between researchers, university and national priorities'. Such an infrastructure will—by its very nature—help bridge the eResearch chasm.
Australian National Agenda
As part of the Australian government's NCRIS (National Collaborative Research Infrastructure Strategy) initiative, the Australian National Data Service (ANDS) was formed to support the 'Platforms for Collaboration' capability. The service is underpinned by two fundamental concepts:
- with the evolution of new means of data capture and storage, data has become an increasingly important component of the research endeavour, and
- research collaboration is fundamental to the resolution of the major challenges facing humanity in the twenty-first century .
With a view to increasing the visibility / discoverability of Australian research data collections, ANDS is building the Research Data Australia (RDA) service . It consists of Web pages describing data collections produced by or relevant to Australian researchers. RDA publishes only the descriptive metadata; it is at the discretion of the custodian whether access, i.e. links, will be provided to the corresponding data. Behind RDA lies the Australian Research Data Commons (ARDC)  which is a combination of the set of shareable Australian research collections, the descriptions of those collections including the information required to support their reuse, the relationships between the various elements involved (the data, the researchers who produced it, the instruments that collected it and the institutions where they work), and the infrastructure needed to enable, populate and support the Commons.
Griffith University: Creating a Framework for eResearch
Griffith University has received NCRIS grant funding for research data identification and discovery. Griffith's Seeding the Data Commons project has captured data about the University's research datasets, has assessed each dataset and determined appropriate access, and has then published 1,100+ records to Research Data Australia. In addition Griffith University is the lead partner with the Queensland University of Technology in an ANDS-EIF (Education Investment Fund) project to develop a middleware software solution which will aggregate data sources from within the University for uploading to Research Data Australia.
In a university with an active, broad research programme, it is to be expected that research data collections will reside in a range of different repositories, e.g. specialised discipline-specific repositories for stem cell research, historical data, and environmental data. In order to participate in Australia's collaborative research infrastructure, universities will need to generate and collate a consistent metadata feed in order to populate Research Data Australia (RDA).
The Research Activity (Metadata Exchange) Hub is a joint Griffith University and Queensland University of Technology (QUT) project, funded by ANDS, for the purpose of developing a master collection of research data within the respective institutions, along with an automated update (feed) to Research Data Australia. The hub collects appropriate metadata from research collections (at the content metadata level where possible) within the University through customised feeds from the various university content management systems. Also where authoritative source metadata is held in University corporate systems, feeds extract data directly from those databases. This hub then acts as a central university repository to feed information in a standard format to Research Data Australia as well as university library discovery tools and other research federations where appropriate. The overall project objectives are:
- to develop a sustainable solution to automate the collection of new research data held within the University and to populate RDA; and
- to provide exemplars / good practice for Australian universities which want to be part of the national collaborative research infrastructure.
The following diagram depicts the role that the metadata hub plays in aggregating data and identifying the relationships between key data elements. In addition two other external interfaces are required to complete the metadata picture. They are required to utilise persistent identifiers from ANDS and the National Library Australia People Australia  service. The end result is that this service not only integrates and aggregates data within the institution, but it also provides a key link into national systems.
Given that the architecture that defines the hub must be open source, the decision was taken to implement a loosely coupled solution based on the Vitro software  developed at Cornell University, which is an open source Integrated Ontology Editor and Semantic Web Application. The project is using this solution to support a research-focused ontology and to establish relationships between researchers and organisations, research collections, research activities (e.g. projects) and services. The project uses several other open source components, e.g. Persistent ID generator, OAI-PMH provider and data integrator. This approach has enabled maximum use of existing software and best use of programming time.
The following diagram is a simple illustration of the Metadata Exchange Hub components. Vitro is referred to on the following diagrams as 'VIVO', which is based on Cornell's implementation of Vitro, which has been implemented without changes to the underlying software architecture.
Research activity metadata is uploaded to Research Data Australia (RDA) using the Registry Interchange Format - Collections and Services (RIF-CS). This data interchange format is based on ISO 2146:2010 Information and documentation -- Registry services for libraries and related organizations . In addition an important part of the project has involved the development of a national research-focused ontology, based on the core Vitro ontology, which has been successfully deployed in the first version of the tool . This ontology employs components of a number of established ontology standards and describes the relationships between them. Collectively they provide a coherent framework for mapping the bulk of institutional research activity in Australia. The table below lists all ontologies that are included in this customised version of Vitro.
ANDSHarvest (includes RIF-CS)
Dublin Core elements
Dublin Core terms (includes RIF-CS)
Event Ontology (includes RIF-CS)
FOAF (includes RIF-CS)
FOR 2008 Ontology
SEO 2008 Ontology
SKOS (Simple Knowledge Organization System) (includes RIF-CS)
Vitro public constructs
VIVO core 1.0 (includes RIF-CS)
Internal vitro/vivo ontology
Griffith University Profile Extensions
Internal vitro/vivo ontology
The architecture of the Hub has been designed to allow for automatic machine-to-machine communication for the ingest of university research activity data. In the first step (Figure 4) previously identified relevant metadata is harvested from university repositories, data stores and corporate systems in its native form.
The next step (Figure 5) is to check if persistent identifiers exist for any people or projects in national systems. Key national systems are operated by the National Library of Australia (Trove , People Australia), Australian Research Council (ARC), and the National Health and Medical Research Council (NHMRC). In the case of Trove  and ANDS, if none exist, requests are made (machine to machine) to create an ID, i.e. new researcher person ID.
The final step (Figure 6) is to upload the file to the RDA for publication. This is done by making the RIF-CS formatted metadata available for harvest via an OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interface using the OAI-CAT component . Research Data Australia periodically harvests the new and updated institutional data via this interface.
The connectivity between research data and researchers is important, especially for purposes of reuse and in cross-disciplinary research. Identifying relationships between people, projects and institutions, for example, enhances opportunities for collaboration and new research  . An important part of Griffith University's Metadata Exchange Hub is to expose the relationships –using RIF-CS—among researchers, their projects and their research outputs, as illustrated in Figure 7
A more detailed analysis of the technical aspects of the implementation was presented at the 2010 VIVO conference .
Implications for Researchers
Researchers are already providing feedback to Research Data Services (Division of Information Services) on corrections to metadata. In addition, exposing their research in RDA has made many of them more aware of Griffith's own institutional repository. They have been contacting the Division, requesting assistance in getting more of their publications (metadata and full-text files) into the Institutional Repository. It has been heartening to see the engagement from researchers. A support page for Research Data Australia on the Griffith Library Web site provides information for other interested researchers.
In addition researchers are seeing the potential for building a rich personal profile based on the relationships as shown above in Figure 7. Research Centre directors are seeing opportunities to take this one step further by using these relationships to develop research centre profiles. Griffith University in turn is building an institutional research profile. An increase in the collection of research data should increase opportunities for identifying collaborators for new research and identifying existing data for reuse.
As a consequence of researchers publishing this data, ideally more attention will be paid to research data management including preservation. This initiative should also help to prepare researchers for new scholarly publishing paradigms, especially integration of data with publications . All of this work ideally will lead to improvements in research quality  as well as an increase in the rate at which new discoveries are made and put to use .
In Moore's model the key to crossing the chasm—for marketing and sales—requires the following steps: target a specific market niche as the 'point of attack' and then focus all the resources on achieving the dominant leadership position in that segment. In Australia, the Australian National Data Service (ANDS) has focused on research collaboration and has taken a leadership role in building a national collaborative research infrastructure. This strategy is designed to facilitate better engagement with eResearch by Australian researchers and to integrate with other international initiatives.
It was suggested at the beginning of this paper that coordinated efforts should be initiated at a national level between the government and institutions, particularly universities. Within this context, initiatives such as those undertaken at Griffith University start to build the next layer of infrastructure which connects individual researchers, universities and the national research community.
At an institutional level the Griffith Metadata Exchange Hub is a first step in building local infrastructure which helps address some of the deterrents for researchers to 'cross the chasm' by removing key technology barriers. It encourages researchers to see beyond the 'project' to the importance of making data from research activity, past and current, more discoverable and accessible.
The authors would like to acknowledge a speech presented at Griffith University in 2010 by Rob Cooke, CEO, Queensland Cyber Infrastructure Foundation, which inspired them to contextualise this discussion in terms of Moore's chasm. They would also like to acknowledge the work of the Research Collection Metadata Exchange Hub Project Team - a joint effort among Griffith University, Queensland University of Technology and the Australian National Data Service - for the technical aspects of this paper.
- Hey, Tony and Anne E. Trefethen (2002). "The UK e-Science Core Programme and the Grid", Future Generation Computer Systems 18(8): 1017-1031
- Hey, Tony, Stewart Tansley, and Kristin Tolle (eds) (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, Redmond,
- O'Brien, Linda (2010a). "The changing scholarly information landscape: reinventing information services to increase research impact",
ELPUB2010 - Conference on Electronic Publishing, Helsinki http://hdl.handle.net/10072/32050
- Richardson, Joanna (2006). "Research Quality Framework as a catalyst for open access",
AusWeb06, Lismore, Australia http://ausweb.scu.edu.au/aw06/papers/refereed/richardson/
- Higher Education Funding Council for England, Research Excellence Framework http://www.hefce.ac.uk/research/ref/
- Australian Research Council, Excellence in Research for Australia http://www.arc.gov.au/era/default.htm
- Tertiary Education Commission (NZ), Performance-Based Research Fund
- National Science Foundation (2010). "Scientists seeking NSF funding will soon be required to submit data management plans"
- Australian Synchrotron http://www.synchrotron.org.au/
- Cooke, Rob (2010). "eResearch services and advanced IT – the next generation", unpublished presentation, Griffith University, Australia
- Bohlen, Joe M. and George M. Beal (1957). "The diffusion process", Special Report No. 18 (Agriculture Extension Service, Iowa State College) 1: 56–77
- Moore, Geoffrey A. (1991). Crossing The Chasm. New York: HarperBooks
- This figure is available, under Creative Commons Attribution 3.0 Unported License,
- Nelson, Bryn (2009). "Empty archives", Nature 461(7261):160-163 http://dx.doi.org/10.1038/461160a
- High Level Expert Group on Scientific Data (2010). "Riding the wave - how Europe can gain from the rising tide of scientific data. A submission to the European Commission".
- O'Brien, Linda (2010b). "Innovation, university research and information infrastructure: making sound investments in information infrastructure", EUNIS (European University Information Systems) Congress, Warsaw, June http://hdl.handle.net/10072/32064
- Sandland, Ron (2009). "Introduction to ANDS", Share: Newsletter of the Australian National Data Service, 1(July):1
- Australian National Data Service, Research Data Australia http://services.ands.org.au/home/orca/rda/
- Australian National Data Service, Australian Research Data Commons http://ands.org.au/guides/discovery-ardc.html
- National Library of Australia, People Australia http://www.nla.gov.au/initiatives/peopleaustralia/
- Vitro http://vitro.mannlib.cornell.edu/
- International Standards Organisation, ISO 2146:2010 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44936
- Australian National Data Service, Registry Interchange Format - Collections and Services (RIF-CS)
- Rose Holley, "Trove: Innovation in Access to Information in Australia", July 2010, Ariadne, Issue 64
- OCLC, OAICat http://www.oclc.org/research/activities/oaicat/default.htm
- Buetow, Kenneth H. (2009). "Speeding research and development through a collaborative ecosystem", Collaborative Innovation in Biomedicine, Washington, DC
- Thelwall, Mike, Xuemei Li, Franz Barjak and Simon Robinson (2008). "Assessing the international web connectivity of research groups", Aslib Proceedings 60(1), 2008:18-31
- Rebollo, Robyn, Lance De Vine and Simon Porter (2010). "Building an Australian user community for VIVO", VIVO Conference, New York
- He-Ze Lin, Xue-Song Geng and Colin Campbell-Hunt (2009). "Research collaboration and research output: A longitudinal study of 65 biomedical scientists in a New Zealand university", Research Policy 38(2):306-317