Web Magazine for Information Professionals

How to Publish Data Using Overlay Journals: The OJIMS Project

Sarah Callaghan, Sam Pepler, Fiona Hewer, Paul Hardaker and Alan Gadian describe the implementation details that can be used to create overlay journals for data publishing in the meteorological sciences.

The previous article about the Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project [1] dealt with an introduction to the concept of overlay journals and their potential impact on the meteorological sciences. It also discussed the business cases and requirements that must be met for overlay journals to become operational as data publications.

There is significant interest in data journals at this time as they could provide a framework to allow the peer-review and citation of datasets, thereby encouraging data scientists to ensure their data and metadata are complete and valid, and granting them academic credit for this work. This would also benefit the wider community as a whole, as data publication would also ensure that expensive (and often irreproducible) data are archived and curated appropriately. Science, as a discipline, benefits from publishing processes that facilitate the appropriate application of data and the reproduceability of experiments.

The OJIMS Project aimed to develop the mechanisms that could support both a new (overlay) Journal of Meteorological Data and an Open-Access Repository for documents related to the meteorological sciences. Its work was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (NCAS), namely the British Atmospheric Data Centre (BADC) and the University of Leeds.

This article goes into more technical detail about the OJIMS Project, giving details of the software used to deploy a demonstration data journal and operational document repository and the form of the submission processes for each.

OJIMS Aims and Objectives

Aims

At the start of the OJIMS Project, there were three fundamental aims:

  1. Creation of overlay journal mechanics
  2. Creation of an open access subject-based repository for meteorology and atmospheric sciences
  3. Construction and evaluation of business models for potential overlay journals

The third aim has been detailed in our previous article [1], so this contribution will concentrate on the details of the first two aims.

Objectives

The specific objectives of the project were detailed as below.

Repository Set-up

Set up a repository for meteorology and atmospheric sciences capable of preserving documents relating to the subject area with the following in mind:

  1. The repository should take peer-reviewed publications, 'grey' literature (which includes technical reports, images, video, podcasts etc.) and structured metadata documents.
  2. Create the repository's deposit and access polices.

Demonstration Overlay System

Create a demonstration overlay journal system with the following aspects addressed:

  1. The system must present an online journal to the reader, and be capable of organising the workflows associated with the peer-review process.
  2. Construct a prototype data journal (MetData) in order to evaluate its sustainability. This will include review procedures, presentation and trial content.
  3. Construct a prototype 'star-rated' overlay journal (MetRep) in order to evaluate its sustainability. This will include review procedures, presentation and trial content.

Most of these objectives remained the same over the course of the project, though time spent working on the prototype 'star-rated' journal was reduced in order to spend more time on the construction of the prototype data journal. This was decided after in-depth user surveys (as reported in [2] [3]) suggested that the meteorological and atmospheric science communities were more interested in a data journal than the provision of a 'star-rated' overlay journal (mainly due to the low levels of documents in pre-existing repositories). It should be pointed out that the software developed to provide the overlay documents for the data journal is nonetheless equally applicable to the 'star-rated' journal.

However after examining the business models, we discovered that the creation and operation of the data and 'star-rated' journals themselves stood quite explicitly outside the project scope as such work required a long-term commitment from a journal publisher.

Methodology

The main project issues were:

Figure 1 gives an overview of the components required for this project and their interactions. It is worth noting that the software requirements for the data journal and the overlay subject repository are very similar, hence the same basic software (with minor modifications) can be used for both the data journal and overlay subject repository.

diagram (38KB) : Figure 1 : Schematic diagram of the project, detailing the software and procedures (blue ovals) and repositories and processes (square boxes) that are required to build an overlay MetRep subject repository and an overlay MetData data journal

Figure 1: Schematic diagram of the project, detailing the software and procedures (blue ovals) and repositories and processes (square boxes) that are required to build an overlay MetRep subject repository and an overlay MetData data journal

The OJIMS Project Web site was produced to act as a dissemination point for the results of the project, and as a collaboration tool for the project partners. The Web site [4] will remain operational for several years after the project ends to publicise the project results.

Implementation

The work of the OJIMS Project was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (the British Atmospheric Data Centre and the University of Leeds).

Building the National Centre for Atmospheric Science (NCAS) Document Repository

A key deliverable of the OJIMS Project was to create a discipline-based open access document repository embedded within the BADC. There were two main requirements for the subject repository:

  1. A suitable place to lodge grey literature
  2. Mechanics for the creation of records that describe documents in other repositories (overlay documents)

The overlay document requirements are considered in the data journal developments (see Creating the Infrastructure for Overlay Journals) so the subject repository development concentrated on identifying how to provide a suitable place to lodge grey literature.

The deposit policy, documentation and training process for maintenance of the repository system were all developed during the project. The full deposit policy is available on the repository site [5]. It is broken down into separate metadata, data, content, submission and preservation policies. Key parts of the policy are that anyone can access the metadata, full-text and other full data items stored in the repository free of charge, and that items stored in the repository will be retained indefinitely.

Implementation of the subject repository was done by installing the EPrints software (version 3) on a Xen (virtual server) platform running Red Hat Enterprise. The basic configuration was supplemented by:

After populating the repository with some sample content, and training BADC staff to administer the repository, the repository was launched on 30 October 2008, and advertised to BADC users. Documents already held by the BADC and NEODC were were added to the repository. The repository has been running operationally since launch as the Centre for Environmental Data Archival Document Repository (CEDA Docs [9]).

The repository has the standard EPrints interface with the addition of the tags and comments extensions from the SNEEP Project. The standard repository workflows apply. The repository currently has over 200 items mainly added by BADC staff from existing material held within the data centre. 27 users are registered with the repository.

The OJIMS Project provided the funding to run the CEDA document repository for a year, with the principal expenditure devoted to moderating the deposit of new items into the repository. The sustainability and cost modelling of the repository were also investigated, and the costs of running the repository within the BADC in the long term were not found to be prohibitive. Hence the repository will be maintained for the foreseeable future now that the OJIMS Project has ended.

screenshot (60KB) : Figure 2 : Screenshot of the CEDA document repository

Figure 2: Screenshot of the CEDA document repository

Creating the Infrastructure for Overlay Journals

The infrastructure requirements for the overlay journals are similar, regardless of whether the overlay journal is a data journal, or a 'star-rated' journal. The project team examined current overlay infrastructure tools and technologies and chose the Open Journal Systems (OJS) because of its open source nature and the ease of adaption. A series of interfaces and forms were generated for the publishers and authors, including a peer-review management interface and issue construction interface for publishers, and a submission interface form for authors.

Overlay Documents for the Repository and Data Journal

An overlay document is a structure document that is created to annotate another resource with information on the quality of the resource. This document can be referred to as the data description document. However, it contains more than just a description of the data, including, for example, details of the review process context for which it is constructed. It is for this reason that the term 'overlay document' has been coined. The document has three basic elements:

When considering how to encode this information, project staff considered various implementation methods; as this is an annotation document, RDF seemed appropriate. It is potentially harder to render RDF documents for human readers because of RDF's more complex data representation, but as the structure of these documents is not overly complex, it can be done. We took inspiration from annotations of Flickr photos by Masahide Kanzaki [10].

Only openly available software was used to create the overlay document editor and the structure for the data journal. Any modifications made to the software during the project have been made freely available in the sub-version repository on the OJIMS Web site [4].

The creation of the overlay documents used in the overlay journals required a custom-built editor system. This was written using the Pylons Web application framework. The editor system supported creation of documents with XML schema, Dublin Core fields for the overlay documents themselves and, for the overlaid dataset, metadata for the data centre. The OJIMS editor is also freely available from the sub-version repository on the OJIMS site and will remain there for the foreseeable future.

Policies and Procedures for the 'Star-rated' and Data Overlay Journal

This work, led by the RMetS, concentrated on producing viable business plans, as well as submission and acceptance policies for the data and 'star-rated' journal.

The main tasks for the data journal included:

For the 'star-rated' overlay journal, the tasks included:

Both types of overlay journal required sustainability and business modelling. Full details of the policies and procedures for data and star-rated journals can be found in the business models report [11].

For the data journal the acceptance policy for datasets depends on the subject area covered by the data journal and whether the datasets are stored in an existing data centre that satisfies standards of good practice in archiving and data management and which is registered with the data journal. For example, for a data journal specialising in meteorological data, a dataset of rain gauge measurements stored in the BADC (or other accredited data centre) would be appropriate for publication, while a dataset on road traffic flows would not.

The contents of the data journal could be categorized in the following ways:

For the overlay journal and document repository, two types of ratings for the referenced documents were proposed. The first rating advises readers on how far the material has gone through the independent peer-review process, giving four ratings as explained in Figure 3.

diagram (48KB) : Figure 3 : Example method of rating the contents of the repository/overlay journal according to its level of peer

Figure 3: Example method of rating the contents of the repository/overlay journal according to its level of peer [11].

The second form of rating comes from the users of the overlay journal (Figure 4), where users could rate the entry out of 10. The average rating would be displayed alongside the number of reviews and number of downloads.

diagram (12KB) : Figure 4 : Example form of rating from the users of the subject repository or overlay journal

Figure 4: Example form of rating from the users of the subject repository or overlay journal [11].

The Data Journal

A demonstration overlay journal system used to produce a data journal has the following requirements:

The production of an overlay document repository can be done using an analogous process.

Figure 5 gives a schematic view of the data journal structure. The data journal contains a database of XML documents relating to various published datasets. These XML data description documents contain links to the datasets as they are published in various accredited data repositories. The data journal editor edits these XML files, but does not make any changes whatsoever to the underlying datasets.

diagram (22KB) : Figure 5 : Schematic of MetData structure

Figure 5: Schematic of MetData structure

The tactic taken in the development of the demonstration system was to use as much standard online journal technologies a possible, thereby introducing all the functions of journals without engineering new solutions. Various online journal systems considered including the Open Journal Systems (OJS), Digital Publishing System (Dpubs) and Hyperjournal. OJS was chosen because of its open source nature and the ease of adaption. The RIOJA [12] Project also used this software for exactly these reasons.

The approach used was to add the data description documents into the standard workflow of the journal software. The additional elements needed were a tool to author the data description documents and a method to render the documents.

To create these documents a Web-based authoring tool was developed. This was done using the Pylons Web application framework, which allows the rapid development of Web applications in the Python programming language. The code for this application is available from the sub-version repository on the OJIMS Web site [13]. The editor requires input of metadata about the overlaid dataset and other information such as the author of the document. It also adds information set and constrained by the data journal's review processes. For example, a text description of the review process is the same for all documents and is simply inserted from the editor's configuration.

The XML documents produced by the editor were rendered into a human-readable document using a XSLT style sheet when viewing through the data journal interface (see screenshots below).

screenshot (52KB) : Figure 6 : The front page of the journal demo. There is a link from the front page and from the submission page to the overlay document editor

Figure 6: The front page of the journal demo. There is a link from the front page and from the submission page to the overlay document editor

screenshot (38KB) : Figure 7 : The overlay document creation tool. This page is where a new document can be created either online or by uploading an existing document in the same format which has been created by other means

Figure 7: The overlay document creation tool. This page is where a new document can be created either online or by uploading an existing document in the same format which has been created by other means

screenshot (44KB) : Figure 8 : The document in the editing stage. Some fields are editable, others (e.g. format) are set by the configuration of the editor

Figure 8: The document in the editing stage. Some fields are editable, others (e.g. format) are set by the configuration of the editor

screenshot (36KB) : Figure 9 : After submission and review, the documents are viewable in the same ways as any other online journal. The contents of a demonstration journal issue are shown in this screenshot

Figure 9: After submission and review, the documents are viewable in the same ways as any other online journal. The contents of a demonstration journal issue are shown in this screenshot

screenshot (44KB) : Figure 10 : The link to the item allows the Web browser to render the document using an XSLT style sheet

Figure 10: The link to the item allows the Web browser to render the document using an XSLT style sheet

Outcomes of the OJIMS Project

The main project achievements have included:

Impact on the Meteorological Sciences Research Community

A significant part of the OJIMS project work was the survey of scientists and organisations which served to introduce the work the project was doing at the same time as capture the requirements for the data journal and document repository. The results from these surveys are documented in the reports OJIMS Survey of Organisations [2] and OJIMS Survey of Scientists [3].

These surveys and presentations at conferences and meetings served to kick-start a community debate on what materials need archiving and which should be regarded as 'publication-quality'. The OJIMS project has a high profile within the repository and atmospheric science community. At the recent NERC Data Management Workshop (February 2009 [14]) the OJIMS Project was mentioned in more than one key-note speech, with special emphasis on the data journal and its potential ability to provide academic credit for those data scientists who publish their data.

Conclusions and Recommendations

The OJIMS Project has demonstrated that standard online journal technologies are suitable for the development and operation of a data journal as they allow the use of all the functions of journals without the need to engineer new solutions.

OJIMS also showed that there is a significant desire in the meteorological sciences community for a data journal, as this would allow scientists to receive academic recognition (in the form of citations) for their work in ensuring the quality of datasets. The funders of the research that produces these data also benefit from data publication as it raises the profile of the data, ensuring reuse. Furthermore, such publication encourages the scientists involved to submit to accredited data repositories, where their data will be properly archived.

With regards to standards, the OJIMS data journal system chosen was the Open Journal Systems (OJS) and the repository software was EPrints. Both OJS and EPrints were chosen because of their open source nature and their ease of adaption. However they also offer standard interfaces such as OAI-PMH [15].

The overlay document schema incorporated Dublin Core metadata and used RDF to encode the needed information.

The project endeavoured to make use of pre-existing and mature software to implement the document repository and the overlay journal infrastructure, modifying it as appropriate. This was to ensure ease of use and stability of the resulting software.

The OJIMS Project would recommend that further work be done on the implementation and operation of a data journal. The authors are aware of one data journal currently in operation, the Earth System Science Data Journal (ESSD) [16], which has four papers in its library as of time of writing.

Acknowledgements

The authors would like to acknowledge the Joint Information Systems Committee (JISC) as the principal funder of the OJIMS Project under the JISC Capital Programme call for Projects, Strand D: - 'Repository Start-up and Enhancement Projects' (4/06). Complementary funding was provided by NCAS through the BADC core agreement, and also by the Natural Environment Research Council.

References

  1. Sarah Callaghan, Fiona Hewer, Sam Pepler, Paul Hardaker and Alan Gadian, "Overlay Journals in the Meteorological Sciences", July 2009, Ariadne, Issue 60 http://www.ariadne.ac.uk/issue60/callaghan-et-al/
  2. Fiona Hewer, OJIMS Survey of Organisations, Version 2.0, March 2009
    http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_SurveyOfOrgsV2%209Mar2009.pdf
  3. Fiona Hewer, OJIMS Survey of Scientists, Version 2.0, March 2009
    http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_SurveyOfScientistsV2%209Mar2009.pdf
  4. Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) - Trac http://proj.badc.rl.ac.uk/ojims
  5. Policies: CEDA Repository http://cedadocs.badc.rl.ac.uk/policies.html
  6. NERC Open Research Archive http://nora.nerc.ac.uk/
  7. Directory of Open Access Repositories http://www.opendoar.org/
  8. Social Networking Extensions for EPrints http://sneep.ulcc.ac.uk/wiki/index.php/Main_Page
  9. CEDA Repository http://cedadocs.badc.rl.ac.uk/
  10. Image Annotator: The Web Kanzaki http://www.kanzaki.com/docs/sw/img-annotator.html
  11. Fiona Hewer, OJIMS Business Models Report, March 2009
    http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_BusinessModelsV2p1.pdf
  12. Repository Interface for Overlaid Journal Archives (RIOJA) http://www.ucl.ac.uk/ls/rioja/
  13. OJIMS - Trac http://proj.badc.rl.ac.uk/ojims/browser
  14. 2009 Workshop Programme - NERC Data Management Workshop - CEH Wiki http://wiki.ceh.ac.uk/display/nercworkshop/2009+Workshop+Programme
  15. The Open Archives Initiative Protocol for Metadata Harvesting http://www.openarchives.org/OAI/openarchivesprotocol.html
  16. Earth System Science Data (ESSD) http://www.earth-system-science-data.net/

Author Details

Sarah Callaghan
Senior Scientific Researcher and Project Manager
British Atmospheric Data Centre

Email: sarah.callaghan@stfc.ac.uk
Web site: http://badc.nerc.ac.uk

Sam Pepler
Head, Science Support Group
British Atmospheric Data Centre

Email: sam.pepler@stfc.ac.uk
Web site: http://badc.nerc.ac.uk

Fiona Hewer
Environmental Consultant
Fiona's Red Kite

Email: fiona@fionasredkite.co.uk
Web site: http://www.fionasredkite.co.uk/

Paul Hardaker
Chief Executive
Royal Meteorological Society

Email: chiefexec@rmets.org
Web site: http://www.rmets.org/

Alan Gadian
Senior Research Lecturer
National Centre for Atmospheric Science

Email: a.gadian@see.leeds.ac.uk
Web site: http://www.ncas.ac.uk/weather/

Return to top