Historically speaking, scientific publishing has focused on publicising the methodology that the scientist uses to analyse a dataset, and the conclusions that the scientist can draw from that analysis, as this is the information that can be easily published in text format with supporting diagrams. Datasets do not lend themselves easily to normal hard copy publication, even if the size of the dataset were small enough to allow this, and datasets are more useful stored in digital media. This means that the peer review process that provides both scrutiny and validation of academic work is generally only applied to the final conclusions and interpretations of a dataset. Some research areas and some countries and organisations make the underlying datasets available, but generally they are not always tightly coupled to the publications that result from them; nor have they themselves been reviewed. Where such conclusions from the analysis of datasets are of significant importance, either within the academic field, or because the work has legal or policy implications, this becomes a problem. It is widely recognised that conclusions drawn from analysis of a dataset must be based on valid data in order to be sound. Furthermore, as datasets are becoming larger and more complex, a reliable method for peer review of data is needed.
In the meteorological and climate sciences, information about weather and climate change is being scrutinised more than ever to meet the need for advice to policy-makers on greenhouse gas emissions and their consequences. Datasets of meteorological measurements such as air temperature, pressure, rain rates, etc. dating back centuries, are subject to increasing scrutiny and analysis in order to investigate and quantify the effects of climate change.
Overlay journals are a technology which is already being used to facilitate peer review and publication on-line. The availability of the technology enables a wider group of organisations to become publishers (e.g. the RIOJA Project). However, the technology is limited, in some cases, by the accessibility and functionality of what is overlaid and business models are needed to achieve long-term sustainability of overlay journals.
Why Publish Data?
Peer-reviewing and publishing data has benefits for more than just the data scientists who create the datasets. It also benefits the funding bodies that pay for the data to be collected as well as the wider academic community.
Benefits for the Data Scientist
The data scientists who build, maintain, validate and collect the data for large databanks have to ensure that the data are of high quality and that the associated metadata and documentation are complete and understandable. This often represents a major task, which leaves little time for the analysis of the data required to produce a paper suitable for journal publication. Publishing a dataset in a data journal will provide academic credit to data scientists, and without diverting effort from their primary work on ensuring data quality.
Benefits for the Funding Organisation
A key driver for the funding organisations is obtaining the best possible science for their money. Running measurement campaigns is expensive, both in terms of equipment and time, so the more reuse that can be derived from a dataset, the better. Part of the submission process for publication in a data journal is uploading the dataset to a trusted repository where it will be backed up, properly archived and curated. As a result, the problems of data stored on obsolete media or suffering from bit-rot will be avoided, thereby minimising the need to repeat costly experiments.
Similarly, the peer-review process reassures the funder that the published dataset is of good quality and that the experiment was carried out appropriately.
Benefits for the Wider Research Community
When datasets have been peer-reviewed and published, it demonstrates to the wider research community that the datasets are reliable and complete, and therefore the data can be trusted.
Publication of datasets will also be useful to researchers outside the immediate field, as going to a data journal for information about datasets will be a quick and convenient way of finding out not only what high-quality data are available, but also whom to contact about accessing them. This will encourage inter-disciplinary collaboration, and open up the user base not only for the datasets, but also the data journal and the underlying repositories.
Moreover, the availability of published datasets will make it easier to validate conclusions through the reanalysis of those datasets.
The technology required for publishing data is already available in the form of online journal systems. In a lot of cases, the software is available for free, and can easily be downloaded and installed on a Web server.
Overlay journals sit on top of, and make use of, the content stored in other pre-existing repositories. The overlay journal database itself consists of a number of overlay documents, which are structure documents created to annotate another resource with information on the quality of the resource. The overlay document has three basic elements:
- metadata about the overlay document itself;
- information about and from the quality process for which the document was constructed; and
- basic metadata from the referenced resource to aid discovery and identification (Figure 1).
The overlay documents can then be treated as any other documents in an electronic system, and they can provide added-value information about the resource they refer to, for example, a star-rating given by readers, or a series of review comments.
So, overlay journals themselves do not actually store the datasets they reference, instead they simply store overlay documents about the datasets which contain links to the datasets.
The concept of overlay journals is not solely limited to data publication; they can be applied to other objects which can be stored in a repository, but which might not be so easy to reproduce in print, for instance, video or multimedia files. For example, an overlay journal might look at other journal-published and unpublished papers; its overlay document might allow users of the overlay journal to award star ratings to the paper to which it refers. The underlying technology for such an overlay journal remains the same.
The Submission and Review Process
The procedure for submitting a dataset for publication to an overlay journal is analogous to that of submitting a conventional paper to a print or on-line journal (Figure 2).
A scientist wishing to submit a paper for publication first writes and prepares the paper according to the journal style and requirements. The author then submits the paper as an electronic document (usually pdf) to the journal submission site, where it is stored and passed on to the reviewer who reviews the paper against the journal’s acceptance criteria. Once the paper has been accepted for publication, it is released on the journal’s Web site for readers to read.
In the case of a data scientist, who wishes to publish a dataset, the first step remains the same. The dataset has to be prepared for publication, and requires its supporting documentation and metadata. (This is analogous to the editing that needs to be done to a paper before it is submitted.) The dataset must be stored in a trusted data repository, and the data journal would provide guidance on which repositories it trusted. To submit the dataset for review, the data scientist would go to the overlay journal site and fill out the data journal’s form, providing details about where the dataset is stored, and providing reviewers with access to the dataset in order to complete their review. The overlay journal site would then create a document collating all these details in the journal format to pass on to the reviewer. The latter then reviews this overlay document and examines the dataset stored in the repository indicated in order to determine whether it meets the journal’s acceptance criteria. Once the dataset is accepted for publication, the overlay document is released on the overlay journal’s site for readers to access.
The OJIMS Project
The Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project aimed to develop the mechanisms that could support both a new (overlay) Journal of Meteorological Data and an Open Access Repository for documents related to the meteorological sciences. Its work was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (NCAS), the British Atmospheric Data Centre (BADC) and the University of Leeds.
The OJIMS Project aimed to exploit the existing data repository at the BADC, along with the expertise of the RmetS, to develop the mechanisms which could support both a new data journal and a repository for the meteorological sciences. This concept comprised four components:
- a new open access discipline-specific document repository based at the BADC;
- the existing BADC data repository;
- a new overlay journal in which ‘articles’ link peer-reviewed documents to peer-reviewed datasets (codenamed the MetData journal); and
- an overlay journal (codenamed MetRep) framework that would provide links to highly regarded ‘star-rated’ papers via the repository (either to the repository contents or the version of the record held by the original journal publisher).
The work built on the previous JISC-funded CLADDIER Project and took the next steps towards making these two classes of overlay journal (i.e. a ‘data’ journal and a ‘really useful papers’ journal) possible.
During the project, the RMetS undertook work to identify what possible business models exist and to recommend a method for identification and practical implementation of a sustainable business model that would guarantee the longevity of these journals. The analysis has been made available to other learned societies or groups considering such activities via the OJIMS Web site . The software created by the project team to run the overlay journals are open source and available to all via the OJIMS Web site .
These two overlay journal activities address two key issues. Firstly, the application of the peer-review process to data, which has been discussed above. Secondly, the project addressed the issue that while it is easy to create a repository for a discipline, it is not so easy to get it populated. While NCAS as a distributed body of (primarily) university staff needs such an entity, and is in a position to expect (even mandate) NCAS-funded staff to use it, the real success of such a repository would be if it were populated by a much wider community from the UK, Europe and even further afield.
This discipline repository would also provide the documents required to provide references to the proposed overlay journal of ‘star-rated’ papers. The resulting journal should provide a new way of focusing community attention on papers of special merit, regardless of the original journal location. It would also act both as a mechanism to encourage repository population, and as a mechanism to encourage publishers to accept the merits of open access. Achieving a high ‘star-rating’ would also increase the number of citations of a given paper.
Survey of Organisations and Scientists
The OJIMS Project carried out a survey of organisations and scientists to investigate the potential implications for the meteorological sciences should a data journal and an open access repository be created and operated.
Survey of Organisations
The OJIMS survey of organisations report  describes the results of a survey of commercial and public sector organisations which was conducted to assess attitudes to the creation and operation of the proposed Journal of Meteorological Data and an Open Access Repository. It was a small survey with 14 respondents, but representing a wide variety from across meteorology. The respondents included international energy companies with operations in the UK, a non-UK national meteorological service, a UK government agency, a local government authority, small and medium-sized companies involved in instrument manufacture and a private sector provider of weather services.
There was a positive response to the opportunity for these organisations to use any new online facility with information about meteorological sciences from the Royal Meteorological Society; all of the respondents replied that they would make use of such a facility. These 14 organisations identified more than 300 staff who would use the facility, and when asked which topics these staff were most likely to access, all respondents said ‘operational systems and trials’ with some also selecting experimental campaigns, numerical modelling projects, instrument and observing facilities, data structure, software, pre-prints and post-prints.
The single representative of the international meteorological service community (of which there are 190) was a very positive respondent, indicating that there were 50 staff willing to use the facilities proposed, and no obstacles to submitting articles or data.
Commercial sensitivity represents a significant obstacle that will prevent larger commercial organisations making their meteorological articles available through an open access repository. Smaller organisations and public sector organisations saw no obstacles, or merely licensing requirements, though some did not feel they held any articles of wider interest. The three small organisations that manufactured instruments responded that they could definitely make data available to an RMetS publication for free and unrestricted use. Others were less sure, and were concerned about commercial sensitivities.
The OJIMS project team has identified the quality assurance offered by a peer-review process as a benefit of the Journal of Meteorological Data. However, the value of peer review was not highly rated by these organisations. Raising awareness of the organisation’s brand was the benefit most highly rated by those respondents.
Survey of Users
As well as investigating scientists’ reactions to the proposed data journal and open access repository, the survey of users report  also describes reactions to supervised run-throughs of a demonstration of the Journal of Meteorological Data. The survey and demo were conducted at the NCAS Conference in Bristol of 8-10 December 2008 .
The survey achieved a high rate of response from delegates at the conference. More than a third of delegates (85) from 24 institutions responded. Respondents were mainly university-based scientists from the fields of atmospheric composition and chemistry, atmospheric physics, dynamical meteorology and climate science; scientists from meteorological programmes, observations/remote sensing, oceanography, hydrology or other areas were not represented in significant numbers. A high proportion of respondents were less experienced scientists with 46% having less than three years experience of research work, but 25% of respondents had more than 10 years experience.
Further insight into attitudes to the Journal of Meteorological Data came from the supervised run-throughs of a demonstrator by seven volunteers at the NCAS Conference. Useful feedback was made to the supervisor on the benefits to data creators, the review process, branding, version control and citations.
The following summarises the key responses to the user surveys.
Overlay Journal and Open Access Repository for Documents Related to the Meteorological Sciences
- The concept of the Open Access Repository includes both a new subject-based repository and overlay mechanics to search and access it and other repositories, as well as producing a ‘star-rated’ overlay journal for the meteorological sciences.
- The Open Access Repository idea was popular with NCAS delegates with about 70% rating at least one of its features as a great idea that they would use.
- The most appealing feature of the Repository was a ‘Single Web site to search many repositories’ with 71% saying they would use it. This is a function that is provided by overlay journal mechanics.
- User rating of articles, supplementary information, e.g. videos, discussion group open forum, and ‘user comments and tags for items’ attracted minority support (12-18% said they would use each of them). This user rating would be a key feature of the ‘star-rated’ journal.
- Use of other repositories that the new Repository system would overlay is lower than might be expected. Only 19% use repositories as their most common method for getting the full text of articles (a further 28% use them occasionally to do so) and only 38% use institutional repositories to archive their articles. It is concluded that the overlay ‘star-rated’ journal could not become a single, comprehensive source of information for the meteorological sciences unless it attracts unprecedented volumes of deposits in its new repository, or inspires a step-change increase in archiving to existing repositories.
Journal of Meteorological Data
The concept behind the Journal of Meteorological Data is to extend the scientific discipline of peer review to data. To summarise:
- It received a strong positive response in the survey.
- 69% agreed that they would like to access data from an RMetS Journal.
- 67% agreed that they were more likely to deposit their data in a data centre if they can obtain academic credit through a data journal.
- Almost all respondents were users or creators of meteorological data of some kind. Data from experimental campaigns was more commonly used and created by these NCAS delegates than data from General Circulation Models, other numerical models, operational systems, and instrument and observing facilities.
- The only existing data journal in this area is aimed at all environmental sciences. 91% of respondents had never heard of it.
Atmospheric Science Letters, the RMetS online journal, is one of the best known online-only meteorological journals with 93% of respondents having heard of it.
The Business Cases for the Subject Repository and Overlay and Data Journals
The business models produced as part of the OJIMS Project are described in more detail in the Business Models report . A review of publishing in the meteorological sciences was carried out, along with data centres and electronic repositories. Information about the potential usage of the subject repository and overlay journals was collected through the user and organisational surveys described above. From these information collection exercises, the functional requirements, content, benefits and success measures for the repository and journals were identified. In parallel, development of the software infrastructure to support the repository and journals continued, with communication between the two strands to ensure that the user requirements and technical costs were fully understood.
Discussion was carried out with stakeholders regarding the governance and management structure for the repository and journals, along with publication ethics and the review processes and procedures that would be adopted. Finally a market analysis of this information was carried out, including a full cost-benefit analysis.
The recommendations from the business cases were as follows:
In the short term, there is a technical issue and a behavioural issue that would severely limit the chances of success for an Open Access Repository for the Meteorological Sciences. Technically, the mechanics for the overlay of other repositories are immature, and direct access from the Open Access Repository to other repositories is not feasible. The rate of depositing information into repositories is low, and would require a step-change increase for the Open Access Repository to be able to provide comprehensive access to meteorological information.
However, it is recognised that there are needs (listed below) that can be met, in the medium to long term, by the creation of an Open Access Repository for the Meteorological Sciences.
Engagement with technology development stakeholders, learned society members and potential funding organisations is needed to overcome these issues.
- Quality assurance of increasing volumes of information – without compromising peer review or the facilitation of greater information and exchange
- Exploitation of new ways of sharing scientific information e.g. Web 2.0 capabilities such as Web pages wikis, blogs and podcast
- Enabling communication between scientists and stakeholders
- Reduction of environmental costs of print journals and face-to-face communication
With regards to the data journal, there does seem value in either a subscription or an author-pays model for financing a Journal of Meteorological Data. To ensure the long-term success of the Journal of Meteorological Data there needs to be engagement with the community of those involved with data collection and analysis. Further, such a journal would not succeed without the development of strategic relationships with national data centres in several countries.
OJIMS: Conclusions and Implications
The following can be concluded from the OJIMS Project work:
- A document repository capable of storing grey literature (Web pages, project reports, pictures, video etc.) as well as journal papers is acknowledged to be a useful addition to an organisation. The repository created as part of the OJIMS Project is already in use as the operational repository for the BADC/NEODC and will continue to be used as this for the foreseeable future.
- Interaction with meteorological and atmospheric data scientists and organisations has shown that there is a strong need for a method for publishing data (a data journal). Publication of data will ensure that the datasets are of good quality, having been peer-reviewed, and will provide data scientists with academic credit for having created the datasets and placing them in an accredited data repository where the data can be professionally archived and curated.
- Similarly, interaction with meteorological and atmospheric data scientists and organisations has shown that there is a desire to have an overlay repository which can serve as a single point of search for numerous institutional repositories. However, at the moment it is felt that the institutional repositories do not have a critical mass of documentation stored in them to merit the investment required to develop an overlay repository framework.
The implications of this project are considerable for data scientists in the meteorological and atmospheric sciences (and potentially data scientists in other fields). The user surveys have shown that there is a significant desire in the user community for a data journal, which would allow scientists to receive academic recognition (in the form of citations) for their work in ensuring the quality of datasets. The sponsors and funding bodies for the experimental campaigns that produce these data (such as NERC) would also benefit as it would encourage scientists to submit their data to accredited data repositories, where they would be archived and curated.
On the broader subject of document repositories, the project has demonstrated that an overlay repository with the capability to be a single point to search multiple repositories is a tool that would be of value to significant numbers of researchers. However, this does rely on the repositories being searched having a sufficient amount of documents in them in the first place, which is not always the case. Further work on user interaction with repositories and determining why they are not as widely used as they could be, may prove of interest in the future.
The authors would like to acknowledge the Joint Information Systems Committee (JISC) as the principal funder of the OJIMS Project under the JISC Capital Programme call for Projects, Strand D: - ‘Repository Start-up and Enhancement Projects’ (4⁄06). Complementary funding was provided by NCAS through the BADC core agreement, and also by the Natural Environment Research Council.
- Overlay Journal Infrastructure for Meteorological Sciences (OJIMS)
- Fiona Hewer, OJIMS Survey of Organisations, March 2009
- Fiona Hewer,OJIMS Survey of Scientists, March 2009
- 2008 NCAS Atmospheric Science Conference, 8-10 December, Bristol
- Fiona Hewer, OJIMS Business Models Report, March 2009