Historically speaking, scientific publishing has focused on publicising the methodology that the scientist uses to analyse a dataset, and the conclusions that the scientist can draw from that analysis, as this is the information that can be easily published in text format with supporting diagrams. Datasets do not lend themselves easily to normal hard copy publication, even if the size of the dataset were small enough to allow this, and datasets are more useful stored in digital media. This means that the peer review process that provides both scrutiny and validation of academic work is generally only applied to the final conclusions and interpretations of a dataset. Some research areas and some countries and organisations make the underlying datasets available, but generally they are not always tightly coupled to the publications that result from them; nor have they themselves been reviewed. Where such conclusions from the analysis of datasets are of significant importance, either within the academic field, or because the work has legal or policy implications, this becomes a problem. It is widely recognised that conclusions drawn from analysis of a dataset must be based on valid data in order to be sound. Furthermore, as datasets are becoming larger and more complex, a reliable method for peer review of data is needed.
In the meteorological and climate sciences, information about weather and climate change is being scrutinised more than ever to meet the need for advice to policy-makers on greenhouse gas emissions and their consequences. Datasets of meteorological measurements such as air temperature, pressure, rain rates, etc. dating back centuries, are subject to increasing scrutiny and analysis in order to investigate and quantify the effects of climate change.
Overlay journals are a technology which is already being used to facilitate peer review and publication on-line. The availability of the technology enables a wider group of organisations to become publishers (e.g. the RIOJA Project). However, the technology is limited, in some cases, by the accessibility and functionality of what is overlaid and business models are needed to achieve long-term sustainability of overlay journals.
Peer-reviewing and publishing data has benefits for more than just the data scientists who create the datasets. It also benefits the funding bodies that pay for the data to be collected as well as the wider academic community.
The data scientists who build, maintain, validate and collect the data for large databanks have to ensure that the data are of high quality and that the associated metadata and documentation are complete and understandable. This often represents a major task, which leaves little time for the analysis of the data required to produce a paper suitable for journal publication. Publishing a dataset in a data journal will provide academic credit to data scientists, and without diverting effort from their primary work on ensuring data quality.
A key driver for the funding organisations is obtaining the best possible science for their money. Running measurement campaigns is expensive, both in terms of equipment and time, so the more reuse that can be derived from a dataset, the better. Part of the submission process for publication in a data journal is uploading the dataset to a trusted repository where it will be backed up, properly archived and curated. As a result, the problems of data stored on obsolete media or suffering from bit-rot will be avoided, thereby minimising the need to repeat costly experiments.
Similarly, the peer-review process reassures the funder that the published dataset is of good quality and that the experiment was carried out appropriately.
When datasets have been peer-reviewed and published, it demonstrates to the wider research community that the datasets are reliable and complete, and therefore the data can be trusted.
Publication of datasets will also be useful to researchers outside the immediate field, as going to a data journal for information about datasets will be a quick and convenient way of finding out not only what high-quality data are available, but also whom to contact about accessing them. This will encourage inter-disciplinary collaboration, and open up the user base not only for the datasets, but also the data journal and the underlying repositories.
Moreover, the availability of published datasets will make it easier to validate conclusions through the reanalysis of those datasets.
The technology required for publishing data is already available in the form of online journal systems. In a lot of cases, the software is available for free, and can easily be downloaded and installed on a Web server.
Overlay journals sit on top of, and make use of, the content stored in other pre-existing repositories. The overlay journal database itself consists of a number of overlay documents, which are structure documents created to annotate another resource with information on the quality of the resource. The overlay document has three basic elements:
The overlay documents can then be treated as any other documents in an electronic system, and they can provide added-value information about the resource they refer to, for example, a star-rating given by readers, or a series of review comments.
So, overlay journals themselves do not actually store the datasets they reference, instead they simply store overlay documents about the datasets which contain links to the datasets.
The concept of overlay journals is not solely limited to data publication; they can be applied to other objects which can be stored in a repository, but which might not be so easy to reproduce in print, for instance, video or multimedia files. For example, an overlay journal might look at other journal-published and unpublished papers; its overlay document might allow users of the overlay journal to award star ratings to the paper to which it refers. The underlying technology for such an overlay journal remains the same.
The procedure for submitting a dataset for publication to an overlay journal is analogous to that of submitting a conventional paper to a print or on-line journal (Figure 2).
A scientist wishing to submit a paper for publication first writes and prepares the paper according to the journal style and requirements. The author then submits the paper as an electronic document (usually pdf) to the journal submission site, where it is stored and passed on to the reviewer who reviews the paper against the journal's acceptance criteria. Once the paper has been accepted for publication, it is released on the journal's Web site for readers to read.
In the case of a data scientist, who wishes to publish a dataset, the first step remains the same. The dataset has to be prepared for publication, and requires its supporting documentation and metadata. (This is analogous to the editing that needs to be done to a paper before it is submitted.) The dataset must be stored in a trusted data repository, and the data journal would provide guidance on which repositories it trusted. To submit the dataset for review, the data scientist would go to the overlay journal site and fill out the data journal's form, providing details about where the dataset is stored, and providing reviewers with access to the dataset in order to complete their review. The overlay journal site would then create a document collating all these details in the journal format to pass on to the reviewer. The latter then reviews this overlay document and examines the dataset stored in the repository indicated in order to determine whether it meets the journal's acceptance criteria. Once the dataset is accepted for publication, the overlay document is released on the overlay journal's site for readers to access.
The Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project aimed to develop the mechanisms that could support both a new (overlay) Journal of Meteorological Data and an Open Access Repository for documents related to the meteorological sciences. Its work was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (NCAS), the British Atmospheric Data Centre (BADC) and the University of Leeds.
The OJIMS Project aimed to exploit the existing data repository at the BADC, along with the expertise of the RmetS, to develop the mechanisms which could support both a new data journal and a repository for the meteorological sciences. This concept comprised four components:
The work built on the previous JISC-funded CLADDIER Project and took the next steps towards making these two classes of overlay journal (i.e. a 'data' journal and a 'really useful papers' journal) possible.
During the project, the RMetS undertook work to identify what possible business models exist and to recommend a method for identification and practical implementation of a sustainable business model that would guarantee the longevity of these journals. The analysis has been made available to other learned societies or groups considering such activities via the OJIMS Web site . The software created by the project team to run the overlay journals are open source and available to all via the OJIMS Web site .
These two overlay journal activities address two key issues. Firstly, the application of the peer-review process to data, which has been discussed above. Secondly, the project addressed the issue that while it is easy to create a repository for a discipline, it is not so easy to get it populated. While NCAS as a distributed body of (primarily) university staff needs such an entity, and is in a position to expect (even mandate) NCAS-funded staff to use it, the real success of such a repository would be if it were populated by a much wider community from the UK, Europe and even further afield.
This discipline repository would also provide the documents required to provide references to the proposed overlay journal of 'star-rated' papers. The resulting journal should provide a new way of focusing community attention on papers of special merit, regardless of the original journal location. It would also act both as a mechanism to encourage repository population, and as a mechanism to encourage publishers to accept the merits of open access. Achieving a high 'star-rating' would also increase the number of citations of a given paper.
The OJIMS Project carried out a survey of organisations and scientists to investigate the potential implications for the meteorological sciences should a data journal and an open access repository be created and operated.
The OJIMS survey of organisations report  describes the results of a survey of commercial and public sector organisations which was conducted to assess attitudes to the creation and operation of the proposed Journal of Meteorological Data and an Open Access Repository. It was a small survey with 14 respondents, but representing a wide variety from across meteorology. The respondents included international energy companies with operations in the UK, a non-UK national meteorological service, a UK government agency, a local government authority, small and medium-sized companies involved in instrument manufacture and a private sector provider of weather services.
There was a positive response to the opportunity for these organisations to use any new online facility with information about meteorological sciences from the Royal Meteorological Society; all of the respondents replied that they would make use of such a facility. These 14 organisations identified more than 300 staff who would use the facility, and when asked which topics these staff were most likely to access, all respondents said 'operational systems and trials' with some also selecting experimental campaigns, numerical modelling projects, instrument and observing facilities, data structure, software, pre-prints and post-prints.
The single representative of the international meteorological service community (of which there are 190) was a very positive respondent, indicating that there were 50 staff willing to use the facilities proposed, and no obstacles to submitting articles or data.
Commercial sensitivity represents a significant obstacle that will prevent larger commercial organisations making their meteorological articles available through an open access repository. Smaller organisations and public sector organisations saw no obstacles, or merely licensing requirements, though some did not feel they held any articles of wider interest. The three small organisations that manufactured instruments responded that they could definitely make data available to an RMetS publication for free and unrestricted use. Others were less sure, and were concerned about commercial sensitivities.
The OJIMS project team has identified the quality assurance offered by a peer-review process as a benefit of the Journal of Meteorological Data. However, the value of peer review was not highly rated by these organisations. Raising awareness of the organisation's brand was the benefit most highly rated by those respondents.
As well as investigating scientists' reactions to the proposed data journal and open access repository, the survey of users report  also describes reactions to supervised run-throughs of a demonstration of the Journal of Meteorological Data. The survey and demo were conducted at the NCAS Conference in Bristol of 8-10 December 2008 .
The survey achieved a high rate of response from delegates at the conference. More than a third of delegates (85) from 24 institutions responded. Respondents were mainly university-based scientists from the fields of atmospheric composition and chemistry, atmospheric physics, dynamical meteorology and climate science; scientists from meteorological programmes, observations/remote sensing, oceanography, hydrology or other areas were not represented in significant numbers. A high proportion of respondents were less experienced scientists with 46% having less than three years experience of research work, but 25% of respondents had more than 10 years experience.
Further insight into attitudes to the Journal of Meteorological Data came from the supervised run-throughs of a demonstrator by seven volunteers at the NCAS Conference. Useful feedback was made to the supervisor on the benefits to data creators, the review process, branding, version control and citations.
The following summarises the key responses to the user surveys.
The concept behind the Journal of Meteorological Data is to extend the scientific discipline of peer review to data. To summarise:
Atmospheric Science Letters, the RMetS online journal, is one of the best known online-only meteorological journals with 93% of respondents having heard of it.
The business models produced as part of the OJIMS Project are described in more detail in the Business Models report . A review of publishing in the meteorological sciences was carried out, along with data centres and electronic repositories. Information about the potential usage of the subject repository and overlay journals was collected through the user and organisational surveys described above. From these information collection exercises, the functional requirements, content, benefits and success measures for the repository and journals were identified. In parallel, development of the software infrastructure to support the repository and journals continued, with communication between the two strands to ensure that the user requirements and technical costs were fully understood.
Discussion was carried out with stakeholders regarding the governance and management structure for the repository and journals, along with publication ethics and the review processes and procedures that would be adopted. Finally a market analysis of this information was carried out, including a full cost-benefit analysis.
The recommendations from the business cases were as follows:
In the short term, there is a technical issue and a behavioural issue that would severely limit the chances of success for an Open Access Repository for the Meteorological Sciences. Technically, the mechanics for the overlay of other repositories are immature, and direct access from the Open Access Repository to other repositories is not feasible. The rate of depositing information into repositories is low, and would require a step-change increase for the Open Access Repository to be able to provide comprehensive access to meteorological information.
However, it is recognised that there are needs (listed below) that can be met, in the medium to long term, by the creation of an Open Access Repository for the Meteorological Sciences.
Engagement with technology development stakeholders, learned society members and potential funding organisations is needed to overcome these issues.
With regards to the data journal, there does seem value in either a subscription or an author-pays model for financing a Journal of Meteorological Data. To ensure the long-term success of the Journal of Meteorological Data there needs to be engagement with the community of those involved with data collection and analysis. Further, such a journal would not succeed without the development of strategic relationships with national data centres in several countries.
The following can be concluded from the OJIMS Project work:
The implications of this project are considerable for data scientists in the meteorological and atmospheric sciences (and potentially data scientists in other fields). The user surveys have shown that there is a significant desire in the user community for a data journal, which would allow scientists to receive academic recognition (in the form of citations) for their work in ensuring the quality of datasets. The sponsors and funding bodies for the experimental campaigns that produce these data (such as NERC) would also benefit as it would encourage scientists to submit their data to accredited data repositories, where they would be archived and curated.
On the broader subject of document repositories, the project has demonstrated that an overlay repository with the capability to be a single point to search multiple repositories is a tool that would be of value to significant numbers of researchers. However, this does rely on the repositories being searched having a sufficient amount of documents in them in the first place, which is not always the case. Further work on user interaction with repositories and determining why they are not as widely used as they could be, may prove of interest in the future.
The authors would like to acknowledge the Joint Information Systems Committee (JISC) as the principal funder of the OJIMS Project under the JISC Capital Programme call for Projects, Strand D: - 'Repository Start-up and Enhancement Projects' (4/06). Complementary funding was provided by NCAS through the BADC core agreement, and also by the Natural Environment Research Council.