Web Magazine for Information Professionals

Stargate: Exploring Static Repositories for Small Publishers

R. John Robertson introduces a project examining the potential benefits of OAI-PMH Static Repositories as a means of enabling small publishers to participate more fully in the information environment.

With the wider deployment of repositories, the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH) is becoming a common method of supporting interoperability between repositories and services. It provides 'an application-independent interoperability framework based on metadata harvesting' [1]. Nodes in a network using this protocol are 'data providers' or 'service providers'.

Although repository software supporting OAI-PMH is not overly complex [2], without programming skills or access to technical support, implementing and supporting a repository is not an entirely straightforward task. Static repositories and static repository gateways [3] are a development of the OAI-PMH specification that makes participation in networks of data and service providers even simpler. In essence a static repository is an XML file publicly available online at a persistent address. This file is registered in a static repository gateway which then presents it as a (slightly limited) OAI-PMH data provider.

One community that the static repository approach might benefit is the community of small publishers, particularly those publishers who only produce one or two journals. Such publishers, who may not have dedicated technical support, are less likely to be able to implement and maintain a repository supporting the full OAI-PMH. They might however be able to maintain a static repository, and participate in these wider networks in this way.

This article introduces STARGATE (Static Repository Gateway and Toolkit: Enabling small publishers to participate in OAI-PMH-based services) [4], a project funded by the Joint Information Systems Committee (JISC) and based in the Centre for Digital Library Research at the University of Strathclyde, which is undertaking an investigation of the applicability of this technology to small publishers.

OAI-PMH grew out of an attempt by members of the e-print community to improve access to and dissemination of scholarly communication [5]. The success of the protocol is demonstrated in its implementation, not only in the software commonly used to create e-print repositories (such as Eprints, Dspace, and Fedora) but also in the growth of services that take advantage of the increased access to metadata it allows. The experimental registry at UIUC (University of Illinois at Urbana-Champaign) currently lists 987 existing repositories supporting OAI-PMH [6].

The protocol has found extensive use among data providers, not only because it facilitates the exchange of data (and so has allowed the construction of federated collections of metadata) but also because of the development of a number of specific services that use this metadata. Examples of these include: OAIster [7] - aiming to provide 'a collection of freely available, previously difficult-to-access, academically-oriented digital resources that are easily searchable by anyone' and Citebase [8] - providing 'a semi-autonomous citation index for the free, online research literature. It harvests pre- and post- prints (most author self-archived) from OAI-PMH-compliant archives, parses and links their references and indexes the metadata in a search engine'.

The Benefits and Problems of OAI-PMH Exposure

The OAI-PMH based exposure of metadata held in databases allows services and search engines to index records not otherwise visible to automated processes. For example, Google is harvesting and indexing materials from the National Library of Australia's digital collections through OAI-PMH [9]. The greater availability of metadata to search engines that this technology allows has resulted in increased visibility for scholarly works and other types of assets whose metadata had previously only been visible through a specific interface at a specific location (physical or virtual). This 'unlocking' of metadata has enabled greater access to information about articles and in many cases to copies of the articles themselves - benefiting not only the scholarly community but also the general public.

Although this process has benefited scholars and others, it has also created a problem about which version of an article is being described and linked to. The version of an article an author can provide to a repository is dependent on the copyright agreement between the author and publisher. Thus the metadata record for any given article can link to the deposited copy (pre-print or post-print), the publisher's copy, both copies, or no copy. This variety creates a difficulty for users and publishers; in that, for any given article, the metadata and link(s) to a copy of the paper which are retrieved by a search may not correspond to the formally-published peer-reviewed version (irrespective of users' rights to access the final formal version), and, even if it is the formal version, users may not necessarily have enough information to allow them to cite the article properly.

The potential problem for both publishers and academics is multiplied in that, if the final version is not linked to by the data provider (i.e. the repository), the correct citation (i.e. publishers' final version accessed through their designated provider) of a paper will not occur in higher-level services. This creates the potential for scholars to be referring to the same intellectual effort but in different instantiations - for example, there may be differences in page numbering, content, date of publication, and even author attribution.

Another difficulty is that some of the value of a journal article comes from its co-location with other articles. The focus of a journal, the progression of relevant topics in sequential issues, and the editorial selection of complementary or conflicting articles within an issue (in particular in a themed issue) is lost as any given repository (institutional or subject) will not contain an entire journal run or even a complete issue. Even within higher-level services based on many repositories, retrieving a journal issue is currently almost impossible as the basic metadata set exposed and harvested through OAI-PMH does not explicitly record the journal issue.

A Way Forward

One way to begin obviating these problems is for publishers to become involved in OAI-PMH based services by exposing their metadata. This would not only increase the visibility of the citable formal version, but would, as services provided on the basis of harvested metadata grow in sophistication, also ensure that compound records, produced by services aggregating and disambiguating metadata, include a link to the publisher's version.

Although the involvement of publishers in the interoperability framework provided by OAI-PMH was envisioned at the start of the protocol [5], take-up by the publishing community has been slow. One example of publisher participation in OAI-PMH is that of Inderscience, a publisher of journals 'in the fields of engineering and technology, management and business administration, and energy, environment and sustainable development' [10]. Inderscience worked together with a project team from EEVL (The Internet Guide to Engineering, Mathematics and Computing) to integrate metadata about their products into external services, in particular cross-referencing services. The development of the Inderscience OAI-PMH repository was in part funded by JISC as part of the Metadata and Interoperability Projects (5/03) strand.

The experience of EEVL and Inderscience, and Inderscience's ongoing participation in OAI-PMH, suggests that, in practice as well as in theory, publishers can benefit from exposing their metadata.

An Obstacle to Participation in OAI-PMH and a Proposed Solution

There are however publishers for whom establishing and maintaining such a full OAI-PMH repository may be problematic. The case study provided by EEVL on the above repository development comments that 'the generic task of configuring a web server to handle OAI-PMH requests and parsing out the arguments should involve less than a day of work for someone experienced with setting up Web servers and writing CGI scripts' [11]. Although this task may be straightforward compared to developing other Web services, for small publishers without technical support it may still remain a significant challenge.

The community developing the Open Archives Initiative has striven to make participation in OAI-PMH as easy as possible and has developed a simpler solution. This solution uses a combination of static repositories (XML files) and a static repository gateway. All the participant has to do is create a compliant XML file, place it on a Web server, and register it with a gateway. The static repository is then available for harvesting via the gateway [12].

The utility of static repositories to lower the barriers to participation has been demonstrated in the Open Language Archives Community (OLAC), which has fostered a community 'creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources' [13]. OLAC's network includes both full repositories and static repositories, and they have, alongside the OAI_DC metadata set, implemented community-specific metadata sets to extend the services they can offer. The potential value of static repositories to lower the technical barrier to participation was also highlighted as one of the key outcomes of the HaIRST Project [14].

As the name suggests, static repositories are designed for relatively static collections of metadata. The specification of the protocol, however, allows for changes to the contents of a collection, implying that the use of static repositories for more dynamic collections is certainly possible.

Static repositories may, therefore, present an apt technical solution to allow small publishers to participate in OAI-PMH based services. This use of static repositories would represent an innovative use of the technology as it is being applied to collections of metadata that change as each issue of the journal is released. The STARGATE Project is investigating the applicability of this solution.

The STARGATE Project

The project will demonstrate the applicability of OAI-PMH static repositories by creating a series of static repositories containing publisher metadata, a gateway in which publishers' static repositories are registered and exposed. It will also demonstrate the harvesting of publisher metadata, using HaIRST's ARC harvester, and cross-searching of the exposed metadata, using the EEVL Xtra service.

The project will create case studies documenting the set-up of the static repositories, the initial tools used to support the creation of these static repositories, and will critically reflect on the strengths and weaknesses of the static repositories approach to exposing publisher metadata. This reflective analysis will draw on the publishers' impressions of the processes involved and will also draw comparisons with alternative approaches to exposing publisher metadata. The outcome of this will be to make recommendations on how, and in what circumstances, publishers might choose to implement the static repositories approach.

The four journals (all from the field of Library and Information Science) participating in the project are:

Although all of these publishers provide electronic versions of their journal, they have different publication processes and different technical support available to them. The differences between the journals (frequency of publication, method, staff involved, metadata created) allow the applicability and efficiency of static repositories to be assessed in a variety of settings. One of the publishers, Texas A&M University, is in the process of setting up its own OAI-PMH repository, which may allow a comparison between static and full repositories.

Creating static repositories for publisher metadata will not in itself resolve the difficulties with identifying consecutive articles from particular issues of a journal. It will, however, allow for searches to be restricted to a particular journal and may inform the future development of appropriate metadata elements or extensions.

Conclusion

The outcomes of this project exploring the benefits of static repositories to the publishing community will support both the greater participation of that community within the OAI community and the wider use of static repositories. Enabling small publishers of professional and peer-reviewed journals to expose their metadata increases the visibility of the citable final version and provides other repositories with a clear link to this version. Testing the flexibility of static repositories promotes their use for other, perhaps more dynamic, content such as e-books, e-learning materials and other digital resources.

The project [4] is underway and will finish at the end of May 2006.

References

  1. The Open Archives Initiative Protocol for Metadata Harvesting
    http://www.openarchives.org/OAI/openarchivesprotocol.html
  2. Chumbe, S., Macleod, R. "Developing Seamless Discovery of Scholarly and Trade Journal Resources Via OAI and RSS", Isaias, P., Karmakar, N. eds. Proceedings of the IADIS International Conference WWW/Internet 2003 Algarve, Portugal, 5-8 November 2003 Volume 2 853-856.
  3. Specification for an OAI Static Repository and an OAI Static Repository Gateway
    http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
  4. STARGATE http://cdlr.strath.ac.uk/stargate/
  5. Lagoze, C., Van de Sompel, H., "Building a low-barrier interoperability framework", JCDL '01, June 17-23, 2001, Roanoke, VA. http://www.openarchives.org/documents/jcdl2001-oai.pdf
  6. Experimental OAI Registry at UIUC http://gita.grainger.uiuc.edu/registry/
  7. OAIster http://oaister.umdl.umich.edu/o/oaister/
  8. Citebase http://www.citebase.org/
  9. National Library of Australia Digital Object Repository http://www.nla.gov.au/digicoll/oai/
  10. Inderscience Publishers Ltd. http://www.inderscience.com/mapper.php?id=11
  11. Kerr, L., Corlett J., Chumbe S. (2003) Case Study for the creation of an OAI repository in a small/medium sized publishers
    http://www.eevl.ac.uk/projects_503.htm
  12. Specification for an OAI Static Repository and an OAI Static Repository Gateway
    http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
  13. Open Language Archives Community
    http://www.language-archives.org/
  14. Brophy, P. HaIRST Project summative evaluation: report. Manchester: CERLIM. (2005)
    http://hairst.cdlr.strath.ac.uk/documents/HAIRST-Summative-Evaluation-Final.pdf
  15. Journal of Digital Information http://jodi.tamu.edu/
  16. Information Research http://informationr.net/ir/
  17. Library and Information Research
    http://www.cilip.org.uk/specialinterestgroups/bysubject/research/publications/journal
  18. Information Scotland
    http://www.slainte.org.uk/publications/serials/infoscot/contents.html

Author Details

R. John Robertson
Researcher / Stargate Project Officer
Centre for Digital Library Research
Department of Computer and Information Sciences
University of Strathclyde

Email: robert.robertson@cis.strath.ac.uk
Web site: http://cdlr.strath.ac.uk/

Return to top