![]()
The National Library of Australia (NLA) has been able to achieve new business practices such as digitising its collections and hosting federated search services by exploiting recent standards including the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), handles for persistent identification, and metadata schemas for new types of content. Each instantiation of the OAI-PMH opens up new ways of creating and managing our digital libraries while making them more accessible for learning, teaching and research purposes. Using handles as the basis for managing the persistence of a large, digitised collection has allowed information to be identified and cited in many different ways. Standards have transformed, and continue to transform, the way in which the National Library conducts its core business of making its digital library collections available for all to use.
Although the National Library of Australia has always adhered to the use of standards such as MARC21 and AACR2 for the creation and management of bibliographic data [1], our adoption of newer Web-based protocols has allowed users distant from the Library to experience digital and digitised collections in virtual ways. The Library has been able to retain legacy system investments by surfacing rich metadata into new services. This process has encouraged the exploration of the ongoing suitability of metadata schemas, as well as contributing to the assessment of the need for new schemas.
The Library promotes the standards it uses with a comprehensive overview on its Web site providing links to more detailed documents which can act as guidelines for other service providers [2].
This article focuses specifically on the application of standards in three areas:
Under the imprimatur of its Australian National Bibliographic Database (NBD) [3], the Library has supported federated resource discovery for more than two decades. Australian libraries have contributed cataloguing and holdings records for finding and copying in a centralised framework since 1981. Even when these processes gradually changed to a hybrid model, where records were created on individual open access catalogues, the contribution to a central discovery point remained intact. There is still a commitment to sharing information at this national level.
The arrival of other metadata schemas such as the Dublin Core allowed the National Library to emulate the hybrid model for the discovery of digital objects. The Library first became interested in the Open Archives Initiative when its harvesting protocol was known as the Santa Fe convention [4]. However, it was not until the stable OAI-PMH version 1.1 became available that the Library implemented it, initially in the PictureAustralia service [5]. The service was able to move from a clunky http/HTML method of harvesting to the more streamlined use of the OAI-PMH. This decision enabled other cultural agencies such as state and regional libraries and museums to become familiar with its use. Australian university libraries and cultural institutions overseas are now also providing digitised images to the service [6].
Before using the OAI Protocol for Metadata Harvesting, complete Web harvesting for PictureAustralia took about 14 days every two months. Harvesting larger sites with around 200,000 metadata records took up to five days and was not completely reliable. Sometimes a harvest of these large sites would fail and have to be completely re-done. The PictureAustralia service was really only up to date six times per year.
After implementing OAI, a complete OAI harvest of the large sites took about 4 hours. Incremental OAI harvests take less than one minute. At present the Library uses a hybrid model where the larger sites which have OAI are incrementally harvested every day. The smaller sites are Web-harvested once per week. PictureAustralia is therefore completely up to date once a week, which represents an improvement in the currency of the more than one million records in the service.
Experience with the OAI-PMH and the use of the Online Computer Library Center (OCLC)'s OAICat software [7] with the Library's digital object repository opened up our digital libraries of cultural heritage materials for inclusion in international federated resource discovery services such as Google [8], OAIster [9] and the Research Library Group's Cultural Materials Initiative [10].
Use of the Protocol in open services, which are available 24 hours and 7 days a week, gives the National Library a solid platform from which to move to the next stage of new service development. The Library has been encouraging the university sector to work with OAI infrastructure by using a small prototype to harvest research outputs and collocate materials useful for research purposes [11]. This work is being progressed as part of the Australian Higher Education sector's ARROW Project [12]. The Library is developing a national discovery layer, which will harvest the metadata for all research outputs from individual institutional repositories and provide cross-searching services. Additional functionality is still being considered.
One area where it has been difficult to obtain international agreement is in the establishment of a single Universal Resource Naming scheme for digital or digitised objects [13].
The Library considers persistent identification of digital objects to be a necessary part of managing a digital library, just as ISBN and International Standard Serial Number (ISSN) assignments are essential components of managing a print-based collection. Identification schemes for print-based objects such as books (ISBNs) or journals (ISSNs) or sheet music (International Standard Music Numbering - ISMNs) were tested, and in some cases were successfully redeployed as a component part of a digital identifier for digital materials, but they do not match requirements exactly for objects which do not emulate print forms.
The Library introduced a persistent identification scheme in 2001 to assign identifiers to objects in sub-collections, such as Web sites captured into the digital archive PANDORA [14]. Based on the Handle system, the scheme was extended to provide further intelligence for composite digitised objects such as manuscripts. For example, <collection id>-<collection no.>-<series no.>-<item no.>-<sequence no.>-< role code>-<generation code> becomes nla.ms-ms8822-001-0001-002-d for the file which is the display image of the second page of the first item in series 1 of the Mabo papers [15]. The Library has recently registered its persistence schemes in the Info-URI registry hosted by OCLC [16].
"Using persistent identifiers provides the ability to guarantee:
A persistent identifier scheme is also being used in the ARROW Project, which provides an additional commitment to the delivery of a top-quality service.
The National Library has worked with descriptive metadata standards such as MARC21 and AARC2 since the 1960s. But there has been a well-recognised controversy over the use of bibliographic standards in recent times. The return on investment in the metadata creation process has been challenged [18].
It is true that there has been an explosion in the amounts of information, in packaged or unpackaged form, which needs describing. There are simply not enough qualified professionals such as librarians and indexers available to create the necessary descriptions for subsequent discovery and management of information objects. Providers of tertiary-level information services have started to query this. The ARROW Project is exploring a combination of solutions for the creation of metadata, which will adhere to the following seven principles:
The working environment will also dictate in part who creates the metadata. The ARROW Project is investigating whether a shared approach will deliver the best result. This concept has already been explored to a certain extent for the UK Higher Education sector by the ePrints UK service [26].
The creation/addition of metadata in any working environment is not necessarily undertaken in the implied linear order of the diagram below (provided by UKOLN) - it should be an iterative process - but it does exemplify how multiple roles in the metadata creation process are possible. The new business cases for the management of research outputs, postulated by the establishment of individual institutional repositories, allows for metadata workflows to be engineered afresh. They are not restricted by pre-existing data conditions often imposed by legacy metadata [27].

Figure 1: Metadata creation workflow
(Diagram Source: Improving the Quality of Metadata in Eprint Archives, Marieke Guy, Andy Powell and Michael Day, Ariadne Issue 38 [26]). The ARROW Project is keen to explore how workflows for metadata creation can be transformed by this approach. Automated approaches may require review combined with enhancement.
Information specialists including librarians and indexers can add metadata from rich schemes such as Library of Congress Subject Headings after the creator of the work creates a skeleton metadata record.
Metadata, the bread and butter of cataloguing services such as the Australian National Bibliographic Database, can attract a dual responsibility and continue to facilitate the sharing of information by libraries for the benefit of everyone else. A shared approach provides a response to the concerns expressed recently by Tony Hey, Director of the e-Science Project [28]. Capturing the metadata, using a combination of people with a stake in the longevity of their work and automated software is the first step in changing the way digital objects, the foundation stone of digital libraries, can be identified, captured and managed in perpetuity. What better way to transform services to become our digital libraries of the future?
The author is grateful to colleagues Jasmine Cameron, Assistant Director-General and Tony Boston, Director Digital Services, both of the National Library of Australia, for their assistance in reviewing this article.
All URLs accessed 14 April 2004
![]()
Ariadne is published every three months by UKOLN. UKOLN is funded by MLA the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. Material referred to on this page is copyright Ariadne (University of Bath) and original authors.