Metadata Corner: Naming Names - Metadata Registries

rachel heery

Metadata Corner: Naming Names - Metadata Registries

Rachel Heery examines metadata issues.

To-day’s information user may well have access to a range of resources, and these resources will be described in more diverse resource description formats than traditional MARC. During the search process a user will encounter systems based on several different resource description formats, for example, their local OPAC, an internet subject gateway, an electronic text archive, each of which will manipulate a different variety of metadata. Although the promise of an increase in ‘seamless searching’ across interoperable systems will mean the end user will not themselves be aware of these diverse formats, there are other people, and indeed software, that will need to understand and manage these formats.

Creators of resource descriptions, those who are implementing new search services, designers of interoperable systems all need to be aware of the authoritative versions of a particular resource description format and the precise data element definitions. Those who recall updating loose leaf manuals (UKMARC, USMARC etc.) will realise this is not a scaleable option. The information needs to be made accessible quickly and accurately over the network.

Among those involved in the Dublin Core effort, the need for sharing information about the element set was recognised at the 4th Dublin Core Metadata Workshop. How would implementors communicate their decisions regarding the use of qualifiers and extensions? How would the ‘final fifteen’ elements be recorded authoritatively? Stu Weibel comments in the workshop report [1]:

Name space management is prominent among the thorny impediments of deploying a world-wide metadata architecture. Who controls a given metadata name space (such as the Dublin Core, for example)? How can such a name space be partitioned such that it can easily be extended by others without re-inventing a particular wheel? What conventions are necessary to make a naming authority globally visible and accessible, and what sort of structured or unstructured data should be available online for humans or applications to process?

Establishing some form of ‘metadata registry’ seems to offer a way forward. For any particular ‘metadata element set’ the registry could record designation of content and agreed usage. Any extensions to the format would be recorded, as would agreed mapping between formats. The role of such registries would be both to promote and to inform thereby encouraging the use of standard formats and reducing duplication of effort.

Within the UK there is growing recognition of the desirability of searching across subject domains and across media types. The higher education funding bodies have shown commitment to encourage adoption of standards and protocols which will enable such cross searching and retrieval. Interoperability can be facilitated by the use of registries. Interoperability requires metadata to be used in standard ways and one way to promote such standardisation is for a system of registries to be established. The need for interoperability occurs at various levels. These can be typified as

global
regional
domain (by subject or resource type)

There may be more (e.g. sectoral).

At each of these levels there is a different business model for interoperability. The organisational impetus for achieving interoperability differs, the market differs, user expectations differ. A distributed model for registries would take account of this fact. Distributed domain/regional registries might exist in a mesh with higher level ‘global’ registries.

UKOLN is currently setting up a registry for use of the ROADS template, ( ROADS is an eLib project providing software to support subject based internet search services.) UKOLN registers template types and elements in use in current implementations. We act as a focus for additions and amendments to the element set, and provide element definitions and guidelines for use. We have drawn up mapping ‘cross walks’ from ROADS templates to USMARC and to the Z39.50 bib-1 use attribute set. UKOLN has plans to extend its registry activity: to provide rules for formulation of content (simple cataloguing rules), and to begin to provide a framework for interoperability within UK higher education resource discovery services. We also intend to record use of extensions to the Dublin Core element set in UK implementations.

It does appear there is some convergence of interest in metadata registries at the moment, coming not just from the digital library community but from a variety of other implementation areas and business groups.

This interest has been recognised by the International Standards Organization in particular those individuals involved with development of ISO/IEC11179 : Specification and Standardization of Data Elements, Part 6 of which deals with registration of data elements. The ISO Joint Technical Committee 1, Subcommittee 14 (Data Engineering) proposed a ‘Joint Workshop on Metadata Registries’ [2]which was held in Berkeley in July of this year and was sponsored by the US Environment Protection Agency, OCLC and the Metadata Coalition.

The Metadata Registries Workshop brought together people from a wide range of communities including those involved in digital library management, database management, and software engineering; they included people from research backgrounds, as well as service providers in government, business and educational sectors. In fact the diversity of this meeting was such that achieving consensus on an approach to metadata registration that could be encapsulated in any single ‘standard’ was not seen by many as a viable objective. The value of such a gathering was rather to increase awareness of different approaches to metadata, indeed to gain some understanding of what ‘metadata’ means to these communities, and to become informed regarding good practice and implementation experience.

One interesting example of the very few metadata registries in existence is the Australian National Health Information Knowledgebase [3]. At the Workshop Nigel Mercer, Head of the Institute’s Corporate Data Management Unit, gave an account of the development of the registry. The initial contents are based on those elements in the Australian National Health Data Dictionary, but in the future the registry will be extended to include data definitions from welfare and community services. This an electronic repository and query tool for health metadata, providing information about the use of particular data elements so as well as definition and permitted values, information is included on related types of data and the nature of the relationship, data collections available, who collects that kind of data, and so on. The Knowledgebase has been constructed according to ISO/IEC 11179.

In the US the Environment Protection Agency has established another ISO 11179 compliant registry, the Environmental Data Registry [4]. This data registry allows you to retrieve information about data elements and data concepts found in selected EPA systems. Once again the context of the registry is that of data surveys and data collection, with an acknowledged hierarchy of authority for formulating definitions and permitted values.

Clifford Lynch as keynote speaker at the Workshop considered the role of the metadata registry in facilitating extensibility and interoperability in the context of network resource discovery. So for example the registry would allow specific communities of practice to assume a common framework as regards their chosen ‘tagged language’ and ensure ‘collision avoidance’ within the format structure. Extensible semantics as defined in a registry might be either human readable or machine readable. The registry might allow the interpretation of different metadata formats by means of crosswalks, mappings or translations.

As a participant in the workshop I, along with others, found communication across so many boundaries a difficult and, at times, a frustrating process. That the workshop was able to agree on a number of resolutions (at present in draft format [5]) was no small achievement and a tribute to the determination of the organisers.

References

[1] The 4th Dublin Core Metadata Workshop Report, D-Lib Magazine, June 1997
http://hosted.ukoln.ac.uk/mirrored/lis-journals/dlib/dlib/dlib/june97/metadata/06weibel.html

[2] Joint Workshop on Metadata Registries
http://www.lbl.gov/~olken/EPA/Workshop/

[3] Australian National Health Information Knowledgebase
http://meteor.aihw.gov.au/ (Formerly http://www.aihw.gov.au/nhik/ -Ed.)

[4] Environmental Data Registry,
http://www.epa.gov/edr/

[5] Joint Workshop on Metadata Registries
http://www.lbl.gov/~olken/EPA/Workshop/report.html

Author details

Rachel Heery
UKOLN Metadata Group
UKOLN
University of Bath

Email: r.heery@ukoln.ac.uk
Own Web Site: http://www.ukoln.ac.uk/ukoln/staff/r.heery/