David Little outlines the resource sharing arrangements between the MedHist gateway and the Humbul hub, using the OAI Protocol for Metadata Harvesting, and some of the issues it has raised.
![]()
The MedHist gateway [1] was launched in August 2002, providing access to a searchable and browsable catalogue of high quality, evaluated history of medicine Internet resources. MedHist has been funded and developed by the Wellcome Library for the History and Understanding of Medicine [2], but is hosted by the BIOME health and life sciences hub [3], and as such is part of the Resource Discovery Network (RDN). MedHist was developed principally to fill the gaps left in the coverage of the history of medicine by existing resource discovery services within and outside the RDN. Both the Humbul Humanities Hub [4] and OMNI [5 gateway within BIOME provided some coverage of the subject, although this was not exhaustive. Outside the RDN, resource discovery services for the history of medicine were either defunct or concentrated on far narrower or broader subject areas [6] .
The fact the history of medicine is such an interdisciplinary subject caused problems for the Wellcome Library in deciding where to locate MedHist. Keen to keep the service within the RDN, it was decided to make MedHist a part of BIOME whose federated structure of health and medicine related gateways under a single hub suited the creation of an independent gateway with an affiliation to an existing service.
However, the interdisciplinary nature of the subject area suggested that it would also be important to make available MedHist's resource description records to other services with over-lapping subject interests, such as Humbul and the SOSIG [7] gateway, and to import any relevant metadata from other gateways, such as Humbul's History and Philosophy of Science records. Therefore, early on in the development of MedHist, methods of making its metadata available to other services, and importing external metadata, were investigated. The solution that was decided upon was the use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [8].
MedHist, in line with other RDN gateways collects and makes available descriptive metadata about Internet resources, catalogued in accordance with the Dublin Core Element Set [9]. In addition to obvious access points such as title and URL, resource descriptions include an evaluative paragraph outlining the purpose and main features of the resource, and keywords are assigned from the National Library of Medicine's MeSH (Medical Subject Headings) thesaurus [10]. Additionally, where a resource is dedicated to an individual, personal name headings from the Library of Congress Name Authority File (LCNAF) are also added [11]. In addition to this descriptive metadata, administrative metadata such as site creator and owner are also collected, but not displayed on the MedHist Website.
MedHist records automatically are available via a number of different access points: via the MedHist Website, the BIOME Website which searches the catalogues of all its constituent gateways and the RDN ResourceFinder database, a "union catalogue" of all the RDN's service providers' catalogues [12].
However, sharing metadata directly between different RDN services required the use of a separate process. Several options were considered before deciding on the use of OAI-PMH to expose and harvest metadata:
At present, MedHist records are exported to Humbul and Humbul's History and Philosophy of Science (HPS) records are imported into MedHist on a weekly basis. Whilst the exchange of records has been largely successful, some ongoing issues, some examined below, have currently prevented MedHist records being displayed on Humbul. However, Humbul HPS records are fully accessible via MedHist.
.Humbul OAI records are currently exported to MedHist on a weekly basis:
Once imported, MedHist staff:
Once live, Humbul HPS records are available on the MedHist Website:
Record display for the Aldous Huxley: the author and his times Website
...and the same record displaying in MedHist
To date the process has highlighted some issues which are currently being addressed by both Humbul and MedHist.
Staff time overheads
To fully integrate Humbul records into MedHist, MedHist staff must spend
time adding subject headings to imported records, and deleting any Humbul
records which "duplicate" existing MedHist entries. In terms of
the number of records imported from Humbul this is not too time consuming,
although it raises questions of the sustainability of the process if it
were extended to include records from other gateways. One approach
currently being considered is the automated addition of suitable subject
headings by the data provider, i.e. for Humbul to add agreed MeSH keywords
to each of the records exported. This would allow records to be
incorporated into MedHist's browse structure, although it is likely they
would have to be very generic headings (e.g. "Science" and "Philosophy").
Whether it is possible, or even desirable, for third party metadata to be
in an automatic "live" state after import is another area which
needs to be examined more closely.
Re-presenting metadata
MedHist and Humbul have slightly different conventions for the display of
metadata records. Humbul favour both a short and full record display, the
latter displaying information such as site author and publisher in
addition to title, URL and description etc. MedHist has only one record
display which features hyperlinked title, description and keywords. BIOME
have tended only to use author / publisher information only for internal
purposes, whereas Humbul consider this part of the full record display, in
the same way it would be displayed for books within a library OPAC.
Certainly BIOME have expressed some concern about this administrative data
being displayed and the implications it may have for data protection. This
is an area which may need to be considered across the RDN as a whole.
Rights
In a similar vein, some thought has had to be given about the expression
of rights statements when displaying third party metadata. At present
MedHist displays a single rights statement which indicates the record is
from Humbul's History and Philosophy of Science collection, but does not
reflect the rights statements published by Humbul which acknowledge the
work of individual cataloguers within its distributed cataloguing system.
The most likely way that this will be addressed is to have a hyperlinked
rights statement that will lead users to the full record within Humbul,
where they will also be able to see the resource creator and publisher
information.
Collection development
Over-reliance on third party metadata could potentially encourage
gateways not to catalogue within certain areas of their collections which
may be covered by other gateways. Whilst this may cut down duplication of
effort and be seen as beneficial, it may be some key resources will not
have been described by a particular gateway for its key audience. MedHist
continues to catalogue all history of medicine resources, including ones
that may already be within Humbul, and views other Humbul records as ways
of supplying records which may be slightly more peripheral, although still
of interest to the subject area. In addition to "adding value"
to the service, these more tangential records help to provide contextual
information for those interested in, for instance, developments within
scientific thinking during a particular period of time.
OAI and Dublin Core
At present, OAI records only support the use of unqualified Dublin Core.
This means that records imported and exported cannot express the full
richness of the metadata collected. For instance, MedHist keywords are
exported without any indication they are from the MeSH or LCNAF thesauri.
Similarly, Humbul's author / creator distinctions (e.g. Web designer,
author, compiler) are lost during the export process (although it must be
noted that these would not be able to be used by MedHist in any case).
Different records for different services?
At present, BIOME must currently export two sets of OAI records: one for
the RDN which feature a "cut-down" version of its gateways'
metadata (featuring basic descriptive metadata and metadata about the
metadata record itself), and one for Humbul which must additionally
feature author and publisher information. This raises questions about
standards within the RDN and the extra work for technical staff which "bilateral"
agreements between gateways can create. This is an issue that will
probably have to be considered across the RDN as a whole.
It has become clear that OAI is an effective way of sharing metadata between gateway services, but that it is not a panacea for all interoperability ills. The process between MedHist and Humbul has not been as straightforward as originally envisaged. It has been a "learning process" which has raised almost as many questions as it has solved. However, it is clear that the issues that have been raised are ones which may have to be addressed by anyone using OAI-PMH or metadata aggregation services and are hurdles to be overcome. Overall OAI-PMH has shown itself to be an efficient and effective way of metadata exchange, and also demonstrated how data may be re-used and re-formatted outside its original context. We certainly look forward to resolving some of the outstanding problems and pushing the resource sharing agenda forward with other related gateways, including SOSIG.
| David Little MedHist Project Officer Wellcome Library for the History and Understanding of Medicine Wellcome Trust Email: d.little@wellcome.ac.uk |