Web Magazine for Information Professionals

JISC Terminology Services Workshop

Sarah Shreeves reports on a one-day workshop on current developments and future directions for JISC terminology services held in London, February 2004.

Co-sponsored by the Joint Information Systems Committee (JISC) and UKOLN, the JISC Terminology Services Workshop was held at the CBI Conference Centre in London on 13 February 2004. Terminology services are networked services which use knowledge organisation systems (such as ontologies, controlled vocabularies, and classification systems) that can be accessed at certain stages of the production and use of metadata. Chris Rusbridge, Director of Information Services at the University of Glasgow, welcomed the participants and outlined the primary purposes of the workshop: to give an overview of research and work on networked terminology services in multiple domains and to inform future JISC development activities in this area. He reminded the participants that the ultimate purpose of terminology services is to help - whether directly or indirectly - users find the appropriate resources.

Lorcan Dempsey, Vice President for Research at OCLC, gave the keynote address. Noting that terminologies should be regarded as resources in their own right, he argued for the leveraging of these resources through the development of accessible, modular, web-based terminology services rather than a large monolithic service. He illustrated a modular approach with the demonstration of the integration of the Library of Congress (LC) Name Authority File into the metadata creation module of D-Space. He advocated that JISC act to encourage experimentation with the development of a diversity of small simple services rather than a single universal service.

The remainder of the workshop was divided into several sessions:

Many of the presentations are available at the workshop Web site.[1]

User Requirements

Paul Miller, Director of Common Information Environment at JISC, introduced the first session, a panel discussion on user needs for terminologies and terminology services within several specific domains.

Sarah Currier, Librarian for Learning Resources for Social Care at the University of Strathclyde, spoke of user needs within the learning community. There is a basic need for widely agreed subject vocabularies and classification schemes for learning objects, but there is also a need for specific vocabularies for different levels of education. These are fundamentally different from the controlled vocabularies typically used in the libraries and other types of cultural heritage institutions. The wide range of learning objects in the e-learning domain require specialised descriptions.

Nicholas Gibbins, Research Staff at the University of Southampton, reported on needs of researchers and projects in the e-Science domain. e-Science are large distributed collaborations with large data collections and high-performance computing and storage needs. e-Science requires metadata and terminologies for both resources and services. Nicholas observed that some of the features of the Semantic Web (such as the Resource Description Framework (RDF) and Web Ontology Language (OWL)) are attractive to e-Scientists.

Peter Jordan, Search and Metadata Manager at the Office of the e-Envoy, spoke about the use of the UK Online Web site, a portal to government information [2]. He stepped through an example showing the terms that users of the Web site employ to find information about life-long learning. He found that the terms entered by users were most often not the terms that were employed by the developers of the portal and that more mapping needed to be done between the two.

Ben Toth, Director of the National Electronic Library for Health, spoke about the use of terminologies at the National Health Service (NHS). Terminologies and terminology services that map between vocabularies are a key issue because of the electronic patient record initiative which includes a need to create links between the clinical terminologies (such as the Systematized Nomenclature of Medicine (SNOMED)) and bibliographic terminologies (such as the Medical Subject Headings (MeSH)). He argued that a business case must be made for terminology services and that in order to do this strong use cases should be developed.

David Dawson, Senior ICT Advisor for the Museums, Libraries, and Archives Council (MLA), spoke on the development of a broad subject terminology for the EnrichUK portal [3] which includes a range of collections digitised through the New Opportunities Fund (NOF). The NOF-funded projects selected one or more subject terms from a predefined list. However, these subject terminologies have not matched the search terms entered by users of the portal. Dawson presented an open directory structure that would allow users to understand what was in the portal before initiating a specific search and would support browsing as an alternative to searching.

Current Developments in Terminology Services

The next session of the workshop focused on current research and technologies developed to support terminology services. Dennis Nicholson, Director of the Centre for Digital Library Research at the University of Strathclyde, spoke about the High Level Thesaurus (HILT) Phase II Project funded by JISC. The primary goal of the project was to 'determine specific design requirements' [4] of a terminology service and to design an experimental pilot terminology service at the collection level for the JISC Information Environment which could inform JISC's further development of such services. The HILT II Project focused on building a mapping between subject schemes by using the Dewey Decimal Classification (DDC) as a spine. At the search end, a user might enter a term which is mapped to a terminology set (such as Library of Congress Subject Headings (LCSH)). This would then be mapped to DDC and from DCC to other subject terminologies (for example, UNESCO). This mapping potentially would allow the identification of multiple collections that meet the user's query as well as facilitate cross-collection searching. Once the subject term has been mapped, the options for terms are presented to the user who can disambiguate between terms as needed. Dennis noted that there were several areas, including the user interface design and the machine-to-machine interactivity issues, that needed further investigation.

Dennis then posed the question of what the next step should be. The HILT Phase II Final Report recommended moving forward with the work of the HILT Project [5]. He noted that the rough estimate of the cost of the development of such a service would be approximately £1 million over five years. He discussed a possible alternative of an automatic subject categorisation matrix which would be based on 'auto-categorising both resources and user queries' and would eliminate the need for mapping between schemas. The other alternative was the development of a single schema, though this has met with some resistance as found in the first phase of the HILT Project [5].

Rachel Heery, Assistant Director, Research and Development at UKOLN, spoke about delivering HILT as a shared service, i.e. a machine-to-machine service, within the JISC Information Environment. She advocated using scenario-based design to develop use cases in order to understand how applications might use terminology services. She also gave a brief example of how a terminology service might be used to enhance user queries. Rachel noted moreover that in order to begin fitting HILT into the JISC Information Environment more work is needed on the structured representation of terminologies, queries, and exchange formats and protocols.

Leonard Will of Willpower Information gave a brief evaluation of the HILT Phase II Project. He noted in particular the challenge for a mapping project like HILT to work with compound concepts as well as mapping from specialised terminologies to more general terminologies. Until problems such as these are solved and mapping work is complete, he identified a need for users to access easily the local subject terminology used in the relevant collections. The full evaluation is available in the HILT Phase II Final Report [4].

Douglas Tudhope, Reader in the School of Computing and leader of the Hypermedia Research Unit at the University of Glamorgan, took a step back from specific approaches to terminology services to give an overview of recent research and thinking around knowledge organisation systems. He began with a taxonomy of knowledge organisation systems including term lists, classification schemes, thesauri and ontologies. He then reviewed recent research and sources, including citations about the Semantic Web. Douglas went on to note where networked knowledge organisation systems fit into digital library services and potentially within the JISC Information Environment. He provided a very useful outline of the life cycle of such systems, (from the creation of the knowledge organisation system to indexing and classification to searching and query expansion to translation support and content integration). He situated each of the technology demonstrations that were to take place during the lunch hour within this life cycle.

He observed several critical issues to consider when developing knowledge organisation systems:

  1. the variety of standards and developing standards;
  2. the cost-benefit analysis of any formalisation of a knowledge organisation system; and
  3. the critical importance of the user interface and the involvement of users in the development of these systems.

The workshop lunch included demonstrations from several projects and technologies. Demonstration summaries and links to Web sites can be found on the workshop Web site [6].

Reports from Breakout Sessions

The first portion of the afternoon was spent in breakout sessions, each focused on a separate topic. Each of these groups reported back in a full session. The reports echoed several of the themes from the morning presentations.

Group One: Roles of automatic and human indexing

Chair: Dennis Nicholson, Director of the Centre for Digital Library Research

Group One focused on the roles of automatic and human indexing. They raised the issue of making a business case for terminology services, noted previously by both Ben Toth and Douglas Tudhope. As part of this business case and in order to determine how to index resources, certain questions need to be asked: What is worth indexing? Is some content more valuable than others? For whom is the content valuable? What value is the indexing to users? Measurement tools for both value and quality are needed. It was noted that one can talk about the indexing cost for producers, but it is also critically important to understand the cost to users if a resource is not indexed.

In discussing what method of indexing should be used, the group observed that different domains will have varying demands for quality and precision of indexing. There are clear limitations to automatic indexing, but there was general agreement in the group that this is not an either/or case. The use of automated techniques in the background with human intellectual supervision could be an appropriate balance. Finally, the group noted barriers to the implementation of automatic indexing, particularly staff who are fearful of losing work.

Group Two: User perspectives

Chair: Peter Brophy, Director of Centre for Research in Library and Information Management, Manchester Metropolitan University

Group Two focused on user behaviours and what their requirements might be for terminology services. There are many assumptions about user behaviours that are not well tested, although there are many user studies occurring in several domains. These, however, have not been compiled and coordinated to any great extent, and work towards the coordination of these might yield a better framework in which to situate the usefulness and value of terminology services. However, user behaviours are often task-based and contextual; it can be problematic to speak of 'the user'.

Picking up Lorcan Dempsey's call for modular 'webulated' terminology services, the group discussed what types of services might have the most impact on users. The group identified place (geographical terms), name (name authority files), and time (temporal periods) as three terminology sets which could have great impact on users and applications because of their importance across multiple domains and their relatively specific natures.

Group Three: General schemes vs. specific terminologies

Chair: Fred Garnett, Head of Community Programmes, British Educational Communications and Technology Agency (Becta)

Group Three focused on the possibilities of enabling applications to use the context of the user - in particular their own terminology - to bridge the gap between classification schemes and the terms users employ. They discussed the explosion of contexts for searching - particularly in the Web-based environment. They noted that the role of expert mediation has changed, but is still needed, particularly at the back end where expert appraisal and classification can add value. Classification specifically can enable browsing for users. The group also noted the need for more and better information literacy education.

Group Four - Technical aspects of the structure and use of terminologies

Chairs: Rachel Heery, Assistant Director, Research and Development at UKOLN and Alan Rector, Professor of Medical Informatics, University of Manchester

Group Four explored what technical standards might be needed and used for the development of common terminology services. The group noted that the technical challenges were quite different for machine-to-machine services than for human-to-machine services. Machine-to-machine services require well defined and rigorous protocols and standards, while human-to-machine services allow an opportunity for further disambiguation by the user. The group was divided on whether to focus on the development of small modular services or on a universal ontology (or meta-model) for knowledge organisation systems from which other ontologies could devolve and thus enable greater interoperability.

During the discussions following the last report, a tension implicit during the entire workshop was explicitly drawn out. The participants were primarily from the library and information science community and the computer science community with a few from the learning objects community. There is a noticeable gap between these communities' perspectives on how best to go about solving the problems posed by working with knowledge organisation systems. However, these communities are conducting similar work and have similar interests, and discussions such as these at the workshop were valuable for bringing the communities closer together.

Lessons for JISC and Group Discussion

Alan Robiette, Acting Head of Development for JISC, summed up some of the lessons for JISC from this workshop. He noted that JISC wants to focus on basic services that others can draw from, and that increasingly JISC is thinking beyond the traditional JISC audiences. A major question for JISC is whether to develop one terminology service or many. He remarked that multiple services are attractive, and that these different services could be maintained by different communities.

He went on to observe that there is often a dichotomy between what users say they want and what they actually do. The specialised users and the general users also represent two ends of the spectrum for the development of terminology services. It can be difficult to keep up with the ever increasing complexity of specialist terminologies. Developing terminology services for general users must take into account a wide range of background culture and vocabularies.

Alan then invited the participants to discuss further and identify important issues for JISC to consider. Several themes emerged:

Conclusion

As someone who is an outsider to the terminology services community (and to the UK!), I thought that the JISC Terminology Services Workshop allowed a lively exchange of views on terminology services and knowledge organisation systems from multiple communities. There seemed to be some agreement that JISC should explore small, modular approaches to terminology services. Further exploration of the role of the Semantic Web, RDF, and OWL in the development of these and other knowledge organisation systems (as well as the role of library and information scientists in the development of the Semantic Web) was an underlying theme as well. It was heartening to hear the concept of the monolithic 'user' broken down a bit. Finally, the engagement of the library and information science community and the computer science community over the issues presented at the workshop was an important, continuing step in a constructive relationship between the two.

References

  1. JISC Terminology Services Workshop Programme.
    http://www.ukoln.ac.uk/events/jisc-terminology/programme.html
  2. UKOnline. http://www.ukonline.gov.uk/
  3. EnrichUK. http://www.enrichuk.net/
  4. Nicholson, D., et al., HILT: High Level Thesaurus Project Phase II. Final Report to JISC. January 2004. http://hilt.cdlr.strath.ac.uk/hilt2web/finalreport.htm
  5. Nicholson, D., et al., HILT: High Level Thesaurus Project. Final Report to RSLP and JISC. December 2001. http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html
  6. Demonstration Summaries. JISC Terminology Services Workshop. http://www.ukoln.ac.uk/events/jisc-terminology/demonstration-summaries.html

Author Details

Sarah Shreeves
Sarah is a Visiting Assistant Professor of Library Administration and the Project Coordinator for the IMLS Digital Collections and Content Project at the University of Illinois at Urbana-Champaign.

Email: sshreeve@uiuc.edu
Web site: http://imlsdcc.grainger.uiuc.edu/

Return to top