Image-based information is a key component of human progress in a number of distinct subject domains and digital image retrieval is a fast-growing research area with regard to both still and moving images. In order to address some relevant issues the Second UK Conference on Image Retrieval - the Challenge of Image Retrieval (CIR 99) was held in Newcastle upon Tyne on the 25 and 26 February 1999 . Participants included both researchers and practitioners in the area of image retrieval.
The conference opened with an overview of the state of the art of content-based image retrieval (CBIR) systems by John Eakins of the Institute for Image Data Research (IIDR) at the University of Northumbria at Newcastle. This presentation was based on a review of CBIR technologies being prepared by Eakins and Margaret Graham (IIDR) for the JISC Technology Applications Programme (JTAP). Traditional image retrieval techniques for both digital and non-digital images have used text-based systems. In these, human cataloguers create metadata about images - and text retrieval software can then be used to retrieve them . This process can be highly labour-intensive and inconsistent. CBIR techniques, by contrast, aim to recognise and retrieve information based on the content of images themselves . Eakins noted that the majority of current CBIR systems have been designed to retrieve by what can be called 'primitive' features like colour, texture and shape. Research has been undertaken into more sophisticated CBIR techniques that might be able to recognise particular types of objects or scenes but the systems developed are currently not mature enough for practical applications. CBIR techniques based on 'primitive' features, however, can be used in particular specialist applications. For example, CBIR can be used for automatic shot boundary detection in video retrieval systems. They are also used by a variety of organisations in some very specialised domains, e.g. crime prevention, the analysis of satellite images or for comparing trademark images .
Eakins noted that there had been very little systematic evaluation of CBIR or of traditional retrieval systems from a user perspective and stressed that image retrieval research needed to take serious account of user needs. He also noted that there was evidence of a growing synergy between traditional text-based and content-based retrieval techniques and noted that the development of systems that combined the two may yield better results.
Michael Swain (Audio and Video Search Project Team, Cambridge Research Laboratory, Compaq Computer Corporation) developed similar themes in his keynote presentation on "Image and video searching on the Web". Swain has been working in the CBIR area for a number of years and has previously developed techniques for colour-based retrieval . His paper at CIR 99 described work carried out at the University of Chicago and at Compaq's Cambridge Research Laboratory for developing image search services, stressing the extremely large (and growing) amount of images and multimedia - that exist on the Web.
Swain was previously director of the team at the Department of Computer Science at University of Chicago that developed an image search engine for the Web called WebSeer . WebSeer was a system that retrieved images from the Web using information from two sources: the text that relates to the image and the image itself. Most images on the Web are part of documents structured in HTML. Search engines, like WebSeer, therefore can base image retrieval on cues taken from text associated with images - from things like image file names, captions and HTML ALT= text entries . WebSeer, in addition, analysed the images themselves in an attempt to distinguish between photographs and other computer or human generated images. The WebSeer research also utilised face-finding and horizon detection techniques and experimented with identifying sectors of images by colour and texture analysis.
Swain is now continuing facets of this work in connection with the AltaVista Photo Finder service . This search service has similar features to WebSeer but includes the option to search for visually similar items and operates a so-called 'Family Filter' - based on the analysis of surrounding text and reference to external sources - that prevents certain inappropriate images from being retrieved . Research at Compaq's Cambridge Research Laboratory is also proceeding into indexing multimedia (primarily video images) as a means of studying the problems related to delivering video and audio through networks.
Several other presentations at CIR 99 covered practical CBIR applications and user studies. Robert van der Zwan (Open University) described an evaluation of the Informedia Digital Video Library system at the Open University . The Informedia video retrieval system was developed at Carnegie Mellon University as part of a NSF/NASA/ARPA Digital Libraries Initiative project and is designed to facilitate multimedia retrieval . The Open University creates and holds large amounts of video based information, primarily in the form of video recordings of television programmes. A subset of this collection has been digitised, segmented and indexed at Carnegie Mellon using the original tapes and transcripts provided by the Open University. The paper described a user study carried out at the Open University into the use of the Informedia system that indicated that there was some potential for its use within the university for a variety of applications.
P.M. Hayward (Applied Science & Technology Group, IBM UK Laboratories) described a user study of CBIR carried out as part the European Union funded Electronic Library Image Service for Europe (ELISE) project. The ELISE project is concerned with the issues that surround building a complete digital image service . The paper described a usability study carried out by the Applied Science & Technology Group at IBM UK Laboratories using IBM's QBIC (Query By Image Content) technology that used images of cultural artefacts from ELISE . The results of the study were positive but suggest that more work is required into the design and evaluation of image search interfaces.
Richard Harvey (School of Information Systems, University of East Anglia) delivered an illustrated paper on research work being carried out at the University of East Anglia (UEA) to identify and block images that contain pornography. Harvey first outlined the legal and commercial issues that relate to the downloading and transfer of pornographic images through the Internet. Research into using CBIR techniques to identify skin-tones and shapes that could indicate the presence of naked human bodies had already been successfully carried out in the U.S. by David A. Forsyth of the University of California at Berkeley and Margaret M. Fleck of the University of Iowa . Harvey reported on work carried out by an UEA research team that used CBIR techniques for skin-tone recognition which was then combined with text-based analysis to form the basis of a prototype system that could help filter inappropriate material.
The conference ended with three papers that discussed standards with relation to image retrieval. Alan Lock (Technical Advisory Service for Images, Institute for Learning and Research Technology, University of Bristol) introduced issues relating to standards for image formats and its associated metadata, for the technical quality and provenance of images and for intellectual property rights. This was followed by a paper that described Dublin Core and other current metadata initiatives and which noted the importance of integrating image retrieval into a distributed and heterogeneous digital landscape. The paper also introduced work being carried out by cultural heritage organisations, primarily museums, in this area [15, 16]. The paper additionally stressed the importance of creating and maintaining technical and rights metadata for the secure management of digital images and for the long-term preservation of digital information [17, 18].
The final paper, given by Edward Hartley (Distributed Multimedia Research Group, Computing Department, University of Lancaster) was a review of progress with MPEG-7. The Moving Picture Experts Group (MPEG) has been working since 1996 to develop a standard 'Multimedia Content Description Interface' that would enable the discovery and retrieval of information from audio-visual content . MPEG-7 provides a framework that allows the creation of standard descriptions (metadata) of audio-visual content. This might include structured descriptions of the content, information on coding schemes, intellectual property rights information, content ratings and information on the context of a recording. A working draft of the MPEG-7 standard is due at the end of 1999 and it is planned that it will become an international (ISO) standard before the end of 2001.
CIR 99 was an useful means of bringing together researchers and practitioners from a wide range of subject domains to discuss the retrieval of digital images. Certain common themes emerged. The importance of developing and adopting appropriate standards for digital images was stressed and much interest was expressed in the ongoing development of the MPEG-7 framework. Another area that was highlighted by the conference was the comparative lack of relevant user studies. The real challenge of image retrieval would appear to be to develop implementations that were truly user-led rather than technology-led. David Harper (School of Computer & Mathematical Sciences, Robert Gordon University) suggested that addressing this issue meant that user evaluation needed to be built into projects and image retrieval implementations and that research was needed into important areas like interface design. Another common theme was the essential complementarity of image retrieval based on text (or metadata) and CBIR techniques. Many current image retrieval services, like the AltaVista Picture Finder, depend upon the use of both CBIR techniques and analysis of the accompanying text structure. It is clear that with current technologies, systems that combine image retrieval based on structured text descriptions (metadata) with CBIR techniques may offer the best way forward.
The author would like to thank Margaret Graham and John Eakins of the Institute for Image Data Research at the University of Northumbria at Newcastle for their kind invitation to speak at CIR 99 and for their kind hospitality in Newcastle upon Tyne.
The Tyne Bridge, Newcastle upon Tyne.
UKOLN is funded by the British Library Research and Innovation Centre (BLRIC), the Joint Information Systems Committee (JISC) of the UK higher education councils, as well as by project funding from several sources. UKOLN also receives support from the University of Bath, where it is based.