Multi-media and Image Handling: The future is Textless
The Software Technology Outreach Programme (1) was initiated in Autumn 2000 by the Department of Trade and Industry (DTI) to bridge the gap between academia and industry. As Tony Stock explains “More than 1,000 advanced projects are carried out in UK universities every year, each producing on average five potential business applications - but companies are missing out because they and the universities are out of touch with each other”. Software Technology Outreach coordinates a variety of different workshops on relevant technology research areas that provide an opportunity for academic researchers to present to a commercial audience. Some of the recent seminars have included Virtual Reality: Real Uses, Advanced Networking and its Applications, New Roles for Software in Mobile Networks, Data Visualisation and Data Mining. The programme of seminars is run in conjunction with the Engineering and Physical Sciences Research Council (EPSRC).
In November I attended the Multi-media and Image Handling seminar to see what the programme has to offer academics and business professionals alike.
The seminar was held in the Regus Centre, No 1 Poultry just next to Bank tube station. On exiting the station I suddenly felt like I was missing something…could it be a bowler hat and an umbrella? Bank is situated in the heart of the city of London and there is serious business feel to the surrounds. So what potential image handling products could academics offer these high-powered, fast-moving business people?
After coffee and an introduction by Tony Stock, Software Technology Outreach Programme Director, Richard Storey from the BBC Research and Development department gave a presentation on Software technology in broadcasting. The majority of the presentation was spent discussing the problems the BBC have storing production information (metadata) and archiving material. Very apt considering their recent call for old videos of BBC programmes. After this Richard contemplated what will be next after television and radio and had a quick look at entertainment software. Richard observed that broadcast is a conservative business and that they haven’t taken too well to IT. This has created a proliferating amount of content without metadata and as he put it “content with no metadata is a liability”. Richard’s concrete example of this was exploding nitrate films! What the BBC need is a low cost solution that could link metadata to the essence (content). The solution would need elements of audio content recognition, data mining from existing texts, image content recognition and contextual and natural language searching. Richard and his Research and Development team are working on ideas but are still interested in finding out if anyone has anything better offer? Someone suggested scanning the credits!
Richard’s talk led nicely into the one by Paul Lewis from the University of Southampton on Content based retrieval and navigation. In the past metadata has been used to retrieve multimedia data from multimedia information bases. Recent research has tried to address some of the limitations of using text metadata by making use of multimedia content as a basis for retrieval. Paul discussed several projects he’s worked on/is working on in this area including MAVIS, MAVIS 2 and ARTISTE (2). The ARTISTE (An integrated Art Analysis and Navigation Environment) project is working at building a tool for the intelligent retrieval and indexing of high-resolution images. As part of their research they are investigating sub image location, crack classification and border finding and classification. These methods use algorithms that deal with the ‘semantic gap’ through use of a ‘Multimedia Thesaurus’ for images. Paul concluded by saying that these methods have their own set of limitations and content based retrieval for non-text media is still too crude to be used on its own without additional support.
Farzin Mokhtarian’s presentation was on The Curvature Scale Space representation (CSS for short which could be easily confused with cascading style sheets if you work in web development). Shape, like colour and texture features, can also give a great deal of information about images. Farzin explained that 2D shapes can be represented using a multi-scale organisation (graph) which then allows searching on particular shapes. Using even more complicated algorythms it is possible to have 3D and noisy artificial object recognition. The CSS shape detector was actually selected for MPEG-7 standardisation in September this year.
I took a deep breath…things did seem to be getting progressively more academic. The last presentation before lunch was given by Boris Mirkin of Birbeck College on Intelligent K-means clustering. K-means clustering’ is a popular method of data mining, however it has been found to have a number of limitations such as data standardization. Boris proposed some effective algorithms for automatically producing both the initial setting and three-level interpretation aids.
After the presentations there was some time for discussion. A delegate from the Heritage Image Partnership asked about what had changed in the multimedia and image-handling world in the last 10 years given that the essence of the problems discussed was already there then and it is still remains difficult to retrieve images using media content. The speakers seemed in agreement that the big change has been in perceptions and attitudes, not technology. 10 years ago archives were being created but nobody really understood why, or what benefits they would have. Now with more sophisticated and faster computers managers are starting to realise that there is money in archives, searching and image handling. Although there is no longer a technological wall stopping us find the answers to these problems there is still much research needed before academics arrive at a product that is suitable for mainstream multi-media and image handling. I pondered that the situation is very similar to that of machine translation; once the algorithms are cracked the technology will follow.
After lunch, a cup of tea and a quick walk around Bank for some fresh air the second lot of presentations began.
Dave Elliman of Nottingham University started the afternoon on a lighter note with his presentation on Image extraction and Interpretation (or escaping from under the paper mountain). Dave used some great sound effects and a short parable to demonstrate the negative repercussions of information overload. As a cure for the problem he proffered Nottingham University’s various state of the art paper ‘capturing and understanding’ systems. At this point in time Nottingham can offer cue-based agents that answer mail automatically, document readers that recognise hand written and cursive script, and content-based retrieval systems. Dave explained that if a document can be understood it can be stored efficiently and easily formatted. When asked how his software differed from the other paper capturers out there he gave its adaptability and personalization features as an answer. Dave left us with an image in our heads of a time when all documents could be processed entirely automatically or discarded without anyone needing to spend valuable time reading them. I wasn’t too sure if when this day comes it would mean we’d either find ourselves with time on our hands to do the things we really should be doing or all out of jobs.
Xiaohong Gao from Middlesex University gave the next presentation on recent progress in PET Image retrieval and traffic sign recognition in which she considered two areas of research. She started of by discussing the two main image types on which the majority research on indexing medical images by their content features (e.g. tumor shape, texture) has been done - computerised tomodensitometry (CT) and magnetic resonance imaging (MR) images. She then introduced another image type, Position Emission Tomography (PET). The second half of her presentation considered a vision model based approach for traffic sign recognition.
Min Chen of the University of Wales gave the final presentation of the day on Volume Graphics. Volume graphics is a new field in computer graphics concerned with graphic scenes defined in volume data types; a model is specified by a mass of points instead of a collection of surfaces.
The day ended with some discussion on one of the points made by Min Chen on the gap between the academic software community and business users. He had used the slide above to discuss the flow of research to product. He felt that ideas are encouraged through academic research and that products are created using business acumen because there is money to make. However to arrive at a business product technologies need to be created and at the moment there seems to have emerged a gap in the process. Delegates discussed how to bring people’s ideas to the market businesses would need to get involved at an earlier stage when the financial gains are not so high. Industry needs to take further risks and academics need to make sure they are more exploitative with their research. Delegates felt that partnership between business and academia need not always have a financial basis and suggestions for further collaboration included businesses providing real data for researchers to work with.
After the discussion people carried on business card swapping, networking and deliberating over the issues raised during the day. The Software Technology Outreach Programme definitely seems to be on the way to bridging the knowledge gap between universities and industry.
(1) Software Technology Outreach Programme http://www.st-outreach.org.uk
(2) ARTISTE: An integrated Art Analysis and Navigation Environment http://www.cultivate-int.org/issue1/artiste/