Web Magazine for Information Professionals

Intelligent Searching Agents on the Web

Tracey Stanley describes Web-based Intelligent Searching Agents, and takes a closer look at a few examples you may wish to play with.

What are Intelligent Searching Agents?

Many web search engines use the concept of a ‘spider’ - automated software which goes out onto the web and trawls through the contents of each server it encounters, indexing documents as it finds them. This approach results in the kinds of databases maintained by services such as Alta Vista and Excite - huge indexes to a vast chunk of what’s currently available on the web. However, the problems which users can face when using such databases are beginning to be well documented. A recent JISC-funded investigation [1] into the use of web search engines indicates that users can typically encounter a number of difficulties. These include the issue of finding information relevant to their needs, and the problem of information overload - when far too much information is returned from a search.

Typically, a search on Alta Vista can result in thousands of hits, many of which will not be relevant to a user’s enquiry. The size and wide coverage of such a database can make it difficult to quickly and effectively track down relevant information, using the limited searching features which are available.

Intelligent searching agents have been developed in order to provide a solution to this problem. Intelligent agents can utilise the spider technology used by traditional web search engines, and employ this in new kinds of ways. Typically, these tools are spiders which can be trained by the user to search the web for specific types of information resources. The agent can be personalised by its owner so that it can build up a picture of individual likes, dislikes and precise information needs. An intelligent agent can also be autonomous - so that it is capable of making judgements about the likely relevance of material.

Once trained, an agent can then be set free to roam the network turning up useful information sources whilst the user gets on with more urgent tasks, or even goes off line. This means that intelligent agents could be left roaming the web overnight, or at weekends, and a user could simply pick up search results at whichever is the most convenient time for them.

Another feature of intelligent agents is that their usefulness as searching tools should increase the more frequently they are used. Over a period of time, an agent will build up an accurate picture of a users information needs. It will learn from past experiences, as a user will have the option of reviewing search results and rejecting any information sources which aren’t relevant or useful. This information will be stored in a user profile which the agent uses when performing a search. So, an agent can also learn from its initial forays into the web, and return with a more tightly defined searching agenda if requested.

Some examples of Intelligent Agents

FireFly

Firefly is a music and film recommendation system on the web which uses intelligent agents to build up a complex profile of user preferences using a technique known as automated collaborative filtering.

Lets say, for example, that you are a big fan of The Spice Girls, and you want to find out if there are any other similar groups that might also be to your musical taste. You can tell FireFly which groups you like, and it will start to build up a picture of your tastes. This information goes into a personal profile which is stored in the FireFly database. FireFly will then go away and check its database to see if anyone else has indicated a preference for The Spice Girls - if so, it will take a look at the musical profile of other Spice Girls fans and suggest other artists, based on the premise that people who like The Spice Girls will also like other similar types of music. So, it’s the computer equivalent of running into someone in the pub and having a chat about the types of music you like.

Once Firefly starts to recommend artists to you it will also give you the opportunity to rate these artists on a scale from “don’t know” to “the best!”. As you continue to add your ratings, you continue to expand the musical profile of you which Firefly now holds. You can also click on a hypertext link to find more information about an artist, read the views of other members, or follow links to audio clips of music. There is also a facility for buying albums on-line.

By now you’re probably thinking that Firefly sounds more like the kind of system that might be popular with American undergraduates, and not really the kind of tool that has any use for serious research. The point to be made here is that it is important to think about the other possible scenarios in which a tool such as FireFly could be used. Imagine, for example, a group of social scientists using a Firefly-like tool to create a rating system for social science resources on the web. Researchers could input a set of keywords describing the type of material they are searching for and then have their request cross-matched against thousands of others in the database. It would also be possible to build up individual user profiles of research needs, so that you could send Firefly out on a regular basis to traverse its database or other publicly accessible databases to find potentially useful material which has been rated as useful by others working in your field.

Interestingly, FireFly have also recently announced a collaboration with Yahoo to create a website recommendation service. This will work in a similar way to the music and movie recommendation service in that users will be able to build up their own profile of web sites they find useful, and get recommendations for new sites based on their profile and the profiles of other users [2].

One word of caution with FireFly: you do have to spend quite a lot of time inputting your preferences in order for FireFly to build up a useful and accurate picture of your tastes. This can be time-consuming; so unless you are prepared to dedicate a fair amount of time initially in order to let FireFly get to know you, you may find that you are disappointed with the results it produces.

FireFly is available [3] over the Web.

Autonomy

Autonomy provides you with a whole suite of different intelligent agents to suit a variety of searching needs. Autonomy isn’t a web-based service; its a package which needs to be downloaded and installed on your own PC in MS Windows. It then works with your web browser to provide searching facilities. A free 30 day trial of the product is available at the Autonomy web pages [4] and it has been available for sale in the UK since November 1996.

Autonomy agents are trained by typing a few sentences about your subject of interest into a box provided on screen; you then let the agents loose on the web and they go off to look for relevant documents. These documents are graded according to their perceived relevance to the topics you have chosen.

Autonomy enables you to create a variety of agents to search for different topics. Each agent has to be individually trained, and they are then released onto the web by dragging them onto a web icon on screen. The agent will then start to search the web for your chosen subject As it searches you will see a graphical map of the sites it is exploring appear on screen as it moves from one server to another. Once your agent has finished searching it displays a list of sites it has found. You can then review these sites and accept those that appear to be relevant to your information needs. Autonomy will create a library for the sites you have accepted and use this information to refine its searching the next time you ask it to perform a search on that particular topic.

It is possible to send your Autonomy agent off on a web search and leave it running in the background whilst you get on with other work. However, I found that you do need a fairly fast PC for this to work well; my PC suffered quite a bit under the strain of having both Autonomy and Word 6 open at the same. The searching process seems to be fairly slow, although this problem could be avoided by setting the agent up to search over evenings or weekends.

I also had some difficulties making sense of the results I got from my Autonomy agent,as the sites it retrieved didn’t necessarily seem to relate to the topic I had requested. Recent discussions on the lis-ukolug mailing list [5] show that other users seem to have encountered this problem as well. Certainly, help information on how to train your agent effectively isn’t very clear, and is presented in a way of screen which makes it difficult to read easily. It may be necessary to spend quite a bit of time thinking about your search query and how best to frame this to get the results you need.

References

[1] Stobart, S. and Kerridge, S., WWW Search Engine Study, November 1996,
http://www.ukoln.ac.uk/ariadne/issue6/survey/

[2] UMBC AgentNews Web Letter, Agents on the Net, Vol. 1, No. 17, December 1996,
http://www.cs.umbc.edu/agentnews/96/17/

[3] Firefly Web Site,
http://www.firefly.com

[4] Autonomy Web Site,
http://www.agentware.com/

[5] Correspondence on lis-ukolug mailing list, Intelligent Agents, November 1996,
http://www.mailbase.ac.uk/lists/lis-ukolug/1996-11/index.html

Author Details

Tracey Stanley is the Networked Information Officer of the Library at the University of Leeds, UK
Email: T.S.Stanley@leeds.ac.uk
Personal Web Page: http://www.leeds.ac.uk/ucs/people/TSStanley/TSStanley.htm