Web Magazine for Information Professionals

Search Engines Corner: Alta Vista LiveTopics

Tracey Stanley looks at Live Topics, a more flexible and user-controlled way of searching the Alta Vista Web Page index.

The search engine world seems to be becoming an increasingly competitive environment with each major service vying with the others in order to come up with a killer feature which will attract more and more users to their site, and so boost their advertising revenue. Developers are always looking for new features which will increase the functionality of their search engines, and so far the battle has been fought over issues such as the size of the database and the ‘look and feel’ of the user interface. Alta Vista has always been something of a trailblazer in this area, and they now appear to have stolen a march on the likes of Lycos and Excite by introducing a new service - called LiveTopics [1].

Although it is currently in the public beta testing stage, LiveTopics represents a potentially major enhancement to the already highly functional Alta Vista search service. Alta Vista refer to LiveTopics as a “personal search assistant”[2]. This means that it can take the results of an individual search and automatically sort them into a set of categories in order to add structure and meaning to your results.

A typical search on Alta Vista might yield as many as 500,000 results, and as the web continues to grow this problem of information overload is likely to intensify. As a consequence, users can often feel overwhelmed by such huge sets of results: making sense of them and finding information which is actually relevant can be extremely difficult, if not impossible.

LiveTopics aims to get around this problem by the use of something known as dynamic categorisation of results [3]. This means that once a search has been performed Alta Vista can then group the results into a number of different key topics or themes - thus organising web pages with similar content under the same heading. This grouping is not predefined, but is instead done ‘on the fly’ based on the results of your search. Pages are automatically analysed for their content and statistical analysis is used to group subjects into themes; this is based on the number of occurrences of words in each document forming part of the search results. The great advantage of this is that the categorisation of documents can change as their content changes, rather than being fixed. The user can then review these topics and refine their search by choosing the ones which seem most appropriate and relevant to their needs. Irrelevant information can be excluded from the search.

When your perform a search on Alta Vista your results are displayed on screen. If you’ve got 200 or more hits, Alta Vista will automatically offer you the opportunity of refining your search using a number of different LiveTopics options. LiveTopics isn’t offered if you get less than 200 results: this is because Alta Vista is generally unable to come up with appropriate topics if the set of search results is quite small (by web standards), and tends to start generating categories out of irrelevant words such as you and yours.

On selecting one of the available formats for LiveTopics your results will be categorised into a number of different themes. You can then select or exclude the main topics presented, or any of the sub-categories which fall within them. Once you have refined your search Alta Vista will then run it again and give you a new set of results based on the topic areas you have selected or excluded. You can then choose to refine your search further if necessary.

Three formats are available for LiveTopics: a java interface, a java script interface and a plain text interface for those with text-only web browsers. The main differences between the interfaces are explained below:

Java Interface
Alta Vista need to be congratulated here for providing possibly the best example I’ve seen so far of how Java can actually be used in a useful context! However, the Java interface can only be used with java- enhanced web browsers such as Netscape 3.0 and above on Windows 95 or NT, Macintosh and Unix, Internet Explorer 3.0 and above on Windows 95 and NT, and Netscape 3.01 on Windows 3.1. The Java interface also takes a little while to load the first time you use it as some delay is experienced in downloading and running the Java applet.

The Java Interface actually offers two views of your results: Topic Words and Topic Relationships. Clicking on the tabs shown below enables you to meve between the different views. The Topic Words option displays the different topics as rows in a table, with those topics that are most likely to be relevant to your query at the top of each list.

Figure 1: A search through the Java Interface with table-based results

To require a word in your search results you can click on it once; as you do this it become underlined in green and will appear in the search form above the table. The original query updates as you select or exclude new words. Clicking twice on a word in the table will exclude it from your search - it then appear in red with a line through it in the table.

The Topic Relationships option display a graphical representation of your results - this appears as a map of topics with links where they have relationships. Each topic heading is expandable - so that you can see the sub-categories which fall below it. Where different topics are linked this usually means that some of the words under each topic heading are likely to appear in the same or similar documents. This enables the user to zero in on the areas that are relevant to their information needs. It also helps you to see where the relationships between different topics might be, and to perhaps come across unexpected relationships and contexts.

Figure 2: A search through the Java Interface with relationship-based results

When you move your cursor over a word the entire topic, with all its related words, is displayed. You can then choose to include or exclude specific words from within that topic. Clicking once on a word includes it in your search, clicking twice will exclude it from the search results.

In both cases, once you are ready to re-run your refined search, you can do this simply by clicking on the Submit button.

Java-Script Interface
The java-script interface works with any java-script enabled web browser - such as Netscape 3.0 and above for Windows 3.1, Windows 95, Windows NT, Macintosh and Unix, and Internet Explore 3.0 and above for the same platforms.

The java-script interface doesn’t have as much functionality as the Java interface, as it isn’t possible to display the graphical topics map. The java-script interface basically consists of a series of tables containing the main topic headings and sub-categories of words beneath them.

Figure 3: Searching on a variety of political concepts

You can choose to include or exclude words from the topic areas. Clicking in the boxes underneath the tick mark includes a word, whereas clicking in the boxes underneath the cross mark excludes a word. The query is automatically updated in the search form as you select or exclude words.

Text Interface
This interface is for users who have a text only web browser such a Lynx, although it can also be used with any other web browser. It is basically a simple table interface with limited functionality.

A Search Example

A good example which tests the functionality of LiveTopics quite well is to perform a search for the keyword ‘Ariadne’.

If you perform a search for ‘Ariadne’ on Alta Vista you retrieve around 9000 documents. None of our first 10 hits appear to be relevant to our search for the journal as they are links to software companies of the same name. With a set of results of this size, Alta Vista automatically prompts you to refine your search using LiveTopics. LiveTopics brings up a number of topic headings - including mythology, Amiga, Goddess, OPACS, UKOLN and Libraries. We can now exclude all of the irrelevant documents from our search by clicking on the irrelevant subjects to place a cross in the boxes next to them.

We can also choose to specifically select certain words - such as librarians, electronic, journal etc. - which appear beneath them. The topics we have chosen or excluded are displayed in the search form as we make our selection. We can then re-run the search by clicking on the Submit button.

Figure 4: Searching on Ariadne

In the example shown above, I have chosen to include the words electronic, journal and ukoln; and to exclude the words theseus, mythology and myths. This brings my search results down from 9000 documents to around 200. I can then go back and further refine my search if necessary. Alta Vista prompts me to use LiveTopics to summarise my results: if I choose this option it will re-define the categories of topics in my search results and present me with a new set of options based on these dynamic categories.

I am now presented with a new set of categories which include Elib, Librarianship, Scholarly, Networked and Harnad. I can choose to select or reject further words from within these categories to further refine my search.

Are There Any Drawbacks?

When using LiveTopics it can be tempting to select all the topics which appear to be relevant to your query. However, if you do this what you will actually be performing is a Boolean search using the AND operator. If a large number of words are present in your query this is going to drastically restrict the amount of data you get back from a search, as only documents will be returned which contain ALL of the words you have specified.

For example, a search on Alta Vista on the phrase “general election” retrieves over 10000 documents. If we then put LiveTopics into action it suggests a number of topics and possible keywords including Labour, Tories, Blair, Britain, British, Conservative, Conservatives, Liberals. It is tempting to select all of these options as they all appear to be relevant to the type of information we are looking for. However, on selecting all of these we are actually only going to retrieve documents which contain every single one of the specified words. This limits our results drastically. The best way to use LiveTopics is perhaps to initially be quite conservative (no pun intended) about the number of required words you select, and to be prepared to keep going back and further refining the query according to the results you are receiving. Alta Vista recommend that preferably you should select required words one at a time and review your results after each addition or exclusion [3].

It would also be useful if you could exclude an entire topic from a search simply by clicking on the topic heading. At present you can only exclude specific words from within that topic. For example, with the Ariadne search which I performed above I was presented with a number of topics which clearly weren’t relevant to my query: such as the ones on Greek Gods and Amiga software. However, I can choose to exclude specific words from these topics - such as mythology, Theseus, Aphrodite etc. - but it isn’t possible to exclude the entire topic as a whole.

Unfortunately, LiveTopics doesn’t currently appear to be available on the Alta Vista European mirror site [4] . This may be because the service is currently in beta testing, and so hasn’t yet been rolled out to European users.

Finally, one other uncertainty I have about LiveTopics is that it seems to be promoting a model of information searching which, as a librarian, I’m not sure that I’d entirely recommend! Librarians tend to spend a lot of time promoting the need for careful planning of a search strategy in advance of sitting down at a computer to perform the search. To some extent, LiveTopics seems to be aiming to take the need for this away, by suggesting alternative terms, synonyms and antonyms that might be used to refine a search once the initial search has been performed. I’m not really sure whether this is to be praised because it takes some of the hard work out of searching for information, or whether I should be expressing doubts about it for the same reason!

References

  1. Alta Vista Live Topics,
    http://www.altavista.digital.com/av/lt/
  2. Malkiel, C. and Monaco, K., Alta Vista announces breakthrough in Internet Search Technology,
    http://www.digital.com/info/internet/pr-news/970211livetopics.html
  3. Alta Vista Live Topics Help,
    http://www.altavista.digital.com/av/lt/help.html
  4. Alta Vista European Mirror Site,
    http://www.altavista.telia.com/

Author Details

Tracey Stanley
Networked Information Officer
University of Leeds Library, UK
Email: T.S.Stanley@leeds.ac.uk
Personal Web Page: http://www.leeds.ac.uk/ucs/people/TSStanley/TSStanley.htm