Web Magazine for Information Professionals

Search Engines Corner: Finding UK and European Resources on the Web

Tracey Stanley looks at how to keep your search results coming from within particular geographic areas and thus save on bandwidth.

When searching the web, many people are keen to find ways of restricting their search to material based in the UK or Europe. The advantages of being able to get hold of such material can include the possible greater relevance to UK needs of material with a UK or European focus, speed of access to the information, and greater ability to focus a search on local topics such as news.

A number of tools already exist for finding UK and European information on the web. Many of the eLib Information Gateways such as SOSIG and OMNI offer options which enable you to restrict your search of their database just to UK and European material. Other tools, such as Alta Vista, will enable you to restrict a search to UK material by adding the string url:uk to your search. For example, an Alta Vista search on +Oasis +url:uk will find documents about the group Oasis which have been published on UK web servers. This limits the search of the database to those documents which contain the word UK in their URL. However, this strategy can prove problematic as some documents which are based on UK web servers may not necessarily contain the word UK in their URLs (for example: the UK internet service provider ClaraNet which has the URL http://www.clara.net).

A number of search engines have therefore been developed which offer a much more focused approach to searching the web for UK and European material. Some of these tools are enhancements of existing global search engines, whereas others are databases of purely UK or European material. This article reviews two services which offer searching for UK and European material - EuroFerret and Excite UK.

EuroFerret

EuroFerret [1] is a database of European web sites aimed at member countries of the European Union. There are currently (August 1997) around 5 million documents in the database. The EuroFerret search engine is based on Muscat technology, which uses intelligent indexing and searching agents to provide interactive analysis of web documents. Detailed information about how this works is available at Muscat [2].

The user-interface for EuroFerret is quite straightforward; you simply type your search keywords into the form on screen and click on the button marked Find. You can also choose to restrict your search to a particular country by choosing it from a pull-down menu on the home page. The default for searching by country is Any - which enables you to search across the entire database of European web sites. Options are also available for searching the database in French, German, Dutch, Spanish, Italian or Swedish.

EuroFerret doesn’t offer options for performing Boolean searching, and the (fairly limited) help information which is available states that ‘the power of the search is greatly enhanced by using several words [3]. EuroFerret does automatic relevancy ranking on the documents that match your query - so that those which contain the most frequent incidences of the word(s) you have chosen are pushed towards the top of the list of search results. The lack of Boolean operators for searching means that EuroFerret doesn’t have the flexibility or power of searching that a tool such as Alta Vista can provide.

I decided to put EuroFerret to the test by trying out a search for some information about the new Oasis album which has had a lot of coverage recently. I expected that there would be quite a lot of UK material available on the web on this subject. A EuroFerret search on the keywords Oasis new album retrieved over 1000 documents matching one or more of these words. A list of hits is then displayed on screen giving a relevancy ranking for each document, and showing how many of the keywords were matched in the document. My first ten hits all have a relevancy ranking of 100% and match all three of my chosen keywords. A closer look at a couple of these documents brings up a few anomalies: my first hit turns out to be a Norwegian web page about the group The Charlatans, which includes a passing mention of Oasis, and includes both the keywords new and album, but in separate sentences.

There is, however, one document in the list of the first 10 hits which has Oasis in the title. Having identified this document, it is now possible to refine my search. This can be done by selecting the relevant document or documents by placing a cross in a checkbox which appears next to each hit. I can then click on a button marked Expand and get a list of words taken from the selected documents which I can add to my search in order to refine it. Possible words include: noel. definit. morn. interview. EuroFerret informs me that the words it is displaying have been ‘suffix-stripped’ so that, for example, match will also find matches and matching. I can select some words which appear to be relevant to my query and then re-run the search. The perceived relevance of the results is re-assessed in the light of the further information that I’ve provided about my needs, and then a new set of results is displayed. I’m now being pointed towards more documents which are focused on Oasis, and a few which mention a new album. This process of selecting relevant documents and refining the search can be repeated as necessary.

Comparing this search against a search for similar information on Alta Vista immediately reveals some of the weaknesses of the EuroFerret approach. Using Alta Vista I can express my query in much more precise terms - for example: +Oasis +“new album”, which tells Alta Vista straightaway that I only want documents which contain both the word Oasis and the phrase “new album”. This should avoid the problem noted with EuroFerret whereby documents were being returned which contained the words new and album in separate sentences. The Alta Vista search finds 696 documents [4]. I can then further refine this search by restricting it to UK material by adding +url:uk. This brings the search results down to 87 documents [5]. This gives me a much more manageable set of results in about half the time it takes to refine the EuroFerret search, and a quick glance at some of the documents retrieved shows that the relevance of the material found using Alta Vista appears to be greater.

A further problem with the results retrieved from the EuroFerret search appears to be the level of currency of the material. Quite a few of the documents appear to be referring to the previous Oasis album, not the current one. It is difficult to get a picture of how up to date the documents are as a date is not included in the information given about the hits on screen. The provision of dates would be a nice enhancement of the service if this were made available.

A new EuroFerret interface is currently under development, and a preview of this can be tested on the EuroFerret web pages [6]. This is an enhancement of the intelligent agent technology currently being used in EuroFerret. The agent service provides an instantaneous analysis of all the documents it retrieves from a search. Categories of words are then generated from this analysis and a list of words which can be used to refine the search is given on screen above the search results. The user can then choose between selecting words from this list to refine the search, or selecting particular documents which appear to be relevant.

A further service, known as Agent Briefing, is also currently under development. It is necessary to register with a username and password to use this service. The Agent Briefing service is a news service which enables you to search for news items over a given period of time. Once an initial search has been performed it can be saved as a ‘working brief’ which the agent can then use again in the future to run further searches on the topic. This brief can be refined over time as your information needs become more defined.

Overall, EuroFerret is a useful tool with a number of impressive features and some exciting-looking developments on the horizon. It could be vastly improved with the addition of some of the features I’ve mentioned above.

Excite UK

This is a new UK version of the popular Excite search engine, and this version offers the choice between searching UK sites, European sites or worldwide sites [7]. The full database consists of approximately 50 million web sites, although the UK and European sections will be considerably smaller than this.

Excite UK uses a technology known as intelligent concept extraction (ICE) [8] to find relationships between words and concepts. Like EuroFerret, it will automatically analyse the results of your search and generate a list of key words and concepts which frequently appear in the documents which make up your results. You can then refine your search by selecting from this list of words. Excite UK takes this a step further by intelligently linking together related topics. An example of this is given on the Excite web pages: an example search for “dog care” will also find material on the related topic of “pet grooming”[9].

Excite UK makes use of the + (plus) and - (minus) sign for Boolean searching, and phrase searching is also possible using quotation marks around phrases. Thus, we can perform our search for information on the new Oasis album as follows: +Oasis +“new album”. This search generates 157 results when restricted to the UK only section of the database. Excite UK also makes use of relevancy ranking, so our results are sorted according to the frequency of occurrence of our required words.

Excite UK also provides an extremely useful option for viewing the results of a search by web site. This is useful in seeing how many pages in our results are infact sub-sections of pages from the same web site. The option is then available for going directly to the home page for that web site rather than trawling through the subpages listed.

Excite UK also provides a More like this.. option. Once you have identified a relevant document you can click on the More like this.. link next to it to find other, similar documents.

Unfortunately again, Excite UK doesn’t provide a date for each document it retrieves from a search. This would be a welcome addition to the functionality of the service.

Excite UK is an extremely useful service with powerful facilities for efficient searching and for finding UK and European resources. It provides the functionality of a tool such as Alta Vista, with the flexibility of limiting the search to UK and European material.

References

[1] Euroferret
http://www.muscat.co.uk/euroferret/

[2] Muscat
http://www.muscat.co.uk/chd/amusfram.htm

[3] EuroFerret Help,
http://www.muscat.com/ferret/nouveau/help.html [22 August 1997]

[4] Alta Vista search,
http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=%2BOasis+%2B%22new+album%22 [26 August 1997]

[5] Alta Vista search,
http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=%2BOasis+%2B%22new+album%22+%2Burl%3Auk&search.x=53&search.y=9 [26 August 1997]

[6] Muscat Agent,
http://www.muscat.co.uk/cgi-bin/fx?DB=ferret.dup [22 August 1997]

[7] Excite search engine
http://www.excite.co.uk

[8] Excite Help - Using Excite Search and Directory
http://www.excite.co.uk/info/how_to.html [26 August 1997]

[9] Excite: What’s New?
http://www.excite.co.uk/whatsnew.html [26 August 1997.]

Author Details

Tracey Stanley
Networked Information Officer
University of Leeds Library, UK
Email: T.S.Stanley@leeds.ac.uk
Personal Web Page: http://www.leeds.ac.uk/ucs/people/TSStanley/TSStanley.htm