Web Magazine for Information Professionals

Search Engines

Phil Bradley's regular column on search engine technology.

As we’re all aware, the world of the search engine is constantly changing; sometimes in my courses I refer to using search engines as being akin to trying to dance on quicksand. However, in the last few months there have been a lot of changes, even more than usual, so rather than concentrate on a particular engine, or specific subject I thought that in this column I’d try and pull together some of the things that have been happening with the major engines for you.

Excite. [1]

Many of you will know that Excite has been in deep financial trouble and it looked as though it was going to close down completely. However, InfoSpace bid $10 for certain of its assets, such as the domain names, trademarks and user traffic associated with the Excite.com web site. Most users will find little difference when they use it in future (although some of the personalised features may well look rather different), though ‘under the bonnet’ InfoSpace’s Dogpile metasearch is going to replace Excite’s crawler. Consequently, searchers will find that they end up with the same results if they use either Excite or Dogpile. Other elements of the site, such as news and television listings will also be powered by iWon.

More details on the sale can be found at ISP News. [2]

Google [3]

Google has finally taken a step forward with regards to stop words, and it is now possible to search for them if they are included within quote marks. You can try this for yourself by going to the search engine and doing a search for the individual words to be or not to be and you’ll find that you get a nonsense set of results, since the only word that Google will be interested in is ‘not’. However, if you re-run the search as “to be or not to be” (making sure to put it in quote marks) Google will actually do the search that you’re expecting. It’s still not a very good set of results since only one hit in the first ten results relates to the Shakespearian quote, but this comes as little surprise since I’ve never found that phrase searching at Google works particularly well.

Google is also interested in getting feedback on the results of the searches that it performs. When you’ve done your search for the quote, take a quick look at the very bottom of the page, where you’ll see a link entitled “Unsatisfied with your results? Help us improve.” If you follow this link, it takes you to a feedback form where you can tell Google exactly what you think of the series of results you’ve obtained. It will be interesting to see if this results in any long term improvement to the results!

Yet another interesting feature at Google is Google Zeitgeist [4] which is regularly updated to provide us with information on the searching habits of users. On this page they detail the top ten gaining/declining searches, languages used to access the search engine, web browsers and operating systems that users have installed and some information on image searches that have been performed.

Lycos [5]

Lycos has come up with a new and interesting twist on the whole filtering problem. I’ve never been a great fan of filtering systems, as I find them clumsy, and I dislike the idea of passing control of my searches over to some anonymous system somewhere that I can’t control. Lycos has however now introduced a feature called Lycos SearchGuard [6], which allows users to decide what sort of content they filter – current options include filtering material for pornography, information on weapons, violent and hate material. I don’t believe that there will ever be a perfect filtering system, but if it’s important to you to filter out unpleasant material, it may well be worth taking a look at.

Allsearchenginesuk [7]

A new search engine has been launched which attempts to focus on UK resources. It’s a multi/meta search engine that passes queries out to 6 major engines that have a UK component - Lycos, Overture, MSN, Yahoo!, Mirago, and AltaVista. If you are interested in doing regional based searching you may want to take a look at it, though I must confess that when I visited it and tried it out I was completely underwhelmed by it; the results didn’t appear to have been de-duplicated and the ranking system seemed based on the concept Ixquick [8] uses (ranking by number of search engines that find a page and position within the top ten), but without being as effective. Personally I’ll be sticking with Ixquick – I know it makes sense!

AltaVista [9]

For many years, AltaVista was my search engine of choice – fast, effective, it had a large database and one that was frequently updated. Diligent readers of my column will however remember that I finally said goodbye to it, since I was dissatisfied with the results that I was getting. However, even I was surprised at the extent to which it was being vilified at the Online conference held last December; the only time that it was referred to was in tones of sadness and several speakers openly derided it.

AltaVista has obviously tried to overcome its problems, but sad to say that its done what it always does – changed the look of the interface once again. It has returned to the look that it had several years ago, with clean lines, a tabbed interface and lots of white space. I think it looks much better, so I was keen to see if it had done anything else. The easiest way to check to see what a search engine is doing is to do a search on a popular term (I tend to use ‘internet’) and choose the limit by date option. I did this with AltaVista, by limiting to the last month, and found a miserable 634 references. In comparison, Alltheweb[10] had a massive 22,506,697 references! As you can imagine, I’ve not been tempted back, nice clean looking interface not withstanding.

A closing miscellany

For those of you who read the various articles that I write, you’ll know that I have a rather cynical approach to search engines, best summarised by the statement ‘They don’t work’. Cruel and extreme, I’ll grant you, but it’s a position I’m happy to back up if necessary. I was therefore very interested to read that two computer scientists from the City College of New York have reached the conclusion that the results from search engines can be biased, particularly if the search that’s been conducted has not been well thought out. The article, in Information Processing & Management is freely available, although you do have to register with the site [11] before you can read it.

Following on from this concern is another one that I know concerns a lot of people, and that is the thorny issue of paid placement. Search engines do not exist for free – they have to make money the same as all of us, and just being an excellent search engine doesn’t impress a bank manager. One of the methods that more and more search engines are using to make money is by selling (or if you prefer ‘emphasising’) certain websites, for a price. Do a search for a well known term, and you’re likely to find that the top hits are for sites such as eBay or Amazon. They will have bid for the use of keywords in order to direct people to their sites, and will pay search engines a certain amount of money for each time someone clicks on the link. This is all well and good as long as it’s made perfectly clear in the list of results that the top site isn’t the most relevant but has paid the most, but unfortunately, it’s not always clear cut. For a more extensive discussion than I’m able to provide here, you might want to take a look at an article provided by Mercury News. [12]

[1] http://www.excite.com
[2] http://www.internetnews.com/isp-news/article/0,,8_921261,00.html
[3] http://www.google.com
[4] http://www.google.com/press/zeitgeist.html
[5] http://www.lycos.com
[6] http://searchguard.lycos.com/
[7] http://www.allsearchengines.co.uk
[8] http://www.ixquick.com
[9] http://www.altavista.com
[10] http://www.alltheweb.com
[11] http://www.elsevier.com/cdweb/journals/03064573/samplecopy/viewer.htt
[12] http://www.siliconvalley.com/mld/siliconvalley/business/technology/personal_technology/2806431.htm
 

Author Details

 
Phil Bradley
Email: philb@twilight.demon.co.uk