Search Engines: New and Developing Search Engines

phil bradley

Search Engines: New and Developing Search Engines

Phil Bradley looks at some existing search engines and also some new ones to bring you up to date on what is happening in the world of Internet search engines.

In the last edition of Ariadne, I wrote about a few search engines that had come to my attention. Obviously the summer months are a fruitful period of development for search engine producers, because I've got a new crop to write about again! There are also some useful Web pages about search engines and developments in existing search engines that I've also discovered, so I'll be making mention of those as well.

Search Engines

A9

A9 [1] is the search engine provided by a wholly owned subsidiary of Amazon, and pulls results from the Google database, and I spoke about it in some detail in the last issue of Ariadne [2]. Since describing it, some new features have been added to the search engine which have, in my opinion, improved what was already a very good offering indeed. There is now a beta test version called 'Discover' which A9 uses to look at the Web pages that you've visited, matches them with information from Alexa [3] and suggests related Web sites, categories, frequently visited sites and movers and shakers, all based on your individual research using the search engine. You do however need to register and log in before you can use this. It's a great idea, and clearly A9 is attempting to explore the holy grail of personalisation very quickly, though I suspect that they need to do some more work in this area, since I was being suggested sites that really had nothing whatsoever to do with the sample searches that I'd been running on it, though in fairness they say the results are more accurate if you use their toolbar, but then they would say that.

Another option is to store a list of bookmarks on one of the other Home Page utilities that are available. It's easy, once you've run a search, to be able to drop a particular page into the bookmark option, and I suppose that it's slightly quicker than using the Favorites option from the browser bar, but if you're doing a particular set of searches it may be preferable to use the A9 feature to avoid cluttering up your Favorites list any more than it is already.

There is also a diary function, which is only available in conjunction with the toolbar offering. This allows users to take notes on any Web page that they visit, and then find them again using a search function, or by going directly to the diary. Since these notes are not held locally they are always going to be available regardless of whatever machine you use to access the Internet (with the limiting factor that the toolbar needs to be loaded). It's a useful function and provides you with a better way of remembering why you visited a Web page in the first place and why it was so useful to you. How much more useful it would be however if it could be used in conjunction with a group of people, where you and your colleagues could comment on pages and share this information backwards and forwards automatically whenever any of you visited that particular page! Eurekster [4] is already a long way down that general avenue of sharing data between a group of people, so it shouldn't be long before it appears on the horizon.

Peerbot

Peerbot [5] is a strange little search engine, but it was quite fun to play with, so I thought I would bring it to your attention. This engine just searches for favicons. A favicon is a small icon that some Web authors use to make their sites slightly different, and to stand out a little bit more. You're already used to seeing them, even if you didn't realise it. If you glance up towards the top left of your brower window, and look just to the right of Address, and just to the left of the URL of the page that you're looking at, you'll see a small image - probably a piece of paper with a dog eared right top corner, with a lower case 'e' superimposed on the front of it. That's the default favicon that Internet Explorer adds, though of course if you're using a different browser you may well not see this image at all. Enterprising Web authors are now starting to make use of favicons themselves, by adding small (16x16 pixel) images into their root directories. When Explorer visits the page, it will load the author-designed favicon instead of the default version. You can make your own favicons very easily by visiting the Favicon Web site [6] and you can also learn more about them there as well.

So, that's what a favicon is, and Peerbot simply looks for pages containing the key words that you have asked for and simply displays a listing of all the favicons that it finds. I suppose that it's useful if you are thinking of creating your own (which is worth doing, because when anyone adds your page to their list of favorites or makes a short cut to your page, they'll see your favicon, so it's a good way of keeping your site at the forefront of people's minds), because you can see what other people have done. It's also an amusing way to spend some time, just to see how much creativity some people have got. However, other than that, I'm not exactly sure what the value of the engine would be to most of us.

SMEALSearch

Rather more seriously we have SMEALSearch [7] and this is a niche search engine for business literature. It has been produced by the eBusiness Research Centre [8] of the Pennsylvania State University [9]. It searches the Web and catalogues academic articles as well as commercially produced articles and reports that address any branch of business (though they don't appear to specify exactly what they mean by that), looking in particular for university Web sites, commercial organisations, research institutes and government departments. In particular it is looking for academic articles, working papers, white papers, consulting reports, magazine articles, and published statistics and facts. However, it should also be pointed out that the search engine only looks for documents in PS (or compressed versions) PDF or Word formats, and not straight HTML pages. Once it has found such items, the search engine then performs a citation analysis of all the academic articles accessed and lists them in order of their citation rates in academic papers and the most cited articles are listed first.

Searches can be run in one of two ways, firstly by Documents, and once a search has been run, and results obtained these can then be narrowed down in number by restricting to keywords in the Header or Title, and ordered by expected citations, hubs, usage or date. Results provide the title of the article, a short summary with keywords in context, and a very useful link to show which papers or articles have cited the result in question. The second method that can be used is to search specifically for Citations, and the results can then be restricted by Author or Title and ordered by Expected Citations or Date. One particularly interesting feature at the bottom of the page of results is a graph showing the year of publication of cited articles.

The search engine has a good help page, (regular readers will know that this is something that I take very seriously and always look for), with some very useful guidance on how to search for articles. This is worth reading because it does require a slightly different approach to the 'normal' search engines that are out there. There is also a very useful FAQ section that goes into a great amount of detail on algorithms (most unusual; I've never seen a search engine be quite as upfront about it before), authors and contributors, document formats, general information, legal issues, querying and user modes and statistics.

If you are involved in researching for business-based information, and particularly if you require articles and citations, this is definitely a search engine worth spending some time with.

Other Search Engine News

Wotbox [10] now comes in several national flavours - Australian, Canadian, German, Spanish, French, Italian, New Zealand, UK and US. It's currently indexing just over 17 million pages, so it's got a long way to travel before it can start to compete with the big players out there.

Lycos has created a new search facility for searching information contained in forums, bulletin boards and groups (although not USENET [11], since Google already has that particular arena sewn up with Google Groups [12]). Unfortunately Lycos is not providing any sort of list of which resources it is using, so there's some guess work involved here, but it seems to have taken a fairly wide remit on their definition, since some results have included pages from my site, though they're also indexing data from FreePint [13], .org sites, Yahoo groups and MSN groups, though they're excluding weblogs at the moment. The service is still in beta test mode, so there isn't as yet an advanced search function, but this may well be added later. The facility would be useful for anyone who is interested in seeing what people are writing about now, and could well be of value if you need to find that elusive ephemeral information. I was amused by their adult filter option though; you can use it to block offensive content always, never, or sometimes. Quite why you'd choose the third option there I'm not entirely sure!

Briefly, Accumo [14] is a search engine of the 'cluster' type, which tries to group similar pages together by subject. It's not exactly a multi search engine, since it doesn't obtain data from a number of different search engines and collate the results together, but it does allow users to search Google, Yahoo, MSN, Open Directory or Wisenut. Other search engines are already taking this clustering approach such as Vivisimo [15] and to my mind are doing it rather better, but Accumo is worth looking at if you have a couple of minutes to spare.

ggler [sic] [16] is powered by Google, though it gives rather different sets of results since it doesn't limit itself to one page per site (consequently a search for my name resulted in pages from my site being the top 16 results); but it does show a small thumbshot of each page that it returns. I'm not however convinced about the value of this, since the thumbnails are so small as to be virtually useless.

Useful Web Pages

I found a couple of useful Web pages that provided me with information on how search engines work together - who provides database content and which search engines give/receive data from which others. The search engine relationship chart provided by Bruce Clay [17] allows you to focus on one particular search engine, while the search engine chart at Ihelpyou [18] is an animated image over the relationships found with over 20 different engines. Both are very useful if you are running training courses on search engines, or just want to get a clearer idea yourself as to what the relationships are between them all, and believe me, it's a very confusing world out there, so worth taking a quick peep!

References

A9 http://www.a9.com
Ariadne, issue 40 http://www.ariadne.ac.uk/issue40/search-engines/
Alexa http://www.alexa.com
Eurekster http://search.eurekster.com/
Peerbot http://www.peerbot.com/1.seerch.index.html
Favicon http://www.favicon.com/
SMEALSearch http://smealsearch.psu.edu/
eBusiness Research Centre http://www.smeal.psu.edu/ebrc/
Pennsylvania State University http://www.psu.edu/
Wotbox http://www.wotbox.com/
Lycos discussion search facility http://discussion.lycos.com/default.asp
Google Groups http://www.google.com/grphp
FreePint http://www.freepint.com
Accumo http://www.accumo.com/
Vivisimo http://vivisimo.com/
ggler http://www.ggler.com/
Bruce Clay's search engine chart http://www.bruceclay.com/searchenginerelationshipchart.htm
Ihelpyou's search engine chart http://www.ihelpyou.com/search-engine-chart.html

Author Details

Phil Bradley
Internet Consultant

Email: philb@philb.com
Web site: http://www.philb.com

Return to top