Web Magazine for Information Professionals

Human-powered Search Engines: An Overview and Roundup

Phil Bradley looks at the major contenders and discusses the value of this type of search engine.

‘Human-powered search engines’ is perhaps a slightly unfortunate term, since it makes me think of lots of people running around on treadmills providing the energy to keep the servers powered up! However, it’s the term in general use, so we’ll go with it. Essentially it means a search engine which can, and will have its results (or at least the position of its results) affected by human intervention, usually by people rating individual results further up or further down the rankings.

The major search engines that you’ll be familiar with all use complex computer algorithms to work out why one result, or Web page, should rank higher than another. These algorithms will take into account the position of words on a page, the repetition, the number of links pointing to a page and so on. Search engines will make particular claims for their own methodology, all claiming that they provide better and more relevant rankings than the opposition. As a result, each engine keeps its exact ‘recipe’ secret, which is one of the reasons why you receive different results for the same search with each and every engine you use – apart of course from the obvious point that they’re each searching different databases.

Unfair Gaming

While this approach generally works reasonably well there are of course flaws in the system. If people can work out how a search engine ranks, they can rewrite their individual Web pages to attract higher rankings – often known as ‘gaming’ a search engine. There’s nothing intrinsically wrong with wanting to achieve a high ranking with a search engine; and after all, it’s in the interests of the end-users to receive sound, relevant results; yet all too often this manipulation is done for quite different reasons. There is a huge industry involved in achieving high rankings, and the methods used are not always as ethical as perhaps they should be, with hundreds if not thousands of Web sites being artificially created to point to each other, or to one site in particular in order to boost it up the rankings. Even in situations where this isn’t the case mistakes can occur – a simple search for ‘Martin Luther King’ over at Google returns a racist Web site in a high position – at the time of writing this particular site (which I won’t mention in more detail since I don’t want to give it even more publicity!) is in 3rd position, though this is of course subject to change. Part of the reason for this is that people link to it, using it as an example of inaccurate information, but Google sees any link as a positive thing; not always the case.

Consequently the ‘traditional’ approach does have considerable drawbacks, and people often look for better ways of providing access to data. Early versions of human-powered or moderated sites were what I term ‘directory’ or ‘index’ sites, where real live people did (and indeed still do) check individual sites before adding them into their hierarchical collections. Strictly speaking however, such sites are not what most of us would consider as ‘search engines’ these days. A number of approaches have been taken to add other human touches to results, in most instances by allowing individuals to engage in the ranking process, by voting for or against specific results and/or writing their own results for particular searches. In this article I’ll take a look at some of these approaches and consider how well they work, and indeed at the extent to which the concept is either successful, or as flawed as the traditional method.

Anoox

An early entrant into the field is Anoox [1] which traces its origins back to 2005. It’s a very straightforward search engine which provides very brief listings of results (title and URL), together with a voting button – to move a result up or down the rankings, or to vote it as spam. I will confess that it does not inspire me with confidence – the first result returned for a search on ‘librarian’ is a placeholder site. Moreover, I couldn’t see any of the more usual librarian Web sites appearing on the first page of results. A perfect opportunity to affect the results surely? Unfortunately not. Before I can vote results up and down I have to register, declaring my name, address, telephone number, area of expertise and email address. My registration will, I am promised, take about 24 hours, and I’ll then be able to become a democratic member of the voting community.

This hiccough takes us straight to the heart of the problem with human-powered search engines – the element of fraud or spam. It’s all too easy to game the engine artificially by voting one particular site higher and another lower. Even registration isn’t going to stop anyone who really wants to achieve a high ranking since they can register repeatedly or indeed employ people to do so on their behalf. While it’s equally true that other people can vote sites down, or report them as spam, it’s going to be very difficult to stop anyone who is really determined. In fact, it’s much easier to affect the rankings in this way than by the traditional method of re-writing pages, getting more links and so on. One just has to hope that most people will play fairly, but this is not always the case.

Meanwhile my attempt to bring some sense to the poor search result has floundered at the first step. I’m prepared and indeed happy to register with these search engines, but a 24-hour turn-around time? It’s just not realistic.

This brings us to a second problem, and it’s one of self-interest. It’s easy to make the case with a site like Wikipedia, that it’s a huge joint effort on the part of thousands of people, and if nothing else one can feel a glow of satisfaction in having made a contribution. If I’m doing a search once (as most of mine are), in order to do justice to the system, I’m going to have to look at least a few results to see which ones are the best, then go back and rate the results accordingly. All this for a search that I’m never going to do again, or indeed which no-one may ever do again? I simply cannot see the value in spending my time in this way. The only people who are going to do it are those who wish to gain financially, as the search term ‘internet’ shows – almost all the first page of results point towards business Web sites, poker sites, optimisation services and so on. Now, I don’t know if these sites have been voted for – the interface doesn’t provide me with that data, but being the cynic I am, it wouldn’t surprise me. Consequently, we have arrived very quickly at the exact opposite of the initial concept!

ChaCha

ChaCha [2] is a search engine that’s twisted the concept of human-powered slightly. It was launched in September 2006 and provides straightforward results in exactly the manner you would expect – sponsored links followed by results with titles, reasonable summaries and URLs. It also provides ‘related searches’ as well. The interesting point with ChaCha however is the ‘search with a guide’ option – and you need to register in order to be able to use this.

ChaCha employs guides who are paid $5 an hour to search for results with you. They are not employed by the organisation, and are probably best considered to be ‘home workers’. Rather than users running their own searches they can click on the ‘Search with guide’ option. After a few seconds a sidebar window opens with a chat box and connects you to a named guide, who asks what your query is, and then attempts to find answers for you. Your searches are saved and you can return to them at any point in the next three months. The service is very slick, the guide is very helpful and polite, but unfortunately the answer that I got to my question ‘What were the names of the invasion beaches on Sicily during World War 2?’ was only partially answered, as I discovered when later I checked out the answer myself.

ChaCha isn’t so much a human-powered search engine as a personalised questions and answers service that uses a search engine, so it’s really a glorified version of Yahoo Answers.

Collarity Relevance Engine

Collarity [3] is taking a different approach to the concept of human-powered search engines. The emphasis this engine takes is to individualise searching, down to the level of a single individual or group. Collarity learns over time by watching the searches that are performed and matching them to appropriate results. This is best explained by way of an example. I used Collarity to find Web pages that related to alternative health therapies in the area in which I live, in Billericay, Essex, UK and my searches generally took the form of <name_of_therapy> <Billericay>. A few days later I returned to the engine to continue searching and, as soon as I typed in a new form of alternative therapy that I hadn’t searched for previously, the search engine immediately suggested an association with Billericay. However, when I then tried a search for <butcher> the same association did not appear.

While it wasn’t possible to affect the rankings by voting for or against sites as with many of the other search engines in this category, nonetheless the associations did at least arise from my previous searches. It is possible to take Collarity further however, and use it with a group of individuals, by integrating their search functionality into a Web site. However, there is a commercial aspect to this enterprise, so it may not be appropriate for everyone. A rather cheaper alternative may be the Eurekster Swicki, mentioned below.

Earthfrisk

Earthfrisk [4] is a multi- or meta-search engine, which pulls Web results from Google, Yahoo, Live, Ask and Clusty. Earthfrisk also takes data from social networking sources such as digg, del.icio.us, Technorati, StumbleUpon, and Reddit under the guise of ‘social media’ and reference sources such as Wikipedia, Britannica Online Encyclopedia, Infoplease and WebMD. It offers Web, video, image, map and directory search. The results, in common with most of these engines, are very basic – title, brief summary and URL. There is an option to discuss and vote on any result that you see. Moreover, it’s possible to ‘claim your name’ and if you’ve registered with the engine you’ll be able to add a few lines of text, including links and a photograph. Individual sites can be added to the database, tagged and described. It is also possible to view the actions of individuals, to see what their comments have been, what they have voted on and this can be saved as an RSS feed. Not exactly social networking, but it’s a nod in that direction. There is a toolbar that can be installed, although to be honest, I’m only going to have one toolbar running at any moment in time, and it’s not their turn. The toolbar does offer extra functionality, in that it’s possible to see the colour coding given to any particular site, but Earthfrisk seems slightly vague about how this is supposed to work, which does not engender confidence or enthusiasm.

Clicking on the voting option opens up a new window with a comments box and radio buttons for ‘very good’, ‘good’, ‘not good’. I have voted on and commented on several results, but this doesn’t appear to be reflected in the position of results. Nor is it clearly obvious that results have been voted on or comments made – it’s necessary to go into the link option to check this out. This seems rather foolish to me – surely the whole point of the exercise is to quickly see what people are saying about pages and sites and to be able to affect positioning?

I certainly think that Earthfrisk is heading in the right direction, but it’s not quite there yet; if I can’t see the work that I’ve put in there, it’s not really going to motivate me; and if my recommendations and votes do not alter the position of results I have to question the value of doing it.

iRazoo

The iRazoo search engine[5] attempts to provide a solution to my earlier question regarding self-interest, in that searchers get points for searching, voting and commenting on Web sites. Once registered users can perform regular searches as normal, with fairly basic results of title, brief summary and URL. When users click on a result a new window opens and they are taken to the appropriate page and can then vote for or against it. Points can then be exchanged for prizes. In order to obtain a Apple 30 GB iPod video Black (5.5 Generation), a user would need to amass 73,000 points; and since the top ‘earner’ had not yet reached 10,000 points I don’t think iRazoo is going to have to buy in very many!

That information bar can be closed or ignored as preferred. Irritatingly however, there is an option to click to go back to the search results, but these load in the same window, so in a very short space of time you can amass a lot of tabs for no good reason.

Once a page has been recommended, it is pulled to the top of the list of results, and the number of times it has been recommended is shown and searchers can view the comments that have been made about it, and add their own. A search for ‘bbc’ showed that the BBC news site had been recommended 9 times and commented on twice. Oddly however the opposite was not true of sites that had not been recommended. I located the Martin Luther King racist site and clicked against recommending it, but this was not taken into account by the software, and I was also unable to see the comment that I had made about the site. I found this slightly odd; while positive recommendation is obviously helpful, surely so is a negative response to a Web site? The only way that I could see the ‘anti’ recommendation system working was when I voted against a site that already had votes – when I looked again at the number of people who voted for the site the figure had dropped by one, but the fact that it had had votes against it wasn’t mentioned. Consequently the most that will happen is that a site remains static, so really bad sites will not suffer, although to be fair, good sites should flourish.

Mahalo

The Mahalo engine [6] was launched last year in a welter of publicity, mainly due to the reputation and background of the creator, Jason Calacanis [7]. Mahalo styles itself as the world’s first human-powered search engine, but I suspect that one or two other engines may well disagree with that description. However, it would be fair to say that it is the engine that has drawn most attention since its debut.

The search interface is clear and uncluttered, offering suggestions as the searcher types. Once the search is run, Mahalo comes back with various types of information. It gives searchers the opportunity to view tabbed results from 8 different resources (the major search engines, Wikipedia, Flickr and so on) as well as information from Mahalo itself.

The ‘human-powered’ element can be seen in the form of guides that have been written by volunteers. They attempt to create quick fact files, links to appropriate sites, the opportunity to email the information to contacts, and to share the page via various social bookmarking systems. The guides are paradoxically the greatest strength and biggest weakness of the system in my opinion. On the one hand, it’s very useful to have all that information pulled together for me in one place, rather similar to a virtual library such as Intute. On the other hand, I have to ask myself if I’m prepared to trust the word of a ‘guide’ about which I know nothing. The page on Everton FC was written by an individual who appears to have no link to or particular knowledge of English football, and though I don’t have issue with any of the content on the page, it’s not as useful as the entry at the Wikipedia for example. Worse, it was last updated in early August 2007. Consequently much of the ‘current’ information was already out of date, without details of the new players who had arrived or the fact that the assistant manager has now changed. The Wikipedia provides better quality content, and the Google tab provides me with current information as well.

Now, it could easily be argued that I could do something about this, as there is an option to comment, once I have registered and logged in. There is nothing to say however that my comments will have any effect, and they are quite simply that – just comments. As a searcher I would still need to check the information from other sources. Mahalo suffers the same type of limitations as Wikipedia, only more so; at least with Wikipedia the currency is better (as opposed to Malaho’s update being 5 months old, Wikipedia’s Everton article had been updated less than 5 hours beforehand!). Moreover, I am not relying on one person to get everything correct, but a group of people.

Mahalo has however recently expanded and is now emphasising the social aspect of the site more than it has done in the past. Its toolbar (remember the blissful days when we didn’t have any toolbars?) allows users to recommend sites quickly, always view Mahalo results when their search matches a Mahalo page and detects relevant Mahalo pages based on the content of the page being viewed. There is also a ‘Mahalo Social’ section, which allows users to register, create profiles, create and share recommendations, recommend links for searches, and discuss specific pages with other users. This is a very interesting development and Mahalo is taking some of the best elements of existing social networking systems such as Facebook, as well as social bookmarking systems and blending them into a new style network. However, once again there is a problem here, because I have friends and colleagues with widely different areas of interests which do not necessarily overlap. While my contacts may be interested in anything that I find which relates to search engines it doesn’t necessarily follow that they will be equally interested in material on the football team I support or my photography interests. Furthermore, when people use search engines the information required will not always be in the same subject area; my search interests are continually changing, and a social networking system will simply reflect the complete hodgepodge of my information requirements at any particular time.

This leads to another problem with the concept of human-powered searching, or social searching. If I’m running a search for ‘java’ it will make no difference to me if most people have voted and commented in the computing context of the word if I have searched in the coffee context, or vice versa. In actual fact, it could be argued that being able to vote for sites is going to make it more, not less difficult to find the information that I need. My ‘coffee’ interest is going to get buried under all those votes for programming aspects of java; whereas at least when using more traditional search engines, I tend to be offered better options for narrowing or widening my search. Human-powered searching is often seen as being ‘better’ searching, but I have so far seen little to suggest that is actually the case. Furthermore, if I want to be social, I’ll sit on Facebook for a while and chat with friends, or play stupid games. When I want to search, searching is what, and all, I want to do. I confess I am not particularly interested in making search results better for other people; I have neither the time nor inclination. All I want is sound, accurate information, delivered quickly, without fuss. Usually I’m searching for information and data that I do not know – that’s why I’m looking for it. Consequently, I am not in a position to say if a certain page is good, bad or indifferent until I’ve spent considerable time researching, by which time I’m keen to use the information, not return to the search engine to comment on its value, (which is going to change over time anyway). Just because I think a page is rubbish or excellent is no guarantee it will be tomorrow, and I’m not going to keep going back to check.

Sproose

Sproose [8] has a small number of search options available to users – Web, video, popular tags (basically just an active tag cloud displayed on the screen) and users. Search results are fairly straightforward, with title, brief summary, URL, vote count number and the option of commenting on a result. It did have the advantage of providing an RSS feed for searches though, which was welcome.

The voting process was easy; simply click on the ‘I like it’ button to the left of each result. While it is possible to unvote or remove your vote, it is not possible to vote against a site – the most that can be done in that direction is to write an adverse comment which is displayed with the page result. Interestingly a comment will also move a result up in the rankings. This is not necessarily a welcome feature, since commenting that a particular site was racist and that the information should not be trusted actually placed that specific site right on the top of the results, consequently giving it more publicity, not less.

There is a slight ‘social’ element to the search engine in that it’s possible to look at other users and see what they have commented on, but there doesn’t appear to be any way of connecting with users, or subscribing to what they’re doing, and so on.

I ran a variety of searches and there didn’t appear to be any particular attempts to force higher ratings for inappropriate sites or to ‘game’ the index, though the top result for George Bush was ‘George W Bush or Chimpanzee?’ which is not particularly encouraging!

Wikia Search

Wikia Search [9] is the latest human-powered search engine that has entered the fray and it was launched at the beginning of January 2008. Most commentators have shown an interest in this product primarily because the leading light behind it is Jimmy Wales, who founded Wikipedia. The general response to the engine has been critical, to say the least. It is stated on the site that ‘We are aware that the quality of the search results is low.’ (Their emphasis) [10]. It is also pointed out however that Wikia Search is not a search engine, it is a project to build a search engine, rather in the same way that when the Wikipedia was launched it wasn’t an encyclopedia and didn’t have anything by the way of entries. Wales himself estimates that it will take 2 years or more for the engine to reach a satisfactory level.

At the moment there is little to see. Search results are poor, there is a 5-star rating system which doesn’t work yet, it’s not possible to affect the ratings and all that it’s really possible to do is to create a profile and contribute mini articles to search results. Given that the search engine has only just launched there is little point in being too critical; it’s probably worth looking at again in six to twelve months, but it will be much longer than that before it’s a viable engine to use.

A Possible Alternative: Build Your Own Engine

There are flaws to all of the engines previously discussed, and perhaps more importantly, there are flaws which are inherent in the very concept of human-powered search engines. However, there are various options which I’ve briefly mentioned in a previous Ariadne article [11]. Various resources now exist that allow users to create their own search engines, limiting results to a small (or in some cases, large) number of Web sites. The concept is straightforward – if you know a lot about a subject area you will be in a good position to judge useful and authoritative Web sites, and can create a search engine that just searches those sites, excluding everything else. This means that it’s possible to tailor a search engine very precisely to reflect a specific interest, be that subject-based, vertical, by demographics and so on. The major players in this area are Rollyo [12] Eurekster [13] Yahoo search builder [14] and Google Custom Search [15].

I’m not going to go into detail on this occasion about such engines since that would require a column in its own right, but they all work in a similar fashion. The principal characteristic of these resources is that they allow individuals to ‘cherrypick’ sites that they think are reliable or which emphasise a specific aspect of a subject. A search engine can then be created which limits search to just those sites. While with the exception of the Eureskter swicki it is true they do not permit users to rank results, vote for sites, and neither do they offer much by way of a social aspect, there is still a high element of human intervention. For example, (and with a lack of modesty) if you want to search for Web 2.0 resources you could do worse than try my Web 2.0 search engine [16] or if you have an interest in UK-based information you could try my UK search engine which searches 90 other [17].

Conclusion

While there is a variety of human-powered search engines, they all have drawbacks as well as advantages. To come full circle, my major concern remains that they are still just as easy to ‘game’ as more traditional engines (despite what their creators say) and in order for them to work a lot of people do need to buy into the concept. Even then, users will end up with a group consensus of what is a good result, and that will not actually help a searcher who wants Apple the record company rather than apple the fruit. From a purely personal standpoint, the only search engine that I return to on a regular basis from the list above is Collarity [3] and the search engines that I’ve created myself. Perhaps success lies less in human-powered search engines and more in personally powered search engines.

References

  1. Anoox http://www.anoox.com/
  2. ChaCha http://www.chacha.com/
  3. Collarity http://www.collarity.com/
  4. Earthfrisk http://earthfrisk.org/
  5. iRazoo http://www.irazoo.com/
  6. Mahalo http://www.mahalo.com
  7. Wikipedia entry on Jason Calacanis, 29 January 2008 http://en.wikipedia.org/wiki/Jason_Calacanis
  8. Sproose http://www.sproose.com
  9. Wikia Search http://alpha.search.wikia.com/
  10. About Wikia Search http://alpha.search.wikia.com/about.html
  11. “Search Engines: Where We Were, Are Now, and Will Ever Be”, Phil Bradley, April 2006, Ariadne Issue 47 http://www.ariadne.ac.uk/issue47/search-engines/
  12. Rollyo http://www.rollyo.com/
  13. Eurekster http://www.eurekster.com/
  14. Yahoo search builder http://builder.search.yahoo.com/
  15. Google Custom Search http://www.google.com/coop/cse/
  16. Web 2.0 search engine http://moourl.com/philsweb2engine
  17. United Kingdom search engine of search engines http://moourl.com/philsukengine

Author Details

Phil Bradley
Independent Internet Consultant

Email: philb@philb.com
Web site: http://www.philb.com

Return to top