Brief thoughts on indexing the World
As speculation mounts on when, as opposed to whether, Digital will be mirroring Alta Vista in the UK/Europe, and what exactly they mean by "will have partners who can localize the pages and include regional content", late night pre-Ariadne launch thoughts turn to matters indexing.
In this issue Dave Beckett and Neil Smith describe AC/DC, a Web search engine. The hugely significant difference between AC/DC and other search engines is that it indexes only a subset of publically available UK Web servers i.e. a large number of those with domains ending in .ac.uk and nothing else - this usually means that anything you find using AC/DC (if the located resource still exists) can be accessed much more quickly than most of the finds resulting from a Lycos or Alta Vista search.
This service is only experimental; as people have pointed out, it is also not the prettiest (though is that relevant?) or the easiest to use. Having said that, it is excellent for finding some of the UK academic sites that have links to a resource on your own server, for example. Search engines that restrict their indexes to some subset of the Web, such as the UK, or just academic or business-based servers, or let you search on some subset of their index, may be a useful complement to subject-based gateways and global "hoover" type indexes such as the SOSIGs and (current) Alta Vistas of this world.
Certainly, with the problems of a still-growing Web, international bandwidth traffic, search engine precision/recall and the proliferation of said engines, searching these large indexes with confidence is becoming an increasingly difficult and long-winded task. With several UK-based national, institutional, programme and project based strategies in caching, mirroring, subject-based indexing, metadata etc. either established or getting off the ground, the quest is well underway for the holy grail of a net-based searching system or strategy with acceptable recall, precision and speed. How long this will take is up to braver, or more reckless, people than me to speculate.
Of cardigans, anoraks and sharp suits...
Leafing through the articles we have in this issue, a common theme becomes apparent; what you wear is what you do for a living. In the Elvira and New Text Search Engines conference write-ups, clothing to suit the profession is crisply defined: cardigans for the librarians, anoraks for the teccies/geeks/people who write the software, sharp suits for the people who sell the systems made by the anoraks to the cardigans. For example, many people have noted that the balance between the teccies and the sharp suits (someone called this the suit to t-shirt ratio at the 4th World Wide Web conference) is swinging towards the latter as the Web rolls on; talk of anoraked people and suited people has been bandied around for years, mostly in a derogatory sense by the opposing camp.
However, this talk of cardigans is perhaps a newer phenomenon. In bygone times, librarians were stereotyped as possessing the proverbial bun on their head; before joining a library school and repeatedly encountering current and past librarians in close proximity I thought no such person could exist; but they do, albeit in extremely small and diminishing pockets of bun-dom, and certainly not in enough quantities to broadly justify the description. However, a search of the lis-link archive, though revealing several instances of the "bun" label, reveals no comment on cardigans. Perhaps this newer label puts librarians on a more equal footing, regarding social status, with the anoraked and suited hordes. In which case; what article of clothing do information science/librarianship research and information officers, who aren't usually heavily in the land of geekdom, the shelves of librarian-dom or the cut and thrust world of business, wear? Just a thought...