The Librarian of Babel: The Self-Citation Machine
Good morning, and welcome back to the Library of Babel. Today's tour will focus on some of the issues raised by our unique acquisitions policy.  As you probably know, our Mission statement enjoins us:
To make accessible to all the totality of human knowledge
and we aim, towards this end, to collect all possible texts. We have been greatly assisted in this by the work of Tim Berners-Lee  and Robert Caillau  without whom our enterprise might have remained forever a pipe-dream.   This work has also, however, made our comprehensive task significantly more difficult. Naturally, the Library must seek to collect all translations of Shakespeare, including those into Xingu and the original German. Recent developments in cultural theory have left us with the alternately depressing and gigglesome task of collecting the script and cast photos of the stage musical based on the Disney cartoon of the film of the play of the book of whatever folk-tales inspired the Quasimodo industry. And all the intermediates, and their reviews, and the Web pages inspired by each. 
There are, of course, notorious problems with the rigorous referencing of the latter class of material, which we have informally dubbed "the mood-ring literature" to reflect its tendency to change from grey to black and back with temperature and the phase of the moon.  In an attempt to remedy this situation, the Library has recently begun a process of translation of World-Wide Web pages into the Harvard Reference System.
Technical progress has been surprisingly swift. Our prototype cataloguing engine examines all hypertext references in a document, and assigns a date and lead author to each referenced document. Clearly, this task will be very significantly facilitated by the adoption of common metadata standards. However, we are sadly of the opinion that it will be a number of years before these are widespread, and we are therefore developing a parser for Web pages. This, within certain restrictions, will recognise those documents such as CVs and resumés which contain an author name. In searching for such pages it constructs a relationship tree which assigns proper bibliographic references to the pages which form its nodes.
The results are interesting:
Arno Valdez's Home Page
Hi. I am a postgraduate at the University of Kesteven, (Regents 1977a) in the Department of Psychosomatic Metallurgy. (Regents 1977b) I graduated from the Ross Ice Shelf Institute of Technology (RIT 1997a) in 1994.
- Some effects of molybdenum nitride inclusions in stressed lithium-sarium sinter-castings (Valdez 1997q)
- Tin Sickness: a non-directive Adlerian approach (Valdez 1997r)
This page lovingly hand-crafted in vi; last updated 30 June 1997.
Devil, A Great Daemon story,
Valdez (1996a) University of Kesteven, Mire-under-Wold:
Valdez (1997a) University of Kesteven, Mire-under-Wold:
Valdez (1997b) University of Kesteven, Mire-under-Wold:
Valdez (1997q) University of Kesteven, Mire-under-Wold:
Valdez (1997r) University of Kesteven, Mire-under-Wold:
We tentatively conclude that the so-called World-Wide Web is evolving into a self-citation machine.
In an academic environment increasingly driven by productivity measures which appeal to the souls of accountants, this must have significant effects. We have anecdotal evidence of authors competing for the higher number of Web citations of their work. In the current state of the art, this is either an approximate or a labour-intensive undertaking. We have been told by one visiting author  that an initial search showed 431 references - but that 157 of these turned out on manual inspection to be the same article, which appeared with a different URL each time the indexing robot visited the site. It cannot be long, though, before the gradual rise of electronic publishing of refereed papers lends credence to the concept of a Web citation index joining the paper citation index in research review processes.
In order to monitor this phenomenon, the Library Administration Research Department is currently considering a proposal to convert the Harvardized Web pages back into HTML. This will enable us to make use of automated indexing and cataloguing facilities. It will, we hope, allow us to detect and filter some instances of self-citation; we are alarmed however, that the innocent self-citers will be more heavily adjusted than the malicious, who may simply spread their Web sites across multiple servers in order to defeat our first-level author-identity parser.
The Harvardization programme also leads to the intriguing or frankly alarming prospect of a shadow-Web in this form. This, however, is entirely in the spirit of the Library of Babel. We look forward to rising to this new challenge, and to those which our responses to it will, it seems, inevitably generate.
Thank you for lending your attention to this brief presentation. Please feel free to take a copy of our petition to the Joint Universities Group Underwriting Library Administration Research as you leave.
 Home page of Robert Cailliau (do not address Robert in French),
 To demonstrate both how incomplete is the state of exploration of the Library and how comprehensive it is, note that at the time of this visit the only English-language document among the 11 returned by an AltaVista  search for the phrase "in xanadu did kublai khan" was this: http://www.redcat.org.uk/~matt/html/think.html
 Nyahhh, we have a much shorter URL so we must be cool!
Author (channel?) detailsMike Holderness,
Mike Holderness: The Internet for Journalists - Web pages at: