World Wide Web Conference 2004

Dave Beckett reports on the international WWW2004 conference held in New York, 19-21 May 2004.

WWW2004 [1] was the 13th conference in the series of international World Wide Web conferences organised by the IW3C2 (International World Wide Web Conference Committee). This was the annual gathering of Web researchers and technologists to present the latest work on the Web and Web standardisation at the World Wide Web Consortium (W3C).

This conference is very much a networking event in both the technical and personal sense. For the last 3 years it has had pervasive wireless networking ('wi-fi') available, allowing interaction with the sessions and the speakers during the conference. For the last two years, I have been involved in providing community coverage [2][3] via IRC and weblogging (with the help of the XMLhack people) to take a contemporaneous record of the event, thereby allowing users on- and off-site to participate as well as making it possible to keep an eye on multiple sessions, which is tricky in multiple parallel-track conferences such as this. It also means that a more permanent public record of the event appears rather quickly.

Conference Tutorials and Workshops

One of my favourite sessions was something that (sadly) I could not attend all day, a fascinating workshop Beyond the Click: Interaction Design and the Semantic Web [4]. This discussed many interesting research problems about dealing with representing richer information which the Semantic Web can provide. I dipped into this a few times and wished I could have seen more. This looks like an area to watch.

Opening Keynote

The conference opened with a declaration that 19 May was 'World Wide Web Day in New York City', read by Gino Menchini, standing in for the Mayor of New York who was attending the 9/11 hearings. This was followed by the opening keynote from Sir Tim Berners-Lee, Director of the W3C and inventor of the Web. This year he celebrated the achievements and discussed some topical Web-related technologies in his keynote Celebrations and Challenges [5].

Berners-Lee described how the foundation of the Web rested not only on the use of URIs for naming Web pages, but that it was critically based on a lower-level technology, the Domain Name System (DNS). The DNS provides the distributed naming system that allows the Web to scale by removing the need to pre-cordinate names; this was one of the big problems affecting earlier closed hypertext systems with centralised linkbases. This meant making links between Web pages became cheap and easy, although they could now fail (i.e. error 404). He described how domain names are now brands that people use like www.something.com when they want to find out about a brand or company. Short domain names are one of the few limited resources on the Web. Most recently there have been proposals to expand the Top Level Domains (TLDs) such as .com, .org etc. to include many more.

Author's Aside: There have already been domain name expansions but I have never seen an .aero or .museum web address used prominently. More popular recent expansions have included .name for personal sites.

The new domains being suggested tend to be for particular content such as .xxx to fence off a space for pornography. This has been suggested in the past and the key issue remains: whose community or legal standard defines this term? He suggested that these are more motivated by the desire to print money in the form of domain names. Only the largest organisations can afford to keep buying up bigco.abc etc. for each new .abc to protect their brands, this despite the fact that nobody is ever likely to seek out the bigco site - people will just add .com and try that first.

Berners-Lee cited the .mobi domain proposal for mobile phone content as a good example of a bad idea. If you are using a .mobi Web site on your phone and synchronise with your laptop or desktop, does the bookmark still work? Why not? It was a wrong-headed choice and there are better solutions for the actual problem - such as dynamically adjusting the content for the device through the server, a solution which works right now. Burning that single content choice into the URI will not work; 'small devices' are changing all the time and acquiring more features, thereby making it a choice that will soon be out of date. Imagine a .html1 domain for all original Web content and new domains each time a new type of content is needed [6].

Consequently Berners-Lee was of the view that we should avoid the temptation to create new top-level domains except in special circumstances, for example in order to identify phones, (for which there are several proposals). However the approach should be to identify the device - not its properties.( I should add that I was just picking one topic from a larger speech a lot of which has been reported elsewhere [5]).

Main Conference

As my research interest for some time has been RDF and the Semantic Web, this is the main conference for that area and indeed there were multiple tracks that essentially served as a mini-conference with several interesting papers from practitioners.

On the first day I attended a panel entitled Will the Semantic Web Scale whose members, for the most part, appeared not to have noticed that it already had. There were several audience members who vigorously disagreed with the panel. One other highlight of the first day was in the Web of Communities track presentation The Role of Standards in Creating Community by Kathi Martin, Drexel University. It included an analysis of words taken from subtitle indexing of President Bush's State of the Union speech; this was run in parallel with a display of retrieved images matching the words which gave a novel form of display. Kathi also discussed other work being done on the Digimuse Project [7] at Drexel. Later that day in Semantic Interfaces and OWL Tools, the Haystack browser was again demonstrated by Dennis Quan of IBM. It's a powerful interface but rather scary, you could call it a 'Shrek' if you like!

The second day started with a pair of plenary talks. The first one, Empowering the Individual, by Rick Rashid of Microsoft, was more of a rush through possible future technologies than anything profound. The buzzword density was huge. Following that was an interesting presentation by Udi Manber from search company A9.com, (an Amazon company), on their work on search innovations and the problems they had found, including their work on long-term search problems. He was limited in talking about the cool stuff coming up, but it was good to see a serious competitor and innovator to match Google.

If you try A9.com you may be surprised to see it recognise your name, this is because as part of Amazon, it reads your amazon.com cookie and can link the items together. There were several questions raised about their site privacy and Udi pointed out that there was a cookie-free anonymous site [8] that people could use without all the cool bits.

The sessions on the second day that stood out for me were in the Semantic Web Applications track, CS AKTive Space: Representing Computer Science in the Semantic Web presented by Les Carr and Monica M.C. Schraefel from the University of Southampton. I've seen some of the CS AKTive Space work before, but it's always interesting since it shows the use of real data in a Semantic Web of information which, to a UK computer scientist such as myself, is very topical. The other fun, fact-filled talk in the Reputation Networks track, Information Diffusion through Blogspace, was presented by Andrew Tomkins of IBM. He showed multiple ways of presenting information grabbed from 12,000 RSS feeds over time to synthesize such things as current, and especially evolving blog topics. It's probably unclear whether this is yet a robust source of information, but it was certainly interesting, in the style of the Google Zeitgeist [9], or Yahoo! Buzz [10], but more detailed.

On the final day of the main conference, Friday 21 May, the best presentations that I saw were Newsjunkie and Information Diffusion through Blogspace by Gruhl et al in the Mining New Media track. More delving into blogspace for fascinating facts, which seems to be a bit of a trend here. As a Semantic Web developer I also found Index Structures and Algorithms for Querying Distributed RDF Repositories by Heiner Stuckenschmidt et al in the Distributed Semantic Querying track, a very useful analysis for future consideration. This session was a highlight of the conference for several Semantic Web developers to whom I talked, as it presented results of practical work. Although I missed it, I heard that the Semantic Web Foundations track was pretty good and A Proposal for an OWL Rules Language by Horrocks et al will probably be a good pointer to future work at the high-end of Semantic Web development.

Developer's Day

My favourite part of the WWW2004 series is the Developer's Day, this year even larger than ever and with lots of interesting things in parallel tracks - how to choose between a track called "cool stuff" and "Semantic Web" for example?

I went to see Doug Cutting about the Nutch [11] open source Web search engine. Doug was one of the creators of the well-regarded Lucene search engine, and Nutch is his latest work. It seems really handy and looks like it might be a replacement for the venerable ht://Dig. He was aiming to improve the transparency of Web search, (what the commercial engines won't tell you), such as how the results were created. This presentation was very well received and I expect there were more than a few downloads during the talk.

As last year, Tim Berners-Lee did a live question-and-answer walkabout over lunch and the topics ranged from the general Web to applications for the Semantic Web. Maybe not as good as last year.

After lunch I saw Tucana [12] demonstrate their Semantic Web work, (the core of which is open source, the rest commercial), which includes a high performance and scalable Semantic Web system. They have reached over 1 billion triples on 64-bit PCs which they provide with advanced inferencing and searching; which somewhat contradicted the earlier panel on scaling the Semantic Web.

Jim Hendler of the University of Maryland and his group demonstated their SWOOP [13] ontology browser and editor. They showed how much easier it was to create OWL ontologies and work with them. If you have ever had a brush with the Protege editor, and come off worse, this is the friendly version, designed for OWL and still being improved.

The final session I attended was that of my colleague Alistair Miles from RAL who was presenting on our SWADE Project Thesaurus Activity [14] which has produced the SKOS (Simple Knowledge Organisation Systems) set of schemas and documents and is generating good feedback.


A great conference to attend in a central venue in Manhattan, five minutes from Times Square and lots of innovation going on. It feels a bit like the Web conference from the late 1990s as the Web grew.


Author Details

Dave Beckett
Senior Technical Researcher
Institute for Learning and Research Technology
University of Bristol

Email: dave.beckett@bristol.ac.uk
Web site: http://www.ilrt.bristol.ac.uk/

