Web Magazine for Information Professionals

Wire: Dave Beckett, Interviewed

Dave Beckett is subjected to an interview via email.

What do you do in the world of networking/libraries/WWW

I edit and maintain the Internet Parallel Computing Archive (IPCA) based at HENSA Unix, both of which are funded by JISC ISSC and I work at the Computing Laboratory, University of Kent at Canterbury. The IPCA is a large and popular archive of High Performance and Parallel Computing materials which has over 500M of files, serves around 3,000 of them each day and has four mirror sites in Paris, Athens, Osaka and Canberra. The popularity of the archive meant that there was lots of log information generated so I wrote some software, The Combined Log System which integrates of logs for WWW and other services. I presented this at the 3rd WWW Conference in Darmstadt. In a large WWW site it can be difficult for people to find things -- the Resource Discovery problem. To help this, I have catalogued by hand most files in IPCA using digital Metadata. This is to a digital file as a library catalogue entry is to a book. The metadata format I use is IAFA Templates (the same format being used by the ROADS project for the subject based indexes such as Social Science Information Gateway (SOSIG), Organising Medical Networked Information (OMNI) etc.). I implemented software to automate handling the metadata (presented at the 4th WWW conference in Boston) and I am currently adding searching to it using the Harvest indexing system.

Like Chris Lilley in issue 2, I am also interested in the new, open and legally unencumbered, Portable Network Graphics (PNG) standard and have done several projects with it. Firstly, I added PNG support to the Arena experimental WWW browser from the World Wide Web Consortium (W3C) as a demonstration of the key PNG features of full-image transparency and gamma correction. Secondly I wrote a program, pngmeta, to extract the metadata that can be stored in PNG images and make it available. I then used this with the Harvest system to index PNG images and give keyword searching of them. This has been of interest to WWW indexing companies who are keen to do this on the Web.

Finally, I have been working on UK based resource discovery systems: UK Internet Sites which is an automated database of all the sites below .uk containing any known WWW sites and a short title, description and category. The most recent project I have been working on, in cooperation with HENSA Unix, is the experimental ACademic DireCtory (AC/DC) service - a WWW index of all the known sites in the .ac.uk domain (and a few others). This is a collaborative project with Universities around the UK to share the gathering work but currently the majority of the gathering and all of the indexing is done at HENSA Unix.

... and how did you get into this area?

In May 1993, our Parallel Processing research group at UKC had developed lots of software and documents in the parallel computing area that we wanted to disseminate - using the Internet was the natural choice. At that point, HENSA Unix which is based at UKC had just started so I based the IPCA there. Gradually other people and companies donated their works and I collected relevant materials I found all over the world into the one place. The IPCA blossomed from there to the current state where over 1.1 million files have been taken during the three years of the service.

What was your reaction to the Web when you encountered it for the first time (and when was this)?

I first encountered it the WWW sometime in 1993 using the Lynx text browser and it looked interesting but slow. I stuck to using that until I got a graphical terminal on my desk and could run Mosaic and see the graphical Web. In 1993 and 1994, Mosaic was the web and it it is a little sad to think that there are probably people using the Web now who have never heard of it.

How significant an impact do you think that mobile code, such as Java applets, will have on the development of the Web as an information resource?

It could be a major development if they remain open standards and can be kept secure. This has not been proven to my satisfaction with the recent security problems Java has had. Processor and browser-specific 'plug ins' are not mobile codes and I hope they don't become the future - we have met this kind of situation before with PCs and got locked-in to a specific architecture and operating system.

...and how significant is VRML in the same context?

I don't think the VRML is ready for the Internet yet - it is pretty slow and the current state of VRML isn't powerful enough for proper animated and interactive 'worlds'. If lots more bandwidth and more powerful machines appear, it should have a role in certain domains - visualisation, art and entertainment.

A friend buys a PC, relatively fast modem and an internet connection. (S)he asks you what browser they should either buy or obtain. What do you advise?

There is a vibrant competitive environment between the major browser vendors currently and it is changing all the time so there is no definite solution. However, assuming the PC is running Windows 95, I would advise trying out the latest beta versions of both Netscape Navigator 2.0 and MS Internet Explorer 2.0 (3.0 if it is out) and use one or more of them. If you are running a PC Unix based system (Linux, BSDI) then Navigator or Arena could be used. In the UK, with bandwidth to the US being very limited, a good text browser is also useful. Lynx is the obvious choice and can serve well for fast WWW access. I personally use all of the above but for my quick WWW use, I use the browser inside the GNU Emacs text editor - W3. This is portable among all the OSes that GNU Emacs runs on, from VMS to DOS.

One of the most frequent complaints from UK Web users is the slow speed of accessing some Web sites, especially those held in the US during times of high network traffic. What, if anything, should be done (and by whom) to alleviate this problem? Are caching and mirroring the answer (and if so, how should these approaches best be deployed, locally, nationally and/or internationally)?

All of the above: caching at all levels - the user, the local site, the region, country and globally. This is a rapidly evolving service and as the UK National Web Cache at HENSA UNIX: http://www.hensa.ac.uk/wwwcache/ shows, the UK is leading that process. The mirroring of important and timely materials will be done for the foreseeable future until more sophisticated resource ''replication'' services appear.

Web pages can be generally created by either (a) using software to convert from some other format e.g. RTF, to HTML (b) using a package such as HoTMetaL to automate/validate the construction or (c) hand, in a text editor and then validated by e.g. the Halsoft validation mechanism (mirrored at Hensa). Which method do you use, and why?

(c) I mostly write HTML by hand in my text editor, GNU Emacs, which can validate that the HTML is correct during the writing. I also check it with other validating tools such as htmlcheck and weblint and also the Halsoft service at http://www.hensa.ac.uk/html-val-svc/ Most of the IPCA HTML files are generated automatically from the metadata, but to my style.

What would you like to see happen in the field of the World Wide Web in 1996/97

The further development and use of OPEN and machine independent standards for protocols, services and systems to support a growing WWW which is faster and more secure.

This should include an end to pages that only work on a specific version of one browser and don't work at all on text-only browsers. For example, of the 6640 .uk WWW site home pages I have collected, 1603 (24%) of them mention a specific browser. One of the main reasons Tim Berners-Lee invented HTML and the Web was to remove such system dependencies from information services and provide universal access - whatever way you are interacting with the web. It shouldn't matter if you are listening to the web page being read out on a speech synthesiser, reading it on a text browser or looking at the images, animation or virtual reality version - it should be available to all.

Ta for the interview, Dave