link to contents page link to eLib page From the Trenches - HTML: Which Version?

From the Trenches, a regular column written by Jon Knight, delves into the more technical aspects of networking and the World Wide Web.

A look at the causes of good and bad HTML and what can be done about them. [Ed: this article has been validated at HTML 2, which is why it looks marginally different to other aticles, in terms of style, in Ariadne].

by Jon Knight (J.P.Knight@lut.ac.uk), Dept. of Computer Studies, Loughborough University of Technology.

Same column in issue 3


Introduction

Most people concerned with Electronic Libraries have by now marked up a document in the HyperText Markup Language (HTML), even if its only their home page. HTML provides an easy means of adding functionality such as distributed hyperlinking and insertion of multimedia objects into documents. Done well, HTML provides access to information over a wide variety of platforms using many different browsers accessing servers via all manners of network connections. However, it is also possible to do HTML badly. Badly done HTML may tie a document down to a particular browser or hardware platform. It may make documents useless over slow network connections. As the Electronic Libraries programme is concerned with empowering people by giving them easy access to information via the Net, deciding what is and is not bad HTML and then avoiding using it is obviously something many librarians and library systems staff will currently be grappling with. This article aims to provide an informal overview of some of the issues surrounding good HTML markup and hopefully highlights some resources that may be of use in helping to improve the markup used in Electronic Library services.

The versions available

Before looking at what may constitute good and bad HTML markup, let us first review the wide variety of HTML versions available. There are currently only two versions of HTML that are on the Internet standards track; HTML 2.0 and HTML 3.0. All other versions are bastardised, vendor specific extensions to one of these open, non-proprietary versions. There is a version of HTML prior to HTML 2.0 known, unsurprisingly as HTML 1.0. It provides the basic hyperlinking and anchors that make HTML a hypertext markup language and some elements for highlighting text in a variety of ways. HTML 1.0 provides us with a lowest common denominator of all the different versions. If you mark a document up to the HTML 1.0 specification then the chances are that more or less every browser will do something vaguely sensible with it and so the information will be conveyed to the user intact. However HTML 1.0 was an informal specification that was never entered as part of the Internet standards process and its use is somewhat depreciated today.

One problem with HTML 1.0 is that it only offers a way to present basic textual information to a user; the means of getting feedback from the user are very limited. HTML 2.0 helps to overcome this problem by providing the document author with the FORMs capability. The mark up tags allow you to embed forms with text input boxes, check boxes, radio buttons and many of the other features that are common in user interfaces. These forms can be interspersed with tags from HTML 1.0 to provide additional functionality to a FORMs document and also to provide some for of access to the available data to non-HTML 2.0 compliant browsers. However such browsers are few and far between these days. HTML 2.0 is thus regarded by many as the base level of HTML to code to if you wish to reach the largest population of browsers and still have reasonable document presentation.

The latest version of HTML, HTML3.0, is still really under development. HTML 3.0 addresses the lack of detailed presentation control in the previous two versions with the introduction of style sheets and tables. The specification for HTML 3.0 also includes a mathematics markup that was very reminiscent of that provided with LaTeX. As HTML 3.0 is still under development, no browsers can claim to be fully compliant with the standard, although many of the more recent browsers have added some of the core HTML 3.0 elements to their own HTML 2.0 base.

Vendors also add their own proprietary tags to the core, standard HTML specifications. These tags are often presentation oriented or make some use of a feature peculiar to that vendor's browser. The most well known commercial browser is currently the Netscape Navigator, versions of which estimates have placed at anywhere between 50% and 90% of the total browser population. It adds many presentational tags that are widely used in many documents proporting to be HTML. Reading one of these "Netscaped" documents on another browser can result in anything from a slight loss of visual attractiveness to a completely unreadable (and therefore unusable) document. Some document authors are so intent on trying to use these Netscapisms that they even place a link to the Netscape distribution site on the Net so that those not blessed with Netscape can download it to view the author's documents. Things are only set to get worse with the entry of Microsoft and IBM into the fray.

It is in part the fact that browser authors add extra tags from one version of the HTML standard to a core from an earlier version and make up their own proprietary elements that causes some of the problems experienced by users. This is compounded by the fact that as the markup gets more complicated the opportunity for bugs to creep into different browsers increases. The result is that we have browsers and documents that all claim to be HTML when in fact many of them are not. To make matters even worse, many people don't specify which version of HTML a document is marked up in or even validate their documents to check that they match one of the specifications (known as a Document Type Definition or DTD).

Many browsers are very tolerant of the markup that they receive which in some ways is a good thing as it means that the end user is likely to see something even if the document's author has made a complete mess out of marking up the document. This has probably helped contribute to the Web's rapid growth as people perceive it to be relatively easily to add markup to documents and get working results. Unfortunately the flip side is that we are left with a Web full of poorly marked up documents not conforming to any of the standards, even the vendor specific extensions.


HTML 2.0 Checked!

Next page


Contents Page - Electronic Libraries Programme and Project Information
News Desk - Search Ariadne - Mail - About Ariadne - Front Page


January 17th 1996 - Comments can be emailed to Ariadne