back

HTML Markup In Electronic Libraries

Electronic Libraries are in the business of providing information to their patrons via the network. The version of HTML markup used will therefore depend upon what they are trying to achieve and who their patrons are. If a service is only to be made available on a single site and that site only uses a single vendor's browser then of course the library is free to use whatever vendor specific HTML extensions it chooses. For example, if a service is only to be used within a site where all the users have Netscape Navigator v2.0, the Library can make use of blinking text with multiple fonts and frames, knowing that all its users will see much the same thing that the author did.

One point to note however is if the documents are intended to be very long lived the use of proprietary markup might render the upgrade process to the "next great browser" much harder than it would be if the documents were encoded using HTML 1.0 or 2.0. For documents with long life cycles (in computing terms, long is more than five years!) the library should really investigate the use of a more content oriented SGML markup such as TEI and then generate documents conforming to a specific HTML version from that.

However, if the service being provided will be used by patrons with many different browsers, it may be worthwhile sacrificing browser specific bells and whistles in favour of a more generic markup using the standard HTML DTDs. All though the result may not look as "pretty" as one using a vendor's proprietary tags, the chances are that it also will not look a complete mess when viewed on another browser. HTML 2.0 contains enough functionality that it can be used in most information provision situations. A library after all should be providing useful, high quality information resources to all comers and not trying to compete with ad agencies for "cool site of the week" awards.

The are some things that all sites should do however. The first of these is to include a line at the top of every document that they serve that specifies the DTD in use. This is rarely done and even this author admits to having written a large number of documents with no indication as to which version of HTML they conform to. To make this information easy for browsers to process there is a standard markup for it which is actually part of the SGML mechanism upon which all the standard HTML versions are based. An example of this "DOCTYPE" line for an HTML 2.0 document is:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

The next thing that Electronic Library document authors can do to help raise the quality of HTML markup in use on the Web is to validate their documents against the appropriate DTDs. Originally this was a tedious and difficult thing to do which might explain why it is rarely done. However today there are a number HTML editors available that will prevent the generation of invalid HTML, some browsers (such as Arena) indicate when they receive invalid markup and there are also a number of online validation sites such as Halsoft's HTML Validation Service that this article has been checked with.

The latter are particularly useful as they usually have a range of up to date DTDs for a variety HTML versions and can be used without the need to buy or install any new software on your machines. The validation is done using HTML 2.0 FORMs into which either fragments of HTML can be entered to be checked or URLs for entire documents can be specified. When you give such a service a URL for one of your documents, the program that processes the FORM will retrieve the document from your server, validate it against the requested DTD and then return a list of any errors to you. One neat trick with the online validation services is that you can often insert a small piece of HTML markup at the end of all of you documents that mimics the action of the service's form, allowing you to quickly validate a document by just clicking on a "validate me" button at the end of the document. Having such a button present may also encourage your users to try validating you documents. This will both help you spot accidental errors on your part if you make a change that invalidates the HTML but you forget to validate it and also "spread the word" about the practice of validating your HTML.

As well as generating valid HTML with an appropriate DTD, an Electronic Library service must also consider how its patrons will be accessing its documents. If they are all on a campus sitting at workstations and high end PCs with graphical browsers and high speed network links then the inclusion of inlined images in documents will present little problem. However, if they are accessing your service over slow international or dial up links, inlined images can be a pain. Nothing is more annoying to a network user than finding that a potentially useful page is full of inlined images and little else. If a document is to be widely available on the Web, the number of inlined images should be kept to a minimum and they should only be used for decoration or have their content replicated in textual links. This is because most graphical browsers provide the option for the user to turn inlined images off which many dialup users take advantage of and it must also be remembered that there are still a large number of people using text based browsers such as lynx. If the majority of a document's information content is only contained in the inlined images, it will be lost to these two classes of user.

Conclusions

HTML is a great way of providing useful functionality to end users and has helped push the lowest common denominator up a little from pure plain ASCII text in many situations. However, Electronic Library service providers must be aware that how they mark their documents up will affect their usability and thus usefulness to the end user. Proprietary vendor extensions are best avoided for widely used services, documents should include an indication of which HTML DTD they conform to and some form of validation should be performed. Public services should also avoid heavy use of inlined images to carry information content as it alienates users on slow links and non-graphical browsers.

If services take some of these simple approaches to marking up documents in HTML for delivery via the Web, we will have fewer users complaining able unreadable or slow links. Electronic Libraries have the opportunity to become show cases of good HTML markup and high quality information provision. Let's not miss that chance.

HTML 2.0 Checked!


back


Contents Page - Electronic Libraries Programme and Project Information
News Desk - Search Ariadne - Mail - About Ariadne - Front Page


January 17th 1996 - Comments can be emailed to Ariadne