Web Magazine for Information Professionals

Web Focus: Let's Get Serious about HTML Standards

Brian Kelly encourages authors to treat compliance with HTML standards seriously.

If you talk to long-established Web authors or those responsible for managing large Web sites or developing Web applications intended for widespread use in a heterogeneous environment you are likely to find that the need for compliance with Web standards is well-understood. There will be an understanding of the need to avoid a re-occurrence of the "browser wars" and to minimise the development time for an environment in which, especially in the higher education community, end users are likely to use a wide range of platforms (MS Windows, Apple Macintosh, Linux, etc.) and browsers (Internet Explorer, Netscape, Mozilla, Galleon, Lynx, etc.).

However although many experienced Web developers will state their commitment to Web standards, such aspirations are not always implemented in practice. This may be because the importance of HTML compliance is not communicated widely within an organisation (especially when there are likely to be many authors, as is likely to be the case within higher educational institutions); because HTML authoring tools fail to implement standards or because authors do not accept the need for standards and will either make use of non-standard features or fail to actively address non-compliance with standards.

This article aims to persuade HTML authors of the importance of compliance with HTML standards. The article also provides an update on Web standards and contains advice on techniques of ensuring that resources comply with standards and for checking for compliance.

The Dangers Of Failures To Comply With Standards

Does compliance with HTML standards really matter? Surely if the page looks OK in Netscape and Internet Explorer Web browsers this will be sufficient?

Testing compliance with HTML standards by visual inspection is not satisfactory for the simple reason that Web browsers are designed to process Web pages which do not comply with standards as best they can. However one should not use this permissive approach by Web browsers as a justification for not bothering with compliance with standards. Strict compliance with HTML standards is important for several reasons:

Avoiding Browser Lockin
Web pages which make use of proprietary browser features will not be accessible to other browsers. As we have seen with Netscape, even if a browser vendor has a significant market share there is no guarantee that this state of affairs with continue indefinitely.
Maximise Access To Browsers
Certain browsers may be more lenient with errors than others.
Maximise Accessibility
Web resources which comply with HTML standards will be more easily processed by screen readers and other accessibility devices.
Avoidance Of Court Cases
If, for example, Web-based teaching and learning resources are not accessible to students with disabilities, students may have a case, under the SENDA legislation which becomes law in September 2002, to sue the organisation.
Enhance Interoperability
Web resources which comply with HTML standards will be more easily processed by software tools, allowing for greater interoperability of the resource.
Enhance Performance
Web resources which comply with HTML standards, especially the XHTML standard, are likely to be processed and displayed more efficiently since the HTML parser will be able to process a valid resource and not check for errors as existing Web browsers are forced to do.
Facilitate Debugging
Web resources which comply with HTML standards should be easier to debug if the pages are not rendered correctly.
Facilitate Migration
Web resources which comply with HTML standards should be more easily ported to other environments.

It should be noted that when HTML resources need to be reused by other applications, there is an increasing requirement for the resources to comply rigourously with HTML standards. Arguing that a resource is almost compliant is like describing someone as almost a virgin!

HTML Standards

If HTML standards are important, which standards should be used? Many organisations are likely to have standardised on the HTML 4.01 specification [1]. Many widely-used HTML authoring tools can be used to create HTML pages which comply with this standards].

However HTML 4 is no longer the latest version of HTML. The latest version, is XHTML 1.0 [2]. This recommendation, which is recommended for use by W3C (the World Wide Web Consortium), became an official W3C Recommendation in January 2000. XHTML is a "reformulation of HTML 4 in XML 1.0" which means that it will be able to be used in conjunction with XML tools and will benefit from developments of the XML language, as described in "The XHTML Interview" [3]

One benefit of XHTML which is worth noting is the XSLT language which can be used for converting an XML resource into another format, which could be another XML application or a non-HTML format, such as PDF. However in order for XSLT to work, the XML resource must be compliant.

Ideally organisations should standardise on XHTML 1, as this. However there may be obstacles to the use of XHTML 1.0 as an organisational standard, such as the need to upgrade authoring tools, provide training, etc. If this is the case then HTML 4.01 should be the organisational standard. Versions of HTML prior to this should not be used, as they do not provide an adequate level of support for accessibility.

Implementation Issues

Whether you are using XHTML or HTML 4.01 there are a small number of elements which you must use in order to ensure your resources are compliant.

Your Web page must begin with the document type declaration (DOCTYPE). For XHTML this is of the form:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

whereas for an HTML 4.01 document it could be:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

The document type declaration is used to specify which type of HTML is to be used. If the first example above the document is an XHTML 1.0 transitional document, whereas in the second example the document is an HTML 4.0 transitional document.

Once the DOCTYPE has been defined you should give the <html> element. If you are using XHTML, you will have to specify the namespace:

<html xmlns="http://www.w3.org/1999/xhtml">

Why is this needed? XHTML is an XML application and XML can be regarded as a meta-language which can be used to create other languages - for example MathML - the Mathematical Markup Language [4]. Since it may be necessary to create resources which combine languages (for example an XHTML document which contains mathematical formulae) a namespace is needed to differentiate the XHTML element names from those belonging to MathML.

In the document's <meta> element you should specify the character set for the document:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

Your XHTML will therefore have the following basic structure:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>XHTML Template</title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
</head>
<body>

</body>
</html>

Note that the elements shown above can be regarded as mandatory for most XHTML documents (the DOCTYPE could be replaced by a more rigourous definition, but the one given is suitable for most purposes).

The format of HTML 4.01 documents will be :

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>XHTML Template</title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
</head>
<body>

</body>
</html>

Note that the DOCTYPE shown above is mandatory for most HTML 4.01 documents (it could be replaced by a more rigourous definition, but the one given is suitable for most purposes).

If you are updating the template for resources on your Web site it would be useful to include a definition of the language type:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-gb">

or

<html lang="en-gb">

Although not mandatory the language definition is needed if you wish to seek compliance with the W3C WAI AAA guidelines [5].

Ensuring Compliance With HTML Standards

We have seen some of the mandatory elements of XHTML and HTML 4. These must appear in compliant documents. Ideally these will be included in templates provided to Web page authors or generated by a Content Management System, by use of XSLT, backend scripts, SSIs (server-side includes), etc.

However, as we know, agreeing on a standard and providing templates do not necessarily mean that compliant documents will be produced: authoring tools may still fail to produce compliant resources, templates may be altered, etc. There will still be a need to test resources for compliance with standards.

There are several approaches for the checking of compliance with HTML standards:

Checking Within The Authoring Tool
Many HTML authoring tools provide HTML compliance checking facilities. However it should be noticed that (a) compliance with the XHTML standard may not be possible and (b) authoring tools which work with HTML fragments may not provide correct results.
External HTML Validation Tools
A number of HTML validation tools are available. These include desktop tools such as CSE HTML Validator [6] and Doctor HTML [7] and Web-based tools such as the W3C HTML Validator [8] and the WDG HTML Validator [9].

Although many HTML validation tools are available use of them to check individual pages is difficult if you have an existing large Web site to maintain. In addition if the validation process is separate from the page creation or maintenance process it is likely that the validation process will be forgotten about.

There are ways of addressing these problems, such as use of tools which can validate entire Web sites and integrating the validation with the page maintenance process.

A number of tools can validate entire Web sites, such as CSE HTML Validator Professional 6.0 [6] and the WDG HTML Validator mentioned previously [10].

Another approach is to embed a live link to an online validation service, allowing the page to be validated by clicking on the link. This approach was used on the Institutional Web Management Workshop 2002 Web site [11] as illustrated below.

Figure 1: Embedded Links to Validation Services
Figure 1: Embedded Links to Validation Services

A refinement to this approach could be to provide a personalised interface to such validation links, so that the icons are seen only by the page maintainer. This could be implemented through, for example use of cookies.

Another approach, which ensures that validation services can be integrated with the Web browser is to make use of a technique sometimes referred to as "bookmarklets". With this approach a bookmark to, for example, a validation service is added to your Web browser. The bookmarklet can be configured so that it will analyse the page which is currently being viewed, thus avoiding the need to copy and paste URLs. Use of this type of service is illustrated below.

Figure 2: Use Of Bookmarklets
Figure 2: Use Of "Bookmarklets"

A number of bookmarklets, together with further information on how they work, is available from the Bookmarklets Web site [12].

In addition to these approaches it is likely that we will see a growth in commercial Web site auditing and testing tools such as LinkScan Server and Workstation software [13] and services such as that provided by Business2WWW [14].

Challenges In Ensuring Compliance

This article has described the importance of compliance with HTML standards and has described some of the key elements of XHTML and HTML 4.01 documents and a number of tools and approaches for ensuring that documents comply with standards. However even if Web managers provide tools to create XHTML-compliant resources, if is still likely that on large Web sites non-compliant resources will be created. This is especially likely when Web resources are created using third-party software over which little control in the output format is available. This is true of, for example, Microsoft Office files, although, to be fair to Microsoft, the open source Open Office software [15] also does not support XHTML output.

What can be done in such cases? The best advice is to ensure that the resource is available in HTML, even if the HTML fails to comply with standards. This will ensure the resource is available to standard Web browsers, even if the resource cannot easily be repurposed. In the case of software such as Microsoft Office, which provide an option for the type of HTML to be generated, you should ensure that the HTML output can be viewed by a wide range of browsers and is not optimised for particular browsers. In the case of widely used proprietary formats for which viewers are freely available you should probably provide a link to both the HTML and the proprietary version.

Another option, in cases where conversion to HTML may be time-consuming, would be to provide a link to a online conversion service, such as Adobe's online conversion tool which can convert PDF to HTML [16].

Further Information

A good starting point for further information on Web and HTML standards is the The Web Standards Project - a group which "fights for standards that reduce the cost and complexity of development while increasing the accessibility and long- term viability of any site published on the Web" [17]. The Web Standards Project provides a valuable FAQ on "What are web standards and why should I use them?" [18].

IBM provide a useful introduction to XHTML, which provides a more complete description of the mandatory features of XHTML [19].

HotWired provide a useful summary of work of the W3C and The Web Standards Project in an article on "Web Standards For Hard Times" [20].

Finally the W3C are in the process of developing guidelines on "Buying Standards Compliant Web Sites" [21]. They have also recently set up the public-evangelist mailing list which provides a forum for discussion of Web standards [22].

References

  1. HTML 4.01 Specification, W3C
    http://www.w3.org/TR/html4/
  2. XHTML 1.0 , W3C,
  3. The XHTML Interview, Exploit Interactive, issue 6, 26th June 2000
    http://www.exploit-lib.org/issue6/xhtml/
  4. W3C Math Home, W3C
    http://www.w3.org/Math/
  5. Web Content Accessibility Guidelines 1.0, W3C
    http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505/#tech-identify-lang
  6. CSE HTML Validator,
    http://www.htmlvalidator.com/
  7. Doctor HTML,
    http://www2.imagiware.com/RxHTML/
  8. W3C Validation Service, W3C
    http://validator.w3.org/
  9. WDG HTML Validator, WDG
    http://www.htmlhelp.com/tools/validator/
  10. WDG HTML Validator - Batch Mode, WDG
    http://www.htmlhelp.com/tools/validator/batch.html
  11. Institutional Web Management Workshop 2002, UKOLN
    http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2002/
  12. Bookmarklets Home Page, Bookmarklets
    http://www.bookmarklets.com/
  13. Linkscan, Elsop
    http://www.elsop.com/
  14. Business2WWW - SiteMorse Automated Web Testing, Business2WWW
    http://www.business2www.com/
  15. OpenOffice, OpenOffice.org
    http://www.openoffice.org/
  16. PDF Conversion, Adobe
    http://access.adobe.com/simple_form.html
  17. The Web Standards Project, The Web Standards Project
    http://www.webstandards.org/learn/faq/
  18. What are web standards and why should I use them?, The Web Standards Project
    http://www.webstandards.org/
  19. XHTML 1.0: Marking up a new dawn, IBM
    http://www-106.ibm.com/developerworks/library/w-xhtml.html
  20. Web Standards for Hard Times, HotWired
    http://hotwired.lycos.com/webmonkey/02/33/index1a.html
  21. Buy Standards Compliant Web Sites, W3C
    http://www.w3c.org/QA/2002/07/WebAgency-Requirements
  22. public-evangelist@w3.org Mail Archive, W3C
    http://lists.w3.org/Archives/Public/public-evangelist/

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY

Email: b.kelly@ukoln.ac.uk

Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath