A recent Web Focus article  argued that there was a need to ensure HTML resources complied strictly with HTML standards in order to ensure that they would be functional, widely accessible and interoperable. The importance of HTML compliance is growing as the HTML format develops from being primarily an output format used for display by Web browsers to its use as XHTML in which the resource can be transformed for a variety of purposes. This will enable XHTML to be a much richer and more widely accessible format; however the cost of this is the requirement that XHTML resources must comply with the XHTML standard.
Insisting that HTML resources must be compliant is, of course, insufficient in itself. There is a need to ensure that the authoring environment is capable of creating compliant resources and, ideally, enforcing compliance. In addition, if the publishing system is not capable of enforcing compliance, there is a need to make use of tools which can check for compliance failures.
An ideal solution would be to make use of a Content Management System (CMS) which provided, as part of the workflow process, a mechanism for validating the resource prior to publication. In many cases, however, such systems are not available. It is probably the case that conventional HTML authoring tools such as DreamWeaver and FrontPage or basic text editors are still the most common way of creating HTML pages. Although such tools may provide validation mechanisms, there can be no guarantee that authors will make use of them, or even fix any errors which are reported.
In such circumstances there is a need to validate resources after they have been published. There are many HTML validation tools available, including both online validation services such as the W3C MarkUp Validation Service  and WDG HTML Validator  and desktop tools such as CSE HTML Validator .
However if you have ever used a tool such as the CSE HTML Validator to validate an entire Web site, you may well have been inundated with a large number of error messages. In a situation such as this you may well feel that the problem is insurmountable and simply abandon any attempt to ensure your Web site complies with HTML standards. This will mean that your Web site is likely to degrade even further, if new resources are published which are not compliant.
Rather than abandon any attempt to ensure that a Web site complies with HTML standards there are approaches which can be taken which can help to identify the key resources which should be fixed.
W3C have developed a Log Validator  which provides a useful tool for Web managers. The Log Validator is a Web server log analysis tool which finds the most popular documents which are non-compliant. This can then be used by the Web manager to prioritise the resources to fix. This tool takes a Web server's last logs and processes it through validation modules.
We have a HTML policy  which states that:
"QA Focus Web pages will be based primarily on XHTML 1.0 Transitional and Cascading Stylesheets (CSS). Resources should comply fully with the appropriate standards."
Since resources are maintained using an HTML editor and there is no automated publishing process which can guarantee compliance with standards, we have adopted manual procedures  for validating resources:
"When new resources are added to the Web site or existing resources updated, QA Focus will check XHTML validation using ' ,validate' after each page is created. QA Focus will run a monthly batch validation check on the whole site using ' ,rvalidate'. All manually created pages will be checked."
The approach of documenting technical policies and ensuring that there are procedures in place is the approach to quality assurance  which QA Focus is recommending for projects funded under JISC's digital library programmes. However since the QA Focus procedures require active use by authors there will inevitably be occasions when the procedures are not implemented. There is therefore a need to provide additional validation procedures.
The approach taken has been to deploy the Log Validator tool. The tool has been configured to run automatically once a month. The tool creates a report which lists the top ten most popular pages which are non-compliant. The monthly reports  are published on the QA Focus Web site. An example of a report is illustrated below.
|Log Validator results|
|Results for module HTMLValidator|
|Here are the 10 most popular invalid document(s) that I could find in the logs for www.ukoln.ac.uk.|
|Conclusion: I had to check 291 document(s) in order to find 10 invalid HTML documents. This means that about 3.43% of your most popular documents was invalid.
This report helps to prioritise the resources which need to be fixed. As well as identifying pages which contain errors which need to be fixed, the report also helps to spot systematic errors.
This tool can help to improve the quality of HTML resources by providing a summary of the most popular pages which are non-compliant. This tool will be used as part of a formal Web publishing policy. For example, an organisation could implement a policy which states that non-compliant resources reported by the tool will be fixed within a specified period. Even better would be a policy which stated that the causes of the problems would be identified and the workflow procedures updated to ensure that such errors would not re-occur. There may, however, be occasions when it is not possible to fix errors (for example your recommended HTML authoring tool may fail to create compliant HTML and replacing the tool with one which can create compliant HTML would be expensive). In such circumstances it would be desirable to ensure that a record of the decision and justification for the decision is kept, as this should inform future planning.
A record of Web Log Validator reports and notes of actions taken  is used on the QA Focus Web site, as illustrated below.
W3C's Web Log Validator is a simple tool which should be of interest to Web authors and developers for whom compliance with HTML standards is of importance. However it is only a reporting tool, so it should be used in conjunction not only with a Web publishing approach which aims to ensure that HTML resources are compliant with HTML standards but also with quality assurance procedures which can ensure that the publishing process works correctly and problem areas are addressed.