In 1997 UKOLN received funding for a project known as WebWatch  . The aim of the WebWatch project was to develop and use automated robot software to analyse Web sites across a number of public sector communities. After the project funding finished UKOLN continued to provide WebWatch surveys across communities such as UK Higher Education Web sites. However once the initial WebWatch software developer left it was decided to adopt a slightly different approach - rather than continuing to develop our own WebWatch robot software, we chose to make use of freely-available Web-based services.
The feedback we have received on our WebWatch surveys has been positive. A number of people have expressed interest in running their own WebWatch surveys. This article describes how you can apply the same methodology yourself across your own community, such as UK HE academic libraries, a particular academic discipline, projects funded by a particular programme, etc.
Why would you wish to carry out your own WebWatch survey? There are several reasons: you may be a national centre and wish to observe approaches taken to the development of Web sites within you community; you may be a funding body and wish to ensure that Web sites you have funded comply with service level agreements; you may have an academic or research interest in developments; etc.
Whatever your reason WebWatch surveys will help you to:
Current WebWatch surveys are mainly based on the use of Web-based services. Although in some cases it would be possible to make use of desktop applications, the Web-based solution has been adopted since the methodology is transparent and this approach allows end users to have live access to the survey services for themselves, enabling them to reproduce the findings. Wherever possible the surveys make use of freely available Web services which will allow others to repeat the surveys without having to incur any charges for using licensed services. The methodology also allows them to compare arbitrary Web sites with published findings.
There are a wide variety of Web-based services which can be used in the way described. A number of them are listed in the following table.
|NetMechanic||General testing, including page analysis of HTML, links and file size.|
|Bobby||Accessibility testing and page file size analysis.|
|LinkPopularity||Numbers of links to a Web site, based on AltaVista, Google and HotBot.|
In order to check if a Web service which provides reports and an analysis of a Web site can be used in a WebWatch survey, go to the Web service and use it. In the page containing the results examine the URL window: if it contains the URL of the Web site you have analysed, you should be able to use the service to survey your community.
Figure 1 illustrates this. The URL of the entry point of the UKOLN Web site is supplied. The Bobby accessibility checking service then analyses the UKOLN Web site. The URL of the page of results contains the URL of the UKOLN entry point:
(It should be noted that in this case certain characters (/ and :) are encoded.) Further information on this technique is given on the Bobby Web site .
We can then simply include the URL given above as a hypertext link in a Web page. Repeat this for all the entry points in our community and we can provide a WebWatch survey of the accessibility of the entry points within our community. If we so desire, we can use other Web services in a similar way to provide a more comprehensive survey.
It should be noted that if the Web service does note include the URL of the Web site being analysed, that the technique described in this article cannot be used in the manner described.
We have described the technique used for carrying out WebWatch surveys. So how should we proceed with a survey?
The first thing to do is to find an authoritative source for the entry points for your Web site. An organisation such as HESA may provide a definitive list of entry points to UK Higher Education Institutions . Other useful starting points include NISS  and University of Wolverhampton UK Sensitive Maps . Organisations outside the HE community who provide directories include iBerry's list of Higher Education Links  and Tagish's directory of UK Universities . You will sometimes find lists provided by volunteers such as Ian Tilsted's list of UK Higher Education & Research Libraries  and Tom Wilson's World List of Departments and Schools of Information Studies, Information Management, Information Systems, etc. . With all of these services you should bear in mind that the quality of the data may be variable - for example Dr Dave's links to UK HEIs gives a link to the Computer Science Department at the University of Hull and not the University of Hull entry point .
You should bear in mind that although some organisations and individuals may be willing for their data to be reused in this way, others may not. You should seek permission before reusing data from other people's Web sites - you may find that the Web site owner is happy for the data to be reused.
Once you have obtained your list of URLs of entry points, you will have to use these as input data for the Web sites you will be using. It would be a simple, but , on a large scale, time consuming task to simply copy and paste the URLs into a HTML template for each entry point (having remembered to convert and special characters, such as ://).
A better way is to store your data in a backend database, and to wrap the appropriate HTML tags around the data when accessing the data. Another alternative would be to make use of server-side scripting (e.g. ASP on a Windows NT platform or PHP on a Unix platform).
Once you have generated links to the Web site services, you will need to manually follow the links to obtain the results and then store them.
It would be possible to write a script which initiated the requests, and processed the results for you (this is sometimes known as "HTML-scraping"). However there is a danger that automated submission of requests to a service which has been developed for use by humans could degrade the performance of the service, and even, in extreme circumstances, result in a 'denial of service' attack. You should not use this approach unless you have obtained permission from the service provider.
Once you have carried out your survey you will need to analyse your findings. You will normally find that you have a set of numerical values for your community: for example the size of entry points, the number of links to a site, the number of broken links on a site, etc. These can be summarised in a graphical format.
When summarising the file sizes it was noticed that several appeared to be very small. Further examination revealed that this could be due to several factors: (a) analysis of a redirect message from the server; (b) analysis of a NO-FRAMES message (e.g. a message saying "Your browser does not support frames"); (c) analysis of other error messages.
In light of these findings you should be wary of the results and, in particular, examine any outliers in your findings in more detail.
As well as use of Web sites, there are other tools which can be used to support your survey.
University Web managers have how the "rolling demonstration" of University entry points , search engines  and 404 error pages  to be helpful when thinking about the redesign of local facilities. Information on how to do this for your own community is available .
You may wish to manage the URLs of your entry points in a bookmark manager. You could use the bookmark facility in your Web browser. However you may find that dedicated bookmark management tools will provide richer functionality. For example, you may wish to receive notification when an entry point no longer exists or the contents of a page changes. Many bookmark managers will provide such functionality (e.g. see  ).
In addition to desktop bookmark managers there are also a range of Web-based bookmark management tools which you may wish to use (e.g. see ). Since Web-based bookmark managers can be accessed by everyone, not only can they be used to help you manage your WebWatch survey, they could also act as part of the survey itself. For example, the WebWatch surveys make use of the LinkBank link management tool. As well as providing email notification when a resource is no longer available, it also provides an automated display of resources .
You may wish to consider using an offline browser in order to capture Web pages from your community and hold them locally (e.g. see  ). This would, perhaps, we a way of archiving pages in order to make comparisons at a later date (although there are copyright and legal issues which you will have to consider if you wish to do this).
Your survey of the Web sites across your community should be of interest to members of the community. The findings should help them to gauge how they compare with their peers and will encourage those whose Web sites are reported favourably and act as a spur for those whose Web sites did not appear to do so well.
You may wish to make your findings available on the Web. Other options include writing articles based on your findings or giving presentations at appropriate conferences.
You may also find it useful to repeat your survey periodically in order to monitor trends.
You may also find it useful to compare your findings with other communities: for example, how do UK University entry points compare with those in the USA ?
The approach describe in this article is based on use of freely-available Web sites. However there are a number of limitations to this approach:
In the future we should see solutions which will address the limitations of the current approach. The term "Web Services" has been used to describe reusable software components which are designed for use by other applications and can be accessed using standard Web protocols. Use of "Web Services" to provide auditing and benchmarking services about Web sites would appear to address the concerns mentioned above.
UK Web Focus
University of Bath