University of Manchester
Web Cache: The National JANET Web Cache Progress Report
In May 1998 (the end of the last academic year) the National Caching Service was receiving over 27,000,000 requests and shipping around 250 GBytes of data on a busy day. In recent weeks we have exceeded 40,000,000 requests and shipped over 400GBytes per day and these figures are likely to increase in the coming months. Over 150 institutions currently use the service and this number too is set to increase as Colleges of Further Education and other organisations begin to use us. We are expanding the service to meet this growth in demand. We are also undedrtaking some service enhancements to improve performance.
The current service is provided by over 30 Intel-based machines running the Linux and FreeBSD operating systems and Squid caching software. There are three caching service nodes located at Manchester, Loughborough and ULCC. Each node comprises several machines which cooperate with each other using the Cache Digest mechanism. Institutions are allocated machines at two of these nodes and thereby access to the aggregate filestore at each node comprising hundreds of GBytes of cached information.
Many factors influence the effectiveness of the caching service. Two of the major ones are machine loading during periods of peak demand and the processing of requests for non-cacheable objects, such as CGI scripts, Web-based email and secure http (https). The former results in users observing unacceptably high Web object retrieval response times whilst the latter imposes unecessary processing load on the caches, also resulting in increased response times.
One way of reducing response times in general and during periods of peak demand in particular, is to balance the load being processed by a cluster of caches. We have been investigating various load-balancing mechanisms and have developed a very promising one based on the LVS (Linux Virtual Servers) project. This system enables a "front-end" machine to balance load, in our case incoming requests, between a group of Linux servers. Tests with several volunteer sites have indicated that this mechanism is very effective in reducing object retrieval times during periods of peak demand. Accordingly during the next few months we will transition the national caching service to an LVS based system, details of which are available at our web site at the URL listed below.
Also in the coming months we plan to improve response times further still by addressing the problem of caches handling requests for non-cacheable objects. Ideally we would wish to route requests for such objects directly to the origin sites bypassing the caches completely. One way of doing this would be to ask institutions not to send requests for non-cacheable objects to the national caching service. Another way would be to adopt a "Level 7" switch approach at the national caching service and redirect such requests to their origin sites. We are investigating these possibilities with some urgency and with a view to implementation as soon as possible.
The introduction of load-balancing and the re-direction of non-cacheable requests will generate a tangible improvement in the effectiveness of the national caching service provided to the academic community. Future articles will elaborate on the mechanisms used. In the meantime you will find further information at http://wwwcache.ja.net.
If you have any comments or suggestions please email email@example.com.
JANET Web Caching Service Team