URL Monitoring Software and Services
Paul Hollands: One of the problems that the academic community faces with respect to the Internet is that certain types resources are, by nature, subject to rapid change (eJournals and eZines for example). How do you remember when to look for the latest edition of your favourite Web publications? Once you have found that ideal specialist list of sources, how do you know when new items are added? From the web author's point of view an even greater difficulty is keeping links within your own documents up to date.
There are three basic approaches to managing information about site changes at your desktop:
- If you use the latest versions of the Netscape Navigator or NCSA Mosaic browsers you have limited monitoring functions built in.
- The second approach relies on the use of email. Whenever a site changes, you are sent an email message, prompting you to check that URL.
- In the third instance you have a piece of software sitting on your machine which periodically polls the sites you have specified and informs you of any changes.
- Netscape - What's New? Function
Netscape v 2.0x has a built in URL monitoring function tied to its Bookmark facilities. If you choose Go to Bookmarks from the Bookmarks menu and then choose the File menu in the Bookmarks window proper, you will see a What's New? option. This apparently gives you the option of checking a selection or all of your bookmarks to see what updates have occured.
Unfortunately I find the whole set of Bookmark functions in Netscape 2.0x very unintuitive and after running this function for several days I am still unsure what has changed in my bookmarks listings. Nice try but no cigar...
- Mosaic - Autosurf Function
This function checks the links in a document rather than a list of URLs per se. You simply pull up the page with the links you want to check and then choose the Mosaic AutoSurf option on the Navigate menu. Mosaic then furnishes you with a report about the state of the links. (Thanks to Walter Scales who originally posted these browser tips on the netskills-for um list.)
An example of the second or email approach to monitoring is the URL-Minder http://www.netmind.com/URL-minder/URL-minder.html service. This is a free service provided by NetMind. The number of URLs you can submit is unlimited and you can even include Yahoo! search strings to be periodically run. This is perhaps the most useful aspect of the service (and what's more, this should also work for any search engine that uses the "Get" command).
The URL-minder keeps track of World Wide Web resources and sends users e-mail when it detects a change in a registered Web page or CGI execution. The URL-minder keeps track of one Web page, gif, or other resource at a time. It will not keep track of all the Web pages linked to the page you submit. A separate URL must be submitted for every distinct page you want the URL-minder to track for you. The URL-minder tracks the actual HTML markup, binary contents, or ascii contents of the URL you submit. If an HTML page includes a GIF or JPEG graphic, the URL-minder will inform you when the reference to the graphic changes.
If you want to know when the actual content of a binary graphic file changes, you must submit the URL of the binary graphic itself. The URL-minder currently checks on your registered URL's at least once per week, and will inform you if it fails to retrieve your registered URL after trying twice.
There are two ways to register your chosen URLs. Web authors can insert the relevant forms into the documents on their sites so that you can register as and when you visit, or you must make a note of then URL and then point your browser at the URL-minder site.
This would be an ideal service for managing sources such as eJournals and Online Newspapers. Unfortunately it won't work with those sites where you have to register.
The benefit for the academic in using the email method over the software approach is that email is ubiquitous and platform independent. No matter what type of machine you might have or what type of mail setup, the information will find its way to you. There are no problems with software support, you don't have to use up hard disk space by building databases of site references nor do you have to wait around while your machine is busy polling sites for you.
However the only service around at the moment seems to be URL-minder and good as it is you still have to type in your email address and name each time you want to log a URL. A better approach would be to have registration system similar to The Electronic Telegraph and The Times. Also it would be nice to be able to log a whole list of URLs in one go. An easy way of doing this would be to be able mail your bookmark.htm dump file to URL-minder and register the whole lot.
These are limitations at the moment, but NetMind are extending the service all the time. I put my criticisms to Matt Frieval of NetMind. He replied:
"Multiple options for managing registrations seem to be in order, since everyone has an opinion and a preference. We'll work the issue as time and resources allow.... We are adding more capacity next week, and hope to be able to put some additional user capabilities in place over the next few months. "
So watch this space....
The third approach to URL monitoring involves the use of a client on your own machine. Examples of these are Netscape Smartmarks, Webwatch, Net Buddy, Blue Squirrel SqURL and Surflogic Surfbot.
This third area seems to be where most of the effort has been focused although from a lot a standpoints it is rather flawed. The major stumbling block from the point of view of the home market (which is where most of these products are aimed) is that you need to be online hiking up your phone bill while your monitor goes off and monitors. While this is not so much of a problem for academic users you still have machine time tied up while the software executes. This is a major deficit compared to the email system.
Only Blue Squirrel seem to have acknowledged this and have bundled their package accordingly. They offer a whole suite of software which includes a utility to launch applications on a timer. This means that you could have your monitoring done while at home in bed to get the double benefit of cheap rate calls and an idle network.
The other difficulty is platform dependency. These products are proprietory systems designed almost exclusively for stand-alone Windows PCs. Also, nearly all of them have been developed by small outfits in North America, which doesn't bode well from a support point of view.
I have looked at several of these products:
A) Netscape Smartmarks: http://home.netscape .com/comprod/smartmarks.html This is a product that was actually developed by a company called First Floor Software and is based on their Smart Catalogue Technology. It has been licenced to Netscape to sell as their own. This is a reasonable product if you need to organize a very large number of bookmarks. (I find that the font size and layout of the existing Netscape bookmarks functions can make bookmarks difficult to locate once you get beyond a certain number of entries.) It also interacts in a reasonably smooth way with the browser.
The big flaw as far as Smartmarks is concerned is its footprint. It seems very memory hungry and my PC (DX-4 100mhz with 8Mb RAM) slows quite considerably when it's in the background. I found that if I run any other memory - hungry applications I get General Protection Fault errors. When you have this and Netscape 2.0 running together and load up Word with a big document, you are walking on a knife edge.
Another difficulty is that when you fire up Netscape, Smartmarks starts automatically as well. This makes the time taken for both products to load interminable (I go and make coffee while I'm waiting). It is possible to stop both pieces of software loading at the same time but you have to decide at the installation stage or mess about editing yourwin.ini file.
If you are considering this product I suggest you get yourself a Pentium with at least 16 Mb RAM. If you have a 4Mb machine then forget it.
I'd be interested to see how Smartmarks performed once the database had several thousand URLs in it as well. My hunch is that it would be so slow as to be unusable.
You may encounter problems with the monitoring functions of Smartmarks if your institution has an HTTP Proxy. Smartmarks reads the details from the Netscape.ini file and if this entry starts withhttp://then Smartmarks will not be able to locate it. You will need to remove the http:// entry for the monitoring functions to work.(e.g. wwwcache.uni.ac.uk instead of http://wwwcache.uni.ac.uk)
Another problem I found was that the database tended to get corrupted easily. Especially if you shut down Netscape before Smartmarks. This can be fixed quite quickly by deleting a few files and running the backup / database check utility bundled with the product (a nice touch) but it's still a pain. Also if you put bookmarks in the Bookmarks Menu rather than a Smartmarks folder and then the software crashes, you lose everything in the menu.
The final problem with Smartmarks is that it will only run under Windows (3.x, 95 and NT). It is not available in Mac or Unix flavours.
Smartmarks comes with a quite impressive list of features including the Smart Finder and its use of the Bulletin standard to speed monitoring. In practice however these are of minor benefit to your day to day work, and the problems of the beta make it less than pleasurable package to use at present.
Even when the faults are ironed out for the full release the huge footprint of the product will still be an issue. Overall, at present, its over-engineered and cumbersome. Would that it were designed more like NetBuddy...
B) NetBuddy: http://www.internetsol.com/netbuddy.html
NetBuddy is produced by Internet Solutions, a small company from Seattle. It is marketed as an alternative to Smartmarks and its main selling point is its small footprint (510K). Despite the silly name I like NetBuddy. It reminds me of Eudora in the elegant simplicity of its design. It is clearly being marketed as a friendly, fun sort of product and in this respect it succeeds. NetBuddy is only available for Windows platforms.
Like Smartmarks, you can import existing bookmarks and arrange them in folders. NetBuddy then contacts each site in turn and stores a record of the text of each HTML file. I presume that when it checks for changes it looks for them in the text itself.
A nice feature is if you right-click on an entry, a dialogue box appears with option to view the text of the Web page. You can also view the HTML source complete with various bits of meta data at the top and even search for words/terms in the body of the text using the Find button. This is an enormously useful feature of NetBuddy as you can view the text contents of a web page in your listings without having to contact the site.
NetBuddy also interacts very well with Netscape and I found it easier both to pass URLs from browser to database and back again than with Smartmarks which is slow.
NetBuddy lacks a Find function to search the URL listings at present. I have found this function to be invaluable in Smartmarks but this sort of enhancement is very easy to do. I make a great deal of use of a similar function in Eudora to locate messages quickly.
What would be ideal is if I could search the text of all the URLs listed without having to open up a dialogue box for each. Again this should be an easy extension of present functionality. This would put NetBuddy head and shoulders above Smartmarks and its Smart Finder.
The other thing about NetBuddy is that it is very quick at contacting and checking Web documents; much faster than any of the other products I have tried. Potentially this means much less waiting around. I'd be interested to see how it copes under Windows 95 multitasking as this might solve the problem altogether.
Before you all clog up the Internet Solutions server to get this product however there is one major problem. The software is still in Alpha test, which means it's free, but also that it is also very unstable. NetBuddy has taken down Windows quite spectacularly twice this afternoon. The copy I have (A51) is riddled with bugs and at the time of writing this is the latest release. Let's hope the Beta is released soon!
C) Webwatch from Surflogic: http://www.specter.com/ww.1.x/products.html
This is the most straightforward product of the three and also the cheapest ($18). The design is very workmanlike (the whole product consists of one dialogue box), but again it is only available for Windows based platforms.
As input to the program you specify a local HTML document (on your hard disk) that references the URLs you want to track, and the date and time after which you consider updates to the referenced documents to be interesting. You also have the option to override this date with the time of your last visit to each referenced document, as recorded by your Web browser. (I presume the idea is that this file could be your bookmark.htm file exported from your browser.)
WebWatch will generate a local HTML document that contains links to only those documents which were updated after the given date. You can use this local document to navigate to the updated documents, using any Web browser. The benefit of this approach is that it is simple and quick but it does rely on authors including a "Last Modified" date for the site you are monitoring:
WebWatch retrieves only the "header" of a document, to check its "Last Modified" date. The size of this header is usually quite small compared to that of the entire document. With WebWatch, your connect time, and ultimately the load on the network, is significantly reduced.
I tried this product on a local copy of my own Web page and it worked a treat. It trawled through all the links in the document (ignoring the rest of the HTML and text) and gave me back a list of updated sites, error 404 sites (presumably sites that have died) and even sites that had moved.
I can wholeheartedly recommend this product both because of its brilliant simplicity and the price. You do need to be careful to get the balance of your Include URLs newer than and Skip (seconds) settings; I would suggest at least 60 seconds for the latter (the length of time a site is polled before the software times out and moves on.)
The only difficulty is with sites with no last modified entries. For these I would recommend you use URL-minder and you should have all bases covered.
To summarize, I would recommend using URL-minder if you have a small number of sites you need to check which you know will change regularly. The benefits are that you can register sites and forget about them. The service does all the work.
If you are authoring and want to check to ensure your links are kept up to date the easiest product is WebWatch. Use URL-minder as a supplement for the odd site without a Last Modified entry.
If you want to use a product which combines bookmark management with monitoring (and you have a Windows PC) then the only option available to you at present that is stable enough is Smartmarks. My personal preference would be to regularly export my bookmarks using the Save As function on the Bookmarks File menu and then run WebWatch over it once a week until the NetBuddy Beta release.
For those of you who have a Mac I'm afraid the only option at present is URL-minder or whatever functions your browser offers.
I'm sure you folks out there using Unix based systems have a few tricks up your collective sleeves and that there are a myriad of Perl scripts in existence to monitor for you. It is still worth giving URL-minder a try however as it is a good service which deserves to flourish.