Web Magazine for Information Professionals

Web Focus: Using the Web to Promote Your Web Site

Brian Kelly expalins how to promote your web site.

Many readers of this article will be involved in setting up new web sites, possibly for European or nationally-funded projects, for internal, institutional projects or perhaps for community projects. As the size of the web grows there is an increasing awareness of the need to be pro-active in promoting web sites - we can no longer simply sit back and expect visitors to arrive at our new site. This article describes a variety of approaches which can be taken to the promotion of a web site. The article is based on a presentation on "Promoting Your Project Web Site" [1] given at the "Consolidating The European Library Space" conference [2].

Submission to Search Engines

Many visitors to a web site will find the web site through use of a search engine. Although search engines can find new web sites automatically as they become linked into the web from existing web sites the growth in the size in the web is making it increasingly difficult for indexing robots to keep up. It is probably desirable to be proactive and submit resources to search engines when a web site is launched.

Many of the main search engines provide an option to "Submit a Resource". Figure 1 illustrates the interface for submitting a resource to AltaVista.

Figure 1: Submitting a Resource to AltaVista
Figure 1: Submitting a Resource to AltaVista

Since there are a number of popular search engines and the search engines may limit the number of URLs which can be submitted it may be desirable to make use of a submission application or web service.

A large number of submission programs are available including WebPosition [3], NetSubmitter [4], RegisterPro [5], Engenius [6] and the Exploit Submission Wizard [7].

In addition to the submission programs there are a number of web-based submission services including Broadcaster [8] and Submit-it [9].

An illustration of one of these products (Web Position) is shown in Figure 2 (click to view enlarged image).

Figure 2a: Web Position Interface Figure 2b: Output from Web Position
Figure 2: Web Position

The products for submitting resources to multiple search engines typically provide other functions as well, such as analysing your pages, reporting on your position in search engines, creating metadata, etc.

Web Directories

Web directories such as Yahoo! are an alternative to search engines. They also provide a popular location for searching for resources. Unlike search engines web directories are compiled manually. Web directories also provide an interface for submitting resources, as illustrated in Figure 3.

Figure 3: Submitting a Resource to Yahoo!
Figure 3: Submitting a Resource to Yahoo!

A number of the submission programs will automate the submission of resources to web directories as well as search engines.

Possible Problems

Can we solve the promotion of our web site by simply purchasing a submission program? Unfortunately not. Due to the sheer size of the web search engines and directory services do not attempt to index all resources they find.

A sample of a web site may be indexed
Although the coverage of commercial search engines, for commercial reasons, tends not to be fully documented, it is believed that a number of search engines will only index a small sample (say 500 pages) of a web site.
A robot may only index to a limited depth
Indexing robots may only index the "surface" of a web site and not follow resources which are located deep in the hierarchy.
The user interface may present a barrier to the robot software
A number of indexing robots cannot process framed web sites or web sites with "splash screens".
A robot may not index certain URL strings
URLs containing questions marks (e.g. http://www.foo.com/get.asp?record=1) may not be indexed.
A directory service may only catalogue complete web sites and not individual projects
It is believed, for example that sub-domains have difficulties in getting into Yahoo!

Possible Solutions

Some possible solutions to the challenges listed above follow.

Domain Name

If a project has its own domain name it is more likely to be catalogued by a directory service such as Yahoo! In addition it is more likely to be fully indexed by a search engine than if it was part of a large web site.

The Robot Exclusion Protocol

Since search engines are likely to index only a small part of a web site it may be desirable to control the areas of the web site which are indexed. For example you may wish to exclude personal information, draft resources or experimental work from being indexed.

The Robot Exclusion Protocol (REP) enables a web site administrator to specify areas of the web site which should not be indexed. The REP makes use of a robots.txt file located in the root of the web server. A typical robots.txt file is shown in Figure 4.



User-agent: *           # Following apply to all robots 

Disallow: /cgi-bin/     # Don't index /cgi-bin directory 

Disallow: /tmp/         # Don't index /tmp directory 

Figure 4:  A Typical robots.txt File

The robots.txt file has a simple format and can be managed by hand. However a number of tools are also available to help you manage this file, such as RoboGen [10].

Robot Exclusion in HTML

Although the Robot Exclusion Protocol is conceptually very simply, in practice it may be difficult to exploit since updating the robots.txt file is likely to be restricted to the web site administrator. Fortunately there is now a HTML feature which enables authors of HTML pages to control access to their pages. The following HTML element located in the HTML HEAD:

<meta name="robots" content="noindex, nofollow">

will prevent robots from indexing the resources and following links within the resource.

Further information on the Robot Exclusion Protocol and Robots META tag has been produced by Martijn Koster [11].

Web Site Design

Avoid use of frames and splash screens in your web site design. As well as enabling indexing robots to access resources on your web site this also has additional accessibility benefits (visitors with browsers which do not support frames will still be able to access your web site).

Improving Search Results

Once the key pages in your web site have been indexed by a search engine you might expect a sensible query to retrieve the resources. Unfortunately the resource may fail to be located near the top of the search results. How can you improve the ranking?

Metadata

Metadata may help to improve the ranking. Simple keywords and description metadata, as illustrated below is desirable since this metadata is used by a number of search engines, including AltaVista:

<meta name="keywords" content="exploit, web magazine, TAP, telematics">

<meta name="description" content="Exploit Interactive is a ..">

Dublin Core metadata provides a more comprehensive and standardised approach to metadata for resource discovery. Unfortunately it is not yet widely support by the major search engines. It is probably worth implementing Dublin Core metadata if you can make use of it to enhance local searching and you can address the maintenance of the metadata.

An example of an approach of the use of metadata to enhance local searching and the architecture to manage the metadata can be seen in the Exploit Interactive web magazine [12]. The search interface is illustrated in Figure 5.

Figure 5: The Exploit Interactive Search Interface
Figure 5: The Exploit Interactive Search Interface

As illustrated in Figure 5 the search facility can be used to search the full text of articles, the author of an article (using the DC.Creator Dublin Core attribute) or the description (using the DC.Description Dublin Core attribute).

The metadata is stored in a neutral format (as variables in an "Active Server Page"). A server side include (SSI) is used to transform the metadata to the appropriate format. Currently the metadata is transformed into <meta name ="DC.Creator" ...> and <meta name ="DC.Description" ...>. However in order to provide the metadata in, say, RDF, it would simply require a single update to the SSI script.

The approach taken by Exploit Interactive provides enhanced searching for visitors to the web site, Dublin Core metadata which could be used by third party applications and an architecture which helps to minimise ongoing maintenance.

Citation

So far we have considered techniques which will ensure that a web site is indexed and ways of improving the ranking. We should also take into account the citation of web sites - for example URLs which are included in articles (both online and print), used in publicity materials or spoken (e.g. when giving talks or presentations or on the phone).

Domain Name

The domain name for the web site can affect promotion of a web site in a number of ways. For example short and memorable domain names:

UKOLN uses the name www.exploit-lib.org and www.ariadne.ac.uk for its Exploit Interactive [12] and Ariadne [13] web magazines. Both of these domain names are short and easy to remember.

Use of separate domain names or qualified domain names - sometimes used by departments (such as http://www.scs.leeds.ac.uk/) and sometimes for a particular function (such as Student Home Pages at Loughborough University - see http://www-student.lboro.ac.uk/) - appears to be on the increase. This is probably due to (a) the ease and low cost of obtaining domain names and (b) the increase in expertise and knowledge of running web servers.

URL Conventions

As well as having a short, memorable domain name it is also desirable to make use of short URLs. Before releasing your web site it is useful to develop guidelines for URL naming conventions. Some suggestions are given below:

Scalable Naming Conventions
You should ensure that your naming conventions are scalable so that a re-organisation of your directory structure is not needed in, say, two years time.
Avoid Unusual File Extensions
You should try to avoid use of unusual file name extensions. For example files ending in .asp, .cgi and files which contain question marks (e.g. get.asp?record=1) are difficult to cite and may fail to be indexed by indexing robots. It should be noted that this suggestion make conflict with information management requirements (e.g. it may be desirable to store information in a backend database). If resources are accessed using a CGI script or a similar method, it is advisable to try to ensure that URLs which appear to be static are provided. A number of techniques, such as Apache rewrites, can be used.
Make Use of Directory Defaults
Use of the default file names for directories can help in shortening the length of URLs and avoiding ambiguities in file extensions. For example the URL for an article could be referred to as http://www.exploit.org/issue1/pride/article.htm but this could easily be confused with http://www.exploit.org/issue1/pride/article.html. If the article has a file name which is the web server's default file name when a directory is requested (such as into.htm or into.asp) not only with this ambiguity be resolved, but the URL will be shorter.
Avoid Citation of Binary Files
When referring to, say an individual document or presentation it is advisable to cite a HTML resource. For example URLs of the form http://www.foo.org/presentations/talk-dec1999.ppt should be avoided as (a) not all potential readers will have access to a PowerPoint viewer; (b) it is not possible to provide links to alternative versions of the resource and (c) it will be difficult to provide additional information related to the presentation.

Jakob Neilson's AlertBox column provides some valuable comments on the "URL as UI" [14].

Giving Away Your Web Site

As well as the various suggestions on ways in which you can enhance the visibility of your web site you may also wish to consider giving the web site away! For example you could:

Figure 6 shows an interface for searching for medical information on the web which is available on the OMNI web site [15].

Figure 5: The interface for searching for medical information on the web at OMNI
Figure 6: The Interface for Searching for Medical Information on the Web at OMNI

This type of interface is probably more likely to generate search requests than a page simply containing links to the remote search interface. There are dangers in encouraging remote web sites to install a search interface to you web site search engine, in particular change control if you decide to introduce a new or updated search engine. However this is an option you may wish to consider.

You may wish to give your entire web site away. A mirror of your web site may enhance its visibility. If this is an option for your web site you may need to structure your web site so that it can easily be mirrored. This will include using directories to delineate areas of your web site which are to be mirrored, appropriate use of relative URLs and, if possible, ensuring that, if you use server-side scripting for management purposes, you hide (or rewrite) unusual URLs. Although these days sophisticated mirroring and replication software is available it will probably make the mirroring task much easier if the site has been developed with mirroring in mind. It should also be noted that this may also help in the digital preservation for a web site.

Publications

This article has described submission engines to search engines and web directories and described web architectures which will help to make web sites more accessible to search engines. In should be noted that articles about your web site can help in its promotion. Articles in print and web publications should obviously raise the visibility. In addition web magazines may submit their pages to search engines and links in the pages may be harvested. Web magazine may also be made available on CD ROM, in free text systems, citation reports, etc. As an example a number of Ariadne articles have been cited in Current Cites [16] and Ariadne itself features in PubList's Internet Directory of Publications [17].

Evaluation

If you have followed the various suggestions given in this article how can you evaluate the effectiveness and assess the benefits against the resources used?

Monitoring Links to Your Web Site

One suggestion would be to monitor the number of links pointing to your web site. The LinkPopularity.com web site [18] enables the numbers of links, as recorded by a number of large search engines, to be measured as illustrated in Figure 7.

Figure 6: The LinkPopularity.com Web Site
Figure 7: The LinkPopularity.com Web Site

Monitoring the number of links to your web site, and the growth of the number of links will be useful in evaluated the impact of your web site. It can also be of use if you wish to sell advertising space on your web site. As Roddy McLeod, manager of the EEVL gateway [19] mentioned in a posting to the lis-elib Mailbase list:

"I tried [LinkPopularity.com], pointing out to a potential advertiser that EEVL had, according to HotBot, 1099 sites linking to it, whilst there were only 18 sites linking to their site, and suggested that what they needed was more exposure. It seems to have worked, as they have agreed to buy an ad on the soon to be released new design EEVL site." [20].

Analyse Your Web Statistics

Analysis of your web statistics can help in measuring the effectiveness of your web promotion strategy. A more thorough report on web statistics will be published at a later date. In this article mention will be made of analysis of access to web sites by robot software. The BotWatch software [21] can produce reports on access to your web site by robot software, as illustrated in Figure 8.

Figure 7: BotWatch
Figure 8: BotWatch

Conclusions

Ideally you will think about the promotion of your web site before the web site has been launched. A number of technical decisions which can help with web site promotion should be made before the launch as changes to a running service will be difficult to implement. However even if your web site is well-established many of the suggestions in this article will still be relevant.

Many of the suggestions given in this article on web site promotion will have additional benefits in other areas. For example:

Further Information

Additional useful information on web site promotion is provided by Deadlock Design [22], SearchEngineWatch [23], VirtualPROMOTE [24], Pegasoweb [25], did-it [26] and Yahoo! [27].

Book reviews for "Poor Richard's Internet marketing and promotions: how to promote yourself, your business, your ideas online" [28] and "How to promote your Web site effectively" [29] have been published in the Internet Resources Newsletter.

Checklist

A checklist of the points mentioned in this article follow.

 Short domain name
 Short URL naming conventions
 Use of Robot Exclusion Protocol
 Short domain name
 Metadata provided in key areas
 Architecture to implement and deploy metadata more widely
 Web site submitted to search engines
 Architecture to submitted to new resources to search engines
 Procedures to produce web statistics
 Procedures to make use of web statistics
 Procedures to produce web statistics for robots
 Procedures to make use of web statistics
 Site designed to be easily mirrored

References

  1. Promoting Your Project Web Site, Brian Kelly
    http://www.ukoln.ac.uk/web-focus/events/concertation/libraries-nov99/
  2. Consolidating The European Library Space Conference, DG Information Society Cultural Heritage Applications Unit
    http://www2.echo.lu/libraries/events/FP4CE/agenda.html
  3. WebPosition Gold,
    http://www.webposition.com/
  4. Net Submitter Professional,
    http://www.netsubmitter.com/
  5. Register Pro,
    http://www.registerpro.com/
  6. Engenius,
    http://www.pegasoweb.com/engenius/
  7. Exploit Submission Wizard,
    http://www.exploit.com/wizard/
  8. Broadcaster Website Promotion, Broadcaster
    http://www.broadcaster.co.uk/>
  9. Submit it!: Web Site Promotion and Marketing, Submit it!
    http://www.submit-it.com/
  10. Robogen,
    http://www.rietta.com/robogen/
  11. Robots Exclusion,
    http://info.webcrawler.com/mak/projects/robots/exclusion.html
  12. Exploit Interactive,
    http://www.exploit-lib.org/
  13. Ariadne,
    http://www.ariadne.ac.uk/
  14. URL as UI,
    http://www.useit.com/alertbox/990321.html
  15. Searching for Medical Information on the Web, OMNI
    http://www.omni.ac.uk/other-search/
  16. CurrentCites Bibliography on Demand (search for Ariadne), CurrentCites
    http://sunsite.berkeley.edu/CurrentCites/bibondemand.cgi?query=ariadne
  17. Ariadne Main Information Page, PubList
    http://www.publist.com/cgi-bin/show?PLID=4931361
  18. LinkPopularity.com,
    http://www.linkpopularity.com/
  19. EEVL,
    http://www.eevl.ac.uk/
  20. lis-elib archive, Mailbase
    http://www.mailbase.ac.uk/lists/lis-elib/1999-11/0015.html
  21. BotWatch,
    http://www.tardis.ed.ac.uk/~sxw/robots/botwatch.html
  22. Art of Business Web Site Promotion, Deadlock Design
    http://www.deadlock.com/promote/
  23. Search Engine Submission Tips, SearchEngineWatch
    http://www.searchenginewatch.com/webmasters/
  24. Web Site Promotion, VirtualPROMOTE
    http://www.virtualpromote.com/promotea.html
  25. Web Site Promotion, PegasoWeb
    http://www.pegasoweb.com/
  26. did-it,
    http://www.did-it.com/
  27. Yahoo!,
    http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Information_and_Documentation/Site_Announcement_and_Promotion/
  28. Recent Internet Books in Heriot-Watt University Library, Internet Resources Newsletter, Issue 58, July 1999
    http://www.hw.ac.uk/libWWW/irn/irn58/irn58d.html#recent
  29. Recent Internet Books in Heriot-Watt University Library, Internet Resources Newsletter, Issue 59, August 1999
    http://www.hw.ac.uk/libWWW/irn/irn59/irn59d.html#recent

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY

Email: b.kelly@ukoln.ac.uk

Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath