Web Magazine for Information Professionals

Unix: What Is mod_perl?

Ian Peacock explains mod_perl technology for supercharging the Apache Server.

mod_perl [1] has to be one of the most useful and powerful of the Apache modules. Beneath the inconspicuous name, this module marries two of the most successful and widely acclaimed products of OSS, the Apache Webserver [2] and Perl [3]. The result is a kind of Web developers Utopia, with Perl providing easy access to, and control of, the formidable Apache API. Powerful applications can be rapidly created and deployed as solutions to anything from an office Intranet to Enterprise level Web requirements.

Note that although this article discusses mod_perl from a Unix perspective, both Apache and Perl will run on a number of different platforms, including Win32.

Apache Insight

Many readers will have at least heard of the Apache project. The highly popular [4] Apache Webserver has established itself as one of the protagonists of the "open source revolution". Designed to be extensible, the basic server core is complimented by various modules which supply functionality. For such a modular approach to be successful, the core must embody a comprehensive and well thought out API. The Apache API [5] provides just this - access to nearly all of the server's internal processing, so that custom steps may be introduced at any stage of the request process. Examples of such modules include the standard distribution mod_cgi [6], a module for executing CGI scripts, and the third party module mod_gunzip [7] for uncompressing files on-the-fly.

However, until a couple of years ago, the power of the Apache API could only be tamed using the 'C' language. The overhead of writing and testing a module written in C has meant that modules have been limited to the core server package, and for those requiring the performance benefits of C (e.g. when a CGI-based quick hack will not suffice). Fortunately, the introduction of mod_perl in 1996 by Doug MacEachern [8] changed things. By having a Perl interpreter within a module, an interpreter also becomes embedded within the server. This can mean significantly increased performance for perl CGI scripts, but the primary benefit is that mod_perl supplies Perl programmers with a direct line to the Apache API via Perl objects and method calls. The outcome is that server-side Perl programs can offer far more versatility over traditional CGI scripts that cannot interact with the server at different phases of the request process.

Why Perl?

Q:What is Perl?
A:Perl is a language for getting your job done.

The above explanation opens the preface of Programming Perl [9] and strikes me as a very suitable synopsis. In the Unix world Perl enjoys a huge popularity and has been adopted as standard on many systems. The reasons for its success are many and include a scripting language/programming language duality, usability, support and the fact that it is free (open-source).

It is a language eminently suitable for rapid application development, where it may assume a programming language role, or for smaller projects such as a CGI application, where it may be treated more as a scripting language. Results are generally easy to achieve with Perl ("you don't have to know all about Perl to work with it") and it is flexible enough to suit the needs of many different types of application.

Support for Perl is vast. A glance at CPAN [3] shows modules that can provide everything from TCP network support to cryptography. There are also many script repositories, discussion groups and mailing lists covering Perl and Perl specifics.

A full treatise on the benefits of the Perl programming language is the subject of numerous articles and books [10].

What can you do with mod_perl?

mod_perl is more than a scripting language. It is a unification of Apache with Perl, meaning that much of Apache can be controlled from Perl (including its configuration - meaning that configuration can be dynamic!). Although Perl can be embedded within HTML documents (through ePerl, mod_perl extended Server Sides Includes or other methods), mod_perl is usually used to supply the functionality that allows Apache extension modules to be written in Perl.

The benefits of choosing mod_perl are the cojoined benefits of using Perl and Apache. There is a great deal of support available for mod_perl, though mailing lists and through repositories of code. Through the Apache, Perl and mod_perl communities, a valuable (and free) knowledge base is available. Portability is another advantage since a mod_perl system can be installed on virtually any flavour of Unix, and on Microsoft Windows systems, and when installed operates in an efficient and stable way. The reliability and stability of Apache is widely recognised [11].

As an open-source project, code undergoes constant review and fixes and upgrades are frequently produced, keeping a mod_perl system in touch with current trends and technologies.

In order to appreciate how mod_perl can extend the server, it is useful to know the different processing states that Apache undergoes when a request is received:

External modules can define custom 'handlers' to enhance or supersede Apache's core behaviour at each phase. With mod_perl installed, these modules can be implemented in Perl. Some examples of what different kinds of handler could achieve are given below.

Example 1: transparently uncompress HTML on-the-fly

Handy for server administrators who are running out of disk space. This module would be a content handler acting as a file processor. This handler would be called on request for an HTML file, example.html, if the HTML files exists it would decline the request and allow apache to deal with it as usual. If the file does not exist, it will look for example.html.gz, ungzip it on-the-fly, and sent it back to the user (who think they have retrieved a static HTML file). If neither files exist, the handler declines the request to let Apache dish out a 404. This module is in operation on parts of the Netcraft site [12].

Example 2: access control based on client attributes

A number of modules exist that allow different kinds of access control:

Example 3: cookie-based access control

The Apache::TicketAccess module was designed to handle the situation where user authentication is expensive. Instead of performing full authentication each time the user requests a page, the module only authenticates against a relational database the first time the user connects. After successfully validating the user's identity, the module issues the user a 'ticket' - an HTTP cookie carrying the user's name, IP address, expiration date and cryptographic signature. Until it expires, the ticket can be used to gain entry to the site under the control of Apache::TicketAccess.

Example 4: Embedded scripting

Apache::ASP provides an Active Server Pages port to the Apache HTTP server with perl as the host scripting language.

Running Perl CGI scripts under mod_perl

Although CGI scripts can work as-is when mod_perl is installed, if they are written in Perl, they can be run through Apache::Registry for performance increase.

The Apache::Registry module allows legacy CGI perl scripts (that the maintainer has no time to convert to modules) to be run under mod_perl. A CGI environment is emulated, and the CGI script is compiled and cached, ready in executable form whenever a request comes in.

CPAN [3] contains a wide range of Apache Perl modules.

What is the difference between mod_perl and ...?

ActiveX

Based upon Microsoft's COM and DCOM ([Distributed] Common Object Model) architectures, ActiveX provides a container for dynamic link libraries, called an ActiveX control, which can be created using the likes of Visual Basic or C++. Such controls can be downloaded to the client and run on the client machine. COM is an architecture supported only by only a few operating-systems and browsers, making ActiveX suitable for a known supported client, or homogeneous intranet.

ASP

Microsoft's Active Server Pages [13], are for the IIS Web server [14], a similar offering to PHP on Apache. Although ASP has been ported to other platforms and Web servers, these account for few of the sites deploying ASP. The full benefits of ASP are likely to be reaped within a Microsoft environment, where there can be integration with other windows applications, for example, ASP access to an ISAPI [15] filter.

ASP is the most widely used solution for providing server-side technologies using Windows NT (almost invariably with IIS).

CGI

The Common Gateway Interface [16] is not a language but a protocol that describes how a Web client and server should interact when the client needs to send small amounts of information to the server via HTTP (the results of filling in a form, for example). Any server-side processing technology should be able to deal with CGI . An ASP enabled server, a mod_perl enabled server or a standalone script executed by the web server with output sent back to the client (a 'traditional' CGI program) should all be able to process the information sent via CGI.

Javascript/VBScript

Javascript [17] (or ECMAScript [18] as it should now be known) and Microsoft VBScript [19] are examples of client-side scripting languages (actually, there is a server-side Javascript, but the client form is far more popular). These are languages that are embedded within the document source and processed by the Web browser rather than by the server. This means that the browser parses the document for the script and executes it (unlike a server-side embedded script, which may be included into HTML source via pseudo HTML elements which are removed when the server processes the document). The browser must also know how to execute the script (Netscape for example will not parse VBScript). To perform useful tasks, the scripting language must interface with the browser. Whereas server-side languages may interact with the server via an API, a scripting language will offer an API to the browser. The standard defining just what the API should be able to achieve is called the Document Object Model (DOM) [20].

There is sometimes a choice over whether to use server-side or client-side scripting, a few of the pros and cons are shown below:

Client-side scriptingServer-side scripting
Have to make assumptions about client browserClient browser does not affect processing
Client processor load increases slightlyServer load may increase significantly at times of high activity
Script needs to be downloadedScript remains on the server (private)
Client can view script sourceClient cannot view script source

Server side and client side scripting can often be used in a complimentary way. For example, a client-side script may check that the contents of an HTML form conform to certain rules before submitting the information to a server to be processed by a server-side script.

Java servlets

According to Sun, A servlet can almost be thought of as an applet that runs on the server side -- without a face. Sun's Java Servlet [21] API supplies 'hooks' via which server side applications can be created. Servlet's are embedded into a JavaServer [22] web server where applications use the API much the same as Apache modules use the Apache API.

The Jakarta project [22] is an Apache working group dedicated to providing a pure Java Servlet and JavaServer implementation for use in the Apache Web Server. Until the fruits of this project are reaped, Apache Jserv [22] is a project that will create an extension module that will allow other extension modules to be created in Java (rather than C or Perl).

mod_pyapache

For those sites scripted in Python, mod_pyapache [23] embeds a Python interpreter into the server. mod_pyapache does not provide a Python interface to the Apache API.

PHP

PHP [24] is an open source Apache module allowing scripts to be embedded within HTML which will be processed by the server. PHP scripts may also occupy their own file. At a certain stage of the request cycle, mod_php will be called to deal with embedded PHP code, substituting it with output if necessary.

PHP is a powerful scripting language with syntax borrowed from 'C', Perl and Java. It has good support in various areas such as database interaction. As a script processor, PHP does not offer the wider functionality of mod_perl, but is a lighter weight and creditable solution for problems that can be solved using embedded scripting. With careful configuration, both mod_perl and mod_php can be installed on the same server.

The July 1999 Netcraft Web Server Survey [25] shows that from 6,598,697 IP addresses, 8.7% (574433 sites) were running PHP. The growth in the use of PHP recently has also exceeded the growth of the Apache server itself (see 'Who uses mod_perl' for further discussion).

Script co-processing

mod_perl increases performance of perl script by keeping a perl interpreter in the server and using this to deal with pre-compiled scripts. An alternative performance booster is to keep scripts running as a co-process and have the web server communicate with that process when the script needs to be run. Two examples of script co-processing include FastCGI [26] and mod_jserv [27].

SSI

Server Side Includes are a feature of many Web servers. They are designed for simple tasks, such as including a footer in HTML pages, or stamping the date. They lack the power of a scripting language, but can be very useful for simple tasks. An option when using mod_perl is to extend the standard Apache SSI mechanism to call perl subroutines.

Others

There are numerous other embedded scripting solutions, for further details see see the references section.

When choosing a particular solution for server technologies, developers should be aware the strengths and weaknesses of the various products, relative to their requirements. Proprietary products may work well with other proprietary products, but interoperability outside that may be weak. On the other hand, open standard solutions may interoperate well, but not offer the required functionality.

Who uses mod_perl?

The July 1999 Netcraft Web Server Survey [25] found that from 6,598,697 IP addresses, 56% were running Apache. Of these, some 5% (202,081) were using mod_perl. Since January 1999, the average monthly growth rate of Apache has been around 9%. Within the usership of Apache, we find that the average monthly growth for mod_perl to be around 16%. The conclusion is that the increasing number of sites using mod_perl is not simply due to the increasing use of Apache, going from 3.7% of Apache installations in January 1999 to 5.4% in July 1999.

The increase in take-up of mod_perl likely reflects the current trends shown in e-commerce, commercial, and other large sites to move beyond using a vanilla Web server to serve static pages.

These trends are based on the requirements to support ideas such as personalisation, e-commerce and banner advertising. Underlying server technologies are responsible for instantiating these concepts, and "behind the scenes", a server will be expected to offer support for things like custom authentication & access control, secure transactions, content negotiation and dynamic content (including database interaction and server-side scripting).

mod_perl has been chosen by many sites to provide the server technologies required by a modern web site. Examples include:

The Internet Movie Database [28]
mod_perl has been used to make efficient interactive database queries through a query cache. Also supports language negotiation.
Metacrawler [29]
All requests to this popular metasearchengine are routed through a perl module.
O'Reilly and Associates [30]
Access control to the online books site is provided through mod_perl.
HotBot [31]
mod_perl is used for the HotBot mail and HotBot homepages application.
Slashdot [32]
Slashdot.org - news for nerds is powered by Perl and MySQL.
CMPnet [33]
CPMnet is a technology information network. mod_perl is used to generate 70% of its pages (half a million hits per day). The CMPnet network includes TechWeb and FileMine.
Lind-Waldock & Co [34]
The world's largest discount commodities trading firm uses mod_perl under Stronghold [35] to generate live and delayed quotes, dynamic charts and news. The system is integrated with a relational database used for customer authentication and transaction processing.

It is hoped that the above examples will provide some insight into the performance and scalability of mod_perl (since I didn't have enough time to write that section!).

Summary

mod_perl is a serious contender as a solution to providing a modern and feature-full website. Many of the benefits of a mod_perl system derive from the open-source licensing of Perl, Apache and mod_perl. This has ensured that all three products have evolved, through the scrutiny and review of experts and end-users alike, to embody the functionality and performance required of such products in today's Web space. Portability, scalability, efficiency and good security are all well-known features of the three products. Of course, a significant benefit for many web administrators is that the product is available without charge for all.

The integration of Perl with Apache provides the Web administrator with a route that will allow the rapid development of complex web applications, that can operate efficiently and have the potential to scale, and are free from the hindrance of any proprietary caveats such as usage licenses and best operation within a proprietary system.

Of course, any solution must be considered on the basis of exact requirements and available resources (including any existing electronic infrastructure). However, in many cases, full consideration should place mod_perl on the short list.

References

  1. The Apache/Perl Integration Project
    http://perl.apache.org/
  2. The Apache Server Project
    http://www.apache.org/httpd.html
  3. CPAN: Comprehensive Perl Archive Network
    http://www.cpan.org/
  4. The Number One HTTP Server On The Internet
    http://www.apache.org/httpd.html
  5. Apache API Notes
    http://www.apache.org/docs/misc/API.html
  6. Module mod_cgi
    http://www.apache.org/docs/mod/mod_cgi.html
  7. Module mod_gunzip
    http://sep.hamburg.com/
  8. Doug MacEachern
    mailto:dougm@pobox.com
  9. Programming Perl
    http://www.oreilly.com/catalog/pperl2/index.html
  10. Google search for 'perl'
    http://www.google.com/search?q=perl
  11. Apache Performance Notes
    http://www.apache.org.uk/docs/misc/perf-tuning.html
  12. Netcraft
    http://www.netcraft.com/
  13. ASP Technology Feature Overview
    http://msdn.microsoft.com/workshop/server/asp/aspfeat.asp
  14. Internet Information Server
    http://www.microsoft.com/ntserver/web/default.asp
  15. Writing ISAPI Filters
    http://www.microsoft.com/MSJ/0498/IIS/IIS.htm
  16. The Common Gateway Interface
    http://hoohoo.ncsa.uiuc.edu/cgi/
  17. ECMAScript Language Specification
    http://hoohoo.ncsa.uiuc.edu/cgi/
  18. JavaScript Developer Central
    http://developer.netscape.com/tech/javascript/index.html
  19. Microsoft vbscript
    http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/default.htm
  20. The Document Object Model (DOM)
    http://www.w3.org/DOM/
  21. The Java(tm) Servlet API
    http://java.sun.com/products/servlet/index.html
  22. The Java Apache Project
    http://java.apache.org/
  23. PyApache
    http://www.msg.com.mx/pyapache/
  24. PHP
    http://www.php.net/
  25. July 1999 Netcraft Web Server Survey
    http://www.netcraft.com/survey/Reports/9907/
  26. FastCGI
    http://www.fastcgi.com/
  27. mod_jserv
    http://java.apache.org/jserv/index.html
  28. The Internet Movie Database
    http://www.imdb.com/
  29. Metacrawler
    http://www.metacrawler.com/
  30. O'Reilly and Associates
    http://www.oreilly.com/
  31. HotBot
    http://www.hotbot.com/
  32. Slashdot
    http://www.slashdot.org/
  33. CMPnet
    http://www.cmpnet.com/
  34. Lind-Waldock & Co
    http://www.lind-waldock.com/
  35. Stronghold
    http://www.c2.net/products/sh2/

Author Details

Ian Peacock
Netcraft
email address: ip@netcraft.com