Unix: What Is mod_perl?
mod_perl  has to be one of the most useful and powerful of the Apache modules. Beneath the inconspicuous name, this module marries two of the most successful and widely acclaimed products of OSS, the Apache Webserver  and Perl . The result is a kind of Web developers Utopia, with Perl providing easy access to, and control of, the formidable Apache API. Powerful applications can be rapidly created and deployed as solutions to anything from an office Intranet to Enterprise level Web requirements.
Note that although this article discusses mod_perl from a Unix perspective, both Apache and Perl will run on a number of different platforms, including Win32.
Many readers will have at least heard of the Apache project. The highly popular  Apache Webserver has established itself as one of the protagonists of the "open source revolution". Designed to be extensible, the basic server core is complimented by various modules which supply functionality. For such a modular approach to be successful, the core must embody a comprehensive and well thought out API. The Apache API  provides just this - access to nearly all of the server's internal processing, so that custom steps may be introduced at any stage of the request process. Examples of such modules include the standard distribution mod_cgi , a module for executing CGI scripts, and the third party module mod_gunzip  for uncompressing files on-the-fly.
However, until a couple of years ago, the power of the Apache API could only be tamed using the 'C' language. The overhead of writing and testing a module written in C has meant that modules have been limited to the core server package, and for those requiring the performance benefits of C (e.g. when a CGI-based quick hack will not suffice). Fortunately, the introduction of mod_perl in 1996 by Doug MacEachern  changed things. By having a Perl interpreter within a module, an interpreter also becomes embedded within the server. This can mean significantly increased performance for perl CGI scripts, but the primary benefit is that mod_perl supplies Perl programmers with a direct line to the Apache API via Perl objects and method calls. The outcome is that server-side Perl programs can offer far more versatility over traditional CGI scripts that cannot interact with the server at different phases of the request process.
|Q:||What is Perl?|
|A:||Perl is a language for getting your job done.|
The above explanation opens the preface of Programming Perl  and strikes me as a very suitable synopsis. In the Unix world Perl enjoys a huge popularity and has been adopted as standard on many systems. The reasons for its success are many and include a scripting language/programming language duality, usability, support and the fact that it is free (open-source).
It is a language eminently suitable for rapid application development, where it may assume a programming language role, or for smaller projects such as a CGI application, where it may be treated more as a scripting language. Results are generally easy to achieve with Perl ("you don't have to know all about Perl to work with it") and it is flexible enough to suit the needs of many different types of application.
Support for Perl is vast. A glance at CPAN  shows modules that can provide everything from TCP network support to cryptography. There are also many script repositories, discussion groups and mailing lists covering Perl and Perl specifics.
A full treatise on the benefits of the Perl programming language is the subject of numerous articles and books .
What can you do with mod_perl?
mod_perl is more than a scripting language. It is a unification of Apache with Perl, meaning that much of Apache can be controlled from Perl (including its configuration - meaning that configuration can be dynamic!). Although Perl can be embedded within HTML documents (through ePerl, mod_perl extended Server Sides Includes or other methods), mod_perl is usually used to supply the functionality that allows Apache extension modules to be written in Perl.
The benefits of choosing mod_perl are the cojoined benefits of using Perl and Apache. There is a great deal of support available for mod_perl, though mailing lists and through repositories of code. Through the Apache, Perl and mod_perl communities, a valuable (and free) knowledge base is available. Portability is another advantage since a mod_perl system can be installed on virtually any flavour of Unix, and on Microsoft Windows systems, and when installed operates in an efficient and stable way. The reliability and stability of Apache is widely recognised .
As an open-source project, code undergoes constant review and fixes and upgrades are frequently produced, keeping a mod_perl system in touch with current trends and technologies.
In order to appreciate how mod_perl can extend the server, it is useful to know the different processing states that Apache undergoes when a request is received:
- URI translation phase [work out what the URI refers to (a physical file, a virtual document or a document generated by a module)]
- Access control phase [from where does the request originate?]
- Authentication phase [who is making the request?]
- Authorisation phase [who is allowed to perform this request?]
- MIME type checking phase [what is the document's type? -Select an appropriate content handler to deal with it
- Response phase [who will generate the content for this document?]
- Logging phase [who will log this transaction?]
- Cleanup phase [who's going to clean up?]
External modules can define custom 'handlers' to enhance or supersede Apache's core behaviour at each phase. With mod_perl installed, these modules can be implemented in Perl. Some examples of what different kinds of handler could achieve are given below.
Example 1: transparently uncompress HTML on-the-fly
Handy for server administrators who are running out of disk space. This module would be a content handler acting as a file processor. This handler would be called on request for an HTML file, example.html, if the HTML files exists it would decline the request and allow apache to deal with it as usual. If the file does not exist, it will look for example.html.gz, ungzip it on-the-fly, and sent it back to the user (who think they have retrieved a static HTML file). If neither files exist, the handler declines the request to let Apache dish out a 404. This module is in operation on parts of the Netcraft site .
Example 2: access control based on client attributes
A number of modules exist that allow different kinds of access control:
- Time-based, where access to parts of the site are restricted at certain times
- Browser-based, where parts of the site are restricted to certain user-agents
- Speed-based, where the site enforces a requests/sec limit and bans clients breaking this limit for a short length of time
Example 3: cookie-based access control
The Apache::TicketAccess module was designed to handle the situation where user authentication is expensive. Instead of performing full authentication each time the user requests a page, the module only authenticates against a relational database the first time the user connects. After successfully validating the user's identity, the module issues the user a 'ticket' - an HTTP cookie carrying the user's name, IP address, expiration date and cryptographic signature. Until it expires, the ticket can be used to gain entry to the site under the control of Apache::TicketAccess.
Example 4: Embedded scripting
Apache::ASP provides an Active Server Pages port to the Apache HTTP server with perl as the host scripting language.
Running Perl CGI scripts under mod_perl
Although CGI scripts can work as-is when mod_perl is installed, if they are written in Perl, they can be run through Apache::Registry for performance increase.
The Apache::Registry module allows legacy CGI perl scripts (that the maintainer has no time to convert to modules) to be run under mod_perl. A CGI environment is emulated, and the CGI script is compiled and cached, ready in executable form whenever a request comes in.
CPAN  contains a wide range of Apache Perl modules.
What is the difference between mod_perl and ...?
Based upon Microsoft's COM and DCOM ([Distributed] Common Object Model) architectures, ActiveX provides a container for dynamic link libraries, called an ActiveX control, which can be created using the likes of Visual Basic or C++. Such controls can be downloaded to the client and run on the client machine. COM is an architecture supported only by only a few operating-systems and browsers, making ActiveX suitable for a known supported client, or homogeneous intranet.
Microsoft's Active Server Pages , are for the IIS Web server , a similar offering to PHP on Apache. Although ASP has been ported to other platforms and Web servers, these account for few of the sites deploying ASP. The full benefits of ASP are likely to be reaped within a Microsoft environment, where there can be integration with other windows applications, for example, ASP access to an ISAPI  filter.
ASP is the most widely used solution for providing server-side technologies using Windows NT (almost invariably with IIS).
The Common Gateway Interface  is not a language but a protocol that describes how a Web client and server should interact when the client needs to send small amounts of information to the server via HTTP (the results of filling in a form, for example). Any server-side processing technology should be able to deal with CGI . An ASP enabled server, a mod_perl enabled server or a standalone script executed by the web server with output sent back to the client (a 'traditional' CGI program) should all be able to process the information sent via CGI.
There is sometimes a choice over whether to use server-side or client-side scripting, a few of the pros and cons are shown below:
|Client-side scripting||Server-side scripting|
|Have to make assumptions about client browser||Client browser does not affect processing|
|Client processor load increases slightly||Server load may increase significantly at times of high activity|
|Script needs to be downloaded||Script remains on the server (private)|
|Client can view script source||Client cannot view script source|
Server side and client side scripting can often be used in a complimentary way. For example, a client-side script may check that the contents of an HTML form conform to certain rules before submitting the information to a server to be processed by a server-side script.
According to Sun, A servlet can almost be thought of as an applet that runs on the server side -- without a face. Sun's Java Servlet  API supplies 'hooks' via which server side applications can be created. Servlet's are embedded into a JavaServer  web server where applications use the API much the same as Apache modules use the Apache API.
The Jakarta project  is an Apache working group dedicated to providing a pure Java Servlet and JavaServer implementation for use in the Apache Web Server. Until the fruits of this project are reaped, Apache Jserv  is a project that will create an extension module that will allow other extension modules to be created in Java (rather than C or Perl).
For those sites scripted in Python, mod_pyapache  embeds a Python interpreter into the server. mod_pyapache does not provide a Python interface to the Apache API.
PHP  is an open source Apache module allowing scripts to be embedded within HTML which will be processed by the server. PHP scripts may also occupy their own file. At a certain stage of the request cycle, mod_php will be called to deal with embedded PHP code, substituting it with output if necessary.
PHP is a powerful scripting language with syntax borrowed from 'C', Perl and Java. It has good support in various areas such as database interaction. As a script processor, PHP does not offer the wider functionality of mod_perl, but is a lighter weight and creditable solution for problems that can be solved using embedded scripting. With careful configuration, both mod_perl and mod_php can be installed on the same server.
The July 1999 Netcraft Web Server Survey  shows that from 6,598,697 IP addresses, 8.7% (574433 sites) were running PHP. The growth in the use of PHP recently has also exceeded the growth of the Apache server itself (see 'Who uses mod_perl' for further discussion).
mod_perl increases performance of perl script by keeping a perl interpreter in the server and using this to deal with pre-compiled scripts. An alternative performance booster is to keep scripts running as a co-process and have the web server communicate with that process when the script needs to be run. Two examples of script co-processing include FastCGI  and mod_jserv .
Server Side Includes are a feature of many Web servers. They are designed for simple tasks, such as including a footer in HTML pages, or stamping the date. They lack the power of a scripting language, but can be very useful for simple tasks. An option when using mod_perl is to extend the standard Apache SSI mechanism to call perl subroutines.
There are numerous other embedded scripting solutions, for further details see see the references section.
When choosing a particular solution for server technologies, developers should be aware the strengths and weaknesses of the various products, relative to their requirements. Proprietary products may work well with other proprietary products, but interoperability outside that may be weak. On the other hand, open standard solutions may interoperate well, but not offer the required functionality.
Who uses mod_perl?
The July 1999 Netcraft Web Server Survey  found that from 6,598,697 IP addresses, 56% were running Apache. Of these, some 5% (202,081) were using mod_perl. Since January 1999, the average monthly growth rate of Apache has been around 9%. Within the usership of Apache, we find that the average monthly growth for mod_perl to be around 16%. The conclusion is that the increasing number of sites using mod_perl is not simply due to the increasing use of Apache, going from 3.7% of Apache installations in January 1999 to 5.4% in July 1999.
The increase in take-up of mod_perl likely reflects the current trends shown in e-commerce, commercial, and other large sites to move beyond using a vanilla Web server to serve static pages.
These trends are based on the requirements to support ideas such as personalisation, e-commerce and banner advertising. Underlying server technologies are responsible for instantiating these concepts, and "behind the scenes", a server will be expected to offer support for things like custom authentication & access control, secure transactions, content negotiation and dynamic content (including database interaction and server-side scripting).
mod_perl has been chosen by many sites to provide the server technologies required by a modern web site. Examples include:
- The Internet Movie Database 
- mod_perl has been used to make efficient interactive database queries through a query cache. Also supports language negotiation.
- Metacrawler 
- All requests to this popular metasearchengine are routed through a perl module.
- O'Reilly and Associates 
- Access control to the online books site is provided through mod_perl.
- HotBot 
- mod_perl is used for the HotBot mail and HotBot homepages application.
- Slashdot 
- Slashdot.org - news for nerds is powered by Perl and MySQL.
- CMPnet 
- CPMnet is a technology information network. mod_perl is used to generate 70% of its pages (half a million hits per day). The CMPnet network includes TechWeb and FileMine.
- Lind-Waldock & Co 
- The world's largest discount commodities trading firm uses mod_perl under Stronghold  to generate live and delayed quotes, dynamic charts and news. The system is integrated with a relational database used for customer authentication and transaction processing.
It is hoped that the above examples will provide some insight into the performance and scalability of mod_perl (since I didn't have enough time to write that section!).
mod_perl is a serious contender as a solution to providing a modern and feature-full website. Many of the benefits of a mod_perl system derive from the open-source licensing of Perl, Apache and mod_perl. This has ensured that all three products have evolved, through the scrutiny and review of experts and end-users alike, to embody the functionality and performance required of such products in today's Web space. Portability, scalability, efficiency and good security are all well-known features of the three products. Of course, a significant benefit for many web administrators is that the product is available without charge for all.
The integration of Perl with Apache provides the Web administrator with a route that will allow the rapid development of complex web applications, that can operate efficiently and have the potential to scale, and are free from the hindrance of any proprietary caveats such as usage licenses and best operation within a proprietary system.
Of course, any solution must be considered on the basis of exact requirements and available resources (including any existing electronic infrastructure). However, in many cases, full consideration should place mod_perl on the short list.
- The Apache/Perl Integration Project
- The Apache Server Project
- CPAN: Comprehensive Perl Archive Network
- The Number One HTTP Server On The Internet
- Apache API Notes
- Module mod_cgi
- Module mod_gunzip
- Doug MacEachern
- Programming Perl
- Google search for 'perl'
- Apache Performance Notes
- ASP Technology Feature Overview
- Internet Information Server
- Writing ISAPI Filters
- The Common Gateway Interface
- ECMAScript Language Specification
- Microsoft vbscript
- The Document Object Model (DOM)
- The Java(tm) Servlet API
- The Java Apache Project
- July 1999 Netcraft Web Server Survey
- The Internet Movie Database
- O'Reilly and Associates
- Lind-Waldock & Co
email address: email@example.com