Drupal Logo First of all, my previous post on Nginx vs. Lighttpd for a small VPS seems to be a “hit”. Thanks to Glenn for submitting it to programming.reddit.com, which got picked up by a few bloggers. Got around 1,000 unique visitors on the day where the post went live, which I guess is pretty good for this one-post-a-week site of mine.

Since I have briefly touched on the “URL rewrite-ability” of Nginx and Lighttpd in my previous post, I think it might actually be useful to have some examples showing how rewrite rules are written on these web servers to support clean URLs. I will take the open source CMS Drupal for example, as it is what Hosting Fu runs on. Btw, Drupal 5.0 has just been released and it rocks.

Prerequisite

These are the things that I assume you would know before reading this article.

  • Why clean URL. Pick your reason. SEO or general dislike of query string.
  • Setting up PHP. I won’t be talking about how to set up PHP/FastCGI on Nginx or Lighttpd. Here’s one for Nginx and one for Lighttpd.
  • Installing Drupal. Check related section in Drupal handbook.

I am only going to discuss the rewriting rules needed to enable clean URL in Drupal on either Nginx or Lighttpd.

Apache’s Mod_Rewrite

Like most open source PHP applications, Drupal came with a .htaccess file assuming Apache is serving the pages. We will use it as the reference on how the rewrite rules can be written for the other two web servers.

Here’s the bit in .htaccess that does rewrites:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

What it does is:

  • If requested file exists, serve it.
  • If requested directory exists, serve it depending on how index option is configured.
  • Otherwise, send all requests to index.php, setting parameter ‘q’ as the path of the original request, and then append the rest of the query string.

Simple. Now let’s see how you can do the same with the other two web servers.

Lighttpd

This site runs on Lighttpd 1.4.13, and there seems to be a lot of different ways to do clean URL with Drupal on Lighttpd. Although it came with its own “mod_rewrite”, but I found the functionality is very limited. The biggest problem I found is the lack of conditional rules to check whether a file or directory already exists. At the end you have to either (1) make lots of exceptions in Lighttpd’s rewrite rules, or (2) modify Drupal so it works better with Lighttpd.

I went with the second option.

Well, here are the steps.

  1. After Drupal has been installed and tested with clean URL disabled, add the following rules to lighty’s configuration file:

    server.error-handler-404 = "/index.php";
    

    It basically makes index.php the 404 error handler, so any request that does is not handled by a local file or directory will be sent to Drupal.

  2. Add the following PHP code to Drupal. I just append them to the end of sites/default/settings.php.

    if (strpos($_SERVER['SERVER_SOFTWARE'], 'lighttpd') !== false) {
        $_lighty_url = $base_url.$_SERVER['REQUEST_URI'];
        $_lighty_url = @parse_url($_lighty_url);
    
        if ($_lighty_url['path'] != '/index.php' && $_lighty_url['path'] != '/') {
            $_SERVER['QUERY_STRING'] = $_lighty_url['query'];
            parse_str($_lighty_url['query'], $_lighty_query);
            foreach ($_lighty_query as $key => $val)
                $_GET[$key] = $_REQUEST[$key] = $val;
            $_GET['q'] = $_REQUEST['q'] = substr($_lighty_url['path'], 1);
        }
    }
    

    Let me explain what it does:

    • If we are behind lighty, turn on this hack (I use Nginx for my development box so this code does not apply).
    • Try to parse the REQUEST_URI. If invoked as 404 error handler, we will parse the QUERY_STRING ourselves and copy the values to PHP’s $_GET and $_REQUEST variables.
    • Also set the path bit of REQUEST_URI as query argument q.

    The reason why we have to parse QUERY_STRING is, Lighttpd deliberately does not set QUERY_STRING if FastCGI is invoked as 404 error handler.

  3. Restart lighty, go to Drupal to enable clean URL and see whether it works.

Well, it has been working fine for me, but YMMV.

Nginx

Nginx comes with conditional code for rewrite rules so it is much easier. I basically have the following code in my Nginx configuration file to emulate Apache’s behaviour.

if (!-e $request_filename) {
    rewrite ^/(.*)$ /index.php?q=$1 last;
}

That’s it! Restarting Nginx, and you can now turn on clean URL in Drupal.

However, Nginx is not perfect, and so far I have found one small issue with its rewrite engine. When you use regular expression in Nginx’s rewrite rules, it will try to encode the matches in the replacement URL. So far I have seen it broke Drupal’s search module. For example, if you search for “Hosting Fu”, Drupal will use the following URL:

GET /search/node/Hosting+Fu HTTP/1.1

However, Nginx will rewrite that to:

GET /index.php?q=search/node/Hosting%2BFu HTTP/1.1

Note it encoded ‘+’ to ‘%2B’, which confuses Drupal, who thinks that you are actually searching the phrase ‘Hosting+Fu’. In case of Apache, ‘+’ passed through rewrite rules untouched.

Conclusion

Many open source PHP applications that I have experience with always assume the existence of Apache, and provide clean URL to only Apache users. On the other hand, developers of other frameworks like Ruby on Rails, Django, Webpy, etc take clean URL for granted because it is something handled right inside the framework. It makes the life of the web server guy much easier — what rewrite? Just proxy or pass through the whole damn thing!

I am hoping more and more PHP applications will use simplified rewrite rules. Let applications themselves take care of parsing the REQUEST_URI, instead of generating a million lines of Apache mod_rewrite rules and dumping them into .htaccess files.

Meanwhile, Nginx users will have much easier time porting those rules then the Lighttpd users.