Oh, The Huge Manatee

A blog about technology, open source, and the web... from someone who works with all three.

The Complete Drupal Cache - Serving HTTP and HTTPS Content With Varnish

Varnish is a fantastic caching proxy, commonly used for CMSes. It’s not uncommon to see benchmarks boasting 300-500 page loads per second - I’ve seen benches up to 5000 hits per second. That’s faster than serving flat HTML from Apache; we’re talking about a serious benefit to your server load here.

Part of Varnish’s tremendous speed comes from how lean it is. At only 58,000 lines of code, it’s very lightweight. Unfortunately, this necessitates a no-frills approach. And SSL is a frill.

I think it’s very well put by Poul Henning-Kamp (lead developer on the Varnish project) in this mailing list post:

I have two main reservations about SSL in Varnish:

1. OpenSSL is almost 350.000 lines of code, Varnish is only 58.000,
Adding such a massive amount of code to Varnish footprint, should
result in a very tangible benefit.

Compared to running a SSL proxy in front of Varnish, I can see
very, very little benefit from integration. Yeah, one process
less and only one set of config parameters.

But that all sounds like “second systems syndrome” thinking to me,
it does not really sound lige a genuine “The world would become
a better place” feature request.

But I do see some some serious drawbacks: The necessary changes
to Varnish internal logic will almost certainly hurt varnish
performance for the plain HTTP case. We need to add an inordinate
about of overhead code, to configure and deal with the key/cert
bits.

2. I have looked at the OpenSSL source code, I think it is a catastrophe
waiting to happen. In fact, the only thing that prevents attackers
from exploiting problems more actively, is that the source code is
fundamentally unreadable and impenetrable.

Unless those two issues can be addressed, I don’t see SSL in Varnish
any time soon.

Ouch. But that doesn’t help those of us who want Varnish’s speed with SSL’s security. Really the only solution is to set up an SSL proxy in front of Varnish. There are lots of ways to do this. I will show you what I think is the easiest option: Pound and Varnish.

1) Set up Varnish


I assume that you’ve already got a running Apache installation going. So now we have to put Varnish in front of it. The first step is to get Apache off of port 80 - that’s where Varnish is going to live. In order to do this, we have to find the “Listen” line in Apache’s configuration. On a standard install, it reads something like:

Listen 0.0.0.0:80

You want to change that to another port. 8080 is a popular one, but it can really be anything above 1024. In Debian systems you can find this line in /etc/apache2/ports.conf . In CentOS it’s in /etc/httpd/conf/httpd.conf . If you’re not sure where it is, try grepping for the standard help text around it: grep "Change this to Listen on specific IP addresses" /etc/apache2/* -r. You also want to make sure it only serves pages to localhost, so outsiders can’t attack your Apache directly. Modify the line to look like this:

Listen 127.0.0.1:8080

Now let’s install and configure varnish. On Debian/Ubuntu you can install it from apt repositories: apt-get install varnish. On CentOS, you first have to add the right repository for yum. You can install the “Extra Packages for Enterprise Linux” (EPEL) repo via RPM - get your version-and-architecture-appropriate link from the EPEL site. I used:

sudo rpm -Uvh http://fr2.rpmfind.net/linux/epel/5/x86_64/epel-release-5-4.noarch.rpm
sudo yum install varnish

Varnish is configured in two places. General command line options that are passed directly to the daemon are set in /etc/sysconfig/varnish , and specific behaviors for the proxy are configured in a .vcl file stored in /etc/varnish.

Varnish is extremely configurable and tune-able, but this guide will focus on the basics you need for Drupal 7 (Drupal 6 only works if you use Pressflow rather than vanilla Drupal, but that’s well documented elsewhere). First, edit the daemon options at /etc/sysconfig/varnish . The default file gives you four alternative configurations to choose from - we want configuration 2, the first one that uses a .vcl . Uncomment the DAEMON_OPTS lines there, and change the “listen” port to 80, and name your own .vcl file. Here’s my mostly default daemon_opts .

DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/swearingatcomputers.com.vcl \
-u varnish -g varnish \
-s file,/var/lib/varnish/varnish_storage.bin,1G"

Save the file. Now we’ll set up the .vcl file to configure the proxy itself. This is my .vcl , you can pretty safely just dump it into the .vcl you named in the DAEMON_OPTS above:


backend default {
.host = "127.0.0.1";
.port = "8080";
}

sub vcl_recv {
# // Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
# // Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
# // Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

# // fix compression per http://www.varnish-cache.org/trac/wiki/FAQ/Compression
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}

}

sub vcl_hash {
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
}

The bulk of this file is occupied with making sure that cookies aren’t cached, and solving a problem with compression. The only part that you should be concerned with editing is the bit at the top, backend default {. This is where you tell Varnish about all the back ends for which it should cache. Varnish is a great load balancer, so if you have 5 systems on the back end which are all serving content, you can list them here. Each one would get it’s own “Backend” declaration. If you want to load balance, see a different guide. We’re just interested in the caching for now. So set the .host and .port variables to match your setup - very likely you want to keep them the same.

Now test the whole thing by restarting apache and starting varnish.

sudo service httpd restart
sudo service varnish restart

If you don’t see any errors, you’re good to go! If you just get a generic [FAILED] for Varnish, without any error messages, there’s probably a syntax problem with your VCL.

2) Set up your SSL certs for Pound


Create your server’s private key and certificate request. I get confused easily between the different certs, so I name them in an idiot proof way that you might want to copy:

openssl req -new -newkey rsa:2048 -nodes -keyout swearingatcomputers.com.private.key -out swearingatcomputers.com.certreq.pem

Traditionally, your private key should go in /etc/ssl/private on Debian/Ubuntu , or /etc/pki/tls/private on CentOS. It really doesn’t matter, but this gives you a nice central place to store your certs.

Now use that certificate request to get a signed cert. I get mine on the cheap from Godaddy ($50/yr is hard to beat!), but if you just want to test, you can make a locally-signed cert like this:

openssl x509 -req -days 365 -in swearingatcomputers.com.certreq.pem -signkey swearingatcomputers.com.private.key -out swearingatcomputers.com.selfsigned.crt

The signed cert typically goes in /etc/ssl/certs .

For a normal SSL setup, this is all you need. But Pound likes both the certificates in a single file, so we’re going to have to make a special combined version for pound.


openssl x509 -in /etc/ssl/certs/swearingatcomputers.com.crt -out /etc/ssl/private/swearingatcomputers.com.combined.pem
openssl rsa -in /etc/ssl/private/swearingatcomputers.com.private.key >> /etc/ssl/private/swearingatcomputers.com.combined.pem

Now we’re ready to set up Pound.

3) Set up the Pound SSL proxy


This part surprised me with how easy it is. Pound is a great system that is very simple to configure! Install it using apt-get or yum: yum install pound, then configure it at /etc/pound.cfg .

First comment out or delete the ListenHTTP section. We don’t want Pound to listen on port 80 at all.

Then we’ll set up the ListenHTTPS section. Apart from telling it to listen on all devices’ port 443 and giving it the cert, we’re going to make sure it sets a special header to notify Drupal that it’s been forwarded from an HTTPS proxy. We’re also going to make sure that GET and PUT operations are supported. Then at the end, we will tell it where to find the back end (Varnish, in our case) - port 80. Here’s my pound config:


User "pound"
Group "pound"
Control "/var/lib/pound/pound.cfg"

#ListenHTTP
# Address 0.0.0.0
# Port 80
#End

ListenHTTPS
Address 0.0.0.0
Port 443
Cert "/etc/ssl/certs/swearingatcomputers.com.crt.pem"

# set X-Forwarded-Proto so D7 knows we're behind an HTTPS proxy.
HeadRemove "X-Forwarded-Proto"
AddHeader "X-Forwarded-Proto:https"

#Allow PUT and DELETE too
xHTTP 0
End

Service
BackEnd
Address 127.0.0.1
Port 80
End
End

Save the config file, and start pound with service pound start. There you go, you’ve got an HTTPS forwarder.

4) Make Drupal HTTPS aware


One big problem with the setup so far, is that Drupal doesn’t know that it’s serving HTTPS content. Remember, as far as Apache is concerned, it’s just HTTP served in the clear to Varnish. Even Varnish doesn’t really know about the HTTPS on the front end. We’re going to follow this X-Forwarded-Proto:https header back through the stack to make sure that every level interprets it properly.

First we deal with Varnish. Let’s make sure that the X-forwarded-proto header is delivered to Apache intact. Find the sub vcl_hash section of your .vcl file, /etc/varnish/swearingatcomputers.com.vcf, and add these lines:


if (req.http.x-forwarded-proto) {
set req.hash += req.http.x-forwarded-proto;
}

If you’re using my template above, the whole section will look like this:


sub vcl_hash {
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
if (req.http.x-forwarded-proto) {
set req.hash += req.http.x-forwarded-proto;
}
}

You’ll have to restart Varnish after making this change.

Now let’s make sure that Drupal knows to look for this header. D7 has some variables for this in it’s settings.php , just waiting to be uncommented. You can walk through the explanations in the file itself and uncomment the relevant lines, or just add this at the end:


# Settings for Varnish - tell Drupal that it's behind a reverse proxy

$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');

$conf['page_cache_invoke_hooks'] = FALSE;

# Settings for HTTPS cache - tell Drupal that forwarded https is the real thing
if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) &&
$_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') {
$_SERVER['HTTPS'] = 'on';
}

5) Test and brag


That’s it - you should have your proxy configured! You can do a simple test to make sure it’s working by watching varnishlog for cache hits. Simply varnishlog |grep hit in your terminal, and try refreshing the frontpage of your site. You should see a few lines of hits pop up in the log. (If not, you might want to try grepping for “pass” or “miss” to help work out what’s happening)

Now let’s see how this caching holds up under load. After all, that’s the whole point, right? I like a simple ab test

ab -c 40 -n 5000 -q http://swearingatcomputers.com/

This will simulate 5000 hits on the frontpage, at a rate of 40 per second. Look for “Requests per second”, that’s my favorite statistic here. On my “playing around” Amazon Micro instance, I pull about 650 hits per second. In theory, this smallest of VPS servers could handle over 2 million hits per hour!

I love Varnish.

Comments