[DRAFT] Using Squid + Apache for Maximum Web App Performance

March 17, 2009

In this guide, I’m going to go through how I’ve configured high-performance web applications in the past using Squid and Apache to front-end the application server(s).

I was not completely alone in this, and need to thank a peer of mine, Tyler Hutcheon, for doing the initial research on Squid.

I’ve personally found this environment highly stable - I have one server environment that processes about 25M requests per month that gets recycled (stopped & started again) maybe 3-4 times a year for administrative purposes.

I’ve used this technique in front of Rails and Java applications of various types. I find it’s quite useful for taking a great deal of load off of the application server itself - given that many web app pages consist these days of many portions of javascript, css and images. And many frameworks (like JBoss Seam on the Java side for example) really don’t provide any ability to consolidate javascript or css into single files so you end up with quite a few requests back to the server for these individual fragment files.

Earlier this year, I had an application that had a very complex table-based page with a tremendous amount of data on it. Using these techniques I accelerated this complex page from 30-40 seconds to render down to 5-8 seconds - strictly by injecting appropriate cache-control headers and keeping these requests from distracting the application server. I continued to use only a single application server instance.

I also need to credit YSlow here - without this tool, I was completely lost with regards to where our performance was going. We were blaming all kinds of crazy things looking for the cause of this slow page-load time. And it simply turned out to be the latency in assembling 1/2 dozen or so javascripts, a dozen or so CSS fragments, and a couple hundred images.

A word of caution however - this is not a magic bullet. The first time you go through this exercise you will discover all kinds of things you will want to change in your application in order to help it optimize this environment. And you won’t be able to fully take advantage of this environment until you do. However once tuned, you can realize some fairly dramatic gains, and if you re-use these techniques on your future applications you can achieve these gains with no additional programming overhead.

Also, whenever you deal with fixed time caching you have the trade-off of quick performance versus stale data. An important attribute to consider is what I call the immediacy of your data. Take for example product image thumbnails on an e-commerce site, if they are changed how quickly does that change need to be reflected on the public site? 3-seconds? 3-minutes? 3-hours? 3-days?

A cache time as low as 1 second can improve performance by a factor of 10 when that asset is being requested 10 times per second.

So lets get started…

1. Your Application

Set up and start your application, make sure it’s working correctly.

On my production servers, I like to run the application on a private network interface or on an IP alias assigned to your loopback device. In your production environment, there should be no reason to expose your application directly to the Internet.

For a Rails process-based environment where you may want to start several Mongrel instances, sometimes I bind to the same IP address on multiple ports, and sometimes I set up multiple IP addresses - the choice is yours. Usually I end up re-using the same private IP address and assigning each Mongrel instance its own port. For the best performance, I typically start 2-4 Mongrel processes per CPU core in the server - but this depends on how your application is structured. If you have a lot of latency between your app server(s) and your database server(s) you may want to increase that number.

For a Java thread-based environment, you only need to start multiple server instances if you are looking for fault tolerance - there is no need to do this for performance. If your application server’s front-end is Tomcat based, you will probably want to tune the number of threads down from the default (250!) to 2-4 times the number of CPU cores in your server, again depending on the latencies inherent in your application. In my opinion, 250 is a terrible default - I’ve seen so many apps crash and burn because of this when overloaded.

If you have several physical or virtual servers assigned to host your application server instances, you’ll want to make sure they’re connected by a private switched network that is segmented from your production traffic. I’ve done this as simply as using a cross-over cable for a pair of hosts, or with a segmented VLAN on a managed switch, or on a VMWare ESX server using a separate private virtual switch dedicated to the application server farm.

2. Apache (v2.2) Load Balancer

Apache HTTP’s mod_proxy module has a capable balancer. The purpose of your Apache installation will be to handle incoming requests on a single private IP address and port, and proxy them back using a load-balancing strategy to all of your running application server instances.

If you are only using a single Java application server instance, there is actually no need for this step. However, if you want to remain open to being able to expand your capacity quickly you might set this up anyways so that it’s ready to extend on a moments notice.

The first step in setting up Apache is to set up your balancer configuration. I have a block like this:

<Proxy balancer://publishing>
   BalancerMember ajp://192.168.254.1:8009 route=cms1 loadfactor=1
   BalancerMember ajp://192.168.254.2:8009 route=cms2 loadfactor=1
</Proxy>

We’re defining this balancer with the name “publishing”. Note also that I’m defining the members to use AJP (Apache JServ Protocol) to proxy back. You can use AJP or HTTP here.

The “route” directive specifics a logical name for each member, and the “loadfactor” provides the ability for you to weigh down more powerful servers with a heavier load than your less powerful ones. I’m using the same “loadfactor” for each which means both members will get the same number of requests.

Now you set up the virtual paths on your Apache server to pass requests back to your defined balancer(s).

<Location /cms/content>
    ProxyPass balancer://publishing/cms/content
</Location>

The virtual path “/cms/content” from your apache server will now be proxied back to the “/cms/content” path on your balancer called “publishing”.

The content at this path for our CMS system is provided in a purely stateless manner. There is no session to worry about, so we can have any server in the farm deliver it.

If you’re passing back to an application where you need to maintain session “stickiness” with a particular server (this means that when a visitor comes to your site, they’re serviced by the same server in your farm for the duration of their session - this is important because the other servers normally don’t know about their session unless you’re using some kind of shared session mechanism like storing it in a database or in a memcache instance or using a Java application server with clustering enabled).

Here’s how to maintain stickiness:

<Location /cms>
    ProxyPass balancer://admin/cms stickysession=JSESSIONID
</Location>

Note the “stickysession” directive. This points to the cookie name used to track the session. In this case, the standard variable for Java web applications, “JSESSIONID”.

Something you may want to do is open up the balancer-manager on your install so that you can get a little visibility into how the server’s doing. Add this block to set that up:

<Location /balancer-manager>
    SetHandler balancer-manager
    AuthType Basic
    AuthName "Cluster Manager"
    AuthUserFile /home/cmslb/apache/conf/users.htpasswd
    Require valid-user
</Location>

Don’t forget to create your .htpasswd file with your username and password using the “htpasswd -c my.htpasswd myusername” command and ensure your “AuthUserFile” directive above matches where that file is on your server.

The astute observers will notice that I’ve sandboxed Apache in its own user account. This is just good security practice. You can sandbox each service individually, or if you’re setting these things up in a multi-tenancy environment you might want to group all of the services under a particular tenant’s user account.

In terms of tuning, there really isn’t much to tune here other than the balancer weighting for each node. Tuning most happens in your application and Squid.

3. Squid Accelerator

You may be familiar with Squid as a caching proxy server for outgoing requests across a congested Internet connection, but you may not be familiar with it as an application accelerator for incoming requests.

The purpose of your Squid installation will be to handle public requests for URLs, and decide whether to immediately send back a cached version of the response or proxy the request back to your Apache installation, which will in turn balance it across your application server farm.