By default, the standard LAMP (Linux Apache Mysql Php/Perl/Python) stack doesn’t come particularly well optimised for handling more than a trivial amount of load. For most people this isn’t a problem, either they’re running on a large enough server or their traffic is at a level that they never hit against the limits.
Anyway, I’ve hit against these limits on a number of occasions now, and while there are many good articles out there on the subject, I thought I’d write down my notes. For my own sake as much as anything else…
Apache
Apache’s default configuration on most Linux distributions is not the most helpful, and you’re goal here is to do everything possible to avoid the server having to hit the swap and start thrashing.
- MaxClients – The important one. If this is too high, apache will merrily spawn new servers to handle new requests, which is great until the server runs out of memory and dies. Rule of thumb:
MaxClients = (Memory - other running stuff) / average size of apache process.
If you’re serving dynamic PHP pages or pull a lot of data from databases etc the amount of memory a process takes up can quickly balloon to a very large value – sometimes as much as 15-20mb in size. Over time all running Apache processes will be the size of your largest script.
- MaxRequestsPerChild – Setting this to a non-zero value will cause these large spawned processes to eventually die and free their memory. Generally this is a good thing, but set the value fairly high, say a few thousand.
- KeepAliveTimeout – By default, apache keeps connections open for 15 seconds waiting for subsequent connections from the same client. This can cause processes to sit around, eating up memory and resources which could be used for incoming requests.
- KeepAlive – If your average number of requests from different IP addresses is greater than the value of MaxClients (as it is in most typical thundering herd slashdottings), strongly consider turning this off.
Caching
- Squid – Squid Reverse Proxy sits on your server and caches requests, turning expensive dynamic pages into simple static ones, meaning that at periods of high load, requests never need to touch apache. Configuration seems complex at first, but all that is really required is to run apache on a different port (say 8080), run squid on port 80 and configure apache as a caching peer, e.g.
http_port 80 accel defaultsite=www.mysite.com vhost
cache_peer 127.0.0.1 parent 81 0 no-query originserver login=PASS name=myAccelOne gotcha I found is that you have to name domains you’ll accept proxying for, otherwise you’ll get a bunch of Access Denied errors, meaning that in a vhost environment with multiple domains this can be a bit fiddly.
A workaround is to specify an ACL with the toplevel domains specified, e.g.
acl our_sites dstdomain .uk .com .net .org
http_access allow our_sites
cache_peer_access myAccel allow our_sites - PHP code cache – Opcode caching can boost performance by caching compiled PHP. There are a number out there, but I use xcache, purely because it was easily apt-gettable.
PHP
It goes without saying that you’d probably want to make your website code as optimal as possible, but don’t spend too much energy over this – there are lower hanging fruit, and as a rule of thumb memory and CPU is cheap when compared to developer resources.
That said, PHP is full of happy little gotchas, so…
- Chunk output – If your script makes use of output buffering (which Elgg does, and a number of other frameworks do too), be sure that when you finally echo the buffer you do it in chunks.
Turns out (and this bit us on the bum when building Elgg) there is a bug/feature/interaction between Apache and PHP (some internal buffer that gets burst or something) which can add multiple seconds onto a page delivery if you attempt to output large blocks of data all at once.
- Avoid calling array_merge in a loop – When profiling Elgg some time ago I discovered that array_merge was (and I believe still is) horrifically expensive. The function does a lot of validation which in most cases isn’t necessary and calling it in a loop is ruinous. Consider using the “+” operator instead.
- Profile – Profile your code using x-debug, find out where the bottlenecks are, you’d be surprised what is expensive and what isn’t (see the previous point).
Non-exclusive list, hope it helps!
Worth mentioning eAccelerator here as well.
Indeed, other code caches do exist. I picked on at random, wasn’t making an endorsement! 😀
eaccelerator’s not a trad cache – I run it on my server alongside Squid. It actually compiles the PHP bytecode. Makes dynamic PHP apps run significantly faster, alongside the effects that Squid can get you with cached content.
It should also be noted that it is generally accepted to be a dead project – no updates since May 2010, so support for future versions of PHP is doubtful. (http://en.wikipedia.org/wiki/List_of_PHP_accelerators#eAccelerator)
Instead of eAccelerator, take a look at APC.
I have previously talked about speeding up your site by using Squid as a reverse proxy to cache served pages. This is a great thing to do, but presents a problem now that all sites are moving over to HTTPS, since for various technical reasons reverse proxies can’t really handle HTTPS.
These days the standard way of doing this seems to be using Varnish as a cache, and Squid seems to be a little “old hat”, however I have several client estates which were set up before Varnish came on the scene, so I needed a solution I could get up and running very quickly.
Terminating HTTPS
Thankfully, the solution is very similar whatever reverse proxy you’re using. The solution is simple, you need to install something that terminates and handles the HTTPS session before sending it to your proxy. The simplest way to do this is to install NGINX and configure it to handle HTTPS sessions.
1) Disable Apache’s handling of HTTPS (if you’ve got an existing, un-cached, HTTPS server).
2) Install the basic nginx
apt-get install nginx-light
3) Configure nginx to forward to your proxy (which you have previously configured to listen on port 80)
server {
listen 443 ssl;
server_name http://www.example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:80;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Port 443;
proxy_set_header Host $host;
}
}
12345678910111213141516
server { listen 443 ssl; server_name www.example.com; ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; location / { proxy_pass http://127.0.0.1:80; proxy_set_header X–Real–IP $remote_addr; proxy_set_header X–Forwarded–For $proxy_add_x_forwarded_for; proxy_set_header X–Forwarded–Proto https; proxy_set_header X–Forwarded–Port 443; proxy_set_header Host $host; }}
After restarting nginx, you should be able to see https requests coming in on your squid proxy logs.
Gotchas
The biggest gotcha that you’re going to hit is that if you’re checking whether a request is HTTPS in your app (e.g. for automatically forwarding from insecure to secure), you’re not going to be able to use the standard protocol checks. The reason being is that HTTPS is being terminated by nginx, so by the time the session hits your app, it will not be seen as secure!
To perform such a test, you’re instead going to have to check for the
X-Forwarded-Proto
header instead ($_SERVER['HTTP_X_FORWARDED_PROTO']
in PHP).Thanks for visiting! If you’re new here you might like to read a bit about me.
(Psst… I am also available to hire! Find out more…)
Follow @mapkyca
!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?’http’:’https’;if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+’://platform.twitter.com/widgets.js’;fjs.parentNode.insertBefore(js,fjs);}}(document, ‘script’, ‘twitter-wjs’);
Share this:EmailLinkedInTwitterGoogleFacebookReddit