PHP and HTTP HEAD Requests
The HTTP protocol used by Web servers supports a few different commands. The typical GET command returns a whole Web page, while the HEAD command just returns its headers. Smart clients use this to determine whether a document has been modified before retrieving the whole thing.
I never really considered this, but it turns out that a HEAD request for a dynamic page using PHP will execute the entire script, just like a GET request, by default. This has a couple of implications: first, a script that counts page views may be incorrectly counting HEAD requests. Second, a HEAD request puts the same load on the server as a full GET request, despite not sending the full output of the script.
I found out about this because the W3C Link Checker issued over ten thousand HEAD requests to various pages on SlashNot in a single day. These requests came rapid-fire and nearly crashed the server a couple of times. The link checker racked up a total of 61000 requests over the last week. It still seems to be running a link check every 35 minutes or so, although now with far fewer requests. I don’t know if this is a well-meaning recursive link check gone horribly wrong or someone using it against us on purpose, but we certainly didn’t request any link-checking.
Lessons learned: (1) Check $SERVER[‘REQUESTMETHOD’] in PHP programs and respond appropriately to HEAD requests so they can’t overload the server. (2) Block the W3C-checklink user agent or use robots.txt in case the link checker falls madly in love with our site again.
Thanks, that is exactly what I was looking for to finish off my little caching project (: