Is your web server using using
gzip encoding? Surprisingly, many are not. I just wrote a little script to fetch the 30 external links off
news.yc and check if they are using gzip encoding. Only 18 were, which means that the other 12 sites are needlessly slow, and also wasting money on bandwidth.
Check your site here.
Some people think gzip is "too slow". It's not. Here's an example (run on my laptop) using data from one of the links on
news.ycombinator.com:
$ cat < /tmp/sd.html | wc -c 146117$ gzip < /tmp/sd.html | wc -c 35481$ time gzip < /tmp/sd.html >/dev/null real 0m0.009suser 0m0.004s sys 0m0.004sIt took 9ms to compress 146,117 bytes of html (and that includes process creation time, etc), and the compressed data was only about 24% the size of the input. At that rate, compressing 1GB of data would require about 66 seconds of cpu time. Repeating the test with a much larger file results yields about 42 sec/GB, so 66 sec is not an unreasonable estimate.
Inevitably, someone will argue that they can't spare a few ms per page to compress the data, even though it will make their site much more responsive. However, it occured to me today that thanks to Amazon, it's very easy to compare CPU vs Bandwidth. According to their
pricing page, a "small" (single core) instance cost $0.10 / hour, and data transfer out costs $0.17 / GB (though it goes down to $0.10 / GB if you use over 150 TB / month, which you probably don't).
Using these numbers, we can estimate that it would cost $1.88 to gzip 1TB of data on Amazon EC2, and $174 to transfer 1TB of data. If you instead compress your data (and get 4-to-1 compression, which is not unusual for html), the bandwidth will only cost $43.52.
Summary:
with gzip: $1.88 for cpu + $43.52 for bandwidth =
$45.40 + happier userswithout gzip: $174.00 for bandwidth =
$128.60 wasted + less happy users The other excuse for not gzipping content is that your webserver doesn't support it for some reason. Fortunately, there's a simple solution: put
nginx in front of your servers. That's what we do at
FriendFeed, and it works very well (we use a custom, epoll-based python server). Nginx acts as a proxy -- outside requests connect to nginx, and nginx connects to whatever webserver you are already using (and along the way it will compress your response, and do other good stuff).