The coordination of compression between servers and clients seems simple, but it must work correctly. The page could easily break if either the client or server makes a mistake (sending gzipped content to a client that canít understand it, forgetting to declare a compressed response as gzip-encoded, etc.). Mistakes donít happen often, but there are edge cases to take into consideration.
Approximately 90% of todayís Internet traffic travels through browsers that claim to support gzip. If a browser says it supports gzip you can generally trust it. There are some known bugs with unpatched early versions of Internet Explorer, specifically Internet Explorer 5.5 and Internet Explorer 6.0 SP1, and Microsoft has published two Knowledge Base articles documenting the problem (http://support.microsoft.com/ kb/313712/en-us and http://support.microsoft.com/kb/312496/en-us). There are other known problems, but they occur on browsers that represent less than 1% of Internet traffic. A safe approach is to serve compressed content only for browsers that are proven to support it, such as Internet Explorer 6.0 and later and Mozilla 5.0 and later. This is called a browser whitelist approach.
With this approach you may miss the opportunity to serve compressed content to a few browsers that would have supported it. The alternativeóserving compressed content to a browser that canít support itóis far worse. Usingmod_gzipin Apache 1.3, a browser whitelist is specified usingmod_gzip_item_includewith the appropriateUser-Agent values:
Adding proxy caches to the mix complicates the handling of these edge case browsers. Itís not possible to share your browser whitelist configuration with the proxy. The directives used to set up the browser whitelist are too complex to encode using HTTP headers. The best you can do is addUser-Agentto theVaryheader as another criterion for the proxy.
Once again,mod_gziptakes care of this automatically by adding theUser-Agentfield to theVaryheader when it detects that youíre using a browser whitelist. Unfortunately, there are thousands of different values forUser-Agent. Itís unlikely that the proxy is able to cache all the combinations ofAccept-EncodingandUser-Agentfor all the URLs it proxies. Themod_gzipdocumentation (http://www.schroepl.net/projekte/ mod_gzip/cache.htm) goes as far as to say, ďusing filter rules evaluating the User- Agent HTTP header will lead to totally disabling any caching for response packets.Ē Because this virtually defeats proxy caching, another approach is to disable proxy caching explicitly using aVary: *orCache-Control: privateheader. Because theVary: *header prevents the browser from using cached components, theCache-Control: privateheader is preferred and is used by both Google and Yahoo!. Keep in mind that this disables proxy caching for all browsers and therefore increases your bandwidth costs because proxies wonít cache your content.
This decision about how to balance between compression and proxy support is complex, trading off fast response times, reduced bandwidth costs, and edge case browser bugs. The right answer for you depends on your site:
If your site has few users and theyíre a niche audience (for example, an intranet or all using Firefox 1.5), edge case browsers are less of a concern. Compress your content and useVary: Accept-Encoding. This improves the user experience by reducing the size of components and leveraging proxy caches.
If youíre watching bandwidth costs closely, do the same as in the previous case: compress your content and useVary: Accept-Encoding. This reduces the bandwidth costs from your servers and increases the number of requests handled by proxies.
If you have a large, diverse audience, can afford higher bandwidth costs, and have a reputation for high quality, compress your content and useCache-Control: Private. This disables proxies but avoids edge case bugs.
There is one more proxy edge case worth pointing out. The problem is that, by default, ETags (explained in Chapter 13) donít reflect whether the content is compressed, so proxies might serve the wrong content to a browser. The issue is described in Apacheís bug database (http://issues.apache.org/bugzilla/show_bug. cgi?id=39727). The best solution is to disable ETags. Since thatís also the solution proposed in Chapter13, I go into more detail about ETags there.