Hacker News new | past | comments | ask | show | jobs | submit login
Who’s not getting gzip? (stevesouders.com)
37 points by ypavan on Nov 18, 2009 | hide | past | favorite | 18 comments



From the referenced Google Code Blog http://googlecode.blogspot.com/2009/11/use-compression-to-ma...:

Anti-virus software may try to minimize CPU operations by intercepting and altering requests so that web servers send back uncompressed content. But if the CPU is not the bottleneck, the software is not doing users any favors. Some popular antivirus programs interfere with compression. Users can check if their anti-virus software is interfering with compression by visiting the browser compression test page at Browserscope.org.

Serious? Anti-virus software that is doing that is acting almost like malware doing what the user didn't ask it to do. Does compression have any even minor security implications that would legitimize this? And anyone know which anti-virus does this?


Does compression have any even minor security implications that would legitimize this?

The quote you gave said that anti-virus apps might do it to "minimize(sic) CPU operations" rather than for security reasons.


(sic)? Not where the post was written it's not :P


But it is not written thus where I wrote my comment, ergo I raise you a smiley

;0P>


Well anti virus software usually checks for signatures of known malware in the incoming data. I imagine compression would slow this down. They would have to uncompress the data and then check for signatures.

(You can say that they can just compress the signatures, but that probably wont work for the more complex compression schemes, because they are not that predictable. Or maybe it can work, but it was considered too hard to code by the AV people.)


Compression schemes have to be predictable when run on the same data. Otherwise decompression wouldn't work. If signature detection actually is more difficult on compressed data, it's not because the compression is unpredictable.


Yes they are ultimately predictable, but they are not easily predictable. That is one usually has to go through all the steps of decompression to predict them there are no shortcuts.

Imagine a data stream X that is compressed into another data stream Y. Imagine that a small portion of X is data stream x1 which is the portion of data used for a signature. That will get compressed into y1. Now lets define x2 as all the data in X that is not x1. Now if you are always guaranteed that the same x1 would get compressed into the same y1, then things would be easily predictable and you can just compare compressed signatures. But this is not the case. If x2 is different, then the same x1 can be compressed into a different string.


Compression schemes have to decode predictably, but they don't necessarily generate the same compressed data. gzip -9 will usually use less space than gzip -1 and decompress to the same data.

What's probably happening is virus scanners are taking the easy way out rather than implementing decompression algorithms in their scanning engines (it probably looks better on traditional benchmarks too)


Even if different data are being generated, the compression is still predictable. There must be something in the gzip header indicating the level of compression (to ensure that the correct decompression algorithm is used). Point being, there's nothing stopping the AV program from figuring out which compressed signature is the correct one. It can even compress the raw signature on the fly.

On your second point I agree. I'll refrain from ranting about AV programs here, but suffice to say there hasn't been enough innovation in the field because AV companies are able to sell substandard products and still make good money.


To be more specific, compression schemes decode predictably, but any given portion of the compression stream need not be independently predictable. Gzip (and DEFLATE in general) use adaptive compression approaches - the data determines the encoding. If you know your content of interest is in the beginning of the stream, pre-computing different compressions should work out. Otherwise...


You've got it backwards: it is the decompression that is predictable; compressors are free to transform the input in any way such that a compliant decompressor will extract the original file.

The gzip format documentation is available here btw: http://www.gzip.org/zlib/rfc-gzip.html


As an exercise, I heard that a toolbar company (who shall remain nameless) ran an experiment to record a client-side hash of the DOM of some particular static pages and found that something like 30%-40% of users had modified or tampered results. Some of this could be spyware, some the antivirus add-ins, but still shockingly high for static content I thought.


"DOM model" or innerHTML? Did they control for normal modifications that browsers do, like inserting missing <tbody> tags, removing whitespace and the like? Any more data on the nature or amount of modification?


I believe he was looking pretty closely at the entire DOM rather than specifically innerHTML but I didn't really dig down on that.

As for the browser mods, he controlled to look at the difference holding browser version/OS steady, so the changes would be consistent inside the browser,version,os pairing. It was a rather scientific approach imo.


Those are scary numbers. Anybody know of rules to force it to on, without breaking clients that really don't support it?


I suppose they use User-Agent sniffing, which is obviously not very robust. That said, something like the rule below should work (mod_gzip/Apache 1.x):

    mod_gzip_item_include         reqheader  "User-agent: .*MSIE 6.0.*"
Mind you, this has not been tested, interacts badly with proxies, and is based on a pretty broken idea in the first place. On the other hand, almost every modern browser supports gzip, so it's not clear how much it would hurt.


Thanks. I suppose it needs a lot of testing.


Netflix did an awesome presentation on front-end performance and put up the full slides here: http://www.scribd.com/doc/14205366/High-Performance-Web-Page....

One of the slides has their code-ready Apache configuration for gzip.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: