

New multi-page HTTP compression proposal from Google - dmv
http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/0441.html

======
axod
Support for something like this would be a step in the right direction, but I
think there are a couple of simpler ways to improve HTTP:

A similar peeve of mine is HTTP headers.

If a browser opens a connection to a web server, and the connection is keep-
alive, the browser will send several requests down than one connection.

But for _every_ single request, it'll send out it's full headers. That's
really wasteful and idiotic. Send full headers when the connection is opened,
there is no need to repeat every single time.

Also if the connection is keep-alive, it'd be reasonably simple to have gzip
compression over the full data - not per request. This would achieve the same
as the google proposal, but in a better way IMHO.

The HTTP headers can add up quite a bit if you're using XMLHttpRequest or
similar. Also if the data is small, compression isn't worthwhile. HTTP header
spam is a PITA.

So if I had my way:

* Headers _only_ sent once at the start of a connection, not per request. Send them if they change - eg a new cookie has been set since the last request :/

* A new transfer-type to specify that the data is gzipped as one - instead of gzipped per request.

Those 2 simple changes to HTTP would make things _so_ much better.

~~~
coderrr
Those are interesting changes.

It's true most headers don't change. One I can think of that usually changes
between resource types is Accept. Usually it will be slightly different
between <img> <script> <link> and <iframe>, but this probably wouldn't make
much of a difference if you allow only to send changed headers. I'd be curious
to see how much bandwidth you save with this. You also might want to allow for
header removal as you do header change. I can't think of a scenario where not
removing a header would cause a problem, but there could potentially be one.

For the gzip as a whole instead of per request, there's one reason I can't see
many browsers taking advantage of that. Most browsers will make requests like,
write request, read response, write request, read response. Instead of write,
write, write, read, read, read. So I'm not sure how you could unzip everything
together unless you wait to display the items till the entire connection is
finished. Also, this would require the client to give an indication when it is
done writing requests to the stream, so that all the data can be fetched from
the server and then zipped together. Which would require a much bigger change
to the protocol.

Is there anything I'm missing?

~~~
axod
The main gain with headers would be for comet like applications. In
Mibbit/Meebo etc type applications you're sending a lot of small messages,
interspersed with HTTP header spam. Often the data is smaller than the HTTP
headers.

For gzip, I don't see an issue. The only change that would be needed would be
for the gzip state to be saved between requests. For the browser, it would
request object A, get the response, unzip it, display. Then it would request
object B, unzip it using the previous gzip state, etc.

For the sender, likewise. So there would be no change in terms of timing. The
only change would be that the gzip state would be carried over to the next
request. (It's possible I'm remembering wrong and gzip can't do this - if so a
different compression method that can be compressed/decompressed individually,
but using a running shared dictionary/state would be needed).

~~~
tlrobinson
Comet optimizes for latency, with the big improvement of avoiding the latency
of opening a TCP connection and sending the request.

With both long polling and streaming you could probably send the headers long
before the actual data is ready to be sent as well.

------
ardit33
I read the whole thing, and I just don't like it. The beauty of HTTP headers,
cookies, and elements is their simplicity (or primitivness). They are easy to
implement.

This proposal will introduce a huge complexity to the HTTP spec. If you have
implemented caching in a client, it is so easy for things to go wrong, even if
the clients are right, the server, content managers could mess this up roaly
really fast.

The other thing I don't like, is that when using raw sockets, and try to
implement HTTP over it, (many reasons to do this, especially in mobile), now
you have to deal with more complexities.

As somebody mentioned above, eliminating duplicate http headers, and
addressing the duplicity issue in the markup language itself (i.e HTML5 or
XHTML2), and not the transport protocol.

~~~
ardit33
here is somebody's counterpoint:

"It seems to me that AJAX can be used to solve this problem in a simpler
manner. Take Gmail for example--it downloads the whole UI once and then uses
AJAX to get the state-specific data. The example from the PPT showed a 40%
reduction in the number of bytes transmitted when using SDCH (beyond what GZIP
provided) for google SERPs. I bet you could do about that well just by
AJAXifying the SERPs (making them more like GMail) + using regular HTTP cache
controls + using a compact, application-specific data format for the dynamic
parts of the page + GZIP. Maybe Google's AJAX Search API already does that? In
fact, you might not even need AJAX for this; maybe IFRAMEs are enough.

I also noticed that this proposal makes the request and response HTTP headers
larger in an effort to make entity bodies smaller. It seems over time there is
an trend of increasingly large HTTP headers as applications stuff more and
more metadata into them, where it is not all that unusual for a GET request to
require more than one packet now, especially when longish URI-encoded IRIs are
used in the message header. Firefox cut down on the request headers it sends
[2] specifically to increase the chances that GET requests are small enough to
fit in one packet. Since HTTP headers are naturally _highly_ repetitive
(especially for resources from the same server), a mechanism that could
compress them would be ideal. Perhaps this could be recast as transport-level
compression so that it could be deployed as a TLS/IPV6/IPSEC compression
scheme.

Regards, Brian "

~~~
litewulf
I assume the main argument against this idea is the burden it places on the
Javascript engine. Its the same reason people use gzip and not packer (well,
assuming packer produces a smaller file, which happens sometimes).

Engines are getting faster, but they still really can't compete with native
browser facilities.

------
jwilliams
I haven't read the detail of the specification, but is a great idea.

The amount of similarity between pages of Markup (esp XML) or related pieces
of JavaScript could be significant.

I found this Google PowerPoint that hints at some of the benefits
[http://209.85.141.104/search?q=cache:RIkP-5qZ4awJ:assets.en....](http://209.85.141.104/search?q=cache:RIkP-5qZ4awJ:assets.en.oreilly.com/1/event/7/Shared%2520Dictionary%2520Compression%2520Over%2520HTTP%2520Presentation.ppt+SDCH+results&hl=en)

The PPT claims _About 40 percent data reduction better than Gzip alone on
Google search._

------
dmv
Link (of a link) to the PDF:
[http://sdch.googlegroups.com/web/Shared_Dictionary_Compressi...](http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HTTP.pdf)

------
andrewf
Can't be a coincidence that they started pushing this a week after Chrome
arrived. I wonder what other proposals Google has coming?

------
bprater
Curious as to how this compares to standard GZIP compression over the course
of a hundred pages on a website.

~~~
andreyf
Another post cites 40%.

