

SPDY Protocol (draft 3) submitted to IETF - igrigorik
https://tools.ietf.org/html/draft-mbelshe-httpbis-spdy-00

======
metabrew
I have a (mostly working, neglected) implementation of SPDYv2 in erlang
(github.com/RJ/erlang-spdy). I've been following the mailing list about the v3
spec. Notable differences if you are updating a spdy library:

* various fields changed size

* compression dictionary [1] for headers block updated

* a few things clarified where spec was ambiguous

* CREDENTIAL frame, so multiple ssl certs can be used on one connection

* addition of per-stream flow control

The last two are probably the most work for implementers.

[1] Compression dictionary for header block: * Paper:
<http://www.eecis.udel.edu/~amer/PEL/poc/pdf/SPDY-Fan.pdf> * The Dictionary:
[http://www.cis.udel.edu/~amer/PEL/SPDY/SPDY-proposed-
initial...](http://www.cis.udel.edu/~amer/PEL/SPDY/SPDY-proposed-initial-zlib-
dictionary.txt)

------
0xABADC0DA
Still has a prefix dictionary. The benefit of the prefix dictionary over
starting from scratch is extremely marginal (about a hundred bytes on the
first request).

Still making claims comparing to HTTP without pipelining. After how long they
still haven't even compared to pipelining (because Chrome doesn't do
pipelining, is Firefox is banned at Google or something?)

Still has server push. Read the spec at how complicated this is, all to save
one one-way trip.

Still hardcodes parts of HTTP into the protocol.

...gross.

~~~
JoshTriplett
> Still has a prefix dictionary. The benefit of the prefix dictionary over
> starting from scratch is extremely marginal (about a hundred bytes on the
> first request).

And when the entire exchange consists of a few hundred bytes (which the server
can answer with a 304 Not Modified), that seems like a substantial win. What
argument do you have against doing this?

> Still making claims comparing to HTTP without pipelining. After how long
> they still haven't even compared to pipelining (because Chrome doesn't do
> pipelining, is Firefox is banned at Google or something?)

Firefox still doesn't do pipelining by default either, with the stated reason
that some web servers will break with it enabled. Using HTTP pipelining at
this point would require some kind of positive indication from the server that
it will work, or some kind of autodetection by the browser (which will take
multiple requests to do). Also, HTTP pipelining requires answering requests in
order, while SPDY doesn't, allowing the server to respond to requests as data
becomes available.

> Still has server push. Read the spec at how complicated this is, all to save
> one one-way trip.

Server push potentially eliminates several full round trips. If I request a
page generated by an expensive CGI, the server can go ahead and send me the
CSS and JavaScript it knows all pages will reference, and the pile of images
referenced from that CSS or JavaScript, all while the CGI runs. That
eliminates at least two round-trips, or even more if more images exist than
the maximum number of concurrent requests browsers will hit a server with.

> Still hardcodes parts of HTTP into the protocol.

SPDY specifically exists to replace HTTP, not anything else.

~~~
0xABADC0DA
> And when the entire exchange consists of a few hundred bytes (which the
> server can answer with a 304 Not Modified), that seems like a substantial
> win. What argument do you have against doing this?

Turn this around. What's the argument _for_ doing it? "The gain using our
proposed initial dictionary is seen only for the first header". This is
completely counter to the goal of Spdy to keep connections open and reuse
them; the longer connections are used the less the initial dictionary matters.

Even just visiting one page, average say 500 KiB, it save on average 121 bytes
total. That's 0.02% reduction in size. This seems like a "substantial win"?

Meanwhile how many version of the dictionary are there so far? 5? 10? And they
propose that it will evolve over time, so that will just keep growing with
software everywhere having dozens of legacy dictionaries.

> Also, HTTP pipelining requires answering requests in order, while SPDY
> doesn't, allowing the server to respond to requests as data becomes
> available.

This is why metrics are important. By ignoring pipelining because of perceived
problems, the authors are basing their protocol on assumptions. This
assumption is that in the real world the 'head of line blocking' is a
significant factor. Judging by 0.02% average gain from prefix dictionaries, I
don't give them the benefit of the doubt that their assumptions are correct.
But I see that Chrome 17 has some type of pipelining support, so maybe they
will actually test this someday.

> If I request a page generated by an expensive CGI, the server can go ahead
> and send me the CSS and JavaScript it knows all pages will reference ...

Yes if the very first request is an expensive CGI this may be some marginal
benefit. I think even the Spdy designers claimed this was less than 1% and
sometimes a loss. This happens very often, that the first page requested from
a site has some really slow loading CGI? I think the site is broken.

~~~
JoshTriplett
> Turn this around. What's the argument for doing it? "The gain using our
> proposed initial dictionary is seen only for the first header". This is
> completely counter to the goal of Spdy to keep connections open and reuse
> them; the longer connections are used the less the initial dictionary
> matters.

Browsers won't keep SPDY connections open forever in the hopes of someday
reusing them; the "initial connection" case will happen quite frequently in a
normal browsing session.

> Even just visiting one page, average say 500 KiB, it save on average 121
> bytes total. That's 0.02% reduction in size. This seems like a "substantial
> win"?

When did you last visit a page with 500k of HTML? I suggested the hopefully
very common case of sending a very small request and getting back a very small
304. The whole exchange consists entirely of headers; I just checked a few
examples and got figures in the 300-400 byte range for request and response
combined. That would make 121 bytes closer to a 30-40% savings.

(Also, I'd love to see a reference for your figures on expected bytes saved
through the prefix dictionary.)

> Yes if the very first request is an expensive CGI this may be some marginal
> benefit. I think even the Spdy designers claimed this was less than 1% and
> sometimes a loss. This happens very often, that the first page requested
> from a site has some really slow loading CGI? I think the site is broken.

Server push seems like a win at any point in a SPDY connection, not just for
the initial connection.

~~~
0xABADC0DA
> Browsers won't keep SPDY connections open forever in the hopes of someday
> reusing them; the "initial connection" case will happen quite frequently in
> a normal browsing session.

This is what boggles my mind about Spdy, the dissonance. On the one hand Spdy
is great because it does a bunch of requests on the same TCP connection, but
on the other hand Spdy is great because it saves 100 bytes per connection and
that's a big deal because there are going to be so many connections made? It
doesn't make sense.

Spdy is great because it has compression, but on the other hand Spdy is great
because it requires SSL which already has compression. Huh?

> I suggested the hopefully very common case of sending a very small request
> and getting back a very small 304. ... The whole exchange consists entirely
> of headers ... That would make 121 bytes closer to a 30-40% savings.

On a first request only. You visit some site and only check if exactly one
resource? Not likely. In any case, the cost to transfer 100 bytes once is
irrelevant in any grand scheme of things.

> (Also, I'd love to see a reference for your figures on expected bytes saved
> through the prefix dictionary.)

<http://www.eecis.udel.edu/~amer/PEL/poc/pdf/SPDY-Fan.pdf>

The dictionary construction part is suspect though... take a look at how many
times the same string length count (\0\0\0\4 for example) occurs in the prefix
-- this can't be optimal.

~~~
JoshTriplett
> This is what boggles my mind about Spdy, the dissonance. On the one hand
> Spdy is great because it does a bunch of requests on the same TCP
> connection, but on the other hand Spdy is great because it saves 100 bytes
> per connection and that's a big deal because there are going to be so many
> connections made? It doesn't make sense.

Adaptability: SPDY works well for both cases, rather than only picking one
case and optimizing for that case alone. I see quite a bit of value in
optimizing SPDY for a pile of short single-request connections to sites, as
well as optimizing for numerous requests to the same site. Google's own site
will serve numerous examples of both; see below.

> On a first request only. You visit some site and only check if exactly one
> resource? Not likely. In any case, the cost to transfer 100 bytes once is
> irrelevant in any grand scheme of things.

It costs nothing to initialize the compressor state differently, and saves
100+ bytes per initial request/response.

A quick search suggests that Google services several billion searches per day.
Assume for the moment that a substantial fraction of those searches come from
user agents that don't already have an open connection to Google. Starting the
compressor out with a specific state based on a prefix dictionary costs
nothing (just changing the initial state of the compressor), but saves 100+
bytes per initial request/response. So, just for Google alone that change
could save on the order of 100GB of traffic per day, for something that costs
browsers and servers nothing to do. Now multiply that out for every other site
on the Internet.

And for a different approach, consider the total number of initial connections
that take place on GSM/3G networks every day.

More importantly than bandwidth, saving 100 bytes provides a potentially
substantial latency benefit for the initial response, making it that much more
likely to fit into a single frame rather than fragmenting.

If SPDY can save 100 bytes for free, why _not_ do it?

> <http://www.eecis.udel.edu/~amer/PEL/poc/pdf/SPDY-Fan.pdf>

Thanks for the reference!

That paper seems to compare their proposed dictionary to "SPDY’s current
default initial dictionary", which as far as I can tell means the prefix
dictionary from some iteration of SPDY, rather than zlib's default. I don't
think the paper provides statistics on how much a prefix dictionary saves over
not having one at all.

> The dictionary construction part is suspect though... take a look at how
> many times the same string length count (\0\0\0\4 for example) occurs in the
> prefix -- this can't be optimal.

I agree that the default dictionary could probably use work. That kind of work
seems like the most likely reason for the several versions of prefix
dictionaries that have appeared so far, which you complained about in a
previous comment. :)

~~~
0xABADC0DA
> So, just for Google alone that change could save on the order of 100GB of
> traffic per day

So say $150 dollars/mo? Or 0.00002% of their profit/year. Come on.

> I don't think the paper provides statistics on how much a prefix dictionary
> saves over not having one at all.

The paper compares plain, deflate, and deflate + prefix dictionary. Prefix is
only a benefit on the first request and response, adding up to 121 on average
in their test.

~~~
JoshTriplett
> The paper compares plain, deflate, and deflate + prefix dictionary. Prefix
> is only a benefit on the first request and response, adding up to 121 on
> average in their test.

As far as I can tell, it seems to compare plain, deflate with SPDY's current
prefix dictionary, and deflate with the paper's proposed prefix dictionary.

~~~
0xABADC0DA
You are correct, the paper does not compare to compression with a null prefix.
In haste I read it wrong, since it was not too far off from calculations I had
done previously in a reddit discussion where I found ~100 bytes saved using
the prefix dictionary. Thanks for being persistent and I'll see if I can rerun
their data using a null prefix before the next Spdy thread, but I doubt it
will show more than 200 bytes total savings per connection, which doesn't
change anything IMO.

