Hacker News new | past | comments | ask | show | jobs | submit login
HTTP 2.0 (ietf.org)
434 points by abhshkdz on July 9, 2013 | hide | past | favorite | 322 comments

Oh wonderful, it appears to use SPDY-like header compression. Why don't we just create a new compression algorithm and predefine common HTML tags and words to improve compression of the response body too! But we shouldn't base it on any known compression scheme and only use it for the case where the content length is less than 220,498 bytes but more than 8,494 bytes to optimize the behavior of today's more common MTU settings for PPPoE in Scandinavian countries minus of course the most common size of compressed headers today. It will be particularly optimized for the kind of responses Google sends back to minimize their own server load and common adwords will of course be included in the predefined list of compressed tokens.


What happened to simple protocols? Seriously, per-hop flow control (which works best with which of the dozen version of TCPv4 or TCPv6 flow control?)? TCPv4-like framing with weird limits (16383 bytes)? Keepalives/Ping? Truly ridiculous specialized compression for headers which ignores the role of HTTP proxies? QoS?

Why not just implement the whole protocol over a raw IP connection and stop pretending like we're operating in layer 4+? I get that multiplexing is difficult without flow control, but good lord does this thing look overdesigned for what few benefits it offers over HTTP/1.1.

Well, we had a nice thing for a while, and now it's time to fuck that all up. The wheel turns.

Look on the bright side--if everyone decides the public Internet is too insecure, maybe we can convince them to keep using the old standards on darknets?

Recapitulating a comment from downthread, but, look at DNS for an example of how the IETF botches compression in its "simple" protocols. Not that compression isn't fraught (look at TLS), but I see its use as a sign of maturity.

"What happened to simple protocols?"

Answer: The Internet is still running on them, 30 years later.

Whenever I read something like "simplicity is hard", it makes me cringe. I hear that a lot, and I see evidence of gratuitous complexity everywhere I look these days. I'd hazard a guess the engineers behind SDPY would find simplicity (and reliability) boring.

Debugging binary protocols is either great job security for over eager engineers like the SPDY team or a great waste of our collective time. I'll let you all decide which.

I can't help thinking some of the pain could be resolved if we had a reliable datagram protocol between UDP and TCP. Delimiting a TCP stream to create a messaging protocol is already suboptimal and error prone, and it's the root cause of the head-of-line blocking problem experienced by HTTP ('fixed' in SPDY).

You mean like SCTP ? It's struggling to get traction.

Yes, because MS haven't implemented it.

It wouldn't get through any home firewalls either. Adoption would take decades...

Something like QUIC? I reckon a few of the authors of this draft is also involved in QUIC.


There can be no new layer 4 protocols due to NAT. You have to use either TCP or UDP.

So we can move from IPv4 to IPv6 but not introduce a new transport because of NAT? Hmm...

The IPv6 transition is not going that much better than SCTP. So many people have v4-only home routers that they have no plans to ever upgrade.

Most people have a free router from their ISP, the ISP will just send out new ones

My ISPs modem has a built in NAT that can't be disabled, but it still passes all IPv4 packets through in DMZ mode, so there's hope.

So instead of doing things properly, we'll hack around it... Isn't that what we're all complaining about?

Let's do it right and then fix NAT

Erm, reliable UDP?

You can get a startling amount of the way there with sequence numbers and a few other things--it's a fun exercise.

SCTP is more modern and still under active development. RUDP only has a draft RFC and hasn't been updated since 1999.

Oh, no no no, you misunderstand--spend an afternoon to implement a reliable communication layer over UDP yourself. I don't suggest another standard (or a defunct one): roll your own.

I understand the motivation to going to a binary protocol for a SPDY inspired HTTP 2.0 but I would have liked to see an ascii based protocol similar to what jgrahamc proposed last year [0]. I thought it was a much cleaner protocol to read and understand, and much more in tune with what the web is supposed to be. Why not keep the clever binary stuff separate in SPDY, endorse it though the IETF and keep HTTP ascii?

[0] http://blog.jgc.org/2012/12/speeding-up-http-with-minimal-pr...

There were so many people just calling for rubber-stamping SPDY as HTTP 2.0 without any changes that frankly, I feel lucky that we're getting revisions at all. The editor of the draft is a good guy, and I trust him to make good changes.

I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning, but this is the place that we're at nowadays. For better or worse, the big vendors guide the standards process. I'd like to see more involvement from the little guys, but that has its own set of challenges.

What does Google's involvement in advertising have to do with the design of the SPDY protocol? Can you make a substantive criticism of SPDY based on Google's advertising incentives, or is this just innuendo?

Soley on their advertising incentives: no. It's a rhetorical flourish.

That said, I _do_ think it's extra important to pay attention to what Google does, for two reasons:

1. They're the largest entity on the Internet. This means their incentives are different than smaller players. 2. They _are_ an advertising company. Advertising companies make money by showing ads. They make more money by showing targeted ads. You target ads by collecting data on people.

I think people often forget Google's purpose in the world, and are simply dazzled by 'whoah cool stuff.' I appreciate some of Google's more interesting and ambitious initiatives, but get very scared when people start accepting any entity's actions without question. Specifically when that entity has large financial incentive to collect data about people.

There are, of course, many technical criticisms of SPDY, but none that rely specifically on the advertising angle.

Personally, I find this line of argumentation nothing more than pure ad hominem. This is an open specification, open to review by everyone. If there were technological changes made to somehow support better data collection, people would be able to see that.

The IETF and the W3C has always had large companies involved in specs, often with their own agendas, I see no reason to attack Google in this way.

An ad hominem would mean that I am saying they're wrong. My argument is not

    Google is an advertising company, therefore SPDY is bad.
My argument is

    Google is an advertising company, and the largest
    single entity on the Internet, and therefore, their
    actions deserve a healthy dose of skepticism. I'm 
    not sure that we've been giving them enough skepticism.
SPDY does have good points, and bad points. I just saw a lot of chatter from people who want to ignore the bad points simply because of where the spec came from.

Google also handles more traffic then almost any company on the planet, so their involvement in a discussion about transporting bits is, well, not really that shocking.

The issue is that you're throwing around them being an advertising company as a negative without any discernible proof that it has negatively affected the outcome.

What, specifically (and please spare us the 'rhetorical flourishes') has been proposed that is unfairly biased towards advertising? Which parts should we be skeptical about?

It's not advertising that I'm skeptical of w/r/t Google. It's their amount of capital.

E.g., relating to the ASCII/binary discussion above:

A binary web would require advanced retooling and therefore investment. Smaller business entities are not in such a strong position to deal with such a large shift in their workflow. Therefore, switching to a binary protocol would disadvantage entities smaller than google.

Sure, but bigger and smaller scale operations have different needs. The internet isn't supposed to be about what's best for the big guys, it's supposed to be about what's best for humanity.

I've mentioned specifics elsewhere in this thread, for example, https://news.ycombinator.com/item?id=6013468 and https://news.ycombinator.com/item?id=6012906

The "specifics" in the other threads are more ad hominem. You're saying "we should be wary of what Google does" without actually mentioning what's there in the spec to be wary about. You're saying "we shouldn't trust Google to pass specs unchecked", people are saying "but we aren't: we've read the spec, and it's good", and you're saying "yeah, but we shouldn't trust Google".

> without actually mentioning what's there in the spec to be wary about.


And, as I mention elsewhere, it's not just the spec, it's the general trend of Google dominating the web. https://news.ycombinator.com/item?id=6013468

Sure, the general trend is a good point, just not entirely relevant to this specific thread. Your comment above does make good points about the actual spec, I agree.

But why is an advertising company deserve more skepticism than a company that sells operating systems or database servers?

As I mentioned elsewhere in the thread, an advertising company has incentives to collect and process as much data about individuals as possible. This is Google's core competency.

Oh, and in Google's case specifically, they now control the largest web site, one of the largest web browsers, (with this) the protocol it talks to, and they're attempting to supersede JavaScript too... and one of the largest client-side frameworks. The list goes on and on. The largest mobile phone OS. Email provider. (?) Working on social network...

Unless you can point to a specific point in this IETF draft which furthers Google's interests while diminishing others' I'm going to ignore you.

There is more to the world than a technical draft, you can't just abstract away the rest of everything else.

For technical critiques, I commented earlier: https://news.ycombinator.com/item?id=6013088

> There is more to the world than a technical draft, you can't just abstract away the rest of everything else.

Then maybe you should actually bring these things up. You seem to be playing coy and throwing around innuendo. If you have real, substantive concerns about something, I think the conversation would be greatly improved by actually bringing those up rather than just casting aspersions about Google.

I have brought them up, this isn't my only comment in this thread.

There are a large number of Google employees and stock holders on HN, and it would not be at all surprising if they leapt to Google's defense for non-technical reasons.

It would be enlightening to see the defenders of Google disclosing whether they have any financial or other interest in Google.

To complement that disclosure, any attackers of Google should likewise disclose whether they have any financial or other interests in Google's competitors.

It would help if people just criticize on the spec based on the technical merits.

I agree that technical critiques are good, but technology is social. You can't abstract away the rest of the real world. We need both kinds.

Sure, but I don't find the social critique very useful either.

There are things that Google does, as a corporation, that you can find fault with, technological decisions that may or may not have been influenced by business model. For those, fire away. But the attack on SPDY/HTTP/2.0 because "Google is an advertising company" (which if you actually worked at Google, and knew how people made decisions here, you'd know is ridiculous from an intent or motivation point of view) is just pure mudslinging.

Examples of stuff that I, as a Google employee, would criticize Google for: Real Names, building "siloed" services and moving away federated/decentralized approaches (see my essay here: http://timepedia.blogspot.com/2008/05/decentralizing-web.htm...), most of what Yegge said about APIs, Google Hangouts going "silo" and away from XMPP model, etc.

People who work on ads and take their marching orders from ads are a small portion of employees at Google. The guys working on Chromium/Blink/SPDY do not report to ads, do not take orders from ads, and in general, work on technology without reference to monetization strategy. Their day to day job is to improve technology, with the hope that if you raise the tide, all boats will be lifted, and they'll be some ROI from that.

But that the idea that engineers are taking marching orders from shareholders to maximize profits based on ads by tweaking web standards is hilariously wrong for people working on Chrome.

I'm not talking about "Larry And Sergey have decreed that Evil Shall Happen!" I'm talking about broad economic incentives. Since I don't work for Google, I have to treat them as a black box; I see what goes in, I see what comes out. I know nothing of the internals, I only have one friend who actually works there. If I implied there was some kind of conspiracy, that is my fault. You're right that that would be ridiculous.

I would also criticize Google for your reasons, and they may be even more important. But this isn't a thread about those things.

A reasonable argument would say that we don't need the social and political stuff in standards discussions, which should be based instead on engineering.

Absolutely. This is why I wouldn't make these comments on the IETF mailing list. I do think that HN is an appropriate venue, this is very much a social place.

"Should we be doing this?" and "How should we do this?" are two very different questions.

That's a fair point for why your argument is germane to HN, but for what it's worth I still don't agree with the argument.

While I agree with keeping a healthy skepticism of Google, I have a bit of difficulty with this wording:

> I think people often forget Google's purpose in the world

The fact that Google makes money through showing ads doesn't mean it's their purpose.

I think it's fair to argue that a company's purpose is best illustrated by its revenue streams. Reasonable people can disagree depending on the circumstances.

Not necessarily the case.

Github's revenue stream is through private repositories (both hosted on github.com and self-hosted enterprise), but I don't think you could reasonably assert that Github's purpose is to make a profit off of keeping code private. Their actions, in fact, suggest precisely the opposite.

In some cases, a company could transcend its initial purpose, but still keep it around as a/the revenue stream as a means to the new end. Not many/any new and further out there Google initiatives have made it to wide scale public adoption, so it's yet unclear whether Google would be such a company, but it could very well turn out to be one.

If it's a publicly traded company, it has a fiduciary responsibility to make money for its investors; so I'd have to agree with you. It's purpose is to make money. It might spend money to buy goodwill to earn loyalty, but at the end of the business day, its a business.

Google's corporate charter was specifically written to avoid that. And shareholders have no meaningful voting rights, so they can't override it there either.

>If it's a publicly traded company, it has a fiduciary responsibility to make money for its investors //

Where is this codified?

It doesn't have to be "codified" to be fiduciary. The trust relationship between any investor and the investment enterprise is that the enterprise will be able to generate a return on the investment. If it doesn't assume this, it generally will be deemed a non-profit.

If it's not codified then it's more likely an expectation than a responsibility. Of course investors expect a return, that's what the term "investor" entails.

Non-pecuniary returns can satisfy the responsibilities of an enterprise.

It appeared that a legal obligation was being suggested. What sort of obligation was being suggested and how is that obligation derived and enforced?

To play devil's advocate, what is Apple's "purpose"? How often has it changed?

Their shareholders would beg to differ.

Could you please share some of the technical criticisms of SPDY that you find most compelling?

I can't speak for Steve, but I thought Poul-Henning Kamp's critique was pretty compelling.


In brief, he makes three points. The first is that SPDY/HTTP 2.0 doesn't do anything about the widely lamented lack of session handling. The second is that it doesn't contain any simplifications of HTTP, despite there being several examples of things that could be simplified (header parsing, for instance, is hairier than it could be). The third is that it is going to pose problems for proxies.

I don't know how many of these points continue to apply with this HTTP 2.0 draft, nor do I have any skin in this game, but I respect PHK quite a bit so his outrage creates in me a sense of mild reservation. :)

I too have unreserved respect for PHK as an implementor. I'm not sure I find his critique compelling. It seems to me that it distills to a couple simple points:

* SPDY depends on Deflate compression, and will require middleboxes to implement deflate to route requests. I think the "IETF school of design" has an irrational fear of good compression and I think it's harmed other protocols, most notably DNS. I may be poisoned into this viewpoint by Bernstein.

* There are protocol constants that PHK doesn't know the background of, which strikes me as the kind of documentation bug that something like an HTTP 2.0 would address.

* SPDY might have required another WKP, which isn't really a SPDY problem.

* There's DoS potential in SPDY --- but of course, there's DoS potential in HTTP too; look at chunked encoding, for instance. For that matter, modern HTTP 1.1 also accommodates compression; when it comes to attack surface, in for a penny, in for a pound.

* A similar argument addresses PHK's concerns about the (theoretic) security of the push model, which is also something that modern HTTP accommodates.


Oh oh also: PHK sees HTTP 2.0 as an opportunity to correct the session management problem, which has led to the "bass ackwards" design of heavyweight signed cookies in web applications. I sympathize with him on this point, but it's not HTTP's fault that this happens. HTTP 1.1 cookies also used to be simple opaque session IDs; heavyweight signed cookies are a consequence of server app architecture, not the underlying protocol.

Even if HTTP 2.0 had built-in robust session management, Rails apps would still be shoving several kbytes of encrypted state out to web browsers.

I hadn't read about DNS compression from DJB, but having looked at implementing it, I wish I had never started reading the specs for it.

Such a complete and utter mess of a scheme!

The first two criticisms of SPDY sound like "doesn't solve every known problem with HTTP at once", which was never a design goal; that doesn't make SPDY bad, it just means that further room for improvement still exists.

The third criticism, that SPDY makes life more difficult for routers, makes me wonder: would this get easier if SPDY just said "forget the Host header, SPDY requires SNI"? Seems like that would help.

My main objection is that the name you call something does matter. SPDY is a very different protocol from HTTP, which addresses a very particular set of concerns. It diverges quite a bit from the "intent" of HTTP. This is all fine and good until you change the name from SPDY to HTTP 2.0. One expects 2.0 of something to continue the same philosophy and motivation that produced 1.0. When that doesn't happen (R6RS is another good example) you can expect some pushback. In this particular case, the "label swap" nature of the process is generating animosity from those who feel that the process has been co-opted by people trying to pull a fast one. I don't think SPDY is intrinsically wrong, I just don't think it looks like a natural successor to HTTP. I wouldn't expect HTTP 2.0 to address every known problem with HTTP at once, but I don't think it's unreasonable to expect at least a few aesthetic improvements.

I don't see how this follows from your earlier objections. "It doesn't add session handling and it doesn't simplify header parsing, therefore it diverges from the intent of HTTP" seems like a non sequitur.

Don't confuse my objections with PHK's objections. There may be good technical answers to his objections; Thomas replied to them above quite cogently, but in any event, PHK's opinion carries a lot more weight than mine. I'm just a spectator.

My objection (observation, really) is that one expects protocol 2.0 to do more than address performance optimization. Simplifying the protocol is a good thing to do with a major revision; they didn't do that. Making the protocol more friendly for upper layer users is another good thing to do with a major revision; they didn't do that either. Instead they took an obviously different protocol designed to address a handful of extremely technical performance matters and rubber-stamped it as HTTP 2.0. Whether you like SPDY or not, it should be clear that this kind of "process" is going to leave people feeling disenfranchised. The spirit of HTTP, inasmuch as such a thing exists, is one of simplicity. SPDY just doesn't "smell" like the successor.

I think the comparison to R6RS is very appropriate to my point. R6RS was designed to address well-known shortcomings of Scheme. The process it took to get approved circumvented a lot of the community. A large segment of the community responded to this by essentially whining about it and ignoring it. We already see the whining about HTTP 2.0. I predict it will be followed by ignoring it, and some years in the future, an HTTP 2.1 or 3.0 that more closely resembles HTTP 1.1.

My sibling has already pointed out one of the better critiques I've seen. There is also http://www.guypo.com/technical/not-as-spdy-as-you-thought/ , which I believe has been discussed on HN before, but I'm on a pomodoro break, so I'm trying to keep this short.

One critique that I don't remember if is contained in either of these two is header compression. Header compression seems to make sense, as compression is good. The problem is that intermediaries make routing decisions based on the headers, and so it's quite possible that the CPU time needed to decompress, possibly modify, and recompress the headers outweighs any gains that the compression brought in the first place.

I've also seen some vague commentary about 'mixing application concerns into the transport layer' which I find compelling, but I don't have enough experience with the low-level networking to properly judge on my own.

Break over! Gotta run.

Worst of all is that compression is stateful, you need to capture the whole HTTP/2.0 session to be able to reconstruct any information with mandatory HTTP/2.0 debug tools.

> their incentives are different than smaller players

Yes. They are not representing smaller players, i.e. majority. And I think for smaller players speed is not as important as convenience. So this can even hurt smaller players in a long run.

Google's self-declared, initial purpose was to organize, curate, and present all the world's information. We didn't know that meant people too.

You don't think Google receives enough skepticism? Every time they brew a pot of coffee, somebody out there declares that Google has violated their "Don't be evil" motto and is out to destroy us all with their dark caffeinated schemes. I can think of very few companies that are treated with more skepticism than Google.

* flow control issues, iirc QUIC is intended to help address this.

* TLS not mandatory for HTTP/2.0

Google is the industry's most active and effective corporate advocate for TLS. They're one of the key drivers for certificate pinning and one of the earliest mainstream deployers of forward secrecy. So I think that argument is a little bogus.

I don't understand the first point, though. Could you clarify?

QUIC is a very new, experimental protocol that runs on UDP. Their (relevant) basis is that TCP's algorithms are completely controlled by the OSes and the routers and all. Using UDP, QUIC can quickly deploy new algorithms without requiring a major part of the world's infrastructure changed.

Sure, but SPDY is a TCP protocol, and so inherits TCP-friendliness from that; I'm just not seeing what that has to do with HTTP.

Google is the industry's most active and effective corporate advocate for TLS simply because it makes tracking users and selling targeted advertising a whole lot easier. Their involvement in the whole PRISM affair has undoubtedly demonstrated that privacy is none of their concern.

I wish I could somehow CAPTCHA comments like these so I could tell if they were people or Markov generators.

Years ago, in days of old, when magic filled the air, I wrote a Slashdot troll post generator. It eventually produced some pretty hilarious posts, but I never closed the loop by allowing it to post. It would make a fun project for learning a new language; perhaps I'll install Dart and give it a shot.

With SPDY as implemented all requests for google analytics reuse the same TCP connection. This connection acts as an implicit tracking cookie uniquely identifying your browsing session.

>I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning

I don't think there's very little questioning...in anycase, Google's goal is to deliver ads to you in the most efficient and fastest way possible. The more you browse the web the more Google makes; it's in their interest to develop SPDY/HTTP2.0. So what is wrong with them doing it? IEFT spec drafts are public and they're audited (as SPDY has been).

Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.

Also, most of the specs that we love and rely on today came from "big vendors"

The core internet protocols we rely on, though, mostly didn't. If you look at the authors of RFCs specifying the widely used standards, nearly all of them were at research institutions: Steve Crocker was at UCLA then ARPA; Vint Cerf was at UCLA, then Stanford, then ARPA; Bob Kahn was at ARPA; Jon Postel was at UCLA then USC; Paul Mockapetris was at UC Irvine; Abhay Bhushan was at MIT; Tim Berners-Lee was at CERN then MIT.

Not sure if that's good or bad, but it seems to have been uncommon until recently for internet protocols to come from vendors.

edit: I did think of one important one, IPv6. Steve Deering was at Stanford, then Xerox PARC, then Cisco, and IPv6 came out during his Xerox/Cisco period. Bob Hinden was at Ipsilon Networks, then Nokia.

it seems to have been uncommon until recently for internet protocols to come from vendors

Sorry, that's been going on for quite a while now. Cases in point: http://tools.ietf.org/html/rfc3768 http://tools.ietf.org/html/rfc5077 http://tools.ietf.org/html/rfc2637 http://tools.ietf.org/html/rfc2281 (these are just examples, there are many more citing Cisco, Microsoft, Nokia, Google, etc.)

Don't forget Dave Clark, who at MIT, and is the chief architect of TCP itself.

> IEFT [sic] spec drafts are public and they're audited (as SPDY has been).

Absolutely. But there's more than one kind of control. I don't think enough programmers understand the effects of social control. If the standards are all public and audited, but only employees of Apple, Google, and Microsoft have the time and energy to keep up with discussions, well...

And, of course, I'm not imply that _only_ that is true, I just fear that big organizations are dominating the discussion. I have more free time than the vast majority of programmers, are subscribed to the HTTP 2.0 mailing list, and find it hard to keep up.

> IEFT spec drafts are public and they're audited (as SPDY has been). > Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.

It's especially hard when the call for proposals period of the draft is about 4 months and there happens to be a ready made proposal from a big player at the ready to be agreed on almost immediately. It's nice to say the little guy should be a part, but in this case the little guy mostly heard about it long after it happened.

> I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web

First, to call Google an advertising company makes no sense. It's a tech company. You don't call a newspaper an advertising company either.

Second, there were people involved in designing this, not just an anonymous corporation. You can actually see their names in the proposals. It's a good design, that's why it has been adopted.

> First

How does Google make money again? Newspaper companies are advertising companies, especially given the quality of the news lately. ;)

> Second

Absolutely, and I don't mean to denigrate their technical efforts. I'm glad people want to move the web forward. I'm just recommending caution.

> It's a good design, that's why it has been adopted.

Many poor designs have garnered wide adoption in the past, this is not inherently true.

Wide adoption of SPDY also seems like quite an overstatement, a few big players have been trying it, but the only number we have so far is 339 SSL certificates used with SPDY-enabled servers in may 2012. http://news.netcraft.com/archives/2012/05/02/may-2012-web-se...

A Google recruiter actually told me that Google is "basically an advertising company." Then again, he was recruiting for an AdWords-related position, so his viewpoint may have been skewed.

Oh come on don't let paranoia encourage you to throw the baby out with the bath water…

The way Mark Nottingham ran the original CFP for HTTP 2 and the eventual adoption of SPDY as a starting point was very fair - it's all there in the IETF archives for anyone to see. From memory there were only two other proposals (from Microsoft and someone else)

The reason Google were able to get a new protocol up and running is because they have both heavily used web properties and a browser. They're also willing to carry out experiments in public.

As it stands HTTP 2.0 will be good for the little guys too, based on the testing I've done little guys will see an improvement in performance without needing to do all the merging that destroys cache lifetimes.

3rd party-content is the fly in the ointment to the performance improvements so we'll need to be much more careful about the performance of the 3rd party sites we include.

N.B. Apart from using their products I have no affiliation with Google

> For better or worse, the big vendors guide the standards process.

It costs money to have people on staff who write IETF drafts and haggle them up to RFCs. Hopefully the standard isn't too degraded by the needs of Google in this instance and everyone benefits.

Absolutely. I'm not saying that it doesn't. I share your hope, but I'm not super optimistic in the general case.

This idea is bad for a couple reasons.

First, ASCII is inefficient. People don't interpret HTTP, computers do. Web servers and browsers. People only look at HTTP when they want to troubleshoot without any tools. With real tools, you can find out what's broken much quicker. And there's plenty of things you can miss without a real HTTP interpreter. Most hackers prefer to think of themselves as wizards that can spy 0's and 1's and tell you what the weather is. It doesn't make for a better protocol, though.

Second, we can already break HTTP responses up in multiple parts, using a novel idea called "multipart". It sucks and nobody has used it since HTML/JS found new ways of providing content. http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html

Third, it's a hack. If you want to improve the protocol, improve the protocol, don't just hack onto it to make it do what you want. I could make a horse and buggy go 60mph, but would it be a good idea? How about just designing a better buggy that is intended to go 60mph?

Fourth, fixed-length records are the wave of the future! It solves crazy problems like header injection and request integrity checking. Moreover, it makes for simpler, more efficient parsing of requests.

Fifth, redundancies introduced from the beginning of time need to go away, like terminating every record with "\r\n", or passing the same headers on every single damn request when once should be just fine for a stream of requests. Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.

Sixth, the flow control improvements can make different applications more efficient by both not having to hold state of where and when traffic is coming and improving flow across disparate network hops.

Seventh, as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits? Add to this that every header could have a 32-bit identifier (4 bytes) and you've got more efficient compression than gzip. Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers, which would make working with the protocol in general more attractive. But then you have your binary-detractor-wizard-hackers and the whole conversation becomes an infinite loop.

> First, ASCII is inefficient. People don't interpret HTTP, computers do.

I can't tell you how many times I've manually read HTTP. To be sure, it's insignificant compared to how many HTTP headers have passed through my computer unseen by me.

ASCII may be inefficient, but computers are really fast, people are not. I don't have any measurements, but in making a browser, HTTP header parsing/writing was never near a performance issue. Bandwidth wise it also tiny compared to that image file you'd inevitably download for every page visit.

And sometimes you don't have tools. Sometimes you don't want tools. Sometimes you want to use tools that work with text to analyse your problem.

(You other arguments might still stand, though :) )

> HTTP header parsing/writing was never near a performance issue.

Since header lengths are not limited, and a single TCP packet's payload is quite limited, long headers can cause very measurable latency difference. Additionally, while I agree the generation / parsing overhead is probably quite small, saving it for every HTTP request is still a boon.

I'm also curious where you are reading raw HTTP from?

For me it's primarily in two situations, reading a packet capture from WireShark, or in the browser's debugger. In both cases, the tool will end up translating the request for me.

I guess a lot of people here never had to work with the X protocols like X.400 and X.500. When you need specialized tools for every protocol and encoding format, development is a real drag.

Just because I use Charles or Wireshark doesn't mean that I only want to use those specialized tools. I have definitely been in situations where I'm doing something like running nc as a proxy and looking at raw HTTP. I wouldn't choose to throw that away and revert to the bad old days without a big win.

The nc example is the last bastion of this argument. But what about SSL? For a long time you just couldn't test it, or maybe use OpenSSL's server feature piped to nc.

But now nc supports SSL natively, making it super easy. Just as it will support binary HTTP natively, making it super easy. And everyone will finally stop caring about ASCII.

Should your willingness or unwillingness to use a tool for these (rare) scenarios influence the design of the protocol in any substantial way?

> saving it for every HTTP request is still a boon

I'm questioning the measurability of this, though. Smells like premature optimization. You could be right, but I'd like to at least measure it before we go about changing one of the fundamental protocols on the internet.

Last time I read raw HTTP was when writing a script to automate some stuff on a web page. I specifically did not want the browser's headers and behaviors. I had a bug which only happened from my script, and raw HTTP helped me track it down. I could have used wireshark, but I am much faster in vim for a simple task like that.

HTTP has existed for over 20 years. We've had some time to look at it. It has been measured.

As a comparison to your scripting story, you would use Wget or Curl or LWP::UserAgent or a thousand other things to automate HTTP requests. One function call to do what you did manually. To find bugs you would use an HTTP fuzzer like Skipfish to automate the process. If you think somehow your manual process was faster, I say to you, teach a man to fish...

(I automate things in web pages for a living, and I only use tools like Firebug and LWP)

Plain-text formats have always been slower for things that are not plain text. But even 30 years ago, when computers were even slower, Unix designers decided plain text was still the way to go, because it was easier to debug and easier for humans to work with. No specialized tools required, no poring over hex dumps. HTML won over other document formats. JSON and XML won over other binary formats. Any coder can look at JSON and see what is being transferred, without the aid of anything but a text editor. Plain-text marshalling formats for binary data (e.g., base64) are still useful for pasting data into an email or adding ssh keys to authorized_keys with "cat >>". Tool support is not going to make SPDY any nicer.

Things have changed in 30 years. Unix designers didn't have the time or resources to write elaborate tools, nor the need for complicated software. Back then you would use telnet to browse Gopher or send your mail. Things are different now. I dare you to read a 3KB JSON file without a parser. Base64 was a hack for text-based protocols. Tool support will make it a lot nicer than no support.

Simple protocol, makes it simpler to write tools, simpler tools are easier to change, upgread and its simpler to add features, faster development, faster improvment. Better live for programmer.

Also sometimes you need something spcific and then you have to option to code it up yourself, or change a open source library quickly and efficently.

The main "thing" that changed in 30 years is computational power, which is now several orders of magnitude greater. If 30 years ago computers sporting the power of timex watches spared the cycles for text protocol overhead, I fail to see the need to squeeze, in today's hardware, that last drop of performance.

The advantages of text based protocol remain the same. The disadvantage is lessened by faster CPUs.

The only advantage to a text-based protocol is you can read and understand it in raw form. Unfortunately this is not an advantage over binary protocols.

If anything, the more complex the protocol, the more redundant the text becomes, because we have to write tools to parse the text and output it so we can understand it better or identify flaws in it, and work around bugs introduced by the human element of the protocol. The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.

You still need a library or tool to write the protocol out, as it's complicated and needs to be structured for the machine, not a person.

Second, saying "it's ok that it's slow, we'll just buy a faster CPU" is not a good argument for anything ever. It's part of the reason it's taken so long to adopt encrypted services everywhere. Someone (Google) had to finally prove it wasn't slow so people would adopt it.

Third, the state of modern computers is that there is no difference in speed between interpreting most text protocols and binary protocols. But that has nothing to do with efficiency, or what the machine is naturally suited to doing. You have to translate from English into machine code for a computer to know what the hell another machine is talking about. Machines don't care about line-by-line, or capitalization, or indentation, spaces, or any vestige of our natural language. Strip all those things away and machines purr along happily with less bullshit to deal with, which means simpler, more efficient code. Note that I didn't say faster.

Fourth, your performance and history observation is flawed. We need a lot more performance today than we did before, as we're scaling existing technology to many, many orders of magnitude higher than anything that existed when it was invented. Yes, we have faster CPUs. We also have more users and more data, and we don't have time to sit around reading packet dumps in text editors.

> The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.

This is entirely false, as anyone who ever had to debug a malfunctioning http proxy or a misbehaving IMAP can tell you. Nothing beats netcat for a quick bug isolation test. As for the need for formal parsing, again it is true for production code, entirely false for sysops transient tasks.

Compare debugging a corba server with debugging http for a whiff of the he difference.

Tools aren't omnipresent. My miryad busybox embedded devices won't ever likely have a protocol analyzer. If I'm in need of one there, I'm done with.

No I don't particularly want to read a 3kB JSON file without tools - but the point is: in a pinch I _CAN_.

With a binary protocol you're entirely dependent on tools (except you want to trawl through it with a hex editor)

You're already dependent on tools - your eyes and language processing parts of your brain - to use text formats. With a binary protocol you'd be equally dependent on tools. They're just not embedded in your skull.

Seeing as we use binary protocols every day of our lives, and the tools to work with them have existed for years, and nobody has any problem with using them, let's let this argument rest.

"Here is a ridiculous stretch of semantics because I can't admit anyone else has a point. Also let's stop arguing at the end of this sentence to avoid rebuttal."


And reading such comments all I think is:

"The history is doomed to repeat itself".

The same arguments can be applied to HTML, CSS, JSON, RSS and so on. I fail to see the crucial difference between those and HTTP. Or would you say the web as a whole should be binary?

Fifth, redundancies introduced from the beginning of time need to go away I wholeheartedly agree with this, but it doesn't automatically warrant binary encoding.

Human readability is a huge bonus in any protocol or format. Not because normally people read those protocols, but because people read ASCII and therefore they have good tools to work with ASCII.

...? Seriously? You don't see the difference?

HTTP is a layer 7 communication protocol. HTML/CSS are markup languages for designing an interface. JSON is a data interchange format. RSS is a content syndication format.

They are all wildly, vastly different. The only thing they have in common is they're all ASCII. If anything, you're making my argument for me: a communications protocol is not a format for displaying documents, it is a language for communicating machine instructions to network applications. Historically they have always been binary because it works better that way.

Your argument that "people can read ASCII, so ASCII is good" leaves out a couple points. Like, human beings do not read an HTTP statement, go into a file folder, bring out a document and present it to their computer. It's the other way around.

Really this just reflects a strange phobia people seem to have. Like your brain is tricking you into thinking you'll lose something by not looking directly "at the wire".

When you look at HTTP headers, 90% of the time you're actually looking at a pre-parsed, normalized set of fields. If you look at a raw packet dump, the whole message may not show up in one packet; you may have to reassemble it, which means parsing. If you have multiple requests in one connection, you have to find the end of the last request, which means seeking through the stream; seeing requests broken down individually means a tool already parsed them. Firebug and wireshark and other tools all take care of the automated, machine-operated work for you.

And what's left? What do you have to do with HTTP, really? Apache rules? They'd stay human-readable. Application testing? We use proxies that handle it, and APIs for client/server programming. Firewalling? Handled by tools and appliances.

Stop giving me the blanket "ASCII is great for everything" excuse and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool. But you don't have to, because that's impossible: HTTP is not for humans.

Apache rules? They'd stay human-readable.

I look forward to servers having different text representation of the same binary headers in their config files.

tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes

You're missing the point.

No one writes HTML manually anymore either. People generate it using tools (string processing tools in a language or templates) and read it using browsers. Heck, even Notepad++ is a tool, but a generic one.

If you want, you can generate all your HTML using DOM. But almost no one does that, because DOM tools are clumsy, while text-based tools are easy to use.

"No one writes HTML manually anymore either."

I was up until four last night doing just this. It's commonly done for templating purposes all the time, or quick hacks and placeholders.

Have you lost your mind?

You're actually still arguing for my point instead of against it.

If no one writes HTML manually anymore, then we have no need for it to look like English when the computer interprets it! We can compile the HTML down to bytecode and have it be interpreted much quicker by the computer, which won't have to do the job of lexing, compiling, assembling, etc. Here, two steps would be eliminated immediately, resulting in increased speed and more efficient storage and transmission: http://www.html5rocks.com/en/tutorials/internals/howbrowsers...

For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!

On top of that, you missed when I said HTTP is a communications protocol. Ever seen the movie The Matrix? Know how the sentinels would sometimes look at each other and make scuttling noises, then shovel off somewhere? They weren't speaking English ASCII. They were speaking a binary communications protocol. Know how I know? BECAUSE MACHINES AREN'T HUMANS! It would be absolutely moronic for them to speak English to each other. It would be like dogs saying the English word "bark" instead of just barking. Completely unnecessary and crazy. But that's what an ASCII communications protocol for machines is.

On top of that, there is no benefit, not one at all, to humans being able to read it when tools already exist to interpret and display it even more human-readable than its natural state. We squish and compress and strip HTML and JS already just to make it more efficient, and then undo the whole process just to read it. It's insane.

So you really think we'd be here today if instead of HTML, CSS , Javascript, JSON, XML we had a web based on bytecode formats?

The web is made by people, not computers. Open an ubiquitous text-editor and you can start working on something right away. If you have to download a dozen different compilers and IDEs to do that, it's definitely not the same.

"The web" is actually just a collection of hyperlinks, applications that parse markup and document storage and retrieval services. You don't see code. You see pictures of cats. And you never, ever need a text editor to use it.

Face it. Your love affair with ASCII is just that: an emotion.

(As to your original question: humans haven't needed to program in binary or assembly for decades. That's what so great about computers: they do the hard work for us, so we don't need to type everything manually into a text editor. Is that such a hard pill to swallow?)

You're completely ignoring the fact that the web began as (and still is, in part) a collaborative tool and publishing platform. Text-based formats played an immense part in that, geocities, the rise of personal publishing, blogs, these would not have happened without them.

Yes, binary is more efficient, but then tell me why is JSON the most popular data interchange format on the web today?

Because XML, the preeminent human-editable data interchange format, sucked balls. It's only superseded YAML because it can be stripped of whitespace and it has the word "Javascript" in it.

binary formats sucked so much, that they had to invent XML and it was a much better way to start the interaction era, were services talk to each other without having to read a 30 page spec just to understand how to write the right payload for the interchange format used. Let alone the byte order...

For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!

Yes, let's base HTML 6 on Word .doc.

Also, the machines in The Matrix were hostile to humans. We'd like machines in the real world to be... not so.

Are you arguing that we should have embraced Java Applets and ActiveX controls, because they are binary formats, hence more efficient? HTTP is NOT a communication protocol, it is an APPLICATION protocol. HTTP is an application on top of a transport layer, HTTP, just like SMTP, IRC, FTP, IMAP etc etc is just a protocol that describes applications. It is not TCP or UDP and SHOULD NOT BE!

> and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool.


How are you going to see the header? Useless to debug if you can't see it.

Haven't you ever done an HTTP request through nc or even just telnet to see what responses came back? This is the best way to troubleshoot strange reverse proxies or rewrite rules.

Boy, wouldn't it be crazy if applications included debugging modes that told you exactly what they were doing?

Do all applications include something like that? If not, what makes you think that they will in your hypothetical "binary is king" future?

Have you ever heard of the tcp/ip protocol suite? I hear there's some things you can use to debug it. Might even support HTTP in the future.

Yes, but that has nothing whatsoever to do with the question I asked.

Here's the problem; you keep on assuming that good tools will magically appear, that help with debugging. But good tools take a lot of time and work to perfect. In reality, you usually wind up with just barely good enough tools.

With a text based protocol, you can inspect it visually with no special tools, and munge it with general purpose tools that you already know how to use (shell script, sed, awk, perl, python, ruby, what have you) with no special support libraries or anything of the sort. Support libraries can help you with the more complex aspects of the protocols, but for basic debugging purposes, you can do it all with general purpose tools.

With a binary protocol, you need those libraries to even have a chance of being able to work with it. Now you can't use a general purpose shell pipeline to munge it; no more nc | grep or what have you. You have to have a wireshark dissector; and good luck figuring out how to grep through the results of what a wireshark dissector generates.

The main point is that the overhead of the ASCII encoding isn't the main problem with HTTP. Reading ASCII encoded CRLF delimited headers is a solved problem (and heck, you could probably switch that to just LF delimiters, since I'm sure that most processors already handle that case just fine).

The problems are things like having to repeat headers over and over again for each request in a session, enormous cookies that need to get sent with every request, and the like. But you can solve those without throwing away the easily debuggable ASCII-encoded headers; and compression really does solve most of the problem with the inefficiency of ASCII encoding (and you're going to want to use anyhow, since the HTML, CSS, and JavaScript that you're delivering is all a fairly inefficient ASCII representation too).

Glad you came by, you can surely help me. I'm in dire need of a debugging tool that allows me to start corba requests and works with all large corba vendors.

There isn't one, don't bother looking. Corba is the poster child for the problems with binary protocols: fragmentation, buggy implementations, incompatible extensions.

I'd rather not see HTTP follow the Corba path.

> The same arguments can be applied to HTML, CSS, JSON, RSS and so on.

It can be, but that doesn't really make sense. The vast majority of web development is done without manually editing HTTP headers. It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice. The same cannot be said for any of the other technologies you listed.

First, I'm not sure what majority you refer to. Neither do I know what you mean by "manually" editing. I used this text-based function on more than one occasion:


This is a good example where adding an object-oriented representation to every header out there would require a lot of work. Not sure if it would justify the gains.

It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice.

Until you try to use grep or something of that sort for some non-trivial analysis operation. Everything 'speaks' ASCII. Custom tools for binary format would take years to evolve to be as powerful as generic text tools.

Custom tools for binary format would take years to evolve to be as powerful as generic text tools.

All you need is one parsing tool that produces a textual representation of the binary protocol, and you can once again use grep and friends.

All you need is one parsing tool ported to 1000 platforms in existence now. We already have ASCII tools on all of those platforms, and we are pretty much guaranteed once new platform is created it would have basic ASCII tools. However it is not at all guaranteed it would have decoder tool for every binary protocol out there. That's why ASCII protocols are easier to handle than binary ones. And for 99.999% of protocol users, savings from converting to binary would not be even measurable. Sure, for likes of Google and Amazon economy of scale would be substantial. But 99.999% of web users aren't humongous-scale projects, they are relatively low-tech projects for which simplicity is much more important than squeezing out every last bit of performance.

So long as nobody that needs to speak ASCII deploys a server that doesn't speak HTTP/1.1, I think switching to a more compact binary protocol for HTTP/2.0 is a good thing. Embedded devices will be able to handle more sessions with less CPU power, for example.

I'm not convinced for average embedded device parsing HTTP headers represents significant amount of energy spent. Are there any data that suggest that for average device - I don't mean Google's specialized routers or any other hardware specifically designed to parse HTTP - this change would produce measurable improvement? In other words, how much longer the battery on my iPad would last? I don't think I'd gain even a single second, but I'd be very interested to see data that suggest otherwise.

I'm talking about things like Philips hue and hardware with a 70MHz CPU, or Arduino even.

> This is a good example where adding an object-oriented representation to every header out there would require a lot of work.

Most of the tools I've used represent the header as a hash/dictionary. I fail to see how that approach "requires a lot of work".

> Until you try to use grep or something of that sort for some non-trivial analysis operation.

You're arguing from the assumption that a binary protocol would be implemented by idiots. Custom binary tools can always emit a textual representation, at which point you can grep through it to your heart's content. This is the exact same problem that we've been solving with compilers for generations. It isn't nearly as insurmountable as you seem to believe.

Most people aren't writing HTTP requests by hand in a text editor.

No, but a lot of people are writing them in telnet prompts and through netcat pipes.

Curl and wget are nearly as ubiquitous as netcat.

The older, HTTP/1.X compatible versions maybe. What happens after the web upgrades to 2.0? How long until compatible tools make it to default installs? Even now, OSX doesn't ship with wget.

But they watch, capture and post them to mailing lists and stack overflow.

People are actually writing HTML/CSS/JS, so no. They have a whole other set of issues like being XML based (for HTML), but they do the job and are not likely to experience fundamental change in the next 5 years in broad adoption.

> They have a whole other set of issues like being XML based (for HTML)

HTML isn't XML based; there is an XML-based relative of HTML (XHTML) which was originally (before HTML5) viewed as a potential successor to HTML, and with HTML5 there is an available XML-based serialization of the HTML's semantics, but HTML is its own thing (prior to HTML 5, HTML was SGML-based; XML was inspired by HTML rather than serving as the basis for it.)

ASCII is inefficient, but nobody cares. For vast majority of Web users, which aren't working on scales of Google and Amazon, difference in performance couldn't even be measured. And for them easiness of use - low barrier of entry, basically all you need is a basic ASCII tool and you're ready to go - is vastly more important than completely immeasurable performance gains from using opaque protocols.

Of course, you can say TCP/IP is still binary, and it is true. But TCP/IP tools are built in every OS in existence now, so they do not form a real entrance barrier. Would HTTP tools be in the same position? I'm not sure - most HTTP tools right now are not standard and do not cover even HTTP/1.1 completely, what reason is there to expect they'd cover the whole 2.0 protocol properly and be widely standard and available on the level tcp/ip tools are? Which means much higher barrier of entry.

I'd figure the inefficiency cost of ascii vs binary http headers over yottabytes of packets every year would add up. It hurts your bandwidth, it wastes electricity on the wire, and it wastes processing power. An insignificant smidgen on average, but add it up and it would probably be substantial.

That is always my stance on things - if one computer is going to run something, write it in python, make it bleed memory, just make it work. If it is going to run on a million, you have to consider the raw power waste of inefficient programming. If it is going to run on trillions of devices for decades, your choices are few in my mind.

Yottabytes only come into play if everybody switches. But complexity of binary protocols would work against that. So probably only very large sites would implement it - and even for them, is parsing HTTP costs that much?

>>> If it is going to run on trillions of devices for decades, your choices are few in my mind.

The history suggests otherwise - majority of mass-produced software is not written with performance as an ultimate concern. You would find a lot of software written in languages like Python or Java, even though using C or assembly would probably produce better performance. But using C or assembly that software would probably never be produced because its complexity would be harder to manage.

Of course, performance does matter - even writing in Python, you have to worry about performance. But here we effectively see an argument saying "since we have a lot of software in Python, if we switch it to C we'll have massive performance gains". I think it is a wrong line of argument, if we switch to C a lot of this software wouldn't be written. (Note it's not against Python or C - I use both and they both are great in their areas :)

So I guess optimized protocol does have its uses for high-volume websites - but I am concerned its advantages would be offset by its complexity. The designation of it as HTTP/2.0 implies it is the next version of HTTP - but it's rather a rather different thing with different use case. I'd rather have it as a separate protocol for high-traffic websites.

Amen! Efficiency to who's standards or values? It's a financial issue for big Internet but what's the real values cost the rest of the Internet? It's obviously a minimal financial costs, as everyone I know that wants to publish can. Why sacrifice durability, readability, and the original core values of the Internet for saving "big dollars" from "big providers"?

The Art of Unix Programming by Eric Raymond makes an excellent argument for textual protocols in Chapter 5:


I strongly disagree with "protocols are observed with tools we don't need ASCII".

That's pretty annoying to see this kind of thinking. The reason why everyone codes in JS and uses HTML, CSS is because its ASCII. Its easy to understand, hack, etc. Same reason python is so popular. Even Go, is pretty simple like that. Sure its languages vs protocols, but the reasoning is exactly the same.

And in fact, the comparison works with protocols as well: SMTP, IMAP, HTTP, IRC are EXTREMELY easy to understand and code for. Binary protocols are a huge PITA to code for. The argument that you're going to use a lib or whatever tool just doesn't hold any water. You want to understand what exactly happens.

Thats how everyone learns, etc. I could write my own SMTP, IRC clients when i was 10. I could understand it. It works. No way I could understand fully the documented binary protocols. I tried, and it was just too painful and not fun at all (hey, I was 10).

I'm not certain the added performance of using a binary format and some of the other advantages are really good enough to make the world unable to understand whats going on anymore by just looking at it.

Sure purely technically speaking, it sounds like "binary is the way to go" for pure performance.

But if you think about it, making hacking around that stuff a niche thing is perhaps a much greater loss. Even the reliability of a binary protocol is VERY arguable.

In fact I'll put a last comparison. Shell pipes and ASCII. Many tried to replace them with smart binary protocols, objects, etc. Its cool. Its more powerful. More efficient. At the end of the day tho, a quick hack with regular pipes transferring ascii is just easier to understand and we all use those - not the fancy binary objects.

> First, ASCII is inefficient.

Why aren't we using binary file formats anymore?

XML is more easily debugged when the machine tools don't work.

It's also more accessible, more easily created and modified, and thus, more available to a wider range of people than just web design professionals.

A binary protocol is bad for a couple of reasons, too.

First, OSI layer 6 called and it wants its old job back. It sat around connecting layers 5 and 7 peacefully since the dawn of the ARPANET, all while the HTTPbis guys were passing around messages back and forth trying to obsolete it.

Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.

Third, the Internet is big-endian while most common processors in use today are little-endian. This is going to haunt peoples lives forever because you have to continuously convert between the two and although the conversions are orthogonal, the methods aren't idempotent (as opposed to converting a string to ASCII, or a text buffer to DOS style line endings).

You mention 32-bit identifiers as opposed to a string of digits. This is more error-prone than you think, two's complement isn't the only integer representation out there. Implementations written in C would have to deal with their underlying architecture as the standard allows for 3 different representations (so the compiler wouldn't help you out). Then there's signed and unsigned, either of which might not be available in the implementers programming language. You end up unpacking the identifier by hand, which may end up being slower than just looping through a string. ASCII is hardly an inefficient serialisation format.

Fourth, any fixed-length records are going to be useless at some point in the future. Several versions later the fixed length records are going to either point to an extra set of records tailing the 2.0 records or will simply have a (designated) backwards compatible value for consumption by older peers. With HTTP, we can add a header anywhere in the request or response except for the very top. We can even shuffle them at will without adverse effects on peers.

Fifth, it doesn't make sense to optimise a tiny fraction of the entire HTTP session. Any benefits are too small to be worthwhile and would therefore result in a net-negative to most implementers.

Sixth, you can still make improvements to HTTP without moving to a binary protocol. Not sending the same headers on every single request isn't one of them. HTTP is essentially a stateless protocol and every request could be handled by a different server. You can architect clusters of servers routing incoming requests however you please and satisfy every one of them correctly and efficiently. For starters, you can replace any underlying protocol in the stack with a more cluster-friendly protocol in transit.

Seventh, just because no one is using a given content type in HTTP (I think you were referring to multipart/related) doesn't mean the protocol used to transfer that content is bad. Heck, it's not even part of any HTTP standard.

1. This is not an argument.

2. Amazingly, people have made binary protocols work before, in spite of no preexisting implementation, so it's not impossible. I'm sure we will be able to meet the challenge.

3. Do not try to sell me the endianess issue. I have written multi-arch tcp/ip stacks and i'm not a CS major. Trust me, it will be okay.

4. Yes, and IPv6 address space will someday expire. But not soon. And as many fixed-length frame protocols have done in the past, you leave an "extra frame options" bit to stack more fields on. It's fine.

5. It's really not about optimization at all. It's about common sense. The computer works better when you talk to it in computer-speak, and we gain absolutely nothing by talking to it in English human-speak. The benefits are a net-positive because parsing is easier, because a computer is parsing it, not a human. There is no sane argument that can validly claim that parsing human-readable English is easier for a computer than fixed-length bitstrings. CPUs don't grok ASCII, they grok BINARY.

6. Modern designs for clusters of web applications route by session, not by individual request. You are session-oriented instead of connection-oriented, though in practice it's almost the same thing. And see previous comment on why adding onto HTTP willy-nilly is just a hack.

7. No, just the jgc's re-implementation of multipart is bad, for previously stated reasons.

If you think binary protocols are so great, you have to then explain why text protocols are winning all over the place. People tried to do binary for a long time before HTTP won. We had all kinds of RPC mechanism - CORBA, DCOM, etc. Even the winning data serialization formats are mostly text (JSON, XML) despite the fact we know it's less efficient. Even where people make binary versions the ones that succeed are direct one-to-one translations of the text (eg: BSON).

In the end, it is formats that people can understand that win the day. You can't just write that off as if it has no value. It plays out in technical ways: all the CORBA implementations ended up having very poor interporability partly because they were hard to debug. Nobody could actually look at a CORBA exchange and see what was wrong with it.

It's because developers need to read JSON/XML regularly. They validate the data the are sending to the client, they create test cases, the testers often read them as well, it is sometimes stored them in databases. It's because the format changes so frequently that reading it is important.

HTTP is not comparable because it never really changes. It's a fixed format. And frankly the majority of developers never need to go down to that level anyway.

HTTP is not fixed and is in fact very flexible. And even if it were fixed, that doesn't make it broken and certainly doesn't mean you get to replace it with a binary protocol. They don't compare.

I'd wager every site sitting behind CDNs or a varnish saw a developer go down to telnetting to port 80 to debug the cache behaviour. If you include frontend developers, sure your majority of developers assertion is true. Select sysops only, and you'll be e surprised.

Re 4: Only under incredibly optimistic models of the future survival and expansion of our species! The IPv6 address space has about 10^38 addresses. Earth's land surface is about 150*10^18 square meters. So in a future where the planet is so crowded that every person lives on a single square meter and owns 10^6 globally routable gadgets, we'd still need 10^12 Earth sized planets to exhaust IPv6.

(Caveat, I'm back-of-the-enveloping this on my phone about to go to sleep But still!)

Allright, if you're so inclined...

1. It definitely is an argument and in fact my main argument. At least explain how this would be any less valid than your "it's a hack" and (bandwagon) "wave of the future" arguments.

2. My turn to invoke "not an argument". Just because one can simply copy a struct over a socket doesn't mean it's a good idea to do so. Especially in light of the flourishing culture of diversity we have on the Internet.

3. You conveniently choose to ignore one half of the argument, but miss it entirely. The point is not that we can't overcome endianness mismatch, it's that we shouldn't have to. At least not inside layer 7.

4. Except the old records will have to remain there forever. HTTP implementations dropped the Pragma header a long time ago and today we can simply pretend it was never there.

5. When it comes to common sense, ASCII is right there. That's because a protocol on the Internet needs to interoperable with many systems. Sure, all of those systems use binary one way or another. But human operators are still going to have to program those systems and ASCII is a useful representation which enables us to do just that. Furthermore, the draft proposes to encode binary headers in base64 in order to transfer them in an HTTP/1.1 upgrade request. Now you have 3 ways of transferring HTTP headers instead of just one and we'll have to support all of them in any case. This might seem trivial to you, but it's a problem to servers and quite a huge one at that for intermediaries (proxies).

6. Amending HTTP with a new header again is much less a hack than providing a way to switch to a binary protocol and resume communications from there. Your buggy argument doesn't stand, for HTTP is not the car. It's the pavement upon which old buggies can ride along just fine until it's no longer considered safe amongst the faster carriages.

7. Yes, I'm not convinced client-provided request identifiers are the way forward myself. Though I would consider the proposal a better starting point for discussions than the current HTTP/2.0 draft because it leverages existing mechanisms better.

1. I understood your point to be "Well why isn't HTTP layer 6?" or "Why isn't layer 6 used?" which makes no sense as TLS is layer 6, and HTTP (and the web service) is layer 7. They necessitate each other. Simply stating that X and Y are different parts of the OSI model are not arguments toward the format of a protocol in one layer.

2. My argument isn't "just because you can", it's "you can." You seemed to be saying it would be difficult if not impossible. I was saying, no it isn't.

3. Endianess will always be an issue, forever. The only time it will go away is when every architecture picks one format. It's a really simple operation and it's part of how computers expect us to behave due to their nature and design. Hacking around it doesn't make it disappear, nor does it help anything.

4. What old records? Pragma was deprecated in 1.1 yet included anyway for god knows why. There's no reason they should do so again, but if they do, it will exist both in text and non-text versions. This is a non-issue.


  > a protocol on the Internet needs to interoperable with many systems
You mean like IP, ICMP, TCP and UDP?

  > But human operators are still going to have to program those systems

  > and ASCII is a useful representation which enables us to do just that
Sure. My C code editor displays ASCII. It totally enables me to write IP, TCP and UDP code, using an ASCII display with code in ASCII. And it neatly compiles down to binary and runs a binary protocol. Amazing!!!! (seriously though, if your argument is that ASCII is just easier to "program" as a protocol, you're up shit creek; you have to write more code to handle converting ASCII to binary and back anyway. your high-level language abstractions hide this fact from you, and you think it's a convenience because you never have to learn what a constant is)

  > it's a problem to servers and quite a huge one at that for intermediaries
That's backwards compatibility for you. If the alternative is to simply mangle and bungle the existing format into a frankenstein into eternity, it's not going to be any better.

6. Are you comparing extending HTTP/1.1 for a single feature to the backwards-compatibility support of HTTP/2.0? Because that makes no sense. The vehicle analogy is just weird at this point.

7. See, this is where the vehicle analogy works again. "leveraging existing mechanisms". In other words, let's throw one more feature on top. It never ends, because all you have to do is keep adding more lines, and modify the browser, and modify either the server or your web app, and keep going to support god knows what. At some point they'll implement an incredibly complex binary protocol and embed it in base64-encoded ASCII HTTP/1.1 headers, because "leveraging existing technologies" is thought of as a neat thing to do. It will also be insane. At some point you need to just make a better <whatever> instead of hacking and hacking and hacking onto it to make it do what you want.

Like building the great pyramid of Giza out of tinker toys. Sure, it's easier for people to use tinker toys. It's easy to understand. You don't have to do any real work. And it's also not meant for that task. At some point you need to throw out the toys and use stone.

I can even go further. ASCII is too old to use. Really, it's been antiquated by UTF-8. It is telegraphic codes for teleprinters. And ASCII itself was micro-optimized to only be 7 bits, and the 8th bit was used as a parity bit because perforated tape had space for 8. ASCII is so antiquated (1960) that nobody should be using it anymore.

Clearly we need to implement HTTP/2.0 in UTF-8 wide characters, so connections to China, Japan and India will support their native language in the protocol. (After all, what's the point of a native-language protocol if only English speakers can read it?) Also, we should include the byte order mark at the beginning of all messages so we don't have to worry about how endianess works.

> At some point they'll implement an incredibly complex binary protocol

No need to wait: http://tools.ietf.org/id/draft-ietf-httpbis-header-compressi...

Just look at that and weep. That whole document deals with how to represent HTTP headers. It doesn't define them, their behaviour and how they should interact. No. This multi-page document merely documents how these headers should be represented.

You know, things which up until now has been:

    Lines of text with key-value-pairs delimited by a colon-sign.
Noticed how that didn't take eighteen pages and pressumed anything about current generation consumer DSL MTUs? Yeah. That's a nice, simple and good spec.

Obviously this HTTP2 binary monstrosity is being done all in the holy Google-name of micro-optimizing performance.

This is terrible design and quite literally obfuscation more than anything else. I cannot believe the IETF is even considering this junk.

Edit: Link to an IETF discussion on the subject: http://www.w3.org/Search/Mail/Public/search?keywords=&hdr-1-...

1. TLS has nothing to do with this, it's Transport Layer Security (even though you may think of it as layer 6) because it doesn't alter its payload. ASCII/UTF representation and the messages themselves are layered on top of it. By going binary, you may well end up forcing your encoding onto systems which are not native to that encoding. Whereas of right now you could link any two systems and exchange messages, a binary protocol would mean that some systems can exchange messages freely and other systems would see garbage. That's why the Internet was standardised on ASCII and \r\n, so we wouldn't ever have to deal with that again.

2. I don't disagree it's easy to come up with a binary protocol, taking a short cut is always easier. Just like it's easier not to write a test harness with full coverage for a software project, that's entirely up to you. When a regression causes havoc down the road before you realise what's going on, well, rather you than me.

3. You're defending a regression, as of right now it's a non-issue. And are you really calling ASCII a hack around endianness issues? On what planet?

4. Records that are going to be deprecated down the road, which I think is fair to consider inevitable. All I'm saying it's been a problem in binary protocols before, so let's not do that. You don't see this as a problem at all, so I'll digress.

5. You mean like IP, ICMP, TCP and UDP?

Yes, these are built into the operating system. Once you start using e.g. netcat (who hasn't piped tar into netcat for a quick backup?), all of that becomes transparent.

No, my argument isn't that ASCII is ipso facto easier to implement. It's that it's easier to test, debug and always see exactly what's going on over the wire.

If the alternative to a text-based frankenstein format is its binary-bastard child, I'll have the former thank you.

6. No, I'm telling you to think of HTTP as a conveyor rather then a payload. There's a difference, and that's why the vehicle analogy is weird.

7. So you're proposing that every N years we create an entirely new HTTP and upgrade to that? At what point will the streaming pile of upgrade requests yield a noticeable reduction in performance?

Also, UTF-8 characters are not "wide", they're variable length but not wide as in multibyte encodings. Then you go on to suggest we use a BOM at the start of every (UTF-8, mind you) message, I'll leave it up to yourself to let that sink in. You even spelled it out.

I agree 100% on binary being a bad idea in HTTP spec, largely because of encoding and going backwards. But also binary and fixed lengths lead to harder to stream situations, chunked problems and being less approachable which leads to less innovation I believe. I'd argue HTTP clients/servers are better at dealing with buffer overruns because it isn't so set to a fixed length and more based on better defensive content messaging.

Many of the complaints on HTTP are really complaints about MIME messaging which the entire internet is really built on (standards anyways) and has ran pretty smoothly for a very long time. Approaching improving HTTP by addendum like SPDY is a better idea. Or possibly transporting it better over streamed protocols like SCTP: http://tools.ietf.org/html/rfc6525 no need to modify the packaging/messaging format.

MIME/HTTP/HTTPS are very flexible and if you want binary can be added in and has been in multipart, EDI/HTTP/AS2 and other RFCs use this. Multipart isn't used as much because it is more problematic (used heavily in email and custom protocols) so making the whole spec that way would be bad overall. The points on the OSI layers is very key, let's not revert to binary + base64 everything just to get data across the wire. You can put anything in there, basing it in text and human readable is always a good idea. That is really what this whole layer is about. Binary type of movement pushes us back to the days of non standard blobs, problems that http messaging then content as xml then json solved by standardizing readable exchange of data. When you are exchanging data in a standard way it should be very basic to minimize problems not collude. Throwing out all of MIME just to speed up HTTP when other protocols exist for any needs that are faster (real-time, attaching files, streaming etc) is a bad idea. Also changing support from HTTP < 1.0, to 1.0 to 1.1 had many problems, unless this adds considerable benefit, changing it adds more problems.

Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.

That's a very 1970s way to develop a new protocol.

These days you specify the protocol in, say, an XML- or JSON-based file format, and then run a code generator to produce client and server libraries directly from the spec. This has the advantage that the implementation is derived directly from the specification, so there is little room for ambiguity.

Wayland is one example in the open-source world of where this is done, but I've seen the technique used in proprietary shops as well.

That's useful for RPC type protocols, but HTTP isn't RPC based and it has lots of semantics written in English. I think it's better that way because it allows for a greater variety of use cases and implementations. You can still do RPC with websockets, if that's what you want.

I've seen this done with the on-the-wire protocol of scientific measuring equipment. It's hardly just for RPC (which HTTP increasingly resembles anyway).

The point is, the ability to "type" a protocol is irrelevant to how modern distributed software gets developed. Maybe it mattered in the days when comms were at 300 baud, machines had kilobytes or perhaps megabytes of core, and the Mark I eyeball was the best way to debug machine-to-machine comms, but these days we have tools that can decipher binary wire protocols for us. Performance and adaptability are far more important now than human readability. That war has been lost.

By these days you mean the last 30 years? ASN.1 was defined in 1984, so actually its probably closer to 40...

Real tools? I would argue that there are way more real tools for debugging ASCII-based protocols than there will ever be for binary-based one. ASCII-based protocol is highly compose-able by allows us to separate intent (contents of the message) from ever-evolving encodings (rebasing, encryption, compression, etc). In this world of fighting complexity, why are we not favoring the simple?

> Fourth, fixed-length records are the wave of the future!

1960s-style arbitrary field size limitations: the wave of the future! No doubt any day now we'll reorganize the internet around shipping punched-card images around, too. We could call the project the "Because It's Time Network", or BITNET.

> Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers.

Most people would argue that, yes; famously, Kernighan and Plauger did argue that in The Elements of Programming Style in 1974. By "easier-to-write parsers", you mean easier than dict(((key.lower(), value.lstrip()) for key, value in (line.split(':', 1) for line in header.split('\r\n'))))? Because I think that's going to be a pretty tough bar to fit under. (Yeah, I know you need another couple of lines of code if you're going to handle indented continuation lines, but you can get rid of those without returning your protocol design to the Summer of Love. You could also get rid of the .lower() and the .lstrip() while you're at it.)

> Add to this that every header could have a 32-bit identifier (4 bytes)

Padding out sub-byte-sized values to fill out fixed-width fields: the Intelligent Man's Approach to Saving Bandwidth! Or you could just use one-letter names in an ASCII protocol.

> [Fixed-length fields] solves crazy problems like header injection and request integrity checking.

Clearly we've never had parsing bugs in binary protocols full of fixed-width fields, now have we? Surely not bugs that produced security holes? Except maybe TCP, IP, and DNS. And X.400, and X.500, and X.509, and some of those were the fault of ASN.1 BER and DER, which are hardly fixed-width formats. And surely silently truncating a value to put it into a fixed-width field would never change its semantics, right?

> as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits?

Well, let's see. How many files do I have here?

    $ find | wc
       5946   17074  330147
Each one has a ctime, an mtime, an atime, and an inode number. The ctime is generally going to be the same as the mtime in this case, so we'll leave it out. I think they're technically 64-bit values in the current inode structure, but let's count them as 32 bits instead, since none of my files are from after 2038. The inode number is also 32 bits. So if we take these three 32-bit values per file, we have 5946×12 bytes, or 71352 bytes. And if we print them out as digits and compress them?

    $ find -printf %.10A@%.10T@%i | gzip -9c | wc -c
But wait, that string of digits isn't parsable; it's just digits. So let's add delimiters.

    $ find -printf %.10A@\ %.10T@\ %i\\n | head -3
    1373396230 1373338685 2359306
    1372967497 1372967489 5246218
    1365458166 1365458157 5248264
    $ find -printf %.10A@\ %.10T@\ %i\\n | gzip -9c | wc -c
So a compressed string of digits is a lot smaller in this case. But you could argue that that's just because my data is highly redundant, since most of the timestamps are going to be within the current few years, which is true. But then, most data is highly redundant. How bad can it get, in the worst case of representing uniformly distributed random 32-bit values as compressed strings of digits, with spaces between? It adds about 44% overhead:

    $ dd </dev/urandom bs=1024 count=1 | 
      od -w1024 -l | tr -s ' ' ' ' |
      gzip -9c | wc -c
    1+0 records in
    1+0 records out
    1024 bytes (1.0 kB) copied, 0.000622635 s, 1.6 MB/s
> Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.

But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.

1. Sarcasm meeting sarcasm; HN truly has turned into hell.

2. Trying to fit an HTTP field parser into one line is not the way to win a programming argument.

3. One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect? You're trolling me now.

4. Yes, software bugs happen! It's crazy I know. But let's go ahead and assume that the security holes that still plague applications today due to design flaws are not the same as a couple off-by-one bugs decades ago.

5. By virtue of the algorithm, compression works better the more you have of the same thing. You won't have 70KB of headers to compress at once; more like 400 bytes. The compression of individual header groups each time will not benefit from the previous data's compression, as TLS or SPDY might do. The eventual overhead would not only be larger than a bitstream but take more CPU to decode.

6. Not only are they inefficient, they add complication to the parsing of the protocol, which is one more thing an application can mess up. Not only is it slower, it's more prone to errors. A VPN does not fix that. Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.

> One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect?

I wasn't suggesting encoding "Host: news.ycombinator.com\r\n" as "Hnews.ycombinator.com\r\n" but as "H:news.ycombinator.com\r\n". As long as you keep the colon, you can still use long names for other headers.

> Trying to fit an HTTP field parser into one line is not the way to win a programming argument.

You said parsing would be "simpler". It's going to be hard to get simpler than something that you can fit into one line.

> Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.

Well, that's kind of what I'm saying: let's focus on solving real problems, instead of recreating new ones that we'd already solved decades ago, in order to "solve" non-problems like HTTP header encoding.

So you'd rather an illegible ASCII representation instead of an illegible binary representation. This is why I hate getting into these arguments; people will insist on completely illogical nonsense as far as they can take it.

> It's going to be hard to get simpler than something that you can fit into one line.

This is a terrible argument, as you can fit anything onto one line if you string it along enough. But here's one example of something simpler:

strncpy( frame_struct, buffer, sizeof(frame_struct) );

And i'm not proposing we merely solve the problems of HTTP. That would make too much sense; people are much more willing to put up with bullshit than do the hard work to make things work correctly. I was proposing we make things work better, simpler, and more reliable, and throw away the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format. But whatever, it's not like this thread will amount to a hill of beans.

So let's just convert all of HTTP into XML.

Your actual parser there is the definition of frame_struct, which you left out; and, as others have pointed out, if you're putting ints in there, you need to ntohl them. Also, you probably need some kind of extensibility.

And I don't really think "H:news.ycombinator.com" is quite as illegible as your suggested 32-bit integer space — which, by the way, is small enough that you'll probably need a central registry to prevent header name conflicts — and it also occupies only two bytes instead of four for the header type. So, from my point of view, the "completely illogical" thing is to go from, "The header names currently in HTTP are too long!" to "Therefore let's replace them with 32-bit integers in a binary protocol" instead of "Therefore let's shorten the header names in HTTP", which solves the problem more thoroughly and with less collateral damage.

And what is this about "if you string it along enough"? We're talking about a parser (for RFC-822 headers without continuation lines) that fits into 110 characters, here, without the least obfuscation. Less than a Tweet. In fact, I just Tweeted it. And it worked on the first try.

> the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format.

You know, we did kind of try binary protocols already: the whole IPX stack, CIFS, X.everything, SNMP, TFTP, ICB, Sun RPC and thus NFS and NIS, and so on. A few survive in common use: DNS, TCP, IP, ICMP, SSL, SSH, BGP, and to some extent, SNMP. And there are lots of them working fine inside of particular companies, rather than between implementations by different vendors. But for the most part, they've been replaced with textual protocols, despite the lower efficiency and in many cases the first-mover advantage: HTTP, SMTP, and IRC, and previously FTP, Gopher, and Finger. You seem to be arguing that was an accident, or a mistake. It's not.

SIP is inspired by HTTP and has defined compact headers like this for over a decade [0]. It is hardly indecipherable.

Why do you find HTTP impractical? There seems to be a lot of evidence to the contrary.

[0] http://www.ietf.org/rfc/rfc3261.txt

> But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.

This was mentioned once in the IETF discussion, before someone said "but uhm, SPDY is binary, and we have data from SPDY, and yeah".

After that everyone was too busy discussing how horrible unstrict and ambigious text-formats can be, before jumping off to a 20 email discussion about which endianess should be preferred and how the clients and server should determine which one to use, or maybe clients should support both kinds endianess.

All without a hint of irony. It's like a Bizarro world IETF discussion.

what is smaller: a string of bits, or a compressed string of digits?

What are you talking about? You could have had something on your mind, but this is a terrible terrible juxtaposition of words...

>I would have liked to see an ascii based protocol

I disagree I think HTTP should have been a simple binary protocol from the start and HTML should have required compilation into a binary format.

How much work would have it really been? htmlcmp foo.html providing foo.bhtml. No whitespace, No end tags, one or two byte tags, etc. Strictness in the reference HTML compiler implementation could have saved the web from all the stuff outside the actual standard that browsers (and other tooling) now have to support (so they don't "break" the web).

I'm not suggesting anything as crazy as the flash binary format (I wrote a Java flash player once...), but when I started to write things like proxy servers and HTML minifers I was blown away by the extreme inefficiency of HTTP/HTML.

This is a step in the "right" direction IMHO.

I believe that the "world wide web" took off for two reasons. It was completely free and it was incredibly easy for ANYONE to make a website. If you needed to "compile" all HTML it would have discouraged a lot of casual experimentation. Not everyone understands what a compiler is.

Just as an example I started playing around with HTML when I was about 12 (in 1998), it was easy and I got instant results. A year or so later I tried to learn Perl and quickly gave up because I couldn't get my first script to run. It was another year before I tried to "program" again and became hooked.

HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.

Wouldn't it have been cool if there were an efficient binary format and a nice human-readable and human-editable format, with a well-defined transformation from one to the other?

Back in the 1990s, Ron Rivest came up with canonical S-expressions, which are fully capable of representing the same information represented by HTML, XML, ASN.1—but can be either human-readable or binary.

Here's a simple example: (p (* class (footer x-treme)) "This is a " (b "footer") "."). Very human-readable, very human-editable, and easily machine-readable, wouldn't you agree?

As a binary format, it would be (1:p(1:*5:class(6:footer7:x-treme))10:This is a (1:b6:footer)1:.). Still geek-manipulable, if necessary, and extraordinarily simple to parse. And it has the advantage that it is a distinguished encoding--any of the myriad human-readable encodings all reduce to the same canonical encoding, which has advantages for hashing.

I'd argue the big thing that made HTML so easy to play with was permissiveness, not the lack of explicit compilation.

Perl doesn't tend to have an explicit compilation step. You could write a HTML compiler to be as permissive as HTML parsers and you wouldn't have the same frustration you had trying to get started with perl.

Gopher was simpler to understand than HTTP/HTML. But it wasn't as flexible, and didn't have as nice support for pretty pictures and midi files.

If anything, HTTP/REST was much more complex. HTML was somewhat mystic yet allowed you to move things around easier, and actually design things vs just presenting them. And hyperlinks were really cool.


>HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.

The web is big enough today that we can afford to make it more efficient since we have so many professionals that don't require the kind of implicit hand-holding provided by the original implementations. Back then everyone was a newbie and there were no web professionals.

I think we need to be careful about claiming its more efficient. There seemed to be significant disagreement about how much more efficient SPDY was last year from the various different benchmarks. From memory the consensus seemed to be that its only 5-10% quicker. Does anyone have any more up to day benchmarks?

I hate the idea that you would have to be a "web profesional" in order to get started today. We should embrace and keep the culture the web was built on.

That's an incredibly good way to lock out new newbies. I don't know how many people I know who got into programming through web development. Just because existing web developers don't necessarily need a simple solution doesn't mean new web developers couldn't take advantage of it.

For what it's worth, I started messing with HTML in elementary school, but had no idea how simple HTTP/1.0 was until I hit the 400 level classes in college. Editing in Notepad was the only way to go back then: even Dreamweaver got too complicated. Likewise, my first foray into JMS/message-queue type stuff was with STOMP, the simple text-oriented message protocol: there was no way I was going to understand the 'open wire protocol' for ActiveMQ. Ain't got time for that!

All that said, I think peterwwillis is right: http/X.0, as long as it's well-simplified, is better in binary than it is in text. Ideally, there's a bijection between your text-mode and binary-mode (like a lens), where it's easy to parse (rely on your toolset to do the translation back and forth) and easy to put on the wire. Forth is a good example of how to do it sanely.

Really? I can't imagine anything they might do putting this out of the reach of say a sophomore C-S student or a motivated independent learner.

A lot of current professionals started out as 11- and 12-year-olds dodging COPPA to host their first HTML web sites, not CS sophomores.

>A lot of current professionals started out as 11- and 12-year-olds dodging COPPA to host their first HTML web sites, not CS sophomores.

I would put those 11 and 12 year old down under "motivated independent learner"

If your concern is the a binary HTTP, too bad, it already here, its called HTTPS. It does seem to be holding anyone back.

If your concern is a binary HTML/compiler for it, I don't by that either. I was using a compiler when I was 8 years old and didn't have a problem with it conceptually.

HTTPS is HTTP over a TLS connection, not a binary version of HTTP.

Obviously, but hits take very helpful to look at HTTPS over something like wireshark. That was my point.

Yesterday, I showed a classroom full of high school kids HTML and CSS. They were so excited to be building stuff, and thought it was so neat that the text would turn into a web page.

Just because we have more 'professionals' today doesn't mean everyone is a professional.

I think to get maximum innovation on anything technology related, it should always be simple enough to get started at any level, then as you gain more experience you can (and the specs allow you to now -- AS2/EDIHTTP spec for one has encryption, compression, on top of HTTP/HTTPS - http://www.ietf.org/rfc/rfc4130) do more for performance, optimization etc otherwise it is premature optimization and a bit of a wall.

Like with gaming, each game should be simple to start but deeper to master. That is what this layer is all about, lower down the OSI stack it is much more of a wall to beginners. Never lock out beginners as they can be better masters with time, don't hide the entrance to the labyrinth. Leave things as approachable, but with better professional experience, modifiable to perform better. We have all that now, a competing binary protocol will never see the innovation that a more simple one like HTTP will. Higher up the stack you can see how this was better for exchanging data in standard ways from old school binary blobs, to CSV files, to XML files, to JSON. Same reason REST services won over SOAP, simplify... There is a JSON binary format BSON in mongo and also MessagePack but guess which one is used more in service/exchanging data the textual one or the binary ones? The binary formats work well in certain situations where both endpoints are controlled by the same entity.

Binary and more locked down/optimized formats and messaging have their place but the start/base should always focus on simplicity over optimization.

The general rule in exchanging via standards is be liberal in what you accept, conservative in what you send. Being all binary all the time is a backwards step and is conservative on what is accepted, I also think it would lead to a host of difficult to debug problems just based on work I did with AS2/HTTP RFC implementation one of them being streaming and of course encoding/decoding which can fill hours of work if you can't visually see the content at some level.

But I don't think we should be raising the barrier to entry and making it more difficult for a young hacker to get started. Like the above commenter I started hacking around with HTML and CSS when I was around 12 and it led me into linux/python/php/web frameworks.

There's something beautiful about being able to teach someone how to write a basic HTML document in 20 minutes. My mother can easily understand what HTML is doing, I doubt she'd understand a compiler.

I see where you're coming from, I had a similar experience playing with HTML as a kid. That said, "compiling" HTML from text to an efficient binary format wouldn't have prevented browsers from including those compilers themselves and transparently compiling all plain text pages received. Newbies could keep on serving plain text HTML (until someone chastises them for it) while the web as a whole would benefit from the increased efficiency that comes from binary pages being the norm.

Your argument doesn't really apply at all to HTTP, though. No one "got their start" peeking at HTTP requests. It's solely the domain of those working on infrastructure (for some definition of that word). Anyone with any idea of what HTTP really is (ie more than "that thing at the start of a URL") should have no problem using a tool to convert between a binary and text representation of the protocol. It's not like you can just magically pull HTTP requests out of the ether, you need a tool anyway. There's no reason why curl (for example) couldn't transform a request you write into an equivalent binary protocol, or why it couldn't do the inverse operation when it receives the response. It's utterly ridiculous to me that there is such inefficiency in HTTP just to make things slightly easier on the implementers of curl and wireshark.

HTTP is a protocol designed to communicate information between two machines. There's no reason it should be human-readable. Trying to make a protocol that's easy for humans to read and write leads invariably means making it harder to write software for.

A while back, I wrote a daemon that checked a bunch of network stuff in a loop. I needed a UI for it, so I made it speak http. No library, just raw GET support. It worked. Didn't take long to write, either. I would never have tried that with a binary protocol.

I don't think anyone's saying HTTP/1.1 should go away.

You say that like you think a custom binary protocol would be harder than writing a HTTP server, even a simple one, from scratch.

A custom binary protocol? Then I would have to write a client! And install it everywhere. And keep it updated. And design my gui in wingdi instead of html.

That is not a viable option.

I thought you said you weren't using existing libraries? Or did you just mean for the server? You made it sound as if you thought writing a server for a custom binary protocol was harder than writing a HTTP server from scratch.

If you're aiming for an existing ecosystem, then sure, there's no reason not to use HTTP, assuming you make use of established libraries. But widespread use is HTTP's only real virtue; the protocol is considerably more difficult to implement correctly than it should be.

No libraries. Ordinary web browsers for the client side.

A minimal HTTP server that recognizes GET requests, finds the url, throws away everything else, runs the relevant code and returns an HTML document of the results is actually really easy to write. And easy to integrate into an existing event loop.

Sure, if you cut corners, place some hard buffer limits, discard the vast majority of the specification, and don't care about breaking parts for brevity, HTTP starts to become tractable.

But even a minimal HTTP server, even one that ignores things like HEAD requests, is still going to be more complex than, say, receiving the URL raw, or even a URL wrapped in a simple structure like a netstring.

Actually implementing a full, correct HTTP server would be one or two orders of magnitude more complex that implementing a more modern protocol from scratch.

Despite the "extreme inefficiency", HTTP/HTML have managed to work successfully for a very long period of time (and even worked decently well on very slow hardware in the '90s).

There is actually a binary format that much of the web content is compiled into: gzip. It's remarkably effective.

I will grant that following Postel's Law means that browsers have more work to do to ensure that all kinds of "busted" stuff on the web continues to work, but I'd guess that, at this stage, that work is pretty small compared to everything else browsers are trying to do.

>here is actually a binary format that much of the web content is compiled into: gzip. It's remarkably effective.

I'm sure you realize this but gzip is a content encoding. If there was a binary html format, it could still (and should be) gzipped (or better yet 7-zipped).

I did some playing around this this and gzipped binary encoded HTML ended up around 1/4th the size a gzipped minified HTML.

Gzip doesn't give you the other advantages of the hypothetical binary format, primarily: An extremely standardized and quickly parseable format.

what "binary encoded html" did you use? I would be interested in seeing this as my initial thought would be that the both would gzip to almost exactly the same size?

>what "binary encoded html" did you use?

This was for internal testing, so it was a custom implementation.

>my initial thought would be that the both would gzip to almost exactly the same size?

That is like saying you would expect a gzipped CSV file to be as small as a gzipped database engine file. This of course not the case.

So which is which? Is html the csv or the DB dump? What is this metaphor even supposed to mean? It's not as though a csv file is sufficient to serialize a database.

Actually, it would be more informative to point us at a "binary encoding", any binary encoding, that compresses smaller than the equivalent html text.

I would expect the gzipped CSV file to be smaller.

When you add in indexes or B-trees or whatever else database engines use to retrieve data quickly, database files can get quite big indeed.

HTTP should have been a simple binary protocol from the start

What are the advantages of binary HTML vs plaintext HTML + gzip?

I've observed many binary format efforts. Various binary XML, BSON, binary VRML.

The only efforts that made any sense (penciled out) were binary geometry representations eg enabling mesh compression.

(PS- I hate tagged formats like XML and HTML with the passion of a billion burning suns.)

Compiled binary HTML seems to me would be the end to the open nature of the web. The fact that anybody can view the HTML source is probably irrelevant from a network point of view. But I consider it a fundamental aspect of the web.

How would:

  hacker@gibson:~$ htmcmp ./myPage.html
  myPage.bhtml generated successfully. 
  hacker@gibson:~$ htmdmp ./myPage.bhtml
  myPage.html dumped successfully.
Hinder an open web? Maybe I am wrong, but my gut tells be this argument comes from developers who are not used to working with compiled languages.

I think it would help new devs, can you imagine how nice it would be to have a handy HTML validator like this compiler would be? And how good for the web it would be to force everyone to use it? I can't imagine learning C without the feed back of a compiler and that was what I hated most about learning javascript and HTML.

If you really hate the idea of having to run this compiler, it could be automatically run by apache, IIS, nginx, etc when serving your page for the first time. This is all hypothetical of course since such a standard does not exist.

I think what jakejake is getting at is that right now, anyone can pull up a page in their browser, click 'View Source' and see exactly how to express that page in a human-readable format. This is also true to a large extent (via various free development tools--some even baked into various browsers) for JavaScript and CSS.

What's interesting is that what you describe: >this compiler, it could be automatically run by apache, IIS, nginx, etc ...essentially already occurs in many cases in the form of compression such as gzip. These files are also automatically extracted by the browser and are essentially as transparent as non-compressed ones.

So I think you both have a point: I agree with jakejake that the web must remain transparent. I agree with you that as long as sufficient tooling is freely available, it doesn't matter how the underlying protocol works. I.e., a binary representation of the page is still fairly transparent if there are plentiful tools that will deserialize it into a page object that can be expanded/perused/manipulated and then re-serialized when e.g., Ctrl+s is hit.

What would be bad is if 'View Source' showed something like: 01010100011010000110100101110011001000000110100101110011001000 00011011110111000001100001011100010111010101100101 etc. ...and you needed to spend umpteen hundreds to get a decompilation tool that only gave you an obfuscated/inexact reproduction of the recipe for the page.

Given IE, Safari, Firebug and Chrome all support ADDITIONAL developer tools, how likely do you think it is that you'll be stuck with raw binary when the browser has to decode this to the same internal representation used for HTML today?

Nothing will change on the front-end, just the servers will get faster and new bugs will be introduced. :)

Meta: Is it too late for you to edit your post to break up your string of digits or put it in a code block (prefix with two spaces, surrounded by blank lines)? It's forcing the whole comment page to have a horizontal scrollbar.

Yes: it's annoying enough for me to decide not to bother with the comments on this story.

Sorry, I'm willing but appear to be unable at this point.

Because the web took off cause non-developers, prot-developers were able to easily create pages, look at html/css/js and become developers. It seems trivial to you but having a compile step complicates things to beyond the reach of many.

Plain text is more open and accessible than binary, full stop.

I remember. There were an awful lot of people who learned from, "How did that work? Let's view source!"

Once pages become big and complex, this learning strategy becomes much harder. But in the early days of the web it was utterly essential.

You will still need and use plain text. File:// isn't powered by HTTP ;-)

You could have embedded the compiler right in the browser. If you would open a html page it could have been automatically compiled and vice versa.

Not that it really matters, but since I'm the one who brought it up and your gut was telling you that I'm not likely familiar with compiled languages - I started programming in Pascal and Assembly in the 80's, moved onto Java and C# and lately I spend a lot of my time writing Objective C code.

It'll be no skin off my nose if the web turns into a compiled protocol. I don't know if it's even necessary to continue with plain text web sites these days. But, it most certainly was a major deal initially for me to just "view source" and see that it wasn't a black box of voodoo. The low barrier to entry is one of the major reasons the web took off.

I'll admit I'm an old timer in this business, probably ready to be taken out back and shot! So I have no idea if you youngsters were similarly inspired by viewing the source code on a web site? Maybe with all of the complex client side code and minimized scripts that it isn't even relevant anymore? It probably just looks like gobbledygook to a non-programmer these days.

> from developers who are not used to working with compiled languages.

I am used to working with compiled languages, 9 million lines of C++ on a embeded costum hardware.

Now I know of course that it would not be that bad but still, I dont want to compile more stuff, I want to compile less stuff. I'll let the VM do the compilling for me.

In the grand sheme of thing these discussions hardly matter, its just a 'cool' topic to fight about. Neither with HTML or HTTP size is a big issue, almost never the bottleneck.

The "standard" way to have done this at the time would have been to use ASN.1, either on top of an OSI stack or used in a similar way to SNMP.

Doesnt GZipping HTML fix most of what you're complaining about?

No. There is more discussion in this thread, but for starters:

1. Gzipping doesn't make the HTML syntax that must be parsed by browsers more regular and efficient to parse.

2. Gzipped minified HTML is 4x as large as gzipped binary encoded HTML when I tested it.

Genuine question: Is it really that big of an issue? We still have to use tools (telnet/curl) to inspect text-based protocols. We don't examine raw bytes by hand, do we? In other words, isn't every protocol ultimately a binary one and the issue here is only a matter of degree?

I think it is. We have a large amount of tooling in existence to inspect text based formats. Binary format inspection requires new tooling to be written and deployed. This is a major barrier.

I may be a bit old-school, but I learned HTTP via telnet, a tool definitely not written to inspect HTTP, and I still use it when I’m trying to debug things. Not having to install tooling is still something I take advantage of.

Obviously binary formats can and do succeed, and with sufficient backing tools will be written and deployed. But if HTTP hadn’t been so easily inspect-able I don’t think it would have been nearly as successful the 1st time around, when the benefits of the protocol where less well known.

And I do think some of the "culture of the protocol" will be lost moving to binary°. It just isn’t as hacker-friendly or at least newbie-hacker-friendly and that sends a message.


° Of course this is already happening with HTTPS, so this is probably not a winnable fight. And the benefits of binary formats are significant, so it might not even be a fight I want to win. Still something is being lost here.

> Genuine question: Is it really that big of an issue?


Text-based means everything is open for inspection and self-evident if well designed. This goes away with binary.

Not to mention this protocol will need to be implemented everywhere and used by everyone. This means everyone will need to understand it as well. Open text-based protocols support these requirements simply by being open and text-based.

As if there's not enough bad code around already, in any language not assembly/C/C++, the reduction in code quality and clarity, and amount of additional issues involved the second you change the word "text" to "binary" is staggering. Let's not go there if we don't have to.

Most debuggers today can dump strings fine. Dumping binary, while not impossible, brings extra hurdles. Putting hurdles into debugging, pretty much means you will end up with more buggy code.

As if that's not enough to make think "hmmm", there's the issue of future-proofing and extending the protocol. That is much easier to do cleanly in a text-based format. A binary HTTP 2.0 will be brittle and short-lived.

> We don't examine raw bytes by hand

Because that is very, very impractical. And thus making that a de-facto necessity for working with HTTP 2.0 doesn't make much sense.

I say we should look at it the other way around: Considering everything a binary-format will cost us (just some of those mentioned above), what benefits does it bring us to justify this huge cost?

I say the answer to that is near none, and in an ideal world that would be the end of the discussion.

An argument that it's a fake issue: the fact that best-practices compliant HTTP applications already tend to run under HTTPS, and that understanding what's happening at an HTTPS level in operationally-relevant ways (namely: are we performant enough? and are we secure enough?) already requires parsing of that binary protocol.

It probably already exists, but you could write a telnet that simply displays ASCII and hex side-by-side. I used something like this when I used to reverse engineer laboratory device control protocols (but it was a hardware implementation: http://www.atecorp.com/ATECorp/media/ProductImages/R/HP-Agil...)

Not to mention wireshark...

tool already capable of doing it is socat. ex: socat -v -x - tcp:google.com:80 add >/dev/null if you want do discard plain text output. also works over ssl: $ socat -v -x - openssl:google.com:443,verify=0

I agree. HTTP, like many core internet protocols was designed to work with any client setup, even someone sitting at a telnet terminal sending hand-rolled HTTP requests. FTP is just the same, and SMTP, etc. While very few people these days will actually wind up using telnet to access network resources, it is a very successful design and one which ought not be abandoned without considerable thought.

> understand the motivation to going to a binary protocol for a SPDY inspired HTTP 2.0 but I would have liked to see an ascii based protocol

You're not alone. I've opened an issue on the subject [0]. Feel free to add weight to it.

[0] https://github.com/http2/http2-spec/issues/169

That's a childish way of asking for what you want: (1) opening an issue in the wrong forum and (2) failing to familiarize yourself with the previous discussion threads. You know this is a controversial topic, please act more maturely in the future.

Any time a bug report is linked in Hacker News the community ends up spamming the tracker with our enlightened opinions. Show some restraint.

> (1) opening an issue in the wrong forum and (2) failing to familiarize yourself with the previous discussion threads

OK. Fair enough, but I did look for an existing issue on the matter. There were none.

> You know this is a controversial topic, please act more maturely in the future.

Personally, I find attempting to lock down open internet-protocols by transforming them into some sort of binary obscurity rude and against everything the internet was built on.

I don't think everyone is aware about that happening right now. and I see nothing wrong with attempting to bring focus to the issue if that is what needed. Seeing how this draft still outlines a binary-protocol, this clearly still is in need of attention.

Just because some Google-heads somewhere decided that by completely ignoring everything the internet has taught us so far, and in the name of pre-mature optimization can shave off 2ms on their page-load doesn't mean the internet should pander to their interests.

If anything is controversial it is how this is all being done without any documentation showing us what benefits we get from the costs associated with a binary protocol. That is amazing. Completely and utterly amazing.

But OK. Let's say I listen to your guidance: Where would be a good place to raise this issue? Where would I take my "childish" issues to ensure they get the attention they so much deserve?

As they said, it belongs in the mailinglist. I too find it quite horrible that they demand technical arguments for why something should _not_ change. Usually one needs technical arguments to justify a change.

There is no reason why someone couldn't build a tool to send/receive binary responses and print them out as text. We have wireshark anyway. Just need a cli translator or a program that lists all the possible valid inputs for a given connection state and lets you select one (and allows for raw input just in case you want to try and break something).

For most cases you will probably be able to get the old HTTP protocol anyway. The only potential issue would be for people who are implementing the HTTP 2.0 protocol itself, just about everyone else in the world will be using a webserver or at least a library.

In fact raising the bar to prevent people developing their own HTTP 2.0 implementations would probably be a good thing, it would cut down on the number bug riddled implementations.

In any case I don't think it makes sense to slow down the whole internet just for a few developers.

Binary could also see the possibility of hardware implementations. Think webserver on a chip.

HTTP 1.x was fine for the early 1990s. It's 2013 and Windows has won. Unix is dead.

Maybe you have a nice Ubuntu box and that's fine. I'm talking philosophically. Modern developers favor the performance and functionality advantages that large applications, deep APIs, and binary communication protocols provide over the transparency advantages that small, highly focused tools, orthogonal APIs, and human-readable communication protocols provide.

Unix is dead?

Aren't Mac OS X, iOS, and Android basically unix OSes?

I'd say Unix won.

The essence of Unix is not in the name of Unix, nor in the code of Unix, said Master Foo.

And then Chuck Norris roundhouse-kicked him in the face.

Point being that if by Unix you mean an OS running a Unix-inspired kernel, then yes, Unix won. But if by Unix you mean the design patterns and philosophical principles that have long inhered to Unix development, then Unix is, if not dead, then on its last legs and soon to be discredited.

The "Unix philosophy" has long favored small, easily composable tools each with a specific purpose, orthogonal APIs which were as small as possible, and textual transmission formats which are easy for humans to read and write. There were exceptions (X, Emacs), but these exceptions tended to support and play well with the Unix philosophy even if they didn't 100% espouse it.

By contrast we may propose a Windows philosophy which favors large, do-everything applications over small tools (because that's how people are accustomed to using PCs starting from the DOS days when only one program could be up at a time), heavyweight frameworks and oftentimes entire inner platforms (because in a world where time-to-market is paramount, developers shouldn't have to think very hard to begin cranking out apps for the new technology of the week), and binary file formats (because the damn thing has to run in 640k, a text parser won't fit).

Look at the platforms you mentioned. Mac OS X, iOS, and Android are all app-centric, not tool-centric. You can treat Mac OS X as a Unix box if you want, but it's hard to do so with the other two. Furthermore, when you write an app for these platforms you are not targeting the Unix kernel or libraries but an inner platform built on top of them. Which brings us to binary file formats -- like HTTP 2.0.

The Windows philosophy has won.

Wow. I don't see that at all. You must not use any of the platforms you mentioned.

If you look under the hood on OSX, iOS, or Android, they are all composed of smaller single purpose components. If you are arguing that they do not use interprocess communication to join these component together, then you are correct. However, that is not the Unix philosophy. A great example is Outlook vs OSX/iOS Mail/Calendar/Contacts/Notes. On OSX and iOS, those applications try to do one thing well. On Windows, Outlook tries to do everything.

Beyond that, just about every embedded device and the vast majority of servers now run linux (or a unix variant). Just look at a list packages on those linux devices and you will understand that it clearly is built around combining small single purpose components.

Given that, it is hard to argue that the Windows philosophy has won. In fact, Windows seems to be struggling (as evidenced by slowing interest and reorg/rearch/rebrand thrashing in Redmond. Looking at the market, I'd argue that the Unix approach won quite some time ago.

Am I missing something?

Point him towards the github repo!

no matter what all says, it's all because Google researched on SPDY

At first glance, the spec appears to be re-implementing a TCP stack inside a TCP connection...

HTTP/2.0: For all those people who thought 'the web' was 'the internet'. Well now it is! We've implemented TCP inside of HTTP. There's even a PING message there so you can do ICMP over HTTP as well :)

Yes, the reason is that starting a new stream multiplexed inside an already-open TCP connection is faster than starting a new TCP connection. Attempts to improve TCP itself are slowed down by slow uptake of new Windows versions.

While there have been some tweaks around the edges in terms of congestion control and the like, the TCP wire format has not changed in over 25 years and it has very little to do with Windows. It has to do with the all the middle boxes and routers in the ossified backbone of the internet that choke if they see anything over IP that is not exactly what they expect.

That's not true. HTTP/2.0 (or at least the successor to HTTP/1.1) has been discussed for over a decade, there have been many OS releases since then!

Inventing a whole new protocol to overcome a TCP deficiency is IMO a terrible motive...

These guys have something to say about that: http://blog.chromium.org/2013/06/experimenting-with-quic.htm...

A new protocol that runs on UDP precisely to side-step slow TCP changes.

That's great, but it means it'll always be a userland library rather than a OS supported networking protocol... which means the API will never be a defacto standard akin to the BSD sockets API. The reference implementation is also in obtuse looking C++... which will hinder bindings to other languages.

In short, don't expect to see this outside of Android or Chrome.

On the plus side, this is exactly what UDP was for in the first place.

What efforts are these? That sounds much more interesting than mucking around with HTTP micro-optimizations.

See TCP Fast Open, MPTCP, ObsTCP, TCPCrypt, CurveCP, QUIC, MinimaLT, etc.

I also get the impression that TLS SNI/Snap Start/False Start/Ludicrous Start were also hampered by browsers that used the Windows SSL stack, although I never understood why they would do that.

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

Guess what the designers chose.

Being open and readable is also a check on this. Compare for instance the HTTP spec or JSON or XML files to binary file formats of applications. If it is hidden from sight things get ugly fast. Case in point the PSD file format and any Office file format (http://www.joelonsoftware.com/items/2008/02/19.html). Binary formats are usually only a good idea on the onset, with change it becomes a pile. Binary formats only really work well when you control both endpoints (even then they get crufty), for exchanging information being unreadable leads to many bad things and it a great step backwards.

I work in game development and application development and the former really loves binary formats (slowly changing away from that with server and editable needs). In many bugs or crashes the root of the problem is some offset in a binary file or some incorrect custom binary file format node that breaks everything after it. Readable, keyed and debuggable formats are so much better at their root (they can be binary, base64'd, compressed after but at the root they should be standard in some way and able to change without breaking the whole thing).

I'd like to see increased adoption of SCTP to get multiplexing benefits for all protocols including HTTP.

Google mentions SCTP briefly here, along with some other alternatives: http://www.chromium.org/spdy/spdy-whitepaper

Their main reason for going with SPDY seems to be that rolling out an application-level protocol is easier, while rolling out a transport-level protocol would require changing routers and such. That is almost certainly true. Fixing the multiplexing problem in an HTTP-specific way rather than generally does seem like an unfortunate hack, but it's probably the pragmatic approach.

Doesn't SCTP run over IP as a layer 4 (transport layer) protocol? Why would you need to change routers (mostly layer 3)?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact