What happened to simple protocols? Seriously, per-hop flow control (which works best with which of the dozen version of TCPv4 or TCPv6 flow control?)? TCPv4-like framing with weird limits (16383 bytes)? Keepalives/Ping? Truly ridiculous specialized compression for headers which ignores the role of HTTP proxies? QoS?
Why not just implement the whole protocol over a raw IP connection and stop pretending like we're operating in layer 4+? I get that multiplexing is difficult without flow control, but good lord does this thing look overdesigned for what few benefits it offers over HTTP/1.1.
Look on the bright side--if everyone decides the public Internet is too insecure, maybe we can convince them to keep using the old standards on darknets?
Answer: The Internet is still running on them, 30 years later.
Whenever I read something like "simplicity is hard", it makes me cringe. I hear that a lot, and I see evidence of gratuitous complexity everywhere I look these days. I'd hazard a guess the engineers behind SDPY would find simplicity (and reliability) boring.
Debugging binary protocols is either great job security for over eager engineers like the SPDY team or a great waste of our collective time. I'll let you all decide which.
Let's do it right and then fix NAT
You can get a startling amount of the way there with sequence numbers and a few other things--it's a fun exercise.
I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning, but this is the place that we're at nowadays. For better or worse, the big vendors guide the standards process. I'd like to see more involvement from the little guys, but that has its own set of challenges.
That said, I _do_ think it's extra important to pay attention to what Google does, for two reasons:
1. They're the largest entity on the Internet. This means their incentives are different than smaller players.
2. They _are_ an advertising company. Advertising companies make money by showing ads. They make more money by showing targeted ads. You target ads by collecting data on people.
I think people often forget Google's purpose in the world, and are simply dazzled by 'whoah cool stuff.' I appreciate some of Google's more interesting and ambitious initiatives, but get very scared when people start accepting any entity's actions without question. Specifically when that entity has large financial incentive to collect data about people.
There are, of course, many technical criticisms of SPDY, but none that rely specifically on the advertising angle.
The IETF and the W3C has always had large companies involved in specs, often with their own agendas, I see no reason to attack Google in this way.
Google is an advertising company, therefore SPDY is bad.
Google is an advertising company, and the largest
single entity on the Internet, and therefore, their
actions deserve a healthy dose of skepticism. I'm
not sure that we've been giving them enough skepticism.
The issue is that you're throwing around them being an advertising company as a negative without any discernible proof that it has negatively affected the outcome.
What, specifically (and please spare us the 'rhetorical flourishes') has been proposed that is unfairly biased towards advertising? Which parts should we be skeptical about?
E.g., relating to the ASCII/binary discussion above:
A binary web would require advanced retooling and therefore investment. Smaller business entities are not in such a strong position to deal with such a large shift in their workflow. Therefore, switching to a binary protocol would disadvantage entities smaller than google.
I've mentioned specifics elsewhere in this thread, for example, https://news.ycombinator.com/item?id=6013468 and https://news.ycombinator.com/item?id=6012906
And, as I mention elsewhere, it's not just the spec, it's the general trend of Google dominating the web. https://news.ycombinator.com/item?id=6013468
For technical critiques, I commented earlier: https://news.ycombinator.com/item?id=6013088
Then maybe you should actually bring these things up. You seem to be playing coy and throwing around innuendo. If you have real, substantive concerns about something, I think the conversation would be greatly improved by actually bringing those up rather than just casting aspersions about Google.
It would be enlightening to see the defenders of Google disclosing whether they have any financial or other interest in Google.
To complement that disclosure, any attackers of Google should likewise disclose whether they have any financial or other interests in Google's competitors.
There are things that Google does, as a corporation, that you can find fault with, technological decisions that may or may not have been influenced by business model. For those, fire away. But the attack on SPDY/HTTP/2.0 because "Google is an advertising company" (which if you actually worked at Google, and knew how people made decisions here, you'd know is ridiculous from an intent or motivation point of view) is just pure mudslinging.
Examples of stuff that I, as a Google employee, would criticize Google for: Real Names, building "siloed" services and moving away federated/decentralized approaches (see my essay here: http://timepedia.blogspot.com/2008/05/decentralizing-web.htm...), most of what Yegge said about APIs, Google Hangouts going "silo" and away from XMPP model, etc.
People who work on ads and take their marching orders from ads are a small portion of employees at Google. The guys working on Chromium/Blink/SPDY do not report to ads, do not take orders from ads, and in general, work on technology without reference to monetization strategy. Their day to day job is to improve technology, with the hope that if you raise the tide, all boats will be lifted, and they'll be some ROI from that.
But that the idea that engineers are taking marching orders from shareholders to maximize profits based on ads by tweaking web standards is hilariously wrong for people working on Chrome.
I would also criticize Google for your reasons, and they may be even more important. But this isn't a thread about those things.
"Should we be doing this?" and "How should we do this?" are two very different questions.
> I think people often forget Google's purpose in the world
The fact that Google makes money through showing ads doesn't mean it's their purpose.
Github's revenue stream is through private repositories (both hosted on github.com and self-hosted enterprise), but I don't think you could reasonably assert that Github's purpose is to make a profit off of keeping code private. Their actions, in fact, suggest precisely the opposite.
Where is this codified?
Non-pecuniary returns can satisfy the responsibilities of an enterprise.
It appeared that a legal obligation was being suggested. What sort of obligation was being suggested and how is that obligation derived and enforced?
In brief, he makes three points. The first is that SPDY/HTTP 2.0 doesn't do anything about the widely lamented lack of session handling. The second is that it doesn't contain any simplifications of HTTP, despite there being several examples of things that could be simplified (header parsing, for instance, is hairier than it could be). The third is that it is going to pose problems for proxies.
I don't know how many of these points continue to apply with this HTTP 2.0 draft, nor do I have any skin in this game, but I respect PHK quite a bit so his outrage creates in me a sense of mild reservation. :)
* SPDY depends on Deflate compression, and will require middleboxes to implement deflate to route requests. I think the "IETF school of design" has an irrational fear of good compression and I think it's harmed other protocols, most notably DNS. I may be poisoned into this viewpoint by Bernstein.
* There are protocol constants that PHK doesn't know the background of, which strikes me as the kind of documentation bug that something like an HTTP 2.0 would address.
* SPDY might have required another WKP, which isn't really a SPDY problem.
* There's DoS potential in SPDY --- but of course, there's DoS potential in HTTP too; look at chunked encoding, for instance. For that matter, modern HTTP 1.1 also accommodates compression; when it comes to attack surface, in for a penny, in for a pound.
* A similar argument addresses PHK's concerns about the (theoretic) security of the push model, which is also something that modern HTTP accommodates.
Oh oh also: PHK sees HTTP 2.0 as an opportunity to correct the session management problem, which has led to the "bass ackwards" design of heavyweight signed cookies in web applications. I sympathize with him on this point, but it's not HTTP's fault that this happens. HTTP 1.1 cookies also used to be simple opaque session IDs; heavyweight signed cookies are a consequence of server app architecture, not the underlying protocol.
Even if HTTP 2.0 had built-in robust session management, Rails apps would still be shoving several kbytes of encrypted state out to web browsers.
Such a complete and utter mess of a scheme!
The third criticism, that SPDY makes life more difficult for routers, makes me wonder: would this get easier if SPDY just said "forget the Host header, SPDY requires SNI"? Seems like that would help.
My objection (observation, really) is that one expects protocol 2.0 to do more than address performance optimization. Simplifying the protocol is a good thing to do with a major revision; they didn't do that. Making the protocol more friendly for upper layer users is another good thing to do with a major revision; they didn't do that either. Instead they took an obviously different protocol designed to address a handful of extremely technical performance matters and rubber-stamped it as HTTP 2.0. Whether you like SPDY or not, it should be clear that this kind of "process" is going to leave people feeling disenfranchised. The spirit of HTTP, inasmuch as such a thing exists, is one of simplicity. SPDY just doesn't "smell" like the successor.
I think the comparison to R6RS is very appropriate to my point. R6RS was designed to address well-known shortcomings of Scheme. The process it took to get approved circumvented a lot of the community. A large segment of the community responded to this by essentially whining about it and ignoring it. We already see the whining about HTTP 2.0. I predict it will be followed by ignoring it, and some years in the future, an HTTP 2.1 or 3.0 that more closely resembles HTTP 1.1.
One critique that I don't remember if is contained in either of these two is header compression. Header compression seems to make sense, as compression is good. The problem is that intermediaries make routing decisions based on the headers, and so it's quite possible that the CPU time needed to decompress, possibly modify, and recompress the headers outweighs any gains that the compression brought in the first place.
I've also seen some vague commentary about 'mixing application concerns into the transport layer' which I find compelling, but I don't have enough experience with the low-level networking to properly judge on my own.
Break over! Gotta run.
Yes. They are not representing smaller players, i.e. majority. And I think for smaller players speed is not as important as convenience. So this can even hurt smaller players in a long run.
* TLS not mandatory for HTTP/2.0
I don't understand the first point, though. Could you clarify?
I don't think there's very little questioning...in anycase, Google's goal is to deliver ads to you in the most efficient and fastest way possible. The more you browse the web the more Google makes; it's in their interest to develop SPDY/HTTP2.0. So what is wrong with them doing it? IEFT spec drafts are public and they're audited (as SPDY has been).
Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.
The core internet protocols we rely on, though, mostly didn't. If you look at the authors of RFCs specifying the widely used standards, nearly all of them were at research institutions: Steve Crocker was at UCLA then ARPA; Vint Cerf was at UCLA, then Stanford, then ARPA; Bob Kahn was at ARPA; Jon Postel was at UCLA then USC; Paul Mockapetris was at UC Irvine; Abhay Bhushan was at MIT; Tim Berners-Lee was at CERN then MIT.
Not sure if that's good or bad, but it seems to have been uncommon until recently for internet protocols to come from vendors.
edit: I did think of one important one, IPv6. Steve Deering was at Stanford, then Xerox PARC, then Cisco, and IPv6 came out during his Xerox/Cisco period. Bob Hinden was at Ipsilon Networks, then Nokia.
Sorry, that's been going on for quite a while now. Cases in point: http://tools.ietf.org/html/rfc3768 http://tools.ietf.org/html/rfc5077 http://tools.ietf.org/html/rfc2637 http://tools.ietf.org/html/rfc2281 (these are just examples, there are many more citing Cisco, Microsoft, Nokia, Google, etc.)
Absolutely. But there's more than one kind of control. I don't think enough programmers understand the effects of social control. If the standards are all public and audited, but only employees of Apple, Google, and Microsoft have the time and energy to keep up with discussions, well...
And, of course, I'm not imply that _only_ that is true, I just fear that big organizations are dominating the discussion. I have more free time than the vast majority of programmers, are subscribed to the HTTP 2.0 mailing list, and find it hard to keep up.
It's especially hard when the call for proposals period of the draft is about 4 months and there happens to be a ready made proposal from a big player at the ready to be agreed on almost immediately. It's nice to say the little guy should be a part, but in this case the little guy mostly heard about it long after it happened.
First, to call Google an advertising company makes no sense. It's a tech company. You don't call a newspaper an advertising company either.
Second, there were people involved in designing this, not just an anonymous corporation. You can actually see their names in the proposals. It's a good design, that's why it has been adopted.
How does Google make money again? Newspaper companies are advertising companies, especially given the quality of the news lately. ;)
Absolutely, and I don't mean to denigrate their technical efforts. I'm glad people want to move the web forward. I'm just recommending caution.
> It's a good design, that's why it has been adopted.
Many poor designs have garnered wide adoption in the past, this is not inherently true.
The way Mark Nottingham ran the original CFP for HTTP 2 and the eventual adoption of SPDY as a starting point was very fair - it's all there in the IETF archives for anyone to see. From memory there were only two other proposals (from Microsoft and someone else)
The reason Google were able to get a new protocol up and running is because they have both heavily used web properties and a browser. They're also willing to carry out experiments in public.
As it stands HTTP 2.0 will be good for the little guys too, based on the testing I've done little guys will see an improvement in performance without needing to do all the merging that destroys cache lifetimes.
3rd party-content is the fly in the ointment to the performance improvements so we'll need to be much more careful about the performance of the 3rd party sites we include.
N.B. Apart from using their products I have no affiliation with Google
It costs money to have people on staff who write IETF drafts and haggle them up to RFCs. Hopefully the standard isn't too degraded by the needs of Google in this instance and everyone benefits.
First, ASCII is inefficient. People don't interpret HTTP, computers do. Web servers and browsers. People only look at HTTP when they want to troubleshoot without any tools. With real tools, you can find out what's broken much quicker. And there's plenty of things you can miss without a real HTTP interpreter. Most hackers prefer to think of themselves as wizards that can spy 0's and 1's and tell you what the weather is. It doesn't make for a better protocol, though.
Second, we can already break HTTP responses up in multiple parts, using a novel idea called "multipart". It sucks and nobody has used it since HTML/JS found new ways of providing content. http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
Third, it's a hack. If you want to improve the protocol, improve the protocol, don't just hack onto it to make it do what you want. I could make a horse and buggy go 60mph, but would it be a good idea? How about just designing a better buggy that is intended to go 60mph?
Fourth, fixed-length records are the wave of the future! It solves crazy problems like header injection and request integrity checking. Moreover, it makes for simpler, more efficient parsing of requests.
Fifth, redundancies introduced from the beginning of time need to go away, like terminating every record with "\r\n", or passing the same headers on every single damn request when once should be just fine for a stream of requests. Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.
Sixth, the flow control improvements can make different applications more efficient by both not having to hold state of where and when traffic is coming and improving flow across disparate network hops.
Seventh, as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits? Add to this that every header could have a 32-bit identifier (4 bytes) and you've got more efficient compression than gzip. Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers, which would make working with the protocol in general more attractive. But then you have your binary-detractor-wizard-hackers and the whole conversation becomes an infinite loop.
I can't tell you how many times I've manually read HTTP. To be sure, it's insignificant compared to how many HTTP headers have passed through my computer unseen by me.
ASCII may be inefficient, but computers are really fast, people are not. I don't have any measurements, but in making a browser, HTTP header parsing/writing was never near a performance issue. Bandwidth wise it also tiny compared to that image file you'd inevitably download for every page visit.
And sometimes you don't have tools. Sometimes you don't want tools. Sometimes you want to use tools that work with text to analyse your problem.
(You other arguments might still stand, though :) )
Since header lengths are not limited, and a single TCP packet's payload is quite limited, long headers can cause very measurable latency difference. Additionally, while I agree the generation / parsing overhead is probably quite small, saving it for every HTTP request is still a boon.
I'm also curious where you are reading raw HTTP from?
For me it's primarily in two situations, reading a packet capture from WireShark, or in the browser's debugger. In both cases, the tool will end up translating the request for me.
Just because I use Charles or Wireshark doesn't mean that I only want to use those specialized tools. I have definitely been in situations where I'm doing something like running nc as a proxy and looking at raw HTTP. I wouldn't choose to throw that away and revert to the bad old days without a big win.
But now nc supports SSL natively, making it super easy. Just as it will support binary HTTP natively, making it super easy. And everyone will finally stop caring about ASCII.
I'm questioning the measurability of this, though. Smells like premature optimization. You could be right, but I'd like to at least measure it before we go about changing one of the fundamental protocols on the internet.
Last time I read raw HTTP was when writing a script to automate some stuff on a web page. I specifically did not want the browser's headers and behaviors. I had a bug which only happened from my script, and raw HTTP helped me track it down. I could have used wireshark, but I am much faster in vim for a simple task like that.
As a comparison to your scripting story, you would use Wget or Curl or LWP::UserAgent or a thousand other things to automate HTTP requests. One function call to do what you did manually. To find bugs you would use an HTTP fuzzer like Skipfish to automate the process. If you think somehow your manual process was faster, I say to you, teach a man to fish...
(I automate things in web pages for a living, and I only use tools like Firebug and LWP)
Also sometimes you need something spcific and then you have to option to code it up yourself, or change a open source library quickly and efficently.
The advantages of text based protocol remain the same. The disadvantage is lessened by faster CPUs.
If anything, the more complex the protocol, the more redundant the text becomes, because we have to write tools to parse the text and output it so we can understand it better or identify flaws in it, and work around bugs introduced by the human element of the protocol. The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.
You still need a library or tool to write the protocol out, as it's complicated and needs to be structured for the machine, not a person.
Second, saying "it's ok that it's slow, we'll just buy a faster CPU" is not a good argument for anything ever. It's part of the reason it's taken so long to adopt encrypted services everywhere. Someone (Google) had to finally prove it wasn't slow so people would adopt it.
Third, the state of modern computers is that there is no difference in speed between interpreting most text protocols and binary protocols. But that has nothing to do with efficiency, or what the machine is naturally suited to doing. You have to translate from English into machine code for a computer to know what the hell another machine is talking about. Machines don't care about line-by-line, or capitalization, or indentation, spaces, or any vestige of our natural language. Strip all those things away and machines purr along happily with less bullshit to deal with, which means simpler, more efficient code. Note that I didn't say faster.
Fourth, your performance and history observation is flawed. We need a lot more performance today than we did before, as we're scaling existing technology to many, many orders of magnitude higher than anything that existed when it was invented. Yes, we have faster CPUs. We also have more users and more data, and we don't have time to sit around reading packet dumps in text editors.
This is entirely false, as anyone who ever had to debug a malfunctioning http proxy or a misbehaving IMAP can tell you. Nothing beats netcat for a quick bug isolation test. As for the need for formal parsing, again it is true for production code, entirely false for sysops transient tasks.
Compare debugging a corba server with debugging http for a whiff of the he difference.
Tools aren't omnipresent. My miryad busybox embedded devices won't ever likely have a protocol analyzer. If I'm in need of one there, I'm done with.
With a binary protocol you're entirely dependent on tools (except you want to trawl through it with a hex editor)
Seeing as we use binary protocols every day of our lives, and the tools to work with them have existed for years, and nobody has any problem with using them, let's let this argument rest.
"The history is doomed to repeat itself".
Fifth, redundancies introduced from the beginning of time need to go away
I wholeheartedly agree with this, but it doesn't automatically warrant binary encoding.
Human readability is a huge bonus in any protocol or format. Not because normally people read those protocols, but because people read ASCII and therefore they have good tools to work with ASCII.
HTTP is a layer 7 communication protocol. HTML/CSS are markup languages for designing an interface. JSON is a data interchange format. RSS is a content syndication format.
They are all wildly, vastly different. The only thing they have in common is they're all ASCII. If anything, you're making my argument for me: a communications protocol is not a format for displaying documents, it is a language for communicating machine instructions to network applications. Historically they have always been binary because it works better that way.
Your argument that "people can read ASCII, so ASCII is good" leaves out a couple points. Like, human beings do not read an HTTP statement, go into a file folder, bring out a document and present it to their computer. It's the other way around.
Really this just reflects a strange phobia people seem to have. Like your brain is tricking you into thinking you'll lose something by not looking directly "at the wire".
When you look at HTTP headers, 90% of the time you're actually looking at a pre-parsed, normalized set of fields. If you look at a raw packet dump, the whole message may not show up in one packet; you may have to reassemble it, which means parsing. If you have multiple requests in one connection, you have to find the end of the last request, which means seeking through the stream; seeing requests broken down individually means a tool already parsed them. Firebug and wireshark and other tools all take care of the automated, machine-operated work for you.
And what's left? What do you have to do with HTTP, really? Apache rules? They'd stay human-readable. Application testing? We use proxies that handle it, and APIs for client/server programming. Firewalling? Handled by tools and appliances.
Stop giving me the blanket "ASCII is great for everything" excuse and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool. But you don't have to, because that's impossible: HTTP is not for humans.
I look forward to servers having different text representation of the same binary headers in their config files.
tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes
You're missing the point.
No one writes HTML manually anymore either. People generate it using tools (string processing tools in a language or templates) and read it using browsers. Heck, even Notepad++ is a tool, but a generic one.
If you want, you can generate all your HTML using DOM. But almost no one does that, because DOM tools are clumsy, while text-based tools are easy to use.
I was up until four last night doing just this. It's commonly done for templating purposes all the time, or quick hacks and placeholders.
Have you lost your mind?
If no one writes HTML manually anymore, then we have no need for it to look like English when the computer interprets it! We can compile the HTML down to bytecode and have it be interpreted much quicker by the computer, which won't have to do the job of lexing, compiling, assembling, etc. Here, two steps would be eliminated immediately, resulting in increased speed and more efficient storage and transmission: http://www.html5rocks.com/en/tutorials/internals/howbrowsers...
For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!
On top of that, you missed when I said HTTP is a communications protocol. Ever seen the movie The Matrix? Know how the sentinels would sometimes look at each other and make scuttling noises, then shovel off somewhere? They weren't speaking English ASCII. They were speaking a binary communications protocol. Know how I know? BECAUSE MACHINES AREN'T HUMANS! It would be absolutely moronic for them to speak English to each other. It would be like dogs saying the English word "bark" instead of just barking. Completely unnecessary and crazy. But that's what an ASCII communications protocol for machines is.
On top of that, there is no benefit, not one at all, to humans being able to read it when tools already exist to interpret and display it even more human-readable than its natural state. We squish and compress and strip HTML and JS already just to make it more efficient, and then undo the whole process just to read it. It's insane.
The web is made by people, not computers. Open an ubiquitous text-editor and you can start working on something right away. If you have to download a dozen different compilers and IDEs to do that, it's definitely not the same.
Face it. Your love affair with ASCII is just that: an emotion.
(As to your original question: humans haven't needed to program in binary or assembly for decades. That's what so great about computers: they do the hard work for us, so we don't need to type everything manually into a text editor. Is that such a hard pill to swallow?)
Yes, binary is more efficient, but then tell me why is JSON the most popular data interchange format on the web today?
Yes, let's base HTML 6 on Word .doc.
Also, the machines in The Matrix were hostile to humans. We'd like machines in the real world to be... not so.
With a text based protocol, you can inspect it visually with no special tools, and munge it with general purpose tools that you already know how to use (shell script, sed, awk, perl, python, ruby, what have you) with no special support libraries or anything of the sort. Support libraries can help you with the more complex aspects of the protocols, but for basic debugging purposes, you can do it all with general purpose tools.
With a binary protocol, you need those libraries to even have a chance of being able to work with it. Now you can't use a general purpose shell pipeline to munge it; no more nc | grep or what have you. You have to have a wireshark dissector; and good luck figuring out how to grep through the results of what a wireshark dissector generates.
The main point is that the overhead of the ASCII encoding isn't the main problem with HTTP. Reading ASCII encoded CRLF delimited headers is a solved problem (and heck, you could probably switch that to just LF delimiters, since I'm sure that most processors already handle that case just fine).
There isn't one, don't bother looking. Corba is the poster child for the problems with binary protocols: fragmentation, buggy implementations, incompatible extensions.
I'd rather not see HTTP follow the Corba path.
It can be, but that doesn't really make sense. The vast majority of web development is done without manually editing HTTP headers. It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice. The same cannot be said for any of the other technologies you listed.
This is a good example where adding an object-oriented representation to every header out there would require a lot of work. Not sure if it would justify the gains.
It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice.
Until you try to use grep or something of that sort for some non-trivial analysis operation. Everything 'speaks' ASCII. Custom tools for binary format would take years to evolve to be as powerful as generic text tools.
All you need is one parsing tool that produces a textual representation of the binary protocol, and you can once again use grep and friends.
Most of the tools I've used represent the header as a hash/dictionary. I fail to see how that approach "requires a lot of work".
> Until you try to use grep or something of that sort for some non-trivial analysis operation.
You're arguing from the assumption that a binary protocol would be implemented by idiots. Custom binary tools can always emit a textual representation, at which point you can grep through it to your heart's content. This is the exact same problem that we've been solving with compilers for generations. It isn't nearly as insurmountable as you seem to believe.
HTML isn't XML based; there is an XML-based relative of HTML (XHTML) which was originally (before HTML5) viewed as a potential successor to HTML, and with HTML5 there is an available XML-based serialization of the HTML's semantics, but HTML is its own thing (prior to HTML 5, HTML was SGML-based; XML was inspired by HTML rather than serving as the basis for it.)
Of course, you can say TCP/IP is still binary, and it is true. But TCP/IP tools are built in every OS in existence now, so they do not form a real entrance barrier. Would HTTP tools be in the same position? I'm not sure - most HTTP tools right now are not standard and do not cover even HTTP/1.1 completely, what reason is there to expect they'd cover the whole 2.0 protocol properly and be widely standard and available on the level tcp/ip tools are? Which means much higher barrier of entry.
That is always my stance on things - if one computer is going to run something, write it in python, make it bleed memory, just make it work. If it is going to run on a million, you have to consider the raw power waste of inefficient programming. If it is going to run on trillions of devices for decades, your choices are few in my mind.
>>> If it is going to run on trillions of devices for decades, your choices are few in my mind.
The history suggests otherwise - majority of mass-produced software is not written with performance as an ultimate concern. You would find a lot of software written in languages like Python or Java, even though using C or assembly would probably produce better performance. But using C or assembly that software would probably never be produced because its complexity would be harder to manage.
Of course, performance does matter - even writing in Python, you have to worry about performance. But here we effectively see an argument saying "since we have a lot of software in Python, if we switch it to C we'll have massive performance gains". I think it is a wrong line of argument, if we switch to C a lot of this software wouldn't be written. (Note it's not against Python or C - I use both and they both are great in their areas :)
So I guess optimized protocol does have its uses for high-volume websites - but I am concerned its advantages would be offset by its complexity. The designation of it as HTTP/2.0 implies it is the next version of HTTP - but it's rather a rather different thing with different use case. I'd rather have it as a separate protocol for high-traffic websites.
That's pretty annoying to see this kind of thinking. The reason why everyone codes in JS and uses HTML, CSS is because its ASCII. Its easy to understand, hack, etc. Same reason python is so popular. Even Go, is pretty simple like that. Sure its languages vs protocols, but the reasoning is exactly the same.
And in fact, the comparison works with protocols as well:
SMTP, IMAP, HTTP, IRC are EXTREMELY easy to understand and code for.
Binary protocols are a huge PITA to code for. The argument that you're going to use a lib or whatever tool just doesn't hold any water. You want to understand what exactly happens.
Thats how everyone learns, etc. I could write my own SMTP, IRC clients when i was 10. I could understand it. It works. No way I could understand fully the documented binary protocols. I tried, and it was just too painful and not fun at all (hey, I was 10).
I'm not certain the added performance of using a binary format and some of the other advantages are really good enough to make the world unable to understand whats going on anymore by just looking at it.
Sure purely technically speaking, it sounds like "binary is the way to go" for pure performance.
But if you think about it, making hacking around that stuff a niche thing is perhaps a much greater loss. Even the reliability of a binary protocol is VERY arguable.
In fact I'll put a last comparison. Shell pipes and ASCII. Many tried to replace them with smart binary protocols, objects, etc. Its cool. Its more powerful. More efficient. At the end of the day tho, a quick hack with regular pipes transferring ascii is just easier to understand and we all use those - not the fancy binary objects.
Why aren't we using binary file formats anymore?
XML is more easily debugged when the machine tools don't work.
It's also more accessible, more easily created and modified, and thus, more available to a wider range of people than just web design professionals.
First, OSI layer 6 called and it wants its old job back. It sat around connecting layers 5 and 7 peacefully since the dawn of the ARPANET, all while the HTTPbis guys were passing around messages back and forth trying to obsolete it.
Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.
Third, the Internet is big-endian while most common processors in use today are little-endian. This is going to haunt peoples lives forever because you have to continuously convert between the two and although the conversions are orthogonal, the methods aren't idempotent (as opposed to converting a string to ASCII, or a text buffer to DOS style line endings).
You mention 32-bit identifiers as opposed to a string of digits. This is more error-prone than you think, two's complement isn't the only integer representation out there. Implementations written in C would have to deal with their underlying architecture as the standard allows for 3 different representations (so the compiler wouldn't help you out). Then there's signed and unsigned, either of which might not be available in the implementers programming language. You end up unpacking the identifier by hand, which may end up being slower than just looping through a string. ASCII is hardly an inefficient serialisation format.
Fourth, any fixed-length records are going to be useless at some point in the future. Several versions later the fixed length records are going to either point to an extra set of records tailing the 2.0 records or will simply have a (designated) backwards compatible value for consumption by older peers. With HTTP, we can add a header anywhere in the request or response except for the very top. We can even shuffle them at will without adverse effects on peers.
Fifth, it doesn't make sense to optimise a tiny fraction of the entire HTTP session. Any benefits are too small to be worthwhile and would therefore result in a net-negative to most implementers.
Sixth, you can still make improvements to HTTP without moving to a binary protocol. Not sending the same headers on every single request isn't one of them. HTTP is essentially a stateless protocol and every request could be handled by a different server. You can architect clusters of servers routing incoming requests however you please and satisfy every one of them correctly and efficiently. For starters, you can replace any underlying protocol in the stack with a more cluster-friendly protocol in transit.
Seventh, just because no one is using a given content type in HTTP (I think you were referring to multipart/related) doesn't mean the protocol used to transfer that content is bad. Heck, it's not even part of any HTTP standard.
2. Amazingly, people have made binary protocols work before, in spite of no preexisting implementation, so it's not impossible. I'm sure we will be able to meet the challenge.
3. Do not try to sell me the endianess issue. I have written multi-arch tcp/ip stacks and i'm not a CS major. Trust me, it will be okay.
4. Yes, and IPv6 address space will someday expire. But not soon. And as many fixed-length frame protocols have done in the past, you leave an "extra frame options" bit to stack more fields on. It's fine.
5. It's really not about optimization at all. It's about common sense. The computer works better when you talk to it in computer-speak, and we gain absolutely nothing by talking to it in English human-speak. The benefits are a net-positive because parsing is easier, because a computer is parsing it, not a human. There is no sane argument that can validly claim that parsing human-readable English is easier for a computer than fixed-length bitstrings. CPUs don't grok ASCII, they grok BINARY.
6. Modern designs for clusters of web applications route by session, not by individual request. You are session-oriented instead of connection-oriented, though in practice it's almost the same thing. And see previous comment on why adding onto HTTP willy-nilly is just a hack.
7. No, just the jgc's re-implementation of multipart is bad, for previously stated reasons.
In the end, it is formats that people can understand that win the day. You can't just write that off as if it has no value. It plays out in technical ways: all the CORBA implementations ended up having very poor interporability partly because they were hard to debug. Nobody could actually look at a CORBA exchange and see what was wrong with it.
HTTP is not comparable because it never really changes. It's a fixed format. And frankly the majority of developers never need to go down to that level anyway.
(Caveat, I'm back-of-the-enveloping this on my phone about to go to sleep
1. It definitely is an argument and in fact my main argument. At least explain how this would be any less valid than your "it's a hack" and (bandwagon) "wave of the future" arguments.
2. My turn to invoke "not an argument". Just because one can simply copy a struct over a socket doesn't mean it's a good idea to do so. Especially in light of the flourishing culture of diversity we have on the Internet.
3. You conveniently choose to ignore one half of the argument, but miss it entirely. The point is not that we can't overcome endianness mismatch, it's that we shouldn't have to. At least not inside layer 7.
4. Except the old records will have to remain there forever. HTTP implementations dropped the Pragma header a long time ago and today we can simply pretend it was never there.
5. When it comes to common sense, ASCII is right there. That's because a protocol on the Internet needs to interoperable with many systems. Sure, all of those systems use binary one way or another. But human operators are still going to have to program those systems and ASCII is a useful representation which enables us to do just that. Furthermore, the draft proposes to encode binary headers in base64 in order to transfer them in an HTTP/1.1 upgrade request. Now you have 3 ways of transferring HTTP headers instead of just one and we'll have to support all of them in any case. This might seem trivial to you, but it's a problem to servers and quite a huge one at that for intermediaries (proxies).
6. Amending HTTP with a new header again is much less a hack than providing a way to switch to a binary protocol and resume communications from there. Your buggy argument doesn't stand, for HTTP is not the car. It's the pavement upon which old buggies can ride along just fine until it's no longer considered safe amongst the faster carriages.
7. Yes, I'm not convinced client-provided request identifiers are the way forward myself. Though I would consider the proposal a better starting point for discussions than the current HTTP/2.0 draft because it leverages existing mechanisms better.
2. My argument isn't "just because you can", it's "you can." You seemed to be saying it would be difficult if not impossible. I was saying, no it isn't.
3. Endianess will always be an issue, forever. The only time it will go away is when every architecture picks one format. It's a really simple operation and it's part of how computers expect us to behave due to their nature and design. Hacking around it doesn't make it disappear, nor does it help anything.
4. What old records? Pragma was deprecated in 1.1 yet included anyway for god knows why. There's no reason they should do so again, but if they do, it will exist both in text and non-text versions. This is a non-issue.
> a protocol on the Internet needs to interoperable with many systems
> But human operators are still going to have to program those systems
> and ASCII is a useful representation which enables us to do just that
> it's a problem to servers and quite a huge one at that for intermediaries
6. Are you comparing extending HTTP/1.1 for a single feature to the backwards-compatibility support of HTTP/2.0? Because that makes no sense. The vehicle analogy is just weird at this point.
7. See, this is where the vehicle analogy works again. "leveraging existing mechanisms". In other words, let's throw one more feature on top. It never ends, because all you have to do is keep adding more lines, and modify the browser, and modify either the server or your web app, and keep going to support god knows what. At some point they'll implement an incredibly complex binary protocol and embed it in base64-encoded ASCII HTTP/1.1 headers, because "leveraging existing technologies" is thought of as a neat thing to do. It will also be insane. At some point you need to just make a better <whatever> instead of hacking and hacking and hacking onto it to make it do what you want.
Like building the great pyramid of Giza out of tinker toys. Sure, it's easier for people to use tinker toys. It's easy to understand. You don't have to do any real work. And it's also not meant for that task. At some point you need to throw out the toys and use stone.
I can even go further. ASCII is too old to use. Really, it's been antiquated by UTF-8. It is telegraphic codes for teleprinters. And ASCII itself was micro-optimized to only be 7 bits, and the 8th bit was used as a parity bit because perforated tape had space for 8. ASCII is so antiquated (1960) that nobody should be using it anymore.
Clearly we need to implement HTTP/2.0 in UTF-8 wide characters, so connections to China, Japan and India will support their native language in the protocol. (After all, what's the point of a native-language protocol if only English speakers can read it?) Also, we should include the byte order mark at the beginning of all messages so we don't have to worry about how endianess works.
No need to wait: http://tools.ietf.org/id/draft-ietf-httpbis-header-compressi...
Just look at that and weep. That whole document deals with how to represent HTTP headers. It doesn't define them, their behaviour and how they should interact. No. This multi-page document merely documents how these headers should be represented.
You know, things which up until now has been:
Lines of text with key-value-pairs delimited by a colon-sign.
Obviously this HTTP2 binary monstrosity is being done all in the holy Google-name of micro-optimizing performance.
This is terrible design and quite literally obfuscation more than anything else. I cannot believe the IETF is even considering this junk.
Edit: Link to an IETF discussion on the subject: http://www.w3.org/Search/Mail/Public/search?keywords=&hdr-1-...
2. I don't disagree it's easy to come up with a binary protocol, taking a short cut is always easier. Just like it's easier not to write a test harness with full coverage for a software project, that's entirely up to you. When a regression causes havoc down the road before you realise what's going on, well, rather you than me.
3. You're defending a regression, as of right now it's a non-issue. And are you really calling ASCII a hack around endianness issues? On what planet?
4. Records that are going to be deprecated down the road, which I think is fair to consider inevitable. All I'm saying it's been a problem in binary protocols before, so let's not do that. You don't see this as a problem at all, so I'll digress.
You mean like IP, ICMP, TCP and UDP?
Yes, these are built into the operating system. Once you start using e.g. netcat (who hasn't piped tar into netcat for a quick backup?), all of that becomes transparent.
No, my argument isn't that ASCII is ipso facto easier to implement. It's that it's easier to test, debug and always see exactly what's going on over the wire.
If the alternative to a text-based frankenstein format is its binary-bastard child, I'll have the former thank you.
6. No, I'm telling you to think of HTTP as a conveyor rather then a payload. There's a difference, and that's why the vehicle analogy is weird.
7. So you're proposing that every N years we create an entirely new HTTP and upgrade to that? At what point will the streaming pile of upgrade requests yield a noticeable reduction in performance?
Also, UTF-8 characters are not "wide", they're variable length but not wide as in multibyte encodings. Then you go on to suggest we use a BOM at the start of every (UTF-8, mind you) message, I'll leave it up to yourself to let that sink in. You even spelled it out.
Many of the complaints on HTTP are really complaints about MIME messaging which the entire internet is really built on (standards anyways) and has ran pretty smoothly for a very long time. Approaching improving HTTP by addendum like SPDY is a better idea. Or possibly transporting it better over streamed protocols like SCTP: http://tools.ietf.org/html/rfc6525 no need to modify the packaging/messaging format.
MIME/HTTP/HTTPS are very flexible and if you want binary can be added in and has been in multipart, EDI/HTTP/AS2 and other RFCs use this. Multipart isn't used as much because it is more problematic (used heavily in email and custom protocols) so making the whole spec that way would be bad overall. The points on the OSI layers is very key, let's not revert to binary + base64 everything just to get data across the wire. You can put anything in there, basing it in text and human readable is always a good idea. That is really what this whole layer is about. Binary type of movement pushes us back to the days of non standard blobs, problems that http messaging then content as xml then json solved by standardizing readable exchange of data. When you are exchanging data in a standard way it should be very basic to minimize problems not collude. Throwing out all of MIME just to speed up HTTP when other protocols exist for any needs that are faster (real-time, attaching files, streaming etc) is a bad idea. Also changing support from HTTP < 1.0, to 1.0 to 1.1 had many problems, unless this adds considerable benefit, changing it adds more problems.
That's a very 1970s way to develop a new protocol.
These days you specify the protocol in, say, an XML- or JSON-based file format, and then run a code generator to produce client and server libraries directly from the spec. This has the advantage that the implementation is derived directly from the specification, so there is little room for ambiguity.
Wayland is one example in the open-source world of where this is done, but I've seen the technique used in proprietary shops as well.
The point is, the ability to "type" a protocol is irrelevant to how modern distributed software gets developed. Maybe it mattered in the days when comms were at 300 baud, machines had kilobytes or perhaps megabytes of core, and the Mark I eyeball was the best way to debug machine-to-machine comms, but these days we have tools that can decipher binary wire protocols for us. Performance and adaptability are far more important now than human readability. That war has been lost.
1960s-style arbitrary field size limitations: the wave of the future! No doubt any day now we'll reorganize the internet around shipping punched-card images around, too. We could call the project the "Because It's Time Network", or BITNET.
> Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers.
Most people would argue that, yes; famously, Kernighan and Plauger did argue that in The Elements of Programming Style in 1974. By "easier-to-write parsers", you mean easier than dict(((key.lower(), value.lstrip()) for key, value in (line.split(':', 1) for line in header.split('\r\n'))))? Because I think that's going to be a pretty tough bar to fit under. (Yeah, I know you need another couple of lines of code if you're going to handle indented continuation lines, but you can get rid of those without returning your protocol design to the Summer of Love. You could also get rid of the .lower() and the .lstrip() while you're at it.)
> Add to this that every header could have a 32-bit identifier (4 bytes)
Padding out sub-byte-sized values to fill out fixed-width fields: the Intelligent Man's Approach to Saving Bandwidth! Or you could just use one-letter names in an ASCII protocol.
> [Fixed-length fields] solves crazy problems like header injection and request integrity checking.
Clearly we've never had parsing bugs in binary protocols full of fixed-width fields, now have we? Surely not bugs that produced security holes? Except maybe TCP, IP, and DNS. And X.400, and X.500, and X.509, and some of those were the fault of ASN.1 BER and DER, which are hardly fixed-width formats. And surely silently truncating a value to put it into a fixed-width field would never change its semantics, right?
> as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits?
Well, let's see. How many files do I have here?
$ find | wc
5946 17074 330147
$ find -printf %.10A@%.10T@%i | gzip -9c | wc -c
$ find -printf %.10A@\ %.10T@\ %i\\n | head -3
1373396230 1373338685 2359306
1372967497 1372967489 5246218
1365458166 1365458157 5248264
$ find -printf %.10A@\ %.10T@\ %i\\n | gzip -9c | wc -c
$ dd </dev/urandom bs=1024 count=1 |
od -w1024 -l | tr -s ' ' ' ' |
gzip -9c | wc -c
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000622635 s, 1.6 MB/s
But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.
2. Trying to fit an HTTP field parser into one line is not the way to win a programming argument.
3. One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect? You're trolling me now.
4. Yes, software bugs happen! It's crazy I know. But let's go ahead and assume that the security holes that still plague applications today due to design flaws are not the same as a couple off-by-one bugs decades ago.
5. By virtue of the algorithm, compression works better the more you have of the same thing. You won't have 70KB of headers to compress at once; more like 400 bytes. The compression of individual header groups each time will not benefit from the previous data's compression, as TLS or SPDY might do. The eventual overhead would not only be larger than a bitstream but take more CPU to decode.
6. Not only are they inefficient, they add complication to the parsing of the protocol, which is one more thing an application can mess up. Not only is it slower, it's more prone to errors. A VPN does not fix that. Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.
I wasn't suggesting encoding "Host: news.ycombinator.com\r\n" as "Hnews.ycombinator.com\r\n" but as "H:news.ycombinator.com\r\n". As long as you keep the colon, you can still use long names for other headers.
> Trying to fit an HTTP field parser into one line is not the way to win a programming argument.
You said parsing would be "simpler". It's going to be hard to get simpler than something that you can fit into one line.
> Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.
Well, that's kind of what I'm saying: let's focus on solving real problems, instead of recreating new ones that we'd already solved decades ago, in order to "solve" non-problems like HTTP header encoding.
> It's going to be hard to get simpler than something that you can fit into one line.
This is a terrible argument, as you can fit anything onto one line if you string it along enough. But here's one example of something simpler:
strncpy( frame_struct, buffer, sizeof(frame_struct) );
And i'm not proposing we merely solve the problems of HTTP. That would make too much sense; people are much more willing to put up with bullshit than do the hard work to make things work correctly. I was proposing we make things work better, simpler, and more reliable, and throw away the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format. But whatever, it's not like this thread will amount to a hill of beans.
So let's just convert all of HTTP into XML.
And I don't really think "H:news.ycombinator.com" is quite as illegible as your suggested 32-bit integer space — which, by the way, is small enough that you'll probably need a central registry to prevent header name conflicts — and it also occupies only two bytes instead of four for the header type. So, from my point of view, the "completely illogical" thing is to go from, "The header names currently in HTTP are too long!" to "Therefore let's replace them with 32-bit integers in a binary protocol" instead of "Therefore let's shorten the header names in HTTP", which solves the problem more thoroughly and with less collateral damage.
And what is this about "if you string it along enough"? We're talking about a parser (for RFC-822 headers without continuation lines) that fits into 110 characters, here, without the least obfuscation. Less than a Tweet. In fact, I just Tweeted it. And it worked on the first try.
> the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format.
You know, we did kind of try binary protocols already: the whole IPX stack, CIFS, X.everything, SNMP, TFTP, ICB, Sun RPC and thus NFS and NIS, and so on. A few survive in common use: DNS, TCP, IP, ICMP, SSL, SSH, BGP, and to some extent, SNMP. And there are lots of them working fine inside of particular companies, rather than between implementations by different vendors. But for the most part, they've been replaced with textual protocols, despite the lower efficiency and in many cases the first-mover advantage: HTTP, SMTP, and IRC, and previously FTP, Gopher, and Finger. You seem to be arguing that was an accident, or a mistake. It's not.
Why do you find HTTP impractical? There seems to be a lot of evidence to the contrary.
This was mentioned once in the IETF discussion, before someone said "but uhm, SPDY is binary, and we have data from SPDY, and yeah".
After that everyone was too busy discussing how horrible unstrict and ambigious text-formats can be, before jumping off to a 20 email discussion about which endianess should be preferred and how the clients and server should determine which one to use, or maybe clients should support both kinds endianess.
All without a hint of irony. It's like a Bizarro world IETF discussion.
What are you talking about? You could have had something on your mind, but this is a terrible terrible juxtaposition of words...
I disagree I think HTTP should have been a simple binary protocol from the start and HTML should have required compilation into a binary format.
How much work would have it really been? htmlcmp foo.html providing foo.bhtml. No whitespace, No end tags, one or two byte tags, etc. Strictness in the reference HTML compiler implementation could have saved the web from all the stuff outside the actual standard that browsers (and other tooling) now have to support (so they don't "break" the web).
I'm not suggesting anything as crazy as the flash binary format (I wrote a Java flash player once...), but when I started to write things like proxy servers and HTML minifers I was blown away by the extreme inefficiency of HTTP/HTML.
This is a step in the "right" direction IMHO.
Just as an example I started playing around with HTML when I was about 12 (in 1998), it was easy and I got instant results. A year or so later I tried to learn Perl and quickly gave up because I couldn't get my first script to run. It was another year before I tried to "program" again and became hooked.
HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.
Back in the 1990s, Ron Rivest came up with canonical S-expressions, which are fully capable of representing the same information represented by HTML, XML, ASN.1—but can be either human-readable or binary.
Here's a simple example: (p (* class (footer x-treme)) "This is a " (b "footer") "."). Very human-readable, very human-editable, and easily machine-readable, wouldn't you agree?
As a binary format, it would be (1:p(1:*5:class(6:footer7:x-treme))10:This is a (1:b6:footer)1:.). Still geek-manipulable, if necessary, and extraordinarily simple to parse. And it has the advantage that it is a distinguished encoding--any of the myriad human-readable encodings all reduce to the same canonical encoding, which has advantages for hashing.
Perl doesn't tend to have an explicit compilation step. You could write a HTML compiler to be as permissive as HTML parsers and you wouldn't have the same frustration you had trying to get started with perl.
If anything, HTTP/REST was much more complex. HTML was somewhat mystic yet allowed you to move things around easier, and actually design things vs just presenting them. And hyperlinks were really cool.
The web is big enough today that we can afford to make it more efficient since we have so many professionals that don't require the kind of implicit hand-holding provided by the original implementations. Back then everyone was a newbie and there were no web professionals.
I hate the idea that you would have to be a "web profesional" in order to get started today. We should embrace and keep the culture the web was built on.
All that said, I think peterwwillis is right: http/X.0, as long as it's well-simplified, is better in binary than it is in text. Ideally, there's a bijection between your text-mode and binary-mode (like a lens), where it's easy to parse (rely on your toolset to do the translation back and forth) and easy to put on the wire. Forth is a good example of how to do it sanely.
I would put those 11 and 12 year old down under "motivated independent learner"
If your concern is the a binary HTTP, too bad, it already here, its called HTTPS. It does seem to be holding anyone back.
If your concern is a binary HTML/compiler for it, I don't by that either. I was using a compiler when I was 8 years old and didn't have a problem with it conceptually.
Just because we have more 'professionals' today doesn't mean everyone is a professional.
Like with gaming, each game should be simple to start but deeper to master. That is what this layer is all about, lower down the OSI stack it is much more of a wall to beginners. Never lock out beginners as they can be better masters with time, don't hide the entrance to the labyrinth. Leave things as approachable, but with better professional experience, modifiable to perform better. We have all that now, a competing binary protocol will never see the innovation that a more simple one like HTTP will. Higher up the stack you can see how this was better for exchanging data in standard ways from old school binary blobs, to CSV files, to XML files, to JSON. Same reason REST services won over SOAP, simplify... There is a JSON binary format BSON in mongo and also MessagePack but guess which one is used more in service/exchanging data the textual one or the binary ones? The binary formats work well in certain situations where both endpoints are controlled by the same entity.
Binary and more locked down/optimized formats and messaging have their place but the start/base should always focus on simplicity over optimization.
The general rule in exchanging via standards is be liberal in what you accept, conservative in what you send. Being all binary all the time is a backwards step and is conservative on what is accepted, I also think it would lead to a host of difficult to debug problems just based on work I did with AS2/HTTP RFC implementation one of them being streaming and of course encoding/decoding which can fill hours of work if you can't visually see the content at some level.
There's something beautiful about being able to teach someone how to write a basic HTML document in 20 minutes. My mother can easily understand what HTML is doing, I doubt she'd understand a compiler.
Your argument doesn't really apply at all to HTTP, though. No one "got their start" peeking at HTTP requests. It's solely the domain of those working on infrastructure (for some definition of that word). Anyone with any idea of what HTTP really is (ie more than "that thing at the start of a URL") should have no problem using a tool to convert between a binary and text representation of the protocol. It's not like you can just magically pull HTTP requests out of the ether, you need a tool anyway. There's no reason why curl (for example) couldn't transform a request you write into an equivalent binary protocol, or why it couldn't do the inverse operation when it receives the response. It's utterly ridiculous to me that there is such inefficiency in HTTP just to make things slightly easier on the implementers of curl and wireshark.
That is not a viable option.
If you're aiming for an existing ecosystem, then sure, there's no reason not to use HTTP, assuming you make use of established libraries. But widespread use is HTTP's only real virtue; the protocol is considerably more difficult to implement correctly than it should be.
A minimal HTTP server that recognizes GET requests, finds the url, throws away everything else, runs the relevant code and returns an HTML document of the results is actually really easy to write. And easy to integrate into an existing event loop.
But even a minimal HTTP server, even one that ignores things like HEAD requests, is still going to be more complex than, say, receiving the URL raw, or even a URL wrapped in a simple structure like a netstring.
Actually implementing a full, correct HTTP server would be one or two orders of magnitude more complex that implementing a more modern protocol from scratch.
There is actually a binary format that much of the web content is compiled into: gzip. It's remarkably effective.
I will grant that following Postel's Law means that browsers have more work to do to ensure that all kinds of "busted" stuff on the web continues to work, but I'd guess that, at this stage, that work is pretty small compared to everything else browsers are trying to do.
I'm sure you realize this but gzip is a content encoding. If there was a binary html format, it could still (and should be) gzipped (or better yet 7-zipped).
I did some playing around this this and gzipped binary encoded HTML ended up around 1/4th the size a gzipped minified HTML.
Gzip doesn't give you the other advantages of the hypothetical binary format, primarily: An extremely standardized and quickly parseable format.
This was for internal testing, so it was a custom implementation.
>my initial thought would be that the both would gzip to almost exactly the same size?
That is like saying you would expect a gzipped CSV file to be as small as a gzipped database engine file. This of course not the case.
Actually, it would be more informative to point us at a "binary encoding", any binary encoding, that compresses smaller than the equivalent html text.
When you add in indexes or B-trees or whatever else database engines use to retrieve data quickly, database files can get quite big indeed.
What are the advantages of binary HTML vs plaintext HTML + gzip?
I've observed many binary format efforts. Various binary XML, BSON, binary VRML.
The only efforts that made any sense (penciled out) were binary geometry representations eg enabling mesh compression.
(PS- I hate tagged formats like XML and HTML with the passion of a billion burning suns.)
hacker@gibson:~$ htmcmp ./myPage.html
myPage.bhtml generated successfully.
hacker@gibson:~$ htmdmp ./myPage.bhtml
myPage.html dumped successfully.
If you really hate the idea of having to run this compiler, it could be automatically run by apache, IIS, nginx, etc when serving your page for the first time. This is all hypothetical of course since such a standard does not exist.
What's interesting is that what you describe:
>this compiler, it could be automatically run by apache, IIS, nginx, etc
...essentially already occurs in many cases in the form of compression such as gzip. These files are also automatically extracted by the browser and are essentially as transparent as non-compressed ones.
So I think you both have a point:
I agree with jakejake that the web must remain transparent.
I agree with you that as long as sufficient tooling is freely available, it doesn't matter how the underlying protocol works.
I.e., a binary representation of the page is still fairly transparent if there are plentiful tools that will deserialize it into a page object that can be expanded/perused/manipulated and then re-serialized when e.g., Ctrl+s is hit.
What would be bad is if 'View Source' showed something like:
...and you needed to spend umpteen hundreds to get a decompilation tool that only gave you an obfuscated/inexact reproduction of the recipe for the page.
Nothing will change on the front-end, just the servers will get faster and new bugs will be introduced. :)
Plain text is more open and accessible than binary, full stop.
Once pages become big and complex, this learning strategy becomes much harder. But in the early days of the web it was utterly essential.
It'll be no skin off my nose if the web turns into a compiled protocol. I don't know if it's even necessary to continue with plain text web sites these days. But, it most certainly was a major deal initially for me to just "view source" and see that it wasn't a black box of voodoo. The low barrier to entry is one of the major reasons the web took off.
I'll admit I'm an old timer in this business, probably ready to be taken out back and shot! So I have no idea if you youngsters were similarly inspired by viewing the source code on a web site? Maybe with all of the complex client side code and minimized scripts that it isn't even relevant anymore? It probably just looks like gobbledygook to a non-programmer these days.
I am used to working with compiled languages, 9 million lines of C++ on a embeded costum hardware.
Now I know of course that it would not be that bad but still, I dont want to compile more stuff, I want to compile less stuff. I'll let the VM do the compilling for me.
In the grand sheme of thing these discussions hardly matter, its just a 'cool' topic to fight about. Neither with HTML or HTTP size is a big issue, almost never the bottleneck.
1. Gzipping doesn't make the HTML syntax that must be parsed by browsers more regular and efficient to parse.
2. Gzipped minified HTML is 4x as large as gzipped binary encoded HTML when I tested it.
I may be a bit old-school, but I learned HTTP via telnet, a tool definitely not written to inspect HTTP, and I still use it when I’m trying to debug things. Not having to install tooling is still something I take advantage of.
Obviously binary formats can and do succeed, and with sufficient backing tools will be written and deployed. But if HTTP hadn’t been so easily inspect-able I don’t think it would have been nearly as successful the 1st time around, when the benefits of the protocol where less well known.
And I do think some of the "culture of the protocol" will be lost moving to binary°. It just isn’t as hacker-friendly or at least newbie-hacker-friendly and that sends a message.
° Of course this is already happening with HTTPS, so this is probably not a winnable fight. And the benefits of binary formats are significant, so it might not even be a fight I want to win. Still something is being lost here.
Text-based means everything is open for inspection and self-evident if well designed. This goes away with binary.
Not to mention this protocol will need to be implemented everywhere and used by everyone. This means everyone will need to understand it as well. Open text-based protocols support these requirements simply by being open and text-based.
As if there's not enough bad code around already, in any language not assembly/C/C++, the reduction in code quality and clarity, and amount of additional issues involved the second you change the word "text" to "binary" is staggering. Let's not go there if we don't have to.
Most debuggers today can dump strings fine. Dumping binary, while not impossible, brings extra hurdles. Putting hurdles into debugging, pretty much means you will end up with more buggy code.
As if that's not enough to make think "hmmm", there's the issue of future-proofing and extending the protocol. That is much easier to do cleanly in a text-based format. A binary HTTP 2.0 will be brittle and short-lived.
> We don't examine raw bytes by hand
Because that is very, very impractical. And thus making that a de-facto necessity for working with HTTP 2.0 doesn't make much sense.
I say we should look at it the other way around: Considering everything a binary-format will cost us (just some of those mentioned above), what benefits does it bring us to justify this huge cost?
I say the answer to that is near none, and in an ideal world that would be the end of the discussion.
Not to mention wireshark...
You're not alone. I've opened an issue on the subject . Feel free to add weight to it.
Any time a bug report is linked in Hacker News the community ends up spamming the tracker with our enlightened opinions. Show some restraint.
OK. Fair enough, but I did look for an existing issue on the matter. There were none.
> You know this is a controversial topic, please act more maturely in the future.
Personally, I find attempting to lock down open internet-protocols by transforming them into some sort of binary obscurity rude and against everything the internet was built on.
I don't think everyone is aware about that happening right now. and I see nothing wrong with attempting to bring focus to the issue if that is what needed. Seeing how this draft still outlines a binary-protocol, this clearly still is in need of attention.
Just because some Google-heads somewhere decided that by completely ignoring everything the internet has taught us so far, and in the name of pre-mature optimization can shave off 2ms on their page-load doesn't mean the internet should pander to their interests.
If anything is controversial it is how this is all being done without any documentation showing us what benefits we get from the costs associated with a binary protocol. That is amazing. Completely and utterly amazing.
But OK. Let's say I listen to your guidance: Where would be a good place to raise this issue? Where would I take my "childish" issues to ensure they get the attention they so much deserve?
For most cases you will probably be able to get the old HTTP protocol anyway. The only potential issue would be for people who are implementing the HTTP 2.0 protocol itself, just about everyone else in the world will be using a webserver or at least a library.
In fact raising the bar to prevent people developing their own HTTP 2.0 implementations would probably be a good thing, it would cut down on the number bug riddled implementations.
In any case I don't think it makes sense to slow down the whole internet just for a few developers.
Binary could also see the possibility of hardware implementations. Think webserver on a chip.
Maybe you have a nice Ubuntu box and that's fine. I'm talking philosophically. Modern developers favor the performance and functionality advantages that large applications, deep APIs, and binary communication protocols provide over the transparency advantages that small, highly focused tools, orthogonal APIs, and human-readable communication protocols provide.
Aren't Mac OS X, iOS, and Android basically unix OSes?
I'd say Unix won.
And then Chuck Norris roundhouse-kicked him in the face.
Point being that if by Unix you mean an OS running a Unix-inspired kernel, then yes, Unix won. But if by Unix you mean the design patterns and philosophical principles that have long inhered to Unix development, then Unix is, if not dead, then on its last legs and soon to be discredited.
The "Unix philosophy" has long favored small, easily composable tools each with a specific purpose, orthogonal APIs which were as small as possible, and textual transmission formats which are easy for humans to read and write. There were exceptions (X, Emacs), but these exceptions tended to support and play well with the Unix philosophy even if they didn't 100% espouse it.
By contrast we may propose a Windows philosophy which favors large, do-everything applications over small tools (because that's how people are accustomed to using PCs starting from the DOS days when only one program could be up at a time), heavyweight frameworks and oftentimes entire inner platforms (because in a world where time-to-market is paramount, developers shouldn't have to think very hard to begin cranking out apps for the new technology of the week), and binary file formats (because the damn thing has to run in 640k, a text parser won't fit).
Look at the platforms you mentioned. Mac OS X, iOS, and Android are all app-centric, not tool-centric. You can treat Mac OS X as a Unix box if you want, but it's hard to do so with the other two. Furthermore, when you write an app for these platforms you are not targeting the Unix kernel or libraries but an inner platform built on top of them. Which brings us to binary file formats -- like HTTP 2.0.
The Windows philosophy has won.
If you look under the hood on OSX, iOS, or Android, they are all composed of smaller single purpose components. If you are arguing that they do not use interprocess communication to join these component together, then you are correct. However, that is not the Unix philosophy. A great example is Outlook vs OSX/iOS Mail/Calendar/Contacts/Notes. On OSX and iOS, those applications try to do one thing well. On Windows, Outlook tries to do everything.
Beyond that, just about every embedded device and the vast majority of servers now run linux (or a unix variant). Just look at a list packages on those linux devices and you will understand that it clearly is built around combining small single purpose components.
Given that, it is hard to argue that the Windows philosophy has won. In fact, Windows seems to be struggling (as evidenced by slowing interest and reorg/rearch/rebrand thrashing in Redmond. Looking at the market, I'd argue that the Unix approach won quite some time ago.
Am I missing something?
Inventing a whole new protocol to overcome a TCP deficiency is IMO a terrible motive...
A new protocol that runs on UDP precisely to side-step slow TCP changes.
In short, don't expect to see this outside of Android or Chrome.
On the plus side, this is exactly what UDP was for in the first place.
I also get the impression that TLS SNI/Snap Start/False Start/Ludicrous Start were also hampered by browsers that used the Windows SSL stack, although I never understood why they would do that.
Guess what the designers chose.
I work in game development and application development and the former really loves binary formats (slowly changing away from that with server and editable needs). In many bugs or crashes the root of the problem is some offset in a binary file or some incorrect custom binary file format node that breaks everything after it. Readable, keyed and debuggable formats are so much better at their root (they can be binary, base64'd, compressed after but at the root they should be standard in some way and able to change without breaking the whole thing).
Their main reason for going with SPDY seems to be that rolling out an application-level protocol is easier, while rolling out a transport-level protocol would require changing routers and such. That is almost certainly true. Fixing the multiplexing problem in an HTTP-specific way rather than generally does seem like an unfortunate hack, but it's probably the pragmatic approach.
That's the way it's supposed to work; the horse is meant to drag the cart. The RFC database is littered with cart-led insurgencies that went nowhere. If binary HTTP is one of those, it'll join them as a historical curiosity.
HTTP/2.0 enables a server to pre-emptively send (or "push") multiple associated resources to a client in response to a single request. This feature becomes particularly helpful when the server knows the client will need to have those resources available in order to fully process the originally requested resource.
If someone asks for your HTML, might as well send them the CSS, JS, and images it is going to use. Would cut down on requests. Only thing more efficient than compression, is not having to ask in the first place.
And I just don't see why it should be rammed through. SPDY has a connection upgrade path, that afaik is largely unchanged for http2.0 anyways.
What demonstrable value? Google has never compared it to pipelining and when Microsoft did so they found it about the same and sometimes slower.
There is next to zero value in SPDY over pipelining across several connections. It's just complicated for no good reason.
I didn't read this super carefully, but this is basically Google's SPDY protocol.
This is obviously a critical mistake, given HTTPS stripping is possibly the biggest weakness in current web transport security.
In short: While TLS is a really good idea for just about every user-facing site on the web, there are many applications where some combination of the administrative burden, need for intermediaries and performance cost make TLS sufficiently undesirable to preclude adoption of a spec that required it.
Ah, so every HTTP 2.0 connection has PRISM in it.
We're also still stuck with DEFLATE, which is vastly inferior to modern compression formats like LZMA.
But while you're going to the trouble to push a new version of HTTP, why not put some sort of mandatory encryption in there? Make the browser generate its own RSA key when it's installed? Require pinned self-signed certificates at the absolute minimum? Hell, I'm not an expert, I just want the NSA to have to work harder.
HTTP allready supports LZMA, its just the browser and web servers that dont. If a browser sent the header:
Accept-Encoding: compress, gzip, lzma
Accept-Encoding: compress, gzip
DNSSEC could be used to distribute certificates: https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Na...
Is it really worth the complication to save a few bytes on HTTP headers when you're just going to shovel out inefficient text data in most cases anyway?
I don't understand why they're trying to shove this down everybody's throat as a new version of HTTP.
There are a ton of ports besides 80. The better thing to do, IMO, would be to make a new protocol on some other port and add it to their browsers.
See, it allows the request for www.google.com, including all of Google's tracking cookies, to fit inside one TCP-frame, causing a 5ms improvement in load time, and Google has evidence that this means they make more money.
Never mind the open, exploratory nature of the internet and how text-based protocols was what made the internet into what it is today. We're going to throw away all that which the internet has taught us about that because Google says we should.
This thing is fucked up beyond belief.
Edit: found a link to the IETF discussion about this:
It admits in plain sight that the only thing they care about with HTTP 2.0 is solving Google's massive-scale issues at the cost of everyone else:
> I finally admitted it was a dead end. At the moment the
challenges consist in feeding requests as fast as possible over high latency connections and processing them as fast as possible on load balancers
A good, open, self-documenting protocol didn't suit Google, so let's throw it away. It's a "dead end", after all.
Jesus christ these people need a bitch-slapping.
More emails on the subject can be found here:
Only implementations of the final, published RFC can identify
themselves as "HTTP/2.0". Until such an RFC exists, implementations
MUST NOT identify themselves using "HTTP/2.0".
Further, it's less user-explorable, which further distances the web from access by the everyday tinkerer as opposed to the professional with lots of time to explore.
Those who don't learn history are doomed to repeat it.
I predict that this will lead to unexpected performance problems due to collisions in Tx window size between TCP windows and HTTP 2.0 frames.
That means, whenever you open a new TCP connection to a http2 server, you must start with HTTP/1.1 and perform the Upgrade negotiation anew.
Making the spec require HTTP2 support on all servers of a cluster if at least one server supports it would have been much better, performance-wise. The only reason I see why they kept it is transparent proxies (if your browser is using an explicit proxy, it should be able to figure out if the proxy supports HTTP2).
Either way, with time, we can change the browser to behave differently based on the number of servers supporting that protocol, the same as with HTTP/1.1.
One of the problems with HTTP 1.x is that it is stateless - every time you want something from a server, you treat it as if it is the first time you talk to it. I understand this is by design, but as a consequence we are transmitting all the same headers to the server all the time... which can be mitigated with HTTP2, except that we still need to do that (send all headers) and the HTTP2 upgrade, every time we open a new TCP socket to that server.
header($_SERVER['SERVER_PROTOCOL'] . ' 400 Bad Request');
header("Status: 400 Bad Request");
No surprise they've opted for a binary protocol rather than ASCII, anything to reduce transparency
Independent of the reasons behind doing so, I too have a problem with that. I you're interested, I've opened an issue on it .
Also, how is binary protocol used to "reduce transparency" ? Its simply a different way of encoding, and once this takes off there will be a vast set of tools for analyzing http/2.0. The purpose of binary is to reduce data transmitted.
The claim that Google had foreknowledge of PRISM, or worked with the NSA to build some kind of firehose for them is unsupportable given what we know, and is pure speculation on your part.
Thus, the claim of close cooperation with the NSA is frankly, ad hominem.
Given what's come out about the NSA, it's much more plausible to assume that Google is cooperating with them.
Much like IAFIS is a standard database operated by the FBI.