Oh wonderful, it appears to use SPDY-like header compression. Why don't we just create a new compression algorithm and predefine common HTML tags and words to improve compression of the response body too! But we shouldn't base it on any known compression scheme and only use it for the case where the content length is less than 220,498 bytes but more than 8,494 bytes to optimize the behavior of today's more common MTU settings for PPPoE in Scandinavian countries minus of course the most common size of compressed headers today. It will be particularly optimized for the kind of responses Google sends back to minimize their own server load and common adwords will of course be included in the predefined list of compressed tokens.
What happened to simple protocols? Seriously, per-hop flow control (which works best with which of the dozen version of TCPv4 or TCPv6 flow control?)? TCPv4-like framing with weird limits (16383 bytes)? Keepalives/Ping? Truly ridiculous specialized compression for headers which ignores the role of HTTP proxies? QoS?
Why not just implement the whole protocol over a raw IP connection and stop pretending like we're operating in layer 4+? I get that multiplexing is difficult without flow control, but good lord does this thing look overdesigned for what few benefits it offers over HTTP/1.1.
Recapitulating a comment from downthread, but, look at DNS for an example of how the IETF botches compression in its "simple" protocols. Not that compression isn't fraught (look at TLS), but I see its use as a sign of maturity.
Answer: The Internet is still running on them, 30 years later.
Whenever I read something like "simplicity is hard", it makes me cringe. I hear that a lot, and I see evidence of gratuitous complexity everywhere I look these days. I'd hazard a guess the engineers behind SDPY would find simplicity (and reliability) boring.
Debugging binary protocols is either great job security for over eager engineers like the SPDY team or a great waste of our collective time. I'll let you all decide which.
I can't help thinking some of the pain could be resolved if we had a reliable datagram protocol between UDP and TCP. Delimiting a TCP stream to create a messaging protocol is already suboptimal and error prone, and it's the root cause of the head-of-line blocking problem experienced by HTTP ('fixed' in SPDY).
I understand the motivation to going to a binary protocol for a SPDY inspired HTTP 2.0 but I would have liked to see an ascii based protocol similar to what jgrahamc proposed last year . I thought it was a much cleaner protocol to read and understand, and much more in tune with what the web is supposed to be. Why not keep the clever binary stuff separate in SPDY, endorse it though the IETF and keep HTTP ascii?
There were so many people just calling for rubber-stamping SPDY as HTTP 2.0 without any changes that frankly, I feel lucky that we're getting revisions at all. The editor of the draft is a good guy, and I trust him to make good changes.
I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning, but this is the place that we're at nowadays. For better or worse, the big vendors guide the standards process. I'd like to see more involvement from the little guys, but that has its own set of challenges.
What does Google's involvement in advertising have to do with the design of the SPDY protocol? Can you make a substantive criticism of SPDY based on Google's advertising incentives, or is this just innuendo?
Soley on their advertising incentives: no. It's a rhetorical flourish.
That said, I _do_ think it's extra important to pay attention to what Google does, for two reasons:
1. They're the largest entity on the Internet. This means their incentives are different than smaller players.
2. They _are_ an advertising company. Advertising companies make money by showing ads. They make more money by showing targeted ads. You target ads by collecting data on people.
I think people often forget Google's purpose in the world, and are simply dazzled by 'whoah cool stuff.' I appreciate some of Google's more interesting and ambitious initiatives, but get very scared when people start accepting any entity's actions without question. Specifically when that entity has large financial incentive to collect data about people.
There are, of course, many technical criticisms of SPDY, but none that rely specifically on the advertising angle.
Personally, I find this line of argumentation nothing more than pure ad hominem. This is an open specification, open to review by everyone. If there were technological changes made to somehow support better data collection, people would be able to see that.
The IETF and the W3C has always had large companies involved in specs, often with their own agendas, I see no reason to attack Google in this way.
An ad hominem would mean that I am saying they're wrong. My argument is not
Google is an advertising company, therefore SPDY is bad.
My argument is
Google is an advertising company, and the largest
single entity on the Internet, and therefore, their
actions deserve a healthy dose of skepticism. I'm
not sure that we've been giving them enough skepticism.
SPDY does have good points, and bad points. I just saw a lot of chatter from people who want to ignore the bad points simply because of where the spec came from.
It's not advertising that I'm skeptical of w/r/t Google. It's their amount of capital.
E.g., relating to the ASCII/binary discussion above:
A binary web would require advanced retooling and therefore investment. Smaller business entities are not in such a strong position to deal with such a large shift in their workflow. Therefore, switching to a binary protocol would disadvantage entities smaller than google.
The "specifics" in the other threads are more ad hominem. You're saying "we should be wary of what Google does" without actually mentioning what's there in the spec to be wary about. You're saying "we shouldn't trust Google to pass specs unchecked", people are saying "but we aren't: we've read the spec, and it's good", and you're saying "yeah, but we shouldn't trust Google".
As I mentioned elsewhere in the thread, an advertising company has incentives to collect and process as much data about individuals as possible. This is Google's core competency.
> There is more to the world than a technical draft, you can't just abstract away the rest of everything else.
Then maybe you should actually bring these things up. You seem to be playing coy and throwing around innuendo. If you have real, substantive concerns about something, I think the conversation would be greatly improved by actually bringing those up rather than just casting aspersions about Google.
Sure, but I don't find the social critique very useful either.
There are things that Google does, as a corporation, that you can find fault with, technological decisions that may or may not have been influenced by business model. For those, fire away. But the attack on SPDY/HTTP/2.0 because "Google is an advertising company" (which if you actually worked at Google, and knew how people made decisions here, you'd know is ridiculous from an intent or motivation point of view) is just pure mudslinging.
Examples of stuff that I, as a Google employee, would criticize Google for: Real Names, building "siloed" services and moving away federated/decentralized approaches (see my essay here: http://timepedia.blogspot.com/2008/05/decentralizing-web.htm...), most of what Yegge said about APIs, Google Hangouts going "silo" and away from XMPP model, etc.
People who work on ads and take their marching orders from ads are a small portion of employees at Google. The guys working on Chromium/Blink/SPDY do not report to ads, do not take orders from ads, and in general, work on technology without reference to monetization strategy. Their day to day job is to improve technology, with the hope that if you raise the tide, all boats will be lifted, and they'll be some ROI from that.
But that the idea that engineers are taking marching orders from shareholders to maximize profits based on ads by tweaking web standards is hilariously wrong for people working on Chrome.
I'm not talking about "Larry And Sergey have decreed that Evil Shall Happen!" I'm talking about broad economic incentives. Since I don't work for Google, I have to treat them as a black box; I see what goes in, I see what comes out. I know nothing of the internals, I only have one friend who actually works there. If I implied there was some kind of conspiracy, that is my fault. You're right that that would be ridiculous.
I would also criticize Google for your reasons, and they may be even more important. But this isn't a thread about those things.
Github's revenue stream is through private repositories (both hosted on github.com and self-hosted enterprise), but I don't think you could reasonably assert that Github's purpose is to make a profit off of keeping code private. Their actions, in fact, suggest precisely the opposite.
In some cases, a company could transcend its initial purpose, but still keep it around as a/the revenue stream as a means to the new end. Not many/any new and further out there Google initiatives have made it to wide scale public adoption, so it's yet unclear whether Google would be such a company, but it could very well turn out to be one.
If it's a publicly traded company, it has a fiduciary responsibility to make money for its investors; so I'd have to agree with you. It's purpose is to make money. It might spend money to buy goodwill to earn loyalty, but at the end of the business day, its a business.
It doesn't have to be "codified" to be fiduciary. The trust relationship between any investor and the investment enterprise is that the enterprise will be able to generate a return on the investment. If it doesn't assume this, it generally will be deemed a non-profit.
In brief, he makes three points. The first is that SPDY/HTTP 2.0 doesn't do anything about the widely lamented lack of session handling. The second is that it doesn't contain any simplifications of HTTP, despite there being several examples of things that could be simplified (header parsing, for instance, is hairier than it could be). The third is that it is going to pose problems for proxies.
I don't know how many of these points continue to apply with this HTTP 2.0 draft, nor do I have any skin in this game, but I respect PHK quite a bit so his outrage creates in me a sense of mild reservation. :)
I too have unreserved respect for PHK as an implementor. I'm not sure I find his critique compelling. It seems to me that it distills to a couple simple points:
* SPDY depends on Deflate compression, and will require middleboxes to implement deflate to route requests. I think the "IETF school of design" has an irrational fear of good compression and I think it's harmed other protocols, most notably DNS. I may be poisoned into this viewpoint by Bernstein.
* There are protocol constants that PHK doesn't know the background of, which strikes me as the kind of documentation bug that something like an HTTP 2.0 would address.
* SPDY might have required another WKP, which isn't really a SPDY problem.
* There's DoS potential in SPDY --- but of course, there's DoS potential in HTTP too; look at chunked encoding, for instance. For that matter, modern HTTP 1.1 also accommodates compression; when it comes to attack surface, in for a penny, in for a pound.
* A similar argument addresses PHK's concerns about the (theoretic) security of the push model, which is also something that modern HTTP accommodates.
Oh oh also: PHK sees HTTP 2.0 as an opportunity to correct the session management problem, which has led to the "bass ackwards" design of heavyweight signed cookies in web applications. I sympathize with him on this point, but it's not HTTP's fault that this happens. HTTP 1.1 cookies also used to be simple opaque session IDs; heavyweight signed cookies are a consequence of server app architecture, not the underlying protocol.
Even if HTTP 2.0 had built-in robust session management, Rails apps would still be shoving several kbytes of encrypted state out to web browsers.
The first two criticisms of SPDY sound like "doesn't solve every known problem with HTTP at once", which was never a design goal; that doesn't make SPDY bad, it just means that further room for improvement still exists.
The third criticism, that SPDY makes life more difficult for routers, makes me wonder: would this get easier if SPDY just said "forget the Host header, SPDY requires SNI"? Seems like that would help.
My main objection is that the name you call something does matter. SPDY is a very different protocol from HTTP, which addresses a very particular set of concerns. It diverges quite a bit from the "intent" of HTTP. This is all fine and good until you change the name from SPDY to HTTP 2.0. One expects 2.0 of something to continue the same philosophy and motivation that produced 1.0. When that doesn't happen (R6RS is another good example) you can expect some pushback. In this particular case, the "label swap" nature of the process is generating animosity from those who feel that the process has been co-opted by people trying to pull a fast one. I don't think SPDY is intrinsically wrong, I just don't think it looks like a natural successor to HTTP. I wouldn't expect HTTP 2.0 to address every known problem with HTTP at once, but I don't think it's unreasonable to expect at least a few aesthetic improvements.
I don't see how this follows from your earlier objections. "It doesn't add session handling and it doesn't simplify header parsing, therefore it diverges from the intent of HTTP" seems like a non sequitur.
Don't confuse my objections with PHK's objections. There may be good technical answers to his objections; Thomas replied to them above quite cogently, but in any event, PHK's opinion carries a lot more weight than mine. I'm just a spectator.
My objection (observation, really) is that one expects protocol 2.0 to do more than address performance optimization. Simplifying the protocol is a good thing to do with a major revision; they didn't do that. Making the protocol more friendly for upper layer users is another good thing to do with a major revision; they didn't do that either. Instead they took an obviously different protocol designed to address a handful of extremely technical performance matters and rubber-stamped it as HTTP 2.0. Whether you like SPDY or not, it should be clear that this kind of "process" is going to leave people feeling disenfranchised. The spirit of HTTP, inasmuch as such a thing exists, is one of simplicity. SPDY just doesn't "smell" like the successor.
I think the comparison to R6RS is very appropriate to my point. R6RS was designed to address well-known shortcomings of Scheme. The process it took to get approved circumvented a lot of the community. A large segment of the community responded to this by essentially whining about it and ignoring it. We already see the whining about HTTP 2.0. I predict it will be followed by ignoring it, and some years in the future, an HTTP 2.1 or 3.0 that more closely resembles HTTP 1.1.
One critique that I don't remember if is contained in either of these two is header compression. Header compression seems to make sense, as compression is good. The problem is that intermediaries make routing decisions based on the headers, and so it's quite possible that the CPU time needed to decompress, possibly modify, and recompress the headers outweighs any gains that the compression brought in the first place.
I've also seen some vague commentary about 'mixing application concerns into the transport layer' which I find compelling, but I don't have enough experience with the low-level networking to properly judge on my own.
You don't think Google receives enough skepticism? Every time they brew a pot of coffee, somebody out there declares that Google has violated their "Don't be evil" motto and is out to destroy us all with their dark caffeinated schemes. I can think of very few companies that are treated with more skepticism than Google.
Google is the industry's most active and effective corporate advocate for TLS. They're one of the key drivers for certificate pinning and one of the earliest mainstream deployers of forward secrecy. So I think that argument is a little bogus.
I don't understand the first point, though. Could you clarify?
QUIC is a very new, experimental protocol that runs on UDP.
Their (relevant) basis is that TCP's algorithms are completely controlled by the OSes and the routers and all. Using UDP, QUIC can quickly deploy new algorithms without requiring a major part of the world's infrastructure changed.
Google is the industry's most active and effective corporate advocate for TLS simply because it makes tracking users and selling targeted advertising a whole lot easier. Their involvement in the whole PRISM affair has undoubtedly demonstrated that privacy is none of their concern.
Years ago, in days of old, when magic filled the air, I wrote a Slashdot troll post generator. It eventually produced some pretty hilarious posts, but I never closed the loop by allowing it to post. It would make a fun project for learning a new language; perhaps I'll install Dart and give it a shot.
>I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning
I don't think there's very little questioning...in anycase, Google's goal is to deliver ads to you in the most efficient and fastest way possible. The more you browse the web the more Google makes; it's in their interest to develop SPDY/HTTP2.0. So what is wrong with them doing it? IEFT spec drafts are public and they're audited (as SPDY has been).
Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.
Also, most of the specs that we love and rely on today came from "big vendors"
The core internet protocols we rely on, though, mostly didn't. If you look at the authors of RFCs specifying the widely used standards, nearly all of them were at research institutions: Steve Crocker was at UCLA then ARPA; Vint Cerf was at UCLA, then Stanford, then ARPA; Bob Kahn was at ARPA; Jon Postel was at UCLA then USC; Paul Mockapetris was at UC Irvine; Abhay Bhushan was at MIT; Tim Berners-Lee was at CERN then MIT.
Not sure if that's good or bad, but it seems to have been uncommon until recently for internet protocols to come from vendors.
edit: I did think of one important one, IPv6. Steve Deering was at Stanford, then Xerox PARC, then Cisco, and IPv6 came out during his Xerox/Cisco period. Bob Hinden was at Ipsilon Networks, then Nokia.
> IEFT [sic] spec drafts are public and they're audited (as SPDY has been).
Absolutely. But there's more than one kind of control. I don't think enough programmers understand the effects of social control. If the standards are all public and audited, but only employees of Apple, Google, and Microsoft have the time and energy to keep up with discussions, well...
And, of course, I'm not imply that _only_ that is true, I just fear that big organizations are dominating the discussion. I have more free time than the vast majority of programmers, are subscribed to the HTTP 2.0 mailing list, and find it hard to keep up.
> IEFT spec drafts are public and they're audited (as SPDY has been).
> Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.
It's especially hard when the call for proposals period of the draft is about 4 months and there happens to be a ready made proposal from a big player at the ready to be agreed on almost immediately. It's nice to say the little guy should be a part, but in this case the little guy mostly heard about it long after it happened.
Oh come on don't let paranoia encourage you to throw the baby out with the bath water…
The way Mark Nottingham ran the original CFP for HTTP 2 and the eventual adoption of SPDY as a starting point was very fair - it's all there in the IETF archives for anyone to see. From memory there were only two other proposals (from Microsoft and someone else)
The reason Google were able to get a new protocol up and running is because they have both heavily used web properties and a browser. They're also willing to carry out experiments in public.
As it stands HTTP 2.0 will be good for the little guys too, based on the testing I've done little guys will see an improvement in performance without needing to do all the merging that destroys cache lifetimes.
3rd party-content is the fly in the ointment to the performance improvements so we'll need to be much more careful about the performance of the 3rd party sites we include.
N.B. Apart from using their products I have no affiliation with Google
First, ASCII is inefficient. People don't interpret HTTP, computers do. Web servers and browsers. People only look at HTTP when they want to troubleshoot without any tools. With real tools, you can find out what's broken much quicker. And there's plenty of things you can miss without a real HTTP interpreter. Most hackers prefer to think of themselves as wizards that can spy 0's and 1's and tell you what the weather is. It doesn't make for a better protocol, though.
Third, it's a hack. If you want to improve the protocol, improve the protocol, don't just hack onto it to make it do what you want. I could make a horse and buggy go 60mph, but would it be a good idea? How about just designing a better buggy that is intended to go 60mph?
Fourth, fixed-length records are the wave of the future! It solves crazy problems like header injection and request integrity checking. Moreover, it makes for simpler, more efficient parsing of requests.
Fifth, redundancies introduced from the beginning of time need to go away, like terminating every record with "\r\n", or passing the same headers on every single damn request when once should be just fine for a stream of requests. Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.
Sixth, the flow control improvements can make different applications more efficient by both not having to hold state of where and when traffic is coming and improving flow across disparate network hops.
Seventh, as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits? Add to this that every header could have a 32-bit identifier (4 bytes) and you've got more efficient compression than gzip. Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers, which would make working with the protocol in general more attractive. But then you have your binary-detractor-wizard-hackers and the whole conversation becomes an infinite loop.
> First, ASCII is inefficient. People don't interpret HTTP, computers do.
I can't tell you how many times I've manually read HTTP. To be sure, it's insignificant compared to how many HTTP headers have passed through my computer unseen by me.
ASCII may be inefficient, but computers are really fast, people are not. I don't have any measurements, but in making a browser, HTTP header parsing/writing was never near a performance issue. Bandwidth wise it also tiny compared to that image file you'd inevitably download for every page visit.
And sometimes you don't have tools. Sometimes you don't want tools. Sometimes you want to use tools that work with text to analyse your problem.
(You other arguments might still stand, though :) )
> HTTP header parsing/writing was never near a performance issue.
Since header lengths are not limited, and a single TCP packet's payload is quite limited, long headers can cause very measurable latency difference. Additionally, while I agree the generation / parsing overhead is probably quite small, saving it for every HTTP request is still a boon.
I'm also curious where you are reading raw HTTP from?
For me it's primarily in two situations, reading a packet capture from WireShark, or in the browser's debugger. In both cases, the tool will end up translating the request for me.
I guess a lot of people here never had to work with the X protocols like X.400 and X.500. When you need specialized tools for every protocol and encoding format, development is a real drag.
Just because I use Charles or Wireshark doesn't mean that I only want to use those specialized tools. I have definitely been in situations where I'm doing something like running nc as a proxy and looking at raw HTTP. I wouldn't choose to throw that away and revert to the bad old days without a big win.
> saving it for every HTTP request is still a boon
I'm questioning the measurability of this, though. Smells like premature optimization. You could be right, but I'd like to at least measure it before we go about changing one of the fundamental protocols on the internet.
Last time I read raw HTTP was when writing a script to automate some stuff on a web page. I specifically did not want the browser's headers and behaviors. I had a bug which only happened from my script, and raw HTTP helped me track it down. I could have used wireshark, but I am much faster in vim for a simple task like that.
HTTP has existed for over 20 years. We've had some time to look at it. It has been measured.
As a comparison to your scripting story, you would use Wget or Curl or LWP::UserAgent or a thousand other things to automate HTTP requests. One function call to do what you did manually. To find bugs you would use an HTTP fuzzer like Skipfish to automate the process. If you think somehow your manual process was faster, I say to you, teach a man to fish...
(I automate things in web pages for a living, and I only use tools like Firebug and LWP)
Plain-text formats have always been slower for things that are not plain text. But even 30 years ago, when computers were even slower, Unix designers decided plain text was still the way to go, because it was easier to debug and easier for humans to work with. No specialized tools required, no poring over hex dumps. HTML won over other document formats. JSON and XML won over other binary formats. Any coder can look at JSON and see what is being transferred, without the aid of anything but a text editor. Plain-text marshalling formats for binary data (e.g., base64) are still useful for pasting data into an email or adding ssh keys to authorized_keys with "cat >>". Tool support is not going to make SPDY any nicer.
Things have changed in 30 years. Unix designers didn't have the time or resources to write elaborate tools, nor the need for complicated software. Back then you would use telnet to browse Gopher or send your mail. Things are different now. I dare you to read a 3KB JSON file without a parser. Base64 was a hack for text-based protocols. Tool support will make it a lot nicer than no support.
The main "thing" that changed in 30 years is computational power, which is now several orders of magnitude greater. If 30 years ago computers sporting the power of timex watches spared the cycles for text protocol overhead, I fail to see the need to squeeze, in today's hardware, that last drop of performance.
The advantages of text based protocol remain the same. The disadvantage is lessened by faster CPUs.
The only advantage to a text-based protocol is you can read and understand it in raw form. Unfortunately this is not an advantage over binary protocols.
If anything, the more complex the protocol, the more redundant the text becomes, because we have to write tools to parse the text and output it so we can understand it better or identify flaws in it, and work around bugs introduced by the human element of the protocol. The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.
You still need a library or tool to write the protocol out, as it's complicated and needs to be structured for the machine, not a person.
Second, saying "it's ok that it's slow, we'll just buy a faster CPU" is not a good argument for anything ever. It's part of the reason it's taken so long to adopt encrypted services everywhere. Someone (Google) had to finally prove it wasn't slow so people would adopt it.
Third, the state of modern computers is that there is no difference in speed between interpreting most text protocols and binary protocols. But that has nothing to do with efficiency, or what the machine is naturally suited to doing. You have to translate from English into machine code for a computer to know what the hell another machine is talking about. Machines don't care about line-by-line, or capitalization, or indentation, spaces, or any vestige of our natural language. Strip all those things away and machines purr along happily with less bullshit to deal with, which means simpler, more efficient code. Note that I didn't say faster.
Fourth, your performance and history observation is flawed. We need a lot more performance today than we did before, as we're scaling existing technology to many, many orders of magnitude higher than anything that existed when it was invented. Yes, we have faster CPUs. We also have more users and more data, and we don't have time to sit around reading packet dumps in text editors.
> The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.
This is entirely false, as anyone who ever had to debug a malfunctioning http proxy or a misbehaving IMAP can tell you. Nothing beats netcat for a quick bug isolation test. As for the need for formal parsing, again it is true for production code, entirely false for sysops transient tasks.
Compare debugging a corba server with debugging http for a whiff of the he difference.
Tools aren't omnipresent. My miryad busybox embedded devices won't ever likely have a protocol analyzer. If I'm in need of one there, I'm done with.
You're already dependent on tools - your eyes and language processing parts of your brain - to use text formats. With a binary protocol you'd be equally dependent on tools. They're just not embedded in your skull.
Seeing as we use binary protocols every day of our lives, and the tools to work with them have existed for years, and nobody has any problem with using them, let's let this argument rest.
HTTP is a layer 7 communication protocol. HTML/CSS are markup languages for designing an interface. JSON is a data interchange format. RSS is a content syndication format.
They are all wildly, vastly different. The only thing they have in common is they're all ASCII. If anything, you're making my argument for me: a communications protocol is not a format for displaying documents, it is a language for communicating machine instructions to network applications. Historically they have always been binary because it works better that way.
Your argument that "people can read ASCII, so ASCII is good" leaves out a couple points. Like, human beings do not read an HTTP statement, go into a file folder, bring out a document and present it to their computer. It's the other way around.
Really this just reflects a strange phobia people seem to have. Like your brain is tricking you into thinking you'll lose something by not looking directly "at the wire".
When you look at HTTP headers, 90% of the time you're actually looking at a pre-parsed, normalized set of fields. If you look at a raw packet dump, the whole message may not show up in one packet; you may have to reassemble it, which means parsing. If you have multiple requests in one connection, you have to find the end of the last request, which means seeking through the stream; seeing requests broken down individually means a tool already parsed them. Firebug and wireshark and other tools all take care of the automated, machine-operated work for you.
And what's left? What do you have to do with HTTP, really? Apache rules? They'd stay human-readable. Application testing? We use proxies that handle it, and APIs for client/server programming. Firewalling? Handled by tools and appliances.
Stop giving me the blanket "ASCII is great for everything" excuse and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool. But you don't have to, because that's impossible: HTTP is not for humans.
I look forward to servers having different text representation of the same binary headers in their config files.
tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes
You're missing the point.
No one writes HTML manually anymore either. People generate it using tools (string processing tools in a language or templates) and read it using browsers. Heck, even Notepad++ is a tool, but a generic one.
If you want, you can generate all your HTML using DOM. But almost no one does that, because DOM tools are clumsy, while text-based tools are easy to use.
You're actually still arguing for my point instead of against it.
If no one writes HTML manually anymore, then we have no need for it to look like English when the computer interprets it! We can compile the HTML down to bytecode and have it be interpreted much quicker by the computer, which won't have to do the job of lexing, compiling, assembling, etc. Here, two steps would be eliminated immediately, resulting in increased speed and more efficient storage and transmission: http://www.html5rocks.com/en/tutorials/internals/howbrowsers...
For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!
On top of that, you missed when I said HTTP is a communications protocol. Ever seen the movie The Matrix? Know how the sentinels would sometimes look at each other and make scuttling noises, then shovel off somewhere? They weren't speaking English ASCII. They were speaking a binary communications protocol. Know how I know? BECAUSE MACHINES AREN'T HUMANS! It would be absolutely moronic for them to speak English to each other. It would be like dogs saying the English word "bark" instead of just barking. Completely unnecessary and crazy. But that's what an ASCII communications protocol for machines is.
On top of that, there is no benefit, not one at all, to humans being able to read it when tools already exist to interpret and display it even more human-readable than its natural state. We squish and compress and strip HTML and JS already just to make it more efficient, and then undo the whole process just to read it. It's insane.
The web is made by people, not computers. Open an ubiquitous text-editor and you can start working on something right away. If you have to download a dozen different compilers and IDEs to do that, it's definitely not the same.
"The web" is actually just a collection of hyperlinks, applications that parse markup and document storage and retrieval services. You don't see code. You see pictures of cats. And you never, ever need a text editor to use it.
Face it. Your love affair with ASCII is just that: an emotion.
(As to your original question: humans haven't needed to program in binary or assembly for decades. That's what so great about computers: they do the hard work for us, so we don't need to type everything manually into a text editor. Is that such a hard pill to swallow?)
You're completely ignoring the fact that the web began as (and still is, in part) a collaborative tool and publishing platform. Text-based formats played an immense part in that, geocities, the rise of personal publishing, blogs, these would not have happened without them.
Yes, binary is more efficient, but then tell me why is JSON the most popular data interchange format on the web today?
binary formats sucked so much, that they had to invent XML and it was a much better way to start the interaction era, were services talk to each other without having to read a 30 page spec just to understand how to write the right payload for the interchange format used.
Let alone the byte order...
Are you arguing that we should have embraced Java Applets and ActiveX controls, because they are binary formats, hence more efficient?
HTTP is NOT a communication protocol, it is an APPLICATION protocol.
HTTP is an application on top of a transport layer, HTTP, just like SMTP, IRC, FTP, IMAP etc etc is just a protocol that describes applications.
It is not TCP or UDP and SHOULD NOT BE!
Here's the problem; you keep on assuming that good tools will magically appear, that help with debugging. But good tools take a lot of time and work to perfect. In reality, you usually wind up with just barely good enough tools.
With a text based protocol, you can inspect it visually with no special tools, and munge it with general purpose tools that you already know how to use (shell script, sed, awk, perl, python, ruby, what have you) with no special support libraries or anything of the sort. Support libraries can help you with the more complex aspects of the protocols, but for basic debugging purposes, you can do it all with general purpose tools.
With a binary protocol, you need those libraries to even have a chance of being able to work with it. Now you can't use a general purpose shell pipeline to munge it; no more nc | grep or what have you. You have to have a wireshark dissector; and good luck figuring out how to grep through the results of what a wireshark dissector generates.
The main point is that the overhead of the ASCII encoding isn't the main problem with HTTP. Reading ASCII encoded CRLF delimited headers is a solved problem (and heck, you could probably switch that to just LF delimiters, since I'm sure that most processors already handle that case just fine).
> The same arguments can be applied to HTML, CSS, JSON, RSS and so on.
It can be, but that doesn't really make sense. The vast majority of web development is done without manually editing HTTP headers. It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice. The same cannot be said for any of the other technologies you listed.
This is a good example where adding an object-oriented representation to every header out there would require a lot of work. Not sure if it would justify the gains.
It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice.
Until you try to use grep or something of that sort for some non-trivial analysis operation. Everything 'speaks' ASCII. Custom tools for binary format would take years to evolve to be as powerful as generic text tools.
All you need is one parsing tool ported to 1000 platforms in existence now. We already have ASCII tools on all of those platforms, and we are pretty much guaranteed once new platform is created it would have basic ASCII tools. However it is not at all guaranteed it would have decoder tool for every binary protocol out there. That's why ASCII protocols are easier to handle than binary ones. And for 99.999% of protocol users, savings from converting to binary would not be even measurable. Sure, for likes of Google and Amazon economy of scale would be substantial. But 99.999% of web users aren't humongous-scale projects, they are relatively low-tech projects for which simplicity is much more important than squeezing out every last bit of performance.
So long as nobody that needs to speak ASCII deploys a server that doesn't speak HTTP/1.1, I think switching to a more compact binary protocol for HTTP/2.0 is a good thing. Embedded devices will be able to handle more sessions with less CPU power, for example.
I'm not convinced for average embedded device parsing HTTP headers represents significant amount of energy spent. Are there any data that suggest that for average device - I don't mean Google's specialized routers or any other hardware specifically designed to parse HTTP - this change would produce measurable improvement? In other words, how much longer the battery on my iPad would last? I don't think I'd gain even a single second, but I'd be very interested to see data that suggest otherwise.
> This is a good example where adding an object-oriented representation to every header out there would require a lot of work.
Most of the tools I've used represent the header as a hash/dictionary. I fail to see how that approach "requires a lot of work".
> Until you try to use grep or something of that sort for some non-trivial analysis operation.
You're arguing from the assumption that a binary protocol would be implemented by idiots. Custom binary tools can always emit a textual representation, at which point you can grep through it to your heart's content. This is the exact same problem that we've been solving with compilers for generations. It isn't nearly as insurmountable as you seem to believe.
People are actually writing HTML/CSS/JS, so no. They have a whole other set of issues like being XML based (for HTML), but they do the job and are not likely to experience fundamental change in the next 5 years in broad adoption.
> They have a whole other set of issues like being XML based (for HTML)
HTML isn't XML based; there is an XML-based relative of HTML (XHTML) which was originally (before HTML5) viewed as a potential successor to HTML, and with HTML5 there is an available XML-based serialization of the HTML's semantics, but HTML is its own thing (prior to HTML 5, HTML was SGML-based; XML was inspired by HTML rather than serving as the basis for it.)
ASCII is inefficient, but nobody cares. For vast majority of Web users, which aren't working on scales of Google and Amazon, difference in performance couldn't even be measured. And for them easiness of use - low barrier of entry, basically all you need is a basic ASCII tool and you're ready to go - is vastly more important than completely immeasurable performance gains from using opaque protocols.
Of course, you can say TCP/IP is still binary, and it is true. But TCP/IP tools are built in every OS in existence now, so they do not form a real entrance barrier. Would HTTP tools be in the same position? I'm not sure - most HTTP tools right now are not standard and do not cover even HTTP/1.1 completely, what reason is there to expect they'd cover the whole 2.0 protocol properly and be widely standard and available on the level tcp/ip tools are? Which means much higher barrier of entry.
I'd figure the inefficiency cost of ascii vs binary http headers over yottabytes of packets every year would add up. It hurts your bandwidth, it wastes electricity on the wire, and it wastes processing power. An insignificant smidgen on average, but add it up and it would probably be substantial.
That is always my stance on things - if one computer is going to run something, write it in python, make it bleed memory, just make it work. If it is going to run on a million, you have to consider the raw power waste of inefficient programming. If it is going to run on trillions of devices for decades, your choices are few in my mind.
Yottabytes only come into play if everybody switches. But complexity of binary protocols would work against that. So probably only very large sites would implement it - and even for them, is parsing HTTP costs that much?
>>> If it is going to run on trillions of devices for decades, your choices are few in my mind.
The history suggests otherwise - majority of mass-produced software is not written with performance as an ultimate concern. You would find a lot of software written in languages like Python or Java, even though using C or assembly would probably produce better performance. But using C or assembly that software would probably never be produced because its complexity would be harder to manage.
Of course, performance does matter - even writing in Python, you have to worry about performance. But here we effectively see an argument saying "since we have a lot of software in Python, if we switch it to C we'll have massive performance gains". I think it is a wrong line of argument, if we switch to C a lot of this software wouldn't be written. (Note it's not against Python or C - I use both and they both are great in their areas :)
So I guess optimized protocol does have its uses for high-volume websites - but I am concerned its advantages would be offset by its complexity. The designation of it as HTTP/2.0 implies it is the next version of HTTP - but it's rather a rather different thing with different use case. I'd rather have it as a separate protocol for high-traffic websites.
Amen! Efficiency to who's standards or values? It's a financial issue for big Internet but what's the real values cost the rest of the Internet? It's obviously a minimal financial costs, as everyone I know that wants to publish can. Why sacrifice durability, readability, and the original core values of the Internet for saving "big dollars" from "big providers"?
I strongly disagree with "protocols are observed with tools we don't need ASCII".
That's pretty annoying to see this kind of thinking. The reason why everyone codes in JS and uses HTML, CSS is because its ASCII. Its easy to understand, hack, etc. Same reason python is so popular. Even Go, is pretty simple like that. Sure its languages vs protocols, but the reasoning is exactly the same.
And in fact, the comparison works with protocols as well:
SMTP, IMAP, HTTP, IRC are EXTREMELY easy to understand and code for.
Binary protocols are a huge PITA to code for. The argument that you're going to use a lib or whatever tool just doesn't hold any water. You want to understand what exactly happens.
Thats how everyone learns, etc. I could write my own SMTP, IRC clients when i was 10. I could understand it. It works. No way I could understand fully the documented binary protocols. I tried, and it was just too painful and not fun at all (hey, I was 10).
I'm not certain the added performance of using a binary format and some of the other advantages are really good enough to make the world unable to understand whats going on anymore by just looking at it.
Sure purely technically speaking, it sounds like "binary is the way to go" for pure performance.
But if you think about it, making hacking around that stuff a niche thing is perhaps a much greater loss. Even the reliability of a binary protocol is VERY arguable.
In fact I'll put a last comparison. Shell pipes and ASCII. Many tried to replace them with smart binary protocols, objects, etc. Its cool. Its more powerful. More efficient. At the end of the day tho, a quick hack with regular pipes transferring ascii is just easier to understand and we all use those - not the fancy binary objects.
A binary protocol is bad for a couple of reasons, too.
First, OSI layer 6 called and it wants its old job back. It sat around connecting layers 5 and 7 peacefully since the dawn of the ARPANET, all while the HTTPbis guys were passing around messages back and forth trying to obsolete it.
Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.
Third, the Internet is big-endian while most common processors in use today are little-endian. This is going to haunt peoples lives forever because you have to continuously convert between the two and although the conversions are orthogonal, the methods aren't idempotent (as opposed to converting a string to ASCII, or a text buffer to DOS style line endings).
You mention 32-bit identifiers as opposed to a string of digits. This is more error-prone than you think, two's complement isn't the only integer representation out there. Implementations written in C would have to deal with their underlying architecture as the standard allows for 3 different representations (so the compiler wouldn't help you out). Then there's signed and unsigned, either of which might not be available in the implementers programming language. You end up unpacking the identifier by hand, which may end up being slower than just looping through a string. ASCII is hardly an inefficient serialisation format.
Fourth, any fixed-length records are going to be useless at some point in the future. Several versions later the fixed length records are going to either point to an extra set of records tailing the 2.0 records or will simply have a (designated) backwards compatible value for consumption by older peers. With HTTP, we can add a header anywhere in the request or response except for the very top. We can even shuffle them at will without adverse effects on peers.
Fifth, it doesn't make sense to optimise a tiny fraction of the entire HTTP session. Any benefits are too small to be worthwhile and would therefore result in a net-negative to most implementers.
Sixth, you can still make improvements to HTTP without moving to a binary protocol. Not sending the same headers on every single request isn't one of them. HTTP is essentially a stateless protocol and every request could be handled by a different server. You can architect clusters of servers routing incoming requests however you please and satisfy every one of them correctly and efficiently. For starters, you can replace any underlying protocol in the stack with a more cluster-friendly protocol in transit.
Seventh, just because no one is using a given content type in HTTP (I think you were referring to multipart/related) doesn't mean the protocol used to transfer that content is bad. Heck, it's not even part of any HTTP standard.
2. Amazingly, people have made binary protocols work before, in spite of no preexisting implementation, so it's not impossible. I'm sure we will be able to meet the challenge.
3. Do not try to sell me the endianess issue. I have written multi-arch tcp/ip stacks and i'm not a CS major. Trust me, it will be okay.
4. Yes, and IPv6 address space will someday expire. But not soon. And as many fixed-length frame protocols have done in the past, you leave an "extra frame options" bit to stack more fields on. It's fine.
5. It's really not about optimization at all. It's about common sense. The computer works better when you talk to it in computer-speak, and we gain absolutely nothing by talking to it in English human-speak. The benefits are a net-positive because parsing is easier, because a computer is parsing it, not a human. There is no sane argument that can validly claim that parsing human-readable English is easier for a computer than fixed-length bitstrings. CPUs don't grok ASCII, they grok BINARY.
6. Modern designs for clusters of web applications route by session, not by individual request. You are session-oriented instead of connection-oriented, though in practice it's almost the same thing. And see previous comment on why adding onto HTTP willy-nilly is just a hack.
7. No, just the jgc's re-implementation of multipart is bad, for previously stated reasons.
If you think binary protocols are so great, you have to then explain why text protocols are winning all over the place. People tried to do binary for a long time before HTTP won. We had all kinds of RPC mechanism - CORBA, DCOM, etc. Even the winning data serialization formats are mostly text (JSON, XML) despite the fact we know it's less efficient. Even where people make binary versions the ones that succeed are direct one-to-one translations of the text (eg: BSON).
In the end, it is formats that people can understand that win the day. You can't just write that off as if it has no value. It plays out in technical ways: all the CORBA implementations ended up having very poor interporability partly because they were hard to debug. Nobody could actually look at a CORBA exchange and see what was wrong with it.
It's because developers need to read JSON/XML regularly. They validate the data the are sending to the client, they create test cases, the testers often read them as well, it is sometimes stored them in databases. It's because the format changes so frequently that reading it is important.
HTTP is not comparable because it never really changes. It's a fixed format. And frankly the majority of developers never need to go down to that level anyway.
I'd wager every site sitting behind CDNs or a varnish saw a developer go down to telnetting to port 80 to debug the cache behaviour. If you include frontend developers, sure your majority of developers assertion is true. Select sysops only, and you'll be e surprised.
Re 4: Only under incredibly optimistic models of the future survival and expansion of our species! The IPv6 address space has about 10^38 addresses. Earth's land surface is about 150*10^18 square meters. So in a future where the planet is so crowded that every person lives on a single square meter and owns 10^6 globally routable gadgets, we'd still need 10^12 Earth sized planets to exhaust IPv6.
(Caveat, I'm back-of-the-enveloping this on my phone about to go to sleep
1. It definitely is an argument and in fact my main argument. At least explain how this would be any less valid than your "it's a hack" and (bandwagon) "wave of the future" arguments.
2. My turn to invoke "not an argument". Just because one can simply copy a struct over a socket doesn't mean it's a good idea to do so. Especially in light of the flourishing culture of diversity we have on the Internet.
3. You conveniently choose to ignore one half of the argument, but miss it entirely. The point is not that we can't overcome endianness mismatch, it's that we shouldn't have to. At least not inside layer 7.
4. Except the old records will have to remain there forever. HTTP implementations dropped the Pragma header a long time ago and today we can simply pretend it was never there.
5. When it comes to common sense, ASCII is right there. That's because a protocol on the Internet needs to interoperable with many systems. Sure, all of those systems use binary one way or another. But human operators are still going to have to program those systems and ASCII is a useful representation which enables us to do just that. Furthermore, the draft proposes to encode binary headers in base64 in order to transfer them in an HTTP/1.1 upgrade request. Now you have 3 ways of transferring HTTP headers instead of just one and we'll have to support all of them in any case. This might seem trivial to you, but it's a problem to servers and quite a huge one at that for intermediaries (proxies).
6. Amending HTTP with a new header again is much less a hack than providing a way to switch to a binary protocol and resume communications from there. Your buggy argument doesn't stand, for HTTP is not the car. It's the pavement upon which old buggies can ride along just fine until it's no longer considered safe amongst the faster carriages.
7. Yes, I'm not convinced client-provided request identifiers are the way forward myself. Though I would consider the proposal a better starting point for discussions than the current HTTP/2.0 draft because it leverages existing mechanisms better.
1. I understood your point to be "Well why isn't HTTP layer 6?" or "Why isn't layer 6 used?" which makes no sense as TLS is layer 6, and HTTP (and the web service) is layer 7. They necessitate each other. Simply stating that X and Y are different parts of the OSI model are not arguments toward the format of a protocol in one layer.
2. My argument isn't "just because you can", it's "you can." You seemed to be saying it would be difficult if not impossible. I was saying, no it isn't.
3. Endianess will always be an issue, forever. The only time it will go away is when every architecture picks one format. It's a really simple operation and it's part of how computers expect us to behave due to their nature and design. Hacking around it doesn't make it disappear, nor does it help anything.
4. What old records? Pragma was deprecated in 1.1 yet included anyway for god knows why. There's no reason they should do so again, but if they do, it will exist both in text and non-text versions. This is a non-issue.
> a protocol on the Internet needs to interoperable with many systems
You mean like IP, ICMP, TCP and UDP?
> But human operators are still going to have to program those systems
> and ASCII is a useful representation which enables us to do just that
Sure. My C code editor displays ASCII. It totally enables me to write IP, TCP and UDP code, using an ASCII display with code in ASCII. And it neatly compiles down to binary and runs a binary protocol. Amazing!!!! (seriously though, if your argument is that ASCII is just easier to "program" as a protocol, you're up shit creek; you have to write more code to handle converting ASCII to binary and back anyway. your high-level language abstractions hide this fact from you, and you think it's a convenience because you never have to learn what a constant is)
> it's a problem to servers and quite a huge one at that for intermediaries
That's backwards compatibility for you. If the alternative is to simply mangle and bungle the existing format into a frankenstein into eternity, it's not going to be any better.
6. Are you comparing extending HTTP/1.1 for a single feature to the backwards-compatibility support of HTTP/2.0? Because that makes no sense. The vehicle analogy is just weird at this point.
7. See, this is where the vehicle analogy works again. "leveraging existing mechanisms". In other words, let's throw one more feature on top. It never ends, because all you have to do is keep adding more lines, and modify the browser, and modify either the server or your web app, and keep going to support god knows what. At some point they'll implement an incredibly complex binary protocol and embed it in base64-encoded ASCII HTTP/1.1 headers, because "leveraging existing technologies" is thought of as a neat thing to do. It will also be insane. At some point you need to just make a better <whatever> instead of hacking and hacking and hacking onto it to make it do what you want.
Like building the great pyramid of Giza out of tinker toys. Sure, it's easier for people to use tinker toys. It's easy to understand. You don't have to do any real work. And it's also not meant for that task. At some point you need to throw out the toys and use stone.
I can even go further. ASCII is too old to use. Really, it's been antiquated by UTF-8. It is telegraphic codes for teleprinters. And ASCII itself was micro-optimized to only be 7 bits, and the 8th bit was used as a parity bit because perforated tape had space for 8. ASCII is so antiquated (1960) that nobody should be using it anymore.
Clearly we need to implement HTTP/2.0 in UTF-8 wide characters, so connections to China, Japan and India will support their native language in the protocol. (After all, what's the point of a native-language protocol if only English speakers can read it?) Also, we should include the byte order mark at the beginning of all messages so we don't have to worry about how endianess works.
Just look at that and weep. That whole document deals with how to represent HTTP headers. It doesn't define them, their behaviour and how they should interact. No. This multi-page document merely documents how these headers should be represented.
You know, things which up until now has been:
Lines of text with key-value-pairs delimited by a colon-sign.
Noticed how that didn't take eighteen pages and pressumed anything about current generation consumer DSL MTUs? Yeah. That's a nice, simple and good spec.
Obviously this HTTP2 binary monstrosity is being done all in the holy Google-name of micro-optimizing performance.
This is terrible design and quite literally obfuscation more than anything else. I cannot believe the IETF is even considering this junk.
1. TLS has nothing to do with this, it's Transport Layer Security (even though you may think of it as layer 6) because it doesn't alter its payload. ASCII/UTF representation and the messages themselves are layered on top of it. By going binary, you may well end up forcing your encoding onto systems which are not native to that encoding. Whereas of right now you could link any two systems and exchange messages, a binary protocol would mean that some systems can exchange messages freely and other systems would see garbage. That's why the Internet was standardised on ASCII and \r\n, so we wouldn't ever have to deal with that again.
2. I don't disagree it's easy to come up with a binary protocol, taking a short cut is always easier. Just like it's easier not to write a test harness with full coverage for a software project, that's entirely up to you. When a regression causes havoc down the road before you realise what's going on, well, rather you than me.
3. You're defending a regression, as of right now it's a non-issue. And are you really calling ASCII a hack around endianness issues? On what planet?
4. Records that are going to be deprecated down the road, which I think is fair to consider inevitable. All I'm saying it's been a problem in binary protocols before, so let's not do that. You don't see this as a problem at all, so I'll digress.
You mean like IP, ICMP, TCP and UDP?
Yes, these are built into the operating system. Once you start using e.g. netcat (who hasn't piped tar into netcat for a quick backup?), all of that becomes transparent.
No, my argument isn't that ASCII is ipso facto easier to implement. It's that it's easier to test, debug and always see exactly what's going on over the wire.
If the alternative to a text-based frankenstein format is its binary-bastard child, I'll have the former thank you.
6. No, I'm telling you to think of HTTP as a conveyor rather then a payload. There's a difference, and that's why the vehicle analogy is weird.
7. So you're proposing that every N years we create an entirely new HTTP and upgrade to that? At what point will the streaming pile of upgrade requests yield a noticeable reduction in performance?
Also, UTF-8 characters are not "wide", they're variable length but not wide as in multibyte encodings. Then you go on to suggest we use a BOM at the start of every (UTF-8, mind you) message, I'll leave it up to yourself to let that sink in. You even spelled it out.
I agree 100% on binary being a bad idea in HTTP spec, largely because of encoding and going backwards. But also binary and fixed lengths lead to harder to stream situations, chunked problems and being less approachable which leads to less innovation I believe. I'd argue HTTP clients/servers are better at dealing with buffer overruns because it isn't so set to a fixed length and more based on better defensive content messaging.
Many of the complaints on HTTP are really complaints about MIME messaging which the entire internet is really built on (standards anyways) and has ran pretty smoothly for a very long time. Approaching improving HTTP by addendum like SPDY is a better idea. Or possibly transporting it better over streamed protocols like SCTP: http://tools.ietf.org/html/rfc6525 no need to modify the packaging/messaging format.
MIME/HTTP/HTTPS are very flexible and if you want binary can be added in and has been in multipart, EDI/HTTP/AS2 and other RFCs use this. Multipart isn't used as much because it is more problematic (used heavily in email and custom protocols) so making the whole spec that way would be bad overall. The points on the OSI layers is very key, let's not revert to binary + base64 everything just to get data across the wire. You can put anything in there, basing it in text and human readable is always a good idea. That is really what this whole layer is about. Binary type of movement pushes us back to the days of non standard blobs, problems that http messaging then content as xml then json solved by standardizing readable exchange of data. When you are exchanging data in a standard way it should be very basic to minimize problems not collude. Throwing out all of MIME just to speed up HTTP when other protocols exist for any needs that are faster (real-time, attaching files, streaming etc) is a bad idea. Also changing support from HTTP < 1.0, to 1.0 to 1.1 had many problems, unless this adds considerable benefit, changing it adds more problems.
Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.
That's a very 1970s way to develop a new protocol.
These days you specify the protocol in, say, an XML- or JSON-based file format, and then run a code generator to produce client and server libraries directly from the spec. This has the advantage that the implementation is derived directly from the specification, so there is little room for ambiguity.
Wayland is one example in the open-source world of where this is done, but I've seen the technique used in proprietary shops as well.
That's useful for RPC type protocols, but HTTP isn't RPC based and it has lots of semantics written in English. I think it's better that way because it allows for a greater variety of use cases and implementations. You can still do RPC with websockets, if that's what you want.
I've seen this done with the on-the-wire protocol of scientific measuring equipment. It's hardly just for RPC (which HTTP increasingly resembles anyway).
The point is, the ability to "type" a protocol is irrelevant to how modern distributed software gets developed. Maybe it mattered in the days when comms were at 300 baud, machines had kilobytes or perhaps megabytes of core, and the Mark I eyeball was the best way to debug machine-to-machine comms, but these days we have tools that can decipher binary wire protocols for us. Performance and adaptability are far more important now than human readability. That war has been lost.
Real tools? I would argue that there are way more real tools for debugging ASCII-based protocols than there will ever be for binary-based one. ASCII-based protocol is highly compose-able by allows us to separate intent (contents of the message) from ever-evolving encodings (rebasing, encryption, compression, etc). In this world of fighting complexity, why are we not favoring the simple?
> Fourth, fixed-length records are the wave of the future!
1960s-style arbitrary field size limitations: the wave of the future! No doubt any day now we'll reorganize the internet around shipping punched-card images around, too. We could call the project the "Because It's Time Network", or BITNET.
> Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers.
Most people would argue that, yes; famously, Kernighan and Plauger did argue that in The Elements of Programming Style in 1974. By "easier-to-write parsers", you mean easier than dict(((key.lower(), value.lstrip()) for key, value in (line.split(':', 1) for line in header.split('\r\n'))))? Because I think that's going to be a pretty tough bar to fit under. (Yeah, I know you need another couple of lines of code if you're going to handle indented continuation lines, but you can get rid of those without returning your protocol design to the Summer of Love. You could also get rid of the .lower() and the .lstrip() while you're at it.)
> Add to this that every header could have a 32-bit identifier (4 bytes)
Padding out sub-byte-sized values to fill out fixed-width fields: the Intelligent Man's Approach to Saving Bandwidth! Or you could just use one-letter names in an ASCII protocol.
> [Fixed-length fields] solves crazy problems like header injection and request integrity checking.
Clearly we've never had parsing bugs in binary protocols full of fixed-width fields, now have we? Surely not bugs that produced security holes? Except maybe TCP, IP, and DNS. And X.400, and X.500, and X.509, and some of those were the fault of ASN.1 BER and DER, which are hardly fixed-width formats. And surely silently truncating a value to put it into a fixed-width field would never change its semantics, right?
> as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits?
Well, let's see. How many files do I have here?
$ find | wc
5946 17074 330147
Each one has a ctime, an mtime, an atime, and an inode number. The ctime is generally going to be the same as the mtime in this case, so we'll leave it out. I think they're technically 64-bit values in the current inode structure, but let's count them as 32 bits instead, since none of my files are from after 2038. The inode number is also 32 bits. So if we take these three 32-bit values per file, we have 5946×12 bytes, or 71352 bytes. And if we print them out as digits and compress them?
So a compressed string of digits is a lot smaller in this case. But you could argue that that's just because my data is highly redundant, since most of the timestamps are going to be within the current few years, which is true. But then, most data is highly redundant. How bad can it get, in the worst case of representing uniformly distributed random 32-bit values as compressed strings of digits, with spaces between? It adds about 44% overhead:
$ dd </dev/urandom bs=1024 count=1 |
od -w1024 -l | tr -s ' ' ' ' |
gzip -9c | wc -c
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000622635 s, 1.6 MB/s
> Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.
But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.
1. Sarcasm meeting sarcasm; HN truly has turned into hell.
2. Trying to fit an HTTP field parser into one line is not the way to win a programming argument.
3. One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect? You're trolling me now.
4. Yes, software bugs happen! It's crazy I know. But let's go ahead and assume that the security holes that still plague applications today due to design flaws are not the same as a couple off-by-one bugs decades ago.
5. By virtue of the algorithm, compression works better the more you have of the same thing. You won't have 70KB of headers to compress at once; more like 400 bytes. The compression of individual header groups each time will not benefit from the previous data's compression, as TLS or SPDY might do. The eventual overhead would not only be larger than a bitstream but take more CPU to decode.
6. Not only are they inefficient, they add complication to the parsing of the protocol, which is one more thing an application can mess up. Not only is it slower, it's more prone to errors. A VPN does not fix that. Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.
> One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect?
I wasn't suggesting encoding "Host: news.ycombinator.com\r\n" as "Hnews.ycombinator.com\r\n" but as "H:news.ycombinator.com\r\n". As long as you keep the colon, you can still use long names for other headers.
> Trying to fit an HTTP field parser into one line is not the way to win a programming argument.
You said parsing would be "simpler". It's going to be hard to get simpler than something that you can fit into one line.
> Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.
Well, that's kind of what I'm saying: let's focus on solving real problems, instead of recreating new ones that we'd already solved decades ago, in order to "solve" non-problems like HTTP header encoding.
So you'd rather an illegible ASCII representation instead of an illegible binary representation. This is why I hate getting into these arguments; people will insist on completely illogical nonsense as far as they can take it.
> It's going to be hard to get simpler than something that you can fit into one line.
This is a terrible argument, as you can fit anything onto one line if you string it along enough. But here's one example of something simpler:
And i'm not proposing we merely solve the problems of HTTP. That would make too much sense; people are much more willing to put up with bullshit than do the hard work to make things work correctly. I was proposing we make things work better, simpler, and more reliable, and throw away the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format. But whatever, it's not like this thread will amount to a hill of beans.
Your actual parser there is the definition of frame_struct, which you left out; and, as others have pointed out, if you're putting ints in there, you need to ntohl them. Also, you probably need some kind of extensibility.
And I don't really think "H:news.ycombinator.com" is quite as illegible as your suggested 32-bit integer space — which, by the way, is small enough that you'll probably need a central registry to prevent header name conflicts — and it also occupies only two bytes instead of four for the header type. So, from my point of view, the "completely illogical" thing is to go from, "The header names currently in HTTP are too long!" to "Therefore let's replace them with 32-bit integers in a binary protocol" instead of "Therefore let's shorten the header names in HTTP", which solves the problem more thoroughly and with less collateral damage.
And what is this about "if you string it along enough"? We're talking about a parser (for RFC-822 headers without continuation lines) that fits into 110 characters, here, without the least obfuscation. Less than a Tweet. In fact, I just Tweeted it. And it worked on the first try.
> the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format.
You know, we did kind of try binary protocols already: the whole IPX stack, CIFS, X.everything, SNMP, TFTP, ICB, Sun RPC and thus NFS and NIS, and so on. A few survive in common use: DNS, TCP, IP, ICMP, SSL, SSH, BGP, and to some extent, SNMP. And there are lots of them working fine inside of particular companies, rather than between implementations by different vendors. But for the most part, they've been replaced with textual protocols, despite the lower efficiency and in many cases the first-mover advantage: HTTP, SMTP, and IRC, and previously FTP, Gopher, and Finger. You seem to be arguing that was an accident, or a mistake. It's not.
> But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.
This was mentioned once in the IETF discussion, before someone said "but uhm, SPDY is binary, and we have data from SPDY, and yeah".
After that everyone was too busy discussing how horrible unstrict and ambigious text-formats can be, before jumping off to a 20 email discussion about which endianess should be preferred and how the clients and server should determine which one to use, or maybe clients should support both kinds endianess.
All without a hint of irony. It's like a Bizarro world IETF discussion.
>I would have liked to see an ascii based protocol
I disagree I think HTTP should have been a simple binary protocol from the start and HTML should have required compilation into a binary format.
How much work would have it really been? htmlcmp foo.html providing foo.bhtml. No whitespace, No end tags, one or two byte tags, etc. Strictness in the reference HTML compiler implementation could have saved the web from all the stuff outside the actual standard that browsers (and other tooling) now have to support (so they don't "break" the web).
I'm not suggesting anything as crazy as the flash binary format (I wrote a Java flash player once...), but when I started to write things like proxy servers and HTML minifers I was blown away by the extreme inefficiency of HTTP/HTML.
I believe that the "world wide web" took off for two reasons. It was completely free and it was incredibly easy for ANYONE to make a website. If you needed to "compile" all HTML it would have discouraged a lot of casual experimentation. Not everyone understands what a compiler is.
Just as an example I started playing around with HTML when I was about 12 (in 1998), it was easy and I got instant results. A year or so later I tried to learn Perl and quickly gave up because I couldn't get my first script to run. It was another year before I tried to "program" again and became hooked.
HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.
Wouldn't it have been cool if there were an efficient binary format and a nice human-readable and human-editable format, with a well-defined transformation from one to the other?
Back in the 1990s, Ron Rivest came up with canonical S-expressions, which are fully capable of representing the same information represented by HTML, XML, ASN.1—but can be either human-readable or binary.
Here's a simple example: (p (* class (footer x-treme)) "This is a " (b "footer") "."). Very human-readable, very human-editable, and easily machine-readable, wouldn't you agree?
As a binary format, it would be (1:p(1:*5:class(6:footer7:x-treme))10:This is a (1:b6:footer)1:.). Still geek-manipulable, if necessary, and extraordinarily simple to parse. And it has the advantage that it is a distinguished encoding--any of the myriad human-readable encodings all reduce to the same canonical encoding, which has advantages for hashing.
I'd argue the big thing that made HTML so easy to play with was permissiveness, not the lack of explicit compilation.
Perl doesn't tend to have an explicit compilation step. You could write a HTML compiler to be as permissive as HTML parsers and you wouldn't have the same frustration you had trying to get started with perl.
Gopher was simpler to understand than HTTP/HTML. But it wasn't as flexible, and didn't have as nice support for pretty pictures and midi files.
If anything, HTTP/REST was much more complex. HTML was somewhat mystic yet allowed you to move things around easier, and actually design things vs just presenting them. And hyperlinks were really cool.
>HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.
The web is big enough today that we can afford to make it more efficient since we have so many professionals that don't require the kind of implicit hand-holding provided by the original implementations. Back then everyone was a newbie and there were no web professionals.
I think we need to be careful about claiming its more efficient. There seemed to be significant disagreement about how much more efficient SPDY was last year from the various different benchmarks. From memory the consensus seemed to be that its only 5-10% quicker. Does anyone have any more up to day benchmarks?
I hate the idea that you would have to be a "web profesional" in order to get started today. We should embrace and keep the culture the web was built on.
That's an incredibly good way to lock out new newbies. I don't know how many people I know who got into programming through web development. Just because existing web developers don't necessarily need a simple solution doesn't mean new web developers couldn't take advantage of it.
For what it's worth, I started messing with HTML in elementary school, but had no idea how simple HTTP/1.0 was until I hit the 400 level classes in college. Editing in Notepad was the only way to go back then: even Dreamweaver got too complicated. Likewise, my first foray into JMS/message-queue type stuff was with STOMP, the simple text-oriented message protocol: there was no way I was going to understand the 'open wire protocol' for ActiveMQ. Ain't got time for that!
All that said, I think peterwwillis is right: http/X.0, as long as it's well-simplified, is better in binary than it is in text. Ideally, there's a bijection between your text-mode and binary-mode (like a lens), where it's easy to parse (rely on your toolset to do the translation back and forth) and easy to put on the wire. Forth is a good example of how to do it sanely.
I think to get maximum innovation on anything technology related, it should always be simple enough to get started at any level, then as you gain more experience you can (and the specs allow you to now -- AS2/EDIHTTP spec for one has encryption, compression, on top of HTTP/HTTPS - http://www.ietf.org/rfc/rfc4130) do more for performance, optimization etc otherwise it is premature optimization and a bit of a wall.
Like with gaming, each game should be simple to start but deeper to master. That is what this layer is all about, lower down the OSI stack it is much more of a wall to beginners. Never lock out beginners as they can be better masters with time, don't hide the entrance to the labyrinth. Leave things as approachable, but with better professional experience, modifiable to perform better. We have all that now, a competing binary protocol will never see the innovation that a more simple one like HTTP will. Higher up the stack you can see how this was better for exchanging data in standard ways from old school binary blobs, to CSV files, to XML files, to JSON. Same reason REST services won over SOAP, simplify... There is a JSON binary format BSON in mongo and also MessagePack but guess which one is used more in service/exchanging data the textual one or the binary ones? The binary formats work well in certain situations where both endpoints are controlled by the same entity.
Binary and more locked down/optimized formats and messaging have their place but the start/base should always focus on simplicity over optimization.
The general rule in exchanging via standards is be liberal in what you accept, conservative in what you send. Being all binary all the time is a backwards step and is conservative on what is accepted, I also think it would lead to a host of difficult to debug problems just based on work I did with AS2/HTTP RFC implementation one of them being streaming and of course encoding/decoding which can fill hours of work if you can't visually see the content at some level.
But I don't think we should be raising the barrier to entry and making it more difficult for a young hacker to get started. Like the above commenter I started hacking around with HTML and CSS when I was around 12 and it led me into linux/python/php/web frameworks.
There's something beautiful about being able to teach someone how to write a basic HTML document in 20 minutes. My mother can easily understand what HTML is doing, I doubt she'd understand a compiler.
I see where you're coming from, I had a similar experience playing with HTML as a kid. That said, "compiling" HTML from text to an efficient binary format wouldn't have prevented browsers from including those compilers themselves and transparently compiling all plain text pages received. Newbies could keep on serving plain text HTML (until someone chastises them for it) while the web as a whole would benefit from the increased efficiency that comes from binary pages being the norm.
Your argument doesn't really apply at all to HTTP, though. No one "got their start" peeking at HTTP requests. It's solely the domain of those working on infrastructure (for some definition of that word). Anyone with any idea of what HTTP really is (ie more than "that thing at the start of a URL") should have no problem using a tool to convert between a binary and text representation of the protocol. It's not like you can just magically pull HTTP requests out of the ether, you need a tool anyway. There's no reason why curl (for example) couldn't transform a request you write into an equivalent binary protocol, or why it couldn't do the inverse operation when it receives the response. It's utterly ridiculous to me that there is such inefficiency in HTTP just to make things slightly easier on the implementers of curl and wireshark.
HTTP is a protocol designed to communicate information between two machines. There's no reason it should be human-readable. Trying to make a protocol that's easy for humans to read and write leads invariably means making it harder to write software for.
A while back, I wrote a daemon that checked a bunch of network stuff in a loop. I needed a UI for it, so I made it speak http. No library, just raw GET support. It worked. Didn't take long to write, either. I would never have tried that with a binary protocol.
I thought you said you weren't using existing libraries? Or did you just mean for the server? You made it sound as if you thought writing a server for a custom binary protocol was harder than writing a HTTP server from scratch.
If you're aiming for an existing ecosystem, then sure, there's no reason not to use HTTP, assuming you make use of established libraries. But widespread use is HTTP's only real virtue; the protocol is considerably more difficult to implement correctly than it should be.
No libraries. Ordinary web browsers for the client side.
A minimal HTTP server that recognizes GET requests, finds the url, throws away everything else, runs the relevant code and returns an HTML document of the results is actually really easy to write. And easy to integrate into an existing event loop.
Sure, if you cut corners, place some hard buffer limits, discard the vast majority of the specification, and don't care about breaking parts for brevity, HTTP starts to become tractable.
But even a minimal HTTP server, even one that ignores things like HEAD requests, is still going to be more complex than, say, receiving the URL raw, or even a URL wrapped in a simple structure like a netstring.
Actually implementing a full, correct HTTP server would be one or two orders of magnitude more complex that implementing a more modern protocol from scratch.
Despite the "extreme inefficiency", HTTP/HTML have managed to work successfully for a very long period of time (and even worked decently well on very slow hardware in the '90s).
There is actually a binary format that much of the web content is compiled into: gzip. It's remarkably effective.
I will grant that following Postel's Law means that browsers have more work to do to ensure that all kinds of "busted" stuff on the web continues to work, but I'd guess that, at this stage, that work is pretty small compared to everything else browsers are trying to do.
Compiled binary HTML seems to me would be the end to the open nature of the web. The fact that anybody can view the HTML source is probably irrelevant from a network point of view. But I consider it a fundamental aspect of the web.
Hinder an open web? Maybe I am wrong, but my gut tells be this argument comes from developers who are not used to working with compiled languages.
If you really hate the idea of having to run this compiler, it could be automatically run by apache, IIS, nginx, etc when serving your page for the first time. This is all hypothetical of course since such a standard does not exist.
What's interesting is that what you describe:
>this compiler, it could be automatically run by apache, IIS, nginx, etc
...essentially already occurs in many cases in the form of compression such as gzip. These files are also automatically extracted by the browser and are essentially as transparent as non-compressed ones.
So I think you both have a point:
I agree with jakejake that the web must remain transparent.
I agree with you that as long as sufficient tooling is freely available, it doesn't matter how the underlying protocol works.
I.e., a binary representation of the page is still fairly transparent if there are plentiful tools that will deserialize it into a page object that can be expanded/perused/manipulated and then re-serialized when e.g., Ctrl+s is hit.
What would be bad is if 'View Source' showed something like:
...and you needed to spend umpteen hundreds to get a decompilation tool that only gave you an obfuscated/inexact reproduction of the recipe for the page.
Given IE, Safari, Firebug and Chrome all support ADDITIONAL developer tools, how likely do you think it is that you'll be stuck with raw binary when the browser has to decode this to the same internal representation used for HTML today?
Nothing will change on the front-end, just the servers will get faster and new bugs will be introduced. :)
Meta: Is it too late for you to edit your post to break up your string of digits or put it in a code block (prefix with two spaces, surrounded by blank lines)? It's forcing the whole comment page to have a horizontal scrollbar.
Because the web took off cause non-developers, prot-developers were able to easily create pages, look at html/css/js and become developers. It seems trivial to you but having a compile step complicates things to beyond the reach of many.
Plain text is more open and accessible than binary, full stop.
Not that it really matters, but since I'm the one who brought it up and your gut was telling you that I'm not likely familiar with compiled languages - I started programming in Pascal and Assembly in the 80's, moved onto Java and C# and lately I spend a lot of my time writing Objective C code.
It'll be no skin off my nose if the web turns into a compiled protocol. I don't know if it's even necessary to continue with plain text web sites these days. But, it most certainly was a major deal initially for me to just "view source" and see that it wasn't a black box of voodoo. The low barrier to entry is one of the major reasons the web took off.
I'll admit I'm an old timer in this business, probably ready to be taken out back and shot! So I have no idea if you youngsters were similarly inspired by viewing the source code on a web site? Maybe with all of the complex client side code and minimized scripts that it isn't even relevant anymore? It probably just looks like gobbledygook to a non-programmer these days.
Genuine question: Is it really that big of an issue? We still have to use tools (telnet/curl) to inspect text-based protocols. We don't examine raw bytes by hand, do we? In other words, isn't every protocol ultimately a binary one and the issue here is only a matter of degree?
I think it is. We have a large amount of tooling in existence to inspect text based formats. Binary format inspection requires new tooling to be written and deployed. This is a major barrier.
I may be a bit old-school, but I learned HTTP via telnet, a tool definitely not written to inspect HTTP, and I still use it when I’m trying to debug things. Not having to install tooling is still something I take advantage of.
Obviously binary formats can and do succeed, and with sufficient backing tools will be written and deployed. But if HTTP hadn’t been so easily inspect-able I don’t think it would have been nearly as successful the 1st time around, when the benefits of the protocol where less well known.
And I do think some of the "culture of the protocol" will be lost moving to binary°. It just isn’t as hacker-friendly or at least newbie-hacker-friendly and that sends a message.
° Of course this is already happening with HTTPS, so this is probably not a winnable fight. And the benefits of binary formats are significant, so it might not even be a fight I want to win. Still something is being lost here.
> Genuine question: Is it really that big of an issue?
Text-based means everything is open for inspection and self-evident if well designed. This goes away with binary.
Not to mention this protocol will need to be implemented everywhere and used by everyone. This means everyone will need to understand it as well. Open text-based protocols support these requirements simply by being open and text-based.
As if there's not enough bad code around already, in any language not assembly/C/C++, the reduction in code quality and clarity, and amount of additional issues involved the second you change the word "text" to "binary" is staggering. Let's not go there if we don't have to.
Most debuggers today can dump strings fine. Dumping binary, while not impossible, brings extra hurdles. Putting hurdles into debugging, pretty much means you will end up with more buggy code.
As if that's not enough to make think "hmmm", there's the issue of future-proofing and extending the protocol. That is much easier to do cleanly in a text-based format. A binary HTTP 2.0 will be brittle and short-lived.
> We don't examine raw bytes by hand
Because that is very, very impractical. And thus making that a de-facto necessity for working with HTTP 2.0 doesn't make much sense.
I say we should look at it the other way around: Considering everything a binary-format will cost us (just some of those mentioned above), what benefits does it bring us to justify this huge cost?
I say the answer to that is near none, and in an ideal world that would be the end of the discussion.
An argument that it's a fake issue: the fact that best-practices compliant HTTP applications already tend to run under HTTPS, and that understanding what's happening at an HTTPS level in operationally-relevant ways (namely: are we performant enough? and are we secure enough?) already requires parsing of that binary protocol.
tool already capable of doing it is socat.
ex: socat -v -x - tcp:google.com:80
add >/dev/null if you want do discard plain text output.
also works over ssl:
$ socat -v -x - openssl:google.com:443,verify=0
I agree. HTTP, like many core internet protocols was designed to work with any client setup, even someone sitting at a telnet terminal sending hand-rolled HTTP requests. FTP is just the same, and SMTP, etc. While very few people these days will actually wind up using telnet to access network resources, it is a very successful design and one which ought not be abandoned without considerable thought.
That's a childish way of asking for what you want: (1) opening an issue in the wrong forum and (2) failing to familiarize yourself with the previous discussion threads. You know this is a controversial topic, please act more maturely in the future.
Any time a bug report is linked in Hacker News the community ends up spamming the tracker with our enlightened opinions. Show some restraint.
> (1) opening an issue in the wrong forum and (2) failing to familiarize yourself with the previous discussion threads
OK. Fair enough, but I did look for an existing issue on the matter. There were none.
> You know this is a controversial topic, please act more maturely in the future.
Personally, I find attempting to lock down open internet-protocols by transforming them into some sort of binary obscurity rude and against everything the internet was built on.
I don't think everyone is aware about that happening right now. and I see nothing wrong with attempting to bring focus to the issue if that is what needed. Seeing how this draft still outlines a binary-protocol, this clearly still is in need of attention.
Just because some Google-heads somewhere decided that by completely ignoring everything the internet has taught us so far, and in the name of pre-mature optimization can shave off 2ms on their page-load doesn't mean the internet should pander to their interests.
If anything is controversial it is how this is all being done without any documentation showing us what benefits we get from the costs associated with a binary protocol. That is amazing. Completely and utterly amazing.
But OK. Let's say I listen to your guidance: Where would be a good place to raise this issue? Where would I take my "childish" issues to ensure they get the attention they so much deserve?
As they said, it belongs in the mailinglist. I too find it quite horrible that they demand technical arguments for why something should _not_ change. Usually one needs technical arguments to justify a change.
There is no reason why someone couldn't build a tool to send/receive binary responses and print them out as text. We have wireshark anyway. Just need a cli translator or a program that lists all the possible valid inputs for a given connection state and lets you select one (and allows for raw input just in case you want to try and break something).
For most cases you will probably be able to get the old HTTP protocol anyway. The only potential issue would be for people who are implementing the HTTP 2.0 protocol itself, just about everyone else in the world will be using a webserver or at least a library.
In fact raising the bar to prevent people developing their own HTTP 2.0 implementations would probably be a good thing, it would cut down on the number bug riddled implementations.
In any case I don't think it makes sense to slow down the whole internet just for a few developers.
Binary could also see the possibility of hardware implementations. Think webserver on a chip.
HTTP 1.x was fine for the early 1990s. It's 2013 and Windows has won. Unix is dead.
Maybe you have a nice Ubuntu box and that's fine. I'm talking philosophically. Modern developers favor the performance and functionality advantages that large applications, deep APIs, and binary communication protocols provide over the transparency advantages that small, highly focused tools, orthogonal APIs, and human-readable communication protocols provide.
The essence of Unix is not in the name of Unix, nor in the code of Unix, said Master Foo.
And then Chuck Norris roundhouse-kicked him in the face.
Point being that if by Unix you mean an OS running a Unix-inspired kernel, then yes, Unix won. But if by Unix you mean the design patterns and philosophical principles that have long inhered to Unix development, then Unix is, if not dead, then on its last legs and soon to be discredited.
The "Unix philosophy" has long favored small, easily composable tools each with a specific purpose, orthogonal APIs which were as small as possible, and textual transmission formats which are easy for humans to read and write. There were exceptions (X, Emacs), but these exceptions tended to support and play well with the Unix philosophy even if they didn't 100% espouse it.
By contrast we may propose a Windows philosophy which favors large, do-everything applications over small tools (because that's how people are accustomed to using PCs starting from the DOS days when only one program could be up at a time), heavyweight frameworks and oftentimes entire inner platforms (because in a world where time-to-market is paramount, developers shouldn't have to think very hard to begin cranking out apps for the new technology of the week), and binary file formats (because the damn thing has to run in 640k, a text parser won't fit).
Look at the platforms you mentioned. Mac OS X, iOS, and Android are all app-centric, not tool-centric. You can treat Mac OS X as a Unix box if you want, but it's hard to do so with the other two. Furthermore, when you write an app for these platforms you are not targeting the Unix kernel or libraries but an inner platform built on top of them. Which brings us to binary file formats -- like HTTP 2.0.
Wow. I don't see that at all. You must not use any of the platforms you mentioned.
If you look under the hood on OSX, iOS, or Android, they are all composed of smaller single purpose components. If you are arguing that they do not use interprocess communication to join these component together, then you are correct. However, that is not the Unix philosophy. A great example is Outlook vs OSX/iOS Mail/Calendar/Contacts/Notes. On OSX and iOS, those applications try to do one thing well. On Windows, Outlook tries to do everything.
Beyond that, just about every embedded device and the vast majority of servers now run linux (or a unix variant). Just look at a list packages on those linux devices and you will understand that it clearly is built around combining small single purpose components.
Given that, it is hard to argue that the Windows philosophy has won. In fact, Windows seems to be struggling (as evidenced by slowing interest and reorg/rearch/rebrand thrashing in Redmond. Looking at the market, I'd argue that the Unix approach won quite some time ago.
Yes, the reason is that starting a new stream multiplexed inside an already-open TCP connection is faster than starting a new TCP connection. Attempts to improve TCP itself are slowed down by slow uptake of new Windows versions.
While there have been some tweaks around the edges in terms of congestion control and the like, the TCP wire format has not changed in over 25 years and it has very little to do with Windows. It has to do with the all the middle boxes and routers in the ossified backbone of the internet that choke if they see anything over IP that is not exactly what they expect.
That's great, but it means it'll always be a userland library rather than a OS supported networking protocol... which means the API will never be a defacto standard akin to the BSD sockets API. The reference implementation is also in obtuse looking C++... which will hinder bindings to other languages.
In short, don't expect to see this outside of Android or Chrome.
On the plus side, this is exactly what UDP was for in the first place.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
Being open and readable is also a check on this. Compare for instance the HTTP spec or JSON or XML files to binary file formats of applications. If it is hidden from sight things get ugly fast. Case in point the PSD file format and any Office file format (http://www.joelonsoftware.com/items/2008/02/19.html). Binary formats are usually only a good idea on the onset, with change it becomes a pile. Binary formats only really work well when you control both endpoints (even then they get crufty), for exchanging information being unreadable leads to many bad things and it a great step backwards.
I work in game development and application development and the former really loves binary formats (slowly changing away from that with server and editable needs). In many bugs or crashes the root of the problem is some offset in a binary file or some incorrect custom binary file format node that breaks everything after it. Readable, keyed and debuggable formats are so much better at their root (they can be binary, base64'd, compressed after but at the root they should be standard in some way and able to change without breaking the whole thing).
Their main reason for going with SPDY seems to be that rolling out an application-level protocol is easier, while rolling out a transport-level protocol would require changing routers and such. That is almost certainly true. Fixing the multiplexing problem in an HTTP-specific way rather than generally does seem like an unfortunate hack, but it's probably the pragmatic approach.
Don't confuse the cart and the horse. The IETF standardization process has never been less relevant than it is today. The reality of deployment of a "next-generation HTTP" involves two things, neither of which the IETF has any say in: (a) whether anyone can improve HTTP as it runs between a mainstream browser (or some as-yet unforeseen browser replacement) and a content serving backend in a way meaningful enough to drive adoption, and (b) buy-in from the major browser owners.
That's the way it's supposed to work; the horse is meant to drag the cart. The RFC database is littered with cart-led insurgencies that went nowhere. If binary HTTP is one of those, it'll join them as a historical curiosity.
HTTP/2.0 enables a server to pre-emptively send (or "push") multiple associated resources to a client in response to a single request. This feature becomes particularly helpful when the server knows the client will need to have those resources available in order to fully process the originally requested resource.
If someone asks for your HTML, might as well send them the CSS, JS, and images it is going to use. Would cut down on requests. Only thing more efficient than compression, is not having to ask in the first place.
I still feel like this is just being rammed through far far too fast. SPDY might well be the right way to go, but it's only been deployed for a small number of very large sites for a couple of years at this point. It was a foregone conclusion when the schedule was announced for http2.0 that it would be spdy, because only spdy has demonstrable value in production. There was no time for anyone to come up with or evaluate an alternative. But on the other hand, there hasn't been near enough time to demonstrate that spdy is good for the whole internet, which is what blessing it with the http protocol name implies.
And I just don't see why it should be rammed through. SPDY has a connection upgrade path, that afaik is largely unchanged for http2.0 anyways.
I think you're seeing a defence of spdy in my post that isn't really there. By value I mean simply that it is at least functional. Any alternative proposal, given the short proposal cycle, would not even have that.
I don't get where the "binary" part is. Does "GET /docs/index.html HTTP/1.1" get converted into binary? What are the benefits of this, given that it's really a limited amount of information compared to the body of the request/response which can of course be binary - or any other format.
In short: While TLS is a really good idea for just about every user-facing site on the web, there are many applications where some combination of the administrative burden, need for intermediaries and performance cost make TLS sufficiently undesirable to preclude adoption of a spec that required it.
You still can't self-sign certificates and rely on certificate pinning for transport security without domain verification. With all of the security concerns people have with governments logging all unencrypted internet traffic, I was hoping HTTP 2.0 would require or at least optionally allow self-signed cert TLS for every http: connection (with no UI indication to the user that they are secure) and require CA-signed cert TLS for https:
We're also still stuck with DEFLATE, which is vastly inferior to modern compression formats like LZMA.
This was my main objection as well, after reading this. Make SPDY a standard, people say it's overengineered, but Goog and MSFT say it'll get us 10% performance benefit. Whatever, I trust them. And hey, think of all the next-generation tcpdump-style tools we'll get to write to decode the multiplexed connections (assuming any existing ones for SPDY won't already suffice).
But while you're going to the trouble to push a new version of HTTP, why not put some sort of mandatory encryption in there? Make the browser generate its own RSA key when it's installed? Require pinned self-signed certificates at the absolute minimum? Hell, I'm not an expert, I just want the NSA to have to work harder.
Encryption is only half the problem, authentication is the other. Self signed certificates don't authenticate the web server and allowing them would open the web up to MITM attacks. What we need is some form or decentralised certificate authority, i have no idea how that could work though.
HTTP allready supports LZMA, its just the browser and web servers that dont. If a browser sent the header:
Accept-Encoding: compress, gzip, lzma
Accept-Encoding: compress, gzip
and the server supported it, it would "just work". Unfortunately none of the major browsers or server have implemented it.
Unencrypted HTTP is already open to MITM attacks. What I proposed was using self-signed certificates in conjunction with certificate pinning, which means that as long as your first connection to a website isn't MITMed, later connections cannot be MITMed or snooped on. As for the compression, I was referring to LZMA for header compression. I am well aware of implementation efforts of LZMA in Firefox and Chrome.
According to the shills employed by or rooting for Google, here on HN and elsewhere on the internet, it's not silly: It is paramount.
See, it allows the request for www.google.com, including all of Google's tracking cookies, to fit inside one TCP-frame, causing a 5ms improvement in load time, and Google has evidence that this means they make more money.
Never mind the open, exploratory nature of the internet and how text-based protocols was what made the internet into what it is today. We're going to throw away all that which the internet has taught us about that because Google says we should.
This thing is fucked up beyond belief.
Edit: found a link to the IETF discussion about this:
It admits in plain sight that the only thing they care about with HTTP 2.0 is solving Google's massive-scale issues at the cost of everyone else:
> I finally admitted it was a dead end. At the moment the
challenges consist in feeding requests as fast as possible over high latency connections and processing them as fast as possible on load balancers
A good, open, self-documenting protocol didn't suit Google, so let's throw it away. It's a "dead end", after all.
I really disliked the following part: A client MAY immediately send HTTP/2.0 frames to a server that is known to support HTTP/2.0, after the connection header. [..] Prior support for HTTP/2.0 is not a strong signal that a given server will support HTTP/2.0 for future connections. (Section 3.4)
That means, whenever you open a new TCP connection to a http2 server, you must start with HTTP/1.1 and perform the Upgrade negotiation anew.
Making the spec require HTTP2 support on all servers of a cluster if at least one server supports it would have been much better, performance-wise. The only reason I see why they kept it is transparent proxies (if your browser is using an explicit proxy, it should be able to figure out if the proxy supports HTTP2).
Yes, but storing "supports HTTP2" for a given server for a given time after connecting to that server and negotiating the upgrade would also do the job.
One of the problems with HTTP 1.x is that it is stateless - every time you want something from a server, you treat it as if it is the first time you talk to it. I understand this is by design, but as a consequence we are transmitting all the same headers to the server all the time... which can be mitigated with HTTP2, except that we still need to do that (send all headers) and the HTTP2 upgrade, every time we open a new TCP socket to that server.
Hmm... I have a few scripts that do something like:
header($_SERVER['SERVER_PROTOCOL'] . ' 400 Bad Request');
Since I'm using SERVER_PROTOCOL, if I upgraded Nginx and it started to send HTTP/2.0 traffic to the server, would I need to rewrite this line? How would I send a specific status code using the new binary protocol?
A Microsoft employee is the editor on this document.
Also, how is binary protocol used to "reduce transparency" ? Its simply a different way of encoding, and once this takes off there will be a vast set of tools for analyzing http/2.0. The purpose of binary is to reduce data transmitted.
It's a rhetorical question. None of the leaks provide any evidence that Google actively cooperates with the NSA in any way. No evidence of equipment at Google data centers, no evidence of any kind of remote access for the NSA, other than the already existing legal channels.
The claim that Google had foreknowledge of PRISM, or worked with the NSA to build some kind of firehose for them is unsupportable given what we know, and is pure speculation on your part.
Thus, the claim of close cooperation with the NSA is frankly, ad hominem.
If we really want to get into the FYIs here, PRISM is not some network intrusion tool designed to siphon data away form large companies, it's just a stupid database where governmental agencies can throw their data obtained by either NSA letters, or a standard bench warrant.
Much like IAFIS is a standard database operated by the FBI.