OTOH, there are several things HTTP does ridiculously badly.
It all depends on what you're building. Evaluate the case, and sure, if you need a more specific protocol, design one yourself. It's simple to layer on encryption, compression if you need those.
Also the argument slightly reeks of the same "Everything knows how to deal with it!" that caused some people to use XML for everything regardless of how well it fit the job.
HTTP isn't perfect, and there are plenty of times when a custom protocol makes sense. However, I think the author is arguing against the "everything needs its own custom protocol" mindset which is all to prevalent in the corporate world. In my work for USPS, I've encountered numerous custom protocols, most of which are far inferior to vanilla HTTP and which cause endless headaches due to poor documentation, debuggability, extensibility, bugginess, etc.
So I agree with the author and my own approach to protocols is to start off with the assumption that I'm using HTTP and only use something else if there's a good reason why HTTP isn't appropriate (performance being the most common dealbreaker).
Oh, you mean like SOAP or XML-RPC? Yes, I would prefer using HTTP as a protocol over using those any day.
Sarcasm aside, I don't see exactly what HTTP buys you over establishing a TCP connection to the server and talking to it directly. Points one and two, check. If your favorite language can talk HTTP, it can talk over TCP.
Besides, in a URL like http://foo/bar?v1=x&v2=y&v3=z is in itself a type of protocol, namely the name/value pairs. Sure, I know HTTP and can debug problems at THAT level, but what about at the higher level? There's still a protocol (oh, I forgot that v4 is mandatory when calling bar, but optional when calling baz).
Read the rest of his post - he mentions specifically that performance issues would probably be the number 1 reason to not use HTTP. Bear in mind also that most applications are low-bandwidth, latency-insensitive, and do not need to make optimal use of available resources. This makes HTTP very quick to implement, nearly universal in compatibility, and fairly foolproof.
I think he was complaining more along the lines of "why are we running a custom protocol when we're just passing along a bunch of strings?"
Because the tools are there to support this. Let's just say I want to be able to query the current weather at a longitude/latitude, and I want to make sure this service is accessible not only to my client application, but also any other apps that want to integrate this feature.
I can:
- Write my own protocol with very low overhead, just a simple TCP handshake, the client sends me a little string with its location, and I send back the weather. Simple, really fast, has low network and processing overhead. But then you find that this is almost impossible for someone to integrate into their webapp, or even a native app, without opening sockets and doing a stupid amount of work just to set up that connection. I can also write my server code in C++, which gets me some hellishly fast performance.
or
- I can implement it in PHP, and accessible via a URL like http://www.mysite.com/weather.php?lat=X&long=Y and be done with it. Total coding time? Almost zero. There is no need to set up or tear down connections, and no worry about things like encodings, endianness, or anything of that sort that you'd have to worry about with a raw TCP connection. These tools are also proven, and relatively bug free, which one can't confidently say about anything they re-implement themselves at a low level. Hell, a simple shell script can curl my URL and get the weather back, just like that.
There are gains to be had for doing everything yourself, but for the majority of webapps the tradeoff in reliability and ease of use/development simply isn't worth it. The existence of talent, tools, and community knowledge is often worth the lower performance and sometimes unnecessary overhead.
On the other hand, you're giving up extremely well designed integration with almost every language and runtime on the planet. Why use your own custom protocol, when you could use HTTP? The argument isn't that HTTP is always better, but rather that it should be the default choice.
This is a ridiculous conversation though. The default choice for what?
What are we even contemplating building here? HTTP is clearly not a sane choice for everything, and most likely not a sane choice for the majority of things people build.
This argument is like saying "If you're going to write any software, you should assume you'll write it in Java, unless you have a good reason not to". (Which some people were probably chanting in 1998).
To be fair, I've at least taken the approach of starting with HTTP for many projects (knowing I'll probably ditch it at some point) - the author does have a point about it being very easy to debug using things like wget and your web browser.
Sure, but if we were working on something like skype, or a multiplayer gaming network, or an instant messaging system, HTTP wouldn't be an obvious choice.
Maybe I missed the part in the article where it said "In the world of websites/webapps... etc" :)
God I hated those arguments so much. I especially hated being told I should be using xml "because the parser is already written" as if writing a parser was the difficult part of... well anything really ;-)
The main point seemed to be that you wouldn't need to layer much, if anything, onto HTTP. That is not necessarily that much of an advantage, depending on your environment, but should be worth considering.
rsync uses it's own protocol (when run as a daemon), so does ssh and a whole slew of others the common element is that http can not provide these applications with the functionality that they seek.
I think that the golden rule should be something like:
If you can't do it on http you're free to roll your own but if you can do it over http then you really should.
Interestingly, rsync can be used via an HTTP proxy, so presumably the protocol gets divided into HTTP requests. Doesn't seem too crazy really: HTTP supports partial downloads and uploads, and the rolling hashes can be easily be encoded in HTTP requests, too.
Yes, but the more useful options that rsync supports, such as all the file manipulation gear won't work via http so you'll end up with a subset of the functionality.
> if you can do it over http then you really should
There seems to be some wisdom here that I am not extracting successfully. It seems HTTP "can" in most cases. But where it can, do you really want to use it? Technically you could also write a db engine that reads over HTTP, but this must be a terrible idea; what would be the argument here that counters the rule of thumb?
- without getting really tortuous, but a little bit rethinking your problem should be fine
- without sacrificing a large amount of speed or functionality
- without compromising your design goals
OK, so I'm a noob about protocols. Can someone help me read between the lines? Does TF's defense/advocacy for HTTP reflect a "threat" to its universality? In any case, what protocols are developing as alternatives (besides the obvious: FTP, XMPP) - and for what purposes that HTTP cannot serve well?
Well, it seems the author drank a bit too much of the webservices kool-aid.
HTTP is just one protocol amongst many. It was never meant to be "universal". The only reason why it's being abused for even the most unsuitable tasks nowadays is because when all you have is a hammer then everything starts to look like a thumb...
There are many purposes where HTTP is not suitable at all. Realtime audio/video comes to mind - UDP is commonly used here because even the TCP latency is too much. Anything where you need a persistent, bi-directional connection is not a good match either. Anything where you need to push from server to client. In short: Anything that doesn't fit into the request/response paradigm.
In fact, many consider HTTP in its current incarnation to not even be suitable for the interactivity that we expect of modern webapps.
But ofcourse that doesn't stop the truly enlightened.
Hence we got abominations like "Comet", a persistent stream-socket emulated on top of an inherently request/response-based protocol. Or RSS, which is nothing more than Usenet done really, really wrong - all lessons unlearned.
Sorry, got a bit carried away. But you get the idea I guess.
The enthusiasm for tunneling everything over HTTP comes from that being easier than getting firewall rules changed, not any intrinsic properties of HTTP itself. It's port 80 people are interested in, really.
I edited out a section where I talked about how it's NOT just port 80.
It turns out lots of ISPs and broken routers and terrible corporate proxies only allow HTTP. They do things like require HTTP headers, so they can check for whitelists/blacklists. It's stupid and it breaks TCP/IP in general, but most people only care about the internet so it just works for most people.
Comet is an abomination if you think the problem is how to design a persistent stream. But the real problem is how to design a persistent stream in the browser.
Well, technically it's an abomination no matter from what angle you look at it (and I am looking at the orbited sourcecode in another window right now).
To me comet is a testament to our collective failure to demand proper tools from the "big guys" (Mozilla, Opera, Microsoft). The term "web 2.0" was coined roughly 7 years ago and even long before that it has been clear that some sort of WebSocket would be incredibly useful to have in a browser.
Yet no single browser vendor (to my knowledge) has impemented it today.
Imagine where we could stand if only Firefox had a true, non-standard Socket class already. People would be using it (socket for the fox, comet-hacks for the rest) and IE market share would probably be dropping faster than ever because all the fancy new apps work better elsewhere.
But I digress, this is mostly whining. You have a point; the comet hack, as ugly as it is, is justified by the lack of proper alternatives. But still, an abomination on so many levels...
I think this exactly illustrates my point... (sorry for calling you out twice moe, I very much appreciate you expressing your opinions even though I disagree with them)
You say "all lessons unlearned" implying that RSS is worse than previous incarnations of solving the same use case. This is a common opinion I really am railing against. RSS is EXTREMELY popular. Usenet is a dying architecture. It's a case where being worse in the "obvious" technical architecture allows you to be better in things that actually matter: ease of access, simplicity of use, lack of installing things and wide support. These are just some of the reasons RSS took off. Easier means more viral, because easier means a shorter viral loop.
Of course, I think RSS is better for syndication than notification, and would prefer more sites like twitter use Webhooks in addition to RSS.
It's a case where being worse in the "obvious" technical architecture allows you to be better in things that actually matter: ease of access, simplicity of use, lack of installing things and wide support.
Well, ubiquity != adequacy. I understood the original author's article as a technical recommendation. He basically suggests to use HTTP for everything because he thinks "it's technically good for everything".
He made broad claims about how any HTTP based protocol will magically scale by leveraging proxies, loadbalancers and other existing infrastructure, completely ignoring the fact that many applications just don't fit into the request/response paradigm in first place.
RSS is a great example for the power of the internet that enables us to "build on what we have" without waiting for some standards body or greater authority to get moving. But it is also a text-book counterexample to his scalability and "one size fits all" claims. Polling just doesn't scale for these things and technically it's a step back from Usenet, that had these problems sorted out already. We went back to square 1 with RSS and are now locked up there until we get a true WebSocket and worthwhile persistent storage in browsers.
FTP is actually a lot older than HTTP and therefore wasn't developed as an alternative. Traditional, multi-socket FTP actually has massive disadvantages in today's firewalled, NATed internet. HTTP is more useful than FTP or most other protocols for transferring individual files these days. I've never worked with WebDAV, but it seems like it would be a preferable alternative almost anywhere you might use FTP.
As an example where HTTP doesn't work so well, is pushing information as opposed to servicing requests. COMET is a hack and requires you to be very careful with clients. (e.g. the 2 connection limit in older browsers) You'd only really want to use it where there's no alternative, like in a browser.
Basically, HTTP is great for servicing individual requests. It can do kind of do persistent connections, but they're more an optimisation than part of the design and therefore optional for the client even if the server suggests them.
If your use case falls outside of the basic request/response premise, you can maybe use it for prototyping for a while, but you should probably switch to something more suitable to your setup after that unless you need to use HTTP for reasons outside of your control.
Well, depends on your problem. If you have a sensor network that needs to transfer information, but needs to do it in an energy-conscious fashion, HTTP might not be the best choice.
Sensor networks are a bunch of sensors that collect info, then network themselves to transfer what they know, usually to a mothership base station of some sort. Because it's a pain in the ass go out in the field and replace the batteries on these sensors, you have to make their transmissions energy-efficient. There is nothing in the HTTP protocol that takes energy consumption into account. Nor should it. It's not an application layer problem. However, the only reason I mention it, is that I've seen people start mixing the protocol layers together in a single protocol to achieve the energy consumption characteristics they want, which included the application layer.
So hence, that's one instance where HTTP isn't a first choice. As long as you know what HTTP is for, and don't use it blindly, you'll find that it probably suits your needs.
I recently implemented an event logging server, intended to process a massive, concurrent number of event messages from a large clusters of systems.
We choose to implement a CPU efficient binary protocol, automatic client-side queuing of outbound messages if a server failed, and automatic -client-side- fail-over to the next available server.
The initial implementation, with no time spent on profiling/optimization, was able to receive and process 25,000 event messages per second from a single client, and scale up clients/cores relatively linearly.
I can't even begin to fathom solving this problem with HTTP, or why the 'features' of HTTP listed (proxies, load balancers, web browsers, 'extensive hardware', etc) would be an improvement over the relatively simple and highly scalable single-tier implementation we created.
Clearly, HTTP works well for some things, but "just use HTTP, your life will be simpler" is a naive axiom.
Also the argument slightly reeks of the same "Everything knows how to deal with it!" that caused some people to use XML for everything regardless of how well it fit the job.