But do you have any concrete suggestions of protocols, or are you criticizing the choice based on an hypothetical protocol that would be very similar but incompatible with HTTP and all its existing tools (millions of tested and deployed caching servers, load balancers, etc), and for which whole new libraries would have to be written, just so you can make it somewhat more efficient?
I think you're grossly over-estimating the difficulty of defining a protocol. It's no more difficult than defining the protocol for which you'll use HTTP as transport.
Load balancers know how to load balance straight TCP. HTTP caching servers are an HTTP-centric idea.
The 'libraries' you'll need can be much, much smaller when all you need is a bit of framing and serialization, instead of a complete complex RFC compliant HTTP client stack.
I think you're grossly over-estimating the difficulty of defining a protocol.
It's not writing the protocol that I find the most difficult. It's reimplementing everything the uses the protocol.
Load balancers know how to load balance straight TCP.
Which is only useful if all the nodes are exactly the same, but that prevents you from distributing the data across them based on the user profiles, and then load balance according to the user id, as (if I'm not mistaken) Netflix does. Since they're using subdomains as user identifiers, you'd get that for free using an existing, well-tested HTTP load balancer.
HTTP caching servers are an HTTP-centric idea.
That's a tautology. The question is: are they a useful idea? Is being able to take advantage of existing and deployed solutions like CDNs useful? Seems to me like it would be.
The 'libraries' you'll need can be much, much smaller when all you need is a bit of framing and serialization, instead of a complete complex RFC compliant HTTP client stack.
I think you underestimate the advantages that some of the core HTTP concepts provide.
> Which is only useful if all the nodes are exactly the same, but that prevents you from distributing the data across them based on the user profiles, and then load balance according to the user id, as (if I'm not mistaken) Netflix does. Since they're using subdomains as user identifiers, you'd get that for free using an existing, well-tested HTTP load balancer.
I'm not sure what you think makes that complicated to implement without HTTP, or why you consider it 'free'. Netflix had to write custom code to support that, and could have just as easily done so on top of a message passing architecture ala ZeroMQ or even AMQP.
> That's a tautology. The question is: are they a useful idea? Is being able to take advantage of existing and deployed solutions like CDNs useful? Seems to me like it would be.
Not really, no -- neither a tautology nor are they particularly useful for API implementation. Their primary value is in caching resources for HTTP requests in a way that meshes well with the complexity of HTTP.
If you need geographically distributed resource distribution than HTTP may be a good idea simply because:
- There's widespread standardized support for HTTP resource distribution.
- Its inefficiencies are easily outweighed by the simple transit costs of a large file transfer.
We're largely talking about server "API", however.
> I think you underestimate the advantages that some of the core HTTP concepts provide.
No, the core concepts are more-or-less fine. It's the stack that's inefficient and grossly complex, largely due to browser constraints and historical limitations.
I'm not sure what you think makes that complicated to implement without HTTP, or why you consider it 'free'. Netflix had to write custom code to support that, and could have just as easily done so on top of a message passing architecture ala ZeroMQ or even AMQP.
It's free because it already exists. Load balancers for hypothetical protocols don't.
Not really, no -- neither a tautology nor are they particularly useful for API implementation. Their primary value is in caching resources for HTTP requests in a way that meshes well with the complexity of HTTP.
If you need geographically distributed resource distribution than HTTP may be a good idea simply because:
- There's widespread standardized support for HTTP resource distribution.
- Its inefficiencies are easily outweighed by the simple transit costs of a large file transfer.
We're largely talking about server "API", however.
Isn't the whole point of this system to transfer people's content - posts, pictures, videos, etc - between servers? I would think pure API "calls" would be a small part of the whole traffic.
No, the core concepts are more-or-less fine. It's the stack that's inefficient and grossly complex, largely due to browser constraints and historical limitations.
But to implement them, you need more than "a bit of framing and serialization".
> But to implement them, you need more than "a bit of framing and serialization".
I posit you're still grossly overestimating complexity based on your own experience with HTTP, coupled with grossly underestimating the complexity, time costs, and efficiency costs of the stack HTTP weds you to.
A TCP stream is simple. It's as simple as it gets. Load balancing it requires a few hundred lines of code, at a maximum. It only gets complicated when you start layering on a protocol stack that is targeted at web browsers, grew over the past 20 years, requires all sorts of hoop-jumping for efficiency (keep-alive, websockets, long-polling), requires a slew of text parsing and escaping (percent-escapes, URL encoding, base64 HTTP basic auth, OAuth, ...), cookies, MIME parsing/encoding, so on and so forth.
All this complexity is targeted at web browsers, introduces significant inefficiencies, and requires huge libraries to make it accessible to application/server engineers.
What's the gain? Nothing other than familiarity, as evidenced by your belief that the core of what HTTP provides is so incredibly complicated, and you couldn't possibly replace it.
No -- it's the complexity of HTTP that's complicated, not the concepts that underly it. Drop the HTTP legacy and things get a heckuvalot simpler.
It's not so much that you can't replace HTTP, as that you can't replace all the thousands of tools and packages that already work with HTTP, and that can be very useful for a project like this. And you can't easily replace the knowledge that people have to HTTP either. (Claiming that OAuth is part of HTTP doesn't help with your credibility either, I'm afraid.)
Furthermore, I think that even if the developers of this project could replace the required tools and forgo the rest, I doubt it'd make sense.
Frankly, you'd need a working prototype to convince me of the contrary, so I guess we'll have to leave it at that. I'm a stubborn man ;)