
WebSockets is a stream, not a message based protocol - saurabh
http://www.lenholgate.com/blog/2011/07/websockets-is-a-stream-not-a-message-based-protocol.html
======
zaphoyd
I think his analysis is flawed. WebSocket is a message based protocol that
does not specify a maximum message size in the RFC. This does not make it a
streaming protocol until an implementation decides to deliver incomplete
messages to the end application. Some implementations have done this, many
(including all browsers) have not and will not.

Time and time again it has been demonstrated that we are bad at choosing a
maximum allowed value for all applications and all future considerations (see:
ethernet frame sizes, IP address lengths, operating system address spaces,
file system block sizes/counts, etc).

In some cases (many of those previously listed) there were hardware, cost, or
technical concerns that led to nailing down a number in an RFC. For WebSocket
there is no clear benefit to forever encoding a specific numeric maximum
message size. It is a high enough level protocol that there is no technical or
cost benefit to make message sizes limited by anything other than individual
application needs.

As such, the WebSocket RFC leaves maximum message size implementation defined,
and specifically says that an implementation SHOULD implement a reasonable
maximum message size for its purpose. A chat application that knows it will
only be moving small text messages can set its maximum message threshold small
to improve buffer performance and catch invalid messages sooner. An
application that finds a business case for sending a large file in one large
message can set itself up accordingly. Generic WebSocket parsers should expose
a method of setting the maximum message size the application wishes to
receive.

I definitely agree that not requiring implementations to return their maximum
message size along with the "Message too big" error will make some sorts of
interoperability more difficult. However, it also prevents exposing
implementation security details and simplifies the core spec (the author has
already complained that the spec is too complicated already). It is relatively
simple for an application to negotiate a maximum message size privately if
necessary and the WebSocket extension mechanism allows a method for
standardizing a way of doing so if this turns out to be a serious issue in the
future.

~~~
LenHolgate
I've no problem with the lack of a max message size in the RFC, what could
cause problems is the fact that it needs to be passed between client and
server "out of band", i.e. at the application protocol layer rather than at
the websocket protocol layer. Also bear in mind that this blog entry was
written based on Draft HyBi 09 and not the final RFC; the wording has changed
somewhat since then.

The draft in question suggested that providing a message based interface to
application code was possible and that the parser could/should deliver only
complete messages to the application code. That's hard to do if you also want
to allow for the 'endless streaming' scenario that others on the working group
were fond of. The result was a bit of a mess.

The final RFC addresses some of this, but there's no getting around the fact
that the websocket protocol itself can't tell you how big a message is until
you get the final frame.

Sure you can work around all of this even for a generic parser but the initial
wording in the draft in question could lead you towards the wrong design if
you're not careful.

~~~
zaphoyd
As I mentioned above, I agree that that needing to pass maximum message size
out of band makes some things more difficult. Whether that was the right
tradeoff in terms of convenience vs protocol complexity I think has yet to be
seen. At any rate, an extension to perform this in band should be trivial.
Perhaps I will try writing one to test out my extension handling code.

I do see your point on the "endless streaming" section of the RFC. Stating
that "(section 5.4) The primary purpose of fragmentation is to allow sending a
message that is of unknown size when the message is started without having to
buffer that message." implies that a web socket implementation should support
this sort of operation. Indeed, if you want to support sending messages of
unknown size you must expose an interface more complicated than the default
message based one.

That said, a message only implementation that does not allow sending unknown
sized messages is 100% compliant with both the spec and receiving such
messages. The RFC probably could have made this fact more clear. I believe
that endless streaming mode will not be a common use case and have not
implemented it in my generic WebSocket library. I do believe, however, that
fragmentation of messages provides important benefits even without unknown
size sends. Once you have message fragmentation there is no additional
protocol cost to allow unknown size sends.

~~~
LenHolgate
I agree with all you're saying.

The wording of the RFC has improved since that draft and the flexibility could
be useful in some scenarios.

I ended up with an API which can be asked to deliver complete messages 'if
possible' given the buffers provided by the client of the API. If it's not
possible and the buffer becomes full then the API simply gives you the
fragment of data and tells you if it knows how much more there is to come or
not.

------
lambda
The WebSocket protocol design was hijacked by architecture astronauts who
decided that it _must_ have all of these extra features added, instead of
remaining a simple, easily implementable and understandable protocol. The
original WebSocket protocol was a simple stream of delimited messages, with
the only complexity being in the handshake that was necessary to ensure that
JavaScript apps couldn't send arbitrary data to arbitrary ports without
permission.

The problem is that the original handshake wasn't good enough (there were
still security vulnerabilities despite he handshake), and when Ian Hickson
decided to hand over control to the IETF, the architecture astronauts took
over, adding complex framing with six different frame types, subprotocols,
extensions, versions, complex bit twiddling required to parse frame headers,
fragmentation of messages into smaller frames (which is what this article is
complaining about), control frames interleaved with fragmented messages,
numeric status codes _and_ textual close reason strings that "MUST NOT" be
shown to the user, masking of data by xor'ing with a random value that changes
for each frame, but only for one direction (client->server), a two-way closing
handshake on top the existing TCP mechanisms for closing the connection, pings
to test the connection for liveness, and so on. There are six registries
defined for IANA to keep track of
<http://www.iana.org/assignments/websocket/websocket.xml>; extensions,
subprotocols, version numbers, close codes, opcodes, and framing bits.

And despite all of this over-engineering and attempt at extensibility, all
extensions must know about each other, because there is no standard method for
delimiting different extensions' data (or even specifying how much data an
extension uses), and there are three header bits and 10 frame types that all
extensions must share. And I don't really know why there's a need for
subprotocols on top of the ability to just encode that information in the URL.

It's kind of sad how what could have been a relatively simple and easy to
implement protocol has been taken over by architecture astronauts. Yes, a few
of these features are actually required to securely deploy websockets (the
handshake and masking). Most of them are people making up features that would
be nice in theory, instead of implementing something simple that works. Ian
Hickson's original protocol wasn't perfect; it still needed some work by the
time he left. But it was simple, and easy to implement, and didn't impose
restrictions that couldn't be worked around at a higher level.

~~~
pork
Thanks for an extremely illuminating explanation. I may be wrong, but it
almost sounds like long polling and other alternatives are preferable to
Websockets because of the added complexity. Stories like this also make me
feel fortunate that my favorite beacon of simplicity, JSON, didn't get handed
over to a "task force".

------
samwillis
It seems to me that the hype around Web Sockets has overshadowed the Server
Sent Events API (<http://dev.w3.org/html5/eventsource/>) which for most
situations where you don't need a continues stream of data is a more sensible
system. It is purely a message sending system by design.

The really nice thing about SSE is that you can fall back to long polling very
easily with exactly the same back end and as it runs over vanilla http without
the upgrade protocol system is much easer to implement, you just don't close
the connection after sending a message. Obviously its only one way but we have
a well established way of sending messages in the other direction with http
POST.

~~~
scarmig
Which browsers have implemented SSE/EventSource?

~~~
pornel
All except IE and Android, as usual.

<http://www.caniuse.com/#search=eventsource>

However, since SSE is HTTP-compatible you can easily implement fallback for
these, e.g.

<https://github.com/Yaffle/EventSource>

~~~
scarmig
Wow, I seem to have missed the boat on caniuse.com =)

Thanks for the info, everyone.

------
andrewvc
This is some screwed up stuff, as nearly every WebSocket library and tutorial
really encourages treating them as discreet messages. This should be fixed
post haste, because very few people really want a stream based protocol for
web sockets.

~~~
theturtle32
I don't think it'll be a problem in practicality. Most implementations primary
APIs will be message based and not intended for this streaming case. I'm
planning on refactoring my implementation to have a low level streaming API
that's used internally and is exposed if you really need it, but on top of
that build the message based API that 99.5% of people will use.

------
NHQ
Hey, this article is better than a rant, if you don't know much about what's
going on with the RFC, but are using websockets anyway. :D

Everything you send up or down is a message, or a packet, and the size of that
cannot exceed the size of pipe (with bottlenecks or intermediary
restrictions). Call 'em Quantum Packets, or a stream, or a message. The
websocket protocol, as imagined by this developer, is meant to allow a
continuous lot of Quantum Packs to "flow", without the _application level_
overhead of parsing a bunch of protocol, headers, wackness. I want to get the
data into my applications AFAP, cuz I still have to transcode it, analyze it,
and all else to make the baby dance.

What we need as developers are minimum-for-reliability standards. No two
people in different locations will have the same pipe. As a developer, I
consider it my domain to write software on top of, or using, the socket layer
to determine the potential through-put of the given socket, and to test such
as needed through-out the simulcast. I don't even want the socket-layer-
wrapper writers (may God shower them with blessings) intervening at this
level, until everybody on Earth has unrestricted 10mbps/s up and down.

If that control is hidden from me, or not an option, or is nullified by
protocol, then my app or media could break in ways I could not predict or
understand, and so I would have to design my app using the socket layer in a
lowest-common-reliability kind of way.

These are not the opinions of a WebSocket RFC acquainted developer.

------
simpsond
The WebSocket protocol works for both small and large messages in a single
frame (message based), and also small and large frames in multiple fragments
(stream based)... It's capable of being used for both. It's a good idea to
restrict frame sizes on your application if you know what your limits are.

~~~
marshray
It's only capable of being used for messages if there's something to guarantee
that all hops along the way are going to preserve the message boundaries in
ways expected by the application layer. Can the protocol split single
messages? Can the protocol merge adjacent messages without reordering?

Unless the protocol specifically guarantees certain behavior _and_ commonly-
used systems regularly exercise this guarantee, it's just not going to work
reliably when it's needed.

Hearing some of the "works for me" discussion from developers suggests that
we're heading for that magic situation where it works 99.9% of the time. I.e.,
the system looks fine in testing and then fails in mysterious ways (that
require deep protocol fixes) in production.

Ideally, implementations of such a protocol would intentionally fragment the
messages somewhat if they were not going to guarantee they were atomic. But
there are very few developers (and code reviewing managers) enlightened enough
to let that kind of thing ship.

~~~
LenHolgate
The protocol preserves message boundaries but not fragment boundaries. You may
send a message of, say, 100 bytes and get 100 x 1 byte fragments arrive, or
you may send 100 x 1 byte fragments and get 100 bytes in a single frame. The
main issue, for me, at the time, was that when you get that first 1 byte frame
there's no way to know how big the resulting message will be.

Luckily there's a rather excellent compliance test suite, here:
<http://www.tavendo.de/autobahn/testsuite.html> which should go a long way to
help nail interop issues.

~~~
marshray
_when you get that first 1 byte frame there's no way to know how big the
resulting message will be_

So that's a design tradeoff. I've implemented protocols that did it both ways
and it's definitely easier on the guy trying to implement a library or other
receiving application if he can get a reasonable upper limit on the size of
the messages.

But on the other hand, by _not_ requiring the total message length be known in
advance, it eases the logical (and memory) burden on the sender. Often the
sender will be an overloaded server.

Nothing can prevent a higher-level protocol on top of WS from negotiating its
own max message size.

So the design choice that was made would seem to _allow_ optimizations for the
overloaded server case without prohibiting other optimizations. This is
typical of W3C protocols.

~~~
simpsond
Well, all frames actually do have length in the header. It's just not the
first byte... it's the lowest 7 bits of the second byte, and possibly more
(check the spec).

So, if I can read 2 bytes from my buffer, I can get a good idea of the message
size. If I only have 1 byte, I rewind and wait until I receive another.

~~~
marshray
But we're talking about messages, not frames.
<http://tools.ietf.org/html/rfc6455> "frames have no semantic meaning"

If Websocket developers begin making unwarranted assumptions about message
framing and fragmentation you will regret it. BELIEVE ME.

/me gets back to bugfixing

------
reedhedges
Since this post seems to actually have been made in July of this year, does
anyone who has been following WebSocket details have any comments on how this
situation has changed?

My impression of WebSockets is that it's not actually a "finished" high level
protocol. They could have just brought a basic socket style interface into
JavaScript and left it at that. (And based on its name, that's what you'd
expect at first.) But they decided to add various features, (for better or
worse, I don't know yet) on top of that. (I guess part of it is the challenge
of working not just on TCP, but sort of within HTTP as well). Just as you
wouldn't just pick up TCP and start blowing "data" through it without some
additional application specific structure, you're going to need to add your
own structure inside WebSocket's framework.

~~~
LenHolgate
I posted an update when HyBi 13 came out;
[http://www.lenholgate.com/blog/2011/09/the-websocket-
protoco...](http://www.lenholgate.com/blog/2011/09/the-websocket-protocol---
draft-hybi-13.html)

The wording was improved around the suggestion to provide only a message based
API.

I think the WebSockets protocol ended up being a little more than it should
have been. You have to understand that it was being pulled in all sorts of
directions by the working group members and that there are good reasons for
all of the parts of the protocol (though some of those parts could work better
with other parts IMHO). It had to be finished at some point though and I think
the working group did a good job in the end.

Personally I think it would have been better had it been explicitly stream
based from a user's perspective, but then I don't have the javascript/browser
background to know how foolish that probably sounds.

------
kokey
Looks like we should be implementing UUCP over WebSockets.

------
jerf
You don't sound smart for mocking the idea that a 8-exabyte message in a
communication protocol is "big enough", you sound like you're mindlessly
parroting ideas you don't fully understand. Yes, 8 exabytes _is_ enough for a
single message, and always will be. TCP works on "messages" (packets) in the
kilobyte range, for comparison. Communication protocol packet sizes aren't
equal to the amount of data the communication protocol can send.

~~~
LenHolgate
The argument FOR 63 bit message sizes was that you could effectively turn the
message based protocol into a stream, except, unfortunately, the "stream" has
a limit even if it seems plenty big enough now.

Personally I wouldn't have included the 63 bit message size.

~~~
maximusprime
What's the point? What's the difference between a stream of messages, and a
stream of bytes? Nothing.

~~~
LenHolgate
The difference is that with a stream of websocket messages every so often you
need to deal with the framing. With a stream of bytes you don't. This
precludes the use case that was the basis of the argument for 63 byte messages
if you ever have to 'stream' a 'message' that needs to be longer... Sure you
can send it as multiple fragments but then you can't do the 'here's a file
handle, read the stream' thing that was proposed.

------
ilaksh
Does socket.io handle these issues? Does Now.js?

------
angersock
Out of curiosity, and forgive my ignorance here, but since everyone seems to
prefer using event-driven methods in JS, why was a message-based protocol
passed over in favor of this stream solution?

~~~
forgotusername
It's elsewhere in the comments or the article: a feature of the protocol
allows transmitting partial messages, where the message size is unknown. One
example might be the result of a slow, unbuffered SQL query, where it's more
useful for the server to pass the result to the client incrementally, rather
than buffer the full message ahead of time.

Why you'd want to do that is another question entirely. Introducing roundtrips
by feeding tiny chunks to TCP is generally a horrible idea, however, it does
prevent the server from dedicating a potentially huge chunk of RAM to buffer
the result ahead of time.

Because of this feature, and the author's desire to model this feature as part
of some client library API (a mistake? you decide), he's concluded that it's
in fact a stream-oriented protocol. That's like concluding it's a byte-
oriented protocol because TCP can/will further fragment the partial frames due
to segment size constraints, etc. (i.e. it's a silly conclusion).

~~~
LenHolgate
I'm not sure I follow you to get to it being a "silly conclusion".

As I said, the draft at the time suggests presenting whole messages to the
application layer. The parser can't know it has a whole message until it gets
the final frame... This could lead to interesting memory usage ;)

The protocol provides for a series of infinitely long messages, each separated
by a terminator. I don't have a problem with that in itself, but the draft at
the time was misleading to suggest otherwise...

------
pyrotechnick
A rose by any other name would smell as sweet.

