
The Blink Protocol - robdjc
http://blinkprotocol.org/
======
bkirwi
Glanced through the spec and the comparison to protocol buffers on their blog.
[0] A couple of quick notes:

\- The messages are prefixed with the message length; this allows has a
variety of advantages (ie. splitting up messages with minimal parsing) but
forces you to know your message size in advance. In the general case, this
requires processing the full structure before the first byte can be written.
(On the other hand, protocol buffers can be written incrementally, with a
single pass over the data and using constant memory.)

\- Since the blink format contains no metadata, the recipient must have the
schema with which the message was written. If schema migrations must be
supported, this requires prefixing each batch of messages with the (encoded)
schema. This may only be efficient when the batch size is large. (See Apache
Avro [1] for another format that makes this tradeoff; since Avro is intended
for processing in Hadoop, where the typical dataset has some millions of
records, prefixing with the schema is not significant overhead.)

[0] [http://blog.blinkprotocol.org/2013/01/blink-compared-to-
goog...](http://blog.blinkprotocol.org/2013/01/blink-compared-to-google-
protocol.html) [1] <http://avro.apache.org/docs/current/spec.html>

~~~
tripzilch
There was this recent security presentation about formal language-theoretic
reasons why message-length prefixes are a bad idea.
<http://www.cs.dartmouth.edu/~sergey/langsec/>

The problem was that it causes unnecessary complexity in validation and
parsing, as _< length-of-message><message>..._ describes a context-sensitive
grammar, while a more XML-, JSON-like syntax with delimiters _<
delim><message></delim>..._ is context-free. The complexity of proper CSG
validation can be exploited for resource-depletion/DOS attacks, while _not_
validating the syntax of a message leaves part of the program that _uses_ the
data in these messages with potentially invalid data, putting it into an
inconsistent state, possibly leading to DOS or worse exploits.

At least that's what I got from their presentation, which was very interesting
BTW, especially the part about "weird machines"--that's basically the
inconsistent state, the emergent machine that is made out of all the things an
attacker can make a system do by exploiting it. It was also _extremely_ geeky.

It just left me wondering: if I understood this right, _every_ protocol that
uses message-length prefixes is _at least_ vulnerable to resource-depletion
attacks, and there's no way of solving this short of proving P=NP. But that's
quite a lot of protocols! Why isn't the Internet burning down as we speak?

~~~
kryptiskt
But in any normal design you never put the message length into the same
parser, you use it to read the message, and the parser is fed the message when
you have received that many bytes.

For the reader a message consists of <length>+<blob>+<length>+<blob>\+ etc. It
can be in two states, having enough data, so pass the blob on to the parser
(and it doesn't have to validate that the blob is length bytes, we know that
already); or awaiting more data (you can attack that by dribbling data slowly,
but you can do that with any protocol).

EDIT: Ok, I used a flat structure there, it would be different in a recursive
structure with length on every element. But a top level reader that reads
<length> bytes of the outer wrapper and doesn't muck about with the innards is
still a sensible thing.

~~~
tripzilch
First of, this isn't about deeply nested structures, the flat example
demonstrates the problem perfectly fine :)

> But in any normal design you never put the message length into the same
> parser, you use it to read the message, and the parser is fed the message
> when you have received that many bytes.

Won't work, this is a fundamental problem in computing science.

Splitting the parser into a pre-parser and a post-parser isn't going to help
solve the fundamental problem, because the combination of two parsers is still
a parser.

One of the problems is, you cannot distinguish <blob> bytes from <length>
bytes. If the data stream gets out of sync with the parser state (hiccup,
dropped packet), you have a very non-trivial problem on your hands. A context-
free grammer however, is free of context (ohh!) and can therefore resync in
time proportional to how deep it's nested.

Speaking of nesting, that's another bit where I expect CSGs to become
incredibly hairy: Of course you can use a hybrid approach: length-prefixed
messages for the "outer stream", and a context-free XML/JSON/Lisp style format
(delimiters on both sides[0]) for recursive structures. But why would you do
that? If you wanted to save bytes by avoiding the delimiters on the very outer
structures, there's a _lot more_ of them to be saved if you apply the same
"optimization" to any inner recursive structures. If you don't know what I'm
talking about, think about how a tree-like recursive datastructure is
represented in the memory of a C program. Yes pointers. Alternatively you
could length-prefix them like before, C programs don't do that because you
need to scan through everything and it's less efficient. Regardless, both
approaches are context-sensitive and good luck on distinguishing malformed
data from correct ones.

Now, this whole "formal languages and automatons" is a very complex subject
matter[1], so while it may seem that the whole argument hinges on dropping a
packet and desyncing the parser[2], I got the feeling from that talk that
there are other (similarly fundamental) problems, but this particular one I
understood and can make a compelling argument for :)

[0] afaik you might actually get away with a delimiter on just one side, but
that makes it harder to parse because you need strict precedence rules to
resolve ambiguities (e.g. 2+3 _4+1_ 8 _2+1+1+7_ 4 _3_ 2 _1+0)

[1] it was considered one of the hardest courses during my CS college years
(the other one being on "formal proofs of program correctness"), for various
reasons I retook this course 4 times (underestimating its difficulty at first
being one of those reasons), but when I finally did pass, I did so with a
score of 9 out of 10, I'm kinda proud of that :P But the _real* benefit of
studying 4 times for the same difficult course is that you never really forget
it (some parallels there with that post about "spaced repetition learning"
last week).

[2] another thing they recommended that makes a lot of sense, but again is a
parsing complexity (security) vs bandwidth efficiency trade-off: to make the
delimiters (say, parentheses) to be out-of-band characters. so they're not
allowed in binary blobs. this saves you from all sorts of escaping exploits
(think XSS), makes resyncing more efficient and parsing a lot easier. of
course it's really hard to step out of the "we really need all 8 bits in a
byte"-paradigm, or how else can you design data formats with out-of-band
characters? I don't know, and the talk I watched didn't give a solution
either, just that it would be a good idea (to which I agree).

------
cjensen
Another binary message format? Another language for describing the format
instead of XML or JSON?

"Why?" is the big question which sits there unanswered.

~~~
cbsmith
I'm with you on this. When Thrift came out, it kind of made sense, because
there wasn't really anything quite like it (closest I'd say was Hessian) in
terms of the mix of simplicity, performance, etc. When Protocol Buffers came
out, it kind of made sense, because Google had worked on it prior to Thrift
even existing (indeed, it inspired Thrift). When Avro came out, I was like,
WTF!? This is making that mistake yet again!!

EDIT: Damn, I forgot to mention MessagePack. ;-)

~~~
mullr
Avro is useful because it also defines a container format which is easily
splittable by Hadoop.

~~~
cbsmith
Yeah, but one could just define a container format. It could have been done
with protocol buffers in fact. I know, because I have implemented it.

------
vinkelhake
This is by the guys who came up with FAST. That protocol is somewhat common in
the dissemination of market data from exchanges. Pantor targets the financial
sector and real-time trading systems in particular. It's a small company with
smart people.

<http://www.fixprotocol.org/fast>

------
tlrobinson
_Suitability for FPGA implementation_

That's an interesting requirement you don't see very often.

~~~
IvyMike
High Frequency Trading applications, maybe?

------
yannis
Go straight to the tutorial <http://blinkprotocol.org/s/tutorial.html>, easier
to understand what it is all about.

------
georgeorwell
I'm curious, why do you have a 3-clause BSD-style license on some PDF files
that contain specifications? I guess they are a kind of binary built from the
page layout source code, but I haven't really seen this before. I'm not sure
if you can license the specification itself, i.e. put conditions on people
that want to implement it.

<http://blinkprotocol.org/s/specs.html>

~~~
alanctgardner2
I'm not an expert, but you can definitely restrict the documents which
describe the standard. Now, if someone sits there and reverse-engineers the
spec without ever seeing that doc, that might be tough to prevent. But
generally you can apply a license to any creative work you want; this includes
technical documents. A good (or bad, depending on how you view it) example is
service manuals for laptops.

------
nanodeath
As someone who knows little about Protocol Buffers, my first question was how
does this compare to Protocol Buffers...which they say blogging about is "on
their todo list".

The protocol comes from Pantor Engineering, which apparently provides an
"advanced trading system" out of Stockholm. Neat.

~~~
afuruhed
Hi, I'm going to update the FAQ. We have started comparing to GPB, please see
here.

[http://blog.blinkprotocol.org/2013/01/blink-compared-to-
goog...](http://blog.blinkprotocol.org/2013/01/blink-compared-to-google-
protocol.html)

As you can see in a comment by Rolf, part of the perf limitation of GPB is the
implementation and not the wire protocol.

Anders

~~~
Inufu
Is there a reason why all the fields for protobuf are declared as "optional",
when it looks like they are in fact not optional at all?

~~~
bchallenor
To support schema evolution. Of course it's up to you to write code that can
actually handle missing fields, but once you have a required field in a
message, you can never remove it.

From the docs:

"You should be very careful about marking fields as required. If at some point
you wish to stop writing or sending a required field, it will be problematic
to change the field to an optional field – old readers will consider messages
without this field to be incomplete and may reject or drop them
unintentionally. You should consider writing application-specific custom
validation routines for your buffers instead. Some engineers at Google have
come to the conclusion that using required does more harm than good; they
prefer to use only optional and repeated. However, this view is not
universal."

<https://developers.google.com/protocol-buffers/docs/proto>

------
mingpan
Feels kinda like a cross between ProtoBuf and MessagePack. We already have so
many serialization formats, though. What is the compelling use-case for this
one?

------
gdp
Could be useful for some things. I feel like it's a missed opportunity to
employ somethings like session types, given the existence of a quasi-typed
human-readable "schema" definition. It seems like one could (and should!) go
further and make it an unambiguous protocol specification against which
implementing programs could eventually be checked.

~~~
afuruhed
Good point. Is it possible to create an unambiguous protocol specification
that scales to more complex interactions? If so, and if you know how, I guess
you would need a different specification language to get expressiveness? Blink
has an extensible schema but that won't cut it. Anyhow, we have so far not
even tried to do what you want but we hope that there is nothing in Blink that
puts limitations on doing that separately. There is a lot more that we don't
try to do in the protocol core.

------
wavesounds
Where does this fit on the network protocol stack? Does it replace TCP/UDP? Or
is it on top of HTTP? Or is it not even a network protocol? I've skimmed the
website but I'm surprised this isn't jumping right out at me tbh.

~~~
wavesounds
"Blink can be used as an efficient presentation layer on top of middleware
solutions like message queues and messaging platforms. However, Blink is also
well-suited to be layered on TCP and UDP without any need for encapsulating
headers, thanks to its built in message type system."

How is it TCP without the headers? And how can you put a layer on top of TCP
and call that saving space? Shouldn't you just be using TCP? It's late I must
be missing something ...

~~~
ekimekim
It's a standard for encoding data over streams like TCP. If you recieve "00 4a
33 2b" over TCP, you have no idea what it means. But if you have a codebook or
language defined to understand that data, you'll understand it means "Meet me
on the first floor in half an hour". Blink is: \- A meta-language for defining
such languages \- A specification of how messages in these languages are
turned into binary data and back.

When both client and server know the definitions (written in the meta-
language), they can determine which kind of message the other side is using
because of the metadata sent with the message (their "message type system")
without the need for any extra information, like a HTTP content-type header.

------
wrl
This actually looks pretty cool. Disappointed to see that there's no reference
implementation available as a library, however. Is there a plan to release
one?

~~~
afuruhed
The (maybe only) good thing about there being no reference implementation is
that we get feedback on the protocol itself and not the characteristics of a
particular implementation. In this first phase, we are providing tools to ease
independent implementation. We think it would be a sign of health that the
specs are (become) good enough to create interoperable implementations from.
We will make our plans known as they develop. We are doing more with the core
protocol next.

~~~
vinkelhake
One thing I sorely missed when working with FAST was an official and
comprehensive test suite. A wide set of templates and encoded files to
exercise all the nooks and crannies of the spec would have been very useful.

It looks like blink has far fewer special cases so the need for an official
test suite might not be as great. Might be something worth considering anyway.

~~~
randersson
agreed, a test suite will be essential for implementors. I'd be very
interested to hear you thoughts on this. Please mail me if you have interest.
Rolf

------
dpakrk
Does any binary message format (proto buffer , message pack , blink
etc)support data streaming ? I as know none.

------
swalsh
Why would I use this instead of ASN.1?

~~~
codex
From the FAQ:

"We did not know how to make use of ASN.1 without making Blink a lot more
complex."

This is the same reason Protocol Buffers exist: it was more fun to write
something new than to understand how to use a better but more complex
technology.

~~~
cbsmith
I think you failed to grasp the point of that decision.

It's not about understanding the more complex technology. At least in the case
of protocol buffers, they understood ASN.1. Implementing all that complexity
comes with a price. If nothing else, it takes time away from other priorities.

If you look at protocol buffers, there are still ways to make the code faster,
cleaner, etc., even without adding new features.

Having worked with a variety of ASN.1 libraries in a variety of ways, I can
tell you that in almost every case, I found either performance bugs,
functional bugs, or just missing features that literally got in my way.

~~~
andrewcooke
also, the ASN.1 spec isn't free, which reduces adoption.

------
berlinbrown
Interesting.

And it applies to anything?

------
shurcooL
What can be said about this?

