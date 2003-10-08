> Imagine if there were 40 competing and completely
> mutually unintelligible versions of html or text encodings
> There really should be a one size fits all minimal
> serialization protocol
> just the same way there is a one size fits all network
> protocol which moves data around the entire internet
But data isn't like water, it's like 'chemicals'. You can't have a standard component for processing data the same way you can't have a single components that knows how to process sulphuric acid, crude oil, hydrazine, mercury and molten salt.
Data can be binary, delimited, fixed field, lossy, ASCII, Many variations of Unicode, executable, contain attack such as SQL injection, encrypted, have time sensitive delivery requirements, include checksums, require checksums to be applied in the protocol, be meaningless without metadata or other data, and have who knows what other constraints, limitations and requirements.
Building a data processing system is not at all like building a water works. It's more like building a chemicals processing plant.
[0] https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...
Meanwhile, XML-RPC (which is not a serialization format!), JSON-RPC, SOAP, Swagger, are stacks that intentionally leave open the possibility that someone will come along with and consume the form-on-the-wire directly, outside of the tooling of the environment. Most in-the-wild JSON-responding APIs have the same expectation.
IDLs themselves are a very old idea, probably because we like declarative ways of specifying contracts that are then applicable across a heterogeneous environment, or in different languages and runtimes, and so on.
As for why there's dozens of offshoots of standalone serialization formats which are all predominantly occupied with the efficient packing of numbers while keeping the general data model of JSON, I can't answer [1].
[1] https://news.ycombinator.com/item?id=12440783
One example of a tradeoff that is hard to eliminate is that you can reduce size and increase performance substantially if you pre-specify a schema like Cap'n Proto (and others) do. The downside is then if you just get a message without knowing what it is about it's difficult to figure out. The only way out of this tradeoff that I can see is having a global schema registry and every message having 8 bytes dedicated to schema ID, and that has downsides of its own, especially for small messages.
I do agree with the author though that we could do with more binary serialization protocols with tools to easily translate back and forth to a human-readable text format for debugging.
Cap'n Proto https://capnproto.org/
Simple Binary Encoding (by Martin Thompson) https://github.com/real-logic/simple-binary-encoding/wiki/De...
and if neither of those will do, raw C-structs on the wire (basically what the other 2 are anyways).
JSON may be everywhere, and it's tempting to look at its flaws and think, "we can do better" but it also has the great benefit of having decent serialization libraries already written in the vast majority of programming languages. That's one heck of a feature.
Along the same sentiment, I'm not a fan of APIs using JSON and/or XML or some other overly-flexible textual encoding. Simple binary encodings, TLV-ish if necessary, are the best.
I was never really convinced by the "human readable" argument for textual encodings either --- you just need to get used to it, then you can read and write the bytes in a hexdump as easily as you can English. In fact I'd prefer working with hexdumps to XML. But unfortunately there's now a whole generation of developers who can't even count to 2 in binary and don't know what a hex editor is...
>"Java monkeys eventually noticed how slow XML was between garbage collects and wrote the slightly less shitty but still completely missing the point Avro."
I would like to know why the author feels that Avro misses the point. Can anyone hazard a guess?
and similar for:
>"Oh yeah, I do like Messagepack; it’s pretty cool."
It would be interesting to hear why they(or anyone else for that matter) consider Messagepack a worthwhile contribution to the serialization tool shed but Avro is not.
XDR is nice, though, apart from being big-endian and not having widely-supported 64-bit integers. It's a pity it's unfashionable.
External Data Representation (XDR)
https://en.wikipedia.org/wiki/External_Data_Representation
https://tools.ietf.org/html/rfc4506
Abstract Syntax Notation One (ASN.1)
https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One
When was the last time you talk to people working on another software stack? Cooperations between different tribes need to be enforced by strong leadership or a Big Need like imminent extinction of the tribe. As long as that doesn't exist and the whole ecosystem is continuing to grow you can just sit there and watch people building the next silo and the next instead of getting to a higher step in evolution.
And it's actually the reasonable thing to do. I mean, would you rather have a miniscule share of a cake others baked, or do you want to have your own cake? When both is about the same effort, I'd rather have my own cake, even if I have to define a new serialization protocol to store it.
I'm tired of these state of affairs. All of the above can be done in real programming languages, with real syntax. There is no need for yet another external DSL when an embedded DSL in the form of a library will suffice.
People use those "extenal DSLs" because they are tired of bash and ssh for the things the want to do.
https://xkcd.com/927/
