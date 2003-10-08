Hacker News new | comments | show | ask | jobs | submit login
Please stop writing new serialization protocols (scottlocklin.wordpress.com)
41 points by otterley 1 hour ago | hide | past | web | 33 comments | favorite 





  > Imagine if there were 40 competing and completely
  > mutually unintelligible versions of html or text encodings
There are.

  > There really should be a one size fits all minimal
  > serialization protocol
There can't be.

  > just the same way there is a one size fits all network
  > protocol which moves data around the entire internet
There isn't.

reply


Every now and then I come across someone who thinks data is like water and we should have standardised 'pipes' and components for processing and transporting data and converting it between formats like we have pipes and valves for transporting and purifying water.

But data isn't like water, it's like 'chemicals'. You can't have a standard component for processing data the same way you can't have a single components that knows how to process sulphuric acid, crude oil, hydrazine, mercury and molten salt.

Data can be binary, delimited, fixed field, lossy, ASCII, Many variations of Unicode, executable, contain attack such as SQL injection, encrypted, have time sensitive delivery requirements, include checksums, require checksums to be applied in the protocol, be meaningless without metadata or other data, and have who knows what other constraints, limitations and requirements.

Building a data processing system is not at all like building a water works. It's more like building a chemicals processing plant.

reply


All valid. But if you want to avoid a shock, please don't talk to people who transport actual water in actual pipes. I bet you'd be surprised how many different standards of pipes there are and the long list of features that separate one kind of water from the other. ;-)

reply


I suspect the author hasn't read The Absolute Minimum ... about Unicode [0] (or why there's no such thing as "plain text")

[0] https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...

reply


Oh no, there are 40 competing standards. We must make one to unify them all! Oh no, there are 41 competing standards.

reply


In fairness they are saying pick one, the earlier one.

reply


https://xkcd.com/927/

edit: scrolling down shows me this has already been posted 3 times in this thread (and will probably be posted a few more times before it's off the front page)

reply


You sum it up nicely.

reply


Thrift and Avro -- and later, gRPC, but not protobuf -- are full RPC stacks where you use an IDL to codegen your endpoints, and those endpoints communicate using their own serialization. Since this is form-on-the-wire is "internal" concern not meant for direct public consumption, I find this acceptable.

Meanwhile, XML-RPC (which is not a serialization format!), JSON-RPC, SOAP, Swagger, are stacks that intentionally leave open the possibility that someone will come along with and consume the form-on-the-wire directly, outside of the tooling of the environment. Most in-the-wild JSON-responding APIs have the same expectation.

IDLs themselves are a very old idea, probably because we like declarative ways of specifying contracts that are then applicable across a heterogeneous environment, or in different languages and runtimes, and so on.

As for why there's dozens of offshoots of standalone serialization formats which are all predominantly occupied with the efficient packing of numbers while keeping the general data model of JSON, I can't answer [1].

[1] https://news.ycombinator.com/item?id=12440783

reply


I don't think this is as big a problem as the article suggests. There are a substantial number of tradeoffs in serialization protocols, and each application/ecosystem can choose their protocol to get the best of these tradeoffs. As long as there are few enough that every popular protocol has a library in almost every language this isn't too bad.

One example of a tradeoff that is hard to eliminate is that you can reduce size and increase performance substantially if you pre-specify a schema like Cap'n Proto (and others) do. The downside is then if you just get a message without knowing what it is about it's difficult to figure out. The only way out of this tradeoff that I can see is having a global schema registry and every message having 8 bytes dedicated to schema ID, and that has downsides of its own, especially for small messages.

I do agree with the author though that we could do with more binary serialization protocols with tools to easily translate back and forth to a human-readable text format for debugging.

reply


The three I ever use:

Cap'n Proto https://capnproto.org/

Simple Binary Encoding (by Martin Thompson) https://github.com/real-logic/simple-binary-encoding/wiki/De...

and if neither of those will do, raw C-structs on the wire (basically what the other 2 are anyways).

reply


There are some really good reasons to pick an unusual serialization protocol, and even sometimes reasons to invent your own. (Embedded systems, limited environments, licensing restrictions, etc.) Generally though, you should use something the rest of your development team / community is familiar with. Not because this is efficient in terms of resource usage on the machine, but because this is efficient in terms of teaching your other developers how your serialization protocol works.

JSON may be everywhere, and it's tempting to look at its flaws and think, "we can do better" but it also has the great benefit of having decent serialization libraries already written in the vast majority of programming languages. That's one heck of a feature.

reply


As much as it seems to be recommended against (by... authors of serialistion protocols?) I am a strong believer in just using simple structures (and unions if necessary) directly --- all these serialisation abstractions appear to have been invented at a time when machines varied far more widely in their characteristics such as endianness, alignment, word size, integer representation, and even byte size. Now that your platform is almost certainly going to be x86 or ARM, it makes little sense to add a layer of (sometimes substantial) complexity in essentially attempting to accommodate flexibility that won't be needed. I can see the necessity if e.g. you need to communicate with a 36-bit 1's complement mainframe, but otherwise it's just bloat.

Along the same sentiment, I'm not a fan of APIs using JSON and/or XML or some other overly-flexible textual encoding. Simple binary encodings, TLV-ish if necessary, are the best.

I was never really convinced by the "human readable" argument for textual encodings either --- you just need to get used to it, then you can read and write the bytes in a hexdump as easily as you can English. In fact I'd prefer working with hexdumps to XML. But unfortunately there's now a whole generation of developers who can't even count to 2 in binary and don't know what a hex editor is...

reply


It sounds reasonable at first. But I'm very happy that we have a lot of stuff supporting very rare edge cases. Maybe it's just bad luck, but at work I often run into edge cases and I'm very happy if it is at least partly supported.

reply


Please stop writing articles that say "Please stop X"

K, thnx

reply


The author states:

>"Java monkeys eventually noticed how slow XML was between garbage collects and wrote the slightly less shitty but still completely missing the point Avro."

I would like to know why the author feels that Avro misses the point. Can anyone hazard a guess?

and similar for:

>"Oh yeah, I do like Messagepack; it’s pretty cool."

It would be interesting to hear why they(or anyone else for that matter) consider Messagepack a worthwhile contribution to the serialization tool shed but Avro is not.

reply


The big problem is that rpcgen is hairy and not reeeeeeally portable (in my experience, if there's a feature you want, it's not supported on your Unix).

XDR is nice, though, apart from being big-endian and not having widely-supported 64-bit integers. It's a pity it's unfashionable.

reply


I had to look up XDR and ASN.1 ...

External Data Representation (XDR)

https://en.wikipedia.org/wiki/External_Data_Representation

https://tools.ietf.org/html/rfc4506

Abstract Syntax Notation One (ASN.1)

https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

reply


The only problem with this kind of rant is the assumption that people actually talk to each other. But why would they?

When was the last time you talk to people working on another software stack? Cooperations between different tribes need to be enforced by strong leadership or a Big Need like imminent extinction of the tribe. As long as that doesn't exist and the whole ecosystem is continuing to grow you can just sit there and watch people building the next silo and the next instead of getting to a higher step in evolution.

And it's actually the reasonable thing to do. I mean, would you rather have a miniscule share of a cake others baked, or do you want to have your own cake? When both is about the same effort, I'd rather have my own cake, even if I have to define a new serialization protocol to store it.

reply


Along a similar vain please stop writing external DSLs. Especially in the DevOps ecosystem. I'm really tired of learning yet another syntax for (bash + ssh).

reply


This is a hairier problem than you make it out to be. Eventually, you end up needing to describe the state of a system based on input values you may not be able to control. Arbitrarily complex dependency graphs add another layer of complication. Being able to construct simple data structures can suffice for almost all use cases, which is why YAML is so popular. Eventually, though, you end up needing to programmatically generate those trees of structures and oops now you're Turing-complete. If the goal is to decouple describing the state and executing the actions leading to convergence, there's no good solution to bridge the gap between the two, so you end up writing a DSL or creating an unholy marriage of a data description language and an imperative templating language (YAML+Jinja2), or just doing it all in BASH and giving up on the idea of clean separation.

reply


It's not hairy at all. All these "orchestration" things written in some weird YAML format end up doing the same thing. Re-inventing their own syntax for modules. Re-inventing their own querying language. Re-inventing idempotent building blocks and then forcing you to compose it all in some YAML format.

I'm tired of these state of affairs. All of the above can be done in real programming languages, with real syntax. There is no need for yet another external DSL when an embedded DSL in the form of a library will suffice.

reply


But which programming language. Some like Python, some like Go, some like Ruby. Whatever one chooses, it will alienate large group of people who could otherwise use the product. It's not black and white.


If you're tired, then just use bash + ssh.

People use those "extenal DSLs" because they are tired of bash and ssh for the things the want to do.

reply


Or you know they could just write a library in a real programming language instead of forcing people to write YAML. Don't quite get the fascination with YAML. Give me actual fucking syntax instead of some bastardized serialization format that is badly trying to ape lisp.

reply


I wonder what the author dislikes about gRPC.

reply


When facebook wrote thrift protobuf wasn't opensourced yet.

reply


That kind of thinking is what brought us UML - the most powerful useless tool.

reply


At the end, it sounds very much like he's suggesting adding another protocol based on XDR, with just a few reasonable changes. It's like the article is a living example of the classic xkcd joke.

[]: https://xkcd.com/927/

reply


Situation: there are now n+1 competing serialization protocols

https://xkcd.com/927/

reply


Sad you got downvoted. That's pretty much what the author did -- list a bunch of serialization protocols, say that the ones developed in the 80's were good enough, and then recommend we all use MessagePack.

reply


That isn't accurate, he states preference for XDR, which is ancient, and that he "likes" MessagePack. He didn't explicitly advocate it.

Not blaming anybody for skimming the post, which was a pretty typical blogrant, if it was from certain other people it would be clearly clickbait, but this seemed like at least genuine ranting.

reply


I thought of this immediately. Not sure why you're downvoted.

reply




