1) scanf on a floating point number works differently on different platforms. If you explore the space of numbers that are expressible, you will find very different results in terms of floating point error depending on which implementation you use.
Therefore transmitting floating point in ascii is just asking for trouble. But IEE-754 is fairly universal and binary fixed point conversion will look the same everywhere.
2) The complaint about not being able to locate fields based on a text dump is okayish. But it's easily solvable by adding a 4 char code to specify the message type. For example, if you open a video file in a hex editor, you'll see headers like 'MPEG' all over.
3) In high performance or low memory applications, translating text to binary and back is expensive. But writing binary as text for debugging is just a printf away.
4) Auxillary to 3, allocating memory for streams becomes an O(logn) problem in terms of allocations since you don't know where the current message of arbitrary length will end. Meanwhile, binary messages over either UDP or TCP are fixed length and atomic, vastly simplifying your streaming and event code.
The worst case of this I ever saw was a text RPC protocol for video streaming, where a single message could be anywhere between 20 bytes and several GB. The guy that wrote that one made the same argument ESR makes, "It's so easy to read!".
There is clearly an efficiency downside to packing this description into the messages themselves when the recipient software will necessarily have the protocol information necessary to decode it.
I have learned some things about how the NTP protocol was designed. And it is a good summary of trade-offs involved. You may or may not agree with them, and they may or may not be important to you.
However, I have learned even more things about how absolutely mind-bogglingly huge the egos of some posters here are.
> “Are you” she asked “the most famous programmer in the world?”
> This was a question which I had, believe it or not, never thought about before. But it’s a reasonable one to ask...
That reminds me of the Mork file format. Apparently, the edict from management was that the database file format should be space-efficient and human readable. This is to store essentially an entity-attribute-value relationship. The resulting file looks like this: (^a0^23fca)(^a1^23fcb)(^a2^23fcc)... totally unreadable. And it's not particularly efficient as a file format either as a result.
> A wire protocol is a way to pass data structures or aggregates over a serial channel between different computing environments.
A real wire protocol involves not just the flow of data but the flow of control. Without knowing which data structures are replies to which others, issued under what circumstances and affecting which other parts of the state space, you don't have a protocol. All you have is a format. It's the difference between the floor plan of a courtroom vs. the rules for what happens within one, and that difference is not a minor one.
Then, predictably, ESR fails to distinguish between three separate concepts: binary vs. text, fixed vs. variable length fields, self-describing vs. not. While he sets up a false dichotomy between two, all eight combinations actually exist. There are even further variations. "Self-describing" can apply to any combination of length/delimiters, field names, internal format/encoding, applicable versions, mandatory vs. optional, and many more. If you want to have a serious discussion about designing protocols and formats, the design space is much larger than "NTP's wire format sucks for one set of constraints and purposes" which is all you'll get from this article.
I don't like the article for the same reasons. It felt long-winded and didn't ultimately offer anything of value. What's worse is that it's written from a position of authority, common for this particular author, so some people will be duped into treating it as gospel.
I would look forward to reading articles from any point of discussion however strongly people felt and he seems like he would make for a great debate but at the end it was...."You think you know but you don't."
Seems like a classic ivory architect mentality that grinds potentially good conversations to a halt.
I have spent a lot of time working with FPGAs that process network packets and many of the performance guarantees relied on the rigid structure of L2-4 protocol headers.
Accidentally, in my work lately I used JSON for data exchange over the network where performance is not important, and MsgPack otherwise where it is (which is essentially a packed JSON).
That is precious.
"The average IQ of the Haitian population is 67... Haiti is, quite literally, a country full of violent idiots." -Eric S Raymond
"... The minimum level of training required to make someone effective as a self defense shooter is not very high... unfortunately, this doesn't cover the BLM crowd, which would have an average IQ of 85 if it's statistically representative of American blacks as a whole. I've never tried to train anyone that dim and wouldn't want to." -Eric S Raymond
(Note: this is just the tip of the shitberg. There are SO MANY MORE examples on so many other topics (like "Is the casting couch fair trade?") from so many other times over the decades.)
JSON is a disaster for many reasons. Hardware incompatible floating point is one; inconsistency in parser implementations (and ambiguities in the spec) also don’t help.
Also, why use a tree structured data representation when the underlying data structure is fundamentally just a N-tuple with a fixed schema?
Similarly, why use a text protocol to send around fixed length blobs or encrypted data?
If I got the chance to redesign NTP from scratch, there are a lot things I'd change, but use of fixed binary fields is not one of them.
What happens if I want to run NTP on an ARM M4 microcontroller with a lithium coin cell battery? Because, you know, perhaps I actually might want my time to be accurate on devices that even outship cell phones?
Sending that message would be difficult without drift because of the huge number of bytes involved. Transmission time is far too long. I could go on and on...
If you want to see a relatively well designed protocol, go chew through the BLE (Bluetooth Low Enregy) spec. It's not perfect, but it shows you how to balance functionality vs. engineering (note the number of times you have a "length" parameter so that you can chew through your binary blobs even if you can't parse all of it).
Please quit giving ESR a platform when it's quite clear he really sucks as a programmer.
I am not sure this post complies with Hacker News guidelines.
I find ESR to be ferociously overrated from a technical standpoint and resent the fact that he absorbs oxygen from people far more talented but far less "adept" at self-promotion. In addition, the "technical" ideas that he promulgates occasionally have to be actively undone by those with stronger technical chops.
Why does verbalizing this run afoul?
Don't we already have formats? (EDIT: Like CAN or I2C?)
A "protocol" sits on top of things like I2C, SPI, and CAN.
Protcols answer things like: "How do I send more bytes than the underlying transport can take in a single transaction?" "How do I exchange data when hardware has different characteristics." "How do I minimize the power or time needed to exchange data?"
Different protocols have different strengths and weaknesses.
Remember: part of my complaint about this "protocol" is its "verbosity". If you are on a battery or are bandwidth constrained (Narrowband-IoT, LoRA, ANT, etc.), you want a protocol that exchanges short messages. Time is certainly something that you don't want to require lengthy messages when you are trying to set up.
Too many people think "embedded" means "runs a Linux installation larger than the average computer in 1996".
He starts from presenting false dichotomy (bit stream vs self-documenting text) and proceeds to apply his personal experience with proprietary GPS trackers to well-documented NTP protocol. He describes his favorite approach without mentioning it's downsides — and that approach is JSON! JSON!
By design, JSON format lacks any capacity for extensions. It's creators figured out, that backwards and forwards compatibility is more important that anything else, so they froze the specification at version 1 and refused to introduce new features or extension support. And thus JSON can't...
1) contain comments;
2) properly encode non-Latin text (no, — hexadecimal encoding is even worse than no encoding);
3) have more than one top-level element;
4) have any data types, except ones in JSON spec.
Each of those limitations has lead to creation of at least one incompatible JSON-like format, that can't be processed by spec-compalient JSON parsers. Pick a random piece of JSON from the wild, and you may find, that it isn't actually "JSON", but one of those quasi-JSON formats. To make matters worse, JSON spec didn't mention maximum supported number size/precision, so JSON payloads from one implementation may not properly decode on another implementation.
If he wants to design JSON-based NTP protocol, he is welcome to do so. But widely adopting such thing would be unwise — we already suffer from traffic amplification attack via NTP, and bigger packet lengths would make those worse.
This is (RFC) valid JSON:
And if you don't like the limitations of JSON, extending the format for your particular use-case is a valid solution. (Though I would argue that going w/ a well-known format that already supports your needs is a more pragmatic one.)
You mean, "valid, according to the latest 2017 RFC". Such young RFC is still to raw, too immature to adopt, especially if it concerns data interchange formats. IPv6 was created in 1995, and it apparently still too young!
I fear, that a proper full-featured JSON spec, with comment support, mandatory UTF-8 and strict prohibition of hex-encoding won't be created and implemented by most JSON parsers till at least 2090. At that point the JSON format itself will likely become insufficiently hip for general use (just like XML suddenly stopped being hip enough in early 2000's).
> I fear, that a proper full-featured JSON spec, with comment support
Many of us use JSON as a language to exchange data, service to service. Comments do no good in that regard. JSON, even w/ comments, is not terribly friendly. I'd recommend TOML or YAML, depending on the situation.
> mandatory UTF-8
JSON is required to be encoded in one of the Unicode UTF encodings. So, it's not required to be UTF-8, but it's pretty close, and I don't think I've yet run across a JSON document that wasn't UTF-8.
> strict prohibition of hex-encoding
I don't think you'd really want this. (Particularly if you want human-friendly features, like comments…) In debug situations, certain non-printing characters are just easier to deal w/ if they're not printed, for example.
When designing any system, be it a wire protocol or anything else, it’s tempting to optimize for metrics that are easy to measure and forget about metrics that are hard to measure. Humans are expensive. Time is expensive. It may very well be worth using a little extra bandwidth to minimize development and debugging costs. That won’t always be the case, but it’s an important question to ask while you’re still in the designing stages.
Any decent programmer can see that the technical arguments here are flawed: NTP, by nature, needs to be very predictable and use as few bytes as possible, or embedded systems are going to run into issues. That’s a hard technical requirement, not a matter of optimization. Unfortunately, that oversight, combined with the author’s poor track record, detract from an important point. Sadly, the author has chosen to present his argument as a misguided rant about a particular protocol rather than a strong theoretical debate over the pros and cons of different optimization goals.
Error replies should also include at least a short-text response (often along side a numeric one).
Initial connection strings might also have a text-string that says something useful to humans, like what the protocol is; just like the various multimedia container formats that were developed on the Internet rather than by 'media companies'.
Push complexity as far up the protocol stack as possible; but don't sacrifice useful extensiblity at lower levels if it makes sense. At the same time don't depend on that data staying the same (if it exists in another layer, it should be modifiable without breaking your actual protocol). FTP is a great example of a protocol that (because of connection multiplexing limitations) embeds data which should be low level in to a higher level.
If an RFC exists that describes the protocol then packed is probably OK. If no RFC described protocol or method exists, try to get as close as possible with off the shelf stuff and prototype with human readable things where possible on top of that until solid requirements for a new RFC are refined.
I always try and design in a debug mode. Turn it on and the destination will try to tell you exactly what you did wrong instead of stonewalling you.
Painful experience has taught me that you really want unique start and stop tokens, message type, message length, rev field and checksum/mac always. That at least allows you to mechanically validate and dispatch packets/messages.
You reminded me of the horror of talking to closed source things where there isn't even useful server-side debugging data. Stuff just fails or gets dropped without informing anyone why it went bad.
This also applies to things like my credit card - I would love to have the last week or two of even /failed/ attempts at using my card in my online statement. That would really, really help with figuring out if someone was trying to use my card, or if a given service that rejected use of my card even tried to hit the CC company. (This happened with a major travel site which probably did it's own processing; having a firm direction to push and solid data might have helped.)
I had some young feller tell me some time back why ASN.1 was the anti-christ, but I can't remember (for the life of me why). Do you happen to know/remember why ASN.1 is 'bad'?
I can write 8-bit assembler to it - I can write 32- or 64- bit compiled C to it, and it's pretty easy to create an FPGA pre-processor (and router) for it, and it certainly is more constrained than random JSON strings. What's wrong with ASN.1? Too old?
But mostly ASN.1 comes from the same "bad neighborhood" as CORBA, X.400, X.509, OSI protocols, etc.
Whenever (way back in the day) we (our/my company) did SET (secure 3P "secure" credit card protocol) with competitors (MS/HP/RSA/IBM/Netscape/etc.), because we compiled an interpreter from the spec, we were able to put in code-path switches depending on the counter party and adapt. Since they had a buggy compiler from a 3rd party - they could not.
Was that an issue with ASN.1? Or crappy tooling that people used?
I had to resort to a 409 dump
In my dozen plus years of working with them, I've not seen a large change like this. And TBH, there are way to structure them to make sure they handle changes.
I mostly deal with hardware that has to be stable in the field for years, so maybe that's the difference.
Sounds like the bit-packed protocols save bits (that's clearly good), while the other protocols are self-documenting. Documenting is good.
Why can't we have both? Something like protocol buffers (yes, Raymond mentions them in the article) is a binary protocol that makes pretty efficient use of the bits on the wire. But they are also very well documented. And it's "documented" that is useful, not "documentation is included in every message that gets sent".
Is it possible to look at the on-wire protocol and tell that this message uses protocol buffers and which protocol specification it is using? (I'm not sure. I hope the answer is yes.) Is it possible to find the documentation for a particular protocol buffers specification once you know which one it is? (I think it is, if by no other means than a google search, although a more automated repository might be nice.) If both these things are true, then we can have bit-level efficiency AND have well-documented and extensible messages.
I wonder if there's merit to recreating such a thing under ICANN, where the issued serial numbers are useful for file types, wire protocols, etc.
Then anyone needing to reliably interpret a packet could (1) look for the format serial number at some well-known location, and then (2) consult the well-publicized registry for whatever information has been provided regarding the format.
In classic Mac OS and BeOS, file inodes carried file types so you didn't have to guess about file types either.
I thought everyone hated SOAP.
I shouldn't pile on (Sorry Eric Raymond), but there's this one:
> A decimal digit string is a decimal digit string; there’s no real ambiguity about how to interpret it
The context is as contrasted to a 64-bit big endian value.
Of course, a decimal digit string is subject to its binary encoding (no different than anything else sent as bits).
I know the author is talking about JSON, but I've see a lot of different ways the length of decimal digit strings determined: null terminator, double-quote terminator, single-quote terminator, length is a twos-complement 16-bit, 32-bit value before the first character of the string. (I think maybe 16-bit is know as a pstring or Pascal string? My memory is not 100% here.) I'll bet someone's done a 64-bit value before the string, though I haven't seen it myself. Oh, and I've seen where the length is determined by knowledge of the data structure (that is, something like bytes 10-25 are a name, padded with spaces or null terminators, usually leaving readers to infer the encoding based on the dominant platform). And once you start terminating a string with a certain sequence of bits, there's an escaping mechanism you need to deal with. Let's look at the source code for a quality JSON parser before we call it unambiguous?
I mean, on my first paying gig of my life I made $50 writing some sample code for a BASIC tutorial. The first version of it was rejected because it didn't work right on their EBSIDIC system. I was 16 and I was thinking, "what the heck (I actually swear back then) is EBSIDIC?!?") I know we all use ASCII and Unicode now, and IIRC, the actual digits 0-9 were the same, but not the decimal point, so maybe parsing integers is OK but not floating point values. Speaking of which, using . for the decimal point is not exactly universal (even forgetting about EBSIDIC, and let's please do)...
JSON has it's rules (which is good!) but my point is: a decimal digit string not necessarily a simple thing. I realize the author doesn't know this, and I don't begrudge him -- I am certainly not happier for knowing otherwise -- I'm just trying to point out that the authors of NTPv4 were not exactly working in a era where a good programmer could possibly think a decimal digit string was anything but a hornets nest sitting on a land mine guarded by MCP (Tron reference, sorry).
So... the author complains that parsing an NTPv4 packet requires prior knowledge of things like big endianness, but parsing JSON requires plenty of prior knowledge.
I get it: big- vs. little-endian is not something people are used to dealing with these days, so it jumps up and bites you when you do. But it's just another encoding and is actually much, much simpler than ones you deal with every day.
(Back then, all the cool CPUs were big endian so I think it was pretty understandable how it ended up on the wire.)
Does it matter? YES (well, not my personal anecdotes! but the other bits).
The dead truth is: you're going to have to understand and parse the messages you receive and they may or may not use conventions and idioms you already understand.