Hacker News new | past | comments | ask | show | jobs | submit login
Consistent Overhead Byte Stuffing (1999) [pdf] (stuartcheshire.org)
38 points by tjalfi 41 days ago | hide | past | favorite | 11 comments

This is Stuart’s PhD paper from Stanford. His later work includes ZeroConf, Rendezvous/Bonjour, and a lot of the underlying network code of modern Apple devices. Some people may know him from his legendary multiplayer tank combat game, Bolo for BBC Micro and vintage Mac.

One nice use case of COBS is to clearly delimit packets being sent over a serial stream. Framing packets with zero bytes is particularly convenient for small projects because it’s easy to implement and debug.

I’ve used COBS and MessagePack combined for some embedded sensor projects. My implementation for Arduino (also includes some other functionality on top to encode data with MessagePack): https://github.com/telemetryjet/telemetryjet-arduino-sdk

For an example closer to home, the JPEG family of image compression formats all have a byte stuffing algorithm to avoid JPEG markers from occurring in the compressed stream.

USB and video codecs also come to mind.

COBS is also nice over serial because the most likely corruption that can happen is a dropped byte, which COBS can usually detect.

One feature of this format which made me choose it for a project I worked on is self-synchronization -- if your data is truncated or corrupted in some way, you can recover by stepping forward until the start of the next frame. And since frame lengths are limited to a small number, there's no worry of one corrupt byte causing the parser to skip the rest of a long file.

So how would you send a single non-zero byte this way? Or is that why you need the zero byte to delimit your messages? Also it seems that if you wanted to pause your transmission you can’t really “flush” the incomplete COBS frame unless it happens to end with a zero...

As per the siblings, the encoding for that would be 02 xx (also figure 1 in the paper).

As for delimiters, it's not so much that to use COBS you need the delimiter, but the causality usually goes the other way, i.e. you want to send packets over an unreliable stream where every symbol is valid data, so you use 00 as the packet delimiter, and encode the actual packets with COBS (so the encoded packets won't have zeros).

So in general you won't need to "flush" partial data, because the natural way to interface to this type of channel is to send whole packets. If you don't know in what state the channel is, you can always send out a 00 before the start of packet to make sure the previous packet gets terminated.

And usually the packets themselves will have some additional validation like CRC, so the previous incomplete packet will be ignored.

> COBS first takes its input data and logically appends a single zero byte. (It is not necessary actually to add this zero byte to the end of the packet in memory; the encoding routine simply has to behave as if the added zero were there.)

So it's `02 xx`. By the way, a single zero byte would be `01 01`.

Ah, thanks, I've missed that part.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact