
Useful old technologies: ASN.1 (2013) - LaSombra
http://ttsiodras.github.io/asn1.html
======
lvh
ASN.1 is a bit unfortunate. On the one hand, I want to believe: it's a fairly
ubiquitous standard (at least, as long as you're dealing with cryptography or
a handful of other specific fields), and it's true that it can be a lot more
compact than JSON or XML (see below), and it's even occasionally faster.
However:

1\. A lot of ASN.1 software is pretty buggy and undermaintained. How do I
submit a bug to pyasn1? Because it's entrenched in libraries three or four
layers below what most developers will see, it's also difficult to replace.

2\. Faster and more compact depends on the encoding rules. Let's talk about
the dozen or so different ways you have of encoding ASN.1. BER? DER? Maybe
CER? XER? Why not both: CXER! I'm missing a bunch, I know.

3\. The performance and network arguments ignore what you can do with
compressed verbose formats like JSON, XML or EDN. lz4 is _magic_. Even totally
ubiquitous gz compression works extraordinarily well.

4\. While ASN.1 messages do contain descriptions of what types they contain
(e.g., you can see that there's a bit string coming), they aren't self-
describing in the same way that JSON or XML are, which is quite annoying for
debugging. You can make self-describing messages using XML, but at that point
you're literally just doing XML. Good luck finding software that lets you
easily use whichever you like -- even if you subscribe to the notion that easy
debuggability doens't matter for production messages, in which case I strongly
disagree.

5\. I'd harp about its extensibility, but it's really no better than JSON, so
I won't.

All-in-all, I'm compressing some EDN, and I'm pretty happy.

~~~
tokenrove
I feel the same way about ASN.1. I remember implementing some ASN.1 tools as
part of an SNMP engine and thinking "how could anyone possibly implement all
of this correctly?" \-- a few years later, all those vulnerabilities in SNMP
products at the ASN.1 encoding level came to light. On the other hand, it's
sad to see efficient on-the-wire encoding ignored, and the ASN.1 standards
aren't really abominations when you compare them with some of the web services
standards.

In addition to what you mentioned, I wanted to mention EXI (and FAST, the FIX
encoding) as interesting on-the-wire encodings that maybe more people should
consider over just compressed JSON or XML. Generic LZ-based compression
doesn't necessarily win very much with lots of short messages.

~~~
abecedarius
I had a similar feeling making an SNMP MIB processor around the same time
(2000?) -- the ASN.1 libraries so complicated. So I wrote my own, which turned
out simple and symmetrical just by encoding from back to front instead of the
messy front-to-back-and-backpatch everyone else was doing. (And I guess by
leaving some things out that we didn't use, like other encoding rules? I don't
remember.)

When those vulnerabilities hit I never found out how that code did -- I wish
they'd open-sourced it as they'd planned.

So, that particular part (DER? again I forget) seemed tolerable to me. The
newer stuff like Cap'n Proto is probably still better.

~~~
tptacek
Yes, this! This is the secret to simple BER/DER encoding: back-to-front. I was
_almost_ converted to ASN.1 BER after discovering this, but in the intervening
8 years the spell has (thankfully) worn off and I can see it for the
clattering technological jalopy that it is.

The features in the author's F# ASN.1 compiler are pretty swank. ASN.1
probably gets a bad rap because of BER/DER.

------
MrBuddyCasino
I remember back when Protobuf and Thrift were new, there were discussions
about them re-inventing the ASN.1 wheel.

I'd love to hear opinions of people who have used them, and the experience
they made. I've found an interesting thread [1] discussing them, and the
claimed advantages for Protobuf are as follows:

\- faster (real existing software, not some hypothetical ASN.1 compiler that
could do x)

\- easier to maintain backward compatibility

\- much simpler, and thus easier to understand and more robust

Especially maintaining compatibility seems crucial to me in large distributed
systems.

[1]
[https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4](https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4)

~~~
mike_hearn
Protobuf is vastly simpler, but has most of the power. As it's used internally
by Google it's also very likely to have been audited and reviewed carefully,
and thus very unlikely to have security bugs (otherwise all their internal
systems would be wide open).

------
coderjames
ASN.1 is used as one protocol in the next-gen air traffic control systems
currently being deployed in Europe. The relevant aerospace standards spell out
a specific subset which helps with getting it correct. Only the single Packed
Encoding Rules (PER) encoding is used, for example.

~~~
msvalkon
ASN.1 (PER) is also used in the EU eCall -system as the encoding protocol of
the data between the on-board unit and the emergency call center.

------
lmb
Coincidentally, ASN.1 (specifically it's DER encoding) is used in X.509v3,
better known as TLS certificates. For a taste of the crazyness that ensues
check out Peter Gutmann's x.509 style guide [1].

Personal highlights:

* Many different ways of encoding a simple string, with some very obscure encodings.

* SET OFs are sorted in the DER encoding, to ensure consistent bytestreams. This sucks for embedded systems.

* OIDs (unique identifiers for things) are unbounded.

[1]
[https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt](https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt)

------
timdierks
I implemented X.509 several times in the late 90's. Generalized ASN.1 is a
mess: I don't believe there were any decent open source toolsets then, and the
code emitted by the ASN.1 compilers I could test was lame and required you to
adopt their conventions completely through your code.

Furthermore, ASN.1 has the usual lameness you get when people build generic
description languages: for example, it's quite common to encode a particular
ASN.1 structure, and then put the resulting structure into an OCTET STRING for
inclusion in a parent structure (take a look at Extension in RFC 5280's ASN.1,
for example). This is presumably because ASN.1 didn't (doesn't?) support an
ANY type to allow inclusion of arbitrary structures that the decoder didn't
know how to parse, so there's no extensibility without such tricks.

In the end, I punted and just used BER/DER directly without ever using ASN.1.
This made a lot of things much simpler and produced much smaller and more
efficient code (e.g. my cert parser for our SSL library for the Palm III ran
with no additional allocation space, and compiled to a few K of code).

------
Animats
Marshalling and unmarshalling are common, yet ill-matched to most programming
languages. You usually have to define the marshalled form in some cross-
language notation, then use some overly complex tool set to get it to play
well with the language. This applies to ASN.1, Google protocol buffers,
OpenRPC, and XML. Each has its very own data definition language and tool
chain.

JSON is taking over because it's a good match to languages where you can
define lists and dictionaries easily. Most languages now have that. The
overhead is high, but the simplicity is helpful. As a practical matter, it's
usually easier to get things to talk to each other with JSON than with the
more rigid forms. Someone is usually out of compliance with the data
definition.

There's now "Universal binary JSON"
([http://ubjson.org/](http://ubjson.org/)). That's just a binary
representation of JSON. Then there's JSON Schema, which adds a data definition
language to JSON. Put both of those together, and you have something very
close to ASN.1.

And the wheel goes round and round.

~~~
espadrine
I felt like MessagePack was the clear JSON-as-binary winner. Oddly, ubjson
does not mention it. Do you know how it compares?

------
AceJohnny2
I'd be curious to know if Google's Protobuf library, and kentonv's followup
Cap'n Proro library, use concepts initially from ASN.1.

~~~
rdtsc
There is also FlatBuffers which takes Cap'n Proto ideas and brings it back to
Google
([http://google.github.io/flatbuffers/](http://google.github.io/flatbuffers/)).

These encoding formats (or rather their implementation) is based on mimizing
copying of data. Deep down they are based on mmap-ing memory areas. Not unlike
you see the casting of blobs of memory to packed structures (but with more
safety).

------
hyc_symas
Nice to see more applications of it. But you don't even need a fancy ASN.1
compiler; OpenLDAP's liblber will do BER/DER just fine. (It uses malloc, but
that's because LDAP messages have unbounded size. And it doesn't _overuse_
malloc...)

XML and JSON are both ridiculously inefficient, both for static storage and
especially for communication protocols. Can't wait for them to die.

------
dkersten
I dealt with ASN.1 some years ago when working on an SMS anti-spam/anti-fraud
system. ASN.1 was both amazingly awesome and horribly frustrating at the same
time. I have an awful lot of respect for it especially given its age, but I'm
also glad to now be working somewhere where I can use JSON, EDN and Transit :)

------
athenot
And if you happen to be dealing at the bit level or have variable-length
fields, as found in encodings such as ExpGolomb (typical in audio-visual
encodings), then check out Flavor[0].

[0] [http://flavor.sourceforge.net](http://flavor.sourceforge.net)

------
osandov
This is orthogonal to the argument in the article, but the "buffer overflow"
example in C is incorrect. Even if sizeof(b) is smaller in the receiver than
in the sender, the receiver will only read at most as many bytes as it (the
receiver) thinks are in b -- whatever it got for sizeof(b). Of course, this
could still lead to a truncated message, but we'd all be in pretty big trouble
if you could buffer overflow a server by sending it a message larger than its
recv buffer :)

------
ExpiredLink
Maybe you don't need full ASN.1 but only TLV?

[http://en.wikipedia.org/wiki/Type-length-
value](http://en.wikipedia.org/wiki/Type-length-value)

------
dkopi
ASN.1 is an awesome way of encoding information, but having variable lengths
all over - implementations are very prone to bugs and buffer overflows.

It sure was fun implementing back then.

~~~
userbinator
At some point you will have to deal with items of varying length, and what
ASN.1 does is IMHO far better than some of the other protocols. In particular,
DER is all TLV, so the lengths are explicitly provided _a priori_ and can be
checked easily.

Contrast this with text-based protocols that rely on scanning forward to find
a delimiter - the lengths are _implicit_. I think this is what really can
cause bugs, as not having explicit lengths makes it easier to forget to check
them against the buffer's size.

~~~
noblethrasher
On the other hand, there is some research[1][2] which indicates that one way
of improving software security and reliability is to (1) realize that any
program that accepts input is basically a recognizer of strings belonging to
some formal language, and (2) that we should limit the grammar of said
language to regular or, at most, context-free. Length fields automatically
make the grammar context sensitive which is much harder to secure according to
langsec.

[1] [http://langsec.org/](http://langsec.org/)

[2] [http://youtu.be/UzjfeFJJseU](http://youtu.be/UzjfeFJJseU)

~~~
pmahoney
Interesting link, just skimmed it and haven't had a chance to watch the video.

> Length fields automatically make the grammar context sensitive which is much
> harder to secure according to langsec.

Is this accurate given a finite length field? I can imagine a DFA that
recognizes the language of a single byte length prefix followed by strings of
1 to 255 characters, just that the node that consumes the length field will
have 255 branches to sub-DFAs that recognize 1, 2, ..., 255 character strings.

~~~
noblethrasher
Yes, I should have said “unbounded length field”. But, with respect to the
discussion, a 32 or 64-bit integer is only bounded in the academic sense.

Also, here is a video that works a bit better as an introduction to LANGSEC:
[https://www.youtube.com/watch?v=3kEfedtQVOY](https://www.youtube.com/watch?v=3kEfedtQVOY)
(around 19:00 is especially entertaining)

