Meredith Patterson's astonishing CCC talk on computing and linguistics. It's a t...

gnosis · on April 8, 2013

I found the message in the talk persuasive, but the presentation of it sorely lacking. Her delivery was stilted, probably because she was either very nervous and/or hadn't practiced her presentation nearly often enough to sound natural. On top of that, the talk was way too long. It could have been easily cut in half, at the very least.

In a Boing Boing article[1] (which also, incidentally, calls her presentation a "tour-de-force"), her points were summarized in these quotes:

  Hard-to-parse protocols require complex parsers. Complex, buggy
  parsers become weird machines for exploits to run on. Help stop weird
  machines today: Make your protocol context-free or regular!

  Protocols and file formats that are Turing-complete input languages
  are the worst offenders, because for them, recognizing valid or
  expected inputs is UNDECIDABLE: no amount of programming or testing
  will get it right.

  A Turing-complete input language destroys security for generations of
  users.  Avoid Turing-complete input languages!

Ok. I got that. Seems reasonable enough. It didn't take me 40 minutes read that, and it shouldn't take 40 minutes to present basically the same thing. Maybe half that if you thought you really had to fight hard to make your case.

A more concise presentation containing just the important/interesting bits would have been less painful to watch and probably a lot easier to prepare for.

That said, the Q&A period at the end was good. She seemed a lot more comfortable there, and there was a lot more interesting and informative content there than in the presentation proper.

[1] - http://boingboing.net/2011/12/28/linguistics-turing-complete...

Udo · on April 8, 2013

I'm watching it right now and there is nothing wrong with the delivery whatsoever. I also didn't get the feeling that the talk was too long. Sure, you can sum up pretty much anything with a TL;DR, but that's not the point of giving a talk, right? I mean at its core a presentation is meant to inform or entertain and this one here did both. So I really don't get where the criticism is coming from.

ezy · on April 9, 2013

This is a great talk, although maybe a little bit too precious (but this is common to a lot of [security] conference talks so I don't hold it against her). Pushing people towards using more formal methods to generate and accept protocols is always good.

That said, I thought the argument against length fields was somewhat.. weak. But maybe I'm misunderstanding the context. The question at the end was not answered satisfactorily. A protocol with a length field is most certainly deterministic, and if you go the other way and use a delimiter, escaping/encoding is the only way you're going to work with arbitrary user data. I would argue the length field is miles better. If someone injects bytes into your stream that match the protocol, your recognizer isn't going to save you, just the same way someone rewriting the length field is going to blow up.

What this talk seems to argue for is making the language simpler (e.g. context-free), so you can validate that the semantics are working as intended, but transferring arbitrary blobs of data is always going to be an issue as long as people enter random musings in text boxes that are rendered by Turing complete software. :-)

flipgimble · on April 9, 2013

As I understood it, the arguments against using an unbounded length field is that it makes the language recognizing it context-sensative. When processing some inner payload of a data packet you need to carry around the state of outer context-sensative protocol layers to make sure your inputs are well formed.

The fact that it is trivial to maliciously craft the length field makes it cheap for the attacker to try to exhaust receiver memory, overflow buffers or make DDoS attack more effective. If you use a delimiter, the attacker has to at least spend the required bandwidth to try to exhaust resources.

I suspect that if your protocol specification bounds the length field to some finite amount, then your language can be classified as a regular language for verification purposes, just with a FSM branch for each possible value of the length field.

wladimir · on April 11, 2013

The fact that it is trivial to maliciously craft the length field makes it cheap for the attacker to try to exhaust receiver memory, overflow buffers or make DDoS attack more effective. If you use a delimiter, the attacker has to at least spend the required bandwidth to try to exhaust resources.

A length field doesn't mean that you have to pre-allocate that amount of memory. Never do that! Robust implementations use the length field only as a hint, as an hidden delimiter, and allocate memory as the data comes in.

That said, she does have a point. Though escaping is fraught with dangers, too (remember PHP in the beginnings? magic quotes, ugh).

ezy · on April 9, 2013

Aha, that was it. I was assuming you'd bound the length. :-)

_juof · on April 9, 2013

She also had a more practical talk on the subject:

https://www.youtube.com/watch?v=XVZrmp5MAas

And also talks about a general parser she's building for this issue on github[1]. Using this parser , she have built a DNS and base64 recognizer , and i think they are working on building recognizers for more protocols.

[1]https://github.com/abiggerhammer/hammer

maradydd · on April 8, 2013

Gosh, thank you!

chubot · on April 11, 2013

I just watched it and thought it was great. She is well spoken and clear.

But I sort of thought this was common knowledge. I made a related comment here: https://news.ycombinator.com/item?id=4677998 (and before the Ruby Gems YAML exploit, which maybe was kind of prescient)

Incidentally, I felt she sort of brushed off computational complexity when once person brought it up. Complexity is just as important as computability/decidability, and it fits very well with her formal language perspective (e.g. regular languages can be recognized in linear time and constant space).

Another thing was that she didn't give that many concrete examples. The x.509 certificates was one that seemed really good (didn't look into the details). But I would have liked to see an instance of a protocol that was not meant to be turing complete, but IS.

She gave the example of HTML5 + CSS which was a good one. But I don't think people have accidentally written Turing complete HTTP parsers or anything like that. How big a problem is it in practice? She's making the claim that most security bugs fall in the category of being caused by language errors?

And, like another comment, I'm not sure I agree with disavowal of the length field. I'll have to think about that. I believe delimiting and escaping is more complex and thus more fragile. I also believe it's slower, despite what she says in the talk. We have to consider the tools we have now; maybe in some future world we'll get better formal language tools.

EDIT: http://en.wikipedia.org/wiki/Billion_laughs -- Related link to a design flaw in XML that is related to computational complexity, but not computability.