
Langsec explained in a few slogans - signa11
http://www.cs.dartmouth.edu/~sergey/langsec/occupy/
======
cbd1984
> Every piece of software that takes inputs contains a de facto recognizer for
> accepting valid or expected inputs and rejecting invalid or malicious ones.

This should be expanded: Langsec isn't about adding complexity so much as it's
about getting the complexity right.

You say you want a revolution, well, you know, those things have a way of
coming back around. In this case, your "revolutionary" ideas about YAGNI and
Do The Simplest Possible Thing have already occurred to others, and they're
the ones who wrote the parser libraries you're so sure you're not going to
need.

All you need is a simple lexer and some extra state... and a little more extra
state, with a bit of recursion to get nesting right... and now you're unsure
how string escapes should work...

The end result isn't an input language without grammar. Every language has
grammar. The problems come when nobody actually knows what that grammar is.
Because if you don't know, you don't know if one of your grammar's productions
is INSERT ZERO-DAY HERE.

So Go YAGNI Yourself and use some pre-written code, or at least act as if you
did by writing an actual grammar first and then implementing it, and then
implement it again, _correctly_ this time. Your users will thank you.

------
pag
I feel like this focuses too much on the recognizer than it does on the
implementation of the language itself. Sure, given a grammar you can implement
a correct parser that accepts only valid inputs. This doesn't really buy you
anything if your implementation contains a bug though. It's like an undergrad
compiler course that spends all its time on parsing and how great it is then
tells you almost nothing about the actual meat: the middle- and back-ends.

Is your position that formally specifying the input language is a necessary
first step toward formally specifying the language (operational/denotational)
semantics, and eventually having a provably correct implementation? If so, why
stop short at just talking about the parser when that's typically the least
complicated part of any interesting input format?

------
munin
neither necessary nor sufficient. you can have a "turing complete" input with
no exploitable bugs (or really, no bugs at all) and you can have a very simple
recognizable and computationally limited input that has all kinds of crazy
buffer overflows.

the story is about abstractions and security boundaries. what is it that
allows an application to write a value of type A to a location expecting type
B? why is this expressible? the langsec focus on inputs and parsers is missing
the forest for the trees.

~~~
ared38
All Turing complete inputs can be attacked by sending an non-terminating
program. You could limit the amount of time an input is allowed to run, but
then it's no longer Turing complete.

~~~
munin
it turns out that every system we've ever devised has had a limit on the
amount of time the input is allowed to run, so this really seems more like a
matter of degrees. this is also only an "attack" if the availability of the
system is an invariant that is relevant from a security perspective. usually,
we care much more about confidentiality and integrity and absent termination
sensitive noninterference (which is a different problem that langsec wholly
ignores and has nothing to do with the complexity of the input), i'm not sure
how the termination of a program will have any impact on its confidentiality
or integrity...

~~~
ared38
DoS isn't an attack? I agree it's less severe than leaking credit card
numbers, but availability is a big deal in many applications.

~~~
munin
if DoS was the only problem we had to worry about, we would have won.

------
lovboat
I think the unix philosophy is the model to follow here. Design an input
method just capable of receiving the information and let other applications
process your input. Input is also about context, in which case you need to
process an input to context prior to an input in a context.

Example: Context = Numeric question about integer numbers and four operators,
+, - , _, /

input = r'[0-9,\\+,\\-,\_,\/]'

------
jolux
I'm thinking about this and I'm finding it hard not to read it as a direct
violation of the robustness principle. I'm probably reading it incorrectly but
where am I going wrong here?

~~~
fabulist
You're totally right! Security practitioners hate the robustness principle!
This article, for instance, contains a small rant.

[http://noxxi.de/research/http-evader-
explained-8-borderline-...](http://noxxi.de/research/http-evader-
explained-8-borderline-robustness.html)

At a minimum, the robustness principle leads to exploitable parser
differentials (filter bypasses such as the above being the most minor form).
When you've already determined the data doesn't meet your assumptions, it's
really easy to make a mistake in handling it, leading to memory corruption
etc.

Don't try too hard to be robust! Call broken data out for what it is!

~~~
jolux
I mean I think I do see that side of it, but the robustness principle is
highly pragmatic. I think if HTML parsing was completely strict then the
internet might not exist.

It's an adage born from the fact that if you aren't tolerant in what you
accept, you end up accepting very little. Especially when the people trying to
create valid documents are not highly skilled or knowledgeable about why it
can't "just work."

Of course, all of security is about tradeoffs like these, because an ideally
secure computer is one that isn't turned on. The robustness principle may be a
nightmare from a security standpoint, but so is JavaScript, Java plugins,
Flash other assorted NPAPI crap, and basically anything else having to do with
the internet.

~~~
fabulist
You probably won't see this response, but in case you do:

I see what you're saying, and I'm not going to criticize decisions people made
more than a decade ago, but we've outgrown the robustness principle.

You should not be tolerant of accepting bad input to coddle your users. You
should write simpler languages (so the effort they must put in is reasonable)
and better development tools (so they understand their mistakes). Even if you
decided to let HTML be "robust" since it comes from humans, why would you let
an HTTP client or server emit broken headers?

The real problem with the robustness principles is that it has short-term
benefits and extreme long-term costs. Today it means that you can open up
netcat and talk to Apache without using CRLF linebreaks (as per the spec). But
next year, and the year after that, and the year after that, you'll have to
support increasingly noncompliant legacy implementations.

We write specifications to make things concrete. The robustness principle,
over time, creates a "shadow standard" one must abide by, putting bad code
into your project to support the bad code everyone else has been writing
because we _let them do it_. Maybe it makes Firefox better able to process its
inputs but it makes the web as a whole fragile. It does not make broken data
any less broken, it simply silences errors.

In short, it is neither robust nor principled.

Security is about tradeoffs, but after so much time building broken systems,
it's time to realize that this tradeoff is not reasonable.

~~~
jolux
HTTP clients and servers should absolutely not emit broken headers, but that's
a violation of the robustness principle of itself.

The way I think about it is imagining how many websites I use daily just
_wouldn 't work_ if HTML parsing were completely strict. I regularly open up
Dev Tools to see which of the websites I visit work without CSS or HTML
parsing errors.

My primary concern is that there needs to be a way to implement strict parsing
that doesn't greatly raise the barrier to entry or make the product unusable.
I do think a lot about this but it's nonetheless confounding from both
perspectives and I think the usability of strict parsers needs to be
addressed, whether that means better error messages or really good linting or
whatever.

------
alexbecker
The last one is a little misleading. While the equivalence problem for two
such parsers is undecidable, this just means that some pair of parsers are
identical but cannot be proven such--for many pairs such a proof exists.

------
hobo_mark
Link is dead but there seems to be a mirror on the langsec website itself:

[http://langsec.org/occupy/](http://langsec.org/occupy/)

