

Postel’s Principle is a Bad Idea - tinsel
http://programmingisterrible.com/post/42215715657/postels-principle-is-a-bad-idea

======
tptacek
Postel's Principle was very important for bootstrapping adoption of TCP/IP,
but it's mostly a curse in mature systems. It doesn't even help new
implementors; instead, it deceives them into thinking that they've achieved
interoperability when instead they've accidentally built dependencies on other
people's implementation details.

That said, I wouldn't suggest that our Insertion, Evasion paper presented an
argument regarding the Principle in either direction. Even if we forbad
leniency, there'd still be ambiguous standards.

~~~
gruseom
_when instead they've accidentally built dependencies on other people's
implementation details_

I think it would be very interesting if you could give an example or two of
this.

~~~
inopinatus
Two off the top of my head:

* A classic would be IE's abuse of TCP RST: [http://www.stroppykitten.com/cms/index.php?option=com_conten...](http://www.stroppykitten.com/cms/index.php?option=com_content&view=article&id=2:internet-explorer-and-tcp-rst-a-reason-to-dislike&catid=1:tech&Itemid=2)

* A decent chunk of email server code (SMTP & IMAP implementations in particular) is there to handle erroneous client behaviours. The worst cases are those where the workaround leads to misbehaviours (or less optimal behaviours) for conforming clients. If I remember correctly, the popular Outlook series of clients is a notorious source of such warts. A number of SMTP sender libraries will skip over significant parts of the protocol state machine; configuring a mail server to handle that degenerate case can weaken its anti-spam provisions.

~~~
mattmanser
Yes, there were major changes between the way IMAP worked in Outlook 2003 &
Outlook 2007 too, I think they re-wrote their IMAP code as it behaved
completely differently. Admittedly it was much better, but it broke our custom
IMAP server which I ended up fixing. I can't remember exactly why now, but
Outlook 2003's implementation was bizarre, as if they'd not read the RFC.

That reminds me of the bizarre big because they used a short uint to store the
message UID. Maybe it wasn't a short but it definitely wasn't 32-bit as per
the spec, there was some magic number that if you went over 'boom'. As an end
user it appeared that some messages just disappeared.

------
colomon
It seems to me it is a very valid principle in many areas.

For instance, the STEP file standard very clearly states that all input files
must be 7-bit ASCII. Many of the programs that generate these files (including
earlier versions of my own) paid no attention to this and wrote out 8-bit
values in strings if the user requested it. Clearly this behavior is wrong.
(The principle agrees: "Be conservative in what you do.")

However, rejecting an entire CAD file merely because the text strings in it
used an illegal encoding is downright silly. It in no way can change the
meaning of the geometry of the file. There is no hidden vector in there for
malicious attacks. It makes perfect sense to accept illegal files like this
and do your best to make them work, even if it might not get quite the same
text strings the user intended.

I think jbert's point about being conservative in what you do in all respects
is a strong one. Taking that into account suggests that maybe carefully
marking the illegal character as such in the string might well be worthwhile,
and is definitely more appropriate than trying to guess what 8-bit character
standard was intended.

~~~
pornel
That's exactly the kind of security risk that the article is talking about.
Internet Explorer could be tricked to use US-ASCII encoding and interpret
¼script¾ as a script tag (CVE 2006-3227)

Liberal vs strict is a false dichotomy. The third solution is to accept all
possible inputs, but in a specified way.

Instead of taking draconian XML approach you can solve the problem by taking
HTML5 approach and make error handling as interoperable as handling of correct
input. In case of STEP files you could require all implementations to clear
the 8th bit (or drop or clamp bytes out of range — whatever as long as it's
specified and mandatory).

~~~
colomon
Maybe I'm missing something here, but a valid STEP string can already encode
any arbitrary Unicode code point. It just does it using 7-bit ASCII. If your
code is somehow executing these strings without examining their content, then
you are already in big, big trouble.

Trying to do something with 8-bit characters -- whether skipping them,
indicating an illegal character in the string, or trying to guess what was
really meant -- cannot make that situation any worse.

~~~
tedunangst
The problem is if you decode a particular byte sequence that causes a bad
action (if that's possible with step files) in a different way than some other
program that is supposed to keep you safe.

In the case of ie, ie decoded one way and forum software might decode a
different way. So the forum software says the string is safe for the browser
(according to its decoding rules) but then the browser applies different rules
and gets a bad string.

You may not be seeing the danger because you implicitly think a step file from
unsafe sources is always unsafe. But imagine if you had a safe file detector
program, except it applied different rules than the program you're actually
going to open the file with.

~~~
colomon
As jbert pointed out, if your program's main job is to say whether or not
something is safe, and it liberally says "Oh yeah, I think that's safe",
that's pretty much the exact opposite of "be conservative in what you do".

~~~
im3w1l
Please explain the proper way of escaping/rejecting html in forum posts, when
you can't rely on the browsers following the spec.

------
othermaciej
This post, if you read it to the end, doesn't reject the principle of being
liberal in what you accept. Rather, it proposes being liberal in a formally
specified an interoperable way - i.e. specs should explicitly define behavior
for all inputs, including any error correction.

HTML5 takes this path with its parsing algorithm, and in fact is cited as an
example in the post. However, the designers of the parsing algorithm saw it as
being an application of Postel's Principle, rather than an example of the
opposite.

The post is really more nuanced than it sounds and would better be titled
"Specifications Should Define How to be Liberal in What You Accept".

------
caf
Quite apart from security, Postel's Principle can hurt the capacity to make
backwards-compatible feature additions in the future.

For example, if you have a set of unused flag bits documented as "reserved,
must be zero", then a receiver that silently ignores non-zero bits allows
senders that erroneously set those bits to propagate. This is fine, until one
day in a future standard you want to define new behaviour for one of those
bits, and find you can't - because there's large numbers of senders out there
that erroneously set it but don't have any idea about the new behaviour.

------
Arnt
It's a pity so many people grossly misunderstand Postel's Principle.

Postel didn't talk about off-spec behaviour. He talked about the borderline
details, which were often quite hazy in early RFCs. When an RFC says the line
length is at most 512 bytes and the terminator is CRLF, does that mean
510+CRLF or 512+CRLF? Postel says to accept 512+CRLF and send 510+CRLF.

If a write a receiver and want to accept 1024 bytes instead, maybe that's a
good idea and maybe it's a bad idea. But if you do that, don't invoke Postel's
Principle in defense.

~~~
walshemj
As some one who used to work on OSI based systems well thats just sloppy
standards writing the bane of internet standards.

Its a pity that RFC's and other internet standards are not written and
implemented more rigorously - for example Google have problems interpreting
the xml sitemap standard and that is only 3 pages FFS.

~~~
Arnt
I've written ten RFCs of varying quality. It's terribly difficult to write
something that a) gives a good overview of the subject, b) explains the
choices that had to be made, c) spells out every detail, and d) remains short
enough that implementers actually read all of it. All of mine fail in some
way. I've heard the OSI documents failed too.

Quoting one implementer, whose code did not accept non-ASCII passwords: "Oh,
the password syntax is on page 88? My printout ends after page 68". In that
RFC, the details are spelt out in appendices, and Appendix A starts on page
69. (And I'm sure pg assigns bonus karma if you can identify the RFC.)

~~~
im3w1l
Is there any way to request an official clarification for borderline cases the
RFC-author didn't think about when typing it up?

~~~
Arnt
You can submit an erratum and an author will comment and often clarify, so
formally speaking the answer to your question is yes. But other implementers
don't generally read the errata, so you have to expect that your
interoperation peers haven't read the clarification.

Once you understand the problem, the clarification, and that your interop
peers do not, I bet your implementation's handling of the issue will be
conservative in what it sends and liberal in what it receives.

------
jbert
As a counterpoint, perhaps it _is_ reasonable, if interpreted more strictly.

Taking the perl-over-c-stdlib example (but I think it applies in other cases),
if the "perl layer" was more strict in what it sent to the stdlib layer, there
would have been no problem.

i.e. the error is in thinking of only the network as the place to apply the
maxim. In fact, you should scrupulously adhere to every interface you pass
data to (internal or external) - and interpret as reasonably as possible all
interfaces you receive data from.

[I'd agree that the latter pt can be weakened. But it does help interop - and
if you clean up your act before you hit the next layer then you limit any
damage.]

~~~
DanBC
Postel wrote it in 1980. (First found in RFC760[1])

Since then we've had computer viruses, worms, and other malware; we've had
hackers, crackers, spies, criminals, and semi-competent people flooding the
Internet; we have people not just making accidental requests but fuzzing and
fusking to try to break things or bypass controls.

It's a great principle for the human stuff, but it feels really outdated for
technical stuff.

[1] (<http://www.ietf.org/rfc/rfc760.txt>)

~~~
_delirium
I agree, Postel's principle makes a lot of sense in context, if you view it as
a bunch of mostly good-faith people attempting to bootstrap communication in a
new medium. Then it's clear that to get things working, you want to forgive
errors on the receiving side (to the extent you can do so), but send as clean
and unproblematic output as you can. Basically what a sensible, not-anally-
bureaucratic human who's trying to establish communication would do. It was
also, iirc, influenced by some of the difficulties ARPANET had experienced in
getting different implementations to interoperate. But it may make less sense
today.

~~~
ams6110
Agreed. Today, I think "fail fast" is a much better principle.

------
YZF
There are times when you want to build for robustness and times you want to be
more concise. If you have control over the set of inputs (e.g. by formalizing
it and using the right tools) that's great but there's usually some overhead
involved in doing that. My argument would be that security is orthogonal to
robustness - just because you accept input that is outside the originl
specification doesn't mean that you should do that insecurely. The robust
(liberal) implementation and the limited (conservative) implementation simply
support different protocols, they can do either with security holes or
without. Does this increase the attack "surface"? It may or may not.

A bigger problem is when the liberal implementations become the de-facto
standard.

------
MichaelGG
The authors of the SIP spec published another spec (RFC4475[1]) called "SIP
torture tests", where they seem to take a perverse glee in showing how messed
up their "human readable" syntax can get.

They even use the phrase "infer" in several places, encouraging systems to
take obviously malformed packets and try to figure out what they meant.

Being liberal in accepting input, apart from security issues, seems to create
a worse situation. Implementation A messes up something, but B seems to be OK
with it. C then accidentally requires it, while D rejects it. Depending on how
large and responsive the vendors behind those implementations are, you end up
with a nasty state of affairs, with random hacks here and there.

It's hard enough to create unambiguous, comprehensible, specifications.
Telling implementations to be liberal only makes it worse.

~~~
btilly
I can't read this comment without thinking about SOAP.

~~~
salgernon
I could kiss you for that.

If the use of a format for interoperability can only be reasonably used by a
single vendor, it has no benefit over a binary protocol.

The entire SOAP and XML-RPC space is postels law writ large.

~~~
btilly
Yup.

I had to only face the true horrors on one occasion, for a Responsys
integration. They had the C# examples and the Java examples. The API that they
offered for the two had differences because some methods would work with one,
some with the other.

I'm a Perl programmer, so tried that. After all you just have to translate the
language, right? Wrong. After banging my head against that mess for a week or
so, I finally gave up, wrote the communication in Java, and had a Perl
launcher for it.

------
mmahemoff
Browser tolerance for HTML errors is one of the main reasons the web took off
so fast.

~~~
MichaelGG
Citation needed? Is there any reason to believe that if the browsers had
insisted on wellformed documented and provided errors like "error at line X,
table tag not closed" that people would not have been able to fixup documents?
I don't believe that's would have stopped things.

But that exact behaviour, trying to infer intent, meant that tons of
unspecified behaviour had to be added to all browsers to try to mimic which
each one did to handle totally invalid cases.

So, even if leniency did make it easier to create a web page, it also
contributed greatly to the already difficult task of creating consistent
cross-browser rendering.

Look at JavaScript, and the recent semi-colon debacle with Bootstrap and some
other tool. Having "implementer defined" leniency just means you'll get
multiple interpretation and problems.

~~~
othermaciej
We've tried this experiment - it's called XHTML. For a long time, adoption by
authors of the strict error handling it offered was stymied by lack of support
in MSIE. So it's not a full counter-factual. However, we have learned two
things:

(1) Now that MSIE does support true XML parsing of XHTML, almost no one is
choosing to use it over HTML.

(2) Of the few experts who conditionally served either text/html or
application/xhtml+xml depending on the UA, or serve XML unconditionally now,
almost all have bugs in their sites which can get them to produce ill-formed
XML which then shows an error page in the browser (for instance, submitting
comments with certain sorts of errors). This is evidence that the draconian
error handling approach is too challenging even for experts and imposes the
costs of small mistakes on users.

~~~
tedunangst
I think the bigger lesson to be learned is that after poorly followed ad hoc
standards have made a mess of things, it's hard to come in and clean up later.

------
benatkin
I'd be more interested in reading a paper called _How JSON escaped Postel's
Principle_ which included discussion on the ambiguity being pushed to other
areas, such as date parsing.

~~~
tedunangst
Once upon a time, somebody wrote a json parser that used an existing date
parser because component reuse is the shizzle. Then somebody updated the date
parser to handle more formats because they were using it in a different
project. Congratulations, now you have a json parser that eats a multitude of
date formats. (Some liberties with facts taken, but building a tower out of
flexible components results in a flexible tower.)

------
grosskur
See also DJB's notes on protocol design: <http://cr.yp.to/proto/design.html>

------
gbog
I am reminded of the discussion we had with colleagues about Markdown versus
RestructuredText.

On Markdown side, you have sexy but ill-defined grammar, and on RST you have a
slightly less nice-looking guy with a much better defined grammar, which allow
building a saner tooling upon it.

------
michaelfeathers
_Postel’s Principle is wrong, or perhaps wrongly applied._

The Robustness Principle is a prescription that lays down a strategy for
growing robust systems. It works. The problem is that the robustness it
provides isn't quite what people want it to be.

------
jmount
Posel's law is a maintenance and composition nightmare. My take:
[http://www.win-vector.com/blog/2010/02/postels-law-not-
sure-...](http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-
be-angry-with/)

------
tedunangst
See also: <http://www.joelonsoftware.com/items/2008/03/17.html>

------
cousin_it
Also see this list of links from 2004, arguing about whether Postel's Law
applies to syndication formats: <http://www.imc.org/atom-syntax/mail-
archive/msg04697.html> . Sadly, Mark Pilgrim's famous rant is no longer
online.

------
peterkelly
See also: HTML parsers

------
badgar
> Treat input handling computational power as a privilege, and reduce it
> whenever possible.

A great example of this was Google's Code Search product, before it was
canceled. Since full backtracking search was blowing out the tiny thread
stacks in servers, they had to reduce what they allowed to actually _regular
expressions_ \- expressions generating a regular language. Queries could be
turned into DFAs of linear size with respect to input, making arbitrary public
regex searches over code indices feasible.

Ross Cox's regular expressions write-ups are quite a fascinating deep-dive:
<http://swtch.com/~rsc/regexp/>

