

HTTP for Servers - dedalus
http://www.and.org/texts/server-http

======
saurik
> This is hidden at the bottom of section 2.1 "Augmented BNF" ... which you're
> almost guaranteed to skip.

I have no sympathy for someone who reads the descriptive parts of the
specification, trying to build their implementation from examples, and then
ignores the machine-parse-able, normative grammar at the bottom: if you aren't
reading the specification, you are not qualified to build a server for it.

Seriously: this guy seems to believe that the correct way to implement
something is with a massive set of examples and test cases, where you massage
your implementation until all of the examples work right and all the tests
pass.

> Then any sane person can have a quick look through the rfc, impliment what
> they think is required and when they're finished run the tests to see.

No: you don't just "implement what you think is required", _you implement the
specification_. It really isn't that hard to build a parser: starting from
scratch, I wrote a parser combinator library followed by an IMAP parser (very
intricate grammar) in a few days.

In truth, the people at the IETF went through a lot of work to build that
grammar, and if you implement that grammar and prove your implementation of
the grammar, you have implemented this specification and can move on with your
life.

To quote Mark Crispin, the guy who designed and is still in charge of the IMAP
specification (whose "combined works", in the form of mailing list posts and
IETF documents, I've recently been studying to level up my "historical
perspective") said just last year:

> First and foremost, the Formal Syntax section of RFC 3501 should be your
> holy book. If any part of RFC 3501 distracts you from the Formal Syntax,
> ignore it in favor of the Formal Syntax.

...

> Whatever you do, DO NOT ATTEMPT TO IMPLEMENT ANY COMMAND OR RESPONSE BY
> LOOKING AT THE EXAMPLES! You are guarantee to screw up if you use the
> examples as a model (that means YOU, Microsoft!). Use only the Formal
> Syntax. The result should indeed look like the examples; and if it doesn't
> then study the Formal Syntax to understand why.

...

> Fifth, keep in mind that pretty much all of RFC 3501 is mandatory to
> implement. ...

[http://mailman2.u.washington.edu/pipermail/imap-
protocol/201...](http://mailman2.u.washington.edu/pipermail/imap-
protocol/2011-June/001471.html)

(edit: I had a ton more complaints here, but I believe that they are
irrelevant and overly long.)

~~~
zimbatm
> I have no sympathy for someone who reads the descriptive parts of the
> specification, trying to build their implementation from examples, and then
> ignores the machine-parse-able, normative grammar at the bottom: if you
> aren't reading the specification, you are not qualified to build a server
> for it.

I don't think it invalidates any of his other points. The important thing to
get is that implementing a valid HTTP server (or client) takes a lot of work
and is not made trivial by how the RFC is built. Lots of people have thought
it would be and are still fixing bugs years later.

Getting the syntax parsing right is just one part of the job. Most of the
issues come with the semantic and as he shows, the violations in layering.
Here is an example of the state machine to implement to make your web-server
semantically correct: <http://wiki.basho.com/images/http-headers-
status-v3.png> . The schema is complete but still misses Keep-Alive, Range
requests, Chunked encoding, ...

~~~
saurik
Yes: I agree that these standards are complex; in some cases, I believe these
standards even are poorly designed (although I try to reserve judgement until
I've "thought through the problem" for at least a few years, and until I've
spent enough time reading through documents from the era trying to figure out
why a specific feature was designed the way it was); in a handful of cases I
even have direct evidence to conclude the people who drafted the spec were
disappointed with the result: however, whining about the complexity of a
specific set of semantics while indicating a disdain for even reading the
entire specification... there is simply no reason to trust the resulting
conclusions (and you really are just trusting him: he doesn't put any of these
things into context or explain how they are problematic to implement).

Some things in life are complex and some of those things actually had to be
fairly complex (maybe not as complex as they are) to handle their myriad
goals. In the case of HTTP, there are tons of things people want to be able to
do: all of those headers he is complaining about _do something_ , and as
someone who uses HTTP quite often I am honestly not certain which ones I'd be
willing to live without... you could always encode them differently, but
that's again just a syntax problem: the semantics of a distributed document
database that is flexible enough to support the notion that people speak
different languages, that file formats go in and out of style, that you may
want or need third-parties to proxy information for purposes of either
filtering or improving performance and yet at the same time allows for state
to be persisted and only sometimes shared on cached documents... this is a
hard problem, and when "lots of people have thought it would be [trivial]"
they are just being naive, and I'd go so far as to say "disrespectful" of the
hard work put in by the people who came before them in the attempt to design
these specifications.

To then respond to one of the specific points you made, the "violations in
layering", this guy's proposal is itself a layering violation (something I had
originally spent some time complaining about in my first post, but then
removed as I realized there was no point defending implementation details from
someone who doesn't feel people should read the spec): imagine what it would
mean to have these kinds of requests coming through a proxy server... what
does the proxy server do about the state? You either have the proxy treated as
a pass-through, in which case the state is going to get highly muddled and
require one-to-one connection mapping through the proxy, or you are going to
have the proxy handle its own state, which will ruin the performance benefits
of not having to do multiple round-trips (as the proxy will have to wait for
the previous request to come back before sending the new one).

Even if he did have something actually better, the people who built these
specifications that he's ranting about had the really hard problem of getting
a bunch of people who had differing ideas about what they were willing or
wanting to implementing and had varying amounts of resources and ability to
alter existing systems; honestly, I think they've done a great job on the
whole, and I have absolutely no faith that if this guy were somehow in their
place that we would have ended up with something different, and certainly not
something better. The idea that someone then just spends a bunch of time
ranting about the work these people did (or even the real-world complexities
that the people working on Apache httpd ran into that caused them to end up
with some of the specific incompatibilities mentioned) without really painting
a picture of the original context, finding out what the various challenges and
goals were, and then pointing out specific ways the result could have worked
out differently with the available information. As it stands, this is just
textbook de-motivational whining. :(

------
mbell
The best idea from the article is the recommendation of an official unit test
suite. I'd love it if every group that builds an API or protocol spec were
required to include a unit test suite for the defined behavior prior to
finalization of the spec. It would not only make implementations easier but
more importantly result in the spec writers running into some of the design
problems they tend to miss.

~~~
motter
I think this is similar to what happens in the Java Community Process with
TCKs.

"One of the three required pieces for each JSR is the Technology Compatibility
Kit (TCK). A TCK is a suite of tests, tools and documentation that determines
whether or not a product complies with a particular Java™ technology
specification. A TCK must be included in a final Java technology release under
the direction of its Expert Group."

Quoted from <http://jcp.org/en/resources/tdk>

------
jtchang
The reality is though that even with how screwed up the spec is HTTP is an
enormous success. It got more complex as time went on but in the early days it
was fairly simple.

Now contrast that with SAML, OAuth2, WS-Security (anyone care to add any
others?) and you can see how much of a trainwreck we avoided.

~~~
patio11
Just imagine what would have happened if 301/302 redirects had been replaced
with delegation, OpenID style. We might not have an Internet yet.

------
luriel
HTTP is a truly messy protocol, and from what i have seen the proposals for
HTTP2 aren't going to make it much simpler, quite the contrary (as the author
of Varnish explained here: <http://news.ycombinator.com/item?id=4253538> ) and
don't address some of its fundamental flaws.

I have been thinking for a while about writing down a simple and small subset
of HTTP plus some conventions and call it HTTP 0.2, I even got a pre-draft
(hah) of what the goals for such a spec would be, maybe i should start to work
on it again: <http://http02.cat-v.org>

~~~
ilikeit
Cool idea. Less is more. People have done cool things, e.g., with C by
retsricting themselves to a subset of it.

It seems counterintuitive but I think reducing the number of features in a
spec, a language, etc., actually gives way to _more_ creativity.

Indeed, it seems "too much freedom/lack of limitations" in the HTTP spec
appears to be the underlying condition that fuels Antill's rant.

(And, funnily enough, an article about the "tyranny of choice" serendipitously
appears on the HN simulatneous with this one. I didn't notice it until _after_
I typed this comment.)

~~~
aut0mat0n1c
It's not counter-intuitive at all. What is architecture if not a set of
constraints?

~~~
therefore0
Perhaps that's why he said "It seems counterintuitive..."

If he thought it was counterintuitive, then he would have said "It is
counterintuitive..."

~~~
aut0mat0n1c
lol.. just saw this. I would modify my statement for your pedantic approval if
I was able.

------
est
This trick is handy to break DPI systems or firewalls (Like China's).

I remember few years ago I connect to youtube directly in China using a python
http proxy by replacing spaces to \t in http request headers. The GFW let the
request pass.

~~~
peterwwillis
Another neat trick: linefeed instead of CRLF.

Some access control software that major mobile carriers use to captive-portal
their data traffic only match on HTTP requests that use CRLF. If the same HTTP
traffic uses linefeeds, it's not matched by the ACLs.

A simple HTTP proxy that does s/\r\n$/\n/g would allow one to surf for free,
even if their bill hasn't been paid in months. Boy, that HTTP standard sure is
handy :D

------
the_mitsuhiko
The whitespace rule as stated there is as far as my understanding goes
incorrect. Whitespace is allowed between quoted strings and word tokens, not
between individual production rules (if that is the term). By that I mean "q =
0 . 1" is invalid, "q = 0.1" however is valid. That simplifies things a lot.

If you lex the headers in two phases (handle line continuations and header
combination first, feed joined lines to parser) HTTP is reasonably simple to
parse.

~~~
d0mine
yes. "Implied LWS" rule says it can be included between any tokens, but tokens
MUST be separated by a delimiter therefore if 0.000 is allowed (no delimiters)
then it means it is a single token and "implied LWS" rule is not applicable
for the 0.000 part in the quality values:

* implied LWS The grammar described by this specification is word-based. Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. At least one delimiter (LWS and/or
    
    
          separators) MUST exist between any two tokens (for the definition
          of "token" below), since they would otherwise be interpreted as a
          single token.*  
    

</quote> <http://www.ietf.org/rfc/rfc2616.txt>

Quality values:

    
    
           qvalue         = ( "0" [ "." 0*3DIGIT ] )
                                | ( "1" [ "." 0*3("0") ] )
    
    
    

It seems the author is mistaken in allowing:

    
    
       0     .   0          0   0

------
buro9
The quality range between 0 and 1 is because it's a percentage.

Think of an image, the HTTP request being for an image. The client can say (in
an Accept header), "Give me a lossless .tiff if you have it, but if you don't
then I'd like an image in .jpg format that is 80% quality. If you can't do
that then I'd prefer a .gif that is equivalent to 50% quality. If you've
failed on that... well, screw it, I'll take ASCII art, it's better than
nothing.".

The quality represents a loss in quality of the supplied resource, and a
preference list of the order of acceptable types (in the case of an Accept
header) for each step down in quality.

It's a bit of an abuse to read quality as "order by" and to treat the percent
quality as "range 0-1000".

I think that example is actually given in the specs, but I'm fairly sure audio
was the resource type used.

------
jvehent
And this is why we use actual web servers, that are HTTP compliant (it takes
years), to handle incoming HTTP requests from the Internet. 90% of the "web
framework" I've played with only have limited HTTP support and must not be
directly on port 80.

~~~
vhf
What "actual web server" HTTP/1.1 compliant would you recommend, since this
article tends to show that no "actual web server" handle it correctly ?

~~~
pjscott
Leaving aside the pesky little issue of what the spec _actually says_ , in
practice Apache or nginx will serve nicely.

------
hiccup
This rant is from 2007.

~~~
andrewthornton
2007 is still pretty recent considering the lifespan of 1.1.
<http://www.w3.org/Protocols/History.html>

------
SageRaven
Has anyone found the software yet? All the source links appear to be dead-
ends.

