Hacker News new | past | comments | ask | show | jobs | submit login
HTTP for Servers (and.org)
97 points by dedalus on Sept 16, 2012 | hide | past | web | favorite | 28 comments

> This is hidden at the bottom of section 2.1 "Augmented BNF" ... which you're almost guaranteed to skip.

I have no sympathy for someone who reads the descriptive parts of the specification, trying to build their implementation from examples, and then ignores the machine-parse-able, normative grammar at the bottom: if you aren't reading the specification, you are not qualified to build a server for it.

Seriously: this guy seems to believe that the correct way to implement something is with a massive set of examples and test cases, where you massage your implementation until all of the examples work right and all the tests pass.

> Then any sane person can have a quick look through the rfc, impliment what they think is required and when they're finished run the tests to see.

No: you don't just "implement what you think is required", you implement the specification. It really isn't that hard to build a parser: starting from scratch, I wrote a parser combinator library followed by an IMAP parser (very intricate grammar) in a few days.

In truth, the people at the IETF went through a lot of work to build that grammar, and if you implement that grammar and prove your implementation of the grammar, you have implemented this specification and can move on with your life.

To quote Mark Crispin, the guy who designed and is still in charge of the IMAP specification (whose "combined works", in the form of mailing list posts and IETF documents, I've recently been studying to level up my "historical perspective") said just last year:

> First and foremost, the Formal Syntax section of RFC 3501 should be your holy book. If any part of RFC 3501 distracts you from the Formal Syntax, ignore it in favor of the Formal Syntax.


> Whatever you do, DO NOT ATTEMPT TO IMPLEMENT ANY COMMAND OR RESPONSE BY LOOKING AT THE EXAMPLES! You are guarantee to screw up if you use the examples as a model (that means YOU, Microsoft!). Use only the Formal Syntax. The result should indeed look like the examples; and if it doesn't then study the Formal Syntax to understand why.


> Fifth, keep in mind that pretty much all of RFC 3501 is mandatory to implement. ...


(edit: I had a ton more complaints here, but I believe that they are irrelevant and overly long.)

> Seriously: this guy seems to believe that the correct way to implement something is with a massive set of examples and test cases, where you massage your implementation until all of the examples work right and all the tests pass.

You have no idea how common that utterly bonkers way of thinking is. I have a colleague according to whom every protocol is trivial to implement. Why? "Trace what an existing implementation does, and replace the values. It works"

> I have no sympathy for someone who reads the descriptive parts of the specification, trying to build their implementation from examples, and then ignores the machine-parse-able, normative grammar at the bottom: if you aren't reading the specification, you are not qualified to build a server for it.

I don't think it invalidates any of his other points. The important thing to get is that implementing a valid HTTP server (or client) takes a lot of work and is not made trivial by how the RFC is built. Lots of people have thought it would be and are still fixing bugs years later.

Getting the syntax parsing right is just one part of the job. Most of the issues come with the semantic and as he shows, the violations in layering. Here is an example of the state machine to implement to make your web-server semantically correct: http://wiki.basho.com/images/http-headers-status-v3.png . The schema is complete but still misses Keep-Alive, Range requests, Chunked encoding, ...

Yes: I agree that these standards are complex; in some cases, I believe these standards even are poorly designed (although I try to reserve judgement until I've "thought through the problem" for at least a few years, and until I've spent enough time reading through documents from the era trying to figure out why a specific feature was designed the way it was); in a handful of cases I even have direct evidence to conclude the people who drafted the spec were disappointed with the result: however, whining about the complexity of a specific set of semantics while indicating a disdain for even reading the entire specification... there is simply no reason to trust the resulting conclusions (and you really are just trusting him: he doesn't put any of these things into context or explain how they are problematic to implement).

Some things in life are complex and some of those things actually had to be fairly complex (maybe not as complex as they are) to handle their myriad goals. In the case of HTTP, there are tons of things people want to be able to do: all of those headers he is complaining about do something, and as someone who uses HTTP quite often I am honestly not certain which ones I'd be willing to live without... you could always encode them differently, but that's again just a syntax problem: the semantics of a distributed document database that is flexible enough to support the notion that people speak different languages, that file formats go in and out of style, that you may want or need third-parties to proxy information for purposes of either filtering or improving performance and yet at the same time allows for state to be persisted and only sometimes shared on cached documents... this is a hard problem, and when "lots of people have thought it would be [trivial]" they are just being naive, and I'd go so far as to say "disrespectful" of the hard work put in by the people who came before them in the attempt to design these specifications.

To then respond to one of the specific points you made, the "violations in layering", this guy's proposal is itself a layering violation (something I had originally spent some time complaining about in my first post, but then removed as I realized there was no point defending implementation details from someone who doesn't feel people should read the spec): imagine what it would mean to have these kinds of requests coming through a proxy server... what does the proxy server do about the state? You either have the proxy treated as a pass-through, in which case the state is going to get highly muddled and require one-to-one connection mapping through the proxy, or you are going to have the proxy handle its own state, which will ruin the performance benefits of not having to do multiple round-trips (as the proxy will have to wait for the previous request to come back before sending the new one).

Even if he did have something actually better, the people who built these specifications that he's ranting about had the really hard problem of getting a bunch of people who had differing ideas about what they were willing or wanting to implementing and had varying amounts of resources and ability to alter existing systems; honestly, I think they've done a great job on the whole, and I have absolutely no faith that if this guy were somehow in their place that we would have ended up with something different, and certainly not something better. The idea that someone then just spends a bunch of time ranting about the work these people did (or even the real-world complexities that the people working on Apache httpd ran into that caused them to end up with some of the specific incompatibilities mentioned) without really painting a picture of the original context, finding out what the various challenges and goals were, and then pointing out specific ways the result could have worked out differently with the available information. As it stands, this is just textbook de-motivational whining. :(

The best idea from the article is the recommendation of an official unit test suite. I'd love it if every group that builds an API or protocol spec were required to include a unit test suite for the defined behavior prior to finalization of the spec. It would not only make implementations easier but more importantly result in the spec writers running into some of the design problems they tend to miss.

I think this is similar to what happens in the Java Community Process with TCKs.

"One of the three required pieces for each JSR is the Technology Compatibility Kit (TCK). A TCK is a suite of tests, tools and documentation that determines whether or not a product complies with a particular Java™ technology specification. A TCK must be included in a final Java technology release under the direction of its Expert Group."

Quoted from http://jcp.org/en/resources/tdk

The reality is though that even with how screwed up the spec is HTTP is an enormous success. It got more complex as time went on but in the early days it was fairly simple.

Now contrast that with SAML, OAuth2, WS-Security (anyone care to add any others?) and you can see how much of a trainwreck we avoided.

Just imagine what would have happened if 301/302 redirects had been replaced with delegation, OpenID style. We might not have an Internet yet.


HTTP is a truly messy protocol, and from what i have seen the proposals for HTTP2 aren't going to make it much simpler, quite the contrary (as the author of Varnish explained here: http://news.ycombinator.com/item?id=4253538 ) and don't address some of its fundamental flaws.

I have been thinking for a while about writing down a simple and small subset of HTTP plus some conventions and call it HTTP 0.2, I even got a pre-draft (hah) of what the goals for such a spec would be, maybe i should start to work on it again: http://http02.cat-v.org

HTTP has its problems, but implementing it is a walk in the park compared to many other telecom protocols. Try SIP and its dozens of add-on RFCs, or tracing a H.323 call flow, or even making sense of something like Megaco from the ITU/IETF spec.

Cool idea. Less is more. People have done cool things, e.g., with C by retsricting themselves to a subset of it.

It seems counterintuitive but I think reducing the number of features in a spec, a language, etc., actually gives way to _more_ creativity.

Indeed, it seems "too much freedom/lack of limitations" in the HTTP spec appears to be the underlying condition that fuels Antill's rant.

(And, funnily enough, an article about the "tyranny of choice" serendipitously appears on the HN simulatneous with this one. I didn't notice it until _after_ I typed this comment.)

There is also Rob Pike's excellent post titled "Less is Exponentially More":


It's not counter-intuitive at all. What is architecture if not a set of constraints?

So how do we convince programmers (architects?) to use languages and programs that are small, which due to lack of features appear to have lots of restraints? (really they don't; but one needs to be creative) How do we produce specs that are not the result of committees and packed with everyone's favorite feature? (and thereby masochistic for any mortal to try to implement)

See the 1999 interview with Ken Thompson that luriel posted earlier. It has some great comments about these dynamics.

When I say "counterintuitive", I am referring to most programmers, judging by what's said in forums like this one, will _not_ be receptive to perceived restraints. They will believe it limits what they can do. Instead, they will see a langauge where there are multiple ways to do the same thing with the same effectiveness as somehow more flexible, making them more productive. They want hundreds of libraries. They want IDE's. They do not want to write things in Scheme. Some even hate the command line. I say these things about "most" programmers. Not all. (thankfully)

Again, Thompson's comments are insightful here. Many programmers are cogs in a corporate wheel. They have little opportunity to be creative. They want to save time and effort in any way possible.

But some people, the increasingly rare few, will see things differently. I side with Thompson's preference in that I cannot understand large languages where I have to work from the top down and read a 500 page manual to try to understand all the components. I cannot manage that degree of complexity as easily as I can I can work with something simple and build things from the bottom up. (Alas, the things I can build do not amount to a UNIX kernel! :)

So, yeah, it's not counterintuitive to me. It makes perfect sense. But I don't find myself agreeing with most comments about these issues I see in forums. I can't even read StackOverflow. The herd mentality is overbearing.

Perhaps that's why he said "It seems counterintuitive..."

If he thought it was counterintuitive, then he would have said "It is counterintuitive..."

lol.. just saw this. I would modify my statement for your pedantic approval if I was able.

This trick is handy to break DPI systems or firewalls (Like China's).

I remember few years ago I connect to youtube directly in China using a python http proxy by replacing spaces to \t in http request headers. The GFW let the request pass.

Another neat trick: linefeed instead of CRLF.

Some access control software that major mobile carriers use to captive-portal their data traffic only match on HTTP requests that use CRLF. If the same HTTP traffic uses linefeeds, it's not matched by the ACLs.

A simple HTTP proxy that does s/\r\n$/\n/g would allow one to surf for free, even if their bill hasn't been paid in months. Boy, that HTTP standard sure is handy :D

The whitespace rule as stated there is as far as my understanding goes incorrect. Whitespace is allowed between quoted strings and word tokens, not between individual production rules (if that is the term). By that I mean "q = 0 . 1" is invalid, "q = 0.1" however is valid. That simplifies things a lot.

If you lex the headers in two phases (handle line continuations and header combination first, feed joined lines to parser) HTTP is reasonably simple to parse.

yes. "Implied LWS" rule says it can be included between any tokens, but tokens MUST be separated by a delimiter therefore if 0.000 is allowed (no delimiters) then it means it is a single token and "implied LWS" rule is not applicable for the 0.000 part in the quality values:

* implied LWS The grammar described by this specification is word-based. Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. At least one delimiter (LWS and/or

      separators) MUST exist between any two tokens (for the definition
      of "token" below), since they would otherwise be interpreted as a
      single token.*  
</quote> http://www.ietf.org/rfc/rfc2616.txt

Quality values:

       qvalue         = ( "0" [ "." 0*3DIGIT ] )
                            | ( "1" [ "." 0*3("0") ] )

It seems the author is mistaken in allowing:

   0     .   0          0   0

The quality range between 0 and 1 is because it's a percentage.

Think of an image, the HTTP request being for an image. The client can say (in an Accept header), "Give me a lossless .tiff if you have it, but if you don't then I'd like an image in .jpg format that is 80% quality. If you can't do that then I'd prefer a .gif that is equivalent to 50% quality. If you've failed on that... well, screw it, I'll take ASCII art, it's better than nothing.".

The quality represents a loss in quality of the supplied resource, and a preference list of the order of acceptable types (in the case of an Accept header) for each step down in quality.

It's a bit of an abuse to read quality as "order by" and to treat the percent quality as "range 0-1000".

I think that example is actually given in the specs, but I'm fairly sure audio was the resource type used.

And this is why we use actual web servers, that are HTTP compliant (it takes years), to handle incoming HTTP requests from the Internet. 90% of the "web framework" I've played with only have limited HTTP support and must not be directly on port 80.

What "actual web server" HTTP/1.1 compliant would you recommend, since this article tends to show that no "actual web server" handle it correctly ?

Leaving aside the pesky little issue of what the spec actually says, in practice Apache or nginx will serve nicely.

This rant is from 2007.

2007 is still pretty recent considering the lifespan of 1.1. http://www.w3.org/Protocols/History.html

Has anyone found the software yet? All the source links appear to be dead-ends.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact