

Tell HN: Please Stop Breaking HTTP Clients - tdavis

For whatever ungodly reason, the news.yc web server doesn't terminate header lines properly when sending a response, which happens to break any HTTP client that actually follows the RFC. Please see RFC 2616 section 3.7.1 and use <i>\r\n</i> as God intended.<p><pre><code>   This flexibility regarding
   line breaks applies only to text media in the entity-body; a bare CR
   or LF MUST NOT be substituted for CRLF within any of the HTTP control
   structures (such as header fields and multipart boundaries).
</code></pre>
Thank you!
======
nir
Voted up, as a good example why it's better to build on top of existing
software rather than reinvent the wheel (or in this case, Apache)

~~~
blasdel
Apache is probably the worst example you could have possibly picked, it's such
a dog.

~~~
kaens
Could you expand on why you don't like Apache? I'll grant you that it can be
confusing to get used to configuring, especially if you're not _really_
familiar with how the web works, but as far as I can tell it's a pretty damn
good server.

What do you prefer over apache?

~~~
blasdel
My fundamental beef with Apache is that people wrap _way_ too much
functionality into it, to the point where their entire end-to-end web stack is
two sets of processes: Apache and a database server.

On top of that it's configuration is awful, the modules are all awful compared
to their domain-specific alternatives, the architecture is awful (especially
wrt concurrency), the development process is molasses, and the Apache Software
Foundation has all but abandoned httpd for IBM-focused all-Java astronaut
architecture with as much bureaucracy as they can possibly fit into a public
process.

I hate Apache because I'm _intimately_ familiar with how the web works, and
the ASF is responsible for so much web-hostile WS-* garbage.

For a while (~4 years ago) I thought lighttpd was worthwhile as a total
replacement, and did a project that hacked it's WebDAV implementation for
userspace filesystems (pre-MacFUSE), but fundamentally its architecture is
still Apache just with sensible concurrency.

I've come to be really fond of the reverse-proxy model, and really like Nginx
running in front of independent app processes using whatever HTTP abstraction
is native to the language (WSGI, Rack, Servlets, etc.) along with a nice
native high-level spec-focused HTTP server (twisted.web, mongrel, ???). The
last web application I wrote from scratch was on Google AppEngine, and I
_really_ like their version of WSGI.

~~~
tome
I'm not familiar with what you're talking about. Could you clarify something
for me? Why would you need Nginx and twisted.web? Wouldn't you just call your
WSGI application from Nginx?

~~~
jaddison
Nginx's architecture doesn't really mesh well with WSGI's interface
protocol... I believe it has to do with blocking the serving Nginx process
whilst the WSGI request is being processed.

Nginx prefers to "pass off" the request to a web-app server to do the heavy
lifting and take care of fast, easy serving in the fastest way possible.

~~~
tome
Thanks!

------
visitor4rmindia
Maybe this would help?

Section 19.3

 _The line terminator for message-header fields is the sequence CRLF. However,
we recommend that applications, when parsing such headers, recognize a single
LF as a line terminator and ignore the leading CR._

~~~
blasdel
That guidance is for how best to "be liberal in what you accept"

news.yc's silly problem is in the area of "be conservative in what you emit"

------
aristus
Voted up. But a lot of homebrew webservers don't adhere and you'll just have
to deal. In this case "God" is W3C, and not every decision they made was good,
eg the "Referer" [sic] header. :)

[deleted grumble about writing production HTTP clients in 2009]

~~~
paulgb
I wonder if anyone has tried to calculate how many bytes of data transfer have
been saved from w3c's spelling of the word "referer".

~~~
aristus
Fewer than that wasted by redundant CRLFs.

------
davidw
I wonder if this is the reason I get a proxy error when I try and browse HN
from my mobile phone.

------
coderrr
I had to make changes to ruby's em-http-request for this

[http://github.com/coderrr/em-http-
request/commit/8e6444fe472...](http://github.com/coderrr/em-http-
request/commit/8e6444fe4727f05a7c4e1efc2cc39555a2ae2338)

------
pufuwozu
I had the same problem a while ago and ended up writing a patch to make PHP's
HTTP library more flexible:

<http://pecl.php.net/bugs/bug.php?id=15223>

------
ivank
This was (probably) fixed in Anarki a while ago:

[http://github.com/nex3/arc/commit/6cb43b3a5977950a61bfd6ce5a...](http://github.com/nex3/arc/commit/6cb43b3a5977950a61bfd6ce5a0af6f18f1df558)

------
herf
The rather good Fiddler debugging proxy (<http://www.fiddler2.com/fiddler2/>)
always flags HN for this. In a popup, no less. So it drives me nuts too, and
if it could be fixed, I could click on fewer popups.

------
stefano
It broke my http client too (written in Arc, btw) some time ago:
<http://arclanguage.org/item?id=8283>

------
jacquesm
So, you're crawling HN and you expect the target of your crawl to fix
something so you have an easier time of it :) ?

~~~
tdavis
I'm not doing anything with HN. I expect the server to emit the proper line
breaks because that's what it should do, especially if it's going to be made
freely available and used by others.

There's no reason that such a simple bug should have made it into or remained
in the production source for so long. It's laziness for laziness' sake.

