
I've implemented a new HTTP/1.1 request and response parser by hand - fogus
http://four.livejournal.com/1033160.html
======
viraptor
For a couple of reasons this code doesn't look like a very reliable one...

Almost identical cases don't reuse code (not even a define). There are also
sections like "`if (usual[ch >> 5] & (1 << (ch & 0x1f))) break;`" without any
comments. The code hardcodes the http methods for some reason so that the
detection code spans ~130 lines (454..580). Looks like it will accept
"HXXX/1.1" if strict checking is off. This double check is... interesting:

    
    
        if (!parser->FOR##_mark) return 0; \
        assert(parser->FOR##_mark); \
    

Sure - speed++, but at what cost? Otherwise... cool code - I like the MARK /
CALLBACK macros.

~~~
jerf
"I can make it arbitrarily fast if I don't actually have to make it work":
[http://blogs.msdn.com/larryosterman/archive/2009/09/29/i-can...](http://blogs.msdn.com/larryosterman/archive/2009/09/29/i-can-
make-it-arbitrarily-fast-if-i-don-t-actually-have-to-make-it-work.aspx)

~~~
nostrademons
I'm disappointed that the student who hardcoded the results was disqualified.
You should get points for finding bugs in your professors' specification.

~~~
andreyf
Not to mention the "winning" entry did exactly the same thing: found an `n' #
of words which works _for this input only_ , then created a fast algorithm
that works _for this input only_ , but is incorrect in general.

~~~
ricree
Especially since the other solution that got docked for breaking the rules
actually performed the calculation that was requested.

------
brianobush
At my company we rolled our own too. The hard part was tracking down all the
brain-dead servers that seemingly worked fine with browsers, but were off
spec.

~~~
jacobolus
Are there any good sets of HTTP compatibility tests which test deviations from
the spec which show up in existing servers/browsers?

~~~
wmf
<http://coad.measurement-factory.com/>

Warning: not free.

------
xal
It has also been merged into node. This thing is starting to be a case study
for optimized C network servers.

------
rglullis
I need to dig through my old computer and find my college lab assignments
where we built a basic HTTP/1.1 server. For the request parser, we had to
create a lex file for a grammar that include actions for all of the verbs, and
also be robust to accept non-standard verbs, i.e, return a 400 code.

I doubt that my yacc'd program would be only 124 bytes in size, but it would
be interesting to get that old code and compare the results.

------
snorkel
Clever use of ## in preprocessor directives.
<http://en.wikipedia.org/wiki/C_preprocessor>

------
pquerna
I thought it was a pretty cool implementation (zero memory alloc is pretty
hard most of the time), though I was disappointed how HTTP methods were hard
coded in, so adding a new HTTP method (like PROPFIND?) would be adding a ton
of code, though I think you could make a few macros for doing HTTP methods and
keep the rest of it.

~~~
davisp
Hacked together an on_method callback last night for the Python bindings I'm
writing:

[http://github.com/davisp/http-
parser/commit/50e54f95fd4c2eac...](http://github.com/davisp/http-
parser/commit/50e54f95fd4c2eac59d669ef5e9e0e8b61a0955d)

------
bumblebird
By _hand_??? As opposed to what exactly? Isn't most code written by hand?

~~~
ionfish
As opposed to using Ragel, as you'd have known if you'd read the link.

~~~
bumblebird
The title is "I've implemented a new HTTP/1.1 request and response parser by
hand"

My question was, how do you implement a new HTTP/1.1 request and response
parser if it's not by hand. Isn't most code written using hands?

I've written one too... big whoop.

Meh _anyway_...

~~~
viraptor
Most parsers nowadays are generated via yacc and others. You write the
structure, yacc generates the parser (and flex the lexer). It's something
completely different if you write the parser yourself.

~~~
bumblebird
It was just a bit surprising to me. Parsing HTTP isn't exactly complex.

~~~
ryah
clearly you're unfamiliar with http.

~~~
bumblebird
I certainly am. It's a complete mystery to me. Is it like visual basic?

The only PITA with HTTP is chunked encoding. Whoever thought that gem up
should be shot. The rest is fairly trivial. Certainly parsing headers is. This
implementation looks pretty silly. Having individual states for each of the
characters in "HTTP" etc? WTF?

edit: Instead of just downmodding me, why not explain exactly what part of
parsing HTTP headers is non trivial?

~~~
viraptor
The parser correctness is not trivial. The RFC2616 contains the complete
grammar you need, so it's fairly simple to implement. OTOH, if you write a
parser on your own, you're likely to miss stuff like section 4.2, which
explains that header values can be multiline if they include LWS. The parser
from this article will fail on (just looking at the source, I'm 99.5% sure of
this):

    
    
        abc:
         def
    

It also doesn't like tabs and will not support comma-separated header values.
It's not rocket science to write a "good enough" http parser, but writing a
fully compliant one is something completely different. There are also cool
parts of the spec that you can read 10 times and come to different conclusions
- for example what does the "\" CR LF section mean if it's inside a quoted
string and does it finish the header value or not. Writing a "correct" parser
is a LOT of fun...

Keeping separate states for characters in HTTP saves you a couple of cycles
probably, because you match as you go and can reject the message early and
with the exact place that didn't match. It's a bit useless for a 4-letter
string though.

