Hacker News new | past | comments | ask | show | jobs | submit login

More likely, they're stopping on the first space OR colon to parse the header name since "Content-Length : 0" is valid.

Personally, if I were writing a HTTP request parser while being lazy about enforcing spec, I'd split ONLY on the colon, then just strip the white space on either side of both the header name and value. In Python:

    header, value = line.split(':', maxsplit=1)
    header = header.strip().lower()
    value = value.strip()
After that, `header` should ALWAYS be checked via equality, and never `.startswith(...)`.



Note that parsing is likely more complicated than your code because you have assumed that your “line” has already been identified before parsing the line. AFAIK there is an escape sequence for the header delineator (\r\n).

Also, your code doesn’t fix the issue where a header name with a white space is accepted (which may violate expectations, depending on the server).

Your pseudo code also doesn’t handle edge cases where 2 headers which normalize to the same stripped text collide. One HTTP smuggling vector is the front server keeping a different header value than the back server when 2 header names collide.


> since "Content-Length : 0" is valid.

According to which spec? RFC 7230 allows optional whitespace (OWS) after the colon, but not before it:

   header-field   = field-name ":" OWS field-value OWS


You definitely shouldn't strip left side of header, as space preceding that is syntax for header splitting over multiple lines at least in email. Not sure if this applies to http though, but some parsers may do that anyway, and some don't.

Just shows how easy it is to be wrong by being lazy with http parsing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: