

The ups and downs of the HTTP header - Keithamus
http://blog.keithcirkel.co.uk/the-ups-and-downs-of-the-http-header/

======
teddyh
A number of errors in this article makes me wary:

1\. The "request" line in HTTP is not a header - it is the request, which can
have associated headers. The headers are all “about” the request. The request
itself is not a header, and does not follow the header syntax. (The historical
reason for this is that the request line was defined in HTTP 0.9, which did
not have headers.)

2\. ISO-8859-1 is not “a crappy Windows character set”. It is an international
standard specifically different from what Microsoft was using at the time
(code page 437 was standard for MS-DOS in the US). Later, Windows switched to
code page 1252, which is a copy of ISO-8859-1 except some extra glyphs in the
bytes the ISO standard defined as control characters.

~~~
Keithamus
Thanks for the clarification about the request line, I'll edit the article to
point that out!

I mostly referred to it as a "crappy Windows character set" because A) it has
a limited set of characters, mostly Western European, and B) it's pretty much
only used by Windows these days. While the term "crappy Windows character set"
is not perhaps entirely accurate, it is a short, tongue in cheek summary of
ISO-8859-1.

~~~
wereHamster
Unicode also has a limited set of characters, mostly those that the unicode
consortium has agreed on including in the standard.

~~~
Keithamus
That's splitting hairs - UTF-8 allows for over a million code points, enough
to cover pretty much every written language, and then some (including swathes
of emoji characters). ISO-8859-1 has 256 code points, barely enough to cover
Europe and America.

------
IgorPartola
Why is the UA header so screwed up, aside from the historical issues with it?
Isn't it time that we replace it with something a bit more sane and
structured? It seems the idea of detecting the browser vs detecting browser
features goes back and forth. Sure, on the client side, where you have access
to the DOM and the JavaScript runtime, it's great to know whether you can use
the placeholder attribute in a text input, but server-side you need to decide
which video file to serve to the client, and this gets tricky.

Instead, why don't we have something like this?:

    
    
        OS: Windows
        OS-Version: 8.1
        Browser: Chrome
        Browser-Version: 18.5
    

(Not suggesting the format, just the type of data.)

That way we can ditch the stupid stuff such as "like Gecko" which means
nothing, and focusing on actual useful things.

~~~
_greim_
Web developers have historically tended to write shitty UA detection logic en
masse, which has in turn incentivized browser makers to carefully craft UAs to
break as few of them as possible. Basically, avoiding the all-too-common "this
site requires IE6 or higher" message when you visit a site in IE11. The same
situation would likely develop with the proposal above, which is essentially
no different from the original intent for UA strings. The most viable option
would be for all browser makers to just simultaneously disable them. Like a
band-aid; right off!

------
MichaelGG
Well as part of a rant, I'll point out two bizarro-world features of HTTP
headers: Line folding and comments.

You can add arbitrary crlfs to any header, so long you start the next line
with whitespace. Proper implementations need to properly treat every next line
as part of the single header. Very annoying to implement (and other similar
protocols implementations' do not all agree!), and no benefit. Unless you're
composing HTTP headers to read on a 80-column layout. And that kind of thing
has no place in a computer protocol.

Comments. Seriously read this from the spec:

    
    
      Comments can be included in some HTTP header fields by surrounding
      the comment text with parentheses. Comments are only allowed in
      fields containing "comment" as part of their field value definition.
      In all other fields, parentheses are considered part of the field
      value.
    

That's even more bizarre. It further makes parsing need to know which header
it is operating on. It just adds possibility for mis-implementation, security
issues (confused deputy) and hurts performance. It's only useful if you're
writing HTTP headers by hand and feel the need to comment them for ... I can't
think of a legit case.

"Human readable" computer protocols are debatable (parsing rules always seem
to become more difficult, which is very bad), but "human writable" is just
silly.

~~~
ChickeNES
This tripped me up to no end when I had to implement a web proxy in one of my
intro to CS classes. I couldn't find this mentioned in the standard anywhere
and different browsers treated it differently.

~~~
MichaelGG
I've discovered exploitable holes "in-the-wild" due to SIP using the same
inane parsing rules. Proxy A asserts security and billing. Server B processes
the message but instead of reading Proxy A's assertions, it reads "cutely
formed" data directly from the client.

Fixing is a royal pain, because some systems require the behaviour to be one
way or another.

"Fortunately" security in VoIP is such a joke that tricks like this aren't the
biggest issue and so far, I've not seen any such attempts in any attacks.

------
rplnt
A bit of trivia why Opera is claiming to be 9.80: They used 10.00 in beta of
Oepra 10 and found out that many site's sniffers couldn't process 2-digit
version number. So with final release (and after that until the death of the
browser) they used Opera/9.80 and put the actual version elsewhere in the
string.

That being said, people who sniff UA string to serve different content (or
even block the user) should end up in hell. I'd start with Google.

~~~
webignition
_That being said, people who sniff UA string to serve different content (or
even block the user) should end up in hell._

Goodness me how I could rant endlessly on this subject.

I operate an automated web frontend testing service and much of that centres
around retrieving a HTML document and running some tests against it.

I have tried very hard to be nice and fair and to set appropriate UA strings,
such as featuring only the product name and relevant version numbers.
Unfortunately for reasons relating to how responses are altered in relation to
the UA string this is not possible.

My product features the word 'test' in the name. Some server-side services
return a 404 or a 500 if the UA string contains 'test' in any form. Due to
this I can't include the full product name in the UA string and expect all
tests for all end users to work in cases where they really should. Some others
respond similarly is the UA string is only 'agent'.

The number of services that respond in a different manner to a blank UA string
is significant. Likewise for cases where the UA string is not somewhat similar
to that of common browsers.

On a related subject, I'd love it if everyone supported the simple HEAD method
consistently.

Some services respond as expected and return only the response headers. Some
services respond fairly with either a '405 Method Not Allowed' or '501 Not
Implemented', giving me the option to try again with an equivalent GET
request. Some services send a 404 or 500 in response to a HEAD in cases where
the equivalent GET request works just fine.

And lastly, [https://myspace.com/](https://myspace.com/) responds with nothing
when making a HEAD request and you have to wait for the request to time out in
cases where an equivalent GET works just fine.

------
nmc
Interesting article, but for the part about the User-Agent header, I really
liked the history lesson by Aaron Andersen [1] from 2008.

[1] [http://webaim.org/blog/user-agent-string-
history/](http://webaim.org/blog/user-agent-string-history/)

------
yukkurishite
Can't say I like the design of the page, but a good read nonetheless. Though
after all those warnings, I expected it to be much longer. Is it really that
long an article?

------
crazygringo
> _Opera 12 then just gets weird on us. It says "Generic English please, or
> U.S English, if not then uh... Arabic! If not then perhaps Catalan? If not
> then Danish, or if not that then Dutch. Ok perhaps Greek? Finnish?... Go
> home Opera, you're drunk._

Most amusing part. Seriously, I can't imagine why Opera sends all these
languages in its request. Bizarre.

~~~
throwaway0094
Not only that, but ... prioritized!

------
julien_c
Slightly off topic, but this is the first post I've read on a Ghost-powered
blog – I think it looks great.

~~~
nly
Plain black text on a white background is great now?

~~~
noblethrasher
Yes.

~~~
julien_c
There's more to Web content than the colour of its fonts.

