Hacker News new | past | comments | ask | show | jobs | submit login
The long and sordid history of the user-agent HTTP header field (webaim.org)
195 points by damian2000 on July 15, 2012 | hide | past | web | favorite | 46 comments



...and everyone pretended to be everyone else, and confusion abounded.

And then JavaScript driven feature detection came to be, and everyone thought it was a good idea. And the people wrung their hands and wept


Not really, because it won't allow you to serve an MS compitable css page.


IE conditional comments solve that problem.


What's wrong with it? Unlike user agent sniffing, it works properly.


Oh? So how do I use it to detect which box model to use in my CSS?

How do I use it to detect whether I can use gradients?

And if you're just thinking about javascript, I've ran into a few text manipulation functions that always returned "not implemented". But hey, sniffer says they're there!

edit: Oh, and there's still plenty of html fun, too. How does your browser support <video>? <thead>? <svg>? I'm guessing the answer starts with "javascript".


I think the GP was referring to weeping with joy.


People don't wring their hands with joy.


They wrung their hands of user agent sniffing and wept for joy.


Oh, in that case, carry on!


The user-agent has two implied pieces of functionality:

1) Describe the device that the agent is coming from (operating system)

2) Describe the capabilities of the agent (this browser, those plugins)

One of the things I loathe about the user agent header is the lack of reasonable maximum length, and the inconsistent way in which developers have overloaded the value. Parsing it is difficult (especially given that the length means there is a lot of scope for bad input).

I would love to see user agent be a virtual header comprised of other headers.

The other headers would not be mandatory, but as most browsers would provide them you could reasonably use them in most cases.

These other headers may be things like:

  os: Windows
  os-version: 7
  client: Gecko
  client-version: 16
  plugins: [{'flash':11}]
Basically... same info but more structure with known acceptable types for certain values.

Headers taking uncompressed space it would also be helpful if shorthand names were accepted: c-v for client-version, etc.

This is me thinking aloud, and perhaps it's an idea that has been thought of before and rejected... but by offering User-Agent as a virtual header that is comprised of all of the other headers you maintain some background compatibility whilst providing something easier to parse, use and trust for developers.


The problem with "fixing" the user-agent string is that making it easier to pass/use only means web-developers will find it easier to continue to abuse it.

In an ideal world web-developers should be testing if individual pieces of functionality exist rather than inferring what is supported based on the browser.

I think JS does a fairly good job of allowing developers to test for functionality, unfortunately CSS does not. I am well aware that it is meant to "fail gracefully" but to a lot of developers they want to supply alternative looks where functionality isn't available and CSS doesn't lend itself to that.

So you wind up inferring CSS support from JS support which is just as broken as inferring JS support from the browser's version/name/platform.


The user agent string is too loaded with backwards compatibility to remove or change. So the next best thing to do is supersede it - add a new agent-id or some such which is mandated by standard to be in the form "BrowserName/Version", e.g. "Chrome/22.0" or "Firefox/15.0.1", while keeping the old user-agent. Problem is I guess it's not really worth it - it doesn't expose any new information not already in the user agent, and it doesn't stop site authors relying on specific agent-ids. So I guess the way forward is try to ignore the user agent completely and just use feature detection.


Aren't there many uses of user agent strings outside of javascript/css usage? E.g. rate limiting, visitor stats etc


I enjoy sites which provide me with the correct download link for software, based on the fact Firefox places "Linux x86_64" in my User-Agent.

Having said that, the flip side is just as bad, when you get completely rejected from sites because "We haven't tested this site for your browser or operating system". It's a website for crying out loud. I don't get too mad at sites which implement this as long as they provide me a way to continue "at my own risk". However it's the final straw when they flat out refuse to serve me anything other than a page telling me they haven't tested the website for my browser/OS combination... sigh

Edit: This might give some food for thought for AshleysBrain. I like your suggestion, but am curious if we can find a way to send OS and architecture to sites so that they can give me nice download links...


A lot of that functionality can be implemented using JS rather than the user-agent string.


Pretty much just visitor stats (not that those aren't important!)

It doesn't make sense to use User Agent string for anything that someone might have a reason to game since it's so trivial to change them.


I wonder why browsers with a modern automatic-update process don't set their user-agent to something that discards all this madness ("Chrome/23.4.5678 (Windows)", or similar) for the cutting-edge/nightly builds only (or even betas, if they wanted to discourage casual users from switching to them, but I don't think that's the case at this point). Surely their users have signed up for a little breakage in exchange for the latest features? And if they actually get website operators to stop or at least fix their sniffing, the whole prisoners-dilemma situation would disappear.

(I guess this assumes that the huge user-agent that my Chrome is currently sending is necessarily bad, and in the real world maybe no one really cares...)


When we actually do this, it does not necessarily convince publishers to fix things. For example: for several months Mozilla has been testing one tiny reduction to the User-Agent string in Firefox Nightly builds (replacing "Gecko/20100101" with "Gecko/16.0"). Zillow.com is the highest-profile site that is broken by this change, and after five months they still haven't even responded to any of our attempts to contact them: http://bugzil.la/754680

It's much better to resist adding things to the UA in the first place, since removing anything later on is a huge pain and inevitably breaks things for users. Mozilla has managed to keep the UA relatively minimal (and successfully reduced it a bit in Firefox 4): https://developer.mozilla.org/en/Gecko_user_agent_string_ref...


I forwarded this on to one of my zillow friends (I used to work there) -- he'll look into it.


Thanks! It looks like it worked. I had even tried contacting @zillow on Twitter, but I hadn't quite reached the point of emailing random engineers. :)


Interesting! I wasn't sure if anything like this had been done. I guess it shows that there's nothing good that can come from change here.


You're implying that if nightly builds of a browser with a simplified UA broke a website that the website owners would fix their code, but that is unlikely to happen. Most websites, particularly the sort with bad UA sniffing, have a high cost to change (engineering, QA, making releases) and no incentive ("it broke on the new Chrome, probably a Chrome bug").

The two instances of UA spoofing I know of in Chrome are for large sites -- hotmail and Yahoo mail. My vague memory of the hotmail case is that Microsoft agreed to fix their code but said it'd take months to make the push. (http://neugierig.org/software/chromium/notes/2009/02/user-ag... , http://neugierig.org/software/chromium/notes/2009/02/user-ag... )

Even a relatively flexible company like Google gets UA sniffing wrong for many of its domains. At one point (as an author of Chrome and an employee of Google) I tried to track down the right people to get things fixed and ran into more or less the above problems. (The non-Chrome non-Safari webkit browsers these days must spoof Chrome to not fall into some "other" browser bucket.)


Ah, the pain of working with the development process of websites. I still remember the Hotmail globalStorage fiasco that led to Firefox 13.0.1 putting it back temporarily: https://bugzilla.mozilla.org/show_bug.cgi?id=736731


1. I think the most interesting thing about that blog post is that it illustrates how the incentives in standards building get warped. I like to describe this sort of thing as "the effect of economics on programming" - not because there is money involved, but because of the nature of the incentives.

2. Graceful degradation. We've sniffed UA's from the minute they were invented. Any change whatsoever would create untold problems for untold millions of people. The UA is just an arbitrary string so… who cares? Very few people (you and I are amongst these "very few") have to be concerned with this compared to the people such a change would affect.

It's because of 1 and 2 (my second point is really an instance of the first) that we're stuck with Javascript. No one in their right mind thinks it's a good language, but getting all the different browser vendors to adopt a good bytecode would be nightmarish (and not necessarily in the interest of every browser vendor).


I think JS is a good language, but I am probably not in my right mind, so touché.


Imagine the problems that would cause! You have Chrome 58 Beta, and stuff works one way. Then they say it's good and release Chrome 58 final, and all of a sudden, stuff changes all over the web.

UA string is just one example of unfortunate hacks that evolved in the web protocols. Compared to probably everything else in HTML it's probably just not even worth it to consider fixing it. We'll always need the old string for compatibility, so it's really only to save a few lines of parsing. Compared to the nightmare of parsing rules for HTTP and HTML, it's not even relevant.


Not only the user agent, either. Try javascript `navigator.appName` in any browser, and you'll get "netscape". `navigator.appCodeName` in most browsers returns "mozilla".

Mike Taylor gave a talk about this and more at yesterday's GothamJS conference:

http://miketaylr.com/pres/gothamjs/shower/


I wanted to try making an HTTP request from Telnet the other day. I tried Wikipedia, using the Host header. I got a 403 for not including a user agent, so I tried again with User-Agent: Telnet and it worked!

It's one of the most important headers for clients, since if you don't include it you might not get a 200.


In the particular case of Wikipedia, I think they check User-Agent to prevent people from unthinkingly wasting gigabytes of bandwidth scraping Wikipedia via tools like wget. In Wikipedia's case, better ways exist to download large quantities of their content in a more usable form.


They may do that ('though requesting a single article works fine), but it's not very smart. Throttling heavy users - possibly returning 429 with a link to the download pages - would make much more sense. It's not like wget users can't change their UA.


?

  bbot@magnesium:~> wget http://en.wikipedia.org/wiki/Japanese_yen
  --2012-07-15 13:54:29--  http://en.wikipedia.org/wiki/Japanese_yen
  Resolving en.wikipedia.org... 208.80.154.225, 2620:0:861:ed1a::1
  Connecting to en.wikipedia.org|208.80.154.225|:80... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 203481 (199K) [text/html]


wget uses the "wget/version" useragent.


Yes, I am aware. The point of my comment is that Wikipedia obviously does not block wget.


The point is that if it becomes a problem they'll just block that particular useragent.


The point is that you can use -U to specify arbitrary user-agent strings, and -E robots=off to ignore robots.txt.

User-agent blocking is completely braindead. It does nothing at all. The fact that somebody in 2012 can possibly think it works is astounding to me.


I return a 403 if User-Agent or Host headers are missing. And my firewall will lock you out completely if you use "User-agent" instead of "User-Agent" (among many other obvious giveaways in the User-Agent header).


Why?


I block anything that looks like penetration testing or content scraping if there's no chance of false positives. Even when there's no vulnerability present, it conserves resources on dynamically generated sites.


Why I can appreciate that, why not block based on patterns of use rather than headers?


Educational and fun!

All this could have been avoided if Webmasters used <noframes>, but I'm not sure when it was added to HTML.


In fact frames was introduced in Netscape 2, released after IE2.


This ugly quagmire makes me wary of compatibility fixes where mimicking another browser is somehow involved. When i heard about non-WebKit browsers adopting -webkit CSS vendor prefixes, the user agent string mess was the first thing that came to mind.


The problem with the user agent is that you can't fix it without repeating the same cycle. All you'd do is make it easier.

It's a good depiction of the issues you have with trying to write code once, and have it work the same in many different environments, though. It's just with browsers, rather than operating systems or hardware.

Such is the evolution of the internet.


That was interesting and fun to read, never thought about the User-Agent string and why it's so messed up.


i just read your article after having some austrian wine and absinth73 and it was very funny


actually meant the article was really cool, but that's what i get - downvotes- for using my slim english skills while tipsy (:




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: