
Why Parse the User Agent? - cbr
http://www.jefftk.com/p/why-parse-the-user-agent
======
jgraham
Unfortunately, encouraging people to parse the UA header leads to them getting
it wrong in a way that would be amusing if it wasn't so irritating.

For example FirefoxOS has a UA string of the from User-Agent:Mozilla/5.0
Mobile Gecko/28 Firefox/32.0'. There's an example in bugzilla of a major site
that, not only depends on the file details of that format, but will send the
mobile version of the site if and only if the number after Firefox is _even_
and in the range 18 < = x <= 34.

Such examples are not one-offs either. In fact it is so common for sites to
screw up here that browser vendors often resort to sending specially crafted
UA strings to certain sites in order that the content become accessible. The
most extreme example of this was pre-blink Opera which had much better compat
than it was given credit for, but was often blocked — acidentially or
intentionally — from sites that effectively whitelisted certain browsers. As a
result it used an auto-updated list of sites for which it had to send an
alternate UA string [1] as well as a sophisticated system of javascript
injection to work around more complex bugs in these sites [2].

[1]
[https://github.com/operasoftware/browserjs/blob/master/deskt...](https://github.com/operasoftware/browserjs/blob/master/desktop/spoofs.xml)
[2]
[https://github.com/operasoftware/browserjs/blob/master/deskt...](https://github.com/operasoftware/browserjs/blob/master/desktop/browserjs-12.50.js)

~~~
markbnj
There's a whole class of old ASP sites that break if you take "like Gecko" out
of the UA. It would be funny if it hadn't made me cry so often.

~~~
drdaeman
Why not just break them?

~~~
vertex-four
Because then people use a competing browser that doesn't break the sites they
use.

~~~
drdaeman
Seriously doubt it. Backwards-incompatible changes (mostly, deprecating some
old kludges) had happened many times and I don't think there were any severe
migrations.

I'm really not sure cons of User-Agent cleanup (or complete deprecation)
outweigh the pros. At the very least I think it's not obvious and is
debatable.

~~~
cbr
What are the pros? Smaller requests are only going to have a very small effect
on speed or bytes used.

~~~
drdaeman
Smaller requests, right.

Also, some privacy improvements by having less data for browser
fingerprinting.

Also, if we'd consider dropping User-Agent header altogether, less ways to
detect an UA, so a bit harder to show "sorry, your browser in unsupported,
this site is IE^H^H Chrome-only"

Obviously, just UA header isn't even remotely sufficient for any of those
reasons, but would be a good start.

------
anon1385
I do find it slightly strange that saving a few bytes from the size of an
image is suddenly of such pressing importance that we need to resort to all
these hacks.

JPEG2000 has been available in some browsers for years and years now, but I've
yet to see any site selectively sending it to supporting browsers to save
bandwidth, or anybody advocating for doing that. JPEG2000 first became
available in browsers 10+ years ago when connection speeds were very much
slower than today, so the benefit of using it was even greater back then, yet
it saw very limited adoption.

Also I note that the saving from using webp compared to the optimised jpeg is
only 18kb, meanwhile the page wastes 40kb of bandwidth on a Google analytics
script. What Google gives with one hand it takes away with the other.

~~~
takeda
Also the appropriate way of declaring what browser supports should be done
through the Accept http header.

------
HeavenFox
I'm working on a single page application that heavily utilizes IndexedDB.
While feature detection can tell the presence of it, it tells nothing about
the quality of the implementation. For example, in Safari, inserting ~9k rows
takes 10 minutes, and IE doesn't support compound key path. While it's
possible to write custom logic to detect the latter, it's impossible to detect
the former cheaply and reliably. I guess I have to resort to UA sniffing.

------
pornel
The whole thing isn't worth as much as it as it seems.

In this case PageSpeed generated JPEG at significantly higher quality than
WebP, so most of the file size difference is due to quality difference, not
codec compression efficiency (distortion measured with DSSIM for JPEG is .009
and for WebP is .015)

If you make the comparison fair by creating JPEG (using mozjpeg) at the same
quality as WebP it's 22539 vs 20158 bytes (11% saving).

~~~
cbr
We set default quality parameters for PageSpeed with the goal of producing
equal quality images between WebP and JPEG, on average (quality 80 for WebP,
85 for JPEG), but you're right that in this case the WebP image is more
distorted as measured by SSIM. Sorry, and thanks for catching this!

Testing now with this image, to bring the DSSIM-measured distortion on the
JPEG up to 0.015 [1] I need to encode it at quality=69, which gets me a file
size of 24317 [2] bytes to the 20158 [3] I had for the WebP. This is 17.1%
improvement for WebP over JPEG, as opposed to the 47.7% improvement I found
before. This is all with libjpeg-turbo as included by PageSpeed 1.9.

Running mozjpeg (jpegtran) with no arguments built at commit f46c787 on this
image, I get 23870 bytes with no change to the SSIM [4]. This is a 15.6% webp-
over-jpeg improvement.

It looks like:

1) We should run some timing tests on the mozjpeg encoder, and if it's in the
same range as the WebP encoder or not too much worse switch PageSpeed from
libjpeg-turbo to mozjpeg.

2) We should check that quality-80 with WebP is correct for getting similar
levels of distortion as quality-85 with JPEG. Is this image just a poor case
for WebP or is it typical and something's wrong with our defaults?

[1] Technically, 0.014956 with JPEG compared to 0.015060 for the WebP.

[2] [http://www.jefftk.com/kitten-demo--ssim15--
pagespeed.jpg?Pag...](http://www.jefftk.com/kitten-demo--ssim15--
pagespeed.jpg?PageSpeed=off)

[3] [http://www.jefftk.com/kitten-demo--ssim15--
pagespeed.webp?Pa...](http://www.jefftk.com/kitten-demo--ssim15--
pagespeed.webp?PageSpeed=off)

[4] [http://www.jefftk.com/kitten-demo--ssim15--
mozjpeg.jpg?PageS...](http://www.jefftk.com/kitten-demo--ssim15--
mozjpeg.jpg?PageSpeed=off)

~~~
pornel
Thanks for looking into this. I'm aware that choosing right quality level is a
hard problem.

If you only ran mozjpeg's jpegtran on a file created with another JPEG
library, you won't get benefit of trellis quantization. Try creating JPEGs
with mozjpeg's cjpeg (and -sample 2x2 to match WebP's limitation).

Here are the files I've been testing (one is same size, one is same quality
based on my DSSIM tool v0.5):
[http://i.imgur.com/O57WF19.jpg](http://i.imgur.com/O57WF19.jpg)
[http://i.imgur.com/ZBD6ioD.jpg](http://i.imgur.com/ZBD6ioD.jpg)

~~~
cbr
Is this the right set of arguments to be testing?

    
    
        $ mozjpeg/build/cjpeg
           -quality $quality
           -optimize
           -progressive

~~~
pornel
I suggest adding -sample 2x2 to ensure chroma subsampling is enabled.

~~~
cbr
Will do.

Talking to some people here, they think your DSSIM tool [1] isn't what I
should use. Specifically, they said it runs blur and downscale steps aren't
part of the SSIM metric. They suggested using Mehdi's C++ implementation [2],
which I understand yours is a rewrite of.

Presumably you think I should use your tool instead? What makes the (D)SSIM
numbers from yours a better match for human perception than those from
Mehdi's? Or should they be giving the same numbers?

[1] [https://pornel.net/dssim](https://pornel.net/dssim)

[2] [http://mehdi.rabah.free.fr/SSIM/](http://mehdi.rabah.free.fr/SSIM/)

~~~
pornel
I'm not sure if mine is better. I do fudge blurring with fast box blur
approximations rather than a proper gaussian.

I have two issues with Mehdi's implementation:

* It works on raw RGB data, which is a poor model for measuring perceptual difference (e.g. black to green range is very sensitive, but green to cyan is almost indistinguishable, but numerically they're the same in RGB). Some benchmarks solve that by testing grayscale only, but that allows encoders to cheat by encoding color as poorly as they want to.

* It's based on OpenCV and when I tested it I found OpenCV didn't apply gamma correction. This makes huge difference on images with noisy dark areas (and photos have plenty of it underexposed areas). Maybe it's a matter of OpenCV version or settings — you can verify this by dumping `ssim_map` and seeing if it shows high difference in dark areas that look fine to you on a well-calibrated monitor.

I've tried to fix those issues by using gamma-corrected Lab colorspace and
include score from color channels, but tested at lower resolution (since eye
is much more sensitive to luminance).

However, I have tested my tool against TID2008 database and got overall score
lower than expected for SSIM (0.73 instead of 0.80), but still better than
most other tools they've tested.

------
discreditable
Author is going about this in completely the wrong way. The best way to
determine if you should deliver webp or jpeg is to look at the browser's
accept header [1]. In this header you will find a listing of what the browser
is willing to accept and the order of preference.

[1] [https://www.igvita.com/2013/05/01/deploying-webp-via-
accept-...](https://www.igvita.com/2013/05/01/deploying-webp-via-accept-
content-negotiation/)

~~~
cdmckay
From the article:

"To emit html that references either a JPEG or a WebP depending on the
browser, you need some way that the server can tell whether the browser
supports WebP. Because this feature is so valuable, there is a standard way of
indicating support for it: include image/webp in the Accept header.
Unfortunately this doesn't quite work in practice. For example, Chrome v36 on
iOS broke support for WebP images outside of data:// urls but was still
sending Accept: image/webp. Similarly, Opera added image/webp to their Accept
header before they supported WebP lossless. And no one indicates in their
Accept whether they support animated WebP."

~~~
myhf
That's a good argument for the image host to sniff the UA string in the course
of content negotiation.

It's not a good argument for crafting html that bypasses content negotiation.

~~~
cbr
The problem with solving this in content negotiation is that breaks proxy
caches. If you serve uncachable html that references either jpg or webp then
you can serve those resources as "Cache-Control: public", and if you include a
content hash or versioning in the URL then you can send a long cache lifetime.
If you use the Accept or User-Agent headers to choose what image format to
send them you need to issue a Vary header, and basically all proxy caches will
treat that as uncachable.

There's some progress with "Vary: Accept" support, but "Vary: User-Agent" is
probably never going to be supported.

------
spdustin
Asp.Net 2.0 (and up to 3.5, which is still framework 2.0) sites use UA
sniffing to determine which markup to emit for many built-in controls, like
Asp:Menu. When Yosemite / iOS 8 / Safari 8 was released, anyone with a
SharePoint 2007/2010 site had their navigation get all screwy. I know first
hand how infuriating someone's badly designed UA sniffing can be; it happened
to me, and I wrote up a fix for it [0].

Why did it happen in the first place? Where did the UA sniffing fall apart?

A poorly designed RegExp looking for "60" in Safari's UA without a word
boundary or EOL check. The new Safari UA includes "600" in the relevant
section, and suddenly SharePoint sites were sending markup intended for (now)
ancient browsers - browsers that weren't even on the compatibility list for
SharePoint in the first place.

UA sniffing does need to go away for determining what structure your markup
sends to the user agent.

[0]: [http://blog.sharepointexperience.com/2014/10/did-safari-
or-i...](http://blog.sharepointexperience.com/2014/10/did-safari-or-
ios-8-break-your-sharepoint-2010-site/)

~~~
makomk
Due to using a moderately obscure browser, I've had problems with ASP.NET
sites just plain refusing to include any version of the JavaScript they
required for form submissions to work.

------
Animats
One of my systems uses the user agent "Sitetruth.com site rating system". This
breaks some Coyote Point load balancers. Their parsing of HTTP headers breaks
if the last letter of the header is "m". I had to stick an unnecessary "Accept
_/_ " header at the end to bypass that.

I suspect that one of their regular expressions has "\m" where they meant
"\n".

------
jimktrains2
Isn't the problem more that the browser can't be given multiple sources and
types and allow it to chose what it believes is best?

User-Agent sniffing really does need to die. Part of me wonders what use UA
headers serve beyond sniffing and analytics. EDIT: or the Accept (and content-
type) header, but that's a next to worthless header anymore :-\

~~~
cbr

         Isn't the problem more that the browser can't be given
         multiple sources and types and allow it to chose what
         it believes is best?
    

How would this look? If you simply did:

    
    
        <img src-options="image.jpg
                          image.webp">
    

There's no way for the browser to know which one it wants without downloading
all of them, because extensions aren't special. But I guess we could extend
srcset to let you specify content type:

    
    
        <img srcset="medium.jpg  1000w image/jpeg,
                     medium.webp 1000w image/webp,
                     large.jpg   2000w image/jpeg,
                     large.webp  2000w image/webp">

~~~
larzang
You just described the new "picture" element. It's an element with one or more
"source" child elements which each have a media query and a uri, as well as an
img child element to use as a fallback for non-supporting browsers.

Browser support isn't really there yet but it's coming and since it has a
built-in fallback you can use it now without requiring JS workarounds.

~~~
acdha
It's under consideration in IE so now would be a good time to vote for it
since it's so much cleaner than srcset:

[https://wpdev.uservoice.com/forums/257854-internet-
explorer-...](https://wpdev.uservoice.com/forums/257854-internet-explorer-
platform/suggestions/6261271-picture-element)

------
calineczka
I imagine that in the future with HTTP2 we could send the list of available
browser features as HTTP headers such as Animated-WebP: Supported ? With HTTP1
the cost of sending long list of available features would be too big probably.
But could HTTP2 solve this problem. If I recall correctly it only sends diffs
of headers between subsequent requests in the same connection. Maybe that
could be helpful for a class of problems like this one and similar.

~~~
jed_watson
I was just thinking a similar thing. If user agents have evolved to
effectively become a way of saying "Our browser supports standards X" (where X
is an awkward grouping based on which features vendors and browser versions
originally supported the feature / standard), surely the ultimate solution is
to provide a header enumerating support on a per-feature basis.

Obviously it would take a while for the standard to be adopted to the point
where it was useful, but as a web developer I dream of a time when we receive
a header that lists the supported features in the client (similar to the
classnames modernizer adds to the document body, and probably eventually
versioned) and no longer have to resort to either UA parsing OR javascript-
based feature detection, both of which are hacks because this problem hasn't
actually been solved properly at the standards level.

------
jmarantz
This is a great debate. I've thought about this quite a bit lately when the
PageSpeed team was contacted by Microsoft [1] about the conflict between
PageSpeed assuming that Android 4.0 can handle webp, and mobile IE
masquerading as Android thus breaking on PageSpeed enabled sites.

The debate will never end but it's nice to understand the perspectives.

PageSpeed, plus other WPO tools and complex sites, want the first view to be
fast and small. We measure our effectiveness on first-view speedup. The most
effective way to get this is by UA sniffing. We know about the other
mechanisms and use them too at times, but they don't produce results that are
as good for metrics we care about. However we are pretty serious about doing a
good job about robust UA parsing; we put more energy into getting this right
than a web developer should be expected to expend. The downside (as Microsoft
points out) is that when we have to make an update we can't push it to all our
users instantly. We should consider mechanisms to dynamically load an updated
UA-mapping from a server we control when PageSpeed starts up, but haven't
started such an effort.

Browser vendors have their own legitimate motivations. Microsoft IE11
developers don't want to be punished for the shortcomings of earlier versions
of IE, and want users of IE11 to get the best, most modern web sites possible,
not the IE6 version. Microsoft justifiably wants servers to send mobile
versions of sites to Windows Phones. I totally sympathize with their
perspective. Chrome was in the same boat when it first launched so it has all
kinds of masquerading in its UA and unfortunately still does. Same story,
different decade.

[1]
[https://code.google.com/p/modpagespeed/issues/detail?id=978](https://code.google.com/p/modpagespeed/issues/detail?id=978)

------
drivingmenuts
Yay! Yet another image format that I can't open in Preview, if I save it to my
hard drive.

Webp may be great for displaying in a web page, but it's TFU if I save it to
my hard drive and try to view it with default viewing or editing software.

It may not be a proprietary format, but it might as well be.

~~~
timothyb89
It hardly seems fair to reject the format simply because Apple hasn't added
support for it to their image viewer. At least here my desktop environment
happily displayed the saved image with its built-in image viewer (Gwenview).

I'll admit that support for the format isn't perfect, but at least in your
case it can be remedied pretty easily with a quick search [1]. That said,
format adoption has to start somewhere. It'd be pretty sad if we were still
using GIFs for all of our lossless images because PNGs were never allowed to
catch on before they could become universally supported.

[1]: [https://github.com/dchest/webp-
quicklook](https://github.com/dchest/webp-quicklook)

~~~
zanny
Thats odd, I have Gwenview 4.14.2 and qt 4.8.6, and webp doesn't resolve
properly. Rekonq can't open it either, but Kolourpaint can and Krita cannot.

Is webp added in qt5?

------
dps
Isn't this what Accept: image/webp is for?

------
crabasa
It seems, for many of the reasons stated below, that feature detection in
JavaScript will never be a sufficient solution. But clearly asking web
developers to manually parse and interpret UA strings is asking for trouble.

Does anyone know if there is an authoritative database of user agent string,
their associated browsers and the features of those browsers?

------
jrochkind1
I agree this is a decent solution, and the one OP compares to is not.

But I would have javascript feature detect, and set a cookie based on what it
found. Now, for all but the first page, the server can look at the cookie
instead of the User-Agent string. Server just needs to sniff user-agent string
if it doesn't (yet) have a cookie.

------
RubyPinch
[https://hacks.mozilla.org/2014/11/an-easier-way-of-using-
pol...](https://hacks.mozilla.org/2014/11/an-easier-way-of-using-polyfills/)
also had a small bit on UA over feature-detection

------
cwbrandsma
My personal approach...use feature detection for the first pass. User Agents
for the cases where it doesn't work. And by the way, it doesn't always work,
and some "features" cannot be detected.

I've spent a lot of time optimizing pages for video. Some browsers work just
fine, some work but look like crap, so you use a flash/silverlight plugin,
then a new version comes out and looks less crappy.

I also have a lot of code I turn off just for IE7...the code works, but
doesn't render properly. There is no way to detect "renders ugly"

So, "YES", feature detection is the future...but it isn't quite there yet
(about 98%).

------
nly
Mr Kaufman, JavaScript is like violence: If it isn't solving your problems,
then you're not using enough of it.

[http://libwebpjs.hohenlimburg.org/](http://libwebpjs.hohenlimburg.org/)

------
jgalt212
I have yet to see a solution that provides acceptable levels of client data
and accuracy that completely avoids parsing the User Agent string. Feature
detection alone won't cut it, and the converse is true as well.

So for the time being, the most effective method to optimize your website for
the widest variety of clients, is to employ both User Agent parsing (server
side) and feature detection (client side).

------
gkoberger
The author dismisses feature detection as being "slow", but it's _much_ less
error prone:

[http://www.stucox.com/blog/using-webp-with-
modernizr/](http://www.stucox.com/blog/using-webp-with-modernizr/)

You can just cache the result in localStorage; you take a few-millisecond hit
once.

Here's how the feature detection is done:

[https://github.com/Modernizr/Modernizr/blob/924c7611c170ef2d...](https://github.com/Modernizr/Modernizr/blob/924c7611c170ef2dc502582e5079507aff61e388/feature-
detects/img/webp.js)

 __HOWEVER __, in this case (webp), it 's likely better to detect it on the
server:

[https://github.com/igrigorik/webp-detect](https://github.com/igrigorik/webp-
detect) (More info: [http://www.stucox.com/blog/client-side-vs-server-side-
detect...](http://www.stucox.com/blog/client-side-vs-server-side-detection-
for-webp/))

The only time you _really_ should be looking at the User Agent is if you're
trying to do something like show a "Download for (Chrome|Firefox|etc)" button.

~~~
sauere
> The only time you really should be looking at the User Agent is if you're
> trying to do something like show a "Download for (Chrome|Firefox|etc)"
> button.

Amen.

~~~
tracker1
I'm currently working on a site that uses react/flux with some use of media
queries and size detection for rendering... effectively the phone and portrait
tablet displays will be similar, and the desktop and large displays will also
be similar... I'd prefer to be able to break this into two camps for the
initial html sent to the browser (via server-side rendering) This way less is
sent to the phone/tablet browsers to render initially... and less jarring
change as other features load after initial display.

The only way to do this is browser sniffing.. although minimal.. for the most
part if "phone" or "mobile" are in the browser I can send the lighter phone
version... otherwise they initially get the desktop view. It isn't perfect,
and to be honest, I'm doing more detection than just that.

There's also issues with where/when I can cash rendered output. And when/where
I can cash certain results. For example the record id's for a given search
will cache the first 10k id's so that paging becomes very fast, with the
results limited to that set. Other queries will also be cached.

It really depends on what you are trying to do... growing from a few thousand
users to millions takes some effort, planning, and thoughtfulness that isn't
always the same.

The way I would design an app for a few hundred internal users isn't the same
as for a public site, isn't the same for a very busy public site. You have to
make trade offs or collapse.

Why should a phone user on a strained 3g connection get the content for a
desktop browser.... By the same token, I _HATE_ when I get a search result for
a page on my phone, and that site redirects me to m.site.com/ (no path after
/)... I don't just design/develop web based applications, I'm also a user...

------
IvyMike
I always wonder why the first browser to misidentify itself as something else
wasn't sued.

I mean, I sure couldn't slap a "Internet Explorer" or "Netscape Navigator"
splash screen on my shareware, but that's what they're doing in the User-Agent
field.

~~~
marvy
Lying to people is not allowed, because you're tricking them into paying you
based on false data. Lying to programs is allowed, because you're tricking
them into working properly when they would otherwise fail. The first browser
to do this had the same motivation as modern browsers do.

------
dpweb
The fact that UA is so easily spoofed makes it a really poor choice.

Client side JS for feature detection can work well, I have a simple example in
the code of [http://http-echo.com](http://http-echo.com)

~~~
cbr
In practice webp isn't usually spoofed, and when it is it's by people who've
thought through the consequences and are in a position to deal with potential
breakages.

Client side feature detection can also be spoofed, but in practice that's bit
a problem either.

------
LocalPCGuy
On any decently sized site where you expect a user to visit multiple pages,
you would load the optimized JPG on the first load, but also detect whether
WEBP is supported, and if so, set a flag that will load WEBP for any
additional loads.

