First of all, good job on the project. I do not mean to be too negative, but this:
> The site updates weekly with data sourced from the server access logs of another site I run in order to give an accurate picture of the devices and browsers being used on the web.
If you only source data from one website, it is not an accurate picture by any means. Consider that other websites have different user bases. Your Firefox usage seems way too high for example, it's likely your other website is largely used by other technical people.
AFAICT they are not trying to provide current browser usage stats. They are only after having a list of popular browsers which is only a subset of the former.
The last paragraph about scraping seems to indicate that.
So they only need what is a reasonable UA as of this week. They don't need what is the most popular one.
I would accept this argument if the sample was unbiased but noisy. In this case it's extremely biased but (potentially) low in noise.
If people from Uganda aren't part of the target audience of this site, we won't get Ugandan user agents even if they happen to be a fair chunk of web users worldwide (certainly more than in my small but high-tech country.)
I think it's worse than that. Based on quantization of the percentage field [0], I think the source data has to be about 925 requests and is including all browsers seen at least twice.
[0] I.e. look at how many browsers are bunched at 0.22, 0.32, 0.43, 0.54, 0.65 with no entries between.
Thanks! And yep, fair comment, and I had noticed this as well even more so in last week's list. I have been thinking about how I could adjust the numbers in some way to counteract this or add another data source.
it's kinda weird how high Windows Firefox is on the list yet Firefox on mac doesn't make it in the top 50 at all... Do mac users just not use Firefox??
On Mac I can use Orion so I don't use Firefox that much.
Why Orion?
1. It supports tree style tabs (note lower case, I mean the concept, not the extension) natively. (I think you can also use TST, the Firefox extension, as Orion strives to be compatible with Firefox and Chrome extensions).
2. It is not based on Chrome/Chromium, but rather on the built in browser engine on Macs.
That said I don't know if it shows up as Safari or if it shows up as its own.
Can we please just freeze the user-agent string all browsers use? (I'd prefer if we could remove it, but we all know that isn't ever happening due to all the old websites.) It's silly that it's almost 2023 and web browsers are still sending legacy-infested nonsense like "(KHTML, like Gecko)" and websites actually change behavior based on it! We have proper APIs for feature detection nowadays, websites shouldn't need to change behavior based on the browser anymore.
Browsers still have weird behavior cases that aren't covered by feature detection. Here's an example from this year where I needed to change behavior based on the browser.
We have a single-page app with lots of dynamically created elements. We discovered a case where Chrome kept input focus on an element that was deleted and recreated, while Firefox didn't, resulting in different behavior if the user pressed Enter next after clicking on that element. (I don't know which of those is the bug, if either. We wanted the Firefox behavior so I hacked in a .blur() in the event handler for Chrome.)
But that doesn't require you to check the browser. You can force the same behaviour in every case, as you did with .blur() Neither feature detection nor user agent string matter here.
To me, that just sounds like an pretty good argument to improve feature detection. Freezing user-agents and improving feature-detection aren't mutually-exclusive.
On the one hand, I absolutely agree, UA-dependent behaviour is generally a huge PITA. Some sites even refuse to return content if you don't set UA header in request, WTF is with that? Last time I checked, only Host header was absolutely required.
But on the other hand, there are certainly some cases where you want to know actual browser version to work around the critical bugs: e.g., Safari 13 (I think?) had broken their implementation of WebSocket compression for about a year: the byte stream becomes garbled after about 64 KiB of data transmitted, and the WebSocket connection breaks down. How do you work around that without UA string?
And on the third hand, we may have proper APIs for feature detection nowadays but — how many webdevs actually have heard of them? And willing to update their legacy codebases? I personally haven't although I'm not a webdev by any stretch. And when one tries to google how to e.g. support some feature in different browsers, there are lots of older pages/resources/answers telling how to do it with contents of the UA header.
Firefox and Chrome froze the macOS version in their User-Agent strings, too, because jumping from “Mac OS X 10.15” to “Max OS X 11.0” broke quite a few websites’ UA parsing code that never expected a value other than 10.
literally this. user agents are an obsolete feature (i considered it obsolete around 10-15 years ago) and whenever a new protocol includes it you already know its made by backseat networking "engineers". all user agents do is break more shit because people think they're smart or clever for modulating their machine behavior based on them (blocking, working around broken standards, etc)
"there exists a use case that wouldn't be possible to implement without user agents" is not an argument. it's not philosophically valid. its dead obvious that you do not consider the whole picture if you make this argument. why does that 0.1% use case matter? especially given that it's not officially supported. especially given that for the last 10 years, the likes of google have shoved everything _straight_ into the web specs when they started considering it an official supported use case? if you are doing something that's not an official supported use case, why do we need to have hacks to support it? obviously the least ecologically harmful solution is for your use case to not be supported. of course we are talking web here, where scope creep is infinite and the protocols do nothing well instead of one thing well.
this is the very problem with the web, that "it doesnt have a concrete use case maaaaan", "ill just add and take what i want as i go", "the web is just the web maaaan you need to think on my wavelength to get it". these people embedded in web actually unironically just make shit up as they go along. they literally operate on an oscillating wave where today feature set 1 is good, tomorrow feature set 2 is good, and the day after, (what is essentially) feature set 1 is good. arguing to have a user agent string is the same idea. this is in strict contrast to a well designed product that actually solves a specific problem, like Standard ML, or a good SQL implementation (disregarding the specification conundrum) or JSON or VGA or TCP.
What’s so hard to understand about Edge pretending to be Chrome pretending to be Safari, using Blink pretending to be WebKit pretending to be KHTML pretending to be Gecko, all pretending to be Netscape Navigator? I’m the dude playing the dude disguised as another dude.
The OS column comes off as a bit deceptive, because windows 11 still presents as Windows NT 10.0 and newer versions of macOS still present themselves as "Intel Mac OS X 10_15_7", even if you're on an ARM chip...,
If you're parsing this info into the OS column, you should probably display this info in there at least with a note, or something like "Windows 10/11", "macOS 10.5 or newer"
I was going to say, having read the chart: a little surprising that Windows 11 isn't on here somewhere, wonder if it just advertises as 10.0 still. And apparently it does.
I wonder if we can get to a default User-Agent string for a browser where just none of the information it contains is accurate. Lying to say you're "Mozilla 5.0" is ubiquitous, now we've got lying about the version of the OS you're on and lying about the architecture... the only stumbling block is that browsers pretty much all admit who they themselves actually are and their version somewhere. So we need to get a browser that's lying about those things, too.
It was still differentiated in the minor version number though, 6.0 was Vista, 6.1 was Windows 7, 6.2 was Windows 8 and 6.3 was Windows 8.1...
Windows 11 uses the same 10.0 as Windows 10, the difference shifted to the "build number", which feels a bit weird cause now 10.0.19045 (Win10 22H2) was released a year later than 10.0.22000 (Win11 21H2)...
Brave's user agent string is identical to Chrome's by default, unlike some other Chromium-based browsers like Opera or Edge, which also list their own name.
it is so easy to spoof and really does not matter in usage except desktop/mobile. there are so many analytics out side that that it is indeed antiquated.
Always interesting to browse these lists. Figured I'd add this week's browser %s from a site I run to the thread for others interested in another source:
The rest are scrapers which I didn't bother to label. Google, Bing, Semrush, Ahrefs, Pinterest, Yandex, BLEX, Petal, MJ12Bot, Neeva, MetaJob..
They're probably overrepresented in this data set since I collected logs from about a dozen sites, but only a couple of them have any real popularity while all get scraped due to SEO.
I wonder how much of the Chrome is actually Firefox presenting itself as Chrome, because of the increasingly common practice of making sites that attempt to be locked to Chrome but are actually perfectly usable with Firefox if you just switch the UA.
Probably very very little... there just aren't that many Firefox users to begin with, and the number that are going to alter the User-Agent is a tiny subset.
On your second point, I use a mix of Chrome and Firefox... can't remember the last time I ran into something that didn't work on Firefox. Is mandating Chrome really particularly common?
I didn't know that, interesting! Definitely would affect one's experience of how common browser-locked sites are.
Probably not relevant here though, as they seem to be pulling their data from one site's traffic; if it were on that compatibility list you'd expect to see no Firefox users at all.
I agree with you, but it is sort of funny — most of the evidence for Firefox being very rare comes from querying user agents, right? If we were observing the computing universe from the outside, we might speculate that Firefox and spoofing are pretty widespread.
Sadly this kind of lockout is implemented client-side in Javascript. Either by actively nerfing the FF performance (if ff then bad perf), or by using APIs known to be bad on firefox. All of Google's websites are a perfect example of both actually (no proof needed, but I cannot believe it's only the second option).
I don't think Apple does, but I've noticed stuff like the touch bar and the finger print reader prompting to automatically sign in on certain sites only work on Safari. It makes sense that Safari would be better integrated with the OS. Not sure if it's a matter of Apple not making the API available to other browsers or them not taking advantage of it though
> Actual browsers decide to now just send the always accepted UA on any system to ensure the best experience
Case in point. An issue with Vivaldi browser was reported at the beginning of the year where google search/gmail/etc would show broken pages with links and buttons not working:
> Search for example shows the carriage, but does not accept user input.
> Gmail opens the page, but does not respond to any clicks to open emails.
> Image search doesn't show the images and pretends it's stuck "loading"
IIRC the issue was resolved on Vivaldi's side by changing UA to Chrome's for google services.
The copy/paste python function looks nice, but why don't you publish this as a package on pip? Would allow for automatic updates through pip vs copying the function manually into my source code every n months.
Transparent license/copyright info is another advantage of a package.
"I really would like you to contrast that with what you have to do with HTML on the Internet. Think about it. HTML on the Internet has gone back to the dark ages because it presupposes that there should be a browser that should understand its formats. This has to be one of the worst ideas since MS-DOS."
This is because newer versions of macOS still present themselves as 10.15 (Catalina). A bunch of user agent parsers just assume macOS never goes beyond version 10 and to minimize breakage so they've just frozen the macOS version at "10_15_7". One tell is that if it's Safari 16 it can't actually be on Catalina. Here's a relevant webkit issue: https://bugs.webkit.org/show_bug.cgi?id=216593
People who were okay with updating to 8 were probably okay to keep updating, while 7 holdouts often think it's the last good version. It reminds me of the Smash Bros. series, where the most commonly played games are always Melee and whatever the new one is. People who played Smash 4 pretty much all moved on to Ultimate, but people who like Melee keep playing it.
I'm going to start hitting this "API" once a week to update the UA of my selenium tool that I call bullshit, that I use when I have to automate use of a business partner's website for lack of a proper API.
I do! I tend to use it as my default Chromium browser because I find it faster/less annoying than Chrome in some aspects (tho over time that gap has closed) and find the dev tools to be a smidge better. In an ideal world we could all use Safari, but it isn’t reliable on too many websites I have to access for work.
Hm! It could just be a parsing error. The useragent in question is like this: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.41
Proud to see Linux + Firefox coming in as the 17th most common UA string! (I truly mean this non-sarcastically--I was expecting this combo to be #30 or even lower)
This list will be useful if it stays up-to-date. I find that on qutebrowser and sometimes IceCat that I've had to spoof my UA to get past the CloudFlare browser checks that are unfortunately on many sites these days (such as the GitLab login). Sometimes this workaround quits working and I suspect my old UA gets treated as more suspicious over time.
Suggestion: Include common headers other than User-Agent and their values in your API. If you want to use the data to act like the latest common browsers, some sites will look to things like Accept and Accept-Encoding to see if you look "normal"
The one thing that should be on here is SEC-UA headers, because these are effectively replacing User-Agent headers and are increasingly necessary for Chrome.
I filter out any user agents that are invalid, but there's no way to see which are real or faked. The access logs include the useragent of every single site visitor - not only errors/bad actors.
> most scraping tasks require either desktop or mobile useragents and not both together
Why? Do some sites serve completely different content? Or it's simply markup differences? Have never done much scraping and I'd expect the viewport size to be the decisive factor these days, not the user agent. But, again, I don't know much about that.
> The site updates weekly with data sourced from the server access logs of another site I run in order to give an accurate picture of the devices and browsers being used on the web.
If you only source data from one website, it is not an accurate picture by any means. Consider that other websites have different user bases. Your Firefox usage seems way too high for example, it's likely your other website is largely used by other technical people.