Hacker News new | past | comments | ask | show | jobs | submit login
We analyzed 425k favicons (iconmap.io)
550 points by gurgeous 44 days ago | hide | past | favorite | 128 comments

I got mine down to 160 bytes with some pixel tweaking and converting it to a 16-color indexed PNG. It's not a lot of work or very difficult (I'm an idiot at graphics editing), but you do need to spend the (small amount of) effort. I embed it as a data URI and it's just four lines of (col-80 wrapped) base64 text, which seems reasonable to me.

Haven't managed to get my headshot down to less than 10k without looking horrible no matter how much I tweaked the JPEG or WebP settings, and thought that was just a tad too big to embed. Maybe I need to find a different picture that compresses better.

I got that 280k Discord favicon down to just 24K simply by opening it in GIMP and saving it again. I got it down to 12K by indexing it to 255 colours rather than using RGB (I can't tell the difference even at full size). You can probably make it even smaller if you tried, but that's diminishing returns. Still, I bet with 5 more minutes you can get it to ~5k or so.

It's very easy; you just need to care. Does it matter? Well, when I used Slack I regularly spent a minute waiting for them to push their >10M updates, so I'd say that 250k here and 250k there etc. adds up and matters, giving real actual improvements to your customers.

The Event Horizon Telescope having a huge favicon I can understand; probably just some astronomer who uploaded it in WordPress or something. Arguably a fault of the software for not dealing with that more sensibly, but these sort of oversights happen. A tech company making custom software for a living is quite frankly just embarrassing to the entire industry. It's a big fat "fuck you" to anyone from less developed areas with less-than-ideal internet connections.

Oh hey, Discord must have seen this article -- their favicon is down to 14k now.

It's not, at least for me. If you checked in devtools, that's gzip over the wire size. Hover over the size and it'll show you the actual resource size, still 285k for me.

I committed a fix, it's now 24k uncompressed! :)

Congratulations. Don't forget your 11x improvement when it comes to the end of year reviews.

Yes it is! That should go on your profile. I wonder how much money that simple change is going to save!

The gzipped size is probably the correct metric to care about, right? Virtually all browsers will support that.

Sure, Discord could do a bit better, but it's not correct to knock them here for costing their users 285KB.

This is bad math, not researched heavily but in 2020 discord had 300 million users. 285kb goes a long way with wasted energy and bits flowing through the pipes. I agree generally with what your saying though gzipped sizes are what's being sent some CPU usage somewhere to unzip. less bytes == less waste?

PNG basically includes gzip in the file format, so you're not reducing the amount of CPU used, you're just moving where it happens.

Includes but doesn't always use. PNG also includes filters which can dramatically decrease sizes, especially when combined with compression.

That's why tools like OptiPng basically brute force all the combination of options. Depending on the image content different combinations of filters and compression will get the best file size.

That’s lit and a fantastic turnaround. Great work to whoever is reading this!

“ I got that 280k Discord favicon down to just 24K simply by opening it in GIMP and saving it again. “

You made me laugh out loud.

I agree that stuff like YouTube.com saying 144x but really 145x seems like it should be embarrassing.

I wouldn't be surprised if that was for a specific reason, like somehow showing up better somewhere for some reason, or something like that. Or maybe not; who knows...

256x256 PNG reduced to 256 colors with pixel transparency gets it to 2.68K. I manually dropped the color depth to indexed and saved it out in PhotoShop and I used FileOptimizer to shrink it. It includes 12 different image shrinkers and runs them all.

there are png optimizer programs, e.g. optipng

The Squoosh (web) app is awesome for this too! All processing is done locally with wasm.


Yep, just tried the Discord icon with OxyPNG and it went from 285k to 6.35k, visually indistinguishable.

I'd love to have a browser plugin that converts all images I upload to CMS using Squoosh.

`optipng -o9 -strip all' is a must

ImageOptim was a favorite of mine. They have a standalone mac app and a webservice. It combines several of these tools into a single GUI.

I found https://pngquant.org/ to be pretty good.

Note that unlike some of the other tools mentioned here, pngquant does lossy compression. Might still be the right tool in many cases, but it means you should check the output while e.g. optipng is a no-brainer to add to whatever your publishing pipeline is.

Warning: pngquant is GPL v3 licensed.

The difference between the Apple “precomposed” and standard icons had to do with the gloss effect on icons on pre iOS 7 home screens.

When adding a website/webapp to these earlier home screens, the OS would apply a gloss effect over the icon in order to match the aesthetic of the standard apps. The precomposed icon was a way for the developer to stop the OS from applying this effect, such as if their logo already had a different gloss effect already applied (i.e “precomposed”) or other design where adding the glossy shine wouldn’t look right. The standard icon allowed the OS to apply the gloss effect - which was a timesaver as Apple did tweak the gloss contour over the years: hence using a standard icon ensured that the website/webapp always matched the user’s OS version.

Also, we turned up 2,000 domains that redirect to a very shady site called happyfamilymedstore[dot]com. Stuff like avanafill[dot]com, pfzviagra[dot]com, prednisoloneotc[dot]com. These domains made it into the Tranco 100k somehow.

Full list here - https://gist.github.com/gurgeous/bcb3e851087763efe4b2f4b992f...

Lately, happyfamilymedstore has mysteriously always been in the top ~ten Google Images results for super niche bicycle parts searches I do. They seem to have ripped an insane amount if images that gets reposted on their domain.

What kind of parts are you looking for?

Does anyone know the story behind these? How do seemingly obscure sites consistently get massive amount of obscure content placed highly in results.

What most of them do is they will use Wordpress exploits to get into random wordpress website ran by people who know nothing about managing a website and are running on a $3/mo shared hosting account.

After they get into these random wordpress sites, then then embed links back to their sketchy site in obscure places on the wordpress site that they hacked, so that owners of the site don't notice, but search bots do. They usually leave the wordpress site alone, but will create a user account to get back into it again later if Wordpress patches an exploit. All of this exploit and link adding is automated, so it is just done by crawlers and bots.

This is done tens of thousands or even millions of times over. All of these sketchy backlinks eventually add up, even if they are low quality, and provide higher ranking for the site they all point to.

Think of websites like mommy blogs, diet diaries, family sites, personal blogs, and random service companies (plumbers, pest control, restaurants, etc) that had their nephew throw up a wordpress site instead of hiring a professional.

I don't mean to pick on wordpress, but it really is the most common culprit of these attacks. Because so many Wordpress sites exist that are operated by people who aren't informed about basic security. Plus, wordpress is open source, so exploits get discovered by looking at source code and attackers will sell those exploits instead of reporting them. So Wordpress is in an infinite cycle of chasing exploits and patching them.

> "had their nephew throw up a wordpress site instead of hiring a professional"

The web is supposed to be accessible to everyone.

This type of "blame the victim" attitude is a poor way to handle criminal activity.

If they had used static content, it would remain 100% accessible to them, but also vastly more secure.

Dynamic content generation on the fly for a blog is unnecessary complexity that invites attacks.

Static content is definitively not as accessible to the typical person asking their nephew to put up a WP blog on shared GoDaddy hosting.

wouldn't that preclude a few popular features like a rich text editor?

You can have a separate system, even a locally running desktop app do that. You can still have a database, complex HTML templating, and image resizing! You just do it offline as a preprocessing step instead of online dynamically for each page view.

Unfortunately, this approach never took off, even though it scales trivially to enormous sites and traffic levels.

I recently tried to optimise a CMS system where it was streaming photos from the database to the web tier, which then resized it and even optimised it on the fly. Even with caching, the overheads were just obscene. Over a 100 cores could barely push out 200 Mbps of content. Meanwhile a single-core VM can easily do 1 Gbps of static content!

I thought about "serverless" blog.

Here's some rough scheme I came up with (I never implemented it, though):

1. Use github pages to serve content.

2. Use github login to authenticate using just JS.

3. Use JS to implement rich text editor and other edit features.

4. When you're done with editing, your browser creates a commit and pushes it using GitHub API.

5. GitHub rebuilds your website and few seconds later your website reflects the changes. JavaScript with localStorage can reflect the changes instantly to improve editor experience.

6. Comments could be implemented with fork/push request. Of course that implies that your users are registered on GitHub, so may not be appropriate for every blog. Or just use external commenting system.

So, essentially a site generated with Jekyll, hosted on GitHub Pages with Utterances [0] for comments and updated with GitHub Actions.

I don’t know if https://github.dev version of Visual Studio Code supports extensions/plugins, but if so, then there is also a rich text editor for markdown ready.

All that’s left would be an instant refresh for editing.

[0]: https://utteranc.es

If this is a serious suggestion (I really hope it isn't), you have never met the kind of person setting up the blogs the GP is talking about.

There are plenty of places that you can go to on this planet with little to no law enforcement. Don't be surprised if you end up dead there. Handling global crime is very difficult.

and anyone can hire me to design them a website.

Pretty sure closed source wasn’t very effective at stopping 0days either (Windows). The most common platform gets the attention generally.

I recently saw and reported one to a local business.

If you typed in the domain and visited directly, it wouldn't redirect to the scam site. But if you clicked on a link from a google search, then it would redirect.

Probably makes it harder to find for small website owners if they're not clicking their own google searches.

It happens through search engine optimization, SEO, and a mix of planting reviews and other tactics. Think of it like this - what would you do to get people talking about your site? You'd somehow put links, conversations, reviews, quotes, etc. in front of them.

IMHO, you should add this note in the blog too. Also, wondering about the use case of the website... are you building anything else too?

I worked on Opera Link, the first built-in synchronization between different installations of the Opera browser, both desktop, Opera Mini and Opera Mobile (+ a web view).

Favicons got included in the data from day one, and it was awesome to get the look and feel of your bookmark bar/UI with the correct icons right away.

Back then we stored the booksmarks in a home grown XML data store (built on top of mysql, acting more or less as a key-value store). This worked quite nice, and it allowed us to easily scale the system.

One night the databases and backends handling the client requests suddenly started eating a lot more memory, and the database started using much more storage than normal.

As one of only two backend devops working on Opera Link, I had to debug this, and find out what was going on. After a while I isolated the problem to a handful of users. But how could a few users affect the system so much?

As a part of the XML data store, we decided naively to store the favicons in the XML, as a base64 encoded string. While not pretty, a 16x16 PNG is not that much data, and even with thousand of bookmarks, the total overhead on compression and parsing was neglishable. What we did not foresee was what I uncovered that night; A semi-popular porn site had changed something on their server. They had started serving the images while also pointing browser to the same images as the favicon! Each image being multiple megabytes, sent from the client, parsed on the backend, decoded, verified, encoded back to base64, added to the XML DOM, serialized, compressed and pushed back to the database...

Before going to bed that night, I had implemented a backlist of domains we would not accept favicons for, cleaned up the "affected" user data, and washed my eyes with soap.

I miss those days!

I have fond memories of using Opera <= 12. You guys were in space compared to other browsers at the time.

Wait, so you can see user's data directly?

The truth is that most services will have a set of devops with access to personal information. And some times, we need to look at private data to solve issues like this. My first instinct back then was that some smart hacker had created a FUSE support for Link or something similar.

Opera Link did not encrypt bookmarks and speeddials etc, but had datatypes encrypted with master password, even while syncing. We where two people with the access and knowledge to access individual user information, and we took it very serious.

GP is talking about something implemented in 2008. It was a different time and a different mentality.

Google Chrome Sync help docs imply it defaults to storing data on servers unencrypted by default.[1]

Firefox Sync seems to have sane/encrypted defaults.[2]

[1] https://support.google.com/chrome/answer/165139

[2] https://hacks.mozilla.org/2018/11/firefox-sync-privacy/

The favicon visualization brought memories of the million dollar homepage. I suppose it was precursor of NFTs.



>The favicon visualization brought memories of the million dollar homepage. I suppose it was precursor of NFTs.

It was not; NFTs are digital certificates saying that you own certain digital content on the other hand The Million Dollar Homepage was basically selling ad space on the website.

You can argue you could buy part of the website(digital space) and therefore you own the part of the website but in reality you were renting it as an ad space meant to promote your website(link).

Purpose and vision of The Million Dollar Homepage and NFTs are completely different but I can see similarities between quasi owning digital space(part of website) and owning digital content or digital certificate(digital token).

Is it really necessary that we assume a precursor must be a strict equality in all dimensions aside time?

No but because of the all aforementioned reasons they are of minimal similarity.

> In fact, I recommend that browsers ignore these hints because they are wrong much of the time.

I don't agree. That's the kind of coddling that encourages incompetence. Instead of compensating for others' mistakes, just let their stuff break.

I wonder if Safai on iOS ignores the hints. When I tested, I was surprised to see that pressing the share icon, which holds the option for `Add to Home Screen`, would cause a download of all of the icons listed with `link rel="icon"`.

Favicons are a huge pain to deal with correctly.

A problem with this is that when a website breaks in one browser, but works in another, I imagine most people's reaction would be to blame the browser. This leads to a kind of race-to-the-bottom for browser compatibility. See for example the history of User-Agent strings.

depends on the error message? Maybe instead of failing, give an annoying prompt to offer a workaround.

People make mistakes all the time. Breaking because somebody made a mistake that you can correct for just leads to unnecessarily fragile code.

What's the point of failing and breaking stuff if someone tells you their image is 144x144 but it's really 145x145? Who does that benefit?

The opposite is the case. Overall, being too lenient in what code accepts and applying heuristics will lead to way worse problems down the line. For example, you want your compiler to fail hard instead of saying: "Oh, this isn't a pointer, but I'm sure you meant well, I'm just going to treat it as a pointer!"

In this particular case, it seems to me that the hints serve no purpose and should be abolished, and in the meantime fully ignored, altogether. All necessary metadata is contained in the image file, and browsers should also be (relatively) strict in what image files with what metadata they accept, for security reasons alone.

And if they also went so far as limiting file size, the perpetrators that clog up bandwidth by putting up multi-MB favicons would catch on much earlier (or at all), too.

So what actually is the point of those hints, if browsers have to fallback anyway?

The hints are not a hint in how to render the icon - browsers don't need hints for that. the hints are an instruction to browsers on which icon to download in the case where multiple icons are specified.

if you are safari and you don't know how to display SVG favicons, then you don't need to waste bytes downloading a favicon only to fail to display it. the HTML does not limit a site to only one favicon.

Why is that not done through the MIME type and using HEAD? The server is apparently much better able to figure out the MIME type through magic numbers and file extensions of the actual file, than the author (human or not) of the HTML, as we see.

The same headers also inform the browser that they can skip downloading a favicon that they consider too big, for example.

HEAD support is never a guarantee, and content type auto detection is just another kind of heuristics.

Ugh, HEAD is not being universally supported, at least for static content? Okay, I accept that this has value then.

As for the MIME type, for image types I'd say it's more than stable enough. Certainly much, much more stable than the 6.7% error rate mentioned in the article here, I'd be surprised if it was even 1%. If you double click on an image on your desktop for example, you can in almost all cases expect that it will be opened correctly. It ceases being a heuristic entirely if you tell the webserver that *.png is image/png, and only put PNGs with names ending in ".png".

Guess those are the reasons why I got out of web development in 10 years ago, everything's held together by scaffolding and needlessly wasteful and inefficient there.

You might be overthinking this. I agree with the philosophy that stricter is better, but in this case what do you expect broken hints to do?

They’re not used for rendering, they’re used for figuring out what to fetch. A HEAD request would be far less efficient than knowing ahead of time what to fetch: 1 request versus 2N+1 requests.

What you suggest sounds all fine but the entire web is user input for a browser, so no matter what, you need to define how to fail. If you can fail gracefully, you might as well do so, because a failure might not even be triggered by bad code/configuration on your side but simply by flaky network issues.

Yeah, I get how those hints make sense, now that you (and others in the thread) have told me how things are, and I did overlook that HEAD is still an extra request, while the attributes are (effectively) for free.

I do wish that content negotiation (e.g. Accept headers) worked properly. In the end though, those hints implement a subset of content negotiation in a reasonable way, given the state of affairs.

Just don't ignore filename extension. favicon.svg is SVG and that's about it. If you don't support SVG, don't download it. If you want to store png in favicon.svg, don't do that.

The web runs on mime types and file extensions are irrelevant except for buggy browsers that try to be too clever (Internet Explorer).

YouTube and Twitter both have wrong parameters. Presumably this means all major browsers ignore them or someone would have noticed their favicons not displaying right?

I don’t see Postel’s Law cited here yet, which I find pragmatic and worth sharing/considering as I used to be in the “let their stuff break” camp.

https://en.m.wikipedia.org/wiki/Robustness_principle (Quite short)

Browsers ignore the hints because they aren't needed. The image file itself has everything you need for rendering it.

The point for the hints is probably that the browser doesn't need to fetch the 2000×2000 favicon if it only needs something in 16×16 to render in the tab bar.

That may be your viewpoint but browsers have historically always taken the other viewpoint. Take HTML parsing for example. You can miss closing tags and a ton of other stuff, and it'll all work on a best-effort basis.

The browsers job is to do the best it can, that's what users want. No one would use a browser that breaks at the smallest tiniest error in the source code.

> browsers have historically always taken the other viewpoint.

except for the short-lived XHTML fad which tbh I kind of miss every day

XHTML is still supported and works even with HTML5 tags.

It such a shame that Safari does not support SVG favicons. It's the only major browser which doesn't: https://caniuse.com/link-icon-svg

All current browsers support PNG.

Don't hold your breath. Safari is the new IE6.

Its such a shame that PNG does not support packing multiple dimensions into one file like .ico formats actually do.

It can be done with MNG. There just has never been a tooling ecosystem that supports it for non-animated applications.

Will it look good on a browser tab? Seems like the res would be too low.

It's a vector graphic; its resolution is whatever you render it at. "S" as in, "Scalable".

Sure, there is some nuance in that you wouldn't want some fine detail to get lost at the displayed size, but presumably you know you're making a favicon when you do so.

Or, you're the NFL & you're going to supply a 4 megapixel image IDK.

> Sure, there is some nuance in that you wouldn't want some fine detail to get lost at the displayed size, but presumably you know you're making a favicon when you do so.

On the other hand, SVG is really not designed for the fine pixel control you want to make the icon look good at smaller sizes as it does not have the equivalent of font hinting.

Not at very low resolutions, <= 32 px. See sibling comment.

... and wrote an interesting technical article about it, that even someone like me, who doesn't do web development, enjoys reading. Definitely why I come to HN (no sarcasm, it is).

Aside: This article is a decent usecase for the esoteric `image-rendering: pixelated;` css property.

I used it to make this PWA work well on iPhones: http://dmitry.gr/89

I loaded this up on a surface tablet--it renders larger than my viewport, but with no scrollbar.

I was able to zoom out and see everything, but some people don't know (or wouldn't think of) that trick.

Designed for personal use as a PWA specifically on my iPhone. I migrated from android where i had a TI-89 emulator app. No such thing exists for iOS. Usability by others was never a requirement :)

Ha - that’s a fantastically nerdy little project. I love it!

My website, gameboyessentials.com, would not exist without this esoteric CSS property. I wanted to show Game Boy images in their exact resolution (160 by 144). With image-rendering: pixelated; I have crisp pictures on my site whose sizes are counted in bytes.

Great tip. I've never come across this before. I updated the post and the scaled up icons look much sharper now.

Also see the gigantic map - https://iconmap.io

The blog post is the analysis of the data set, the map is the visualization.

I wonder if there might be a way to map all these using t-SNE to discrete grid locations? Maybe even an autoencoder. I'd love to see what features it could pick out.

I don't see their data set though. hmmm.

maybe I'll just have to crawl it on my own if I want to do it.

You can use t-SNE (or even better: UMAP or one of its variation) to create a 2D points cloud, and then use something like RasterFairy [1] to map 2D positions to the cells a grid. It usually works well.

[1] https://github.com/Quasimondo/RasterFairy

side note: instead of t-SNE consider UMAP - provides better results (and it's much faster) https://github.com/lmcinnes/umap

Is the dataset available for download? I couldn't immediately find a download to the dataset in the linked article.

My hands itch to do some dimension reduction on that data and make some nice plots

We'd be happy to share the data. Reach us at help at gurge.com if you're interested.

damn I was thinking about that too :-)

I see a lot of repetitions in the map?

It's one icon per domain. Try hovering (on desktop) and you'll see that many domains have the same favicon.

It also works on mobile if you tap the fav icon.

Favicons are slightly useful. You can serve your page at http://www.example.com with a favicon from https://example.com that has a HTTP Strict-Transport-Security header with includeSubDomains, and then future page loads in that browser will be https (across your whole domain). (This assumes you want your domain to be https)

Other than that, I'm still pretty meh about them.

Nmap generated a similar version many years ago and it's still available at:


We also did something looks at favicons by IP:


I know of a company whose favicon was a hires true color PNG that weighed in at more than 2 MB. The web site was the dominion of marketing. Suggestions to improve the situation were detrimental to one's career path. sigh

Huh, there's a row of identical icons of 3 blue circles (search for cashadvancewow[dot]com) and all the domains using them are loan-related. Interesting way to do forensics on clone sites (although trying a few of them, they're not showing any icons right now, and the URL /favicon.ico 404's)

And I checked a few of the sites, I just got lorem-ipsum style landing pages. I wonder what's the point, or are the scammers using the domains mostly for emails?

There are multiple runs of "just a bit too abstract" icons that point into the abyssal cesspools of the Internet. Most of them seem to be about loans, so I'm going to avoid announcing that too loudly if I ever need a loan, since clearly, there are some scumbags out there.

Not really relevant, but using Go to fetch the data, and then Ruby to process the data is the best. I used this exact set up for a project and it was amazing. Really the sweet spot of use cases for both languages.

Can you please explain why they are the best languages for these jobs?

Go's got an awesome feature set built in to the language for building small networked services. I implemented a client to a cryptocurrency network to extract information about its status and clients. I can't really express why it's so good, it just feels right.

Same for Ruby, the syntax is perfectly suited for transforming, digging through and acting upon data. I didn't even add a Gemfile, only used standard library functions, transforming the data the Go program mined into usable information serialized in JSON which was subsequently used as a static database for a webpage.

You can find the source here: https://github.com/tinco/stellar-core-go, the Go is in cmd and the Ruby is in tools.

The site it powers is now defunct, apparently they changed some stuff in the past 3 years and the crawler no longer functions.

The non-PNG Apple touch icons might be CgBI files? It's an undocumented proprietary Apple extension to PNG which most PNG tools won't accept, but which Xcode uses for iOS apps.

> Strangely, only 96.1% of Apple touch icons are PNG. Presumably the other 4% are broken.

What does broken mean in this context? Non-PNG, or actually broken? I assume the author has the files.

Off in one of the more esoteric corners of favicons, you have games played within the favicon: https://www.youtube.com/watch?v=fpjM5myls7I

Sadly it doesn't quite work for me any more, but the youtube video does a decent job showing what it looked like when it worked.

That article was a fun read! There was one sentence that bothered me though.

> I recommend that browsers ignore these hints because they are wrong much of the time. We calculated a 6.7% error rate for these attributes, where the size is not found in the image or the type does not match the file format.

I think of much in this context to mean at least more than 50% of the time. So I had to look up the definition of the word. One definition from Merriam is "more than is expected or acceptable : more than enough." So I guess the usage is acceptable!

I always enjoy finding I have a slightly wrong definition in my mind for a word. Many arguments, or much arguments, fail to move forward due to the differing, unidentified, underlying assumptions relying on words with slightly different definitions, both people having a slightly different question they are arguing in their mind.

Less analysis, but a couple years ago I posted a script to download and then generate mosaics from favicons: https://smalldata.dev/posts/favicon-mosaic/

example image: https://smalldata.dev/images/mosaic.jpeg

script to get the favicons: https://gist.github.com/philshem/e59388197fd9ddb7dcdb8098f9f...

This reminds me of the time I reported to CIRA (Canadian domain registry) that their favicon was ~2mb /w bad caching rules and was causing issues in ... many situations.

In related news, the FDA approved Ketamine nose spray for treatment resistant depression not too too long ago https://www.fda.gov/news-events/press-announcements/fda-appr...

Didn't they miss all the pre-sized icons in their scan as well? For a while Apple encouraged multiple resolution sizes for favicons for... reasons.

I know they additionally missed the directory specific favicons which have always had iffy support (i.e. /index.html => /favicon.ico and /munks-page/index.html => /munks-page/favicon.ico)

> Check out this startling ICO with 64 images, all roughly 16x16. I suspect a bug.

I suspect an animation. Anybody know how to find out?

One weird behavior with favicons that I noticed is that Firefox will download both the 16x16 icon that matches the size its displayed at (on 1x pixel ratio screen) as well as the largest icon and then will display whichever finished last. This behavior makes no sense to me.

Would have liked to see more color analysis, like a graph showing the number of distinct colours per icon.

That "I am feeling lucky" button does not seem random at all, it brought me in order to: Microsoft Windows, Blogger, The Financial Times, Github, Adobe ...

As every other location I randomly scroll to has no recognizable image on it ... that seems preselected :-)

I haven't updated the favicon on a site I run in years, if not decades. It's a 32x32 GIF 89a file that runs 131 bytes.

It's interesting to ponder how many hundreds of bytes are exchanged between the browser and the site just for a simple GET request for the image.

I have always wanted to do this _exact_ analysis - so awesome! Every time I am building some kind of semi-intelligent parser to fetch an arbitrary visual icon for a URL I think to myself there has gotta be a better way do do this.

I did something similar in 2008: https://tech.arantius.com/favicon-survey

I have to wonder if this is being passed around Netflix, for example, today, asking who is in charge of fixing it.

Slightly OT, but what was that one that came around a few years ago that would make everyone's CPU go to 100%?

I use an inline svg for mine... which is really just a poop emoji.

What is the Tranco dataset that this is based on? I mean come on -- anything that claims to be based on 'Alexa' (or any of these others: Cisco Umbrella/openDNS? Majestic? Quantcast?) is sooo suspect. None of these sources are that good and especially Alexa which harks back to a time 20 years ago of browser toolbars and extensions which the large majority do not use anymore.

Just saying yes maybe it's easy to come up with a top 1000 list of sites on the net, but other than that no one really knows unless you're like Google/Bing/Apple/Cloudflare that have redirection urls/DNS control, tracking clicks etc

> We did a hacky image analysis with ImageMagick to survey favicon colors. Here is the dominant color breakdown across our favicons. This isn’t very accurate, unfortunately. I suspect that many multicolored favicons are getting lumped in with purple.

Writing or reviewing a sentence like this should make you reconsider. Either do the right analysis or remove this from your article. But when you say your analysis is probably wrong and the results look weird, then why publish as is?

Imperfect analysis with known limitations still has value. We can build upon it and improve. I'd rather have it out in the open than omitted.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact