Hacker News new | past | comments | ask | show | jobs | submit login
Web Bloat Score Calculator (webbloatscore.com)
184 points by zdw on Oct 25, 2016 | hide | past | favorite | 74 comments

I found an interesting case where WebBS does not seem to reflect my understanding of "bloat". A very long page with

Page size: 3,672 kB and 219 requests

Image size: 11,390 kB (http://www.webbloatscore.com/Screenshots/58b58030-6807-4638-...)

WebBS score: 0.322

It's apparently easy to obtain a good WebBS score by including lots of JPEG images. The PNG used for the image size calculation is typically much larger for the same bitmap, so every JPEG reduces the WebBS score.

More generally, visual simplicity will be penalized. As an example, http://jsbin.com/milelocito gets a score of 0.011, and http://jsbin.com/fejevatihi a score of 0.035.

The site timed out while trying to calculate the scores, so I had to do it manually. The respective sizes were 3377 (html), 283130 (png) and 3323 (html), 92815 (png).

This example is a bit contrived, but you'll see the same effect for more subtle decorations as well (e.g. box-shadow, ...).

Hi, author here.

I wanted to measure visual information content of a page, and yes, PNG is maybe not the best to do that. JPEGs are smaller, but it introduces another problem of visible artifacts around text.

If anybody knows a better way to measure visual information content, please suggest.

I don't have a suggestion, but I wanted to congratulate you on a wonderful idea, really nicely executed.

Thanks Maciej, your obesity article was a one of the sources of inspiration :)

I don't have a worked out idea, but maybe if a site has more than a certain amount of images, start to multiply their contribution to the size of the page? So if you double the image content, then a page that literally takes 10 MB to load, of which 9.5 MB are images has the calculation:

19.5 / 10 = 1.95. (assuming the PNG is 10 MB for simplicity).

You could be fancy and use exponent.

But I doubt there's a good mechanical way to do this. Whether an image is bloat or valuable content requires sophisticated judgment and understanding of what the purpose of the page is.

> Whether an image is bloat or valuable content requires sophisticated judgment and understanding of what the purpose of the page is.

I completely agree with that. Quick check is to replace all images on a page with placeholder images—if that doesn't reduce functionality of the page then maybe you don't need that images at all.

Well, you wouldn't be showing the JPEGs to anyone. Using JPEG compression for the calculation might be perfectly fine.

By the way, your calculator seems to include prefetched pages in the calculation. (link rel="prefetch".) Not sure if that's deliberate or not.

Yes, prefetch is deliberate as that is the default behavior of SlimmerJS and most of the browsers.

Why are visible artifacts around text a problem? The screenshots are just being used to calculate the metric. Is it really important what they look like?

There is some truth to that, but I wanted screenshots to be visible for comparison and to have the same visual functionality. Artifacts around text make it hard to read, thus less functional.

How do you measure "TotalPageSize"? Does it take into consideration any plain-text content that's served with "Content-Encoding:gzip"?

From my developer: "We measure every request that was received by SlimerJS, also for any request that sends a resource as GZIP, but does not specify its size, we will GZIP that content ourselves in order to get its size."

I was finally able to try it out and this is exactly my observation too.

I've got a very similar score of 0.323 for an Angular SPA with a bunch of dependencies (1,276 kB, 44 requests) that basically shows 5 jpeg images.

Would be interesting to see the result when using JPEG instead of PNG. However it's still easy to gamble, just display some large JPEG images in full size and the score will approach 1.

Edit: By the way, here's a working screenshot link for the score above (image size 3,947 kB): http://www.webbloatscore.com/Screenshots/aeea9b6a-4852-4b01-... They screenshot the whole page.

> just display some large JPEG images in full size and the score will approach 1.

Or less, depending on the JPEG quality setting used in each case I guess. That's not completely unreasonable though - if I only want to look at a few large images (e.g. from Mars) and my browser downloads mostly just that, it's not really bloated.

Look at the homepages of Tim Berners-Lee, Bjarne Stroustrup, and Donald Knuth. All three together have 235 kB, less than one Google SERP. Images are optimized, most of the content is above the fold, and their pages were "responsive" two decades before responsive design became a thing. But they are all ugly.

"Ugly" is subjective. Personally, I like that minimal information-rich style instead of the over-designed monstrosities I come across so often which are glittery and trendy on first glance, yet quite vapid in terms of content. They also tend to be annoyingly distracting to actually try reading, due to all the extra crap. The former feels like reading a book, the latter a tabloid.

1/3 of KB of CSS can convert any of them from ugly into really nice.

How? They aren't ugly in the first place. In fact they're eminently readable and navigable.

My biggest complaint on most of those sorts of sites are:

    * Tight line spacing
    * Full window width lines
    * Small fonts
    * Sometimes, font choice
When I'm reading something lengthy on a site like those named, I'll usually set a max-width on the body, give it auto margins, and play with line-height, font-size, and usually set it to Arial. Takes me 30 seconds on pages that aren't terrible and makes it much easier to read.

How? By changing those things in the browser's console, or some other way? And if in console, do you have to do it each time, or can you save it somehow?

You can use a Bookmarklet to inject/change styles. I have one that toggles CSS on/off:

    javascript:(function(){function d(a,b){a.setAttribute("data-css-storage",b)}function e(a){var b=a.getAttribute("data-css-storage");a.removeAttribute("data-css-storage");return b}var c=[];(function(){var a=document.body,b=a.hasAttribute("data-css-disabled");b?a.removeAttribute("data-css-disabled"):a.setAttribute("data-css-disabled","");return b})()?(c=document.querySelectorAll("[data-css-storage]"),[].slice.call(c).forEach(function(a){"STYLE"===a.tagName?a.innerHTML=e(a):"LINK"===a.tagName?a.disabled=!1:a.style.cssText=e(a)})):(c=document.querySelectorAll("[style], link, style"),[].slice.call(c).forEach(function(a){"STYLE"===a.tagName?(d(a,a.innerHTML),a.innerHTML=""):"LINK"===a.tagName?(d(a,""),a.disabled=!0):(d(a,a.style.cssText),a.style.cssText="")}))})();
I also use Stylish and Greasemonkey to style and script any sites I visit frequently to match my personal tastes and add missing functionality or helpful features.

Not him but I use the Stylish add-on whenever I make a change to a website I visit often.

I find "The long tale of two tribes" at the bottom of the page far more interesting than the score calculator itself. We should discuss this and not nitpick on WebBS gaming. It would be better for discussion to have article at the beginning of the page and later on introduce the score calculator.

I would not say that it is easy to pick one particular side. It is true however that web feels too bloated. It is funny how simple static page with everything embedded can be faster than whatever fancy web framework. Look how quick St4k [1] is.

Computing on server may be fast and energy efficient. It also hides complexity. But then we end up in proprietary walled garden. There must be some middle ground. I am hopping that Sandstorm project [2] will bring democratization of the server space.

Lately I think that our technology meets the ceiling. In the past we imagined that everyone would have their personal flying car or that we would have a space plane. Now we have subsonic jets and maybe electric self-driving cars behind the corner. We have a promise of reusable rockets with stages landing separately like some kind of elevator. Everything is smoke and mirrors. We will not have thin clients with low latency links to servers everywhere just like we will never have independent and fully distributed thick clients. We will fake it until we make it. There may be Netflix cache [3] near you to simulate that network is fast in both senses.

[1] http://danlec.com/st4k#

[2] https://sandstorm.io/

[3] https://openconnect.netflix.com/en/delivery-options/

[EDIT] I worded it poorly. St4k was not meant to be an example of fully static page, but rather general trend. Yes, St4k is SPA, but after one request you have whole application. Next requests bring content only.

st4k is very much not a "simple static page"; it's a tiny SPA which requires JS.

What's on it? It doesn't render for me with NoScript enabled. :)

It's Stackoverflow in 4kB. Here is the explanation: http://danlec.com/blog/stackoverflow-in-4096-bytes

Quite funny. My blog page (with text only) has score 4.06. Converting it to image is stupid. Btw. I've been working with a Korean company on a couple of Korean websites. What they provided was ONLY IMAGES. All the texts were images, new content was in images, all the changes were just image changes. It was a nightmare.

A lot of Japanese websites are the same. All the navigation buttons are images for example, I found that quite a bad experience over slow wifi!

Having dealt with the Korean web (as a user) it was daunting.

Once had a customer who did the same. His reason at the time was that his customers see boxes otherwise. This was one of my first projects and I was still in school then. Didn't know anything about unicode, plus unicode was not that popular then.

I guess, may be your customer has the same problem.

Setting aside countless other obvious issues, converting text to images is a huge accessibility issue.

Lucky it's not based in the US, that's the sort of thing that gets you sued.

thats what you get in IE 6 society

Calling the metric "WebBS" was fantastic, it had me continually reading it as "Web Bulls!@#"

I would be surprised if it wasn't author's exact intent.

The naming is rather brilliant

This is like a poster child for a bad metric. It's bad because it's possible for a team to do a good job and have the metric look bad, it's possible for a team to do a bad job And have the metric look good. Are we sure that this metric is measuring something that we care about?

This metric is interesting to me, but it's an esoteric thing compared to practical issues of how well a site achieves its goals. This metric reminds me of tongue and groove house design: It's wonderful, but only matters to artisans.

To some extent this is true of many metrics.

The real question is whether the metric motivates people to do the right thing. Is this something people should optimize for to the exclusion of other goals? Probably not. But bloat is a serious problem, and this metric just shines the light of recognition on that.

While the phrasing seems problematic, as HTML has value even if larger than the equivalent image (selectable text, clickable links, interactive elements, etc), the metric seems vaguely reasonable. In particular, if you compare to the compressed screenshot size, then you've incorporated a very simplistic complexity metric beyond just the dimensions of the page.

Minor bug report: I was testing a page that was using SVG Stacks[1], where multiple icons are encapsulated in a single SVG image and the individual icons are referenced via fragment identifiers. This caused the WebBS calculator to count the SVG image multiple times (once for each fragment reference), resulting in a score that was significantly worse than it should have been.

[1] http://simurai.com/blog/2012/04/02/svg-stacks

It makes me glad to see that HN scores well, as presumed:

Page size: 10.4 kB and 6 requests

Image size: 59 kB

Web-BS: 0.176

Seems WebBS is not accurate to measure information density of: pages with lots of images, video streaming and text rendered as images. It seems it aims to measure bloat for information delivered as text (layout, colors, etc also convey information but for simplicity let's avoid trying to measure them).

So, in essence WebBS aims to measure rendered html without tags (like using the text() function of jQuery to get contents of a div tag). In that case, why no traversing the DOM, get clean text content, measure it (count bytes) and divide by request size. In this case, WebBS will always be below 1 and a value closer to 1 is better. Also, some punishment points should be considered for number of request (or maybe define another rate since that is not bloat per-se).

Why don't you cache the results for your proposed websites (or at least just your own's ? )

+1, that should take quite some load off. Some kind of retest link would be helpful though, to clear the cache and test a page again.

A job well done, congratulations! The idea is fantastic and your implementation, including the choice of non-lossy reference image, has a sound design. The purpose and intent is obvious.

And please ignore the complaining bunch! Any metric can of course be misused (think unit test coverage, linting, soft coding, etc.) but writing software is craftsmanship. We're half-way to art. To even be able to measure is an achievement, and what gets measured gets better. Me and my senior colleagues immediately took it to heart and will use it as objectively as language shootout. Many thanks for planting an industry standard seed!

Tried to run it on boston.com, and I got a "Calculator stopped measuring after ~20 sec timeout." error. When the bloat tool can't even fully measure your bloat, you might have a problem.

While the implementation is not optimal (favors photograph content as PNG is not going to win JPEG here; overall favors pages of few colors) the idea is actually quite decent.

I'd very much prefer script bloat calculator.

1.Open dev tools

2.run a JS CPU profile.

3. :)

The score is a bit flawed as it assumes pertinent information can only be contained in text, not in images.

Is there a site or software that allows you to really convert your site to an image map?

They can't process requests now. Hug of death already?

Whoa. Your last comment before this one was 6.2 years ago.

You seem to be banned, by the way. I had to vouch this comment, and your previous comment (https://news.ycombinator.com/item?id=1525566) is dead too. Since that was long before the new mod team, you may want to email hn@ycombinator.com to discuss the possibility of getting yourself unbanned.

Welcome back to HN.

> Whoa. Your last comment before this one was 6.2 years ago.

Now that's what I call Lurking ;)

Too bad there's no Stackoverflow-style badge for it.

Unbanned already as far as I can tell.

The comment I replied to was marked [dead], which usually indicates a ban. It's alive now since I vouched it. Is there another way to know?

Despite having used HN for several years now, I've never really understood [dead] etc. The FAQ says it's anything that has been flagged as such by mods or the software itself. So does the original author of the comment see "[dead]"? Does it happen automatically if the author is banned/shadow banned? Why/how are so many 'New' submissions 'DOA'? Is there anywhere that explains this so I never have to ask these questions - and others - again?

The first HN moderation rule is "don't talk about HN moderation rules". The goal is maturity through obscurity.

How does a lack of transparency help?

Sure, feel free to ask anything you're curious about. The only reason I know the answers is because I've been around since day two across various accounts.

> So does the original author of the comment see "[dead]"?


> Does it happen automatically if the author is banned/shadow banned?

Yes, all submissions and comments from a banned account are automatically killed.

> Why/how are so many 'New' submissions 'DOA'?

There is a banlist for domains in addition to accounts. Any attempt to submit a link to a banned domain will be killed.

Interestingly, news.ycombinator.com is banned in order to prevent meta-submissions. If you're curious what someone sees when they submit a banned post, submit e.g. this comment.

> Is there anywhere that explains this so I never have to ask these questions - and others - again?

Nope. :) It's an evolutionary process that's been going on for 9.7 years. http://www.paulgraham.com/hackernews.html

There are at least four other ways that a comment can become dead: if it's a dupe (try submitting the same comment twice), if it's flagged by multiple users (around three or four, I think?), if a moderator manually kills a comment, or if someone is posting from Tor under a brand-new account.

In the case of a dupe or a flag, you'll see [dupe] [dead] or [flagged] [dead] respectively. That's how you can tell why a comment is dead: if it's just [dead], either a mod killed it, it was posted by a banned user, or was posted from a Tor user. If the user's name is green, it's probably a Tor user.

In this case, they were obviously banned because the comment was only 24 minutes old when I replied. Nowadays, when a moderator kills a comment manually, they will usually leave a reply stating the reason. Since there was no reply, I suspected the user was banned.

This brings us to one of HN's most interesting features: the vouching system. If you have a certain amount of karma, I think a few hundred points, then you can cause a comment to go from [dead] to alive. The reason this was implemented is because sometimes a banned user posts an informative or harmless comment. These comments would provide value to the community.

To vouch a dead comment, you must first be able to see dead comments. If you go into your profile, you'll find a setting called "showdead" which you can set to true. (By default, no one sees dead comments.) Then to vouch, click on a dead comment's timestamp. You'll be taken to a page with a "vouch" link.

I don't know how it is for other users, but for me, clicking on "vouch" always resurrects the dead comment. I.e. only my own vouch is necessary. I assume it's the same for other users.

This is a serious responsibility. The moment that people start vouching crap comments is the moment dang will have to rework the vouching system. HN brings a bit of joy to my life, so I like to go out of my way to vouch dead comments that are high quality, and then read through the user's previous comments to see whether they were banned and why. 95% of the time, I say nothing to them. Users were almost always banned with good reason. In the case where you spot someone who was banned for seemingly no reason, the user was almost always involved in abusing the site (voting with multiple accounts, etc).

This user happened to be the exception to that. They were banned 6.2 years ago with seemingly no reason. Perhaps they were involved in some sort of abuse against HN. Or perhaps they were posting from a shared IP address and happened to get be mis-identified. Either way, 6 years is certainly enough time for them to appeal their ban. Besides, I like the idea of someone coming back to the site and being welcomed.

The HN team is very responsive and quite reasonable, so usually if you send a sincere apology and promise to behave then you'll be given another chance.

It's been pretty fascinating to watch the moderation process evolve along with the community over the years.

Thanks very much for your informative and comprehensive reply. It's a complicated system, but I understand the reasons behind that. HN is such a (relatively) high-quality forum, and it's interesting to me how that's partly because of the underlying software, and partly because of the policies and how the community applies, and is governed by, them.

Let's terminate this off-topic discussion here! :)

If they got rid of all that bloat, they'd be better able to cope with the load, clearly.

yep. dead

Timeout when checking vwo.com

Not loading at all.

It's certainly a lean web site. Or rather, the 503 is lean.

It would kind-of-make sense but only for static pages. Pages are now an interactive applications so comparing them to size of visual representation of their initial state is not a best metric.

Almost all pages I regularly visit are not interactive applications. HN is one of the most interactive sites.

News websites in particular are guilty of downloading Megabytes upon Megabytes of who knows what to display the five paragraphs of text and the image that I actually want to see.

A lot of developers like to think that the websites they're building are interactive applications where people spent a lot of time strolling around. In almost all cases I get to such "applications" through direct deep links, and i'll close the tab when done looking at that specific page. I rarely care about anything else but the initial state.

Same experience when I was working for a news site. An increasing amount of users get to articles directly from e.G. facebook and close the site after reading.

I personally try to avoid interactive pages. Most of them seem to be built by people who hire psychologists with the goal of preventing you from doing other things.

If by interactive you mean "reactive html5" pages, then I have to admit that I avoid them like the plague.

They require endless scrolling, there are often pointless template images on them and the fonts are too big, and their maintainers often want you to try out or download something 'for free' - and three days later you find out it's not free at all and they want $5/mo.

Sure, there are exceptions. I couldn't imagine google maps without bloat. That is a web-application. But it is an exception. The current problem is that we treat exceptions like they are the norm. Most sites to not need tons of JS/css/etc to display a 10-paragraph news article, yet they do because "reasons".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact