That's a void argument in the article. If they were serving pages with 500 bytes each this would indeed be a huge improvement, but no page is 500 bytes. I just opened 4chan.org, and the markup before <body> is already 1,836 bytes. The entire frontpage of /b/ is 114,428 bytes, and saving 50 is absolutely negligible. On the other hand, if saving single bytes was significant, there would be a lot more potential in the source by shortening CSS class names etc. rather than picking a short domain name.
EDIT: According to http://www.4chan.org/advertise, there are a total of 575,000,000 monthly page impressions.
> On the other hand, if saving single bytes was significant, there would be a lot more potential in the source by shortening CSS class names etc. rather than picking a short domain name.
Also spot on, but the point I was trying to make is I was given the choice of choosing a longer domain and a shorter one, and the shorter one resulted in smaller page size, which does result in some (though as you put it, negligible) savings in terms of transfer. CSS/JS refactoring/pruning would definitely be a better bang for your buck if your goal was solely to reduce page weight, but my primary goal was to decrease request overhead and this was just a side benefit at no additional cost to me.
As an aside, I would say the non-technical benefit of the longer domain (4chan-cdn.org) would have been avoiding user confusion, but I feel this is mitigated since visiting 4cdn.org directly bounces you to www.4chan.org and our custom error pages make 4cdn.org clearly 4chan related.
Do you have (well, you're moot, so -- want to share) some more data of 4chan's current size? I'm sure lots of people would be interested in hearing about that. (This would probably make a great individual post.)
Thanks for this article moot.
We essentially think of flat files on disk as a cache for the database, and don't employ a proxy cache or any other common HTTP proxies. It's a little unorthodox, but it works well for us.
We used that memory-partition-over-network thing for a while, but actually switched to SSD-over-network because it was faster than the memory partition. I spoke with a FreeBSD maintainer about it and he said what we were doing was so unsupported/unoptimized that he wasn't surprised.
We run into weird FreeBSD edge cases pretty often where there are few people, if anyone, who can answer our questions. Sometimes I wish we'd gone with Linux, but after ten years the hassle of switching doesn't seem worth it. Thankfully 9.2-RELEASE has been pretty good to us.
if there's a power-blink and the posts get lost? Its 4chan.
In addition, let's not forget to multiply the cookie overhead by the number of images on the page... likely to be quite a large number given 4chan's love of images and enormous, eager-loaded discussion threads.
All those bytes are justified because they are rendered on the screen provide direct utility to the user. The extra 50 bytes does nothing but slow the page down.
there would be a lot more potential in the source by shortening CSS class names etc
There is no either/or here: this can and should be done as well.
I realize there are a lot of things that could go wrong, such as
* correlating the HTML and CSS for an entire site instead of just one page
* dealing with third party dependencies that require certain class names to be used
Just wondering if there's any work already done with this approach.
Changing the class names would be difficult as it wouldn't pick up any dynamic class names, like
var className = 'user_' + user.getState(); //'user_deleted' or 'user_active'
An obfuscator generating code à la iocc efficiency would be quite neat (e.g. http://www.ioccc.org/2012/endoh1/endoh1.c)
Nope. That's classic non-engineer thinking. The same kind of thinking that does "optimization" without profiling.
Let me introduce you to my friend: http://en.wikipedia.org/wiki/Diminishing_returns
There's also savings in not sending cookies with every image request - which saves them a honkin' 46 terabytes.
Plus, he himself admits that this isn't the first place to start optimizing - it's just where he chose to.
If he reworked old code just to save 50 bytes, that would probably be a mistake, but it sounds like the work was being done anyway and he had the choice to save 50 bytes OR use a longer domain name.
Basically this. I thought it was an interesting side benefit that came at no additional cost to the main benefit of greatly reducing request overhead for static resources.
This is rubbish. What matters is how many packets of data go through the network. It is packets which are the unit of transfer and which are handled by intermediate nodes. The difference between say 2340 and 2290 bytes is of no material consequence whatsoever, it may cause one less packet to be sent but probably won't. To think it does have a consequence is to demonstrate a complete lack of understanding of what happens in the network and in endpoints. None of these '50 byte savings' accumulate anywhere in any meaningful or measurable sense whatsoever. So no, it doesn't "add up" to anything.
And if you're going to downvote me explain why I am wrong and how the benefit of these mythical 'savings' can be demonstrated.
What the thread OP is talking about is the decrease in page size from the URLs included in the page source. Choosing a shorter URL for the static domain versus a longer one resulted in a rough savings of 50 bytes per page (size), compressed.
You're referring to request cookie size, which was also decreased significantly (CloudFlare still sets a single cookie, unfortunately), which results in the big savings of ~100 KB upstream per page load.
I thought you were quoting me here, I wrote the same text yesterday only with a figure of 100 million pageviews. That was weird to see.
This works for 4chan, but in case of Facebook, it actually reduces security - since cookies cannot be checked for photos on static domains, everybody can access every photo (as long as they are given an URL), regardless of the photo's privacy settings. In the case of Facebook, they are probably using sufficiently random URLs that mostly mitigate the issue, but a naive implementation could be very problematic.
Finally, instead of using the browser extensions (which are helpful), get actual PageSpeed installed on the servers!
Regardless, awesome stuff. Big fan. Much love.
There's definitely a tradeoff between inlining JS and small images, but I think in our case it makes more sense to leave them external to leverage browser (and since we use a CDN for static content -- edge) caching.
I believe I tried to get ngx_pagespeed up and running when it was announced, but couldn't get it to compile from source. Sometimes (read: often) it sucks to be a FreeBSD user.
I can really recommend the compiler in AO mode, the type saving is insane (75% reduction in file size) and type checking is sweet
which saves the extra HTTP requests
4chan reloads the the page on each click meaning you save a shitload of bandwidth by not having to send the scripts and css each time the page loads.
I refuse to disable my VPN, so it means I can't post any more. A shame 4chan has no way for privacy conscious users to post, especially given your support of StopWatching.us and such.
The alternative is to have two views of the site that hit different servers, one that requires login and another that does not, but that introduces a whole slew of other problems. It would probably be the way to go if you wanted to do this however.
> "4chan Pass users may bypass ISP, IP range, and country blocks"
> "Pass users cannot bypass individual (regular) IP bans."
So, if some random spammer uses the same VPN server, it gets blocked by an individual ban. This rapidly happens to all popular shared VPNs.
Kind of like Unix permissions vs. jails/virtual machines. Both are secure, but one is more secure against incompetence than the other.
Certainly, it is not automatically a bad idea to set such cookies. I see that.
Unfortunately, this means our cookies are sent to static.domain. Worse, once we get rid of beta.domain there's no going back on wildcard cookies - there's no way to force clients to expunge cookies.
It's not an inherent flaw of the tech. It's a flaw in how we use it.
CNAMEs are inherently more flexible and more resilient in the face of various load challenges or DoS attacks:
"Root domains are aesthetically pleasing, but the nature of DNS prevents them from being a robust solution for web apps. Root domains don't allow CNAMEs, which requires hardcoding IP addresses, which in turn prevents flexibility on updates to IPs which may need to change over time to handle new load or divert denial-of-service attacks. We strongly recommend against using root domains. Use a subdomain that can be CNAME aliased... "
- Heroku [https://status.heroku.com/incident/156]
However, having static assets spread across multiple host names also helps browsers which can spawn multiple threads to pull assets from a page. I think most browsers allow 4 concurrent threads per HOST. In this case, it's just one additional host.
Not saying you're doing anything wrong, just curious. I assume some of it is for ad tracking, but that's still a hell of a lot of data!
It's almost entirely Google Analytics, unfortunately. Our ads are served from a different domain (4chan-ads.org) for specifically this reason (user privacy and cookie bloat).
Google Analytics has its shortcomings, but it's a great product and free.
How do you feel about the believability of the data?
On Google news thumbnails would not be shown unless they came from the same domain as the other page.
So our content was on "www.example.com" and pictures were on "media.example.com" but the cookies were for "example.com" so got sent with every image request.
It could work for the smaller boards, though...
Thus the cost is only 160 bytes for the page, which isn't all that much.
Additionally assuming their cache timeouts are non-trivial there are several caches to reduce the delay incurred by this.
Have you investigated the gains to using Cloudflare's Railgun? Seems like it'd be able to save quite a bit of bandwidth on your end.
SSL is forced on our domain you post to (sys.4chan.org) with redirects and HSTS, and we set cookies with proper Secure and HTTP-Only flags. Maybe some day we'll force SSL site-wide, but I don't think that's the right decision for now.
I definitely encourage people use the EFF's wonderful HTTPS Everywhere extension though: https://www.eff.org/https-everywhere
I imagined a lot of people would be using mobile and I know that Safari iOS doesn't support SPDY. Does this mean that > 80% of users are browsing on desktops, or is it possible there's a mobile app that's reporting a false user agent?
Or maybe all the iOS users fell victim to waterproof tests...
Having some idea of 4chan user base, I won't be surprized if android is more popular then iPhone.
Also, I don't know if you're using different VIPs for load balancing or lack of SNI support reasons or what not, but if your certificate provides proof of authentication for all your hostnames (probably need to use SubjectAltNames and maybe wildcards too) and the VIPs match, then Chrome & Firefox will send requests for those different hostnames over the same SPDY connection.
The example below isn't the most scientific, but should give you a rough idea.
Test index page with different static URLs:
URLs as 4cdn.org -- 23261 bytes compressed
URLs as 4chan-cdn.org -- 23311 bytes compressed
URLs as 4chan.org (control) -- 23278 bytes compressed