
Small things add up: 4chan's migration to a cookieless domain - moot
http://chrishateswriting.com/post/68794699432/small-things-add-up
======
quchen
> 50 bytes may not seem like a lot, but when you’re serving 500 million
> pageviews per month, it adds up.

That's a void argument in the article. If they were serving pages with 500
bytes each this would indeed be a huge improvement, but no page is 500 bytes.
I just opened 4chan.org, and the markup _before <body>_ is already 1,836
bytes. The entire frontpage of /b/ is 114,428 bytes, and saving 50 is
absolutely negligible. On the other hand, if saving single bytes was
significant, there would be a lot more potential in the source by shortening
CSS class names etc. rather than picking a short domain name.

EDIT: According to
[http://www.4chan.org/advertise](http://www.4chan.org/advertise), there are a
total of 575,000,000 monthly page impressions.

~~~
moot
You're absolutely right -- it _is_ negligible. When you're serving upwards of
a petabyte per month, 23 GB isn't exactly a lot!

> On the other hand, if saving single bytes was significant, there would be a
> lot more potential in the source by shortening CSS class names etc. rather
> than picking a short domain name.

Also spot on, but the point I was trying to make is I was given the choice of
choosing a longer domain and a shorter one, and the shorter one resulted in
smaller page size, which does result in some (though as you put it,
negligible) savings in terms of transfer. CSS/JS refactoring/pruning would
definitely be a better bang for your buck if your goal was solely to reduce
page weight, but my primary goal was to decrease request overhead and this was
just a side benefit at no additional cost to me.

As an aside, I would say the non-technical benefit of the longer domain
(4chan-cdn.org) would have been avoiding user confusion, but I feel this is
mitigated since visiting 4cdn.org directly bounces you to www.4chan.org and
our custom error pages make 4cdn.org clearly 4chan related.

~~~
quchen
Right, if you use a short domain name "because why not" for something new
that's alright. It was just the "it adds up" part that I was commenting on.

Do you have (well, you're moot, so -- want to share) some more data of 4chan's
current size? I'm sure lots of people would be interested in hearing about
that. (This would probably make a great individual post.)

~~~
Killswitch
I'd be very interested in this... I used to visit 4chan all the time back in
the day, not so much now, but I do enjoy reading statistics about the site.

Thanks for this article moot.

~~~
moot
I've been meaning to write a post about all of the weird stuff we do in the
interest of maximizing our limited resources. We've always had to stretch
things as far as possible given server, financial, and time constraints, which
has led to some interesting/unorthodox "solutions."

~~~
Killswitch
That's what interests me about 4chan's story... How you do that is amazing.

------
tomp
> If you’ve been linked directly to a Facebook photo, you may have noticed the
> domain wasn’t facebook.com, but instead something like fbcdn-
> x-x.akamaihd.net. Large sites load static content from special domains for a
> few reasons, but primarily to reduce request overhead, and sometimes
> security.

This works for 4chan, but in case of Facebook, it actually reduces security -
since cookies cannot be checked for photos on static domains, _everybody_ can
access _every_ photo (as long as they are given an URL), regardless of the
photo's privacy settings. In the case of Facebook, they are probably using
sufficiently random URLs that mostly mitigate the issue, but a naive
implementation could be very problematic.

~~~
gokhan
If you give the url of the photo you are allowed to see to someone else, you
can give the actual photo as well. I don't get the security risk here.

~~~
trothoun
If the urls are guessable then someone could harvest images without having
been given urls by authorized users.

------
largehotcoffee
A quick look shows that you could minify the Javascript quite a bit more
(which is relatively easy, and would save a lot more than 50 bytes). You might
also look at inlining all of the Javascript into the HTML directly (this is
what Google does), which saves the extra HTTP requests. You could also do this
with some of the persistent images on the page (logo) and inline them with
base64 (but I don't think it actually helps with the page size).

Finally, instead of using the browser extensions (which are helpful), get
actual PageSpeed installed on the servers!

Apache:
[https://developers.google.com/speed/pagespeed/module](https://developers.google.com/speed/pagespeed/module)
nginx:
[https://github.com/pagespeed/ngx_pagespeed](https://github.com/pagespeed/ngx_pagespeed)

Regardless, awesome stuff. Big fan. Much love.

~~~
moot
Thanks for pointing that out. We minify production JS using Closure Compiler
but sometimes that leaves room for improvement.

There's definitely a tradeoff between inlining JS and small images, but I
think in our case it makes more sense to leave them external to leverage
browser (and since we use a CDN for static content -- edge) caching.

I believe I tried to get ngx_pagespeed up and running when it was announced,
but couldn't get it to compile from source. Sometimes (read: often) it sucks
to be a FreeBSD user.

~~~
Kudos
I've found that uglifyjs does about as good a job as Closure Compiler, but
waaay faster. Could be a nice speed up in deploy times for you.

~~~
lennel
closure kicks uglify for size when compiling with advanced optimisations.

~~~
Kudos
I found that the advanced optimisations broke our Javascript when I compared
the two a couple of years back. We weren't interested in rewriting our
Javascript to make it compatible with the Closure.

~~~
lennel
The thing that will break your code is this type of notation this['function']
since the compiler can have no idea what renaming should apply. There are
aspects of the library that lets you expose public apis.

I can really recommend the compiler in AO mode, the type saving is insane (75%
reduction in file size) and type checking is sweet

------
doctorfoo
Hey moot, why don't you let pass users bypass individual IP blocks?

I refuse to disable my VPN, so it means I can't post any more. A shame 4chan
has no way for privacy conscious users to post, especially given your support
of StopWatching.us and such.

~~~
jhgg
You can... you just have to shell out money for a 4chan pass :\

~~~
doctorfoo
I have one.

> "4chan Pass users may bypass ISP, IP range, and country blocks" > "Pass
> users cannot bypass individual (regular) IP bans."

So, if some random spammer uses the same VPN server, it gets blocked by an
individual ban. This rapidly happens to all popular shared VPNs.

------
adriancooney
I never even contemplated the size of cookies before seeing this. It never
occurred to me that it could create such an overhead. It'd be incredibly handy
if we could set a header like `x-send-cookies: NO` to stop the browser sending
cookies along for static content. Great post, a real eye opener.

~~~
reddiric
Set a header where (on which request from whom to whom?)

~~~
dnissley
The initial request from browser to server, I presume.

------
gambler
Maybe it's just me, but I hate how current web technologies force people to
register separate domains for static content. This is not how domains are
supposed to work.

~~~
cheald
They don't. You could serve your site off of www.domain.com and your CDN off
of cdn.domain.com easily.

~~~
bjt
But then you still have to remain vigilant against a clueless dev or random JS
lib on www.domain.com setting a cookie for .domain.com, which your browser
will helpfully include with requests to cdn.domain.com. With the completely
separate root you're protected from that.

[http://en.wikipedia.org/wiki/HTTP_cookie#Domain_and_Path](http://en.wikipedia.org/wiki/HTTP_cookie#Domain_and_Path)

~~~
sliverstorm
So basically it doesn't _have_ to be that way, but to protect yourself from
cluelessness/stupidity, you do it.

Kind of like Unix permissions vs. jails/virtual machines. Both are secure, but
one is more secure against incompetence than the other.

~~~
moot
I wouldn't agree. There are completely legitimate reasons to set cookies on
*.domain.com -- it isn't "clueless/stupid" to do so, just less ideal.

~~~
sliverstorm
I wasn't making a statement, just trying to clarify and then making an analogy
to verify my understanding. I think I omitted a question mark where I should
have had one.

Certainly, it is not _automatically_ a bad idea to set such cookies. I see
that.

------
mapgrep
I'm very curious why an anonymous site that doesn't even allow registered
users gets 100k worth of cookies on a typical connection.

Not saying you're doing anything wrong, just curious. I assume some of it is
for ad tracking, but that's still a hell of a lot of data!

~~~
moot
Well it's a single kilobyte, but 100 KB in aggregate.

It's almost entirely Google Analytics, unfortunately. Our ads are served from
a different domain (4chan-ads.org) for specifically this reason (user privacy
and cookie bloat).

~~~
bdt101
The newest version of Google Analytics eliminates most of the cookie bloat.
There's now just a single id cookie that's around 30 bytes or so.

[https://developers.google.com/analytics/devguides/collection...](https://developers.google.com/analytics/devguides/collection/upgrade/)

~~~
moot
Wasn't aware of this (but was hoping it was in the works) -- thanks a bunch!

------
slyall
We wanted to do this at a news site I worked for (since we had way too many
cookies) but the problem was Google news.

On Google news thumbnails would not be shown unless they came from the same
domain as the other page.

So our content was on "www.example.com" and pictures were on
"media.example.com" but the cookies were for "example.com" so got sent with
every image request.

~~~
aidenn0
Set cookie for www.example.com?

~~~
slyall
Unfortunately they used various subdomains for things.

------
d0ugie
moot, given that your site seems like an ideal candidate for WebP, why not use
it with pagespeed (and/or let users opt to use the format when posting)?

~~~
asiekierka
In order to stay compatible with non-WebP-supporting browsers, moot would have
to keep two versions of image files on the disk, causing a lot of pain and HDD
waste.

It could work for the smaller boards, though...

~~~
TazeTSchnitzel
A board for WebP animations would be amazing.

------
webhat
Recently a way to save ~33 raw bytes was suggested in the HTML5 Boilerplate
issues by using localStorage.

[https://github.com/h5bp/html5-boilerplate/issues/1444](https://github.com/h5bp/html5-boilerplate/issues/1444)

------
lucb1e
Now, what about the extra DNS lookups? That adds 1x roundtrip time for the
user, plus 20-40 bytes of IP header (v4/v6), plus 8 bytes UDP header, plus ~25
bytes for the query and ~100 bytes for the reply.

~~~
TorKlingberg
Wouldn't that be cached after the first load?

~~~
pw77
Doesn't that depend on the browser? Chromium definitely does cache DNS records
(chrome://net-internals/#dns). I remember Firefox relying on a system
configured DNS cache (dnsmasq or pdnsd on Linux).

------
jhgg
4chan uses cloudflare, right?

Have you investigated the gains to using Cloudflare's Railgun? Seems like it'd
be able to save quite a bit of bandwidth on your end.

------
gabriel34
I'm a bit paranoid, don't trust google and never really liked the idea of
recaptcha on 4chan because of the illusion of anonymity found there. The
cherry on the top is finding out uses Google analytics (mind you I haven't
been there in a while. Back then I wasn't nearly this concerned with privacy)

~~~
ANTSANTS
I also wish 4chan would switch to open source, self-hosted CAPTCHA and
analytics solutions. CAPTCHA especially, because while analytics scripts,
tracking images, etc. can easily be blocked, you cannot participate on the
site without allowing ReCAPTCHA to constantly phone home to Google.

------
Jhsto
They should probably also force connections trough their SPDY supported HTTPS,
rather than making it an option.

~~~
maxk42
A huge portion of their user-base doesn't visit with SPDY-capable browsers.

~~~
moot
Actually 78.75% of our users are on Chrome/Firefox!

SSL is forced on our domain you post to (sys.4chan.org) with redirects and
HSTS, and we set cookies with proper Secure and HTTP-Only flags. Maybe some
day we'll force SSL site-wide, but I don't think that's the right decision for
now.

I definitely encourage people use the EFF's wonderful HTTPS Everywhere
extension though: [https://www.eff.org/https-
everywhere](https://www.eff.org/https-everywhere)

~~~
maxk42
78.75%? That's great news!

I imagined a lot of people would be using mobile and I know that Safari iOS
doesn't support SPDY. Does this mean that > 80% of users are browsing on
desktops, or is it possible there's a mobile app that's reporting a false user
agent?

Or maybe all the iOS users fell victim to waterproof tests...

~~~
moot
We get surprisingly little mobile web traffic -- only 16% in November.

------
pieter
I doubt domain length will really make any difference, gzip compression should
take care of a longer domain name.

~~~
moot
The 50 bytes figure represents a compressed response. (We actually write all
of our pages compressed to disk before serving them -- nothing is served
dynamically. But that's for another post...)

The example below isn't the most scientific, but should give you a rough idea.

    
    
      Test index page with different static URLs:
      URLs as 4cdn.org -- 23261 bytes compressed
      URLs as 4chan-cdn.org -- 23311 bytes compressed
      URLs as 4chan.org (control) -- 23278 bytes compressed

~~~
dingaling
So 4ch.io would have knocked-off another 30 bytes or so?

~~~
moot
BRB, switching everything now!

------
AsymetricCom
I don't have any of these problems with 4chan because I block most of their JS
and I don't accept 3rd party cookies, and I definitely don't let any Google
APIs run on my local machine. Google's APIs are for Google's hardware, which
my hardware is not a subset of.

