Meh, I'm always super unimpressed when simple text based websites have trouble scaling. Everything that's highly requested should be available in memory, and it should be trivial to spit it out instantly.
I'm not a scaling wizard, but I'd guess 99/100 times the reason CRUD apps have problems scaling is because they are over-engineered, and there is a tendency to solve scaling issues by adding another layer of complexity instead of optimizing the root application.
4chan is an image board. None of the content is there long enough to be considered "Highly requested.". And like half the posts have jpeg's and png's attached.
Besides, how does caching solve the "My bandwidth bills are killing my wallet!" issue?
Here's how I would scale 4chan. All static items served from s3/cloudfront. Posted images pushed to s3/cloudfront. (Or wherever filehosting is cheapest). These are all the high bandwidth items, all thats left is the text/html, which isn't that much work.
I'd argue that all of the text could be served from 1 nice box if you wanted to(multiple boxes make it more complicated but not that much more). Send the post to each box, add it to a table/indexes in memory and write the post to disk and backups for recovery purposes. Then either update and cache every page the new post affects, or mark the pages dirty and update and cache them the next time they are requested.
Done, all pages served out of memory super fast, what am I missing?
As far as bandwidth bills, most browsers observe cache settings and won't re-download what it has already downloaded. His complaint is about getting hit too hard serving the html/text not the images.
> All static items served from s3/cloudfront. Posted images pushed to s3/cloudfront. (Or wherever filehosting is cheapest).
Congratulations, you just massively blew out your bandwidth bill. The cheapest option is to host your static content yourself, especially if you're serving over a petabyte of it per month.
It really doesn't matter where you host the static/image files, as long as it's completely separate from the application server and doesn't eat into it's resources.
Yea, I see that now. He's complaining about both. Still, asking 4chan add-on developers to be courteous seems pretty silly. Gotta detect/ban/rate-limit them on the back-end.
> Notice that I say proxied and not cached. CloudFlare does not cache HTML—every connection/request for HTML is passed on to our servers, and our server must send a response.
CloudFlare is not caching his html, thus the performance problems, because his backend is probably dog-slow.
I'm not a scaling wizard, but I'd guess 99/100 times the reason CRUD apps have problems scaling is because they are over-engineered, and there is a tendency to solve scaling issues by adding another layer of complexity instead of optimizing the root application.