This seems more like a DailyWTF than a sweet optimization to be reminiscing about. I'd love to hear more details about this. I mean a single video is about the equivalent of tens of thousands of 404 pages.
I'm not sure which is more mindboggling... Spending 66% of their bandwidth and 50% of their CPU on serving these trivially cacheable pages or the fact that they didn't correct the problem when they were serving more than 5%.
If you read the entire thing (not just this select comment) you will find that they just re-engineered their site from Dupral to Django meaning they would have had this in place within days if not hours of discovering the bottle neck.
I can confirm that videos, articles, images, 404 pages are all served by a CDN. Our 404 pages were not cached by the CDN. Allowing them to be cached reduced the origin penetration rate substantially enough to amount to a 66% reduction in outgoing bandwidth over uncached 404s.
Edit: This is not to say that our 404s were not cached at our origin. Our precomputed 404 was cached and served out without a database hit on every connection, however this still invokes the regular expression engine for url pattern matching and taxes the machine's network IO resources.