The linked post itself says it's mostly spiders following links from various place on the web to old/invalid URLs:

> And the biggest performance boost of all: caching 404s and sending Cache-Control headers to the CDN on 404. Upwards of 66% of our server time is spent on serving 404s from spiders crawling invalid urls and from urls that exist out in the wild from 6-10 years ago.

They're still serving the pages, but they're serving them off CDN instead of serving them off their main server.

Still, I wonder what Google does with page rank that goes to a 404 vs a permanent redirect.

