Okay you guys obviously aren't getting the whole picture. A minority of our links are from old content that I no longer know the urls for. We redirect everything from 5 years ago and up. Stuff originally published 6-10 years ago could potentially be redirected, but none of it came from a database and was all static HTML in its initial incarnation and redirects weren't maintained. This was before I took charge of the Onion's link management.
I can't fix a broken spider and tell it not to request these links that do not even exist, but I still have to serve their 404s.
Edit: In other words, because of broken spiders that try to guess URLs, I have roughly 15-20x as many 404s as 200 article responses, and there's nothing that can be done about it except move every single resource that exists on my page into a css spritemap.
Edit #2: You can't see the urls very clearly, but what's happening is a spider is finding a filename on our page appending it to the end of the URL and requesting that to see if it exists.