Spiders make up the vast majority of my 404s. They request URIs that in no sane world should exist. They request http://www.theonion.com/video/breaking-news-some-bullshit-ha...
They request http://www.theonion.com/articles/man-plans-special-weekend-t... even though that domain is explicitly set to http://media.theonion.com/ and is not a relative url in the page source.
I can't fix a broken spider and tell it not to request these links that do not even exist, but I still have to serve their 404s.
Edit: In other words, because of broken spiders that try to guess URLs, I have roughly 15-20x as many 404s as 200 article responses, and there's nothing that can be done about it except move every single resource that exists on my page into a css spritemap.
Edit #2: You can't see the urls very clearly, but what's happening is a spider is finding a filename on our page appending it to the end of the URL and requesting that to see if it exists.