In general we want to measuring what matters to users, so removing a lot of spam that nobody sees doesn't really make anything better.
By the same token, if you are trying to launch a new spam classifier that has some false positives, if one of the false positives is yahoo or facebook, it doesn't really matter how good it is, it will never be worth the collateral damage.
As a result, rather than measuring precision/recall as a percentage of domains or as a percentage of urls we usually try to measure it as a percentage of results that appeared on a search result page or results that users click on, mined from our logs.
This is one of the bajillion reasons why it's absurd to expect Google to throw away all its logs data. The logs are essential to coming to any meaningful conclusions about the current quality of our search, let alone finding ways to improve it.
"As a result, rather than measuring precision/recall as a percentage of domains or as a percentage of urls we usually try to measure it as a percentage of results that appeared on a search result page or results that users click on, mined from our logs."
So this basically means that you are able to discern content spam on authoritative domains (facebook, wordpress.com, etc) based on ctr, bounce, impressions compared to surrounding serp results rather than comparing that data against the parent domain as a whole?
By the same token, if you are trying to launch a new spam classifier that has some false positives, if one of the false positives is yahoo or facebook, it doesn't really matter how good it is, it will never be worth the collateral damage.
As a result, rather than measuring precision/recall as a percentage of domains or as a percentage of urls we usually try to measure it as a percentage of results that appeared on a search result page or results that users click on, mined from our logs.
This is one of the bajillion reasons why it's absurd to expect Google to throw away all its logs data. The logs are essential to coming to any meaningful conclusions about the current quality of our search, let alone finding ways to improve it.