Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general we want to measuring what matters to users, so removing a lot of spam that nobody sees doesn't really make anything better.

By the same token, if you are trying to launch a new spam classifier that has some false positives, if one of the false positives is yahoo or facebook, it doesn't really matter how good it is, it will never be worth the collateral damage.

As a result, rather than measuring precision/recall as a percentage of domains or as a percentage of urls we usually try to measure it as a percentage of results that appeared on a search result page or results that users click on, mined from our logs.

This is one of the bajillion reasons why it's absurd to expect Google to throw away all its logs data. The logs are essential to coming to any meaningful conclusions about the current quality of our search, let alone finding ways to improve it.



Thanks.

Am I right in understanding that if a SERP result for a given keyword doesn't get clicked by users enough, it will be removed?

By the same token: does the result's bounce rate matter? I imagine spammy sites have a very high bounce rate.


"As a result, rather than measuring precision/recall as a percentage of domains or as a percentage of urls we usually try to measure it as a percentage of results that appeared on a search result page or results that users click on, mined from our logs."

So this basically means that you are able to discern content spam on authoritative domains (facebook, wordpress.com, etc) based on ctr, bounce, impressions compared to surrounding serp results rather than comparing that data against the parent domain as a whole?


No, not at all. This is just talking about the granularity at which we compute metrics like "spam rate."




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: