Hacker Newsnew | comments | show | ask | jobs | submit login

When you say the algorithm affects 0.3% of searches on google do you mean you select each query with a probability of 0.003 and pass it through this algo, or this algo is used 100% of the time when query A is followed by query B and these combinations account for 0.3% of all searches. If it is the latter then the fact that Obama is a magic keyword may mean that you are biasing a very high percentage of political searches. I am sure election involving an incumbent is an edge case which is very difficult to account for, but now that we know of this I hope google will try to correct the results to remove these specific accidental biases in the future.



The solution for most of these sorts of things is just to refresh the data more quickly. Lots of queries, particularly head queries, are pretty stable in their characteristics over time. For a system that isn't absolutely critical to getting the query correct it's perniciously seductive to think that you can just push out the data once, then refresh it every quarter or so.

We used to have a problem with spelling where some news event would make a person with an uncommon name famous, but google would mistakenly correct it to a more common but incorrect name just because the spelling system hadn't ever seen this person's name before. We've fixed that issue and many other freshness related things: http://googleblog.blogspot.com/2011/11/giving-you-fresher-mo... but this is an ongoing area of focus throughout a lot of our systems.

It's an interesting problem because for many things recomputing the data faster will only fix a handful of queries, so from a raw impact standpoint hardly seems worth it. However those queries end up being ones that are in the news and related to things that people care a lot about.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: