Hacker News new | past | comments | ask | show | jobs | submit login

He is saying that non-google search engines are just repackaging google search results because no one actually has the resources to build a search engine anymore. Bing was already caught doing this like 10 years back if I remember correctly, and while I doubt Bing still does this, I bet a number of the smaller players do.

It's as easy as setting up a lambda with headless chromium and then integrating those results into whatever internal results your system has (if any) on the fly. Google is fast enough that this could be done without any perceptible performance impact, and that's ignoring the possibility of massive caching of common searches. It will look like normal web traffic to google, and the lambda network will result in a diversity of IP addresses, so it might never get flagged.




It's quite easy to catch this behaviour by issuing unique queries and looking at Google web server logs. Which means that all these companies have to get permission from Google for repackaging the results.


I presume you are referring to AWS Lambda in which case wouldn't the IP addresses be within the IP range used by AWS? If so, Google might have already blocked those IPs.


Yes, except they haven't because I've done exactly this in a lambda with no issues.


Bing got 'caught' doing something completely innocuous: using opt-in data about what Bing Toolbar users click on to improve search results. They never looked at what google would return for any searches. Google had to install the Bing Toolbar, enable that data sharing, and actively submit their 'honeypot' keyword->click data to Bing before it would show up in Bing results.


> using opt-in data about what Bing Toolbar users click on

lol specifically, what Google search results Bing Toolbar users click on and what query they had entered in Google to get those results.


Specifically every link they every clicked on, on every web site.

Yes some fraction of this was on google. That's not cheating. No content from google was transmitted back to Bing. Just the next site the users went to. Is Bing supposed to go through the logs looking for visits to google and then delete the rest of the session because it might indirectly reveal something about what existed on the google page?

I should own my records about what pages I go to, not Google. If I share them with Bing it's nobody else's business.


> every link they every clicked on, on every web site. [...] No content from google was transmitted back to Bing. Just the next site the users went to.

Your theory cannot explain how Bing associated that "next site" with the specific search term the user had entered in Google.

If you had clicked a link on HackerNews, it wouldn't have shown up in Bing under some random search phrase. It's obvious that Microsoft parsed the search query out of the google.com URL, and the only reason why you'd do that is to mine what results were being presented for each search query.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: