Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google just cut off 90% of the internet from AI – no one's talking about it (reddit.com)
12 points by alexgotoi 3 hours ago | hide | past | favorite | 14 comments




Clickbait title. They only cut off the AI that was using Google as their crawler, which was not a good idea in the first place. I’d love to ask the developers of these AIs: what exactly did you expect to happen here?

Writing a web crawler is not too complicated.

I predict every 'AI' company will have a homegrown search engine in a few months to account for this.

The way this would be publicly usable is through the new generation of 'AI' browsers.


It is not crawling but indexing is the problem. Google has over years learned the patterns and authority of different articles. It will be hard for others to replicate but not impossible.

What Google should do is offer API based access to these providers but a lot of these providers might no adhere by contracts. So there is that.


Indexing is a fairly well understood technology.

You could hire one or experts for this to be doable with a pretty good amount of scalability.


Crawling isn't complicated. But ranking? That was Google's reason for existence for a very long time. Remains to be seen if AI companies will be able to replicate that.

Here's the interesting part - ranking matters when humans are looking at the results.

For a bot with a large context window though, not so much.



> You can no longer view 100 results at once. The new hard limit is 10.

Does Google not support lazy-loading more results or is that not supported via API or what's going on here?


Results are certainly thinner than in years past when you could seemingly inspect the entire crawled corpus. You can search for a pretty broad topic and hit the wall pretty fast today. I think they are limiting the depth of queries these days probably owing to search volume and the size of cache they can sustain what with current webdev standards. It was a different story when websites were a few kb to mb 20 years ago even though storage is "cheaper" today.

"Most large language models like OpenAI, Anthropic, and Perplexity rely directly or indirectly on Google's indexed results to feed their retrieval systems and crawlers."

Is this true?

I thought OpenAI was using Bing. Gemini obviously will use Google but to them the restriction does not apply. Claude says it uses Brave.


Or OAI-SearchBot for "web search" feature to augment queries and GPTBot for training?

I swear I've even read how aggressive GPTBot is. Surely they aren't just googling stuff?

https://platform.openai.com/docs/bots


i thought most of the major ai vendors (excl google) used their own crawlers and indexes, or licensed from a non-google company

> used their own crawlers and indexes

Not yet, but will eventually for sure.

If you're an expert who has worked on Google search or something like that, this would b great time to start a company for this.


Clickbait article.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: