Google just cut off 90% of the internet from AI – no one's talking about it

jdale27 · 2025-10-08T19:11:31 1759950691

Clickbait title. They only cut off the AI that was using Google as their crawler, which was not a good idea in the first place. I’d love to ask the developers of these AIs: what exactly did you expect to happen here?

fooker · 2025-10-08T18:59:14 1759949954

Writing a web crawler is not too complicated.

I predict every 'AI' company will have a homegrown search engine in a few months to account for this.

The way this would be publicly usable is through the new generation of 'AI' browsers.

aiauthoritydev · 2025-10-08T19:19:33 1759951173

It is not crawling but indexing is the problem. Google has over years learned the patterns and authority of different articles. It will be hard for others to replicate but not impossible.

What Google should do is offer API based access to these providers but a lot of these providers might no adhere by contracts. So there is that.

fooker · 2025-10-08T20:48:23 1759956503

Indexing is a fairly well understood technology.

You could hire one or experts for this to be doable with a pretty good amount of scalability.

afavour · 2025-10-08T19:27:24 1759951644

Crawling isn't complicated. But ranking? That was Google's reason for existence for a very long time. Remains to be seen if AI companies will be able to replicate that.

fooker · 2025-10-08T20:47:10 1759956430

Here's the interesting part - ranking matters when humans are looking at the results.

For a bot with a large context window though, not so much.

toomuchtodo · 2025-10-08T19:07:49 1759950469

https://commoncrawl.org/

barbazoo · 2025-10-08T18:58:22 1759949902

> You can no longer view 100 results at once. The new hard limit is 10.

Does Google not support lazy-loading more results or is that not supported via API or what's going on here?

asdff · 2025-10-08T19:07:20 1759950440

Results are certainly thinner than in years past when you could seemingly inspect the entire crawled corpus. You can search for a pretty broad topic and hit the wall pretty fast today. I think they are limiting the depth of queries these days probably owing to search volume and the size of cache they can sustain what with current webdev standards. It was a different story when websites were a few kb to mb 20 years ago even though storage is "cheaper" today.

weinzierl · 2025-10-08T19:20:19 1759951219

"Most large language models like OpenAI, Anthropic, and Perplexity rely directly or indirectly on Google's indexed results to feed their retrieval systems and crawlers."

Is this true?

I thought OpenAI was using Bing. Gemini obviously will use Google but to them the restriction does not apply. Claude says it uses Brave.

jug · 2025-10-08T19:23:27 1759951407

Or OAI-SearchBot for "web search" feature to augment queries and GPTBot for training?

I swear I've even read how aggressive GPTBot is. Surely they aren't just googling stuff?

https://platform.openai.com/docs/bots

pityJuke · 2025-10-08T19:13:30 1759950810

i thought most of the major ai vendors (excl google) used their own crawlers and indexes, or licensed from a non-google company

fooker · 2025-10-08T20:49:45 1759956585

> used their own crawlers and indexes

Not yet, but will eventually for sure.

If you're an expert who has worked on Google search or something like that, this would b great time to start a company for this.

aiauthoritydev · 2025-10-08T19:18:24 1759951104

Clickbait article.