Reddit has updated its robots.txt to block all web crawlers

dunno7456 · 2024-07-04T11:08:00 1720091280

"to tell crawler to not crawl" which can be ignored AFAIK

schroeding · 2024-07-04T12:24:25 1720095865

It can be ignored (it's the equivalent to a "keep out" sign on a lawn), but I very much doubt Google et al. (Edit: Oops, Bing et al.) will actually ignore it.

Anarch157a · 2024-07-04T12:34:35 1720096475

The article says Google is paying Reddit to get the data directly from their firehose API, so they wont even bother crawling the public website.

The_Colonel · 2024-07-04T13:14:19 1720098859

I wonder how much they pay. Reddit profits a lot from showing up on the top for many search queries. I very often do "whatever I'm looking for reddit" (for e.g. product reviews), since the reddit results often provide higher quality information than normal results.

toomuchtodo · 2024-07-04T15:08:25 1720105705

$60M/year

https://www.reuters.com/technology/reddit-ai-content-licensi...

rbetts · 2024-07-04T12:59:04 1720097944

I wonder if these indexing deals will become more antitrust evidence.

dzhiurgis · 2024-07-04T21:42:40 1720129360

Google sometimes ignore it when it makes sense (ie big bank accidentally adds login page to ignore) or to check for spam activity (in which case google doesn't use their bot user agent)

littlecranky67 · 2024-07-04T11:18:28 1720091908

> User-agent: *

> Disallow: /

Ugh oh, that means all search engines are gona delist reddit content.

contrarian1234 · 2024-07-04T11:22:32 1720092152

That's probably the whole point ..

I'd say I add +reddit to a third of my searches these days

Now I'll have to go to their shitty built-in search and they can algorithmically feed me garbage and make lots of money from people that pay them

littlecranky67 · 2024-07-04T11:24:00 1720092240

Smart move actually. The "+reddit" is something more and more people do on google, and on kagi.com reddit results are ranken No. 1 usually. Seems Googles search result quality will drop even further now.

EDIT: Article explains there is a 60M deal from Google using reddits API so that they can continue delivering results from Reddit. Will only hurt smaller search engines, like Kagi :(

Expurple · 2024-07-04T12:26:58 1720096018

Kagi in particular shouldn't be affected. Apart from using its own index, it also aggregates results from other search engines, most likely including Google. https://help.kagi.com/kagi/search-details/search-sources.htm...

seeknotfind · 2024-07-04T16:00:12 1720108812

All of Reddit was freely and readily available just a few years ago. Just goes to show - archive and save what you love.

nojvek · 2024-07-04T12:13:20 1720095200

Reddit making deals with search engines and AI companies for millions of dollars.

Public data belong to Reddit to sell. Makes sense, why would they give it away for free when they can charge for it.

b3ing · 2024-07-04T11:32:51 1720092771

I figured it was to warn people not to use “their” data for AI. The data belongs to their users though

nunez · 2024-07-04T13:42:12 1720100532

God fucking damn it.

"User privacy" my ass. This is a pure lock-in play.

Sorry for the swear words. Reddit was _the_ way I got honest reviews about restaurants, products, and damn near everything, but their search engine was horrible and the platform is very clearly built to drive engagement.

I hate what the Internet has become. I guess it's time to go through the book list I've accumulated over the years.