Hacker News new | past | comments | ask | show | jobs | submit login
Reddit has updated its robots.txt to block all web crawlers (stackdiary.com)
19 points by skilled 89 days ago | hide | past | favorite | 15 comments



"to tell crawler to not crawl" which can be ignored AFAIK


It can be ignored (it's the equivalent to a "keep out" sign on a lawn), but I very much doubt Google et al. (Edit: Oops, Bing et al.) will actually ignore it.


The article says Google is paying Reddit to get the data directly from their firehose API, so they wont even bother crawling the public website.


I wonder how much they pay. Reddit profits a lot from showing up on the top for many search queries. I very often do "whatever I'm looking for reddit" (for e.g. product reviews), since the reddit results often provide higher quality information than normal results.



I wonder if these indexing deals will become more antitrust evidence.


Google sometimes ignore it when it makes sense (ie big bank accidentally adds login page to ignore) or to check for spam activity (in which case google doesn't use their bot user agent)


> User-agent: *

> Disallow: /

Ugh oh, that means all search engines are gona delist reddit content.


That's probably the whole point ..

I'd say I add +reddit to a third of my searches these days

Now I'll have to go to their shitty built-in search and they can algorithmically feed me garbage and make lots of money from people that pay them


Smart move actually. The "+reddit" is something more and more people do on google, and on kagi.com reddit results are ranken No. 1 usually. Seems Googles search result quality will drop even further now.

EDIT: Article explains there is a 60M deal from Google using reddits API so that they can continue delivering results from Reddit. Will only hurt smaller search engines, like Kagi :(


Kagi in particular shouldn't be affected. Apart from using its own index, it also aggregates results from other search engines, most likely including Google. https://help.kagi.com/kagi/search-details/search-sources.htm...


All of Reddit was freely and readily available just a few years ago. Just goes to show - archive and save what you love.


Reddit making deals with search engines and AI companies for millions of dollars.

Public data belong to Reddit to sell. Makes sense, why would they give it away for free when they can charge for it.


I figured it was to warn people not to use “their” data for AI. The data belongs to their users though


God fucking damn it.

"User privacy" my ass. This is a pure lock-in play.

Sorry for the swear words. Reddit was _the_ way I got honest reviews about restaurants, products, and damn near everything, but their search engine was horrible and the platform is very clearly built to drive engagement.

I hate what the Internet has become. I guess it's time to go through the book list I've accumulated over the years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: