Hacker News new | past | comments | ask | show | jobs | submit login
Reddit can't build a better search engine (ruky.me)
292 points by rukshn on Feb 17, 2022 | hide | past | favorite | 280 comments



I've searched for Reddit in the past because Reddit is the only site where I can get community answers to an important question. In those cases, the community aspect mattered to me - for example, I recently moved to Boston, so I solicited Bostonian's recommendations from r/boston; my partner and I were having troubles, and I wanted to hear from people who'd gone through similar experiences; I wanted to learn about compiler construction and went looking for other smart folks who were interested in the same topic.

I would never do a code search on Reddit. I would never search for "restaurants near me" on Reddit. I would almost certainly never solicit medical advice from Reddit, or type in a new term I'd heard to get all the information about it, or try to shop for things from Reddit links. I just don't think that's what Reddit's for.

That's why I find both this response and the original so bizarre in their thesis. Search isn't homogeneous, and, while I have noticed declining quality in a few areas I search frequently, those areas almost never overlap with my Reddit-specific search areas. Reddit wouldn't replace my Google any time soon, even if it built a better search engine.


Reddit doesn't replace all of search, but for me it does effectively replace search engines when I'm looking for an opinion. Life is short... I don't have time to weigh all the facts about something and make my own decision. While I do that a lot, when getting started with something, an initial opinion to go on is very useful.

Search engines have become really bad at surfacing genuine opinions. Crappy GPT-2 generated blogs have overwhelmed the ability of search engines to surface human generated blogs, and even the human ones are often clearly just someone's hour of research on a topic with the conclusion being "decide for yourself". Sites like Quora make it too easy for someone to insert their opinion without it being adequately tested in the public space. YouTube was once a fairly decent place to hear opinions but, like with regular search, it's currently optimized for the equivalent of low-effort blog posts: videos that are ~12 minutes long with little valuable information.

Reddit has many issues, but at least it's still a useful place to find opinions that are also openly tested against their respective subcommunities. The usual blog type content doesn't last long in most Reddit communities without becoming substantially downvoted. If I want to know why people are choosing a particular programming language, for instance, I can do `site:reddit.com why use c++` and actually get some concise and even nuanced opinions. An equivalent search across all sites on a search engine usually leads to long treatises with little value appearing first above the fold.

If I couldn't use `site:` syntax on DDG or Yandex, I'd probably go directly to Reddit's own search engine a lot of times. So Reddit wouldn't necessarily replace all of my searching, but it definitely would replace a large portion of it.

And I wish that wasn't so. I wish more of what appeared in search engine results could either be Reddit-type content or academic articles as opposed to all the flotsam we get that's somewhere in between those two yet fulfilling the needs of few readers.

EDIT: Somehow I forget how I'll also use HN in a similar way, though less often for some reason.


> EDIT: Somehow I forget how I'll also use HN in a similar way, though less often for some reason.

It's because reddit has dedicated communities for so many topics. If you search for something diy related you'll find posts from a specialist sub, and this works even for the most obscure topics.


imagine using overly censored, social credit priming reddit as your search engine


It's great when you want to search for niche information. Especially because google won't do all its censoring and boosting on the contents as much as it would on a typical google search. That's the point. Neither Google nor Reddit are actively censoring certain types of information. E.g. what is your favorite nail polish or best ice cream in NYC.

I think we might agree that if you search for anything politically censored on either, you're going to be led astray.


From the article, "The best way to fix the system is to prioritize websites that are there to share knowledge, not websites with their primary priority to make ad revenue."

You know who shares valuable knowledge? Other humans who have personally experienced the knowledge I'm trying to acquire. In this context, Reddit IS the website that shares REAL knowledge. I agree I wouldn't search "restaurants near me" on reddit but what I do search locally, and every single time I visit a new city is, "favorite restaurants in city x". The result is real crowd-voted feedback, from real humans that has almost always been extremely reliable, google doesn't do this, effectively.

Another example. I was trying to buy new skis. Google any query and you get SEO'd results which are almost always useless or sponsored reviews/content. Now, try reddit, you get real testimonials from people who have actually used the product and often provide insights I wasn't originally even aware i should be considering.

The reddit dataset is already a search engine, it just doesn't have a pretty UI. It's like trying to buy Gucci shoes at Nordstrom Rack vs. the Gucci store.


Google doesn't do this because no one is posting entries like "My trip to X city - Favorite Restaurants" or "Ski Trip - Awesome Demos" on their public personal blogs. There is not enough authentic raw data to work with so google is completely overwhelmed by SEO spam.

We get around this by specifically telling Google to prioritize results from Reddit as we "happen to know" that Reddit is a more-or-less reliable public forum where real people share their real opinions.

The same thing is already starting to happen though. Astro-turfing and bot spam working to manufacture opinion. Reddit is at least slightly resistant to this due to the nature of the arrangement/management of its communities.


While there's a powerful inclination to take Reddit recommendations, it's also increasingly dangerous. Just from normal usage, it's increasingly apparent that a disturbing proportion of comments are astroturf. There's a flood of accounts that have generic names in the form of word-word-number or word_word_number, usually are only a few days or weeks old with little karma, and run around threads responding to everything until they're banned or cycled out for a new account.

This happens not just on political topics, but things related to brands. It's very clear that paid actors are out there attempting to manage public opinion about brands or advertise products.

IIRC, part of the original supposition was that produce related search results were flooded with spam sites promoting products...but the bar is far lower to do the same with Reddit accounts. You don't even need an email to create one, let alone a domain name or hosting.

There's a very real spam problem, but spam-fighting is stuck in the past.


I think this is a really strong second-order point, in that Reddit is vulnerable to comments gamification in the same way Google was to SEO gamification, which sort of defeats the point of this whole enterprise.


> the community aspect mattered to me

That is why Google was worried about Social Media / Facebook eating them, and hence the creation of Google+. When most of the important, or valuable answer are on the social network, search engine will cease to be the entry point of internet. And in many countries, Facebook and Whatsapp really is "the" Internet.

And after that there seems to be a gentleman agreement, Zuckerberg wont enter search. And Google wont enter Social.

And actually I use HN for searches as well. From Product recommendation to many other niche subject.


I'm sure this is the reason given as to why facebook's search function is barely better than grep + line noise...


Right. I still use google to search reddit. It's that Reddit's content is more directly from people - not marketing departments or SEO optimizers.


> try to shop for things from Reddit

Any reasons why, I often check reddit for reviews for thing I plan to buy.


This thesis is off. It's not as if Reddit just appeared on the internet yesterday. It's been a place people look for this kind of info for a long time, and people have been trying to game it for just as long. And it already has a good search engine - that just happens to be Google. So it still leaves the question: why does Reddit still have such a high signal-to-noise ratio.

A few things I've noticed:

1. A lot of conversations happen before that many people are interested. Subreddits tend to attract mavens, and they often discuss things months or years before people really care (or the marketing team for whatever is being discussed is even looped in). Pay attention to when the posts you're looking at occurred. In a lot of cases, they were there earlier than you'd expect.

2. There is incredible dispersion in where conversations on a topic occur. It's not uncommon to have 10s or 100s of different communities discussing the same thing, and its not clear which is going to end up being the place people trust. Many of the sub-communities are also somewhat mutually exclusive(geography, android vs ios, etc), meaning it's going to look incredibly insincere if the same account is posting in a bunch of them.

3. Reddit posts allow negative feedback in a way few other venues due, especially not pages optimized for SEO and controlled by a single entity.

4. It is one of the few platforms with an appetite for long-form content. It is almost an anti-Twitter. Meaningfully moving a Reddit discussion on a single popular post could take hours if it could be done at all. For communities with more lasting artifacts like a wiki, it could be practically impossible.

5. As others have pointed out, the subreddits aren't controlled by and don't have the same incentives as Reddit Inc. Optimization tools aren't going to generalize well since what it takes to get to the top of each is different.


Recently have noticed a business creating fictional threads on reddit (multiple on different communities) working as an AD to sell something.

Example:

„What is the cheapest best cooling for cpu now?”

All discussions point to product that more experienced users know its crap.

But still the wrong product is at the top with most (botted) upvotes.

Companies realized that ppl look for opinions on reddit -> that created demand for those services -> bots companies implemented new strategy.

Its getting very very hard to get honest expert opinion. Seems like the only way now is to belong to specific closed communities.


Thank you for sharing this view. There are a few nails in reddit's coffin and this is certainly one of them.

I used to be able to gauge real world opinions on reddit pretty accurately, i.e. be aware what was happening before the curve. That is gone now.


I can second this. Looking for a Bluetooth speaker and if I believed Reddit I am throwing money away if I’m not buying Anker. I’m sure it’s fine but it’s disproportionately represented in Reddit over other audiophile forums


I really like my Anker Bluetooth speaker, it's an old one that just keeps on working, lasted longer than Android phones. Listening to it now, every morning - the Anker love is real. ;)


If you need something cheap that lasts, Anker is your best bet. There are others that have longer lifetime, but they are from Sony, Bose or JBL and thus more expensive. My sister has the Soundcore for 2 years now.


Have had mine for several, don't use it as much as I used to because I prefer the sound signature of my phone speaker for spoken content but still love it for music. Reccomended it to my extended family and still get thrown for a loop when I see them using it all the time, like a piece of my home in their's.


I still think if you are internet savvy you detect these things almost instantly. Uncanny valley works on text too and the revulsion is just as strong.

And all it takes to destroy your best cooling for cpu now bot setup is one user saying “No it is not and here is why…” right under the comment.


A few years ago astroturf on Reddit was of low value and was easily spottable.

Now, some of it is so good, it takes longer to confirm than to find.

If there is enough value in a community people will exploit it.


That might happen if the product being pushed is awful. But if the product meets a baseline – not amazing, but perfectly cromulent – then a positive comment might not get such a rebuke.

I guess it's not the end of the world if you get astroturfed but you still get a decent enough product, but still, it's not ideal.


Yes, Reddit is being gamed. The question as with any system is: To what extend and how does it compare to alternative systems?

The capital that is most relevant on Reddit is status. It is very easy to have people participate in Reddit, which makes it inviting enough to start building status but it's somewhat harder to actually do. Users care about their status and about their communities and Reddit combines the two in a nifty way, that already gives provides some slid resistance against gamification.

Sure, you can infiltrate the system with any/or a lot of accounts, careful VPNing, lucky/clever reposting and also keep the guise up (you must, since your portfolio of both content and accounts is under constant review, as long as it remains visible). To get a highish ranking article on google I mostly have to write what feels like convoluted infomercials (or maybe let GPT3 do it for me). There is an industry of people being paid to do so. All it takes is one mediocre writer and some time.

To me, the former system seems very brittle and expensive to game, the later much less so, specially since the prevalence of convoluted writing for SEO purposes make the distinction between SEO bullshit and expansive content increasingly difficult: People are getting used to informercial type content on the web and so it is increasingly what they learn to expect. And we give the people what they apparently want.


I don't think that's a particularly new thing. I used to be a very active reddit user, but I haven't been active on the site for 2-3 years now. But even 3+ years ago I recall plenty of (granted, anecdotal) reports of commercial astroturfing even for relatively nice subreddits.


Yeah initially I thought Reddit was useful for "give an opinion" type queries, especially where you're getting answers to a newly posted query. Eg my brother managed to get a specialist doctor to give an opinion about our mom's treatment.

But for the vast majority of commercially relevant searches, it will get sunk quite soon. It can't be terribly hard to hook up a GPT style bot to harvest karma on various accounts and then use that to gain credibility on "which headphones should I buy" type queries. You might not even need to automate it, a farm of real people could potentially be useful for that, I bet that's offered by someone out there already.


> It can't be terribly hard to hook up a GPT style bot to harvest karma on various accounts

Or more easily, steal and repost images on cute animal subreddits, sometimes adding white borders to the bottom right so duplicate image detection algorithms can't find it.


This problem will only get worse as language models get better, making it cheaper to flood a forum with bot comments that look more and more like real humans. Moderators and users alike will spend far more time and resources weeding that stuff out, while it takes far fewer resources to post that stuff in large volumes.

The only way to counter that is to have some kind of account verification requiring 1 account per person. Such schemes are all either very unpopular ("Why does reddit need my passport photo!") or very ineffective ("So I just need a new IP and I can sign up for a new account? Great, I'll set up 1000 accounts")..


Reddit accounts for this already with friction like account creation dates or minimum karma. Seems to work decently


Closed communities is where it's at... But they have the obvious inherent problem: they don't scale. If a closed community lets in too many people, it becomes a somewhat obscured open community, with all the downsides. And if it doesn't, well, you are unlikely to get in on a topic you are interested in.


One of the lessons of the last decade (?) is that if a community scales, subverting it also scales


That's something I've been grappling with recently, and the solution I landed on as "good enough" (though far from perfect) is to have a community where the ability to post and interact is closed, but the ability to view the content is open.

Some subreddits have tried a variation of this, such as the Economics subreddit granting a "Bureau Member" flair for individuals who have proven their knowledge of economics in long form posts, but without taking it farther than just extra flair. There's no weekly threads for Bureau Members only to post, or anything like that.

The issue then becomes how do you set up a judging body that determines the criteria for who is allowed to interact, and prevent that body from being co-opted in some way. Not an easy task, especially for a social media company where there's absolutely perverse incentives at play.


Seems like an easy thing to provide a link for?


The antidote outside of ruthlessly moderated special-interest "walled gardens" is a system no one really wants I think. It would need to avoid mechanisms that give community or interest groups powers to promote some content over others. It would need to avoid algorithms that automatically determine and recommend certain content. Other than pruning illegal contributions and obvious spam, it would need to avoid giving virtually any moderation powers to select individuals or the community. It couldn't allow paid promotion or require payment to participate as that merely enables those with greater financial means to contribute and have their content seen.

Introduce any one of these things and the whole system becomes gameable. Within these constraints it seems the outcome would be a loose graph of content tied to known individuals. It would grow amorphously over time, be largely undirected, and lacking the capability of being fully gamed, would probably avoid the attention of marketers and influencers. And probably general users too, since it wouldn't be easy to find interesting and relevant stuff and be quite boring for the average attention span.

The constraints bring to mind the early 90s web which basically had all of these properties. Outside of bulletin boards and UseNet that is. People wanted more convenience, hence search engines became the defacto way of finding entry points into these networks, which caused changes in how these networks presented and arranged themselves, point being even if we designed the perfect ungameable system, as you alluded to if any search engine or aggregator crawled and ranked its content, it would just be meta-gamed into uselessness.

Personally I've come to the conclusion we just have to accept that for the majority of online community-driven content, high SNR is here to stay, and we have to take what value we can individually as it comes. Unless we're willing to forgo conveniences like search, or relevant content recommendations, which I think for the majority of the web is a big old no.

For enthusiasts, building their own walled gardens with in-tune moderation is probably the way to go. Hacker News I think though not perfect is a good example of this, the SNR here is fairly decent compared to the broader web.


Don't forget:

0. Unlike the majority of the web, Reddit is moderated by actual humans.


I mean the example given is something about a car review. If it wasn't for reddit, it's likely the search query would be suffixed with a community name (like traditional forums), or a (to the searcher) known car review site. That said, the difference is that reddit has forums for everything, so you can expect to find a community about cars on there; if you're a noob at cars, you probably don't know which communities there are, what they're called, or if they're reliable.


I think it's just that Reddit threads the needle on openness and moderation.

Subreddits are moderated by the subreddit community. Well moderated subs attract and produce good content. Multiple subs on the same topic can exist, often a poorly moderated sub will be replaced by a better one.


My dissertation was about point #2.


Marketer perspective: you can not build a communication tool you find useful that won't end up being used to make money. It will get figured out.

I could go into a bit more philosophical angle with reterritorialization done by capital, but I think it's much simpler to consider the following: between politics (where once any division in population is found, it becomes valuable instantly), classical business marketing (where once anyone makes purchasing decision informed by something, it becomes valuable), capital markets (where once anything can be used to predict anything about any company or asset class, it becomes valuable) and more personal scams (where once you figure out someone's niche interest, it becomes valuable) there just isn't anything left. Go ahead, try to find something.

Reddit is being constantly targeted. I still use it, because what else?, but if the method isn't obvious, here's what you do: you are hired by/own small company making niche potato chips dip sold via Amazon. You go to google keywords and check 'potato chips dip', you google all you can find there and some of your ideas, you write down all top10 results and check every now and then (well, your SEO monitoring app does it for you). Whatever you find that allows for user input, you generate that input - accounts are cheap - and maybe do some external SEO (thus beating 99,9% of social media results online).

That's it, it's easy. What Reddit (and any mildly aware SM company) does, is they try to offer marketers access to audience for a price that's lower than cost of what I just described. There will be edge cases, especially on international markets where value of time for various business owners differs vastly, which will lead to sites slowly getting more clogged up with ads, but that's the general gist. If you can imagine using similar method to get in front of your eyes when you're looking for something, then it would just be quite weird if nobody ever did it.


What if you explicitly target it with respect to the money?

Most of the crap on Google is going to be monetized sites. If you just block/downrank all the sites with e.g. ads, then the profit motive for spam goes down - the more value you extract, the less value you are able to extract.


It's an improvement, even though it's going to be gamed as well.

That said, I have no idea how you would do it. More specifically, 1) I see almost everything as an ad, 2) you're going to get an extremely biased internet and logistics/technicalities of the project are hellish.

1) you downgrade all blogs with ads. All journalism (if they advertise their own stuff that's still profit motive and ad, right?) and all company blogs. All social media (that have over 100mln users right now). Also all blogs that are meant to promote author as an authority to get them gigs down the road, I'm guessing. You will be left with more overt PR and

2) organizations who can afford to not have profit motive in any form. That's just unsustainable for almost everything. NGOs are still asking for donations. That's going to be mostly politically motivated messaging and tiny % of people who can afford to just post for free cause they like to (god forbid they lose that position and setup patreon).

Audience allows for attaching marketing, and that allows to attach all content production to business (and other cases I mentioned). Again, I like the idea in principal, but 'lets just get the money out of the system' was tried in many different contexts and the same structural issues resurface.


What about stackoverflow?


The only thing I noticed on SO is sometimes having question titled in a more general way, with details being actually focused on a specific new framework, then discussion and answer showing how amazing the framework is for this use case. Which is also the only thing I can reasonably imagine being promoted, but incentives here are so closely aligned I can't even be mad honestly.

I don't know what happens on 'front page'/'trending' of SO and other other stackexchanges.

EDIT: I checked stats and SO has <15mln users. Part of it being the way it is comes from being relatively tiny and I honestly can't guess how expensive moderation on SO is.


> However, once Reddit creates a search engine, and once people get to know that there is an opportunity to game the system and create a financial opportunity, people will abuse that system and we will be back to the place where we are now. SEO stuffed websites.

I help run a Reddit website and open source project for managing scheduled posts [1] and we see a lot of garbage there, sadly. Over the years the site has processed nearly half a million posts and has tens of thousands of registered Redditors. In the early days it was pretty cool, we had a lot of individual users with various interesting projects. I remember early on there were some musicians using it and a few book authors that would release their books one chapter at a time - using our service to schedule the posts so they didn't have to do it manually.

But, as time went on, the entire platform has been just swamped with mostly live sex workers and people shilling bullshit products. I'm not judging the sex workers or saying they shouldn't be allowed on the platform and we don't ban them, but I would say now literally 99/100 of new users are just trying to push their Only Fans accounts to porn subreddits. Regarding the weird product shills, we've seen a few really odd campaigns. There was one guy who would post to various left-leaning and "green" oriented subreddits on the terrible nature of plastics while also promoting his own metal straw sales. Again, none of it is illegal and I'm not even sure if any of it is unethical or immoral since it really depends on if you think people are doing these things in bad faith, etc.

Either way, the project has for sure taken a backseat for us. If the server stays up and everything works, cool, if not, that's fine too. I have no interest in trying to monetize the service anymore whereas in the early days we took donations.

My point is the world of gamifying and commercializing Reddit content is already very very well established and there are large players doing it at a very large scale. If you measure it the porn industry is virtually all of it, but if you exclude them and look a bit harder you'll find other industries that have also carved out their niche in exploiting the platform.

[1]: https://cronnit.com


This is already happening. Advertisers create normal looking fake accounts, and subtly inject product placements into discussions.

The best way to market a thing is to not make people realize they are being marketed with.

I remember a few years ago, there was a post (AMA) by someone who does this and his whole job was to maintain hundreds of normal looking accounts and post SEO friendly stuff, good words about products, etc on reddit.


Why don't you guys add a flat monthly/one-time usage fee?


Accepting payment for this kind of service is actually quite problematic. Even more so now that the vast majority of the userbase is Only Fans creators. I'm not saying it should be like that, but if you've ever tried to accept payments for anything "high risk" it's a nightmare.

For example, Stripe won't allow any of it at all. It's blacklisted in multiple ways through their restricted services.


Isn't it high risk when consumers are paying for online porn? I don't see how the creators would be high risk.


> I was too young to use Google when it first got started, but according to many, in the beginning, Google had better or more accurate search results.

I'm old enough to remember when Google came, and the main difference was this:

With Google you did not have to flip through 3 pages of completely unrelated search results; that were paid for. Google used to display advertisements on the right as well, i.e. not among the regular search results.

This made Google so much better (and more popular) then their competition.


The early search engines (Webcrawler, Lycos, ...) ranked "relevance" based on some silly metric like "the more times the search term appears on the page, the more relevant the page must be".

So the result was that the first pages were full spam sites that just used search terms as hidden text. Then Altavista came and it was a huge change for the better. They at least made some effort of trying to rank actual relevance. Of course, Page/Brin's invention that made Google what it is today is PageRank, and none of the competition were close to an idea that good.

So PageRank'ed pages, good spam detection filters and ads on the side = Good Google. A search engine that stood out against the competition.

Now Googlee has horrible site spam detection, shows ads in the main result list (some times it's all you see). Google now has very little edge against the competition. Even just 1-2 years ago DDG and Bing were a lot worse than even the modern day Google. But what should make the search people at google really sweat isn't so much that google has deteriorated a bit, but that Bing and DDG have improved so much that they are now on Par. Google search has never had peers at any point since it was launched.


> made Google what it is today is PageRank

Yes, but it was no secret that Google's page ranking was based on counting links from different web-pages; I think Page/Brin even wrote a paper about this when Google was in its infancy. But Google clearly had better ranking at first, but it would not help the others if their ranking was better than Google's, since the "paid results" polluted their search results so much they became a joke. A lot of the paid results was "adult content" as well, creating a somewhat strange atmosphere during presentations if the lecturer had to preform an internet search. Although a large part of the www was adult content in the late 90's :-)

The competition doomed their own business by selling search results; they really did not play the long-game, but went for the quick cash. I guess most people would have...


>but that Bing and DDG have improved so much that they are now on Par

Worth noting that these are essentially the same. DDG scrapes a handful of sites but they lean heavily on Bing's search API (or Yandex, if you search in russian).

It would be interesting if there could be some independent consumer group that would be able to grade search engine results and compare them regularly. To not only hold them accountable but give credit where it's due too.


> But what should make the search people at google really sweat isn't so much that google has deteriorated a bit, but that Bing and DDG have improved so much that they are now on Par.

I'd say it's both, but the force that has been driving me to use DDG more and more recently is not that it's gotten so much better but that G has gotten so much worse.


The competition at the time was relying on HTML meta tags' keywords. It was purely based on faith that the web developer would be honest and help classify pages. Which was reasonable at the time, before the Web was so commercial, but that ship has long since sailed. Keyword-stuffing was already a thing that was happening, and other sites simply neglected to tag pages. Some search engines were starting to read the page content and count keyword frequency, so a page that had "apple" many times on it would be deemed relevant to a search for "apple."

Google's whole thing was it did something smart, and treated the Web as the big graph it is and more or less measured the in and out degrees of the vertices to rank their popularity. The idea that something relevant would be linked frequently. This was the subject of an academic paper and had associated patents for awhile. https://en.wikipedia.org/wiki/PageRank It was a big deal at the time, but once again the Web has changed.

Google's search relevance is a function of the changing Web and how it's now highly commercial and populated by everyone instead of being the world's largest community of nerds. Links aren't a measure of popularity, because gulf between the average user and people who operate web sites has widened...and the common denominator has dropped to the point that optimizing for popularity is questionable in the first place.


And now the majority of the first page of results on Google are ads.


80% of queries have no ads


Which makes sense, that means they can't fill the inventory for those searches. The majority of queries likely yield garbage these days and or are just searches for things like conversions, definition look-ups, searches for the homepage of gmail or facebook (which is how many older people get back to those places), or people wanting to go directly to a specific page on Wikipedia (knowing that's half of what Google is good for now) and countless other similar searches.

It's quite amusing that so much of Google's value is now in being a glorified link generator for Wikipedia and Reddit.


But the practice of appending 'reddit' to your Google search isn't depending on Reddit building a better search engine. It's using Google's search engine, and trying to direct it to a database that you hope has the answer you're looking for. So there's that. Of course, if that practice became widespread, then people could still try to stuff Reddit posts with junk in the hope that it gets picked up by Google in these searches, and then the Reddit folks would have to try to filter that out.


> people could still try to stuff Reddit posts with junk in the hope that it gets picked up by Google in these searches

This is already happening. I search for reviews and help on Reddit via Google often, and it's not that uncommon for the first result to be SEO spam: a Reddit post with no votes or comments, just lots of keywords and a link to a product or another site.


There was a post, which trended on HN yesterday (2000+ upvotes) that stated Reddit is sitting on a gold mine and they can build a better search engine because people are appending 'reddit' at the end of it.

This is a response to the original post, explaining why the logic behind is it false.


Frankly I don't really see the logic. You're failing to disambiguate "generally better" with "better for Reddit inc" or "better for redditors," which are all separate things. That is, I'm not convinced that a Reddit search engine would necessarily be subject to the same level of gaming, for a variety of reasons:

* Reddit groups are moderated in a way that "THE INTERNET" as a whole isn't.

* Reddit is a destination in itself. It's not like Google, which is a means to arrive at various destinations.

* A Reddit search engine just has to index Reddit. It can do things like entirely filter out clickable external links, which would make SEO much less fruitful.

* Reddit is dependent on being an effective conversational watercooler. For financial reasons, it's not going to allow itself to become filled with the same level of cruft as the entire web is.

This is somewhat evidenced by the difference between Reddit and Usenet. Compared with how Usenet quickly devolved into an unmanageable morass with the advent of spam, Reddit has managed to stay fairly legible.

(BTW off topic but personally I think this whole post could've just been a comment subthread in the original post. Of course we all want our ideas to bubble to the top. But if everybody started a whole new thread every time they thought they had something REALLY IMPORTANT to say that shouldn't be ignored, then every large thread would give birth to dozens of baby threads and make HN less effective overall. On the other hand, I see someone else thanked you for putting this in a separate "breakout room" so to speak, so what do I know? Cheers.)


Perhaps Disqus is a destination site too and no one has realized it yet?

It's like a distributed reddit and individual site owners could agree to have their content pulled into a main site, eg via rss or meta tags.

Human moderation has value and there was a time when Google let users curate their results.


@rukshn Thanks for your post. Personally I'm not convinced by the idea of adding site:reddit.com to your Google search query either, not just because it will get gamed by SEO practitioners as you suggest, but also because there's a lot more to the internet than reddit and a lot of great content hidden away which isn't on reddit. I also believe that a more general solution to better search is both to reward informational sites and also to penalise advertising sites - that's why I built a search engine with a PageRank-style algorthim which also detects result pages containing adverts and heavily downranks them.


Sure, there are still single-topic forums that are much better for those subjects than Reddit, you just have to know about them.


I built https://www.unscatter.com using Reddit to source links for the search index. Last year I added Twitter as another source.

I don't think it's a "better" search engine. It's a different lens through which to search. Reddit and Twitter are an information source for what people are talking about. This is why I limit my index to articles that have popped in my Reddit/Twitter input in the last 30 days, deleting anything older.

I've actually had it up for years now, just don't know what to do with it. Been focused on my career in IT rather than entrepreneurism because well, life. I can say just this morning I saw "Stanytsia Luhanska" pop as a trending term on the front page of Unscatter and at the time mainstream media has not picked up the story of the school being hit by Russian shells.

I think over all the quality results still come from Reddit. Twitter often gets gamed and I see content terms pop up in the trending list. However, Twitter overnight (my time, US East) gets a more international flavor with lots of Korean and other Asia Pac country content bubbling to the top during that time because of Twitter.


Very cool site! Is there a way to get the words in a list instead of a cloud?

I am interested in what your thoughts are about being in an echo chamber and getting out of it, but this is a great way to get a high level view of whats happening on reddit.


Right now the terms are available only via the word cloud. I have considered trying to put together an api and also keeping more metrics about each link. For example how many times it's popped up, reddit and twitter users that posted it and which subreddits the post is in. I just haven't gotten around to it.

The concern of it being an echo chamber is one of the major reasons I added Twitter and I still look for more sources. For most of Trump's presidency some form of his name was the top trend 24 hours a day. Crypto is another trend, that while it's great for me because I'm interested in it as well, I question if it's really reflective of what the world is talking about.

I have considered creating some sub-sites as well to try and dig more. Focusing on subreddits for specific categories, but the Twitter api (at least what I can afford which means free) isn't quite as flexible for doing that kind of thing while staying inside my api call limits.


> I question if it's really reflective of what the world is talking about.

I'm very curious what each country is talking about. I made a little script that would show me what each country's subreddit was talking about but, the fact that it is on reddit means that its already biased. Maybe something where you use the the most popular source inside a country would be a way out of the echo chamber but the work in maintaining that I imagine might be a bit much.


This is pretty cool but am wondering, how are you choosing which subreddits/tweets are indexed?

I notice for instance "manga updates" is in the word cloud but a search for "kanye west" doesn't return any relevant results despite him being in the news a bunch lately for being kind of nuts.


I get the top 100 posts from the top 100 popular subreddits for the past hour (as defined by reddit). I then do some basic filtering on subreddits to exclude a few that are often in the top that I find are either mostly text or media content, I'm looking for links.

The lack of Kanye West content is interesting. I'm going to try to find some time this weekend to dive into that. The only thing I can think of is it's so well known less people are sharing links and more people are just making text posts on reddit about it.

However, it could be an indexing issue too. I'm using postgres for the index so maybe there is something there. I'll research that. Thanks for noticing and calling it out!


So if anyone is following this, right now it appears that not many Kanye stories are bubbling to the top. It's simply not in the top 100 of the top 100. For the full 30 day index, checking just on Kanye in story titles, this is all I have in my index.

Kanye West - Gold Digger Kanye wants Billie Eilish to say sorry or he'll pull out of Coachella Kanye West: ‘Stop Asking Me to Do NFTs... Ask Me Later’ Kanye West Does Not Want to Get Involved With NFTs Kanye West Rejects NFTs, Tells Fans To Stop Asking: 'I Make Music In The Real World'

So long story short, I think I may need to consider increasing my pull. Going from the top 100 of the top 100 to maybe the top 1000 of the top 100 or the top 100 of the top 1000. I'll have to do some research and also validate my crawl can support it.


> Reddit and Twitter are an information source for what people are talking about.

It's a source of information for what Reddit and Twitter owners want you to see. Both websites are heavily manipulated in a myriad of ways. (The simplest one is via massive amounts of accounts banned for wrongthink.) This is blatantly obvious if instead of passively absorbing news you deep-dive into a specific issue and then look up the discussions and trends, especially on Reddit. Sometimes discussions for major events just aren't there, which is nigh impossible organically on a website with tens of millions of users.


Manipulated or not, I do agree it's an echo chamber. I added Twitter in attempt of breaking the echo chamber aspect but wasn't as successful as I hoped. I do occasionally look for other sources to add but finding similar sites that can match the volume of those two is difficult. Which makes it more difficult to figure out how to weigh other sites results to those two.

This discussion did make me poke around some more. I may consider using the free tier of Bing's api just to pull in trending topics.


What are some examples of this extreme and blatantly obvious manipulation? Certainly it happens but you're making it sound like we live in the Truman Show.


Here is a simple one.

https://www.reddit.com/r/ontario/comments/se7t5p/i_saw_the_t...

https://www.reddit.com/r/conspiracy/comments/secvqz/i_stood_...

A subreddit with 480K users manages to upvote the topic 1.3K times with 78% up ratio. A subreddit with 1.7 million users upvotes a similarly themed topic (expressing a different opinion) only to get it 555 points with 91% up ratio. Ten times the difference in engagement.

This is the norm. Reddit bans the fuck out of people with opinions that don't align with the hivemind. You can search HN for "reddit banned" and see a limitless supply of stories. At one point they deleted two thousand subreddits in one go.

I'm not going into certain topics magically not appearing anywhere on Reddit, because it would require a significantly more detailed examples, but it's a thing as well. It is the Truman Show for public opinion.


A lot of "influencers" get paid good money to shill on Reddit.

It has a huge social media site. Of course, it is not ignored by the industry. Products, fads and politics.

Over the last several years it has become quite infested.

All is not lost.

There are some really good smaller subreddits. For smaller subreddits who have been around for a long time social meia influencers are easier to spot.

Some subreddits are so focused on an obscure niche that it is not worth money to try.

Well paid reddit influencers are well trained in "propaganda". One might have several accounts and the information comes "organic" from a "group effort". Some will not even name a product directly but give enough hints.

You also have negative influencers that work had to hurt the credibility of others and brand.

>Reddit posts are good because the people who create these posts or make comments are doing it to share their knowledge. And there is no financial incentive associated with it. They know that do it to share their knowledge.


I largely blame Reddit's redesign. Once Reddit became more about infinite scrolling with media content becoming front and center, that made it much easier to gain attention through low-effort gifs and memes. It draws in the lowest common denominator of people looking to scroll through mindless content while on break. More eyeballs and more screen real estate means more opportunities to shill and propagandize. Old Reddit doesn't encourage this to the same degree because feeds are mostly composed of titles and thumbnails without infinite scroll or autoplaying media; much less exciting for mindless content consumers but better for those who are genuinely interested in discussions around particular topics. The change in Reddit's design brought in a very different element that also brought more shilling with it, and so many subreddits have turned into glorified meme generators.


The low effort racist/political/pepe shitpost image macros came across from Facebook. My grandparents love them.

I wouldn't mind seeing a lot less of them on Reddit, but that's apparently the only part of the site's userbase that doesn't have adblock installed which is why Reddit caters exclusively to them.


Umm... I guess that can be considered part of it, but somehow I manage to avoid the racist shit without trying.

I'm more broadly talking about how it evolved many of Reddit's communities to be more "look at me!" based as opposed to "what do you think?" or "here's my story". Not every community is like this, but if moderators aren't strict, communities quickly become an amalgam of these:

- Here's me doing a thing loosely related to the topic of the sub! Look and congratulate me!

- Here's a screenshot of some news story with no hyperlink. Get mad with me!

- Here's a meme making fun of breaking news. Laugh and mock with me!

- Here's a product I bought that's loosely related to the topic. Don't I look badass? Discount code in the thread!

- Here I am, [pretending to be] an idiot just getting into the topic. Tell me what to buy!

This is of course an incomplete picture, but this more or less reflects my experience with what I consider to be low quality Reddit content. They are mostly about visual attention and are designed to either promote products, their YouTubes/Instagrams, or make you think that the community believes something, ergo the reader should believe it, too.

Old Reddit wasn't immune to this, but I think there's a definite difference when the UX is designed around passive content consumption versus something more active. The old Reddit interface was ironically more of an active experience because you were forced more to visit individual posts based on the title and thumbnail, meaning posts better be good or fewer people will read them. With the New Reddit's passive content consumption, the visual content is front and center on the screen with no effort by the user other than to keep swiping their finger upwards to move down their feed. This design both invites a different type of user and greater amounts of shilling for products and political memes.

But maybe I'm way off base with how most people use Reddit. I believe I've seen Reddit change overall as well as some individual subs.

To the creddit of Reddit, subs do have the ability to avoid most of these issues. Some of them turn off image posts entirely, which seems to greatly increase the value of a community. Unfortunately, that doesn't stop Reddit as a whole from being a shilling and propaganda machine. People want their dumb pictures.


The structure of reddit really works well for preventing shilling from being a fatal problem. Namespaces moderated by genuine enthsuasists gain more users, and even if they are infiltrated, there is this constant evaluation of subreddits and posts by subscriber numbers and upvotes. It's a darwinian ecosystem with constant speciation and competition among subreddits.


I completely disagree with some of the points here. There seems to be this weird assumption from people that Reddit is some organic source of information that hasn't been polluted by advertisers and SEO. But the truth is that it absolutely has, the difference is that the way you game the system is different. Google is this crazy algorithm and no one really knows how it works, so people just throw shit at it. Keyword spamming, link farms, etc. All this auto-generated content. It's much more difficult to manipulate Google, but as an algorithm google doesn't object to people trying. The result is an obvious explosion of rubbish that raises the noise floor.

Reddit is different, they don't have algorithms. They have moderators. You don't need to figure out some crazy complicated way of manipulate things, you just have to either bribe or trick a couple of unpaid part time staff. And we've seen this so many times - moderators who use their position for their own gain. Now the result is different, this is what's misleading. On reddit because it's all manually moderated spamming is heavily penalized. So you don't get the noise floor of rubbish content. What you do get though, is highly unreliable information about anything that could be monetized.

It's exactly the same way on twitter, your energy company will be perfectly happy treating you like crap and keeping you on hold for 3 hours, or your airline company will throw your bags into a skip in Timbuktu and tell you you're on your own. But suddenly if you have a following on twitter and complain you get the red carpet treatment. Why? Because they've found a very easy way of cheating their customer service reputation.


> What you do get though, is highly unreliable information about anything that could be monetized.

So it's just like the rest of the Internet. And at least in Reddit (or, more to the point, the newer federated lookalikes - Reddit is dying and losing its highest value and most committed users) you can go look for the tiny niche communities focused around that interest, where users will be better focused on policing insincere content.


They don't exist. Open forums have been dying for the last decade. First they migrated to Reddit, then walled garden Facebook groups, and lately to Discord.

Google has also removed a lot of large and active forums from it's index outright, either due to AI bungling of it's core product, anti-competitive reasons, or political conspiracy.


The problem with reddit isn't different to the problem of astroturf in other places, it's that it's a highly concentrated form of it since it's so popular it's a highly valuable target.


This is a weird article to me. I somehow agree with the conclusion but disagree with all of its points.

Unless my memory is failing me (possible, it’s late), it straw mans the article it’s referencing by implying Reddit would want to do something like that anyway. It also fails to offer a compelling narrative for what makes Reddit good to me because frankly, and I feel that a lot of people seem to be overlooking this here too, Reddit has plenty of its own problems.

Reddit is owned by Condé Nast and is largely used as a content mill to prop up its existing properties. Reddit is dominated by the Pareto principle. 80% of its content is generated by a very small subset of users. Reddit doesn’t have the best track record for free expression. Many subreddits have crazy moderators that volunteer for some Machiavellian rush and use their power to stifle or twist the narrative of public sentiment. Reddit doesn’t want to kill the golden goose, so when this kind of controversy arises or when controversial subreddits crop up they kill it with fire to protect their investment. Lastly, Reddit users more than half a decade ago were already lambasting the amount of corporate shilling and guerilla advertising that takes place on Reddit daily. A lot of this is, in fact, orchestrated by those users I mentioned who produce 80% of the content because yes, they have discovered a financial incentive to do so.

I wouldn’t deny that Reddit has any sincere users any more than I would deny that we have any sincere users here (I’m certainly not getting paid for this), but I’ve been a bit perturbed by the amount of whitewashed, positive vibes Reddit seems to be getting around here lately. I stick Reddit on the end of search queries too, but I feel like I still wade into results with a similar level of skepticism to that which I carry around most of the rest of the web.


Reddit doesn't really need to build a search engine, they could just have an algolia instance like HN does: https://hn.algolia.com/.

While not perfect, it does a pretty good job, in my experience.

However, the article makes a valid point -- the moment people start using it heavily, we'll get a reddit comment equivalent of SEO, with keyword stuffing and whatnot.


I absolutely love the HN Algolia search. I am amazed by its sheer speed, and that it does exactly what you'd expect, lets you sort by popularity and filter by time, and does no neural-network-deep-learning-garbage query rewriting.

I'm thankful for its existence, and I hope they get plenty of clients due to this HN demo of their power.

I hope they can afford to keep it alive forever. I use it instead of the " reddit" suffix in Google, usually HN has what I'm interested in, and in any case has much higher signal-to-noise ratio.


Algolia was the first thing I though about when the post was made yesterday.

I commented this elsewhere but I don't think Reddit wants a good search option. Users tend to repost a non-trivial amount of content/questions/etc., and if those users could find the original content with a better search, it's likely reddit would see less user engagement.


Greed killed the golden goose and this article points this out in a simple and convincing way.

> The best way to fix the system is to prioritize websites that are there to share knowledge

This doesn't immediately provide a solution because knowledge sharing does need infrastructure (which needs to be developed and maintained etc.) But as the (relatively) tiny budgets of organizations like wikimedia indicate, if you have the right incentives less might be more.

There is room for money to made in the digital economy, lots of it actually, but it will require us to pull the plug on adtech and all the ways it has degenerated. In broad brush we need to cleanly bifurcate into 1) trully free commons and 2) pay-to-play services that respect and are accountable to the client/user


Thanks for posting op, it's interesting to think about.

> Reddit posts are good because the people who create these posts or make comments are doing it to share their knowledge.

I would offer a bit of a different perspective here, I think maybe even most do it for what you could call self-soothing.

For example, a lot of people share knowledge, but it's _their knowledge_, which can also be defined as _sharing their subjective past_ and that's known to be a very soothing thing for people who may even be troubled by the unknown in their personal lives. Hop online, boom, you're a domain expert in your own past. Find a place that's _about your past_, be it /r/linux or /r/formula1 or whatever, and there you go, it's comfy. You generally get an upvote + reply vibes dopamine bonus just for being there, due to your relevant past.

So, Reddit's sooth-sayers (so to speak) are kind of lazy but incentivized-lazy in some ways, which I think can also speak to some issues with the platform surrounding cognitive blind spots and what some here have called the echo chamber effect. That's an opportunity for a new search engine, in some ways, but it also may offer insights to new services that could be even more effective.

> And there is no financial incentive associated with it. [posting on Reddit]

There is though. I mean I've myself been financially incentivized into doing it, as have probably many, many others. Nothing sneaky, even.

You've got people posting their stuff all over the place, and the more community or indie or niche it is, the more welcomed is the post-as-advertisement or comment-as-advertisement. And that's not even going into corporate efforts to operate on the platform, which can be very subtle.

> However, once Reddit creates a search engine, and once people get to know that there is an opportunity to game the system and create a financial opportunity,

Perhaps--but Google was _really_ good for a very long time. That was worth a lot to millions of people.

I like to think that in the future, an ecosystem-mindset toward incentive systems could really help, like planning for different emergent systems that incentivize good results.

Anyway, fun to think about, thanks again.


> I would offer a bit of a different perspective here, I think maybe even most do it for what you could call self-soothing. > > For example, a lot of people share knowledge, but it's _their knowledge_, which can also be defined as _sharing their subjective past_ and that's known to be a very soothing thing for people who may even be troubled by the unknown in their personal lives. Hop online, boom, you're a domain expert in your own past. Find a place that's _about your past_, be it /r/linux or /r/formula1 or whatever, and there you go, it's comfy. You generally get an upvote + reply vibes dopamine bonus just for being there, due to your relevant past.

That's a bit too psycho-analytical. A lot of what gets shared on technical/domain-expertise subs are simply technical knowledge. Most posts on such subs are questions, and upvoted comments are typically answers. If you accuse people of feeling good about helping others with their knowledge, well, I'd say that the joy is well-earned by the generous act.

In fact, if we are talking about the exhilaration of being upvoted, it's much easier to farm karma on reddit by lying. There are many big subs that are purely made up stories.


You are illustrating my point for me, and for this I thank you for sharing your subje...err, past experiences.


Then, I'll admit that I don't understand your point. Your tone seems to suggest that there is a better alternative than putting someone's past experience as part of their argument. What might that be?


Despite all of its drawbacks like the obnoxious mobile app pushing and deterioration of communities after they pass a certain size it is still the closest equivalent I know of for the early 2000's forum culture. It is the only social network I know (apart from HN) that does not exclusively deals in bite-sized dopamine hits.

Lets hope that the upcoming IPO does not compromise the incentives there.


Agreed, but it doesn't have to. It is addictive as hell.

I guess, if you own facebook, you track daily stats and percentages and convert them into insights like "how long are people on the site" or "how many ads to posts ratio do people experience" and FB optimizes for that.

I guess, Reddit is still growing enough that it does not have to do such analysis and take any action.

Also, the fundamental structure of reddit is different. People search out for sub-reddits and choose to be in them. The dynamic is different. Even in FB, those who spend time in groups, will feel different than browsing through profile posts.

While FB is a chore, reddit is an acquired taste.

Another important thing I have noticed is that, conversations in reddit, the majority of them, revolve around internet events (a blog post, a picture, an upload, a tweet, a news item, a real world event) and seldom are there any "friends and family" networking stuff happening. If I post a cute pic of my kid, I don't expect my brother to see it and say "my nephew is cute", I expect a honest or sarcastic or angry or jealous discussion, or no discussion at all.

The variety of experience reddit provides, is its USP. You can see glimpses of that in Twitter and Insta. A social network pivoted around interests and experiences, rather than friends and family.


There are a lot of dynamic forums out there. Enough for a company (Discourse) to have powering them as their business model.

I tend to navigate very niche spaces and I stumble upon forums on a regular basis.


> However, once Reddit creates a search engine, and once people get to know that there is an opportunity to game the system and create a financial opportunity, people will abuse that system and we will be back to the place where we are now

I mean, you don't have to imagine this, it already happens. People do game Reddit to inject ads, create bots that farm karma then use it to post their content and get them to the front page, etc. The only problem is that (at least some) redditors are incredibly sensitive to this and will go full internet detectives on you. See /r/HailCorporate.


There is already a great search engine, it is Google.

The problem is that it contains an overwhelming volume of spam and auto-generated sites.

To solve that, we all use the best search engine and apply site filters. Sites where we know other humans may have already discussed it.

We filter for forums - and Reddit is the largest of those - and we filter for specific websites we already trust.

Sometimes I filter `site:news.ycombinator.com companyname`, sometimes it's `site:dcrainmaker.com best gps watch`, sometimes it's `site:lfgss.com bottom bracket cable guide de rosa`.

But if I don't know about a specialist website, then it's `site:reddit.com searchterm`.

The search engine exists, it's just full of garbage with no effective way to search minus the garbage.

I'd take another DMOZ, a human curation of specialist sites. Then I'd have something better than Reddit too... I'd have the right filter to make every search useful.


How would you search private forums (subreddits) that you are part of, but not accessible for Google?


I'm already not signed in to Reddit so this already applies. It turns out most forums default to public as that drives organic growth that sustains them over time. There are enough public fora to not have to seek out private ones to answer things.


As someone who appends "Reddit" to his Google queries - SEO spammers have already caught up on this trend. I've seen it on several occasions that some blog site will have the word "Reddit" in the title.


That’s why you should use ‘site:reddit.com’ in your search query to only get results from this domain.


I make custom search engines for this. In Chrome, go to Settings -> Search engine -> Manage search engines -> Other search engines -> Add, then add this

Search engine: Reddit through Google

Keyword: r

URL with %s in place of query: {google:baseURL}search?q=site%3Areddit.com+%s

After that you just type "r best electric bicycle" in the top bar and it'll turn that into "site:reddit.com best electric bicycle".

I also have

    n  {google:baseURL}search?q=site%3Anews.ycombinator.com+%s
    w  https://en.wikipedia.org/wiki/Special:Search?search=%s
    wk https://en.wiktionary.org/wiki/%s
    u  https://www.urbandictionary.com/define.php?term=%s
    l  https://libgen.is/search.php?req=%s
    a  {google:baseURL}search?q=site%3Aarxiv.org+%s
    t  https://thepiratebay.org/search/%s/0/99/0
and a bunch more I never use. The LibGen one isn't great because it sometimes breaks if they get kicked off their TLD. I worry that someone could snag the domain after they get kicked off and I wouldn't notice, but that's a bit too paranoid.

Then I also have

    p  {google:baseURL}search?q=site:docs.python.org%2F3%2F+%s&btnI=I%27m+Feeling+Lucky
which goes straight to the first search result using Google's "I'm Feeling Lucky" feature (turns out that's what that button was for), but in order to avoid having to click through a redirect notice, you need to install (at your own risk) the "Redirect Google Redirects" extension: https://chrome.google.com/webstore/detail/redirect-google-re...


For Hacker News I always use https://hn.algolia.com - it has more specific sorting and filtering options. Besides HN itself it is one of my main bookmarks because I search for Hacker News discussions all the time when checking out new technology.


Append `site:Reddit.com` instead of just `Reddit` :)


>I was too young to use Google when it first got started

This statement aged me 10 years


Yeah that one was weird. But then I thought about it and I didn't either. I don't think I started using Google until like 2004.


>Reddit can't build a better search engine

Reddit cant build any search functionality whatsoever. Current search is so broken It cant even find my own posts.


I've said it before and I'll say it again... I think building better search goes against everything Reddit wants to be. Reddit doesn't want to be Google or Wikipedia. Reddit wants to be a mix between Discord, TikTok, and Instagram.

The redesign they did years ago pushed a more Instagram like design. Things like chat and rPan are meant to keep you on the site interacting with other users in real time. Their new video player is just a TikTok rip off. They give away awards every day and started pushing award karma in hopes that you'll give them to other users. Reddit wants you on their site, interacting with other users in real time and building a better search goes against that goal.

If I'm looking for a new pair of boots, there's two paths I can take on the reddit site. I can search "best boots" or I can go make a post on /r/boots. If their search is fantastic and I find a post that's 2 years old, there may be some great information there, but I'm probably not going to comment. Even if I do comment, I'm probably not going to get a lot of responses. If their search is shit (intentionally or unintentionally) it pushes me toward making my own post and drives up their engagement metrics.

It would be a massive undertaking to improve their search and there's very little incentive. Reddit wants users who just want to browse and that's not what the users coming in from a specific Google search are doing.


There is massive incentive, since Google earns over $50B per quarter selling ads to people who want to buy stuff. Targeting these queries effectively would be hugely more valuable than my current ad selection of increasingly desperate shitcoin/NFT shilling and pleas to buy Reddit gold.


I disagree. Reddit already has more than enough information to effectively target users. Just because they aren't doing it well right now, doesn't mean they need a new approach.

The subreddits you spend your time on is basically just you classifying your interests for Reddit. I'd argue that knowing that someone spends 1-2 hours a day scrolling through hiking subreddits is just as much of an indicator of their interests that any combination of Google searches. Reddit basically has access to all the information that Google ties to get through 3rd party cookies, except they get it way easier since you're never leaving their platform. They know where you're going, how long you're there, and even if you like or dislike the content (upvotes and downvotes). I'm sure their targeting could be improved with search, but like I said in my first post, I think that's a lot of effort for a small improvement.


It would be even more amazing if ads were stuck exclusively on search result pages. That would minimize the interruption while maximizing the reevance of targeted ads, with minimal tracking needs.


“most of the sites back [when Google was created] were indie websites, which did not care about SEO and money and created websites for sharing information.”

This is a very bad misinterpretation of the history. At the time, we’d already passed the inflection point of most websites being just for sharing information (that probably happened in 1995). Traditional search engines like Lycos and AltaVista were being overwhelmed by SEO spam. I remember having given up on most searches for anything remotely commercial.

What made Google so revolutionary was that it worked around that with a very different algorithm that relied on incoming links and site reputation. Of course the spammers figured that out soon enough, and Google worked hard to stay ahead of things for a long time.

But now what we’re seeing is that Google has captured itself in the SEO game, and the incentives to continue investing in providing a high quality search engine are mostly gone. Google makes money off of showing you ads, and getting you to click on ads, and getting you to use other Google services that show you ads. And the ads they sell are a lot smaller and more tightly controlled search space than the spammy web. And they are more concerned with who you are and what you’ve been up to than what you are actually looking for in the moment.

I agree that Reddit won’t be replacing Google as a general purpose search engine, but eventually Google will become so user hostile that alternatives will emerge. It’ll just take longer than we want.


>So how can we fix the system? The best way to fix the system is to prioritize websites that are there to share knowledge, not websites with their primary priority to make ad revenue.

Different solution:

- Make a search engine that is actually good. Blacklist all SEO spam such as pinterest, allow complex queries, restrict by date, site, regex, etc

- Charge 1 cent per search, or as much as is necessary

Why has nobody done this? What am I missing?

I would totally use this. And if the pricing were correct, you wouldn't need a captcha and could just expose it as an api for bots


>Blacklist all SEO spam

That is like saying:

"Electricity problems ? just build a fusion reactor..."

Sure it's the answer but the devil is in the details !

The "easy ones" like pinterest are "easy" but I can guarantee you that finding all the other ones and new ones will be much much harder, especially once you have false positives.


>Why has nobody done this? What am I missing?

The answer is in your proposal:

>- Charge 1 cent per search, or as much as is necessary

90% of people do potentially hundreds of searches everyday. Especially non expert users that practically use the search engine to search for the domain of the sites they need to go too.

The people ready to pay a fixed fee /per search wouldn't be able to cover even a fraction of the infrastructure costs of : "search engine that is actually good"

Then there is the question of what good means.

Google's solution that "outside links" are considered as "votes" that this site is good is what made it good but also what made it vulnerable to SEO.

Do you have any other idea on how do you decide if a result is good or not beyond the literal text search?


>Google's solution that "outside links" are considered as "votes" that this site is good is what made it good but also what made it vulnerable to SEO. Do you have any other idea on how do you decide if a result is good or not beyond the literal text search?

Maybe you could have a "report spam" button next to every search result, and sites that got reported more often would get weighted way down in the results. But whatever google was doing around 2010-2012 is my benchmark for "actually good search engine". Maybe hire some people who worked at google back then and have them write something similar

>90% of people do potentially hundreds of searches everyday. Especially non expert users that practically use the search engine to search for the domain of the sites they need to go too. The people ready to pay a fixed fee /per search wouldn't be able to cover even a fraction of the infrastructure costs of : "search engine that is actually good"

I'm pretty skeptical of this. Ok, so you spend a few dollars per day on search. So what? People spend that much on coffee, and you get way more utility out of an actually good search engine. The market would definitely exist


If there was a “report spam” button, companies would hire people to “report” anything that praises their competitors’ products.


>Google's solution that "outside links" are considered as "votes" that this site is good is what made it good but also what made it vulnerable to SEO. Do you have any other idea on how do you decide if a result is good or not beyond the literal text search?

Exactly this ! I used to be head of search for a big eCommerce site, and even with this very "limited-scope-vertical". I was always amazed at the vast differences between people's opinion on "what are good results" it's so personal and ambiguous most of the time.

Even in our small'ish devteam (< 15), there were never 100% consensus on what "query to product-ranking" should be.

Doing this at Google scale I can only imagine is x1000 harder


Step 1 is quite difficult, and step 2 has the downside of disincentivizing people to use it. There are some attempts to make search engines where you pay per month.


Kagi Search.


Building a good search engine is hard

It gets worse if the content you're trying to index are labeled with the same titles like "me_irl". While they're usually just pointless memes, there are occasionally a few gems that I would like to find again to share with friends but it's basically impossible with such poor signals. I think it's a bit sad that those rare gems are essentially lost and forgotten.


When I append "reddit" to a search term for something it's because I'm looking for the opinions of mavens. For example, I needed a flashlight recently and rather than just buy the first flashlight on Amazon I instead searched "flashlight reddit", found the subreddit for flashlight enthusiasts, scanned their wiki and posts about what to buy and then just bought what seemed like their top recommendation for my usecase. It took a few minutes more, but now, I don't just have a flashlight, I have a flashlight that people who care about flashlights think is a good flashlight.

I think the reason reddit building a better search engine wouldn't help them is that it's already pretty darn easy to just add "reddit" to a search and, at least for me personally, I don't do it that often. Typing "reddit" at the end of my search seems a lot easier and faster than going to reddit and typing my query into their search.

Also, contrary to the blog post, I think people have always tried to game both search results and make money on reddit.


This is exactly right: if Reddit ever decided to build a search engine they'd wind up destroying their site, because everyone would try to game the results and fill Reddit with crap to do it.

The point it is that search engines are adversarial: people who are skilled and determined are actively trying to replace good results with their results.


one advantage that reddit has is that its content is community moderated. People may have an incentive to game search, but communities have an incentive to keep the content authentic. Not to mention that Reddit as a platform also has some means to restrict spam accounts.

Google if it does this sort of clean-up at all has to do so algorithmically which is kind of shoddy. Reddit basically gets a human workforce for free, so I'm not so sure the comparison holds.


Their site has been thoroughly trashed already. It's an absolute cesspool of bots and groupthink. When I joined back in 2007, it was a completely different site full of actual discussion and not just cringey self deprecating nerd jokes and memes.

It's a shame, but that's how things go. Once a thing gets that big, it gets its soul sucked out.


It's true, early reddit was rather similar to HN actually. I can even pinpoint one of the cataclysmic events, reddit's "Eternal September" so to speak, where things accelerated downhill: When digg 4.0 was rolled out and failed spectacularly, leading to an influx of digg users to reddit.

Prior to that I checked digg out every once in a while and got immediately annoyed of all the "cringey self deprecating nerd jokes and memes". And suddenly they were everywhere on reddit.


Yup. I bounced from slashdot, to reddit, and now here.


Your experience on reddit is 100% the result of the subreddits to which you subscribe. Unsub ALL default reddits and you'll be surprised at how quickly the quality goes up.


Yeah, I unsubbed the main subreddits years ago. I only stuck around for the tech stuff. And as they all grow bigger they turn into meme fests, "hey guys can you be Google for me?", stupid one liner in-jokes, and lowest common denominator shit posting.


Avoid r/Canada just a toxic community where mods delete anything that offends them and goes against their beliefs. I honestly think it is also overrun by bots trying to steer an narrative. Everyone just argues and fights now it seems.


I think a lot of the Canada-based subreddits are so political and group-thinky, for lack of a better term, that they are better off avoided at this point. /r/Ontario is the most glaringly obvious example of an echo chamber that I see no point in ever checking it, even though I live in Ontario. My small town's subreddit is decent though, just for the odd news stories every now and then.

Best advice for reddit I can give is to stick to small, focused subreddits. In my case I only visit gaming/tech/dev/podcast related subreddits. The site is only usable like this now, imo.


This is a good point I'd never considered before. The better your search, the more people will try to game it, the worse your results will get.


You (and the OP) are echoing Goodhart's Law, which describes one of the reasons that I've lost faith in the wisdom of markets.


yes, however there is an element of real feedback: if an subreddit turns out to be garbage, then people will choose a different subreddit with a different moderation policy. It is possible to prioritize by number of point or by some dynamics over the popularity of an subreddit - that would incorporate this feedback.


I've seen people game bad search engines just the same.


This take doesn't make any sense to me.

> Reddit posts are good because the people who create these posts or make comments > are doing it to share their knowledge. And there is no financial incentive > associated with it. They know that do it to share their knowledge.

Ok, that's why people are Googling for "xyz reddit"...

> However, once Reddit creates a search engine, and once people get to know that > there is an opportunity to game the system and create a financial opportunity, > people will abuse that system and we will be back to the place where we are > now. SEO stuffed websites.

There's nothing in this article explaining how the existence of better search on Reddit would provide different incentives than the very popular search engine for Reddit that is being used today (Google). What is this "financial opportunity" that doesn't already exist that will spell doom?


Reddit results are not organic at all but maybe appears to be on the surface.

Subs have their own "reddit" brands in accordance to the topic. They also have their own "reddit" opinions shared among almost every sub with enough people in it for justify propaganda.

I already did my own informal research on this - reddit has transitioned into consumption. I went sub by sub for thousands of posts, even a few years ago, and found that at least 75% of the selection of hobby subs I chose were related to the consumption of products within that hobby, not the active participation and knowledge translation within that hobby. You'd think camping subs would be talking about camping? Sure, see the sidebar for basic info. However, what you'll really see is 80 of the top 100 posts of the day will be pictures of various products within different backgrounds/settings.


> Subs have their own "reddit" brands in accordance to the topic. They also have their own "reddit" opinions shared among almost every sub with enough people in it for justify propaganda.

Which is actually the cool thing about Reddit, because you can just create your own subreddit about a topic and set the rules as you see fit. Don't like that camping pics have products in the background? Alright, set a rule for your new sub that says no products in pics, and then rule your kingdom with an iron fist.

Heck, that could be the whole point of your sub. /r/campingProductFree or something.


This used to work well but moderators have "professionalized" and nowadays will viciously stamp out any kind of separarism.

Creating a forum has never been hard. Attracting people is.

I would imagine this would get worse as money corrupts reddit the same way money corrupted google.

Whomever moderates /r/camping probably isnt doing it for free.


> This used to work well but moderators have "professionalized" and nowadays will viciously stamp out any kind of separarism.

How would they go about doing that? I see new, competing subreddits being created all the time. Like, what could a mod do to stop me if I wanted to create a new competing sub?


Censor posts even mentioning it in competing subs they moderate?

Maybe i wasnt clear about that.

Even if everybody from an existing subreddit would move to a better one they wont if they dont hear about it.


> They also have their own "reddit" opinions shared among almost every sub with enough people in it for justify propaganda.

This has been a problem I've observed when using reddit to choose products. If a bunch of people are going to get together to talk about X, they are going to be way more into X than most people. When I was choosing a mesh network system the conclusion I got from reddit was that spending less than $600-$800 and having a system without a wired backbone would make my internet unusable. It was extremely difficult to find a "good enough for typical users" recommendation since none of the posters were typical users.


My hot take is that this represents actual human behavior, like in a human Sturgeon's law sense. 90% of campers are going to camp less than X times a year. (X == 3? IDK. some low number) They're going to go to Reddit and look for "best tent" and then go to REI and buy it and feel great. That's all they need.

10% of campers are going to camp > 10 times a year and have nuanced opinions about tents. Occasionally, these campers will post their lengthy tent opinions on Reddit, but mostly they'll stay quiet because they already know what the best tent for their own uses is and don't need to deliberate on it.


I think people give too SEO too much credit. SEO is a big problem as long as there is only one big search engine, with one algorithm. That allows absurd specialization to abuse the idiosyncrasies of that search engine. That's just how it goes with monocultures though.

I've had very little problem identifying SEO spam by just not being Google and promoting other values than they do. Since the search engine spammers are very mindful to follow every one of google's best practices, then you can effectively find the rest of the internet by just punishing sites that do not, in some way, follow Google's rules.

Unencrypted HTTP? Bring it. Poorly optimized for mobile? Don't mind if I do. Weird looking URL? This is gonna be good! Et cetera etc.

You can also weed out a lot of this nonsense by looking at where websites created by humans link. Rarely do they link to spam.


I think Google is over because the problems we have now are harder and harder to google.

That combined with stagnation of constructive and/or simple progress, all caused by lack of energy.

Means there is less and less you don't know that Google can teach you. I haven't googled anything important since around 2016.

Reddit gives you a narrow subject with people that care about said subject, much much better!


Well there still seems to be a lot of deep content on the web, it's just that google doesn't show you any of it. You may, if you're lucky, get a Wikipedia box. And Wikipedia, for all its breadth, is incredibly shallow on most topics.

I submit that the reason you aren't learning anything from google is that it just doesn't surface anything for you to learn from, but usually just the same dozen or so big sites, wikipedia, stackoverflow, etc, and the rest is all spam and ads.

If you google "Strongly Connected Components", it shows you a wikipedia page and a few pages with what looks like freebooted content and a bunch of ads.

This is a far better resource than any you'll find on Google:

https://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorit...


So why you think this particular website must be the top answer? I looked at googles results and and I disagree, they are not bad, e.g. actual program examples, videos and papers for example.


Besides the ads, the Wikipedia article, the next result I'm getting is this: https://www.geeksforgeeks.org/strongly-connected-components/

Which seems to mostly just be paraphrasing the Wikipedia article and the page I linked. The kent.edu-page seems aimed at teaching you the material, the geeskforgeeks-page seems primarily to focus on getting you through your introduction to computer science exam. It gives answers, not understanding.


I disagree, you can get a pretty good understanding of the subject, there are many resources linked there you are focusing on a few perceivably weak ones, your link is not an exhaustive answer. e.g i prefer learning from actual code examples and videos.


> Reddit posts are good because the people who create these posts or make comments are doing it to share their knowledge.

people post to Reddit (and here) not to share their knowledge but because they get a little squirt of dopamine from upvotes and downvotes and replies.


I find that most people on this site use it to engage in discussion, to learn and to teach if they think they have something of value to say. For the most part, you're right about Reddit these days. When I used that site I'd check votes every time I was done scrolling. This site I check threads to see if anyone responded to me with anything I'd like to reply to.


Actually they can and only they can.

As noticed, once Reddit builds a search engine, people will flood it with spam. But Reddit is in the unique position that they can figure out who is a genuine user. They have the ip addresses of all the submitters and they have the source redditor of each downvote.

They can still show the comments from VPNs and such. But for their search engine, they can exclusively rely on using the signals from verified IPs.

Instead of building it themselves, they could cooperate with Microsoft and Facebook for the search technology and to know each user even further. Those signals, combined with the downvotes to identify spammers, could create the foundation for Google's nightmares.


This argument makes precisely zero sense. We've already established that people are using site:reddit.com in search queries so the incentive already exists. And frankly Reddit is already at least 50% astro turfed. But search results on Reddit are still more useful for many queries.

It's also worth noting that unlike Google, Reddit has total control over the content that they're indexing. If they detect bad SEO actors they have far more power to influence them or shut them down completely because they're indexing a platform they control. Google is indexing the open web.


Reddit search results could be better. For example, it'd be great to search only in comments, or in post titles, descriptions. Now, it searches posts only. The subreddits search shows some relevant subreddits by their names, but the other results aren't valid. E.g. type any search query and there's a high chance that r/teenagers are there too. There could be no fancy algorithms, just give more filters. Meanwhile, people try to build websites around Reddit to leverage their search(what I do).


I just don't agree with this trope that Google is "bad" or even quantitatively "worse". It's almost always based on:

1. Ads in search results. I don't agree with the premise that ads are necessarily bad. Someone selling a thing may be the best result for a search. As long as ads are labelled as such I'm fine with this;

2. Anecdotes. I can match you anecdote for anecdote;

3. Personalization: it depends on what's included but knowing you're in Chicago changes what's most relevant when yous earch for "auto repair shop"; and

4. Astroturfing, content farms and SEO gaming. To me this is the biggest problem and will be forever an arms race. The article touches on this. It's really a product of views (ie display advertising) and clicks (ie affiliate programs) generating revenue and any search engine will have this problem.

The lesson you should take from using "site:reddit.com" or "reddit" or any other term to Google search like this is evidence of just how hard search is. The 21st century is littered with the corpses of dead "Google killer" search startups.

The article (correctly IMHO) mentions Reddit doesn't get as much attention SEO-wise and that's actually what can make it useful but it misses the main point: there are communities that form around niche topics (eg vacuum cleaners) that are generally resistant to astroturfing.


> I was too young to use Google when it first got started, but according to many, in the beginning, Google had better or more accurate search results. One reason for this was the fact that most of the sites back then were indie websites, which did not care about SEO and money and created websites for sharing information.

> But I think all can agree that Google results reduced when people started to game the algorithm and create search engine optimized garbage websites.

Well not quite.

There were piles of SEO crap sites being produced in 2001. Google used to manage to react to any new exploit and improve their results. They actively fought that battle for at least a decade or so. They used to win.

It is only more recently that they seem to have given up.


>It is only more recently that they seem to have given up.

They haven't given up; the OP has a point. The "sites" you are hoping for Google to return _don't exist_. Any website online right now that doesn't exist to drive ad revenue is exceedingly rare. In 2001, there were way more websites that existed just for fun; any tom, dick and harry could open up note pad and get a website online. That doesn't exist anymore.

It's my opinion that those who complain about Google search results are frustrated that Google can no longer find a web that no longer exists


It does exist, just the same - probably more so, because the web as a whole is many orders of magnitude large now, so there's just more of everything.

However, 90% of everything is crap, so the good stuff is not buried under thousands, or millions, or mere billions of pages of crap, but trillions.

> any tom, dick and harry could open up note pad and get a website online.

This part hasn't changed one iota - you can still do _exactly_ that, even following a tutorial from 1996 to the letter, if you want. All that stuff still works fine - and I have websites that old, built that way, that are still up and work fine.


> It's my opinion that those who complain about Google search results are frustrated that Google can no longer find a web that no longer exists

You ever try to search for a niche piece of software for your computer? Google surfaces the "Top 7 Best software" content farm articles before you ever get to the official sites. That's what some of us are frustrated about.


Have you ever searched for a recipe in Google? Have you noticed that the top results always go on for a few pages of SEO-optimized storytelling and explanation before you actually see the recipe?

I'm sure someone at Google has. They could fix it if they wanted to, and boost well-known sites that get straight to the recipe (allrecipes.com, foodnetwork.com) instead of making you scroll through the SEO spam first. They haven't.


I used to think this too but if all the recipe websites start going straight to the recipe without the the personalization, then how would you rank them? Remember, there are thousands of these websites while there are only so many ways you can boil an egg, for example.


PageRank-like system, ideally based on real identities? (i.e. depending on how often the recipe is linked from Facebook, etc.)

Also if you are specifically doing it for recipes, just select manually websites that have precise ingredient amounts, very detailed and explicit recipes, that explain the reasoning behind each step, provide photos of how things should and shouldn't look like, why the proposed technique is optimal and/or the tradeoff of alternative techniques, the chemistry involved, comparison with other recipes, etc. (although in my experience most or all recipe sites are horrible and aren't even vaguely close to something like this)


It is largely Google's fault.

Example. If you searched the word "flight" in 2002, the first two results would be NASA links and the third would be the Museum of Flight. These pages still exist. But now the entire first page of Google search results are cost comparison websites. This is Google's badness, not the web's.


I mean, you do you but when I search flight I for one am looking to purchase a flight. If I want NASA links or the Museum of Flight I would specify that.


Alternatively, I could have searched for "flight comparisons" or "US airlines" back in 2002 if that's the type of result that I actually wanted.

The problem is that there is no way I can tell current Google to give me stuff that's interesting, novel and information dense anymore. It's all well and good if I knew that the Museum of Flight existed before my search, but the point of Google is to tell me about its existence. I can't tell Google "flight ... but no cost comparison or airliner website stuff"; such a query is currently not possible.


It's not possible because the demand for this is very low and implementation cost is extremely high. No one will pay for that.


> demand for this is very low and implementation cost is extremely high.

Demand is real as evidenced by everyone in this thread wanting such a search engine. We are not representative of genpop, but we aren't insignificant either.

Google had this in 2002 (i.e. a default to novel and info dense results unless otherwise specified), so it can't be that hard or expensive.

> No one will pay for that.

So we have ads.


> They haven't given up; the OP has a point. The "sites" you are hoping for Google to return _don't exist_.

I was beginning to wonder if this explanation was true but after starting to use Marginalia (mostly for fun) I now know the old web still exists - Google just doesn't let me see it.


They have given up, as in if you Google anything technical these days you'll get tons of spam sites who just copies the content from GitHub and StackOverflow. Google used to punish sites that just copied content from other sites, but that isn't the case anymore.


No. There still are forums on lots of topics. But search how to fix X and instead of prioritizing a forum site with answers you get 5 top results trying to sell you the parts to fix X. The answers are always way down the list.


>There still are forums on lots of topics.

What is Reddit if not the largest forum in the world? You have the same gripe; you are looking for a web where people share things for free (like in forum); and when Google first existed most of the web is like that.

And if google were to prioritize forums; then SEO blackhats would do everything in their power to make every website look like a forum; and then you are back to square one. The problem is the profit motive, not Google's algorithms.


Yes but google crawls the internet it would be pretty trivial for it to scan a site and realize it was full of affiliate links and trying to sell things. They could rank those sites lower then just a site with text and not trying to sell everything.


Explain away how literally verbatim searching with quotes doesn't return verbatim results. Or how searches that match the very title of an article returns mountains of SEO spam before listing the article, if it shows up at all.

If there's an algorithm being gamed that causes that, that algorithm was designed explicitly to be gamed.


Google intentionally finds synonyms for obscure jargon that would have typically turned up these sites. The verbatim setting barely makes a difference.

The reliance on synonyms and semantically related topics has made iterative refining impossible as well, it's essentially non-converging.

My solution has been to search in multiple foreign languages, which looks less conducive to these "affordances".


Yeah, having spent a lot of time working with dense vector based techniques, especially in the context of a domain which is really similar to semantic search, I can say 100% that I don't want BERT and related techniques anywhere near my google search. That's the real crux of the problem, is the reliance on these sorts of models, and their utilization within query rewriting schemes.


YES exactly THIS.

When the "No Search results" page was deleted, the internet died.


Sure they do. It is just in 2001, there was 30M websites, and let's say 5% of them were good. And in 2020, there is 2000M websites... and the number of good ones is about the same (because there are only so many people willing to spend time making personal website). So the fraction is much lower now.

It is easier than ever to open your websites, and there are tons of free providers today. Look at any manual aggregator -- like hacker news or hackaday.com -- to see them. This is a variety I have not had in 2001!


Part of the problem is that Google started heavily favoring the age of a domain about a decade or so ago. If you spin up a new site, with high quality content, you simply won’t show up in Google for a long time.

Also, the amount of good content today is vastly larger than it was in 2001. Though you are right about the ratio changing.


Back in 2001 advertising was far more lucrative than it was today. So a website could be both: useful, and a for-profit, ad-supported enterprise. Hell, the reason why most online versions of newspapers were free was because advertising was so lucrative. Now, ads pay peanuts, and only make sense if you already have ludicrous scale, or are able to game search results with low-effort """content""" to get that scale.[0]

So it's both "less signal" (fewer high-quality websites) and "more noise" (more spammy SEO-to-oblivion sites).

[0] A similar thing happened to video advertising, which is why even really good YouTube videos are filled to the brim with ad integrations, merch tie-ins, and calls to like and subscribe.


In this vein it used to be easy to find dissenting opinions on something:

<Topic> sucks

RIP "sucks".


Because its simply not possible. There are too many barriers now. You need to get a static ip for one. Domains cost more. Email for those domains costs quite a bit. And your friends are far less likely to visit since theyre all addicted to infinite scrolling websites


> It's my opinion that those who complain about Google search results are frustrated that Google can no longer find a web that no longer exists

That, and the Google search results are bad for the parts that do exist.


Well you are stating your opinion as fact, which is not useful when your opinion is wrong, which it is in this case.

Still lots of great content still out there, often times the target of my original Google query.


I think the incentives are not there anymore.

Before the rise of smartphones a competitor could have emerged and take users from Google. Yahoo wasn't completely dead yet. Platforms were mostly open, a sea change could have happened (though highly unlikely).

Into the smartphone area, other engines became anecdotal (bing just exists in the background), there is enough lock-in to not have to fear against direct competition, and the main focus becomes having users spend more time on the web, search more, instead of staying in native apps. That's were, if my memory is correct, they ditch language based landing pages and start fazing out modifier support and all the "power user" part of their interface.

Fighting SEO scams becomes a waste of time when it has no real impact on wether user will continue to use the product or not.


When it came out, PageRank was a really innovative way to order search results that cut through most of the "SEO" at the time, which was webpages doing lots of unsophisticated things to range well for a given topic (e.g., putting that topic in the <title> many times).


Yeah. PageRank worked because web site owners were linking to each other as a service to the "Surfers", signaling what they think are high quality sites. There is an incentive to keep sites high quality or it'll risk being delinked.

When people stopped "surfing" and starting "searching", the whole mechanism broke down. Plus Google started using other metrics for ranking sites these days.


SEO before Google PageRank was basically submitting a site multiple times in an online form to Altavista and it would rank higher. As a teenager I automated this with the only programming language I knew, mIRC scripts.


The real experts or people who create great content just for the sake of sharing information never care about backlinks and whatever else Google requires for SEO.. OTOH the SEO scammers like Pinterest create little content and give the search engine exactly what it wants (backlinks, social signals, etc) which is why the good content is so hard to find.

But at the end of the day it's the search engine's job to separate the wheat from the chaff and google has been doing a very poor job of it lately. Like the article right said, the best is to prioritise websites that are about sharing knowledge not affiliate marketing, paid ads, marketing, etc.


Google is an advertising company. So it makes sense that their search engine (and youtube) exist as a platform for advertising. It really is that simple.

Reddit don't want people to be able to easily find information. They want to make it just easy enough that you wont leave but hard enough that it will take you longer to find what you need. They are all about retention and finding that balance between aggravation and engagement. Twitter is the same.


I wonder if is Google the defacto search engine because they actually cannot be beaten or is it because it's virtually impossible for a new player to "scrape" the internet as easy as Google does.

Websites, including Reddit, are happy to give all their data to Google bots, but if anyone else tries to crawl them they will get black-listed very fast even if they follow the robots.txt rules.


I lobbied heavily to crawl my company's intranet since our official IT folks would not include the ~10k "grey" (as in unofficial) web servers set up by technical people over the years.

So I tried this myself and quickly realized how hard it was. I stumbled upon thousands of devices that had TCP port 80 (and 443) open so I had to devise various ways of removing these devices.

By the end of my project, I had run out of disk space so many times it was laughable. And tuning the crawling and the resulting mountain of data was daunting and started to affect my day job so I eventually gave up.

A couple of months after my "project", and with enough warnings from IT and our network security folks, our company decided to purchase a couple google 1U servers.


"However, once Reddit creates a search engine, and once people get to know that there is an opportunity to game the system and create a financial opportunity, people will abuse that system and we will be back to the place where we are now. SEO stuffed websites."

Reddit already has a "search engine" to search reddit.com. People used to call this "site search". As Google became popular, some websites gave up on site search and outsourced it to Google. Nevertheless, plenty of websites still have their own search engine. They do not use Google.

AFAICT, Reddit does not use Google for site search. Let's put the premise in the author's title to the test. Someone suggest some search terms. We will then search (a) using Reddit's site search and (b) using Google search with "site:reddit.com". We can then compare the results from (a) and (b).


Reddit does have its own search engine, it's just bad. The premise in the author's post is that as soon as it becomes "good", the SEO optimizators will ruin it in the same way they ruined Google search.


I see reddit as a replacement for the old "forum" search function that Google used to have. It's an easy first place to check and see if someone has information able a very specific niche.

An example would be something like the UKVisa subreddit. I want to know if someone else has run into the same issue that I had with the god awful 3rd party visa processing website that the UK have. I'm not able to find that information of Google - the best I can hope for is something along the lines of a general guide for how to process a visa from an immigration law firm. But on Reddit there is almost certainly someone who has run into the exact same issue, as well as others who have been through the process and can answer from first hand knowledge.


Kagi is the search engine I'm most excited about. I'm using it as my default now and will pay when it exits beta.

The placement of financial incentive at the user should help them invest in combatting spam and returning good results. The results are already pretty solid.

Money easily creates perverse incentives in media. The advertising driven model is the easiest to scale. You lose the friction of signups and people don't want to pay anyway. They'll happily blow $50/month at Starbucks but $10/month for media is too much. Unfortunately the advertising model places the incentive around capturing user attention rather than delivering anything useful, so it incentivizes the media equivalent of empty carbs and sugar.


i remember history very differently. the old search engines before google analyzed only page content in isolation, which lead to SEO sites appending invisible blocks of keywords at the bottom or putting loads of barely related keywords in the page title.

google was the one that managed to break this cycle with pagerank. suddenly, the spamword filled trash sites were worthless because nobody (of worth) linked to them. on the other hand blogs flourished, because their structure worked well with pagerank.

it took a long time for the spammers to re-build their networks and strategies to adapt to the situation, but even then having a well-written and regularly updated company weblog usually did more for your search ranking than other SEO trickery.


This.

In the early days of Google search, there were lots of other search engines (Hotbot and Altavista come to mind), and they were crap compared to Google.


Although this article has many valid points, it's important to say that Reddit is mostly used by english speakers. (Maybe 1/3 of the world, i think that probably less). I'm a software developer and i think that most of the time i prefer websites like stack overflow or github issues directly.

As a software developer i see that reddit answers most questions like "Is it better to use A over B?", "Why using C can be bad for my code?". It's for questions that are more open and less exact or technical.

So yeah, i don't see reddit being used as a search engine because english speakers are not even half of the world and it definitely don't answers most daily basis questions.


> One reason for this was the fact that most of the sites back then were indie websites, which did not care about SEO and money and created websites for sharing information.

I was around when it started. Let me tell you there was a massive amount of SEO going on back then. The big driver to Google back then was that it was a crawler not directory, so you could get newer results. But most importantly the search was clean and the landing page was empty.

All the other search pages were so rammed with crap and asking to install ask Jeeves toolbars. Google was minimal.

Webrings were the main way to find indie sites for me anyway. Digg was always good too


> I’d still prefer a targeted ad over a random ad that is unrelated to me.

No, a million times no. A targeted ad requires my personal information being collected, and the resulting assumption of my needs may be offensive and even dangerous.

I prefer a random ad. Or better, a place where I can just search the product I want in detail.


"Random ad vs targeted ad" is a false dichotomy. There are also contextual ads. For example, the audience of Linus Tech Tips regularly gets their fair share of contextual ads delivered by Linus in-band. You're not going to see an ad of a toilet paper on LTT. Instead you're going to see ads of hard drives, Squarespace, online shopping platforms, etc. The ads are the same for all viewers (i.e. definitely not targeted) and don't require any JS spyware. Moreover, LTT (and other creators) get more money from those ads than from YouTube ads. Everybody wins (except Google and Facebook).


Everyone wins except the viewer who now has to manually skip through ads because adblockers don't block those type of ads


SponsorBlock might be of interest.

> Skip sponsorships, subscription begging and more on YouTube videos. Report sponsors on videos you watch to save others' time. [0]

[0]: https://chrome.google.com/webstore/detail/sponsorblock-for-y...


how do you expect the producers of the content you consume to continue putting out the content you want to consume if they're unable to monetise that content?


Honestly, that's not my problem. "How do you satisfy your customer's needs while making a profit" is the same problem every other business has. It's up to the content creators to solve it. If they can't, then they go out of business. It's that simple.

If you can't make a profit from your content because you're giving it away with blockable ads then you should stop doing that and do something else.

If I went to a Ferrari showroom and they said "Here's a free Ferrari, but you have to keep that advert we've put on the windscreen. If you take it off we'll have to stop giving Ferraris away.", then I am driving a beautiful ad-free Ferrari around now and the showroom can't do anything about it.


> It's up to the content creators to solve it. If they can't, then they go out of business. It's that simple.

They are solving it, over and over, for the past 20 years. The way so far is to find new ways to shove ads down enough peoples throats. And the internet can't stop complaining about them doing so, understandably.

> Honestly, that's not my problem.

So, yeah, maybe it kinda is. A state of war is mostly a lose-lose situation.


A state of war is mostly a lose-lose situation.

It's not lose-lose though. The creator loses, but I don't. If I don't care about a specific creator (and I obviously don't if I don't pay them and I block their ads) then I'm only losing access to the content they produce, which I value at essentially zero. So, really, I'm not losing anything. I just move on to the next YouTube video/podcast/whatever.


That's a misrepresentation of what's happening. They've largely not tried to solve the adblock 'problem' because users are in the minority. And yet, despite that, they've still ramped up adverts. They're always going to advertise the maximum account they can before it hinders returns - that's economics.

So I could argue that by using blockers I'm reducing the amount of advertising the populace will take and increasing the value of a given, lower level of advertising.


Why would consumers consume content if 50%+ of the content is ads? Adblocks make content consumable, without them, i'd stop watching 80% of the stuff I watch now. If ads (the amount and the type) were sensible, then sure, most of us wouldn't really care that much about them, but if a 35second "#shorts" video has a 20s ad at the begining, a "this is sponsored by..." by the author, and a 15s ad at the end, then we get pissed. Also, every fucking video has the same ad... raid shadow legends, nordvpn, raid, vpn, raid, vpn... same thing in every video. It really became so bad (for me personally), I'd rather not watch the video at all, if I can't avoid the ads, because it's just too much.


Seriously. I don't get why people expect that they should pay nothing to watch their favorite content creators. Sure you bought an LTT water bottle, but make no mistake Google is a business, YouTube is a business and they are not providing these videos out of the goodness of their heart. If YouTube ever doesn't make sense (algorithmically, financially, or through whatever metric Google decides is important) Google will cut it.


I don't think people expect anything for free but I don't want to pay with getting bombarded with ads. If their business model is not viable then they have to find a new one. Just as we can't expect content for free they can't expect me to sit there and look at adverts out of the goodness of my heart. If at all possible, I will find a way to block ads out of my mind as much as I can.


Cable TV Promises.

Buy our service and get no ads. 5-ish years later, just like OTA.

NY Times ('81) https://www.nytimes.com/1981/07/26/arts/will-cable-tv-be-inv...

Satellite radio, mostly no ads. Tune to the "laugh" channels ... tons of ads.


Personally I don't feel a moral dilemma about not giving revenue to YouTube.

That said, if Person A buys an LTT water bottle and block all ads and Person B watches all ads and never buys anything, then Person B needs to watch thousands of ads to make up the same revenue from that single merchandise purchase.

I agree with Linus that skipping ads is piracy, but if you finance it in better ways should you care?

For reference: https://www.youtube.com/watch?v=6jUxOnoWsFU


That was basically his stance. It's piracy, and I see no issues with it.

I'm not sure why people got absolutely butthurt about that.


Linus point has a slightly different nuance: adblocking is theft. Similarly to how Taylor Swift was against Spotify because (IIRC) people should buy albums.

My version was that adblocking damages revenue in very quantifiable ways* and you can work around that.

* I am almost sure that YouTube's recommendation algorithm does not counts watched ads as a metric, that is maybe watching ads increases the total watch-time of the video but otherwise an ad-full view is counted the same as a ad-free view. But this belief is not based on anything.

If this happened to be wrong then an adblocker could exponentially hurt a channel growth and that might equal quite a few water bottles.


Oh no, YouTube shuts down and we all have to go outside and read a book. Realistically, enough people will remain ignorant of ad blocking hygiene that YT will remain relevant to the big G for the foreseeable future. "What if everyone made the same choice?" We already know they don't.


People have been able to tune out of, skip or ignore advertising on every media platform since the beginning of time. There is no obligation, anywhere, on the part of the consumer to engage with advertising in exchange for content.


> I don't get why people expect that they should pay nothing to watch their favourite content creators

Because they where offered up for free? And only now are charging?


I for one could do without a large percentage of the content made for monetization.


I think the honest answer is that one expects to be in the minority that uses adblock -- and for the content to be monetized by the majority that don't.


I'm in a restrictive country that's extremely difficult to get stuff to, I don't have the standard payment methods and instead all these internal only services.

There is absolutely no point in me watching those because I couldn't buy anything even if I wanted to


Behind a privacy respecting paywall if they want. Let the market decide if their content is worth some dollar amount. Advertisement is never the answer, IMO.


I don't mind sponsor slots - they directly pay the content creators and don't involved me to be stalked as I go about my life.

If the sponsored segments get irritating, and in some cases they have, I just go elsewhere. If I don't want the content enough to put of with the sponsor segment then I simply don't want the content enough.


It depends how you define targetted. An advert for Intel while I'm looking at a technical site is targetted in the traditional sense, like advertising sports equipment and beer in TV broadcasts of football (where the audience is likely to be interested in one, the other, or both), and I'm fine with that.

Like you I object to adverts that are targetted by stalking me as I go about my daily life, or buying & selling information gleaned about me by an entity performing such stalking. Targetting based on what I'm actively doing right now seems fine.

(and away from the targetting question, I also object to other dark patterns like pop-ups/-unders, auto-playing audio & video, etc.)


Yeah totally, that statement is actually absurd to me. Because ANY ad is equally off putting. But a targeted ad is based on personal information.

Regardless of how the ad was made, I will do anything to avoid it, just as I would any other ad.


A targeted ad can be a contextual ad, with no personal information involved.


Ads are not the fundamental problem. The fundamental problem is tracking.

The internet has not been broken by ads. It’s been broken by data harvesting and monopolies. Contextual ads are no problem IMO, and can sometimes be a helpful (economic) signal. Without ads how can challengers get some awareness?

So yes for targeted ads, but only if it does not require personal data. Contextual ads can be targeted without engaging in use of personal data.


I hope you realize this is just your perspective.

To offer the alternative, to me ads are the fundamental problem. As much as I don't want to be tracked and avoid providing my information in any way possible, I just assume I am being tracked. It does not "annoy me" while consuming some content.

On the other hand, any random ad / paid promotion that gets through the blocking does annoy me immensely and I do consider nearly any form of advertisement harmful.


Sure I realize. I'm not alone in this thinking but reaslize I am out-on-a-limb. I totally understand your point-of-view; it was how I thought for most of my life. Perhaps it was my shift from being mostly developer and one-company focussed to later helping and/or working with or alongside hundreds of startups, including YC S12. Without ads to gain awareness and gow many now great companies would not have crossed the chasm. And we all benefit from those companies, from innovations that flourished, and use them in our toolset. Sadly a few of them acheived their own, or backers, goal of being a monopoly, leading to the annoyance, damage and extraction we all experience.


Personalizing ads per se does not need tracking or profiling. Doing it server-side and checking if the targeting works does.

the Mr Robot scandal for better or worse, was implemented almost like this

https://news.ycombinator.com/item?id=15941302


This is too dismissive. Reddit users and communities have history, links, upvotes etc, which is hugely valuable, and it should be possible to implement a PageRank-type system to separate the signal from the noise and surface the most valuable results. Now that noise is still considerable, thanks to karma farming etc (did you know you can buy and sell Reddit accounts?), but humans can usually tell apart the shills from the legit posters and automating this doesn't seem unsurmountable if Reddit cared. Which, oddly, they don't seem to.


> once Reddit creates a search engine, and once people get to know that there is an opportunity to game the system and create a financial opportunity, people will abuse that system and we will be back to the place where we are now. SEO stuffed websites.

reddit already has a search feature. It's garbage, but it does exist.

Also, plenty of people already use Google to search reddit. If reddit integrated a Google-powered search feature, would that be a game-changer? I doubt it. What would be the difference compared to if they implemented their own search?


When I saw the reddit example, I thought of quora. Quora will never be a search engine because all users need to stand out is to add something. and what is usually added is garbage.


I sometimes wonder if we would have had very good search engines if one of the Tech Giants invested in a huge amount of manual information entry/organization instead of optimising around spam and junk websites.

Like Encyclopedia Brittanica, but on steroids. With 500.000 editors and a couple millions volunteers instead of a couple hundred editors.

Google has 135.000 people on staff right now, all very highly paid. They could have a massive advantage if they had their own unique dataset.


> Like Encyclopedia Brittanica, but on steroids. With 500.000 editors and a couple millions volunteers instead of a couple hundred editors.

DMOZ was pretty close to that, but freely available. And Google now has their Knowledge Graph, which is internally curated but largely fed from outside sources including the free Wikidata.


SEO tricks is shit to me.

The content itself is the SEO.

If i were building the search engine, i'll just ignore all nonsense website with their tricks as well as clunky tracking script in their header.


This misses the mark, Reddit's search is completely useless for any sort of query, and has been for at least a decade. Searching for a post's title verbatim but without quotes will frequently return completely unrelated results. That combined with the fact that Reddit constitutes something like 90% of downtime of top 10 global websites and that their video player is, by all accounts, terrible, tells me they have institutional technical problems.


They don't have to. They just have to be a better option to look for acceptable "answers" (and can be acquired even via google anyway, at least for now).

The real underlying point is the degradation of Google's default results; manipulation of boolean syntax to filter for certain characteristics and Google becomes a tool to find answers elsewhere rather than from Google's own results directly.


Reddit would need strong anti-fraud/anti-spam expertise, because currently most of BlackHat SEO masters are focused on Google.

Subreddits with their own moderators can work for the current traffic (and probably 10x that), but the quality of the platform is likely to degrade if reddit received >10% overall google search traffic.

I am not saying Reddit can't do that, and I would be really happy to see it happen.


The payoff of doing SEO probably wouldn't be anywhere near what it is for Google search. I suspect many reddit users don't know it has a search feature, or tried it exactly once. The basic usage is scrolling through posts. No matter how good their search becomes, the thing you'd want to optimize is presumably be getting onto people's front page.


I think this article misses the fact that Google polices spam/seo hacks via software automation whereas Reddit (in theory) polices it via crowdsourcing and moderators. These volunteers are super vigilant as they seem to treat the subreddits as their turf or property. If we could downvote google results and share those with others, that would be closer.


I still remembered Panda change they made in 2011.

It is vividly burned in my memory because I used to work for one of those trash website that polluted Google search results.

When Panda came out, I was like… what took you so long (also, time to get a new job).

For whatever reason, Google is too shy to pull that move again.

Just imagine, Pinterest could easily be obliterated in one random day.


I think this is actually a good argument. Once you build a search algorithm then people start trying to exploit it for visibility. It creates a reward system that conflicts with Reddit's voting system that's already there.

I just wish Reddit would get rid of all the NSFW content on the site, it makes for a juvenile atmosphere.


> So how can we fix the system? The best way to fix the system is to prioritize websites that are there to share knowledge, not websites with their primary priority to make ad revenue.

Which makes it a shame that Google is in the business of pushing ad revenue rather than search.


Reddit posts are good because the people who create these posts or make comments are doing it to share their knowledge.

So how can we fix the system? The best way to fix the system is to prioritize websites that are there to share knowledge

OK, so append Reddit to the google search, got it.


Reddit may not be able to but using Google to search Reddit works great :).

Here is a simple site that you may want to bookmark.

Whenever you start to type goo.. just visit https://gooreddit.com/


A lot of people don't use search in Reddit. They'll ask the same questions over and over ("Why is "other" taking all my storage???").

Reddit could make a better search engine than google and it wouldn't matter.


Yeah, much of reddit's content is repetitive, even so much that commenters will complain about "reposts".

If reddit ever built a search engine, it would need to build a graph that would cluster all these repetitive posts. Users would then be able to easily find a previous post before they went on to create their own repost.

This would lead to less posts and less avenues for other users to engage, which is counter to Reddit's own model and probably why Reddit doesn't want a search engine that works.


They're already aware of it. There's astroturfing, brigading etc. Reddit moderators help a lot, but the community itself points out bots or suspicious accounts. That cleans out a lot.


Let's bring back the old dmoz.org [1]

[1]: https://en.wikipedia.org/wiki/DMOZ


REDDIT POPULARITY ON GOOGLE TRENDS https://i.imgur.com/oAGwCZc.png



I know of companies that employ hundreds of people to do "webutation" or "guerilla marketing". That involves creating a bunch of accounts on social networks (reddit as well) and posting positive info about their product in relevant places. I have zero confidence in any product review on the internet, if I don't know the poster personally.

When I was younger, I thought verifying identity to connect to the internet would be against my rights or something.

Now I think this would be great, though probably impossible to coordinate around the world.


Search Reddit via Google: https://www.searchbettr.com/


“The best way to fix the system is to prioritize websites that are there to share knowledge, not websites with their primary priority to make ad revenue.”

The best way to fix the system is to prioritize the best content irrespective of monetization strategy.

Search engines should not penalize websites trying to make money.

They should penalize spammy websites that add little value but know how to game the system.

The degree to which the creator monetizes their content should bear little weight with a search engine that is trying to serve the best content.


I haven't seen much discussions about solving the entire "search sucks" mess by simply introducing rich filtering for the results.

The way I see it is that Google is still some weird magic box that throws a bunch of random results out based on some black magic logic.

It's beyond my comprehension that there are no filters on any on the search engines (at least I don't remember seeing any).

By having some filtering options, such as "No JavaScript", "No ads", "SSGs only", "Page size < 1MB" and so on, search would be much better.


I definitely agree with this one, that's why I use Google search even though I'm searching for a specific reddit content.


insightful.

Even Amazon can't build a better search engine. Besides reddit, I always now have to use Google to find specific products on Amazon. I can't understand why Amazon doesn't return them. These same products are not discoverable on Amazon by browsing either. But somehow Google is able to catalog them.

eg: 'ohto ceramic refill blue'


So I can kill Reddit if a build a search engine for it that becomes popular and Reddit users start optimizing for that?



that's just false. I often want to advertise in subreddits but either the mods don't allow anyone to advertise or the reddit ads are really crappy and juvenile. If i could have a spot on the sidebar of reddit search results i d happily pay more than i do for their cpc ads


Either the problem is very hard or reddit's technology culture is crap.

If they go public I would not be bullish


Reddit can’t even make a rich text editor for comments that isn’t borked in browsers.


Remember when Google said it was going to punish blatant SEO?

Must have been April 1st.


Reddit organizes information hierarchically. No shame in that.


Astroturfing alone Reddit is not a good search engine.


FOSS can't build a better search engine either.


Maybe Google should just add an upvote button.


The lack of a Reddit search engine suits me fine. If I want to use Reddit then I go via Google, visit the page that has the information I need, and RUN.

I don't want to spend any more time within the Reddit ecosystem than that. It's a vicious miasma of hate and stupidity. Is there any community on the Internet more susceptible to propaganda than the front page of Reddit? They make boomers on Facebook look like James Randi in comparison.

At least Facebook and Twitter actually try and flag misinformation. Reddit actively promotes it, and it's by design!


That could have been an ema... comment.


Then ban the people gaming the system


reddit is good for customer reviews of nieche products, not much else


Nice troll in the URL.


URL: Reddit can

Title: Reddit can't


Nice troll in th


The idea of reddit building competent software is hard to take seriously. I am appalled every time I try to buy an ad. Those folks have no idea what they are doing. Rank amateurs.


There is a Mark Zuckerberg quote about twitter, something like "Twitter is a clown car that fell into a gold mine." I often think that about reddit, minus the gold mine.


tl;dr: • No one can build a better search engine, bc it will just get gamed to show bad results. • Google can't built a better search engine bc they are beholden to ad rev (per Brin/Page's 1998 warning.)


There is a Chinese counterpart/copycat of Reddit --- Tieba. It is owned by Baidu (counterpart/copycat of Google). More introduction can be found here https://en.wikipedia.org/wiki/Baidu_Tieba.

Tieba was once popular just like Reddit. Then, the management (of Baidu) figured out how to make money with Tieba and its search engine to promote brand and erase negative comments. Not much people use it anymore.

The point in the article is pretty convincing to me.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: