
A Graph of Related Subreddits - freediver
https://anvaka.github.io/sayit/?query=programming
======
anvaka
Oh hey there, I'm the author of this project.

Just wanted to say thank you for sharing this! Would be happy to answer any
questions - graphs are my long time hobby, and I love them!

PS: You can find more recent graphs and fun projects here:
[https://twitter.com/search?q=from%3Aanvaka%20min_retweets%3A...](https://twitter.com/search?q=from%3Aanvaka%20min_retweets%3A20&src=typed_query)

~~~
bobosha
How does this infer subreddit similarity? For instance, I checked for
r/AskHistorians and the results don't seem that relevant.

edit: never mind, I just read your GitHub readme. But the question still
stands as if "users posting in x, also posted in y" is a good way to infer
similarity. Could comparing top-ranked posts be a better comparator?

~~~
anvaka
I used my own metric that is based on jaccard similarity. Which in turn is
based on "users who posted to X also posted to Y" metric.

That said, there are a few subreddits that are too popular and similarity
results were too saturated (/r/videos, /r/funny, etc.) so I did a manual
override by looking into most commonly mentioned other subreddits, and
sometimes into `about` blurb of subreddit).

Please don't consider these recommendation as source of truth! It's just a fun
way to discover other subreddits :).

I'm also very open to change this metric to something else - please let me
know if you have any recommendations!

[1]: [https://github.com/anvaka/sayit#the-
data](https://github.com/anvaka/sayit#the-data) \- describes the data,
indexing scripts are here:
[https://github.com/anvaka/sayit/tree/master/scripts](https://github.com/anvaka/sayit/tree/master/scripts)

[2]: Manual overrides can be found here [https://github.com/anvaka/sayit-
data#sayit---recommendation-...](https://github.com/anvaka/sayit-data#sayit---
recommendation-data)

~~~
ketralnis
I work for reddit and here I’ve used a similar technique to your jaccard
distance but with one twist: divide by the size of the smaller subreddit (in
your case, the number of unique posters that you’ve recorded). That gives you
a directional relatedness, that is programming->python but not necessarily
python->programming. Used this way you account for the giant subreddit problem
automatically but now the results are less “amitheasshole is related to
askreddit” and more like “linguisticshumor is a more niche version of
linguistics”.

The great thing is that it’s actually more actionable as far as
recommendations go! Everybody has already heard of the bigger version of this
subreddit, but they probably haven’t heard of the smaller versions. And it’s
self-correcting. As a subreddit gets bigger we are less likely to recommend it
(which is great because it needs our help less)

~~~
anvaka
This is super awesome, thank you for sharing!

If you guys are interested in seeing how your recommendation work for the
entire reddit, I'd be happy to build you a spaceship similar to this one
[https://github.com/anvaka/word2vec-graph](https://github.com/anvaka/word2vec-
graph) .

I couldn't find an easy way to download the entire recommendation graph, but
it would be awesome if we could make it work. My email is the same as this
account at gmail, and twitter is all open:
[https://twitter.com/anvaka](https://twitter.com/anvaka)

~~~
yantrams
In case you haven't come across it already, here is a very exhaustive list of
distance measures for dealing with problems of this kind -
[http://www.iiisci.org/journal/CV$/sci/pdfs/GS315JG.pdf](http://www.iiisci.org/journal/CV$/sci/pdfs/GS315JG.pdf)

I fooled around a bit with lastfm data for band recommendation and found this
sheet quite helpful.

If you are interested in learning more about asymmetrical similarity, here is
a great primer by Tversky - [http://www.cogsci.ucsd.edu/~coulson/203/tversky-
features.pdf](http://www.cogsci.ucsd.edu/~coulson/203/tversky-features.pdf)

~~~
anvaka
This is absolute treasure trove. Thank you so much!

------
mckirk
The most interesting graph I could find so far:
[https://anvaka.github.io/sayit/?query=chairsunderwater](https://anvaka.github.io/sayit/?query=chairsunderwater)

Speaking of which, it'd be awesome if the site could automatically generate
multi-reddits from the results. I think a multi-reddit constructed from that
graph would be quite interesting ;)

~~~
semipro
Thanks, I knew about r/chairsunderwater, but never heard about
r/breadstapedtotrees. My life will never be the same.

~~~
anvaka
I didn't know about either. Utterly impressed

~~~
mckirk
I found r/chairsunderwater while looking for a subreddit that could help me
pick out a new office chair.

r/chairs has 800 people subscribed. r/chairsunderwater has 115k. It's just
reddit things ¯\\_(ツ)_/¯

------
GloriousKoji
Bon Appetit is a New York city based food magazine that also has pretty
entertaining YouTube content.

The graph is extremely shocking and not at all what I would have expected.

[https://anvaka.github.io/sayit/?query=bon_appetit](https://anvaka.github.io/sayit/?query=bon_appetit)

------
orf
The_donald members often post in TwoXChromosomes and TropicalWeather?

[https://anvaka.github.io/sayit/?query=the_donald](https://anvaka.github.io/sayit/?query=the_donald)

~~~
throw_m239339
In fact, a lot of users seem to post both on extreme right-wing and left-wing
subreddits, I did the test for a few of these subs, and it's just baffling.
Either it's because of the phenomenon called "brigading", where a thread from
one side is linked on another sub which leads users of the latter to post in
the former, or a lot of users are just trolls playing both sides and inciting
drama and outrage between people just for kicks.

~~~
papln
> inciting drama and outrage between people just for kicks.

or

"popping their filter bubbles", or "are generally interested in boundary-
pushing ideas", or "are susceptible to the rage-inducing trolls who run fringe
communities".

"Affinity for extreme ideas of any kind" maybe a stronger / more common
personality trait than "interested in one extreme point in the vector space of
ideas".

------
rednerrus
The link between r/Android and r/iamverysmart seems legit.

~~~
soylentcola
Link also exists with r/apple and r/ios (but not r/windows or r/linux).

------
rewq4321
See also: [https://subredditstats.com/subreddit-user-
overlaps/programmi...](https://subredditstats.com/subreddit-user-
overlaps/programming)

~~~
Glosster
This one seems to be doing a worse job. Compare what I'm getting for
/r/longevity:

[https://subredditstats.com/subreddit-user-
overlaps/longevity](https://subredditstats.com/subreddit-user-
overlaps/longevity)

vs

[https://anvaka.github.io/sayit/?query=longevity](https://anvaka.github.io/sayit/?query=longevity)

------
FredrikMeyer
Earlier discussion (june 2019)
[https://news.ycombinator.com/item?id=18866800](https://news.ycombinator.com/item?id=18866800)

------
morceauxdebois
Someone should make a graph all the moderators letting subreddits turn to
garbage bot farms

------
neiman
What a fantastic idea!

I always keep on looking for new interesting subreddits. This tool is the best
I saw for this task so far.

~~~
anvaka
I'm so glad to know this! Thank you!

------
dredmorbius
[https://anvaka.github.io/sayit/?query=hackernews](https://anvaka.github.io/sayit/?query=hackernews)

------
mapleboi
this is so sick! finally an easy way to find some new subreddits. thanks!

~~~
anvaka
Yay! Thank you!

------
iblaine
If I pick a relatively obscure subreddit like dataengineering, then the
results are noisy. May increase the distance/decrease the charge between nodes
as the number of children increases on a node?

------
itsmhuang
I'm not able to see the contents of a subreddit in the sidebar after clicking
on it. I'm on MacOS Catalina using Google Chrome 79 (pretty modern).

~~~
anvaka
I heard this might be caused by some adblocking extensions - they consider
reddit to be an ad/tracking system, so they block all javascript requests to
it. Do you happen to have one of those extensions?

------
justaman
Searching for r/aww links to lots of porn.

------
petey283
Great project.. and great implementation.

~~~
anvaka
Thank you!

------
ProbablyRyaan
*immediately types in porn subreddit.

------
shanth
would be nice to keep that renderer as separate library.

~~~
anvaka
Thank you for your suggestion!

This implementation is tailored to smaller graphs with sometimes long text
boxes.

I'll make a note to extract it to a reusable component.

~~~
totony
I second this, really neat graph

