
Show HN: A tool to find related subreddits - anvaka
https://anvaka.github.io/sayit/?query=linux
======
charlieegan3
This is a great idea for a project, the 'users who also posted' metric seems
to have worked really well.

The site seems to fail to load the 'hot' items for the subreddits when I click
on them but that's not a big deal for me. On closer inspection, it doesn't
seem to be making any requests. Just says `Failed to download
[https://www.reddit.com/r/thinkpad/hot.json`](https://www.reddit.com/r/thinkpad/hot.json`)
etc

~~~
anvaka
hmmm... I don't see the error on my end. What browser do you use? Can you try
in "incognito" mode? Are there any extensions that might be blocking this?

~~~
jsloss
I'm getting the same error on Firefox Quantum.

~~~
anvaka
Hm... I'm at lost.

[https://jsbin.com/fuyijan/2/edit?js,console](https://jsbin.com/fuyijan/2/edit?js,console)
\- this works in Chrome, and non-private mode of Firefox Quantum (64.0.2
(64-bit)). However when I open private browsing in Firefox Quantum request
fails.

Anyone might know why?

~~~
EvilTerran
That sounds like Content Blocking kicking in - that's only active in Private
Browsing by default: [https://support.mozilla.org/en-US/kb/content-
blocking](https://support.mozilla.org/en-US/kb/content-blocking)

I note that page says "By default, content blocking uses the Disconnect.me
basic protection list" \- and reddit.com is on that list:
[https://github.com/disconnectme/disconnect-tracking-
protecti...](https://github.com/disconnectme/disconnect-tracking-
protection/blob/master/services.json)

(I'm guessing reddit's "social button" is considered a tracker.)

[edit] confirmed, it's definitely Content Blocking: I just loaded that jsbin
in an FF private window, and there's a message in the console to that effect.

~~~
anvaka
Thank you so much! I opened an issue here:
[https://github.com/disconnectme/disconnect-tracking-
protecti...](https://github.com/disconnectme/disconnect-tracking-
protection/issues/67)

------
Smithalicious
It's a cool tool, but it seems very biased towards bigger subs. If you let it
loose on a small sub it will emphasize that big, kinda-but-not-really related
subs over tiny-but-closely-related subs.

~~~
ppod
Using Jaccard has this effect, mutual information would correct more for the
independent frequency of the posts per subreddit.

~~~
Smithalicious
It's a shame since this tool would be particularly useful for recommending
small subs. I don't need it to tell me about big subs, since I already know
them.

------
patcon
This is seriously amazing man! Interesting to see how different subject-areas
network themselves differently.

For example, comparing "r/permaculture" to "r/linux".

Also, looking at r/girlgamers makes me realize my privilege for being able to
navigate my interest areas without such a clusterfuck of bullshit going on:
[https://anvaka.github.io/sayit/?query=girlgamers](https://anvaka.github.io/sayit/?query=girlgamers)

~~~
swampthinker
It's really sad how toxic Reddit brigading is

------
skilled
This is awesome! My input had exactly the results I expected.

Thanks for creating this tool, bookmarking!

~~~
anvaka
Thank you! I'm very glad you liked it :)

------
viraptor
I checked VXjunkies and found the level of weirdness I haven't expected. Will
need a few hours to browse through this while nobody is around / can be
startled by sudden, random laughter...

------
nairboon
That's a cool tool. And useful extension would be if it preserves the location
history if you navigate topics, so that you can go back.

~~~
anvaka
Good call. I was worried that I'd "spam" the browser history and people who
are coming from reddit or HN would never go back to where they came from :)

------
adrianmonk
Usability improvement idea: make it easier to discover how to re-center the
graph around a new subreddit.

I spent several minutes playing around with this, and I was just typing in the
name of the desired subreddit because that was the only I could figure out.
Finally, after much experimenting, I realized double-clicking is the solution.

Oh, and a second, related usability idea: if I double-click, don't open the
preview sidebar at the right. I can see how the sidebar is useful, but if I'm
doing one action, I don't want it to have two effects. Also, I have signaled
clear intent to browse the graph, so I want more screen real estate to be
devoted to that.

EDIT: bonus usability idea/request: clicking on a node brings up the preview
sidebar. It'd be nice if clicking on it again (not double-clicking) makes the
sidebar hide again.

------
KasianFranks
Anvaka, when you accept BTC or ETH let us know, we can contribute to your
efforts.

~~~
anvaka
Thank you, Kasian!

------
hueyjj
> The relationship is determined by a metric "users who posted to this
> subreddit also post to...".

I'm interested, could you share with us the the entire metric you used to
determine the relationship?

~~~
anvaka
It is jaccard similarity
[https://en.wikipedia.org/wiki/Jaccard_index](https://en.wikipedia.org/wiki/Jaccard_index)

Also I described it a bit more here:
[https://www.reddit.com/r/MachineLearning/comments/aek3yk/p_l...](https://www.reddit.com/r/MachineLearning/comments/aek3yk/p_learning_related_subreddits_to_rmachinelearning/)

~~~
jcims
Have you tried polling profiles to see how many are sharing upvotes/downvotes?
It used to be a small percentage but is pretty informative.

------
minimaxir
You indicated that you used the Pushshift.io datasets, but how did you compute
Jaccard Similarity on a dataset of 38M?

~~~
anvaka
I didn't use pushshift, sorry. The data was collected from bigquery, stored
locally into CSV files, and then I just wrote a node.js script to compute
similarities.

~~~
Scaevolus
Did you simply collect "user has posted to X, Y, and Z subreddits", or did you
look at frequency too?

~~~
anvaka
I didn't look into frequency. Is there a version of jaccard similarity that
accounts for frequencies?

~~~
yorwba
There's a weighted variant:
[https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard...](https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance)

------
jotato
*types in DunderMifflin

related: MapsWithoutSouthSudan

I know what I am going to be doing for the next 30 minutes

------
bibyte
This is a really useful tool. It works so smoothly on my mobile.

~~~
anvaka
Happy to hear :)!

------
Phenomenit
Great,

I've been searching for a tool like this for ages, bookmarked!

~~~
anvaka
Thanks :)

------
laurynas-s
This is really nice!

~~~
anvaka
Thanks :)!

------
techaddict009
Good tool if possible add option to view result data in tabular format with no
of subscribers. As this way its difficult to use.

------
DevX101
Great tool! This site supports my suspicions that much of the activity on
/r/The_Donald is the coordinated effort of a few individuals posting across
multiple accounts. For those not familiar with this sub, it was created
sometime during the 2016 election leadup and unabashedly supports Donald Trump
with memes and shitposting. At one point, the entire frontpage of reddit was
just posts from /r/The_Donald until reddit admins had to alter their algorithm
to force the sub off.

If you look at the network graph for /r/The_Donald, it doesn't look...organic.
There are 4 clearly delineated clusters of sub related to that sub. Posters to
/r/The_Donald heavily post to /r/news & /r/politics, /r/TropicalWeather (?),
/r/TwoXChromosomes (?) and /r/AskTheDonald (and other alt-right subs).

There's not much interaction with the rest of reddit. Posters from other subs
don't also post content to the /r/The_Donald.

This is unusual.

Every other sub I've looked at there's a much more complex & dynamic graph
where users post across various communities across the site. Every other major
sub looks like a real network with dozens of interconnected links. Yet,
/r/The_Donald, with almost 700,000 subscribers only has a strong connection to
4 clusters.

The alternate hypothesis is that people on that sub heavily use alternate
accounts. This might also explain the lack of interaction with the site
compared to other subs of similar size.

~~~
zawerf
He manually overrided some large subs:
[https://old.reddit.com/r/dataisbeautiful/comments/ae88pk/int...](https://old.reddit.com/r/dataisbeautiful/comments/ae88pk/interactive_visualization_of_related_subreddits/ednceat/)

~~~
DevX101
Thanks! That's probably it then. I guess this doesn't support my hypothesis
after all.

------
bdibs
This is great, and works flawlessly!

~~~
anvaka
Thank you! I'm so happy you like it.

~~~
bdibs
It’s simple and just works, don’t stop making great things.

~~~
anvaka
Aww, thank you!

> don’t stop making great things.

Not going to ever stop! I have sooo many ideas - I wish I could be more
efficient :).

------
cannedslime
Useful little tool! Reddit humor subs are so damn specific, it can be hard to
find them all.

------
mrfusion
Is this only for tech subjects or am I using it wrong?

Edit. Somehow I missed the big searchbar at the top.

~~~
cambaceres
I tried "tits", that worked.

~~~
criddell
Ornithologist?

~~~
cambaceres
There was some cocks present anyway

------
jamiek88
Fantastic! I tested the heck out of this and found it really useful.

Already found some cool subs.

------
belltaco
You should submit this to r/dataisbeautiful if not already done.

------
ppod
Which javascript network vis library does this use? It's very nice.

------
yanslookup
I was sort of expecting to be able to click through to the subreddit...

------
benibraz
Very nice tool, thank you very much that. This is why is love HN

------
kerbalspacepro
Interesting finds:

* /r/askscience is nested at the center of defaults (I think a lot of older, famous subs will end up highly connected)

* /r/relationship_advice is kind of a loner. The graph generates six distinct subreddit clusters- feminism, lgbt-issues, counseling, and misc. science fields. The last cluster is a very large, diffuse cluster of sex/porn/depression subreddits that skew towards defaults.

* /r/slatestarcodex has distinct clusters too. 1) Effective altruism and philosophy, 2) Psychiatry, 3) Rational fiction writing, 4)Liberal-tarian, IDW defaults, 5) "Classic effort post" subs like true_reddit and depth_hub.

* /r/bigboye is a tiny part of a very large network of animal gifs subreddits. /r/animalsbeingbros connects it to a bunch of high volume gif subs.

~~~
newman314
* /r/the_donald has a surprising link to /r/TwoXChromosomes [[https://anvaka.github.io/sayit/?query=the_donald](https://anvaka.github.io/sayit/?query=the_donald)]

* /r/politics seems to have higher interconnection

* /r/awww is quite wholesome =) [[https://anvaka.github.io/sayit/?query=Awww](https://anvaka.github.io/sayit/?query=Awww)]

* /r/puppers has some strange nsfw links

~~~
belltaco
>/r/the_donald has a surprising link to /r/TwoXChromosomes

I don't think it's surprising. Donald fans on social media tend to hate
minorities and women, not surprised they would try to brigade women oriented
subs.

It got so bad that subs like /r/offmychest automatically ban people that post
in many alt right related subreddits.

------
chad_strategic
This is great!

But on a side note, I can also waste more time on the Internets!

------
diziet
Is this built on top of your work on yasiv before?

~~~
anvaka
It would be fair to say so. The core layout is the same with a bit more
polished overlap removal and animation.

------
myself248
Why do I get stuck in "dead ends"? For instance,
[https://anvaka.github.io/sayit/?query=rtlsdr](https://anvaka.github.io/sayit/?query=rtlsdr)
contains
[https://anvaka.github.io/sayit/?query=PlutoSDR](https://anvaka.github.io/sayit/?query=PlutoSDR)
but the inverse is not true -- once I'm in PlutoSDR there's only one other
subreddit and the two of them are an island.

------
andyidsinga
damn - i wondering if this with marketing in order to find out where your
audience hangs out.

------
sureaboutthis
Ya' know this assumes one would use reddit as a reference for learning which
one should never, EVER do, don't ya?

------
thro_a_way
hi thanks for this. Is there a guide to how you are storing the data on github
pages?

------
amunategui
Great visualization! Nice work.

------
chx
Incredibly useful, thanks!

------
diimdeep
Amazing!

------
flylib
nice tool

------
yzb
Would be nice if banned subs appeared in a different colour.

------
patcon
If you have spacetime, you might consider sharing this with LGBTQ and kink
communities experiencing the Tumblr diaspora.

Context: [https://nowtoronto.com/lifestyle/advice/savage-love-
tumblr-p...](https://nowtoronto.com/lifestyle/advice/savage-love-tumblr-porn-
ban-is-hurting-my-kink/)

Lots of people feel uprooted from sex-positive and/or tightly-bound
communities they've been part of for years, and don't know how to rediscover
or rebuild the healthy networks they've lost on Tumblr. I know full-grown
adult women who are struggling to find footing again in the most personal of
spaces.

