
Show HN: Little Ball of Fur – A NetworkX extension library for graph sampling - benitorosenberg
https://github.com/benedekrozemberczki/littleballoffur
======
klmadfejno
Can you give some examples of when this would be useful? What's distinctive
about sampling methods beyond, say, picking a random node and all of its
neighbors? What problem does that solve?

~~~
benitorosenberg
The reason for not doing that is the bias that such sampling introduces.

We are writing a paper out of this, but the main point is that you can achieve
these two things with minimal classification performance degradation:

1\. Speeding up node embedding and classification. 2\. Speeding up whole graph
embedding and classification.

~~~
klmadfejno
Can you speak a little more about how those work? I understand word embeddings
conceptually. And I can imagine using a similar process to embed the arbitrary
data stored in a graph. Embedding an entire graph makes less sense to me,
unless 'entire graph' means a subgraph of the general population.

I do social network stuff occasionally. If I hypothetically could create an
embedding representation of everyone, I could imagine it might be useful to,
say, TSNE it all as opposed to a force layout for viz. Or maybe run it as a
pretty black box prediction input? Wondering if I'm missing something more
obvious here

~~~
benitorosenberg
Entire graph embedding means that you have a lot of smaller graphs (e.g.
molecules, transactions, threads) and you want to classify them. We created
this package which covers these methods:

[https://github.com/benedekrozemberczki/karateclub](https://github.com/benedekrozemberczki/karateclub)

