Show HN: Python library for embedding large graphs (Written in Rust)

_han · on Nov 29, 2022

I like how this is a Rust project without any Python, that publishes a package to PyPI.

It uses Maturin (https://github.com/PyO3/maturin) for this, which I've never heard of but sounds really useful.

japanuspus · on Nov 29, 2022

In my experience, the whole `PyO3`-project is a rare gem in language integrations since the developers appear to have an intricate understanding of the ecosystems on both side of the integration.

The python parts are pythonic and the rust parts are "rusty". A joy to work with.

mcronce · on Nov 29, 2022

It probably does help that both Python and Rust treat FFI as first-class in terms of importance. That said, yeah, PyO3 is a pretty awesome suite to work with.

faizshah · on Nov 29, 2022

This is really nice, I have been looking into learning Rust or Cython to speed up some critical paths. Does anyone know if there’s any overhead to using rust vs cython for extending python?

sean_pedersen · on Nov 29, 2022

A lib generating 2D artifacts might benefit from showcasing these artifacts using image technology for the curious minds.

h4kor · on Nov 29, 2022

Thanks for the feedback. The library only calculates the positions and has no graphical output.

As an example: I created this to build this embedding of Mastodon instances. https://h4kor.github.io/fediverse-explorer/

bArray · on Nov 29, 2022

Could you not give input and output examples for the graphs? I.e. show networkx before and after your positioning?

Also how well will this scale for large numbers of nodes, say thousands? Currently using networkx I have to perform tens of thousands of iterations on the spring_layout to get quite mediocre results.

h4kor · on Nov 29, 2022

I've two models implement, one of which is the networkx model, which I would recommend at the moment.

I implemented this because networkx was to memory consuming and too slow.

Where networkx crashed with 24k nodes I was able to embed them with this library, it will still take several hours, but at least it's doable. The number of iterations should be on same magnitude as the number of nodes. For my graph #Iteration = 1/2 #nodes was sufficient.

ljvmiranda · on Nov 29, 2022

I'm looking at the Fediverse explorer and I'm curious how to interpret the embeddings (or the distances between points). What does it mean when two instances are near each other (e.g., sigmoid.social and mastodon-belgium.be)? Is it related to the number of follows?

h4kor · on Nov 29, 2022

Currently I only use the peering information of instances. Instances are places close to each other when they are peering.

As a next step I want to refine this as peering itself does not say anything about how much users of these instances interact. For this I will have to sample the users of instances and who they follow.

anentropic · on Nov 29, 2022

I did also want to see a picture, presumably the point of the lib is to make a nice layout of nodes even though that then has to be rendered with another tool?

h4kor · on Nov 30, 2022

Yes, I built it as a replacement for the networkx spring_layout function.

jonititan · on Nov 29, 2022

Is it compatible with graphviz? https://graphviz.readthedocs.io/en/stable/

The graphviz python library works great I've found with the sole exception of not having straightforward ways to edit the graph after creation. It's really more intended for packaging it up to pass to the command line application.

Will you have file import/export for .dot and similar?

h4kor · on Nov 29, 2022

Have not used it with graphviz yet. There is an example how to use it in combination with networkx, which probably can be adjusted quiet easily.

I'm planning to better integrate it with other graph libraries, but at the moment you have manually translate between graph-force and you library of choice.

lmeyerov · on Nov 29, 2022

I'm curious what makes it for embedding large graphs - afaict the core is a spring model implemented as O(n^2) in nodes and single-threaded? Wasn't sure of the _size param for the nested loop, maybe I'm misreading. Was hoping there'd be something fun to solve the asymptotic scaling limit via fast multipole, NN's, etc :)

h4kor · on Nov 29, 2022

Main benefit over alternative (e.g. networkx): Use all cores of your machine + lower memory footprint. I had problems working with a 24k node graph, which is why I created this library.

For speed I want to try two things in the future: Barnes–Hut simulation and doing the computation on the GPU.

lmeyerov · on Nov 30, 2022

Ah yeah, I'd still fix the asymptotics first before doing multicore: 10X multicore speedup on a problem means little when haven't done the 100X+ speedup :) we do 500k+ node graphs on frontend, and million/billion on backend.. but after the core is sane

GraphLover9000 · on Nov 29, 2022

Have you considered just writing a Rust library and also releasing a thin Python wrapper over it as a separate project? That way, other people could write their own thin wrappers in their high level languages of choice and use your fast implementation via FFI.

I have spent some time looking into graph drawing algorithms and it seems to me that writing a good, optimised algorithm is non-trivial!

imranq · on Nov 29, 2022

Can someone point me to some resources on why you would want to embed graphs?

lmeyerov · on Nov 29, 2022

graph embedding is super useful for both viz + decision procedures

- Viz: Embedding gives x/y coords, and if they rendered the edges, a traditional graph view. A cool thing about recent shift to embedding approaches to the graph drawing problem is optimizing for objective functions that 1980's style spring layouts don't -- think the same things a neural network would optimize for. The code here appears more useful for small/medium graphs (ex: some ec2 tenant diagram) and I didn't see the neural network stuff, but with work, you can scale up several of the pipeline steps to handle 100X+ bigger ones (ex: we work with a lot of fraud or cyber event logs), and in headless cases, ~billion scale. It's a cool new subfield, google "graph drawing neural network".

- Decisions: Node, edge, and subgraph/graph embeddings are all super useful. I'm giving a talk at Friday's Infosec Jupyterthon (https://infosecjupyterthon.com/introduction.html) on the link prediction case for ~identity protection & resource ~monitoring (account takeover, insider threat, rogue devices, data leakage, ...) by mining log data, and as another example, link prediction is basically recsys, which is how any site with a shopping cart makes more money. Node classification, graph motif mining, etc are different but the same. Search for one of the many introductions to graph neural networks for a technical perspective, and I co-authored this survey at the beginning of the year to give a market perspective: https://gradientflow.com/what-is-graph-intelligence/

All this comes up a bunch in cyber/fraud/retail/supplychain -- we're certainly busy there. For anyone into that, we're hiring someone to own a bunch of backend/infra (k8s/gpu cloud/enterprise), 1-2 cleared folks in DC, and in Q1, (graph) data scientists. Simple question but one we're really into :)

edf13 · on Nov 29, 2022

No weights on the relationships?

h4kor · on Nov 29, 2022

Not yet. The binary format already stores weights, but they are not yet used.

edf13 · on Nov 29, 2022

Cool - be interested to look at this when weights are used!