In my experience, the whole `PyO3`-project is a rare gem in language integrations since the developers appear to have an intricate understanding of the ecosystems on both side of the integration.
The python parts are pythonic and the rust parts are "rusty". A joy to work with.
It probably does help that both Python and Rust treat FFI as first-class in terms of importance. That said, yeah, PyO3 is a pretty awesome suite to work with.
This is really nice, I have been looking into learning Rust or Cython to speed up some critical paths. Does anyone know if there’s any overhead to using rust vs cython for extending python?
Could you not give input and output examples for the graphs? I.e. show networkx before and after your positioning?
Also how well will this scale for large numbers of nodes, say thousands? Currently using networkx I have to perform tens of thousands of iterations on the spring_layout to get quite mediocre results.
I've two models implement, one of which is the networkx model, which I would recommend at the moment.
I implemented this because networkx was to memory consuming and too slow.
Where networkx crashed with 24k nodes I was able to embed them with this library, it will still take several hours, but at least it's doable. The number of iterations should be on same magnitude as the number of nodes. For my graph #Iteration = 1/2 #nodes was sufficient.
I'm looking at the Fediverse explorer and I'm curious how to interpret the embeddings (or the distances between points). What does it mean when two instances are near each other (e.g., sigmoid.social and mastodon-belgium.be)? Is it related to the number of follows?
Currently I only use the peering information of instances. Instances are places close to each other when they are peering.
As a next step I want to refine this as peering itself does not say anything about how much users of these instances interact. For this I will have to sample the users of instances and who they follow.
I did also want to see a picture, presumably the point of the lib is to make a nice layout of nodes even though that then has to be rendered with another tool?
The graphviz python library works great I've found with the sole exception of not having straightforward ways to edit the graph after creation.
It's really more intended for packaging it up to pass to the command line application.
Will you have file import/export for .dot and similar?
Have not used it with graphviz yet. There is an example how to use it in combination with networkx, which probably can be adjusted quiet easily.
I'm planning to better integrate it with other graph libraries, but at the moment you have manually translate between graph-force and you library of choice.
I'm curious what makes it for embedding large graphs - afaict the core is a spring model implemented as O(n^2) in nodes and single-threaded? Wasn't sure of the _size param for the nested loop, maybe I'm misreading. Was hoping there'd be something fun to solve the asymptotic scaling limit via fast multipole, NN's, etc :)
Main benefit over alternative (e.g. networkx): Use all cores of your machine + lower memory footprint. I had problems working with a 24k node graph, which is why I created this library.
For speed I want to try two things in the future: Barnes–Hut simulation and doing the computation on the GPU.
Ah yeah, I'd still fix the asymptotics first before doing multicore: 10X multicore speedup on a problem means little when haven't done the 100X+ speedup :) we do 500k+ node graphs on frontend, and million/billion on backend.. but after the core is sane
Have you considered just writing a Rust library and also releasing a thin Python wrapper over it as a separate project? That way, other people could write their own thin wrappers in their high level languages of choice and use your fast implementation via FFI.
I have spent some time looking into graph drawing algorithms and it seems to me that writing a good, optimised algorithm is non-trivial!
graph embedding is super useful for both viz + decision procedures
- Viz: Embedding gives x/y coords, and if they rendered the edges, a traditional graph view. A cool thing about recent shift to embedding approaches to the graph drawing problem is optimizing for objective functions that 1980's style spring layouts don't -- think the same things a neural network would optimize for. The code here appears more useful for small/medium graphs (ex: some ec2 tenant diagram) and I didn't see the neural network stuff, but with work, you can scale up several of the pipeline steps to handle 100X+ bigger ones (ex: we work with a lot of fraud or cyber event logs), and in headless cases, ~billion scale. It's a cool new subfield, google "graph drawing neural network".
- Decisions: Node, edge, and subgraph/graph embeddings are all super useful. I'm giving a talk at Friday's Infosec Jupyterthon (https://infosecjupyterthon.com/introduction.html) on the link prediction case for ~identity protection & resource ~monitoring (account takeover, insider threat, rogue devices, data leakage, ...) by mining log data, and as another example, link prediction is basically recsys, which is how any site with a shopping cart makes more money. Node classification, graph motif mining, etc are different but the same. Search for one of the many introductions to graph neural networks for a technical perspective, and I co-authored this survey at the beginning of the year to give a market perspective: https://gradientflow.com/what-is-graph-intelligence/
All this comes up a bunch in cyber/fraud/retail/supplychain -- we're certainly busy there. For anyone into that, we're hiring someone to own a bunch of backend/infra (k8s/gpu cloud/enterprise), 1-2 cleared folks in DC, and in Q1, (graph) data scientists. Simple question but one we're really into :)
It uses Maturin (https://github.com/PyO3/maturin) for this, which I've never heard of but sounds really useful.