
Show HN: Ggraph – A graph visualization library for big messy data - pcbje
https://gransk.com/ggraph.html
======
mcphage
It's hard to get a sense of how it'll look with 'big messy data', when it's
sample dataset is so tiny, at 14 nodes and 21 edges. Is there a way to see how
it'll look with a ton of data?

~~~
pcbje
Still not big, but a bit bigger: [https://gransk.com/ggraph-
bigger.html](https://gransk.com/ggraph-bigger.html) (Nodes: 611 Edges: 2499)

Edit: It's an excerpt of the Enron dataset.

~~~
nkoren
Thanks, that really helps.

It's getting noticeably laggy for me at 611 nodes, but below you say it
shouldn't be a problem to do 5k. Have you actually tested that kind of data?

~~~
pcbje
Yes. Animating node placement has a significant performance penalty, so in
larger graphs you would either hide edges or everything during the first
couple of seconds.

------
bokchoi
Perhaps related, I came across this graph library comparison page recently.
You can try out the different libraries with different graphs and sizes:

[https://anvaka.github.io/graph-drawing-
libraries/#/all](https://anvaka.github.io/graph-drawing-libraries/#/all)

------
prodtorok
How big can the graph get before taking a noticeable performance hit? A better
example should be shown if advertising "Big ... Data"

~~~
pcbje
It's built on top D3 and has the same limits (5k nodes 20k edges should not be
a big problem). The "big" here is the ability to combine nodes, for two
reasons:

1 Reduce the number of nodes and edges, thus increasing capacity

2 Combine nodes that should be seen together (e.g. alternative spellings and
typos), to better deal with the variation-aspect of big data

I see your point, but the goal here was to illustrate the functionality.

~~~
quickben
I don't have the time to try it right now, but can it visualize 20M-100M
nodes?

~~~
pcbje
At some point you run out of pixels on your screen to visualize it all at
once. My suggestion in your case is to not use a graph at the beginning, but
rather grouping nodes based on metrics. Like degree (group all contacts of a
node that only has a single contact), communities, or value (e.g. IP's in a
subnet).

------
fauria
Congratulations! This seems to be a great tool to effectively display large
graphs.

Maybe you can submit it to Neo4j team to be showcased in their "Graph
Visualization for Neo4j" section: [https://neo4j.com/developer/guide-data-
visualization/](https://neo4j.com/developer/guide-data-visualization/)

------
ivankirigin
We used [https://www.graphistry.com](https://www.graphistry.com) to visualize
hundreds of thousands of edges for
[https://www.yesgraph.com/twinmaps/](https://www.yesgraph.com/twinmaps/)

The minimal eng required to get good performance was wonderful

~~~
phil_s_stein
For the yesgraph link I had to disable ad-blocking and uBlock Origin and still
only got a static image with a Twitter advertisement pasted over the image.

The graphistry link is similarly useless just showing a few sentences on a few
pages ending in a "request demo" button.

~~~
lmeyerov
Hi Ivan, always cool to see what Graphistry users are doing!

Phil, sad to hear you weren't able to see your Twin Graph. Many of us use ad
blockers and this is the first report we've gotten like yours, so we'll dig
in. Meanwhile, you may be able to try a direct link to my own YesGraph
TwinMap:
[https://labs.graphistry.com/graph/graph.html?dataset=lmeyero...](https://labs.graphistry.com/graph/graph.html?dataset=lmeyerov_twitter_graph)
. (Note: best on laptops, and we recently relaunched with Falcor/React, so
currently porting all our page load optimizations.)

For more information about graphistry, we have users piloting the three below
layers of our stack. Because we can load 10-100X more data at the visual tier
than other systems here (so 100K-1M+ things), people have been exploring
connections across events/entities for some fascinating reasons:

* Investigation & Response -- Connect to systems like Splunk and get rich, scalable visual graph views and easy workflow automation. Ex: build an investigation template that takes an indicator of compromise and runs queries that connect it to various users, devices, alerts, etc. Or, "here are our ssh trails and anomalies around them."

* Exploration: Data scientists and data analysts will explore connections in their events or samples, e.g., for week-over-week model tuning, security research & forensics, & even now loan analysis. They'll load in a bunch of events or samples where each may have a lot of attributes (IPs, times, amounts, ...), and then they can see correlations. Ex: most false positives are from events with 3 particular combinations of characteristics, or an outage involved 4 distinct phases of behavior and entities.

* Developers: folks building internal apps for scenarios like the above.

For the latter two use cases, a good place to get started is our API:
[https://github.com/graphistry/pygraphistry](https://github.com/graphistry/pygraphistry)
. Feel free to contact us at info@ if this may solve a problem for you. (And..
we're hiring! Help us build web-based visual tools with GPUs acceleration to
solve real data problems!)

------
andreash
What are typical use cases for this?

~~~
pcbje
To better understand complex relations in the data you are analyzing, e.g.
friendship in social networks.

~~~
andreash
Do you know of any good sources for example data like that? Would be fun to
explore.

~~~
pcbje
Stanford has a good collection of network datasets:
[https://snap.stanford.edu/data/](https://snap.stanford.edu/data/)

------
fourthark
The lasso seems to "give up" if too many nodes are selected - stops updating
the selection although it keeps updating the green shaded area. Perhaps it
should only give up on the text labels which are probably the expensive part.

------
ropeladder
I really like the grouping features and the labels work pretty well too
(though it's slow with large groups). Thanks for sharing your work!

~~~
pcbje
Thank you! I know, there are plenty of things yet to be done to make this
thing run smoothly.

------
sagichmal
"Hold right click and drag" doesn't work on laptops with trackpads.

